Next: Bayesian Learning
Up: Reinforcement Learning
Previous: Relationship to Dynamic Programming
- Reinforcement learning - learning control strategies for
autonomous agents. Training information is real-valued reward for
each state-action transition. Learn action policy that maximizes
total reward from any starting state.
- Reinforcement learning algorithms fit Markov decision processes
where the outcome of applying an action to a state depends only on this
action and state (not preceding actions or states). MDP cover a wide
range of problems - robot control, factory automation, and scheduling
problems.
- learning is one form of reinforcement learning where the
function is defined as the maximum expected, discounted,
cumulative reward the agent can achieve by applying action to
state . In Q learning no knowledge of how the actions effect the
environment is required.
- Q learning is proven to converge under certain assumptions
when the hypothesis is represented by a lookup
table. Will converge deterministic and nondeterministic MDPs, but
requires thousands of training iterations to converge in even
modest problems.
- Q learning is a member of the class of temporal difference
algorithms. These algorithms learn by iteratively reducing
discrepancies between estimates produced by the agent at different
times,
- Reinforcement learning is closely related to dynamic programming,
The key difference is that dynamic programming assumes the agent possesses knowledge of
the functions and , while Q learning assumes
the learner lacks this knowledge.
Patricia Riddle
Fri May 15 13:00:36 NZST 1998