Summary

Next: Bayesian Learning Up: Reinforcement Learning Previous: Relationship to Dynamic Programming

Reinforcement learning - learning control strategies for autonomous agents. Training information is real-valued reward for each state-action transition. Learn action policy that maximizes total reward from any starting state.
Reinforcement learning algorithms fit Markov decision processes where the outcome of applying an action to a state depends only on this action and state (not preceding actions or states). MDP cover a wide range of problems - robot control, factory automation, and scheduling problems.
learning is one form of reinforcement learning where the function is defined as the maximum expected, discounted, cumulative reward the agent can achieve by applying action to state . In Q learning no knowledge of how the actions effect the environment is required.
Q learning is proven to converge under certain assumptions when the hypothesis is represented by a lookup table. Will converge deterministic and nondeterministic MDPs, but requires thousands of training iterations to converge in even modest problems.
Q learning is a member of the class of temporal difference algorithms. These algorithms learn by iteratively reducing discrepancies between estimates produced by the agent at different times,
Reinforcement learning is closely related to dynamic programming, The key difference is that dynamic programming assumes the agent possesses knowledge of the functions and , while Q learning assumes the learner lacks this knowledge.

Patricia Riddle
Fri May 15 13:00:36 NZST 1998