Next: Bayesian Learning
Up: Reinforcement Learning
 Previous: Relationship to Dynamic Programming
 
-  Reinforcement learning - learning control strategies for
autonomous agents.  Training information is real-valued reward for
each state-action transition.  Learn action policy that maximizes
total reward from any starting state.
 -  Reinforcement learning algorithms fit Markov decision processes
where the outcome of applying an action to a state depends only on this
action and state (not preceding actions or states). MDP cover a wide
range of problems - robot control, factory automation, and scheduling
problems.
 -   
  learning is one form of reinforcement learning where the
function  
  is defined as the maximum expected, discounted,
cumulative reward the agent can achieve by applying action  
  to
state  
 .  In Q learning no knowledge of how the actions effect the
environment is required. -  Q learning is proven to converge under certain assumptions
when the hypothesis  
  is represented by a lookup
table.  Will converge deterministic and nondeterministic MDPs, but
requires thousands of training iterations to converge in even
modest problems. -  Q learning is a member of the class of temporal difference
algorithms.  These algorithms learn by iteratively reducing
discrepancies between estimates produced by the agent at different
times,
 -  Reinforcement learning is closely related to dynamic programming,
The key difference is that dynamic programming assumes the agent possesses knowledge of
the functions  
  and  
 , while Q learning assumes
the learner lacks this knowledge.
 
 
Patricia Riddle 
Fri May 15 13:00:36 NZST 1998