Next: Generalizing from Examples
Up: Reinforcement Learning
 Previous: Nondeterministic Rewards and Actions
 
-  Q learning is a special case of temporal difference learning
 -  Q learning can be seen as one-step lookahead,  
  -  two-step look ahead,  
  -   general formula for n-step lookahead,  
  -  so can use a constant  
  to combine
estimates from various lookahead distances  
  -  recursive definition -  
  -   motivation is that when the agent follows an optimal policy for
choosing actions if  
  then  
  will
provide the perfect estimate of  
  regardless of errors in
 
 , if suboptimal action sequences are chosen then
 
  observed far in the future can be misleading
 
 
Patricia Riddle 
Fri May 15 13:00:36 NZST 1998