Next: Experimentation Strategies
Up: Reinforcement Learning
 Previous: Illustrative Example
 
-   
  values never decrease during training -  
  -   
  will remain in the interval between 0 and  
  -  
  -  will converge if 
-  deterministic MDP,
 -  immediate rewards are bounded -  
  -  the agent selects actions such that it visits every state-action
pair infinitely often - must execute  
  from  
  with nonzero
frequency as the length of its action sequence approaches infinity
 
 
 
Patricia Riddle 
Fri May 15 13:00:36 NZST 1998