Convergence

Next: Experimentation Strategies Up: Reinforcement Learning Previous: Illustrative Example

values never decrease during training -
will remain in the interval between 0 and -
will converge if
1. deterministic MDP,
2. immediate rewards are bounded -
3. the agent selects actions such that it visits every state-action pair infinitely often - must execute from with nonzero frequency as the length of its action sequence approaches infinity

Patricia Riddle
Fri May 15 13:00:36 NZST 1998