Next: Experimentation Strategies
Up: Reinforcement Learning
Previous: Illustrative Example
- values never decrease during training -
- will remain in the interval between 0 and -
- will converge if
- deterministic MDP,
- immediate rewards are bounded -
- the agent selects actions such that it visits every state-action
pair infinitely often - must execute from with nonzero
frequency as the length of its action sequence approaches infinity
Patricia Riddle
Fri May 15 13:00:36 NZST 1998