Next: Experimentation Strategies
Up: Reinforcement Learning
Previous: Illustrative Example
values never decrease during training -
will remain in the interval between 0 and
- will converge if
- deterministic MDP,
- immediate rewards are bounded -
- the agent selects actions such that it visits every state-action
pair infinitely often - must execute
with nonzero
frequency as the length of its action sequence approaches infinity
Patricia Riddle
Fri May 15 13:00:36 NZST 1998