Next: Experimentation Strategies
Up: Reinforcement Learning
Previous: Illustrative Example
-
values never decrease during training -
-
will remain in the interval between 0 and
-
- will converge if
- deterministic MDP,
- immediate rewards are bounded -
- the agent selects actions such that it visits every state-action
pair infinitely often - must execute
from
with nonzero
frequency as the length of its action sequence approaches infinity
Patricia Jean Riddle
Wed Jun 23 13:06:34 NZST 1999