Next: Generalizing from Examples
Up: Reinforcement Learning
Previous: Nondeterministic Rewards and Actions
- Q learning is a special case of temporal difference learning
- Q learning can be seen as one-step lookahead,
- two-step look ahead,
- general formula for n-step lookahead,
- so can use a constant to combine
estimates from various lookahead distances
- recursive definition -
- motivation is that when the agent follows an optimal policy for
choosing actions if then will
provide the perfect estimate of regardless of errors in
, if suboptimal action sequences are chosen then
observed far in the future can be misleading
Patricia Riddle
Fri May 15 13:00:36 NZST 1998