Temporal Difference Learning

Next: Generalizing from Examples Up: Reinforcement Learning Previous: Nondeterministic Rewards and Actions

Q learning is a special case of temporal difference learning
Q learning can be seen as one-step lookahead,
two-step look ahead,
general formula for n-step lookahead,
so can use a constant to combine estimates from various lookahead distances
recursive definition -
motivation is that when the agent follows an optimal policy for choosing actions if then will provide the perfect estimate of regardless of errors in , if suboptimal action sequences are chosen then observed far in the future can be misleading

Patricia Riddle
Fri May 15 13:00:36 NZST 1998