Q Learning

Next: Q Learning Properties Up: Reinforcement Learning Previous: Finding Optimal Policies

optimal action is the one that maximizes the sum and to the immediate successor state discounted by
but must have perfect knowledge of reward function and the state transition function !!!
so create the Q function,
now
now we can select optimal actions even when we have no knowledge of or
Q value for each state-action transition equals the value for this transition plus the value for the resulting state discounted by

Patricia Riddle
Fri May 15 13:00:36 NZST 1998