Next: Q Learning Properties
Up: Reinforcement Learning
Previous: Finding Optimal Policies
- optimal action is the one that maximizes the sum and
to the immediate successor state discounted by
-
- but must have perfect knowledge of reward function and the
state transition function !!!
- so create the Q function,
- now
- now we can select optimal actions even when we have no
knowledge of or
- Q value for each state-action transition equals the value
for this transition plus the value for the resulting state
discounted by
Patricia Riddle
Fri May 15 13:00:36 NZST 1998