Next: Q Learning Properties
Up: Reinforcement Learning
 Previous: Finding Optimal Policies
 
-  optimal action is the one that maximizes the sum  
  and
 
  to the immediate successor state discounted by  
  -   
  -  but must have perfect knowledge of reward function  
  and the
state transition function  
 !!! -  so create the Q function,  
  -  now  
  -  now we can select optimal actions  even when we have no
knowledge of  
  or  
  -  Q value for each state-action transition equals the  
  value
for this transition plus the  
  value for the resulting state
discounted by  
  
 
Patricia Riddle 
Fri May 15 13:00:36 NZST 1998