Q Learning Properties

Next: Q learning Algorithm Up: Reinforcement Learning Previous: Q Learning

still need - iterative approximation or recursive definition
, so
, the learner's estimate of , is stored in a big table which is Initially filled with random values or zero
The agent starts in some state, , and chooses some action, , and observes the result reward, , and the new state,
It then updates the table,
doesn't need to know functions or just executes the action and observes and so just sampling these functions at the current values of and

Patricia Riddle
Fri May 15 13:00:36 NZST 1998