Next:
Grid-world
Up:
Reinforcement Learning
Previous:
Learning Task
Optimal Policy
We require the agent to learn the optimal policy,
whose value function is
or for simplicity
Patricia Riddle
Fri May 15 13:00:36 NZST 1998