Next: General Problem
Up: Reinforcement Learning
Previous: Reinforcement Learning
- building a learning robot (or agent)
- sensors observe the of the world - camera and sonars
- a set of can be performed to alter the state - move
forward, turn left
- Its task is to learn a control for choosing actions
that achieve goals - docking onto a battery charger whenever its
battery is low
- we assume the goals of the agent can be defined by a
function that assigns a numerical value - an immediate payoff - to
each distinct action from each distinct state (a reward of +100) to
state-action transitions that immediately result in a connection to the
charger and 0 for all other state-action transitions
- the reward function can be built into the robot or known only to
an external teacher
- The task of the robot is to perform sequences of actions,
observe their consequences, and learn a control policy
- The desired control policy is one that from any initial state
chooses actions that maximize the reward accumulated over time by the agent
Patricia Riddle
Fri May 15 13:00:36 NZST 1998