Next: Learning Task
Up: Reinforcement Learning
Previous: General Problem
- that outputs an appropriate action
from the set , given the current state from the set .
- Delayed Reward: no training example in form, the
trainer provides only a sequence of immediate reward values as the
agent executes its sequence of actions. The agent faces the problem
of temporal credit assignment
- Exploration: The agent influences the distribution of training
examples by the action sequence it chooses. Which experimentation
strategy produces most effective learning. The learner faces
tradeoffs in choosing of unknown states or
of known states that it has already learned will yield
high rewards.
- Partially Observable States: In many practical situations
sensors only provide partial information. An agent may have to
consider its previous observations together with its current sensor
data. The best policy may be one which chooses specifically to
improve the observability of the environment.
- Life-long learning: agents often require that the robot learn
several related tasks within the same environment. A robot might need
to learn how to dock on its battery charger, how to navigate through
narrow corridors, and how to pickup output from laser printers. This
raises the possibility of using previously obtained experience or
knowledge to reduce sample complexity when learning new tasks.
Patricia Riddle
Fri May 15 13:00:36 NZST 1998