Reinforcement Learning Problems

Next: Learning Task Up: Reinforcement Learning Previous: General Problem

that outputs an appropriate action from the set , given the current state from the set .
Delayed Reward: no training example in form, the trainer provides only a sequence of immediate reward values as the agent executes its sequence of actions. The agent faces the problem of temporal credit assignment
Exploration: The agent influences the distribution of training examples by the action sequence it chooses. Which experimentation strategy produces most effective learning. The learner faces tradeoffs in choosing of unknown states or of known states that it has already learned will yield high rewards.
Partially Observable States: In many practical situations sensors only provide partial information. An agent may have to consider its previous observations together with its current sensor data. The best policy may be one which chooses specifically to improve the observability of the environment.
Life-long learning: agents often require that the robot learn several related tasks within the same environment. A robot might need to learn how to dock on its battery charger, how to navigate through narrow corridors, and how to pickup output from laser printers. This raises the possibility of using previously obtained experience or knowledge to reduce sample complexity when learning new tasks.

Patricia Riddle
Fri May 15 13:00:36 NZST 1998