General Problem

Next: Reinforcement Learning Problems Up: Reinforcement Learning Previous: Agents

Learning to control sequential processes - manufacturing optimization problems where reward is goods-produced minus costs involved
sequential scheduling - choosing which taxis to send for passengers in a big city where reward is a function of the wait time of passengers and the total fuel costs of the taxi fleet
Specific settings: actions are deterministic or nondeterministic, agent does or does not have prior knowledge of the effects of its actions on the environment

Patricia Riddle
Fri May 15 13:00:36 NZST 1998