Next: Reinforcement Learning Problems
Up: Reinforcement Learning
Previous: Agents
- Learning to control sequential processes - manufacturing optimization problems where reward is
goods-produced minus costs involved
- sequential scheduling - choosing which taxis to send for
passengers in a big city where reward is a function of the wait time
of passengers and the total fuel costs of the taxi fleet
- Specific settings: actions are deterministic or
nondeterministic, agent does or does not have prior knowledge of the
effects of its actions on the environment
Patricia Riddle
Fri May 15 13:00:36 NZST 1998