Experimentation Strategies

Next: Updating Sequence Up: Reinforcement Learning Previous: Convergence

Common to use probabilistic approach to selecting actions
Actions with higher are assigned higher probabilities, but every action has a non-zero probability
is the probability of selecting action , given the agent is in state , where is the constant that determines how strongly the selection favors actions with high values
Sometimes is varied with the number of iterations so the agent favors exploration during the early stages of learning , then gradually shifts toward a strategy of exploitation.

Patricia Riddle
Fri May 15 13:00:36 NZST 1998