Next: Updating Sequence
Up: Reinforcement Learning
Previous: Convergence
- Common to use probabilistic approach to selecting actions
- Actions with higher are assigned higher
probabilities, but every action has a non-zero probability
- is the probability of selecting action
, given the agent is in state , where is the
constant that determines how strongly the selection favors actions
with high values
-
- Sometimes is varied with the number of iterations so the
agent favors exploration during the early stages of learning , then
gradually shifts toward a strategy of exploitation.
Patricia Riddle
Fri May 15 13:00:36 NZST 1998