Next: Updating Sequence
Up: Reinforcement Learning
Previous: Convergence
- Common to use probabilistic approach to selecting actions
- Actions with higher
are assigned higher
probabilities, but every action has a non-zero probability -
is the probability of selecting action
, given the agent is in state
, where
is the
constant that determines how strongly the selection favors actions
with high
values -
- Sometimes
is varied with the number of iterations so the
agent favors exploration during the early stages of learning , then
gradually shifts toward a strategy of exploitation.
Patricia Riddle
Fri May 15 13:00:36 NZST 1998