Next: Heuristics to Overcome Local
Up: Neural Network Learning
Previous: Arbitrary Acyclic Networks
- the error surface in multilayer neural networks may contain may
different local minima where gradient descent can become trapped
- but Backpropagation is a highly effective function
approximation in practice - why?
- networks with large numbers of weights correspond to error
surfaces in very high dimensional spaces
- when gradient descent falls into a local minima with respect to
one weight it won't necessarily be with respect to the other weights
- the more weights, the more dimensions that might provide an
escape route - do I believe this??? - new seed, more nodes, more data???
- During early GD search the network will represent a very smooth
function, only after weights have time to grow will they reach a point
where they can represent highly nonlinear network functions
- One might expect more local minima to exist in a region of the
weight space that represents these more complex functions
- One hopes that by the time the weights reach this point they
have already moved close enough to the global minimum that even a
local minima in this region is acceptable
Patricia Riddle
Fri May 15 13:00:36 NZST 1998