Next: Stochastic Gradient Descent
Up: Neural Network Learning
Previous: Gradient-Descent Algorithm
- important general paradigm when
- continuously parameterized hypothesis
- the error can be differentiated with respect to the hypothesis
parameters
- The key practical problems are:
- converging to a local minimum can be quite slow
- if there are multiple local minima, then there is no guarantee
that the procedure will find the global minimum (Notice: The gradient
descent algorithm can work with other error definitions and
will not have a global minimum. If we use the sum of squares
error, this is not a problem.)
Patricia Riddle
Fri May 15 13:00:36 NZST 1998