Summary

Next: Genetic Algorithms Up: Neural Network Learning Previous: Advanced Topics

practical method for learning real-valued and vector-valued functions over continuous and discrete-valued attributes
robust to noise in the training data
Backprop algorithm is most common
hypothesis space: all functions that can be represented by assigning weights to fixed network of interconnected units
feedforward networks containing 3 layers can approximate any function to arbitrary accuracy given sufficient number of units in each layer
networks of practical size are capable of representing a rich space of highly nonlinear functions
Backprop searches the space of possible hypotheses using gradient descent (GD) to iteratively reduce the error in the network to fit the training data.
GD converges to a local minimum in the training error with respect to the network weights.
Backprop has the ability to invent new features that are not explicit in the input
hidden units of multilayer networks learn to represent intermediate features (e.g., face recognition)
Overfitting is an important issue (caused by overuse of accuracy imho).
Cross-validation can be used to estimate an appropriate stopping point for gradient descent.
Many other algorithms and extensions.

Patricia Riddle
Fri May 15 13:00:36 NZST 1998