Next: Genetic Algorithms
Up: Neural Network Learning
 Previous: Advanced Topics
 
-  practical method for learning real-valued and vector-valued
functions over continuous and discrete-valued attributes
 -  robust to noise in the training data
 -  Backprop algorithm is most common
 -  hypothesis space: all functions that can be represented by
assigning weights to fixed network of interconnected units
 -  feedforward networks containing 3 layers can approximate any
function to arbitrary accuracy given sufficient number of units in
each layer
 -  networks of practical size are capable of representing a rich
space of highly nonlinear functions
 -  Backprop searches the space of possible hypotheses using
gradient descent (GD) to iteratively reduce the error in the network to fit
the training data.
 -  GD converges to a local minimum in the training error with
respect to the network weights.
 -  Backprop has the ability to invent new features that are not
explicit in the input
 -  hidden units of multilayer networks learn to represent
intermediate features (e.g., face recognition)
 -  Overfitting is an important issue (caused by overuse of accuracy
imho).
 -  Cross-validation can be used to estimate an appropriate stopping
point for gradient descent.
 -  Many other algorithms and extensions.
 
 
Patricia Riddle 
Fri May 15 13:00:36 NZST 1998