if training examples are not linearly separable, the delta
rule converges toward a best-fit approximation
use gradient descent to find the weights that best fit
the training examples - basis of Backpropagation Algorithm
assume an unthresholded perceptron, then the training error is
where is the set of training examples, is the target output
for the training example and is the output of the linear unit
for training example .
Given above error definition, the error surface must be
parabolic with a single global minimum.