Gradient Search to Maximize Likelihood in ANN

Next: Minimum Description Length Up: Bayesian Learning Previous: Maximum Likelihood for Predicting

The cross entropy can be used in a gradient ascent algorithm for ANNs
, where is the th input to unit for the th training example
The rule that minimizes squared error seeks under the assumptions that the training data can be modeled by Normally distributed noise added to a target function value.
The rule that minimizes cross entropy sees under the assumption that the observed boolean value is a probabilistic function of the input instance.

Patricia Riddle
Fri May 15 13:00:36 NZST 1998