Next: Minimum Description Length
Up: Bayesian Learning
Previous: Maximum Likelihood for Predicting
- The cross entropy can be used in a gradient ascent algorithm
for ANNs
-
- ,
where is the th input to unit for the th
training example
- The rule that minimizes squared error seeks under
the assumptions that the training data can be modeled by Normally
distributed noise added to a target function value.
- The rule that minimizes cross entropy sees under the
assumption that the observed boolean value is a probabilistic function
of the input instance.
Patricia Riddle
Fri May 15 13:00:36 NZST 1998