Next: Minimum Description Length
Up: Bayesian Learning
Previous: Maximum Likelihood for Predicting
- The cross entropy can be used in a gradient ascent algorithm
for ANNs
-
-
,
where
is the
th input to unit
for the
th
training example - The rule that minimizes squared error seeks
under
the assumptions that the training data can be modeled by Normally
distributed noise added to a target function value. - The rule that minimizes cross entropy sees
under the
assumption that the observed boolean value is a probabilistic function
of the input instance.
Patricia Riddle
Fri May 15 13:00:36 NZST 1998