Next: About this document
Up: Bayesian Learning
Previous: Practical Implementation for -means
- Bayesian methods provide basis for probabilistic learning
methods that accommodate knowledge about prior distributions of
alternative hypothesis and about the probability of observing the data
given various hypothesis. They assign a posterior probability to each
candidate hypothesis, based on these assumed priors and the observed data.
- Bayesian methods return the most probable hypothesis (e.g., a
MAP hypothesis).
- Bayes Optimal classifier combines the predictions of all
alternative hypotheses, weighted by their posterior probabilities, to
calculate the most probable classification of a new instance.
- Naive Bayes has been found to useful in many killer apps. It is
naive because it has no street sense...no no no..it incorporates the simplifying assumption that attribute
values are conditionally independent given the classification of the
instance. When this is true naive Bayes produces a MAP hypothesis.
Even when the assumption is violated Naive Bayes tends to perform
well. BBNs provide a more expressive representation for sets of
conditional independence assumptions.
- The framework of Bayesian reasoning can provide a useful basis
for analyzing certain learning methods that do not directly apply
Bayes theorem. For example under certain conditions, minimizing the
squared error when learning a real-valued target function corresponds
to computing the maximum likelihood (ML) hypothesis.
- The Minimum Description Length principle recommends choosing the
hypothesis that minimizes the description length of the hypothesis
plus the description length of the data given the hypothesis. Bayes
theorem and basic results from information theory can be used to
provide a rationale for this principle.
- In many practical learning tasks, some of the relevant instance
variables may be unobservable. The EM algorithm provides quite a
general approach to learning in the presence of unobservable
variables. It begins with an arbitrary initial hypothesis. It then
repeatedly calculates the expected values of the hidden variables
(assuming the current hypothesis is correct) and then recalculates the
ML hypothesis (assuming the hidden variables have the expected values
calculated by the first step). This procedure converges to a local ML
hypothesis (i.e., maximum boys-in-the-hood
hypothesis) along with the estimated values for the hidden variables.
Next: About this document
Up: Bayesian Learning
Previous: Practical Implementation for -means
Patricia Riddle
Fri May 15 13:00:36 NZST 1998