Summary

Next: About this document Up: Bayesian Learning Previous: Practical Implementation for -means

Summary

Bayesian methods provide basis for probabilistic learning methods that accommodate knowledge about prior distributions of alternative hypothesis and about the probability of observing the data given various hypothesis. They assign a posterior probability to each candidate hypothesis, based on these assumed priors and the observed data.
Bayesian methods return the most probable hypothesis (e.g., a MAP hypothesis).
Bayes Optimal classifier combines the predictions of all alternative hypotheses, weighted by their posterior probabilities, to calculate the most probable classification of a new instance.
Naive Bayes has been found to useful in many killer apps. It is naive because it has no street sense...no no no..it incorporates the simplifying assumption that attribute values are conditionally independent given the classification of the instance. When this is true naive Bayes produces a MAP hypothesis. Even when the assumption is violated Naive Bayes tends to perform well. BBNs provide a more expressive representation for sets of conditional independence assumptions.
The framework of Bayesian reasoning can provide a useful basis for analyzing certain learning methods that do not directly apply Bayes theorem. For example under certain conditions, minimizing the squared error when learning a real-valued target function corresponds to computing the maximum likelihood (ML) hypothesis.
The Minimum Description Length principle recommends choosing the hypothesis that minimizes the description length of the hypothesis plus the description length of the data given the hypothesis. Bayes theorem and basic results from information theory can be used to provide a rationale for this principle.
In many practical learning tasks, some of the relevant instance variables may be unobservable. The EM algorithm provides quite a general approach to learning in the presence of unobservable variables. It begins with an arbitrary initial hypothesis. It then repeatedly calculates the expected values of the hidden variables (assuming the current hypothesis is correct) and then recalculates the ML hypothesis (assuming the hidden variables have the expected values calculated by the first step). This procedure converges to a local ML hypothesis (i.e., maximum boys-in-the-hood hypothesis) along with the estimated values for the hidden variables.

Next: About this document Up: Bayesian Learning Previous: Practical Implementation for -means

Patricia Riddle
Fri May 15 13:00:36 NZST 1998