Next: A General Example
Up: Bayesian Learning
Previous: Introduction
- Learning - we want the best hypothesis from some space
,
given the observed training data
. Best can be defined as most
probable given the data
plus any initial knowledge about prior
probabilities of the various hypotheses in
. - This is a direct method!!! (no search)
-
- the initial probability that hypothesis
holds
before we observe the training data - prior probability - if we
have no prior knowledge we assign the same initial probability to them
all (it is trickier than this!!) -
- prior probability training data
will be observed
given no knowledge about which hypothesis holds -
- the probability of observing data
given that
hypothesis
holds -
- the probability that
holds given the training
data
- posterior probability - Bayes Theorem -
- probability increases with
and
and decreases
with
- this last is not true with a lot of other scoring
functions! - so we want a maximum a posteriori hypothesis (MAP) -
- if we assume every hypothesis is equally likely a priori then we
want the maximum likelihood hypothesis -
- Bayes theorem is more general than Machine Learning!!
Patricia Riddle
Fri May 15 13:00:36 NZST 1998