Bayes Theorem

Next: A General Example Up: Bayesian Learning Previous: Introduction

Learning - we want the best hypothesis from some space , given the observed training data . Best can be defined as most probable given the data plus any initial knowledge about prior probabilities of the various hypotheses in .
This is a direct method!!! (no search)
- the initial probability that hypothesis holds before we observe the training data - prior probability - if we have no prior knowledge we assign the same initial probability to them all (it is trickier than this!!)
- prior probability training data will be observed given no knowledge about which hypothesis holds
- the probability of observing data given that hypothesis holds
- the probability that holds given the training data - posterior probability
Bayes Theorem -
probability increases with and and decreases with - this last is not true with a lot of other scoring functions!
so we want a maximum a posteriori hypothesis (MAP) -
if we assume every hypothesis is equally likely a priori then we want the maximum likelihood hypothesis -
Bayes theorem is more general than Machine Learning!!

Patricia Riddle
Fri May 15 13:00:36 NZST 1998