Next: A General Example
Up: Bayesian Learning
 Previous: Introduction
 
-  Learning - we want the best hypothesis from some space  
 ,
given the observed training data  
 .  Best can be defined as most 
probable given the data  
  plus any initial knowledge about prior
probabilities of the various hypotheses in  
 . -  This is a direct method!!! (no search)
 -   
  - the initial probability that hypothesis  
  holds
before we observe the training data - prior probability - if we
have no prior knowledge we assign the same initial probability to them
all (it is trickier than this!!) -   
  - prior probability training data  
  will be observed
given no knowledge about which hypothesis holds -   
  - the probability of observing data  
  given that
hypothesis  
  holds -   
  - the probability that  
  holds given the training
data  
  - posterior probability -  Bayes Theorem -  
  -  probability increases with  
  and  
  and decreases
with  
  - this last is not true with a lot of other scoring
functions! -  so we want a maximum a posteriori hypothesis (MAP) -
 
  -  if we assume every hypothesis is equally likely a priori then we
want the maximum likelihood hypothesis -  
  -  Bayes theorem is more general than Machine Learning!!
 
 
Patricia Riddle 
Fri May 15 13:00:36 NZST 1998