Download presentation
Presentation is loading. Please wait.
Published byJulian Turner Modified over 9 years ago
1
An Overview of Learning Bayes Nets From Data Chris Meek Microsoft Researchhttp://research.microsoft.com/~meek
2
What’s and Why’s n What is a Bayesian network? n Why Bayesian networks are useful? n Why learn a Bayesian network?
3
What is a Bayesian Network? n Directed acyclic graph –Nodes are variables (discrete or continuous) –Arcs indicate dependence between variables. n Conditional Probabilities (local distributions) n Missing arcs implies conditional independence n Independencies + local distributions => modular specification of a joint distribution X1X1 X2X2 X3X3 also called belief networks, and (directed acyclic) graphical models
4
Why Bayesian Networks? n Expressive language –Finite mixture models, Factor analysis, HMM, Kalman filter,… n Intuitive language –Can utilize causal knowledge in constructing models –Domain experts comfortable building a network n General purpose “inference” algorithms –P(Bad Battery | Has Gas, Won’t Start) –Exact: Modular specification leads to large computational efficiencies –Approximate: “Loopy” belief propagation Gas Start Battery
5
Why Learning? knowledge-based (expert systems) data-based -Answer Wizard, Office 95, 97, & 2000 -Troubleshooters, Windows 98 & 2000 -Causal discovery -Data visualization -Concise model of data -Prediction
6
Overview n Learning Probabilities (local distributions) –Introduction to Bayesian statistics: Learning a probability –Learning probabilities in a Bayes net –Applications n Learning Bayes-net structure –Bayesian model selection/averaging –Applications
7
Learning Probabilities: Classical Approach Simple case: Flipping a thumbtack tails heads True probability is unknown Given iid data, estimate using an estimator with good properties: low bias, low variance, consistent (e.g., ML estimate)
8
Learning Probabilities: Bayesian Approach tails heads True probability is unknown Bayesian probability density for p()p() 01
9
Bayesian Approach: use Bayes' rule to compute a new density for given data prior likelihood posterior
10
The Likelihood “binomial distribution”
11
Example: Application of Bayes rule to the observation of a single "heads" p( |heads) 01 p()p() 01 p(heads| )= 01 priorlikelihoodposterior
12
The probability of heads on the next toss Note: This yields nearly identical answers to ML estimates when one uses a “flat” prior
13
Overview n Learning Probabilities –Introduction to Bayesian statistics: Learning a probability –Learning probabilities in a Bayes net –Applications n Learning Bayes-net structure –Bayesian model selection/averaging –Applications
14
From thumbtacks to Bayes nets Thumbtack problem can be viewed as learning the probability for a very simple BN: X heads/tails X1X1 X2X2 XNXN... toss 1 toss 2toss N XiXi i=1 to N
15
The next simplest Bayes net X heads/tails Y tails heads “heads”“tails”
16
The next simplest Bayes net X heads/tails Y XX XiXi i=1 to N YY YiYi ?
17
The next simplest Bayes net X heads/tails Y XX XiXi i=1 to N YY YiYi "parameter independence"
18
The next simplest Bayes net X heads/tails Y XX XiXi i=1 to N YY YiYi "parameter independence" two separate thumbtack-like learning problems
19
A bit more difficult... X heads/tails Y Three probabilities to learn: X=heads X=heads Y=heads|X=heads Y=heads|X=heads Y=heads|X=tails Y=heads|X=tails
20
A bit more difficult... X heads/tails Y XX X1X1 X2X2 Y|X=heads Y1Y1 Y2Y2 case 1 case 2 Y|X=tails ? ? ?
21
A bit more difficult... X heads/tails Y XX X1X1 X2X2 Y|X=heads Y1Y1 Y2Y2 case 1 case 2 Y|X=tails
22
A bit more difficult... X heads/tails Y XX X1X1 X2X2 Y|X=heads Y1Y1 Y2Y2 case 1 case 2 Y|X=tails heads tails 3 separate thumbtack-like problems
23
In general… Learning probabilities in a BN is straightforward if n Likelihoods from the exponential family (multinomial, poisson, gamma,...) n Parameter independence n Conjugate priors n Complete data
24
Incomplete data makes parameters dependent X heads/tails Y XX X1X1 Y|X=heads Y1Y1 Y|X=tails
25
Incomplete data n Incomplete data makes parameters dependent Parameter Learning for incomplete data n Monte-Carlo integration –Investigate properties of the posterior and perform prediction n Large-sample Approx. (Laplace/Gaussian approx.) –Expectation-maximization (EM) algorithm and inference to compute mean and variance. n Variational methods
26
Overview n Learning Probabilities –Introduction to Bayesian statistics: Learning a probability –Learning probabilities in a Bayes net –Applications n Learning Bayes-net structure –Bayesian model selection/averaging –Applications
27
Example: Audio-video fusion Beal, Attias, & Jojic 2002 mic.1mic.2 source at l x camera lxlx lyly Video scenarioAudio scenario Goal: detect and track speaker Slide courtesy Beal, Attias and Jojic
28
Separate audio-video models audio datavideo data Frame n=1,…,N Slide courtesy Beal, Attias and Jojic
29
Combined model audio datavideo data Frame n=1,…,N Slide courtesy Beal, Attias and Jojic
30
Tracking Demo Slide courtesy Beal, Attias and Jojic
31
Overview n Learning Probabilities –Introduction to Bayesian statistics: Learning a probability –Learning probabilities in a Bayes net –Applications n Learning Bayes-net structure –Bayesian model selection/averaging –Applications
32
Two Types of Methods for Learning BNs n Constraint based –Finds a Bayesian network structure whose implied independence constraints “match” those found in the data. n Scoring methods (Bayesian, MDL, MML) –Find the Bayesian network structure that can represent distributions that “match” the data (i.e. could have generated the data).
33
Learning Bayes-net structure Given data, which model is correct? XY model 1: XY model 2:
34
Bayesian approach Given data, which model is correct? more likely? XY model 1: XY model 2: Data d
35
Bayesian approach: Model Averaging Given data, which model is correct? more likely? XY model 1: XY model 2: Data d average predictions
36
Bayesian approach: Model Selection Given data, which model is correct? more likely? XY model 1: XY model 2: Data d Keep the best model: - Explanation - Understanding - Tractability
37
To score a model, use Bayes rule Given data d: "marginal likelihood" model score likelihood
38
The Bayesian approach and Occam’s Razor All distributions p( m |m) True distribution Simple model Complicated model Just right
39
Computation of Marginal Likelihood Efficient closed form if n Likelihoods from the exponential family (binomial, poisson, gamma,...) n Parameter independence n Conjugate priors n No missing data, including no hidden variables Else use approximations n Monte-Carlo integration n Large-sample approximations n Variational methods
40
Practical considerations The number of possible BN structures is super exponential in the number of variables. n How do we find the best graph(s)?
41
Model search n Finding the BN structure with the highest score among those structures with at most k parents is NP hard for k>1 (Chickering, 1995) n Heuristic methods –Greedy –Greedy with restarts –MCMC methods score all possible single changes any changes better? perform best change yes no return saved structure initialize structure
42
Learning the correct model n True graph G and P is the generative distribution n Markov Assumption: P satisfies the independencies implied by G n Faithfulness Assumption: P satisfies only the independencies implied by G Theorem: Under Markov and Faithfulness, with enough data generated from P one can recover G (up to equivalence). Even with the greedy method!
43
Bayes net(s) data X 1 true false true X21532X21532 X 3 Red Blue Green Red........... Learning Bayes Nets From Data X1X1 X4X4 X9X9 X3X3 X2X2 X5X5 X6X6 X7X7 X8X8 Bayes-net learner + prior/expert information
44
Overview n Learning Probabilities –Introduction to Bayesian statistics: Learning a probability –Learning probabilities in a Bayes net –Applications n Learning Bayes-net structure –Bayesian model selection/averaging –Applications
45
Preference Prediction (a.k.a. Collaborative Filtering) n Example: Predict what products a user will likely purchase given items in their shopping basket n Basic idea: use other people’s preferences to help predict a new user’s preferences. n Numerous applications –Tell people about books or web-pages of interest –Movies –TV shows
46
Example: TV viewing Show1 Show2 Show3 viewer 1 ynn viewer 2 nyy... viewer 3 nnn etc. ~200 shows, ~3000 viewers Nielsen data: 2/6/95-2/19/95 Goal: For each viewer, recommend shows they haven’t watched that they are likely to watch
48
Making predictions Models Inc Melrose place Friends Beverly hills 90210 Mad about you Seinfeld Frasier NBC Monday night movies Law & order infer: p (watched 90210 | everything else we know about the user) watched didn't watch watcheddidn't watch watched didn't watch
49
Making predictions Models Inc Melrose place Friends Beverly hills 90210 Mad about you Seinfeld Frasier NBC Monday night movies Law & order infer: p (watched 90210 | everything else we know about the user) watched didn't watch watcheddidn't watch watched
50
Making predictions Models Inc Melrose place Friends Beverly hills 90210 Mad about you Seinfeld Frasier NBC Monday night movies Law & order infer p (watched Melrose place | everything else we know about the user) watched didn't watch watcheddidn't watch watched
51
Recommendation list n p=.67 Seinfeld n p=.51 NBC Monday night movies n p=.17 Beverly hills 90210 n p=.06 Melrose place
52
Software Packages n BUGS: http://www.mrc-bsu.cam.ac.uk/bugs parameter learning, hierarchical models, MCMC n Hugin: http://www.hugin.dk Inference and model construction n xBaies: http://www.city.ac.uk/~rgc chain graphs, discrete only n Bayesian Knowledge Discoverer: http://kmi.open.ac.uk/projects/bkd commercial n MIM: http://inet.uni-c.dk/~edwards/miminfo.html n BAYDA: http://www.cs.Helsinki.FI/research/cosco classification n BN Power Constructor: BN PowerConstructor n Microsoft Research: WinMine http://research.microsoft.com/~dmax/WinMine/Tooldoc.htm
53
For more information… Tutorials: K. Murphy (2001) http://www.cs.berkeley.edu/~murphyk/Bayes/bayes.html W. Buntine. Operations for learning with graphical models. Journal of Artificial Intelligence Research, 2, 159-225 (1994). D. Heckerman (1999). A tutorial on learning with Bayesian networks. In Learning in Graphical Models (Ed. M. Jordan). MIT Press. Books: R. Cowell, A. P. Dawid, S. Lauritzen, and D. Spiegelhalter. Probabilistic Networks and Expert Systems. Springer-Verlag. 1999. M. I. Jordan (ed, 1988). Learning in Graphical Models. MIT Press. S. Lauritzen (1996). Graphical Models. Claredon Press. J. Pearl (2000). Causality: Models, Reasoning, and Inference. Cambridge University Press. P. Spirtes, C. Glymour, and R. Scheines (2001). Causation, Prediction, and Search, Second Edition. MIT Press.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.