An Overview of Learning Bayes Nets From Data Chris Meek Microsoft Researchhttp://research.microsoft.com/~meek.

Slides:



Advertisements
Similar presentations
Bayes rule, priors and maximum a posteriori
Advertisements

CS188: Computational Models of Human Behavior
CSE 473/573 Computer Vision and Image Processing (CVIP) Ifeoma Nwogu Lecture 27 – Overview of probability concepts 1.
A Tutorial on Learning with Bayesian Networks
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for 1 Lecture Notes for E Alpaydın 2010.
INTRODUCTION TO MACHINE LEARNING Bayesian Estimation.
1 Some Comments on Sebastiani et al Nature Genetics 37(4)2005.
Learning Bayesian Networks. Dimensions of Learning ModelBayes netMarkov net DataCompleteIncomplete StructureKnownUnknown ObjectiveGenerativeDiscriminative.
Reasoning Under Uncertainty: Bayesian networks intro Jim Little Uncertainty 4 November 7, 2014 Textbook §6.3, 6.3.1, 6.5, 6.5.1,
Introduction of Probabilistic Reasoning and Bayesian Networks
1 Slides for the book: Probabilistic Robotics Authors: Sebastian Thrun Wolfram Burgard Dieter Fox Publisher: MIT Press, Web site for the book & more.
Bayesian Networks Chapter 2 (Duda et al.) – Section 2.11
1 Graphical Models in Data Assimilation Problems Alexander Ihler UC Irvine Collaborators: Sergey Kirshner Andrew Robertson Padhraic Smyth.
Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest posterior distribution. Minimizes the average misclassification.
Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.
Basics of Statistical Estimation. Learning Probabilities: Classical Approach Simplest case: Flipping a thumbtack tails heads True probability  is unknown.
Part 2 of 3: Bayesian Network and Dynamic Bayesian Network.
Learning Bayesian Networks. Dimensions of Learning ModelBayes netMarkov net DataCompleteIncomplete StructureKnownUnknown ObjectiveGenerativeDiscriminative.
Haimonti Dutta, Department Of Computer And Information Science1 David HeckerMann A Tutorial On Learning With Bayesian Networks.
Presenting: Assaf Tzabari
. PGM: Tirgul 10 Parameter Learning and Priors. 2 Why learning? Knowledge acquisition bottleneck u Knowledge acquisition is an expensive process u Often.
Probabilistic Robotics Introduction Probabilities Bayes rule Bayes filters.
CS 188: Artificial Intelligence Spring 2007 Lecture 14: Bayes Nets III 3/1/2007 Srini Narayanan – ICSI and UC Berkeley.
Computer vision: models, learning and inference Chapter 10 Graphical Models.
Causal Models, Learning Algorithms and their Application to Performance Modeling Jan Lemeire Parallel Systems lab November 15 th 2006.
Learning Bayesian Networks (From David Heckerman’s tutorial)
Causal Modeling for Anomaly Detection Andrew Arnold Machine Learning Department, Carnegie Mellon University Summer Project with Naoki Abe Predictive Modeling.
1 Learning with Bayesian Networks Author: David Heckerman Presented by Yan Zhang April
Learning In Bayesian Networks. Learning Problem Set of random variables X = {W, X, Y, Z, …} Training set D = { x 1, x 2, …, x N }  Each observation specifies.
Crash Course on Machine Learning
A Brief Introduction to Graphical Models
Lecture 1 is based on David Heckerman’s Tutorial slides. (Microsoft Research) Bayesian Networks Lecture 1: Basics and Knowledge- Based Construction Requirements:
Bayesian Learning By Porchelvi Vijayakumar. Cognitive Science Current Problem: How do children learn and how do they get it right?
Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.
A Comparison Between Bayesian Networks and Generalized Linear Models in the Indoor/Outdoor Scene Classification Problem.
Bayesian Networks What is the likelihood of X given evidence E? i.e. P(X|E) = ?
Aprendizagem Computacional Gladys Castillo, UA Bayesian Networks Classifiers Gladys Castillo University of Aveiro.
Bayesian Networks for Data Mining David Heckerman Microsoft Research (Data Mining and Knowledge Discovery 1, (1997))
Unsupervised Learning: Clustering Some material adapted from slides by Andrew Moore, CMU. Visit for
Course files
Learning the Structure of Related Tasks Presented by Lihan He Machine Learning Reading Group Duke University 02/03/2006 A. Niculescu-Mizil, R. Caruana.
Learning With Bayesian Networks Markus Kalisch ETH Zürich.
Computing & Information Sciences Kansas State University Data Sciences Summer Institute Multimodal Information Access and Synthesis Learning and Reasoning.
Announcements Spring Courses Somewhat Relevant to Machine Learning 5314: Algorithms for molecular bio (who’s teaching?) 5446: Chaotic dynamics (Bradley)
The famous “sprinkler” example (J. Pearl, Probabilistic Reasoning in Intelligent Systems, 1988)
Learning In Bayesian Networks. General Learning Problem Set of random variables X = {X 1, X 2, X 3, X 4, …} Training set D = { X (1), X (2), …, X (N)
CHAPTER 5 Probability Theory (continued) Introduction to Bayesian Networks.
Lecture 2: Statistical learning primer for biologists
Bayesian networks and their application in circuit reliability estimation Erin Taylor.
David Heckerman Microsoft Research Graphical Models.
1 Parameter Learning 2 Structure Learning 1: The good Graphical Models – Carlos Guestrin Carnegie Mellon University September 27 th, 2006 Readings:
1 Param. Learning (MLE) Structure Learning The Good Graphical Models – Carlos Guestrin Carnegie Mellon University October 1 st, 2008 Readings: K&F:
1 Learning P-maps Param. Learning Graphical Models – Carlos Guestrin Carnegie Mellon University September 24 th, 2008 Readings: K&F: 3.3, 3.4, 16.1,
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
04/21/2005 CS673 1 Being Bayesian About Network Structure A Bayesian Approach to Structure Discovery in Bayesian Networks Nir Friedman and Daphne Koller.
Spring 2014 Course PSYC Thinking Proseminar – Matt Jones Provides beginning Ph.D. students with a basic introduction to research on complex human.
Dependency Networks for Inference, Collaborative filtering, and Data Visualization Heckerman et al. Microsoft Research J. of Machine Learning Research.
Bayesian Belief Network AI Contents t Introduction t Bayesian Network t KDD Data.
CS 2750: Machine Learning Directed Graphical Models
Bayes Net Learning: Bayesian Approaches
Learning Bayesian Network Models from Data
Irina Rish IBM T.J.Watson Research Center
An Overview of Learning Bayes Nets From Data
Markov Properties of Directed Acyclic Graphs
CSCI 5822 Probabilistic Models of Human and Machine Learning
CS498-EA Reasoning in AI Lecture #20
Instructors: Fei Fang (This Lecture) and Dave Touretzky
CSCI 5822 Probabilistic Models of Human and Machine Learning
Important Distinctions in Learning BNs
BN Semantics 3 – Now it’s personal! Parameter Learning 1
Presentation transcript:

An Overview of Learning Bayes Nets From Data Chris Meek Microsoft Researchhttp://research.microsoft.com/~meek

What’s and Why’s n What is a Bayesian network? n Why Bayesian networks are useful? n Why learn a Bayesian network?

What is a Bayesian Network? n Directed acyclic graph –Nodes are variables (discrete or continuous) –Arcs indicate dependence between variables. n Conditional Probabilities (local distributions) n Missing arcs implies conditional independence n Independencies + local distributions => modular specification of a joint distribution X1X1 X2X2 X3X3 also called belief networks, and (directed acyclic) graphical models

Why Bayesian Networks? n Expressive language –Finite mixture models, Factor analysis, HMM, Kalman filter,… n Intuitive language –Can utilize causal knowledge in constructing models –Domain experts comfortable building a network n General purpose “inference” algorithms –P(Bad Battery | Has Gas, Won’t Start) –Exact: Modular specification leads to large computational efficiencies –Approximate: “Loopy” belief propagation Gas Start Battery

Why Learning? knowledge-based (expert systems) data-based -Answer Wizard, Office 95, 97, & Troubleshooters, Windows 98 & Causal discovery -Data visualization -Concise model of data -Prediction

Overview n Learning Probabilities (local distributions) –Introduction to Bayesian statistics: Learning a probability –Learning probabilities in a Bayes net –Applications n Learning Bayes-net structure –Bayesian model selection/averaging –Applications

Learning Probabilities: Classical Approach Simple case: Flipping a thumbtack tails heads True probability  is unknown Given iid data, estimate  using an estimator with good properties: low bias, low variance, consistent (e.g., ML estimate)

Learning Probabilities: Bayesian Approach tails heads True probability  is unknown Bayesian probability density for  p()p()  01

Bayesian Approach: use Bayes' rule to compute a new density for  given data prior likelihood posterior

The Likelihood “binomial distribution”

Example: Application of Bayes rule to the observation of a single "heads" p(  |heads)  01 p()p()  01 p(heads|  )=   01 priorlikelihoodposterior

The probability of heads on the next toss Note: This yields nearly identical answers to ML estimates when one uses a “flat” prior

Overview n Learning Probabilities –Introduction to Bayesian statistics: Learning a probability –Learning probabilities in a Bayes net –Applications n Learning Bayes-net structure –Bayesian model selection/averaging –Applications

From thumbtacks to Bayes nets Thumbtack problem can be viewed as learning the probability for a very simple BN: X heads/tails  X1X1 X2X2 XNXN... toss 1 toss 2toss N  XiXi i=1 to N

The next simplest Bayes net X heads/tails Y tails heads “heads”“tails”

The next simplest Bayes net X heads/tails Y XX XiXi i=1 to N YY YiYi ?

The next simplest Bayes net X heads/tails Y XX XiXi i=1 to N YY YiYi "parameter independence"

The next simplest Bayes net X heads/tails Y XX XiXi i=1 to N YY YiYi "parameter independence"  two separate thumbtack-like learning problems

A bit more difficult... X heads/tails Y Three probabilities to learn:  X=heads  X=heads  Y=heads|X=heads  Y=heads|X=heads  Y=heads|X=tails  Y=heads|X=tails

A bit more difficult... X heads/tails Y XX X1X1 X2X2  Y|X=heads Y1Y1 Y2Y2 case 1 case 2  Y|X=tails ? ? ?

A bit more difficult... X heads/tails Y XX X1X1 X2X2  Y|X=heads Y1Y1 Y2Y2 case 1 case 2  Y|X=tails

A bit more difficult... X heads/tails Y XX X1X1 X2X2  Y|X=heads Y1Y1 Y2Y2 case 1 case 2  Y|X=tails heads tails 3 separate thumbtack-like problems

In general… Learning probabilities in a BN is straightforward if n Likelihoods from the exponential family (multinomial, poisson, gamma,...) n Parameter independence n Conjugate priors n Complete data

Incomplete data makes parameters dependent X heads/tails Y XX X1X1  Y|X=heads Y1Y1  Y|X=tails

Incomplete data n Incomplete data makes parameters dependent Parameter Learning for incomplete data n Monte-Carlo integration –Investigate properties of the posterior and perform prediction n Large-sample Approx. (Laplace/Gaussian approx.) –Expectation-maximization (EM) algorithm and inference to compute mean and variance. n Variational methods

Overview n Learning Probabilities –Introduction to Bayesian statistics: Learning a probability –Learning probabilities in a Bayes net –Applications n Learning Bayes-net structure –Bayesian model selection/averaging –Applications

Example: Audio-video fusion Beal, Attias, & Jojic 2002  mic.1mic.2 source at l x camera lxlx lyly Video scenarioAudio scenario Goal: detect and track speaker Slide courtesy Beal, Attias and Jojic

Separate audio-video models audio datavideo data Frame n=1,…,N Slide courtesy Beal, Attias and Jojic

Combined model audio datavideo data Frame n=1,…,N  Slide courtesy Beal, Attias and Jojic

Tracking Demo Slide courtesy Beal, Attias and Jojic

Overview n Learning Probabilities –Introduction to Bayesian statistics: Learning a probability –Learning probabilities in a Bayes net –Applications n Learning Bayes-net structure –Bayesian model selection/averaging –Applications

Two Types of Methods for Learning BNs n Constraint based –Finds a Bayesian network structure whose implied independence constraints “match” those found in the data. n Scoring methods (Bayesian, MDL, MML) –Find the Bayesian network structure that can represent distributions that “match” the data (i.e. could have generated the data).

Learning Bayes-net structure Given data, which model is correct? XY model 1: XY model 2:

Bayesian approach Given data, which model is correct? more likely? XY model 1: XY model 2: Data d

Bayesian approach: Model Averaging Given data, which model is correct? more likely? XY model 1: XY model 2: Data d average predictions

Bayesian approach: Model Selection Given data, which model is correct? more likely? XY model 1: XY model 2: Data d Keep the best model: - Explanation - Understanding - Tractability

To score a model, use Bayes rule Given data d: "marginal likelihood" model score likelihood

The Bayesian approach and Occam’s Razor All distributions p(  m |m) True distribution Simple model Complicated model Just right

Computation of Marginal Likelihood Efficient closed form if n Likelihoods from the exponential family (binomial, poisson, gamma,...) n Parameter independence n Conjugate priors n No missing data, including no hidden variables Else use approximations n Monte-Carlo integration n Large-sample approximations n Variational methods

Practical considerations The number of possible BN structures is super exponential in the number of variables. n How do we find the best graph(s)?

Model search n Finding the BN structure with the highest score among those structures with at most k parents is NP hard for k>1 (Chickering, 1995) n Heuristic methods –Greedy –Greedy with restarts –MCMC methods score all possible single changes any changes better? perform best change yes no return saved structure initialize structure

Learning the correct model n True graph G and P is the generative distribution n Markov Assumption: P satisfies the independencies implied by G n Faithfulness Assumption: P satisfies only the independencies implied by G Theorem: Under Markov and Faithfulness, with enough data generated from P one can recover G (up to equivalence). Even with the greedy method!

Bayes net(s) data X 1 true false true X21532X21532 X 3 Red Blue Green Red Learning Bayes Nets From Data X1X1 X4X4 X9X9 X3X3 X2X2 X5X5 X6X6 X7X7 X8X8 Bayes-net learner + prior/expert information

Overview n Learning Probabilities –Introduction to Bayesian statistics: Learning a probability –Learning probabilities in a Bayes net –Applications n Learning Bayes-net structure –Bayesian model selection/averaging –Applications

Preference Prediction (a.k.a. Collaborative Filtering) n Example: Predict what products a user will likely purchase given items in their shopping basket n Basic idea: use other people’s preferences to help predict a new user’s preferences. n Numerous applications –Tell people about books or web-pages of interest –Movies –TV shows

Example: TV viewing Show1 Show2 Show3 viewer 1 ynn viewer 2 nyy... viewer 3 nnn etc. ~200 shows, ~3000 viewers Nielsen data: 2/6/95-2/19/95 Goal: For each viewer, recommend shows they haven’t watched that they are likely to watch

Making predictions Models Inc Melrose place Friends Beverly hills Mad about you Seinfeld Frasier NBC Monday night movies Law & order infer: p (watched | everything else we know about the user) watched didn't watch watcheddidn't watch watched didn't watch

Making predictions Models Inc Melrose place Friends Beverly hills Mad about you Seinfeld Frasier NBC Monday night movies Law & order infer: p (watched | everything else we know about the user) watched didn't watch watcheddidn't watch watched

Making predictions Models Inc Melrose place Friends Beverly hills Mad about you Seinfeld Frasier NBC Monday night movies Law & order infer p (watched Melrose place | everything else we know about the user) watched didn't watch watcheddidn't watch watched

Recommendation list n p=.67 Seinfeld n p=.51 NBC Monday night movies n p=.17 Beverly hills n p=.06 Melrose place

Software Packages n BUGS: parameter learning, hierarchical models, MCMC n Hugin: Inference and model construction n xBaies: chain graphs, discrete only n Bayesian Knowledge Discoverer: commercial n MIM: n BAYDA: classification n BN Power Constructor: BN PowerConstructor n Microsoft Research: WinMine

For more information… Tutorials: K. Murphy (2001) W. Buntine. Operations for learning with graphical models. Journal of Artificial Intelligence Research, 2, (1994). D. Heckerman (1999). A tutorial on learning with Bayesian networks. In Learning in Graphical Models (Ed. M. Jordan). MIT Press. Books: R. Cowell, A. P. Dawid, S. Lauritzen, and D. Spiegelhalter. Probabilistic Networks and Expert Systems. Springer-Verlag M. I. Jordan (ed, 1988). Learning in Graphical Models. MIT Press. S. Lauritzen (1996). Graphical Models. Claredon Press. J. Pearl (2000). Causality: Models, Reasoning, and Inference. Cambridge University Press. P. Spirtes, C. Glymour, and R. Scheines (2001). Causation, Prediction, and Search, Second Edition. MIT Press.