1 Structure Learning (The Good), The Bad, The Ugly Inference Graphical Models – 10708 Carlos Guestrin Carnegie Mellon University October 13 th, 2008 Readings:

Slides:

Advertisements

Similar presentations

1 Undirected Graphical Models Graphical Models – Carlos Guestrin Carnegie Mellon University October 29 th, 2008 Readings: K&F: 4.1, 4.2, 4.3, 4.4,

Advertisements

Variational Inference Amr Ahmed Nov. 6 th Outline Approximate Inference Variational inference formulation – Mean Field Examples – Structured VI.

Parameter and Structure Learning Dhruv Batra, Recitation 10/02/2008.

Bayesian Networks CSE 473. © Daniel S. Weld 2 Last Time Basic notions Atomic events Probabilities Joint distribution Inference by enumeration Independence.

Exact Inference in Bayes Nets

Parameter Learning in MN. Outline CRF Learning CRF for 2-d image segmentation IPF parameter sharing revisited.

MPE, MAP AND APPROXIMATIONS Lecture 10: Statistical Methods in AI/ML Vibhav Gogate The University of Texas at Dallas Readings: AD Chapter 10.

Overview of Inference Algorithms for Bayesian Networks Wei Sun, PhD Assistant Research Professor SEOR Dept. & C4I Center George Mason University, 2009.

GS 540 week 6. HMM basics Given a sequence, and state parameters: – Each possible path through the states has a certain probability of emitting the sequence.

Parameter Learning in Markov Nets Dhruv Batra, Recitation 11/13/2008.

Bayesian network inference

Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.

5/25/2005EE562 EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS Lecture 16, 6/1/2005 University of Washington, Department of Electrical Engineering Spring 2005.

CS 188: Artificial Intelligence Spring 2007 Lecture 14: Bayes Nets III 3/1/2007 Srini Narayanan – ICSI and UC Berkeley.

CS 188: Artificial Intelligence Fall 2006 Lecture 17: Bayes Nets III 10/26/2006 Dan Klein – UC Berkeley.

1 Bayesian Networks Chapter ; 14.4 CS 63 Adapted from slides by Tim Finin and Marie desJardins. Some material borrowed from Lise Getoor.

1 EM for BNs Graphical Models – Carlos Guestrin Carnegie Mellon University November 24 th, 2008 Readings: 18.1, 18.2, –  Carlos Guestrin.

Machine Learning CUNY Graduate Center Lecture 21: Graphical Models.

Context-specific independence Graphical Models – Carlos Guestrin Carnegie Mellon University October 16 th, 2006 Readings: K&F: 4.1, 4.2, 4.3, 4.4,

1 Bayesian Param. Learning Bayesian Structure Learning Graphical Models – Carlos Guestrin Carnegie Mellon University October 6 th, 2008 Readings:

Unsupervised Learning: Clustering Some material adapted from slides by Andrew Moore, CMU. Visit for

Bayesian Learning Chapter Some material adapted from lecture notes by Lise Getoor and Ron Parr.

Introduction to Bayesian Networks

1 Variable Elimination Graphical Models – Carlos Guestrin Carnegie Mellon University October 11 th, 2006 Readings: K&F: 8.1, 8.2, 8.3,

Inference Complexity As Learning Bias Daniel Lowd Dept. of Computer and Information Science University of Oregon Joint work with Pedro Domingos.

1 CMSC 671 Fall 2001 Class #25-26 – Tuesday, November 27 / Thursday, November 29.

1 EM for BNs Kalman Filters Gaussian BNs Graphical Models – Carlos Guestrin Carnegie Mellon University November 17 th, 2006 Readings: K&F: 16.2 Jordan.

Readings: K&F: 11.3, 11.5 Yedidia et al. paper from the class website

Learning With Bayesian Networks Markus Kalisch ETH Zürich.

Slides for “Data Mining” by I. H. Witten and E. Frank.

1 BN Semantics 1 Graphical Models – Carlos Guestrin Carnegie Mellon University September 15 th, 2008 Readings: K&F: 3.1, 3.2, –  Carlos.

1 Mean Field and Variational Methods finishing off Graphical Models – Carlos Guestrin Carnegie Mellon University November 5 th, 2008 Readings: K&F:

Probabilistic Graphical Models seminar 15/16 ( ) Haim Kaplan Tel Aviv University.

Exploiting Structure in Probability Distributions Irit Gat-Viks Based on presentation and lecture notes of Nir Friedman, Hebrew University.

1 Parameter Learning 2 Structure Learning 1: The good Graphical Models – Carlos Guestrin Carnegie Mellon University September 27 th, 2006 Readings:

Exact Inference in Bayes Nets. Notation U: set of nodes in a graph X i : random variable associated with node i π i : parents of node i Joint probability:

1 BN Semantics 2 – The revenge of d-separation Graphical Models – Carlos Guestrin Carnegie Mellon University September 20 th, 2006 Readings: K&F:

Inference Algorithms for Bayes Networks

Advances in Bayesian Learning Learning and Inference in Bayesian Networks Irina Rish IBM T.J.Watson Research Center

1 Param. Learning (MLE) Structure Learning The Good Graphical Models – Carlos Guestrin Carnegie Mellon University October 1 st, 2008 Readings: K&F:

1 CMSC 671 Fall 2001 Class #20 – Thursday, November 8.

1 Learning P-maps Param. Learning Graphical Models – Carlos Guestrin Carnegie Mellon University September 24 th, 2008 Readings: K&F: 3.3, 3.4, 16.1,

1 BN Semantics 2 – Representation Theorem The revenge of d-separation Graphical Models – Carlos Guestrin Carnegie Mellon University September 17.

Daphne Koller Overview Maximum a posteriori (MAP) Probabilistic Graphical Models Inference.

1 Variable Elimination Graphical Models – Carlos Guestrin Carnegie Mellon University October 15 th, 2008 Readings: K&F: 8.1, 8.2, 8.3,

1 BN Semantics 1 Graphical Models – Carlos Guestrin Carnegie Mellon University September 15 th, 2006 Readings: K&F: 3.1, 3.2, 3.3.

1 BN Semantics 3 – Now it’s personal! Parameter Learning 1 Graphical Models – Carlos Guestrin Carnegie Mellon University September 22 nd, 2006 Readings:

Irina Rish IBM T.J.Watson Research Center

CSCI 5822 Probabilistic Models of Human and Machine Learning

Readings: K&F: 15.1, 15.2, 15.3, 15.4, 15.5 K&F: 7 (overview of inference) K&F: 8.1, 8.2 (Variable Elimination) Structure Learning in BNs 3: (the good,

Class #19 – Tuesday, November 3

CS 188: Artificial Intelligence Fall 2008

Expectation-Maximization & Belief Propagation

Class #16 – Tuesday, October 26

Variable Elimination 2 Clique Trees

Parameter Learning 2 Structure Learning 1: The good

Readings: K&F: 5.1, 5.2, 5.3, 5.4, 5.5, 5.6, 5.7 Markov networks, Factor graphs, and an unified view Start approximate inference If we are lucky… Graphical.

Approximate Inference by Sampling

Unifying Variational and GBP Learning Parameters of MNs EM for BNs

BN Semantics 3 – Now it’s personal! Parameter Learning 1

Junction Trees 3 Undirected Graphical Models

Readings: K&F: 11.3, 11.5 Yedidia et al. paper from the class website

Context-specific independence

Variable Elimination Graphical Models – Carlos Guestrin

Mean Field and Variational Methods Loopy Belief Propagation

Generalized Belief Propagation

BN Semantics 2 – The revenge of d-separation

Presentation transcript:

1 Structure Learning (The Good), The Bad, The Ugly Inference Graphical Models – Carlos Guestrin Carnegie Mellon University October 13 th, 2008 Readings: K&F: 17.3, 17.4, , 8.1, –  Carlos Guestrin

2 Decomposable score Log data likelihood Decomposable score:  Decomposes over families in BN (node and its parents)  Will lead to significant computational efficiency!!!  Score(G : D) =  i FamScore(X i |Pa Xi : D)

–  Carlos Guestrin Structure learning for general graphs In a tree, a node only has one parent Theorem:  The problem of learning a BN structure with at most d parents is NP-hard for any (fixed) d≥2 Most structure learning approaches use heuristics  Exploit score decomposition  (Quickly) Describe two heuristics that exploit decomposition in different ways

–  Carlos Guestrin Understanding score decomposition Difficulty SATGrade Happy Job Coherence Letter Intelligence

–  Carlos Guestrin Fixed variable order 1 Pick a variable order  e.g., X 1,…,X n X i can only pick parents in {X 1,…,X i-1 }  Any subset  Acyclicity guaranteed! Total score = sum score of each node

–  Carlos Guestrin Fixed variable order 2 Fix max number of parents to k For each i in order  Pick Pa Xi  {X 1,…,X i-1 } Exhaustively search through all possible subsets Pa Xi is maximum U  {X 1,…,X i-1 } FamScore(X i |U : D) Optimal BN for each order!!! Greedy search through space of orders:  E.g., try switching pairs of variables in order  If neighboring vars in order are switched, only need to recompute score for this pair O(n) speed up per iteration

–  Carlos Guestrin Learn BN structure using local search Starting from Chow-Liu tree Local search, possible moves: Only if acyclic!!! Add edge Delete edge Invert edge Select using favorite score

–  Carlos Guestrin Exploit score decomposition in local search Add edge and delete edge:  Only rescore one family! Reverse edge  Rescore only two families Difficulty SATGrade Happy Job Coherence Letter Intelligence

–  Carlos Guestrin Some experiments Alarm network

–  Carlos Guestrin Order search versus graph search Order search advantages  For fixed order, optimal BN – more “global” optimization  Space of orders much smaller than space of graphs Graph search advantages  Not restricted to k parents Especially if exploiting CPD structure, such as CSI  Cheaper per iteration  Finer moves within a graph

–  Carlos Guestrin Bayesian model averaging So far, we have selected a single structure But, if you are really Bayesian, must average over structures  Similar to averaging over parameters Inference for structure averaging is very hard!!!  Clever tricks in reading

–  Carlos Guestrin What you need to know about learning BN structures Decomposable scores  Data likelihood  Information theoretic interpretation  Bayesian  BIC approximation Priors  Structure and parameter assumptions  BDe if and only if score equivalence Best tree (Chow-Liu) Best TAN Nearly best k-treewidth (in O(N k+1 )) Search techniques  Search through orders  Search through structures Bayesian model averaging

–  Carlos Guestrin Inference in graphical models: Typical queries 1 Flu Allergy Sinus Headache Nose Conditional probabilities  Distribution of some var(s). given evidence

–  Carlos Guestrin Inference in graphical models: Typical queries 2 – Maximization Flu Allergy Sinus Headache Nose Most probable explanation (MPE)  Most likely assignment to all hidden vars given evidence Maximum a posteriori (MAP)  Most likely assignment to some var(s) given evidence

–  Carlos Guestrin Are MPE and MAP Consistent? SinusNose Most probable explanation (MPE)  Most likely assignment to all hidden vars given evidence Maximum a posteriori (MAP)  Most likely assignment to some var(s) given evidence P(S=t)=0.4 P(S=f)=0.6 P(N|S)

C++ Library Now available, join:  The library implements the following functionality:  random variables, random processes, and linear algebra  factorized distributions, such Gaussians, multinomial distributions, and mixtures  graph structures and basic graph algorithms  graphical models, including Bayesian networks, Markov networks, andjunction trees  basic static and dynamic inference algorithms  parameter learning for Gaussian distributions, Chow Liu Fairly advanced C++ (not for everyone ) –  Carlos Guestrin

–  Carlos Guestrin Complexity of conditional probability queries 1 How hard is it to compute P(X|E=e)? Reduction – 3-SAT

–  Carlos Guestrin Complexity of conditional probability queries 2 How hard is it to compute P(X|E=e)?  At least NP-hard, but even harder!

–  Carlos Guestrin Inference is #P-complete, hopeless? Exploit structure! Inference is hard in general, but easy for many (real-world relevant) BN structures

–  Carlos Guestrin Complexity for other inference questions Probabilistic inference  general graphs:  poly-trees and low tree-width: Approximate probabilistic inference  Absolute error:  Relative error: Most probable explanation (MPE)  general graphs:  poly-trees and low tree-width: Maximum a posteriori (MAP)  general graphs:  poly-trees and low tree-width:

–  Carlos Guestrin Inference in BNs hopeless? In general, yes!  Even approximate! In practice  Exploit structure  Many effective approximation algorithms (some with guarantees) For now, we’ll talk about exact inference  Approximate inference later this semester

–  Carlos Guestrin General probabilistic inference Query: Using def. of cond. prob.: Normalization: Flu Allergy Sinus Headache Nose

–  Carlos Guestrin Marginalization FluNose=tSinus

–  Carlos Guestrin Probabilistic inference example Flu Allergy Sinus Headache Nose=t Inference seems exponential in number of variables!

–  Carlos Guestrin Fast probabilistic inference example – Variable elimination Flu Allergy Sinus Headache Nose=t (Potential for) Exponential reduction in computation!

–  Carlos Guestrin Understanding variable elimination – Exploiting distributivity FluSinusNose=t

–  Carlos Guestrin Understanding variable elimination – Order can make a HUGE difference Flu Allergy Sinus Headache Nose=t

–  Carlos Guestrin Understanding variable elimination – Intermediate results Flu Allergy Sinus Headache Nose=t Intermediate results are probability distributions

–  Carlos Guestrin Understanding variable elimination – Another example Pharmacy Sinus Headache Nose=t

–  Carlos Guestrin Pruning irrelevant variables Flu Allergy Sinus Headache Nose=t Prune all non-ancestors of query variables More generally: Prune all nodes not on active trail between evidence and query vars