Variable Elimination 2 Clique Trees

Slides:

Advertisements

Similar presentations

1 Undirected Graphical Models Graphical Models – Carlos Guestrin Carnegie Mellon University October 29 th, 2008 Readings: K&F: 4.1, 4.2, 4.3, 4.4,

Advertisements

Exact Inference. Inference Basic task for inference: – Compute a posterior distribution for some query variables given some observed evidence – Sum out.

Variational Inference Amr Ahmed Nov. 6 th Outline Approximate Inference Variational inference formulation – Mean Field Examples – Structured VI.

Variational Methods for Graphical Models Micheal I. Jordan Zoubin Ghahramani Tommi S. Jaakkola Lawrence K. Saul Presented by: Afsaneh Shirazi.

CS498-EA Reasoning in AI Lecture #15 Instructor: Eyal Amir Fall Semester 2011.

Lauritzen-Spiegelhalter Algorithm

Statistical Methods in AI/ML Bucket elimination Vibhav Gogate.

Exact Inference in Bayes Nets

Junction Trees And Belief Propagation. Junction Trees: Motivation What if we want to compute all marginals, not just one? Doing variable elimination for.

UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering A Comparison of Lauritzen- Spiegelhalter, Hugin, and Shenoy- Shafer Architectures.

Dynamic Bayesian Networks (DBNs)

Clique Trees Amr Ahmed October 23, Outline Clique Trees Representation Factorization Inference Relation with VE.

Overview of Inference Algorithms for Bayesian Networks Wei Sun, PhD Assistant Research Professor SEOR Dept. & C4I Center George Mason University, 2009.

Junction Trees: Motivation Standard algorithms (e.g., variable elimination) are inefficient if the undirected graph underlying the Bayes Net contains cycles.

Junction tree Algorithm :Probabilistic Graphical Models Recitation: 10/04/07 Ramesh Nallapati.

From Variable Elimination to Junction Trees

Machine Learning CUNY Graduate Center Lecture 6: Junction Tree Algorithm.

GS 540 week 6. HMM basics Given a sequence, and state parameters: – Each possible path through the states has a certain probability of emitting the sequence.

Recent Development on Elimination Ordering Group 1.

Global Approximate Inference Eran Segal Weizmann Institute.

Belief Propagation, Junction Trees, and Factor Graphs

Exact Inference: Clique Trees

PGM 2002/03 Tirgul5 Clique/Junction Tree Inference.

Context-specific independence Graphical Models – Carlos Guestrin Carnegie Mellon University October 16 th, 2006 Readings: K&F: 4.1, 4.2, 4.3, 4.4,

Undirected Models: Markov Networks David Page, Fall 2009 CS 731: Advanced Methods in Artificial Intelligence, with Biomedical Applications.

1 Variable Elimination Graphical Models – Carlos Guestrin Carnegie Mellon University October 11 th, 2006 Readings: K&F: 8.1, 8.2, 8.3,

Daphne Koller Variable Elimination Graph-Based Perspective Probabilistic Graphical Models Inference.

Readings: K&F: 11.3, 11.5 Yedidia et al. paper from the class website

1 BN Semantics 1 Graphical Models – Carlos Guestrin Carnegie Mellon University September 15 th, 2008 Readings: K&F: 3.1, 3.2, –  Carlos.

1 Mean Field and Variational Methods finishing off Graphical Models – Carlos Guestrin Carnegie Mellon University November 5 th, 2008 Readings: K&F:

1 Parameter Learning 2 Structure Learning 1: The good Graphical Models – Carlos Guestrin Carnegie Mellon University September 27 th, 2006 Readings:

1 Use graphs and not pure logic Variables represented by nodes and dependencies by edges. Common in our language: “threads of thoughts”, “lines of reasoning”,

Exact Inference in Bayes Nets. Notation U: set of nodes in a graph X i : random variable associated with node i π i : parents of node i Joint probability:

1 Param. Learning (MLE) Structure Learning The Good Graphical Models – Carlos Guestrin Carnegie Mellon University October 1 st, 2008 Readings: K&F:

Christopher M. Bishop, Pattern Recognition and Machine Learning 1.

1 Learning P-maps Param. Learning Graphical Models – Carlos Guestrin Carnegie Mellon University September 24 th, 2008 Readings: K&F: 3.3, 3.4, 16.1,

1 BN Semantics 2 – Representation Theorem The revenge of d-separation Graphical Models – Carlos Guestrin Carnegie Mellon University September 17.

Today Graphical Models Representing conditional dependence graphically

Belief propagation with junction trees Presented by Mark Silberstein and Yaniv Hamo.

1 Structure Learning (The Good), The Bad, The Ugly Inference Graphical Models – Carlos Guestrin Carnegie Mellon University October 13 th, 2008 Readings:

1 Variable Elimination Graphical Models – Carlos Guestrin Carnegie Mellon University October 15 th, 2008 Readings: K&F: 8.1, 8.2, 8.3,

1 BN Semantics 1 Graphical Models – Carlos Guestrin Carnegie Mellon University September 15 th, 2006 Readings: K&F: 3.1, 3.2, 3.3.

1 BN Semantics 3 – Now it’s personal! Parameter Learning 1 Graphical Models – Carlos Guestrin Carnegie Mellon University September 22 nd, 2006 Readings:

Instructor: Eyal Amir Grad TAs: Wen Pu, Yonatan Bisk

Inference in Bayesian Networks

Exact Inference Continued

CSCI 5822 Probabilistic Models of Human and Machine Learning

Readings: K&F: 15.1, 15.2, 15.3, 15.4, 15.5 K&F: 7 (overview of inference) K&F: 8.1, 8.2 (Variable Elimination) Structure Learning in BNs 3: (the good,

Clique Tree & Independence

Exact Inference Eric Xing Lecture 11, August 14, 2010

Cluster Graph Properties

Parameter Learning 2 Structure Learning 1: The good

Readings: K&F: 5.1, 5.2, 5.3, 5.4, 5.5, 5.6, 5.7 Markov networks, Factor graphs, and an unified view Start approximate inference If we are lucky… Graphical.

Lecture 3: Exact Inference in GMs

Clique Tree Algorithm: Computation

Unifying Variational and GBP Learning Parameters of MNs EM for BNs

BN Semantics 3 – Now it’s personal! Parameter Learning 1

Junction Trees 3 Undirected Graphical Models

Readings: K&F: 11.3, 11.5 Yedidia et al. paper from the class website

Context-specific independence

Variable Elimination Graphical Models – Carlos Guestrin

Advanced Machine Learning

Mean Field and Variational Methods Loopy Belief Propagation

Generalized Belief Propagation

BN Semantics 2 – The revenge of d-separation

Presentation transcript:

Variable Elimination 2 Clique Trees Readings: K&F: 8.1, 8.2, 8.3, 8.7.1 K&F: 9.1, 9.2, 9.3, 9.4 Variable Elimination 2 Clique Trees Graphical Models – 10708 Carlos Guestrin Carnegie Mellon University October 13th, 2006

Complexity of variable elimination – Graphs with loops Moralize graph: Connect parents into a clique and remove edge directions Difficulty SAT Grade Happy Job Coherence Letter Intelligence Connect nodes that appear together in an initial factor 10-708 – Carlos Guestrin 2006

Induced graph Elimination order: {C,D,S,I,L,H,J,G} The induced graph IFÁ for elimination order Á has an edge Xi – Xj if Xi and Xj appear together in a factor generated by VE for elimination order Á on factors F Elimination order: {C,D,S,I,L,H,J,G} Difficulty SAT Grade Happy Job Coherence Letter Intelligence 10-708 – Carlos Guestrin 2006

Induced graph and complexity of VE Read complexity from cliques in induced graph Structure of induced graph encodes complexity of VE!!! Theorem: Every factor generated by VE subset of a maximal clique in IFÁ For every maximal clique in IFÁ corresponds to a factor generated by VE Induced width (or treewidth) Size of largest clique in IFÁ minus 1 Minimal induced width – induced width of best order Á Coherence Difficulty Intelligence Grade SAT Letter Job Happy Elimination order: {C,D,I,S,L,H,J,G} 10-708 – Carlos Guestrin 2006

Example: Large induced-width with small number of parents Compact representation  Easy inference  10-708 – Carlos Guestrin 2006

Finding optimal elimination order Theorem: Finding best elimination order is NP-complete: Decision problem: Given a graph, determine if there exists an elimination order that achieves induced width · K Interpretation: Hardness of finding elimination order in addition to hardness of inference Actually, can find elimination order in time exponential in size of largest clique – same complexity as inference Coherence Difficulty Intelligence Grade SAT Letter Job Happy Elimination order: {C,D,I,S,L,H,J,G} 10-708 – Carlos Guestrin 2006

Induced graphs and chordal graphs Every cycle X1 – X2 – … – Xk – X1 with k ¸ 3 has a chord Edge Xi – Xj for non-consecutive i & j Theorem: Every induced graph is chordal “Optimal” elimination order easily obtained for chordal graph Difficulty SAT Grade Happy Job Coherence Letter Intelligence 10-708 – Carlos Guestrin 2006

Chordal graphs and triangulation Triangulation: turning graph into chordal graph Max Cardinality Search: Simple heuristic Initialize unobserved nodes X as unmarked For k = |X| to 1 X Ã unmarked var with most marked neighbors Á(X) Ã k Mark X Theorem: Obtains optimal order for chordal graphs Often, not so good in other graphs! B E D H G A F C 10-708 – Carlos Guestrin 2006

Minimum fill/size/weight heuristics Many more effective heuristics see reading Min (weighted) fill heuristic Often very effective Initialize unobserved nodes X as unmarked For k = 1 to |X| X Ã unmarked var whose elimination adds fewest edges Á(X) Ã k Mark X Add fill edges introduced by eliminating X Weighted version: Consider size of factor rather than number of edges B E D H G A F C 10-708 – Carlos Guestrin 2006

Choosing an elimination order Choosing best order is NP-complete Reduction from MAX-Clique Many good heuristics (some with guarantees) Ultimately, can’t beat NP-hardness of inference Even optimal order can lead to exponential variable elimination computation In practice Variable elimination often very effective Many (many many) approximate inference approaches available when variable elimination too expensive Most approximate inference approaches build on ideas from variable elimination 10-708 – Carlos Guestrin 2006

Announcements Recitation on advanced topic: Carlos on Context-Specific Independence On Monday Oct 16, 5:30-7:00pm in Wean Hall 4615A 10-708 – Carlos Guestrin 2006

Most likely explanation (MLE) Flu Allergy Sinus Headache Nose Query: Using defn of conditional probs: Normalization irrelevant: 10-708 – Carlos Guestrin 2006

Max-marginalization Flu Sinus Nose=t 10-708 – Carlos Guestrin 2006

Example of variable elimination for MLE – Forward pass Flu Allergy Sinus Headache Nose=t 10-708 – Carlos Guestrin 2006

Example of variable elimination for MLE – Backward pass Flu Allergy Sinus Headache Nose=t 10-708 – Carlos Guestrin 2006

MLE Variable elimination algorithm – Forward pass Given a BN and a MLE query maxx1,…,xnP(x1,…,xn,e) Instantiate evidence E=e Choose an ordering on variables, e.g., X1, …, Xn For i = 1 to n, If XiE Collect factors f1,…,fk that include Xi Generate a new factor by eliminating Xi from these factors Variable Xi has been eliminated! 10-708 – Carlos Guestrin 2006

MLE Variable elimination algorithm – Backward pass {x1*,…, xn*} will store maximizing assignment For i = n to 1, If Xi  E Take factors f1,…,fk used when Xi was eliminated Instantiate f1,…,fk, with {xi+1*,…, xn*} Now each fj depends only on Xi Generate maximizing assignment for Xi: 10-708 – Carlos Guestrin 2006

What you need to know about VE Variable elimination algorithm Eliminate a variable: Combine factors that include this var into single factor Marginalize var from new factor Cliques in induced graph correspond to factors generated by algorithm Efficient algorithm (“only” exponential in induced-width, not number of variables) If you hear: “Exact inference only efficient in tree graphical models” You say: “No!!! Any graph with low induced width” And then you say: “And even some with very large induced-width” (special recitation) Elimination order is important! NP-complete problem Many good heuristics Variable elimination for MLE Only difference between probabilistic inference and MLE is “sum” versus “max” 10-708 – Carlos Guestrin 2006

What if I want to compute P(Xi|x0,xn+1) for each i? Variable elimination for each i? Variable elimination for every i, what’s the complexity? 10-708 – Carlos Guestrin 2006

Reusing computation Compute: X0 X5 X3 X4 X2 X1 10-708 – Carlos Guestrin 2006

Cluster graph Cluster graph: For set of factors F Undirected graph Each node i associated with a cluster Ci Family preserving: for each factor fj 2 F, 9 node i such that scope[fi]µ Ci Each edge i – j is associated with a separator Sij = Ci Å Cj DIG JSL GJSL HGJ CD GSI D S G H J C L I 10-708 – Carlos Guestrin 2006

Factors generated by VE Difficulty SAT Grade Happy Job Coherence Letter Intelligence Elimination order: {C,D,I,S,L,H,J,G} 10-708 – Carlos Guestrin 2006

Cluster graph for VE VE generates cluster tree! Proposition: One clique for each factor used/generated Edge i – j, if fi used to generate fj “Message” from i to j generated when marginalizing a variable from fi Tree because factors only used once Proposition: “Message” dij from i to j Scope[dij] µ Sij DIG JSL GJSL HGJ CD GSI 10-708 – Carlos Guestrin 2006

Running intersection property Running intersection property (RIP) Cluster tree satisfies RIP if whenever X2 Ci and X2 Cj then X is in every cluster in the (unique) path from Ci to Cj Theorem: Cluster tree generated by VE satisfies RIP DIG JSL GJSL HGJ CD GSI 10-708 – Carlos Guestrin 2006

Constructing a clique tree from VE Select elimination order Á Connect factors that would be generated if you run VE with order Á Simplify! Eliminate factor that is subset of neighbor 10-708 – Carlos Guestrin 2006

Find clique tree from chordal graph Triangulate moralized graph to obtain chordal graph Find maximal cliques NP-complete in general Easy for chordal graphs Max-cardinality search Maximum spanning tree finds clique tree satisfying RIP!!! Generate weighted graph over cliques Edge weights (i,j) is separator size – |CiÅCj| Difficulty Grade Happy Job Coherence Letter Intelligence SAT 10-708 – Carlos Guestrin 2006

Clique tree & Independencies Clique tree (or Junction tree) A cluster tree that satisfies the RIP Theorem: Given some BN with structure G and factors F For a clique tree T for F consider Ci – Cj with separator Sij: X – any set of vars in Ci side of the tree Y – any set of vars in Ci side of the tree Then, (X  Y | Sij) in BN Furthermore, I(T) µ I(G) DIG JSL GJSL HGJ CD GSI 10-708 – Carlos Guestrin 2006

Variable elimination in a clique tree 1 C2: DIG C4: GJSL C5: HGJ C1: CD C3: GSI D S G H J C L I Clique tree for a BN Each CPT assigned to a clique Initial potential 0(Ci) is product of CPTs 10-708 – Carlos Guestrin 2006

Variable elimination in a clique tree 2 C2: DIG C4: GJSL C5: HGJ C1: CD C3: GSI VE in clique tree to compute P(Xi) Pick a root (any node containing Xi) Send messages recursively from leaves to root Multiply incoming messages with initial potential Marginalize vars that are not in separator Clique ready if received messages from all neighbors 10-708 – Carlos Guestrin 2006

Belief from message Theorem: When clique Ci is ready Received messages from all neighbors Belief i(Ci) is product of initial factor with messages: 10-708 – Carlos Guestrin 2006

Choice of root Message does not depend on root!!! Root: node 5 Root: node 3 “Cache” computation: Obtain belief for all roots in linear time!! 10-708 – Carlos Guestrin 2006

Shafer-Shenoy Algorithm (a.k.a. VE in clique tree for all roots) Clique Ci ready to transmit to neighbor Cj if received messages from all neighbors but j Leaves are always ready to transmit While 9 Ci ready to transmit to Cj Send message i! j Complexity: Linear in # cliques One message sent each direction in each edge Corollary: At convergence Every clique has correct belief C2 C3 C1 C4 C5 C6 C7 10-708 – Carlos Guestrin 2006

Calibrated Clique tree Initially, neighboring nodes don’t agree on “distribution” over separators Calibrated clique tree: At convergence, tree is calibrated Neighboring nodes agree on distribution over separator 10-708 – Carlos Guestrin 2006

Answering queries with clique trees Query within clique Incremental updates – Observing evidence Z=z Multiply some clique by indicator 1(Z=z) Query outside clique Use variable elimination! 10-708 – Carlos Guestrin 2006

Message passing with division C2: DIG C4: GJSL C5: HGJ C1: CD C3: GSI Computing messages by multiplication: Computing messages by division: 10-708 – Carlos Guestrin 2006

Lauritzen-Spiegelhalter Algorithm (a.k.a. belief propagation) Simplified description see reading for details Initialize all separator potentials to 1 mij Ã 1 All messages ready to transmit While 9 i! j ready to transmit mij’ Ã If mij’  mij di!j Ã j Ã j £ i!j ij Ã ij’ 8 neighbors k of j, k i, j!k ready to transmit Complexity: Linear in # cliques for the “right” schedule over edges (leaves to root, then root to leaves) Corollary: At convergence, every clique has correct belief C2 C4 C5 C1 C3 C7 C6 10-708 – Carlos Guestrin 2006

VE versus BP in clique trees VE messages (the one that multiplies) BP messages (the one that divides) 10-708 – Carlos Guestrin 2006

Clique tree invariant Clique tree potential: Clique tree invariant: Product of clique potentials divided by separators potentials Clique tree invariant: P(X) = T (X) 10-708 – Carlos Guestrin 2006

Belief propagation and clique tree invariant Theorem: Invariant is maintained by BP algorithm! BP reparameterizes clique potentials and separator potentials At convergence, potentials and messages are marginal distributions 10-708 – Carlos Guestrin 2006

Subtree correctness Informed message from i to j, if all messages into i (other than from j) are informed Recursive definition (leaves always send informed messages) Informed subtree: All incoming messages informed Theorem: Potential of connected informed subtree T’ is marginal over scope[T’] Corollary: At convergence, clique tree is calibrated i = P(scope[i]) ij = P(scope[ij]) 10-708 – Carlos Guestrin 2006

Clique trees versus VE Clique tree advantages Multi-query settings Incremental updates Pre-computation makes complexity explicit Clique tree disadvantages Space requirements – no factors are “deleted” Slower for single query Local structure in factors may be lost when they are multiplied together into initial clique potential 10-708 – Carlos Guestrin 2006

Clique tree summary Solve marginal queries for all variables in only twice the cost of query for one variable Cliques correspond to maximal cliques in induced graph Two message passing approaches VE (the one that multiplies messages) BP (the one that divides by old message) Clique tree invariant Clique tree potential is always the same We are only reparameterizing clique potentials Constructing clique tree for a BN from elimination order from triangulated (chordal) graph Running time (only) exponential in size of largest clique Solve exactly problems with thousands (or millions, or more) of variables, and cliques with tens of nodes (or less) 10-708 – Carlos Guestrin 2006