PGM 2003/04 Tirgul6 Clique/Junction Tree Inference

Slides:



Advertisements
Similar presentations
Constraint Satisfaction Problems
Advertisements

Variational Methods for Graphical Models Micheal I. Jordan Zoubin Ghahramani Tommi S. Jaakkola Lawrence K. Saul Presented by: Afsaneh Shirazi.
CS498-EA Reasoning in AI Lecture #15 Instructor: Eyal Amir Fall Semester 2011.
Bayesian Networks, Winter Yoav Haimovitch & Ariel Raviv 1.
Lauritzen-Spiegelhalter Algorithm
Exact Inference in Bayes Nets
Junction Trees And Belief Propagation. Junction Trees: Motivation What if we want to compute all marginals, not just one? Doing variable elimination for.
Dynamic Bayesian Networks (DBNs)
For Monday Read chapter 18, sections 1-2 Homework: –Chapter 14, exercise 8 a-d.
Parallel Scheduling of Complex DAGs under Uncertainty Grzegorz Malewicz.
Clique Trees Amr Ahmed October 23, Outline Clique Trees Representation Factorization Inference Relation with VE.
Junction Trees: Motivation Standard algorithms (e.g., variable elimination) are inefficient if the undirected graph underlying the Bayes Net contains cycles.
From Variable Elimination to Junction Trees
GS 540 week 6. HMM basics Given a sequence, and state parameters: – Each possible path through the states has a certain probability of emitting the sequence.
Recent Development on Elimination Ordering Group 1.
Global Approximate Inference Eran Segal Weizmann Institute.
. Bayesian Networks Lecture 9 Edited from Nir Friedman’s slides by Dan Geiger from Nir Friedman’s slides.
Bayesian Networks Clique tree algorithm Presented by Sergey Vichik.
Belief Propagation, Junction Trees, and Factor Graphs
. Inference I Introduction, Hardness, and Variable Elimination Slides by Nir Friedman.
Learning Equivalence Classes of Bayesian-Network Structures David M. Chickering Presented by Dmitry Zinenko.
Exact Inference: Clique Trees
PGM 2002/03 Tirgul5 Clique/Junction Tree Inference.
. PGM 2002/3 – Tirgul6 Approximate Inference: Sampling.
Undirected Models: Markov Networks David Page, Fall 2009 CS 731: Advanced Methods in Artificial Intelligence, with Biomedical Applications.
1 Inferring structure to make substantive conclusions: How does it work? Hypothesis testing approaches: Tests on deviances, possibly penalised (AIC/BIC,
Automated Planning and Decision Making Prof. Ronen Brafman Automated Planning and Decision Making 2007 Bayesian networks Variable Elimination Based on.
1 Variable Elimination Graphical Models – Carlos Guestrin Carnegie Mellon University October 11 th, 2006 Readings: K&F: 8.1, 8.2, 8.3,
UIUC CS 598: Section EA Graphical Models Deepak Ramachandran Fall 2004 (Based on slides by Eyal Amir (which were based on slides by Lise Getoor and Alvaro.
1 COROLLARY 4: D is an I-map of P iff each variable X is conditionally independent in P of all its non-descendants, given its parents. Proof  : Each variable.
Intro to Junction Tree propagation and adaptations for a Distributed Environment Thor Whalen Metron, Inc.
1 Use graphs and not pure logic Variables represented by nodes and dependencies by edges. Common in our language: “threads of thoughts”, “lines of reasoning”,
Exact Inference in Bayes Nets. Notation U: set of nodes in a graph X i : random variable associated with node i π i : parents of node i Joint probability:
Today Graphical Models Representing conditional dependence graphically
Algorithms for hard problems Parameterized complexity Bounded tree width approaches Juris Viksna, 2015.
1 Variable Elimination Graphical Models – Carlos Guestrin Carnegie Mellon University October 15 th, 2008 Readings: K&F: 8.1, 8.2, 8.3,
. Bayesian Networks Some slides have been edited from Nir Friedman’s lectures which is available at Changes made by Dan Geiger.
Knowledge Representation & Reasoning Lecture #5 UIUC CS 498: Section EA Professor: Eyal Amir Fall Semester 2005 (Based on slides by Lise Getoor and Alvaro.
Theory of Computational Complexity Probability and Computing Chapter Hikaru Inada Iwama and Ito lab M1.
Instructor: Eyal Amir Grad TAs: Wen Pu, Yonatan Bisk
An introduction to chordal graphs and clique trees
CSPs: Search and Arc Consistency Computer Science cpsc322, Lecture 12
Inference in Bayesian Networks
CSPs: Search and Arc Consistency Computer Science cpsc322, Lecture 12
Parameterized complexity Bounded tree width approaches
Exact Inference ..
Bayesian Networks Background Readings: An Introduction to Bayesian Networks, Finn Jensen, UCL Press, Some slides have been edited from Nir Friedman’s.
Bell & Coins Example Coin1 Bell Coin2
Exact Inference Continued
The Art Gallery Problem
CSCI 5822 Probabilistic Models of Human and Machine Learning
UIUC CS 497: Section EA Lecture #6
Professor Marie desJardins,
Boi Faltings and Martin Charles Golumbic
Exact Inference ..
Boi Faltings and Martin Charles Golumbic
Exact Inference Eric Xing Lecture 11, August 14, 2010
Exact Inference Continued
Variable Elimination 2 Clique Trees
Readings: K&F: 5.1, 5.2, 5.3, 5.4, 5.5, 5.6, 5.7 Markov networks, Factor graphs, and an unified view Start approximate inference If we are lucky… Graphical.
Inference III: Approximate Inference
Lecture 3: Exact Inference in GMs
Clique Tree Algorithm: Computation
Junction Trees 3 Undirected Graphical Models
Variable Elimination Graphical Models – Carlos Guestrin
Advanced Machine Learning
Minimum Spanning Trees
Presentation transcript:

PGM 2003/04 Tirgul6 Clique/Junction Tree Inference

Undirected graph representation At each stage of the procedure, we have an algebraic term that we need to evaluate In general this term is of the form: where Zi are sets of variables We now plot a graph where there is an undirected edge X--Y if X,Y are arguments of some factor that is, if X,Y are in some Zi Note: this is the Markov network that describes the probability on the variables we did not eliminate yet

Undirected Graph Representation Consider the “Asia” example The initial factors are thus, the undirected graph is In this case this graph is just the moralized graph V S L T A B X D V S L T A B X D

Undirected Graph Representation Now we eliminate t, getting The corresponding change in the graph is V S L T A B X D V S T L A B X D

Example Want to compute P(L, V = t, S = f, D = t) Moralizing V S L T A B X D Want to compute P(L, V = t, S = f, D = t) Moralizing V S D T L A B X

Example Want to compute P(L, V = t, S = f, D = t) Moralizing B X D Want to compute P(L, V = t, S = f, D = t) Moralizing Setting evidence V S D T L A B X

Example Want to compute P(L, V = t, S = f, D = t) Moralizing B X D Want to compute P(L, V = t, S = f, D = t) Moralizing Setting evidence Eliminating x New factor fx(A) V S D T L A B X

Example Want to compute P(L, V = t, S = f, D = t) Moralizing B X D Want to compute P(L, V = t, S = f, D = t) Moralizing Setting evidence Eliminating x Eliminating a New factor fa(b,t,l) V S D T L A B X

Example Want to compute P(L, V = t, S = f, D = t) Moralizing B X D Want to compute P(L, V = t, S = f, D = t) Moralizing Setting evidence Eliminating x Eliminating a Eliminating b New factor fb(t,l) V S D T L A B X

Example Want to compute P(L, V = t, S = f, D = t) Moralizing B X D Want to compute P(L, V = t, S = f, D = t) Moralizing Setting evidence Eliminating x Eliminating a Eliminating b Eliminating t New factor ft(l) V S D T L A B X

Elimination in Undirected Graphs Generalizing, we see that we can eliminate a variable x by 1. For all Y,Z, s.t., Y--X, Z--X add an edge Y--Z 2. Remove X and all adjacent edges to it This procedures create a clique that contains all the neighbors of X After step 1 we have a clique that corresponds to the intermediate factor (before marginlization) The cost of the step is exponential in the size of this clique

Undirected Graphs The process of eliminating nodes from an undirected graph gives us a clue to the complexity of inference To see this, we will examine the graph that contains all of the edges we added during the elimination

Example Want to compute P(L) Moralizing V S L T A B X D V S D T L A B

Example Want to compute P(L) Moralizing Eliminating v S L T A B X D Want to compute P(L) Moralizing Eliminating v Multiply to get f’v(v,t) Result fv(t) V S T L A B X D

Example Want to compute P(L) Moralizing Eliminating v Eliminating x S L T A B X D Want to compute P(L) Moralizing Eliminating v Eliminating x Multiply to get f’x(a,x) Result fx(a) V S T L A B X D

Example Want to compute P(L) Moralizing Eliminating v Eliminating x S L T A B X D Want to compute P(L) Moralizing Eliminating v Eliminating x Eliminating s Multiply to get f’s(l,b,s) Result fs(l,b) V S T L A B X D

Example Want to compute P(D) Moralizing Eliminating v Eliminating x S L T A B X D Want to compute P(D) Moralizing Eliminating v Eliminating x Eliminating s Eliminating t Multiply to get f’t(a,l,t) Result ft(a,l) V S T L A B X D

Example Want to compute P(D) Moralizing Eliminating v Eliminating x S L T A B X D Want to compute P(D) Moralizing Eliminating v Eliminating x Eliminating s Eliminating t Eliminating l Multiply to get f’l(a,b,l) Result fl(a,b) V S T L A B X D

Example Want to compute P(D) Moralizing Eliminating v Eliminating x S L T A B X D Want to compute P(D) Moralizing Eliminating v Eliminating x Eliminating s Eliminating t Eliminating l Eliminating a, b Multiply to get f’a(a,b,d) Result f(d) V S T L A B X D

Expanded Graphs L T A B X V S D The resulting graph is the induced graph (for this particular ordering) Main property: Every maximal clique in the induced graph corresponds to a intermediate factor in the computation Every factor stored during the process is a subset of some maximal clique in the graph These facts are true for any variable elimination ordering on any network

Induced Width The size of the largest clique in the induced graph is thus an indicator for the complexity of variable elimination This quantity is called the induced width of a graph according to the specified ordering Finding a good ordering for a graph is equivalent to finding the minimal induced width of the graph

Consequence: Elimination on Trees Suppose we have a tree A network where each variable has at most one parent All the factors involve at most two variables Thus, the moralized graph is also a tree A C B D E F G A C B D E F G

Elimination on Trees We can maintain the tree structure by eliminating extreme variables in the tree A C B D E F G A B C D E F G A B C D E F G

Elimination on Trees Formally, for any tree, there is an elimination ordering with induced width = 1 Thm Inference on trees is linear in number of variables

PolyTrees A polytree is a network where there is at most one path from one variable to another Thm: Inference in a polytree is linear in the representation size of the network This assumes tabular CPT representation Can you see how the argument would work? A C B D E F G H

General Networks What do we do when the network is not a polytree? If network has a cycle, the induced width for any ordering is greater than 1

Example Eliminating A, B, C, D, E,…. A H B D F C E G A H B D F C E G A

Example Eliminating H,G, E, C, F, D, E, A A H B D F C E G A H B D F C

General Networks From graph theory: Thm: Finding an ordering that minimizes the induced width is NP-Hard However, There are reasonable heuristic for finding “relatively” good ordering There are provable approximations to the best induced width If the graph has a small induced width, there are algorithms that find it in polynomial time

Chordal Graphs Recall: elimination ordering  undirected chordal graph Maximal cliques are factors in elimination Factors in elimination are cliques in the graph Complexity is exponential in size of the largest clique in graph V S L T A B X D L T A B X V S D

Cluster Trees Variable elimination  graph of clusters Nodes in graph are annotated by the variables in a factor Clusters: circles correspond to multiplication Separators: boxes correspond to marginalization T,V T V S L T A B X D A,L,T B,L,S A,L B,L A,L,B X,A A,B A A,B,D

Properties of cluster trees Cluster graph must be a tree Only one path between any two clusters A separator is labeled by the intersection of the labels of the two neighboring clusters Running intersection property: All separators on the path between two clusters contain their intersection T,V T A,L,T B,L,S A,L B,L A,L,B X,A A,B A A,B,D

Cluster Trees & Chordal Graphs Combining the two representations we get that: Every maximal clique in chordal is a cluster in tree Every separator in tree is a separator in the chordal graph T,V T L T A B X V S D A,L,T B,L,S A,L B,L A,L,B X,A A,B A A,B,D

Cluster Trees & Chordal Graphs Observation: If a cluster that is not a maximal clique, then it must be adjacent to one that is a superset of it We might as well work with cluster tree were each cluster is a maximal clique T,V A,L,T B,L,S X,A A,L,B A,B,D A A,B B,L T A,L L T A B X V S D

Cluster Trees & Chordal Graphs Thm: If G is a chordal graph, then it can be embedded in a tree of cliques such that: Every clique in G is a subset of at least one node in the tree The tree satisfies the running intersection property

Elimination in Chordal Graphs B X V S D A separator S divides the remaining variables in the graph in to two groups Variables in each group appears on one “side” in the cluster tree Examples: {A,B}: {L, S, T, V} & {D, X} {A,L}: {T, V} & {B,D,S,X} {B,L}: {S} & {A, D,T, V, X} {A}: {X} & {B,D,L, S, T, V} {T}; {V} & {A, B, D, K, S, X} T,V A,L,T B,L,S X,A A,L,B A,B,D A A,B B,L T A,L

Elimination in Cluster Trees Let X and Y be the partition induced by S Observation: Eliminating all variables in X results in a factor fX(S) Proof: Since S is a separator only variables in S are adjacent to variables in X Note:The same factor would result, regardless of elimination ordering x y A B S fX(S) fY(S)

Recursive Elimination in Cluster Trees How do we compute fX(S) ? By recursive decomposition along cluster tree Let X1 and X2 be the disjoint partitioning of X - C implied by the separators S1 and S2 Eliminate X1 to get fX1(S1) Eliminate X2 to get fX2(S2) Eliminate variables in C - S to get fX(S) x1 x2 S1 S2 C S y

Elimination in Cluster Trees (or Belief Propagation revisited) Assume we have a cluster tree Separators: S1,…,Sk Each Si determines two sets of variables Xi and Yi, s.t. Si  Xi  Yi = {X1,…,Xn} All paths from clusters containing variables in Xi to clusters containing variables in Yi pass through Si We want to compute fXi(Si) and fYi(Si) for all i

Elimination in Cluster Trees Idea: Each of these factors can be decomposed as an expression involving some of the others Use dynamic programming to avoid recomputation of factors

Example T,V A,L,T B,L,S X,A A,L,B A,B,D A A,B B,L T A,L

Dynamic Programming We now have the tools to solve the multi-query problem Step 1: Inward propagation Pick a cluster C Compute all factors eliminating from fringes of the tree toward C This computes all “inward” factors associated with separators C

Dynamic Programming We now have the tools to solve the multi-query problem Step 1: Inward propagation Step 2: Outward propagation Compute all factors on separators going outward from C to fringes C

Dynamic Programming We now have the tools to solve the multi-query problem Step 1: Inward propagation Step 2: Outward propagation Step 3: Computing beliefs on clusters To get belief on a cluster C’ multiply: CPDs that involves only variables in C’ Factors on separators adjacent to C’ using the proper direction This simulates the result of elimination of all variables except these in C’ using pre-computed factors C C’’

Complexity Time complexity: Each traversal of the tree is costs the same as standard variable elimination Total computation cost is twice of standard variable elimination Space complexity: Need to store partial results Requires two factors for each separator Space requirements can be up to 2n more expensive than variable elimination

The “Asia” network with evidence Visit to Asia Smoking Lung Cancer Tuberculosis Abnormality in Chest Bronchitis X-Ray Dyspnea We want to compute P(L|D=t,V=t,S=f)

Initial factors with evidence We want to compute P(L|D=t,V=t,S=f) P(T|V): ( ( Tuberculosis false ) ( VisitToAsia true ) ) 0.95 ( ( Tuberculosis true ) ( VisitToAsia true ) ) 0.05 P(B|S): ( ( Bronchitis false ) ( Smoking false ) ) 0.7 ( ( Bronchitis true ) ( Smoking false ) ) 0.3 P(L|S): ( ( LungCancer false ) ( Smoking false ) ) 0.99 ( ( LungCancer true ) ( Smoking false ) ) 0.01 P(D|B,A): ( ( Dyspnea true ) ( Bronchitis false ) ( AbnormalityInChest false ) ) 0.1 ( ( Dyspnea true ) ( Bronchitis true ) ( AbnormalityInChest false ) ) 0.8 ( ( Dyspnea true ) ( Bronchitis false ) ( AbnormalityInChest true ) ) 0.7 ( ( Dyspnea true ) ( Bronchitis true ) ( AbnormalityInChest true ) ) 0.9

Initial factors with evidence (cont.) P(A|L,T): ( ( Tuberculosis false ) ( LungCancer false ) ( AbnormalityInChest false ) ) 1 ( ( Tuberculosis true ) ( LungCancer false ) ( AbnormalityInChest false ) ) 0 ( ( Tuberculosis false ) ( LungCancer true ) ( AbnormalityInChest false ) ) 0 ( ( Tuberculosis true ) ( LungCancer true ) ( AbnormalityInChest false ) ) 0 ( ( Tuberculosis false ) ( LungCancer false ) ( AbnormalityInChest true ) ) 0 ( ( Tuberculosis true ) ( LungCancer false ) ( AbnormalityInChest true ) ) 1 ( ( Tuberculosis false ) ( LungCancer true ) ( AbnormalityInChest true ) ) 1 ( ( Tuberculosis true ) ( LungCancer true ) ( AbnormalityInChest true ) ) 1 P(X|A): ( ( X-Ray false ) ( AbnormalityInChest false ) ) 0.95 ( ( X-Ray true ) ( AbnormalityInChest false ) ) 0.05 ( ( X-Ray false ) ( AbnormalityInChest true ) ) 0.02 ( ( X-Ray true ) ( AbnormalityInChest true ) ) 0.98

Step 1: Initial Clique values T,V CT=P(T|V) T CB,L=P(L|S)P(B|S) T,L,A B,L,S CT,L,A=P(A|L,T) L,A B,L CX,A=P(X|A) X,A B,L,A CB,L,A=1 B,A A “dummy” separators: this is the intersection between nodes in the junction tree and helps in defining the inference messages (see below) D,B,A CB,A=1

Step 2: Update from leaves T,V CT T ST=CT T,L,A B,L,S CT,L,A CB,L L,A B,L S  B,L=CB,L B,L,A X,A CB,L,A CX,A B,A A S  A=CX,A D,B,A CB,A

Step 3: Update (cont.) T,V T T,L,A B,L,S L,A B,L B,L,A X,A B,A A D,B,A CT T ST T,L,A B,L,S CT,L,A CB,L L,A B,L SL,A=(CT,L,Ax ST) SB,L B,L,A X,A CB,L,A CX,A SB,A=(CB,Ax SA) B,A A SA D,B,A CB,A

Step 4: Update (cont.) T,V T T,L,A B,L,S L,A B,L B,L,A X,A B,A A D,B,A CT T ST T,L,A B,L,S CB,L CT,L,A SB,L SL,A=(CB,L,Ax SB,LXSB,A) L,A B,L SL,A SB,L=(CB,L,Ax SL,AXSB,A) B,L,A CB,L,A X,A CX,A B,A SB,A A SA SB,A=(CB,L,Ax SL,AxSB,L) D,B,A CB,A

Step 5: Update (cont.) T,V T T,L,A B,L,S L,A B,L B,L,A X,A B,A A D,B,A CT ST=(CT,L,Ax S  L,A) T ST T,L,A B,L,S CT,L,A CB,L SB,L L,A B,L SL,A SL,A SB,L B,L,A CB,L,A X,A SA CX,A B,A SB,A SB,A A D,B,A SA=(CB,Ax SB,A) CB,A

Step 6: Compute Query P(L|D=t,V=t,S=f) = (CB,Lx SB,L) = (CB,L,Ax SL,A x SB,L x S  B,A) = … and normalize T,V CT ST T ST T,L,A B,L,S CT,L,A CB,L SB,L L,A B,L SL,A SL,A SB,L B,L,A CB,L,A X,A SA CX,A B,A SB,A SB,A A D,B,A SA CB,A

How to avoid small numbers and normalize (with N1xN2xN3xN4xN5xNBLA) P(L|D=t,V=t,S=f) = (CB,Lx SB,L) = (CB,L,Ax SL,A x SB,L x S  B,A) = … and normalize (with N1xN2xN3xN4xN5xNBLA) T,V CT ST T ST Normalize by N4 Normalize by N1 T,L,A B,L,S CT,L,A CB,L SB,L L,A B,L SL,A SL,A SB,L Normalize by N2 B,L,A CB,L,A X,A Normalize by N5 CX,A B,A SB,A SB,A SA A SA D,B,A CB,A Normalize by N3

A Theorem about elimination order Triangulated graph: a graph that has no cycle with length > 3 without a chord. Simplicial node: a node that can be eliminated without the need for addition of an extra edge, i.e. all its neighbouring nodes are connected (they form a complete subgraph). Eliminatable graph: a graph which has an elimination order without the need to add edges - all the nodes are simplicial in that order. Thm: Every triangulated graph is eliminatable.

Lemma: An uncomplete triangulated graph G with a node set N (at least 3) has a complete subset S which separates the graph - every path between the two parts of N/S goes through S. Proof: Let S be a minimal set of nodes such that any path between non-adjacent nodes A and B contains a nodes from S. Assume that C,D in S are not neighbors. Since S is minimal, there is a path from A to B in G passing only through C in S (and same for D). Then there is a path from C to D in GA and in GB. This path is a cycle that a chord C--D must break. A B S GA GB

Claim: Let G be a triangulated graph Claim: Let G be a triangulated graph . We always have two simplicial nodes that can be chosen nonadjacent (if the graph is not complete). Proof: The claim is trivial for a complete graph and a graph with 2 nodes. Let G have n nodes. If GA is complete choose any simplicial node outside S. If not, choose one of the two outside S (they cannot be both in S or they will be adjacent). Same can be done for GB and nodes are non-adjacent (separated by S). Wrapping up: Any graph with 2 nodes is triangulated and eliminatable. The claim gives us more than the single simplicial node we need. * Full proof can be found at Jensen, Appendix A.