Clique Trees Amr Ahmed October 23, 2008. Outline Clique Trees Representation Factorization Inference Relation with VE.

Slides:

Advertisements

Similar presentations

1 Undirected Graphical Models Graphical Models – Carlos Guestrin Carnegie Mellon University October 29 th, 2008 Readings: K&F: 4.1, 4.2, 4.3, 4.4,

Advertisements

Exact Inference. Inference Basic task for inference: – Compute a posterior distribution for some query variables given some observed evidence – Sum out.

Variational Inference Amr Ahmed Nov. 6 th Outline Approximate Inference Variational inference formulation – Mean Field Examples – Structured VI.

Variational Methods for Graphical Models Micheal I. Jordan Zoubin Ghahramani Tommi S. Jaakkola Lawrence K. Saul Presented by: Afsaneh Shirazi.

CS498-EA Reasoning in AI Lecture #15 Instructor: Eyal Amir Fall Semester 2011.

Exact Inference in Bayes Nets

Junction Trees And Belief Propagation. Junction Trees: Motivation What if we want to compute all marginals, not just one? Doing variable elimination for.

Dynamic Bayesian Networks (DBNs)

Identifying Conditional Independencies in Bayes Nets Lecture 4.

Hidden Markov Models M. Vijay Venkatesh. Outline Introduction Graphical Model Parameterization Inference Summary.

Junction Trees: Motivation Standard algorithms (e.g., variable elimination) are inefficient if the undirected graph underlying the Bayes Net contains cycles.

Junction tree Algorithm :Probabilistic Graphical Models Recitation: 10/04/07 Ramesh Nallapati.

From Variable Elimination to Junction Trees

Machine Learning CUNY Graduate Center Lecture 6: Junction Tree Algorithm.

GS 540 week 6. HMM basics Given a sequence, and state parameters: – Each possible path through the states has a certain probability of emitting the sequence.

Variable Elimination Dhruv Batra, Recitation 10/16/2008.

Bayesian network inference

A Differential Approach to Inference in Bayesian Networks - Adnan Darwiche Jiangbo Dang and Yimin Huang CSCE582 Bayesian Networks and Decision Graph.

PGM 2003/04 Tirgul 3-4 The Bayesian Network Representation.

Global Approximate Inference Eran Segal Weizmann Institute.

Bayesian Network Representation Continued

Bayesian Networks Clique tree algorithm Presented by Sergey Vichik.

Belief Propagation, Junction Trees, and Factor Graphs

Exact Inference: Clique Trees

Bayesian Networks Alan Ritter.

A Differential Approach to Inference in Bayesian Networks - Adnan Darwiche Jiangbo Dang and Yimin Huang CSCE582 Bayesian Networks and Decision Graphs.

PGM 2002/03 Tirgul5 Clique/Junction Tree Inference.

. DAGs, I-Maps, Factorization, d-Separation, Minimal I-Maps, Bayesian Networks Slides by Nir Friedman.

Approximate Inference 2: Monte Carlo Markov Chain

Machine Learning CUNY Graduate Center Lecture 21: Graphical Models.

Undirected Models: Markov Networks David Page, Fall 2009 CS 731: Advanced Methods in Artificial Intelligence, with Biomedical Applications.

Belief Propagation. What is Belief Propagation (BP)? BP is a specific instance of a general class of methods that exist for approximate inference in Bayes.

1 Variable Elimination Graphical Models – Carlos Guestrin Carnegie Mellon University October 11 th, 2006 Readings: K&F: 8.1, 8.2, 8.3,

Readings: K&F: 11.3, 11.5 Yedidia et al. paper from the class website

Announcements Project 4: Ghostbusters Homework 7

1 CMSC 671 Fall 2001 Class #21 – Tuesday, November 13.

Daphne Koller Message Passing Belief Propagation Algorithm Probabilistic Graphical Models Inference.

1 Mean Field and Variational Methods finishing off Graphical Models – Carlos Guestrin Carnegie Mellon University November 5 th, 2008 Readings: K&F:

1 Use graphs and not pure logic Variables represented by nodes and dependencies by edges. Common in our language: “threads of thoughts”, “lines of reasoning”,

Exact Inference in Bayes Nets. Notation U: set of nodes in a graph X i : random variable associated with node i π i : parents of node i Joint probability:

Christopher M. Bishop, Pattern Recognition and Machine Learning 1.

1 Learning P-maps Param. Learning Graphical Models – Carlos Guestrin Carnegie Mellon University September 24 th, 2008 Readings: K&F: 3.3, 3.4, 16.1,

Introduction on Graphic Models

1 BN Semantics 2 – Representation Theorem The revenge of d-separation Graphical Models – Carlos Guestrin Carnegie Mellon University September 17.

Today Graphical Models Representing conditional dependence graphically

1 Structure Learning (The Good), The Bad, The Ugly Inference Graphical Models – Carlos Guestrin Carnegie Mellon University October 13 th, 2008 Readings:

Markov Random Fields in Vision

1 Variable Elimination Graphical Models – Carlos Guestrin Carnegie Mellon University October 15 th, 2008 Readings: K&F: 8.1, 8.2, 8.3,

1 BN Semantics 1 Graphical Models – Carlos Guestrin Carnegie Mellon University September 15 th, 2006 Readings: K&F: 3.1, 3.2, 3.3.

Knowledge Representation & Reasoning Lecture #5 UIUC CS 498: Section EA Professor: Eyal Amir Fall Semester 2005 (Based on slides by Lise Getoor and Alvaro.

Inference in Bayesian Networks

Exact Inference ..

CSCI 5822 Probabilistic Models of Human and Machine Learning

Readings: K&F: 15.1, 15.2, 15.3, 15.4, 15.5 K&F: 7 (overview of inference) K&F: 8.1, 8.2 (Variable Elimination) Structure Learning in BNs 3: (the good,

CAP 5636 – Advanced Artificial Intelligence

Clique Tree & Independence

CS 188: Artificial Intelligence

Variable Elimination 2 Clique Trees

Readings: K&F: 5.1, 5.2, 5.3, 5.4, 5.5, 5.6, 5.7 Markov networks, Factor graphs, and an unified view Start approximate inference If we are lucky… Graphical.

Lecture 3: Exact Inference in GMs

Clique Tree Algorithm: Computation

Unifying Variational and GBP Learning Parameters of MNs EM for BNs

Junction Trees 3 Undirected Graphical Models

Readings: K&F: 11.3, 11.5 Yedidia et al. paper from the class website

Variable Elimination Graphical Models – Carlos Guestrin

Mean Field and Variational Methods Loopy Belief Propagation

Generalized Belief Propagation

Presentation transcript:

Clique Trees Amr Ahmed October 23, 2008

Outline Clique Trees Representation Factorization Inference Relation with VE

Representation Given a Probability distribution, P – How to represent it? – What does this representation tell us about P? Cost of inference Independence relationships – What are the options? Full CPT Bayes Network (list of factors) Clique Tree

Representation FULL CPT – Space is exponential – Inference is exponential – Can read nothing about independence in P – Just bad

Representation: the past Bayes Network – List of Factors: P(X i |Pa(X i )) Space efficient – Independence Read local Markov Ind. Compute global independence via d-seperation – Inference Can use dynamic programming by leveraging factors Tell us little immediately about cost of inference – Fix an elimination order – Compute the induced graph – Find the largest clique size » Inference is exponential in this largest clique size

Representation: Today Clique Trees (CT) – Tree of cliques – Can be constructed from Bayes network Bayes Network + Elimination order  CT – What independence can read from CT about P? – How P factorizes over CT? – How to do inference using CT? – When should you use CT?

The Big Picture Full CPT Inference is Just summation Difficulty SATGrade Happy Job Coherence Letter Intelligence List of local factors VE operates in factors by caching computations (intermediate factors) within a single inference operation P(X|E) Tree of Cliques CT enables caching computations (intermediate factors) across multiple inference operations P(X i,X j ) for all I,j many PDAG, Minimal I-MAP,etc many Choose an Elimination order, Max-spaning tree

Clique Trees: Representation For set of factors F (i.e. Bayes Net) – Undirected graph – Each node i associated with a cluster C i – Family preserving: for each factor f j 2 F, 9 node i such that scope[f i ]  C i – Each edge i – j is associated with a separator S ij = C i  C j

Clique Trees: Representation Family preserving over factors Running Intersection Property Both are correct Clique trees Difficulty SATGrade Happy Job Coherence Letter Intelligence CDGDISG L J SH J G

Clique Trees: Representation What independence can be read from CT – I(CT) subset I(G) subset I(P) Use your intuition – How to block a path? Observe a separator. Q4 Difficulty SATGrade Happy Job Coherence Letter Intelligence

Clique Trees: Representation How P factorizes over CT (when CT is calibrated) Q4 (See ) Difficulty SATGrade Happy Job Coherence Letter Intelligence

Representation Summary Clique trees (like Bayes Net) has two parts – Structure – Potentials (the parallel to CPTs in BN) Clique potentials Separator Potential Upon calibration, you can read marginals from the cliques and separator potentials Initialize clique potentials with factors from BN – Distribute factors over cliques (family preserving) – Cliques must satisfy RIP But wee need calibration to reach a fixed point of these potentials (see later today) Compare to BN – You can only read local conditionals P(x i |pa(x i )) in BN You need VE to answer other queries – In CT, upon calibration, you can read marginals over cliques You need VE over calibrated CT to answer queries whose scope can not be confined to a single clique

Clique tree Construction Replay VE Connect factors that would be generated if you run VE with this order Simplify! – Eliminate factor that is subset of neighbor Difficulty SATGrade Happy Job Coherence Letter Intelligence

Clique tree Construction (details) Replay VE with order: C,D,I,H,S, L,J,G Difficulty SATGrade Happy Job Coherence Letter Intelligence Initial factors: C, DC, GDI, SI, I, LG, JLS, HJG Eliminate C: multiply CD, C to get factor with CD, then marginalize C To get a factor with D. CCDD Eliminate D: multiply D, GDI to get factor with GDI, then marginalize D to get a factor with GI CCDDDGIGI Eliminate I: multiply GI, SI, I to get factor with GSI, then marginalize I to get a factor with GS CCDDDGIGI GSI I SI GS

Clique tree Construction (details) Replay VE with order: C,D,I,H,S, L,J,G Difficulty SATGrade Happy Job Coherence Letter Intelligence Initial factors: C, DC, GDI, SI, I, LG, JLS, HJG Eliminate I: multiply GI, SI, I to get factor with GSI, then marginalize I to get a factor with GS CCDDDGIGI GSI I SI GS Eliminate H: just marginalize HJG to get a factor with JG CCDDDGIGI GSI I SI GS HJG JG

Clique tree Construction (details) Replay VE with order: C,D,I,H,S, L,J,G Difficulty SATGrade Happy Job Coherence Letter Intelligence Initial factors: C, DC, GDI, SI, I, LG, JLS, HJG Eliminate H: just marginalize HJG to get a factor with JG CCDDDGIGI GSI I SI GS HJG JG Eliminate S: multiply GS, JLS to get GJLS, then marginalize S to get GJL CCDDDGIGI GSI I SI GS HJG JG JLS GJLSGJL

Clique tree Construction (details) Replay VE with order: C,D,I,H,S, L,J,G Difficulty SATGrade Happy Job Coherence Letter Intelligence Initial factors: C, DC, GDI, SI, I, LG, JLS, HJG Eliminate S: multiply GS, JLS to get GJLS, then marginalize S to get GJL CCDDDGIGI GSI I SI GS HJG JG JLS GJLSGJL Eliminate L: multiply GJL, LG to get JLG, then marginalize L to get GJ CCDDDGIGI GSI I SI GS HJG JG JLS GJLSGJL LG Eliminate L, G: JG  G G

Clique tree Construction (details) Summarize CT by removing subsumed nodes CCDDDGIGI GSI I SI GS HJG JG JLS GJLSGJL LG CDGDIGSIG L J SH J G - Satisfy RIP and Family preserving (always true for any Elimination order) - Finally distribute initial factor into the cliques, to get initial beliefs (which is the parallel of CPTs in BN), to be used for inference G

Clique tree Construction: Another method From a triangulated graph – Still from VE, why? – Elimination order  triangulation – Triangulation  Max cliques – Connect cliques, find max-spanning tree

Clique tree Construction: Another method (details) Get choral graph (add fill edges) for the same order as before C,D,I,H,S, L,J,G. Extract Max cliques from this graph and get maximum-spanning clique tree Difficulty SATGrade Happy Job Coherence Letter Intelligence CD GDIGSI G L J S H J G CDGDIGSIG L J SH J G As before

The Big Picture Full CPT Inference is Just summation Difficulty SATGrade Happy Job Coherence Letter Intelligence List of local factors VE operates in factors by caching computations (intermediate factors) within a single inference operation P(X|E) Tree of Cliques CT enables caching computations (intermediate factors) across multiple inference operations P(X i,X j ) for all I,j many PDAG, Minimal I-MAP,etc many Choose an Elimination order, Max-spaning tree

Clique Tree: Inference P(X): assume X is in a node (root) Just run VE! Using elimination order dictated by the tree and initial factors put into each clique to define \Pi 0 (C i ) When done we have P(G,J,S,L) In VE jargon, we assign these messages names like g1, g2, etc. Eliminate C Eliminate D Eliminate IEliminate H

Clique Tree: Inference (2) In VE jargon, we call these messages with names like g1, g2, etc. Initial Local belief What is: Just a factor over D Just a factor over GI We are simply doing VE along “partial” order determined by the tree: (C,D,I) and H (i.e. H can Be anywhere in the order)

Clique Tree: Inference (3) Initial Local belief -When we are done, C5 would have received two messages from left and right - In VE, we will end up with factors corresponding to these messages in addition to all factors that were distributed into C5: P(L|G), P(J|L,G) -In VE, we multiply all these factors to get the marginals - In CT, we multiply all factors in C5 : \Pi_0(C_5) with these two messages to get C_5 calibrated potential (which is also the marginal), so what is the deal? Why this is useful?

Clique Tree: Inter-Inference Caching P(G,L) :use C5 as root P(I,S): use C3 as root Notice the same 3 messages: i.e. same intermediate factors in VE

What is passed across the edge? Exclusive Scope:CDExclusive Scope: S,L,J,HEdge Scope: GI GISLJH CDGI -The message summarizes what the right side of the tree cares about in the left side (GI) - See Theorem Completely determined by the root -Multiply all factors in left side -Eliminate out exclusive variables (but do it in steps along the tree: C then D) -The message depends ONLY on the direction of the edge!!!

Clique Tree Calibration Two Step process: – Upward: as before – Downward (after you calibrate the root)

Intuitively Why it works? Upward Phase: Root is calibrated Downward : Lets take C4, what if it was a root. C4: just needs message from C6 That summarizes the status of the Separator from the other side of the tree The two tree only differ on the edge from C4-C6, but else The same = Now C4 is calibrated and can Act recursively as a new root!!!

Clique Trees Can compute all clique marginals with double the cost of a single VE Need to store all intermediate messages – It is not magic If you store intermediate factors from VE you get the same effect!! You lose internal structure and some independency – Do you care? Time: no! Space: YES You can still run VE to get marginal with variables not in the same clique and even all pair-wise marginals (Q5). Good for continuous inference Can not be tailored to evidences: only one elimination order

Queries Outside Clique: Q5 T is assumed calibrated – Cliques agree on separators – See section , Section