Generalized Belief Propagation

Slides:

Advertisements

Similar presentations

1 Undirected Graphical Models Graphical Models – Carlos Guestrin Carnegie Mellon University October 29 th, 2008 Readings: K&F: 4.1, 4.2, 4.3, 4.4,

Advertisements

Markov Networks Alan Ritter.

Variational Inference Amr Ahmed Nov. 6 th Outline Approximate Inference Variational inference formulation – Mean Field Examples – Structured VI.

Variational Methods for Graphical Models Micheal I. Jordan Zoubin Ghahramani Tommi S. Jaakkola Lawrence K. Saul Presented by: Afsaneh Shirazi.

Exact Inference in Bayes Nets

Parameter Learning in MN. Outline CRF Learning CRF for 2-d image segmentation IPF parameter sharing revisited.

Junction Trees And Belief Propagation. Junction Trees: Motivation What if we want to compute all marginals, not just one? Doing variable elimination for.

Loopy Belief Propagation a summary. What is inference? Given: –Observabled variables Y –Hidden variables X –Some model of P(X,Y) We want to make some.

Undirected Probabilistic Graphical Models (Markov Nets) (Slides from Sam Roweis)

An Introduction to Variational Methods for Graphical Models.

Introduction to Belief Propagation and its Generalizations. Max Welling Donald Bren School of Information and Computer and Science University of California.

Clique Trees Amr Ahmed October 23, Outline Clique Trees Representation Factorization Inference Relation with VE.

Belief Propagation by Jakob Metzler. Outline Motivation Pearl’s BP Algorithm Turbo Codes Generalized Belief Propagation Free Energies.

Overview of Inference Algorithms for Bayesian Networks Wei Sun, PhD Assistant Research Professor SEOR Dept. & C4I Center George Mason University, 2009.

Markov Networks.

CS774. Markov Random Field : Theory and Application Lecture 04 Kyomin Jung KAIST Sep

Junction tree Algorithm :Probabilistic Graphical Models Recitation: 10/04/07 Ramesh Nallapati.

GS 540 week 6. HMM basics Given a sequence, and state parameters: – Each possible path through the states has a certain probability of emitting the sequence.

Parameter Learning in Markov Nets Dhruv Batra, Recitation 11/13/2008.

Global Approximate Inference Eran Segal Weizmann Institute.

Belief Propagation, Junction Trees, and Factor Graphs

1 EM for BNs Graphical Models – Carlos Guestrin Carnegie Mellon University November 24 th, 2008 Readings: 18.1, 18.2, –  Carlos Guestrin.

Some Surprises in the Theory of Generalized Belief Propagation Jonathan Yedidia Mitsubishi Electric Research Labs (MERL) Collaborators: Bill Freeman (MIT)

Undirected Models: Markov Networks David Page, Fall 2009 CS 731: Advanced Methods in Artificial Intelligence, with Biomedical Applications.

Belief Propagation. What is Belief Propagation (BP)? BP is a specific instance of a general class of methods that exist for approximate inference in Bayes.

1 Bayesian Param. Learning Bayesian Structure Learning Graphical Models – Carlos Guestrin Carnegie Mellon University October 6 th, 2008 Readings:

1 Variable Elimination Graphical Models – Carlos Guestrin Carnegie Mellon University October 11 th, 2006 Readings: K&F: 8.1, 8.2, 8.3,

Readings: K&F: 11.3, 11.5 Yedidia et al. paper from the class website

Daphne Koller Message Passing Belief Propagation Algorithm Probabilistic Graphical Models Inference.

An Introduction to Variational Methods for Graphical Models

1 BN Semantics 1 Graphical Models – Carlos Guestrin Carnegie Mellon University September 15 th, 2008 Readings: K&F: 3.1, 3.2, –  Carlos.

1 Mean Field and Variational Methods finishing off Graphical Models – Carlos Guestrin Carnegie Mellon University November 5 th, 2008 Readings: K&F:

Belief Propagation and its Generalizations Shane Oldenburger.

1 Parameter Learning 2 Structure Learning 1: The good Graphical Models – Carlos Guestrin Carnegie Mellon University September 27 th, 2006 Readings:

1 Use graphs and not pure logic Variables represented by nodes and dependencies by edges. Common in our language: “threads of thoughts”, “lines of reasoning”,

Exact Inference in Bayes Nets. Notation U: set of nodes in a graph X i : random variable associated with node i π i : parents of node i Joint probability:

1 Param. Learning (MLE) Structure Learning The Good Graphical Models – Carlos Guestrin Carnegie Mellon University October 1 st, 2008 Readings: K&F:

1 Learning P-maps Param. Learning Graphical Models – Carlos Guestrin Carnegie Mellon University September 24 th, 2008 Readings: K&F: 3.3, 3.4, 16.1,

1 BN Semantics 2 – Representation Theorem The revenge of d-separation Graphical Models – Carlos Guestrin Carnegie Mellon University September 17.

Today Graphical Models Representing conditional dependence graphically

1 Structure Learning (The Good), The Bad, The Ugly Inference Graphical Models – Carlos Guestrin Carnegie Mellon University October 13 th, 2008 Readings:

1 Variable Elimination Graphical Models – Carlos Guestrin Carnegie Mellon University October 15 th, 2008 Readings: K&F: 8.1, 8.2, 8.3,

1 BN Semantics 1 Graphical Models – Carlos Guestrin Carnegie Mellon University September 15 th, 2006 Readings: K&F: 3.1, 3.2, 3.3.

1 BN Semantics 3 – Now it’s personal! Parameter Learning 1 Graphical Models – Carlos Guestrin Carnegie Mellon University September 22 nd, 2006 Readings:

Markov Networks.

CSCI 5822 Probabilistic Models of Human and Machine Learning

Generalized Belief Propagation

Readings: K&F: 15.1, 15.2, 15.3, 15.4, 15.5 K&F: 7 (overview of inference) K&F: 8.1, 8.2 (Variable Elimination) Structure Learning in BNs 3: (the good,

Lecture 5 Unsupervised Learning in fully Observed Directed and Undirected Graphical Models.

Clique Tree & Independence

Graduate School of Information Sciences, Tohoku University

Physical Fluctuomatics 7th~10th Belief propagation

Expectation-Maximization & Belief Propagation

Variable Elimination 2 Clique Trees

Probabilistic image processing and Bayesian network

Cluster Graph Properties

Parameter Learning 2 Structure Learning 1: The good

Readings: K&F: 5.1, 5.2, 5.3, 5.4, 5.5, 5.6, 5.7 Markov networks, Factor graphs, and an unified view Start approximate inference If we are lucky… Graphical.

Approximate Inference by Sampling

Unifying Variational and GBP Learning Parameters of MNs EM for BNs

BN Semantics 3 – Now it’s personal! Parameter Learning 1

Junction Trees 3 Undirected Graphical Models

Readings: K&F: 11.3, 11.5 Yedidia et al. paper from the class website

Markov Networks.

Variable Elimination Graphical Models – Carlos Guestrin

Mean Field and Variational Methods Loopy Belief Propagation

Presentation transcript:

Generalized Belief Propagation Readings: K&F: 10.2, 10.3 Generalized Belief Propagation Graphical Models – 10708 Carlos Guestrin Carnegie Mellon University November 12th, 2008 10-708 – Carlos Guestrin 2006-2008

More details on Loopy BP Difficulty SAT Grade Happy Job Coherence Letter Intelligence Numerical problem: messages < 1 get multiplied together as we go around the loops numbers can go to zero normalize messages to one: Zi!j doesn’t depend on Xj, so doesn’t change the answer Computing node “beliefs” (estimates of probs.): 10-708 – Carlos Guestrin 2006-2008

Loopy BP in Factor graphs From node i to factor j: F(i) factors whose scope includes Xi From factor j to node i: Scope[j] = Y[{Xi} Belief: Node: Factor: A B C D E ABC ABD BDE CDE 10-708 – Carlos Guestrin 2006-2008

Loopy BP v. Clique trees: Two ends of a spectrum DIG GJSL HGJ CD GSI Difficulty SAT Grade Happy Job Coherence Letter Intelligence 10-708 – Carlos Guestrin 2006

Generalize cluster graph Generalized cluster graph: For set of factors F Undirected graph Each node i associated with a cluster Ci Family preserving: for each factor fj 2 F, 9 node i such that scope[fi] ÍCi Each edge i – j is associated with a set of variables Sij Í Ci Ç Cj 10-708 – Carlos Guestrin 2006

Running intersection property (Generalized) Running intersection property (RIP) Cluster graph satisfies RIP if whenever X2 Ci and X2 Cj then 9 one and only one path from Ci to Cj where X2Suv for every edge (u,v) in the path 10-708 – Carlos Guestrin 2006

Examples of cluster graphs 10-708 – Carlos Guestrin 2006

Two cluster graph satisfying RIP with different edge sets 10-708 – Carlos Guestrin 2006

Generalized BP on cluster graphs satisfying RIP Initialization: Assign each factor  to a clique (), Scope[]ÍC() Initialize cliques: Initialize messages: While not converged, send messages: Belief: 10-708 – Carlos Guestrin 2006

Cluster graph for Loopy BP Difficulty SAT Grade Happy Job Coherence Letter Intelligence 10-708 – Carlos Guestrin 2006

What if the cluster graph doesn’t satisfy RIP 10-708 – Carlos Guestrin 2006

Region graphs to the rescue Can address generalized cluster graphs that don’t satisfy RIP using region graphs: Book: 10.3 Example in your homework!  10-708 – Carlos Guestrin 2006

Revisiting Mean-Fields Choice of Q: Optimization problem: 10-708 – Carlos Guestrin 2006

Announcements Recitation tomorrow HW5 out soon Will not cover relational models this semester Instead, recommend Pedro Domingos’ tutorial on Markov Logic Markov logic is one example of a relational probabilistic model November 14th from 1:00 pm to 3:30 pm in Wean 4623 10-708 – Carlos Guestrin 2006-2008

Interpretation of energy functional Exact if P=Q: View problem as an approximation of entropy term: 10-708 – Carlos Guestrin 2006

Entropy of a tree distribution Coherence Entropy term: Joint distribution: Decomposing entropy term: More generally: di number neighbors of Xi Difficulty Intelligence Grade SAT Letter 10-708 – Carlos Guestrin 2006

Loopy BP & Bethe approximation Difficulty SAT Grade Happy Job Coherence Letter Intelligence Energy functional: Bethe approximation of Free Energy: use entropy for trees, but loopy graphs: Theorem: If Loopy BP converges, resulting bij & bi are stationary point (usually local maxima) of Bethe Free energy! 10-708 – Carlos Guestrin 2006

GBP & Kikuchi approximation DIG GJSL HGJ CD GSI Exact Free energy: Junction Tree Bethe Free energy: Kikuchi approximation: Generalized cluster graph spectrum from Bethe to exact Theorem: If GBP converges, resulting bCi are stationary point (usually local maxima) of Kikuchi Free energy! Difficulty SAT Grade Happy Job Coherence Letter Intelligence 10-708 – Carlos Guestrin 2006

What you need to know about GBP Spectrum between Loopy BP & Junction Trees: More computation, but typically better answers If satisfies RIP, equations are very simple General setting, slightly trickier equations, but not hard Relates to variational methods: Corresponds to local optima of approximate version of energy functional 10-708 – Carlos Guestrin 2006

Parameter learning in Markov nets Readings: K&F: 10.2, 10.3 Parameter learning in Markov nets Graphical Models – 10708 Carlos Guestrin Carnegie Mellon University November 12th, 2008 10-708 – Carlos Guestrin 2006-2008

Learning Parameters of a BN D S G H J C L I Log likelihood decomposes: Learn each CPT independently Use counts 10-708 – Carlos Guestrin 2006

Log Likelihood for MN Log likelihood of the data: Difficulty SAT Grade Happy Job Coherence Letter Intelligence 10-708 – Carlos Guestrin 2006

Log Likelihood doesn’t decompose for MNs A convex problem Can find global optimum!! Term log Z doesn’t decompose!! Difficulty SAT Grade Happy Job Coherence Letter Intelligence 10-708 – Carlos Guestrin 2006

Derivative of Log Likelihood for MNs Difficulty SAT Grade Happy Job Coherence Letter Intelligence 10-708 – Carlos Guestrin 2006

Derivative of Log Likelihood for MNs 2 Difficulty SAT Grade Happy Job Coherence Letter Intelligence 10-708 – Carlos Guestrin 2006

Derivative of Log Likelihood for MNs Difficulty SAT Grade Happy Job Coherence Letter Intelligence Derivative: Computing derivative requires inference: Can optimize using gradient ascent Common approach Conjugate gradient, Newton’s method,… Let’s also look at a simpler solution 10-708 – Carlos Guestrin 2006

Iterative Proportional Fitting (IPF) Difficulty SAT Grade Happy Job Coherence Letter Intelligence Setting derivative to zero: Fixed point equation: Iterate and converge to optimal parameters Each iteration, must compute: 10-708 – Carlos Guestrin 2006

What you need to know about learning MN parameters? BN parameter learning easy MN parameter learning doesn’t decompose! Learning requires inference! Apply gradient ascent or IPF iterations to obtain optimal parameters 10-708 – Carlos Guestrin 2006