1 Mean Field and Variational Methods finishing off Graphical Models – 10708 Carlos Guestrin Carnegie Mellon University November 5 th, 2008 Readings: K&F:

Slides:

Advertisements

Similar presentations

Mean-Field Theory and Its Applications In Computer Vision1 1.

Advertisements

Bayesian Belief Propagation

Linear Time Methods for Propagating Beliefs Min Convolution, Distance Transforms and Box Sums Daniel Huttenlocher Computer Science Department December,

1 Undirected Graphical Models Graphical Models – Carlos Guestrin Carnegie Mellon University October 29 th, 2008 Readings: K&F: 4.1, 4.2, 4.3, 4.4,

Discrete Optimization Lecture 4 – Part 3 M. Pawan Kumar Slides available online

Exact Inference. Inference Basic task for inference: – Compute a posterior distribution for some query variables given some observed evidence – Sum out.

Variational Inference Amr Ahmed Nov. 6 th Outline Approximate Inference Variational inference formulation – Mean Field Examples – Structured VI.

Variational Methods for Graphical Models Micheal I. Jordan Zoubin Ghahramani Tommi S. Jaakkola Lawrence K. Saul Presented by: Afsaneh Shirazi.

Exact Inference in Bayes Nets

Parameter Learning in MN. Outline CRF Learning CRF for 2-d image segmentation IPF parameter sharing revisited.

Junction Trees And Belief Propagation. Junction Trees: Motivation What if we want to compute all marginals, not just one? Doing variable elimination for.

ICCV 2007 tutorial Part III Message-passing algorithms for energy minimization Vladimir Kolmogorov University College London.

Loopy Belief Propagation a summary. What is inference? Given: –Observabled variables Y –Hidden variables X –Some model of P(X,Y) We want to make some.

Undirected Probabilistic Graphical Models (Markov Nets) (Slides from Sam Roweis)

Introduction to Belief Propagation and its Generalizations. Max Welling Donald Bren School of Information and Computer and Science University of California.

Clique Trees Amr Ahmed October 23, Outline Clique Trees Representation Factorization Inference Relation with VE.

Belief Propagation by Jakob Metzler. Outline Motivation Pearl’s BP Algorithm Turbo Codes Generalized Belief Propagation Free Energies.

Markov Networks.

Belief Propagation on Markov Random Fields Aggeliki Tsoli.

Junction tree Algorithm :Probabilistic Graphical Models Recitation: 10/04/07 Ramesh Nallapati.

Carnegie Mellon Focused Belief Propagation for Query-Specific Inference Anton Chechetka Carlos Guestrin 14 May 2010.

1 Graphical Models in Data Assimilation Problems Alexander Ihler UC Irvine Collaborators: Sergey Kirshner Andrew Robertson Padhraic Smyth.

Global Approximate Inference Eran Segal Weizmann Institute.

Belief Propagation, Junction Trees, and Factor Graphs

Genome evolution: a sequence-centric approach Lecture 6: Belief propagation.

Understanding Belief Propagation and its Applications Dan Yuan June 2004.

Exact Inference: Clique Trees

Some Surprises in the Theory of Generalized Belief Propagation Jonathan Yedidia Mitsubishi Electric Research Labs (MERL) Collaborators: Bill Freeman (MIT)

CS774. Markov Random Field : Theory and Application Lecture 08 Kyomin Jung KAIST Sep

Mean Field Inference in Dependency Networks: An Empirical Study Daniel Lowd and Arash Shamaei University of Oregon.

Undirected Models: Markov Networks David Page, Fall 2009 CS 731: Advanced Methods in Artificial Intelligence, with Biomedical Applications.

1 Structured Region Graphs: Morphing EP into GBP Max Welling Tom Minka Yee Whye Teh.

Probabilistic Graphical Models

1 Variable Elimination Graphical Models – Carlos Guestrin Carnegie Mellon University October 11 th, 2006 Readings: K&F: 8.1, 8.2, 8.3,

Survey Propagation. Outline Survey Propagation: an algorithm for satisfiability 1 – Warning Propagation – Belief Propagation – Survey Propagation Survey.

Readings: K&F: 11.3, 11.5 Yedidia et al. paper from the class website

Lecture 2: Statistical learning primer for biologists

Belief Propagation and its Generalizations Shane Oldenburger.

CIAR Summer School Tutorial Lecture 1b Sigmoid Belief Nets Geoffrey Hinton.

Exact Inference in Bayes Nets. Notation U: set of nodes in a graph X i : random variable associated with node i π i : parents of node i Joint probability:

Graduate School of Information Sciences, Tohoku University

Pattern Recognition and Machine Learning

Mean field approximation for CRF inference

Today Graphical Models Representing conditional dependence graphically

Markov Networks: Theory and Applications Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208

1 Structure Learning (The Good), The Bad, The Ugly Inference Graphical Models – Carlos Guestrin Carnegie Mellon University October 13 th, 2008 Readings:

Bayesian Belief Propagation for Image Understanding David Rosenberg.

1 Variable Elimination Graphical Models – Carlos Guestrin Carnegie Mellon University October 15 th, 2008 Readings: K&F: 8.1, 8.2, 8.3,

Perfect recall: Every decision node observes all earlier decision nodes and their parents (along a “temporal” order) Sum-max-sum rule (dynamical programming):

Introduction of BP & TRW-S

Markov Networks.

Readings: K&F: 15.1, 15.2, 15.3, 15.4, 15.5 K&F: 7 (overview of inference) K&F: 8.1, 8.2 (Variable Elimination) Structure Learning in BNs 3: (the good,

Physical Fluctuomatics 7th~10th Belief propagation

Expectation-Maximization & Belief Propagation

Variable Elimination 2 Clique Trees

Readings: K&F: 5.1, 5.2, 5.3, 5.4, 5.5, 5.6, 5.7 Markov networks, Factor graphs, and an unified view Start approximate inference If we are lucky… Graphical.

Approximate Inference by Sampling

Lecture 3: Exact Inference in GMs

Unifying Variational and GBP Learning Parameters of MNs EM for BNs

Junction Trees 3 Undirected Graphical Models

Readings: K&F: 11.3, 11.5 Yedidia et al. paper from the class website

Markov Networks.

Variable Elimination Graphical Models – Carlos Guestrin

BP in Practice Message Passing Inference Probabilistic Graphical

Mean Field and Variational Methods Loopy Belief Propagation

Generalized Belief Propagation

Presentation transcript:

1 Mean Field and Variational Methods finishing off Graphical Models – Carlos Guestrin Carnegie Mellon University November 5 th, 2008 Readings: K&F: 10.1, –  Carlos Guestrin

2

3 What you need to know so far Goal:  Find an efficient distribution that is close to posterior Distance:  measure distance in terms of KL divergence Asymmetry of KL:  D(p||q)  D(q||p) Computing right KL is intractable, so we use the reverse KL

–  Carlos Guestrin Reverse KL & The Partition Function Back to the general case Consider again the defn. of D(q||p):  p is Markov net P F Theorem: where energy functional: Difficulty SATGrade Happy Job Coherence Letter Intelligence

–  Carlos Guestrin Understanding Reverse KL, Energy Function & The Partition Function Maximizing Energy Functional  Minimizing Reverse KL Theorem: Energy Function is lower bound on partition function  Maximizing energy functional corresponds to search for tight lower bound on partition function

–  Carlos Guestrin Structured Variational Approximate Inference Pick a family of distributions Q that allow for exact inference  e.g., fully factorized (mean field) Find Q2Q that maximizes For mean field

–  Carlos Guestrin Optimization for mean field Constrained optimization, solved via Lagrangian multiplier  9, such that optimization equivalent to:  Take derivative, set to zero Theorem: Q is a stationary point of mean field approximation iff for each i:

–  Carlos Guestrin Understanding fixed point equation Difficulty SATGrade Happy Job Coherence Letter Intelligence

–  Carlos Guestrin Simplifying fixed point equation Difficulty SATGrade Happy Job Coherence Letter Intelligence

–  Carlos Guestrin Theorem: The fixed point: is equivalent to:  where the Scope[  j ] = U j [ {X i } Q i only needs to consider factors that intersect X i Difficulty SATGrade Happy Job Coherence Letter Intelligence

–  Carlos Guestrin There are many stationary points!

–  Carlos Guestrin Initialize Q (e.g., randomly or smartly) Set all vars to unprocessed Pick unprocessed var X i  update Q i :  set var i as processed  if Q i changed set neighbors of X i to unprocessed Guaranteed to converge Very simple approach for finding one stationary point Difficulty SATGrade Happy Job Coherence Letter Intelligence

–  Carlos Guestrin More general structured approximations Mean field very naïve approximation Consider more general form for Q  assumption: exact inference doable over Q Theorem: stationary point of energy functional: Very similar update rule Difficulty SATGrade Happy Job Coherence Letter Intelligence

–  Carlos Guestrin Computing update rule for general case Consider one  : Difficulty SATGrade Happy Job Coherence Letter Intelligence

–  Carlos Guestrin Structured Variational update requires inference Compute marginals wrt Q of cliques in original graph and cliques in new graph, for all cliques What is a good way of computing all these marginals? Potential updates:  sequential: compute marginals, update  j, recompute marginals  parallel: compute marginals, update all  ’s, recompute marginals

–  Carlos Guestrin What you need to know about variational methods Structured Variational method:  select a form for approximate distribution  minimize reverse KL Equivalent to maximizing energy functional  searching for a tight lower bound on the partition function Many possible models for Q:  independent (mean field)  structured as a Markov net  cluster variational Several subtleties outlined in the book

17 Loopy Belief Propagation Graphical Models – Carlos Guestrin Carnegie Mellon University November 5 th, 2008 Readings: K&F: 10.2, –  Carlos Guestrin

18 Recall message passing over junction trees Exact inference:  generate a junction tree  message passing over neighbors  inference exponential in size of clique Difficulty SATGrade Happy Job Coherence Letter Intelligence DIG GJSL HGJ CD GSI

–  Carlos Guestrin Belief Propagation on Tree Pairwise Markov Nets Tree pairwise Markov net is a tree!!!  no need to create a junction tree Message passing: More general equation:  N(i) – neighbors of i in pairwise MN Theorem: Converges to true probabilities: Difficulty SATGrade Happy Job Coherence Letter Intelligence

–  Carlos Guestrin Loopy Belief Propagation on Pairwise Markov Nets What if we apply BP in a graph with loops?  send messages between pairs of nodes in graph, and hope for the best What happens?  evidence goes around the loops multiple times  may not converge  if it converges, usually overconfident about probability values But often gives you reasonable, or at least useful answers  especially if you just care about the MPE rather than the actual probabilities Difficulty SATGrade Happy Job Coherence Letter Intelligence

–  Carlos Guestrin More details on Loopy BP Numerical problem:  messages < 1 get multiplied together as we go around the loops  numbers can go to zero  normalize messages to one:  Z i!j doesn’t depend on X j, so doesn’t change the answer Computing node “beliefs” (estimates of probs.): Difficulty SATGrade Happy Job Coherence Letter Intelligence

–  Carlos Guestrin An example of running loopy BP

–  Carlos Guestrin Convergence If you tried to send all messages, and beliefs haven’t changed (by much) ! converged Difficulty SATGrade Happy Job Coherence Letter Intelligence

–  Carlos Guestrin (Non-)Convergence of Loopy BP Loopy BP can oscillate!!!  oscillations can small  oscillations can be really bad! Typically,  if factors are closer to uniform, loopy does well (converges)  if factors are closer to deterministic, loopy doesn’t behave well One approach to help: damping messages  new message is average of old message and new one:  often better convergence but, when damping is required to get convergence, result often bad graphs from Murphy et al. ’99

–  Carlos Guestrin Loopy BP in Factor graphs What if we don’t have pairwise Markov nets? 1. Transform to a pairwise MN 2. Use Loopy BP on a factor graph Message example:  from node to factor:  from factor to node: ABCDE ABCABDBDECDE

–  Carlos Guestrin Loopy BP in Factor graphs From node i to factor j:  F(i) factors whose scope includes X i From factor j to node i:  Scope[  j ] = Y[{X i } Belief:  Node:  Factor: ABCDE ABCABDBDECDE

–  Carlos Guestrin What you need to know about loopy BP Application of belief propagation in loopy graphs Doesn’t always converge  damping can help  good message schedules can help (see book) If converges, often to incorrect, but useful results Generalizes from pairwise Markov networks by using factor graphs