Some Surprises in the Theory of Generalized Belief Propagation Jonathan Yedidia Mitsubishi Electric Research Labs (MERL) Collaborators: Bill Freeman (MIT)

Slides:



Advertisements
Similar presentations
Bayesian Belief Propagation
Advertisements

Discrete Optimization Lecture 4 – Part 3 M. Pawan Kumar Slides available online
Exact Inference. Inference Basic task for inference: – Compute a posterior distribution for some query variables given some observed evidence – Sum out.
Variational Methods for Graphical Models Micheal I. Jordan Zoubin Ghahramani Tommi S. Jaakkola Lawrence K. Saul Presented by: Afsaneh Shirazi.
Lauritzen-Spiegelhalter Algorithm
Section 3: Appendix BP as an Optimization Algorithm 1.
Statistical Methods in AI/ML Bucket elimination Vibhav Gogate.
Exact Inference in Bayes Nets
Junction Trees And Belief Propagation. Junction Trees: Motivation What if we want to compute all marginals, not just one? Doing variable elimination for.
Loopy Belief Propagation a summary. What is inference? Given: –Observabled variables Y –Hidden variables X –Some model of P(X,Y) We want to make some.
Convergent Message-Passing Algorithms for Inference over General Graphs with Convex Free Energies Tamir Hazan, Amnon Shashua School of Computer Science.
Introduction to Belief Propagation and its Generalizations. Max Welling Donald Bren School of Information and Computer and Science University of California.
Belief Propagation by Jakob Metzler. Outline Motivation Pearl’s BP Algorithm Turbo Codes Generalized Belief Propagation Free Energies.
CS774. Markov Random Field : Theory and Application Lecture 04 Kyomin Jung KAIST Sep
Junction Trees: Motivation Standard algorithms (e.g., variable elimination) are inefficient if the undirected graph underlying the Bayes Net contains cycles.
GS 540 week 6. HMM basics Given a sequence, and state parameters: – Each possible path through the states has a certain probability of emitting the sequence.
Practical Belief Propagation in Wireless Sensor Networks Bracha Hod Based on a joint work with: Danny Dolev, Tal Anker and Danny Bickson The Hebrew University.
Global Approximate Inference Eran Segal Weizmann Institute.
Belief Propagation, Junction Trees, and Factor Graphs
Genome evolution: a sequence-centric approach Lecture 6: Belief propagation.
Belief Propagation Kai Ju Liu March 9, Statistical Problems Medicine Finance Internet Computer vision.
Ryan Kinworthy 2/26/20031 Chapter 7- Local Search part 2 Ryan Kinworthy CSCE Advanced Constraint Processing.
Genome Evolution. Amos Tanay 2009 Genome evolution: Lecture 8: Belief propagation.
If we measured a distribution P, what is the tree- dependent distribution P t that best approximates P? Search Space: All possible trees Goal: From all.
Efficient Gathering of Correlated Data in Sensor Networks
Physics Fluctuomatics (Tohoku University) 1 Physical Fluctuomatics 7th~10th Belief propagation Appendix Kazuyuki Tanaka Graduate School of Information.
Undirected Models: Markov Networks David Page, Fall 2009 CS 731: Advanced Methods in Artificial Intelligence, with Biomedical Applications.
1 Structured Region Graphs: Morphing EP into GBP Max Welling Tom Minka Yee Whye Teh.
Probabilistic Graphical Models
Physics Fluctuomatics / Applied Stochastic Process (Tohoku University) 1 Physical Fluctuomatics Applied Stochastic Process 9th Belief propagation Kazuyuki.
Local Theory of BER for LDPC Codes: Instantons on a Tree Vladimir Chernyak Department of Chemistry Wayne State University In collaboration with: Misha.
Problem Introduction Chow’s Problem Solution Example Proof of correctness.
28 February, 2003University of Glasgow1 Cluster Variation Method and Probabilistic Image Processing -- Loopy Belief Propagation -- Kazuyuki Tanaka Graduate.
Survey Propagation. Outline Survey Propagation: an algorithm for satisfiability 1 – Warning Propagation – Belief Propagation – Survey Propagation Survey.
Physics Fluctuomatics (Tohoku University) 1 Physical Fluctuomatics 12th Bayesian network and belief propagation in statistical inference Kazuyuki Tanaka.
Readings: K&F: 11.3, 11.5 Yedidia et al. paper from the class website
Physics Fluctuomatics (Tohoku University) 1 Physical Fluctuomatics 7th~10th Belief propagation Kazuyuki Tanaka Graduate School of Information Sciences,
Approximate Inference: Decomposition Methods with Applications to Computer Vision Kyomin Jung ( KAIST ) Joint work with Pushmeet Kohli (Microsoft Research)
1 Mean Field and Variational Methods finishing off Graphical Models – Carlos Guestrin Carnegie Mellon University November 5 th, 2008 Readings: K&F:
Belief Propagation and its Generalizations Shane Oldenburger.
CIAR Summer School Tutorial Lecture 1b Sigmoid Belief Nets Geoffrey Hinton.
Exact Inference in Bayes Nets. Notation U: set of nodes in a graph X i : random variable associated with node i π i : parents of node i Joint probability:
Graduate School of Information Sciences, Tohoku University
Christopher M. Bishop, Pattern Recognition and Machine Learning 1.
Pattern Recognition and Machine Learning
Mean field approximation for CRF inference
Today Graphical Models Representing conditional dependence graphically
Distributed cooperation and coordination using the Max-Sum algorithm
Bayesian Belief Propagation for Image Understanding David Rosenberg.
Free Energy Estimates of All-atom Protein Structures Using Generalized Belief Propagation Kamisetty H., Xing, E.P. and Langmead C.J. Raluca Gordan February.
Statistical-Mechanical Approach to Probabilistic Image Processing -- Loopy Belief Propagation and Advanced Mean-Field Method -- Kazuyuki Tanaka and Noriko.
Graduate School of Information Sciences, Tohoku University, Japan
Graduate School of Information Sciences Tohoku University, Japan
Generalized Belief Propagation
一般化された確率伝搬法の数学的構造 東北大学大学院情報科学研究科 田中和之
Cluster Variation Method for Correlation Function of Probabilistic Model with Loopy Graphical Structure Kazuyuki Tanaka Graduate School of Information.
Graduate School of Information Sciences, Tohoku University
Physical Fluctuomatics 7th~10th Belief propagation
Expectation-Maximization & Belief Propagation
Probabilistic image processing and Bayesian network
Readings: K&F: 5.1, 5.2, 5.3, 5.4, 5.5, 5.6, 5.7 Markov networks, Factor graphs, and an unified view Start approximate inference If we are lucky… Graphical.
Approximate Inference by Sampling
Probabilistic image processing and Bayesian network
Lecture 3: Exact Inference in GMs
Unifying Variational and GBP Learning Parameters of MNs EM for BNs
Readings: K&F: 11.3, 11.5 Yedidia et al. paper from the class website
Cluster Variation Method for Correlation Function of Probabilistic Model with Loopy Graphical Structure Kazuyuki Tanaka Graduate School of Information.
Mean Field and Variational Methods Loopy Belief Propagation
Generalized Belief Propagation
Presentation transcript:

Some Surprises in the Theory of Generalized Belief Propagation Jonathan Yedidia Mitsubishi Electric Research Labs (MERL) Collaborators: Bill Freeman (MIT) Yair Weiss (Hebrew University) See “Constructing Free Energy Approximations and Generalized Belief Propagation Algorithms,” MERL TR , to be published in IEEE Trans. Info. Theory.

Outline Introduction to GBP –Quick Review of Standard BP –Free Energies –Region Graphs and Valid Approximations Some Surprises Maxent Normal Approximations & a useful heuristic

Factor Graphs AB C (Kschischang, et.al. 2001)

Computing Marginal Probabilities Decoding error-correcting codes Inference in Bayesian networks Computer vision Statistical physics of magnets Fundamental for Non-trivial because of the huge number of terms in the sum.

Error-correcting Codes Marginal Probabilities = A posteriori bit probabilities (Tanner, 1981 Gallager, 1963)

Statistical Physics Marginal Probabilities= local magnetization

Standard Belief Propagation i “beliefs” “messages” a The “belief” is the BP approximation of the marginal probability.

BP Message-update Rules Using we get a i ia =

Variational (Gibbs) Free Energy Kullback-Leibler Distance: “Boltzmann’s Law” (serves to define the “energy”): Variational Free Energy is minimized when

So we’ve replaced the intractable problem of computing marginal probabilities with the even more intractable problem of minimizing a variational free energy over the space of possible beliefs. But the point is that now we can introduce interesting approximations.

Region-based Approximations to the Variational Free Energy Exact: Regions: (intractable) (Kikuchi, 1951)

Defining a “Region” A region r is a set of variable nodes V r and factor nodes F r such that if a factor node a belongs to F r, all variable nodes neighboring a must belong to V r. Regions Not a Region

Region Definitions Region states: Region beliefs: Region energy: Region average energy: Region entropy: Region free energy:

“Valid” Approximations Introduce a set of regions R, and a counting number c r for each region r in R, such that c r =1 for the largest regions, and for every factor node a and variable node i, Count every node once! Indicator functions

Entropy and Energy : Counting each factor node once makes the approximate energy exact (if the beliefs are). : Counting each variable node once makes the approximate entropy reasonable (at least the entropy is correct when all states are equiprobable).

Comments We could actually use different counting numbers for energy and entropy—leads to “fractional BP” (Weigerinck & Heskes 2002) or “convexified free energy” (Wainwright et.al., 2002) algorithms. Of course, we also need to impose normalization and consistency constraints on our free energy.

Methods to Generate Valid Region-based Approximations Cluster Variational Method (Kikuchi) Junction graphs Aji-McEliece Region Graphs Bethe (Bethe is example of Kikuchi for Factor graphs with no 4-cycles; Bethe is example of Aji-McEliece for “normal” factor graphs.) Junction trees

Example of a Region Graph A,C,1,2,4,5 B,D,2,3,5,6 C,E,4,5,7,8 D,F,5,6,8,9 2,5 C,4,5 D,5,6 5,8 5 [5] is a child of [2,5]

Definition of a Region Graph Labeled, directed graph of regions. Arc may exist from region A to region B if B is a subset of A. Sub-graphs formed from regions containing a given node are connected. where is the set of ancestors of region r. (Mobius Function) We insist that

Bethe Method Two sets of regions: Large regions containing a single factor node a and all attached variable nodes. Small regions containing a single variable node i AB C D E F A;1,2,4,5 B;2,3,5,6 C;4,5 D;5,6 E;4,5,7,8 F;5,6,8, (after Bethe, 1935)

Bethe Approximation to Gibbs Free Energy Equal to the exact Gibbs free energy when the factor graph is a tree because in that case,

Minimizing the Bethe Free Energy

Bethe = BP Identify to obtain BP equations: i

Cluster Variation Method Form a region graph with an arbitrary number of different sized regions. Start with largest regions. Then find intersection regions of the largest regions, discarding any regions that are sub-regions of other intersection regions. Continue finding intersections of those intersection regions, etc. All intersection regions obey, where S(r ) is the set of super-regions of region r. (Kikuchi, 1951)

Region Graph Created Using CVM A,C,1,2,4,5 B,D,2,3,5,6 C,E,4,5,7,8 D,F,5,6,8,9 2,5 C,4,5 D,5,6 5,8 5

Minimizing a Region Graph Free Energy Minimization is possible, but it may be awkward because of all the constraints that must be satisfied. We introduce generalized belief propagation algorithms whose fixed points are provably identical to the stationary points of the region graph free energy.

Generalized Belief Propagation Belief in a region is the product of: –Local information (factors in region) –Messages from parent regions –Messages into descendant regions from parents who are not descendants. Message-update rules obtained by enforcing marginalization constraints.

Generalized Belief Propagation

Generalized Belief Propagation

Generalized Belief Propagation

Generalized Belief Propagation

= Use Marginalization Constraints to Derive Message-Update Rules

Generalized Belief Propagation = Use Marginalization Constraints to Derive Message-Update Rules

Generalized Belief Propagation = Use Marginalization Constraints to Derive Message-Update Rules

Generalized Belief Propagation = Use Marginalization Constraints to Derive Message-Update Rules

(Mild) Surprise #1 Region beliefs (even those given by Bethe/BP) may not be realizable as the marginals of any global belief. a c b is a perfectly acceptable solution to BP, but cannot arise from any

(Minor) Surprise #2 For some sets of beliefs (sometimes even when the marginal beliefs are the exactly correct ones), the Bethe entropy is negative! Each large region has counting number 1, each small region has counting number -2, so if the beliefs are then the entropy is negative:

(Serious) Surprise #3 When there are no interactions, the minimum of the CVM free energy should correspond to the equiprobable global distribution, but sometimes it doesn’t!

Example Fully connected pairwise model, using CVM and all triplets as the largest regions, with pairs and singlets as the intersection regions. Triplets Pairs Singlets # of regions Counting #

Two simple distributions Equiprobable distribution: each state is equally probable for all beliefs. –e.g. Distribution obtained by marginalizing the global distribution that only allows all-zeros or all-ones. “Equipolarized distribution” –e.g. Each region gives

Surprise #3 (cont.) Surprisingly, the “equipolarized” distribution can have a greater entropy than the equiprobable distribution for some valid CVM approximations. This is a serious problem, because if the model gets the wrong answer without any interactions, it can’t be expected to be correct with interactions.

The Fix: Consider only Maxent-Normal Approximations A maxent-normal approximation is one which gives a maximum of the entropy when the beliefs correspond to the equiprobable distribution. Bethe approximations are provably maxent-normal. Some other CVM and other region-graph approximations are also provably maxent-normal. –Using quartets on square lattices –Empirically, these approximations give good results Other valid CVM or region-graph approximations are not maxent-normal. –Empirically, these approximations give poor results

A Very Simple Heuristic Sum all the counting numbers. –If the sum is greater than N, the equipolarized distribution will have a greater entropy than the equiprobable distribution. (Too Much—Very Bad) –If the sum is less than 0, the equipolarized distribution will have a negative entropy. (Too Little) –If the sum equals 1, the equipolarized distribution will have the correct entropy. (Just Right)

10x10 Ising Spin Glass Random interactions Random fields

Conclusions Standard BP essentially equivalent to minimizing the Bethe free energy. Bethe method and cluster variation method are special cases of the more general region graph method for generating valid free energy approximations. GBP is essentially equivalent to minimizing region graph free energy. One should be careful to use maxent-normal approximations. Sometimes you can prove that your approximation is maxent-normal; and simple heuristics can be used to prove when it isn’t.