Presentation is loading. Please wait.

Presentation is loading. Please wait.

Some Surprises in the Theory of Generalized Belief Propagation Jonathan Yedidia Mitsubishi Electric Research Labs (MERL) Collaborators: Bill Freeman (MIT)

Similar presentations


Presentation on theme: "Some Surprises in the Theory of Generalized Belief Propagation Jonathan Yedidia Mitsubishi Electric Research Labs (MERL) Collaborators: Bill Freeman (MIT)"— Presentation transcript:

1 Some Surprises in the Theory of Generalized Belief Propagation Jonathan Yedidia Mitsubishi Electric Research Labs (MERL) Collaborators: Bill Freeman (MIT) Yair Weiss (Hebrew University) See “Constructing Free Energy Approximations and Generalized Belief Propagation Algorithms,” MERL TR2004-040, to be published in IEEE Trans. Info. Theory.

2 Outline Introduction to GBP –Quick Review of Standard BP –Free Energies –Region Graphs and Valid Approximations Some Surprises Maxent Normal Approximations & a useful heuristic

3 Factor Graphs AB C 12 34 (Kschischang, et.al. 2001)

4 Computing Marginal Probabilities Decoding error-correcting codes Inference in Bayesian networks Computer vision Statistical physics of magnets Fundamental for Non-trivial because of the huge number of terms in the sum.

5 Error-correcting Codes + + + Marginal Probabilities = A posteriori bit probabilities (Tanner, 1981 Gallager, 1963)

6 Statistical Physics Marginal Probabilities= local magnetization

7 Standard Belief Propagation i “beliefs” “messages” a The “belief” is the BP approximation of the marginal probability.

8 BP Message-update Rules Using we get a i ia =

9 Variational (Gibbs) Free Energy Kullback-Leibler Distance: “Boltzmann’s Law” (serves to define the “energy”): Variational Free Energy is minimized when

10 So we’ve replaced the intractable problem of computing marginal probabilities with the even more intractable problem of minimizing a variational free energy over the space of possible beliefs. But the point is that now we can introduce interesting approximations.

11 Region-based Approximations to the Variational Free Energy Exact: Regions: (intractable) (Kikuchi, 1951)

12 Defining a “Region” A region r is a set of variable nodes V r and factor nodes F r such that if a factor node a belongs to F r, all variable nodes neighboring a must belong to V r. Regions Not a Region

13 Region Definitions Region states: Region beliefs: Region energy: Region average energy: Region entropy: Region free energy:

14 “Valid” Approximations Introduce a set of regions R, and a counting number c r for each region r in R, such that c r =1 for the largest regions, and for every factor node a and variable node i, Count every node once! Indicator functions

15 Entropy and Energy : Counting each factor node once makes the approximate energy exact (if the beliefs are). : Counting each variable node once makes the approximate entropy reasonable (at least the entropy is correct when all states are equiprobable).

16 Comments We could actually use different counting numbers for energy and entropy—leads to “fractional BP” (Weigerinck & Heskes 2002) or “convexified free energy” (Wainwright et.al., 2002) algorithms. Of course, we also need to impose normalization and consistency constraints on our free energy.

17 Methods to Generate Valid Region-based Approximations Cluster Variational Method (Kikuchi) Junction graphs Aji-McEliece Region Graphs Bethe (Bethe is example of Kikuchi for Factor graphs with no 4-cycles; Bethe is example of Aji-McEliece for “normal” factor graphs.) Junction trees

18 Example of a Region Graph A,C,1,2,4,5 B,D,2,3,5,6 C,E,4,5,7,8 D,F,5,6,8,9 2,5 C,4,5 D,5,6 5,8 5 [5] is a child of [2,5]

19 Definition of a Region Graph Labeled, directed graph of regions. Arc may exist from region A to region B if B is a subset of A. Sub-graphs formed from regions containing a given node are connected. where is the set of ancestors of region r. (Mobius Function) We insist that

20 Bethe Method Two sets of regions: Large regions containing a single factor node a and all attached variable nodes. Small regions containing a single variable node i. 3 6 9 1 2 45 7 8 AB C D E F A;1,2,4,5 B;2,3,5,6 C;4,5 D;5,6 E;4,5,7,8 F;5,6,8,9 1 2 34 5 6 7 8 9 (after Bethe, 1935)

21 Bethe Approximation to Gibbs Free Energy Equal to the exact Gibbs free energy when the factor graph is a tree because in that case,

22 Minimizing the Bethe Free Energy

23 Bethe = BP Identify to obtain BP equations: i

24 Cluster Variation Method Form a region graph with an arbitrary number of different sized regions. Start with largest regions. Then find intersection regions of the largest regions, discarding any regions that are sub-regions of other intersection regions. Continue finding intersections of those intersection regions, etc. All intersection regions obey, where S(r ) is the set of super-regions of region r. (Kikuchi, 1951)

25 Region Graph Created Using CVM A,C,1,2,4,5 B,D,2,3,5,6 C,E,4,5,7,8 D,F,5,6,8,9 2,5 C,4,5 D,5,6 5,8 5

26 Minimizing a Region Graph Free Energy Minimization is possible, but it may be awkward because of all the constraints that must be satisfied. We introduce generalized belief propagation algorithms whose fixed points are provably identical to the stationary points of the region graph free energy.

27 Generalized Belief Propagation Belief in a region is the product of: –Local information (factors in region) –Messages from parent regions –Messages into descendant regions from parents who are not descendants. Message-update rules obtained by enforcing marginalization constraints.

28 58 235645785689 25 45 56 1245 5 Generalized Belief Propagation 1 2 3 4 5 6 7 8 9

29 58 235645785689 25 45 56 1245 5 Generalized Belief Propagation 1 2 3 4 5 6 7 8 9

30 58 235645785689 25 45 56 1245 5 Generalized Belief Propagation 1 2 3 4 5 6 7 8 9

31 58 235645785689 25 45 56 1245 5 Generalized Belief Propagation 1 2 3 4 5 6 7 8 9

32 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 = Use Marginalization Constraints to Derive Message-Update Rules

33 1 2 3 4 5 6 7 8 9 Generalized Belief Propagation 1 2 3 4 5 6 7 8 9 = Use Marginalization Constraints to Derive Message-Update Rules

34 1 2 3 4 5 6 7 8 9 Generalized Belief Propagation 1 2 3 4 5 6 7 8 9 = Use Marginalization Constraints to Derive Message-Update Rules

35 1 2 3 4 5 6 7 8 9 Generalized Belief Propagation 1 2 3 4 5 6 7 8 9 = Use Marginalization Constraints to Derive Message-Update Rules

36 (Mild) Surprise #1 Region beliefs (even those given by Bethe/BP) may not be realizable as the marginals of any global belief. a c b 1 2 3 is a perfectly acceptable solution to BP, but cannot arise from any

37 (Minor) Surprise #2 For some sets of beliefs (sometimes even when the marginal beliefs are the exactly correct ones), the Bethe entropy is negative! Each large region has counting number 1, each small region has counting number -2, so if the beliefs are then the entropy is negative:

38 (Serious) Surprise #3 When there are no interactions, the minimum of the CVM free energy should correspond to the equiprobable global distribution, but sometimes it doesn’t!

39 Example Fully connected pairwise model, using CVM and all triplets as the largest regions, with pairs and singlets as the intersection regions. Triplets Pairs Singlets # of regions Counting #

40 Two simple distributions Equiprobable distribution: each state is equally probable for all beliefs. –e.g. Distribution obtained by marginalizing the global distribution that only allows all-zeros or all-ones. “Equipolarized distribution” –e.g. Each region gives

41 Surprise #3 (cont.) Surprisingly, the “equipolarized” distribution can have a greater entropy than the equiprobable distribution for some valid CVM approximations. This is a serious problem, because if the model gets the wrong answer without any interactions, it can’t be expected to be correct with interactions.

42 The Fix: Consider only Maxent-Normal Approximations A maxent-normal approximation is one which gives a maximum of the entropy when the beliefs correspond to the equiprobable distribution. Bethe approximations are provably maxent-normal. Some other CVM and other region-graph approximations are also provably maxent-normal. –Using quartets on square lattices –Empirically, these approximations give good results Other valid CVM or region-graph approximations are not maxent-normal. –Empirically, these approximations give poor results

43 A Very Simple Heuristic Sum all the counting numbers. –If the sum is greater than N, the equipolarized distribution will have a greater entropy than the equiprobable distribution. (Too Much—Very Bad) –If the sum is less than 0, the equipolarized distribution will have a negative entropy. (Too Little) –If the sum equals 1, the equipolarized distribution will have the correct entropy. (Just Right)

44 10x10 Ising Spin Glass Random interactions Random fields

45

46 Conclusions Standard BP essentially equivalent to minimizing the Bethe free energy. Bethe method and cluster variation method are special cases of the more general region graph method for generating valid free energy approximations. GBP is essentially equivalent to minimizing region graph free energy. One should be careful to use maxent-normal approximations. Sometimes you can prove that your approximation is maxent-normal; and simple heuristics can be used to prove when it isn’t.


Download ppt "Some Surprises in the Theory of Generalized Belief Propagation Jonathan Yedidia Mitsubishi Electric Research Labs (MERL) Collaborators: Bill Freeman (MIT)"

Similar presentations


Ads by Google