Loopy Belief Propagation a summary. What is inference? Given: –Observabled variables Y –Hidden variables X –Some model of P(X,Y) We want to make some.

Slides:



Advertisements
Similar presentations
Mean-Field Theory and Its Applications In Computer Vision1 1.
Advertisements

Bayesian Belief Propagation
Linear Time Methods for Propagating Beliefs Min Convolution, Distance Transforms and Box Sums Daniel Huttenlocher Computer Science Department December,
Discrete Optimization Lecture 4 – Part 3 M. Pawan Kumar Slides available online
Exact Inference. Inference Basic task for inference: – Compute a posterior distribution for some query variables given some observed evidence – Sum out.
Section 3: Appendix BP as an Optimization Algorithm 1.
Exact Inference in Bayes Nets
Junction Trees And Belief Propagation. Junction Trees: Motivation What if we want to compute all marginals, not just one? Doing variable elimination for.
Convergent Message-Passing Algorithms for Inference over General Graphs with Convex Free Energies Tamir Hazan, Amnon Shashua School of Computer Science.
Introduction to Belief Propagation and its Generalizations. Max Welling Donald Bren School of Information and Computer and Science University of California.
Belief Propagation by Jakob Metzler. Outline Motivation Pearl’s BP Algorithm Turbo Codes Generalized Belief Propagation Free Energies.
Markov Networks.
Belief Propagation on Markov Random Fields Aggeliki Tsoli.
CS774. Markov Random Field : Theory and Application Lecture 04 Kyomin Jung KAIST Sep
Graphical models, belief propagation, and Markov random fields 1.
GS 540 week 6. HMM basics Given a sequence, and state parameters: – Each possible path through the states has a certain probability of emitting the sequence.
1 Graphical Models in Data Assimilation Problems Alexander Ihler UC Irvine Collaborators: Sergey Kirshner Andrew Robertson Padhraic Smyth.
Global Approximate Inference Eran Segal Weizmann Institute.
Learning Low-Level Vision William T. Freeman Egon C. Pasztor Owen T. Carmichael.
Conditional Random Fields
Belief Propagation, Junction Trees, and Factor Graphs
Graphical Models Lei Tang. Review of Graphical Models Directed Graph (DAG, Bayesian Network, Belief Network) Typically used to represent causal relationship.
Genome evolution: a sequence-centric approach Lecture 6: Belief propagation.
Understanding Belief Propagation and its Applications Dan Yuan June 2004.
Computer vision: models, learning and inference Chapter 10 Graphical Models.
24 November, 2011National Tsin Hua University, Taiwan1 Mathematical Structures of Belief Propagation Algorithms in Probabilistic Information Processing.
Computer vision: models, learning and inference
Some Surprises in the Theory of Generalized Belief Propagation Jonathan Yedidia Mitsubishi Electric Research Labs (MERL) Collaborators: Bill Freeman (MIT)
Physics Fluctuomatics (Tohoku University) 1 Physical Fluctuomatics 7th~10th Belief propagation Appendix Kazuyuki Tanaka Graduate School of Information.
Undirected Models: Markov Networks David Page, Fall 2009 CS 731: Advanced Methods in Artificial Intelligence, with Biomedical Applications.
1 Structured Region Graphs: Morphing EP into GBP Max Welling Tom Minka Yee Whye Teh.
Belief Propagation. What is Belief Propagation (BP)? BP is a specific instance of a general class of methods that exist for approximate inference in Bayes.
Physics Fluctuomatics / Applied Stochastic Process (Tohoku University) 1 Physical Fluctuomatics Applied Stochastic Process 9th Belief propagation Kazuyuki.
Continuous Variables Write message update equation as an expectation: Proposal distribution W t (x t ) for each node Samples define a random discretization.
Tokyo Institute of Technology, Japan Yu Nishiyama and Sumio Watanabe Theoretical Analysis of Accuracy of Gaussian Belief Propagation.
Readings: K&F: 11.3, 11.5 Yedidia et al. paper from the class website
Physics Fluctuomatics (Tohoku University) 1 Physical Fluctuomatics 7th~10th Belief propagation Kazuyuki Tanaka Graduate School of Information Sciences,
1 Mean Field and Variational Methods finishing off Graphical Models – Carlos Guestrin Carnegie Mellon University November 5 th, 2008 Readings: K&F:
Probabilistic Graphical Models seminar 15/16 ( ) Haim Kaplan Tel Aviv University.
Introduction to Belief Propagation
Belief Propagation and its Generalizations Shane Oldenburger.
Exact Inference in Bayes Nets. Notation U: set of nodes in a graph X i : random variable associated with node i π i : parents of node i Joint probability:
Graduate School of Information Sciences, Tohoku University
Christopher M. Bishop, Pattern Recognition and Machine Learning 1.
Pattern Recognition and Machine Learning
Efficient Belief Propagation for Image Restoration Qi Zhao Mar.22,2006.
Mean field approximation for CRF inference
Daphne Koller Overview Conditional Probability Queries Probabilistic Graphical Models Inference.
Markov Networks: Theory and Applications Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208
Daphne Koller Overview Maximum a posteriori (MAP) Probabilistic Graphical Models Inference.
Distributed cooperation and coordination using the Max-Sum algorithm
Bayesian Belief Propagation for Image Understanding David Rosenberg.
ICPR2004 (24 July, 2004, Cambridge) 1 Probabilistic image processing based on the Q-Ising model by means of the mean- field method and loopy belief propagation.
Free Energy Estimates of All-atom Protein Structures Using Generalized Belief Propagation Kamisetty H., Xing, E.P. and Langmead C.J. Raluca Gordan February.
Today.
Graduate School of Information Sciences, Tohoku University, Japan
Markov Networks.
CSCI 5822 Probabilistic Models of Human and Machine Learning
Graduate School of Information Sciences Tohoku University, Japan
Generalized Belief Propagation
一般化された確率伝搬法の数学的構造 東北大学大学院情報科学研究科 田中和之
Graduate School of Information Sciences, Tohoku University
Physical Fluctuomatics 7th~10th Belief propagation
Exact Inference Continued
Expectation-Maximization & Belief Propagation
Probabilistic image processing and Bayesian network
Readings: K&F: 11.3, 11.5 Yedidia et al. paper from the class website
Markov Networks.
Mean Field and Variational Methods Loopy Belief Propagation
Generalized Belief Propagation
Presentation transcript:

Loopy Belief Propagation a summary

What is inference? Given: –Observabled variables Y –Hidden variables X –Some model of P(X,Y) We want to make some analysis of P(X|Y): –Estimate marginal P(S) for S µ X –Minimal Mean Squared Error configuration (MMSE) This is just E[X|Y] –Maximum A-Posteriori configuration (MAP) –N most likely configurations –Minimum Variance (MVUE)

Representing Structure in P(X,Y) Often, P(X,Y) =  k  k (X Ck ), where X Ck µ X [ Y Markov Random FieldBayes NetFactor Graph P(X) = f 1 (x 1,x 2,x 3 ) ¢ f 2 (x 3,x 4 ) ¢ f 3 (x 3,x 5 ) / Z P(X) = P(x 3 |x 1,x 2 ) ¢ P(x 4 |x 3 ) ¢ P(x 5 |x 3 ) P(X) = f 1 (x 1,x 2,x 3 ) ¢ f 2 (x 3,x 4 ) ¢ f 3 (x 3,x 5 ) ¢ f 4 (x 1 ) ¢ f 5 (x 2 ) / Z

Sum-Product Algorithm aka belief update Suppose the factor graph is a tree. For the tree to the left, we have: P(X) = f 1 (x 1,x 2 )f 2 (x 2,x 3,x 4 )f 3 (x 3,x 5 )f 4 (x 4,x 6 ) Then marginalization (for example, computing P(x 1 )) can be sped up by exploiting the factorization: P(x 1 ) =  f 1 (x 1,x 2 )f 2 (x 2,x 3,x 4 )f 3 (x 3,x 5 )f 4 (x 4,x 6 ) =  f 1 (x 1,x 2 ) (  f 3 (x 3,x 5 )) (  f 4 (x 4,x 6 )) x 2,x 3,x 4,x 5,x 6 x 2,x 3,x 4 x5x5 x6x6 Quickly computes every single-variable marginal P(x n ) from a tree graph

Message Passing for Sum-Product We can compute every marginal P(x n ) quickly using a system of message passing: Message from variable node n to factor node m: v n,m (x n ) =   i,n (x n ) Message from factor node m to variable node n:  m,n (x n ) =  [f s (x N(s) )  v i,m (x i )] Marginal P(x n ): P(x n ) /   m,n (x n ) Each node n can pass a message to neighbor m only once it has received a message from all other adjacent nodes. Intuitively, each message from n to m represents P(x m |S n ), where S n is the set of all children of node n. i 2 N(n) \ n x N(n) \ n m 2 N(n)

Max-Product Algorithm aka belief revision Instead of summing P(X), we take the maximum to get the “maximal” (instead of the marginal): M(x 1 ) = max f 1 (x 1,x 2 )f 2 (x 2,x 3,x 4 )f 3 (x 3,x 5 )f 4 (x 4,x 6 ) = max f 1 (x 1,x 2 ) (max f 3 (x 3,x 5 )) (max f 4 (x 4,x 6 )) Use the same message passing system to compute the maximal of each variable. x 2,x 3,x 4,x 5,x 6 x 2,x 3,x 4 x5x5 x6x6 Quickly computes the Maximum A-Posteriori configuration of a tree graph

Computational Cost of Max-Product and Sum-Product Each message is of size M, where M is the number of states in the random variable. –usually pretty small Each variable ! factor node message requires (N-2)M multiplies, where N is the number of neighbors off the variable node. –that’s tiny Each factor ! variable node message requires summation over N-1 variables, each of size M. Total computation per message is O(N ¢ M N ). –not bad, as long as there aren’t any hub-like nodes.

What if the graph is not a tree Several alternative methods: –Gibbs sampling –Expectation Maximization –Variational methods –Elimination Algorithm –Junction-Tree algorithm –Loopy Belief Propagation

Elimination Algorithm Inferring P(x 1 ):

Loopy Belief Propagation Just apply BP rules in spite of loops In each iteration, each node sends all messages in parallel Seems to work for some applications Decoding TurboCodes

Trouble with LBP May not converge –A variety of tricks can help Cycling Error – old information is mistaken as new Convergence Error – unlike in a tree, neighbors need not be independent. However, LBP treats them as if they were. Bolt & Gaag “On the convergence error in loopy propagation” (2004).

Good news about MAP in LBP For a single loop, MAP values are correct –Although the “maximals” are not If LPB converges, the resulting MAP configuration has higher probability than any other configuration in the “Single Loops and Trees” Neighborhood Example SLT neighborhoods on a grid Weiss, Freeman, “On the optimality of solutions of the max-product belief propagation algorithm in arbitrary graphs” (2001)

MMSE in LBP If P(X) is jointly Gaussian, LBP will converge to the correct marginals. For pairwise-connected markov random fields, if LBP converges, its marginals will minimize Bethe free energy. Weiss, Freeman, “Correctness of Belief Propagation in Gaussian Graphical Models of Arbitrary Topology” (2001) Yedidia, Freeman, Weiss, “Bethe free energy, Kikuchi approximations, and belief propagation algorithms”, (2001)

Free Energy Suppose we were able to compute the marginals of a probability distribution b(X) that closely approximated P(X|Y). We would want b(X) to resemble P(X|Y) as much as possible. The total energy F of b(X) is the Kullback-Leibler divergence between b(X) and P(X|Y): However, F is difficult to compute. Also, the b(X) we are working with is often ill-defined.

Kikuchi Free Energy We can approximate total free energy using Kikuchi Free energy. 1)Select a set of clusters of nodes of a factor graph 1)All nodes must be in at least one cluster 2)For each factor node in a cluster, all adjacent variables nodes must also be included. 2)For each cluster of variables S i, compute the total energy. Sum them together. a)F[b(S i )] is the KL-divergence between b(S_i) and the marginal P(S_i|Y) 3)Now we have double-counted the intersections between sets S_i. Subtract the free-energy of the intersections. Repeat. Bethe free energy is Kukuchi free energy starting with all clusters of size 2.

More advanced algorithms Greater accuracy, at a price Generalized Belief Propagation algorithms have been developed to minimize Kicuchi free energy (Yedida, Freeman, Weiss, 2004) –The junction-tree algorithm is a special case Alan Yuille (2000) has devised a message passing algorithm that minimizes Bethe free energy and is guaranteed to converge. Other groups are working on fast & robust Bethe minimization (Pretti & Pelizzola 2003).