Belief Propagation in a Continuous World Andrew Frank 11/02/2009 Joint work with Alex Ihler and Padhraic Smyth TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A
Graphical Models Nodes represent random variables. Edges represent dependencies. C B AC B A C B A
CE DB A Markov Random Fields E DB CA D ACE B B E | C, DA C | B
Factoring Probability Distributions Independence relations factorization D C BA p(A,B,C,D) = f(A) f(B) f(C) f(D) f(A,B) f(B,C) f(B,D)
Toy Example: A Day in Court W A EV A, E, W є {“Innocent”, “Guilty”} V є {“Not guilty verdict”, “Guilty verdict”} I G G I I G
Inference Most probable explanation: Marginalization:
Iterative Message Updates x
Belief Propagation W A EV m AE (E) m WE (E) m EV (V)
Loopy BP C A BD C A BD Does this work? Does it make any sense?
A Variational Perspective Reformulate the problem: True distribution, P “Tractable” distributions Best tractable approximation, Q Find Q to minimize the divergence.
Desired traits: – Simple enough to enable easy computation – Complex enough to represent P Choose an Approximating Family e.g. Fully factored: Structured:
Choose a Divergence Measure Kullback-Liebler divergence: Alpha divergence: Common choices:
Behavior of α-Divergence Source: T. Minka. Divergence measures and message passing. Technical Report MSR-TR , Microsoft. Research, 2005.
Resulting Algorithms Assuming a fully-factored form of Q, we get…* Mean field,α = 0 Belief propagation,α = 1 Tree-reweighted BP,α ≥ 1 * By minimizing “local divergence”: Q(X 1, X 2, …, X n ) = f(X 1 ) f(X 2 ) … f(X n )
Local vs. Global Minimization Source: T. Minka. Divergence measures and message passing. Technical Report MSR-TR , Microsoft. Research, 2005.
Applications
Sensor Localization A B C
Protein Side Chain Placement RTDCYGN +
Common traits? ? Continuous state space:
Easy Solution: Discretize! 10 bins Domain size: d = bins Domain size: d = 400 Each message: O(d 2 )
Particle BP We’d like to pass “continuous messages”… C A BD B m AB (B) ……… Instead, pass discrete messages over sets of particles: { b (i) } ~ W B (B) m AB ({b (i) }) b (1) b (2) b (N)...
PBP: Computing the Messages Re-write as an expectation: Finite-sample approximation:
Choosing“Good” Proposals C A BD Proposal should “match” the integrand. Sample from the belief:
Iteratively Refine Particle Sets (2) f(x s, x t ) (1)Draw a set of particles, {x s (i) } ~ W s (x s ). (2)Discrete inference over the particle discretization. (3)Adjust W s (x s ) (1) (3) XsXs XtXt (1) (3)
Benefits of PBP No distributional assumptions. Easy accuracy/speed trade-off. Relies on an “embedded” discrete algorithm. Belief propagation, mean field, tree-reweighted BP…
Exploring PBP: A Simple Example xsxs ||x s – x t ||
Continuous Ising Model Marginals Approximate Exact Mean Field PBP α = 0 PBP α = 1 TRW PBP α = 1.5 * Run with 100 particles per node
A Localization Scenario
Exact Marginal
PBP Marginal
Tree-reweighted PBP Marginal
Estimating the Partition Function Mean field provides a lower bound. Tree-reweighted BP provides an upper bound. p(A,B,C,D) = f(A) f(B) f(C) f(D) f(A,B) f(B,C) f(B,D) Z = f(A) f(B) f(C) f(D) f(A,B) f(B,C) f(B,D)
Partition Function Bounds
Conclusions BP and related algorithms are useful! Particle BP let’s you handle continuous RVs. Extensions to BP can work with PBP, too. Thank You!