Discrete Optimization Lecture 4 – Part 2 M. Pawan Kumar Slides available online
MRF V1V1 d1d1 V2V2 d2d2 V3V3 d3d3 V4V4 d4d4 V5V5 d5d5 V6V6 d6d6 V7V7 d7d7 V8V8 d8d8 V9V9 d9d9 A is conditionally independent of B given C if there is no path from A to B when C is removed
MRF V1V1 d1d1 V2V2 d2d2 V3V3 d3d3 V4V4 d4d4 V5V5 d5d5 V6V6 d6d6 V7V7 d7d7 V8V8 d8d8 V9V9 d9d9 V a is conditionally independent of V b given V a ’s neighbors
Pairwise MRF V1V1 d1d1 V2V2 d2d2 V3V3 d3d3 V4V4 d4d4 V5V5 d5d5 V6V6 d6d6 V7V7 d7d7 V8V8 d8d8 V9V9 d9d9 Z is known as the partition function Unary Potential ψ 1 (v 1,d 1 ) Pairwise Potential ψ 56 (v 5,v 6 ) Probability P(v,d) = Π a ψ a (v a,d a ) Π (a,b) ψ ab (v a,v b ) Z
Inference max v P(v) Maximum a Posteriori (MAP) Estimation min v Q(v)Energy Minimization P(v a = l i ) = Σ v P(v)δ(v a = l i ) Computing Marginals P(v a = l i, v b = l k ) = Σ v P(v)δ(v a = l i )δ(v b = l k ) P(v) = exp(-Q(v))/Z
Outline Belief Propagation on Chains Belief Propagation on Trees Loopy Belief Propagation
Overview VaVa VbVb VcVc VdVd Compute the marginal probability for V d P(v) = P(v a |v b )P(v b |v c )P(v c |v d )P(v d ) Compute (unnormalized) distribution Ψ a (v a )Ψ ab (v a,v b )ΣvaΣva Function m(v b )
Overview VaVa VbVb VcVc VdVd Compute the marginal probability for V d P(v) = P(v a |v b )P(v b |v c )P(v c |v d )P(v d ) Compute (unnormalized) distribution Ψ b (v b )Ψ bc (v b,v c )m(v b )ΣvbΣvb Function m(v c )
Overview VaVa VbVb VcVc VdVd Compute the marginal probability for V d P(v) = P(v a |v b )P(v b |v c )P(v c |v d )P(v d ) Compute (unnormalized) distribution Ψ c (v c )Ψ cd (v c,v d )m(v c )ΣvcΣvc (Unnormalized) Marginals !!
Overview VaVa VbVb VcVc VdVd Compute the marginal probability for V c P(v) = P(v a |v b )P(v b |v c )P(v c |v d )P(v d ) P(v) = P(v a |v b )P(v b |v c )P(v d |v c )P(v c ) Several common terms !!
Overview VaVa VbVb VcVc VdVd Compute the marginal probability for V b P(v) = P(v a |v b )P(v b |v c )P(v c |v d )P(v d ) P(v) = P(v a |v b )P(v b |v c )P(v d |v c )P(v c ) P(v) = P(v a |v b )P(v c |v b )P(v d |v c )P(v b )
Overview VaVa VbVb VcVc VdVd Compute the marginal probability for V a P(v) = P(v a |v b )P(v b |v c )P(v c |v d )P(v d ) P(v) = P(v a |v b )P(v b |v c )P(v d |v c )P(v c ) P(v) = P(v a |v b )P(v c |v b )P(v d |v c )P(v b ) P(v) = P(v b |v a )P(v c |v b )P(v d |v c )P(v a )
Belief Propagation on Chains Compute exact marginals Avoids re-computing common terms
Two Variables VaVa VbVb VaVa VbVb Unary Potentials ψ a (l i ) Pairwise Potentials ψ ab (l i,l k )
Two Variables VaVa VbVb VaVa VbVb Marginal Probability P(v b = l j ) = Σ i ψ a (l i )ψ b (l j )ψ ab (l i,l j )/Z
Two Variables VaVa VbVb VaVa VbVb Un-normalized Marginal Probability P’(v b = l j ) = Σ i ψ a (l i )ψ b (l j )ψ ab (l i,l j )/Z
Two Variables VaVa VbVb VaVa VbVb Un-normalized Marginal Probability P’(v b = l j ) = Σ i ψ a (l i )ψ b (l j )ψ ab (l i,l j )
Two Variables VaVa VbVb VaVa VbVb Un-normalized Marginal Probability P’(v b = l j ) = ψ b (l j )Σ i ψ a (l i )ψ ab (l i,l j )
Two Variables VaVa VbVb VaVa VbVb x 3
Two Variables VaVa VbVb VaVa VbVb x 3+ 5 x 1 M ab;0 11
Two Variables VaVa VbVb x 1 VaVa VbVb
Two Variables 2 x 1 11 VaVa VbVb VaVa VbVb x 3 M ab;1 17
Two Variables 11 VaVa VbVb Marginal Probability P’(v b = l j ) = ψ b (l j )Σ i ψ a (l i )ψ ab (l i,l j ) VaVa VbVb
Two Variables 11 VaVa VbVb Marginal Probability P’(v b = l j ) = ψ b (l j )M ab;j VaVa VbVb P’(v b = l 0 ) = 22P’(v b = l 1 ) = 68
Two Variables 11 VaVa VbVb Marginal Probability P(v b = l j ) = ψ b (l j )M ab;j /Z VaVa VbVb P’(v b = l 0 ) = 22P’(v b = l 1 ) = 68 Z = Σ j P’(v b = l j ) = 90
Two Variables 11 VaVa VbVb VaVa VbVb P(v b = l 0 ) = 0.244…P(v b = l 1 ) = 0.755… = 90 O(h 2 )!! Marginal Probability P(v b = l j ) = ψ b (l j )M ab;j /Z Z = Σ j P’(v b = l j )
Two Variables 11 VaVa VbVb VaVa VbVb P(v b = l 0 ) = 0.244…P(v b = l 1 ) = 0.755… O(h 2 )!! Same as brute-force
Three Variables VaVa VbVb VcVc P’(v c = l k ) Σ j Σ i ψ a (l i )ψ b (l j )ψ c (l k )ψ ab (l i,l j )ψ bc (l j,l k )
Three Variables VaVa VbVb VcVc P’(v c = l k ) ψ c (l k )Σ j Σ i ψ a (l i )ψ b (l j )ψ ab (l i,l j )ψ bc (l j,l k )
Three Variables VaVa VbVb VcVc P’(v c = l k ) ψ c (l k )Σ j ψ b (l j )Σ i ψ a (l i )ψ ab (l i,l j )ψ bc (l j,l k )
Three Variables VaVa VbVb VcVc P’(v c = l k ) ψ c (l k )Σ j ψ b (l j )ψ bc (l j,l k )Σ i ψ a (l i )ψ ab (l i,l j ) M ab;j 11 17
Three Variables VaVa VbVb VcVc P’(v c = l k ) ψ c (l k )Σ j ψ b (l j )ψ bc (l j,l k )M ab;j M bc;k
Three Variables VaVa VbVb VcVc P’(v c = l k ) ψ c (l k )Σ j ψ b (l j )ψ bc (l j,l k )M ab;j 11 17
Three Variables VaVa VbVb VcVc P’(v c = l k ) ψ c (l k )Σ j ψ b (l j )ψ bc (l j,l k )M ab;j x 2 x 11
Three Variables VaVa VbVb VcVc P’(v c = l k ) ψ c (l k )Σ j ψ b (l j )ψ bc (l j,l k )M ab;j x 2 x x 2 x 17
Three Variables VaVa VbVb VcVc P’(v c = l k ) ψ c (l k )Σ j ψ b (l j )ψ bc (l j,l k )M ab;j x 2 x x 2 x
Three Variables P’(v c = l k ) ψ c (l k )Σ j ψ b (l j )ψ bc (l j,l k )M ab;j VaVa VbVb VcVc
Three Variables P’(v c = l k ) ψ c (l k )M bc;k VaVa VbVb VcVc NOTE: M bc;k “includes” M ab;j 146
Three Variables VaVa VbVb VcVc P(v c = 0) = 0.35 P(v c = 1) = 0.65 Z = 156 x x 6 =
Three Variables VaVa VbVb VcVc O(nh 2 )Better than brute-force 146
Three Variables VaVa VbVb VcVc What about P(v b = l j )? 146
Three Variables VaVa VbVb VcVc P’(v b = l j ) Σ k Σ i ψ a (l i )ψ b (l j )ψ c (l k )ψ ab (l i,l j )ψ bc (l j,l k ) 146
Three Variables VaVa VbVb VcVc P’(v b = l j ) ψ b (l j )Σ k Σ i ψ a (l i )ψ c (l k )ψ ab (l i,l j )ψ bc (l j,l k ) 146
Three Variables VaVa VbVb VcVc P’(v b = l j ) ψ b (l j )Σ k ψ c (l k )Σ i ψ a (l i )ψ ab (l i,l j )ψ bc (l j,l k ) 146
Three Variables VaVa VbVb VcVc P’(v b = l j ) ψ b (l j )Σ k ψ c (l k )ψ bc (l j,l k )Σ i ψ a (l i )ψ ab (l i,l j ) M ab;j 146
Three Variables VaVa VbVb VcVc P’(v b = l j ) ψ b (l j )M ab;j Σ k ψ c (l k )ψ bc (l j,l k ) M cb;j NOTE: M cb;j does not “include” M bc;k 146
Three Variables VaVa VbVb VcVc P’(v b = l j ) ψ b (l j )M ab;j M cb;j
Three Variables VaVa VbVb VcVc P(v b = 0) = 0.39 P(v b = 1) = 0.61 Z = 11 x 12 x x 24 x 2 = 1344
Three Variables VaVa VbVb VcVc O(nh 2 )Better than brute-force
Three Variables VaVa VbVb VcVc What about P(v a = l i )?
Three Variables VaVa VbVb VcVc P’(v a = l i ) Σ j Σ k ψ a (l i )ψ b (l j )ψ c (l k )ψ ab (l i,l j )ψ bc (l j,l k )
Three Variables VaVa VbVb VcVc P’(v a = l i ) ψ a (l i )Σ j Σ k ψ b (l j )ψ c (l k )ψ ab (l i,l j )ψ bc (l j,l k )
Three Variables VaVa VbVb VcVc P’(v a = l i ) ψ a (l i )Σ j ψ b (l j )Σ k ψ c (l k )ψ ab (l i,l j )ψ bc (l j,l k )
Three Variables VaVa VbVb VcVc P’(v a = l i ) ψ a (l i )Σ j ψ b (l j )ψ ab (l i,l j )Σ k ψ c (l k )ψ bc (l j,l k ) M cb;j
Three Variables VaVa VbVb VcVc P’(v a = l i ) ψ a (l i )Σ j ψ b (l j )ψ ab (l i,l j )M cb;j M ba;i NOTE: M ba;i “includes” M cb;j
Three Variables VaVa VbVb VcVc P’(v a = l i ) ψ a (l i )M ba;i 192
Three Variables VaVa VbVb VcVc P(v a = 0) = 0.71 P(v b = 1) = 0.29 Z = 192 x x 5 = 1344
Three Variables VaVa VbVb VcVc O(nh 2 )Better than brute-force
Belief Propagation on Chains Start from left, go to right For current edge (a,b), compute M ab;k = Σ i ψ a (l i )ψ ab (l i,l k )Π n≠b M na;i Repeat till the end of the chain Start from right, go to left M ab;k = Σ i ψ a (l i )ψ ab (l i,l k )Π n≠b M na;i Repeat till the end of the chain
Belief Propagation on Chains P’(v a = l i,v b = l j ) = ? Normalize to compute true marginals P’(v a = l i ) = ? ψ a (l i )ψ b (l j )ψ ab (l i,l j )Π n≠b M na;i Π n≠a M nb;j ψ a (l i )Π n M na;i
Outline Belief Propagation on Chains Belief Propagation on Trees Loopy Belief Propagation Pearl, 1988
Belief Propagation on Trees VcVc VdVd VaVa VbVb Σ k Σ j Σ i ψ a (l i )ψ b (l j )ψ c (l k )ψ d (l o )ψ ac (l i,l k )ψ bc (l j,l k )ψ cd (l k,l o ) P’(v d = l o )
Belief Propagation on Trees VcVc VdVd VaVa VbVb ψ d (l o )Σ k Σ j Σ i ψ a (l i )ψ b (l j )ψ c (l k )ψ ac (l i,l k )ψ bc (l j,l k )ψ cd (l k,l o ) P’(v d = l o )
Belief Propagation on Trees VcVc VdVd VaVa VbVb ψ d (l o )Σ k ψ c (l k )Σ j Σ i ψ a (l i )ψ b (l j )ψ ac (l i,l k )ψ bc (l j,l k )ψ cd (l k,l o ) P’(v d = l o )
Belief Propagation on Trees VcVc VdVd VaVa VbVb ψ d (l o )Σ k ψ c (l k )ψ cd (l k,l o )Σ j Σ i ψ a (l i )ψ b (l j )ψ ac (l i,l k )ψ bc (l j,l k ) P’(v d = l o )
Belief Propagation on Trees VcVc VdVd VaVa VbVb ψ d (l o )Σ k ψ c (l k )ψ cd (l k,l o )Σ j ψ b (l j )Σ i ψ a (l i )ψ ac (l i,l k )ψ bc (l j,l k ) P’(v d = l o )
Belief Propagation on Trees VcVc VdVd VaVa VbVb ψ d (l o )Σ k ψ c (l k )ψ cd (l k,l o )Σ j ψ b (l j )ψ bc (l j,l k )Σ i ψ a (l i )ψ ac (l i,l k ) P’(v d = l o ) M ac;k
Belief Propagation on Trees VcVc VdVd VaVa VbVb ψ d (l o )Σ k ψ c (l k )ψ cd (l k,l o )Σ j ψ b (l j )ψ bc (l j,l k )M ac;k P’(v d = l o ) M bc;k M ac;k M bc;k
Belief Propagation on Trees VcVc VdVd VaVa VbVb ψ d (l o )Σ k ψ c (l k )ψ cd (l k,l o )M bc;k M ac;k P’(v d = l o ) M ac;k M bc;k M cd;o
Belief Propagation on Trees VcVc VdVd VaVa VbVb ψ d (l o )M cd;o P’(v d = l o ) M ac;k M bc;k M cd;o
Belief Propagation on Trees VcVc VdVd VaVa VbVb P’(v c = l k ) M ac;k M bc;k M cd;o M dc;k ψ c (l k )M ac;k M bc;k M dc;k
Belief Propagation on Trees VcVc VdVd VaVa VbVb P’(v b = l j ) M ac;k M bc;k M cd;o M dc;k M cb;j ψ b (l j )M cb;j
Belief Propagation on Trees VcVc VdVd VaVa VbVb P’(v a = l i ) M ac;k M bc;k M cd;o M dc;k M cb;j M ca;i ψ a (l i )M ca;i
Belief Propagation on Trees Start from leaf, go towards root For current edge (a,b), compute M ab;k = Σ i ψ a (l i )ψ ab (l i,l k )Π n≠b M na;i Repeat till the root is reached Start from root, go towards leaves M ab;k = Σ i ψ a (l i )ψ ab (l i,l k )Π n≠b M na;i Repeat till the leafs are reached
Belief Propagation on Trees P’(v a = l i,v b = l j ) = ? Normalize to compute true marginals P’(v a = l i ) = ? ψ a (l i )ψ b (l j )ψ ab (l i,l j )Π n≠b M na;i Π n≠a M nb;j ψ a (l i )Π n M na;i
Outline Belief Propagation on Chains Belief Propagation on Trees Loopy Belief Propagation Pearl, 1988; Murphy et al., 1999
Loopy Belief Propagation Initialize all messages to 1 In some order of edges, update messages M ab;k = Σ i ψ a (l i )ψ ab (l i,l k )Π n≠b M na;i Until Convergence Rate of changes in messages < threshold
Loopy Belief Propagation VaVa VbVb VdVd VcVc M ab M bc M bc contains M ab M cd M da M cd contains M bc M da contains M cd Overcounting!!
Loopy Belief Propagation Initialize all messages to 1 In some order of edges, update messages M ab;k = Σ i ψ a (l i )ψ ab (l i,l k )Π n≠b M na;i Until Convergence Rate of changes in messages < threshold Not Guaranteed !!
Loopy Belief Propagation B’ ab (i,j) = Normalize to compute beliefs B a (i), B ab (i,j) B’ a (i) = ψ a (l i )ψ b (l j )ψ ab (l i,l j )Π n≠b M na;i Π n≠a M nb;j ψ a (l i )Π n M na;i At convergence Σ j B ab (i,j) = B a (i)