Download presentation
Presentation is loading. Please wait.
1
Today
2
Next week
3
Marginalization ) , ( y x P P(x1) = sum sum ) , ( y x P
Suppose you have some joint probability, Involving observations, y, and hidden states, x. Suppose you’re at x1, and you want to find the marginal probability there, given the observations. Normally, you would have to compute: For N other hidden nodes, each of M states, that will take MN additions. ) , ( 3 2 1 y x P P(x1) = sum sum ) , ( 3 2 1 y x P x x 2 3
4
Special case: Markov network
But suppose the joint probability has a special structure, shown by this Markov network: Then this sum: can be computed with N M2 additions, as follows… y1 x1 y2 x2 y3 x3 ) , ( sum 3 2 1 y x P P(x1) =
5
Derivation of belief propagation
y1 y2 y3 x1 x2 x3 P(x1) = sum sum P ( x , x , x , y , y , y ) 1 2 3 1 2 3 x x 2 3
6
The posterior factorizes
P(x1) = sum sum P ( x , x , x , y , y , y ) 1 2 3 1 2 3 x x 2 3 = sum sum F ( x , y ) 1 1 x x 2 3 F ( x , y ) Y ( x , x ) 2 2 1 2 F ( x , y ) Y ( x , x ) 3 3 2 3 x = mean F ( x , y ) 1 MMSE 1 1 x y1 y2 y3 1 sum F ( x , y ) Y ( x , x ) 2 2 1 2 x 2 x1 x2 x3 sum F ( x , y ) Y ( x , x ) 3 3 2 3 x 3
7
Propagation rules P(x1) = sum sum P ( x , x , x , y , y , y ) = sum
2 3 1 2 3 x x 2 3 = sum sum F ( x , y ) 1 1 x x 2 3 F ( x , y ) Y ( x , x ) 2 2 1 2 F ( x , y ) Y ( x , x ) 3 3 2 3 P(x1) = F ( x , y ) 1 1 y1 y2 y3 sum F ( x , y ) Y ( x , x ) 2 2 1 2 x 2 x1 x2 x3 sum F ( x , y ) Y ( x , x ) 3 3 2 3 x 3
8
Propagation rules P(x1) = F ( x , y ) sum F ( x , y ) Y ( x , x ) sum
2 2 1 2 x 2 sum F ( x , y ) Y ( x , x ) 3 3 2 3 x 3 y1 y2 y3 x1 x2 x3
9
Belief and message update rules
j = i j i
10
Belief propagation updates
= i j i ( ) = * .* .*
11
Simple example For the 3-node example, worked out in detail, see Sections 2.0, 2.1 of:
12
Optimal solution in a chain or tree: Belief Propagation
“Do the right thing” Bayesian algorithm. For Gaussian random variables over time: Kalman filter. For hidden Markov models: forward/backward algorithm (and MAP variant is Viterbi).
13
Other loss functions The above rules let you compute the marginal probability at a node. From that, you can compute the mean estimate. But you can also use a related algorithm to compute the MAP estimate for x1.
14
MAP estimate for a chain or a tree
y1 y2 y3 x1 x2 x3
15
The posterior factorizes
y1 y2 y3 x1 x2 x3
16
Propagation rules y1 y2 y3 x1 x2 x3
17
Using conditional probabilities instead of compatibility functions
x1 y2 x2 y3 x3 By Bayes rule
18
Writing it as a factorization
y1 x1 y2 x2 y3 x3 By the fact that conditioning on x1 makes y1 and x2, x3, y2, y3 independent
19
Writing it as a factorization
y1 x1 y2 x2 y3 x3 Now use Bayes rule (with x2) for the rightmost term.
20
Writing it as a factorization
y1 x1 y2 x2 y3 x3 From the Markov structure, conditioning on x1 and x2 is the same as conditioning on x2.
21
Writing it as a factorization
y1 x1 y2 x2 y3 x3 Conditioning on x2 makes y2 independent of x3 and y3.
22
Writing it as a factorization
y1 x1 y2 x2 y3 x3 The same operations, once more, with the far right term.
23
A toy problem 10 nodes. 2 states for each node. Local evidence as shown below.
27
Classic 1976 paper
28
Relaxation labelling
29
Belief propagation Relaxation labelling
30
Yair’s motion example
31
Yair’s figure/ground example
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.