Download presentation
Presentation is loading. Please wait.
Published byLucinda Tucker Modified over 9 years ago
1
Belief propagation with junction trees Presented by Mark Silberstein and Yaniv Hamo
2
Outline – Part I ● Belief propagation example ● Propagation using message passing ● Clique trees ● Junction trees ● Junction trees algorithm
3
Simple belief propagation example (from Jensen, “An introduction to Bayesian Networks” P(X Icy ): yes0.7 no0.3 P(X Holmes |X Icy ): yesno yes0.80.2 No0.10.9 P(X Watson |X Icy ): yesno yes0.80.2 no0.10.9 “Icy Roads”
4
“Watson has had an accident!” P(X Watson =yes)=1 Bayes’ Rule P(X Icy | X Watson =yes) = (0.95,0.05) (0.70,0.30) a priori ? Joint Probability + Marginalization P(X Holmes | X Watson =yes) = (0.76,0.24) (0.59,0.41) a priori
5
“No, the roads are not icy.” P(X Icy =no)=1 When initiating X Icy X Holmes becomes independent of X Watson ; X Holmes X Watson | X Icy
6
Answering probabilistic queries (J. Pearl, [1]) ● Joint probability using elimination – most likely that human brain does not do that! Why? – Needs to hold all the network to set the elimination order – Answers only single question, without answering on all questions – Create and calculate spurious dependencies among vars concieved as independent – Sequential! ● Our brain probably computes it in parallel
7
Belief updating as a constraint propagation (J. Pearl, [1]) ● Local, simple computations ● But is it possible at all? – Why would it ever stabilize ● Rumour example: You updated your nabour, after several days you hear the same from him. Should it increase your belief? Graph coloring
8
Simple example for chain propagation (J. Pearl, [1]) Definitions: X Y e X Y Z e Link matrix Vector
9
Bidirectional propagation (J. Pearl, [1]) XTU X Y Z e-e- e+e+ π(u) π(t) π(x) λ(y)λ(x) λ(z) Chooses column Chooses row π(t) λ(y)
10
HMM and Backward-Forward algorithm P(x 1,…,x L,h i ) = P(x 1,…,x i,h i ) P(x i+1,…,x L | x 1,…,x i,h i ) H1H1 H2H2 H L-1 HLHL X1X1 X2X2 X L-1 XLXL HiHi XiXi = P(x 1,…,x i,h i ) P(x i+1,…,x L | h i ) f(h i ) b(h i ) Belief update: P(h i | x 1,…,x L ) = (1/K) P(x 1,…,x L,h i ) where K= hi P(x 1,…,x L,h i ). π(h i ) = P(x 1,…,x i,h i ) P(x i+1,…,x L | h i ) f(h i ) b(h i ) λ(h i )
11
The forward algorithm H1H1 H2H2 X1X1 X2X2 HiHi XiXi The task: Compute f(h i ) = P(x 1,…,x i,h i ) for i=1,…,L (namely, considering evidence up to time slot i). P(x 1, h 1 ) = P(h 1 ) P(x 1 |h 1 ) {Basis step} P(x 1,…,x i,h i ) = P(x 1,…,x i-1, h i-1 ) P(h i | h i-1 ) P(x i | h i ) h i-1 {step i} π(h i-1 )
12
The backward algorithm The task: Compute b(h i ) = P(x i+1,…,x L |h i ) for i=L-1,…,1 (namely, considering evidence after time slot i). H L-1 HLHL X L-1 XLXL HiHi H i+1 X i+1 P(x i+1,…,x L |h i ) = P(h i+1 | h i ) P(x i+1 | h i+1 ) P(x i+2,…,x L | h i+1 ) h i+1 {step i} =b(h i )= =b(h i+1 )=
13
Can we generalize this approach to any graph? ● Loops pose a problem – We might reach contradiction or indefinite loop ● We should apply clustering and create tree of clusters ● Each new vertex in cluster tree has potential Ψ (mapping all combination of cluster variables to non-negative real number. Joint distribution table is a special case) ● Problems: – Many ways to create cluster (e.g. all vertices forming a loop) – How to obtain marginal probabilities from potentials
14
● Yet another representation of joint probability ● How we build them: – For every variable A there should exist single clique V that – Clique potential is a multiplication of all its tables (a table is multiplied only if it was not used in another clique) – Links are labeled with separators, which consist of the intersection of adjacent nodes – Separator tables are initialized to ones ● Claim: Joint distribution is a product of all cluster tables divided by product of all separator tables Clique trees
15
Example A B C D F E A,B,C CDEDEF C DE A B C D F E Chordal graph
16
Consistency ● The marginals of adjacent nodes on their separator should be equal Ψ(V)Ψ(W) Ψ(S)
17
Absorption ● Absorption passes a message from one node to another Ψ(V)Ψ*(W) Ψ*(S) Ψ(V)Ψ*(W) Ψ*(S)
18
Absorption (cont) ● Absorption ensures consistency? ● Product of cluster tables divided by product of separators is invariant under absorption ● This feature maintains the correctness of clique tree representation
19
Rules of message passing in clique tree ● Node V can send exactly one message to neighbor W, and only if V has received a message from each of its other neighbors ● We continue till messages passed once in both directions along every link 3 2 4 6 5 1 1 23 4 5 67 89 10 After all messages are sent in both directions over every link, the tree is consistent
20
Does local consistency ensures global consistency? ● The same old loop problem ● Building a tree breaks the loops DBC ABC ED EA Global consistency: B D A C E
21
Junction tree ● Ensures global consistency ● Definition: Clique tree is a junction tree if all nodes on the path between V and W contain V∩W ABE EH BCF FIFJ CDG GK E B F F F C G ACDB HJKI FGE
22
Claims on junction tree ● Claim: A consistent junction tree is globally consistent ● Claim: t u is a product of all node potentials divided by the product of all separator potentials. Then ● Claim: after a full round of message passing in T, ● Claim: given evidence at different nodes, after a full round of message passing in T,
23
References until now 1. J. Pearl, “Probabilistic Reasoning In Intellihent Systems” 2. Finn.V. Jensen, “An Introduction To Bayesian Networks” 3. Presentations by: Sam Roweis, Henrik Bengtsson, David Barber
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.