Pearl’s Belief Propagation Algorithm Exact answers from tree-structured Bayesian networks Heavily based on slides by: Tomas Singliar, Modified by: Daniel Lowd, for UW CSE 573
Outline What’s a polytree? Idea of belief propagation Messages and incorporation of evidence Combining messages into belief Computing messages Algorithm outlined What if the BN is not a polytree?
Polytree Bayesian networks What is a polytree? A Bayesian network graph is a polytree if (an only if) there is at most one path between any two nodes, V i and V k Why do we care about polytrees? Exact BN inference is NP-hard… …but on polytrees, takes linear time.
Examples: Polytree or not? V1V1 V5V5 V2V2 V4V4 V3V3 V6V6 V1V1 V5V5 V2V2 V4V4 V3V3 V6V6 V1V1 V2V2 V3V3 V4V4
Our inference task We know the values of some evidence variables E: We wish to compute the conditional probability P(X i |E) for all non-evidence variables X i.
Pearl’s belief propagation Local computation for one node V desired Information flows through the links of G flows as messages of two types – λ and π V splits network into two disjoint parts Strong independence assumptions induced – crucial! Denote E V + the part of evidence accessible through the parents of V (causal) passed downward in π messages Analogously, let E V - be the diagnostic evidence passed upwards in λ messages
Pearl’s Belief Propagation V U2U2 V1V1 V2V2 U1U1 π(U 2 ) π(V 1 ) π(V 2 ) π(U 1 ) λ(U 1 ) λ(V 2 ) λ(V 1 ) λ(U 2 )
The Messages What are the messages? For simplicity, let the nodes be binary V1V1 V2V2 V 1 =T0.8 V 1 =F0.2 PV 1 =TV 1 =F V 2 =T V 2 =F The message passes on information. What information? Observe: P(V 2 | V 1 ) = P(V 2 | V 1 =T)P(V 1 =T) + P(V 2 | V 1 =F)P(V 1 =F) The information needed is the CPT of V 1 = V (V 1 ) Messages capture information passed from parent to child
The Evidence Evidence – values of observed nodes V 3 = T, V 6 = 3 Our belief in what the value of V i ‘should’ be changes. This belief is propagated As if the CPTs became V1V1 V5V5 V2V2 V4V4 V3V3 V6V6 V 3 =T1.0 V 3 =F0.0 PV 2 =TV 2 =F V 6 =10.0 V 6 =20.0 V 6 =31.0
The Messages We know what the messages are What about ? The messages are (V)=P(V|E + ) and (V)=P(E - |V) V1V1 V2V2 Assume E = { V 2 } and compute by Bayes rule: The information not available at V 1 is the P(V 2 |V 1 ). To be passed upwards by a -message. Again, this is not in general exactly the CPT, but the belief based on evidence down the tree.
Combination of evidence Recall that E V = E V + E V - and let us compute α is the normalization constant normalization is not necessary (can do it at the end) but may prevent numerical underflow problems
Messages Assume X received λ-messages from neighbors How to compute λ(x) = p(e - |x)? Let Y 1, …, Y c be the children of X λ XY (x) denotes the λ-message sent between X and Y
Messages Assume X received π -messages from neighbors How to compute π(x) = p(x|e + ) ? Let U 1, …, U p be the parents of X π XY (x) denotes the π-message sent between X and Y summation over the CPT
Messages to pass We need to compute π XY (x) Similarly, λ XY (x), X is parent, Y child Symbolically, group other parents of Y into V = V 1, …, V q
The Pearl Belief Propagation Algorithm We can summarize the algorithm now: Initialization step For all V i =e i in E: (x i ) = 1 wherever x i = e i ; 0 otherwise (x i ) = 1 wherever x i = e i ; 0 otherwise For nodes without parents (x i ) = p(x i ) - prior probabilities For nodes without children (x i ) = 1 uniformly (normalize at end)
The Pearl Belief Propagation Algorithm Iterate until no change occurs (For each node X) if X has received all the π messages from its parents, calculate π(x) (For each node X) if X has received all the λ messages from its children, calculate λ(x) (For each node X) if π(x) has been calculated and X received all the λ-messages from all its children (except Y), calculate π XY (x) and send it to Y. (For each node X) if λ(x) has been calculated and X received all the π-messages from all parents (except U), calculate λ XU (x) and send it to U. Compute BEL(X) = λ(x)π(x) and normalize
Complexity On a polytree, the BP algorithm converges in time proportional to diameter of network – at most linear Work done in a node is proportional to the size of CPT Hence BP is linear in number of network parameters For general BBNs Exact inference is NP-hard Approximate inference is NP-hard
Most Graphs are not Polytrees Cutset conditioning Instantiate a node in cycle, absorb the value in child’s CPT. Do it with all possible values and run belief propagation. Sum over obtained conditionals Hard to do Need to compute P(c) Exponential explosion - minimal cutset desirable (also NP-complete) Clustering algorithm Approximate inference MCMC methods Loopy BP