Pearl’s Belief Propagation Algorithm Exact answers from tree-structured Bayesian networks Heavily based on slides by: Tomas Singliar,

Slides:



Advertisements
Similar presentations
Exact Inference. Inference Basic task for inference: – Compute a posterior distribution for some query variables given some observed evidence – Sum out.
Advertisements

Variational Methods for Graphical Models Micheal I. Jordan Zoubin Ghahramani Tommi S. Jaakkola Lawrence K. Saul Presented by: Afsaneh Shirazi.
CS498-EA Reasoning in AI Lecture #15 Instructor: Eyal Amir Fall Semester 2011.
Exact Inference in Bayes Nets
Junction Trees And Belief Propagation. Junction Trees: Motivation What if we want to compute all marginals, not just one? Doing variable elimination for.
Dynamic Bayesian Networks (DBNs)
For Monday Read chapter 18, sections 1-2 Homework: –Chapter 14, exercise 8 a-d.
For Monday Finish chapter 14 Homework: –Chapter 13, exercises 8, 15.
Probabilistic Reasoning (2)
Belief Propagation by Jakob Metzler. Outline Motivation Pearl’s BP Algorithm Turbo Codes Generalized Belief Propagation Free Energies.
Overview of Inference Algorithms for Bayesian Networks Wei Sun, PhD Assistant Research Professor SEOR Dept. & C4I Center George Mason University, 2009.
CS774. Markov Random Field : Theory and Application Lecture 04 Kyomin Jung KAIST Sep
Junction Trees: Motivation Standard algorithms (e.g., variable elimination) are inefficient if the undirected graph underlying the Bayes Net contains cycles.
GS 540 week 6. HMM basics Given a sequence, and state parameters: – Each possible path through the states has a certain probability of emitting the sequence.
Toothache  toothache catch  catch catch  catch cavity  cavity Joint PDF.
Bayesian Network. Introduction Independence assumptions Seems to be necessary for probabilistic inference to be practical. Naïve Bayes Method Makes independence.
M.I. Jaime Alfonso Reyes ´Cortés.  The basic task for any probabilistic inference system is to compute the posterior probability distribution for a set.
Bayesian network inference
Bayesian Networks Chapter 2 (Duda et al.) – Section 2.11
Inference in Bayesian Nets
A Differential Approach to Inference in Bayesian Networks - Adnan Darwiche Jiangbo Dang and Yimin Huang CSCE582 Bayesian Networks and Decision Graph.
Bayesian Networks. Graphical Models Bayesian networks Conditional random fields etc.
Propagation in Poly Trees Given a Bayesian Network BN = {G, JDP} JDP(a,b,c,d,e) = p(a)*p(b|a)*p(c|e,b)*p(d)*p(e|d) a d b e c.
Belief Propagation, Junction Trees, and Factor Graphs
Graphical Models Lei Tang. Review of Graphical Models Directed Graph (DAG, Bayesian Network, Belief Network) Typically used to represent causal relationship.
Constructing Belief Networks: Summary [[Decide on what sorts of queries you are interested in answering –This in turn dictates what factors to model in.
5/25/2005EE562 EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS Lecture 16, 6/1/2005 University of Washington, Department of Electrical Engineering Spring 2005.
. Approximate Inference Slides by Nir Friedman. When can we hope to approximate? Two situations: u Highly stochastic distributions “Far” evidence is discarded.
Bayes Nets. Bayes Nets Quick Intro Topic of much current research Models dependence/independence in probability distributions Graph based - aka “graphical.
A Differential Approach to Inference in Bayesian Networks - Adnan Darwiche Jiangbo Dang and Yimin Huang CSCE582 Bayesian Networks and Decision Graphs.
1 Bayesian Networks Chapter ; 14.4 CS 63 Adapted from slides by Tim Finin and Marie desJardins. Some material borrowed from Lise Getoor.
. PGM 2002/3 – Tirgul6 Approximate Inference: Sampling.
Bayesian networks Chapter 14. Outline Syntax Semantics.
Undirected Models: Markov Networks David Page, Fall 2009 CS 731: Advanced Methods in Artificial Intelligence, with Biomedical Applications.
Belief Propagation. What is Belief Propagation (BP)? BP is a specific instance of a general class of methods that exist for approximate inference in Bayes.
1 CS 391L: Machine Learning: Bayesian Learning: Beyond Naïve Bayes Raymond J. Mooney University of Texas at Austin.
For Wednesday Read Chapter 11, sections 1-2 Program 2 due.
Direct Message Passing for Hybrid Bayesian Networks Wei Sun, PhD Assistant Research Professor SFL, C4I Center, SEOR Dept. George Mason University, 2009.
Bayesian Statistics and Belief Networks. Overview Book: Ch 13,14 Refresher on Probability Bayesian classifiers Belief Networks / Bayesian Networks.
Introduction to Bayesian Networks
1 Variable Elimination Graphical Models – Carlos Guestrin Carnegie Mellon University October 11 th, 2006 Readings: K&F: 8.1, 8.2, 8.3,
Bayesian Networks: Independencies and Inference Scott Davies and Andrew Moore Note to other teachers and users of these slides. Andrew and Scott would.
Inference Complexity As Learning Bias Daniel Lowd Dept. of Computer and Information Science University of Oregon Joint work with Pedro Domingos.
Two Approximate Algorithms for Belief Updating Mini-Clustering - MC Robert Mateescu, Rina Dechter, Kalev Kask. "Tree Approximation for Belief Updating",
1 CMSC 671 Fall 2001 Class #21 – Tuesday, November 13.
Computing & Information Sciences Kansas State University Data Sciences Summer Institute Multimodal Information Access and Synthesis Learning and Reasoning.
The famous “sprinkler” example (J. Pearl, Probabilistic Reasoning in Intelligent Systems, 1988)
Belief Propagation and its Generalizations Shane Oldenburger.
Wei Sun and KC Chang George Mason University March 2008 Convergence Study of Message Passing In Arbitrary Continuous Bayesian.
Exact Inference in Bayes Nets. Notation U: set of nodes in a graph X i : random variable associated with node i π i : parents of node i Joint probability:
Learning and Acting with Bayes Nets Chapter 20.. Page 2 === A Network and a Training Data.
1 CMSC 671 Fall 2001 Class #20 – Thursday, November 8.
Pattern Recognition and Machine Learning
Bayes network inference  A general scenario:  Query variables: X  Evidence (observed) variables and their values: E = e  Unobserved variables: Y 
Daphne Koller Overview Conditional Probability Queries Probabilistic Graphical Models Inference.
Today Graphical Models Representing conditional dependence graphically
Conditional Independence As with absolute independence, the equivalent forms of X and Y being conditionally independent given Z can also be used: P(X|Y,
Slide 1 Directed Graphical Probabilistic Models: inference William W. Cohen Machine Learning Feb 2008.
Web-Mining Agents Data Mining Prof. Dr. Ralf Möller Universität zu Lübeck Institut für Informationssysteme Karsten Martiny (Übungen)
Bayesian Networks Chapter 2 (Duda et al.) – Section 2.11 CS479/679 Pattern Recognition Dr. George Bebis.
Inference in Bayesian Networks
Learning Bayesian Network Models from Data
CSCI 5822 Probabilistic Models of Human and Machine Learning
Professor Marie desJardins,
Bayesian Statistics and Belief Networks
Exact Inference ..
Class #19 – Tuesday, November 3
Class #16 – Tuesday, October 26
Lecture 3: Exact Inference in GMs
Presentation transcript:

Pearl’s Belief Propagation Algorithm Exact answers from tree-structured Bayesian networks Heavily based on slides by: Tomas Singliar, Modified by: Daniel Lowd, for UW CSE 573

Outline What’s a polytree? Idea of belief propagation Messages and incorporation of evidence Combining messages into belief Computing messages Algorithm outlined What if the BN is not a polytree?

Polytree Bayesian networks What is a polytree?  A Bayesian network graph is a polytree if (an only if) there is at most one path between any two nodes, V i and V k Why do we care about polytrees?  Exact BN inference is NP-hard…  …but on polytrees, takes linear time.

Examples: Polytree or not? V1V1 V5V5 V2V2 V4V4 V3V3 V6V6 V1V1 V5V5 V2V2 V4V4 V3V3 V6V6 V1V1 V2V2 V3V3 V4V4

Our inference task We know the values of some evidence variables E: We wish to compute the conditional probability P(X i |E) for all non-evidence variables X i.

Pearl’s belief propagation Local computation for one node V desired Information flows through the links of G  flows as messages of two types – λ and π V splits network into two disjoint parts  Strong independence assumptions induced – crucial! Denote E V + the part of evidence accessible through the parents of V (causal)  passed downward in π messages Analogously, let E V - be the diagnostic evidence  passed upwards in λ messages

Pearl’s Belief Propagation V U2U2 V1V1 V2V2 U1U1 π(U 2 ) π(V 1 ) π(V 2 ) π(U 1 ) λ(U 1 ) λ(V 2 ) λ(V 1 ) λ(U 2 )

The  Messages What are the messages? For simplicity, let the nodes be binary V1V1 V2V2 V 1 =T0.8 V 1 =F0.2 PV 1 =TV 1 =F V 2 =T V 2 =F The message passes on information. What information? Observe: P(V 2 | V 1 ) = P(V 2 | V 1 =T)P(V 1 =T) + P(V 2 | V 1 =F)P(V 1 =F) The information needed is the CPT of V 1 =  V (V 1 )  Messages capture information passed from parent to child

The Evidence Evidence – values of observed nodes  V 3 = T, V 6 = 3 Our belief in what the value of V i ‘should’ be changes. This belief is propagated As if the CPTs became V1V1 V5V5 V2V2 V4V4 V3V3 V6V6 V 3 =T1.0 V 3 =F0.0 PV 2 =TV 2 =F V 6 =10.0 V 6 =20.0 V 6 =31.0

The Messages We know what the  messages are What about ? The messages are  (V)=P(V|E + ) and (V)=P(E - |V) V1V1 V2V2 Assume E = { V 2 } and compute by Bayes rule: The information not available at V 1 is the P(V 2 |V 1 ). To be passed upwards by a -message. Again, this is not in general exactly the CPT, but the belief based on evidence down the tree.

Combination of evidence Recall that E V = E V +  E V - and let us compute α is the normalization constant normalization is not necessary (can do it at the end) but may prevent numerical underflow problems

Messages Assume X received λ-messages from neighbors How to compute λ(x) = p(e - |x)? Let Y 1, …, Y c be the children of X λ XY (x) denotes the λ-message sent between X and Y

Messages Assume X received π -messages from neighbors How to compute π(x) = p(x|e + ) ? Let U 1, …, U p be the parents of X π XY (x) denotes the π-message sent between X and Y summation over the CPT

Messages to pass We need to compute π XY (x) Similarly, λ XY (x), X is parent, Y child Symbolically, group other parents of Y into V = V 1, …, V q

The Pearl Belief Propagation Algorithm We can summarize the algorithm now:  Initialization step For all V i =e i in E:  (x i ) = 1 wherever x i = e i ; 0 otherwise   (x i ) = 1 wherever x i = e i ; 0 otherwise For nodes without parents   (x i ) = p(x i ) - prior probabilities For nodes without children  (x i ) = 1 uniformly (normalize at end)

The Pearl Belief Propagation Algorithm  Iterate until no change occurs (For each node X) if X has received all the π messages from its parents, calculate π(x) (For each node X) if X has received all the λ messages from its children, calculate λ(x) (For each node X) if π(x) has been calculated and X received all the λ-messages from all its children (except Y), calculate π XY (x) and send it to Y. (For each node X) if λ(x) has been calculated and X received all the π-messages from all parents (except U), calculate λ XU (x) and send it to U.  Compute BEL(X) = λ(x)π(x) and normalize

Complexity On a polytree, the BP algorithm converges in time proportional to diameter of network – at most linear Work done in a node is proportional to the size of CPT Hence BP is linear in number of network parameters For general BBNs  Exact inference is NP-hard  Approximate inference is NP-hard

Most Graphs are not Polytrees Cutset conditioning  Instantiate a node in cycle, absorb the value in child’s CPT.  Do it with all possible values and run belief propagation.  Sum over obtained conditionals  Hard to do Need to compute P(c) Exponential explosion - minimal cutset desirable (also NP-complete) Clustering algorithm Approximate inference  MCMC methods  Loopy BP