Bayesian Networks: Independencies and Inference Scott Davies and Andrew Moore Note to other teachers and users of these slides. Andrew and Scott would.

Slides:



Advertisements
Similar presentations
BAYESIAN NETWORKS Ivan Bratko Faculty of Computer and Information Sc. University of Ljubljana.
Advertisements

Exact Inference. Inference Basic task for inference: – Compute a posterior distribution for some query variables given some observed evidence – Sum out.
Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.
BAYESIAN NETWORKS. Bayesian Network Motivation  We want a representation and reasoning system that is based on conditional independence  Compact yet.
1 Bayes nets Computing conditional probability Polytrees Probability Inferences Bayes nets Computing conditional probability Polytrees Probability Inferences.
Exact Inference in Bayes Nets
Junction Trees And Belief Propagation. Junction Trees: Motivation What if we want to compute all marginals, not just one? Doing variable elimination for.
A question Homophily: similar nodes ~= connected nodes Which is cause and which is effect? –Do birds of a feather flock together? (Associative sorting)
Identifying Conditional Independencies in Bayes Nets Lecture 4.
For Monday Read chapter 18, sections 1-2 Homework: –Chapter 14, exercise 8 a-d.
For Monday Finish chapter 14 Homework: –Chapter 13, exercises 8, 15.
Pearl’s Belief Propagation Algorithm Exact answers from tree-structured Bayesian networks Heavily based on slides by: Tomas Singliar,
Overview of Inference Algorithms for Bayesian Networks Wei Sun, PhD Assistant Research Professor SEOR Dept. & C4I Center George Mason University, 2009.
Chapter 8-3 Markov Random Fields 1. Topics 1. Introduction 1. Undirected Graphical Models 2. Terminology 2. Conditional Independence 3. Factorization.
Junction Trees: Motivation Standard algorithms (e.g., variable elimination) are inefficient if the undirected graph underlying the Bayes Net contains cycles.
From Variable Elimination to Junction Trees
CPSC 668Set 10: Consensus with Byzantine Failures1 CPSC 668 Distributed Algorithms and Systems Fall 2009 Prof. Jennifer Welch.
Bayesian Network. Introduction Independence assumptions Seems to be necessary for probabilistic inference to be practical. Naïve Bayes Method Makes independence.
M.I. Jaime Alfonso Reyes ´Cortés.  The basic task for any probabilistic inference system is to compute the posterior probability distribution for a set.
Bayesian network inference
Bayesian Networks Chapter 2 (Duda et al.) – Section 2.11
Bayesian Networks. Motivation The conditional independence assumption made by naïve Bayes classifiers may seem to rigid, especially for classification.
PGM 2003/04 Tirgul 3-4 The Bayesian Network Representation.
CPSC 668Set 10: Consensus with Byzantine Failures1 CPSC 668 Distributed Algorithms and Systems Fall 2006 Prof. Jennifer Welch.
Bayesian networks Chapter 14 Section 1 – 2.
Bayesian Belief Networks
Copyright © 2004, Andrew W. Moore Naïve Bayes Classifiers Andrew W. Moore Professor School of Computer Science Carnegie Mellon University
. Approximate Inference Slides by Nir Friedman. When can we hope to approximate? Two situations: u Highly stochastic distributions “Far” evidence is discarded.
Bayesian Networks Alan Ritter.
. DAGs, I-Maps, Factorization, d-Separation, Minimal I-Maps, Bayesian Networks Slides by Nir Friedman.
1 Bayesian Networks Chapter ; 14.4 CS 63 Adapted from slides by Tim Finin and Marie desJardins. Some material borrowed from Lise Getoor.
Bayesian Reasoning. Tax Data – Naive Bayes Classify: (_, No, Married, 95K, ?)
. PGM 2002/3 – Tirgul6 Approximate Inference: Sampling.
Artificial Intelligence CS 165A Tuesday, November 27, 2007  Probabilistic Reasoning (Ch 14)
Bayesian networks Chapter 14. Outline Syntax Semantics.
Bayesian networks. Motivation We saw that the full joint probability can be used to answer any question about the domain, but can become intractable as.
For Wednesday Read Chapter 11, sections 1-2 Program 2 due.
Bayesian Statistics and Belief Networks. Overview Book: Ch 13,14 Refresher on Probability Bayesian classifiers Belief Networks / Bayesian Networks.
1 Monte Carlo Artificial Intelligence: Bayesian Networks.
Slide 1 Directed Graphical Probabilistic Models William W. Cohen Machine Learning Feb 2008.
Ch 8. Graphical Models Pattern Recognition and Machine Learning, C. M. Bishop, Revised by M.-O. Heo Summarized by J.W. Nam Biointelligence Laboratory,
Announcements Project 4: Ghostbusters Homework 7
Negative remarks about Romney centered on the perception that he was rude (20.6%) and that he "promised to cut help" (10.2%), an apparent reference to.
1 CMSC 671 Fall 2001 Class #21 – Tuesday, November 13.
Topic Models.
1 Use graphs and not pure logic Variables represented by nodes and dependencies by edges. Common in our language: “threads of thoughts”, “lines of reasoning”,
Exact Inference in Bayes Nets. Notation U: set of nodes in a graph X i : random variable associated with node i π i : parents of node i Joint probability:
Inference Algorithms for Bayes Networks
1 CMSC 671 Fall 2001 Class #20 – Thursday, November 8.
Pattern Recognition and Machine Learning
Reasoning Under Uncertainty: Independence and Inference CPSC 322 – Uncertainty 5 Textbook §6.3.1 (and for HMMs) March 25, 2011.
Oct 29th, 2001Copyright © 2001, Andrew W. Moore Bayes Net Structure Learning Andrew W. Moore Associate Professor School of Computer Science Carnegie Mellon.
Slide 1 Directed Graphical Probabilistic Models: inference William W. Cohen Machine Learning Feb 2008.
Chapter 12. Probability Reasoning Fall 2013 Comp3710 Artificial Intelligence Computing Science Thompson Rivers University.
CS 2750: Machine Learning Bayesian Networks Prof. Adriana Kovashka University of Pittsburgh March 14, 2016.
CS 2750: Machine Learning Directed Graphical Models
Read R&N Ch Next lecture: Read R&N
Approximate Inference
Read R&N Ch Next lecture: Read R&N
CAP 5636 – Advanced Artificial Intelligence
Bayesian Statistics and Belief Networks
Class #19 – Tuesday, November 3
CS 188: Artificial Intelligence
Directed Graphical Probabilistic Models: the sequel
Class #16 – Tuesday, October 26
Inference III: Approximate Inference
Read R&N Ch Next lecture: Read R&N
Read R&N Ch Next lecture: Read R&N
Presentation transcript:

Bayesian Networks: Independencies and Inference Scott Davies and Andrew Moore Note to other teachers and users of these slides. Andrew and Scott would be delighted if you found this source material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them to fit your own needs. PowerPoint originals are available. If you make use of a significant portion of these slides in your own lecture, please include this message, or the following link to the source repository of Andrew’s tutorials: Comments and corrections gratefully received.

What Independencies does a Bayes Net Model? In order for a Bayesian network to model a probability distribution, the following must be true by definition: Each variable is conditionally independent of all its non- descendants in the graph given the value of all its parents. This implies But what else does it imply?

What Independencies does a Bayes Net Model? Example: Z Y X Given Y, does learning the value of Z tell us nothing new about X? I.e., is P(X|Y, Z) equal to P(X | Y)? Yes. Since we know the value of all of X’s parents (namely, Y), and Z is not a descendant of X, X is conditionally independent of Z. Also, since independence is symmetric, P(Z|Y, X) = P(Z|Y).

Quick proof that independence is symmetric Assume: P(X|Y, Z) = P(X|Y) Then: (Bayes’s Rule) (Chain Rule) (By Assumption) (Bayes’s Rule)

What Independencies does a Bayes Net Model? Let I represent X and Z being conditionally independent given Y. I ? Yes, just as in previous example: All X’s parents given, and Z is not a descendant. Y XZ

What Independencies does a Bayes Net Model? I ? No. I ? Yes. Maybe I iff S acts a cutset between X and Z in an undirected version of the graph…? Z VU X

Things get a little more confusing X has no parents, so we’re know all its parents’ values trivially Z is not a descendant of X So, I, even though there’s a undirected path from X to Z through an unknown variable Y. What if we do know the value of Y, though? Or one of its descendants? ZX Y

The “Burglar Alarm” example Your house has a twitchy burglar alarm that is also sometimes triggered by earthquakes. Earth arguably doesn’t care whether your house is currently being burgled While you are on vacation, one of your neighbors calls and tells you your home’s burglar alarm is ringing. Uh oh! BurglarEarthquake Alarm Phone Call

Things get a lot more confusing But now suppose you learn that there was a medium-sized earthquake in your neighborhood. Oh, whew! Probably not a burglar after all. Earthquake “explains away” the hypothetical burglar. But then it must not be the case that I, even though I ! BurglarEarthquake Alarm Phone Call

d-separation to the rescue Fortunately, there is a relatively simple algorithm for determining whether two variables in a Bayesian network are conditionally independent: d-separation. Definition: X and Z are d-separated by a set of evidence variables E iff every undirected path from X to Z is “blocked”, where a path is “blocked” iff one or more of the following conditions is true:...

A path is “blocked” when... There exists a variable V on the path such that it is in the evidence set E the arcs putting V in the path are “tail-to-tail” Or, there exists a variable V on the path such that it is in the evidence set E the arcs putting V in the path are “tail-to-head” Or,... V V

A path is “blocked” when… (the funky case) … Or, there exists a variable V on the path such that it is NOT in the evidence set E neither are any of its descendants the arcs putting V on the path are “head-to-head” V

d-separation to the rescue, cont’d Theorem [Verma & Pearl, 1998]: If a set of evidence variables E d-separates X and Z in a Bayesian network’s graph, then I. d-separation can be computed in linear time using a depth-first-search-like algorithm. Great! We now have a fast algorithm for automatically inferring whether learning the value of one variable might give us any additional hints about some other variable, given what we already know. “Might”: Variables may actually be independent when they’re not d- separated, depending on the actual probabilities involved

d-separation example AB CD EF G I H J I ?

Bayesian Network Inference Inference: calculating P(X|Y) for some variables or sets of variables X and Y. Inference in Bayesian networks is #P-hard! Reduces to How many satisfying assignments? I1I2I3I4I5 O Inputs: prior probabilities of.5 P(O) must be (#sat. assign.)*(.5^#inputs)

Bayesian Network Inference But…inference is still tractable in some cases. Let’s look a special class of networks: trees / forests in which each node has at most one parent.

Decomposing the probabilities Suppose we want P(X i | E) where E is some set of evidence variables. Let’s split E into two parts: E i - is the part consisting of assignments to variables in the subtree rooted at X i E i + is the rest of it XiXi

Decomposing the probabilities, cont’d XiXi

XiXi

XiXi

XiXi Where:  is a constant independent of X i  (X i ) = P(X i |E i + ) (X i ) = P(E i - | X i )

Using the decomposition for inference We can use this decomposition to do inference as follows. First, compute (X i ) = P(E i - | X i ) for all X i recursively, using the leaves of the tree as the base case. If X i is a leaf: If X i is in E: (X i ) = 1 if X i matches E, 0 otherwise If X i is not in E: E i - is the null set, so P(E i - | X i ) = 1 (constant)

Quick aside: “Virtual evidence” For theoretical simplicity, but without loss of generality, let’s assume that all variables in E (the evidence set) are leaves in the tree. Why can we do this WLOG: XiXi XiXi Xi’Xi’ Observe X i Equivalent to Observe X i ’ Where P( X i ’| X i ) =1 if X i ’=X i, 0 otherwise

Calculating (X i ) for non-leaves Suppose X i has one child, X c. Then: XiXi XcXc

Calculating (X i ) for non-leaves Suppose X i has one child, X c. Then: XiXi XcXc

Calculating (X i ) for non-leaves Suppose X i has one child, X c. Then: XiXi XcXc

Calculating (X i ) for non-leaves Suppose X i has one child, X c. Then: XiXi XcXc

Calculating (X i ) for non-leaves Now, suppose X i has a set of children, C. Since X i d-separates each of its subtrees, the contribution of each subtree to (X i ) is independent: where j (X i ) is the contribution to P(E i - | X i ) of the part of the evidence lying in the subtree rooted at one of X i ’s children X j.

We are now -happy So now we have a way to recursively compute all the (X i )’s, starting from the root and using the leaves as the base case. If we want, we can think of each node in the network as an autonomous processor that passes a little “ message” to its parent.

The other half of the problem Remember, P(X i |E) =  (X i ) (X i ). Now that we have all the (X i )’s, what about the  (X i )’s?  (X i ) = P(X i |E i + ). What about the root of the tree, X r ? In that case, E r + is the null set, so  (X r ) = P(X r ). No sweat. Since we also know (X r ), we can compute the final P(X r ). So for an arbitrary X i with parent X p, let’s inductively assume we know  (X p ) and/or P(X p |E). How do we get  (X i )?

Computing  (X i ) XpXp XiXi

XpXp XiXi

XpXp XiXi

XpXp XiXi

XpXp XiXi

XpXp XiXi Where  i (X p ) is defined as

We’re done. Yay! Thus we can compute all the  (X i )’s, and, in turn, all the P(X i |E)’s. Can think of nodes as autonomous processors passing and  messages to their neighbors   

Conjunctive queries What if we want, e.g., P(A, B | C) instead of just marginal distributions P(A | C) and P(B | C)? Just use chain rule: P(A, B | C) = P(A | C) P(B | A, C) Each of the latter probabilities can be computed using the technique just discussed.

Polytrees Technique can be generalized to polytrees: undirected versions of the graphs are still trees, but nodes can have more than one parent

Dealing with cycles Can deal with undirected cycles in graph by clustering variables together Conditioning A BC D A D BC Set to 0 Set to 1

Join trees Arbitrary Bayesian network can be transformed via some evil graph-theoretic magic into a join tree in which a similar method can be employed. A B ED F C G ABC BCD DF In the worst case the join tree nodes must take on exponentially many combinations of values, but often works well in practice