1 Bayesian Networks (Directed Acyclic Graphical Models) The situation of a bell that rings whenever the outcome of two coins are equal can not be well.

Slides:



Advertisements
Similar presentations
Markov Networks Alan Ritter.
Advertisements

Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.
CS498-EA Reasoning in AI Lecture #15 Instructor: Eyal Amir Fall Semester 2011.
Graphical Models BRML Chapter 4 1. the zoo of graphical models Markov networks Belief networks Chain graphs (Belief and Markov ) Factor graphs =>they.
Weakening the Causal Faithfulness Assumption
Bayesian Networks, Winter Yoav Haimovitch & Ariel Raviv 1.
BAYESIAN NETWORKS CHAPTER#4 Book: Modeling and Reasoning with Bayesian Networks Author : Adnan Darwiche Publisher: CambridgeUniversity Press 2009.
Identifying Conditional Independencies in Bayes Nets Lecture 4.
Parallel Scheduling of Complex DAGs under Uncertainty Grzegorz Malewicz.
Bayesian Networks VISA Hyoungjune Yi. BN – Intro. Introduced by Pearl (1986 ) Resembles human reasoning Causal relationship Decision support system/ Expert.
From: Probabilistic Methods for Bioinformatics - With an Introduction to Bayesian Networks By: Rich Neapolitan.
Chapter 8-3 Markov Random Fields 1. Topics 1. Introduction 1. Undirected Graphical Models 2. Terminology 2. Conditional Independence 3. Factorization.
From Variable Elimination to Junction Trees
Bayesian Networks A causal probabilistic network, or Bayesian network,
Boolean Algebra cont’ The digital abstraction Graphs and Topological Sort מבנה המחשב + מבוא למחשבים ספרתיים תרגול 2#
PGM 2003/04 Tirgul 3-4 The Bayesian Network Representation.
NP-Complete Problems Reading Material: Chapter 10 Sections 1, 2, 3, and 4 only.
Bayesian Network Representation Continued
. Bayesian Networks Lecture 9 Edited from Nir Friedman’s slides by Dan Geiger from Nir Friedman’s slides.
1 CSE 417: Algorithms and Computational Complexity Winter 2001 Lecture 23 Instructor: Paul Beame.
. Inference I Introduction, Hardness, and Variable Elimination Slides by Nir Friedman.
Bayesian Networks Alan Ritter.
PGM 2002/03 Tirgul5 Clique/Junction Tree Inference.
. DAGs, I-Maps, Factorization, d-Separation, Minimal I-Maps, Bayesian Networks Slides by Nir Friedman.
Relations Chapter 9.
Probabilistic Graphical Models David Madigan Rutgers University
Chapter 9. Chapter Summary Relations and Their Properties Representing Relations Equivalence Relations Partial Orderings.
Chapter 9. Chapter Summary Relations and Their Properties n-ary Relations and Their Applications (not currently included in overheads) Representing Relations.
Chapter 9. Section 9.1 Binary Relations Definition: A binary relation R from a set A to a set B is a subset R ⊆ A × B. Example: Let A = { 0, 1,2 } and.
Automated Planning and Decision Making Prof. Ronen Brafman Automated Planning and Decision Making 2007 Bayesian networks Variable Elimination Based on.
Learning Linear Causal Models Oksana Kohutyuk ComS 673 Spring 2005 Department of Computer Science Iowa State University.
UIUC CS 598: Section EA Graphical Models Deepak Ramachandran Fall 2004 (Based on slides by Eyal Amir (which were based on slides by Lise Getoor and Alvaro.
Ch 8. Graphical Models Pattern Recognition and Machine Learning, C. M. Bishop, Revised by M.-O. Heo Summarized by J.W. Nam Biointelligence Laboratory,
1 COROLLARY 4: D is an I-map of P iff each variable X is conditionally independent in P of all its non-descendants, given its parents. Proof  : Each variable.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS.
INTERVENTIONS AND INFERENCE / REASONING. Causal models  Recall from yesterday:  Represent relevance using graphs  Causal relevance ⇒ DAGs  Quantitative.
Knowledge Representation & Reasoning Lecture #4 UIUC CS 498: Section EA Professor: Eyal Amir Fall Semester 2005 (Based on slides by Lise Getoor and Alvaro.
1 BN Semantics 1 Graphical Models – Carlos Guestrin Carnegie Mellon University September 15 th, 2008 Readings: K&F: 3.1, 3.2, –  Carlos.
Independence, Decomposability and functions which take values into an Abelian Group Adrian Silvescu Vasant Honavar Department of Computer Science Iowa.
Chapter 9. Chapter Summary Relations and Their Properties n-ary Relations and Their Applications (not currently included in overheads) Representing Relations.
1 Use graphs and not pure logic Variables represented by nodes and dependencies by edges. Common in our language: “threads of thoughts”, “lines of reasoning”,
1 BN Semantics 2 – The revenge of d-separation Graphical Models – Carlos Guestrin Carnegie Mellon University September 20 th, 2006 Readings: K&F:
1 Learning P-maps Param. Learning Graphical Models – Carlos Guestrin Carnegie Mellon University September 24 th, 2008 Readings: K&F: 3.3, 3.4, 16.1,
Reasoning Under Uncertainty: Independence and Inference CPSC 322 – Uncertainty 5 Textbook §6.3.1 (and for HMMs) March 25, 2011.
Introduction on Graphic Models
1 BN Semantics 2 – Representation Theorem The revenge of d-separation Graphical Models – Carlos Guestrin Carnegie Mellon University September 17.
1 BN Semantics 1 Graphical Models – Carlos Guestrin Carnegie Mellon University September 15 th, 2006 Readings: K&F: 3.1, 3.2, 3.3.
. Bayesian Networks Some slides have been edited from Nir Friedman’s lectures which is available at Changes made by Dan Geiger.
. Introduction to Bayesian Networks Instructor: Dan Geiger Web page:
Knowledge Representation & Reasoning Lecture #5 UIUC CS 498: Section EA Professor: Eyal Amir Fall Semester 2005 (Based on slides by Lise Getoor and Alvaro.
1 BN Semantics 3 – Now it’s personal! Parameter Learning 1 Graphical Models – Carlos Guestrin Carnegie Mellon University September 22 nd, 2006 Readings:
Daphne Koller Independencies Bayesian Networks Probabilistic Graphical Models Representation.
Relations Chapter 9 Copyright © McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill.
Instructor: Eyal Amir Grad TAs: Wen Pu, Yonatan Bisk
CS 2750: Machine Learning Directed Graphical Models
Relations Chapter 9.
Bayesian Networks Background Readings: An Introduction to Bayesian Networks, Finn Jensen, UCL Press, Some slides have been edited from Nir Friedman’s.
Bell & Coins Example Coin1 Bell Coin2
The set  of all independence statements defined by (3
Markov Properties of Directed Acyclic Graphs
Bayesian Networks (Directed Acyclic Graphical Models)
Dependency Models – abstraction of Probability distributions
Bayesian Networks Based on
Independence in Markov Networks
Markov Networks Independencies Representation Probabilistic Graphical
Bayesian Networks Independencies Representation Probabilistic
Independence in Markov Networks
Markov Networks Independencies Representation Probabilistic Graphical
Independence in Markov Networks
Markov Networks Independencies Representation Probabilistic Graphical
Presentation transcript:

1 Bayesian Networks (Directed Acyclic Graphical Models) The situation of a bell that rings whenever the outcome of two coins are equal can not be well represented by undirected graphical models. A clique will be formed because of induced dependency of the two coins given the bell. Coin 1 Bell Coin 2

2 Bayesian Networks (BNs) Examples of models for diseases & symptoms & risk factors One variable for all diseases (values are diseases) One variable per disease (values are True/False) Naïve Bayesian Networks versus Bipartite BNs

3 Boundary Basis for Dependency Models Let M be a dependency model over U={X 1,…,X n }. Let d be an ordering of these elements. A boundary basis wrt d of M is a set of independence statements I(X i, B i, U i -B i ) that hold in M where U i ={X 1,X 2,…,X i-1 }, i=1,..n. A boundary basis is minimal if every B i is minimal. Example I: What is the boundary basis for P(X1,X2,X3,X4) = P(X1)P(X2|X1)P(X3|X2)P(X4|X3)?

4 Example I I ( X 3, X 2, X 1 ) I ( X 4, X 3, { X 1, X 2 }) X1X1 X2X2 X3X3 X4X4 A boundary basis and a boundary DAG for: P(X1,X2,X3,X4) = P(X1)P(X2|X1)P(X3|X2)P(X4|X3)? The directed acyclic graph (DAG) created by assigning each vertex X i the parents B i is called the boundary DAG of M relative to order d.

5 Example II I ( coin 1, { },coin 2 ) Coin 1 Bell Coin 2 A boundary basis and a boundary DAG for: P(coin 1,coin 2,bell) =P(coin 1 )P(coin 2 )P(bell|coin 1,coin 2 )

6 Example III  In the order V,S,T,L,B,A,X,D, we have a boundary basis:  I( S, { }, V )  I( T, V, S)  I( l, S, {T, V})  …  I( X,A, { V,S,T,L,B,D}) V S L T A B XD Does I ( {X, D},A,V) also hold in the dependency model P ?

7 1. A Directed Acyclic Graph (DAG) D=(U,E) is an I-map of a dependency model M over U if I D (X,Z,Y)  I M (X,Z,Y) for all disjoint subsets X,Y, Z of U. 2.D is a minimal I-map of M if by removing any edge, D ceases to be an I-map. 3. D is a perfect map of M if I D (X,Z,Y)  I M (X,Z,Y) for all disjoint subsets X,Y, Z of U. Definitions Can we define “Independence” I D (X,Z,Y) graphically that answers these probabilistic independence questions ?

8 From Separation in UGs To d-Separation in DAGs

9 Paths u Intuition: dependency must “flow” along paths in the graph u A path is a sequence of neighboring variables Examples: u X  A  D  B u A  L  S  B V S L T A B XD

10 Path blockage u Every path is classified given the evidence:  active -- creates a dependency between the end nodes  blocked – does not create a dependency between the end nodes Evidence means the assignment of a value to a subset of nodes.

11 Blocked S L B S L B Path Blockage Three cases:  Common cause  Blocked Active

12 Blocked S A L S A L Path Blockage Three cases:  Common cause  Intermediate cause  Blocked Active

13 Blocked T L X A T L X A T L X A Path Blockage Three cases:  Common cause  Intermediate cause  Common Effect Blocked Active

14 Definition of Path Blockage Definition: A path is active, given evidence Z, if  Whenever we have the configuration then either A or one of its descendents is in Z  No other nodes in the path are in Z. Definition: A path is blocked, given evidence Z, if it is not active. T L A Definition: X is d-separated from Y, given Z, if all paths from a node in X and a node in Y are blocked, given Z.

15 d-Separation

16  I D (T,S|  ) = yes Example V S L T A B XD

17 V S L T A B XD  I D (T,S |  ) = yes  I D (T,S|D) = no Example

18  I D (T,S |  ) = yes  I D (T,S|D) = no  I D (T,S|{D,L,B}) = yes Example V S L T A B XD

19 Example  In the order V,S,T,L,B,A,X,D, we get from the boundary basis:  I D ( S, { }, V )  I D ( T, V, S)  I D ( l, S, {T, V})  …  I D ( X,A, { V,S,T,L,B,D}) V S L T A B XD

20 Main Result - Soundness

21 Bayesian Networks (Directed Acyclic Graphical Models) Definition: Given a probability distribution P on a set of variables U, a DAG D = (U,E) is called a Bayesian Network of P iff D is a minimal I-map of P.

22 First claim holds because any probability distribution is a semi graphoid (Symmetry, Decomposition, Contraction, Weak union).

23 Second claim of uniqueness of parents sets holds due to. I(X,ZW 1,YW 2 ) and I(X,ZW 2,YW 1 )  I(X,Z,YW 1 W 2 ) Proof: (1) I(X, ZW 1,YW 2 ). Given. (2) I(X, ZW 2,YW 1 ). Given. (3) I(X, ZW 1 W 2,Y) by weak union from (1). (4) I(X, ZYW 1,W 2 ) by weak union from (1). (5) I(X, ZYW 2,W 1 ) by weak union from (2). (6) I(X, ZY, W 1 W 2 ) by intersection from (4) and (5).  I(X, Z, YW 1 W 2 ) by intersection from (3) and (6).

24 d-separation The definition of I D (X, Z, Y) is such that: Soundness [Theorem 9]: I D (X, Z, Y) = yes implies I P (X, Z, Y) follows from the boundary Basis(D). Completeness [Theorem 10]: I D (X, Z, Y) = no implies I P (X, Z, Y) does not follow from the boundary Basis(D).

25 Revisiting Example II V S L T A B XD So does I P ( {X, D},A, V) hold ? Enough to check d-separation !

26 Bayesian Networks with numbers p(t|v) V S L T A B XD p(x|a) p(d|a,b) p(a|t,l) p(b|s) p(l|s) p(s)p(s) p(v)p(v)

27 Bayesian Network (cont.) Each Directed Acyclic Graph defines a factorization of the form: p(t|v) V S L T A B XD p(x|a) p(d|a,b) p(a|t,l) p(b|s) p(l|s) p(s)p(s) p(v)p(v)

28 Independence in Bayesian networks This set of independence assertions is denoted Basis(G). All other independence assertions that are entailed by (*) are derivable using the semi-graphoid axioms. I P ( X i ; { X 1,…,X i-1 }\Pa i | Pa i )

29 Local distributions- Asymmetric independence Table: p(A=y|L=n, T=n) = 0.02 p(A=y|L=n, T=y) = 0.60 p(A=y|L=y, T=n) = 0.99 p(A=y|L=y, T=y) = 0.99 Lung Cancer (Yes/No) Tuberculosis (Yes/No) Abnormality in Chest (Yes/no) p(A|T,L)

30 COROLLARY 4: D is an I-map of P iff each variable X is conditionally independent in P of all its non-descendants, given its parents. Proof  : Each variable X is conditionally independent of all its non-descendants, given its parents implies using decomposition that it is also independent of its predecessors in a particular order d. Proof  : X is d-separated of all its non-descendants, given its parents. Since D is an I-map, by the soundness theorem the claim holds.

31 COROLLARY 5: If D=(U,E) is a boundary DAG of P constructed in some order d, then any topological order d’ of U will yield the same boundary DAG of P. (Hence construction order can be forgotten). Proof : By Corollary 4, each variable X is d-separated of all its non-descendants, given its parents in the boundary DAG of P. In particular, due to decomposition, X is independent given its parents from all previous variables in any topological order d’.

32 Extension of the Markov Chain Property I(X k, X k-1, X 1 … X k-2 )  I(X k, X k-1 X k+1, X 1 … X k-2 X k+2 … X n ) Holds due to the soundness theorem. Converse holds when Intersection is assumed. Markov Blankets in DAGs

33 Consequence: There is no improvement to d-separation and no statement escapes graphical representation. Reasoning: (1) If there were an independence statement  not shown by d-separation, then  must be true in all distributions that satisfy the basis. But Theorem 10 states that there exists a distribution that satisfies the basis and violates . (2) Same argument. [Note that (2) is a stronger claim.]