1 Bayesian Networks (Directed Acyclic Graphical Models) The situation of a bell that rings whenever the outcome of two coins are equal can not be well.


Presentation transcript:

1 Bayesian Networks (Directed Acyclic Graphical Models) The situation of a bell that rings whenever the outcome of two coins are equal can not be well represented by undirected graphical models. A clique will be formed because of induced dependency of the two coins given the bell. Coin 1 Bell Coin 2

2 Bayesian Networks (BNs) Examples of models for diseases & symptoms & risk factors One variable for all diseases (values are diseases) One variable per disease (values are True/False) Naïve Bayesian Networks versus Bipartite BNs

3 Boundary Basis for Dependency Models Let M be a dependency model over U={X 1,…,X n }. Let d be an ordering of these elements. A boundary basis wrt d of M is a set of independence statements I(X i, B i, U i -B i ) that hold in M where U i ={X 1,X 2,…,X i-1 }, i=1,..n. A boundary basis is minimal if every B i is minimal. Example I: What is the boundary basis for P(X1,X2,X3,X4) = P(X1)P(X2|X1)P(X3|X2)P(X4|X3)?

4 Example I I ( X 3, X 2, X 1 ) I ( X 4, X 3, { X 1, X 2 }) X1X1 X2X2 X3X3 X4X4 A boundary basis and a boundary DAG for: P(X1,X2,X3,X4) = P(X1)P(X2|X1)P(X3|X2)P(X4|X3)? The directed acyclic graph (DAG) created by assigning each vertex X i the parents B i is called the boundary DAG of M relative to order d.

5 Example II I ( coin 1, { },coin 2 ) Coin 1 Bell Coin 2 A boundary basis and a boundary DAG for: P(coin 1,coin 2,bell) =P(coin 1 )P(coin 2 )P(bell|coin 1,coin 2 )

6 Example III  In the order V,S,T,L,B,A,X,D, we have a boundary basis:  I( S, { }, V )  I( T, V, S)  I( l, S, {T, V})  …  I( X,A, { V,S,T,L,B,D}) V S L T A B XD Does I ( {X, D},A,V) also hold in the dependency model P ?

7 1. A Directed Acyclic Graph (DAG) D=(U,E) is an I-map of a dependency model M over U if I D (X,Z,Y)  I M (X,Z,Y) for all disjoint subsets X,Y, Z of U. 2.D is a minimal I-map of M if by removing any edge, D ceases to be an I-map. 3. D is a perfect map of M if I D (X,Z,Y)  I M (X,Z,Y) for all disjoint subsets X,Y, Z of U. Definitions Can we define “Independence” I D (X,Z,Y) graphically that answers these probabilistic independence questions ?

8 From Separation in UGs To d-Separation in DAGs

9 Paths u Intuition: dependency must “flow” along paths in the graph u A path is a sequence of neighboring variables Examples: u X  A  D  B u A  L  S  B V S L T A B XD

10 Path blockage u Every path is classified given the evidence:  active -- creates a dependency between the end nodes  blocked – does not create a dependency between the end nodes Evidence means the assignment of a value to a subset of nodes.

11 Blocked S L B S L B Path Blockage Three cases:  Common cause  Blocked Active

12 Blocked S A L S A L Path Blockage Three cases:  Common cause  Intermediate cause  Blocked Active

13 Blocked T L X A T L X A T L X A Path Blockage Three cases:  Common cause  Intermediate cause  Common Effect Blocked Active

14 Definition of Path Blockage Definition: A path is active, given evidence Z, if  Whenever we have the configuration then either A or one of its descendents is in Z  No other nodes in the path are in Z. Definition: A path is blocked, given evidence Z, if it is not active. T L A Definition: X is d-separated from Y, given Z, if all paths from a node in X and a node in Y are blocked, given Z.

15 d-Separation

16  I D (T,S|  ) = yes Example V S L T A B XD

17 V S L T A B XD  I D (T,S |  ) = yes  I D (T,S|D) = no Example

18  I D (T,S |  ) = yes  I D (T,S|D) = no  I D (T,S|{D,L,B}) = yes Example V S L T A B XD

19 Example  In the order V,S,T,L,B,A,X,D, we get from the boundary basis:  I D ( S, { }, V )  I D ( T, V, S)  I D ( l, S, {T, V})  …  I D ( X,A, { V,S,T,L,B,D}) V S L T A B XD

20 Main Result - Soundness

21 Bayesian Networks (Directed Acyclic Graphical Models) Definition: Given a probability distribution P on a set of variables U, a DAG D = (U,E) is called a Bayesian Network of P iff D is a minimal I-map of P.

22 First claim holds because any probability distribution is a semi graphoid (Symmetry, Decomposition, Contraction, Weak union).

23 Second claim of uniqueness of parents sets holds due to. I(X,ZW 1,YW 2 ) and I(X,ZW 2,YW 1 )  I(X,Z,YW 1 W 2 ) Proof: (1) I(X, ZW 1,YW 2 ). Given. (2) I(X, ZW 2,YW 1 ). Given. (3) I(X, ZW 1 W 2,Y) by weak union from (1). (4) I(X, ZYW 1,W 2 ) by weak union from (1). (5) I(X, ZYW 2,W 1 ) by weak union from (2). (6) I(X, ZY, W 1 W 2 ) by intersection from (4) and (5).  I(X, Z, YW 1 W 2 ) by intersection from (3) and (6).

24 d-separation The definition of I D (X, Z, Y) is such that: Soundness [Theorem 9]: I D (X, Z, Y) = yes implies I P (X, Z, Y) follows from the boundary Basis(D). Completeness [Theorem 10]: I D (X, Z, Y) = no implies I P (X, Z, Y) does not follow from the boundary Basis(D).

25 Revisiting Example II V S L T A B XD So does I P ( {X, D},A, V) hold ? Enough to check d-separation !

26 Bayesian Networks with numbers p(t|v) V S L T A B XD p(x|a) p(d|a,b) p(a|t,l) p(b|s) p(l|s) p(s)p(s) p(v)p(v)

27 Bayesian Network (cont.) Each Directed Acyclic Graph defines a factorization of the form: p(t|v) V S L T A B XD p(x|a) p(d|a,b) p(a|t,l) p(b|s) p(l|s) p(s)p(s) p(v)p(v)

28 Independence in Bayesian networks This set of independence assertions is denoted Basis(G). All other independence assertions that are entailed by (*) are derivable using the semi-graphoid axioms. I P ( X i ; { X 1,…,X i-1 }\Pa i | Pa i )

29 Local distributions- Asymmetric independence Table: p(A=y|L=n, T=n) = 0.02 p(A=y|L=n, T=y) = 0.60 p(A=y|L=y, T=n) = 0.99 p(A=y|L=y, T=y) = 0.99 Lung Cancer (Yes/No) Tuberculosis (Yes/No) Abnormality in Chest (Yes/no) p(A|T,L)

30 COROLLARY 4: D is an I-map of P iff each variable X is conditionally independent in P of all its non-descendants, given its parents. Proof  : Each variable X is conditionally independent of all its non-descendants, given its parents implies using decomposition that it is also independent of its predecessors in a particular order d. Proof  : X is d-separated of all its non-descendants, given its parents. Since D is an I-map, by the soundness theorem the claim holds.

31 COROLLARY 5: If D=(U,E) is a boundary DAG of P constructed in some order d, then any topological order d’ of U will yield the same boundary DAG of P. (Hence construction order can be forgotten). Proof : By Corollary 4, each variable X is d-separated of all its non-descendants, given its parents in the boundary DAG of P. In particular, due to decomposition, X is independent given its parents from all previous variables in any topological order d’.

32 Extension of the Markov Chain Property I(X k, X k-1, X 1 … X k-2 )  I(X k, X k-1 X k+1, X 1 … X k-2 X k+2 … X n ) Holds due to the soundness theorem. Converse holds when Intersection is assumed. Markov Blankets in DAGs

33 Consequence: There is no improvement to d-separation and no statement escapes graphical representation. Reasoning: (1) If there were an independence statement  not shown by d-separation, then  must be true in all distributions that satisfy the basis. But Theorem 10 states that there exists a distribution that satisfies the basis and violates . (2) Same argument. [Note that (2) is a stronger claim.]