Download presentation
Presentation is loading. Please wait.
Bell & Coins Example Coin1 Bell Coin2 Consider a bell that rings with high probability only when the outcome of two coins are equal. This situation can be well represented by a directed acyclic graph with the probability distribution: P(c1,c2’ b) = P(c1) P(c2) P(b | c1, c2) (8 assignments) = P(c1) P(c2|c1) P(b | c1, c2) Only Assumption: Coins are marginally independent I ( coin2, { } ,coin1)
Bayesian Networks (Directed Acyclic Graphical Models)
Definition: A Bayesian Network is a pair (P,D) where P is a Probability distribution over variables X1 ,…, Xn and D is a Directed Acyclic Graph (DAG) over vertices X1 ,…, Xn with the relationship: P(x1 ,…, xn ) = 𝑖=1 𝑛 P(xi | pai) for all values xi of Xi and where pai is the set of parents of vertex xi in D. (When Xi is a source vertex then pai is the empty set of parents and P(xi | pai) reduces to P(xi).) DAG = directed graph without any directed cycles.
Bayesian Networks (BNs)
Examples to model diseases & symptoms & risk factors One variable for all diseases (values are diseases) (Draw it) One variable per disease (values are True/False) Naïve Bayesian Networks versus Bipartite BNs (details) Adding Risk Factors
Naïve Bayes for Diseases & Symptoms
sk ... Values of D are {D1,…Dm} and represent non-overlapping diseases. Values of each symptom S represent severity of symptom {0,1,2}. P(d, s1 ,…, sk ) = p(d) 𝑖=1 𝑘 P(si | d)
Risks, Diseases, Symptoms
... s1 D1 D2 D3 Dn s2 s3 sk sk-1 Values of each Disease Di are represent existence/non-existence {True, False}. Values of each symptom S represent severity of symptom {0,1,2}.
Natural Direction of Information
Consider our example of a hypothesis H=h indicating disease and a set of symptoms such as fever, blood pressure, pain, etc, marked by S1=s1, S2=s2, S3=s3 or in short E=e. We assume P(e | h) are given (or inferred from data) and compute P(h | e). Natural order to specify a BN is usually according to causation or time line. Nothing in the definition dictates this choice, but normally more independence assumptions are explicated in the graph via this choice. Example: Atomic clock + watch clocks naïve Bayes model.
The “Visit-to-Asia” Example
Smoking Lung Cancer Tuberculosis Abnormality in Chest Bronchitis X-Ray Dyspnea What are the relevant variables and their dependencies ?
An abstract example In the order V,S,T,L,B,A,X,D, we have a boundary basis: I( S, { }, V ) I( T, V, S) I( l, S, {T, V}) … I( X,A, {V,S,T,L,B,D}) V S L T A B X D Does I ( {X, D} ,A,V) also hold in P ?
Independence in Bayesian networks
I ( Xi , pai , {X1,…,Xi-1} \ pai ) This set of n independence assumptions is called Basis(D) . This equation uses specific topological order d of vertices, namely, that for every Xi, the parents of Xi appear before Xi in this order. However, all choices of a topological order are equivalent. Check for example topological orders of V,S,T,L of visit-to-Asia example.
Directed and Undirected Chain Example
P(x1,x2,x3,x4) = P(x1) P(x2|x1) P(x3|x2) P(x4|x3) Assumptions: I ( X3 , X2 , X1), I ( X4 , X3,{X1, X2}) X1 X2 X3 X4 P(x1,x2,x3,x4) = P(x1, x2) P(x3|x2) P(x4|x3) Markov network: The joint distribution is a Multiplication of functions on three maximal cliques with normalizing factor 1.
Reminder: Markov Networks (Undirected Graphical Models)
Definition: A Markov Network is a pair (P,G) where P is a probability distribution over variables X1 ,…, Xn and G is a undirected graph over vertices X1 ,…, Xn with the relationship: P(x1 ,…, xn ) = K 𝑗=1 𝑚 gj(Cj ) for all values xi of Xi and where C1 … Cm are the maximal cliques of G and gj are functions that assign a non-negative number to every value combination of the variables in Cj. Markov Networks and Bayesian networks represent the same set of independence assumptions only on Chordal graphs (such as chains and trees).
To d-Separation in DAGs
From Separation in UGs To d-Separation in DAGs
Paths Intuition: dependency must “flow” along paths in the graph
A path is a sequence of neighboring variables Examples: X A D B A L S B V S L T A B X D
Path blockage Every path is classified given the evidence:
active -- creates a dependency between the end nodes blocked – does not create a dependency between the end nodes Evidence means the assignment of a value to a subset of nodes.
Path Blockage Three cases: Common cause Blocked Blocked Active S L B S
Path Blockage Three cases: Common cause Intermediate cause Blocked
Active Blocked S A L
Path Blockage Three cases: Common cause Intermediate cause
Common Effect Blocked Active Blocked T L A X T L A X
Definition of Path Blockage
Definition: A path is active, given evidence Z, if Whenever we have the configuration then either A or one of its descendents is in Z No other nodes in the path are in Z. Definition: A path is blocked, given evidence Z, if it is not active. T L A Definition: X is d-separated from Y, given Z, if all paths from a node in X and a node in Y are blocked, given Z. Denoted by ID(X, Z, Y) . (X,Y,Z) are sets of variables that are disjoint.
Example ID(T,S|) = yes V S L T A B X D
Example ID (T,S |) = yes ID(T,S|D) = no V S L T A B X D
Example ID (T,S |) = yes ID(T,S|D) = no ID(T,S|{D,L,B}) = yes V S L T
Example In the order V,S,T,L,B,A,X,D, we get from the boundary basis:
ID( S, { }, V ) ID( T, V, S) ID( l, S, {T, V}) … ID( X,A, {V,S,T,L,B,D}) V S L T A B X D
Main results on d-separation
The definition of ID(X, Z, Y) is such that: Soundness [Theorem 9]: ID(X, Z, Y) = yes implies IP(X, Z, Y) follows from Basis(D). Completeness [Theorem 10]: ID(X, Z, Y) = no implies IP(X, Z, Y) does not follow from Basis(D).
Revisiting Example II So does IP( {X, D} ,A, V) hold ?
B X D So does IP( {X, D} ,A, V) hold ? Enough to check d-separation !
Local distributions- Asymmetric independence
Lung Cancer (Yes/No) Tuberculosis Abnormality in Chest (Yes/no) p(A|T,L) Table: p(A=y|L=n, T=n) = 0.02 p(A=y|L=n, T=y) = 0.60 p(A=y|L=y, T=n) = 0.99 p(A=y|L=y, T=y) = 0.99 Independence for some values IP(A, L=y, T) holds IP(A, L=n, T) does not hold So: IP(A, L, T) does not hold
ID(Xi, pai, non-descendantsi).
Claim 1: Each vertex Xi in a Bayesian Network is d-separated of all its non-descendants’ given its parents pai. Proof : Each vertex Xi is connected to its non-descendantsi via its parents or via its descendants. All paths via its parents are blocked because pai are given and all paths via descendants are blocked because they pass through converging edges Z were Z is not given. Hence by definition of d-separation the claim holds: ID(Xi, pai, non-descendantsi).
Independence in Bayesian networks
= 𝑝( 𝑥 𝑑(1 ,…, 𝑥 𝑑(𝑛 )= 𝑖=1 𝑛 𝑝( 𝑥 𝑑(𝑖) |𝐩 𝐚 𝑑(𝑖) Implies for every topological order d ? = 𝑝( 𝑥 𝑑(1 ,…, 𝑥 𝑑(𝑛 )= 𝑖=1 𝑛 𝑝( 𝑥 𝑑(𝑖) | 𝑥 d(1) ,… 𝑥 𝑑(𝑖−1)
Claim 2: Each topological order d in a BN entails the same set of independence assumptions.
Proof : By Claim 1: ID(Xi, pai, non-descendandsi) holds. For each topological order d on {1,…,n}, it follows IP(Xd(i), pad(i), non-descendsd(i)) holds as well. From soundness (Theorem 9) IP(Xd(i), pad(i), non-descendsd(i)) holds as well. By the decomposition property of conditional independence IP(Xd(i), pad(i), S ) holds for every S that is a subset of non-descendsd(i) . Hence, Xi is independent given its parents also from S ={all variables before Xi in an arbitrary topological order d}.
Extension of the Markov Chain Property
For Markov chains we assume Basis(D): IP(Xi , Xi-1 , {X1 … Xi-2}) Due to soundness of d-separation (Theorem 9) we also get: IP(Xi , {Xi-1 ,Xi+1} , {X1 … Xi-2 , Xi+2… Xn }) Definition: A Markov blanket of a variable X wrt probability distribution P is a set of variables B(X) such that IP(X, B(X), all-other-variables) Chain Example: B(Xi) = {Xi-1 ,Xi+1}
Markov Blankets in BNs Proof: Consequence of soundness of d-separation (Theorem 9).
Similar presentations
© 2025 Inc.
All rights reserved.