BN Semantics 2 – The revenge of d-separation

Slides:

Advertisements

Similar presentations

1 Undirected Graphical Models Graphical Models – Carlos Guestrin Carnegie Mellon University October 29 th, 2008 Readings: K&F: 4.1, 4.2, 4.3, 4.4,

Advertisements

Weakening the Causal Faithfulness Assumption

Chapter 8-3 Markov Random Fields 1. Topics 1. Introduction 1. Undirected Graphical Models 2. Terminology 2. Conditional Independence 3. Factorization.

PGM 2003/04 Tirgul 3-4 The Bayesian Network Representation.

Bayesian Network Representation Continued

Bayesian Networks Alan Ritter.

. DAGs, I-Maps, Factorization, d-Separation, Minimal I-Maps, Bayesian Networks Slides by Nir Friedman.

1 EM for BNs Graphical Models – Carlos Guestrin Carnegie Mellon University November 24 th, 2008 Readings: 18.1, 18.2, –  Carlos Guestrin.

Bayes’ Nets  A Bayes’ net is an efficient encoding of a probabilistic model of a domain  Questions we can ask:  Inference: given a fixed BN, what is.

1 Bayesian Param. Learning Bayesian Structure Learning Graphical Models – Carlos Guestrin Carnegie Mellon University October 6 th, 2008 Readings:

1 Variable Elimination Graphical Models – Carlos Guestrin Carnegie Mellon University October 11 th, 2006 Readings: K&F: 8.1, 8.2, 8.3,

1 BN Semantics 1 Graphical Models – Carlos Guestrin Carnegie Mellon University September 15 th, 2008 Readings: K&F: 3.1, 3.2, –  Carlos.

1 Parameter Learning 2 Structure Learning 1: The good Graphical Models – Carlos Guestrin Carnegie Mellon University September 27 th, 2006 Readings:

1 Use graphs and not pure logic Variables represented by nodes and dependencies by edges. Common in our language: “threads of thoughts”, “lines of reasoning”,

1 BN Semantics 2 – The revenge of d-separation Graphical Models – Carlos Guestrin Carnegie Mellon University September 20 th, 2006 Readings: K&F:

1 Param. Learning (MLE) Structure Learning The Good Graphical Models – Carlos Guestrin Carnegie Mellon University October 1 st, 2008 Readings: K&F:

1 Learning P-maps Param. Learning Graphical Models – Carlos Guestrin Carnegie Mellon University September 24 th, 2008 Readings: K&F: 3.3, 3.4, 16.1,

Introduction on Graphic Models

1 BN Semantics 2 – Representation Theorem The revenge of d-separation Graphical Models – Carlos Guestrin Carnegie Mellon University September 17.

1 Structure Learning (The Good), The Bad, The Ugly Inference Graphical Models – Carlos Guestrin Carnegie Mellon University October 13 th, 2008 Readings:

Belief Networks Kostas Kontogiannis E&CE 457. Belief Networks A belief network is a graph in which the following holds: –A set of random variables makes.

1 Variable Elimination Graphical Models – Carlos Guestrin Carnegie Mellon University October 15 th, 2008 Readings: K&F: 8.1, 8.2, 8.3,

1 BN Semantics 1 Graphical Models – Carlos Guestrin Carnegie Mellon University September 15 th, 2006 Readings: K&F: 3.1, 3.2, 3.3.

1 BN Semantics 3 – Now it’s personal! Parameter Learning 1 Graphical Models – Carlos Guestrin Carnegie Mellon University September 22 nd, 2006 Readings:

A Brief Introduction to Bayesian networks

BN Semantic II d-Separation, PDAGs, etc

CS 2750: Machine Learning Directed Graphical Models

Bayesian networks Chapter 14 Section 1 – 2.

Presented By S.Yamuna AP/CSE

Read R&N Ch Next lecture: Read R&N

Bayesian Networks: A Tutorial

Bell & Coins Example Coin1 Bell Coin2

CS 4/527: Artificial Intelligence

Markov Properties of Directed Acyclic Graphs

Read R&N Ch Next lecture: Read R&N

Dependency Models – abstraction of Probability distributions

General Gibbs Distribution

Bayesian Networks Based on

Independence in Markov Networks

Readings: K&F: 15.1, 15.2, 15.3, 15.4, 15.5 K&F: 7 (overview of inference) K&F: 8.1, 8.2 (Variable Elimination) Structure Learning in BNs 3: (the good,

Bayesian Networks Independencies Representation Probabilistic

CS 188: Artificial Intelligence

Independence in Markov Networks

An Algorithm for Bayesian Network Construction from Data

CS 188: Artificial Intelligence Spring 2007

Variable Elimination 2 Clique Trees

Introduction Graphical Models – Carlos Guestrin

Parameter Learning 2 Structure Learning 1: The good

EM for BNs Kalman Filters Gaussian BNs

Readings: K&F: 5.1, 5.2, 5.3, 5.4, 5.5, 5.6, 5.7 Markov networks, Factor graphs, and an unified view Start approximate inference If we are lucky… Graphical.

Approximate Inference by Sampling

I-maps and perfect maps

Unifying Variational and GBP Learning Parameters of MNs EM for BNs

BN Semantics 3 – Now it’s personal! Parameter Learning 1

Markov Networks Independencies Representation Probabilistic Graphical

I-maps and perfect maps

Junction Trees 3 Undirected Graphical Models

Readings: K&F: 11.3, 11.5 Yedidia et al. paper from the class website

Bayesian networks Chapter 14 Section 1 – 2.

Independence in Markov Networks

Probabilistic Reasoning

CS 188: Artificial Intelligence Spring 2006

Read R&N Ch Next lecture: Read R&N

Variable Elimination Graphical Models – Carlos Guestrin

Mean Field and Variational Methods Loopy Belief Propagation

Generalized Belief Propagation

CS 188: Artificial Intelligence Fall 2008

Presentation transcript:

BN Semantics 2 – The revenge of d-separation Readings: K&F: 3.3, 3.4 BN Semantics 2 – The revenge of d-separation Graphical Models – 10708 Carlos Guestrin Carnegie Mellon University September 20th, 2006

Local Markov assumption & I-maps Local independence assumptions in BN structure G: Independence assertions of P: BN structure G is an I-map (independence map) if: Flu Allergy Sinus Headache Nose Local Markov Assumption: A variable X is independent of its non-descendants given its parents (Xi  NonDescendantsXi | PaXi) 10-708 – Carlos Guestrin 2006

Today: The Representation Theorem Encodes independence assumptions BN: Joint probability distribution: Obtain If conditional independencies in BN are subset of conditional independencies in P If joint probability distribution: Obtain Then conditional independencies in BN are subset of conditional independencies in P 10-708 – Carlos Guestrin 2006

Factorized distributions Given Random vars X1,…,Xn P distribution over vars BN structure G over same vars P factorizes according to G if Flu Allergy Sinus Headache Nose 10-708 – Carlos Guestrin 2006

BN Representation Theorem – I-map to factorization Joint probability distribution: Obtain If conditional independencies in BN are subset of conditional independencies in P P factorizes according to G G is an I-map of P 10-708 – Carlos Guestrin 2006

BN Representation Theorem – I-map to factorization: Proof, part 1 Obtain G is an I-map of P P factorizes according to G Topological Ordering: Number variables such that: parent has lower number than child i.e., Xi ! Xj ) i<j DAGs always have (many) topological orderings find by a modification of breadth first search (not exactly what is in the book) Flu Allergy Sinus Headache Nose 10-708 – Carlos Guestrin 2006

BN Representation Theorem – I-map to factorization: Proof, part 2 Obtain G is an I-map of P P factorizes according to G Local Markov Assumption: A variable X is independent of its non-descendants given its parents (Xi  NonDescendantsXi | PaXi) ALL YOU NEED: Flu Allergy Sinus Headache Nose 10-708 – Carlos Guestrin 2006

Adding edges doesn’t hurt Theorem: Let G be an I-map for P, any DAG G’ that includes the same directed edges as G is also an I-map for P. Proof: Flu Allergy Sinus Headache Nose Airplane Season 10-708 – Carlos Guestrin 2006

Defining a BN Given a set of variables and conditional independence assertions of P Choose an ordering on variables, e.g., X1, …, Xn For i = 1 to n Add Xi to the network Define parents of Xi, PaXi, in graph as the minimal subset of {X1,…,Xi-1} such that local Markov assumption holds – Xi independent of rest of {X1,…,Xi-1}, given parents PaXi Define/learn CPT – P(Xi| PaXi) 10-708 – Carlos Guestrin 2006

BN Representation Theorem – Factorization to I-map If joint probability distribution: Obtain Then conditional independencies in BN are subset of conditional independencies in P P factorizes according to G G is an I-map of P 10-708 – Carlos Guestrin 2006

BN Representation Theorem – Factorization to I-map: Proof If joint probability distribution: Obtain Then conditional independencies in BN are subset of conditional independencies in P P factorizes according to G G is an I-map of P Homework 1!!!!  10-708 – Carlos Guestrin 2006

The BN Representation Theorem Joint probability distribution: Obtain If conditional independencies in BN are subset of conditional independencies in P Important because: Every P has at least one BN structure G If joint probability distribution: Obtain Then conditional independencies in BN are subset of conditional independencies in P Important because: Read independencies of P from BN structure G 10-708 – Carlos Guestrin 2006

What you need to know thus far Independence & conditional independence Definition of a BN Local Markov assumption The representation theorems Statement: G is an I-map for P if and only if P factorizes according to G Interpretation 10-708 – Carlos Guestrin 2006

Announcements Upcoming recitation Tomorrow 5 - 6:30pm in Wean 4615A review BN representation, representation theorem, d-separation (coming next) Don’t forget to register to the mailing list at: https://mailman.srv.cs.cmu.edu/mailman/listinfo/10708-announce If you don’t want to take the class for credit (will sit in or audit) – please talk with me after class 10-708 – Carlos Guestrin 2006

Independencies encoded in BN We said: All you need is the local Markov assumption (Xi  NonDescendantsXi | PaXi) But then we talked about other (in)dependencies e.g., explaining away What are the independencies encoded by a BN? Only assumption is local Markov But many others can be derived using the algebra of conditional independencies!!! 10-708 – Carlos Guestrin 2006

Understanding independencies in BNs – BNs with 3 nodes Local Markov Assumption: A variable X is independent of its non-descendants given its parents Indirect causal effect: X Z Y Indirect evidential effect: Common effect: X Z Y Z Y X Common cause: Z X Y 10-708 – Carlos Guestrin 2006

Understanding independencies in BNs – Some examples F H J I K 10-708 – Carlos Guestrin 2006

Understanding independencies in BNs – Some more examples F H J I K 10-708 – Carlos Guestrin 2006

An active trail – Example G E A B D H C F F’ F’’ When are A and H independent? 10-708 – Carlos Guestrin 2006

Active trails formalized A trail X1 – X2 – · · · –Xk is an active trail when variables Oµ{X1,…,Xn} are observed if for each consecutive triplet in the trail: Xi-1XiXi+1, and Xi is not observed (XiO) Xi-1XiXi+1, and Xi is not observed (XiO) Xi-1XiXi+1, and Xi is not observed (XiO) Xi-1XiXi+1, and Xi is observed (Xi2O), or one of its descendents 10-708 – Carlos Guestrin 2006

Active trails and independence? H C E G D B F K J I Theorem: Variables Xi and Xj are independent given Zµ{X1,…,Xn} if the is no active trail between Xi and Xj when variables Zµ{X1,…,Xn} are observed 10-708 – Carlos Guestrin 2006

More generally: Soundness of d-separation Given BN structure G Set of independence assertions obtained by d-separation: I(G) = {(XY|Z) : d-sepG(X;Y|Z)} Theorem: Soundness of d-separation If P factorizes over G then I(G)µI(P) Interpretation: d-separation only captures true independencies Proof discussed when we talk about undirected models 10-708 – Carlos Guestrin 2006

Existence of dependency when not d-separated G D B F K J I Theorem: If X and Y are not d-separated given Z, then X and Y are dependent given Z under some P that factorizes over G Proof sketch: Choose an active trail between X and Y given Z Make this trail dependent Make all else uniform (independent) to avoid “canceling” out influence 10-708 – Carlos Guestrin 2006

More generally: Completeness of d-separation Theorem: Completeness of d-separation For “almost all” distributions that P factorize over to G, we have that I(G) = I(P) “almost all” distributions: except for a set of measure zero of parameterizations of the CPTs (assuming no finite set of parameterizations has positive measure) Proof sketch: 10-708 – Carlos Guestrin 2006

Interpretation of completeness Theorem: Completeness of d-separation For “almost all” distributions that P factorize over to G, we have that I(G) = I(P) BN graph is usually sufficient to capture all independence properties of the distribution!!!! But only for complete independence: P ²(X=xY=y | Z=z), 8 x2Val(X), y2Val(Y), z2Val(Z) Often we have context-specific independence (CSI) 9 x2Val(X), y2Val(Y), z2Val(Z): P ²(X=xY=y | Z=z) Many factors may affect your grade But if you are a frequentist, all other factors are irrelevant  10-708 – Carlos Guestrin 2006

Algorithm for d-separation How do I check if X and Y are d-separated given Z There can be exponentially-many trails between X and Y Two-pass linear time algorithm finds all d-separations for X 1. Upward pass Mark descendants of Z 2. Breadth-first traversal from X Stop traversal at a node if trail is “blocked” (Some tricky details apply – see reading) A H C E G D B F K J I 10-708 – Carlos Guestrin 2006

What you need to know d-separation and independence sound procedure for finding independencies existence of distributions with these independencies (almost) all independencies can be read directly from graph without looking at CPTs 10-708 – Carlos Guestrin 2006

Building BNs from independence properties From d-separation we learned: Start from local Markov assumptions, obtain all independence assumptions encoded by graph For most P’s that factorize over G, I(G) = I(P) All of this discussion was for a given G that is an I-map for P Now, give me a P, how can I get a G? i.e., give me the independence assumptions entailed by P Many G are “equivalent”, how do I represent this? Most of this discussion is not about practical algorithms, but useful concepts that will be used by practical algorithms 10-708 – Carlos Guestrin 2006

Minimal I-maps One option: G is an I-map for P G is as simple as possible G is a minimal I-map for P if deleting any edges from G makes it no longer an I-map 10-708 – Carlos Guestrin 2006

Obtaining a minimal I-map Given a set of variables and conditional independence assumptions Choose an ordering on variables, e.g., X1, …, Xn For i = 1 to n Add Xi to the network Define parents of Xi, PaXi, in graph as the minimal subset of {X1,…,Xi-1} such that local Markov assumption holds – Xi independent of rest of {X1,…,Xi-1}, given parents PaXi Define/learn CPT – P(Xi| PaXi) 10-708 – Carlos Guestrin 2006

Minimal I-map not unique (or minimal) Flu, Allergy, SinusInfection, Headache Given a set of variables and conditional independence assumptions Choose an ordering on variables, e.g., X1, …, Xn For i = 1 to n Add Xi to the network Define parents of Xi, PaXi, in graph as the minimal subset of {X1,…,Xi-1} such that local Markov assumption holds – Xi independent of rest of {X1,…,Xi-1}, given parents PaXi Define/learn CPT – P(Xi| PaXi) 10-708 – Carlos Guestrin 2006

Perfect maps (P-maps) I-maps are not unique and often not simple enough Define “simplest” G that is I-map for P A BN structure G is a perfect map for a distribution P if I(P) = I(G) Our goal: Find a perfect map! Must address equivalent BNs 10-708 – Carlos Guestrin 2006

Inexistence of P-maps 1 XOR (this is a hint for the homework) 10-708 – Carlos Guestrin 2006

Inexistence of P-maps 2 (Slightly un-PC) swinging couples example 10-708 – Carlos Guestrin 2006

Obtaining a P-map Given the independence assertions that are true for P Assume that there exists a perfect map G* Want to find G* Many structures may encode same independencies as G*, when are we done? Find all equivalent structures simultaneously! 10-708 – Carlos Guestrin 2006

I-Equivalence Two graphs G1 and G2 are I-equivalent if I(G1) = I(G2) Equivalence class of BN structures Mutually-exclusive and exhaustive partition of graphs How do we characterize these equivalence classes? 10-708 – Carlos Guestrin 2006

Skeleton of a BN Skeleton of a BN structure G is an undirected graph over the same variables that has an edge X–Y for every X!Y or Y!X in G (Little) Lemma: Two I-equivalent BN structures must have the same skeleton A H C E G D B F K J I 10-708 – Carlos Guestrin 2006

What about V-structures? G D B F K J I V-structures are key property of BN structure Theorem: If G1 and G2 have the same skeleton and V-structures, then G1 and G2 are I-equivalent 10-708 – Carlos Guestrin 2006

Same V-structures not necessary Theorem: If G1 and G2 have the same skeleton and V-structures, then G1 and G2 are I-equivalent Though sufficient, same V-structures not necessary 10-708 – Carlos Guestrin 2006

Immoralities & I-Equivalence Key concept not V-structures, but “immoralities” (unmarried parents ) X ! Z Ã Y, with no arrow between X and Y Important pattern: X and Y independent given their parents, but not given Z (If edge exists between X and Y, we have covered the V-structure) Theorem: G1 and G2 have the same skeleton and immoralities if and only if G1 and G2 are I-equivalent 10-708 – Carlos Guestrin 2006

Obtaining a P-map Given the independence assertions that are true for P Obtain skeleton Obtain immoralities From skeleton and immoralities, obtain every (and any) BN structure from the equivalence class 10-708 – Carlos Guestrin 2006

Identifying the skeleton 1 When is there an edge between X and Y? When is there no edge between X and Y? 10-708 – Carlos Guestrin 2006

Identifying the skeleton 2 Assume d is max number of parents (d could be n) For each Xi and Xj Eij Ã true For each Uµ X – {Xi,Xj}, |U|· 2d Is (Xi  Xj | U) ? If Eij is true Add edge X – Y to skeleton 10-708 – Carlos Guestrin 2006

Identifying immoralities Consider X – Z – Y in skeleton, when should it be an immorality? Must be X ! Z Ã Y (immorality): When X and Y are never independent given U, if Z2U Must not be X ! Z Ã Y (not immorality): When there exists U with Z2U, such that X and Y are independent given U 10-708 – Carlos Guestrin 2006

From immoralities and skeleton to BN structures Representing BN equivalence class as a partially-directed acyclic graph (PDAG) Immoralities force direction on other BN edges Full (polynomial-time) procedure described in reading 10-708 – Carlos Guestrin 2006

What you need to know Minimal I-map Perfect map every P has one, but usually many Perfect map better choice for BN structure not every P has one can find one (if it exists) by considering I-equivalence Two structures are I-equivalent if they have same skeleton and immoralities 10-708 – Carlos Guestrin 2006