1 BN Semantics 1 Graphical Models – 10708 Carlos Guestrin Carnegie Mellon University September 15 th, 2006 Readings: K&F: 3.1, 3.2, 3.3.

Slides:

Advertisements

Similar presentations

1 Undirected Graphical Models Graphical Models – Carlos Guestrin Carnegie Mellon University October 29 th, 2008 Readings: K&F: 4.1, 4.2, 4.3, 4.4,

Advertisements

Identifying Conditional Independencies in Bayes Nets Lecture 4.

Bayesian Networks VISA Hyoungjune Yi. BN – Intro. Introduced by Pearl (1986 ) Resembles human reasoning Causal relationship Decision support system/ Expert.

Review: Bayesian learning and inference

PGM 2003/04 Tirgul 3-4 The Bayesian Network Representation.

Bayesian Belief Networks

Bayesian Network Representation Continued

CS 188: Artificial Intelligence Fall 2009 Lecture 15: Bayes’ Nets II – Independence 10/15/2009 Dan Klein – UC Berkeley.

CS 188: Artificial Intelligence Spring 2009 Lecture 15: Bayes’ Nets II -- Independence 3/10/2009 John DeNero – UC Berkeley Slides adapted from Dan Klein.

Bayesian Networks Alan Ritter.

Announcements Homework 8 is out Final Contest (Optional)

. DAGs, I-Maps, Factorization, d-Separation, Minimal I-Maps, Bayesian Networks Slides by Nir Friedman.

1 Bayesian Networks Chapter ; 14.4 CS 63 Adapted from slides by Tim Finin and Marie desJardins. Some material borrowed from Lise Getoor.

Bayesian networks More commonly called graphical models A way to depict conditional independence relationships between random variables A compact specification.

1 EM for BNs Graphical Models – Carlos Guestrin Carnegie Mellon University November 24 th, 2008 Readings: 18.1, 18.2, –  Carlos Guestrin.

Advanced Artificial Intelligence

Machine Learning CUNY Graduate Center Lecture 21: Graphical Models.

Bayes’ Nets  A Bayes’ net is an efficient encoding of a probabilistic model of a domain  Questions we can ask:  Inference: given a fixed BN, what is.

1 Bayesian Param. Learning Bayesian Structure Learning Graphical Models – Carlos Guestrin Carnegie Mellon University October 6 th, 2008 Readings:

Introduction to Bayesian Networks

1 Variable Elimination Graphical Models – Carlos Guestrin Carnegie Mellon University October 11 th, 2006 Readings: K&F: 8.1, 8.2, 8.3,

Bayes Nets recitation Bayes Nets Represent dependencies between variables Compact representation of probability distribution Flu Allergy.

Generalizing Variable Elimination in Bayesian Networks 서울 시립대학원 전자 전기 컴퓨터 공학과 G 박민규.

1 EM for BNs Kalman Filters Gaussian BNs Graphical Models – Carlos Guestrin Carnegie Mellon University November 17 th, 2006 Readings: K&F: 16.2 Jordan.

Readings: K&F: 11.3, 11.5 Yedidia et al. paper from the class website

Announcements Project 4: Ghostbusters Homework 7

INTERVENTIONS AND INFERENCE / REASONING. Causal models  Recall from yesterday:  Represent relevance using graphs  Causal relevance ⇒ DAGs  Quantitative.

Probabilistic Reasoning [Ch. 14] Bayes Networks – Part 1 ◦Syntax ◦Semantics ◦Parameterized distributions Inference – Part2 ◦Exact inference by enumeration.

Marginalization & Conditioning Marginalization (summing out): for any sets of variables Y and Z: Conditioning(variant of marginalization):

1 BN Semantics 1 Graphical Models – Carlos Guestrin Carnegie Mellon University September 15 th, 2008 Readings: K&F: 3.1, 3.2, –  Carlos.

1 Bayesian Networks (Directed Acyclic Graphical Models) The situation of a bell that rings whenever the outcome of two coins are equal can not be well.

Review: Bayesian inference  A general scenario:  Query variables: X  Evidence (observed) variables and their values: E = e  Unobserved variables: Y.

1 Parameter Learning 2 Structure Learning 1: The good Graphical Models – Carlos Guestrin Carnegie Mellon University September 27 th, 2006 Readings:

1 Use graphs and not pure logic Variables represented by nodes and dependencies by edges. Common in our language: “threads of thoughts”, “lines of reasoning”,

1 BN Semantics 2 – The revenge of d-separation Graphical Models – Carlos Guestrin Carnegie Mellon University September 20 th, 2006 Readings: K&F:

1 Param. Learning (MLE) Structure Learning The Good Graphical Models – Carlos Guestrin Carnegie Mellon University October 1 st, 2008 Readings: K&F:

1 CMSC 671 Fall 2001 Class #20 – Thursday, November 8.

CS B 553: A LGORITHMS FOR O PTIMIZATION AND L EARNING Bayesian Networks.

1 Learning P-maps Param. Learning Graphical Models – Carlos Guestrin Carnegie Mellon University September 24 th, 2008 Readings: K&F: 3.3, 3.4, 16.1,

1 BN Semantics 2 – Representation Theorem The revenge of d-separation Graphical Models – Carlos Guestrin Carnegie Mellon University September 17.

1 Structure Learning (The Good), The Bad, The Ugly Inference Graphical Models – Carlos Guestrin Carnegie Mellon University October 13 th, 2008 Readings:

1 Variable Elimination Graphical Models – Carlos Guestrin Carnegie Mellon University October 15 th, 2008 Readings: K&F: 8.1, 8.2, 8.3,

CS 2750: Machine Learning Bayesian Networks Prof. Adriana Kovashka University of Pittsburgh March 14, 2016.

1 BN Semantics 3 – Now it’s personal! Parameter Learning 1 Graphical Models – Carlos Guestrin Carnegie Mellon University September 22 nd, 2006 Readings:

Daphne Koller Independencies Bayesian Networks Probabilistic Graphical Models Representation.

Artificial Intelligence Bayes’ Nets: Independence Instructors: David Suter and Qince Li Course Harbin Institute of Technology [Many slides.

CS 2750: Machine Learning Directed Graphical Models

General Gibbs Distribution

Markov Networks Independencies Representation Probabilistic Graphical

Readings: K&F: 15.1, 15.2, 15.3, 15.4, 15.5 K&F: 7 (overview of inference) K&F: 8.1, 8.2 (Variable Elimination) Structure Learning in BNs 3: (the good,

CAP 5636 – Advanced Artificial Intelligence

Bayesian Networks Independencies Representation Probabilistic

CS 188: Artificial Intelligence

CS 188: Artificial Intelligence Spring 2007

Variable Elimination 2 Clique Trees

Parameter Learning 2 Structure Learning 1: The good

Readings: K&F: 5.1, 5.2, 5.3, 5.4, 5.5, 5.6, 5.7 Markov networks, Factor graphs, and an unified view Start approximate inference If we are lucky… Graphical.

Unifying Variational and GBP Learning Parameters of MNs EM for BNs

BN Semantics 3 – Now it’s personal! Parameter Learning 1

Junction Trees 3 Undirected Graphical Models

Readings: K&F: 11.3, 11.5 Yedidia et al. paper from the class website

Markov Networks Independencies Representation Probabilistic Graphical

CS 188: Artificial Intelligence Spring 2006

Variable Elimination Graphical Models – Carlos Guestrin

Generalized Belief Propagation

CS 188: Artificial Intelligence Fall 2008

BN Semantics 2 – The revenge of d-separation

Presentation transcript:

1 BN Semantics 1 Graphical Models – Carlos Guestrin Carnegie Mellon University September 15 th, 2006 Readings: K&F: 3.1, 3.2, 3.3

–  Carlos Guestrin Let’s start on BNs… Consider P(X i )  Assign probability to each x i 2 Val(X i )  Independent parameters Consider P(X 1,…,X n )  How many independent parameters if |Val(X i )|=k?

–  Carlos Guestrin What if variables are independent?  (X i  X j ), 8 i,j  Not enough!!! (See homework 1 )  Must assume that (X  Y), 8 X,Y subsets of {X 1,…,X n } Can write  P(X 1,…,X n ) =  i=1…n P(X i ) How many independent parameters now?

–  Carlos Guestrin Conditional parameterization – two nodes Grade is determined by Intelligence

–  Carlos Guestrin Conditional parameterization – three nodes Grade and SAT score are determined by Intelligence (G  S | I)

–  Carlos Guestrin The naïve Bayes model – Your first real Bayes Net Class variable: C Evidence variables: X 1,…,X n assume that (X  Y | C), 8 X,Y subsets of {X 1,…,X n }

–  Carlos Guestrin What you need to know (From last class) Basic definitions of probabilities Independence Conditional independence The chain rule Bayes rule Naïve Bayes

–  Carlos Guestrin Announcements Homework 1:  Out yesterday  Due September 27th – beginning of class!  It’s hard – start early, ask questions Collaboration policy  OK to discuss in groups  Tell us on your paper who you talked with  Each person must write their own unique paper  No searching the web, papers, etc. for answers, we trust you want to learn Upcoming recitation  Monday 5:30-7pm in Wean 4615A – Matlab Tutorial Don’t forget to register to the mailing list at: 

–  Carlos Guestrin This class We’ve heard of Bayes nets, we’ve played with Bayes nets, we’ve even used them in your research This class, we’ll learn the semantics of BNs, relate them to independence assumptions encoded by the graph

–  Carlos Guestrin Causal structure Suppose we know the following:  The flu causes sinus inflammation  Allergies cause sinus inflammation  Sinus inflammation causes a runny nose  Sinus inflammation causes headaches How are these connected?

–  Carlos Guestrin Possible queries Flu Allergy Sinus Headache Nose Inference Most probable explanation Active data collection

–  Carlos Guestrin Car starts BN 18 binary attributes Inference  P(BatteryAge|Starts=f) 2 18 terms, why so fast? Not impressed?  HailFinder BN – more than 3 54 = terms

–  Carlos Guestrin Factored joint distribution - Preview Flu Allergy Sinus Headache Nose

–  Carlos Guestrin Number of parameters Flu Allergy Sinus Headache Nose

–  Carlos Guestrin Key: Independence assumptions Flu Allergy Sinus Headache Nose Knowing sinus separates the variables from each other

–  Carlos Guestrin (Marginal) Independence Flu and Allergy are (marginally) independent More Generally: Flu = tFlu = f Allergy = t Allergy = f Allergy = t Allergy = f Flu = t Flu = f

–  Carlos Guestrin Conditional independence Flu and Headache are not (marginally) independent Flu and Headache are independent given Sinus infection More Generally:

–  Carlos Guestrin The independence assumption Flu Allergy Sinus Headache Nose Local Markov Assumption: A variable X is independent of its non-descendants given its parents (X i  NonDescendants Xi | Pa Xi )

–  Carlos Guestrin Explaining away Flu Allergy Sinus Headache Nose Local Markov Assumption: A variable X is independent of its non-descendants given its parents (Xi  NonDescendants Xi | Pa Xi )

–  Carlos Guestrin What about probabilities? Conditional probability tables (CPTs) Flu Allergy Sinus Headache Nose

–  Carlos Guestrin Joint distribution Flu Allergy Sinus Headache Nose Why can we decompose? Markov Assumption!

–  Carlos Guestrin A general Bayes net Set of random variables Directed acyclic graph CPTs Joint distribution: Local Markov Assumption:  A variable X is independent of its non-descendants given its parents – (Xi  NonDescendantsXi | PaXi)

–  Carlos Guestrin Questions???? What distributions can be represented by a BN? What BNs can represent a distribution? What are the independence assumptions encoded in a BN?  in addition to the local Markov assumption

–  Carlos Guestrin Today: The Representation Theorem – Joint Distribution to BN Joint probability distribution: Obtain BN: Encodes independence assumptions If conditional independencies in BN are subset of conditional independencies in P

–  Carlos Guestrin Today: The Representation Theorem – BN to Joint Distribution If joint probability distribution: BN: Encodes independence assumptions Obtain Then conditional independencies in BN are subset of conditional independencies in P

–  Carlos Guestrin Let’s start proving it for naïve Bayes – From joint distribution to BN Independence assumptions:  X i independent given C Let’s assume that P satisfies independencies must prove that P factorizes according to BN:  P(C,X 1,…,X n ) = P(C)  i P(X i |C) Use chain rule!

–  Carlos Guestrin Let’s start proving it for naïve Bayes – From BN to joint distribution 1 Let’s assume that P factorizes according to the BN:  P(C,X 1,…,X n ) = P(C)  i P(X i |C) Prove the independence assumptions:  X i independent given C  Actually, (X  Y | C), 8 X,Y subsets of {X 1,…,X n }

–  Carlos Guestrin Let’s start proving it for naïve Bayes – From BN to joint distribution 2 Let’s consider a simpler case  Grade and SAT score are determined by Intelligence  P(I,G,S) = P(I)P(G|I)P(S|I)  Prove that P(G,S|I) = P(G|I) P(S|I)

–  Carlos Guestrin Today: The Representation Theorem BN: Encodes independence assumptions Joint probability distribution: Obtain If conditional independencies in BN are subset of conditional independencies in P If joint probability distribution: Obtain Then conditional independencies in BN are subset of conditional independencies in P

–  Carlos Guestrin Local Markov assumption & I-maps Flu Allergy Sinus Headache Nose Local Markov Assumption: A variable X is independent of its non-descendants given its parents (Xi  NonDescendants Xi | Pa Xi ) Local independence assumptions in BN structure G: Independence assertions of P: BN structure G is an I-map (independence map) if:

–  Carlos Guestrin Factorized distributions Given  Random vars X 1,…,X n  P distribution over vars  BN structure G over same vars P factorizes according to G if Flu Allergy Sinus Headache Nose

–  Carlos Guestrin BN Representation Theorem – I-map to factorization Joint probability distribution: Obtain If conditional independencies in BN are subset of conditional independencies in P G is an I-map of P P factorizes according to G

–  Carlos Guestrin BN Representation Theorem – I-map to factorization: Proof Local Markov Assumption: A variable X is independent of its non-descendants given its parents (Xi  NonDescendants Xi | Pa Xi ) ALL YOU NEED: Flu Allergy Sinus Headache Nose Obtain G is an I-map of P P factorizes according to G

–  Carlos Guestrin Defining a BN Given a set of variables and conditional independence assertions of P Choose an ordering on variables, e.g., X 1, …, X n For i = 1 to n  Add X i to the network  Define parents of X i, Pa X i, in graph as the minimal subset of {X 1,…,X i-1 } such that local Markov assumption holds – X i independent of rest of {X 1,…,X i-1 }, given parents Pa Xi  Define/learn CPT – P(X i | Pa Xi )

–  Carlos Guestrin BN Representation Theorem – Factorization to I-map G is an I-map of P P factorizes according to G If joint probability distribution: Obtain Then conditional independencies in BN are subset of conditional independencies in P

–  Carlos Guestrin BN Representation Theorem – Factorization to I-map: Proof G is an I-map of P P factorizes according to G If joint probability distribution: Obtain Then conditional independencies in BN are subset of conditional independencies in P Homework 1!!!!

–  Carlos Guestrin The BN Representation Theorem If joint probability distribution: Obtain Then conditional independencies in BN are subset of conditional independencies in P Joint probability distribution: Obtain If conditional independencies in BN are subset of conditional independencies in P Important because: Every P has at least one BN structure G Important because: Read independencies of P from BN structure G

–  Carlos Guestrin Independencies encoded in BN We said: All you need is the local Markov assumption  (X i  NonDescendants Xi | Pa Xi ) But then we talked about other (in)dependencies  e.g., explaining away What are the independencies encoded by a BN?  Only assumption is local Markov  But many others can be derived using the algebra of conditional independencies!!!

–  Carlos Guestrin Understanding independencies in BNs – BNs with 3 nodes Z YX Local Markov Assumption: A variable X is independent of its non-descendants given its parents ZYX ZYX Z YX Indirect causal effect: Indirect evidential effect: Common cause: Common effect:

–  Carlos Guestrin Understanding independencies in BNs – Some examples A H C E G D B F K J I

–  Carlos Guestrin Understanding independencies in BNs – Some more examples A H C E G D B F K J I

–  Carlos Guestrin An active trail – Example AH C E G DB F F’’ F’ When are A and H independent?

–  Carlos Guestrin Active trails formalized A trail X 1 – X 2 – · · · –X k is an active trail when variables O µ {X 1,…,X n } are observed if for each consecutive triplet in the trail:  X i-1  X i  X i+1, and X i is not observed (X i  O)  X i-1  X i  X i+1, and X i is not observed (X i  O)  X i-1  X i  X i+1, and X i is not observed (X i  O)  X i-1  X i  X i+1, and X i is observed (X i 2 O), or one of its descendents

–  Carlos Guestrin Active trails and independence? Theorem: Variables X i and X j are independent given Z µ {X 1,…,X n } if the is no active trail between X i and X j when variables Z µ {X 1,…,X n } are observed A H C E G D B F K J I

–  Carlos Guestrin More generally: Soundness of d-separation Given BN structure G Set of independence assertions obtained by d-separation:  I(G) = {(X  Y|Z) : d-sep G (X;Y|Z)} Theorem: Soundness of d-separation  If P factorizes over G then I(G) µ I(P) Interpretation: d-separation only captures true independencies Proof discussed when we talk about undirected models

–  Carlos Guestrin Adding edges doesn’t hurt Theorem: Let G be an I-map for P, any DAG G’ that includes the same directed edges as G is also an I-map for P. Proof sketch: Flu Allergy Sinus Headache Nose

–  Carlos Guestrin Existence of dependency when not d-separated Theorem: If X and Y are not d-separated given Z, then X and Y are dependent given Z under some P that factorizes over G Proof sketch:  Choose an active trail between X and Y given Z  Make this trail dependent  Make all else uniform (independent) to avoid “canceling” out influence A H C E G D B F K J I

–  Carlos Guestrin More generally: Completeness of d-separation Theorem: Completeness of d-separation  For “almost all” distributions that P factorize over to G, we have that I(G) = I(P)  “almost all” distributions: except for a set of measure zero of parameterizations of the CPTs (assuming no finite set of parameterizations has positive measure) Proof sketch:

–  Carlos Guestrin Interpretation of completeness Theorem: Completeness of d-separation  For “almost all” distributions that P factorize over to G, we have that I(G) = I(P) BN graph is usually sufficient to capture all independence properties of the distribution!!!! But only for complete independence:  P ² (X=x  Y=y | Z=z), 8 x 2 Val(X), y 2 Val(Y), z 2 Val(Z) Often we have context-specific independence (CSI)  9 x 2 Val(X), y 2 Val(Y), z 2 Val(Z): P ² (X=x  Y=y | Z=z)  Many factors may affect your grade  But if you are a frequentist, all other factors are irrelevant

–  Carlos Guestrin What you need to know Independence & conditional independence Definition of a BN The representation theorems  Statement  Interpretation d-separation and independence  soundness  existence  completeness

–  Carlos Guestrin Acknowledgements JavaBayes applet  me/index.html