Bayes Net Perspectives on Causation and Causal Inference

Slides:



Advertisements
Similar presentations
CS188: Computational Models of Human Behavior
Advertisements

1 Learning Causal Structure from Observational and Experimental Data Richard Scheines Carnegie Mellon University.
A Tutorial on Learning with Bayesian Networks
Discovering Cyclic Causal Models by Independent Components Analysis Gustavo Lacerda Peter Spirtes Joseph Ramsey Patrik O. Hoyer.
1. Person 1 1.Stress 2.Depression 3. Religious Coping Task: learn causal model 2 Data from Bongjae Lee, described in Silva et al
1 Some Comments on Sebastiani et al Nature Genetics 37(4)2005.
Weakening the Causal Faithfulness Assumption
Outline 1)Motivation 2)Representing/Modeling Causal Systems 3)Estimation and Updating 4)Model Search 5)Linear Latent Variable Models 6)Case Study: fMRI.
Lauritzen-Spiegelhalter Algorithm
Peter Spirtes, Jiji Zhang 1. Faithfulness comes in several flavors and is a kind of principle that selects simpler (in a certain sense) over more complicated.
Dynamic Bayesian Networks (DBNs)
Learning Bayesian Networks. Dimensions of Learning ModelBayes netMarkov net DataCompleteIncomplete StructureKnownUnknown ObjectiveGenerativeDiscriminative.
1 in data …uncertainty and complexity in models and.
Introduction of Probabilistic Reasoning and Bayesian Networks
From: Probabilistic Methods for Bioinformatics - With an Introduction to Bayesian Networks By: Rich Neapolitan.
Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.
Bayes Nets Rong Jin. Hidden Markov Model  Inferring from observations (o i ) to hidden variables (q i )  This is a general framework for representing.
1Causality & MDL Causal Models as Minimal Descriptions of Multivariate Systems Jan Lemeire June 15 th 2006.
Bayesian Network Representation Continued
Graphical Models Lei Tang. Review of Graphical Models Directed Graph (DAG, Bayesian Network, Belief Network) Typically used to represent causal relationship.
6. Gene Regulatory Networks
1 gR2002 Peter Spirtes Carnegie Mellon University.
Bayesian Networks Alan Ritter.
Causal Models, Learning Algorithms and their Application to Performance Modeling Jan Lemeire Parallel Systems lab November 15 th 2006.
Causal Modeling for Anomaly Detection Andrew Arnold Machine Learning Department, Carnegie Mellon University Summer Project with Naoki Abe Predictive Modeling.
CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.
1 Day 2: Search June 9, 2015 Carnegie Mellon University Center for Causal Discovery.
CAUSAL SEARCH IN THE REAL WORLD. A menu of topics  Some real-world challenges:  Convergence & error bounds  Sample selection bias  Simpson’s paradox.
Summary of the Bayes Net Formalism David Danks Institute for Human & Machine Cognition.
A Brief Introduction to Graphical Models
Using Bayesian Networks to Analyze Expression Data N. Friedman, M. Linial, I. Nachman, D. Hebrew University.
Learning Structure in Bayes Nets (Typically also learn CPTs here) Given the set of random variables (features), the space of all possible networks.
Causal Inference and Graphical Models Peter Spirtes Carnegie Mellon University.
Bayesian Learning By Porchelvi Vijayakumar. Cognitive Science Current Problem: How do children learn and how do they get it right?
Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:
Chapter 2 The Research Enterprise in Psychology. Table of Contents The Scientific Approach: A Search for Laws Basic assumption: events are governed by.
Nov. 13th, Causal Discovery Richard Scheines Peter Spirtes, Clark Glymour, and many others Dept. of Philosophy & CALD Carnegie Mellon.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
Introduction to Bayesian Networks
Learning Linear Causal Models Oksana Kohutyuk ComS 673 Spring 2005 Department of Computer Science Iowa State University.
Ch 8. Graphical Models Pattern Recognition and Machine Learning, C. M. Bishop, Revised by M.-O. Heo Summarized by J.W. Nam Biointelligence Laboratory,
Course files
Learning the Structure of Related Tasks Presented by Lihan He Machine Learning Reading Group Duke University 02/03/2006 A. Niculescu-Mizil, R. Caruana.
Bayesian Network By Zhang Liliang. Key Point Today Intro to Bayesian Network Usage of Bayesian Network Reasoning BN: D-separation.
Computing & Information Sciences Kansas State University Data Sciences Summer Institute Multimodal Information Access and Synthesis Learning and Reasoning.
INTERVENTIONS AND INFERENCE / REASONING. Causal models  Recall from yesterday:  Represent relevance using graphs  Causal relevance ⇒ DAGs  Quantitative.
Exploratory studies: you have empirical data and you want to know what sorts of causal models are consistent with it. Confirmatory tests: you have a causal.
Dependency Networks for Collaborative Filtering and Data Visualization UAI-2000 발표 : 황규백.
Lecture 2: Statistical learning primer for biologists
Bayesian networks and their application in circuit reliability estimation Erin Taylor.
The Visual Causality Analyst: An Interactive Interface for Causal Reasoning Jun Wang, Stony Brook University Klaus Mueller, Stony Brook University, SUNY.
Chapter Two Methods in the Study of Personality. Gathering Information About Personality Informal Sources of Information: Observations of Self—Introspection,
04/21/2005 CS673 1 Being Bayesian About Network Structure A Bayesian Approach to Structure Discovery in Bayesian Networks Nir Friedman and Daphne Koller.
1 Machine Learning: Lecture 6 Bayesian Learning (Based on Chapter 6 of Mitchell T.., Machine Learning, 1997)
1 BN Semantics 2 – Representation Theorem The revenge of d-separation Graphical Models – Carlos Guestrin Carnegie Mellon University September 17.
1 BN Semantics 1 Graphical Models – Carlos Guestrin Carnegie Mellon University September 15 th, 2006 Readings: K&F: 3.1, 3.2, 3.3.
1 Day 2: Search June 9, 2015 Carnegie Mellon University Center for Causal Discovery.
Graphical Models for Segmenting and Labeling Sequence Data Manoj Kumar Chinnakotla NLP-AI Seminar.
1Causal Inference and the KMSS. Predicting the future with the right model Stijn Meganck Vrije Universiteit Brussel Department of Electronics and Informatics.
1 Day 2: Search June 14, 2016 Carnegie Mellon University Center for Causal Discovery.
Qian Liu CSE spring University of Pennsylvania
Meredith L. Wilcox FIU, Department of Epidemiology/Biostatistics
Markov Properties of Directed Acyclic Graphs
Center for Causal Discovery: Summer Short Course/Datathon
Propagation Algorithm in Bayesian Networks
Extra Slides.
CS 188: Artificial Intelligence
Machine Learning: Lecture 6
Machine Learning: UNIT-3 CHAPTER-1
Searching for Graphical Causal Models of Education Data
Presentation transcript:

Bayes Net Perspectives on Causation and Causal Inference Peter Spirtes Thank organizers

Example Problems Genetic regulatory networks Yeast – ~5000 genes, ~2,500,000 potential edges A gene regulatory network in mouse embryonic stem cells http://www.pnas.org/content/104/42/16438/F3.expansion.html point out Rcor2, Oct4 change yeast to mouse fmri in brain climate science social networks

Causal Models → Predictions Probabilistic – Among the cells that have active Oct4 what percentage have active Rcor2? Causal – If I experimentally set a cell to have active Oct4, what percentage will have active Rcor2? 3 levels of prediction

Causal Models → Predictions Counterfactual – Among the cells that did not have active Oct4 at t-1, what percentage would have active Rcor2 if I had experimentally set a cell to have active Oct4 at t-1?

Data → Causal Models Large number of variables Small observed sample size Overlapping variables Small number of experiments Feedback Hidden common causes Selection bias Many kinds of entities causally interacting

Outline Bayesian Networks Search Limitations and Extensions of Bayesian Networks Dynamic Relational Cycles Counterfactual what Bayesian networks are for search problems limitations of standard bayesian networks – what needs work and recent research

Directed Acyclic Graph (DAG) Bayesian Networks Search Limitations and Extensions Dynamic Relational Cycles Counterfactual Directed Acyclic Graph (DAG) SES SEX PE CP IQ SES – Socioeconomic Status PE – Parental Encouragement CP – College Plans IQ – Intelligence Quotient SEX – Sex Point to DAG terminology State vertices are random variables first. Random variables in italics, sets of random variables in bold Then give list of random variables. Sewell and Shah – 1968, 10308 high school seniors The vertices are random variables. All edges are directed. There are no directed cycles.

Population Bayesian Networks Search Limitations and Extensions Dynamic Relational Cycles Counterfactual Population SES SEX PE CP IQ SES SEX PE CP IQ SES SEX PE CP IQ connect to set of probability distributions and causal relations Independent, identically distributed

P Factoring According to G Bayesian Networks Search Limitations and Extensions Dynamic Relational Cycles Counterfactual P Factoring According to G SES SEX PE CP IQ P(SES,SEX,PE,IQ,CP) = P(SEX)P(SES)P(IQ|SES) P(PE|SES,SEX,IQ) P(CP|PE,SES,IQ) If then P factors according to G G represents all of the distributions that factor according to G what the parameters are can’t measure the causal relations directly – could do experiments; but if can’t give relationship between Causal and probabilistic interpretations so can infer as much as possible about DAG from samples from probability distribution

Conditional Independence Bayesian Networks Search Limitations and Extensions Dynamic Relational Cycles Counterfactual Conditional Independence X is independent of Y conditional on Z (denoted IP(X,Y|Z)) iff P(X|Y,Z) = P(X|Z). IP(CP,SEX|{SES,IQ,PE}) iff P(CP|{SES,IQ,PE,SEX}) = P(CP|{SES,IQ,PE}) notation

Bayesian Networks Search Limitations and Extensions Dynamic Relational Cycles Counterfactual Graphical Entailment If for every P that factors according to G, IP(X,Y|Z) holds, then G entails I(X,Y|Z). Examples: G entails I(IQ,SEX|∅) I(IQ,SEX|SES) Can read entailments off of graph through d-separation SES SEX PE CP IQ First is local Markov. Second is entailed by local Markov. Can read off of graph.

D-separation and D-connection Bayesian Networks Search Limitations and Extensions Dynamic Relational Cycles Counterfactual D-separation and D-connection X d-separated from Y conditional on Z in G iff G entails X independent of Y conditional on Z D-separation between X and Y conditional on Z holds when certain kinds of paths do not exist between X and Y SES SEX PE CP IQ won’t give full definition here because too complicated – trust me SES and SEX d-separated conditional on empty set. For conditioning on empty set no path without a collider. SES and SEX d-connected conditional on PE. Path with collider that contains conditioning set. D-connection (the negation of d-separation) between X and Y conditional on Z holds when certain kinds of paths do exist between X and Y

Definition of D-connection Bayesian Networks Search Limitations and Extensions Dynamic Relational Cycles Counterfactual Definition of D-connection A node X is active on a path U conditional on Z iff X is a collider (→ X ←) and there is a directed path from X to a member of Z or X is in Z; or X is not a collider and X is not in Z. SES SEX PE CP IQ won’t give full definition here because too complicated – trust me SES and SEX d-separated conditional on empty set. For conditioning on empty set no path without a collider. SES and SEX d-connected conditional on PE. Path with collider that contains conditioning set. SES → IQ → PE ← SEX is a path U. PE is active on U conditional on {CP, IQ}. IQ is inactive on U conditional on {CP, IQ}.

Definition of D-connection Bayesian Networks Search Limitations and Extensions Dynamic Relational Cycles Counterfactual Definition of D-connection SES SEX PE CP IQ A path U is active conditional on Z iff every vertex on U is active relative to Z. X is d-connected to Y conditional on Z iff there is an active path between X and Y conditional on Z. won’t give full definition here because too complicated – trust me SES and SEX d-separated conditional on empty set. For conditioning on empty set no path without a collider. SES and SEX d-connected conditional on PE. Path with collider that contains conditioning set. SES → IQ → PE ← SEX is inactive conditional on {CP, IQ}. SES is d-connected to SEX conditional on {CP, IQ} because SES → PE ← SEX is active conditional on {CP, IQ}

Bayesian Networks Search Limitations and Extensions Dynamic Relational Cycles Counterfactual If I is Not Entailed by G SES SEX PE CP IQ If conditional independence relation I is not entailed by G, then I may hold in some (but not every) distribution P that factors according to G. for some values of the paramters, but not for others Example: There are P and P’ that factor according to G such that ~IP(SES,CP|∅) and IP’(SES,CP|∅). P’ is said to be unfaithful to G.

Bayesian Networks Search Limitations and Extensions Dynamic Relational Cycles Counterfactual Manipulations An ideal manipulation assigns a density to a set X of properties (random variables) as a function of the values of a set Z of properties (random variables) Directly affects only the variables in X Successful Example – randomized experiment can’t tell if particular action is an ideal manipulation

Manipulations and Causal Graph Bayesian Networks Search Limitations and Extensions Dynamic Relational Cycles Counterfactual Manipulations and Causal Graph There is an edge SES → CP in G because there are two ways of manipulating {SES,SEX,IQ,PE} that differ only in the value they assign to SES that changes the probability of CP. SES SEX PE CP IQ not defining causal in terms of non-causal, but giving relationships of causal terms relation to experiments Stable Unit Treatment Value Assumption

Bayesian Networks Search Limitations and Extensions Dynamic Relational Cycles Counterfactual Causal Sufficiency SES SEX PE CP IQ A set S of variables is causally sufficient if there are no variables not in S that are direct causes of more than one variable in S. S = {SES,IQ} is causally sufficient. S = {SES,PE,CP} is not causally sufficient.

Causal Markov Assumption Bayesian Networks Search Limitations and Extensions Dynamic Relational Cycles Counterfactual Causal Markov Assumption In a population Pop with distribution P and causal graph G, if V is causally sufficient, P(V) factors according to G. P(SES,SEX,PE,IQ,CP) = P(SEX)P(SES)P(IQ|SES) P(PE|SES,SES,IQ) P(CP|PE,SES,IQ) SES SEX PE CP IQ the reason it is called a Markov assumption is because in an equivalent form – related to Reichenbach common cause this relates the causal interpretation of the graph and the probabilitistic representation of the graph

Representation of Manipulation Bayesian Networks Search Limitations and Extensions Dynamic Relational Cycles Counterfactual Representation of Manipulation P(SES,SEX,PE=1,IQ,CP||PE=1) = P(SEX)P(SES)P(IQ|SES) * 1 * P(CP|PE,SES,IQ) = P(SES,SEX,PE=1,IQ,CP)/P(PE|SEX,SES,IQ) SES SEX PE CP IQ truncation division

Bayesian Networks Search Limitations and Extensions Dynamic Relational Cycles Counterfactual FCI Algorithm Looks for set of DAGs (possibly with latent variables and selection bias) that entail all and only the conditional independence relations that hold in the data according to statistical tests.

Bayesian Networks Search Limitations and Extensions Dynamic Relational Cycles Counterfactual Markov Equivalence Two DAGs G1 and G2 are Markov equivalent when they contain the same variables, and for all disjoint X, Y, Z, X is entailed to be independent from Y conditional on Z in G1 if and only if X is entailed to be independent from Y conditional on Z in G2 this does not mean that they represent the same set of dsitributions in general, although it does for certain case equivalently, every distribution that factors according to G1 also factors according to G2 and vice-versa

Markov Equivalence Class Bayesian Networks Search Limitations and Extensions Dynamic Relational Cycles Counterfactual Markov Equivalence Class SES SEX PE CP IQ SES SEX PE CP IQ Can’t tell difference if using conditional independence DAG G DAG G’

Causal Faithfulness Assumption Bayesian Networks Search Limitations and Extensions Dynamic Relational Cycles Counterfactual Causal Faithfulness Assumption  In a population Pop with causal graph G and distribution P(V), if V is causally sufficient, IP(X,Y|Z) only if G entails I(X,Y|Z). ~IP(SES,CP|∅) because I(SES,CP|∅)is not entailed by G +… SES SEX PE CP IQ

Causal Faithfulness Assumption Bayesian Networks Search Limitations and Extensions Dynamic Relational Cycles Counterfactual Causal Faithfulness Assumption  Causal Faithfulness is too strong because can prove consistency with assumptions about fewer conditional independencies is unlikely to hold, especially when there are many variables. SES SEX PE CP IQ  Causal Faithfulness is too weak because it is not sufficient to prove uniform consistency (put error bounds at finite sample sizes.)

Good Features of FCI Algorithm Bayesian Networks Search Limitations and Extensions Dynamic Relational Cycles Counterfactual Good Features of FCI Algorithm Is pointwise consistent: As sample size → ∞, P(error in output pattern) → 0. Can be applied to distributions where tests of conditional independence are known Can be applied to hidden variable models (and selection bias models)

Bad Features of FCI Algorithm Bayesian Networks Search Limitations and Extensions Dynamic Relational Cycles Counterfactual Bad Features of FCI Algorithm There is no reliable way to set error bounds on the pattern without making stronger assumptions. Can only get set of Markov equivalent DAGs, not a single DAG Doesn’t allow for comparing how much better one model is than another Need to assume some version of Causal Faithfulness Assumption

Non Independence Constraints Bayesian Networks Search Limitations and Extensions Dynamic Relational Cycles Counterfactual Non Independence Constraints Depending on the parametric family, a DAG can entail constraints that are not conditional independence constraints Assuming linearity and non-Gaussian error terms, if a distribution is compatible with X → Y it is not compatible with X ← Y, even though they are Markov equivalent. this does not mean that they represent the same set of dsitributions in general, although it does for certain case equivalently, every distribution that factors according to G1 also factors according to G2 and vice-versa

Score-Based Search Strategy Bayesian Networks Search Limitations and Extensions Dynamic Relational Cycles Counterfactual Score-Based Search Strategy Assign score to graph and sample based on maximum likelihood of data given graph simplicity of model Do search over graph space for highest score

Advantages of Score-Based Search Strategy Bayesian Networks Search Limitations and Extensions Dynamic Relational Cycles Counterfactual Advantages of Score-Based Search Strategy Get more information about graph Additive noise models, unique DAG Doesn’t rely on binary decisions Local mistakes don’t propagate

Disadvantages of Score-Based Search Strategy Bayesian Networks Search Limitations and Extensions Dynamic Relational Cycles Counterfactual Disadvantages of Score-Based Search Strategy Often slower to calculate or not known how to calculate exactly if include unmeasured variables selection bias unusual distributions Search over graph space is often heuristic

Bayesian Networks Search Limitations and Extensions Dynamic Relational Cycles Counterfactual Dynamic Bayes Nets If measure same variable at different times, then the samples from the variable are not i.i.d. Solution: index each variable by time (time series)

Bayesian Networks Search Limitations and Extensions Dynamic Relational Cycles Counterfactual Dynamic Bayes Nets Make a template for the causal structure that can be filled in with actual times Xt-2 Xt-1 Xt Yt-2 Yt-1 Yt Continuous time or differential equations?

Population Bayesian Networks Search Limitations and Extensions Dynamic Relational Cycles Counterfactual Population parent-of parent-of parent-of SES SEX PE CP IQ SES SEX PE CP IQ SES SEX PE CP IQ

Population Not i.i.d. distribution Violations of SUTVA Bayesian Networks Search Limitations and Extensions Dynamic Relational Cycles Counterfactual Population SES SEX PE CP IQ parent-of parent-of parent-of Not i.i.d. distribution Violations of SUTVA Causal relations between relations (e.g. sibling causes rivalry)

Extended Manipulation Specification Bayesian Networks Search Limitations and Extensions Dynamic Relational Cycles Counterfactual Extended Manipulation Specification A manipulation assigns a density to a set of properties or relations at a set of times (measurable set of times T) for a set of units as a function of the values of a set of properties of relations

Extended Factorization Assumption Bayesian Networks Search Limitations and Extensions Dynamic Relational Cycles Counterfactual Extended Factorization Assumption Alice&Jim SES SEX PE CP IQ parent-of parent-of Sue Bob P([Alice&Jim.SES, Sue.SEX, Sue.PE, Sue.IQ, Sue.CP, Alice&Jim.SES, Bob.SEX, Bob.PE, Bob.IQ, Bob.CP) =

Extended Factorization Assumption Bayesian Networks Search Limitations and Extensions Dynamic Relational Cycles Counterfactual Extended Factorization Assumption P(Sue.SEX) P(Alice&Jim.SES) P(Sue.IQ|Alice&Jim .SES) P(Sue.PE|Alice&Jim.SES, Sue.SEX, Sue.IQ) P(Sue.CP|Sue.PE, Alice&Jim.SES, Sue.IQ) P(Bob.SEX) P(Alice&Jim.SES) P(Bob.IQ|Alice&Jim.SES) P(Bob.PE|Alice&Jim.SES, Bob.SEX, Bob.IQ) P(Bob.CP|Bob.PE, Alice&Jim.SES, Bob.IQ)

3 Interpretation of Cycles: PE ⇆ CP Bayesian Networks Search Limitations and Extensions Dynamic Relational Cycles Counterfactual 3 Interpretation of Cycles: PE ⇆ CP Equilibrium values of PE and CP cause each other. Average of values of PE and CP while reaching equilibrium influence each other. Mixture of PE → CP and CP → PE SES SEX PE CP IQ relation to experiments other kinds of equilibrium with different representation