Nov. 13th, 20031 Causal Discovery Richard Scheines Peter Spirtes, Clark Glymour, and many others Dept. of Philosophy & CALD Carnegie Mellon.

Slides:

Advertisements

Similar presentations

1 Learning Causal Structure from Observational and Experimental Data Richard Scheines Carnegie Mellon University.

Advertisements

CSE 473/573 Computer Vision and Image Processing (CVIP) Ifeoma Nwogu Lecture 27 – Overview of probability concepts 1.

Causal Data Mining Richard Scheines Dept. of Philosophy, Machine Learning, & Human-Computer Interaction Carnegie Mellon.

Interactions Up to this point, when we’ve discussed the causal effect of one variable upon another we’ve assumed that the effect is additive and independent.

Discovering Cyclic Causal Models by Independent Components Analysis Gustavo Lacerda Peter Spirtes Joseph Ramsey Patrik O. Hoyer.

Topic Outline Motivation Representing/Modeling Causal Systems

Outline 1)Motivation 2)Representing/Modeling Causal Systems 3)Estimation and Updating 4)Model Search 5)Linear Latent Variable Models 6)Case Study: fMRI.

Structure Learning Using Causation Rules Raanan Yehezkel PAML Lab. Journal Club March 13, 2003.

The IMAP Hybrid Method for Learning Gaussian Bayes Nets Oliver Schulte School of Computing Science Simon Fraser University Vancouver, Canada

Lecture 5: Causality and Feature Selection Isabelle Guyon

Challenges posed by Structural Equation Models Thomas Richardson Department of Statistics University of Washington Joint work with Mathias Drton, UC Berkeley.

1 Automatic Causal Discovery Richard Scheines Peter Spirtes, Clark Glymour Dept. of Philosophy & CALD Carnegie Mellon.

From: Probabilistic Methods for Bioinformatics - With an Introduction to Bayesian Networks By: Rich Neapolitan.

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

The Simple Linear Regression Model: Specification and Estimation

1Causality & MDL Causal Models as Minimal Descriptions of Multivariate Systems Jan Lemeire June 15 th 2006.

Ambiguous Manipulations

Inferring Causal Graphs Computing 882 Simon Fraser University Spring 2002.

1 gR2002 Peter Spirtes Carnegie Mellon University.

MACHINE LEARNING 6. Multivariate Methods 1. Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Motivating Example  Loan.

Causal Models, Learning Algorithms and their Application to Performance Modeling Jan Lemeire Parallel Systems lab November 15 th 2006.

Causal Modeling for Anomaly Detection Andrew Arnold Machine Learning Department, Carnegie Mellon University Summer Project with Naoki Abe Predictive Modeling.

Learning In Bayesian Networks. Learning Problem Set of random variables X = {W, X, Y, Z, …} Training set D = { x 1, x 2, …, x N }  Each observation specifies.

1 Day 2: Search June 9, 2015 Carnegie Mellon University Center for Causal Discovery.

Review of Probability.

CAUSAL SEARCH IN THE REAL WORLD. A menu of topics  Some real-world challenges:  Convergence & error bounds  Sample selection bias  Simpson’s paradox.

Bayes Net Perspectives on Causation and Causal Inference

1 Part 2 Automatically Identifying and Measuring Latent Variables for Causal Theorizing.

1 Tetrad: Machine Learning and Graphcial Causal Models Richard Scheines Joe Ramsey Carnegie Mellon University Peter Spirtes, Clark Glymour.

Using Bayesian Networks to Analyze Expression Data N. Friedman, M. Linial, I. Nachman, D. Hebrew University.

Causal Inference and Graphical Models Peter Spirtes Carnegie Mellon University.

Bayesian Learning By Porchelvi Vijayakumar. Cognitive Science Current Problem: How do children learn and how do they get it right?

CS 4100 Artificial Intelligence Prof. C. Hafner Class Notes March 13, 2012.

1 Tutorial: Causal Model Search Richard Scheines Carnegie Mellon University Peter Spirtes, Clark Glymour, Joe Ramsey, others.

1 Causal Data Mining Richard Scheines Dept. of Philosophy, Machine Learning, & Human-Computer Interaction Carnegie Mellon.

1 Center for Causal Discovery: Summer Workshop June 8-11, 2015 Carnegie Mellon University.

Ch 8. Graphical Models Pattern Recognition and Machine Learning, C. M. Bishop, Revised by M.-O. Heo Summarized by J.W. Nam Biointelligence Laboratory,

Penn State - March 23, The TETRAD Project: Computational Aids to Causal Discovery Peter Spirtes, Clark Glymour, Richard Scheines and many others.

INTERVENTIONS AND INFERENCE / REASONING. Causal models  Recall from yesterday:  Represent relevance using graphs  Causal relevance ⇒ DAGs  Quantitative.

Slides for “Data Mining” by I. H. Witten and E. Frank.

Learning In Bayesian Networks. General Learning Problem Set of random variables X = {X 1, X 2, X 3, X 4, …} Training set D = { X (1), X (2), …, X (N)

1 BN Semantics 1 Graphical Models – Carlos Guestrin Carnegie Mellon University September 15 th, 2008 Readings: K&F: 3.1, 3.2, –  Carlos.

INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.

Exploratory studies: you have empirical data and you want to know what sorts of causal models are consistent with it. Confirmatory tests: you have a causal.

Lecture 2: Statistical learning primer for biologists

04/21/2005 CS673 1 Being Bayesian About Network Structure A Bayesian Approach to Structure Discovery in Bayesian Networks Nir Friedman and Daphne Koller.

Mediation: The Causal Inference Approach David A. Kenny.

1Causal Performance Models Causal Models for Performance Analysis of Computer Systems Jan Lemeire TELE lab May 24 th 2006.

1 BN Semantics 2 – Representation Theorem The revenge of d-separation Graphical Models – Carlos Guestrin Carnegie Mellon University September 17.

1 BN Semantics 1 Graphical Models – Carlos Guestrin Carnegie Mellon University September 15 th, 2006 Readings: K&F: 3.1, 3.2, 3.3.

1 Day 2: Search June 9, 2015 Carnegie Mellon University Center for Causal Discovery.

1 Day 2: Search June 14, 2016 Carnegie Mellon University Center for Causal Discovery.

Tetrad 1)Main website: 2)Download:

1 BN Semantics 3 – Now it’s personal! Parameter Learning 1 Graphical Models – Carlos Guestrin Carnegie Mellon University September 22 nd, 2006 Readings:

INTRODUCTION TO Machine Learning 2nd Edition

Review of Probability.

Day 3: Search Continued Center for Causal Discovery June 15, 2015

Markov Properties of Directed Acyclic Graphs

Center for Causal Discovery: Summer Short Course/Datathon

Center for Causal Discovery: Summer Short Course/Datathon

Causal Data Mining Richard Scheines

Inferring Causal Graphs

Causal Discovery Richard Scheines Peter Spirtes, Clark Glymour,

Markov Networks.

BN Semantics 3 – Now it’s personal! Parameter Learning 1

Searching for Graphical Causal Models of Education Data

Presentation transcript:

Nov. 13th, Causal Discovery Richard Scheines Peter Spirtes, Clark Glymour, and many others Dept. of Philosophy & CALD Carnegie Mellon

Nov. 13th, Outline 1.Motivation 2.Representation 3.Connecting Causation to Probability (Independence) 4.Searching for Causal Models 5.Improving on Regression for Causal Inference

Nov. 13th, Motivation Non-experimental Evidence Typical Predictive Questions Can we predict aggressiveness from the amount of violent TV watched Can we predict crime rates from abortion rates 20 years ago Causal Questions: Does watching violent TV cause Aggression? I.e., if we change TV watching, will the level of Aggression change?

Nov. 13th, Causal Estimation Manipulated Probability P(Y | X set= x, Z=z) from Unmanipulated Probability P(Y | X = x, Z=z) When and how can we use non-experimental data to tell us about the effect of an intervention?

Nov. 13th, Representation 1.Association & causal structure - qualitatively 2.Interventions 3.Statistical Causal Models 1.Bayes Networks 2.Structural Equation Models

Nov. 13th, Causation & Association X is a cause of Y iff  x 1  x 2 P(Y | X set= x 1 )  P(Y | X set= x 2 ) Causation is asymmetric: X Y  Y X X and Y are associated (X _||_ Y) iff  x 1  x 2 P(Y | X = x 1 )  P(Y | X = x 2 ) Association is symmetric: X _||_ Y  Y _||_ X

Nov. 13th, Direct Causation X is a direct cause of Y relative to S, iff  z,x 1  x 2 P(Y | X set= x 1, Z set= z)  P(Y | X set= x 2, Z set= z) where Z = S - {X,Y}

Nov. 13th, Causal Graphs Causal Graph G = {V,E} Each edge X  Y represents a direct causal claim: X is a direct cause of Y relative to V Chicken Pox

Nov. 13th, Causal Graphs Not Cause Complete Common Cause Complete

Nov. 13th, Modeling Ideal Interventions Ideal Interventions (on a variable X): Completely determine the value or distribution of a variable X Directly Target only X (no “fat hand”) E.g., Variables: Confidence, Athletic Performance Intervention 1: hypnosis for confidence Intervention 2: anti-anxiety drug (also muscle relaxer)

Nov. 13th, Sweaters On Room Temperature Pre-experimental SystemPost Modeling Ideal Interventions Interventions on the Effect

Nov. 13th, Modeling Ideal Interventions Sweaters On Room Temperature Pre-experimental SystemPost Interventions on the Cause

Nov. 13th, Interventions & Causal Graphs Model an ideal intervention by adding an “intervention” variable outside the original system Erase all arrows pointing into the variable intervened upon Intervene to change Inf Post-intervention graph ? Pre-intervention graph

Nov. 13th, Conditioning vs. Intervening P(Y | X = x 1 ) vs. P(Y | X set= x 1 ) Teeth Slides

Nov. 13th, Causal Bayes Networks P(S = 0) =.7 P(S = 1) =.3 P(YF = 0 | S = 0) =.99P(LC = 0 | S = 0) =.95 P(YF = 1 | S = 0) =.01P(LC = 1 | S = 0) =.05 P(YF = 0 | S = 1) =.20P(LC = 0 | S = 1) =.80 P(YF = 1 | S = 1) =.80P(LC = 1 | S = 1) =.20 P(S,YF, L) = P(S) P(YF | S) P(LC | S) The Joint Distribution Factors According to the Causal Graph, i.e., for all X in V P(V) =  P(X|Immediate Causes of(X))

Nov. 13th, Structural Equation Models 1. Structural Equations 2. Statistical Constraints Statistical Model Causal Graph

Nov. 13th, Structural Equation Models zStructural Equations: One Equation for each variable V in the graph: V = f(parents(V), error V ) for SEM (linear regression) f is a linear function zStatistical Constraints: Joint Distribution over the Error terms Causal Graph

Nov. 13th, Structural Equation Models Equations: Education =  ed Income =    Education  income Longevity =    Education  Longevity Statistical Constraints: (  ed,  Income,  Income ) ~N(0,  2 )  2  diagonal - no variance is zero Causal Graph SEM Graph (path diagram)

Nov. 13th, Connecting Causation to Probability

Nov. 13th, Causal Structure Statistical Predictions The Markov Condition Causal Graphs Independence X _||_ Z | Y i.e., P(X | Y) = P(X | Y, Z) Causal Markov Axiom

Nov. 13th, Causal Markov Axiom If G is a causal graph, and P a probability distribution over the variables in G, then in P: every variable V is independent of its non-effects, conditional on its immediate causes.

Nov. 13th, Causal Markov Condition Two Intuitions: 1) Immediate causes make effects independent of remote causes (Markov). 2) Common causes make their effects independent (Salmon).

Nov. 13th, Causal Markov Condition 1) Immediate causes make effects independent of remote causes (Markov). E || S | I E = Exposure to Chicken Pox I = Infected S = Symptoms Markov Cond.

Nov. 13th, Causal Markov Condition 2) Effects are independent conditional on their common causes. YF || LC | S Markov Cond.

Nov. 13th, Causal Structure  Statistical Data

Nov. 13th, Causal Markov Axiom In SEMs, d-separation follows from assuming independence among error terms that have no connection in the path diagram - i.e., assuming that the model is common cause complete.

Nov. 13th, Causal Markov and D-Separation In acyclic graphs: equivalent Cyclic Linear SEMs with uncorrelated errors: D-separation correct Markov condition incorrect Cyclic Discrete Variable Bayes Nets: If equilibrium --> d-separation correct Markov incorrect

Nov. 13th, D-separation: Conditioning vs. Intervening P(X 3 | X 2 )  P(X 3 | X 2, X 1 ) X 3 _||_ X 1 | X 2 P(X 3 | X 2 set= ) = P(X 3 | X 2 set=, X 1 ) X 3 _||_ X 1 | X 2 set=

Nov. 13th, Search From Statistical Data to Probability to Causation

Nov. 13th, Causal Discovery Statistical Data  Causal Structure Background Knowledge - X 2 before X 3 - no unmeasured common causes Statistical Inference

Nov. 13th, Representations of D-separation Equivalence Classes We want the representations to: Characterize the Independence Relations Entailed by the Equivalence Class Represent causal features that are shared by every member of the equivalence class

Nov. 13th, Patterns & PAGs Patterns (Verma and Pearl, 1990): graphical representation of an acyclic d-separation equivalence - no latent variables. PAGs: (Richardson 1994) graphical representation of an equivalence class including latent variable models and sample selection bias that are d-separation equivalent over a set of measured variables X

Nov. 13th, Patterns

Nov. 13th, Patterns: What the Edges Mean

Nov. 13th, Patterns

Nov. 13th, PAGs: Partial Ancestral Graphs What PAG edges mean.

Nov. 13th, PAGs: Partial Ancestral Graph

Nov. 13th, Tetrad 4 Demo

Nov. 13th, Overview of Search Methods Constraint Based Searches TETRAD Scoring Searches Scores: BIC, AIC, etc. Search: Hill Climb, Genetic Alg., Simulated Annealing Very difficult to extend to latent variable models Heckerman, Meek and Cooper (1999). “A Bayesian Approach to Causal Discovery” chp. 4 in Computation, Causation, and Discovery, ed. by Glymour and Cooper, MIT Press, pp

Nov. 13th, Regession and Causal Inference

Nov. 13th, Regression to estimate Causal Influence Let V = {X,Y,T}, where - Y : measured outcome - measured regressors: X = {X 1, X 2, …, X n } - latent common causes of pairs in X U Y: T = {T 1, …, T k } Let the true causal model over V be a Structural Equation Model in which each V  V is a linear combination of its direct causes and independent, Gaussian noise.

Nov. 13th, Regression to estimate Causal Influence Consider the regression equation: Y = b 0 + b 1 X 1 + b 2 X 2 +..… b n X n Let the OLS regression estimate b i be the estimated causal influence of X i on Y. That is, holding X/Xi experimentally constant, b i is an estimate of the change in E(Y) that results from an intervention that changes Xi by 1 unit. Let the real Causal Influence X i  Y =  i When is the OLS estimate b i an unbiased estimate of  i ?

Nov. 13th, Linear Regression Let the other regressors O = {X 1, X 2,....,X i-1, X i+1,...,X n } b i = 0 if and only if  Xi,Y.O = 0 In a multivariate normal distribuion,  Xi,Y.O = 0 if and only if X i _||_ Y | O

Nov. 13th, Linear Regression So in regression: b i = 0  X i _||_ Y | O But provably :  i = 0   S  O, X i _||_ Y | S So  S  O, X i _||_ Y | S   i = 0 ~  S  O, X i _||_ Y | S  don’t know (unless we’re lucky)

Nov. 13th, Regression Example b 2 = 0 b 1  0 X 1 _||_ Y | X 2 X 2 _||_ Y | X 1 Don’t know ~  S  {X 2 } X 1 _||_ Y | S  S  {X 1 } X 2 _||_ Y | {X 1 }  2 = 0

Nov. 13th, Regression Example b 1  0 ~  S  {X 2,X 3 }, X 1 _||_ Y | S X 1 _||_ Y | {X 2,X 3 } X 2 _||_ Y | {X 1,X 3 } b 2  0 b 3  0 X 3 _||_ Y | {X 1,X 2 } DK  S  {X 1,X 3 }, X 2 _||_ Y | {X 1 }  2 = 0 DK ~  S  {X 1,X 2 }, X 3 _||_ Y | S

Nov. 13th, Regression Example X 2 Y X 3 X 1 PAG

Nov. 13th, Regression Bias If X i is d-separated from Y conditional on X/X i in the true graph after removing X i  Y, and X contains no descendant of Y, then: b i is an unbiased estimate of  i See “Using Path Diagrams ….”

Nov. 13th, Applications Parenting among Single, Black Mothers Pneumonia Photosynthesis Lead - IQ College Retention Corn Exports Rock Classification Spartina Grass College Plans Political Exclusion Satellite Calibration Naval Readiness

Nov. 13th, References Causation, Prediction, and Search, 2 nd Edition, (2000), by P. Spirtes, C. Glymour, and R. Scheines ( MIT Press) Causality: Models, Reasoning, and Inference, (2000), Judea Pearl, Cambridge Univ. Press Computation, Causation, & Discovery (1999), edited by C. Glymour and G. Cooper, MIT Press Causality in Crisis?, (1997) V. McKim and S. Turner (eds.), Univ. of Notre Dame Press. TETRAD IV: Web Course on Causal and Statistical Reasoning :