Day 3: Search Continued Center for Causal Discovery June 15, 2015

Slides:

Advertisements

Similar presentations

1 Learning Causal Structure from Observational and Experimental Data Richard Scheines Carnegie Mellon University.

Advertisements

Discovering Cyclic Causal Models by Independent Components Analysis Gustavo Lacerda Peter Spirtes Joseph Ramsey Patrik O. Hoyer.

1. Person 1 1.Stress 2.Depression 3. Religious Coping Task: learn causal model 2 Data from Bongjae Lee, described in Silva et al

Topic Outline Motivation Representing/Modeling Causal Systems

Weakening the Causal Faithfulness Assumption

Outline 1)Motivation 2)Representing/Modeling Causal Systems 3)Estimation and Updating 4)Model Search 5)Linear Latent Variable Models 6)Case Study: fMRI.

Structure Learning Using Causation Rules Raanan Yehezkel PAML Lab. Journal Club March 13, 2003.

Peter Spirtes, Jiji Zhang 1. Faithfulness comes in several flavors and is a kind of principle that selects simpler (in a certain sense) over more complicated.

Introduction of Probabilistic Reasoning and Bayesian Networks

1 Automatic Causal Discovery Richard Scheines Peter Spirtes, Clark Glymour Dept. of Philosophy & CALD Carnegie Mellon.

1 Graphical Models in Data Assimilation Problems Alexander Ihler UC Irvine Collaborators: Sergey Kirshner Andrew Robertson Padhraic Smyth.

Graphical Models Lei Tang. Review of Graphical Models Directed Graph (DAG, Bayesian Network, Belief Network) Typically used to represent causal relationship.

Ambiguous Manipulations

Inferring Causal Graphs Computing 882 Simon Fraser University Spring 2002.

1 gR2002 Peter Spirtes Carnegie Mellon University.

Computer vision: models, learning and inference Chapter 10 Graphical Models.

Causal Models, Learning Algorithms and their Application to Performance Modeling Jan Lemeire Parallel Systems lab November 15 th 2006.

Causal Modeling for Anomaly Detection Andrew Arnold Machine Learning Department, Carnegie Mellon University Summer Project with Naoki Abe Predictive Modeling.

. PGM 2002/3 – Tirgul6 Approximate Inference: Sampling.

1 Center for Causal Discovery: Summer Workshop June 8-11, 2015 Carnegie Mellon University.

1 Day 2: Search June 9, 2015 Carnegie Mellon University Center for Causal Discovery.

CAUSAL SEARCH IN THE REAL WORLD. A menu of topics  Some real-world challenges:  Convergence & error bounds  Sample selection bias  Simpson’s paradox.

Bayes Net Perspectives on Causation and Causal Inference

1 Part 2 Automatically Identifying and Measuring Latent Variables for Causal Theorizing.

1 Tetrad: Machine Learning and Graphcial Causal Models Richard Scheines Joe Ramsey Carnegie Mellon University Peter Spirtes, Clark Glymour.

Using Bayesian Networks to Analyze Expression Data N. Friedman, M. Linial, I. Nachman, D. Hebrew University.

Causal Inference and Graphical Models Peter Spirtes Carnegie Mellon University.

CJT 765: Structural Equation Modeling Class 7: fitting a model, fit indices, comparingmodels, statistical power.

1 Tutorial: Causal Model Search Richard Scheines Carnegie Mellon University Peter Spirtes, Clark Glymour, Joe Ramsey, others.

1 Causal Data Mining Richard Scheines Dept. of Philosophy, Machine Learning, & Human-Computer Interaction Carnegie Mellon.

1 Peter Spirtes, Richard Scheines, Joe Ramsey, Erich Kummerfeld, Renjie Yang.

Nov. 13th, Causal Discovery Richard Scheines Peter Spirtes, Clark Glymour, and many others Dept. of Philosophy & CALD Carnegie Mellon.

Methodological Problems in Cognitive Psychology David Danks Institute for Human & Machine Cognition January 10, 2003.

Learning Linear Causal Models Oksana Kohutyuk ComS 673 Spring 2005 Department of Computer Science Iowa State University.

Mediation: Solutions to Assumption Violation

CJT 765: Structural Equation Modeling Class 12: Wrap Up: Latent Growth Models, Pitfalls, Critique and Future Directions for SEM.

Ch 8. Graphical Models Pattern Recognition and Machine Learning, C. M. Bishop, Revised by M.-O. Heo Summarized by J.W. Nam Biointelligence Laboratory,

Penn State - March 23, The TETRAD Project: Computational Aids to Causal Discovery Peter Spirtes, Clark Glymour, Richard Scheines and many others.

Learning the Structure of Related Tasks Presented by Lihan He Machine Learning Reading Group Duke University 02/03/2006 A. Niculescu-Mizil, R. Caruana.

Learning With Bayesian Networks Markus Kalisch ETH Zürich.

INTERVENTIONS AND INFERENCE / REASONING. Causal models  Recall from yesterday:  Represent relevance using graphs  Causal relevance ⇒ DAGs  Quantitative.

G Lecture 81 Comparing Measurement Models across Groups Reducing Bias with Hybrid Models Setting the Scale of Latent Variables Thinking about Hybrid.

Exploratory studies: you have empirical data and you want to know what sorts of causal models are consistent with it. Confirmatory tests: you have a causal.

Talking Points Joseph Ramsey. LiNGAM Most of the algorithms included in Tetrad (other than KPC) assume causal graphs are to be inferred from conditional.

Lecture 2: Statistical learning primer for biologists

The Visual Causality Analyst: An Interactive Interface for Causal Reasoning Jun Wang, Stony Brook University Klaus Mueller, Stony Brook University, SUNY.

1Causal Performance Models Causal Models for Performance Analysis of Computer Systems Jan Lemeire TELE lab May 24 th 2006.

1 Day 2: Search June 9, 2015 Carnegie Mellon University Center for Causal Discovery.

1Causal Inference and the KMSS. Predicting the future with the right model Stijn Meganck Vrije Universiteit Brussel Department of Electronics and Informatics.

1 Day 2: Search June 14, 2016 Carnegie Mellon University Center for Causal Discovery.

CS 2750: Machine Learning Directed Graphical Models

The Centre for Longitudinal Studies Missing Data Strategy

CJT 765: Structural Equation Modeling

Markov Properties of Directed Acyclic Graphs

Hierarchical clustering approaches for high-throughput data

Causal Inference and Graphical Models

Center for Causal Discovery: Summer Short Course/Datathon

Center for Causal Discovery: Summer Short Course/Datathon

Causal Data Mining Richard Scheines

Inferring Causal Graphs

Causal Discovery Richard Scheines Peter Spirtes, Clark Glymour,

Markov Random Fields Presented by: Vladan Radosavljevic.

Class #19 – Tuesday, November 3

An Introduction to Correlational Research

Readings: K&F: 5.1, 5.2, 5.3, 5.4, 5.5, 5.6, 5.7 Markov networks, Factor graphs, and an unified view Start approximate inference If we are lucky… Graphical.

Searching for Graphical Causal Models of Education Data

BN Semantics 2 – The revenge of d-separation

Presentation transcript:

Day 3: Search Continued Center for Causal Discovery June 15, 2015 Carnegie Mellon University

Outline Models  Data Bridge Principles: Markov Axiom and D-separation Model Equivalence Model Search For Patterns For PAGs Multiple Regression vs. Model Search Measurement Issues and Latent Variables

Search Results? Charitable Giving Lead and IQ Timberlake and Williams

Constraint-based Search for Patterns 1) Adjacency phase 2) Orientation phase

Constraint-based Search for Patterns: Adjacency phase X and Y are not adjacent if they are independent conditional on any subset that doesn’t X and Y 1) Adjacency Begin with a fully connected undirected graph Remove adjacency X-Y if X _||_ Y | any set S So (2) makes it much more efficient to start with a completely adjacent graph and look to prune

Constraint-based Search for Patterns: Orientation phase Collider test: Find triples X – Y – Z, orient according to whether the set that separated X-Z contains Y Away from collider test: Find triples X  Y – Z, orient Y – Z connection via collider test Repeat until no further orientations Apply Meek Rules

Search: Orientation Patterns Y Unshielded Test: X _||_ Z | S, is Y  S Collider X Y Z No Yes Non-Collider X Y Z

Search: Orientation Away from Collider

Away from Collider Test: Search: Orientation After Adjacency Phase Collider Test: X1 – X3 – X2 X1 _||_ X2 Away from Collider Test: X1  X3 – X4 X2  X3 – X4 X1 _||_ X4 | X3 X2 _||_ X4 | X3

Away from Collider Power! X2 – X3 oriented as X2  X3 Why does this test also show that X2 and X3 are not confounded?

Independence Equivalence Classes: Patterns & PAGs Patterns (Verma and Pearl, 1990): graphical representation of d-separation equivalence among models with no latent common causes PAGs: (Richardson 1994) graphical representation of a d-separation equivalence class that includes models with latent common causes and sample selection bias that are d-separation equivalent over a set of measured variables X

PAGs: Partial Ancestral Graphs 1. represents set of conditional independence and distribution equivalent graphs 2. same adjacencies 3. undirected edges mean some contain edge one way, some contain other way 4. directed edge means they all go same way 5. Pearl and Verma -complete rules for generating from Meek, Andersson, Perlman, and Madigan, and Chickering 6. instance of chain graph 7. since data can’t distinguish, in absence of background knowledge is right output for search 8. what are they good for?

PAGs: Partial Ancestral Graphs 1. represents set of conditional independence and distribution equivalent graphs 2. same adjacencies 3. undirected edges mean some contain edge one way, some contain other way 4. directed edge means they all go same way 5. Pearl and Verma -complete rules for generating from Meek, Andersson, Perlman, and Madigan, and Chickering 6. instance of chain graph 7. since data can’t distinguish, in absence of background knowledge is right output for search 8. what are they good for?

PAGs: Partial Ancestral Graphs What PAG edges mean. 1. represents set of conditional independence and distribution equivalent graphs 2. same adjacencies 3. undirected edges mean some contain edge one way, some contain other way 4. directed edge means they all go same way 5. Pearl and Verma -complete rules for generating from Meek, Andersson, Perlman, and Madigan, and Chickering 6. instance of chain graph 7. since data can’t distinguish, in absence of background knowledge is right output for search 8. what are they good for?

PAG Search: Orientation Y Unshielded X Y Z X _||_ Z | Y X _||_ Z | Y Collider Non-Collider X Y Z X Y Z

PAG Search: Orientation After Adjacency Phase Collider Test: X1 – X3 – X2 X1 _||_ X2 Away from Collider Test: X1  X3 – X4 X2  X3 – X4 X1 _||_ X4 | X3 X2 _||_ X4 | X3

Interesting Cases X1 X2 L L M1 M2 X Y Z Y1 Y2 L1 L L1 Z1 L2 X Y Z1 X

Tetrad Demo and Hands-on Create new session Select “Search from Simulated Data” from Template menu Build graphs for M1, M2, M3 “interesting cases”, parameterize, instantiate, and generate sample data N=1,000. Execute PC search, a = .05 Execute FCI search, a = .05 L M1 X Y Z X1 X2 L L M3 Z1 Y1 Y2 M2 X Y L1 Z2

Regression & Causal Inference

Regression & Causal Inference Typical (non-experimental) strategy: Establish a prima facie case (X associated with Y) But, omitted variable bias So, identifiy and measure potential confounders Z: prior to X, associated with X, associated with Y 3. Statistically adjust for Z (multiple regression) 1

Regression & Causal Inference Multiple regression or any similar strategy is provably unreliable for causal inference regarding X  Y, with covariates Z, unless: X truly prior to Y X, Z, and Y are causally sufficient (no confounding) 1

Tetrad Demo and Hands-on Create new session Select “Search from Simulated Data” from Template menu Build a graph for M4 “interesting cases”, parameterize as SEM, instantiate, and generate sample data N=1,000. Execute PC search, a = .05 Execute FCI search, a = .05

Measurement

Measurement Error and Coarsening Endanger conditional Independence! X Y X _||_ Y | Z Z Z* Z’ e’ Coarsening: -∞ < Z < 0  Z* = 0 0 ≤ Z < i  Z* = 1 i ≤ Z < j  Z* = 2 .. k ≤ Z < ∞  Z* = k Measurement Error: Z’ = Z + e X _||_ Y | Z’ (unless Var(e’) = 0) X _||_ Y | Z* (almost always)

Strategies 1. Parameterize measurement error: Sensitivity Analysis Bayesian Analysis Bounds X Y Z Z’ e’ 2. Multiple Indicators: X Z Y Z1 Z2

Strategies 1. Parameterize measurement error: Sensitivity Analysis Bayesian Analysis Bounds X Y Z Z’ e’ 2. Multiple Indicators: Scales X Z Y Z1 Z2 Z_scale X _||_ Y | Z_scale

Psuedorandom sample: N = 2,000 Parental Resources Lead Exposure -.5 1.0 IQ Regression of IQ on Lead, PR Independent Variable Coefficient Estimate p-value PR 0.98 0.000 Lead -0.088 0.378

Multiple Measures of the Confounder Lead Exposure Parental Resources IQ X1 X2 X3 e1 e2 e3 X1 := g1* Parental Resources + e1 X2 := g2* Parental Resources + e2 X3 := g3* Parental Resources + e3

Scales don't preserve conditional independence Lead Exposure Parental Resources IQ X1 X2 X3 PR_Scale = (X1 + X2 + X3) / 3 Independent Variable Coefficient Estimate p-value PR_scale 0.290 0.000 Lead -0.423

Indicators Don’t Preserve Conditional Independence Lead Exposure Parental Resources IQ X1 X2 X3 Regress IQ on: Lead, X1, X2, X3 Independent Variable Coefficient Estimate p-value X1 0.22 0.002 X2 0.45 0.000 X3 0.18 0.013 Lead -0.414

Strategies 1. Parameterize measurement error: Sensitivity Analysis Bayesian Analysis Bounds X Y Z Z’ e’ bxy ey ex 2. Multiple Indicators: Scales SEM X Y Z Z1 Z2 ez1 ez2  X _||_ Y | Z

Structural Equation Models Work True Model Estimated Model X1 X2 X3 Lead Exposure Parental Resources IQ X1 X2 X3 Parental Resources Lead Exposure IQ b In the Structural Equation Model (p-value = .499) Lead and IQ “screened off” by PR

Coarsening is Bad Parental Resources < m(PR)  PR_binary = 0 Lead Exposure IQ Parental Resources < m(PR)  PR_binary = 0 Parental Resources ≥ m(PR)  PR_binary = 1 Independent Variable Coefficient Estimate p-value Screened-off at .05? PR_binary 3.53 0.000 No Lead -0.56

Coarsening is Bad Parental Resources < m(PR)  PR_binary = 0 Lead Exposure IQ Parental Resources < m(PR)  PR_binary = 0 Parental Resources ≥ m(PR)  PR_binary = 1 Independent Variable Coefficient Estimate p-value Screened-off at .05? PR_binary 3.53 0.000 No Lead -0.56

TV  Obesity Goals: Estimate the influence of TV on BMI Proctor, et al. (2003). Television viewing and change in body fat from preschool to early adolescence: The Framingham Children’s Study International Journal of Obesity, 27, 827-833. Exercise Obesity (BMI) TV Diet Goals: Estimate the influence of TV on BMI Tease apart the mechanisms (diet, exercise)

Measures of Exercise, Diet Exercise_M [L,H] Exercise TV (age 4) Obesity (BMI) Age 11 Diet (Calories) Diet_M [L,H] Exercise_M: L  Calories expended in exercise in bottom two tertiles Exercise_M: H  Calories expended in exercise in top tertile Diet_M: L  Calories consumed in bottom two tertiles Diet_M: H  Calories consumed in top tertile

Measures of Exercise, Diet Exercise_M [L,H] Exercise TV (age 4) Obesity (BMI) Age 11 Diet (Calories) Diet_M [L,H] Findings: TV and Obesity NOT screened off by Exercise_M & Diet_M Bias in mechanism estimation unknown

Problems with Latent Variable SEMs Specified Model X Z Y Z1 ez1 Z2 ez2 ey ex True Model ex ey bxy X Y Z Z1 Z2 ez1 ez2 ≠ bxy  X _||_ Y | Z

Latent Variable Models

Psychometric Models Social/Personality Psychology

Psychometric Models Educational Research

Local Independence / Pure Measurement Models Not Locally Independent Locally Independent Local Independence: For every pair of measured items xi, xj : xi _||_ xj | modeled latent parents of xi

Local Independence / Pure Measurement Models Impure 1st Order Pure

Local Independence / Pure Measurement Models 1st Order Pure 2nd Order Pure

Rank 1 Constraints: Tetrad Equations Fact: given it follows that L W = 1L + 1 X = 2L + 2 Y = 3L + 3 Z = 4L + 4 1 4 2 3 W X Y Z WXYZ = WYXZ = WZXY tetrad constraints CovWXCovYZ = (122L) (342L) = = (132L) (242L) = CovWYCovXZ

rm1,m2 * rr1,r2 = rm1,r1 * rm2,r2 = rm1,r2 * rm2,r1 Charles Spearman (1904) Statistical Constraints  Measurement Model Structure g m1 m2 r1 r2 rm1,m2 * rr1,r2 = rm1,r1 * rm2,r2 = rm1,r2 * rm2,r1 1

Impurities/Deviations from Local Independence defeat tetrad constraints selectively rx1,x2 * rx3,x4 = rx1,x3 * rx2,x4 rx1,x2 * rx3,x4 = rx1,x4 * rx2,x3 rx1,x3 * rx2,x4 = rx1,x4 * rx2,x3 rx1,x2 * rx3,x4 = rx1,x3 * rx2,x4 rx1,x2 * rx3,x4 = rx1,x4 * rx2,x3 rx1,x3 * rx2,x4 = rx1,x4 * rx2,x3

Strategies Cluster and Purify MM first Use rank constraints to find item subsets that form nth order pure clusters Using Pure MM : Search for Structural Model by testing independence relations among latents via SEM estimation Specify Impure Measurement Model Specify Measurement Model for all items Using Specified MM: Search for Structural Model by testing independence relations among latents via SEM estimation

Purify Impure 1st Order Purified

Search for Measurement Models BPC, FOFC: Find One Factor Clusters Input: Covariance Matrix of measured items: Output: Subset of items and clusters that are 1st Order Pure FTFC: Find Two Factor Clusters Input: Covariance Matrix of measured items: Output: Subset of items and clusters that are 2nd Order Pure

BPC Case Study: Stress, Depression, and Religion Masters Students (N = 127) 61 - item survey (Likert Scale) Stress: St1 - St21 Depression: D1 - D20 Religious Coping: C1 - C20 Specified Model P(c2) = 0.00

Case Study: Stress, Depression, and Religion Build Pure Clusters

Case Study: Stress, Depression, and Religion Assume : Stress causally prior to Depression Find : Stress _||_ Coping | Depression P(c2) = 0.28

2nd Order Pure S(X) FTFC 2nd-Order Pure Clusters: {X1, X2, X3, X4, X5, X6} {X8, X9, X10, X11, X12}

Summary of Search

Causal Search from Passive Observation PC, FGS  Patterns (Markov equivalence class - no latent confounding) FCI  PAGs (Markov equivalence - including confounders and selection bias) CCD  Linear cyclic models (no confounding) Lingam  unique DAG (no confounding – linear non-Gaussian – faithfulness not needed) BPC, FOFC, FTFC  (Equivalence class of linear latent variable models) LVLingam  set of DAGs (confounders allowed) CyclicLingam  set of DGs (cyclic models, no confounding) Non-linear additive noise models  unique DAG Most of these algorithms are pointwise consistent – uniform consistent algorithms require stronger assumptions 1

Causal Search from Manipulations/Interventions What sorts of manipulation/interventions have been studied? Do(X=x) : replace P(X | parents(X)) with P(X=x) = 1.0 Randomize(X): (replace P(X | parents(X)) with PM(X), e.g., uniform) Soft interventions (replace P(X | parents(X)) with PM(X | parents(X), I), PM(I)) Simultaneous interventions (reduces the number of experiments required to be guaranteed to find the truth with an independence oracle from N-1 to 2 log(N) Sequential interventions Sequential, conditional interventions Time sensitive interventions 1