Day 3: Search Continued Center for Causal Discovery June 15, 2015

Day 3: Search Continued Center for Causal Discovery June 15, 2015
Carnegie Mellon University

Outline Models  Data Bridge Principles: Markov Axiom and D-separation
Model Equivalence Model Search For Patterns For PAGs Multiple Regression vs. Model Search Measurement Issues and Latent Variables

Search Results? Charitable Giving Lead and IQ Timberlake and Williams

Constraint-based Search for Patterns
1) Adjacency phase 2) Orientation phase

Constraint-based Search for Patterns: Adjacency phase
X and Y are not adjacent if they are independent conditional on any subset that doesn’t X and Y 1) Adjacency Begin with a fully connected undirected graph Remove adjacency X-Y if X _||_ Y | any set S So (2) makes it much more efficient to start with a completely adjacent graph and look to prune

Constraint-based Search for Patterns: Orientation phase
Collider test: Find triples X – Y – Z, orient according to whether the set that separated X-Z contains Y Away from collider test: Find triples X  Y – Z, orient Y – Z connection via collider test Repeat until no further orientations Apply Meek Rules

Search: Orientation Patterns Y Unshielded Test: X _||_ Z | S, is Y  S
Collider X Y Z No Yes Non-Collider X Y Z

Search: Orientation Away from Collider

Away from Collider Test:
Search: Orientation After Adjacency Phase Collider Test: X1 – X3 – X2 X1 _||_ X2 Away from Collider Test: X1  X3 – X4 X2  X3 – X4 X1 _||_ X4 | X3 X2 _||_ X4 | X3

Away from Collider Power!
X2 – X3 oriented as X2  X3 Why does this test also show that X2 and X3 are not confounded?

Independence Equivalence Classes: Patterns & PAGs
Patterns (Verma and Pearl, 1990): graphical representation of d-separation equivalence among models with no latent common causes PAGs: (Richardson 1994) graphical representation of a d-separation equivalence class that includes models with latent common causes and sample selection bias that are d-separation equivalent over a set of measured variables X

PAGs: Partial Ancestral Graphs
1. represents set of conditional independence and distribution equivalent graphs 2. same adjacencies 3. undirected edges mean some contain edge one way, some contain other way 4. directed edge means they all go same way 5. Pearl and Verma -complete rules for generating from Meek, Andersson, Perlman, and Madigan, and Chickering 6. instance of chain graph 7. since data can’t distinguish, in absence of background knowledge is right output for search 8. what are they good for?

PAGs: Partial Ancestral Graphs
What PAG edges mean. 1. represents set of conditional independence and distribution equivalent graphs 2. same adjacencies 3. undirected edges mean some contain edge one way, some contain other way 4. directed edge means they all go same way 5. Pearl and Verma -complete rules for generating from Meek, Andersson, Perlman, and Madigan, and Chickering 6. instance of chain graph 7. since data can’t distinguish, in absence of background knowledge is right output for search 8. what are they good for?

PAG Search: Orientation
Y Unshielded X Y Z X _||_ Z | Y X _||_ Z | Y Collider Non-Collider X Y Z X Y Z

PAG Search: Orientation
After Adjacency Phase Collider Test: X1 – X3 – X2 X1 _||_ X2 Away from Collider Test: X1  X3 – X4 X2  X3 – X4 X1 _||_ X4 | X3 X2 _||_ X4 | X3

Interesting Cases X1 X2 L L M1 M2 X Y Z Y1 Y2 L1 L L1 Z1 L2 X Y Z1 X

Tetrad Demo and Hands-on
Create new session Select “Search from Simulated Data” from Template menu Build graphs for M1, M2, M3 “interesting cases”, parameterize, instantiate, and generate sample data N=1,000. Execute PC search, a = .05 Execute FCI search, a = .05 L M1 X Y Z X1 X2 L L M3 Z1 Y1 Y2 M2 X Y L1 Z2

Regression & Causal Inference

Typical (non-experimental) strategy: Establish a prima facie case (X associated with Y) But, omitted variable bias So, identifiy and measure potential confounders Z: prior to X, associated with X, associated with Y 3. Statistically adjust for Z (multiple regression) 1

Multiple regression or any similar strategy is provably unreliable for causal inference regarding X  Y, with covariates Z, unless: X truly prior to Y X, Z, and Y are causally sufficient (no confounding) 1

Tetrad Demo and Hands-on
Create new session Select “Search from Simulated Data” from Template menu Build a graph for M4 “interesting cases”, parameterize as SEM, instantiate, and generate sample data N=1,000. Execute PC search, a = .05 Execute FCI search, a = .05

Measurement

Measurement Error and Coarsening Endanger conditional Independence!
X Y X _||_ Y | Z Z Z* Z’ e’ Coarsening: -∞ < Z < 0  Z* = 0 0 ≤ Z < i  Z* = 1 i ≤ Z < j  Z* = 2 .. k ≤ Z < ∞  Z* = k Measurement Error: Z’ = Z + e X _||_ Y | Z’ (unless Var(e’) = 0) X _||_ Y | Z* (almost always)

Strategies 1. Parameterize measurement error: Sensitivity Analysis
Bayesian Analysis Bounds X Y Z Z’ e’ 2. Multiple Indicators: X Z Y Z1 Z2

Bayesian Analysis Bounds X Y Z Z’ e’ 2. Multiple Indicators: Scales X Z Y Z1 Z2 Z_scale X _||_ Y | Z_scale

Psuedorandom sample: N = 2,000
Parental Resources Lead Exposure -.5 1.0 IQ Regression of IQ on Lead, PR Independent Variable Coefficient Estimate p-value PR 0.98 0.000 Lead -0.088 0.378

Multiple Measures of the Confounder
Lead Exposure Parental Resources IQ X1 X2 X3 e1 e2 e3 X1 := g1* Parental Resources + e1 X2 := g2* Parental Resources + e2 X3 := g3* Parental Resources + e3

Scales don't preserve conditional independence
Lead Exposure Parental Resources IQ X1 X2 X3 PR_Scale = (X1 + X2 + X3) / 3 Independent Variable Coefficient Estimate p-value PR_scale 0.290 0.000 Lead -0.423

Indicators Don’t Preserve Conditional Independence
Lead Exposure Parental Resources IQ X1 X2 X3 Regress IQ on: Lead, X1, X2, X3 Independent Variable Coefficient Estimate p-value X1 0.22 0.002 X2 0.45 0.000 X3 0.18 0.013 Lead -0.414

Bayesian Analysis Bounds X Y Z Z’ e’ bxy ey ex 2. Multiple Indicators: Scales SEM X Y Z Z1 Z2 ez1 ez2  X _||_ Y | Z

Structural Equation Models Work
True Model Estimated Model X1 X2 X3 Lead Exposure Parental Resources IQ X1 X2 X3 Parental Resources Lead Exposure IQ b In the Structural Equation Model (p-value = .499) Lead and IQ “screened off” by PR

Coarsening is Bad Parental Resources < m(PR)  PR_binary = 0
Lead Exposure IQ Parental Resources < m(PR)  PR_binary = 0 Parental Resources ≥ m(PR)  PR_binary = 1 Independent Variable Coefficient Estimate p-value Screened-off at .05? PR_binary 3.53 0.000 No Lead -0.56

TV  Obesity Goals: Estimate the influence of TV on BMI
Proctor, et al. (2003). Television viewing and change in body fat from preschool to early adolescence: The Framingham Children’s Study International Journal of Obesity, 27, Exercise Obesity (BMI) TV Diet Goals: Estimate the influence of TV on BMI Tease apart the mechanisms (diet, exercise)

Measures of Exercise, Diet
Exercise_M [L,H] Exercise TV (age 4) Obesity (BMI) Age 11 Diet (Calories) Diet_M [L,H] Exercise_M: L  Calories expended in exercise in bottom two tertiles Exercise_M: H  Calories expended in exercise in top tertile Diet_M: L  Calories consumed in bottom two tertiles Diet_M: H  Calories consumed in top tertile

Measures of Exercise, Diet
Exercise_M [L,H] Exercise TV (age 4) Obesity (BMI) Age 11 Diet (Calories) Diet_M [L,H] Findings: TV and Obesity NOT screened off by Exercise_M & Diet_M Bias in mechanism estimation unknown

Problems with Latent Variable SEMs
Specified Model X Z Y Z1 ez1 Z2 ez2 ey ex True Model ex ey bxy X Y Z Z1 Z2 ez1 ez2 ≠ bxy  X _||_ Y | Z

Latent Variable Models

Psychometric Models Social/Personality Psychology

Psychometric Models Educational Research

Local Independence / Pure Measurement Models
Not Locally Independent Locally Independent Local Independence: For every pair of measured items xi, xj : xi _||_ xj | modeled latent parents of xi

Impure 1st Order Pure

1st Order Pure 2nd Order Pure

Rank 1 Constraints: Tetrad Equations
Fact: given it follows that L W = 1L + 1 X = 2L + 2 Y = 3L + 3 Z = 4L + 4 1 4 2 3 W X Y Z WXYZ = WYXZ = WZXY tetrad constraints CovWXCovYZ = (122L) (342L) = = (132L) (242L) = CovWYCovXZ

rm1,m2 * rr1,r2 = rm1,r1 * rm2,r2 = rm1,r2 * rm2,r1
Charles Spearman (1904) Statistical Constraints  Measurement Model Structure g m1 m2 r1 r2 rm1,m2 * rr1,r2 = rm1,r1 * rm2,r2 = rm1,r2 * rm2,r1 1

Impurities/Deviations from Local Independence
defeat tetrad constraints selectively rx1,x2 * rx3,x4 = rx1,x3 * rx2,x4 rx1,x2 * rx3,x4 = rx1,x4 * rx2,x3 rx1,x3 * rx2,x4 = rx1,x4 * rx2,x3 rx1,x2 * rx3,x4 = rx1,x3 * rx2,x4 rx1,x2 * rx3,x4 = rx1,x4 * rx2,x3 rx1,x3 * rx2,x4 = rx1,x4 * rx2,x3

Strategies Cluster and Purify MM first
Use rank constraints to find item subsets that form nth order pure clusters Using Pure MM : Search for Structural Model by testing independence relations among latents via SEM estimation Specify Impure Measurement Model Specify Measurement Model for all items Using Specified MM: Search for Structural Model by testing independence relations among latents via SEM estimation

Purify Impure 1st Order Purified

Search for Measurement Models
BPC, FOFC: Find One Factor Clusters Input: Covariance Matrix of measured items: Output: Subset of items and clusters that are 1st Order Pure FTFC: Find Two Factor Clusters Input: Covariance Matrix of measured items: Output: Subset of items and clusters that are 2nd Order Pure

BPC Case Study: Stress, Depression, and Religion
Masters Students (N = 127) item survey (Likert Scale) Stress: St1 - St21 Depression: D1 - D20 Religious Coping: C1 - C20 Specified Model P(c2) = 0.00

Case Study: Stress, Depression, and Religion
Build Pure Clusters

Case Study: Stress, Depression, and Religion
Assume : Stress causally prior to Depression Find : Stress _||_ Coping | Depression P(c2) = 0.28

2nd Order Pure S(X) FTFC 2nd-Order Pure Clusters: {X1, X2, X3, X4, X5, X6} {X8, X9, X10, X11, X12}

Summary of Search

Causal Search from Passive Observation
PC, FGS  Patterns (Markov equivalence class - no latent confounding) FCI  PAGs (Markov equivalence - including confounders and selection bias) CCD  Linear cyclic models (no confounding) Lingam  unique DAG (no confounding – linear non-Gaussian – faithfulness not needed) BPC, FOFC, FTFC  (Equivalence class of linear latent variable models) LVLingam  set of DAGs (confounders allowed) CyclicLingam  set of DGs (cyclic models, no confounding) Non-linear additive noise models  unique DAG Most of these algorithms are pointwise consistent – uniform consistent algorithms require stronger assumptions 1

Causal Search from Manipulations/Interventions
What sorts of manipulation/interventions have been studied? Do(X=x) : replace P(X | parents(X)) with P(X=x) = 1.0 Randomize(X): (replace P(X | parents(X)) with PM(X), e.g., uniform) Soft interventions (replace P(X | parents(X)) with PM(X | parents(X), I), PM(I)) Simultaneous interventions (reduces the number of experiments required to be guaranteed to find the truth with an independence oracle from N-1 to 2 log(N) Sequential interventions Sequential, conditional interventions Time sensitive interventions 1

Day 3: Search Continued Center for Causal Discovery June 15, 2015

Similar presentations

Presentation on theme: "Day 3: Search Continued Center for Causal Discovery June 15, 2015"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Day 3: Search Continued Center for Causal Discovery June 15, 2015

Similar presentations

Presentation on theme: "Day 3: Search Continued Center for Causal Discovery June 15, 2015"— Presentation transcript:

Similar presentations

About project

Feedback