Center for Causal Discovery: Summer Short Course/Datathon - 2018 June 13, 2018 Causal Search II
Outline Models Data Bridge Principles: Markov Axiom and D-separation Model Equivalence Model Search For Patterns For PAGs Multiple Regression vs. Model Search Mixed Variables – Discrete and Continuous Orienting via the degree of Gaussianity
Independence Equivalence Classes: Patterns & PAGs Patterns (Verma and Pearl, 1990): graphical representation of d-separation equivalence among models with no latent common causes PAGs: (Richardson 1994) graphical representation of a d-separation equivalence class that includes models with latent common causes and sample selection bias that are d-separation equivalent over a set of measured variables X
PAGs: Partial Ancestral Graphs 1. represents set of conditional independence and distribution equivalent graphs 2. same adjacencies 3. undirected edges mean some contain edge one way, some contain other way 4. directed edge means they all go same way 5. Pearl and Verma -complete rules for generating from Meek, Andersson, Perlman, and Madigan, and Chickering 6. instance of chain graph 7. since data can’t distinguish, in absence of background knowledge is right output for search 8. what are they good for?
PAGs: Partial Ancestral Graphs 1. represents set of conditional independence and distribution equivalent graphs 2. same adjacencies 3. undirected edges mean some contain edge one way, some contain other way 4. directed edge means they all go same way 5. Pearl and Verma -complete rules for generating from Meek, Andersson, Perlman, and Madigan, and Chickering 6. instance of chain graph 7. since data can’t distinguish, in absence of background knowledge is right output for search 8. what are they good for?
PAGs: Partial Ancestral Graphs 1. represents set of conditional independence and distribution equivalent graphs 2. same adjacencies 3. undirected edges mean some contain edge one way, some contain other way 4. directed edge means they all go same way 5. Pearl and Verma -complete rules for generating from Meek, Andersson, Perlman, and Madigan, and Chickering 6. instance of chain graph 7. since data can’t distinguish, in absence of background knowledge is right output for search 8. what are they good for?
PAGs: Partial Ancestral Graphs What PAG edges mean. 1. represents set of conditional independence and distribution equivalent graphs 2. same adjacencies 3. undirected edges mean some contain edge one way, some contain other way 4. directed edge means they all go same way 5. Pearl and Verma -complete rules for generating from Meek, Andersson, Perlman, and Madigan, and Chickering 6. instance of chain graph 7. since data can’t distinguish, in absence of background knowledge is right output for search 8. what are they good for?
PAG Search: Orientation Y Unshielded X Y Z X _||_ Z | Y X _||_ Z | Y Collider Non-Collider X Y Z X Y Z
PAG Search: Orientation After Adjacency Phase Collider Test: X1 – X3 – X2 X1 _||_ X2 Away from Collider Test: X1 X3 – X4 X2 X3 – X4 X1 _||_ X4 | X3 X2 _||_ X4 | X3
Tetrad Demo and Hands-on: College Plans Use FCI to search College Plans data, a = .01 Repeat with bootstrap
Interesting Cases X1 X2 L M1 M2 X Y Z Y1 Y2 L1 L L1 Z1 L2 X Y Z1 X Z2
Tetrad Demo and Hands-on Create new session Select “Simulate from a given graph, then search” from Pipelines menu Build a graph for interesting case M1, parameterize as you wish, and generate a large sample (e.g., N=1,000). Execute a Pattern search, e.g., PC Execute a PAG search, e.g., FCI. Explain the results Repeat for M2 M2 X1 Y2 L1 Y1 X2 M1 L X Y Z
Detectable Instrumental Variables X Y Z1 M3 Z2 L
rZ,Obesity rZ,TV Observational Studies - Instrumental Variables Goal: Estimate b2 SES a g b2 Traffic Density Upwind (IV) b1 Lead IQ C1 C2 Cn Idea: find a “natural” randomizer, i.e., an “instrument” rZ,Obesity rZ,TV Required Assumptions: IV adjacent and into cause Any trek from IV to effect goes through cause IV causally disconnected from, i.e., independent of,every confounder Z direct cause of Contract $ Z _||_ Every Confounder 1
rZ,Obesity rZ,TV Observational Studies - Instrumental Variables SES a g Traffic Density Upwind (IV) b1 b2 Lead IQ C1 C2 Cn In a standardized SEM: rZ,Obesity rZ,TV rIV,IQ rIV,Lead b1 * b2 = b2 = b1 rIV,effect rIV,cause = IV estimate of cause effect coefficient Z direct cause of Contract $ Z _||_ Every Confounder 1
Tetrad Demo and Hands-on Simulate from M3 interpreted as a standardized SEM Execute a PAG search, e.g., FCI. Use the sample correlation matrix to compute IV estimate of XY coefficient using Z1, and again separately for Z2. X Y Z1 M3 Z2 L Sample Correlation Matrix
Regression & Causal Inference Z1 X Z2 M4 Y
Regression & Causal Inference X Y ?? Size of the effect? Typical (non-experimental) strategy: Establish a prima facie case (X associated with Y) But, omitted variable bias X Y Z ? So, identifiy and measure potential confounders Z that are: prior to X, associated with X, associated with Y 3. Statistically control/adjust for Z (multiple regression) 1
Regression & Causal Inference Multiple regression or any similar strategy is provably unreliable for causal inference regarding X Y, with covariates Z, unless: X truly prior to Y X, Z, and Y are causally sufficient (no confounding) Multiple regression : Response Y, Independent Variable X, Covariates Z Coefficient bX is an unbiased estimate of the causal influence of X on Y, if and only if: in the true causal graph, after removing any edge that exists between X and Y, X and Y are d-separated by Z 1
Tetrad Demo and Hands-on Simulate from M4 parameterized as a standardized SEM, N=1,000. Execute FCI search Execute regression: Y response, X only predictor Execute regression: Y response, X and Z1 and Z2 as predictors
Outline Models Data Bridge Principles: Markov Axiom and D-separation Model Equivalence Model Search For Patterns For PAGs Multiple Regression vs. Model Search Mixed Variables – Discrete and Continuous (Joe Ramsey) Orienting via the degree of Gaussianity (Joe Ramsey)
Summary of Search
Causal Search from Passive Observation PC, FGS Patterns (Markov equivalence class - no latent confounding) FCI PAGs (Markov equivalence - including confounders and selection bias) CCD Linear cyclic models (no confounding) Lingam unique DAG (no confounding – linear non-Gaussian – faithfulness not needed) BPC, FOFC, FTFC (Equivalence class of linear latent variable models) LVLingam set of DAGs (confounders allowed) CyclicLingam set of DGs (cyclic models, no confounding) Non-linear additive noise models unique DAG Most of these algorithms are pointwise consistent – uniform consistent algorithms require stronger assumptions 1
Causal Search from Manipulations/Interventions What sorts of manipulation/interventions have been studied? Do(X=x) : replace P(X | parents(X)) with P(X=x) = 1.0 Randomize(X): (replace P(X | parents(X)) with PM(X), e.g., uniform) Soft interventions (replace P(X | parents(X)) with PM(X | parents(X), I), PM(I)) Simultaneous interventions (reduces the number of experiments required to be guaranteed to find the truth with an independence oracle from N-1 to 2 log(N) Sequential interventions Sequential, conditional interventions Time sensitive interventions 1
Randomization Association = Causation Randomizer Treatment Response Dropout Treatment _||_ Response | Dropout = no Treatment Response Randomizer U Assignment Treatment _||_ Response 1