Richard Scheines Carnegie Mellon University Searching for Statistical Causal Models: Theory and Practice Richard Scheines Carnegie Mellon University
Goals Policy, Law, and Science: How can we use data to answer subjunctive questions (effects of future policy interventions), or counterfactual questions (what would have happened had things been done differently (law)? scientific questions (what mechanisms run the world) Rumsfeld Problem: Do we know what we don’t know: Can we tell when there is or is not enough information in the data to answer causal questions?
Early Progenitors g rm1 * rr1 = rm2 * rr2 Charles Spearman (1904) Statistical Constraints Causal Structure g m1 m2 r1 r2 rm1 * rr1 = rm2 * rr2 1
Early Progenitors Sewall Wright (1920s,1930s) Graphical Model Causal & Statistical Interpretation The Method of Path Coefficients (1934). Annals of Mathematical Statistics, 5, 161-215. 1
Social Sciences: 1940s 1970s Factor Analysis Instrumental Variable Estimators Structural Equation Models Simultaneous Equation Models Economics: Cowles Commission Franklin Fisher Art Goldberger Clive Granger Herb Simon Haavelmo R. Strotz H. Wold Sociology, Psychometrics, etc. Hubert Blalock Herb Costner Otis Dudley Duncan David Heise David Kenny Ken Bollen 1
Population level Counterfactuals 1
1970s & 1980s:Graphical Models & Independence Structures S. Lauritzen J. Darroch T. Speed H. Kiiveri N. Wermuth D. Hausman D. Papineau P. Dawid D. Cox J. Robins J. Whittaker Judea Pearl 1988 1
1988 1993: Axioms, Intervention, and Latent Variable Model Search P. Spirtes, C. Glymour, and R. Scheines Causal Markov Axiom Full model of interventions, both surgical and non-surgical Equivalence classes for latent variable models, with search 1
Modern Non-Parametric Theory of Statistical Causal Models Intervention & Manipulation Graphical Models Counterfactuals Constraints (Independence) Modern Non-Parametric Theory of Statistical Causal Models Causal Bayes Nets 1
Semantics of SCMs Choice 1: Take direct causation as primitive, axiomatize Causal systems over V Probabilistic Independence Relations in P(V) Choice 2: Define direct causation in terms of intervention, i.e., (hypothetical) treatment)
Choice 1: Causal Markov Axiom If G is a causal graph, and P a probability distribution over the variables in G, then in <G,P> satisfy the Markov Axiom iff: every variable V is independent of its non-effects, conditional on its immediate causes.
Causal Graphs Causal Graph G = {V,E} Each edge X Y represents a direct causal claim: X is a direct cause of Y relative to V Chicken Pox 1. don’t define causality - but will introduce axioms to connect probability to causality 2. many fields proceed without agreement on definition - probability, “force” in mechanics, interpretation of quantum mechanics, etc. 3. a number of different kinds of graphs represent probability distributions and independence - advantage of directed graphs is also represents causal relations 4. will introduce several extensions
Causal Graphs Not Cause Complete Common Cause Complete 1. don’t define causality - but will introduce axioms to connect probability to causality 2. many fields proceed without agreement on definition - probability, “force” in mechanics, interpretation of quantum mechanics, etc. 3. a number of different kinds of graphs represent probability distributions and independence - advantage of directed graphs is also represents causal relations 4. will introduce several extensions
Interventions & Causal Graphs Model an intervention by adding an “intervention” variable outside the original system as a direct cause of its target. Pre-intervention graph Intervene on Income “Soft” Intervention Fat Hand - intervention - cholesterol drug -- arythmia “Hard” Intervention
Structural Equation Models Causal Graph 1. Structural Equations 2. Statistical Constraints Statistical Causal Model 1. example of recursive structural equation model without correlated errors 2. can show that assumption of independence of errors guarantees correctness of probabilitic interpretation 3. this represents both probability and causality
Structural Equation Models Causal Graph Structural Equations: One Assignment Equation for each variable V V := f(parents(V), errorV) for SEM (linear regression) f is a linear function Statistical Constraints: Joint Distribution over the Error terms 1. example of recursive structural equation model without correlated errors 2. can show that assumption of independence of errors guarantees correctness of probabilitic interpretation 3. this represents both probability and causality
Structural Equation Models Causal Graph Equations: Education := ed Income :=Educationincome Longevity :=EducationLongevity Statistical Constraints: (ed, Income,Income ) ~N(0,2) 2diagonal - no variance is zero SEM Graph (path diagram) 1. example of recursive structural equation model without correlated errors 2. can show that assumption of independence of errors guarantees correctness of probabilitic interpretation 3. this represents both probability and causality
Two Routes to the Causal Markov Condition Assumption 1: Weak Causal Markov Assumption V1,V2 causally disconnected V1 _||_ V2 Assumption 2a: Causal Markov Axiom Assumption 2b: Determinism, e.g., Structural Equations For each Vi V, Vi := f(parents(Vi)) 1
Choice 2: Define Direct Causation from Intervention X is a cause of Y iff x1 x2 P(Y | X set= x1) P(Y | X set= x2) X is a direct cause of Y relative to S, iff z,x1 x2 P(Y | X set= x1 , Z set= z) P(Y | X set= x2 , Z set= z) where Z = S - {X,Y}
Modularity of Intervention/Manipulation Causal Graph Structural Equations: Education := ed Longevity :=f1(Education)Longevity Income := f2(Education)income Manipulated Causal Graph M1 Manipulated Structural Equations: Education := ed Longevity :=f1(Education)Longevity Income := f3(M1) 1. example of recursive structural equation model without correlated errors 2. can show that assumption of independence of errors guarantees correctness of probabilitic interpretation 3. this represents both probability and causality
Manipulation --> Causal Markov Manipulation conception of causation and Modularity --> weak version of CMA Zhang, Jiji and Spirtes, Peter (2007) Detection of Unfaithfulness and Robust Causal Inference. In [2007] LSE-Pitt Conference: Confirmation, Induction and Science (London, 8 - 10 March, 2007) http://philsci-archive.pitt.edu/archive/00003188/01/Detection_of_Unfaithfulness_and_Robust_Causal_Inference.pdf 1
SCM Search Statistical Data Causal Structure Statistical Inference Background Knowledge - X2 before X3
Faithfulness Constraints on a probability distribution P generated by a causal structure G hold for all parameterizations of G. Revenues = aRate + cEconomy + eRev. Economy = bRate + eEcon. Faithfulness: a ≠ -bc 1
Equivalence Classes Equivalence: Independence (M1 ╞ X _||_ Y | Z M2 ╞ X _||_ Y | Z) Distribution (q1 q2 M1(q1) = M2(q2)) Independence (d-separation equivalence) DAGs : Patterns PAGs : Latent variable models Intervention Equivalence Classes Measurement Model Equivalence Classes Linear Non-Gaussian Model Equivalence Classes
Patterns 1. represents set of conditional independence and distribution equivalent graphs 2. same adjacencies 3. undirected edges mean some contain edge one way, some contain other way 4. directed edge means they all go same way 5. Pearl and Verma -complete rules for generating from Meek, Andersson, Perlman, and Madigan, and Chickering 6. instance of chain graph 7. since data can’t distinguish, in absence of background knowledge is right output for search 8. what are they good for?
Patterns: What the Edges Mean 1. represents set of conditional independence and distribution equivalent graphs 2. same adjacencies 3. undirected edges mean some contain edge one way, some contain other way 4. directed edge means they all go same way 5. Pearl and Verma -complete rules for generating from Meek, Andersson, Perlman, and Madigan, and Chickering 6. instance of chain graph 7. since data can’t distinguish, in absence of background knowledge is right output for search 8. what are they good for?
PAGs: Partial Ancestral Graphs 1. represents set of conditional independence and distribution equivalent graphs 2. same adjacencies 3. undirected edges mean some contain edge one way, some contain other way 4. directed edge means they all go same way 5. Pearl and Verma -complete rules for generating from Meek, Andersson, Perlman, and Madigan, and Chickering 6. instance of chain graph 7. since data can’t distinguish, in absence of background knowledge is right output for search 8. what are they good for?
PAGs: Partial Ancestral Graphs What PAG edges mean. 1. represents set of conditional independence and distribution equivalent graphs 2. same adjacencies 3. undirected edges mean some contain edge one way, some contain other way 4. directed edge means they all go same way 5. Pearl and Verma -complete rules for generating from Meek, Andersson, Perlman, and Madigan, and Chickering 6. instance of chain graph 7. since data can’t distinguish, in absence of background knowledge is right output for search 8. what are they good for?
Overview of Search Methods Constraint Based Searches TETRAD (SGS, PC, FCI) Very fast – max N ~ 1,000 Pointwise Consistent Scoring Searches Scores: BIC, AIC, etc. Search: Hill Climb, Genetic Alg., Simulated Annealing Difficult to extend to latent variable models Meek and Chickering Greedy Equivalence Class (GES) Very slow – max N ~ 30-40
Tetrad Demo http://www.phil.cmu.edu/projects/tetrad_download/
Case Study 1: Foreign Investment Does Foreign Investment in 3rd World Countries cause Repression? Timberlake, M. and Williams, K. (1984). Dependence, political exclusion, and government repression: Some cross-national evidence. American Sociological Review 49, 141-146. N = 72 PO degree of political exclusivity CV lack of civil liberties EN energy consumption per capita (economic development) FI level of foreign investment 1
Case Study 1: Foreign Investment Correlations po fi en fi -.175 en -.480 0.330 cv 0.868 -.391 -.430 1
Case Study 1: Foreign Investment Regression Results po = .227*fi - .176*en + .880*cv SE (.058) (.059) (.060) t 3.941 -2.99 14.6 Interpretation: increases in foreign investment increases political exclusion 1
Case Study 1: Foreign Investment Alternatives No model with testable constraint (df > 0) in which FI has a positive effect on PO
Case Study 2: Welfare Reform Single Mothers’ Self-Efficacy, Parenting in the Home Environment, and Children’s Development in a Two-Wave Study (Social Work Research, 29, 1, 7-20) Aurora Jackson, Richard Scheines
Case Study 2: Welfare Reform Sampling Scheme Longitudinal Data Time 1: 1996-97 (N = 188) Time 2: 1998-99 (N = 178) Single black mothers in NYC Current and former welfare recipients With a child who was 3 – 5 at time 1, and 6 to 8 at time 2
Constructs/Scales/Measures Case Study 2: Welfare Reform Constructs/Scales/Measures Employment Status Perceived Self-efficacy Depressive Symptoms Quality of Mother/Father Relationship Father/Child Contact Quality of Home Environment Behavior Problems Cognitive Development
Case Study 2: Welfare Reform Background Knowledge Tier 1: Employment Status Tier 2: Depression Self-efficacy Mother/Father Relationship Father/Child Contact Mother’s Parenting/HOME Tier 3: Negative Behaviors Cognitive Development Over 22 million path models consistent with these constraints
Case Study 2: Welfare Reform Conceptual Model c2 = 22.3, df = 20, p = .32 Tetrad Equivalence Class c2 = 18.87, df = 19, p = .46
Case Study 2: Welfare Reform Points of Agreement: Mother’s Self-Efficacy mediates the effect of Employment on all other variables. HOME environment mediates the effect of all other factors on outcomes: Cog. Develop and Prob. Behaviors Conceptual Model Points of Disagreement: Depression key cause vs. only an effect Tetrad
Online Course in Causal & Statistical Reasoning Case Study 3: Online Courseware Online Course in Causal & Statistical Reasoning
Case Study 3: Online Courseware Variables Pre-test (%) Print-outs (% modules printed) Quiz Scores (avg. %) Voluntary Exercises (% completed) Final Exam (%) 9 other variables
Printing and Voluntary Comprehension Checks: 2002 --> 2003 Case Study 3: Online Courseware Printing and Voluntary Comprehension Checks: 2002 --> 2003 2002 2003
Case Study 4: Charitable Giving Variables Tangibility/Concreteness (Exp manipulation) Imaginability (likert 1-7) Impact (avg. of 2 likerts) Sympathy (likert) Donation ($)
Case Study 4: Charitable Giving Theoretical Model study 1 (N= 94) df = 5, c2 = 52.0, p= 0.0000 study 2 (N= 115) df = 5, c2 = 62.6, p= 0.0000
Case Study 4: Charitable Giving study 1: df = 5, c2 = 5.88, p= 0.32 study 2: df = 5, c2 = 8.23, p= 0.14 GES Outputs study 1: df = 5, c2 = 3.99, p= 0.55 study 2: df = 5, c2 = 7.48, p= 0.18
The Causal Theory Formation Problem for Latent Variable Models Given observations on a number of variables, identify the latent variables that underlie these variables and the causal relations among these latent concepts. Example: Spectral measurements of solar radiation intensities. Variables are intensities at each measured frequency. Example: Quality of a Child’s Home Environment, Cumulative Exposure to Lead, Cognitive Functioning
The Most Common Automatic Solution: Exploratory Factor Analysis Chooses “factors” to account linearly for as much of the variance/covariance of the measured variables as possible. Great for dimensionality reduction Factor rotations are arbitrary Gives no information about the statistical and thus the causal dependencies among any real underlying factors. No general theory of the reliability of the procedure
Other Solutions Independent Components, etc Background Theory Scales
Other Solutions: Background Theory Specified Model Key Causal Question Thus, key statistical question: Lead _||_ Cog | Home ?
Other Solutions: Background Theory True Model “Impurities” Lead _||_ Cog | Home ? Yes, but statistical inference will say otherwise.
Purify Specified Model
Purify True Model
Purify True Model
Purify True Model
Purify True Model
Purify Purified Model
Other Solutions: Scales Scale = sum(measures of a latent)
Other Solutions: Scales True Model Pseudo-Random Sample: N = 2,000
Scales vs. Latent variable Models True Model Regression: Cognition on Home, Lead Predictor Coef SE Coef T P Constant -0.02291 0.02224 -1.03 0.303 Home 1.22565 0.02895 42.33 0.000 Lead -0.00575 0.02230 -0.26 0.797 S = 0.9940 R-Sq = 61.1% R-Sq(adj) = 61.0% Insig.
Scales vs. Latent variable Models True Model Scales homescale = (x1 + x2 + x3)/3 leadscale = (x4 + x5 + x6)/3 cogscale = (x7 + x8 + x9)/3
Scales vs. Latent variable Models True Model Regression: Cognition on homescale, Lead Cognition = - 0.0295 + 0.714 homescale - 0.178 Lead Predictor Coef SE Coef T P Constant -0.02945 0.02516 -1.17 0.242 homescal 0.71399 0.02299 31.05 0.000 Lead -0.17811 0.02386 -7.46 0.000 Sig.
Scales vs. Latent variable Models True Model Modeling Latents Specified Model
Scales vs. Latent variable Models True Model Estimated Model (c2 = 29.6, df = 24, p = .19) B5 = .0075, which at t=.23, is correctly insignificant
Scales vs. Latent variable Models True Model Mixing Latents and Scales (c2 = 14.57, df = 12, p = .26) B5 = -.137, which at t=5.2, is incorrectly highly significant P < .001
Build Pure Clusters True Model Output Output - provably reliable (pointwise consistent): Equivalence class of measurement models over a pure subset of measures True Model Output
Build Pure Clusters Qualitative Assumptions Quantitative Assumptions: Two types of nodes: measured (M) and latent (L) M L (measured don’t cause latents) Each m M measures (is a direct effect of) at least one l L No cycles involving M Quantitative Assumptions: Each m M is a linear function of its parents plus noise P(L) has second moments, positive variances, and no deterministic relations
Case Study 4: Stress, Depression, and Religion MSW Students (N = 127) 61 - item survey (Likert Scale) Stress: St1 - St21 Depression: D1 - D20 Religious Coping: C1 - C20 Specified Model p = 0.00
Case Study 4: Stress, Depression, and Religion Build Pure Clusters
Case Study 4: Stress, Depression, and Religion Assume Stress temporally prior: MIMbuild to find Latent Structure: p = 0.28
Case Study 5: Test Anxiety Bartholomew and Knott (1999), Latent variable models and factor analysis 12th Grade Males in British Columbia (N = 335) 20 - item survey (Likert Scale items): X1 - X20: Exploratory Factor Analysis:
Case Study 5: Test Anxiety Build Pure Clusters:
Case Study 5: Test Anxiety Build Pure Clusters: Exploratory Factor Analysis: p-value = 0.00 p-value = 0.47
Case Study 5: Test Anxiety MIMbuild p = .43 Unininformative Scales: No Independencies or Conditional Independencies
Other Cases Economics Climate Research Bessler, Pork Prices Hoover, multiple Cryder & Loewenstein, Charitable Giving Climate Research Glymour, Chu, , Teleconnections Epidemiology Scheines, Lead & IQ Biology Shipley, SGS, Spartina Grass Educational Research Easterday, Bias & Recall Laski, Numerical coding Neuroscience Glymour & Ramsey, fMRI
References General Spirtes, P., Glymour, C., Scheines, R. (2000). Causation, Prediction, and Search, 2nd Edition, MIT Press. Pearl, J. (2000). Causation: Models of Reasoning and Inference, Cambridge University Press. Biology Chu, Tianjaio, Glymour C., Scheines, R., & Spirtes, P, (2002). A Statistical Problem for Inference to Regulatory Structure from Associations of Gene Expression Measurement with Microarrays. Bioinformatics, 19: 1147-1152. Shipley, B. Exploring hypothesis space: examples from organismal biology. Computation, Causation and Discovery. C. Glymour and G. Cooper. Cambridge, MA, MIT Press. Shipley, B. (1995). Structured interspecific determinants of specific leaf area in 34 species of herbaceous angeosperms. Functional Ecology 9.
References Scheines, R. (2000). Estimating Latent Causal Influences: TETRAD III Variable Selection and Bayesian Parameter Estimation: the effect of Lead on IQ, Handbook of Data Mining, Pat Hayes, editor, Oxford University Press. Jackson, A., and Scheines, R., (2005). Single Mothers' Self-Efficacy, Parenting in the Home Environment, and Children's Development in a Two-Wave Study, Social Work Research , 29, 1, pp. 7-20. Timberlake, M. and Williams, K. (1984). Dependence, political exclusion, and government repression: Some cross-national evidence. American Sociological Review 49, 141-146.
References Economics Akleman, Derya G., David A. Bessler, and Diana M. Burton. (1999). ‘Modeling corn exports and exchange rates with directed graphs and statistical loss functions’, in Clark Glymour and Gregory F. Cooper (eds) Computation, Causation, and Discovery, American Association for Artificial Intelligence, Menlo Park, CA and MIT Press, Cambridge, MA, pp. 497-520. Awokuse, T. O. (2005) “Export-led Growth and the Japanese Economy: Evidence from VAR and Directed Acyclical Graphs,” Applied Economics Letters 12(14), 849-858. Bessler, David A. and N. Loper. (2001) “Economic Development: Evidence from Directed Acyclical Graphs” Manchester School 69(4), 457-476. Bessler, David A. and Seongpyo Lee. (2002). ‘Money and prices: U.S. data 1869-1914 (a study with directed graphs)’, Empirical Economics, Vol. 27, pp. 427-46. Demiralp, Selva and Kevin D. Hoover. (2003) !Searching for the Causal Structure of a Vector Autoregression," Oxford Bulletin of Economics and Statistics 65(supplement), pp. 745-767. Haigh, M.S., N.K. Nomikos, and D.A. Bessler (2004) “Integration and Causality in International Freight Markets: Modeling with Error Correction and Directed Acyclical Graphs,” Southern Economic Journal 71(1), 145-162. Sheffrin, Steven M. and Robert K. Triest. (1998). ‘A new approach to causality and economic growth’, unpublished typescript, University of California, Davis.
References Economics Swanson, Norman R. and Clive W.J. Granger. (1997). ‘Impulse response functions based on a causal approach to residual orthogonalization in vector autoregressions’, Journal of the American Statistical Association, Vol. 92, pp. 357-67. Demiralp, S., Hoover, K., & Perez, S. A Bootstrap Method for Identifying and Evaluating a Structural Vector Autoregression Oxford Bulletin of Economics and Statistics, 2008, 70, (4), 509-533 - Searching for the Causal Structure of a Vector Autoregression Oxford Bulletin of Economics and Statistics, 2003, 65, (s1), 745-767 Kevin D. Hoover, Selva Demiralp, Stephen J. Perez, Empirical Identification of the Vector Autoregression: The Causes and Effects of U.S. M2*, This paper was written to present at the Conference in Honour of David F. Hendry at Oxford University, 2325 August 2007. Selva Demiralp and Kevin D. Hoover , Searching for the Causal Structure of a Vector Autoregression, OXFORD BULLETIN OF ECONOMICS AND STATISTICS, 65, SUPPLEMENT (2003) 0305-9049 A. Moneta, and P. Spirtes “Graphical Models for the Identification of Causal Structures in Multivariate Time Series Model”, Proceedings of the 2006 Joint Conference on Information Sciences, JCIS 2006, Kaohsiung, Taiwan, ROC, October 8-11,2006, Atlantis Press, 2006.
References Causation, Prediction, and Search, 2nd Edition, (2000), by P. Spirtes, C. Glymour, and R. Scheines ( MIT Press) Causality: Models, Reasoning, and Inference (2000). By Judea Pearl, Cambridge Univ. Press Computation, Causation, & Discovery (1999), edited by C. Glymour and G. Cooper, MIT Press Eberhardt, F., and Scheines R., (2007).“Interventions and Causal Inference”, in PSA-2006, Proceedings of the 20th biennial meeting of the Philosophy of Science Association 2006 http://philsci.org/news/PSA06 Silva, R., Glymour, C., Scheines, R. and Spirtes, P. (2006) “Learning the Structure of Latent Linear Structure Models,” Journal of Machine Learning Research, 7, 191-246. TETRAD IV: www.phil.cmu.edu/projects/tetrad Web Course on Causal and Statistical Reasoning : www.phil.cmu.edu/projects/csr/ Causality Lab: www.phil.cmu.edu/projects/causality-lab 1