Richard Scheines Carnegie Mellon University

Richard Scheines Carnegie Mellon University
Searching for Statistical Causal Models: Theory and Practice Richard Scheines Carnegie Mellon University

Goals Policy, Law, and Science: How can we use data to answer
subjunctive questions (effects of future policy interventions), or counterfactual questions (what would have happened had things been done differently (law)? scientific questions (what mechanisms run the world) Rumsfeld Problem: Do we know what we don’t know: Can we tell when there is or is not enough information in the data to answer causal questions?

Early Progenitors g rm1 * rr1 = rm2 * rr2 Charles Spearman (1904)
Statistical Constraints  Causal Structure g m1 m2 r1 r2 rm1 * rr1 = rm2 * rr2 1

Early Progenitors Sewall Wright (1920s,1930s)
Graphical Model  Causal & Statistical Interpretation The Method of Path Coefficients (1934). Annals of Mathematical Statistics, 5, 1

Social Sciences: 1940s  1970s Factor Analysis
Instrumental Variable Estimators Structural Equation Models Simultaneous Equation Models Economics: Cowles Commission Franklin Fisher Art Goldberger Clive Granger Herb Simon Haavelmo R. Strotz H. Wold Sociology, Psychometrics, etc. Hubert Blalock Herb Costner Otis Dudley Duncan David Heise David Kenny Ken Bollen 1

Population level Counterfactuals
1

1970s & 1980s:Graphical Models & Independence Structures
S. Lauritzen J. Darroch T. Speed H. Kiiveri N. Wermuth D. Hausman D. Papineau P. Dawid D. Cox J. Robins J. Whittaker Judea Pearl 1988 1

1988  1993: Axioms, Intervention, and Latent Variable Model Search
P. Spirtes, C. Glymour, and R. Scheines Causal Markov Axiom Full model of interventions, both surgical and non-surgical Equivalence classes for latent variable models, with search 1

Modern Non-Parametric Theory of Statistical Causal Models
Intervention & Manipulation Graphical Models Counterfactuals Constraints (Independence) Modern Non-Parametric Theory of Statistical Causal Models Causal Bayes Nets 1

Semantics of SCMs Choice 1: Take direct causation as primitive, axiomatize Causal systems over V  Probabilistic Independence Relations in P(V) Choice 2: Define direct causation in terms of intervention, i.e., (hypothetical) treatment)

Choice 1: Causal Markov Axiom
If G is a causal graph, and P a probability distribution over the variables in G, then in <G,P> satisfy the Markov Axiom iff: every variable V is independent of its non-effects, conditional on its immediate causes.

Causal Graphs Causal Graph G = {V,E}
Each edge X  Y represents a direct causal claim: X is a direct cause of Y relative to V Chicken Pox 1. don’t define causality - but will introduce axioms to connect probability to causality 2. many fields proceed without agreement on definition - probability, “force” in mechanics, interpretation of quantum mechanics, etc. 3. a number of different kinds of graphs represent probability distributions and independence - advantage of directed graphs is also represents causal relations 4. will introduce several extensions

Causal Graphs Not Cause Complete Common Cause Complete
1. don’t define causality - but will introduce axioms to connect probability to causality 2. many fields proceed without agreement on definition - probability, “force” in mechanics, interpretation of quantum mechanics, etc. 3. a number of different kinds of graphs represent probability distributions and independence - advantage of directed graphs is also represents causal relations 4. will introduce several extensions

Interventions & Causal Graphs
Model an intervention by adding an “intervention” variable outside the original system as a direct cause of its target. Pre-intervention graph Intervene on Income “Soft” Intervention Fat Hand - intervention - cholesterol drug -- arythmia “Hard” Intervention

Structural Equation Models
Causal Graph 1. Structural Equations 2. Statistical Constraints Statistical Causal Model 1. example of recursive structural equation model without correlated errors 2. can show that assumption of independence of errors guarantees correctness of probabilitic interpretation 3. this represents both probability and causality

Causal Graph Structural Equations: One Assignment Equation for each variable V V := f(parents(V), errorV) for SEM (linear regression) f is a linear function Statistical Constraints: Joint Distribution over the Error terms 1. example of recursive structural equation model without correlated errors 2. can show that assumption of independence of errors guarantees correctness of probabilitic interpretation 3. this represents both probability and causality

Causal Graph Equations: Education := ed Income :=Educationincome Longevity :=EducationLongevity Statistical Constraints: (ed, Income,Income ) ~N(0,2) 2diagonal - no variance is zero SEM Graph (path diagram) 1. example of recursive structural equation model without correlated errors 2. can show that assumption of independence of errors guarantees correctness of probabilitic interpretation 3. this represents both probability and causality

Two Routes to the Causal Markov Condition
Assumption 1: Weak Causal Markov Assumption V1,V2 causally disconnected  V1 _||_ V2 Assumption 2a: Causal Markov Axiom Assumption 2b: Determinism, e.g., Structural Equations For each Vi  V, Vi := f(parents(Vi)) 1

Choice 2: Define Direct Causation from Intervention
X is a cause of Y iff x1  x2 P(Y | X set= x1)  P(Y | X set= x2) X is a direct cause of Y relative to S, iff z,x1  x2 P(Y | X set= x1 , Z set= z)  P(Y | X set= x2 , Z set= z) where Z = S - {X,Y}

Modularity of Intervention/Manipulation
Causal Graph Structural Equations: Education := ed Longevity :=f1(Education)Longevity Income := f2(Education)income Manipulated Causal Graph M1 Manipulated Structural Equations: Education := ed Longevity :=f1(Education)Longevity Income := f3(M1) 1. example of recursive structural equation model without correlated errors 2. can show that assumption of independence of errors guarantees correctness of probabilitic interpretation 3. this represents both probability and causality

Manipulation --> Causal Markov
Manipulation conception of causation and Modularity > weak version of CMA Zhang, Jiji and Spirtes, Peter (2007) Detection of Unfaithfulness and Robust Causal Inference. In [2007] LSE-Pitt Conference: Confirmation, Induction and Science (London, March, 2007) 1

SCM Search Statistical Data  Causal Structure
Statistical Inference Background Knowledge - X2 before X3

Faithfulness Constraints on a probability distribution P generated by a causal structure G hold for all parameterizations of G. Revenues = aRate + cEconomy + eRev. Economy = bRate + eEcon. Faithfulness: a ≠ -bc 1

Equivalence Classes Equivalence:
Independence (M1 ╞ X _||_ Y | Z  M2 ╞ X _||_ Y | Z) Distribution (q1 q2 M1(q1) = M2(q2)) Independence (d-separation equivalence) DAGs : Patterns PAGs : Latent variable models Intervention Equivalence Classes Measurement Model Equivalence Classes Linear Non-Gaussian Model Equivalence Classes

Patterns 1. represents set of conditional independence and distribution equivalent graphs 2. same adjacencies 3. undirected edges mean some contain edge one way, some contain other way 4. directed edge means they all go same way 5. Pearl and Verma -complete rules for generating from Meek, Andersson, Perlman, and Madigan, and Chickering 6. instance of chain graph 7. since data can’t distinguish, in absence of background knowledge is right output for search 8. what are they good for?

Patterns: What the Edges Mean
1. represents set of conditional independence and distribution equivalent graphs 2. same adjacencies 3. undirected edges mean some contain edge one way, some contain other way 4. directed edge means they all go same way 5. Pearl and Verma -complete rules for generating from Meek, Andersson, Perlman, and Madigan, and Chickering 6. instance of chain graph 7. since data can’t distinguish, in absence of background knowledge is right output for search 8. what are they good for?

PAGs: Partial Ancestral Graphs
1. represents set of conditional independence and distribution equivalent graphs 2. same adjacencies 3. undirected edges mean some contain edge one way, some contain other way 4. directed edge means they all go same way 5. Pearl and Verma -complete rules for generating from Meek, Andersson, Perlman, and Madigan, and Chickering 6. instance of chain graph 7. since data can’t distinguish, in absence of background knowledge is right output for search 8. what are they good for?

PAGs: Partial Ancestral Graphs
What PAG edges mean. 1. represents set of conditional independence and distribution equivalent graphs 2. same adjacencies 3. undirected edges mean some contain edge one way, some contain other way 4. directed edge means they all go same way 5. Pearl and Verma -complete rules for generating from Meek, Andersson, Perlman, and Madigan, and Chickering 6. instance of chain graph 7. since data can’t distinguish, in absence of background knowledge is right output for search 8. what are they good for?

Overview of Search Methods
Constraint Based Searches TETRAD (SGS, PC, FCI) Very fast – max N ~ 1,000 Pointwise Consistent Scoring Searches Scores: BIC, AIC, etc. Search: Hill Climb, Genetic Alg., Simulated Annealing Difficult to extend to latent variable models Meek and Chickering Greedy Equivalence Class (GES) Very slow – max N ~ 30-40

Tetrad Demo

Case Study 1: Foreign Investment
Does Foreign Investment in 3rd World Countries cause Repression? Timberlake, M. and Williams, K. (1984). Dependence, political exclusion, and government repression: Some cross-national evidence. American Sociological Review 49, N = 72 PO degree of political exclusivity CV lack of civil liberties EN energy consumption per capita (economic development) FI level of foreign investment 1

Correlations po fi en fi en cv 1

Regression Results po = *fi *en *cv SE (.058) (.059) (.060) t Interpretation: increases in foreign investment increases political exclusion 1

Alternatives No model with testable constraint (df > 0) in which FI has a positive effect on PO

Case Study 2: Welfare Reform
Single Mothers’ Self-Efficacy, Parenting in the Home Environment, and Children’s Development in a Two-Wave Study (Social Work Research, 29, 1, 7-20) Aurora Jackson, Richard Scheines

Sampling Scheme Longitudinal Data Time 1: (N = 188) Time 2: (N = 178) Single black mothers in NYC Current and former welfare recipients With a child who was 3 – 5 at time 1, and 6 to 8 at time 2

Constructs/Scales/Measures
Case Study 2: Welfare Reform Constructs/Scales/Measures Employment Status Perceived Self-efficacy Depressive Symptoms Quality of Mother/Father Relationship Father/Child Contact Quality of Home Environment Behavior Problems Cognitive Development

Background Knowledge Tier 1: Employment Status Tier 2: Depression Self-efficacy Mother/Father Relationship Father/Child Contact Mother’s Parenting/HOME Tier 3: Negative Behaviors Cognitive Development Over 22 million path models consistent with these constraints

Conceptual Model c2 = 22.3, df = 20, p = .32 Tetrad Equivalence Class c2 = 18.87, df = 19, p = .46

Points of Agreement: Mother’s Self-Efficacy mediates the effect of Employment on all other variables. HOME environment mediates the effect of all other factors on outcomes: Cog. Develop and Prob. Behaviors Conceptual Model Points of Disagreement: Depression key cause vs. only an effect Tetrad

Online Course in Causal & Statistical Reasoning
Case Study 3: Online Courseware Online Course in Causal & Statistical Reasoning

Case Study 3: Online Courseware
Variables Pre-test (%) Print-outs (% modules printed) Quiz Scores (avg. %) Voluntary Exercises (% completed) Final Exam (%) 9 other variables

Printing and Voluntary Comprehension Checks: 2002 --> 2003
Case Study 3: Online Courseware Printing and Voluntary Comprehension Checks: > 2003 2002 2003

Case Study 4: Charitable Giving
Variables Tangibility/Concreteness (Exp manipulation) Imaginability (likert 1-7) Impact (avg. of 2 likerts) Sympathy (likert) Donation ($)

Theoretical Model study 1 (N= 94) df = 5, c2 = 52.0, p= study 2 (N= 115) df = 5, c2 = 62.6, p=

study 1: df = 5, c2 = 5.88, p= 0.32 study 2: df = 5, c2 = 8.23, p= 0.14 GES Outputs study 1: df = 5, c2 = 3.99, p= 0.55 study 2: df = 5, c2 = 7.48, p= 0.18

The Causal Theory Formation Problem for Latent Variable Models
Given observations on a number of variables, identify the latent variables that underlie these variables and the causal relations among these latent concepts. Example: Spectral measurements of solar radiation intensities. Variables are intensities at each measured frequency. Example: Quality of a Child’s Home Environment, Cumulative Exposure to Lead, Cognitive Functioning

The Most Common Automatic Solution: Exploratory Factor Analysis
Chooses “factors” to account linearly for as much of the variance/covariance of the measured variables as possible. Great for dimensionality reduction Factor rotations are arbitrary Gives no information about the statistical and thus the causal dependencies among any real underlying factors. No general theory of the reliability of the procedure

Other Solutions Independent Components, etc Background Theory Scales

Other Solutions: Background Theory
Specified Model Key Causal Question Thus, key statistical question: Lead _||_ Cog | Home ?

Other Solutions: Background Theory
True Model “Impurities” Lead _||_ Cog | Home ? Yes, but statistical inference will say otherwise.

Purify Specified Model

Purify True Model

Purify Purified Model

Other Solutions: Scales
Scale = sum(measures of a latent)

Other Solutions: Scales
True Model Pseudo-Random Sample: N = 2,000

Scales vs. Latent variable Models
True Model Regression: Cognition on Home, Lead Predictor Coef SE Coef T P Constant Home Lead S = R-Sq = 61.1% R-Sq(adj) = 61.0% Insig.

True Model Scales homescale = (x1 + x2 + x3)/3 leadscale = (x4 + x5 + x6)/3 cogscale = (x7 + x8 + x9)/3

True Model Regression: Cognition on homescale, Lead Cognition = homescale Lead Predictor Coef SE Coef T P Constant homescal Lead Sig.

True Model Modeling Latents Specified Model

True Model Estimated Model (c2 = 29.6, df = 24, p = .19) B5 = .0075, which at t=.23, is correctly insignificant

True Model Mixing Latents and Scales (c2 = 14.57, df = 12, p = .26) B5 = -.137, which at t=5.2, is incorrectly highly significant P < .001

Build Pure Clusters True Model Output
Output - provably reliable (pointwise consistent): Equivalence class of measurement models over a pure subset of measures True Model Output

Build Pure Clusters Qualitative Assumptions Quantitative Assumptions:
Two types of nodes: measured (M) and latent (L) M L (measured don’t cause latents) Each m  M measures (is a direct effect of) at least one l  L No cycles involving M Quantitative Assumptions: Each m  M is a linear function of its parents plus noise P(L) has second moments, positive variances, and no deterministic relations

Case Study 4: Stress, Depression, and Religion
MSW Students (N = 127) item survey (Likert Scale) Stress: St1 - St21 Depression: D1 - D20 Religious Coping: C1 - C20 Specified Model p = 0.00

Build Pure Clusters

Assume Stress temporally prior: MIMbuild to find Latent Structure: p = 0.28

Case Study 5: Test Anxiety
Bartholomew and Knott (1999), Latent variable models and factor analysis 12th Grade Males in British Columbia (N = 335) 20 - item survey (Likert Scale items): X1 - X20: Exploratory Factor Analysis:

Build Pure Clusters:

Build Pure Clusters: Exploratory Factor Analysis: p-value = 0.00 p-value = 0.47

MIMbuild p = .43 Unininformative Scales: No Independencies or Conditional Independencies

Other Cases Economics Climate Research Bessler, Pork Prices
Hoover, multiple Cryder & Loewenstein, Charitable Giving Climate Research Glymour, Chu, , Teleconnections Epidemiology Scheines, Lead & IQ Biology Shipley, SGS, Spartina Grass Educational Research Easterday, Bias & Recall Laski, Numerical coding Neuroscience Glymour & Ramsey, fMRI

References General Spirtes, P., Glymour, C., Scheines, R. (2000). Causation, Prediction, and Search, 2nd Edition, MIT Press. Pearl, J. (2000). Causation: Models of Reasoning and Inference, Cambridge University Press. Biology Chu, Tianjaio, Glymour C., Scheines, R., & Spirtes, P, (2002). A Statistical Problem for Inference to Regulatory Structure from Associations of Gene Expression Measurement with Microarrays. Bioinformatics, 19: Shipley, B. Exploring hypothesis space: examples from organismal biology. Computation, Causation and Discovery. C. Glymour and G. Cooper. Cambridge, MA, MIT Press. Shipley, B. (1995). Structured interspecific determinants of specific leaf area in 34 species of herbaceous angeosperms. Functional Ecology 9.

References Scheines, R. (2000). Estimating Latent Causal Influences: TETRAD III Variable Selection and Bayesian Parameter Estimation: the effect of Lead on IQ, Handbook of Data Mining, Pat Hayes, editor, Oxford University Press. Jackson, A., and Scheines, R., (2005). Single Mothers' Self-Efficacy, Parenting in the Home Environment, and Children's Development in a Two-Wave Study, Social Work Research , 29, 1, pp Timberlake, M. and Williams, K. (1984). Dependence, political exclusion, and government repression: Some cross-national evidence. American Sociological Review 49,

References Economics Akleman, Derya G., David A. Bessler, and Diana M. Burton. (1999). ‘Modeling corn exports and exchange rates with directed graphs and statistical loss functions’, in Clark Glymour and Gregory F. Cooper (eds) Computation, Causation, and Discovery, American Association for Artificial Intelligence, Menlo Park, CA and MIT Press, Cambridge, MA, pp Awokuse, T. O. (2005) “Export-led Growth and the Japanese Economy: Evidence from VAR and Directed Acyclical Graphs,” Applied Economics Letters 12(14), Bessler, David A. and N. Loper. (2001) “Economic Development: Evidence from Directed Acyclical Graphs” Manchester School 69(4), Bessler, David A. and Seongpyo Lee. (2002). ‘Money and prices: U.S. data (a study with directed graphs)’, Empirical Economics, Vol. 27, pp Demiralp, Selva and Kevin D. Hoover. (2003) !Searching for the Causal Structure of a Vector Autoregression," Oxford Bulletin of Economics and Statistics 65(supplement), pp Haigh, M.S., N.K. Nomikos, and D.A. Bessler (2004) “Integration and Causality in International Freight Markets: Modeling with Error Correction and Directed Acyclical Graphs,” Southern Economic Journal 71(1), Sheffrin, Steven M. and Robert K. Triest. (1998). ‘A new approach to causality and economic growth’, unpublished typescript, University of California, Davis.

References Economics Swanson, Norman R. and Clive W.J. Granger. (1997). ‘Impulse response functions based on a causal approach to residual orthogonalization in vector autoregressions’, Journal of the American Statistical Association, Vol. 92, pp Demiralp, S., Hoover, K., & Perez, S. A Bootstrap Method for Identifying and Evaluating a Structural Vector Autoregression Oxford Bulletin of Economics and Statistics, 2008, 70, (4), - Searching for the Causal Structure of a Vector Autoregression Oxford Bulletin of Economics and Statistics, 2003, 65, (s1), Kevin D. Hoover, Selva Demiralp, Stephen J. Perez, Empirical Identification of the Vector Autoregression: The Causes and Effects of U.S. M2*, This paper was written to present at the Conference in Honour of David F. Hendry at Oxford University, 2325 August 2007. Selva Demiralp and Kevin D. Hoover , Searching for the Causal Structure of a Vector Autoregression, OXFORD BULLETIN OF ECONOMICS AND STATISTICS, 65, SUPPLEMENT (2003) A. Moneta, and P. Spirtes “Graphical Models for the Identification of Causal Structures in Multivariate Time Series Model”, Proceedings of the 2006 Joint Conference on Information Sciences, JCIS 2006, Kaohsiung, Taiwan, ROC, October 8-11,2006, Atlantis Press, 2006.

References Causation, Prediction, and Search, 2nd Edition, (2000), by P. Spirtes, C. Glymour, and R. Scheines ( MIT Press) Causality: Models, Reasoning, and Inference (2000). By Judea Pearl, Cambridge Univ. Press Computation, Causation, & Discovery (1999), edited by C. Glymour and G. Cooper, MIT Press Eberhardt, F., and Scheines R., (2007).“Interventions and Causal Inference”, in PSA-2006, Proceedings of the 20th biennial meeting of the Philosophy of Science Association Silva, R., Glymour, C., Scheines, R. and Spirtes, P. (2006) “Learning the Structure of Latent Linear Structure Models,” Journal of Machine Learning Research, 7, TETRAD IV: Web Course on Causal and Statistical Reasoning : Causality Lab: 1

Richard Scheines Carnegie Mellon University

Similar presentations

Presentation on theme: "Richard Scheines Carnegie Mellon University"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Richard Scheines Carnegie Mellon University

Similar presentations

Presentation on theme: "Richard Scheines Carnegie Mellon University"— Presentation transcript:

Similar presentations

About project

Feedback