1 Searching for Causal Models Richard Scheines Philosophy, Machine Learning, Human-Computer Interaction Carnegie Mellon University
2 Goals 1.Basic Familiarity with Causal Model Search: oWhat it is oWhat it can and cannot do 2.Basic Familiarity with Tetrad IV oWhat it is oWhat it can and cannot do
3 Outline 1.Motivation 2.Representing Causal Systems 3.Strategies for Causal Inference 4.Causal Model Search 5.Examples 6.Causal Model Search with Latent Variables
1. Motivation Conditioning ≠ Intervening : P(Y | X = x ) ≠ P(Y | X set= x) When and how can we use non-experimental data to tell us about the effect of a future intervention?
Motivation Rumsfeld Problem: Do we know what we don’t know: Can we tell when there is not enough information in the data + background knowledge to infer causation?
Motivation: Example Online Course: As good or better than lecture? What student behaviors cause learning?
Full Semester Online Course in Causal & Statistical Reasoning
Course is tooled to record certain events: Logins, page requests, print requests, quiz attempts, quiz scores, voluntary exercises attempted, etc. Each event was associated with attributes: Time student-id Session-id
9 Experiments 2000 : Online vs. Lecture, UCSD Winter (N = 180) Spring (N = 120) 2001: Online vs. Lecture, Pitt & UCSD UCSD - winter (N = 190) Pitt (N = 80) UCSD - spring (N = 110)
10 Online vs. Lecture Delivery Online: No lecture / one recitation per week Required to finish approximately 2 online modules / week Lecture: 2 Lectures / one recitation per week Printed out modules as reading – extra assignments Same Material, same Exams: 2 Paper and Pencil Midterms 1 Paper and Pencil Final Exam
11 Pitt Variables Pre-test (%) Midterm1 (%) Midterm 2 (%) Final Exam (%) Recitation attendance (%) Lecture attendance (%) Gender Online (1 = online, 0 = lecture)
12 Online vs. Lecture - Pitt Online students did 1/2 a St.Dev better than lecture students (p =.059) Factors affecting performance: Practice Questions Attempted Cost: Online condition costs 1/3 less per student df = 2 2 = 0.08 p-value =.96
13 Printing and Voluntary Comprehension Checks: >
14 2. Representing Causal Systems 1.Causal structure - qualitatively 2.Interventions 3.Statistical Causal Models 1.Causal Bayes Networks 2.Structural Equation Models
15 Causal Graphs Causal Graph G = {V,E} Each edge X Y represents a direct causal claim: X is a direct cause of Y relative to V Chicken Pox
16 Causal Graphs Do Not need to be Cause Complete Do need to be Common Cause Complete Omitted Causes 2Omitted Causes 1
17 Sweaters On Room Temperature Pre-experimental SystemPost Modeling Ideal Interventions Interventions on the Effect
18 Modeling Ideal Interventions Sweaters On Room Temperature Pre-experimental SystemPost Interventions on the Cause
19 Interventions & Causal Graphs Model an ideal intervention by adding an “intervention” variable outside the original system as a direct cause of its target. Pre-intervention graph Intervene on Income “Soft” Intervention “Hard” Intervention
20 Causal Bayes Networks P(S = 0) =.7 P(S = 1) =.3 P(YF = 0 | S = 0) =.99P(LC = 0 | S = 0) =.95 P(YF = 1 | S = 0) =.01P(LC = 1 | S = 0) =.05 P(YF = 0 | S = 1) =.20P(LC = 0 | S = 1) =.80 P(YF = 1 | S = 1) =.80P(LC = 1 | S = 1) =.20 P(S,YF, L) = P(S) P(YF | S) P(LC | S) The Joint Distribution Factors According to the Causal Graph, i.e., for all X in V P(V) = P(X|Immediate Causes of(X))
21 Tetrad Demo
22 Structural Equation Models 1. Structural Equations 2. Statistical Constraints Statistical Model Causal Graph
23 Structural Equation Models Structural Equations: One Equation for each variable V in the graph: V = f(parents(V), error V ) for SEM (linear regression) f is a linear function Statistical Constraints: Joint Distribution over the Error terms Causal Graph
24 Structural Equation Models Equations: Education = ed Income = Education income Longevity = Education Longevity Statistical Constraints: ( ed, Income, Income ) ~N(0, 2 ) 2 diagonal - no variance is zero Causal Graph SEM Graph (path diagram)
Calculating the Effect of Interventions Pre-manipulation Joint Distribution (YF,S,L) Intervention, Causal Graph Post-manipulation Joint Distribution (YF,S,L)
Calculating the Effect of Interventions P(YF,S,L) = P(S) P(YF|S) P(L|S) P(YF,S,L) m = P(S) P(YF|Manip) P(L|S) Replace pre-manipulation causes with manipulation
Structural Equations: Education = ed Longevity = f Education) Longevity Income = f Education) income Modularity of Intervention/Manipulation Causal Graph Manipulated Structural Equations: Education = ed Longevity = f Education) Longevity Income = f M1) Manipulated Causal Graph M1
Structural Equations: Education = ed Longevity = f Education) Longevity Income = f Education) income Modularity of Intervention/Manipulation Causal Graph Manipulated Structural Equations: Education = ed Longevity = f Education) Longevity Income = f M2,Education) income Manipulated Causal Graph M2
29 3. Strategies for Causal Inference
Goal: Causation (X Y) Problem: Association Causation Why? -- Mainly confounding Solutions (Designs) oExperiments Controlled Trials Randomized Trials oObservational Studies Quasi-Experiments - Fortuitous Randomization Instrumental Variables Statistical Control Quasi-Experiments – Blocking Interrupted Time Series oCausal Model Search 30
31 Statistical Evidence - Question 1: Is there an Association? TV,Obsesity ≠ 0 TV,Obsesity = 0
32 Statistical Evidence – Question 2: Is the Association Spurious? TV,Obsesity ≠ 0 Produced by: Spurious Association Causal Association
33 The Problem of Confounding TV Obesity Permissiveness of Parents C1C1 C2C2 CnCn ?? Contract $ # IEDs Ethnic Alignment with Central Govt. C1C1 C2C2 CnCn ?? Hours of TV BMI Contract $ # IEDs
34 Randomized Trials eliminate Spurious Association Exposure (treatment) assigned randomly In an RT: association between exposure and outcome: strong evidence of causation:
35 Designs for Dealing With Confounding Contract $ # IEDs Ethnic Alignment C1C1 C2C2 CnCn ?? Randomizer 1) Experiments - Randomized Trials
36 Designs for Dealing With Confounding Contract $ # IEDs Ethnic Alignment C1C1 C2C2 CnCn ?? Randomizer 1) Experiments - Randomized Trials All confounders removed Often Ethically or Practically Impossible
37 Designs for Dealing With Confounding Contract $ # IEDs Ethnic Alignment C1C1 C2C2 CnCn ?? 2a) Observational Studies - Statistical Control Contract$,#IEDs All confounders must be measured.EthnicAlignment, C1, C2,..,Cn
38 Eliminating Spurious Association without Randomizing/Assigning/Controlling Exposure All confounders measured? TV,Obestity.Permissiveness ≠ 0 Confounders measured well? TV,Obestity.PoorMeasure ≠ 0 Statistical Adjustment (controlling for covariates) TV,Obestity.Permissiveness = 0 TV,Obestity. ≠ 0
39 Designs for Dealing With Confounding 2b) Observational Studies - Instrumental Variables Contracting Agent (Z) Needed Assumptions: Z direct cause of Contract $ Z independent of every confounder Contract $ # IEDs Ethnic Alignment with Central Govt. C1C1 C2C2 CnCn ?? Idea: Z is a partial natural randomizer
40 Designs for Dealing With Confounding Gender-matched Instructor Learning C1C1 C2C2 CnCn ?? 2c) Observational Studies: Quasi-Experiments – Fortuitous Randomization Random Assignment of Instructor
41 Designs for Dealing With Confounding Gender-matched Instructor Learning C1C1 C2C2 CnCn ?? 2c) Observational Studies: Quasi-Experiments – Fortuitous Randomization Random Assignment of Instructor
42 Designs for Dealing With Confounding TV Obesity Permissiveness of Parents C1C1 C2C2 CnCn ?? 2c) Quasi-Experiments - Blocking Identical Twins Subset Data to only Twins
43 Strategies for Dealing With Confounding TV Obesity Permissiveness of Parents C1C1 C2C2 CnCn ?? 2c) Quasi-Experiments - Blocking Identical Twins TV,Obesity in Twin 1 vs. TV,Obesity in Twin 2 Subset Data to only Twins
44 Regression & Causal Inference
45 Regression & Causal Inference 2.So, identifiy and measure potential confounders Z: a)prior to X, b)associated with X, c)associated with Y Typical (non-experimental) strategy: 1.Establish a prima facie case (X associated with Y) 3. Statistically adjust for Z (multiple regression) But, omitted variable bias
46 Regression & Causal Inference Strategy threatened by measurement error – ignore this for now Multiple regression is provably unreliable for causal inference unless: X prior to Y X, Z, and Y are causally sufficient (no confounding)
Examples Truth RegressionAlternative? X = 0 Z ≠ 0 X ≠ 0 Z ≠ 0 X ≠ 0 Z1 ≠ 0 Z2 ≠ 0
48 Better Methods Exist Causal Model Search (since 1988): Provably Reliable Provably Rumsfeld
49 4. Causal Model Search
50 Causal Discovery Statistical Data Causal Structure Background Knowledge - X 2 before X 3 - no unmeasured common causes Statistical Inference
51 Faithfulness Constraints on a probability distribution P generated by a causal structure G hold for all parameterizations of G. Revenues = aRate + cEconomy + Rev. Economy = bRate + Econ. Faithfulness: a ≠ -bc
52 The Problem of Alternatives: Observationally Equivalent Models Given an Experimental Setup, and Background Knowledge, and Theory, and a set of independence relations, what are all the models that would entail those independence relations that are consistent with BK and Theory?
53 Equivalence Classes Independence (d-separation equivalence) DAGs : Patterns PAGs : Latent variable models Intervention Equivalence Classes Measurement Model Equivalence Classes Linear Non-Gaussian Model Equivalence Classes Equivalence: Independence (M 1 ╞ X _||_ Y | Z M 2 ╞ X _||_ Y | Z) Distribution ( 1 2 M 1 ( 1 ) = M 2 ( 2 ))
54 Representations of Independence Equivalence Classes We want the representations to: Characterize the Independence Relations Entailed by the Equivalence Class Represent causal features that are shared by every member of the equivalence class
55 Patterns & PAGs Patterns (Verma and Pearl, 1990): graphical representation of Markov equivalence - with no latent variables. PAGs: (Richardson 1994) graphical representation of an equivalence class including latent variable models and sample selection bias that are Markov equivalent over a set of measured variables X
56 Patterns
57 Patterns
58 PAGs: Partial Ancestral Graphs
Regression vs. PAGs X Y Z 2 Z 1 Truth RegressionPAG X Y Z 1 X Y X = 0 Z ≠ 0 X ≠ 0 Z ≠ 0 X ≠ 0 Z1 ≠ 0 Z2 ≠ 0
60 Causal Model Search Background Knowledge PC, GES, CPC FCI, CFCI Impossible
61 Overview of Search Methods Constraint Based Searches TETRAD (SGS, PC, FCI) Very fast – capable of handling 1,000 variables Pointwise, but not uniformly consistent Scoring Searches Scores: BIC, AIC, etc. Search: Hill Climb, Genetic Alg., Simulated Annealing Difficult to extend to latent variable models Meek and Chickering Greedy Equivalence Class (GES) Very slow – max N ~ Pointwise, but not uniformly consistent
62 5. Examples
63 Case Study 1: Foreign Investment Does Foreign Investment in 3 rd World Countries cause Political Repression? Timberlake, M. and Williams, K. (1984). Dependence, political exclusion, and government repression: Some cross-national evidence. American Sociological Review 49, N = 72 POdegree of political exclusivity CVlack of civil liberties ENenergy consumption per capita (economic development) FIlevel of foreign investment
64 Correlations po fi en fi en cv Case Study 1: Foreign Investment
65 Regression Results po =.227*fi -.176*en +.880*cv SE (.058) (.059) (.060) t Interpretation: foreign investment increases political repression Case Study 1: Foreign Investment
Alternatives Case Study 1: Foreign Investment There is no model with testable constraints (df > 0) in which FI has a positive effect on PO that is not rejected by the data.
67 Variables Tangibility/Concreteness (Exp manipulation) Imaginability (likert 1-7) Impact (avg. of 2 likerts) Sympathy (likert) Donation ($) Case Study 2: Charitable Giving Cryder & Loewenstein (in prep)
68 Theoretical Model Case Study 2: Charitable Giving study 1 (N= 94) df = 5, 2 = 52.0, p=
69 GES Outputs Case Study 2: Charitable Giving study 1: df = 5, 2 = 5.88, p= 0.32 study 1: df = 5, 2 = 3.99, p= 0.55
70 Theoretical Model Case Study 2: Charitable Giving study 2 (N= 115) df = 5, 2 = 62.6, p= study 2: df = 5, 2 = 8.23, p= 0.14 study 2: df = 5, 2 = 7.48, p= 0.18
71 GES Outputs Case Study 2: Charitable Giving study 1: df = 5, 2 = 5.88, p= 0.32 study 2: df = 5, 2 = 8.23, p= 0.14 study 1: df = 5, 2 = 3.99, p= 0.55 study 2: df = 5, 2 = 7.48, p= 0.18
Lead and IQ: Variable Selection Final Variables (Needleman) -leadbaby teeth -fabfather’s age -mabmother’s age -nlbnumber of live births -medmother’s education -piqparent’s IQ -ciqchild’s IQ
Needleman Regression - standardized coefficient - (t-ratios in parentheses) - p-value for significance ciq = lead fab nlb med mab piq (2.32) (1.79) (2.30) (3.08) (1.97) (3.87) < <0.01 All variables significant at.1 R 2 =.271
TETRAD Variable Selection Tetrad mab _||_ ciq fab _||_ ciq nlb _||_ ciq | med Regression mab _||_ ciq | { lead, med, piq, nlb fab} fab _||_ ciq | { lead, med, piq, nlb mab} nlb _||_ ciq | { lead, med, piq, mab, fab}
Regressions - standardized coefficient - (t-ratios in parentheses) - p-value for significance Needleman (R 2 =.271) ciq = lead fab nlb med mab piq (2.32) (1.79) (2.30) (3.08) (1.97) (3.87) < <0.01 TETRAD (R 2 =.243) ciq = lead med piq (2.89) (3.50) (3.59) <0.01 <0.01 <0.01
Measurement Error Measured regressor variables are proxies that involve measurement error Errors-in-all-variables model for Lead’s influence on IQ - underidentified Strategies: Sensitivity Analysis Bayesian Analysis
Prior over Measurement Error Proportion of Variance from Measurement Error Measured Lead Mean =.2,SD =.1 Parent’s IQMean =.3,SD =.15 Mother’s Education Mean =.3,SD =.15 Prior Otherwise uninformative
Posterior Zero Robust over similar priors
Using Needleman’s Covariates With similar prior, the marginal posterior: Very Sensitive to Prior Over Regressors TETRAD eliminated Zero
80 6. Causal Model Search with Latent Variables
81 The Causal Theory Formation Problem for Latent Variable Models Given observations on a number of variables, identify the latent variables that underlie these variables and the causal relations among these latent concepts. Example: Spectral measurements of solar radiation intensities. Variables are intensities at each measured frequency. Example: Quality of a Child’s Home Environment, Cumulative Exposure to Lead, Cognitive Functioning
82 The Most Common Automatic Solution: Exploratory Factor Analysis Chooses “factors” to account linearly for as much of the variance/covariance of the measured variables as possible. Great for dimensionality reduction Factor rotations are arbitrary Gives no information about the statistical and thus the causal dependencies among any real underlying factors. No general theory of the reliability of the procedure
83 Other Solutions Independent Components, etc Background Theory Scales
84 Other Solutions: Background Theory Key Causal Question Thus, key statistical question: Lead _||_ Cog | Home ? Specified Model
85 Lead _||_ Cog | Home ? Yes, but statistical inference will say otherwise. Other Solutions: Background Theory True Model “Impurities”
86 Purify Specified Model
87 Purify True Model
88 Purify True Model
89 Purify True Model
90 Purify True Model
91 Purify Purified Model
92 Scale = sum(measures of a latent) Other Solutions: Scales
93 True Model Other Solutions: Scales Pseudo-Random Sample: N = 2,000
94 Scales vs. Latent variable Models Regression: Cognition on Home, Lead Predictor Coef SE Coef T P Constant Home Lead S = R-Sq = 61.1% R-Sq(adj) = 61.0% Insig. True Model
95 Scales vs. Latent variable Models Scales homescale = (x1 + x2 + x3)/3 leadscale = (x4 + x5 + x6)/3 cogscale = (x7 + x8 + x9)/3 True Model
96 Scales vs. Latent variable Models Cognition = homescale Lead Predictor Coef SE Coef T P Constant homescal Lead Regression: Cognition on homescale, Lead Sig. True Model
97 Scales vs. Latent variable Models Modeling Latents True Model Specified Model
98 Scales vs. Latent variable Models ( 2 = 29.6, df = 24, p =.19) B5 =.0075, which at t=.23, is correctly insignificant True Model Estimated Model
99 Scales vs. Latent variable Models Mixing Latents and Scales ( 2 = 14.57, df = 12, p =.26) B5 = -.137, which at t=5.2, is incorrectly highly significant P <.001 True Model
100 Build Pure Clusters Output - provably reliable (pointwise consistent): Equivalence class of measurement models over a pure subset of measures True Model Output
101 Build Pure Clusters Qualitative Assumptions 1.Two types of nodes: measured (M) and latent (L) 2.M L (measured don’t cause latents) 3.Each m M measures (is a direct effect of) at least one l L 4.No cycles involving M Quantitative Assumptions: 1.Each m M is a linear function of its parents plus noise 2.P(L) has second moments, positive variances, and no deterministic relations
102 Case Study 4: Stress, Depression, and Religion MSW Students (N = 127) 61 - item survey (Likert Scale) Stress: St 1 - St 21 Depression: D 1 - D 20 Religious Coping: C 1 - C 20 p = 0.00 Specified Model
103 Build Pure Clusters Case Study 4: Stress, Depression, and Religion
104 Assume Stress temporally prior: MIMbuild to find Latent Structure: p = 0.28 Case Study 4: Stress, Depression, and Religion
105 Case Study 5: Test Anxiety Bartholomew and Knott (1999), Latent variable models and factor analysis 12th Grade Males in British Columbia (N = 335) 20 - item survey (Likert Scale items): X 1 - X 20 : Exploratory Factor Analysis:
106 Build Pure Clusters : Case Study 5: Test Anxiety
107 Build Pure Clusters: p-value = 0.00p-value = 0.47 Exploratory Factor Analysis: Case Study 5: Test Anxiety
108 MIMbuild p =.43Unininformative Scales: No Independencies or Conditional Independencies Case Study 5: Test Anxiety
109 Economics Bessler, Pork Prices Hoover, multiple Other Cases Educational Research Easterday, Bias & Recall Laski, Numerical coding Climate Research Glymour, Chu,, Teleconnections Biology Shipley, SGS, Spartina Grass Neuroscience Glymour & Ramsey, fMRI Epidemiology Scheines, Lead & IQ
Software Education: - Causality Lab: - Web Course on Causal and Statistical Reasoning, and Empirical Research Methods: Research: Tetrad:
References Causation, Prediction, and Search, 2 nd Edition, (2000), by P. Spirtes, C. Glymour, and R. Scheines ( MIT Press) Causality: Models, Reasoning, and Inference (2000). By Judea Pearl, Cambridge Univ. Press Computation, Causation, & Discovery (1999), edited by C. Glymour and G. Cooper, MIT Press
112 References Biology Chu, Tianjaio, Glymour C., Scheines, R., & Spirtes, P, (2002). A Statistical Problem for Inference to Regulatory Structure from Associations of Gene Expression Measurement with Microarrays. Bioinformatics, 19: Shipley, B. Exploring hypothesis space: examples from organismal biology. Computation, Causation and Discovery. C. Glymour and G. Cooper. Cambridge, MA, MIT Press. Shipley, B. (1995). Structured interspecific determinants of specific leaf area in 34 species of herbaceous angeosperms. Functional Ecology 9.
113 References Scheines, R. (2000). Estimating Latent Causal Influences: TETRAD III Variable Selection and Bayesian Parameter Estimation: the effect of Lead on IQ, Handbook of Data Mining, Pat Hayes, editor, Oxford University Press. Jackson, A., and Scheines, R., (2005). Single Mothers' Self-Efficacy, Parenting in the Home Environment, and Children's Development in a Two-Wave Study, Social Work Research, 29, 1, pp Timberlake, M. and Williams, K. (1984). Dependence, political exclusion, and government repression: Some cross-national evidence. American Sociological Review 49,
114 References Economics Akleman, Derya G., David A. Bessler, and Diana M. Burton. (1999). ‘Modeling corn exports and exchange rates with directed graphs and statistical loss functions’, in Clark Glymour and Gregory F. Cooper (eds) Computation, Causation, and Discovery, American Association for Artificial Intelligence, Menlo Park, CA and MIT Press, Cambridge, MA, pp Awokuse, T. O. (2005) “Export-led Growth and the Japanese Economy: Evidence from VAR and Directed Acyclical Graphs,” Applied Economics Letters 12(14), Bessler, David A. and N. Loper. (2001) “Economic Development: Evidence from Directed Acyclical Graphs” Manchester School 69(4), Bessler, David A. and Seongpyo Lee. (2002). ‘Money and prices: U.S. data (a study with directed graphs)’, Empirical Economics, Vol. 27, pp Demiralp, Selva and Kevin D. Hoover. (2003) !Searching for the Causal Structure of a Vector Autoregression," Oxford Bulletin of Economics and Statistics 65(supplement), pp Haigh, M.S., N.K. Nomikos, and D.A. Bessler (2004) “Integration and Causality in International Freight Markets: Modeling with Error Correction and Directed Acyclical Graphs,” Southern Economic Journal 71(1), Sheffrin, Steven M. and Robert K. Triest. (1998). ‘A new approach to causality and economic growth’, unpublished typescript, University of California, Davis.
115 References Economics Swanson, Norman R. and Clive W.J. Granger. (1997). ‘Impulse response functions based on a causal approach to residual orthogonalization in vector autoregressions’, Journal of the American Statistical Association, Vol. 92, pp Demiralp, S., Hoover, K., & Perez, S. A Bootstrap Method for Identifying and Evaluating a Structural Vector Autoregression Oxford Bulletin of Economics and Statistics, 2008, 70, (4), Searching for the Causal Structure of a Vector Autoregression Oxford Bulletin of Economics and Statistics, 2003, 65, (s1), Kevin D. Hoover, Selva Demiralp, Stephen J. Perez, Empirical Identification of the Vector Autoregression: The Causes and Effects of U.S. M2*, This paper was written to present at the Conference in Honour of David F. Hendry at Oxford University, 2325 August Selva Demiralp and Kevin D. Hoover, Searching for the Causal Structure of a Vector Autoregression, OXFORD BULLETIN OF ECONOMICS AND STATISTICS, 65, SUPPLEMENT (2003) A. Moneta, and P. Spirtes “Graphical Models for the Identification of Causal Structures in Multivariate Time Series Model”, Proceedings of the 2006 Joint Conference on Information Sciences, JCIS 2006, Kaohsiung, Taiwan, ROC, October 8-11,2006, Atlantis Press, 2006.
References Eberhardt, F., and Scheines R., (2007).“Interventions and Causal Inference”, in PSA-2006, Proceedings of the 20th biennial meeting of the Philosophy of Science Association Silva, R., Glymour, C., Scheines, R. and Spirtes, P. (2006) “Learning the Structure of Latent Linear Structure Models,” Journal of Machine Learning Research, 7,