1 Richard Scheines Carnegie Mellon University Causal Graphical Models II: Applications with Search.

Slides:



Advertisements
Similar presentations
Hierarchical Models and
Advertisements

Design of Experiments Lecture I
A. The Basic Principle We consider the multivariate extension of multiple linear regression – modeling the relationship between m responses Y 1,…,Y m and.
Causal Data Mining Richard Scheines Dept. of Philosophy, Machine Learning, & Human-Computer Interaction Carnegie Mellon.
Discovering Cyclic Causal Models by Independent Components Analysis Gustavo Lacerda Peter Spirtes Joseph Ramsey Patrik O. Hoyer.
1. Person 1 1.Stress 2.Depression 3. Religious Coping Task: learn causal model 2 Data from Bongjae Lee, described in Silva et al
Topic Outline Motivation Representing/Modeling Causal Systems
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Chapter 12 Simple Regression
Econ 140 Lecture 131 Multiple Regression Models Lecture 13.
Statistics for Business and Economics
Mediating Between Causes and Probabilities: the Use of Graphical Models in Econometrics Alessio Moneta Max Planck Institute of Economics, Jena, and Sant’Anna.
Multiple Regression Models
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 13 Introduction to Linear Regression and Correlation Analysis.
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.Chap 13-1 Statistics for Managers Using Microsoft® Excel 5th Edition Chapter.
Linear Regression and Correlation Analysis
Chapter 13 Introduction to Linear Regression and Correlation Analysis
1 gR2002 Peter Spirtes Carnegie Mellon University.
Analysis of Individual Variables Descriptive – –Measures of Central Tendency Mean – Average score of distribution (1 st moment) Median – Middle score (50.
Causal Modeling for Anomaly Detection Andrew Arnold Machine Learning Department, Carnegie Mellon University Summer Project with Naoki Abe Predictive Modeling.
Linear Regression Models Powerful modeling technique Tease out relationships between “independent” variables and 1 “dependent” variable Models not perfect…need.
Introduction to Regression Analysis, Chapter 13,
1 Center for Causal Discovery: Summer Workshop June 8-11, 2015 Carnegie Mellon University.
1 Day 2: Search June 9, 2015 Carnegie Mellon University Center for Causal Discovery.
1 Chapter 10 Correlation and Regression We deal with two variables, x and y. Main goal: Investigate how x and y are related, or correlated; how much they.
General Linear Model & Classical Inference Guillaume Flandin Wellcome Trust Centre for Neuroimaging University College London SPM M/EEGCourse London, May.
1 Part 2 Automatically Identifying and Measuring Latent Variables for Causal Theorizing.
Introduction to Linear Regression and Correlation Analysis
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 12-1 Chapter 12 Simple Linear Regression Statistics for Managers Using.
Chapter 13: Inference in Regression
© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
1 Tetrad: Machine Learning and Graphcial Causal Models Richard Scheines Joe Ramsey Carnegie Mellon University Peter Spirtes, Clark Glymour.
Chapter 15 Correlation and Regression
Statistics for Business and Economics Chapter 10 Simple Linear Regression.
1 Searching for Causal Models Richard Scheines Philosophy, Machine Learning, Human-Computer Interaction Carnegie Mellon University.
1 Experimental Statistics - week 10 Chapter 11: Linear Regression and Correlation Note: Homework Due Thursday.
Name: Angelica F. White WEMBA10. Teach students how to make sound decisions and recommendations that are based on reliable quantitative information During.
Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:
1 Causal Data Mining Richard Scheines Dept. of Philosophy, Machine Learning, & Human-Computer Interaction Carnegie Mellon.
© 2001 Prentice-Hall, Inc. Statistics for Business and Economics Simple Linear Regression Chapter 10.
Random Regressors and Moment Based Estimation Prepared by Vera Tabakova, East Carolina University.
Chapter 10 Correlation and Regression
FMRI Methods Lecture7 – Review: analyses & statistics.
Nov. 13th, Causal Discovery Richard Scheines Peter Spirtes, Clark Glymour, and many others Dept. of Philosophy & CALD Carnegie Mellon.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.
Learning Linear Causal Models Oksana Kohutyuk ComS 673 Spring 2005 Department of Computer Science Iowa State University.
Ch 8. Graphical Models Pattern Recognition and Machine Learning, C. M. Bishop, Revised by M.-O. Heo Summarized by J.W. Nam Biointelligence Laboratory,
Academic Research Academic Research Dr Kishor Bhanushali M
G Lecture 81 Comparing Measurement Models across Groups Reducing Bias with Hybrid Models Setting the Scale of Latent Variables Thinking about Hybrid.
Statistical Analysis An Introduction to MRI Physics and Analysis Michael Jay Schillaci, PhD Monday, April 7 th, 2007.
1 Day 2: Search June 9, 2015 Carnegie Mellon University Center for Causal Discovery.
Slide Slide 1 Chapter 10 Correlation and Regression 10-1 Overview 10-2 Correlation 10-3 Regression 10-4 Variation and Prediction Intervals 10-5 Multiple.
1 Day 2: Search June 14, 2016 Carnegie Mellon University Center for Causal Discovery.
The General Linear Model Christophe Phillips SPM Short Course London, May 2013.
Methods of multivariate analysis Ing. Jozef Palkovič, PhD.
Stochasticity and Probability. A new approach to insight Pose question and think of the answer needed to answer it. Ask: How do the data arise? What is.
Workshop Files
Markov Properties of Directed Acyclic Graphs
CHAPTER 26: Inference for Regression
The General Linear Model (GLM)
Center for Causal Discovery: Summer Short Course/Datathon
Causal Data Mining Richard Scheines
Richard Scheines Carnegie Mellon University
Extra Slides.
The general linear model and Statistical Parametric Mapping
Hierarchical Models and
The General Linear Model (GLM)
Searching for Graphical Causal Models of Education Data
Presentation transcript:

1 Richard Scheines Carnegie Mellon University Causal Graphical Models II: Applications with Search

2 1.Foreign Investment 2.Welfare Reform 3.Online Learning 4.Charitable Giving 5.Stress & Prayer 6.Test Anxiety 7.Causal Connectivity among Brain Regions Case Studies

3 1.Exceedingly simple 2.Background theory weak 3.Claim: –Not: search output is true –Is: search adds value Case Studies

4 Case Study 1: Foreign Investment Does Foreign Investment in 3 rd World Countries cause Political Repression? Timberlake, M. and Williams, K. (1984). Dependence, political exclusion, and government repression: Some cross-national evidence. American Sociological Review 49, N = 72 POdegree of political exclusivity CVlack of civil liberties ENenergy consumption per capita (economic development) FIlevel of foreign investment

5 Correlations po fi en fi en cv Case Study 1: Foreign Investment

6 Regression Results po =.227*fi -.176*en +.880*cv SE (.058) (.059) (.060) t Interpretation: foreign investment increases political repression Case Study 1: Foreign Investment

Alternatives Case Study 1: Foreign Investment There is no model with testable constraints (df > 0) in which FI has a positive effect on PO that is not rejected by the data.

8 Aurora Jackson, Richard Scheines Single Mothers’ Self-Efficacy, Parenting in the Home Environment, and Children’s Development in a Two-Wave Study (Social Work Research, 29, 1, 7-20) Case Study 2: Welfare Reform

9 Two-Wave Longitudinal Study Longitudinal Data oTime 1: (N = 188) oTime 2: (N = 178) Single black mothers in NYC Current and former welfare recipients With a child who was 3 – 5 at time 1, and 6 to 8 at time 2 Case Study 2: Welfare Reform

10 Constructs/Scales/Measures Employment Status Perceived Self-efficacy Depressive Symptoms Quality of Mother/Father Relationship Father/Child Contact Quality of Home Environment Behavior Problems Cognitive Development Case Study 2: Welfare Reform

11 Background Knowledge Tier 1: Employment Status Tier 2: Depression Self-efficacy Mother/Father Relationship Father/Child Contact Mother’s Parenting/HOME Tier 3: Negative Behaviors Cognitive Development Over 22 million path models consistent with these constraints Case Study 2: Welfare Reform

12 Tetrad Equivalence Class Conceptual Model  2 = 22.3, df = 20, p =.32  2 = 18.87, df = 19, p =.46 Case Study 2: Welfare Reform

13 Tetrad Conceptual Model Points of Agreement: Mother’s Self-Efficacy mediates the effect of Employment on all other variables. Home environment mediates the effect of all other factors on outcomes: Cog. Develop and Prob. Behaviors Points of Disagreement: Depression key cause vs. only an effect Case Study 2: Welfare Reform

14 Online Course in Causal & Statistical Reasoning Case Study 3: Online Courseware

15 Variables  Pre-test (%)  Print-outs (% modules printed)  Quiz Scores (avg. %)  Voluntary Exercises (% completed)  Final Exam (%)  9 other variables Case Study 3: Online Courseware Tier 1 Tier 2 Tier 3

16 Printing and Voluntary Comprehension Checks: > Case Study 3: Online Courseware

17 Variables  Tangibility/Concreteness (Exp manipulation)  Imaginability (likert 1-7)  Impact (avg. of 2 likerts)  Sympathy (likert)  Donation ($) Case Study 4: Charitable Giving Cryder & Loewenstein (in prep)

18 Theoretical Model Case Study 4: Charitable Giving study 1 (N= 94) df = 5,  2 = 52.0, p=

19 GES Outputs Case Study 4: Charitable Giving study 1: df = 5,  2 = 5.88, p= 0.32 study 1: df = 5,  2 = 3.99, p= 0.55

20 Theoretical Model Case Study 4: Charitable Giving study 2 (N= 115) df = 5,  2 = 62.6, p= study 2: df = 5,  2 = 8.23, p= 0.14 study 2: df = 5,  2 = 7.48, p= 0.18

21 Build Pure Clusters Output - provably reliable (pointwise consistent): Equivalence class of measurement models over a pure subset of measures True Model Output

22 Build Pure Clusters Qualitative Assumptions 1.Two types of nodes: measured (M) and latent (L) 2.M L (measured don’t cause latents) 3.Each m  M measures (is a direct effect of) at least one l  L 4.No cycles involving M Quantitative Assumptions: 1.Each m  M is a linear function of its parents plus noise 2.P(L) has second moments, positive variances, and no deterministic relations

23 Case Study 5: Stress, Depression, and Religion MSW Students (N = 127) 61 - item survey (Likert Scale) Stress: St 1 - St 21 Depression: D 1 - D 20 Religious Coping: C 1 - C 20 p = 0.00 Specified Model

24 Build Pure Clusters Case Study 5: Stress, Depression, and Religion

25 Assume Stress temporally prior: MIMbuild to find Latent Structure: p = 0.28 Case Study 5: Stress, Depression, and Religion

26 Case Study 6: Test Anxiety Bartholomew and Knott (1999), Latent variable models and factor analysis 12th Grade Males in British Columbia (N = 335) 20 - item survey (Likert Scale items): X 1 - X 20 : Exploratory Factor Analysis:

27 Build Pure Clusters : Case Study 6: Test Anxiety

28 Build Pure Clusters: p-value = 0.00p-value = 0.47 Exploratory Factor Analysis: Case Study 6: Test Anxiety

29 MIMbuild p =.43Uninformative Scales: No Independencies or Conditional Independencies Case Study 6: Test Anxiety

30 Goals: –Identify relatively BIG brain regions (ROIs). –Figure out how they influence one another, with what timing sequences, in producing behaviors of interest. –Figure out individual differences. Case Study 7: fMRI  Brain Connectivity

Experiment: (Xue and Poldrack, unpublished) –13 right handed subjects –On each trial, subject judged whether visual stimuli rhymed or not –8 pairs of words/nonwords presented for 2.5 seconds each in eight 20 second blocks, separated by 20 seconds of visual fixation –TR = 2000 milliseconds –160 time points. 31 Case Study 7: fMRI

32 Problems: –Criteria for identifying ROIs –Individuals differ Brain ROIs Parameter values –Brain processing is cyclic –Time: Varying time delays of neuron  ROI BOLD response Time series sampling rate vs. processing rate –Search Space 11 ROIs – 3 23 DAGs Case Study 7: fMRI  Brain Connectivity

ROI Construction Mean of signal intensity among voxels in a cluster at a time 1 st or....4 th principal component Average of top X% variance Maximum variance voxel. Eyeballs Etc., etc Case Study 7: fMRI

Example ROIs Case Study 7: fMRI

35 –Individuals differ Brain ROIs Parameter values Case Study 7: fMRI  Brain Connectivity –Assume same qualitiative causal structure different quantitative causal structure (mixed effects) –iMAGES search Apply GES to each subject, 1 step Take step = max(avg. BIC score) to each search Repeat

Time Problem 1 fMRI recordings at time intervals can be analyzed as a collection of independent cases. Or, they can be analyzed as an auto-regressive time series. Which is better? –No general answer. –But if you think the neural activities measured at time t influence the measurements at time t+1 then the data should be treated as a lag 1 auto-regressive time series. –But then Granger causality isn’t a consistent estimator of causal relations. 36 Case Study 7: fMRI

Granger Causality Corrected Causal processes faster than the sampling rate: X t X t+1 X Y t Y t+1 Y Z t Z t+1 Z Regress on t variables Apply GES to the RESIDUALS of the regression (Demiralp, Hoover) NO False path Case Study 7: fMRI

Time Problem 2 Varying time delays : neurons  BOLD responses 38 Case Study 7: fMRI Try all time shifts of one or two units over all subsets of 3 vars, choose shift that leads to best likelihoods

Lag 0 resultLag 1 result. 39

Simulation Studies: 11 ROIs, each consisting of 50 simulated neurons: Neuron output spikes simulated by thresholding a tanh function of the sum of neuron inputs. Excitatory feedback Random subset of neurons in one ROI input to random subset of neurons in an “effectively connected ROI” Measured variables = BOLD function of sum of ROI neurons + Gaussian error with variance = error variances of empirical measured variables in the X/P experiment. 40 Case Study 7: fMRI

Repeat 10 times: –Randomly generate a graphical structure with 11 nodes and 11 (feedforward) directed edges –Randomly select a subset of simulated ROIs. –Generate data –Randomly shift 0 to 3 variables one or 2 time steps forward. –Apply the iMAGES method with 0 lag and 1 lag, with backshifting. Tabulate the errors. 41 Simulate the Xue/Poldrack Experiment Time Series: Case Study 7: fMRI

Simulation Results 0 Lag: Average number of false positive edges: 0.7 Average number of mis-directed edges: Lag Residuals: Average number of false positive edges: 1.2 Average number of mis-directed edges: Case Study 6: fMRI

43 Economics  Bessler, Pork Prices  Hoover, multiple  Cryder & Loewenstein, Charitable Giving Other Cases Educational Research  Easterday, Bias & Recall  Laski, Numerical coding Climate Research  Glymour, Chu,, Teleconnections Biology  Shipley,  SGS, Spartina Grass Neuroscience  Glymour & Ramsey, fMRI Epidemiology  Scheines, Lead & IQ

44 Straw Men! Model Search ignores theory Model Search hides assumptions Model Search needs more assumptions than standard statistical models

45 References Biology Chu, Tianjaio, Glymour C., Scheines, R., & Spirtes, P, (2002). A Statistical Problem for Inference to Regulatory Structure from Associations of Gene Expression Measurement with Microarrays. Bioinformatics, 19: Shipley, B. Exploring hypothesis space: examples from organismal biology. Computation, Causation and Discovery. C. Glymour and G. Cooper. Cambridge, MA, MIT Press. Shipley, B. (1995). Structured interspecific determinants of specific leaf area in 34 species of herbaceous angeosperms. Functional Ecology 9. General Spirtes, P., Glymour, C., Scheines, R. (2000). Causation, Prediction, and Search, 2 nd Edition, MIT Press. Pearl, J. (2000). Causation : Models of Reasoning and Inference, Cambridge University Press.

46 References Scheines, R. (2000). Estimating Latent Causal Influences: TETRAD III Variable Selection and Bayesian Parameter Estimation: the effect of Lead on IQ, Handbook of Data Mining, Pat Hayes, editor, Oxford University Press. Jackson, A., and Scheines, R., (2005). Single Mothers' Self-Efficacy, Parenting in the Home Environment, and Children's Development in a Two-Wave Study, Social Work Research, 29, 1, pp Timberlake, M. and Williams, K. (1984). Dependence, political exclusion, and government repression: Some cross-national evidence. American Sociological Review 49,

47 References Economics Akleman, Derya G., David A. Bessler, and Diana M. Burton. (1999). ‘Modeling corn exports and exchange rates with directed graphs and statistical loss functions’, in Clark Glymour and Gregory F. Cooper (eds) Computation, Causation, and Discovery, American Association for Artificial Intelligence, Menlo Park, CA and MIT Press, Cambridge, MA, pp Awokuse, T. O. (2005) “Export-led Growth and the Japanese Economy: Evidence from VAR and Directed Acyclical Graphs,” Applied Economics Letters 12(14), Bessler, David A. and N. Loper. (2001) “Economic Development: Evidence from Directed Acyclical Graphs” Manchester School 69(4), Bessler, David A. and Seongpyo Lee. (2002). ‘Money and prices: U.S. data (a study with directed graphs)’, Empirical Economics, Vol. 27, pp Demiralp, Selva and Kevin D. Hoover. (2003) !Searching for the Causal Structure of a Vector Autoregression," Oxford Bulletin of Economics and Statistics 65(supplement), pp Haigh, M.S., N.K. Nomikos, and D.A. Bessler (2004) “Integration and Causality in International Freight Markets: Modeling with Error Correction and Directed Acyclical Graphs,” Southern Economic Journal 71(1), Sheffrin, Steven M. and Robert K. Triest. (1998). ‘A new approach to causality and economic growth’, unpublished typescript, University of California, Davis.

48 References Economics Swanson, Norman R. and Clive W.J. Granger. (1997). ‘Impulse response functions based on a causal approach to residual orthogonalization in vector autoregressions’, Journal of the American Statistical Association, Vol. 92, pp Demiralp, S., Hoover, K., & Perez, S. A Bootstrap Method for Identifying and Evaluating a Structural Vector Autoregression Oxford Bulletin of Economics and Statistics, 2008, 70, (4), Searching for the Causal Structure of a Vector Autoregression Oxford Bulletin of Economics and Statistics, 2003, 65, (s1), Kevin D. Hoover, Selva Demiralp, Stephen J. Perez, Empirical Identification of the Vector Autoregression: The Causes and Effects of U.S. M2*, This paper was written to present at the Conference in Honour of David F. Hendry at Oxford University, 2325 August Selva Demiralp and Kevin D. Hoover, Searching for the Causal Structure of a Vector Autoregression, OXFORD BULLETIN OF ECONOMICS AND STATISTICS, 65, SUPPLEMENT (2003) A. Moneta, and P. Spirtes “Graphical Models for the Identification of Causal Structures in Multivariate Time Series Model”, Proceedings of the 2006 Joint Conference on Information Sciences, JCIS 2006, Kaohsiung, Taiwan, ROC, October 8-11,2006, Atlantis Press, 2006.

49 Extra

Lead and IQ: Variable Selection Final Variables (Needleman) -leadbaby teeth -fabfather’s age -mabmother’s age -nlbnumber of live births -medmother’s education -piqparent’s IQ -ciqchild’s IQ

Needleman Regression - standardized coefficient - (t-ratios in parentheses) - p-value for significance ciq = lead fab nlb med mab piq (2.32) (1.79) (2.30) (3.08) (1.97) (3.87) < <0.01 All variables significant at.1 R 2 =.271

TETRAD Variable Selection Tetrad mab _||_ ciq fab _||_ ciq nlb _||_ ciq | med Regression mab _||_ ciq | { lead, med, piq, nlb fab} fab _||_ ciq | { lead, med, piq, nlb mab} nlb _||_ ciq | { lead, med, piq, mab, fab}

Regressions - standardized coefficient - (t-ratios in parentheses) - p-value for significance Needleman (R 2 =.271) ciq = lead fab nlb med mab piq (2.32) (1.79) (2.30) (3.08) (1.97) (3.87) < <0.01 TETRAD (R 2 =.243) ciq = lead med piq (2.89) (3.50) (3.59) <0.01 <0.01 <0.01

Measurement Error Measured regressor variables are proxies that involve measurement error Errors-in-all-variables model for Lead’s influence on IQ - underidentified Strategies: Sensitivity Analysis Bayesian Analysis

Prior over Measurement Error Proportion of Variance from Measurement Error Measured Lead Mean =.2,SD =.1 Parent’s IQMean =.3,SD =.15 Mother’s Education Mean =.3,SD =.15 Prior Otherwise uninformative

Posterior Zero Robust over similar priors

Using Needleman’s Covariates With similar prior, the marginal posterior: Very Sensitive to Prior Over Regressors TETRAD eliminated Zero