1 Part 2 Automatically Identifying and Measuring Latent Variables for Causal Theorizing.

Slides:



Advertisements
Similar presentations
Structural Equation Modeling
Advertisements

Discovering Cyclic Causal Models by Independent Components Analysis Gustavo Lacerda Peter Spirtes Joseph Ramsey Patrik O. Hoyer.
1. Person 1 1.Stress 2.Depression 3. Religious Coping Task: learn causal model 2 Data from Bongjae Lee, described in Silva et al
Topic Outline Motivation Representing/Modeling Causal Systems
AP Stats Review. Assume that the probability that a baseball player will get a hit in any one at-bat is Give an expression for the probability.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
INFERENTIAL STATISTICS. Descriptive statistics is used simply to describe what's going on in the data. Inferential statistics helps us reach conclusions.
Ch11 Curve Fitting Dr. Deshi Ye
Objectives (BPS chapter 24)
1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Simple Linear Regression Estimates for single and mean responses.
MULTIPLE REGRESSION. OVERVIEW What Makes it Multiple? What Makes it Multiple? Additional Assumptions Additional Assumptions Methods of Entering Variables.
Multiple regression analysis
1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 2. Hypothesis Testing.
Common Factor Analysis “World View” of PC vs. CF Choosing between PC and CF PAF -- most common kind of CF Communality & Communality Estimation Common Factor.
Some more issues of time series analysis Time series regression with modelling of error terms In a time series regression model the error terms are tentatively.
Note 14 of 5E Statistics with Economics and Business Applications Chapter 12 Multiple Regression Analysis A brief exposition.
Clustered or Multilevel Data
REGRESSION AND CORRELATION
Part 18: Regression Modeling 18-1/44 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of Economics.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 11 th Edition.
Analysis of Individual Variables Descriptive – –Measures of Central Tendency Mean – Average score of distribution (1 st moment) Median – Middle score (50.
Part 7: Multiple Regression Analysis 7-1/54 Regression Models Professor William Greene Stern School of Business IOMS Department Department of Economics.
Descriptive measures of the strength of a linear association r-squared and the (Pearson) correlation coefficient r.
Relationships Among Variables
1 Day 2: Search June 9, 2015 Carnegie Mellon University Center for Causal Discovery.
Review for Final Exam Some important themes from Chapters 9-11 Final exam covers these chapters, but implicitly tests the entire course, because we use.
CAUSAL SEARCH IN THE REAL WORLD. A menu of topics  Some real-world challenges:  Convergence & error bounds  Sample selection bias  Simpson’s paradox.
Using Bayesian Networks to Analyze Expression Data N. Friedman, M. Linial, I. Nachman, D. Hebrew University.
1 1 Slide © 2003 Thomson/South-Western Chapter 13 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple Coefficient of Determination.
Chapter 14 Multiple Regression Models. 2  A general additive multiple regression model, which relates a dependent variable y to k predictor variables.
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc. Chap 12-1 Correlation and Regression.
Introduction to Linear Regression
Association between 2 variables
Statistical analysis Prepared and gathered by Alireza Yousefy(Ph.D)
Introduction to Probability and Statistics Thirteenth Edition Chapter 12 Linear Regression and Correlation.
CJT 765: Structural Equation Modeling Class 8: Confirmatory Factory Analysis.
MGS3100_04.ppt/Sep 29, 2015/Page 1 Georgia State University - Confidential MGS 3100 Business Analysis Regression Sep 29 and 30, 2015.
Part 2: Model and Inference 2-1/49 Regression Models Professor William Greene Stern School of Business IOMS Department Department of Economics.
Measurement Models: Exploratory and Confirmatory Factor Analysis James G. Anderson, Ph.D. Purdue University.
14- 1 Chapter Fourteen McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved.
Course files
ANOVA: Analysis of Variance.
Lack of Fit (LOF) Test A formal F test for checking whether a specific type of regression function adequately fits the data.
Multiple regression. Example: Brain and body size predictive of intelligence? Sample of n = 38 college students Response (Y): intelligence based on the.
Academic Research Academic Research Dr Kishor Bhanushali M
Experimental Research Methods in Language Learning Chapter 10 Inferential Statistics.
Lecture 10 Chapter 23. Inference for regression. Objectives (PSLS Chapter 23) Inference for regression (NHST Regression Inference Award)[B level award]
Single-Factor Studies KNNL – Chapter 16. Single-Factor Models Independent Variable can be qualitative or quantitative If Quantitative, we typically assume.
Exploratory studies: you have empirical data and you want to know what sorts of causal models are consistent with it. Confirmatory tests: you have a causal.
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Variable Selection 1 Chapter 8 Variable Selection Terry Dielman Applied Regression Analysis:
I231B QUANTITATIVE METHODS ANOVA continued and Intro to Regression.
Latent regression models. Where does the probability come from? Why isn’t the model deterministic. Each item tests something unique – We are interested.
Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.
Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.
Tutorial I: Missing Value Analysis
Business Research Methods
1Causal Performance Models Causal Models for Performance Analysis of Computer Systems Jan Lemeire TELE lab May 24 th 2006.
Principal Component Analysis
Regression Analysis Presentation 13. Regression In Chapter 15, we looked at associations between two categorical variables. We will now focus on relationships.
1 Day 2: Search June 9, 2015 Carnegie Mellon University Center for Causal Discovery.
1 Day 2: Search June 14, 2016 Carnegie Mellon University Center for Causal Discovery.
Descriptive Statistics Report Reliability test Validity test & Summated scale Dr. Peerayuth Charoensukmongkol, ICO NIDA Research Methods in Management.
Day 3: Search Continued Center for Causal Discovery June 15, 2015
Chapter 13 Created by Bethany Stubbe and Stephan Kogitz.
Dr. Siti Nor Binti Yaacob
Center for Causal Discovery: Summer Short Course/Datathon
OUTLINE Lecture 5 A. Review of Lecture 4 B. Special SLR Models
Multiple Regression Chapter 14.
Principal Component Analysis
Presentation transcript:

1 Part 2 Automatically Identifying and Measuring Latent Variables for Causal Theorizing

2 Assumptions Throughout Causal Bayes Nets Causal Markov Condition Faithfulness

3 Latent Variables Reduce Dimensionality

4 Latent Variables Cluster of Causes

5 Latent Variables Model concepts that might be “real” but which cannot be directly measured, e.g., air polution, depression

6 The Causal Theory Formation Problem for Latent Variable Models Given observations on a number of variables, identify the latent variables that underlie these variables and the causal relations among these latent concepts. Example: Spectral measurements of solar radiation intensities. Variables are intensities at each measured frequency. Example: Quality of a Child’s Home Environment, Cumulative Exposure to Lead, Cognitive Functioning

7 The Most Common Automatic Solution: Exploratory Factor Analysis Chooses “factors” to account linearly for as much of the variance/covariance of the measured variables as possible. Great for dimensionality reduction Factor rotations are arbitrary Gives no information about the statistical and thus the causal dependencies among any real underlying factors. No general theory of the reliability of the procedure

8 Other Solutions Independent Components, etc Background Theory Scales

9 Other Solutions: Background Theory Key Causal Question Thus, key statistical question: Lead _||_ Cog | Home ? Specified Model

10 Lead _||_ Cog | Home ? Yes, but statistical inference will say otherwise. Other Solutions: Background Theory True Model “Impurities”

11 Other Solutions: Background Theory True Model “Impure” Measures: C 1, C 2, T 2, T 20 A measure is “pure” if it is d-separated from all other measures by its latent parent.

12 Purify Specified Model

13 Purify True Model

14 Purify True Model

15 Purify True Model

16 Purify True Model

17 Purify Purified Model

18 Scale = sum(measures of a latent) Other Solutions: Scales

19 True Model Other Solutions: Scales Pseudo-Random Sample: N = 2,000

20 Scales vs. Latent variable Models Regression: Cognition on Home, Lead Predictor Coef SE Coef T P Constant Home Lead S = R-Sq = 61.1% R-Sq(adj) = 61.0% Insig. True Model

21 Scales vs. Latent variable Models Scales homescale = (x1 + x2 + x3)/3 leadscale = (x4 + x5 + x6)/3 cogscale = (x7 + x8 + x9)/3 True Model

22 Scales vs. Latent variable Models Cognition = homescale Lead Predictor Coef SE Coef T P Constant homescal Lead Regression: Cognition on homescale, Lead Sig. True Model

23 Scales vs. Latent variable Models Modeling Latents True Model Specified Model

24 Scales vs. Latent variable Models (  2 = 29.6, df = 24, p =.19) B5 =.0075, which at t=.23, is correctly insignificant True Model Estimated Model

25 Scales vs. Latent variable Models Mixing Latents and Scales (  2 = 14.57, df = 12, p =.26) B5 = -.137, which at t=5.2, is incorrectly highly significant P <.001 True Model

26 Algorithms Washdown (Scheines and Glymour, 2000?) Build Pure Clusters (Silva, Scheines, Glymour, 2003,204)

27 Build Pure Clusters Qualitative Assumptions (Causal Grammar - Tennenbaum): 1.Two types of nodes: measured (M) and latent (L) 2.M L (measured don’t cause latents) 3.Each m  M measures (is a direct effect of) at least one l  L 4.No cycles involving M Quantitative Assumptions: 1.Each m  M is a linear function of its parents plus noise 2.P(L) has second moments, positive variances, and no deterministic relations

28 Build Pure Clusters Output - provably reliable (pointwise consistent): Equivalence class of measurement models over a pure subset of M For example: True Model Output

29 Build Pure Clusters Measurement models in the equivalence class are at most refinements, but never coarsenings or permuted clusterings. Output

30 Build Pure Clusters Algorithm Sketch: 1.Use particular rank (tetrad) constraints on the measured correlations to find pairs m j, m k that do NOT share a latent parent 2.Add a latent for each subset S of M such that no pair in S was found NOT to share a latent parent in step 1. 3.Purify 4.Remove latents with no children

31 Limitations Requires large sample sizes to be really reliable (~ 500). Pure indicators must exist for a latent to be discovered and included Moderately computationally intensive (O(n 6 )). No error probabilities.

32 Case Studies Stress, Depression, and Religion (Lee, 2004) Test Anxiety (Bartholomew, 2002)

33 Stress, Depression, and Religion MSW Students (N = 127) 61 - item survey (Likert Scale) Stress: St 1 - St 21 Depression: D 1 - D 20 Religious Coping: C 1 - C 20 P = 0.00 Specified Model

34 Stress, Depression, and Religion Build Pure Clusters

35 Stress, Depression, and Religion Assume Stress temporally prior: MIMbuild to find Latent Structure: P = 0.28

36 Test Anxiety 12th Grade Males in British Columbia (N = 335) 20 - item survey (Likert Scale items): X 1 - X 20 Exploratory Factor Analysis:

37 Test Anxiety Build Pure Clusters :

38 Test Anxiety Build Pure Clusters : P-value = 0.00 P-value = 0.47 Exploratory Factor Analysis :

39 Test Anxiety MIMbuild p =.43Unininformative Scales: No Independencies or Conditional Independencies

40 Future Directions Handle discrete items Incorporate background knowledge Apply to ETS data