1 Part 2 Automatically Identifying and Measuring Latent Variables for Causal Theorizing
2 Assumptions Throughout Causal Bayes Nets Causal Markov Condition Faithfulness
3 Latent Variables Reduce Dimensionality
4 Latent Variables Cluster of Causes
5 Latent Variables Model concepts that might be “real” but which cannot be directly measured, e.g., air polution, depression
6 The Causal Theory Formation Problem for Latent Variable Models Given observations on a number of variables, identify the latent variables that underlie these variables and the causal relations among these latent concepts. Example: Spectral measurements of solar radiation intensities. Variables are intensities at each measured frequency. Example: Quality of a Child’s Home Environment, Cumulative Exposure to Lead, Cognitive Functioning
7 The Most Common Automatic Solution: Exploratory Factor Analysis Chooses “factors” to account linearly for as much of the variance/covariance of the measured variables as possible. Great for dimensionality reduction Factor rotations are arbitrary Gives no information about the statistical and thus the causal dependencies among any real underlying factors. No general theory of the reliability of the procedure
8 Other Solutions Independent Components, etc Background Theory Scales
9 Other Solutions: Background Theory Key Causal Question Thus, key statistical question: Lead _||_ Cog | Home ? Specified Model
10 Lead _||_ Cog | Home ? Yes, but statistical inference will say otherwise. Other Solutions: Background Theory True Model “Impurities”
11 Other Solutions: Background Theory True Model “Impure” Measures: C 1, C 2, T 2, T 20 A measure is “pure” if it is d-separated from all other measures by its latent parent.
12 Purify Specified Model
13 Purify True Model
14 Purify True Model
15 Purify True Model
16 Purify True Model
17 Purify Purified Model
18 Scale = sum(measures of a latent) Other Solutions: Scales
19 True Model Other Solutions: Scales Pseudo-Random Sample: N = 2,000
20 Scales vs. Latent variable Models Regression: Cognition on Home, Lead Predictor Coef SE Coef T P Constant Home Lead S = R-Sq = 61.1% R-Sq(adj) = 61.0% Insig. True Model
21 Scales vs. Latent variable Models Scales homescale = (x1 + x2 + x3)/3 leadscale = (x4 + x5 + x6)/3 cogscale = (x7 + x8 + x9)/3 True Model
22 Scales vs. Latent variable Models Cognition = homescale Lead Predictor Coef SE Coef T P Constant homescal Lead Regression: Cognition on homescale, Lead Sig. True Model
23 Scales vs. Latent variable Models Modeling Latents True Model Specified Model
24 Scales vs. Latent variable Models ( 2 = 29.6, df = 24, p =.19) B5 =.0075, which at t=.23, is correctly insignificant True Model Estimated Model
25 Scales vs. Latent variable Models Mixing Latents and Scales ( 2 = 14.57, df = 12, p =.26) B5 = -.137, which at t=5.2, is incorrectly highly significant P <.001 True Model
26 Algorithms Washdown (Scheines and Glymour, 2000?) Build Pure Clusters (Silva, Scheines, Glymour, 2003,204)
27 Build Pure Clusters Qualitative Assumptions (Causal Grammar - Tennenbaum): 1.Two types of nodes: measured (M) and latent (L) 2.M L (measured don’t cause latents) 3.Each m M measures (is a direct effect of) at least one l L 4.No cycles involving M Quantitative Assumptions: 1.Each m M is a linear function of its parents plus noise 2.P(L) has second moments, positive variances, and no deterministic relations
28 Build Pure Clusters Output - provably reliable (pointwise consistent): Equivalence class of measurement models over a pure subset of M For example: True Model Output
29 Build Pure Clusters Measurement models in the equivalence class are at most refinements, but never coarsenings or permuted clusterings. Output
30 Build Pure Clusters Algorithm Sketch: 1.Use particular rank (tetrad) constraints on the measured correlations to find pairs m j, m k that do NOT share a latent parent 2.Add a latent for each subset S of M such that no pair in S was found NOT to share a latent parent in step 1. 3.Purify 4.Remove latents with no children
31 Limitations Requires large sample sizes to be really reliable (~ 500). Pure indicators must exist for a latent to be discovered and included Moderately computationally intensive (O(n 6 )). No error probabilities.
32 Case Studies Stress, Depression, and Religion (Lee, 2004) Test Anxiety (Bartholomew, 2002)
33 Stress, Depression, and Religion MSW Students (N = 127) 61 - item survey (Likert Scale) Stress: St 1 - St 21 Depression: D 1 - D 20 Religious Coping: C 1 - C 20 P = 0.00 Specified Model
34 Stress, Depression, and Religion Build Pure Clusters
35 Stress, Depression, and Religion Assume Stress temporally prior: MIMbuild to find Latent Structure: P = 0.28
36 Test Anxiety 12th Grade Males in British Columbia (N = 335) 20 - item survey (Likert Scale items): X 1 - X 20 Exploratory Factor Analysis:
37 Test Anxiety Build Pure Clusters :
38 Test Anxiety Build Pure Clusters : P-value = 0.00 P-value = 0.47 Exploratory Factor Analysis :
39 Test Anxiety MIMbuild p =.43Unininformative Scales: No Independencies or Conditional Independencies
40 Future Directions Handle discrete items Incorporate background knowledge Apply to ETS data