Longitudinal Data Analysis: Why and How to Do it With Multi-Level Modeling (MLM)? Oi-man Kwok Texas A & M University
2 Why do we want to analyze longitudinal data under multilevel modeling (MLM) framework? –Dependency issue –Advantages of using MLM over traditional Methods (e.g., Univariate ANOVA, Multivariate ANOVA) –Review of important parameters in MLM How can we do it under SPSS? Road Map
Regression Model: e.g. DV: Test Scores of 1 st Year Grad-Level Statistics IV: GRE_M (GRE Math Test Score) 150 Students (i = 1,…,150) One of the important Assumptions for OLS regression? (Observations are independent from each other)
4 Ignoring the clustered structure (or dependency between observations) in the analyses can result in: Bias in the standard errors *Bias in the test of significance and confidence interval (Type I errors: Inflated alpha level (e.g. set α=.05; actual α=.10)) non-replicable results
5 Advantages of MLM over the traditional Methods on analyzing longitudinal data Univariate ANOVA—Restriction on the error structure: Compound Symmetry (CS) type error structure (higher statistical power but not likely to be met in longitudinal data) Multivariate ANOVA—No restriction on the error structure: Unstructured (UN) type error structure (often too conservative, lower statistical power); can only handle completely balanced data (Listwise deletion) More…
Analyzing Longitudinal Data: Example (Based on Actual Data—variable names changed for ease of presentation): Compare two different teaching methods on Achievement over time Teaching Methods: 78 students are randomly assigned to either: A. Lecture (Control group; 39 students) or B. Computer (Treatment group; 39 students) 4 Achievement (Ach) scores (right after the course, 1 year after, 2 year after, & 3 year after) were collected from each student after treatment (i.e. statistics course)
7 Achievement Computer Lecture Time=0 : Immediately posttest measure Time (Year) 1 2 3
Multi-Level Model (MLM) Note: Start with simple growth model Introduce treatment in example at end 123 e1e1 Ach t Time t 0 Student 36 β0β0 β1β1 e0e0 e2e2 e3e3 A Simple Regression Model for ONE student (student 36) (t=0,1,2,3) e t : Captures variation of individual achievement scores from the fitted regression model WITHIN student 36 V(e ti )=σ 2
Compare to (Micro Level Model) 123 Student 27 Ach ti Time ti 0 Student 36 Student 52 β 1_Student 27 β 1_Student 36 Β 0_Student 36 Β 0_Student 52 Β 0_Student 27 (i=1,2,3,…,78)
10 Student ID Student 27 Ach ti Time ti 0 Student 36 Student 52 β 1_Student 27 β 1_Student 36 Β 0_Student 36 Β 0_Student 52 Β 0_Student 27 Grand Intercept Grand Slope Variance of the intercepts Variance of the Slopes
Overall Model Student 27 Student 36 Student 52 No variation among the 78 intercepts Ach Time 0 γ 00 Captures the deviations of the 78 intercepts from the grand intercept γ 00 Captures the deviations of the 78 slopes from the Grand slope γ 10
Ach Time Overall Model Student 27 Student 36 Student 52 γ 10 No variation among the 78 slopes
13 Ach Time Overall Model
Summary G: Captures between- student differences R: Captures within-student random errors Grand Intercept Grand Slope Variance of the Intercepts Variance of the Slopes Covariance between Intercepts and Slopes V(e ti )=σ 2
15 MACRO vs. MICRO UNITS: Educational study Family studyLongitudinal study MACROSchool /Class FamilyIndividual MICROStudentFamily member Repeated observations
16 MACRO vs. MICRO (Cont.) MODELS: MICRO level model: regression model fits the observations within each MACRO unit MACRO level model: model captures the differences between the overall model and individual regression models from different macro units
17 Dependent Variable: Math Achievement (Achieve, Repeat measures /Micro Level) Predictors: Repeated measure (MICRO) Level Predictor: Time (& any time varying covariates) Student (MACRO) Level Predictor: Computer (Different teaching methods) (& any time-invariant variables such as gender)
18 Data format under MANOVA approaches: Student Treat T0 T1 T2 T3 S S S S1 has responses on all time points S2 has missing response at time 2 (indicated by "--") S3 has missing response at time 0. MANOVA: only retains S1 in the analysis (SPSS Data Format)
19 Student Treat T0 T1 T2 T3 S S S Student Treat Time DV S S S S S S S S S S Data format for MANOVA Data format for Multilevel Model (All 3 students are included in the analyses)
20 Student Treat Time DV S S S S S S S S S S S S Can you transform this dataset back into multivariate format???
21 Questions 1. On average, is there any trend of the math achievement over time? 2. Are there any differences between students on the trend of math achievement over time? (Do all students have the same trend of math achievement over time?)
Micro Level (Level 1): Macro Level (Level 2): Grand Slope Grand Intercept
23 Micro Level Macro Level Combined Model Between School Differences Within School Errors V(e ti )=σ 2 Grand Intercept Grand Slope V(U 0i )=τ 00 V(U 1i )=τ 11
Red: Computer Blue: Lecture
25 MA ti =γ 00 + γ 10 Time ti +U 0i +U 1i Time ti + e ti SPSS MIXED Syntax: MIXED mathach with Time /METHOD = REML /Fixed = intercept Time /Random = intercept Time |Subject(Subid) COVTYPE (UN) /PRINT = G SOLUTION TESTCOV. Execute Default: REML (Restricted Maximum Likelihood) Other option: ML (Maximum Likelihood) Produce asymptotic standard errors and Wald Z-tests for The covariance Parameter estimates identity variable for Macro level Units (e.g., Subid) Captures the overall model Requests for regression coefficients Specify random effects: Effects capture the between- School differences Print G matrix Structure of G matrix (Unstructured) DV with Continuous IV by Categorical IV
26 SPSS Output Basic Information
27
28 Requested by the “Solution” command in the PRINT statement (Line 5) (γ 10 ) Average Trend of the MA score (γ 00 ) Average MA score at Time=0
29 Requested by the “G” command in the PRINT statement (Line 5) τ 00 τ 10 τ 11 τ 01 τ 00 τ 10 τ 11 τ 01 Requested by the “TESTCOV” command in the PRINT statement (Line 5) Asymptotic standard errors and Wald Z-tests σ2σ2
30 Compare Likelihood Ratio Test! Can I have a simpler G matrix (i.e. τ01= τ10 =0) With -2LL: LL: ?
31 Syntax for fitting simpler G SPSS syntax /random = intercept Time |subject(Subid) COVTYPE (Diag)
32 (Model with τ 01 = τ 10 =0 ) -2 Res Log Likelihood (or Deviance) (Model with τ 01 = τ 10 ≠0) -2 Res Log Likelihood (or Deviance) χ 2 (1)=.000, p=1.00 Choose This
33 Compare to model with τ 11 = 0 SPSS syntax /random = intercept |subject(Subid) COVTYPE (Diag)
34 (Model with τ 01 =τ 10 =0, τ 11 ≠0 ) -2 Res Log Likelihood (Model with τ 11 =τ 01 =τ 10 = 0) -2 Res Log Likelihood χ 2 (1)=14.51, p<.001 Choose This Halved P-value
35 Result of the final Model γ 00 γ 10 τ 00 τ 11 σ2σ2
36 1. On average, is there any trend of the math achievement over time? 2. Are there any differences between students on the trend of math achievement over time? (Or, do all students have the same trend of math achievement over time?) τ 00 = τ 11 = Q3. If Yes to Q2, what causes the differences?
37 Micro Level (Level 1): MA ti = 0i + 1i Time ti + e ti (Variance of e ti = σ 2 ) Combined Model: MA ti =γ 00 + γ 01 Comp i + γ 10 Time ti + γ 11 Time ti *Comp i + U 0i + U 1i SES ti + e ti Macro Level (Level 2): β 0i =γ 00 + γ 01 Comp i + U 0i β 1i =γ 10 + γ 11 Comp i + U 1i ( Variance of U 0i = τ 00 ; Variance of U 1i = τ 11 ) Null Hypothesis: Different teaching methods have SAME effects on achievement over time (H 0 : γ 11 = 0)
38 MA ij =γ 00 + γ 01 Comp i + γ 10 Time ti + γ 11 Time ti *Comp i + U 0i + U 1i Time ti + e ti SPSS PROC MIXED Syntax: MIXED mathach with Time /METHOD = REML /Fixed = intercept Comp Time Time*Comp /Random = intercept Time |Subject(Subid) COVTYPE (Diag) /PRINT = G SOLUTION TESTCOV. Execute.
39 Without Comp in the Macro models With Comp in the Macro models
40 (WITHOUT “Comp” in the model) (WITH “Comp” in the model) Proportion of variance in the intercept ( ) explained by “Comp”=( )/ =.13 (or 13%) Proportion of variance in the slope ( ) explained by “Comp”=( )/14.56 =.33 (or 33%)
41 Solution for Fixed Effects Standard Effect Estimate Error DF t Value Pr > |t| Intercept <.0001 time computer time*comp
42 Overall Model for students in the Lecture method group Overall Model for students in the Computer method group Random Effect V(e ti )=σ 2 =90.00
43 Achievement Computer Lecture Time=0 : Immediately posttest measure Time (Year)
Conclusion Advantages of using MLM over traditional ANOVA approaches for analyzing longitudinal data: –1. Can flexibly model the variance function –2. Retain meaning of the random effects –3. Explore factors which predict individual differences in change over time (e.g., Treatment effect) –4.Take both unequal spacing and missing data into account
45 Take Home Exercise A clinical psychologist wants to examine the impact of the stress level of each family member (STRESS) on his/her level of symptomatology (SYMPTOM). There are 100 families, and families vary in size from three to eight members. The total number of participants is 400. a) Can you write out the model? (Hint: What is in the micro model? What is in the macro model?) b) Can you write out the syntax (SPSS) to analyze this model?
46 c) In designing the study, what possible macro predictors do you think the clinical psychologist should include in her study? (e.g. family size?) d) In designing the study, what possible micro predictors do you think the clinical psychologist should include in her study? (e.g. participant’s neuroticism?) e) Can you write out the model? (Hint: What is in the micro model? What is in the macro model) f) Can you write out the syntax (SPSS) to analyze this model?
47 b) SYMPTOM ij = γ 00 + γ 10 STRESS ij + U 0j + U 1j STRESS ij + e ij SPSS Syntax: MIXED Symptom with Stress /fixed = intercept Stress /random = intercept Stress |subject (Family) COVTYPE (UN) /PRINT = G SOLUTION TESTCOV. execute.
48 a) Micro-level model: SYMPTOM ij = β 0j + β 1j STRESS ij + e ij Macro-level model: β 0j = γ 00 + U 0j β 1j = γ 10 + U 1j Combined model: SYMPTOM ij = γ 00 + γ 10 STRESS ij + U 0j + U 1j STRESS ij + e ij
THE END! THANK YOU!