University Rennes 2, CRPCC, EA 1285 Latent variable modeling of psychological longitudinal data: taking into account the unobserved heterogeneity using Mplus Jacques Juhel University Rennes 2, CRPCC, EA 1285 June 2-4, 2010 - Saint-Raphaël INSERM workshop : Mixture modelling for longitudinal data
Studying individual differences in learning, change and development A double compromise : random effect model, classification techniques. Introduction June 2-4, 2010 - Saint-Raphaël INSERM workshop : Mixture modelling for longitudinal data
(among other methods) the GMM approach of Muthén and colleagues A technique for longitudinal data that : combines categorical and continuous latent variables in the same model (“beyond SEM”), accommodates unobserved heterogeneity in the sample, allows for each class membership latent growth parameters to be influenced by time-varying covariates and time-invariant predictor variables, incorporates consequent outcomes predicted by the latent class variable. Introduction June 2-4, 2010 - Saint-Raphaël INSERM workshop : Mixture modelling for longitudinal data
Factor analysis measurement model (level 1) : LGM specifications The LGM for a continuous outcome : the multivariate latent variable approach Factor analysis measurement model (level 1) : Yi (mx1) repeated measures over fixed time points, n (mx1) intercepts in the regression from Yi on hi , hi (px1) latent growth factors, L (mxp) design matrix of factor loadings, ei (mx1) residuals in the regression of Yi on hi (covariance matrix Q). June 2-4, 2010 - Saint-Raphaël INSERM workshop : Mixture modelling for longitudinal data
Structural regression model (level 2) : LGM specifications The LGM for a continuous outcome : the multivariate latent variable approach Structural regression model (level 2) : a (px1) means of hi or intercepts in the regression of hi on hi , B (pxp) regression coefficients in the regression of hi on hi , hi (px1) latent growth factors, zi (px1) residuals in the regression of hi on hi (covariance matrix Y). June 2-4, 2010 - Saint-Raphaël INSERM workshop : Mixture modelling for longitudinal data
e, z and h are mutually uncorrelated, E[e] and E[z] equal 0. The LGM for a continuous outcome : the multivariate latent variable approach The covariance and mean structure are derived for the population with the hypothesis that : e, z and h are mutually uncorrelated, E[e] and E[z] equal 0. LGM assumptions June 2-4, 2010 - Saint-Raphaël INSERM workshop : Mixture modelling for longitudinal data
The unconditional linear LGM Free parameters (Mplus output) SEM representation The unconditional linear LGM Free parameters (Mplus output) y1 y2 y3 y4 a Means of h0 and h1, Y var(h0) var(h1) cov(h0,h1) res. var(y) June 2-4, 2010 - Saint-Raphaël INSERM workshop : Mixture modelling for longitudinal data
The LGM with time-varying covariates LGM specifications The LGM with time-varying covariates Factor analysis measurement model (level 1) : Yi (mx1) repeated measures over fixed time points, n (mx1) intercepts in the regression from Yi on hi , hi (px1) latent growth factors, L (mxp) design matrix of factor loadings, K (mxr) coefficients in the regression from Yi on time-varying covariates ai. ei (mx1) residuals in the regression of Yi on hi (covariance matrix Q). June 2-4, 2010 - Saint-Raphaël INSERM workshop : Mixture modelling for longitudinal data
Regression coefficients from y on a SEM representation Linear LGM with time-varying covariates Free parameters (Mplus output) y1 y2 y3 y4 a1 a2 a3 a4 Y var(h0) var(h1) cov(h0,h1) res.var(y) cov(a, h0) cov(a, h1) B Regression coefficients from y on a a Means of h0 and h1, June 2-4, 2010 - Saint-Raphaël INSERM workshop : Mixture modelling for longitudinal data
The LGM with time-invariant covariates LGM specifications The LGM with time-invariant covariates Structural regression model (level 2), with vector of predictors x : hi (px1) latent growth factors, a (px1) means of hi or intercepts in the regression of hi on hi , B (pxp) regression coefficients in the regression of hi on hi , Xi (qx1) time-invariant covariate predictors of change, G (pxq) regression coefficients in the regression from h on X, zi (px1) residuals in the regression of hi on hi (covariance matrix Y). June 2-4, 2010 - Saint-Raphaël INSERM workshop : Mixture modelling for longitudinal data
The linear LGM with time-varying and time-invariant covariates SEM representation The linear LGM with time-varying and time-invariant covariates Free parameters (Mplus output) y1 y2 y3 y4 x1 x2 x3 a1 a2 a3 a4 B Regression coefficients from y on a Regression coefficients from h0 and h1on X a Intercepts of h0 and h1, Means of a1-a4 Y res.var(h0) res. var(h1) res. cov(h0,h1) res. var(y) cov(a, h0) cov(a, h1) cov(a, x) June 2-4, 2010 - Saint-Raphaël INSERM workshop : Mixture modelling for longitudinal data
Zi (dx1) vector of distal outcomes of change, LGM specifications The linear LGM with time-varying, time-invariant covariates and a distal outcome Consequences of change as outcomes can be predicted by the latent growth factors : Zi (dx1) vector of distal outcomes of change, b (dxp) matrix of regression coefficients from Z on h, w (dx1) vector of regression intercepts for Z, xi (px1) residuals in the regression of Zi on hi (covariance matrix Y). June 2-4, 2010 - Saint-Raphaël INSERM workshop : Mixture modelling for longitudinal data
Free parameters (Mplus output) SEM representation The linear LGM with time-varying, time-invariant covariates and a distal outcome Free parameters (Mplus output) y1 y2 y3 y4 x1 x2 x3 a1 a2 a3 a4 B Regression coefficients from y on a Regression coefficients from h0 and h1on x Regression coefficients from z on h0 and h1 a Intercepts of h0 and h1, Means of a1-a4 Intercept of z Y res. var(h0) res. var(h1) res. cov(h0,h1) res. var(y) cov(a, h0) cov(a, h1) cov(a, x) z June 2-4, 2010 - Saint-Raphaël INSERM workshop : Mixture modelling for longitudinal data
Illustration : data set 1 Clinical symptomatology, performance on the TMT and consciousness disorders in schizophrenia 130 stabilized patients with schizophrenia (M=31.0 yr., QI>90, all with neuroleptic medication). Time to complete TMT parts A and B separately at 4 equally spaced time points (t=0, t=2, t=4 and t=6 months). t=-1 : scores to the Positive and Negative Syndrome Scale. June 2-4, 2010 - Saint-Raphaël INSERM workshop : Mixture modelling for longitudinal data
Illustration: data set 1 Trail Making Test : Responding time (t0 t3, N = 102 complete, only!) Illustration: data set 1 June 2-4, 2010 - Saint-Raphaël INSERM workshop : Mixture modelling for longitudinal data
Illustration: data set 1 Fitting a linear LGM with time-varying and time-invariant covariates to TMT data (N=102) B1 B2 B3 B4 A1 A2 A3 A4 Dis Pos Neg Host Anx i s TMT form B TMT form A June 2-4, 2010 - Saint-Raphaël INSERM workshop : Mixture modelling for longitudinal data
Illustration: data set 1 Is the linear growth model tenable? Illustration: data set 1 June 2-4, 2010 - Saint-Raphaël INSERM workshop : Mixture modelling for longitudinal data
Illustration: data set 1 Conditional LGM : results ML estimation Two-Tailed Estimate S.E. Est./S.E. P-Value I ON DISORG 5.075 2.666 1.904 0.057 POS 2.983 2.536 1.176 0.240 NEG 0.089 2.562 0.035 0.972 HOST -3.696 2.875 -1.285 0.199 ANX 4.272 2.817 1.516 0.129 S ON DISORG -2.006 1.034 -1.940 0.052 POS -1.376 0.984 -1.400 0.162 NEG 1.408 0.991 1.421 0.155 HOST 1.222 1.115 1.095 0.273 ANX -0.360 1.092 -0.330 0.742 Illustration: data set 1 June 2-4, 2010 - Saint-Raphaël INSERM workshop : Mixture modelling for longitudinal data
Illustration: data set 1 Conditional LGM : results ML estimation Two-Tailed Estimate S.E. Est./S.E. P-Value B1 ON A1 1.674 0.226 7.394 0.000 B2 ON A2 1.703 0.166 10.274 0.000 B3 ON A3 1.511 0.115 13.110 0.000 B4 ON A4 1.797 0.156 11.516 0.000 Illustration: data set 1 June 2-4, 2010 - Saint-Raphaël INSERM workshop : Mixture modelling for longitudinal data
Illustration: data set 1 Conditional LGM : results ML estimation Two-Tailed Estimate S.E. Est./S.E. P-Value Intercepts B1 0.000 0.000 999.000 999.000 B2 0.000 0.000 999.000 999.000 B3 0.000 0.000 999.000 999.000 B4 0.000 0.000 999.000 999.000 I -39.325 27.652 -1.422 0.155 S 4.543 10.730 0.423 0.672 Illustration: data set 1 June 2-4, 2010 - Saint-Raphaël INSERM workshop : Mixture modelling for longitudinal data
Illustration: data set 1 Conditional LGM : results ML estimation Two-Tailed Estimate S.E. Est./S.E. P-Value Residual Variances B1 3172.312 461.870 6.868 0.000 B2 1034.587 164.132 6.303 0.000 B3 387.629 75.508 5.134 0.000 B4 378.444 72.855 5.194 0.000 I 265.423 61.838 4.292 0.000 S 0.000 0.000 999.000 999.000 R-SQUARE B1 0.395 0.061 6.427 0.000 B2 0.584 0.055 10.594 0.000 B3 0.801 0.041 19.526 0.000 B4 0.770 0.045 17.118 0.000 I 0.468 0.144 3.240 0.001 S 1.000 999.000 999.000 999.000 Illustration: data set 1 June 2-4, 2010 - Saint-Raphaël INSERM workshop : Mixture modelling for longitudinal data
Representing heterogeneity with respect to the growth factors and covariates. GMM specifies a separate LGM for each of the K latent class simultaneously : and GMM specification June 2-4, 2010 - Saint-Raphaël INSERM workshop : Mixture modelling for longitudinal data
with the reference class K, GMM specification Modeling predictive effects of time-invariant covariates on latent class membership Mixture components (c) are related to covariates through a multinomial logistic regression model : with the reference class K, (1xq) vector of logistic regression coefficients from C on X, p0k logistic regression intercept for class k relative to class K. Xi (qx1) vector of time-invariant covariate predictors of change. June 2-4, 2010 - Saint-Raphaël INSERM workshop : Mixture modelling for longitudinal data
Indices for determining the “best” GMM Information-based criteria : GMM selection Indices for determining the “best” GMM Information-based criteria : BIC, SABIC - Nested model Likelihood Ratio Test : LMR (Low-Mendell-Rubin) LRT, bootstrapped LRT Latent classification accuracy : Entropy, average latent class probabilities for most likely latent class membership June 2-4, 2010 - Saint-Raphaël INSERM workshop : Mixture modelling for longitudinal data
Illustration: data set 1 Mplus representation of a linear GMM fitted to TMT data (N=102). B1 B2 B3 B4 A1 A2 A3 A4 i s c Disorg Pos Neg Host Anx x June 2-4, 2010 - Saint-Raphaël INSERM workshop : Mixture modelling for longitudinal data
Illustration: data set 1 Determining the “best” growth two-class model x c i s differences between classes June 2-4, 2010 - Saint-Raphaël INSERM workshop : Mixture modelling for longitudinal data
Illustration: data set 1 GMM results : TMT data (N=102) Information Criteria Number of Free Parameters 29 Akaike (AIC) 4025.603 Bayesian (BIC) 4101.727 Sample-Size Adjusted BIC 4010.126 (n* = (n + 2) / 24 FINAL CLASS COUNTS AND PROPORTIONS FOR THE LATENT CLASS PATTERNS BASED ON ESTIMATED POSTERIOR PROBABILITIES Latent Classes 1 7.10321 0.06964 2 94.89679 0.93036 Illustration: data set 1 June 2-4, 2010 - Saint-Raphaël INSERM workshop : Mixture modelling for longitudinal data
Illustration: data set 1 GMM results : TMT data (N=102) CLASSIFICATION QUALITY Entropy 0.987 CLASSIFICATION OF INDIVIDUALS BASED ON THEIR MOST LIKELY LATENT CLASS MEMBERSHIP Class Counts and Proportions Latent classes 1 7 0.06863 2 95 0.93137 Average Latent Class Probabilities for Most Likely Latent Class Membership (Row) by Latent Class (Column) 1 2 1 0.994 0.006 2 0.002 0.998 Illustration: data set 1 June 2-4, 2010 - Saint-Raphaël INSERM workshop : Mixture modelling for longitudinal data
Illustration: data set 1 Growth Mixture model results : TMT data (N=102) VUONG-LO-MENDELL-RUBIN LIKELIHOOD RATIO TEST FOR 1 (H0) VERSUS 2 CLASSES H0 Loglikelihood Value -2001.982 2 Times the Loglikelihood Difference 36.361 Difference in the Number of Parameters 8 Mean -7.722 Standard Deviation 35.246 P-Value 0.0355 LO-MENDELL-RUBIN ADJUSTED LRT TEST Value 35.404 P-Value 0.0383 Illustration: data set 1 June 2-4, 2010 - Saint-Raphaël INSERM workshop : Mixture modelling for longitudinal data
Illustration: data set 1 Growth Mixture model results : TMT data (N=102) Categorical Latent Variables Two-Tailed Estimate S.E. Est./S.E. P-Value C#1 ON DISORG 1.478 0.550 2.689 0.007 POS 1.967 0.603 3.260 0.001 NEG -1.250 0.397 -3.150 0.002 HOST -2.240 0.869 -2.579 0.010 ANX -0.282 0.399 -0.706 0.480 Intercepts C#1 -1.700 3.014 -0.564 0.573 Illustration: data set 1 June 2-4, 2010 - Saint-Raphaël INSERM workshop : Mixture modelling for longitudinal data
Illustration: data set 1 GMM: probability of class membership as function of value on each of covariates : TMT data (N=102) Illustration: data set 1 June 2-4, 2010 - Saint-Raphaël INSERM workshop : Mixture modelling for longitudinal data
Illustration: data set 1 Growth Mixture model results : TMT data (N=102) Latent class 1 = Latent class 2 Two-Tailed Estimate S.E. Est./S.E. P-Value I ON DiSORG 1.335 2.595 0.514 0.607 POS -1.365 2.703 -0.505 0.613 NEG 4.387 2.412 1.819 0.069 HOST 0.264 3.270 0.081 0.936 ANX 5.051 2.659 1.900 0.057 S ON DiSORG -1.617 1.090 -1.483 0.138 POS -0.892 1.196 -0.746 0.456 NEG 0.917 0.899 1.019 0.308 HOST 0.780 1.585 0.492 0.622 ANX -0.434 1.206 -0.360 0.719 Illustration: data set 1 June 2-4, 2010 - Saint-Raphaël INSERM workshop : Mixture modelling for longitudinal data
Illustration: data set 1 Growth Mixture model results : TMT data (N=102) Nc#1= 7 Nc#2= 95 June 2-4, 2010 - Saint-Raphaël INSERM workshop : Mixture modelling for longitudinal data
Illustration: data set 2 Data set 2 : Learning to read and development of phonological and morphological processing 344 children (6-7 years) tested 6 times (6 weeks between each measurement occasion) t1-1: Raven Matrix (int) t1 – t6 : 4 observed variables: Syllables Implicit Processing, Phonemes Implicit Processing , Syllables Explicit Processing, Phonemes Explicit Processing. t6 + 1 week : Word reading (frequent words, rare words, pseudo-words) Illustration: data set 2 June 2-4, 2010 - Saint-Raphaël INSERM workshop : Mixture modelling for longitudinal data
Illustration: data set 2 t0 t1 t2 t3 t4 t5 Data set 2 : descriptive statistics Illustration: data set 2 June 2-4, 2010 - Saint-Raphaël INSERM workshop : Mixture modelling for longitudinal data
Illustration: data set 2 SEM representation of a quadratic GGMM with time invariant antecedents of change and a distal outcome (N=344) Int sip1 pip1 sep1 pep1 f1 sip2 pip2 sep2 pep2 f2 sip3 pip3 sep3 pep3 f3 sip4 pip4 sep4 pep4 f4 sip5 pip5 sep5 pep5 f5 sip6 sep6 pep6 f6 c i s q Lect. freq. rare pseudo words Illustration: data set 2 June 2-4, 2010 - Saint-Raphaël INSERM workshop : Mixture modelling for longitudinal data
Multiple indicators GMM Multiple indicator LGM First-order factor scores : measurement model with (strong) invariance constraints Second-order growth factors : Factor scores as deviations from the group mean : Second-order growth model: Multiple indicators GMM June 2-4, 2010 - Saint-Raphaël INSERM workshop : Mixture modelling for longitudinal data
Multiple indicators GMM Multiple indicator GMM First-order constraints : Differences between latent classes : - means , - covariances , - parameters for representing growth . June 2-4, 2010 - Saint-Raphaël INSERM workshop : Mixture modelling for longitudinal data
Illustration: data set 2 Unconditional GMM : 2 classes vs 3 classes Illustration: data set 2 June 2-4, 2010 - Saint-Raphaël INSERM workshop : Mixture modelling for longitudinal data
Illustration: data set 2 Three-class GMM with int as covariate, without (overall) and with (between) class differences June 2-4, 2010 - Saint-Raphaël INSERM workshop : Mixture modelling for longitudinal data
Illustration: data set 2 Conditional GMM: estimated means June 2-4, 2010 - Saint-Raphaël INSERM workshop : Mixture modelling for longitudinal data
Illustration: data set 2 GMM results : information criteria an quality of classification Information Criteria Number of Free Parameters 127 Akaike (AIC) 31755.780 Bayesian (BIC) 32243.542 Sample-Size Adjusted BIC 31840.665 (n* = (n + 2) / 24) FINAL CLASS COUNTS AND PROPORTIONS FOR THE LATENT CLASS PATTERNS BASED ON ESTIMATED POSTERIOR PROBABILITIES Latent Classes 1 278.61914 0.80994 2 39.41000 0.11456 3 25.97086 0.07550 June 2-4, 2010 - Saint-Raphaël INSERM workshop : Mixture modelling for longitudinal data
Illustration: data set 2 GMM results : information criteria an quality of classification Entropy 0.986 CLASSIFICATION OF INDIVIDUALS BASED ON THEIR MOST LIKELY LATENT CLASS MEMBERSHIP Class Counts and Proportions Latent Classes 1 280 0.81395 2 38 0.11047 3 26 0.07558 Average Latent Class Probabilities for Most Likely Latent Class Membership (Row) by Latent Class (Column) 1 2 3 1 0.995 0.005 0.000 2 0.003 0.990 0.007 3 0.000 0.011 0.989 June 2-4, 2010 - Saint-Raphaël INSERM workshop : Mixture modelling for longitudinal data
Illustration: data set 2 GMM results : intercepts of i, s and q Class 1 Intercepts I 3.693 0.275 13.451 0.000 S 1.103 0.145 7.632 0.000 Q -0.095 0.027 -3.559 0.000 Residual Variances I 0.961 0.106 9.084 0.000 S 0.152 0.031 4.924 0.000 Q 0.005 0.001 5.221 0.000 June 2-4, 2010 - Saint-Raphaël INSERM workshop : Mixture modelling for longitudinal data
Illustration: data set 2 GMM results : intercepts of i, s and q Class 2 Intercepts I 2.616 0.420 6.223 0.000 S 1.907 0.284 6.725 0.000 Q -0.254 0.055 -4.617 0.000 Residual Variances I 0.961 0.106 9.084 0.000 S 0.152 0.031 4.924 0.000 Q 0.005 0.001 5.221 0.000 June 2-4, 2010 - Saint-Raphaël INSERM workshop : Mixture modelling for longitudinal data
Illustration: data set 2 GMM results : intercepts of i, s and q Class 3 Intercepts I 0.000 0.000 999.000 999.000 S 1.127 0.354 3.187 0.001 Q 0.077 0.068 -1.137 0.256 (linear trend in class 3 in fixing q@0) Residual Variances I 0.961 0.106 9.084 0.000 S 0.152 0.031 4.924 0.000 Q 0.005 0.001 5.221 0.000 June 2-4, 2010 - Saint-Raphaël INSERM workshop : Mixture modelling for longitudinal data
Illustration: data set 2 GMM results : coefficients regression from categorical variables c on covariate Categorical Latent Variables C#1 ON INTNV 0.172 0.058 2.969 0.003 C#2 ON INTNV 0.044 0.076 0.575 0.565 Intercepts C#1 0.392 0.709 0.553 0.580 C#2 -0.052 0.925 -0.056 0.955 June 2-4, 2010 - Saint-Raphaël INSERM workshop : Mixture modelling for longitudinal data
Illustration: data set 2 GMM results : probability of class membership June 2-4, 2010 - Saint-Raphaël INSERM workshop : Mixture modelling for longitudinal data
Illustration: data set 2 Estimated probabilities for c as a function of int level June 2-4, 2010 - Saint-Raphaël INSERM workshop : Mixture modelling for longitudinal data
Illustration: data set 2 GMM results : regression from i, s and q on covariate Class 1 I ON INTNV 0.122 0.020 6.050 0.000 S ON INTNV -0.033 0.011 -2.939 0.003 Q ON INTNV 0.003 0.002 1.567 0.117 S WITH I -0.008 0.040 -0.206 0.837 Q WITH I -0.015 0.007 -2.309 0.021 S -0.026 0.005 -4.943 0.000 June 2-4, 2010 - Saint-Raphaël INSERM workshop : Mixture modelling for longitudinal data
Illustration: data set 2 GMM results : regression from i, s and q on covariate Class 2 I ON INTNV 0.140 0.040 3.477 0.001 S ON INTNV -0.095 0.025 -3.802 0.000 Q ON INTNV 0.015 0.005 3.136 0.002 S WITH I -0.008 0.040 -0.206 0.837 Q WITH I -0.015 0.007 -2.309 0.021 S -0.026 0.005 -4.943 0.000 June 2-4, 2010 - Saint-Raphaël INSERM workshop : Mixture modelling for longitudinal data
Illustration: data set 2 GMM results : regression from i, s and q on covariate Class 3 I ON INTNV 0.341 0.022 15.275 0.000 S ON INTNV -0.037 0.034 -1.085 0.278 Q ON INTNV 0.002 0.007 0.297 0.766 S WITH I -0.008 0.040 -0.206 0.837 Q WITH I -0.015 0.007 -2.309 0.021 S -0.026 0.005 -4.943 0.000 June 2-4, 2010 - Saint-Raphaël INSERM workshop : Mixture modelling for longitudinal data
Illustration: data set 2 GMM results : reading proficiency level for each class Class 1 Means LECT 7.508 0.434 17.288 0.000 Class 2 LECT 4.430 0.287 15.455 0.000 Class 3 LECT 0.000 0.000 999.000 999.000 June 2-4, 2010 - Saint-Raphaël INSERM workshop : Mixture modelling for longitudinal data
Interest, limitations, cautions Concluding remarks Interest, limitations, cautions GMM is a promising approach for modeling heterogeneous latent change across unobserved population subgroups. But : GMM is usually based on large samples. The search for heterogeneity should be conducted in a principled and disciplined way; the best way to guide GMM selection is to test different models following theory-based models. GMM always identify groups The role that covariates play in the enumeration process has to be clarified. An important question : how to model missing data on x variables? June 2-4, 2010 - Saint-Raphaël INSERM workshop : Mixture modelling for longitudinal data