Latent Class Analysis Computing examples Karen Bandeen-Roche October 28, 2016
Objectives For you to leave here knowing… How to use the LCR SAS Macro for latent class analysis Brief introduction to poLCA in R How to interpret, report output How to create residuals and conduct model checking with them
Basics on using the software Part I: Basics on using the software
SAS MACRO … Beginning of file— basic documentation /*-----------------------------------------------------------------------------------------*/ /* */ /* TITLE: LCR */ /* ---------- */ /* A SAS Macro for Latent Class Regression using PROC IML. */ /* Requires SAS/IML */ /* Please send any suggestions or corrections to kbandeen@jhu.edu */ /* DESCRIPTION */ /* ----------- */ /* This program contains a macro for fitting LCA and LCR models and an example. */ /* To fit a standard latent class model, only include an intercept in the model. */ /* The macro uses the algorithm describe in */ /* Bandeen-Roche, K; Miglioretti, DL; Zeger, SL; Rathouz, P, */ /* "Latent variable regression for multiple discrete outcomes," */ /* JASA, In Press (1997) */ /* to fit latent class regression models. … Beginning of file— basic documentation
SAS MACRO … This text creates output data sets! create &outlib..beta from beta [colname='value' rowname=betaname]; append from beta [rowname=betaname]; close &outlib..beta; create &outlib..eta from eta [colname=etaname]; append from eta; close &outlib..eta; create &outlib..theta from h [colname=thetanam]; append from h; close &outlib..theta; expct = nrow(x)#bottom; create &outlib..expect from expct [colname='expected']; append from expct; close &outlib..expect; create &outlib..pi from pi [colname=pi2name rowname=varname]; append from pi [rowname=varname]; close &outlib..pi; create &outlib..var from var [colname=parm rowname=parm]; append from var [rowname=parm]; close &outlib..var; title; %mend lcr; This text creates output data sets! Posterior probabilities Expected cell counts Before anything else— need to run through here
Toy example (in software: immediately follows the macro) BINARY INDICATORS data dataset; set a; if y1=. | y2=. | y3=. | y4=. | y5=. then delete; int=1; run; Create “intercept” Designed for complete data
SAS Macro Command line format Name of your dataset Response variables Covariates (just intercept for LCA) Number of classes Initial parameters (“0” triggers self-initialization) Iterate to criterion precision
SAS Macro Output format Top: initial estimates, # iterations Bottom: final estimates, fit criteria
SAS Macro Command line format Example with initial estimates filled in rather than self-initialization “pi” = as we have defined it (conditional probabilities) “eta” = our “Pj” (latent class probabilities)
R function: poLCA > poLCA(formula = cbind(Y1, Y2, Y3, Y4, Y5) ~ 1, data = j2, nclass = 2) Conditional item response (column) probabilities, by outcome variable, for each class (row) $V1 Pr(1) Pr(2) class 1: 0.9534 0.0466 class 2: 0.2851 0.7149 $V2 class 1: 0.7568 0.2432 class 2: 0.1559 0.8441 $V3 class 1: 0.7848 0.2152 class 2: 0.2143 0.7857 $V4 Pr(1) Pr(2) class 1: 0.7880 0.2120 class 2: 0.0571 0.9429 $V5 class 1: 0.7794 0.2206 class 2: 0.0703 0.9297 Estimated class population shares 0.6059 0.3941
R function: poLCA ========================================================= Fit for 2 latent classes: number of observations: 100 number of estimated parameters: 11 residual degrees of freedom: 20 maximum log-likelihood: -276.3204 AIC(2): 574.6407 BIC(2): 603.2976 G^2(2): 26.85842 (Likelihood ratio/deviance statistic) X^2(2): 27.31029 (Chi-square goodness of fit)
Post-traumatic stress disorder Part II: Application Post-traumatic stress disorder
Data set up (immediately following Macro) Pull in pre-existing data A convenient way to code “patterns” Dataset to pass on to LCA
Pattern frequency listing pscor=b1+10*b4+100*b5+1000*c1+10000*c2+100000*c5+1000000*d1+10000000*d2+100000000*d3; No symptoms b1 only b4 only b1 & b4
LCA Macro “Call” Name of dataset Response variables (9 of them) Number of classes Initial parameters “Canned” initialization & other starts yield same Am arranging for “low” symptom probability to be the “last” class (relevant for LCR)
Output Class Latent class probabilities Class 3 prevalence 1 2 3 Latent class probabilities Class 3 prevalence Class 1 2 3 conditional probabilities Pr(B1=1|Class 3)
Classes reordered for reporting
Revisiting the Model for “Fit” 5 class model “None,” “PTSD” classes very stable AIC, BIC: both lower AIC, BIC LR test: Better LR test
Revisiting the Model for “Fit” Five class model appears “better” Trustworthy? Data quite sparse! Seeing is believing—thus….
Checking Fit - Residuals Standardized residuals (multinomial) In this case, residuals are actual cell counts vs. expected cell counts.
Expected counts: SAS Macro Pull the expected values into a dataset. They’re labeled “expected”—rename them to avoid code-word problems Sort and tabulate to show the pattern, observed count, and expected count
Observed vs. Expected Comparison Three class Five class Cut and paste into Excel: Stat transfer to Stata
Data structure in Stata 3 5 Obs Class Class
QQPlot of residuals, 5 vs 3 class . gen resid3=n-tclass . gen resid5=n-fclass . qqplot resid5 resid3
QQPlot of standardized residuals, 5 vs 3 class . gen sresid3 = resid3/sqrt(tclass*(1-tclass/1827)) . gen sresid5 = resid5/sqrt(fclass*(1-fclass/1827)) . qqplot sresid5 sresid3 Favors 5-class model
Listing–Largest |Standardized Residual| Differences Negative difference favor 3-class model. Only a few large— these have small n. . gen sadiff = abs(sresid3)-abs(sresid5) . sort sadiff . list Pattern sadiff n tclass fclass sresid3 sresid5
Listing–Largest |Standardized Residual| Differences . gen sadiff = abs(sresid3)-abs(sresid5) . sort sadiff . list Pattern sadiff n tclass fclass sresid3 sresid5 Both models underestimate the number having all symptoms Positive values favor 5-class model. A few large values have considerably large n (ex/ 110 = cues create distress, reactivity without re-experiencing).
Conclusion The latent class model fit suggests a nosology with subpopulations exhibiting “few” (just over half), “many” (~14%) and “re-experiencing plus a few other” symptoms. The conditional independence assumption may not be reasonable for these data
Objectives For you to leave here knowing… How to use the LCR SAS Macro for latent class analysis Brief introduction to poLCA in R How to interpret, report output How to create residuals and conduct model checking with them