Latent Class Analysis Computing examples

Slides:



Advertisements
Similar presentations
What is Chi-Square? Used to examine differences in the distributions of nominal data A mathematical comparison between expected frequencies and observed.
Advertisements

EViews Student Version. Today’s Workshop Basic grasp of how EViews manages data Creating Workfiles Importing data Running regressions Performing basic.
A. The Basic Principle We consider the multivariate extension of multiple linear regression – modeling the relationship between m responses Y 1,…,Y m and.
© Department of Statistics 2012 STATS 330 Lecture 32: Slide 1 Stats 330: Lecture 32.
The %LRpowerCorr10 SAS Macro Power Estimation for Logistic Regression Models with Several Predictors of Interest in the Presence of Covariates D. Keith.
Discrete Choice Modeling William Greene Stern School of Business IFS at UCL February 11-13, 2004
SPSS Session 5: Association between Nominal Variables Using Chi-Square Statistic.
Logit & Probit Regression
Regression Analysis Using Excel. Econometrics Econometrics is simply the statistical analysis of economic phenomena Here, we just summarize some of the.
Latent Growth Curve Modeling In Mplus:
A Simple Guide to Using SPSS© for Windows
Lecture 24: Thurs. Dec. 4 Extra sum of squares F-tests (10.3) R-squared statistic (10.4.1) Residual plots (11.2) Influential observations (11.3,
Notes on Logistic Regression STAT 4330/8330. Introduction Previously, you learned about odds ratios (OR’s). We now transition and begin discussion of.
Linear statistical models 2009 Count data  Contingency tables and log-linear models  Poisson regression.
Statistical hypothesis testing – Inferential statistics II. Testing for associations.
Logistic Regression II Simple 2x2 Table (courtesy Hosmer and Lemeshow) Exposure=1Exposure=0 Disease = 1 Disease = 0.
Statistics for the Social Sciences Psychology 340 Fall 2013 Thursday, November 21 Review for Exam #4.
Aaker, Kumar, Day Ninth Edition Instructor’s Presentation Slides
1 MULTI VARIATE VARIABLE n-th OBJECT m-th VARIABLE.
Review of Econ424 Fall –open book –understand the concepts –use them in real examples –Dec. 14, 8am-12pm, Plant Sciences 1129 –Vote Option 1(2)
1 Experimental Statistics - week 4 Chapter 8: 1-factor ANOVA models Using SAS.
Copyright © 2009 Cengage Learning 15.1 Chapter 16 Chi-Squared Tests.
April 6 Logistic Regression –Estimating probability based on logistic model –Testing differences among multiple groups –Assumptions for model.
Multilevel Linear Models Field, Chapter 19. Why use multilevel models? Meeting the assumptions of the linear model – Homogeneity of regression coefficients.
6-1 Introduction To Empirical Models Based on the scatter diagram, it is probably reasonable to assume that the mean of the random variable Y is.
GEE Approach Presented by Jianghu Dong Instructor: Professor Keumhee Chough (K.C.) Carrière.
BUSI 6480 Lecture 8 Repeated Measures.
1 Parallel Models. 2 Model two separate processes which run in tandem Bedwetting and daytime wetting 5 time points: 4½, 5½, 6½,7½ & 9½ yrs Binary measures.
Lecture 18 Ordinal and Polytomous Logistic Regression BMTRY 701 Biostatistical Methods II.
Basics of Biostatistics for Health Research Session 1 – February 7 th, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health.
1 Statistics 262: Intermediate Biostatistics Regression Models for longitudinal data: Mixed Models.
The Probit Model Alexander Spermann University of Freiburg SS 2008.
Introduction to Eviews Eviews Workshop September 6, :30 p.m.-3:30 p.m.
The Probit Model Alexander Spermann University of Freiburg SoSe 2009
I. ANOVA revisited & reviewed
An Introduction to Latent Curve Models
Keller: Stats for Mgmt & Econ, 7th Ed Chi-Squared Tests
Chapter 11 – Test of Independence - Hypothesis Test for Proportions of a Multinomial Population In this case, each element of a population is assigned.
Latent Class Regression
Latent Variable Modeling Summary / Final Thoughts
Introduction to SPSS July 28, :00-4:00 pm 112A Stright Hall
CHAPTER 13 Data Processing, Basic Data Analysis, and the Statistical Testing Of Differences Copyright © 2000 by John Wiley & Sons, Inc.
BINARY LOGISTIC REGRESSION
Chapter 9: Non-parametric Tests
Latent Class Regression Computing examples
Running models and Communicating Statistics
Notes on Logistic Regression
Statistics in MSmcDESPOT
Analyzing and Interpreting Quantitative Data
Estimating with PROBE II
Reasoning in Psychology Using Statistics
ביצוע רגרסיה לוגיסטית. פרק ה-2
6-1 Introduction To Empirical Models
Chi Square Two-way Tables
Discrete Event Simulation - 4
15.1 Goodness-of-Fit Tests
Reasoning in Psychology Using Statistics
Eviews Tutorial for Labor Economics Lei Lei
Lexico-grammar: From simple counts to complex models
Data Processing, Basic Data Analysis, and the
Reasoning in Psychology Using Statistics
Amos Introduction In this tutorial, you will be briefly introduced to the student version of the SEM software known as Amos. You should download the current.
Reasoning in Psychology Using Statistics
Reasoning in Psychology Using Statistics
Making Use of Associations Tests
Essentials of Statistics for Business and Economics (8e)
SEM: Step by Step In AMOS and Mplus.
Hypothesis Testing - Chi Square
Introduction to SAS Essentials Mastering SAS for Data Analytics
Presentation transcript:

Latent Class Analysis Computing examples Karen Bandeen-Roche October 28, 2016

Objectives For you to leave here knowing… How to use the LCR SAS Macro for latent class analysis Brief introduction to poLCA in R How to interpret, report output How to create residuals and conduct model checking with them

Basics on using the software Part I: Basics on using the software

SAS MACRO … Beginning of file— basic documentation /*-----------------------------------------------------------------------------------------*/ /* */ /* TITLE: LCR */ /* ---------- */ /* A SAS Macro for Latent Class Regression using PROC IML. */ /* Requires SAS/IML */ /* Please send any suggestions or corrections to kbandeen@jhu.edu */ /* DESCRIPTION */ /* ----------- */ /* This program contains a macro for fitting LCA and LCR models and an example. */ /* To fit a standard latent class model, only include an intercept in the model. */ /* The macro uses the algorithm describe in */ /* Bandeen-Roche, K; Miglioretti, DL; Zeger, SL; Rathouz, P, */ /* "Latent variable regression for multiple discrete outcomes," */ /* JASA, In Press (1997) */ /* to fit latent class regression models. … Beginning of file— basic documentation

SAS MACRO … This text creates output data sets! create &outlib..beta from beta [colname='value' rowname=betaname]; append from beta [rowname=betaname]; close &outlib..beta; create &outlib..eta from eta [colname=etaname]; append from eta; close &outlib..eta; create &outlib..theta from h [colname=thetanam]; append from h; close &outlib..theta; expct = nrow(x)#bottom; create &outlib..expect from expct [colname='expected']; append from expct; close &outlib..expect; create &outlib..pi from pi [colname=pi2name rowname=varname]; append from pi [rowname=varname]; close &outlib..pi; create &outlib..var from var [colname=parm rowname=parm]; append from var [rowname=parm]; close &outlib..var; title; %mend lcr; This text creates output data sets! Posterior probabilities Expected cell counts Before anything else— need to run through here

Toy example (in software: immediately follows the macro) BINARY INDICATORS data dataset; set a; if y1=. | y2=. | y3=. | y4=. | y5=. then delete; int=1; run; Create “intercept” Designed for complete data

SAS Macro Command line format Name of your dataset Response variables Covariates (just intercept for LCA) Number of classes Initial parameters (“0” triggers self-initialization) Iterate to criterion precision

SAS Macro Output format Top: initial estimates, # iterations Bottom: final estimates, fit criteria

SAS Macro Command line format Example with initial estimates filled in rather than self-initialization “pi” = as we have defined it (conditional probabilities) “eta” = our “Pj” (latent class probabilities)

R function: poLCA > poLCA(formula = cbind(Y1, Y2, Y3, Y4, Y5) ~ 1, data = j2, nclass = 2) Conditional item response (column) probabilities, by outcome variable, for each class (row) $V1 Pr(1) Pr(2) class 1: 0.9534 0.0466 class 2: 0.2851 0.7149 $V2 class 1: 0.7568 0.2432 class 2: 0.1559 0.8441 $V3 class 1: 0.7848 0.2152 class 2: 0.2143 0.7857 $V4 Pr(1) Pr(2) class 1: 0.7880 0.2120 class 2: 0.0571 0.9429 $V5 class 1: 0.7794 0.2206 class 2: 0.0703 0.9297 Estimated class population shares 0.6059 0.3941

R function: poLCA ========================================================= Fit for 2 latent classes: number of observations: 100 number of estimated parameters: 11 residual degrees of freedom: 20 maximum log-likelihood: -276.3204 AIC(2): 574.6407 BIC(2): 603.2976 G^2(2): 26.85842 (Likelihood ratio/deviance statistic) X^2(2): 27.31029 (Chi-square goodness of fit)

Post-traumatic stress disorder Part II: Application Post-traumatic stress disorder

Data set up (immediately following Macro) Pull in pre-existing data A convenient way to code “patterns” Dataset to pass on to LCA

Pattern frequency listing pscor=b1+10*b4+100*b5+1000*c1+10000*c2+100000*c5+1000000*d1+10000000*d2+100000000*d3; No symptoms b1 only b4 only b1 & b4

LCA Macro “Call” Name of dataset Response variables (9 of them) Number of classes Initial parameters “Canned” initialization & other starts yield same Am arranging for “low” symptom probability to be the “last” class (relevant for LCR)

Output Class Latent class probabilities Class 3 prevalence 1 2 3 Latent class probabilities Class 3 prevalence Class 1 2 3 conditional probabilities Pr(B1=1|Class 3)

Classes reordered for reporting

Revisiting the Model for “Fit” 5 class model “None,” “PTSD” classes very stable AIC, BIC: both lower AIC, BIC LR test: Better LR test

Revisiting the Model for “Fit” Five class model appears “better” Trustworthy? Data quite sparse! Seeing is believing—thus….

Checking Fit - Residuals Standardized residuals (multinomial) In this case, residuals are actual cell counts vs. expected cell counts.

Expected counts: SAS Macro Pull the expected values into a dataset. They’re labeled “expected”—rename them to avoid code-word problems Sort and tabulate to show the pattern, observed count, and expected count

Observed vs. Expected Comparison Three class Five class Cut and paste into Excel: Stat transfer to Stata

Data structure in Stata 3 5 Obs Class Class

QQPlot of residuals, 5 vs 3 class . gen resid3=n-tclass . gen resid5=n-fclass . qqplot resid5 resid3

QQPlot of standardized residuals, 5 vs 3 class . gen sresid3 = resid3/sqrt(tclass*(1-tclass/1827)) . gen sresid5 = resid5/sqrt(fclass*(1-fclass/1827)) . qqplot sresid5 sresid3 Favors 5-class model

Listing–Largest |Standardized Residual| Differences Negative difference favor 3-class model. Only a few large— these have small n. . gen sadiff = abs(sresid3)-abs(sresid5) . sort sadiff . list Pattern sadiff n tclass fclass sresid3 sresid5

Listing–Largest |Standardized Residual| Differences . gen sadiff = abs(sresid3)-abs(sresid5) . sort sadiff . list Pattern sadiff n tclass fclass sresid3 sresid5 Both models underestimate the number having all symptoms Positive values favor 5-class model. A few large values have considerably large n (ex/ 110 = cues create distress, reactivity without re-experiencing).

Conclusion The latent class model fit suggests a nosology with subpopulations exhibiting “few” (just over half), “many” (~14%) and “re-experiencing plus a few other” symptoms. The conditional independence assumption may not be reasonable for these data

Objectives For you to leave here knowing… How to use the LCR SAS Macro for latent class analysis Brief introduction to poLCA in R How to interpret, report output How to create residuals and conduct model checking with them