Caihong R. Li, MS Latent Class Analysis in Mplus November 7, 2017

Slides:



Advertisements
Similar presentations
Standardized Scales.
Advertisements

Statistical Issues in Research Planning and Evaluation
5/15/2015Slide 1 SOLVING THE PROBLEM The one sample t-test compares two values for the population mean of a single variable. The two-sample test of a population.
Latent Growth Curve Modeling In Mplus:
Developing the Research Question
Multiple Regression – Basic Relationships
SW388R7 Data Analysis & Computers II Slide 1 Multiple Regression – Basic Relationships Purpose of multiple regression Different types of multiple regression.
Social Science Research Design and Statistics, 2/e Alfred P. Rovai, Jason D. Baker, and Michael K. Ponton Internal Consistency Reliability Analysis PowerPoint.
Copyright © 2008 by Pearson Education, Inc. Upper Saddle River, New Jersey All rights reserved. John W. Creswell Educational Research: Planning,
Mixture Modeling Chongming Yang Research Support Center FHSS College.
8/20/2015Slide 1 SOLVING THE PROBLEM The two-sample t-test compare the means for two groups on a single variable. the The paired t-test compares the means.
EVIDENCE ABOUT DIAGNOSTIC TESTS Min H. Huang, PT, PhD, NCS.
QUANTITATIVE RESEARCH AND BASIC STATISTICS. TODAYS AGENDA Progress, challenges and support needed Response to TAP Check-in, Warm-up responses and TAP.
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
Regression Chapter 16. Regression >Builds on Correlation >The difference is a question of prediction versus relation Regression predicts, correlation.
CFA: Basics Beaujean Chapter 3. Other readings Kline 9 – a good reference, but lumps this entire section into one chapter.
1-1 Copyright © 2014, 2011, and 2008 Pearson Education, Inc.
Chapter 13 Understanding research results: statistical inference.
Lesson 3 Measurement and Scaling. Case: “What is performance?” brandesign.co.za.
(Slides not created solely by me – the internet is a wonderful tool) SW388R7 Data Analysis & Compute rs II Slide 1.
Growth mixture modeling
Using Latent Variable Models in Survey Research Roger E. Millsap Arizona State University Contact: (480)
Outline Sampling Measurement Descriptive Statistics:
Lecture #8 Thursday, September 15, 2016 Textbook: Section 4.4
Data measurement, probability and Spearman’s Rho
Chapter 8: Estimating with Confidence
AP CSP: Cleaning Data & Creating Summary Tables
Measurements Statistics
Chapter 8: Estimating with Confidence
BINARY LOGISTIC REGRESSION
Classification Methods
Logistic Regression APKC – STATS AFAC (2016).
Statistics made simple Dr. Jennifer Capers
Experimental Psychology
Applied Biostatistics: Lecture 2
Assumption of normality
Hypothesis Testing and Confidence Intervals (Part 1): Using the Standard Normal Lecture 8 Justin Kern October 10 and 12, 2017.
QM222 A1 Nov. 27 More tips on writing your projects
M7Plus Unit-10: Statistics CMAPP Days (Compacted Days 1 – 5 )
Experimental Psychology PSY 433
LINDSEY BREWER CSSCR (CENTER FOR SOCIAL SCIENCE COMPUTATION AND RESEARCH) UNIVERSITY OF WASHINGTON September 17, 2009 Introduction to SPSS (Version 16)
ECONOMETRICS ii – spring 2018
Chapter Eight: Quantitative Methods
Stat 217 – Day 28 Review Stat 217.
Chapter 1: Exploring Data
Hypothesis Construction
Statistics for the Social Sciences
INTEGRATED LEARNING CENTER
Analysis and Interpretation of Experimental Findings
UNDERSTANDING RESEARCH RESULTS: STATISTICAL INFERENCE
Chapter 8: Estimating with Confidence
How to Start This PowerPoint® Tutorial
Introduction to SAS Essentials Mastering SAS for Data Analytics
15.1 The Role of Statistics in the Research Process
Chapter 8: Estimating with Confidence
Chapter 7 (Probability)
Multiple Regression – Split Sample Validation
Confidence Intervals for Proportions
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter Nine: Using Statistics to Answer Questions
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Confidence Intervals for Proportions
Chapter 8: Estimating with Confidence
Rachael Bedford Mplus: Longitudinal Analysis Workshop 23/06/2015
MGS 3100 Business Analysis Regression Feb 18, 2016
Chapter 13 Excel Extension: Now You Try!
Presentation transcript:

Caihong R. Li, MS Latent Class Analysis in Mplus November 7, 2017 Applied Psychometric Strategies Lab Applied Quantitative and Psychometric Series Caihong R. Li, MS Latent Class Analysis in Mplus November 7, 2017

What will we learn today? Describe latent class analysis (LCA) Identify questions that can be answered by LCA Differentiate between LCA and factor analysis Describe LCA steps in Mplus Provide an empirical example using LCA

What is LCA? LCA is a latent variable modeling approach LCA identifies unseen (latent) subgroups within a population, using responses from a set of variables Variables in a LCA can be nominal, ordinal, or continuous

What are common applications of LCA? Identify subgroups of students (e.g., over-confident students vs. less-confident students) LCA could also be used as a diagnostic test in clinical settings (e.g., assessing the validity of scores from a cognitive assessment) LCA could also be used to classify a sample into subgroups when we don’t have a “gold standard”(e.g., the cut score between “self-regulators” and “procrastinators”) LCA could also be used to adjust for noise caused by invalid responding, nonresponse bias, etc.

What research questions can be answered by LCA? Are there different latent classes of students based on their responses to a set of items measuring a variable? If we hypothesize that the participants in my sample can be grouped into two latent classes, how do we confirm this hypothesis? If two latent classes are identified, what is the sample size per latent class? Given someone’s response pattern, what is the probability that a person belongs to a certain class?

What is modeled in LCA? The “latent classes” variable Item 1 Item 2 Item 3 Item 4 Item 5 Latent Classes The “latent classes” variable The “latent classes” variable is a categorical latent variable and the categories being the types of latent classes Latent classes differ from each other in their response patterns Individuals in each class are similar to each other in their response patterns Item 1 Item 2 Item 3 Item 4 Item 5 Latent Class 1 Item 1 Item 2 Item 3 Item 4 Item 5 Latent Class 2

What is the difference between LCA and factor analysis (FA)? Item 1 Item 2 Item 3 Item 4 Item 5 Latent classes Item 1 Item 2 Item 3 Item 4 Item 5 A latent continuous construct You might wonder I have seen the figure in CFAs and EFAs. But this figure is for LCA so what is the difference between LCA and factor analysis? These two are both latent modelling approaches. If the latent variable we try to model is continuous, it’s a factor analylsis. But if the latent variable is a categorical variable, it’s latent class analysis. LCA: Person-centered FA: Item-centered

Let’s Review LCA identifies unseen subgroups within a population, using responses from a set of variables/items LCA is commonly used when it’s necessary to identify latent classes from a sample or a “gold rule” of classifying people is NOT readily available The only variable modeled in LCA is the “latent classes” variable LCA classifies people NOT items

What are the BASIC steps when conducting a LCA in Mplus? Interpret LCA results Evaluate LCA models Estimate LCA models Identify LCA indicators 1 2 3 4

1: Identify LCA indicators Determine the items/variables you want to use and that make sense for your purposes Existing instrument Write a set of items for your purpose A combination of the above options

Example: Do we have invalid respondents in online survey responses? Data (FAKE!) N = 1,000 Mplus Code Indicator/Item Categories Options 1. Speedy Average response time to an item is less than 3s 2 1 = less than 3s; 0 = equal or greater than 3s 2. Lying I have told the truth on this survey. 1 = NO; 0 = YES 3. Careless I was careless when I answered this survey. 1 = YES; 0 = NO 4. Disable I have more than two types of disabilities. 5. Extreme Mean score ranked 99 percentile or higher 1 = 99th percentile or higher; 0 = lower than 99th percentile 1

2:Estimate LCA models Speedy Lying Careless Disable Extreme 2 Latent Classes Speedy Lying Careless Disable Extreme 3 Latent Classes Speedy Lying Careless Disable Extreme 4 Latent Classes Speedy Lying Careless Disable Extreme 5 Latent Classes

Create Mplus syntax Create the syntax for a 2-class LCA model Make sure the syntax runs properly Using the 2-class LCA model syntax as a template, create syntaxes for LCA models with 3 latent classes, 4 latent classes, and 5 latent classes, separately Run the rest of the syntaxes

Create syntax for 2-latent-class model Let’s look at the syntax file for a LCA of 2 classes. All the blue ones are commands and all the black ones are statements where we can modify stuff. Let’s look at the syntax one by one. Title command gives a name to the syntax file. Data command tells Mplus where to find the dataset. So LCAaexample.csv means I save the dataset under the same folder where the input and output are saved. Variable command tells Mplus what variables are included in the dataset, which variables we will use for the current analysis, and which variables are categorical. Here, Let’s pay particular attention to the Classes statement. It tells Mplus there is one categorical latent variable (which we call it c) and it has 2 levels. The Analysis command tells mplus we need a type of mixture model. This is how we request for a latent class analysis. The next section of the syntax is all about the LCA result. We request for the plot command so we can combine most of the LCA result information into a figure. The series statement is to link the x-axis with the items, with Speedy labeled as 1, lying labeled as 2… Under savedata command, we ask Mplus to save the variable “class” into the dataset (this is a default). We asked Mplus to save the conditional probability into the dataset as well, so for each individual, we can tell one’s probability of belonging to class 1 or class 2. We specify tech11 and tech14 under output, to request the 2 tests assessing the number of classes. Tech 11 is for Vuong-Lo-Mendell-Rubin test and Tech14 is for the bootstrapped parametric likelihood ratio test. To recap, all the blue circles are regular commands we use in Mplus. The red boxes are the syntaxes we needed for specifying a latent class analysis. All the orange syntaxes are for the LCA results. The last thing I want your special attention is the line LRTstarts = 0 0 300 20; We will see it again in the output file. This statement is related with the tech14 output. If the output for tech14 reported error, we need to go back to the syntax to increase the last two numbers. So, the default is 0 0 40 8, and I’ve changed it to 0 0 300 20 to make the syntax work. More complex LCA models require bigger numbers.

2-latent-class model (con.) Create syntax for 2-latent-class model (con.) Let’s look at the syntax file for a LCA of 2 classes. All the blue ones are commands and all the black ones are statements where we can modify stuff. Let’s look at the syntax one by one. Title command gives a name to the syntax file. Data command tells Mplus where to find the dataset. So LCAaexample.csv means I save the dataset under the same folder where the input and output are saved. Variable command tells Mplus what variables are included in the dataset, which variables we will use for the current analysis, and which variables are categorical. Here, Let’s pay particular attention to the Classes statement. It tells Mplus there is one categorical latent variable (which we call it c) and it has 2 levels. The Analysis command tells mplus we need a type of mixture model. This is how we request for a latent class analysis. The next section of the syntax is all about the LCA result. We request for the plot command so we can combine most of the LCA result information into a figure. The series statement is to link the x-axis with the items, with Speedy labeled as 1, lying labeled as 2… Under savedata command, we ask Mplus to save the variable “class” into the dataset (this is a default). We asked Mplus to save the conditional probability into the dataset as well, so for each individual, we can tell one’s probability of belonging to class 1 or class 2. We specify tech11 and tech14 under output, to request the 2 tests assessing the number of classes. Tech 11 is for Vuong-Lo-Mendell-Rubin test and Tech14 is for the bootstrapped parametric likelihood ratio test. To recap, all the blue circles are regular commands we use in Mplus. The red boxes are the syntaxes we needed for specifying a latent class analysis. All the orange syntaxes are for the LCA results. The last thing I want your special attention is the line LRTstarts = 0 0 300 20; We will see it again in the output file. This statement is related with the tech14 output. If the output for tech14 reported error, we need to go back to the syntax to increase the last two numbers. So, the default is 0 0 40 8, and I’ve changed it to 0 0 300 20 to make the syntax work. More complex LCA models require bigger numbers.

2-latent-class model (con.) Create syntax for 2-latent-class model (con.) Let’s look at the syntax file for a LCA of 2 classes. All the blue ones are commands and all the black ones are statements where we can modify stuff. Let’s look at the syntax one by one. Title command gives a name to the syntax file. Data command tells Mplus where to find the dataset. So LCAaexample.csv means I save the dataset under the same folder where the input and output are saved. Variable command tells Mplus what variables are included in the dataset, which variables we will use for the current analysis, and which variables are categorical. Here, Let’s pay particular attention to the Classes statement. It tells Mplus there is one categorical latent variable (which we call it c) and it has 2 levels. The Analysis command tells mplus we need a type of mixture model. This is how we request for a latent class analysis. The next section of the syntax is all about the LCA result. We request for the plot command so we can combine most of the LCA result information into a figure. The series statement is to link the x-axis with the items, with Speedy labeled as 1, lying labeled as 2… Under savedata command, we ask Mplus to save the variable “class” into the dataset (this is a default). We asked Mplus to save the conditional probability into the dataset as well, so for each individual, we can tell one’s probability of belonging to class 1 or class 2. We specify tech11 and tech14 under output, to request the 2 tests assessing the number of classes. Tech 11 is for Vuong-Lo-Mendell-Rubin test and Tech14 is for the bootstrapped parametric likelihood ratio test. To recap, all the blue circles are regular commands we use in Mplus. The red boxes are the syntaxes we needed for specifying a latent class analysis. All the orange syntaxes are for the LCA results. The last thing I want your special attention is the line LRTstarts = 0 0 300 20; We will see it again in the output file. This statement is related with the tech14 output. If the output for tech14 reported error, we need to go back to the syntax to increase the last two numbers. So, the default is 0 0 40 8, and I’ve changed it to 0 0 300 20 to make the syntax work. More complex LCA models require bigger numbers.

Create syntaxes for other models If we want to specify a 3-class LCA model, where do we make changes? The only place we need to change is the classes statement. Instead of specifying the latent categorical variable “c” contains two categories, we change it into 3. In this way, we can create syntaxes for analyzing a 4-class LCA model and 5-class LCA model. There is a second place we need to change. That is the first statement under savedata. In every new LCA model, the mplus will produce a new class variable and a new conditional probability variable. So we need to give a new name to the data file here. Let’s change it into lca3_save.txt.

Mplus files for each LCA model Mplus files for LCA For each LCA model, 4 files are connected with it: Input file Output file Graph file New data file

3: Evaluate LCA models Q: How many classes should we retain? A: Multiple statistical criteria Bayesian Information Criterion (BIC) Adjusted BIC (ABIC) Lo-Mendell-Rubin likelihood ratio test (LMR LRT) The bootstrap likelihood ratio test (Bootstrap LRT) Interpretability BIC and ABIC are goodness-of-fit test, for a goodness-of-fit test, the smaller the better. If the p value of LMR test and the Bootstrap LRT is significant, it indicate that the model fits better than the previous model with one less class. Bayes factor tested with two given models, which one is the true, given there is only one true model. A BF value greater than 10, indicating a true model. cmP also offers us a sense of true model. The model with the largest cmP value is considered true.

BIC & ABIC LCA Models BIC ABIC 2-Class 2311.414 2276.477 3-Class 2342.577 2288.584 4-Class 2371.190 2298.140 5-Class 2405.083 2312.977

LMR LRT LCA Models p for LMR 2-Class vs. 1-Class <.001 .0244 4-Class vs. 3-Class .0017 5-Class vs. 4-Class .0089

Bootstrap LRT LCA Models p for Bootstrap 2-Class vs. 1-Class < .001 .1765 4-Class vs. 3-Class .0698 5-Class vs. 4-Class .1923

Which model is the best? LCA Models BIC ABIC 2-Class 2311.414 2276.477 2342.577 2288.584 4-Class 2371.190 2298.140 5-Class 2405.083 2312.977 LCA Models p for LMR p for Bootstrap 2-Class vs. 1-Class < .001 3-Class vs. 2-Class .0244 .1765 4-Class vs. 3-Class .0017 .0698 5-Class vs. 4-Class .0089 .1923

4: Interpret LCA Results Given a person belongs to a certain class, what is this person’s probability of saying “yes” to each item? What should we label each latent class? Given a person’s response pattern, what is the probability that person belongs to a certain class? What is the sample size of each latent class?

Given a person belongs to a certain class, what is this person’s probability of saying “yes” to each item? Category 1 is 0 ,no Category 2 is 1, yes. For example, if one person belongs to Latent Class 1, then he has a higher probability of endorsing speedy (93.1% of the times this person will report being speedy), similary, we could tell, indivdividuals belongs to latent class 1 has high probabilities of endorsing all five items. In contrast, individuals in Latent Class 2 tends to have low probability of endorsing these five items. Thus, we decided to name Latent Class 1 as invalid respondents, and Latent Class 2 as Honest respondents.

What should we label each latent class? Category 1 is 0 ,no Category 2 is 1, yes. For example, if one person belongs to Latent Class 1, then he has a higher probability of endorsing speedy (93.1% of the times this person will report being speedy), similary, we could tell, indivdividuals belongs to latent class 1 has high probabilities of endorsing all five items. In contrast, individuals in Latent Class 2 tends to have low probability of endorsing these five items. Thus, we decided to name Latent Class 1 as invalid respondents, and Latent Class 2 as Honest respondents.

Given a person’s response pattern, what is the probability that person belongs to a certain class? lca2_save.txt:

What is the sample size of each latent class? Intervention? Sensitivity analysis

All in one plot Speedy Careless Lying Disable Extreme Figure 1. Item probability profiles for 2-class latent class model (N = 1,000)

How to get the item probability profile plot?

What does a bad model look like? Figure 2. Item probability profiles for 3-class latent class model (N = 1,000)

How can we use the “latent classes” variable? Relationship with covariates Relationship with outcome variables Speedy Lying Careless Disable Extreme Latent classes Speedy Lying Careless Disable Extreme Latent classes Gender Race Gender Self-efficacy Race Under analysis, add c on gender race c on gender race; SE on c gender race;

Troubleshooting bootstrap LRT

Common problem with bootstrap LRT

How to solve it? LRTSTARTS = 0 0 300 20; LRTSTARTS = 0 0 40 8; by default LRTSTARTS = 0 0 300 20;

Problem solved!

Things to keep in mind when doing LCA LCA classifies people (not items) into unseen subgroups A large dataset is required With many indicators, consider the full version of Mplus Other software to complete LCA: LatentGold, Proc LCA in SAS, poLCA in R Mplus demo: Maximum number of dependent variables: 6 Maximum number of independent variables: 2 Small dataset will provides lots and lots of errors and if you make inferential statistics that might be biased.

References Asparouhov, T. & Muthén, B. (2012). Using Mplus TECH11 and TECH14 to test the number of latent classes. Mplus Web Notes: No. 14. May 22, 2012 Denson, N., & Ing, M. (2014). Latent class analysis in higher education: An illustrative example of pluralistic orientation. Research in Higher Education, 55, 508-526. doi:10.1007/s11162-013-9324-5 Porcu, M. & Giambona, F. (2017). Introduction to latent class analysis with applications. Journal of Early Adolescence, 37, 129-158. doi:10.1177/0272431616648452

How do I cite and reference this talk? If you wish to cite the video of this AQPS Talk, please use this reference and citation: Reference: Li, C.R. (2017, November). Latent class analysis in Mplus. [Video file]. Retrieved from http://sites.education.uky.edu/apslab/upcoming-events/ In-text citation: Li (2017) or (Li, 2017) This PowerPoint Handout can be found at the APS Lab website: http://sites.education.uky.edu/apslab/upcoming-events/ You can download the fake raw data and Mplus input and output syntax used in this talk from the APS Lab website. You are encouraged to adapt our syntax for your own research—please use this reference and citation when doing so: Li, C.R. (2017). Name of specific syntax file you adapted from us goes here  [Data file]. Retrieved from http://sites.education.uky.edu/apslab/upcoming-events/