Presentation is loading. Please wait.

Presentation is loading. Please wait.

Caihong R. Li, MS Latent Class Analysis in Mplus November 7, 2017

Similar presentations


Presentation on theme: "Caihong R. Li, MS Latent Class Analysis in Mplus November 7, 2017"— Presentation transcript:

1 Caihong R. Li, MS Latent Class Analysis in Mplus November 7, 2017
Applied Psychometric Strategies Lab Applied Quantitative and Psychometric Series Caihong R. Li, MS Latent Class Analysis in Mplus November 7, 2017

2 What will we learn today?
Describe latent class analysis (LCA) Identify questions that can be answered by LCA Differentiate between LCA and factor analysis Describe LCA steps in Mplus Provide an empirical example using LCA

3 What is LCA? LCA is a latent variable modeling approach
LCA identifies unseen (latent) subgroups within a population, using responses from a set of variables Variables in a LCA can be nominal, ordinal, or continuous

4 What are common applications of LCA?
Identify subgroups of students (e.g., over-confident students vs. less-confident students) LCA could also be used as a diagnostic test in clinical settings (e.g., assessing the validity of scores from a cognitive assessment) LCA could also be used to classify a sample into subgroups when we don’t have a “gold standard”(e.g., the cut score between “self-regulators” and “procrastinators”) LCA could also be used to adjust for noise caused by invalid responding, nonresponse bias, etc.

5 What research questions can be answered by LCA?
Are there different latent classes of students based on their responses to a set of items measuring a variable? If we hypothesize that the participants in my sample can be grouped into two latent classes, how do we confirm this hypothesis? If two latent classes are identified, what is the sample size per latent class? Given someone’s response pattern, what is the probability that a person belongs to a certain class?

6 What is modeled in LCA? The “latent classes” variable
Item 1 Item 2 Item 3 Item 4 Item 5 Latent Classes The “latent classes” variable The “latent classes” variable is a categorical latent variable and the categories being the types of latent classes Latent classes differ from each other in their response patterns Individuals in each class are similar to each other in their response patterns Item 1 Item 2 Item 3 Item 4 Item 5 Latent Class 1 Item 1 Item 2 Item 3 Item 4 Item 5 Latent Class 2

7 What is the difference between LCA and factor analysis (FA)?
Item 1 Item 2 Item 3 Item 4 Item 5 Latent classes Item 1 Item 2 Item 3 Item 4 Item 5 A latent continuous construct You might wonder I have seen the figure in CFAs and EFAs. But this figure is for LCA so what is the difference between LCA and factor analysis? These two are both latent modelling approaches. If the latent variable we try to model is continuous, it’s a factor analylsis. But if the latent variable is a categorical variable, it’s latent class analysis. LCA: Person-centered FA: Item-centered

8 Let’s Review LCA identifies unseen subgroups within a population, using responses from a set of variables/items LCA is commonly used when it’s necessary to identify latent classes from a sample or a “gold rule” of classifying people is NOT readily available The only variable modeled in LCA is the “latent classes” variable LCA classifies people NOT items

9 What are the BASIC steps when conducting a LCA in Mplus?
Interpret LCA results Evaluate LCA models Estimate LCA models Identify LCA indicators 1 2 3 4

10 1: Identify LCA indicators
Determine the items/variables you want to use and that make sense for your purposes Existing instrument Write a set of items for your purpose A combination of the above options

11 Example: Do we have invalid respondents in online survey responses?
Data (FAKE!) N = 1,000 Mplus Code Indicator/Item Categories Options 1. Speedy Average response time to an item is less than 3s 2 1 = less than 3s; 0 = equal or greater than 3s 2. Lying I have told the truth on this survey. 1 = NO; 0 = YES 3. Careless I was careless when I answered this survey. 1 = YES; 0 = NO 4. Disable I have more than two types of disabilities. 5. Extreme Mean score ranked 99 percentile or higher 1 = 99th percentile or higher; 0 = lower than 99th percentile 1

12 2:Estimate LCA models Speedy Lying Careless Disable Extreme 2 Latent
Classes Speedy Lying Careless Disable Extreme 3 Latent Classes Speedy Lying Careless Disable Extreme 4 Latent Classes Speedy Lying Careless Disable Extreme 5 Latent Classes

13 Create Mplus syntax Create the syntax for a 2-class LCA model
Make sure the syntax runs properly Using the 2-class LCA model syntax as a template, create syntaxes for LCA models with 3 latent classes, 4 latent classes, and 5 latent classes, separately Run the rest of the syntaxes

14 Create syntax for 2-latent-class model
Let’s look at the syntax file for a LCA of 2 classes. All the blue ones are commands and all the black ones are statements where we can modify stuff. Let’s look at the syntax one by one. Title command gives a name to the syntax file. Data command tells Mplus where to find the dataset. So LCAaexample.csv means I save the dataset under the same folder where the input and output are saved. Variable command tells Mplus what variables are included in the dataset, which variables we will use for the current analysis, and which variables are categorical. Here, Let’s pay particular attention to the Classes statement. It tells Mplus there is one categorical latent variable (which we call it c) and it has 2 levels. The Analysis command tells mplus we need a type of mixture model. This is how we request for a latent class analysis. The next section of the syntax is all about the LCA result. We request for the plot command so we can combine most of the LCA result information into a figure. The series statement is to link the x-axis with the items, with Speedy labeled as 1, lying labeled as 2… Under savedata command, we ask Mplus to save the variable “class” into the dataset (this is a default). We asked Mplus to save the conditional probability into the dataset as well, so for each individual, we can tell one’s probability of belonging to class 1 or class 2. We specify tech11 and tech14 under output, to request the 2 tests assessing the number of classes. Tech 11 is for Vuong-Lo-Mendell-Rubin test and Tech14 is for the bootstrapped parametric likelihood ratio test. To recap, all the blue circles are regular commands we use in Mplus. The red boxes are the syntaxes we needed for specifying a latent class analysis. All the orange syntaxes are for the LCA results. The last thing I want your special attention is the line LRTstarts = ; We will see it again in the output file. This statement is related with the tech14 output. If the output for tech14 reported error, we need to go back to the syntax to increase the last two numbers. So, the default is , and I’ve changed it to to make the syntax work. More complex LCA models require bigger numbers.

15 2-latent-class model (con.)
Create syntax for 2-latent-class model (con.) Let’s look at the syntax file for a LCA of 2 classes. All the blue ones are commands and all the black ones are statements where we can modify stuff. Let’s look at the syntax one by one. Title command gives a name to the syntax file. Data command tells Mplus where to find the dataset. So LCAaexample.csv means I save the dataset under the same folder where the input and output are saved. Variable command tells Mplus what variables are included in the dataset, which variables we will use for the current analysis, and which variables are categorical. Here, Let’s pay particular attention to the Classes statement. It tells Mplus there is one categorical latent variable (which we call it c) and it has 2 levels. The Analysis command tells mplus we need a type of mixture model. This is how we request for a latent class analysis. The next section of the syntax is all about the LCA result. We request for the plot command so we can combine most of the LCA result information into a figure. The series statement is to link the x-axis with the items, with Speedy labeled as 1, lying labeled as 2… Under savedata command, we ask Mplus to save the variable “class” into the dataset (this is a default). We asked Mplus to save the conditional probability into the dataset as well, so for each individual, we can tell one’s probability of belonging to class 1 or class 2. We specify tech11 and tech14 under output, to request the 2 tests assessing the number of classes. Tech 11 is for Vuong-Lo-Mendell-Rubin test and Tech14 is for the bootstrapped parametric likelihood ratio test. To recap, all the blue circles are regular commands we use in Mplus. The red boxes are the syntaxes we needed for specifying a latent class analysis. All the orange syntaxes are for the LCA results. The last thing I want your special attention is the line LRTstarts = ; We will see it again in the output file. This statement is related with the tech14 output. If the output for tech14 reported error, we need to go back to the syntax to increase the last two numbers. So, the default is , and I’ve changed it to to make the syntax work. More complex LCA models require bigger numbers.

16 2-latent-class model (con.)
Create syntax for 2-latent-class model (con.) Let’s look at the syntax file for a LCA of 2 classes. All the blue ones are commands and all the black ones are statements where we can modify stuff. Let’s look at the syntax one by one. Title command gives a name to the syntax file. Data command tells Mplus where to find the dataset. So LCAaexample.csv means I save the dataset under the same folder where the input and output are saved. Variable command tells Mplus what variables are included in the dataset, which variables we will use for the current analysis, and which variables are categorical. Here, Let’s pay particular attention to the Classes statement. It tells Mplus there is one categorical latent variable (which we call it c) and it has 2 levels. The Analysis command tells mplus we need a type of mixture model. This is how we request for a latent class analysis. The next section of the syntax is all about the LCA result. We request for the plot command so we can combine most of the LCA result information into a figure. The series statement is to link the x-axis with the items, with Speedy labeled as 1, lying labeled as 2… Under savedata command, we ask Mplus to save the variable “class” into the dataset (this is a default). We asked Mplus to save the conditional probability into the dataset as well, so for each individual, we can tell one’s probability of belonging to class 1 or class 2. We specify tech11 and tech14 under output, to request the 2 tests assessing the number of classes. Tech 11 is for Vuong-Lo-Mendell-Rubin test and Tech14 is for the bootstrapped parametric likelihood ratio test. To recap, all the blue circles are regular commands we use in Mplus. The red boxes are the syntaxes we needed for specifying a latent class analysis. All the orange syntaxes are for the LCA results. The last thing I want your special attention is the line LRTstarts = ; We will see it again in the output file. This statement is related with the tech14 output. If the output for tech14 reported error, we need to go back to the syntax to increase the last two numbers. So, the default is , and I’ve changed it to to make the syntax work. More complex LCA models require bigger numbers.

17 Create syntaxes for other models
If we want to specify a 3-class LCA model, where do we make changes? The only place we need to change is the classes statement. Instead of specifying the latent categorical variable “c” contains two categories, we change it into 3. In this way, we can create syntaxes for analyzing a 4-class LCA model and 5-class LCA model. There is a second place we need to change. That is the first statement under savedata. In every new LCA model, the mplus will produce a new class variable and a new conditional probability variable. So we need to give a new name to the data file here. Let’s change it into lca3_save.txt.

18 Mplus files for each LCA model
Mplus files for LCA For each LCA model, 4 files are connected with it: Input file Output file Graph file New data file

19 3: Evaluate LCA models Q: How many classes should we retain?
A: Multiple statistical criteria Bayesian Information Criterion (BIC) Adjusted BIC (ABIC) Lo-Mendell-Rubin likelihood ratio test (LMR LRT) The bootstrap likelihood ratio test (Bootstrap LRT) Interpretability BIC and ABIC are goodness-of-fit test, for a goodness-of-fit test, the smaller the better. If the p value of LMR test and the Bootstrap LRT is significant, it indicate that the model fits better than the previous model with one less class. Bayes factor tested with two given models, which one is the true, given there is only one true model. A BF value greater than 10, indicating a true model. cmP also offers us a sense of true model. The model with the largest cmP value is considered true.

20 BIC & ABIC LCA Models BIC ABIC 2-Class 2311.414 2276.477 3-Class
4-Class 5-Class

21 LMR LRT LCA Models p for LMR 2-Class vs. 1-Class <.001
.0244 4-Class vs. 3-Class .0017 5-Class vs. 4-Class .0089

22 Bootstrap LRT LCA Models p for Bootstrap 2-Class vs. 1-Class < .001
.1765 4-Class vs. 3-Class .0698 5-Class vs. 4-Class .1923

23 Which model is the best? LCA Models BIC ABIC 2-Class 2311.414 2276.477
4-Class 5-Class LCA Models p for LMR p for Bootstrap 2-Class vs. 1-Class < .001 3-Class vs. 2-Class .0244 .1765 4-Class vs. 3-Class .0017 .0698 5-Class vs. 4-Class .0089 .1923

24 4: Interpret LCA Results
Given a person belongs to a certain class, what is this person’s probability of saying “yes” to each item? What should we label each latent class? Given a person’s response pattern, what is the probability that person belongs to a certain class? What is the sample size of each latent class?

25 Given a person belongs to a certain class, what is this person’s probability of saying “yes” to each item? Category 1 is 0 ,no Category 2 is 1, yes. For example, if one person belongs to Latent Class 1, then he has a higher probability of endorsing speedy (93.1% of the times this person will report being speedy), similary, we could tell, indivdividuals belongs to latent class 1 has high probabilities of endorsing all five items. In contrast, individuals in Latent Class 2 tends to have low probability of endorsing these five items. Thus, we decided to name Latent Class 1 as invalid respondents, and Latent Class 2 as Honest respondents.

26 What should we label each latent class?
Category 1 is 0 ,no Category 2 is 1, yes. For example, if one person belongs to Latent Class 1, then he has a higher probability of endorsing speedy (93.1% of the times this person will report being speedy), similary, we could tell, indivdividuals belongs to latent class 1 has high probabilities of endorsing all five items. In contrast, individuals in Latent Class 2 tends to have low probability of endorsing these five items. Thus, we decided to name Latent Class 1 as invalid respondents, and Latent Class 2 as Honest respondents.

27 Given a person’s response pattern, what is the probability that person belongs to a certain class?
lca2_save.txt:

28 What is the sample size of each latent class?
Intervention? Sensitivity analysis

29 All in one plot Speedy Careless Lying Disable Extreme Figure 1. Item probability profiles for 2-class latent class model (N = 1,000)

30 How to get the item probability profile plot?

31 What does a bad model look like?
Figure 2. Item probability profiles for 3-class latent class model (N = 1,000)

32 How can we use the “latent classes” variable?
Relationship with covariates Relationship with outcome variables Speedy Lying Careless Disable Extreme Latent classes Speedy Lying Careless Disable Extreme Latent classes Gender Race Gender Self-efficacy Race Under analysis, add c on gender race c on gender race; SE on c gender race;

33 Troubleshooting bootstrap LRT

34 Common problem with bootstrap LRT

35 How to solve it? LRTSTARTS = 0 0 300 20;
LRTSTARTS = ; by default LRTSTARTS = ;

36 Problem solved!

37 Things to keep in mind when doing LCA
LCA classifies people (not items) into unseen subgroups A large dataset is required With many indicators, consider the full version of Mplus Other software to complete LCA: LatentGold, Proc LCA in SAS, poLCA in R Mplus demo: Maximum number of dependent variables: 6 Maximum number of independent variables: 2 Small dataset will provides lots and lots of errors and if you make inferential statistics that might be biased.

38 References Asparouhov, T. & Muthén, B. (2012). Using Mplus TECH11 and TECH14 to test the number of latent classes. Mplus Web Notes: No. 14. May 22, 2012 Denson, N., & Ing, M. (2014). Latent class analysis in higher education: An illustrative example of pluralistic orientation. Research in Higher Education, 55, doi: /s Porcu, M. & Giambona, F. (2017). Introduction to latent class analysis with applications. Journal of Early Adolescence, 37, doi: /

39 How do I cite and reference this talk?
If you wish to cite the video of this AQPS Talk, please use this reference and citation: Reference: Li, C.R. (2017, November). Latent class analysis in Mplus. [Video file]. Retrieved from In-text citation: Li (2017) or (Li, 2017) This PowerPoint Handout can be found at the APS Lab website: You can download the fake raw data and Mplus input and output syntax used in this talk from the APS Lab website. You are encouraged to adapt our syntax for your own research—please use this reference and citation when doing so: Li, C.R. (2017). Name of specific syntax file you adapted from us goes here  [Data file]. Retrieved from


Download ppt "Caihong R. Li, MS Latent Class Analysis in Mplus November 7, 2017"

Similar presentations


Ads by Google