Latent Class Analysis Presented by Nicholas Branic UCI Stats n’ Snacks December 9, 2014.

Slides:



Advertisements
Similar presentations
Machine Learning Homework
Advertisements

Microsoft Excel. Click on “Start,” then “Microsoft Office Excel.”
Accessing and Using the e-Book Collection from EBSCOhost ® When an arrow appears, click to proceed to the next slide at your own pace. To go back, click.
Creating Rout Paths Using CAMMaster. Step 1 Import Gerber File. Import Gerber File. User Ctrl+W to window around data. User Ctrl+W to window around data.
Introduction to Excel 2007 Part 2: Bar Graphs and Histograms February 5, 2008.
Newsletter Plugin The newsletter plugin allows you to create and send newsletters to a managed list or multiple lists of users. Your users can subscribe.
The World’s Fastest Crash Course in Statistics Or, What You Need to Know to Answer Your Research Question 13 November 2006.
1 An Introduction to IBM SPSS PSY450 Experimental Psychology Dr. Dwight Hennessy.
Downloading and Installing AutoCAD Architecture 2015 This is a 4 step process 1.Register with the Autodesk Student Community 2.Downloading the software.
SPSS 1: An Introduction to the Statistical Package SPSS Suzie Cro MRC Clinical Trials Unit.
Pet Fish and High Cholesterol in the WHI OS: An Analysis Example Joe Larson 5 / 6 / 09.
Introduction to Excel 2007 Part 3: Bar Graphs and Histograms Psych 209.
Mixture Modeling Chongming Yang Research Support Center FHSS College.
Introduction to VBA. This is not Introduction to Excel We’re going to assume you have a basic level of familiarity with Excel If you don’t, or you need.
Microsoft Office Illustrated Fundamentals Unit B: Understanding File Management.
Advanced Tables Lesson 9. Objectives Creating a Custom Table When a table template doesn’t suit your needs, you can create a custom table in Design view.
Lesson 5: Using Tables.
FW364 Ecological Problem Solving Lab 4: Blue Whale Population Variation [Ramas Lab]
Objectives Understand what MATLAB is and why it is widely used in engineering and science Start the MATLAB program and solve simple problems in the command.
Copyright © 2012 Pearson Education, Inc. Publishing as Pearson Addison-Wesley C H A P T E R 2 Input, Processing, and Output.
Key Data Management Tasks in Stata
XP 1 Microsoft Word 2002 Tutorial 1 – Creating a Document.
Organizing a project, making a table Biostatistics 212 Lecture 7.
Organizing a project, making a table Biostatistics 212 Session 5.
Just as there are many human languages, there are many computer programming languages that can be used to develop software. Some are named after people,
Colleague, Excel & Word Best of Friends Presented by: Joan Kaun & Yvonne Nelson College of the Rockies.
Downloading and Installing Autodesk Revit 2016
Create Lists in Millennium Jenny Schmidt SWITCH Library Consortium.
IC 3 BASICS, Internet and Computing Core Certification Key Applications Lesson 11 Organizing the Worksheet.
Processing Lab 3 – Header issues and trace editing Bryce Hutchinson Objectives: Fixing elevation issues Define an LMO function Pick first breaks Kill traces.
Organizing a project, making a table Biostatistics 212 Lecture 7.
Page 1 Non-Payroll Cost Transfer Enhancements Last update January 24, 2008 What are the some of the new enhancements of the Non-Payroll Cost Transfer?
ISU Basic SAS commands Laboratory No. 1 Computer Techniques for Biological Research Animal Science 500 Ken Stalder, Professor Department of Animal Science.
Lesson 12: Using the Recycle Bin deleting files or folders what the Recycle Bin is restoring files from the Recycle Bin emptying the Recycle Bin identifying.
Introduction to Enterprise Guide Jennifer Schmidt Rhonda Ellis Cassandra Hall.
Downloading and Installing Autodesk Inventor Professional 2015 This is a 4 step process 1.Register with the Autodesk Student Community 2.Downloading the.
BMTRY 789 Lecture 11: Debugging Readings – Chapter 10 (3 rd Ed) from “The Little SAS Book” Lab Problems – None Homework Due – None Final Project Presentations.
When I want to work with SQL, I start off as if I am doing a regular query.
This document gives one example of how one might be able to “fix” a meteorological file, if one finds that there may be problems with the file. There are.
Overview Excel is a spreadsheet, a grid made from columns and rows. It is a software program that can make number manipulation easy and somewhat painless.
Securing and Sharing Workbooks Lesson 11. The Review Tab Microsoft Excel provides several layers of security and protection that enable you to control.
Microsoft ® Excel 2010 Core Skills Lesson 5 Viewing and Printing Workbooks Courseware #: 3243 Microsoft ® Office Excel 2010.
CuffDiff ran successfully. Output files include gene_exp.diff What are the next steps? Use Navigation bar to find files; they may be under DNA Subway if.
Introduction to Computer Programming - Project 2 Intro to Digital Technology.
CIS 234: Project 2 Issues Dr. Ralph D. Westfall May, 2010.
Computer Literacy BASICS: A Comprehensive Guide to IC 3, 5 th Edition Lesson 18 Getting Started with Excel Essentials 1 Morrison / Wells / Ruffolo.
While loops. Iteration We’ve seen many places where repetition is necessary in a problem. We’ve been using the for loop for that purpose For loops are.
For Datatel and other applications Presented by Cheryl Sullivan.
Data Entry, Coding & Cleaning SPSS Training Thomas Joshua, MS July, 2008.
Accomplish more with macros! Presenter: Joyce Bell Princeton University
Software Development Languages and Environments. Computer Languages Just as there are many human languages, there are many computer programming languages.
Emdeon Office Batch Management Services This document provides detailed information on Batch Import Services and other Batch features.
COPA Rollover How to successfully complete the COPA School Year End Rollover from to
A step-by-Step Guide For labels or merges
IUIE Reporting Basics Workshop
Have you signed up (or had) your meeting?
EET 2259 Unit 3 Editing and Debugging VIs
Tutorial 1 – Creating a Document
ECONOMETRICS ii – spring 2018
Lesson 18 Getting Started with Excel Essentials
Using Charts in a Presentation
Stata Basic Course Lab 2.
Microsoft Office Illustrated Fundamentals
Introduction to Excel 2007 Part 3: Bar Graphs and Histograms
Chapter 7 Excel Extension: Now You Try!
Chapter 13 Excel Extension: Now You Try!
Chapter 9 Excel Extension: Now You Try!
Chapter 15 Excel Extension: Now You Try!
Presentation transcript:

Latent Class Analysis Presented by Nicholas Branic UCI Stats n’ Snacks December 9, 2014

Presentation Overview What is latent class analysis? Writing MPlus code and running LCA Importing MPlus output into Stata …And fixing an irritating importation problem

What is Latent Class Analysis? Data-driven technique for identifying group classifications “Latent” classes Shared characteristics within a unique dataset Groups not specified a priori But, groups may mirror existing theory/literature Rather, specify variables/attributes for classifications

What is Latent Class Analysis? Generate predicted probabilities of class membership

What is Latent Class Analysis? Identify class membership for cases in dataset For example, changing home mortgage loan activity across SoCal tracts ClassTractsPercent ClassTractsPercent % % % % % % % % % % % % % % % % % % %

What is Latent Class Analysis? Other statistical techniques similar to LCA: Exploratory factor analysis Principle components analysis Confirmatory factor analysis K-means cluster analysis Hot spot analysis …And I’m sure there are more examples

So How Can I Use It? Steps for using Stata and MPlus: In Stata: Prepare dataset for LCA Use “outfile” command to produce data as.txt file In MPlus: Write input file to run latent class analysis Execute the model Produce.txt output file In Stata: Import.txt output into Stata Clean LCA results Merge LCA results into original dataset

Preparing Your Data Open your full dataset Remove any cases that feature entirely missing data In Stata, any case that has all “.” values A shortcut: use the “mdesc” command (downloadable.ado file) Sort your data – not necessary, but not a bad idea Save out a copy of prepared data You’ll merge the LCA results to this dataset later

Exporting Your Data Use the “outfile” command to create a.txt form of your data

Writing LCA Code in MPlus To estimate models in Mplus, you need to write an input (.inp) file Need to include specific fields in code (e.g. TITLE, DATA, VARIABLE) Use “!” to write comments in code (like “*” in Stata) Each line of code cannot be longer than 80 characters MPlus window shows character count for selected line at bottom (e.g. Col 69)

Writing LCA Code in MPlus First, specify TITLE and DATA fields

Writing LCA Code in MPlus Under VARIABLE field, include all variables in dataset

Writing LCA Code in MPlus Indicate MISSING ARE field (for Stata, this will be “.”) USEVAR ARE lists the subset of variables to include in the LCA

Writing LCA Code in MPlus The CLASSES field indicates the number of classes to estimate

Writing LCA Code in MPlus ANALYSIS specifies the model you will run TYPE = missing mixture STARTS indicates the number of model iterations

Writing LCA Code in MPlus Next, specify the MODEL – four important parts

Writing LCA Code in MPlus For MODEL, write “%OVERALL%”

Writing LCA Code in MPlus The “%c#2%” part indicates the class solution At a minimum, you will always have at least the c#2 block of code For a three-class solution, you would repeat a “%c#3%” section, etc.

Writing LCA Code in MPlus Next, include all of your LCA-selected variables in two sections The first section is enclosed in brackets [ ]

Writing LCA Code in MPlus The second section has the same variables but without brackets

Writing LCA Code in MPlus After MODEL, specify the OUTPUT field For our purposes, use “sampstat” and “standardized”

Writing LCA Code for MPlus Finally, write the SAVEDATA field to kick out the LCA results Indicates the name of the file to create (this will always be a.txt file) Indicates what information to save out (we want “CPROBABILITIES”)

Writing LCA Code for MPlus Some additional notes: You need to include “;” to denote the end of different sections of code Read through the example input file to see all of the necessary locations

Writing LCA Code for MPlus Some additional notes: Remember that you need to keep each line of code within 80 characters Otherwise, MPlus will cut off any code at 81 characters or beyond Remember that you can use “!” to include comments in your code All files referenced in your input file will be.txt files The data you’re calling in (e.g. “mplus_df2_hmda_test.txt”) The output that you save out (e.g. “df2_c2.txt”)

Running the Latent Class Analysis Click on the “Run” icon to begin estimation

Running the Latent Class Analysis MPlus will estimate your model according to the number of iterations specified in your input file (e.g. 100 iterations), each with different starting values for estimation

Running the Latent Class Analysis How long will estimation take? Depends on a few factors: The size of your dataset The number of included variables The number of specified classes The number of specified model iterations Your computer’s processing speed and memory I’ve had models take 30 seconds and models that run for 60+ hours

I Ran My Model…Now What? After completing your LCA estimation, scroll through and review the output file (.out) that MPlus generates Some things to look for: “MODEL ESTIMATION TERMINATED NORMALLY”

Post-Estimation Review Some things to look for: Bayesian Information Criterion (BIC) BIC is used to compare different models (e.g. a two-class versus a three-class solution) and see which provides a better fit for your data. A lower BIC value indicates a better fit, so keep testing models until the BIC stops declining and begins to increase again

Post-Estimation Review Some things to look for: The number/percent of cases that fall into each estimated class

Post-Estimation Review Some things to look for: The entropy score for your model “Entropy” values range between zero and one, where a value of one means that each class is perfectly unique from the others You want this value to be as close to one as possible. I don’t know if there is an accepted threshold or cutoff for entropy levels that are “too low.” I also don’t know whether entropy is reported in published research as an indicator of model fit or quality.

Post-Estimation Review Some things to look for: The end of the output file shows how long your model took to estimate

Post-Estimation Review After estimating your model, try estimating a new model with one additional class For this example, I ran a two-class solution, so next I would specify three classes This way, I can find the optimal class solution for my data (by comparing BIC values between models)

Running the Next LCA Model Open the two-class input file, use “Save As” to save a new three-class input file, and make just a few edits: Change the number of classes from (2) to (3)

Running the Next LCA Model Copy and paste the variable list in the MODEL field and then change the header to “%c#3%” for a three-class model Note: you still need to keep the “%c#2%” section from before, so now you will have a c#2 section followed by a c#3 section in your input file.

Running the Next LCA Model Change the name of the.txt data file that MPlus will kick out

Running the Next LCA Model After running a three-class solution, review the output file created by MPlus -- if the BIC decreased, then create a new input file for a four- class solution and estimate this new model Repeat these steps until your BIC value stops decreasing and instead begins to increase – the model with the lowest BIC is your optimal solution!

Running the Next LCA Model C2 ModelC3 ModelC4 ModelC5 Model Class % % %968.9% Class % % % % Class % % % Class % % Class % BIC100, , , , Entropy

Importing MPlus Output into Stata After identifying your optimal model, read your.txt LCA output back into Stata and merge into your original dataset My preference: use the “stcmd” commands, which call StatTransfer from within Stata Easily convert.txt to.dta format For example: inputst df2_c5.txt outputst df2_c5.dta /y Alternatively, you could open the.txt file in Excel, save as a.csv file, and use the “import delimited” command in Stata

…What Just Happened? MPlus uses asterisks (“*”) to denote missing data in its output file Conversely, Stata uses periods (“.”) These asterisks cause a number of issues in your dataset: Turn numeric variables into strings Cause data to “shift” columns to the left Pull your predicted probabilities and class ID out of proper columns

The Solution? Shift the Data Back I wrote an.ado file for Stata that will automatically reverse the data shifting problem with MPlus LCA output I gave this.ado file an imaginative title: “mpluslcafix”

Fixing Your MPlus Output With your MPlus output loaded into Stata, enter the following commands: adopath + “ ” mpluslcafix

Fixing Your MPlus Output

The “mpluslcafix” command will save out a new.dta file with your corrected LCA data Merge this new file back into your original dataset: use, clear merge 1:1 _n using For example: use df2_hmda_lca_test_tomerge, clear merge 1:1 _n using df2_c5

Fixing Your MPlus Output Now, you can use your latent class analysis results in statistical models! Hooray!

Thanks for Listening! Please feel free to me with questions, comments, issues: Also, please help me to “stress test” the.ado file! Try it on different types of data Try to break it Let me know if you find glitches so that I can fix them