Multiple Imputation in Finite Mixture Modeling Daniel Lee Presentation for MMM conference May 24, 2016 University of Connecticut 1.

Slides:



Advertisements
Similar presentations
Handling attrition and non- response in longitudinal data Harvey Goldstein University of Bristol.
Advertisements

Treatment of missing values
Adapting to missing data
How to Handle Missing Values in Multivariate Data By Jeff McNeal & Marlen Roberts 1.

Missing Data in Randomized Control Trials
How to deal with missing data: INTRODUCTION
Partially Missing At Random and Ignorable Inferences for Parameter Subsets with Missing Data Roderick Little Rennes
Chapter 9: Introduction to the t statistic
Statistical Methods for Missing Data Roberta Harnett MAR 550 October 30, 2007.
Mixture Modeling Chongming Yang Research Support Center FHSS College.
Multiple imputation using ICE: A simulation study on a binary response Jochen Hardt Kai Görgen 6 th German Stata Meeting, Berlin June, 27 th 2008 Göteborg.
METU Informatics Institute Min 720 Pattern Classification with Bio-Medical Applications PART 2: Statistical Pattern Classification: Optimal Classification.
STAT 3130 Statistical Methods I Session 2 One Way Analysis of Variance (ANOVA)
Guide to Handling Missing Information Contacting researchers Algebraic recalculations, conversions and approximations Imputation method (substituting missing.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
1 Introduction to Survey Data Analysis Linda K. Owens, PhD Assistant Director for Sampling & Analysis Survey Research Laboratory University of Illinois.
Performance of Resampling Variance Estimation Techniques with Imputed Survey data.
G Lecture 11 G Session 12 Analyses with missing data What should be reported?  Hoyle and Panter  McDonald and Moon-Ho (2002)
Imputation for Multi Care Data Naren Meadem. Introduction What is certain in life? –Death –Taxes What is certain in research? –Measurement error –Missing.
Measurement Models: Exploratory and Confirmatory Factor Analysis James G. Anderson, Ph.D. Purdue University.
Multigroup Models Byrne Chapter 7 Brown Chapter 7.
CROSS-VALIDATION AND MODEL SELECTION Many Slides are from: Dr. Thomas Jensen -Expedia.com and Prof. Olga Veksler - CS Learning and Computer Vision.
SW 983 Missing Data Treatment Most of the slides presented here are from the Modern Missing Data Methods, 2011, 5 day course presented by the KUCRMDA,
The Impact of Missing Data on the Detection of Nonuniform Differential Item Functioning W. Holmes Finch.
Missing Values Raymond Kim Pink Preechavanichwong Andrew Wendel October 27, 2015.
Special Topics in Educational Data Mining HUDK5199 Spring term, 2013 March 13, 2013.
Chapter 5 Sampling Distributions. The Concept of Sampling Distributions Parameter – numerical descriptive measure of a population. It is usually unknown.
Sampling Theory and Some Important Sampling Distributions.
Tutorial I: Missing Value Analysis
A framework for multiple imputation & clustering -Mainly basic idea for imputation- Tokei Benkyokai 2013/10/28 T. Kawaguchi 1.
Chapter 17 STRUCTURAL EQUATION MODELING. Structural Equation Modeling (SEM)  Relatively new statistical technique used to test theoretical or causal.
Data Screening. What is it? Data screening is very important to make sure you’ve met all your assumptions, outliers, and error problems. Each type of.
Slides to accompany Weathington, Cunningham & Pittenger (2010), Chapter 11: Between-Subjects Designs 1.
Research and Evaluation Methodology Program College of Education A comparison of methods for imputation of missing covariate data prior to propensity score.
Best Practices for Handling Missing Data
HANDLING MISSING DATA.
Measures of Fit David A. Kenny.
Data Analysis Module: One Way Analysis of Variance (ANOVA)
Missing data: Why you should care about it and what to do about it
Structural Equation Modeling using MPlus
Rachael Bedford Mplus: Longitudinal Analysis Workshop 26/09/2017
Sampling Distributions and Estimation
Bivariate Testing (ANOVA)
CH 5: Multivariate Methods
Hypothesis Testing and Confidence Intervals (Part 1): Using the Standard Normal Lecture 8 Justin Kern October 10 and 12, 2017.
The Centre for Longitudinal Studies Missing Data Strategy
Maximum Likelihood & Missing data
Hypothesis Tests: One Sample
Introduction to Survey Data Analysis
Hypothesis Testing: Hypotheses
Multiple Imputation.
Bivariate Testing (ANOVA)
Multiple Imputation Using Stata
How to handle missing data values
Discrete Event Simulation - 4
The bane of data analysis
The European Statistical Training Programme (ESTP)
Product moment correlation
Missing Data Mechanisms
Latent Variable Mixture Growth Modeling in Mplus
Non response and missing data in longitudinal surveys
Analysis of missing responses to the sexual experience question in evaluation of an adolescent HIV risk reduction intervention Yu-li Hsieh, Barbara L.
CHAPTER 2: Basic Summary Statistics
Inferential statistics Study a sample Conclude about the population Two processes: Estimation (Point or Interval) Hypothesis testing.
Handling Missing Not at Random Data for Safety Endpoint in the Multiple Dose Titration Clinical Pharmacology Trial Li Fan*, Tian Zhao, Patrick Larson Merck.
Chapter 9 Hypothesis Testing: Single Population
Clinical prediction models
Chapter 13: Item nonresponse
Considerations for the use of multiple imputation in a noninferiority trial setting Kimberly Walters, Jie Zhou, Janet Wittes, Lisa Weissfeld Joint Statistical.
Presentation transcript:

Multiple Imputation in Finite Mixture Modeling Daniel Lee Presentation for MMM conference May 24, 2016 University of Connecticut 1

2 Introduction: Finite Mixture Models Class of statistical models that treat group membership as a latent categorical variable A class of analysis that estimates parameters for a hypothesized number of groups, or classes, from a single data set (McLachlan & Peel, 2000) – This usually involved: Investigating population heterogeneity in model parameters Finding the possible number of latent groups classifying cases into these groups examine the extent to which auxiliary information can be used to evaluate classes Any statistical method that can be formulated as a multiple group problem can be formulated as a finite mixture model

3 Introduction: Finite Mixture Models example (factor mixture models)

4 Introduction: Missing data in finite mixtures Missing data handling methods in finite mixture models (Sterba, 2014) – Strategy in which missingness is handled interferes with discriminating between latent class or latent continuous models. – MVN MI, FIML-EM, and newer MI approaches considered – MI strategies for multiple group SEMs (Enders & Gottschall, 2011) – Explored 2 MI methods with multiple groups SGI PTI – Cautionary note on latent categorical variables (mixture models)

5 Introduction: Missing Data Missing data in practice – Listwise/Pairwise Deletion – Full Information Maximum Likelihood – Multiple Imputation (MI; Rubin, 1976) Multiple Imputation – Imputation Phase: generate m different datasets, each with slightly different estimates for the missing values. – Analysis Phase: Analysis performed on the m datasets and parameters across m results averaged (Special rule for standard errors provided by Rubin (1987))

6 Introduction: Research Questions When groups are unknown (mixture models) how will MI perform? In a recent discussion with Craig Enders… “The gist is that standard MI routines will not work for mixtures because they will generate imputations from a single- class model. In effect, MI leaves out the most important variable in the analysis, the latent classes, thereby biasing the resulting estimates toward a single, common class...” In MI the group structure should be accounted for, otherwise imputations will produce poor values (since it uses the entire dataset to get these imputations) Label switching problem (Tueller, Drotar, & Lubke, 2011)

7 Methods: Simulation Manipulated 3 variables (total 12 conditions): – Sample size: 50 and 250 – MCAR missing rates: 5%, 15%, 25% (even benign missing values can cause bias) – Mahalanobis Distances: low ( 4) 100 multivariate normal complete data sets from a 2-group CFA model with 6 indicator variables. Each data set contained data for two groups with distinct population parameters, including true group variable (e.g. n = 250 was split into two groups, 125 in each group, with different population values)

8 Methods: Data Generating Model Group 2 Group 1

9 Methods: Data analysis Analysis 1: Used MI with 10 imputation when groups were known (normal CFA model), using the SGI procedure. Used built-in Mplus imputation (MI in Mplus; Asparouhov & Muthen, 2010) and MG- CFA analysis. – WHAT KIND OF IMPUTATION MODEL IS USED HERE? Analysis 2: Used MI with 10 imputations when groups were unknown (factor mixture model). Used Mplus for imputation and FMM analysis. – Starting values: true parameters Estimates from analysis 1 and analysis 2 were compared against true population parameters and standard bias estimates. Standard error estimates greater than 0.40 considered significant (Collins, Schafer, & Kam, 2001).

10 Label switching (Tueller, Drotar, & Lubke, 2011) Common issue in LVMM simulations Simple example: – TRUE generating values for factor variances: class 1 = 2 and class 2 = 4. – Rep.1 LVMM estimates show: class 1 = 3.9 and class 2 = 2.1 (switched) – Rep. 2 LVMM estimates show: class 1 = 1.9 and class 2 = 4.1 (OK) – Rep. 3 LVMM estimates show: class 1 = 2 and class 2 = 3.7 (switched) Problem: aggregating parameter estimates over potentially mislabeled classes

11 Methods: Evaluation criteria Bias PUT THE FORMULA HERE used as cut-off (Hoogland & Boomsma, 1998) RMSE – PUT THE FORMULA HERE – Expected squared loss around the true parameter Standard error ratio (e.g., Lee, Poon, & Bentler, 1995) -SE(theta_hat(m))/SD(theta_hat(m) -values < 1  inflated Type I error -values > 1  inflated type II error -non-converged replications omitted

12 Results: Bias

13 Results: Bias

14 Label switching check (Tueller, Drotar, & Lubke, 2011)

15 Results: RMSE

16 Results: Standard Error Ratio

17 Discussion and Recommendations (and issues) MI not recommended for finite mixture models Other solutions? – Different sample sizes? – Larger differences in parameters? Label switching? – Does it happen at the imputation level or analysis level?