Missing Data in Randomized Control Trials

Slides:



Advertisements
Similar presentations
Handling attrition and non- response in longitudinal data Harvey Goldstein University of Bristol.
Advertisements

Writing up results from Structural Equation Models
A Conceptual Introduction to Multilevel Models as Structural Equations
Treatment of missing values
MCUAAAR: Methods & Measurement Core Workshop: Structural Equation Models for Longitudinal Analysis of Health Disparities Data April 11th, :00 to.
Missing Data Analysis. Complete Data: n=100 Sample means of X and Y Sample variances and covariances of X Y
1crmda.KU.edu Todd D. Little University of Kansas Director, Quantitative Training Program Director, Center for Research Methods and Data Analysis Director,
 Overview  Types of Missing Data  Strategies for Handling Missing Data  Software Applications and Examples.
Missing Data Issues in RCTs: What to Do When Data Are Missing? Analytic and Technical Support for Advancing Education Evaluations REL Directors Meeting.
CJT 765: Structural Equation Modeling Class 3: Data Screening: Fixing Distributional Problems, Missing Data, Measurement.
Part 2 Attrition: Bias and Loss of Power. Relevant Papers Graham, J.W., (2009). Missing data analysis: making it work in the real world. Annual Review.
Missing Data: Analysis and Design John W. Graham The Prevention Research Center and Department of Biobehavioral Health Penn State University.
Structural Equation Modeling
Adapting to missing data
How to Handle Missing Values in Multivariate Data By Jeff McNeal & Marlen Roberts 1.
1 Unsupervised Learning With Non-ignorable Missing Data Machine Learning Group Talk University of Toronto Monday Oct 4, 2004 Ben Marlin Sam Roweis Rich.
How to deal with missing data: INTRODUCTION
Modeling Achievement Trajectories When Attrition is Informative Betsy J. Feldman & Sophia Rabe- Hesketh.
LECTURE 15 MULTIPLE IMPUTATION
Psych 524 Andrew Ainsworth Data Screening 2. Transformation allows for the correction of non-normality caused by skewness, kurtosis, or other problems.
Statistical Methods for Missing Data Roberta Harnett MAR 550 October 30, 2007.
Missing Data in Randomized Control Trials
Introduction to Multilevel Modeling Using SPSS
Multiple imputation using ICE: A simulation study on a binary response Jochen Hardt Kai Görgen 6 th German Stata Meeting, Berlin June, 27 th 2008 Göteborg.
by Levente (Levi) Littvay Central European University
1 S T A T A U S E R S G R O U P M E E T I N G SEPTEMBER Multiple Imputation for households surveys A comparison of methods Stata Users Group Meeting.
1 Introduction to Survey Data Analysis Linda K. Owens, PhD Assistant Director for Sampling & Analysis Survey Research Laboratory University of Illinois.
G Lecture 11 G Session 12 Analyses with missing data What should be reported?  Hoyle and Panter  McDonald and Moon-Ho (2002)
Handling Attrition and Non- response in the 1970 British Cohort Study Tarek Mostafa Institute of Education – University of London.
Applied Epidemiologic Analysis - P8400 Fall 2002 Lab 10 Missing Data Henian Chen, M.D., Ph.D.
Multilevel Modeling Software Wayne Osgood Crime, Law & Justice Program Department of Sociology.
Imputation for Multi Care Data Naren Meadem. Introduction What is certain in life? –Death –Taxes What is certain in research? –Measurement error –Missing.
1 Introduction to Survey Data Analysis Linda K. Owens, PhD Assistant Director for Sampling & Analysis Survey Research Laboratory University of Illinois.
SW 983 Missing Data Treatment Most of the slides presented here are from the Modern Missing Data Methods, 2011, 5 day course presented by the KUCRMDA,
1crmda.KU.edu Todd D. Little University of Kansas Director, Quantitative Training Program Director, Center for Research Methods and Data Analysis Director,
1 G Lect 13W Imputation (data augmentation) of missing data Multiple imputation Examples G Multiple Regression Week 13 (Wednesday)
The Impact of Missing Data on the Detection of Nonuniform Differential Item Functioning W. Holmes Finch.
1 G Lect 13M Why might data be missing in psychological studies? Missing data patterns Overview of statistical approaches Example G Multiple.
Missing Values Raymond Kim Pink Preechavanichwong Andrew Wendel October 27, 2015.
A REVIEW By Chi-Ming Kam Surajit Ray April 23, 2001 April 23, 2001.
Simulation Study for Longitudinal Data with Nonignorable Missing Data Rong Liu, PhD Candidate Dr. Ramakrishnan, Advisor Department of Biostatistics Virginia.
1crmda.KU.edu Todd D. Little University of Kansas Director, Quantitative Training Program Director, Center for Research Methods and Data Analysis Director,
Special Topics in Educational Data Mining HUDK5199 Spring term, 2013 March 13, 2013.
29 th TRF 2003, Denver July 14 th, Jenny H. Qin and Mike Singleton Kentucky CODES Kentucky Injury Prevention & Research Center University.
General Structural Equations (LISREL)
Tutorial I: Missing Value Analysis
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Biostatistics Case Studies Peter D. Christenson Biostatistician Session 3: Missing Data in Longitudinal Studies.
Pre-Processing & Item Analysis DeShon Pre-Processing Method of Pre-processing depends on the type of measurement instrument used Method of Pre-processing.
Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 April 9, 2012.
A framework for multiple imputation & clustering -Mainly basic idea for imputation- Tokei Benkyokai 2013/10/28 T. Kawaguchi 1.
Bootstrapping James G. Anderson, Ph.D. Purdue University.
Chapter 17 STRUCTURAL EQUATION MODELING. Structural Equation Modeling (SEM)  Relatively new statistical technique used to test theoretical or causal.
Data Screening. What is it? Data screening is very important to make sure you’ve met all your assumptions, outliers, and error problems. Each type of.
DATA STRUCTURES AND LONGITUDINAL DATA ANALYSIS Nidhi Kohli, Ph.D. Quantitative Methods in Education (QME) Department of Educational Psychology 1.
Multiple Imputation in Finite Mixture Modeling Daniel Lee Presentation for MMM conference May 24, 2016 University of Connecticut 1.
Best Practices for Handling Missing Data
HANDLING MISSING DATA.
Missing data: Why you should care about it and what to do about it
Rachael Bedford Mplus: Longitudinal Analysis Workshop 26/09/2017
MISSING DATA AND DROPOUT
Ch3: Model Building through Regression
Maximum Likelihood & Missing data
Introduction to Survey Data Analysis
How to handle missing data values
The European Statistical Training Programme (ESTP)
Missing Data Mechanisms
Clinical prediction models
Missing data: Is it all the same?
Considerations for the use of multiple imputation in a noninferiority trial setting Kimberly Walters, Jie Zhou, Janet Wittes, Lisa Weissfeld Joint Statistical.
Presentation transcript:

Missing Data in Randomized Control Trials John W. Graham The Prevention Research Center and Department of Biobehavioral Health Penn State University IES/NCER Summer Research Training Institute, August 2, 2010 jgraham@psu.edu

Sessions in Three Parts (1) Introduction: Missing Data Theory (2) Attrition: Bias and Lost Power After the break ... (3) Hands-on with Multiple Imputation Multiple Imputation with NORM SPSS Automation Utility (New!) SPSS Regression HLM Automation Utility (New!) 2-Level Regression with HLM 6

Recent Papers Graham, J. W., (2009). Missing data analysis: making it work in the real world. Annual Review of Psychology, 60, 549-576. Graham, J. W., Cumsille, P. E., & Elek-Fisk, E. (2003). Methods for handling missing data. In J. A. Schinka & W. F. Velicer (Eds.). Research Methods in Psychology (pp. 87_114). Volume 2 of Handbook of Psychology (I. B. Weiner, Editor-in-Chief). New York: John Wiley & Sons. Graham, J. W. (2010, forthcoming). Missing Data: Analysis and Design. New York: Springer. Chapter 4: Multiple Imputation with Norm 2.03 Chapter 6: Multiple Imputation and Analysis with SPSS 17/18 Chapter 7: Multiple Imputation and Analysis with Multilevel (Cluster) Data

Recent Papers Collins, L. M., Schafer, J. L., & Kam, C. M. (2001). A comparison of inclusive and restrictive strategies in modern missing data procedures. Psychological Methods, 6, 330-351. Schafer, J. L., & Graham, J. W. (2002). Missing data: our view of the state of the art. Psychological Methods, 7, 147-177. Graham, J. W., Taylor, B. J., Olchowski, A. E., & Cumsille, P. E. (2006). Planned missing data designs in psychological research. Psychological Methods, 11, 323-343.

Part 1: A Brief Introduction to Analysis with Missing Data

Problem with Missing Data Analysis procedures were designed for complete data . . .

Solution 1 Design new model-based procedures Missing Data + Parameter Estimation in One Step Full Information Maximum Likelihood (FIML) SEM and Other Latent Variable Programs (Amos, LISREL, Mplus, Mx, LTA)

Solution 2 Data based procedures Two Steps e.g., Multiple Imputation (MI) Two Steps Step 1: Deal with the missing data (e.g., replace missing values with plausible values Produce a product Step 2: Analyze the product as if there were no missing data

FAQ Aren't you somehow helping yourself with imputation? . . .

NO. Missing data imputation . . . does NOT give you something for nothing DOES let you make use of all data you have . . .

FAQ Is the imputed value what the person would have given?

NO. When we impute a value . . We do not impute for the sake of the value itself We impute to preserve important characteristics of the whole data set . . .

We want . . . unbiased parameter estimation e.g., b-weights Good estimate of variability e.g., standard errors best statistical power

Causes of Missingness Ignorable Non-Ignorable MCAR: Missing Completely At Random MAR: Missing At Random Non-Ignorable MNAR: Missing Not At Random

MCAR (Missing Completely At Random) MCAR 1: Cause of missingness completely random process (like coin flip) MCAR 2: (essentially MCAR) Cause uncorrelated with variables of interest Example: parents move No bias if cause omitted

MAR (Missing At Random) Missingness may be related to measured variables But no residual relationship with unmeasured variables Example: reading speed No bias if you control for measured variables

MNAR (Missing Not At Random) Even after controlling for measured variables ... Residual relationship with unmeasured variables Example: drug use reason for absence

MNAR Causes The recommended methods assume missingness is MAR But what if the cause of missingness is not MAR? Should these methods be used when MAR assumptions not met? . . .

YES! These Methods Work! Suggested methods work better than “old” methods Multiple causes of missingness Only small part of missingness may be MNAR Suggested methods usually work very well

Methods: "Old" vs MAR vs MNAR MAR methods (MI and ML) are ALWAYS at least as good as, usually better than "old" methods (e.g., listwise deletion) Methods designed to handle MNAR missingness are NOT always better than MAR methods

Analysis: Old and New

Old Procedures: Analyze Complete Cases (listwise deletion) may produce bias you always lose some power (because you are throwing away data) reasonable if you lose only 5% of cases often lose substantial power

Analyze Complete Cases (listwise deletion) 1 1 1 1 0 1 1 1 1 0 1 1 1 1 0 1 1 1 1 0 very common situation only 20% (4 of 20) data points missing but discard 80% of the cases

Other "Old" Procedures Pairwise deletion Mean substitution May be of occasional use for preliminary analyses Mean substitution Never use it Regression-based single imputation generally not recommended ... except ...

Recommended Model-Based Procedures Multiple Group SEM (Structural Equation Modeling) Latent Transition Analysis (Collins et al.) A latent class procedure

Recommended Model-Based Procedures Raw Data Maximum Likelihood SEM aka Full Information Maximum Likelihood (FIML) Amos (James Arbuckle) LISREL 8.5+ (Jöreskog & Sörbom) Mplus (Bengt Muthén) Mx (Michael Neale)

Amos, Mx, Mplus, LISREL 8.8 Structural Equation Modeling (SEM) Programs In Single Analysis ... Good Estimation Reasonable standard errors Windows Graphical Interface

Limitation with Model-Based Procedures That particular model must be what you want

Recommended Data-Based Procedures EM Algorithm (ML parameter estimation) Norm-Cat-Mix, EMcov, SAS, SPSS Multiple Imputation NORM, Cat, Mix, Pan (Joe Schafer) SAS Proc MI SPSS 17/18 (not quite yet) LISREL 8.5+ Amos

EM Algorithm Expectation - Maximization Alternate between E-step: predict missing data M-step: estimate parameters Excellent (ML) parameter estimates But no standard errors must use bootstrap or multiple imputation

Multiple Imputation Problem with Single Imputation: Too Little Variability Because of Error Variance Because covariance matrix is only one estimate

Too Little Error Variance Imputed value lies on regression line

Imputed Values on Regression Line

Restore Error . . . Add random normal residual

Regression Line only One Estimate

Covariance Matrix (Regression Line) only One Estimate Obtain multiple plausible estimates of the covariance matrix ideally draw multiple covariance matrices from population Approximate this with Bootstrap Data Augmentation (Norm) MCMC (SAS)

Data Augmentation stochastic version of EM EM Data Augmentation E (expectation) step: predict missing data M (maximization) step: estimate parameters Data Augmentation I (imputation) step: simulate missing data P (posterior) step: simulate parameters

Data Augmentation Parameters from consecutive steps ... too related i.e., not enough variability after 50 or 100 steps of DA ... covariance matrices are like random draws from the population

Multiple Imputation Allows: Unbiased Estimation Good standard errors provided number of imputations (m) is large enough too few imputations  reduced power with small effect sizes

ρ From Graham, J.W., Olchowski, A.E., & Gilreath, T.D. (2007). How many imputations are really needed? Some practical clarifications of multiple imputation theory. Prevention Science, 8, 206-213.