Missing Data Mechanisms

Slides:



Advertisements
Similar presentations
Handling attrition and non- response in longitudinal data Harvey Goldstein University of Bristol.
Advertisements

Non response and missing data in longitudinal surveys.
Latent normal models for missing data Harvey Goldstein Centre for Multilevel Modelling University of Bristol.
Treatment of missing values
 Overview  Types of Missing Data  Strategies for Handling Missing Data  Software Applications and Examples.
Some birds, a cool cat and a wolf
CJT 765: Structural Equation Modeling Class 3: Data Screening: Fixing Distributional Problems, Missing Data, Measurement.
Part 2 Attrition: Bias and Loss of Power. Relevant Papers Graham, J.W., (2009). Missing data analysis: making it work in the real world. Annual Review.
Adapting to missing data
How to Handle Missing Values in Multivariate Data By Jeff McNeal & Marlen Roberts 1.

Missing Data in Randomized Control Trials
How to deal with missing data: INTRODUCTION
Modeling Achievement Trajectories When Attrition is Informative Betsy J. Feldman & Sophia Rabe- Hesketh.
Partially Missing At Random and Ignorable Inferences for Parameter Subsets with Missing Data Roderick Little Rennes
Missing Data.. What do we mean by missing data? Missing observations which were intended to be collected but: –Never collected –Lost accidently –Wrongly.
LECTURE 15 MULTIPLE IMPUTATION
Statistical Methods for Missing Data Roberta Harnett MAR 550 October 30, 2007.
PEAS wprkshop 2 Non-response and what to do about it Gillian Raab Professor of Applied Statistics Napier University.
The Mimix Command Reference Based Multiple Imputation For Sensitivity Analysis of Longitudinal Trials with Protocol Deviation Suzie Cro EMERGE.
Multiple Imputation for Complex Surveys: An Overview of the State of the Art Center for Statistical Research and Methodology (CSRM) United States Census.
Workshop on methods for studying cancer patient survival with application in Stata Karolinska Institute, 6 th September 2007 Modeling relative survival.
1 Multiple Imputation : Handling Interactions Michael Spratt.
1 S T A T A U S E R S G R O U P M E E T I N G SEPTEMBER Multiple Imputation for households surveys A comparison of methods Stata Users Group Meeting.
1 Introduction to Survey Data Analysis Linda K. Owens, PhD Assistant Director for Sampling & Analysis Survey Research Laboratory University of Illinois.
G Lecture 11 G Session 12 Analyses with missing data What should be reported?  Hoyle and Panter  McDonald and Moon-Ho (2002)
Applied Epidemiologic Analysis - P8400 Fall 2002 Lab 10 Missing Data Henian Chen, M.D., Ph.D.
Introduction to Multiple Imputation CFDR Workshop Series Spring 2008.
Imputation for Multi Care Data Naren Meadem. Introduction What is certain in life? –Death –Taxes What is certain in research? –Measurement error –Missing.
1 Introduction to Survey Data Analysis Linda K. Owens, PhD Assistant Director for Sampling & Analysis Survey Research Laboratory University of Illinois.
SW 983 Missing Data Treatment Most of the slides presented here are from the Modern Missing Data Methods, 2011, 5 day course presented by the KUCRMDA,
© John M. Abowd 2007, all rights reserved General Methods for Missing Data John M. Abowd March 2007.
1 G Lect 13W Imputation (data augmentation) of missing data Multiple imputation Examples G Multiple Regression Week 13 (Wednesday)
The Impact of Missing Data on the Detection of Nonuniform Differential Item Functioning W. Holmes Finch.
Missing Values Raymond Kim Pink Preechavanichwong Andrew Wendel October 27, 2015.
A REVIEW By Chi-Ming Kam Surajit Ray April 23, 2001 April 23, 2001.
Simulation Study for Longitudinal Data with Nonignorable Missing Data Rong Liu, PhD Candidate Dr. Ramakrishnan, Advisor Department of Biostatistics Virginia.
Special Topics in Educational Data Mining HUDK5199 Spring term, 2013 March 13, 2013.
Methods and software for editing and imputation: recent advancements at Istat M. Di Zio, U. Guarnera, O. Luzi, A. Manzari ISTAT – Italian Statistical Institute.
29 th TRF 2003, Denver July 14 th, Jenny H. Qin and Mike Singleton Kentucky CODES Kentucky Injury Prevention & Research Center University.
Tutorial I: Missing Value Analysis
PSY 1950 Outliers, Missing Data, and Transformations September 22, 2008.
Pre-Processing & Item Analysis DeShon Pre-Processing Method of Pre-processing depends on the type of measurement instrument used Method of Pre-processing.
Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 April 9, 2012.
A framework for multiple imputation & clustering -Mainly basic idea for imputation- Tokei Benkyokai 2013/10/28 T. Kawaguchi 1.
DATA STRUCTURES AND LONGITUDINAL DATA ANALYSIS Nidhi Kohli, Ph.D. Quantitative Methods in Education (QME) Department of Educational Psychology 1.
Research and Evaluation Methodology Program College of Education A comparison of methods for imputation of missing covariate data prior to propensity score.
Best Practices for Handling Missing Data
HANDLING MISSING DATA.
Missing data: Why you should care about it and what to do about it
Missing Data and Selection Bias
Multiple Imputation using SOLAS for Missing Data Analysis
MISSING DATA AND DROPOUT
Linear Mixed Models in JMP Pro
Inference for the mean vector
The Centre for Longitudinal Studies Missing Data Strategy
Maximum Likelihood & Missing data
Introduction to Survey Data Analysis
Multiple Imputation.
Multiple Imputation Using Stata
How to handle missing data values
Presenter: Ting-Ting Chung July 11, 2017
The European Statistical Training Programme (ESTP)
Non response and missing data in longitudinal surveys
Chapter 4: Missing data mechanisms
The European Statistical Training Programme (ESTP)
Clinical prediction models
Chapter 13: Item nonresponse
Missing data: Is it all the same?
Considerations for the use of multiple imputation in a noninferiority trial setting Kimberly Walters, Jie Zhou, Janet Wittes, Lisa Weissfeld Joint Statistical.
Presentation transcript:

Missing Data Mechanisms MCAR MAR MNAR References: Schafer, J., Graham, J.W. Missing data: our view of the state of the art. Psychological Methods,7(2), 147-177, 2002 Raghunathan, T.E., What do we do with missing data ? Some options for analysis of incomplete data. Ann. Rev. Public Health 25: 99-117, 2004

Graphical representation Y = variable partly missing X = variable completely observed Z = cause of missingness (unrelated to Y) R = represents missingness

X Z X Z X Z R Y Y R Y R MCAR MAR MNAR

Use of conditional probability Yc = the complete vector of Y observations Yc = ( Yo , Ym) MCAR: P (R | Yc) = P(R) Prob of missing does not depend on Yo MAR: P (R | Yc) = P( R | Yo) Prob of missing depends only on Yo MNAR: P (R | Yc) = P( R | Ym) Prob of missing depends on unobserved Ym

Methods for analyzing data with missing values in the repeated measures situation Case deletion: delete subjects with missing components (complete case analysis) Available case analysis: analysis is based on all observable data (use data from subjects with complete Y vectors as well as incomplete Y vectors)

Simulation Study: Parameter: MCAR MAR MNAR --------------------------------------------------------- Μean(Y):125 125 (7.0) 143.3(19.3) 155.5(30.7) Std(Y): 25 24.6(5.3) 20.9(5.8) 12.2(13.2) Rho: 0.6 0.59(0.2) 0.33(0.37) 0.34(0.36) Beta Y|X: 0.6 0.61(0.27) 0.60 (0.51) 0.21(0.43) Beta X|Y: 0.6 0.60(0.25) 0.20(0.44) 0.60(0.52) --------------------------------------------------------------------------- Generate: 50 observations from bivariate normal (Y,X) MCAR: prob Y missing is 0.73 (high !) MAR: prob Y missing if X < 141 MNAR: prob Y missing if Y < 141

Methods for analyzing survey data Weight responses that are present Average the available items (social sciences based on standardized scores but not studied in any systematic fashion)

Single imputation MS: Mean substitution HD: Hot Deck CM : conditional mean PD: predictive distribution

Average parameter estimate (RMSE) MCAR Average parameter estimate (RMSE) MS HD CM PD µ = 125.0 125.1 (7.18) 125.2 (7.89) (6.26) (6.57) σ = 25.0 12.3 (13.0) 23.4 (5.40) 18.2 (8.57) 24.7 (5.37) ρ = .60 .30 (.32) .16 (.46) .79 (.27) .59 (.20) βy|x = .60 (.45) (.47) .61 (.25) .60 βx|y= .60 (.26) .17 1.12 (.64) (.24)

Average parameter estimate (RMSE) MAR Average parameter estimate (RMSE) MS HD CM PD µ = 125.0 143.5 (19.4) (19.5) 124.9 (18.1) 124.8 (18.3) σ = 25.0 10.6 (14.6) 20.0 (6.68) 20.4 (10.7) 27.0 (8.77) ρ = .60 .08 (.52) .04 (.57) .64 (.48) .50 (.40) βy|x = .60 (.56) .61 .62 βx|y= .60 .20 (.44) .06 .78 (.75) .45

Average parameter estimate (RMSE) MNAR Average parameter estimate (RMSE) MS HD CM PD µ = 125.0 155.5 (30.7) (30.73) 151.6 (26.9) σ = 25.0 6.2 (18.9) 11.7 (13.7) 8.42 (16.9) 12.9 (12.7) ρ = .60 .08 (.47) .04 (.53) .64 (.40) .50 (.37) βy|x = .60 (.56) .61 (.43) .62 βx|y= .60 .20 (.55) .06 .78 (1.72) .45 (.68)

ML estimation Widely accepted Yields unbiased estimators under general regular conditions Provides a mechanism to do inference: testing hypotheses and confidence intervals Often relies on the EM algorithm Newton-Raphson /Fisher scoring used in multilevel modeling

Software for ML estimation SPSS: missing data module EMCOV NORM SAS: Proc Mixed S-Plus: lme function STATA LISREL Mplus HLM / MLWin (multi-level models)

Simulation Study: ML estimation Parameter: MCAR MAR MNAR --------------------------------------------------------- Μean(Y):125 124.8(6.5) 125.2(16.9) 151.6(26.9) Std(Y): 25 24.2(5.7) 25.5(7.4) 12.3(13.2) Rho: 0.6 0.61(0.2) 0.52(0.38) 0.39(0.36) Beta Y|X: 0.6 0.61(0.27) 0.60 (0.51) 0.21(0.43) Beta X|Y: 0.6 0.63(0.25) 0.49(0.38) 0.79(0.68) --------------------------------------------------------------------------- Generate: 50 observations from bivariate normal (Y,X) MCAR: prob Y missing is 0.73 (high !) MAR: prob Y missing if X < 141 MNAR: prob Y missing if Y < 141

ML estimation More attractive than ad-hoc methods Assume a large sample May or may not be robust to model assumptions Assume MAR

Multiple Imputation Each missing value replaced by m > 1 values: effectively create m datasets Efficiency: (1 + λ / m)-1 where λ is the rate of missing information implies m need not be large but certainly larger than 1 Rubin’s rules for combining estimators are now well accepted Helps to be a Bayesian ! MAR is usually assumed

Software NORM Proc MI in SAS: regression, propensity scores, MCMC This does NORM plus other routines SAS macro: IVE library S-Plus: missing data library (NORM) longitudinal data uses function PAN LISREL: missing data library like NORM SOLAS (same as Proc MI ??) http://www.multiple-imputation.com

Comments on MI methods Regression based MI methods are really based on Ml estimation: usually require a multivariate normal distribution Should you transform skewed data to normality (log or power transformation)? Partial answer: no Graham and Schafer (1999) Practice of rounding data to create binary/ordinal variables ? Partial answer: okay even for small samples

Comments continued: However: better specialized methods are available Schaffer (1997) for nominal data Liu et al (2000) for clustered data How about propensity scores ? No: can distort covariance structure in data (Allison, 2000)

Simulation Study: MI (NORM) Parameter: MCAR MAR MNAR --------------------------------------------------------- Μean(Y):125 124.9(6.5) 125.3(17.2) 151.6(26.9) Std(Y): 25 25.9(5.9) 28.7(8.2) 13.6(12.1) Rho: 0.6 0.57(0.2) 0.45(0.37) 0.35(0.36) Beta Y|X: 0.6 0.61(0.27) 0.59 (0.52) 0.21(0.43) Beta X|Y: 0.6 0.56(0.22) 0.39(0.38) 0.66(0.56) --------------------------------------------------------------------------- Generate: 50 observations from bivariate normal (Y,X) MCAR: prob Y missing is 0.73 (high !) MAR: prob Y missing if X < 141 MNAR: prob Y missing if Y < 141

Methods that do not assume MAR Selection models Pattern Mixture models

Food for thought In an longitudinal study on aging many subjects die while on study Is MAR a reasonable assumption ? Alternatively: joint modeling of outcome and death may be superior