Statistical Methods for Missing Data Roberta Harnett MAR 550 October 30, 2007.

Slides:



Advertisements
Similar presentations
Handling attrition and non- response in longitudinal data Harvey Goldstein University of Bristol.
Advertisements

Non response and missing data in longitudinal surveys.
Treatment of missing values
 Overview  Types of Missing Data  Strategies for Handling Missing Data  Software Applications and Examples.
Some birds, a cool cat and a wolf
Missing Data. What is missing? Missing data are unavoidable, and more encompassing than the ubiquitous association of the term. What is missing? ~Cases.
CJT 765: Structural Equation Modeling Class 3: Data Screening: Fixing Distributional Problems, Missing Data, Measurement.
Adapting to missing data
How to Handle Missing Values in Multivariate Data By Jeff McNeal & Marlen Roberts 1.
Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap.

1 Unsupervised Learning With Non-ignorable Missing Data Machine Learning Group Talk University of Toronto Monday Oct 4, 2004 Ben Marlin Sam Roweis Rich.
Missing Data in Randomized Control Trials
How to deal with missing data: INTRODUCTION
Modeling Achievement Trajectories When Attrition is Informative Betsy J. Feldman & Sophia Rabe- Hesketh.
Partially Missing At Random and Ignorable Inferences for Parameter Subsets with Missing Data Roderick Little Rennes
Psych 524 Andrew Ainsworth Data Screening 2. Transformation allows for the correction of non-normality caused by skewness, kurtosis, or other problems.
PEAS wprkshop 2 Non-response and what to do about it Gillian Raab Professor of Applied Statistics Napier University.
Multiple imputation using ICE: A simulation study on a binary response Jochen Hardt Kai Görgen 6 th German Stata Meeting, Berlin June, 27 th 2008 Göteborg.
Multiple Imputation Approaches for Right-Censored Wages in the German IAB Employment Register European Conference on Quality in Official Statistics 2008,
Workshop on methods for studying cancer patient survival with application in Stata Karolinska Institute, 6 th September 2007 Modeling relative survival.
Guide to Handling Missing Information Contacting researchers Algebraic recalculations, conversions and approximations Imputation method (substituting missing.
Model Inference and Averaging
1 Multiple Imputation : Handling Interactions Michael Spratt.
1 Introduction to Survey Data Analysis Linda K. Owens, PhD Assistant Director for Sampling & Analysis Survey Research Laboratory University of Illinois.
G Lecture 11 G Session 12 Analyses with missing data What should be reported?  Hoyle and Panter  McDonald and Moon-Ho (2002)
Applied Epidemiologic Analysis - P8400 Fall 2002 Lab 10 Missing Data Henian Chen, M.D., Ph.D.
1 Introduction to Survey Data Analysis Linda K. Owens, PhD Assistant Director for Sampling & Analysis Survey Research Laboratory University of Illinois.
SW 983 Missing Data Treatment Most of the slides presented here are from the Modern Missing Data Methods, 2011, 5 day course presented by the KUCRMDA,
© John M. Abowd 2007, all rights reserved General Methods for Missing Data John M. Abowd March 2007.
1 G Lect 13W Imputation (data augmentation) of missing data Multiple imputation Examples G Multiple Regression Week 13 (Wednesday)
The Impact of Missing Data on the Detection of Nonuniform Differential Item Functioning W. Holmes Finch.
Chapter 7 Point Estimation of Parameters. Learning Objectives Explain the general concepts of estimating Explain important properties of point estimators.
Missing Values Raymond Kim Pink Preechavanichwong Andrew Wendel October 27, 2015.
A REVIEW By Chi-Ming Kam Surajit Ray April 23, 2001 April 23, 2001.
Simulation Study for Longitudinal Data with Nonignorable Missing Data Rong Liu, PhD Candidate Dr. Ramakrishnan, Advisor Department of Biostatistics Virginia.
Special Topics in Educational Data Mining HUDK5199 Spring term, 2013 March 13, 2013.
Latent regression models. Where does the probability come from? Why isn’t the model deterministic. Each item tests something unique – We are interested.
A shared random effects transition model for longitudinal count data with informative missingness Jinhui Li Joint work with Yingnian Wu, Xiaowei Yang.
Tutorial I: Missing Value Analysis
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Pre-Processing & Item Analysis DeShon Pre-Processing Method of Pre-processing depends on the type of measurement instrument used Method of Pre-processing.
Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 April 9, 2012.
A framework for multiple imputation & clustering -Mainly basic idea for imputation- Tokei Benkyokai 2013/10/28 T. Kawaguchi 1.
DATA STRUCTURES AND LONGITUDINAL DATA ANALYSIS Nidhi Kohli, Ph.D. Quantitative Methods in Education (QME) Department of Educational Psychology 1.
Prediction and Missing Data. Summarising Distributions ● Models are often large and complex ● Often only interested in some parameters – e.g. not so interested.
Best Practices for Handling Missing Data
HANDLING MISSING DATA.
Missing data: Why you should care about it and what to do about it
Multiple Imputation using SOLAS for Missing Data Analysis
MISSING DATA AND DROPOUT
Model Inference and Averaging
Ch3: Model Building through Regression
Maximum Likelihood & Missing data
Introduction to Survey Data Analysis
Multiple Imputation.
Multiple Imputation Using Stata
How to handle missing data values
Presenter: Ting-Ting Chung July 11, 2017
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
The European Statistical Training Programme (ESTP)
MEASUREMENT OF THE QUALITY OF STATISTICS
EM for Inference in MV Data
Missing Data Mechanisms
Analysis of missing responses to the sexual experience question in evaluation of an adolescent HIV risk reduction intervention Yu-li Hsieh, Barbara L.
EM for Inference in MV Data
Clinical prediction models
Implementation of the Bayesian approach to imputation at SORS Zvone Klun and Rudi Seljak Statistical Office of the Republic of Slovenia Oslo, September.
Chapter 13: Item nonresponse
Missing data: Is it all the same?
Presentation transcript:

Statistical Methods for Missing Data Roberta Harnett MAR 550 October 30, 2007

Outline When do we see missing data? Types of missing data Traditional approaches Deletion Substitution Modern Approaches Maximum likelihood and Bayes Software

Missing Data Medical studies, nonresponse in surveys or censuses, dropouts in clinical trials, censored data Loss of information, power Bias in results due to differences in missing and observed data Complicated analysis with standard software

Types of missing data MCAR MAR MNAR

MCAR Missing Completely at Random Probability that x i is missing doesn’t depend on its value or on value of other variables Doesn’t matter if it is associated with other “missingness”

MAR Missing at Random Missingness doesn’t depend on x i after controlling for other variable This is not great, but we can deal with it

MNAR Missing Not at Random Not MCAR or MAR (anything else) BAD!! Model missingness

Traditional Approaches Deletion List-wise Unbiased, but loses power Alternatives are really replacements for list-wise Pair-wise (also called “unwise”) deletion Leads to different sample sizes for different parts of analysis Can be a disaster

Traditional cont… Single Imputation Hot deck Census Bureau vs. Cold deck Mean substitution Regression substitution Stochastic regression substitution

Modern Methods Maximum Likelihood EM algorithm Estimate parameters  Listwise deletion, add some error Predict missing data (M): Maximize likelihood. Repeat. NORM (

Modern Methods Multiple Imputation Simple and general – works for any type of analysis Validity of method depends on how imputation is carried out Should reasonably predict missing data, but should also reflect uncertainty in predictions Using a “sensible” imputation model

“Random Imputation” Predict missing values, then add error component drawn randomly from residual distribution of the variable Repeat several times to improve error estimates

Multiple Imputation Use Bayesian arguments to impute data: Parametric model for data Ignorable missing data Non-ignorable missing data Apply prior for unknown model parameters Simulate m independent draws from distribution of Y mis given Y obs Calculate values explicitly or through MCMC

MI procedure Simulate a random draw of unknown parameters from observed-data posterior Simulate a random draw of missing values from conditional predictive distribution Repeat, obtaining new parameter estimates from “complete” data set until stabilizes Do 3-5 times total (Rubin) MCMC: data augmentation algorithm of Tanner and Wong (1987)‏

Parameter Estimates Calculate parameter Q from m data sets Estimate of Q is just average of m values of Q Variance of Q is T = (1+m -1 ) B + U Where U is the mean within-imputation variance and B is B = (1/m) Σ (Q l -Q ave ) 2 The between-imputation variability. As m → ∞, T = B + U and you don’t need to correct B for low numbers of imputations.

MI Imputation is computationally distinct from analysis Problem if assumptions of imputation are not compatible with analysis assumptions Loss of power if imputation makes fewer assumptions than analysis “Superefficient” if imputation is based on more (valid) assumptions than analysis

MI Inconsistent if imputation makes invalid assumptions that are not included in analysis Ex: interaction terms Imputation needs to preserve features of data that will be included in analysis

ABB Approximate Bayesian Bootstrap (Rubin, 1987)‏ Fancier version of Hot deck imputation

Comparison of Methods Removing entries with missing data vs. MI Imputing once vs. MI Number of imputations Efficiency is (1+λ/m) -1 MI vs. EM

Nonignorable nonresponse Ignorable if data are MAR MI can be used when there is nonignorable nonresponse Missing-data mechanism

Programs For S-PLUS: For R: Amelia (II) (surveys and time-series data) Norm (for multivariate normal data) SOLAS (tested by Allison, 2000) For windows

References Little, R.J.A. and Rubin, D.B. (1987) Statistical Analysis with Missing Data. J. Wiley & Sons, New York. Schafer, J.L. (1999) Multiple imputation: a primer. Statistical Methods in Medical Research, 8, Barnard, J. and X. Meng. (1999) Applications of multiple imputation in medical studies: from AIDS to NHANES. Statistical Methods in Medical Research, 8, Allison, P.D. (2000) Multiple Imputation for Missing Data: A Cautionary Tale. Sociological Methods and Research, 28 (3),

MI Example (Tu et al, 1993)‏ AIDS survival time with reporting-delay (1) Survival-time model (2) Reporting-lag model using available information (3) Multiply impute delayed cases using model from step 2 (4) Compute estimates of survival-time model parameters (5) Combine estimates using repeated-imputation rules

Milwaukee Parental Choice Program (MPCP)‏ Effects of school choice on achievement tests (public vs. private schools)‏ School vouchers to attend “choice” schools, participating private schools Only households with less than 1.75 times poverty line could participate

Milwaukee Parental Choice Program (MPCP)‏ Randomized block design Outcome variables were scores from ITBS Maximum of 4 years observed ( )‏ Higher levels of missingness than in typical medical study Pattern in missing data was not monotone