How to Handle Missing Values in Multivariate Data By Jeff McNeal & Marlen Roberts 1.

Slides:



Advertisements
Similar presentations
Regression Eric Feigelson Lecture and R tutorial Arcetri Observatory April 2014.
Advertisements

Nonparametric estimation of non- response distribution in the Israeli Social Survey Yury Gubman Dmitri Romanov JSM 2009 Washington DC 4/8/2009.
Treatment of missing values
CountrySTAT Team-I November 2014, ECO Secretariat,Teheran.
Notes Sample vs distribution “m” vs “µ” and “s” vs “σ” Bias/Variance Bias: Measures how much the learnt model is wrong disregarding noise Variance: Measures.
Missing Data Analysis. Complete Data: n=100 Sample means of X and Y Sample variances and covariances of X Y
Replacing Missing Values Jukka Parviainen Tik Special Course in Information Technology
 Overview  Types of Missing Data  Strategies for Handling Missing Data  Software Applications and Examples.
Uncertainty and confidence intervals Statistical estimation methods, Finse Friday , 12.45–14.05 Andreas Lindén.
Some birds, a cool cat and a wolf
CJT 765: Structural Equation Modeling Class 3: Data Screening: Fixing Distributional Problems, Missing Data, Measurement.
Adapting to missing data
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem, random variables, pdfs 2Functions.
Statistical Inference Chapter 12/13. COMP 5340/6340 Statistical Inference2 Statistical Inference Given a sample of observations from a population, the.
1 Unsupervised Learning With Non-ignorable Missing Data Machine Learning Group Talk University of Toronto Monday Oct 4, 2004 Ben Marlin Sam Roweis Rich.
Missing Data in Randomized Control Trials
How to deal with missing data: INTRODUCTION
Modeling Achievement Trajectories When Attrition is Informative Betsy J. Feldman & Sophia Rabe- Hesketh.
Partially Missing At Random and Ignorable Inferences for Parameter Subsets with Missing Data Roderick Little Rennes
Missing Data.. What do we mean by missing data? Missing observations which were intended to be collected but: –Never collected –Lost accidently –Wrongly.
Psych 524 Andrew Ainsworth Data Screening 2. Transformation allows for the correction of non-normality caused by skewness, kurtosis, or other problems.
Statistical Methods for Missing Data Roberta Harnett MAR 550 October 30, 2007.
PEAS wprkshop 2 Non-response and what to do about it Gillian Raab Professor of Applied Statistics Napier University.
Multiple imputation using ICE: A simulation study on a binary response Jochen Hardt Kai Görgen 6 th German Stata Meeting, Berlin June, 27 th 2008 Göteborg.
Moderation in Structural Equation Modeling: Specification, Estimation, and Interpretation Using Quadratic Structural Equations Jeffrey R. Edwards University.
Pattern Recognition: Baysian Decision Theory Charles Tappert Seidenberg School of CSIS, Pace University.
Workshop on methods for studying cancer patient survival with application in Stata Karolinska Institute, 6 th September 2007 Modeling relative survival.
1 Introduction to Survey Data Analysis Linda K. Owens, PhD Assistant Director for Sampling & Analysis Survey Research Laboratory University of Illinois.
G Lecture 11 G Session 12 Analyses with missing data What should be reported?  Hoyle and Panter  McDonald and Moon-Ho (2002)
Applied Epidemiologic Analysis - P8400 Fall 2002 Lab 10 Missing Data Henian Chen, M.D., Ph.D.
Imputation for Multi Care Data Naren Meadem. Introduction What is certain in life? –Death –Taxes What is certain in research? –Measurement error –Missing.
1 Introduction to Survey Data Analysis Linda K. Owens, PhD Assistant Director for Sampling & Analysis Survey Research Laboratory University of Illinois.
SW 983 Missing Data Treatment Most of the slides presented here are from the Modern Missing Data Methods, 2011, 5 day course presented by the KUCRMDA,
© John M. Abowd 2007, all rights reserved General Methods for Missing Data John M. Abowd March 2007.
1 G Lect 13W Imputation (data augmentation) of missing data Multiple imputation Examples G Multiple Regression Week 13 (Wednesday)
The Impact of Missing Data on the Detection of Nonuniform Differential Item Functioning W. Holmes Finch.
Missing Values Raymond Kim Pink Preechavanichwong Andrew Wendel October 27, 2015.
A REVIEW By Chi-Ming Kam Surajit Ray April 23, 2001 April 23, 2001.
Special Topics in Educational Data Mining HUDK5199 Spring term, 2013 March 13, 2013.
Machine Learning 5. Parametric Methods.
Tutorial I: Missing Value Analysis
Regression Analysis1. 2 INTRODUCTION TO EMPIRICAL MODELS LEAST SQUARES ESTIMATION OF THE PARAMETERS PROPERTIES OF THE LEAST SQUARES ESTIMATORS AND ESTIMATION.
INFO 4470/ILRLE 4470 Visualization Tools and Data Quality John M. Abowd and Lars Vilhuber March 16, 2011.
Pre-Processing & Item Analysis DeShon Pre-Processing Method of Pre-processing depends on the type of measurement instrument used Method of Pre-processing.
Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 April 9, 2012.
MathematicalMarketing Slide 5.1 OLS Chapter 5: Ordinary Least Square Regression We will be discussing  The Linear Regression Model  Estimation of the.
A framework for multiple imputation & clustering -Mainly basic idea for imputation- Tokei Benkyokai 2013/10/28 T. Kawaguchi 1.
Small area estimation combining information from several sources Jae-Kwang Kim, Iowa State University Seo-Young Kim, Statistical Research Institute July.
Data Screening. What is it? Data screening is very important to make sure you’ve met all your assumptions, outliers, and error problems. Each type of.
DATA STRUCTURES AND LONGITUDINAL DATA ANALYSIS Nidhi Kohli, Ph.D. Quantitative Methods in Education (QME) Department of Educational Psychology 1.
Research and Evaluation Methodology Program College of Education A comparison of methods for imputation of missing covariate data prior to propensity score.
Best Practices for Handling Missing Data
HANDLING MISSING DATA.
Missing data: Why you should care about it and what to do about it
Ch3: Model Building through Regression
CH 5: Multivariate Methods
Maximum Likelihood & Missing data
Introduction to Survey Data Analysis
Multiple Imputation.
Multiple Imputation Using Stata
How to handle missing data values
Presenter: Ting-Ting Chung July 11, 2017
The European Statistical Training Programme (ESTP)
CH2. Cleaning and Transforming Data
Missing Data Mechanisms
Analysis of missing responses to the sexual experience question in evaluation of an adolescent HIV risk reduction intervention Yu-li Hsieh, Barbara L.
Multivariate Methods Berlin Chen
Multivariate Methods Berlin Chen, 2005 References:
Implementation of the Bayesian approach to imputation at SORS Zvone Klun and Rudi Seljak Statistical Office of the Republic of Slovenia Oslo, September.
Chapter 13: Item nonresponse
Presentation transcript:

How to Handle Missing Values in Multivariate Data By Jeff McNeal & Marlen Roberts 1

The Missing Data Problem Problems with Statistical Inference Sample Size & Power Biased Results Little, R., & Rubin, D. (2002). Introduction. In Statistical Analysis with Missing Data (2nd ed., pp. 1-2). Hoboken, New Jersey: John Wiley & Sons.2

Real World Examples Respondents in a household survey refuse to report income Missing results of manufacturing experiment due to equipment failure Voters’ inability to express preference for a political candidate in an opinion poll Little, R., & Rubin, D. (2002). Introduction. In Statistical Analysis with Missing Data (2nd ed., pp. 1-2). Hoboken, New Jersey: John Wiley & Sons.3

Outline Common Assumptions and Missing Data Patterns Taxonomy of Methods for Handling Missing Values Multiple Imputation Maximum Likelihood Simulation 4

Missing Data Patterns All missing data are not created equal Missing due to a random process Missing due to a non-random process 5

A Simple Example: Income Survey Westfall, P., & Henning, K. (2013). Understanding Advanced Statistical Methods (1st ed.). Boca Raton, Florida: CRC Press, Taylor & Francis Group.6

Univariate Missing Data Process: MCAR P.H. Westfall7

Multivariate Missing Data Processes: MCAR and MAR

Missing Data Processes: MNAR

Taxonomy of Missing-Data Methods Complete Case Analysis (Listwise Deletion) Available Case Analysis (Pairwise Deletion) Least Squares on Imputed Data Multiple Imputation Maximum Likelihood (and Bayes) Little, R., & Rubin, D. (2002). Introduction. In Statistical Analysis with Missing Data (2nd ed., pp ). Hoboken, New Jersey: John Wiley & Sons.10

Complete Case Analysis (Listwise Deletion) Easy to implement Works well when MCAR assumption is met Wastes a lot of information Q/Regression%20with%20Missing%20X's.pdf 11

Available Case Analysis (Pairwise Deletion) Attempts to minimize the loss of data in listwise deletion Increases the power of your test Usually is outperformed by Maximum Likelihood Caveat: Can result in non-positive definite covariance matrices Q/Regression%20with%20Missing%20X's.pdf 12

Least Squares Imputation Methods Unconditional Mean Substitution Conditional Mean Imputation based on X Conditional Mean Imputation based on X and Y Q/Regression%20with%20Missing%20X's.pdf 13

Unconditional Mean Substitution Just take the sample mean of the observed data and use it for the missing values Heavily biases the covariance matrix Bias can be corrected but the inferences (confidence intervals, tests, etc.) are distorted and over-precise Q/Regression%20with%20Missing%20X's.pdf 14

Conditional Mean Imputation Q/Regression%20with%20Missing%20X's.pdf 15

Multiple Imputation Little, R., & Rubin, D. (2002). Introduction. In Statistical Analysis with Missing Data (2nd ed., pp ). Hoboken, New Jersey: John Wiley & Sons.16

Steps Involved in Multiple Imputation Introduce random variation into the process of imputing missing values Generate several data sets, each with different imputed values Perform an analysis on each data set Combine the results into a single set of parameter estimates, standard errors, and test statistics

Introducing Randomness into a M.I. Model

Adding Variability to the Imputed Values

Why Do We Want to Add Variability? This is the whole point of multiple imputation

Combining Inferences from Imputed Data

Simplified Form using a Regression Example

Likelihood-Based Inference

ML with Ignorable Missing Data

ML with Ignorable Missing Data

Comparison of Methods ListwisePairwise Easiest to implement Has minimal effect if data are MCAR, or MAR for large sample sizes Has a tendency to bias results Uses more information than listwise Increases statistical power Also easy to implement Multiple ImputationMaximum Likelihood Requires no special software once the imputed datasets are generated Requires specification of a model Requires more assumptions Requires specification of a model for each variable Most asymptotically efficient Most complex You get model comparison statistics (AIC, BIC, etc.) 26