Download presentation
Presentation is loading. Please wait.
Published byPauline Stafford Modified over 9 years ago
1
How to Handle Missing Values in Multivariate Data By Jeff McNeal & Marlen Roberts 1
2
The Missing Data Problem Problems with Statistical Inference Sample Size & Power Biased Results Little, R., & Rubin, D. (2002). Introduction. In Statistical Analysis with Missing Data (2nd ed., pp. 1-2). Hoboken, New Jersey: John Wiley & Sons.2
3
Real World Examples Respondents in a household survey refuse to report income Missing results of manufacturing experiment due to equipment failure Voters’ inability to express preference for a political candidate in an opinion poll Little, R., & Rubin, D. (2002). Introduction. In Statistical Analysis with Missing Data (2nd ed., pp. 1-2). Hoboken, New Jersey: John Wiley & Sons.3
4
Outline Common Assumptions and Missing Data Patterns Taxonomy of Methods for Handling Missing Values Multiple Imputation Maximum Likelihood Simulation 4
5
Missing Data Patterns All missing data are not created equal Missing due to a random process Missing due to a non-random process 5
6
A Simple Example: Income Survey Westfall, P., & Henning, K. (2013). Understanding Advanced Statistical Methods (1st ed.). Boca Raton, Florida: CRC Press, Taylor & Francis Group.6
7
Univariate Missing Data Process: MCAR P.H. Westfall7
8
Multivariate Missing Data Processes: MCAR and MAR http://support.sas.com/resources/papers/proceedings12/312-2012.pdf8
9
Missing Data Processes: MNAR http://www.stat.columbia.edu/~gelman/arm/missing.pdf9
10
Taxonomy of Missing-Data Methods Complete Case Analysis (Listwise Deletion) Available Case Analysis (Pairwise Deletion) Least Squares on Imputed Data Multiple Imputation Maximum Likelihood (and Bayes) Little, R., & Rubin, D. (2002). Introduction. In Statistical Analysis with Missing Data (2nd ed., pp. 19-20). Hoboken, New Jersey: John Wiley & Sons.10
11
Complete Case Analysis (Listwise Deletion) Easy to implement Works well when MCAR assumption is met Wastes a lot of information http://statistics.ucla.edu/system/resources/BAhbBlsHOgZmSSI7MjAxMi8wNS8yOS8xNF80OF8wOV83M19SZWdyZXNzaW9uX3dpdGhfTWlzc2luZ19YX3MucGRmBjoGRV Q/Regression%20with%20Missing%20X's.pdf 11
12
Available Case Analysis (Pairwise Deletion) Attempts to minimize the loss of data in listwise deletion Increases the power of your test Usually is outperformed by Maximum Likelihood Caveat: Can result in non-positive definite covariance matrices http://statistics.ucla.edu/system/resources/BAhbBlsHOgZmSSI7MjAxMi8wNS8yOS8xNF80OF8wOV83M19SZWdyZXNzaW9uX3dpdGhfTWlzc2luZ19YX3MucGRmBjoGRV Q/Regression%20with%20Missing%20X's.pdf 12
13
Least Squares Imputation Methods Unconditional Mean Substitution Conditional Mean Imputation based on X Conditional Mean Imputation based on X and Y http://statistics.ucla.edu/system/resources/BAhbBlsHOgZmSSI7MjAxMi8wNS8yOS8xNF80OF8wOV83M19SZWdyZXNzaW9uX3dpdGhfTWlzc2luZ19YX3MucGRmBjoGRV Q/Regression%20with%20Missing%20X's.pdf 13
14
Unconditional Mean Substitution Just take the sample mean of the observed data and use it for the missing values Heavily biases the covariance matrix Bias can be corrected but the inferences (confidence intervals, tests, etc.) are distorted and over-precise http://statistics.ucla.edu/system/resources/BAhbBlsHOgZmSSI7MjAxMi8wNS8yOS8xNF80OF8wOV83M19SZWdyZXNzaW9uX3dpdGhfTWlzc2luZ19YX3MucGRmBjoGRV Q/Regression%20with%20Missing%20X's.pdf 14
15
Conditional Mean Imputation http://statistics.ucla.edu/system/resources/BAhbBlsHOgZmSSI7MjAxMi8wNS8yOS8xNF80OF8wOV83M19SZWdyZXNzaW9uX3dpdGhfTWlzc2luZ19YX3MucGRmBjoGRV Q/Regression%20with%20Missing%20X's.pdf 15
16
Multiple Imputation Little, R., & Rubin, D. (2002). Introduction. In Statistical Analysis with Missing Data (2nd ed., pp. 19-20). Hoboken, New Jersey: John Wiley & Sons.16
17
Steps Involved in Multiple Imputation Introduce random variation into the process of imputing missing values Generate several data sets, each with different imputed values Perform an analysis on each data set Combine the results into a single set of parameter estimates, standard errors, and test statistics http://support.sas.com/resources/papers/proceedings12/312-2012.pdf17
18
Introducing Randomness into a M.I. Model http://support.sas.com/resources/papers/proceedings12/312-2012.pdf18
19
Adding Variability to the Imputed Values http://support.sas.com/resources/papers/proceedings12/312-2012.pdf19
20
Why Do We Want to Add Variability? This is the whole point of multiple imputation http://www.stat.columbia.edu/~gelman/arm/missing.pdf20
21
Combining Inferences from Imputed Data http://support.sas.com/resources/papers/proceedings12/312-2012.pdf21
22
Simplified Form using a Regression Example http://www.stat.columbia.edu/~gelman/arm/missing.pdf22
23
Likelihood-Based Inference https://www.amstat.org/sections/srms/webinarfiles/ModernMethodWebinarMay2012.pdf23
24
ML with Ignorable Missing Data https://www.amstat.org/sections/srms/webinarfiles/ModernMethodWebinarMay2012.pdf24
25
ML with Ignorable Missing Data https://www.amstat.org/sections/srms/webinarfiles/ModernMethodWebinarMay2012.pdf25
26
Comparison of Methods ListwisePairwise Easiest to implement Has minimal effect if data are MCAR, or MAR for large sample sizes Has a tendency to bias results Uses more information than listwise Increases statistical power Also easy to implement Multiple ImputationMaximum Likelihood Requires no special software once the imputed datasets are generated Requires specification of a model Requires more assumptions Requires specification of a model for each variable Most asymptotically efficient Most complex You get model comparison statistics (AIC, BIC, etc.) 26
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.