Presentation is loading. Please wait.

Presentation is loading. Please wait.

How to Handle Missing Values in Multivariate Data By Jeff McNeal & Marlen Roberts 1.

Similar presentations


Presentation on theme: "How to Handle Missing Values in Multivariate Data By Jeff McNeal & Marlen Roberts 1."— Presentation transcript:

1 How to Handle Missing Values in Multivariate Data By Jeff McNeal & Marlen Roberts 1

2 The Missing Data Problem Problems with Statistical Inference Sample Size & Power Biased Results Little, R., & Rubin, D. (2002). Introduction. In Statistical Analysis with Missing Data (2nd ed., pp. 1-2). Hoboken, New Jersey: John Wiley & Sons.2

3 Real World Examples Respondents in a household survey refuse to report income Missing results of manufacturing experiment due to equipment failure Voters’ inability to express preference for a political candidate in an opinion poll Little, R., & Rubin, D. (2002). Introduction. In Statistical Analysis with Missing Data (2nd ed., pp. 1-2). Hoboken, New Jersey: John Wiley & Sons.3

4 Outline Common Assumptions and Missing Data Patterns Taxonomy of Methods for Handling Missing Values Multiple Imputation Maximum Likelihood Simulation 4

5 Missing Data Patterns All missing data are not created equal Missing due to a random process Missing due to a non-random process 5

6 A Simple Example: Income Survey Westfall, P., & Henning, K. (2013). Understanding Advanced Statistical Methods (1st ed.). Boca Raton, Florida: CRC Press, Taylor & Francis Group.6

7 Univariate Missing Data Process: MCAR P.H. Westfall7

8 Multivariate Missing Data Processes: MCAR and MAR http://support.sas.com/resources/papers/proceedings12/312-2012.pdf8

9 Missing Data Processes: MNAR http://www.stat.columbia.edu/~gelman/arm/missing.pdf9

10 Taxonomy of Missing-Data Methods Complete Case Analysis (Listwise Deletion) Available Case Analysis (Pairwise Deletion) Least Squares on Imputed Data Multiple Imputation Maximum Likelihood (and Bayes) Little, R., & Rubin, D. (2002). Introduction. In Statistical Analysis with Missing Data (2nd ed., pp. 19-20). Hoboken, New Jersey: John Wiley & Sons.10

11 Complete Case Analysis (Listwise Deletion) Easy to implement Works well when MCAR assumption is met Wastes a lot of information http://statistics.ucla.edu/system/resources/BAhbBlsHOgZmSSI7MjAxMi8wNS8yOS8xNF80OF8wOV83M19SZWdyZXNzaW9uX3dpdGhfTWlzc2luZ19YX3MucGRmBjoGRV Q/Regression%20with%20Missing%20X's.pdf 11

12 Available Case Analysis (Pairwise Deletion) Attempts to minimize the loss of data in listwise deletion Increases the power of your test Usually is outperformed by Maximum Likelihood Caveat: Can result in non-positive definite covariance matrices http://statistics.ucla.edu/system/resources/BAhbBlsHOgZmSSI7MjAxMi8wNS8yOS8xNF80OF8wOV83M19SZWdyZXNzaW9uX3dpdGhfTWlzc2luZ19YX3MucGRmBjoGRV Q/Regression%20with%20Missing%20X's.pdf 12

13 Least Squares Imputation Methods Unconditional Mean Substitution Conditional Mean Imputation based on X Conditional Mean Imputation based on X and Y http://statistics.ucla.edu/system/resources/BAhbBlsHOgZmSSI7MjAxMi8wNS8yOS8xNF80OF8wOV83M19SZWdyZXNzaW9uX3dpdGhfTWlzc2luZ19YX3MucGRmBjoGRV Q/Regression%20with%20Missing%20X's.pdf 13

14 Unconditional Mean Substitution Just take the sample mean of the observed data and use it for the missing values Heavily biases the covariance matrix Bias can be corrected but the inferences (confidence intervals, tests, etc.) are distorted and over-precise http://statistics.ucla.edu/system/resources/BAhbBlsHOgZmSSI7MjAxMi8wNS8yOS8xNF80OF8wOV83M19SZWdyZXNzaW9uX3dpdGhfTWlzc2luZ19YX3MucGRmBjoGRV Q/Regression%20with%20Missing%20X's.pdf 14

15 Conditional Mean Imputation http://statistics.ucla.edu/system/resources/BAhbBlsHOgZmSSI7MjAxMi8wNS8yOS8xNF80OF8wOV83M19SZWdyZXNzaW9uX3dpdGhfTWlzc2luZ19YX3MucGRmBjoGRV Q/Regression%20with%20Missing%20X's.pdf 15

16 Multiple Imputation Little, R., & Rubin, D. (2002). Introduction. In Statistical Analysis with Missing Data (2nd ed., pp. 19-20). Hoboken, New Jersey: John Wiley & Sons.16

17 Steps Involved in Multiple Imputation Introduce random variation into the process of imputing missing values Generate several data sets, each with different imputed values Perform an analysis on each data set Combine the results into a single set of parameter estimates, standard errors, and test statistics http://support.sas.com/resources/papers/proceedings12/312-2012.pdf17

18 Introducing Randomness into a M.I. Model http://support.sas.com/resources/papers/proceedings12/312-2012.pdf18

19 Adding Variability to the Imputed Values http://support.sas.com/resources/papers/proceedings12/312-2012.pdf19

20 Why Do We Want to Add Variability? This is the whole point of multiple imputation http://www.stat.columbia.edu/~gelman/arm/missing.pdf20

21 Combining Inferences from Imputed Data http://support.sas.com/resources/papers/proceedings12/312-2012.pdf21

22 Simplified Form using a Regression Example http://www.stat.columbia.edu/~gelman/arm/missing.pdf22

23 Likelihood-Based Inference https://www.amstat.org/sections/srms/webinarfiles/ModernMethodWebinarMay2012.pdf23

24 ML with Ignorable Missing Data https://www.amstat.org/sections/srms/webinarfiles/ModernMethodWebinarMay2012.pdf24

25 ML with Ignorable Missing Data https://www.amstat.org/sections/srms/webinarfiles/ModernMethodWebinarMay2012.pdf25

26 Comparison of Methods ListwisePairwise Easiest to implement Has minimal effect if data are MCAR, or MAR for large sample sizes Has a tendency to bias results Uses more information than listwise Increases statistical power Also easy to implement Multiple ImputationMaximum Likelihood Requires no special software once the imputed datasets are generated Requires specification of a model Requires more assumptions Requires specification of a model for each variable Most asymptotically efficient Most complex You get model comparison statistics (AIC, BIC, etc.) 26


Download ppt "How to Handle Missing Values in Multivariate Data By Jeff McNeal & Marlen Roberts 1."

Similar presentations


Ads by Google