Multiple Imputation
Multiple Imputation Missing data method developed by Donald Rubin Simulate multiple samples of “complete” data, and compute estimates and standard errors from the complete data. Rubin distinguished multiple imputation from Different models Same model We will focus on same-model multiple imputation
Missing Data mechanism Missing data mechanisms MCAR (Missing completely at random)—missing data are a random subsample of complete data MAR (Missing at random)—missing data mechanism may depend on independent variables, but not the response
Missing Data mechanism Ignorable nonresponse MCAR Parameter for missing process different from data parameters Example for discussion Growth curve models for largemouth bass
Computer Example 5 Teachers, 3 methods, Y=relative improvement Method 10, 7 6 11 B 4 . 8.5 4,5 3 C 9 13 16 8
Multiple Imputation simulation Repeated draws i=1,…,M from the posterior predictive distribution of the missing data. The complete data sets have the same set of fully observed responses. In practice, there are numerous ways to generate complete data. Introductory methods rely on monotone missingness, and classic results for conditional distributions of jointly multivariate normal random variables.
Multiple Imputation simulation In a multivariate normal setting (some values of Y missing), we generate our draws from Y|X:
Multiple Imputation Estimation Combining results from imputation for parameters of interest is surprisingly straightforward. E.g., let q represent the PMM’s for Method. We can compute
Multiple Imputation Estimation Our estimate and its standard error can be computed as:
Multiple Imputation Estimation Combining estimates in SAS is non-standard. Our example with LSMeans is atypical, and more straightforward than most.