Presentation is loading. Please wait.

Presentation is loading. Please wait.

Missing Data Mechanisms

Similar presentations


Presentation on theme: "Missing Data Mechanisms"— Presentation transcript:

1 Missing Data Mechanisms
MCAR MAR MNAR References: Schafer, J., Graham, J.W. Missing data: our view of the state of the art. Psychological Methods,7(2), , 2002 Raghunathan, T.E., What do we do with missing data ? Some options for analysis of incomplete data. Ann. Rev. Public Health 25: , 2004

2 Graphical representation
Y = variable partly missing X = variable completely observed Z = cause of missingness (unrelated to Y) R = represents missingness

3 X Z X Z X Z R Y Y R Y R MCAR MAR MNAR

4 Use of conditional probability
Yc = the complete vector of Y observations Yc = ( Yo , Ym) MCAR: P (R | Yc) = P(R) Prob of missing does not depend on Yo MAR: P (R | Yc) = P( R | Yo) Prob of missing depends only on Yo MNAR: P (R | Yc) = P( R | Ym) Prob of missing depends on unobserved Ym

5 Methods for analyzing data with missing values in the repeated measures situation
Case deletion: delete subjects with missing components (complete case analysis) Available case analysis: analysis is based on all observable data (use data from subjects with complete Y vectors as well as incomplete Y vectors)

6 Simulation Study: Parameter: MCAR MAR MNAR
Μean(Y): (7.0) (19.3) (30.7) Std(Y): (5.3) (5.8) 12.2(13.2) Rho: (0.2) (0.37) 0.34(0.36) Beta Y|X: (0.27) (0.51) 0.21(0.43) Beta X|Y: (0.25) (0.44) (0.52) Generate: 50 observations from bivariate normal (Y,X) MCAR: prob Y missing is 0.73 (high !) MAR: prob Y missing if X < 141 MNAR: prob Y missing if Y < 141

7 Methods for analyzing survey data
Weight responses that are present Average the available items (social sciences based on standardized scores but not studied in any systematic fashion)

8 Single imputation MS: Mean substitution HD: Hot Deck
CM : conditional mean PD: predictive distribution

9 Average parameter estimate (RMSE)
MCAR Average parameter estimate (RMSE) MS HD CM PD µ = 125.0 125.1 (7.18) 125.2 (7.89) (6.26) (6.57) σ = 25.0 12.3 (13.0) 23.4 (5.40) 18.2 (8.57) 24.7 (5.37) ρ = .60 .30 (.32) .16 (.46) .79 (.27) .59 (.20) βy|x = .60 (.45) (.47) .61 (.25) .60 βx|y= .60 (.26) .17 1.12 (.64) (.24)

10 Average parameter estimate (RMSE)
MAR Average parameter estimate (RMSE) MS HD CM PD µ = 125.0 143.5 (19.4) (19.5) 124.9 (18.1) 124.8 (18.3) σ = 25.0 10.6 (14.6) 20.0 (6.68) 20.4 (10.7) 27.0 (8.77) ρ = .60 .08 (.52) .04 (.57) .64 (.48) .50 (.40) βy|x = .60 (.56) .61 .62 βx|y= .60 .20 (.44) .06 .78 (.75) .45

11 Average parameter estimate (RMSE)
MNAR Average parameter estimate (RMSE) MS HD CM PD µ = 125.0 155.5 (30.7) (30.73) 151.6 (26.9) σ = 25.0 6.2 (18.9) 11.7 (13.7) 8.42 (16.9) 12.9 (12.7) ρ = .60 .08 (.47) .04 (.53) .64 (.40) .50 (.37) βy|x = .60 (.56) .61 (.43) .62 βx|y= .60 .20 (.55) .06 .78 (1.72) .45 (.68)

12 ML estimation Widely accepted
Yields unbiased estimators under general regular conditions Provides a mechanism to do inference: testing hypotheses and confidence intervals Often relies on the EM algorithm Newton-Raphson /Fisher scoring used in multilevel modeling

13 Software for ML estimation
SPSS: missing data module EMCOV NORM SAS: Proc Mixed S-Plus: lme function STATA LISREL Mplus HLM / MLWin (multi-level models)

14 Simulation Study: ML estimation
Parameter: MCAR MAR MNAR Μean(Y): (6.5) (16.9) (26.9) Std(Y): (5.7) (7.4) (13.2) Rho: (0.2) (0.38) (0.36) Beta Y|X: (0.27) (0.51) 0.21(0.43) Beta X|Y: (0.25) (0.38) (0.68) Generate: 50 observations from bivariate normal (Y,X) MCAR: prob Y missing is 0.73 (high !) MAR: prob Y missing if X < 141 MNAR: prob Y missing if Y < 141

15 ML estimation More attractive than ad-hoc methods
Assume a large sample May or may not be robust to model assumptions Assume MAR

16 Multiple Imputation Each missing value replaced by m > 1 values: effectively create m datasets Efficiency: (1 + λ / m)-1 where λ is the rate of missing information implies m need not be large but certainly larger than 1 Rubin’s rules for combining estimators are now well accepted Helps to be a Bayesian ! MAR is usually assumed

17 Software NORM Proc MI in SAS: regression, propensity scores, MCMC
This does NORM plus other routines SAS macro: IVE library S-Plus: missing data library (NORM) longitudinal data uses function PAN LISREL: missing data library like NORM SOLAS (same as Proc MI ??)

18 Comments on MI methods Regression based MI methods are really based on Ml estimation: usually require a multivariate normal distribution Should you transform skewed data to normality (log or power transformation)? Partial answer: no Graham and Schafer (1999) Practice of rounding data to create binary/ordinal variables ? Partial answer: okay even for small samples

19 Comments continued: However: better specialized methods are available
Schaffer (1997) for nominal data Liu et al (2000) for clustered data How about propensity scores ? No: can distort covariance structure in data (Allison, 2000)

20 Simulation Study: MI (NORM)
Parameter: MCAR MAR MNAR Μean(Y): (6.5) (17.2) (26.9) Std(Y): (5.9) (8.2) (12.1) Rho: (0.2) (0.37) (0.36) Beta Y|X: (0.27) (0.52) 0.21(0.43) Beta X|Y: (0.22) (0.38) (0.56) Generate: 50 observations from bivariate normal (Y,X) MCAR: prob Y missing is 0.73 (high !) MAR: prob Y missing if X < 141 MNAR: prob Y missing if Y < 141

21 Methods that do not assume MAR
Selection models Pattern Mixture models

22 Food for thought In an longitudinal study on aging many subjects die while on study Is MAR a reasonable assumption ? Alternatively: joint modeling of outcome and death may be superior


Download ppt "Missing Data Mechanisms"

Similar presentations


Ads by Google