Presentation is loading. Please wait.

Presentation is loading. Please wait.

Partially Missing At Random and Ignorable Inferences for Parameter Subsets with Missing Data Roderick Little Rennes 20151.

Similar presentations


Presentation on theme: "Partially Missing At Random and Ignorable Inferences for Parameter Subsets with Missing Data Roderick Little Rennes 20151."— Presentation transcript:

1 Partially Missing At Random and Ignorable Inferences for Parameter Subsets with Missing Data Roderick Little Rennes 20151

2 Outline Inference with missing data: Rubin's (1976) paper on conditions for ignoring the missing-data mechanism Rubin’s standard conditions are sufficient but not necessary: example Propose definitions of partially MAR, ignorability for likelihood (and Bayes) inference for subsets of parameters (Little and Zanganeh, 2013) Application: Subsample ignorable methods for regression with missing covariates (Little and Zhang, 2011) Joint work with Nanhua Zhang, Sahar Zanganeh Rennes 20152

3 v 3

4 Rubin (1976 Biometrika) Landmark paper (5000+ citations, after being rejected by many journals!) –I wrote my first referee’s report (11 pages!), and an obscure discussionon ancillarity Modeled the missing data mechanism by treating missingness indicators as random variables, assigning them a distribution Sufficient conditions under which missing data mechanism can be ignored for likelihood and frequentist inference about parameters –Focus here on likelihood, Bayes Rennes 20154

5 Ignoring the mechanism Full likelihood: Likelihood ignoring mechanism: Missing data mechanism can be ignored for likelihood inference when Rennes 20155

6 Rubin’s sufficient conditions for ignoring the mechanism Missing data mechanism can be ignored for likelihood inference when –(a) the missing data are missing at random (MAR): –(b) distinctness of the parameters of the data model and the missing-data mechanism: MAR is the key condition: without (b), inferences are valid but not fully efficient Rennes 20156

7 More on Rubin (1976) Seaman et al. (2013) propose a more complex but precise notation Distinguish between “direct” likelihood inference and “frequentist likelihood inference –“Realized MAR” sufficient for direct likelihood inference – R depends only on realized observed data –“Everywhere MAR” sufficient for frequentist likelihood inference: MAR condition needs to hold for observed values in future repeated sampling –Rubin (1976) uses term “always MAR”. See also Mealli and Rubin (2015, forthcoming) Rennes 20157

8 “Sufficient for ignorable” is not the same as “ignorable” These definitions have come to define ignorability (e.g. Little and Rubin 2002) However, Rubin (1976) described (a) and (b) as the "weakest simple and general conditions under which it is always appropriate to ignore the process that causes missing data". These conditions are not necessary for ignoring the mechanism in all situations. Rennes 20158

9 Example 1: Nonresponse with auxiliary data 0001100011 ???? ???? Not linked Or whole population N Rennes 20159

10 MAR, ignorability for parameter subsets MAR and ignorability are defined in terms of the complete set of parameters in the data model for D It would be useful to have a definition of MAR that applies to subsets of parameters of substantive interest. Example: inference for regression parameters might be “partially MAR” when parameters for a model for full data are not. Rennes 201510

11 MAR, ignorability for parameter subsets Rennes 201511

12 MAR, ignorability for parameter subsets Rennes 201512

13 Partial MAR given a function of mechanism Rennes 201513

14 Example 1: Auxiliary Survey Data 0001100011 ???? ???? Not linked Rennes 201514

15 Ex. 2: MNAR Monotone Bivariate Data 0001100011 ???? Rennes 201515

16 More generally… Rennes 201516

17 Application: missing data in covariates ? Target: regression of Y on X, Z; missing data on X BUT: if Pr(X missing)= g(Z, X) CC analysis is consistent, but IL methods (or weighted CC) are inconsistent since mechanism is not MAR Simulations favoring IL often generate data under MAR, hence are biased against CC IL methods include information for the regression in the incomplete cases (particularly intercept and coefficients of Z) and are valid assuming MAR: Pr(X missing)= g(Z, Y) Rennes 201517

18 PatternObservation, i P1i = 1,…,m√√√ P2i = m +1,…,n√?√ More general missing data in X Key: √ denotes observed, ? denotes observed or missing Could be vector P1 P2 Rennes 201518

19 Ignorable Likelihood methods Rennes 201519

20 CC analysis MNAR mechanism: Missingness can depend on missing values of X Rennes 201520 Follows from (*)

21 PatternObservation, i P1i = 1,…,m√√? P2i = m +1,…,n√?? Extension: missing data on X and Y Key: √ denotes observed, ? denotes observed or missing Could be vector P1: covariates complete P2: x incomplete Rennes 201521

22 SSIL analysis, X and Y missing Rennes 201522

23 SSIL likelihood, X and Y missing Rennes 201523

24 PatternObservation, i P1i = 1,…,m√√√? P2i = m +1,…,m+r√√?? P3i = m +r+1,…,n√??? Two covariates X, W with different mechanisms SSIL: analyze cases in patterns 1 and 2 P1: covariates complete P2: x obs, w, y may be mis P3: x mis, w, y may be mis Rennes 201524

25 Subsample Ignorable Likelihood (SSIL) Target: regression of Y on Z, X, and W Assume: By similar proof to previous case, data are SSIL applies IL method (e.g. ML) to the subsample of cases for which X is observed, but W or Y may be incomplete Rennes 201525

26 Simulation Study For each of 1000 replications, 5000 observations Z, W, X and Y generated as: 20-35% of missing values of W and X generated by four mechanisms Rennes 201526

27 Simulation: missing data mechanisms Mechanisms I: All valid1000 1000 II: CC valid1110 1110 III: IML valid-21001 1101 IV: SSIML valid1110-21101 Rennes 201527

28 RMSEs*1000 of Estimated Regression Coefficients for Before Deletion (BD), Complete Cases (CC), Ignorable Maximum Likelihood (IML) and Subsample Ignorable Maximum Likelihood (SSIML), under Four Missing Data Mechanisms. I*IIIIIIVIIIIIIIV BD 2728 2750465046 CC 45445533228671426246 IML 372313611658965390 SSIML 4213336049708031969 Valid: ALL CC IML SSIML ALL CC IML SSIML Rennes 201528

29 Missing Covariates in Survival Analysis Rennes 201529

30 How to choose X, W Choice requires understanding of the mechanism: Variables that are missing based on their underlying values belong in W Variables that are MAR belong in X Collecting data about why variables are missing is obviously useful to get the model right But this applies to all missing data adjustments… Rennes 201530

31 Other questions and points –How much is lost from SSIL relative to full likelihood model of data and missing data mechanism? In some special cases, SSIL is efficient for a pattern-mixture model In other cases, trade-off between additional specification of mechanism and loss of efficiency from conditional likelihood –MAR analysis applied to the subset does not have to be likelihood-based E.g. weighted GEE, AIPWEE –Pattern-mixture models (Little, 1993) can also avoid modeling the mechanism Rennes 201531

32 Conclusions Defined partial MAR for a subset of parameters Application to regression with missing covariates: sometimes discarding data is useful! Subsample ignorable likelihood: apply likelihood method to data, selectively discarding cases based on assumed missing- data mechanism –More efficient than CC –Valid for P-MAR mechanisms where IL, CC are inconsistent Rennes 201532

33 References Harel, O. and Schafer, J.L. (2009). Partial and Latent Ignorability in missing data problems. Biometrika, 2009, 1-14 Little, R.J.A. (1993). Pattern ‑ Mixture Models for Multivariate Incomplete Data. JASA, 88, 125-134. Little, R. J. A., and Rubin, D. B. (2002). Statistical Analysis with Missing Data (2 nd ed.) Wiley. Little, R.J. and Zangeneh, S.Z. (2013). Missing at random and ignorability for inferences about subsets of parameters with missing data. University of Michigan Biostatistics Working Paper Series. Little, R. J. and Zhang, N. (2011). Subsample ignorable likelihood for regression analysis with missing data. JRSSC, 60, 4, 591–605. Rubin, D. B. (1976). Inference and Missing Data. Biometrika 63, 581- 592. Seaman, S., Galati, J., Jackson, D. and Carlin, J. (2013). What Is Meant by “Missing at Random”? Statist. Sci. 28, 2, 257-268. Zhang, N. & Little, R.J. (2014). Lifetime Data Analysis, published online Aug 2014. doi:10.​1007/​s10985-014-9304-x. Rennes 201533


Download ppt "Partially Missing At Random and Ignorable Inferences for Parameter Subsets with Missing Data Roderick Little Rennes 20151."

Similar presentations


Ads by Google