Presentation is loading. Please wait.

Presentation is loading. Please wait.

Imputation for Multi Care Data Naren Meadem. Introduction What is certain in life? –Death –Taxes What is certain in research? –Measurement error –Missing.

Similar presentations


Presentation on theme: "Imputation for Multi Care Data Naren Meadem. Introduction What is certain in life? –Death –Taxes What is certain in research? –Measurement error –Missing."— Presentation transcript:

1 Imputation for Multi Care Data Naren Meadem

2

3 Introduction What is certain in life? –Death –Taxes What is certain in research? –Measurement error –Missing data Missing data can be: –Due to preventable errors, mistakes, or lack of foresight by the researcher –Due to problems outside the control of the researcher –Deliberate, intended, or planned by the researcher to reduce cost or respondent burden –Due to differential applicability of some items to subsets of respondents –Etc.

4 Missing Data Mechanisms (1) Preliminaries:  Y obs : The non-missing or observed data  Y miss : The missing or unobserved data  M: Whether the data on a given item for a given case is missing (1) or not (0) Missing Completely at Random (MCAR)  The probability that an item is missing (M) is unrelated to either the observed (Y obs ) or the unobserved (Y miss ) data Missing at Random (MAR)  The probability that an item is missing (M) may be related to the observed data (Y obs ) but is unrelated to the unobserved data (Y miss ) Missing Not at Random (MNAR)  The probability that an item is missing (M) is related to the (unknown) value of the unobserved data (Y miss ), even after conditioning on the observed data (Y obs )

5 Missing Data Mechanisms (2) The appropriateness of different missing data treatments depends (among other things) on the underlying missing data mechanism “Real” missing data can seldom be classified into just one of the three (MCAR, MAR, MNAR) Because we don’t have access to the missing data (Y miss ), we can not empirically test whether or not the data is MNAR If we know (or can convincingly argue) that the data is not MNAR, a test of whether the data is MCAR is available (e. g. in SPSS Missing Values Analysis).

6 Missing Data in Research Studies Missing data mechanism  Missing completely at random (MCAR)—Ignorable  Missing at random (MAR)—Conditionally ignorable  Missing not at random (MNAR)—Nonignorable Amount of missing data  Percent of cases with missing data  Percent of variables having missing data  Percent of data values that are missing Pattern of missing data  Missing by design  Missing data patterns Univariate Monotonic File matching General

7 Newer Missing Data Treatments Modern state-of-the-art missing data treatments for MAR data –Maximum likelihood –Multiple imputation Cutting edge investigational missing data treatments for MNAR data –Pattern mixture models –Selection models –Shared parameter models –Inverse probability weighting

8 Clustering methods: Mean substitution Substitute the mean of the variable for the missing values

9 Graphical illustration

10 Better methods of handling missing data Full information maximum likelihood methods  Can handle data that are MAR and NI Special consideration required for NI data  Implemented as part of hierarchical linear modeling and structural equation modeling  Missing data handled during analysis Multiple imputation  Can also handle data that are MAR and NI Special consideration required for NI data  Simulation-based approach  Missing data are handled separately from analysis

11 Multiple imputation Three steps: 1.Generate multiple complete-case datasets (imputations) through simulation (only 5 – 10 are needed) 2.Perform analyses on each imputation 3.Combine the multiple analyses using a set of special rules (Rubin’s (1987) rules)

12

13 Results No Imputation Naive Bayes Logistic Regression SVM AUC: 0.6362 0.6025 0.635 Imputation AUC: 0.6377 0.6033 0.649

14 Conclusions When you have missing data, think about WHY they are missing  Ask yourself whether you have observed variables that could explain why the data are missing Missing data handled improperly can bias your conclusions Multiple imputation is one good way of handling missing data Caveats:  Multiple imputation is complex An evolving field The standards of reporting the results from imputed data are not well-established  If you need to do it (especially if you think your data are NI), read the source papers I referenced at the beginning of the slides

15 Questions?


Download ppt "Imputation for Multi Care Data Naren Meadem. Introduction What is certain in life? –Death –Taxes What is certain in research? –Measurement error –Missing."

Similar presentations


Ads by Google