Imputation for Multi Care Data Naren Meadem. Introduction What is certain in life? –Death –Taxes What is certain in research? –Measurement error –Missing.

Slides:



Advertisements
Similar presentations
Econometrics Session 1 – Introduction Amine Ouazad, Asst. Prof. of Economics.
Advertisements

Improving the quality of data through imputing missing values (Part One: Introduction to types of missing data) Saeid Shahraz MD, PhD Student Heller School.
Treatment of missing values
CountrySTAT Team-I November 2014, ECO Secretariat,Teheran.
Missing values problem in Data Mining
Some birds, a cool cat and a wolf
CJT 765: Structural Equation Modeling Class 3: Data Screening: Fixing Distributional Problems, Missing Data, Measurement.
Adapting to missing data
How to Handle Missing Values in Multivariate Data By Jeff McNeal & Marlen Roberts 1.

1 Unsupervised Learning With Non-ignorable Missing Data Machine Learning Group Talk University of Toronto Monday Oct 4, 2004 Ben Marlin Sam Roweis Rich.
Missing Data in Randomized Control Trials
How to deal with missing data: INTRODUCTION
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Modeling Achievement Trajectories When Attrition is Informative Betsy J. Feldman & Sophia Rabe- Hesketh.
Partially Missing At Random and Ignorable Inferences for Parameter Subsets with Missing Data Roderick Little Rennes
Missing Data.. What do we mean by missing data? Missing observations which were intended to be collected but: –Never collected –Lost accidently –Wrongly.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Statistical Methods for Missing Data Roberta Harnett MAR 550 October 30, 2007.
Survey Experiments. Defined Uses a survey question as its measurement device Manipulates the content, order, format, or other characteristics of the survey.
Multiple imputation using ICE: A simulation study on a binary response Jochen Hardt Kai Görgen 6 th German Stata Meeting, Berlin June, 27 th 2008 Göteborg.
Multivariate Statistical Data Analysis with Its Applications
Methods Inverse probability weighting –Can you predict probability of response? –Difficulties if more than one missing outcome or covariate Joint model.
Guide to Handling Missing Information Contacting researchers Algebraic recalculations, conversions and approximations Imputation method (substituting missing.
1 Multiple Imputation : Handling Interactions Michael Spratt.
1 Introduction to Survey Data Analysis Linda K. Owens, PhD Assistant Director for Sampling & Analysis Survey Research Laboratory University of Illinois.
G Lecture 11 G Session 12 Analyses with missing data What should be reported?  Hoyle and Panter  McDonald and Moon-Ho (2002)
Handling Attrition and Non- response in the 1970 British Cohort Study Tarek Mostafa Institute of Education – University of London.
1 Introduction to Survey Data Analysis Linda K. Owens, PhD Assistant Director for Sampling & Analysis Survey Research Laboratory University of Illinois.
SW 983 Missing Data Treatment Most of the slides presented here are from the Modern Missing Data Methods, 2011, 5 day course presented by the KUCRMDA,
© John M. Abowd 2007, all rights reserved General Methods for Missing Data John M. Abowd March 2007.
1 G Lect 13W Imputation (data augmentation) of missing data Multiple imputation Examples G Multiple Regression Week 13 (Wednesday)
Learning In Bayesian Networks. General Learning Problem Set of random variables X = {X 1, X 2, X 3, X 4, …} Training set D = { X (1), X (2), …, X (N)
The Impact of Missing Data on the Detection of Nonuniform Differential Item Functioning W. Holmes Finch.
Some key developments in data analysis Michael Babyak, PhD.
Missing Values Raymond Kim Pink Preechavanichwong Andrew Wendel October 27, 2015.
A REVIEW By Chi-Ming Kam Surajit Ray April 23, 2001 April 23, 2001.
Biostatistics Case Studies 2006 Peter D. Christenson Biostatistician Session 2: Correlation of Time Courses of Simultaneous.
Tutorial I: Missing Value Analysis
Biostatistics Case Studies Peter D. Christenson Biostatistician Session 3: Missing Data in Longitudinal Studies.
Pre-Processing & Item Analysis DeShon Pre-Processing Method of Pre-processing depends on the type of measurement instrument used Method of Pre-processing.
A framework for multiple imputation & clustering -Mainly basic idea for imputation- Tokei Benkyokai 2013/10/28 T. Kawaguchi 1.
DATA STRUCTURES AND LONGITUDINAL DATA ANALYSIS Nidhi Kohli, Ph.D. Quantitative Methods in Education (QME) Department of Educational Psychology 1.
Research and Evaluation Methodology Program College of Education A comparison of methods for imputation of missing covariate data prior to propensity score.
HANDLING MISSING DATA.
Missing data: Why you should care about it and what to do about it
Handling Attrition and Non-response in the 1970 British Cohort Study
Multiple Imputation using SOLAS for Missing Data Analysis
MISSING DATA AND DROPOUT
CJT 765: Structural Equation Modeling
How useful is a reminder system in collection of follow-up quality of life data in clinical trials? Dr Shona Fielding.
CH 5: Multivariate Methods
The Centre for Longitudinal Studies Missing Data Strategy
Introduction to Survey Data Analysis
Multiple Imputation.
Multiple Imputation Using Stata
Dealing with missing data
Presenter: Ting-Ting Chung July 11, 2017
Working with missing Data
The bane of data analysis
CH2. Cleaning and Transforming Data
Missing Data Mechanisms
Analysis of missing responses to the sexual experience question in evaluation of an adolescent HIV risk reduction intervention Yu-li Hsieh, Barbara L.
Chapter 4: Missing data mechanisms
The European Statistical Training Programme (ESTP)
Cases. Simple Regression Linear Multiple Regression.
Implementation of the Bayesian approach to imputation at SORS Zvone Klun and Rudi Seljak Statistical Office of the Republic of Slovenia Oslo, September.
Missing data: Is it all the same?
Considerations for the use of multiple imputation in a noninferiority trial setting Kimberly Walters, Jie Zhou, Janet Wittes, Lisa Weissfeld Joint Statistical.
Imputation Strategies When a Continuous Outcome is to be Dichotomized for Responder Analysis: A Simulation Study Lysbeth Floden, PhD1 Melanie Bell, PhD2.
Presentation transcript:

Imputation for Multi Care Data Naren Meadem

Introduction What is certain in life? –Death –Taxes What is certain in research? –Measurement error –Missing data Missing data can be: –Due to preventable errors, mistakes, or lack of foresight by the researcher –Due to problems outside the control of the researcher –Deliberate, intended, or planned by the researcher to reduce cost or respondent burden –Due to differential applicability of some items to subsets of respondents –Etc.

Missing Data Mechanisms (1) Preliminaries:  Y obs : The non-missing or observed data  Y miss : The missing or unobserved data  M: Whether the data on a given item for a given case is missing (1) or not (0) Missing Completely at Random (MCAR)  The probability that an item is missing (M) is unrelated to either the observed (Y obs ) or the unobserved (Y miss ) data Missing at Random (MAR)  The probability that an item is missing (M) may be related to the observed data (Y obs ) but is unrelated to the unobserved data (Y miss ) Missing Not at Random (MNAR)  The probability that an item is missing (M) is related to the (unknown) value of the unobserved data (Y miss ), even after conditioning on the observed data (Y obs )

Missing Data Mechanisms (2) The appropriateness of different missing data treatments depends (among other things) on the underlying missing data mechanism “Real” missing data can seldom be classified into just one of the three (MCAR, MAR, MNAR) Because we don’t have access to the missing data (Y miss ), we can not empirically test whether or not the data is MNAR If we know (or can convincingly argue) that the data is not MNAR, a test of whether the data is MCAR is available (e. g. in SPSS Missing Values Analysis).

Missing Data in Research Studies Missing data mechanism  Missing completely at random (MCAR)—Ignorable  Missing at random (MAR)—Conditionally ignorable  Missing not at random (MNAR)—Nonignorable Amount of missing data  Percent of cases with missing data  Percent of variables having missing data  Percent of data values that are missing Pattern of missing data  Missing by design  Missing data patterns Univariate Monotonic File matching General

Newer Missing Data Treatments Modern state-of-the-art missing data treatments for MAR data –Maximum likelihood –Multiple imputation Cutting edge investigational missing data treatments for MNAR data –Pattern mixture models –Selection models –Shared parameter models –Inverse probability weighting

Clustering methods: Mean substitution Substitute the mean of the variable for the missing values

Graphical illustration

Better methods of handling missing data Full information maximum likelihood methods  Can handle data that are MAR and NI Special consideration required for NI data  Implemented as part of hierarchical linear modeling and structural equation modeling  Missing data handled during analysis Multiple imputation  Can also handle data that are MAR and NI Special consideration required for NI data  Simulation-based approach  Missing data are handled separately from analysis

Multiple imputation Three steps: 1.Generate multiple complete-case datasets (imputations) through simulation (only 5 – 10 are needed) 2.Perform analyses on each imputation 3.Combine the multiple analyses using a set of special rules (Rubin’s (1987) rules)

Results No Imputation Naive Bayes Logistic Regression SVM AUC: Imputation AUC:

Conclusions When you have missing data, think about WHY they are missing  Ask yourself whether you have observed variables that could explain why the data are missing Missing data handled improperly can bias your conclusions Multiple imputation is one good way of handling missing data Caveats:  Multiple imputation is complex An evolving field The standards of reporting the results from imputed data are not well-established  If you need to do it (especially if you think your data are NI), read the source papers I referenced at the beginning of the slides

Questions?