Missing Data.. What do we mean by missing data? Missing observations which were intended to be collected but: –Never collected –Lost accidently –Wrongly.

Slides:



Advertisements
Similar presentations
Treatment of missing values
Advertisements

1 QOL in oncology clinical trials: Now that we have the data what do we do?
Some birds, a cool cat and a wolf
Multilevel Models 4 Sociology 8811, Class 26 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission.
Lecture 17: Regression for Case-control Studies BMTRY 701 Biostatistical Methods II.
How to Handle Missing Values in Multivariate Data By Jeff McNeal & Marlen Roberts 1.

Duration models Bill Evans 1. timet0t0 t2t2 t 0 initial period t 2 followup period a b c d e f h g i Flow sample.
Ordered probit models.
BIOST 536 Lecture 3 1 Lecture 3 – Overview of study designs Prospective/retrospective  Prospective cohort study: Subjects followed; data collection in.
How to deal with missing data: INTRODUCTION
1 Regression and Calibration EPP 245 Statistical Analysis of Laboratory Data.
Review for Exam 2 Some important themes from Chapters 6-9 Chap. 6. Significance Tests Chap. 7: Comparing Two Groups Chap. 8: Contingency Tables (Categorical.
Statistical Methods for Missing Data Roberta Harnett MAR 550 October 30, 2007.
PEAS wprkshop 2 Non-response and what to do about it Gillian Raab Professor of Applied Statistics Napier University.
Logistic Regression In logistic regression the outcome variable is binary, and the purpose of the analysis is to assess the effects of multiple explanatory.
Multiple imputation using ICE: A simulation study on a binary response Jochen Hardt Kai Görgen 6 th German Stata Meeting, Berlin June, 27 th 2008 Göteborg.
Missing Data in Epidemiology: Issues & Approaches
Longitudinal Data Analysis for Social Science Researchers Thinking About Event Histories
Essentials of survival analysis How to practice evidence based oncology European School of Oncology July 2004 Antwerp, Belgium Dr. Iztok Hozo Professor.
Survival Data John Kornak March 29, 2011
Workshop on methods for studying cancer patient survival with application in Stata Karolinska Institute, 6 th September 2007 Modeling relative survival.
How do Lawyers Set fees?. Learning Objectives 1.Model i.e. “Story” or question 2.Multiple regression review 3.Omitted variables (our first failure of.
Multilevel Analysis Kate Pickett Senior Lecturer in Epidemiology.
MULTILEVEL ANALYSIS Kate Pickett Senior Lecturer in Epidemiology SUMBER: www-users.york.ac.uk/.../Multilevel%20Analysis.ppt‎University of York.
Scientific question: Does the lunch intervention impact cognitive ability? The data consists of 4 measures of cognitive ability including:Raven’s score.
1 Multiple Imputation : Handling Interactions Michael Spratt.
Applications of G-estimation using a new Stata command Jonathan Sterne Kate Tilling Department.
Biostatistics Case Studies 2007 Peter D. Christenson Biostatistician Session 3: Incomplete Data in Longitudinal Studies.
1 Introduction to Survey Data Analysis Linda K. Owens, PhD Assistant Director for Sampling & Analysis Survey Research Laboratory University of Illinois.
Basic Biostatistics Prof Paul Rheeder Division of Clinical Epidemiology.
G Lecture 11 G Session 12 Analyses with missing data What should be reported?  Hoyle and Panter  McDonald and Moon-Ho (2002)
Applied Epidemiologic Analysis - P8400 Fall 2002 Lab 10 Missing Data Henian Chen, M.D., Ph.D.
Imputation for Multi Care Data Naren Meadem. Introduction What is certain in life? –Death –Taxes What is certain in research? –Measurement error –Missing.
1 Introduction to Survey Data Analysis Linda K. Owens, PhD Assistant Director for Sampling & Analysis Survey Research Laboratory University of Illinois.
MBP1010 – Lecture 8: March 1, Odds Ratio/Relative Risk Logistic Regression Survival Analysis Reading: papers on OR and survival analysis (Resources)
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 13 Multiple Regression Section 13.3 Using Multiple Regression to Make Inferences.
SW 983 Missing Data Treatment Most of the slides presented here are from the Modern Missing Data Methods, 2011, 5 day course presented by the KUCRMDA,
MISSING DATA IN THE INFECTIOUS DISEASES INSTITUTE CLINIC DATABASE Agnes N Kiragga East Africa IeDEA investigators’ meeting 4-5 th May 2010 East African.
© John M. Abowd 2007, all rights reserved General Methods for Missing Data John M. Abowd March 2007.
1 G Lect 13W Imputation (data augmentation) of missing data Multiple imputation Examples G Multiple Regression Week 13 (Wednesday)
The Impact of Missing Data on the Detection of Nonuniform Differential Item Functioning W. Holmes Finch.
Tutorial I: Missing Value Analysis
INFO 4470/ILRLE 4470 Visualization Tools and Data Quality John M. Abowd and Lars Vilhuber March 16, 2011.
Birthweight (gms) BPDNProp Total BPD (Bronchopulmonary Dysplasia) by birth weight Proportion.
A framework for multiple imputation & clustering -Mainly basic idea for imputation- Tokei Benkyokai 2013/10/28 T. Kawaguchi 1.
DATA STRUCTURES AND LONGITUDINAL DATA ANALYSIS Nidhi Kohli, Ph.D. Quantitative Methods in Education (QME) Department of Educational Psychology 1.
HANDLING MISSING DATA.
Missing data: Why you should care about it and what to do about it
MISSING DATA AND DROPOUT
Lecture 18 Matched Case Control Studies
How useful is a reminder system in collection of follow-up quality of life data in clinical trials? Dr Shona Fielding.
The Centre for Longitudinal Studies Missing Data Strategy
Maximum Likelihood & Missing data
Introduction to Survey Data Analysis
Multiple Imputation.
Introduction to Logistic Regression
Multiple Imputation Using Stata
How to handle missing data values
Multiple logistic regression
Dealing with missing data
The European Statistical Training Programme (ESTP)
Biost 513 Discussion Section Week 9
Narrative Reviews Limitations: Subjectivity inherent:
Missing Data Mechanisms
Analysis of missing responses to the sexual experience question in evaluation of an adolescent HIV risk reduction intervention Yu-li Hsieh, Barbara L.
Clinical prediction models
Global PaedSurg Research Training Fellowship
Chapter 13: Item nonresponse
Missing data: Is it all the same?
Presentation transcript:

Missing Data.

What do we mean by missing data? Missing observations which were intended to be collected but: –Never collected –Lost accidently –Wrongly collected so deleted Outcomes and/or Explanatory variables

Effect of Missing Data Can cause –Biased estimates, means, regression parameters –Biased standard errors, resulting in incorrect P-values and CI

Missing data mechanism 1. Missing Completely At Random : MCAR –Missing does not depend on observed or unobserved values –Eg. Missing FBC because a tube with blood material is accidently broken –BP missing due broken machine

Missing data mechanism 2. Missing At Random : MAR –Missing depends on observed data, but not on the unobserved data. –Eg year olds are less likely to respond to a follow up postal questionnaire – more likely to change address several times

3. Missing Not At Random: MNAR –Given all available observed information, the probability of being missing still depends on the unobserved data –Eg. Patient misses an appointment because they feel ill. This illness (e.g.flu) is related to the measurement intended to be made (e.g temperature) Missing data mechanism

The Assumptions –Cannot tell from data at hand whether the missing values are MCAR, MNAR or MAR –Can distinguish between MCAR and MAR –MAR can be made more likely by looking at associations between missing values and non missing observations in explanatory variables

Simple methods to handle missing data Complete Case (CC) analysis Mean Imputation Regression imputation Stochastic Imputation Problem: Makes results too certain

Multiple Imputation (MI) Under MAR assumption, gives less biased estimates and SEs, when compared to CC Covers many different data structures Never absolute best thing to do

Multiple Imputation (MI) IDx1x IDx1x ? ? x2x1ID

Express our uncertainty about missing data by creating ‘m’ imputed data sets Analyse each of these in usual way Combine estimates using particular rules (Rubin’s rules) Key Idea behind Imputation

Two variables: X1 and X2 –X1 missing in some records –X2 not missing, observed in every unit Learn relationship between X1 and X2 Complete data set by drawing the missing observations from X1 | X2

Example 1 Longitudinal Breast Cancer study –Outcome: Early death or disease recurrence –Explanatory variables: age, meno, tam Cox regression

How much is missing? variables with no mv's: id meno rectime censrec _st _d _t _t0 lnt Variable | type obs mv variable label age | float age, years tam | byte hormonal therapy N: 686

CC Analysis Cox regression -- Breslow method for ties No. of subjects = 452 Number of obs = 452 No. of failures = 193 Time at risk = LR chi2(3) = 5.15 Log likelihood = Prob > chi2 = _t | Haz. Ratio Std. Err. z P>|z| [95% Conf. Interval] age | tam | meno |

MI in Practice STATA : ICE –Multiple Imputation by Chained Equations (MICE) Univariate imputation - uvis Multivariate imputation - ice

Density Graphs by agemiss Age (years)

MI Analysis mim: stcox age tam meno Multiple-imputation estimates (stcox) Imputations = 5 Minimum obs = 686 Minimum dof = _t | Haz. Rat. Std. Err. t P>|t| [95% Conf. Int.] FMI age | tam | meno |

Summary Most studies will have missing data MI suitable. Gives less biased estimates, SE, under MAR and MCAR MI is a useful tool for dealing with missing data.