The bane of data analysis

Slides:



Advertisements
Similar presentations
Improving the quality of data through imputing missing values (Part One: Introduction to types of missing data) Saeid Shahraz MD, PhD Student Heller School.
Advertisements

Treatment of missing values
Missing values problem in Data Mining
Some birds, a cool cat and a wolf
MISUNDERSTOOD AND MISUSED
Statistical Methods for Missing Data Roberta Harnett MAR 550 October 30, 2007.
PEAS wprkshop 2 Non-response and what to do about it Gillian Raab Professor of Applied Statistics Napier University.
Multiple imputation using ICE: A simulation study on a binary response Jochen Hardt Kai Görgen 6 th German Stata Meeting, Berlin June, 27 th 2008 Göteborg.
STAT 3130 Statistical Methods II Missing Data and Imputation.
1 Introduction to Survey Data Analysis Linda K. Owens, PhD Assistant Director for Sampling & Analysis Survey Research Laboratory University of Illinois.
G Lecture 11 G Session 12 Analyses with missing data What should be reported?  Hoyle and Panter  McDonald and Moon-Ho (2002)
Imputation for Multi Care Data Naren Meadem. Introduction What is certain in life? –Death –Taxes What is certain in research? –Measurement error –Missing.
SW 983 Missing Data Treatment Most of the slides presented here are from the Modern Missing Data Methods, 2011, 5 day course presented by the KUCRMDA,
The Impact of Missing Data on the Detection of Nonuniform Differential Item Functioning W. Holmes Finch.
1 G Lect 13M Why might data be missing in psychological studies? Missing data patterns Overview of statistical approaches Example G Multiple.
Chapter 18 Sampling Distribution Models *For Means.
Tutorial I: Missing Value Analysis
Data Screening. What is it? Data screening is very important to make sure you’ve met all your assumptions, outliers, and error problems. Each type of.
Missing data: Why you should care about it and what to do about it
Sampling Distribution Models
Handling Attrition and Non-response in the 1970 British Cohort Study
Sampling Distributions
MISSING DATA AND DROPOUT
How useful is a reminder system in collection of follow-up quality of life data in clinical trials? Dr Shona Fielding.
The Centre for Longitudinal Studies Missing Data Strategy
Maximum Likelihood & Missing data
Chapter 25 Comparing Counts.
Chapter 5 Sampling Distributions
Sampling Distributions
Chapter 5 Sampling Distributions
Chapter 5 Sampling Distributions
Sampling Distribution Models
Introduction to Survey Data Analysis
Hypothesis Testing Using the Chi Square (χ2) Distribution
Multiple Imputation.
Multiple Imputation Using Stata
How to handle missing data values
Chapter 7 Sampling Distributions
Variables and Measurement (2.1)
Presenter: Ting-Ting Chung July 11, 2017
Sampling Distribution
Lesson Comparing Two Means.
Sampling Distribution
Chapter 5 Sampling Distributions
Gathering and Organizing Data
CHAPTER 7 Sampling Distributions
The European Statistical Training Programme (ESTP)
CH2. Cleaning and Transforming Data
EM for Inference in MV Data
Chapter 5 Sampling Distributions
Principal Component Analysis
What Is a Sampling Distribution?
Chapter 26 Comparing Counts.
Chapter 7: Sampling Distributions
Missing Data Mechanisms
CHAPTER 7 Sampling Distributions
Non response and missing data in longitudinal surveys
Analysis of missing responses to the sexual experience question in evaluation of an adolescent HIV risk reduction intervention Yu-li Hsieh, Barbara L.
Gathering and Organizing Data
Chapter 9: Sampling Distributions
EM for Inference in MV Data
Chapter 26 Comparing Counts Copyright © 2009 Pearson Education, Inc.
CHAPTER 7 Sampling Distributions
Chapter 4: Missing data mechanisms
The European Statistical Training Programme (ESTP)
What do Samples Tell Us Variability and Bias.
Chapter 26 Comparing Counts.
Clinical prediction models
Chapter 13: Item nonresponse
Presentation transcript:

The bane of data analysis Missing values The bane of data analysis

Missing values in computations Totals using arithmetic Totals using statistical function “sum” These methods treat missing values differently

Statistical functions for formulas

Beware: Missing data can introduce bias Depends on why data are missing MCAR – Missing Completely at Random MAR – Missing At Random MNAR – Missing Not At Random … and on how many values are missing

When does missing data introduce bias? When the missing data would have a different distribution than the available data Examples: In your clinical data, patients that are younger and healthier are less likely to have their blood pressures taken (MAR), but you think the BP’s you have are representative of the ones you don’t and you have other variables that can identify the relevant subgroups. Patients who don’t have good insurance are less likely to be in your clinical data because they don’t seek medical care (MNAR) – and you can’t assume the data you have are similar to the missing data.

When does missing data NOT introduce bias? When there’s no pattern to missingness (MCAR) But, if too many values are missing, then the variable is not informative But, in a multivariate analysis, the combination of missing values may kick out too many observations

What happens in data analysis? Computations with missing values result in missing values (with some exceptions) Thus, in most multivariate procedures, all observations with a missing value for any of the variables being utilized are thrown out The result is a “complete case analysis” - which is sometimes OK for MCAR. Beware: different procedures using the same data may be using different observations due to the missing value pattern. You may want to delete the observations that with missing values up front.

Possible Solutions MCAR - complete case analysis or imputation MAR – imputation based on other variables that are related to the missing data pattern based on expert judgment MNAR – be very careful trying to draw an inference

Other considerations Informative missing – the fact of missingness can be an interesting category in itself The loss of observations in a model due to missing values will decrease your power. Are you better off leaving out a problematic variable out of your model? Always check your statistical reports for the actual sample size included Check to see if the missing observations are comparable in terms of your outcome or other important variables.

An ounce of prevention Consider the “Required response” option for crucial variables But this can frustrate the respondent, so test your questions to make sure they’re not confusing Incentivize the respondents to complete the survey

(JMP) Tables>Missing Data Pattern For the 20 PCA items in lei_krupdat_data_for_fellows.jmp

(JMP) Tables>Missing Data Pattern For the 3 Cluster items based on the 20 PCA items in lei_krupdat_data_for_fellows.jmp