SW 983 Missing Data Treatment Most of the slides presented here are from the Modern Missing Data Methods, 2011, 5 day course presented by the KUCRMDA,

Slides:



Advertisements
Similar presentations
Writing up results from Structural Equation Models
Advertisements

Treatment of missing values
Missing Data Analysis. Complete Data: n=100 Sample means of X and Y Sample variances and covariances of X Y
1 QOL in oncology clinical trials: Now that we have the data what do we do?
 Overview  Types of Missing Data  Strategies for Handling Missing Data  Software Applications and Examples.
Some birds, a cool cat and a wolf
CJT 765: Structural Equation Modeling Class 3: Data Screening: Fixing Distributional Problems, Missing Data, Measurement.
Adapting to missing data
How to Handle Missing Values in Multivariate Data By Jeff McNeal & Marlen Roberts 1.
Longitudinal Experiments Larry V. Hedges Northwestern University Prepared for the IES Summer Research Training Institute July 28, 2010.
Missing Data in Randomized Control Trials
How to deal with missing data: INTRODUCTION
Modeling Achievement Trajectories When Attrition is Informative Betsy J. Feldman & Sophia Rabe- Hesketh.
Today Concepts underlying inferential statistics
Correlation 1. Correlation - degree to which variables are associated or covary. (Changes in the value of one tends to be associated with changes in the.
Chapter 7 Correlational Research Gay, Mills, and Airasian
Statistical Methods for Missing Data Roberta Harnett MAR 550 October 30, 2007.
Objectives of Multiple Regression
Multiple imputation using ICE: A simulation study on a binary response Jochen Hardt Kai Görgen 6 th German Stata Meeting, Berlin June, 27 th 2008 Göteborg.
Guide to Handling Missing Information Contacting researchers Algebraic recalculations, conversions and approximations Imputation method (substituting missing.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 15 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
Statistical Power 1. First: Effect Size The size of the distance between two means in standardized units (not inferential). A measure of the impact of.
Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.
1 Introduction to Survey Data Analysis Linda K. Owens, PhD Assistant Director for Sampling & Analysis Survey Research Laboratory University of Illinois.
Biostatistics Case Studies 2008 Peter D. Christenson Biostatistician Session 5: Choices for Longitudinal Data Analysis.
G Lecture 11 G Session 12 Analyses with missing data What should be reported?  Hoyle and Panter  McDonald and Moon-Ho (2002)
Handling Attrition and Non- response in the 1970 British Cohort Study Tarek Mostafa Institute of Education – University of London.
Applied Epidemiologic Analysis - P8400 Fall 2002 Lab 10 Missing Data Henian Chen, M.D., Ph.D.
Imputation for Multi Care Data Naren Meadem. Introduction What is certain in life? –Death –Taxes What is certain in research? –Measurement error –Missing.
1 Introduction to Survey Data Analysis Linda K. Owens, PhD Assistant Director for Sampling & Analysis Survey Research Laboratory University of Illinois.
Aron, Aron, & Coups, Statistics for the Behavioral and Social Sciences: A Brief Course (3e), © 2005 Prentice Hall Chapter 12 Making Sense of Advanced Statistical.
CROSS-VALIDATION AND MODEL SELECTION Many Slides are from: Dr. Thomas Jensen -Expedia.com and Prof. Olga Veksler - CS Learning and Computer Vision.
Stat 112 Notes 9 Today: –Multicollinearity (Chapter 4.6) –Multiple regression and causal inference.
G Lecture 81 Comparing Measurement Models across Groups Reducing Bias with Hybrid Models Setting the Scale of Latent Variables Thinking about Hybrid.
1 G Lect 13W Imputation (data augmentation) of missing data Multiple imputation Examples G Multiple Regression Week 13 (Wednesday)
The Impact of Missing Data on the Detection of Nonuniform Differential Item Functioning W. Holmes Finch.
1 G Lect 13M Why might data be missing in psychological studies? Missing data patterns Overview of statistical approaches Example G Multiple.
Missing Values Raymond Kim Pink Preechavanichwong Andrew Wendel October 27, 2015.
1 MGT 511: Hypothesis Testing and Regression Lecture 8: Framework for Multiple Regression Analysis K. Sudhir Yale SOM-EMBA.
Special Topics in Educational Data Mining HUDK5199 Spring term, 2013 March 13, 2013.
Missing Values C5.2 Data Screening. Missing Data Use the summary function to check out the missing data for your dataset. summary(notypos)
Tutorial I: Missing Value Analysis
Pre-Processing & Item Analysis DeShon Pre-Processing Method of Pre-processing depends on the type of measurement instrument used Method of Pre-processing.
Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 April 9, 2012.
A framework for multiple imputation & clustering -Mainly basic idea for imputation- Tokei Benkyokai 2013/10/28 T. Kawaguchi 1.
Data Screening. What is it? Data screening is very important to make sure you’ve met all your assumptions, outliers, and error problems. Each type of.
Chapter 11 Linear Regression and Correlation. Explanatory and Response Variables are Numeric Relationship between the mean of the response variable and.
DATA STRUCTURES AND LONGITUDINAL DATA ANALYSIS Nidhi Kohli, Ph.D. Quantitative Methods in Education (QME) Department of Educational Psychology 1.
HANDLING MISSING DATA.
Missing data: Why you should care about it and what to do about it
Multiple Imputation using SOLAS for Missing Data Analysis
MISSING DATA AND DROPOUT
Maximum Likelihood & Missing data
Introduction to Survey Data Analysis
Multiple Imputation.
Making Sense of Advanced Statistical Procedures in Research Articles
Multiple Imputation Using Stata
How to handle missing data values
Dealing with missing data
Presenter: Ting-Ting Chung July 11, 2017
The European Statistical Training Programme (ESTP)
CH2. Cleaning and Transforming Data
Some issues in multivariate regression
Analysis of missing responses to the sexual experience question in evaluation of an adolescent HIV risk reduction intervention Yu-li Hsieh, Barbara L.
Chapter 4: Missing data mechanisms
The European Statistical Training Programme (ESTP)
Multicollinearity What does it mean? A high degree of correlation amongst the explanatory variables What are its consequences? It may be difficult to separate.
Rachael Bedford Mplus: Longitudinal Analysis Workshop 23/06/2015
Implementation of the Bayesian approach to imputation at SORS Zvone Klun and Rudi Seljak Statistical Office of the Republic of Slovenia Oslo, September.
Chapter 13: Item nonresponse
Presentation transcript:

SW 983 Missing Data Treatment Most of the slides presented here are from the Modern Missing Data Methods, 2011, 5 day course presented by the KUCRMDA, Instructors: Wei Wu and Mijke Rhemtulla

Why Do We Care? A major goal of inferential statistics involves the estimation of parameters and their standard errors. When our estimations of parameters are off they are said to be biased. When our estimates of standard errors are wrong it effects our ability to do significance tests. Missing data can impact both.

Old methods Deletion methods (listwise, pairwise deletion) – Ignore data from a subset of participants who have missing values on some cases (listwise deletion = throw out whole cases; pairwise deletion = selectively delete cases depending on the computation) – Throwing out data leads to loss of power – Selectively deleting data/ignoring missing data can lead to a high degree of bias

Old Methods Single-imputation methods – Mean imputation Replace each missing value with the variable’s mean Adds no new information – Regression imputation Use linear regression to predict each missing observation based on the other variables that are present

Advantages: – Filling in missing values results in complete N – Can use all available data Disadvantages: – Will give biased statistics (correlations, regression coefficients, standard deviations, path coefficients) under any kind of missingness – Correlations, covariances, and regression/path coefficients will be too weak – Variances and standard deviations will be too small – “The method achieves little except the illusion of progress” (Little & Rubin, 1990, p.380) Mean imputation 3. Traditional Methods5

Note that in order to get a predicted value for every missing value, predictors cannot themselves have missing values When computing the regression equations, you must use either a deletion method or mean imputation When filling in missing values, you must use mean imputation to substitute into the regression equations Alternatively, only use complete predictors Regression imputation 3. Traditional Methods6 SubjectYX1X1 X2X

Advantages: – Borrows information from observed data – Point estimates of missing observations are more accurate than mean imputation Disadvantages: – Will give biased statistics (correlations, regression coefficients, standard deviations, path coefficients) under any kind of missingness – Correlations, covariances, and regression/path coefficients will be too weak – Variances and standard deviations will be too small Regression Imputation 3. Traditional Methods 7

Patterns and Mechanisms Mechanisms describe why data are missing – can affect the ease of recovering relations among variables Patterns describe where data are missing – can affect the ease of recovering relations among variables – can affect the extent to which results will be biased

Missing Patterns 2. Patterns and Mechanisms9 Y1Y1 Y2Y2 Y3Y3 Y4Y4 Y1Y1 Y2Y2 Y3Y3 Y4Y4 A A B B Y1Y1 Y2Y2 Y3Y3 Y4Y4 C C

MISSING MECHANISMS 1.Missing Completely at Random (MCAR) 2.Missing at Random (MAR) 3.Missing Not at Random (MNAR or NMAR)

MCAR 1. Completely Random Missingness (MCAR) For example: You measure SES and math scores -If some students were randomly selected to go on a field trip on the day of testing and they missed the test -If you collected data from all students and your dog ate some of them -If some students forgot to fill in the answer sheet for some items at random

MCAR 1. Completely Random Missingness (MCAR) Prognosis: great! Almost any method of analysis will lead to unbiased results. Modern methods (Imputation, Maximum Likelihood) will give you more power than older methods (e.g., Deletion).

MAR 2. Random Missingness (MAR) The reason data are missing is unrelated to the missing values after controlling for the relation between missingness and measured variables.

MAR 2. Random Missingness (MAR) For example: You measure SES and math scores -If low-SES children tend to have poorer math scores, and they are more likely to be absent from testing, AND after accounting for SES, there is no further relation between math scores and the propensity for data to be missing.

2. Random Missingness (MAR) Prognosis: Good. IF you use modern missing data methods that account for the relations between observed variables and missingness, results will be unbiased and power will be recovered. Using old methods (e.g., deletion), results will be biased. MAR

3. Non-Random Missingness (MNAR or NMAR) For example: You measure SES and math scores -If children with low math scores are more likely to avoid writing the test. Even after accounting for SES, math scores continue to be related to the propensity for missingness -If all children write the test but they skip items that they find difficult. Whether an item is missing is directly related to the child’s true score on that item MNAR

MNAR  MAR To the extent that it is possible to collect data that correlate with the missingness (R), MNAR data can be made to approximate MAR. – e.g., if math ability and reading ability are highly correlated, then reading scores might predict MNAR missingness on math scores. If the missingness can be predicted, it becomes MAR missingness.

How do you know what kind of missingness you have? MAR/MNAR are impossible to test for But there are tests for MCAR – If missingness is MCAR, then observed data should be no different from missing data – We can measure this by examining group differences on a predictor (“auxiliary”) variable e.g., we can ask whether those who are missing a math score have significantly different SES scores than those whose math scores are not missing MCAR TESTS 2. Patterns and Mechanisms18

Is it worth testing for MCAR? – Most naturally-occurring missingness is not MCAR – Even if missingness is MCAR, new methods are still better (i.e., more powerful) than old methods – New methods don’t distinguish between MCAR and MAR – But: MCAR tests may be useful for identifying variables relevant to the MAR process (i.e., auxiliary variables) Auxiliary Variables are those that we include in analysis because they predict missingness MCAR TESTS 2. Patterns and Mechanisms19

Patterns of missingness can indicate different causes of missingness (attrition, planned missing, nonresponse) The number and kind of missing patterns can affect covariance coverage and fraction of missing information, resulting in better or worse parameter estimates MCAR missingness is the only missing mechanism that does not lead to bias with traditional methods (e.g., deletion) and it is also the rarest MAR missingness is attainable by measuring covariates that predict missingness MNAR missingness is related to the missing values themselves and will result in poor-quality estimates Tests for MCAR are possible but of questionable value Patterns and Mechanisms: Summary 2. Patterns and Mechanisms20

FIML vs. Multiple Imputation (Note – with large samples they produce the same results) Advantage Multiple Imputation – Use of auxiliary variables (but Mplus does allow) – Treatment of incomplete explanatory variables (?) – Item level imputation (i.e. item nonresponse) Advantage Maximus Likelihood – Estimating interaction terms – SEM – Fewer procedural ambiguities (i.e. it’s easier to do)