1crmda.KU.edu Todd D. Little University of Kansas Director, Quantitative Training Program Director, Center for Research Methods and Data Analysis Director,

Slides:



Advertisements
Similar presentations
Handling attrition and non- response in longitudinal data Harvey Goldstein University of Bristol.
Advertisements

Treatment of missing values
1crmda.KU.edu Todd D. Little University of Kansas Director, Quantitative Training Program Director, Center for Research Methods and Data Analysis Director,
1crmda.KU.edu Todd D. Little University of Kansas Director, Quantitative Training Program Director, Center for Research Methods and Data Analysis Director,
1 Multiple Frame Surveys Tracy Xu Kim Williamson Department of Statistical Science Southern Methodist University.
Some birds, a cool cat and a wolf
Survey Inference with Incomplete Data Trivellore Raghunathan Chair and Professor of Biostatistics, School of Public Health Research Professor, Institute.
How to Handle Missing Values in Multivariate Data By Jeff McNeal & Marlen Roberts 1.
Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap.
Resampling techniques
Missing Data in Randomized Control Trials
Independent Sample T-test Often used with experimental designs N subjects are randomly assigned to two groups (Control * Treatment). After treatment, the.
How to deal with missing data: INTRODUCTION
Partially Missing At Random and Ignorable Inferences for Parameter Subsets with Missing Data Roderick Little Rennes
Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 7: Coding and Representation 1 Computational Architectures in.
Chapter 9 Flashcards. measurement method that uses uniform procedures to collect, score, interpret, and report numerical results; usually has norms and.
Statistical Methods for Missing Data Roberta Harnett MAR 550 October 30, 2007.
Multiple imputation using ICE: A simulation study on a binary response Jochen Hardt Kai Görgen 6 th German Stata Meeting, Berlin June, 27 th 2008 Göteborg.
Modeling Menstrual Cycle Length in Pre- and Peri-Menopausal Women Michael Elliott Xiaobi Huang Sioban Harlow University of Michigan School of Public Health.
Modeling errors in physical activity data Sarah Nusser Department of Statistics and Center for Survey Statistics and Methodology Iowa State University.
1crmda.KU.edu Todd D. Little University of Kansas Director, Quantitative Training Program Director, Center for Research Methods and Data Analysis Director,
Model Inference and Averaging
PARAMETRIC STATISTICAL INFERENCE
1crmda.KU.edu Todd D. Little University of Kansas Director, Quantitative Training Program Director, Center for Research Methods and Data Analysis Director,
Sampling Design and Analysis MTH 494 Ossam Chohan Assistant Professor CIIT Abbottabad.
1 Introduction to Survey Data Analysis Linda K. Owens, PhD Assistant Director for Sampling & Analysis Survey Research Laboratory University of Illinois.
Performance of Resampling Variance Estimation Techniques with Imputed Survey data.
G Lecture 11 G Session 12 Analyses with missing data What should be reported?  Hoyle and Panter  McDonald and Moon-Ho (2002)
Applied Epidemiologic Analysis - P8400 Fall 2002 Lab 10 Missing Data Henian Chen, M.D., Ph.D.
Defining Success Understanding Statistical Vocabulary.
DTC Quantitative Methods Survey Research Design/Sampling (Mostly a hangover from Week 1…) Thursday 17 th January 2013.
Sampling Design and Analysis MTH 494 LECTURE-12 Ossam Chohan Assistant Professor CIIT Abbottabad.
Imputation for Multi Care Data Naren Meadem. Introduction What is certain in life? –Death –Taxes What is certain in research? –Measurement error –Missing.
1 Introduction to Survey Data Analysis Linda K. Owens, PhD Assistant Director for Sampling & Analysis Survey Research Laboratory University of Illinois.
SW 983 Missing Data Treatment Most of the slides presented here are from the Modern Missing Data Methods, 2011, 5 day course presented by the KUCRMDA,
1crmda.KU.edu Todd D. Little University of Kansas Director, Quantitative Training Program Director, Center for Research Methods and Data Analysis Director,
© John M. Abowd 2007, all rights reserved General Methods for Missing Data John M. Abowd March 2007.
1 G Lect 13W Imputation (data augmentation) of missing data Multiple imputation Examples G Multiple Regression Week 13 (Wednesday)
Missing Values Raymond Kim Pink Preechavanichwong Andrew Wendel October 27, 2015.
Lecture 2: Statistical learning primer for biologists
Multivariate Analysis and Data Reduction. Multivariate Analysis Multivariate analysis tries to find patterns and relationships among multiple dependent.
1crmda.KU.edu Todd D. Little University of Kansas Director, Quantitative Training Program Director, Center for Research Methods and Data Analysis Director,
A shared random effects transition model for longitudinal count data with informative missingness Jinhui Li Joint work with Yingnian Wu, Xiaowei Yang.
Traits – The Big Five Kimberley A. Clow
Tutorial I: Missing Value Analysis
Multiple Imputation using SAS Don Miller 812 Oswald Tower
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Pre-Processing & Item Analysis DeShon Pre-Processing Method of Pre-processing depends on the type of measurement instrument used Method of Pre-processing.
Parameter Estimation. Statistics Probability specified inferred Steam engine pump “prediction” “estimation”
A framework for multiple imputation & clustering -Mainly basic idea for imputation- Tokei Benkyokai 2013/10/28 T. Kawaguchi 1.
Density Estimation in R Ha Le and Nikolaos Sarafianos COSC 7362 – Advanced Machine Learning Professor: Dr. Christoph F. Eick 1.
DATA STRUCTURES AND LONGITUDINAL DATA ANALYSIS Nidhi Kohli, Ph.D. Quantitative Methods in Education (QME) Department of Educational Psychology 1.
Research and Evaluation Methodology Program College of Education A comparison of methods for imputation of missing covariate data prior to propensity score.
Missing data: Why you should care about it and what to do about it
Multiple Imputation using SOLAS for Missing Data Analysis
The Centre for Longitudinal Studies Missing Data Strategy
Maximum Likelihood & Missing data
Introduction to Survey Data Analysis
Multiple Imputation.
Multiple Imputation Using Stata
How to handle missing data values
Remember that our objective is for some density f(y|) for observations where y and  are vectors of data and parameters,  being sampled from a prior.
The European Statistical Training Programme (ESTP)
EM for Inference in MV Data
Missing Data Mechanisms
EM for Inference in MV Data
Rachael Bedford Mplus: Longitudinal Analysis Workshop 23/06/2015
Clinical prediction models
Chapter 13: Item nonresponse
Missing data: Is it all the same?
Presentation transcript:

1crmda.KU.edu Todd D. Little University of Kansas Director, Quantitative Training Program Director, Center for Research Methods and Data Analysis Director, Undergraduate Social and Behavioral Sciences Methodology Minor Member, Developmental Psychology Training Program crmda. KU.edu Colloquium presented University of California Merced Special Thanks to: Mijke Rhemtulla & Wei Wu On the Merits of Planning and Planning for Missing Data* *You’re a fool for not using planned missing data design On the Merits of Planning and Planning for Missing Data* *You’re a fool for not using planned missing data design

2crmda.KU.edu Road Map Learn about the different types of missing data Learn about ways in which the missing data process can be recovered Understand why imputing missing data is not cheating Learn why NOT imputing missing data is more likely to lead to errors in generalization! Learn about intentionally missing designs

3crmda.KU.edu Key Considerations Recoverability Is it possible to recover what the sufficient statistics would have been if there was no missing data? (sufficient statistics = means, variances, and covariances) Is it possible to recover what the parameter estimates of a model would have been if there was no missing data. Bias Are the sufficient statistics/parameter estimates systematically different than what they would have been had there not been any missing data? Power Do we have the same or similar rates of power (1 – Type II error rate) as we would without missing data?

4crmda.KU.edu Types of Missing Data Missing Completely at Random (MCAR) No association with unobserved variables (selective process) and no association with observed variables Missing at Random (MAR) No association with unobserved variables, but maybe related to observed variables Random in the statistical sense of predictable Non-random (Selective) Missing (MNAR) Some association with unobserved variables and maybe with observed variables

5crmda.KU.edu Effects of imputing missing data

6crmda.KU.edu Effects of imputing missing data No Association with Observed Variable(s) An Association with Observed Variable(s) No Association with Unobserved /Unmeasured Variable(s) MCAR Fully recoverable Fully unbiased MAR Partly to fully recoverable Less biased to unbiased An Association with Unobserved /Unmeasured Variable(s) NMAR Unrecoverable Biased (same bias as not estimating) MAR/NMAR Partly recoverable Same to unbiased

7crmda.KU.edu No Association with ANY Observed Variable An Association with Analyzed Variables An Association with Unanalyzed Variables No Association with Unobserved /Unmeasured Variable(s) MCAR Fully recoverable Fully unbiased MAR Partly to fully recoverable Less biased to unbiased MAR Partly to fully recoverable Less biased to unbiased An Association with Unobserved /Unmeasured Variable(s) NMAR Unrecoverable Biased (same bias as not estimating) MAR/NMAR Partly to fully recoverable Same to unbiased MAR/NMAR Partly to fully recoverable Same to unbiased Effects of imputing missing data Statistical Power: Will always be greater when missing data is imputed!

8crmda.KU.edu Modern Missing Data Analysis In 1978, Rubin proposed Multiple Imputation (MI) An approach especially well suited for use with large public-use databases. First suggested in 1978 and developed more fully in MI primarily uses the Expectation Maximization (EM) algorithm and/or the Markov Chain Monte Carlo (MCMC) algorithm. Beginning in the 1980’s, likelihood approaches developed. Multiple group SEM Full Information Maximum Likelihood (FIML). An approach well suited to more circumscribed models MI or FIML

9crmda.KU.edu Full Information Maximum Likelihood FIML maximizes the case-wise -2loglikelihood of the available data to compute an individual mean vector and covariance matrix for every observation. Since each observation’s mean vector and covariance matrix is based on its own unique response pattern, there is no need to fill in the missing data. Each individual likelihood function is then summed to create a combined likelihood function for the whole data frame. Individual likelihood functions with greater amounts of missing are given less weight in the final combined likelihood function than those will a more complete response pattern, thus controlling for the loss of information. Formally, the function that FIML is maximizing is where

10crmda.KU.edu Multiple Imputation Multiple imputation involves generating m imputed datasets (usually between 20 and 100), running the analysis model on each of these datasets, and combining the m sets of results to make inferences. By filling in m separate estimates for each missing value we can account for the uncertainty in that datum’s true population value. Data sets can be generated in a number of ways, but the two most common approaches are through an MCMC simulation technique such as Tanner & Wong’s (1987) Data Augmentation algorithm or through bootstrapping likelihood estimates, such as the bootstrapped EM algorithm used by Amelia II. SAS uses data augmentation to pull random draws from a specified posterior distribution (i.e., stationary distribution of EM estimates). After m data sets have been created and the analysis model has been run on each separately, the resulting estimates are commonly combined with Rubin’s Rules (Rubin, 1987).

Fraction Missing Fraction Missing is a measure of efficiency lost due to missing data. It is the extent to which parameter estimates have greater standard errors than they would have had, had all the data been observed. It is a ratio of variances: Estimated parameter variance in the complete data set Between-imputation variance estimated parameter variance in the complete data set total parameter variance taking into account missingness 11crmda.KU.edu

12 Fraction Missing Fraction of Missing Information (asymptotic formula) Varies by parameter in the model Is typically smaller for MCAR than MAR data crmda.KU.edu

Figure 7. Simulation results showing XY correlation estimates (with 95 and 99% confidence intervals) associated with a 60% MAR Situation and 1 PCA auxiliary variable. 60% MAR correlation estimates with 1 PCA auxiliary variable (r =.60) 13 crmda.KU.edu

What goes in the Common Set? Three-form design FormCommon Set X Variable Set AVariable Set BVariable Set C 1¼ of items missing 2¼ of items missing¼ of items 3 missing¼ of items 14crmda.KU.edu

Three-form design: Example SubtestItem DemographicsHow old are you? Are you male or female? What is your occupation? Musical TasteWhat is your favorite genre of music? Do you like to listen to music while you work? Do you prefer music played loud or softly? OpennessI have a rich vocabulary. I have excellent ideas. I have a vivid imagination. SubtestItem ExtraversionI start conversations. I am the life of the party. I am comfortable around people. NeuroticismI get stressed out easily. I get irritated easily. I have frequent mood swings. ConscientiousnessI am always prepared. I like order. I pay attention to details. AgreeablenessI am interested in people. I have a soft heart. I take time out for others. 21 questions made up of 7 3-question subtests 15crmda.KU.edu

Three-form design: Example SubtestItem DemographicsHow old are you? Are you male or female? What is your occupation? Musical TasteWhat is your favorite genre of music? Do you like to listen to music while you work? Do you prefer music played loud or softly? OpennessI have a rich vocabulary. I have excellent ideas. I have a vivid imagination. SubtestItem ExtraversionI start conversations. I am the life of the party. I am comfortable around people. NeuroticismI get stressed out easily. I get irritated easily. I have frequent mood swings. ConscientiousnessI am always prepared. I like order. I pay attention to details. AgreeablenessI am interested in people. I have a soft heart. I take time out for others. Common Set (X) crmda.KU.edu

Three-form design: Example crmda.ku.edu SubtestItem DemographicsHow old are you? Are you male or female? What is your occupation? Musical TasteWhat is your favorite genre of music? Do you like to listen to music while you work? Do you prefer music played loud or softly? OpennessI have a rich vocabulary. I have excellent ideas. I have a vivid imagination. SubtestItem ExtraversionI start conversations. I am the life of the party. I am comfortable around people. NeuroticismI get stressed out easily. I get irritated easily. I have frequent mood swings. ConscientiousnessI am always prepared. I like order. I pay attention to details. AgreeablenessI am interested in people. I have a soft heart. I take time out for others. Common Set (X) 17

Three-form design: Example SubtestItem DemographicsHow old are you? Are you male or female? What is your occupation? Musical TasteWhat is your favorite genre of music? Do you like to listen to music while you work? Do you prefer music played loud or softly? OpennessI have a rich vocabulary. I have excellent ideas. I have a vivid imagination. SubtestItem ExtraversionI start conversations. I am the life of the party. I am comfortable around people. NeuroticismI get stressed out easily. I get irritated easily. I have frequent mood swings. ConscientiousnessI am always prepared. I like order. I pay attention to details. AgreeablenessI am interested in people. I have a soft heart. I take time out for others. Set A I have a rich vocabulary. I start conversations. I get stressed out easily. I am always prepared. I am interested in people. 18crmda.KU.edu

Three-form design: Example SubtestItem DemographicsHow old are you? Are you male or female? What is your occupation? Musical TasteWhat is your favorite genre of music? Do you like to listen to music while you work? Do you prefer music played loud or softly? OpennessI have a rich vocabulary. I have excellent ideas. I have a vivid imagination. SubtestItem ExtraversionI start conversations. I am the life of the party. I am comfortable around people. NeuroticismI get stressed out easily. I get irritated easily. I have frequent mood swings. ConscientiousnessI am always prepared. I like order. I pay attention to details. AgreeablenessI am interested in people. I have a soft heart. I take time out for others. Set B I have excellent ideas. I am the life of the party. I get irritated easily. I like order. I have a soft heart. 19 crmda.KU.edu

Three-form design: Example SubtestItem DemographicsHow old are you? Are you male or female? What is your occupation? Musical TasteWhat is your favorite genre of music? Do you like to listen to music while you work? Do you prefer music played loud or softly? OpennessI have a rich vocabulary. I have excellent ideas. I have a vivid imagination. SubtestItem ExtraversionI start conversations. I am the life of the party. I am comfortable around people. NeuroticismI get stressed out easily. I get irritated easily. I have frequent mood swings. ConscientiousnessI am always prepared. I like order. I pay attention to details. AgreeablenessI am interested in people. I have a soft heart. I take time out for others. Set C I have a vivid imagination. I am comfortable around people. I have frequent mood swings. I pay attention to details. I take time out for others. 20crmda.KU.edu

21 Form 1 (XAB)Form 2 (XAC)Form 3 (XBC) How old are you? Are you male or female? What is your occupation? How old are you? Are you male or female? What is your occupation? How old are you? Are you male or female? What is your occupation? What is your favorite genre of music? Do you like to listen to music while you work? Do you prefer music played loud or softly? What is your favorite genre of music? Do you like to listen to music while you work? Do you prefer music played loud or softly? What is your favorite genre of music? Do you like to listen to music while you work? Do you prefer music played loud or softly? I have a rich vocabulary. I have excellent ideas. I have a rich vocabulary. I have a vivid imagination. I have excellent ideas. I have a vivid imagination. I start conversations. I am the life of the party. I start conversations. I am comfortable around people. I am the life of the party. I am comfortable around people. I get stressed out easily. I get irritated easily. I get stressed out easily. I have frequent mood swings. I get irritated easily. I have frequent mood swings. I am always prepared. I like order. I am always prepared. I pay attention to details. I like order. I pay attention to details. I am interested in people. I have a soft heart. I am interested in people. I take time out for others. I have a soft heart. I take time out for others.

22crmda.KU.edu

Expansions of 3-Form Design (Graham, Taylor, Olchowski, & Cumsille, 2006) crmda.KU.edu23

Expansions of 3-Form Design (Graham, Taylor, Olchowski, & Cumsille, 2006) crmda.KU.edu24

25 2-Method Planned Missing Design crmda.KU.edu

Use when you have an ideal (highly valid) measure that is time-consuming or expensive By supplementing this measure with a less expensive or time-consuming measure, it is possible to increase total sample size and get higher power e.g., measuring stress Expensive measure = collect spit samples, measure cortisol Inexpensive measure = survey querying stressful thoughts e.g., measuring intelligence Expensive measure = WAIS IQ scale Inexpensive measure = multiple choice IQ test e.g., measuring smoking Expensive measure = carbon monoxide measure Inexpensive measure = self-report 2-Method Planned Missing Design 26crmda.KU.edu

Assumptions: expensive measure is unbiased (i.e., valid) inexpensive measure is systematically biased Using both measures (on a subset of participants) enables us to estimate and remove the bias from the inexpensive measure (for all participants) As the inexpensive measure gets more valid, fewer observations are needed on the expensive measure If inexpensive measure is perfectly unbiased, we don’t need the expensive measure at all! 2-Method Planned Missing Design 27crmda.KU.edu

Self- Report 1 Self- Report 2 COCotinine Smoking Self-Report Bias 2-Method Planned Missing Design 28crmda.KU.edu

2-Method Planned Missing Design 29crmda.KU.edu

2-Method Planned Missing Design 30crmda.KU.edu

K1 2 grade 1 student ;6- 4;11 5;0- 5;5 5;6- 5;11 age 6;0- 6;5 6;6- 6;11 7;0- 7;5 7;6- 7;11 5;6 5;3 4;9 4;6 4;11 5;7 5;2 5;4 6;7 6;0 5;11 5;5 5;9 6;7 6;1 6;5 7;4 6;10 7;3 6;10 7;5 6;4 7;3 7;6 31crmda.KU.edu

4;6- 4;11 5;0- 5;5 5;6- 5;11 age 6;0- 6;5 6;6- 6;11 7;0- 7;5 7;6- 7;11 5;6 5;3 4;9 4;6 4;11 5;7 5;2 5;4 6;7 6;0 5;11 5;5 5;9 6;7 6;1 6;5 7;4 6;10 7;3 6;10 7;5 6;4 7;3 7;6  Out of 3 waves, we create 7 waves of data with high missingness  Allows for more fine- tuned age-specific growth modeling  Even high amounts of missing data are not typically a problem for estimation 32crmda.KU.edu

Growth-Curve Design GroupTime 1Time 2Time 3Time 4Time 5 1xxxxx 2xxxxmissing 3xxx x 4xx xx 5x xxx 6 xxxx 33crmda.KU.edu

Growth Curve Design II GroupTime 1Time 2Time 3Time 4Time 5 1xxxxx 2xxxmissing 3xx x 4x xx 5 xxx 6xx x 7x x x 8 xx x 9x xx 10missingx xx 11missing xxx 34crmda.KU.edu

Growth Curve Design II GroupTime 1Time 2Time 3Time 4Time 5 1xxxxx 2xxxmissing 3xx x 4x xx 5 xxx 6xx x 7x x x 8 xx x 9x xx 10missingx xx 11missing xxx 35crmda.KU.edu

36crmda.KU.edu Thanks for your attention! Questions? crmda. KU.edu Colloquium presented University of California at Merced On the Merits of Planning and Planning for Missing Data* *You’re a fool for not using planned missing data design On the Merits of Planning and Planning for Missing Data* *You’re a fool for not using planned missing data design

Update Dr. Todd Little is currently at Texas Tech University Director, Institute for Measurement, Methodology, Analysis and Policy (IMMAP) Director, “Stats Camp” Professor, Educational Psychology and Leadership IMMAP (immap.educ.ttu.edu) Stats Camp (Statscamp.org) 37www.Quant.KU.edu