Missing Data Imputation in the Bayesian Framework

Slides:



Advertisements
Similar presentations
Bayes rule, priors and maximum a posteriori
Advertisements

Unido.org/statistics International workshop on industrial statistics 8 – 10 July, Beijing Non response in industrial surveys Shyam Upadhyaya.
Using Business Taxation Data as Auxiliary Variables and as Substitution Variables in the Australian Bureau of Statistics Frank Yu, Robert Clark and Gabriele.
Treatment of missing values
1 Unsupervised Learning With Non-ignorable Missing Data Machine Learning Group Talk University of Toronto Monday Oct 4, 2004 Ben Marlin Sam Roweis Rich.
How to deal with missing data: INTRODUCTION
FINAL REPORT: OUTLINE & OVERVIEW OF SURVEY ERRORS
Statistical Methods for Missing Data Roberta Harnett MAR 550 October 30, 2007.
Jeff Howbert Introduction to Machine Learning Winter Classification Bayesian Classifiers.
Standard Error of the Mean
A P STATISTICS LESSON 9 – 1 ( DAY 1 ) SAMPLING DISTRIBUTIONS.
Guide to Handling Missing Information Contacting researchers Algebraic recalculations, conversions and approximations Imputation method (substituting missing.
Chapter 8 Introduction to Inference Target Goal: I can calculate the confidence interval for a population Estimating with Confidence 8.1a h.w: pg 481:
1 Institute of Engineering Mechanics Leopold-Franzens University Innsbruck, Austria, EU H.J. Pradlwarter and G.I. Schuëller Confidence.
Using Resampling Techniques to Measure the Effectiveness of Providers in Workers’ Compensation Insurance David Speights Senior Research Statistician HNC.
1 S T A T A U S E R S G R O U P M E E T I N G SEPTEMBER Multiple Imputation for households surveys A comparison of methods Stata Users Group Meeting.
1 Introduction to Survey Data Analysis Linda K. Owens, PhD Assistant Director for Sampling & Analysis Survey Research Laboratory University of Illinois.
1 ©2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
AP STATISTICS LESSON AP STATISTICS LESSON DESIGNING DATA.
SW 983 Missing Data Treatment Most of the slides presented here are from the Modern Missing Data Methods, 2011, 5 day course presented by the KUCRMDA,
September 18-19, 2006 – Denver, Colorado Sponsored by the U.S. Department of Housing and Urban Development Conducting and interpreting multivariate analyses.
© John M. Abowd 2007, all rights reserved General Methods for Missing Data John M. Abowd March 2007.
1 G Lect 13W Imputation (data augmentation) of missing data Multiple imputation Examples G Multiple Regression Week 13 (Wednesday)
A REVIEW By Chi-Ming Kam Surajit Ray April 23, 2001 April 23, 2001.
Sampling Theory and Some Important Sampling Distributions.
6. Population Codes Presented by Rhee, Je-Keun © 2008, SNU Biointelligence Lab,
Statistics 350 Lecture 2. Today Last Day: Section Today: Section 1.6 Homework #1: Chapter 1 Problems (page 33-38): 2, 5, 6, 7, 22, 26, 33, 34,
Weighting and imputation PHC 6716 July 13, 2011 Chris McCarty.
HANDLING MISSING DATA.
Introduction to Survey Research
The simple linear regression model and parameter estimation
Bayesian Semi-Parametric Multiple Shrinkage
socI 100: INTRODUCTION TO SOCIOLOGY
The treatment of uncertainty in the results
ECO 173 Chapter 10: Introduction to Estimation Lecture 5a
Modeling approaches for the allocation of costs
Ch3: Model Building through Regression
The Centre for Longitudinal Studies Missing Data Strategy
Statistical Data Analysis
Analyzing Redistribution Matrix with Wavelet
Introduction to Survey Data Analysis
Multiple Imputation.
Bayesian Classification
ECO 173 Chapter 10: Introduction to Estimation Lecture 5a
الإحصاء ومنهجية البحث Statistics and Research Methodology Fall 2016
How to handle missing data values
Business Research Methods
University of Pittsburgh at Johnstown
Dealing with missing data
An Online CNN Poll.
Presenter: Ting-Ting Chung July 11, 2017
2. Stratified Random Sampling.
Random sampling Carlo Azzarri IFPRI Datathon APSU, Dhaka
Sampling Distribution
Sampling Distribution
EVALUATING STATISTICAL REPORTS
The European Statistical Training Programme (ESTP)
Chapter 8: Weighting adjustment
Chapter 12: Other nonresponse correction techniques
Chapter 5: Producing Data
Statistical Data Analysis
New Techniques and Technologies for Statistics 2017  Estimation of Response Propensities and Indicators of Representative Response Using Population-Level.
Statistical Inference
MBA 231 RESEARCH METHODOLOGY Course Introduction Dr. Mohammed Al-Abed.
STA 291 Spring 2008 Lecture 13 Dustin Lueker.
A bootstrap method for estimators based on combined administrative and survey data Sander Scholtus (Statistics Netherlands) NTTS Conference 13 March 2019.
Mathematical Foundations of BME Reza Shadmehr
Chapter 13: Item nonresponse
The Use of Test Scores in Secondary Analysis
The Research Process & Surveys, Samples, and Populations
Presentation transcript:

Missing Data Imputation in the Bayesian Framework by John Draper, David Kadonsky June 9, 2005

Introduction Dr. Paul Robbins of The Ohio State University developed a survey on lawn care beliefs and behaviors in 2001 to challenge commonly held beliefs in the environmental literature. Unfortunately, some variables were missing in the survey data.

Introduction Dr. Paul Robbins of The Ohio State University developed a survey on lawn care beliefs and behaviors in 2001 to challenge commonly held beliefs in the environmental literature. Unfortunately, some variables were missing in the survey data. A variety of imputation methods (cold deck, hot deck, regression based, and Bayesian) are explored to analyze the advantages and disadvantages of each in the context of these survey data.

Purpose The primary purpose of the paper is to focus on developing a comprehensive Bayesian imputation method that would improve on established methods.

Purpose The primary purpose of the paper is to focus on developing a comprehensive Bayesian imputation method that would improve on established methods. Ideally, the augmented data would more closely resemble the true sampled population and therefore, the logistic regression models derived would lead to more accurate conclusions.

Data National telephone survey conducted by the Survey Research Center at The Ohio State University in 2001 on behalf of Dr. Paul Robbins. Asked questions on the opinions, behaviors and knowledge of lawn care Missing data values come primarily from the household income and housing value variables.

Methods Cold Deck Imputation Hot Deck Imputation Non-random Regression Bayesian Imputation

Cold Deck Imputation Advantages: simple unbiased for observed sample

Cold Deck Imputation Advantages: Disadvantages: simple unbiased for observed sample Disadvantages: does not use any concomitant information ignores any missing data mechanism if MAR, then cold deck will be biased large variance in imputed missing values

Hot Deck Imputation Advantages: fairly simple uses concomitant information can capture crude missing data mechanisms

Hot Deck Imputation Advantages: Disadvantages: fairly simple uses concomitant information can capture crude missing data mechanisms Disadvantages: must discretize continuous concomitant data no distinct values imputed

Non-random Regression Advantages: strong use of concomitant data can capture complex missing data mechanisms

Non-random Regression Advantages: strong use of concomitant data can capture complex missing data mechanisms Disadvantages: deterministic cannot take any exogenous information into account

Bayesian Imputation Advantages: strong use of concomitant data can capture complex missing data mechanisms can take any exogenous information into account

Bayesian Imputation Advantages: Disadvantages: strong use of concomitant data can capture complex missing data mechanisms can take any exogenous information into account Disadvantages: complexity strong distributional assumptions

Bayesian Imputation Requires normality for imputation to be reasonable.

Bayesian Imputation

Bayesian Imputation Prior data says: Data model says:

Bayesian Imputation Prior data says: Data model says: Original Model: Taking prior information into account:

Bayesian Imputation

Bayesian Imputation

Bayesian Imputation

Bayesian Imputation To impute the missing values, one simply computes the posterior predictive mean of the data, Yβ*.

Results Imputing the housing value To get some idea as to how reliable our imputed estimates are, we chose to use each non-deterministic method to impute the missing data values 5,000 times. The resulting mean values of the full data sets (observed and imputed housing value) were accumulated.

Results

Results Cold deck and hot deck imputation methods seem to be centered around the sample mean housing value Regression suggests that respondents with lower housing values may be underrepresented. Bayesian falls between Regression and cold deck

Results Bayesian reduces standard deviation of full data means of housing value by 60%

Results – Logistic Model gender1 is coded for Female metro1 is coded for Urban or Suburban (vs. Rural) hvalcat1 is coded for Housing Value between $100,000 and $150,000 hvalcat2 is coded for Housing Value less than $100,000

Results – Logistic Model

Conclusions Very little gained by imputing housing value by cold deck and hot deck. Enormous reduction in variance of the imputed means of housing value using Bayesian methods. Very little difference in resulting logistic models.

Further Research FRITZ imputation Predictive Computational techniques (Federal Reserve Imputation Technique Zeta) Predictive Computational techniques Other Bayesian modeling approaches