A REVIEW By Chi-Ming Kam Surajit Ray April 23, 2001 April 23, 2001.

Slides:



Advertisements
Similar presentations
Assumptions underlying regression analysis
Advertisements

Latent normal models for missing data Harvey Goldstein Centre for Multilevel Modelling University of Bristol.
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Treatment of missing values
CountrySTAT Team-I November 2014, ECO Secretariat,Teheran.
Uncertainty and confidence intervals Statistical estimation methods, Finse Friday , 12.45–14.05 Andreas Lindén.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Some birds, a cool cat and a wolf
CJT 765: Structural Equation Modeling Class 3: Data Screening: Fixing Distributional Problems, Missing Data, Measurement.
3.3 Toward Statistical Inference. What is statistical inference? Statistical inference is using a fact about a sample to estimate the truth about the.
Adapting to missing data
Chapter 10 Simple Regression.

REGRESSION What is Regression? What is the Regression Equation? What is the Least-Squares Solution? How is Regression Based on Correlation? What are the.
Missing Data in Randomized Control Trials
End of Chapter 8 Neil Weisenfeld March 28, 2005.
Topic 3: Regression.
How to deal with missing data: INTRODUCTION
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Statistical Methods for Missing Data Roberta Harnett MAR 550 October 30, 2007.
PEAS wprkshop 2 Non-response and what to do about it Gillian Raab Professor of Applied Statistics Napier University.
Multiple imputation using ICE: A simulation study on a binary response Jochen Hardt Kai Görgen 6 th German Stata Meeting, Berlin June, 27 th 2008 Göteborg.
STAT 3130 Statistical Methods II Missing Data and Imputation.
The Mimix Command Reference Based Multiple Imputation For Sensitivity Analysis of Longitudinal Trials with Protocol Deviation Suzie Cro EMERGE.
Workshop on methods for studying cancer patient survival with application in Stata Karolinska Institute, 6 th September 2007 Modeling relative survival.
1 Multiple Imputation : Handling Interactions Michael Spratt.
Practical Missing Data Analysis in SPSS (v17 onwards) Peter T. Donnan Professor of Epidemiology and Biostatistics.
1 S T A T A U S E R S G R O U P M E E T I N G SEPTEMBER Multiple Imputation for households surveys A comparison of methods Stata Users Group Meeting.
Topic (ii): New and Emerging Methods Maria Garcia (USA) Jeroen Pannekoek (Netherlands) UNECE Work Session on Statistical Data Editing Paris, France,
Properties of OLS How Reliable is OLS?. Learning Objectives 1.Review of the idea that the OLS estimator is a random variable 2.How do we judge the quality.
Section 6-5 The Central Limit Theorem. THE CENTRAL LIMIT THEOREM Given: 1.The random variable x has a distribution (which may or may not be normal) with.
Example: Bioassay experiment Problem statement –Observations: At each level of dose, 5 animals are tested, and number of death are observed.
SW 983 Missing Data Treatment Most of the slides presented here are from the Modern Missing Data Methods, 2011, 5 day course presented by the KUCRMDA,
© John M. Abowd 2007, all rights reserved General Methods for Missing Data John M. Abowd March 2007.
1 G Lect 13W Imputation (data augmentation) of missing data Multiple imputation Examples G Multiple Regression Week 13 (Wednesday)
The Impact of Missing Data on the Detection of Nonuniform Differential Item Functioning W. Holmes Finch.
Missing Values Raymond Kim Pink Preechavanichwong Andrew Wendel October 27, 2015.
Simulation Study for Longitudinal Data with Nonignorable Missing Data Rong Liu, PhD Candidate Dr. Ramakrishnan, Advisor Department of Biostatistics Virginia.
Using Propensity Score Matching in Observational Services Research Neal Wallace, Ph.D. Portland State University February
A shared random effects transition model for longitudinal count data with informative missingness Jinhui Li Joint work with Yingnian Wu, Xiaowei Yang.
Tutorial I: Missing Value Analysis
Considering model structure of covariates to estimate propensity scores Qiu Wang.
Stat Today: More Chapter 3. Full Factorial Designs at 2 Levels Notation/terminology: 2 k experiment, where –k is the number of factors –each factor.
- 1 - Preliminaries Multivariate normal model (section 3.6, Gelman) –For a multi-parameter vector y, multivariate normal distribution is where  is covariance.
Pre-Processing & Item Analysis DeShon Pre-Processing Method of Pre-processing depends on the type of measurement instrument used Method of Pre-processing.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Yu Cheng Chen Author: Lynette.
Bias-Variance Analysis in Regression  True function is y = f(x) +  where  is normally distributed with zero mean and standard deviation .  Given a.
DATA STRUCTURES AND LONGITUDINAL DATA ANALYSIS Nidhi Kohli, Ph.D. Quantitative Methods in Education (QME) Department of Educational Psychology 1.
Best Practices for Handling Missing Data
HANDLING MISSING DATA.
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Multiple Imputation using SOLAS for Missing Data Analysis
MISSING DATA AND DROPOUT
Ch3: Model Building through Regression
The Centre for Longitudinal Studies Missing Data Strategy
Multiple Imputation.
Multiple Imputation Using Stata
How to handle missing data values
Presenter: Ting-Ting Chung July 11, 2017
BOOTSTRAPPING: LEARNING FROM THE SAMPLE
The European Statistical Training Programme (ESTP)
Simple Linear Regression
Missing Data Mechanisms
Chapter 4: Missing data mechanisms
The European Statistical Training Programme (ESTP)
Clinical prediction models
Implementation of the Bayesian approach to imputation at SORS Zvone Klun and Rudi Seljak Statistical Office of the Republic of Slovenia Oslo, September.
Chapter 13: Item nonresponse
Stanford University School of Medicine
Imputation Strategies When a Continuous Outcome is to be Dichotomized for Responder Analysis: A Simulation Study Lysbeth Floden, PhD1 Melanie Bell, PhD2.
Presentation transcript:

A REVIEW By Chi-Ming Kam Surajit Ray April 23, 2001 April 23, 2001

Imputation Techniques Implemented in SOLAS 3.0 SINGLE IMPUTATION Hot Decking Predicted Mean Imputation Last Value Carried Forward MULTIPLE IMPUTATIONS Propensity Score Based Imputation Predictive Model Based Imputation

Method 1: Propensity Score Based Imputation This was the only Method in Version 1. Method similar to Lavori,Dawson,Shera (1995) “A multiple imputation strategy for clinical trials with truncation of patient data” GOAL: To impute Missing values by minimal Distributional Assumptions

How it Works Let R be the indicator for the missingness pattern (R=0 or 1) X 1 X 2 ………. X P Y ??..???..? R Model R from X 1, X 2,..., X P using logistic regression p=Prob(R=1| X 1, X 2,…, X P ) for each case yielding N p i ’s.

How it works…. (Approximate Bayesian bootstrap, Rubin, 1987) Group (user specified) the units by the value of the quintiles of p. Suppose that within a particular group there are n 1 observed and n 0 missing values. Quintiles of p

s ample n 1 +n 0 units with replacement from the observed values. From the sampled pool, subsample n 0 units with replacement Use these n 0 units as the imputed values for the n 0 missing values Repeat the procedure m times to get m imputations with replacement with replacement n 1 obs n 0 + n 1 n 0

Theoretical Justification It produces an imputed distribution of Y that has been corrected for biases due to missingness related to X. It's similar in spirit to reweighting but here we have a multiple imputation version of it. The method produces unbiased estimates for marginal distribution of Y.

Problems/Drawbacks The method does not preserve the association between Y and individual X i ’s. Reasoning: The only aspect of X i ’s that is used here is the linear prediction for Y (  0 +  1 X 1 + 2 X 2 …. + p X p ) in the logistic model. This is the function that predicts missingness of Y (R) but not Y itself.

Problems/Drawbacks (Continued….) Suppose X 1 is highly correlated with Y but is unrelated to P(R=1). X 1 will drop out of the the logistic model and it is not used in the imputation. As a result, the model will misrepresent the correlation of X 1 and Y. Suppose X 1 is highly correlated with Y but is unrelated to P(R=1). X 1 will drop out of the the logistic model and it is not used in the imputation. As a result, the model will misrepresent the correlation of X 1 and Y. Also, by not using X 1 in the imputation, we are failing to impute Y efficiently.

Simulation Results Using SOLAS 1.1 Data Generation Mechanism: Y=X+Z+ , whereand  ~  (0,1) Source: Paul D. Allison “Multiple Imputation for Missing Data, A Cautionary Tale”

Some Comments About the Propensity Score Based Method The method can provide valid but possibly inefficient inferences about Y (marginal). The method can lead to very misleading inferences about the relationships between Y and other variables.

Method 2: Predictive Model Based Multiple Imputation This method is implemented in SOLAS 2.0 and 3.0 HOW IT WORKS: Regress Y on X 1, X 2,…, X p Get the estimates of  0,  1,  2,….  p and   Draw  0 *,  1 *,  2 * ….  p *,  * from an approximate posterior distribution Impute Y * =  0 * +  1 * X 1 + 2 * X 2 …. + p * X p + * where  * Normal(0,  * ) Repeat m times to get the m imputed datasets

Good points The method provides correct model based MI under the regression model and MAR It also preserves the correlation between X i and Y It also preserves the correlation between X i 's and Y What is the difference with NORM ? NORM does the same thing with MCMC Under multivariate normal model, both methods give the same results

Which Software is More General ? I work for arbitrary missingness pattern I work for non-linear relation of y on X But that’s probably very similar to norm with rounding

Concluding Remarks SOLAS is the first commercial missing data software. It has good graphical interface. Easy data import and export to other softwares. Performs well under monotone missingness pattern. Estimates are not always unbiased.