Non response and missing data in longitudinal surveys

Slides:



Advertisements
Similar presentations
Handling attrition and non- response in longitudinal data Harvey Goldstein University of Bristol.
Advertisements

REALCOM Multilevel models for realistically complex data Measurement errors Multilevel Structural equations Multivariate responses at several levels and.
Missing data – issues and extensions For multilevel data we need to impute missing data for variables defined at higher levels We need to have a valid.
MCMC estimation in MlwiN
Non response and missing data in longitudinal surveys.
Multilevel Multivariate Models with responses at several levels Harvey Goldstein Centre for Multilevel Modelling University of Bristol.
Latent normal models for missing data Harvey Goldstein Centre for Multilevel Modelling University of Bristol.
Efficient modelling of record linked data A missing data perspective Harvey Goldstein Record Linkage Methodology Research Group Institute of Child Health.
Treatment of missing values
Multilevel survival models A paper presented to celebrate Murray Aitkin’s 70 th birthday Harvey Goldstein ( also 70 ) Centre for Multilevel Modelling University.
Efficient modelling of record linked data A missing data perspective Harvey Goldstein Record Linkage Methodology Research Group Institute of Child Health.
Dynamic Treatment Regimes, STAR*D & Voting D. Lizotte, E. Laber & S. Murphy LSU ---- Geaux Tigers! April 2009.
Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap.
4. Multiple Regression Analysis: Estimation -Most econometric regressions are motivated by a question -ie: Do Canadian Heritage commercials have a positive.
Dynamic Treatment Regimes, STAR*D & Voting D. Lizotte, E. Laber & S. Murphy ENAR March 2009.
Dynamic Treatment Regimes, STAR*D & Voting D. Lizotte, E. Laber & S. Murphy Psychiatric Biostatistics Symposium May 2009.
Statistical Methods for Missing Data Roberta Harnett MAR 550 October 30, 2007.
PEAS wprkshop 2 Non-response and what to do about it Gillian Raab Professor of Applied Statistics Napier University.
Multiple imputation using ICE: A simulation study on a binary response Jochen Hardt Kai Görgen 6 th German Stata Meeting, Berlin June, 27 th 2008 Göteborg.
Eurostat Statistical Data Editing and Imputation.
Methods Inverse probability weighting –Can you predict probability of response? –Difficulties if more than one missing outcome or covariate Joint model.
1 Multiple Imputation : Handling Interactions Michael Spratt.
Modelling non-independent random effects in multilevel models Harvey Goldstein and William Browne University of Bristol NCRM LEMMA 3.
Handling Attrition and Non- response in the 1970 British Cohort Study Tarek Mostafa Institute of Education – University of London.
DATA PREPARATION: PROCESSING & MANAGEMENT Lu Ann Aday, Ph.D. The University of Texas School of Public Health.
Eurostat Statistical matching when samples are drawn according to complex survey designs Training Course «Statistical Matching» Rome, 6-8 November 2013.
Eurostat Statistical Matching using auxiliary information Training Course «Statistical Matching» Rome, 6-8 November 2013 Marco Di Zio Dept. Integration,
© John M. Abowd 2007, all rights reserved General Methods for Missing Data John M. Abowd March 2007.
The Impact of Missing Data on the Detection of Nonuniform Differential Item Functioning W. Holmes Finch.
Simulation Study for Longitudinal Data with Nonignorable Missing Data Rong Liu, PhD Candidate Dr. Ramakrishnan, Advisor Department of Biostatistics Virginia.
Latent regression models. Where does the probability come from? Why isn’t the model deterministic. Each item tests something unique – We are interested.
Tutorial I: Missing Value Analysis
Multiple Imputation using SAS Don Miller 812 Oswald Tower
Introduction to Sampling Methods Qi Zhao Oct.27,2004.
[Part 5] 1/43 Discrete Choice Modeling Ordered Choice Models Discrete Choice Modeling William Greene Stern School of Business New York University 0Introduction.
Bias-Variance Analysis in Regression  True function is y = f(x) +  where  is normally distributed with zero mean and standard deviation .  Given a.
DATA STRUCTURES AND LONGITUDINAL DATA ANALYSIS Nidhi Kohli, Ph.D. Quantitative Methods in Education (QME) Department of Educational Psychology 1.
CS498-EA Reasoning in AI Lecture #19 Professor: Eyal Amir Fall Semester 2011.
Heteroscedasticity Heteroscedasticity is present if the variance of the error term is not a constant. This is most commonly a problem when dealing with.
Probability plots.
Handling Attrition and Non-response in the 1970 British Cohort Study
Chapter 7. Classification and Prediction
MISSING DATA AND DROPOUT
William Greene Stern School of Business New York University
CH 5: Multivariate Methods
The Centre for Longitudinal Studies Missing Data Strategy
Maximum Likelihood & Missing data
Sampling: Theory and Methods
Hypothesis Testing: Hypotheses
Multiple Imputation.
How to handle missing data values
Hidden Markov Models Part 2: Algorithms
Methods of Economic Investigation Lecture 12
Presenter: Ting-Ting Chung July 11, 2017
The bane of data analysis
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
The European Statistical Training Programme (ESTP)
CH2. Cleaning and Transforming Data
Task 6 Statistical Approaches
Homoscedasticity/ Heteroscedasticity In Brief
Probability, Statistics
Fixed, Random and Mixed effects
Missing Data Mechanisms
Analysis of missing responses to the sexual experience question in evaluation of an adolescent HIV risk reduction intervention Yu-li Hsieh, Barbara L.
Homoscedasticity/ Heteroscedasticity In Brief
Chapter 13 Additional Topics in Regression Analysis
Instrumental Variables Estimation and Two Stage Least Squares
Clinical prediction models
Chapter 13: Item nonresponse
Testing Causal Hypotheses
Presentation transcript:

Non response and missing data in longitudinal surveys

Traditional ways of handling attrition and missing data Weighting typically used for attrition Sample design and initial non-response provides basic weights For several waves defines ‘typical’ pathways and provide weights for each one. e.g. LSYP may require 12 or more For item non-response use ‘hot deck’ single imputation

Problems with weighting procedures Inefficient – can only use complete data for each combination of variables analysed Restrictive since weights only provided for chosen ‘pathways’ Possibly inconsistent results through different weights for different analyses Not very transparent for use Problematic for ‘structurally missing’ items

Problems with hot deck imputation Not theoretically based Selection of ‘matched’ cases may not always be possible – especially in multilevel data Single imputation does not allow easy computation of standard errors

Multiple imputation – briefly and simply Consider the model of interest (MOI) We turn this into a multivariate response model and obtain residual estimates of (from an MCMC chain) where x, or y are missing. Use these to ‘fill in’ and produce a complete data set. Do this (independently) n (e.g. = 20) times. Fit MOI to each data set and combine according to rules to get estimates and standard errors. Note that at imputation stage we can use auxiliary data. Note also that we can handle attrition as missing data.

What not to do Omit all records with missing data – inneficient In categorical data use an extra category for missing - biased Plug in the mean over the non-missing values - biased

Multiple imputation in MLwiN Existing methods assume normality. For multilevel data they cannot handle level 2 variables with missing data Cannot handle discrete variables with missing data. REALCOM-IMPUTE links REALCOM with MLwiN and can handle level 2 and discrete variables. It works by transforming discrete variables to normality using a ‘latent variable’ model so that all response variables have a joint multivariate normal distribution and then applies MI theory.

Partially observed data values Where we have a prior (estimated) probability distribution (PD) for a missing discrete variable value we simply insert an extra MCMC step that accepts the ‘standard’ MI value with a probability that is just the probability given by the PD. A corresponding step is used for normal data. This thus uses all of the data efficiently. No data are discarded so long as it is possible to assign a PD. May also reduce ‘partial response bias’ Several completed data sets are produced and combined as in standard MI These procedures are computationally intensive but once the completed data sets are produced they can be used for many different models – so long as a model uses only variables that have been involved in the imputation procedure.

References Multilevel models with multivariate mixed response types (2009) Goldstein, H, Carpenter, J., Kenward, M., Levin, K. Statistical Modelling (to appear) - Gives methodological background Handling attrition and non-response in longitudinal data. International Journal of longitudinal and Life Course studies. April 2009. http://www.journal.longviewuk.com/index.php/llcs - Discusses issues for longitudinal studies in detail

Sampling weights Consider a 2-level model: Write level 2 weights as Level 1 weights for j-th level 2 unit as Final level 1 weights We use as the level 1 random part explanatory variable instead of the constant =1 This will be used for imputation and for MOI