Handling attrition and non- response in longitudinal data Harvey Goldstein University of Bristol.

Slides:



Advertisements
Similar presentations
Statistics for Improving the Efficiency of Public Administration Daniel Peña Universidad Carlos III Madrid, Spain NTTS 2009 Brussels.
Advertisements

Sampling designs using the National Pupil Database Some issues for discussion by Harvey Goldstein (University of Bristol) & Tony Fielding (University of.
REALCOM Multilevel models for realistically complex data Measurement errors Multilevel Structural equations Multivariate responses at several levels and.
Missing data – issues and extensions For multilevel data we need to impute missing data for variables defined at higher levels We need to have a valid.
MCMC estimation in MlwiN
Non response and missing data in longitudinal surveys.
NCeSS e-Stat quantitative node Prof. William Browne & Prof. Jon Rasbash University of Bristol.
Multilevel modelling short course
Multilevel Multivariate Models with responses at several levels Harvey Goldstein Centre for Multilevel Modelling University of Bristol.
SADC Course in Statistics Modelling ideas in general – an appreciation (Session 20)
Latent normal models for missing data Harvey Goldstein Centre for Multilevel Modelling University of Bristol.
Efficient modelling of record linked data A missing data perspective Harvey Goldstein Record Linkage Methodology Research Group Institute of Child Health.
Module B-4: Processing ICT survey data TRAINING COURSE ON THE PRODUCTION OF STATISTICS ON THE INFORMATION ECONOMY Module B-4 Processing ICT Survey data.
Treatment of missing values
Some birds, a cool cat and a wolf
Multilevel survival models A paper presented to celebrate Murray Aitkin’s 70 th birthday Harvey Goldstein ( also 70 ) Centre for Multilevel Modelling University.
Efficient modelling of record linked data A missing data perspective Harvey Goldstein Record Linkage Methodology Research Group Institute of Child Health.
NLSCY – Non-response. Non-response There are various reasons why there is non-response to a survey  Some related to the survey process Timing Poor frame.

ANOVA: ANalysis Of VAriance. In the general linear model x = μ + σ 2 (Age) + σ 2 (Genotype) + σ 2 (Measurement) + σ 2 (Condition) + σ 2 (ε) Each of the.
© John M. Abowd 2005, all rights reserved Analyzing Frames and Samples with Missing Data John M. Abowd March 2005.
How to deal with missing data: INTRODUCTION
Modeling Achievement Trajectories When Attrition is Informative Betsy J. Feldman & Sophia Rabe- Hesketh.
Statistical Methods for Missing Data Roberta Harnett MAR 550 October 30, 2007.
PEAS wprkshop 2 Non-response and what to do about it Gillian Raab Professor of Applied Statistics Napier University.
A simulation study of the effect of sample size and level of interpenetration on inference from cross-classified multilevel logistic regression models.
Introduction to Multilevel Modeling Using SPSS
Multiple imputation using ICE: A simulation study on a binary response Jochen Hardt Kai Görgen 6 th German Stata Meeting, Berlin June, 27 th 2008 Göteborg.
Eurostat Statistical Data Editing and Imputation.
Modelling non-independent random effects in multilevel models William Browne Harvey Goldstein University of Bristol.
1 Introduction to Survey Data Analysis Linda K. Owens, PhD Assistant Director for Sampling & Analysis Survey Research Laboratory University of Illinois.
Modelling non-independent random effects in multilevel models Harvey Goldstein and William Browne University of Bristol NCRM LEMMA 3.
G Lecture 11 G Session 12 Analyses with missing data What should be reported?  Hoyle and Panter  McDonald and Moon-Ho (2002)
Handling Attrition and Non- response in the 1970 British Cohort Study Tarek Mostafa Institute of Education – University of London.
Multilevel Modeling Software Wayne Osgood Crime, Law & Justice Program Department of Sociology.
1 Introduction to Survey Data Analysis Linda K. Owens, PhD Assistant Director for Sampling & Analysis Survey Research Laboratory University of Illinois.
Calibrated imputation of numerical data under linear edit restrictions Jeroen Pannekoek Natalie Shlomo Ton de Waal.
Eurostat Statistical matching when samples are drawn according to complex survey designs Training Course «Statistical Matching» Rome, 6-8 November 2013.
Eurostat Statistical Matching using auxiliary information Training Course «Statistical Matching» Rome, 6-8 November 2013 Marco Di Zio Dept. Integration,
© John M. Abowd 2007, all rights reserved General Methods for Missing Data John M. Abowd March 2007.
1 G Lect 13W Imputation (data augmentation) of missing data Multiple imputation Examples G Multiple Regression Week 13 (Wednesday)
Single-Factor Studies KNNL – Chapter 16. Single-Factor Models Independent Variable can be qualitative or quantitative If Quantitative, we typically assume.
The Impact of Missing Data on the Detection of Nonuniform Differential Item Functioning W. Holmes Finch.
Missing Values Raymond Kim Pink Preechavanichwong Andrew Wendel October 27, 2015.
Simulation Study for Longitudinal Data with Nonignorable Missing Data Rong Liu, PhD Candidate Dr. Ramakrishnan, Advisor Department of Biostatistics Virginia.
Special Topics in Educational Data Mining HUDK5199 Spring term, 2013 March 13, 2013.
Chapter 6 Lecture 3 Sections: 6.4 – 6.5. Sampling Distributions and Estimators What we want to do is find out the sampling distribution of a statistic.
Latent regression models. Where does the probability come from? Why isn’t the model deterministic. Each item tests something unique – We are interested.
Tutorial I: Missing Value Analysis
Armando Teixeira-Pinto AcademyHealth, Orlando ‘07 Analysis of Non-commensurate Outcomes.
INFO 4470/ILRLE 4470 Visualization Tools and Data Quality John M. Abowd and Lars Vilhuber March 16, 2011.
Pre-Processing & Item Analysis DeShon Pre-Processing Method of Pre-processing depends on the type of measurement instrument used Method of Pre-processing.
An Application of Multilevel Modelling to Meta-Analysis, and Comparison with Traditional Approaches Alison O’Mara & Herb Marsh Department of Education,
A framework for multiple imputation & clustering -Mainly basic idea for imputation- Tokei Benkyokai 2013/10/28 T. Kawaguchi 1.
Synthetic Approaches to Data Linkage Mark Elliot, University of Manchester Jerry Reiter Duke University Cathie Marsh Centre.
DATA STRUCTURES AND LONGITUDINAL DATA ANALYSIS Nidhi Kohli, Ph.D. Quantitative Methods in Education (QME) Department of Educational Psychology 1.
Handling Attrition and Non-response in the 1970 British Cohort Study
The Centre for Longitudinal Studies Missing Data Strategy
Maximum Likelihood & Missing data
Introduction to Survey Data Analysis
How to handle missing data values
Presenter: Ting-Ting Chung July 11, 2017
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
The European Statistical Training Programme (ESTP)
Fixed, Random and Mixed effects
Non response and missing data in longitudinal surveys
Analysis of missing responses to the sexual experience question in evaluation of an adolescent HIV risk reduction intervention Yu-li Hsieh, Barbara L.
Rachael Bedford Mplus: Longitudinal Analysis Workshop 23/06/2015
Clinical prediction models
Chapter 13: Item nonresponse
Presentation transcript:

Handling attrition and non- response in longitudinal data Harvey Goldstein University of Bristol

Whats the problem? Loss of individuals in a survey over time can lead to smaller numbers –By aged 42 ~70% of original NCDS cohort gave information Non – random loss can lead to biases –Especially important when loss is associated with the variable values that are not subsequently available

Fixing the losses Preventing loss is another topic. This is a look at how you might compensate for it. –A brief look at traditional weighting procedures –Use of multiple imputation (MI) – a simple introduction and its application to attrition –Combining MI with weighting

Traditional approach to handling attrition and missing data Sets of weights –Sample design and any initial non-response provide basic weights for wave 1 –For several waves we can define typical pathways and provide weights for each one. e.g. LSYPE may require 12 or more depending on selected components –For item non-response hot deck single imputation (weighted?) often used

Problems with weighting procedures Inefficient – can only use the data available for each combination of variables analysed Restrictive, since weights are only provided for chosen pathways Possibly inconsistent results through different weights for different analyses Not very transparent for use

Problems with hot deck imputation Not theoretically based Selection of matched cases may not always be possible – especially in multilevel data Single imputation does not allow easy computation of standard errors

Multiple imputation – very briefly Consider the model of interest (MOI), assuming normal x, y We turn this into a multivariate normal response model and obtain residual estimates (from an MCMC chain) where x, or y are missing. Use these to fill in and produce a complete data set. Do this (independently) n (e.g. = 20) times. Fit MOI to each data set and combine according to rules to get estimates and standard errors. Note that other methods (listwise deletion, mean imputation, hot deck etc.) are either inefficient or biased.

Attrition treated as missing data A missing record at a follow up gives an individual with many known and many missing values. Even where no data at all are collected directly, auxiliary data may be available (interviewer observations etc.) Together with item missingness we can use MI to fill in all the missing data.

Distributional issues Existing methods assume normality. We would like to handle multilevel data and mixtures of normal and discrete variables with missing data. ESRC REALCOM project developed MCMC algorithm and software for these cases REALCOM-IMPUTE links REALCOM with MLwiN and can handle level 2 and discrete variables. It works by transforming discrete variables to normality using a latent variable model so that all response variables have a joint multivariate normal distribution and then applies MI theory.

Putting weights into MI Consider a 2-level model: Write level 2 weights as Level 1 weights for j-th level 2 unit as Final level 1 weights These weights can be used for MOI and also for imputation. This involves an MCMC estimation using weighted likelihoods, where variances are inversely proportional to weights.

References Multilevel models with multivariate mixed response types (2009) Goldstein, H, Carpenter, J., Kenward, M., Levin, K. Statistical Modelling (to appear) - Gives methodological background Handling attrition and non-response in longitudinal data. Goldstein. H. International Journal of longitudinal and Life Course studies. April 2009,. - Discusses issues for longitudinal studies in detail Web site for software: