Robert Voogt Dutch Ministery Of Social Affairs and Employment (formerly of the University Of Amsterdam) Nonresponse in survey research: why is it a problem?

Slides:



Advertisements
Similar presentations
Handling attrition and non- response in longitudinal data Harvey Goldstein University of Bristol.
Advertisements

Handling Missing Data on ALSPAC
Non response and missing data in longitudinal surveys.
Survey Methodology Nonresponse EPID 626 Lecture 6.
Nonparametric estimation of non- response distribution in the Israeli Social Survey Yury Gubman Dmitri Romanov JSM 2009 Washington DC 4/8/2009.
Treatment of missing values
Mean, Proportion, CLT Bootstrap
The estimation strategy of the National Household Survey (NHS) François Verret, Mike Bankier, Wesley Benjamin & Lisa Hayden Statistics Canada Presentation.
Statistics for Managers Using Microsoft® Excel 5th Edition
Some birds, a cool cat and a wolf
Economics 105: Statistics Review #1 due next Tuesday in class Go over GH 8 No GH’s due until next Thur! GH 9 and 10 due next Thur. Do go to lab this week.
Jeff Beard Lisa Helma David Parrish Start Presentation.
QBM117 Business Statistics Statistical Inference Sampling 1.
Chapter 7 Sampling Distributions
Aaker, Kumar, Day Ninth Edition Instructor’s Presentation Slides
Lecture 9: One Way ANOVA Between Subjects
Sampling and Experimental Control Goals of clinical research is to make generalizations beyond the individual studied to others with similar conditions.
How to deal with missing data: INTRODUCTION
The Excel NORMDIST Function Computes the cumulative probability to the value X Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc
7-1 Copyright ©2011 Pearson Education, Inc. publishing as Prentice Hall Chapter 7 Sampling and Sampling Distributions Statistics for Managers using Microsoft.
FINAL REPORT: OUTLINE & OVERVIEW OF SURVEY ERRORS
Statistical Methods for Missing Data Roberta Harnett MAR 550 October 30, 2007.
PEAS wprkshop 2 Non-response and what to do about it Gillian Raab Professor of Applied Statistics Napier University.
Multiple imputation using ICE: A simulation study on a binary response Jochen Hardt Kai Görgen 6 th German Stata Meeting, Berlin June, 27 th 2008 Göteborg.
COLLECTING QUANTITATIVE DATA: Sampling and Data collection
ICCS th NRC Meeting, February 15 th - 18 th 2010, Madrid 1 Sample Participation and Sampling Weights.
Copyright 2010, The World Bank Group. All Rights Reserved. Estimation and Weighting, Part I.
From Sample to Population Often we want to understand the attitudes, beliefs, opinions or behaviour of some population, but only have data on a sample.
Andrey Veykher National Research University «Higher School оf Economics» in St-Petersburg, Russia The representative study of households based on the data.
Chap 20-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 20 Sampling: Additional Topics in Sampling Statistics for Business.
Copyright ©2011 Pearson Education 7-1 Chapter 7 Sampling and Sampling Distributions Statistics for Managers using Microsoft Excel 6 th Global Edition.
1 Introduction to Survey Data Analysis Linda K. Owens, PhD Assistant Director for Sampling & Analysis Survey Research Laboratory University of Illinois.
Handling Attrition and Non- response in the 1970 British Cohort Study Tarek Mostafa Institute of Education – University of London.
1 Introduction to Survey Data Analysis Linda K. Owens, PhD Assistant Director for Sampling & Analysis Survey Research Laboratory University of Illinois.
Nonresponse Rates and Nonresponse Bias In Surveys Robert M. Groves University of Michigan and Joint Program in Survey Methodology, USA Emilia Peytcheva.
Chapter 15 Sampling and Sample Size Winston Jackson and Norine Verberg Methods: Doing Social Research, 4e.
Sampling Methods and Sampling Distributions
5-4-1 Unit 4: Sampling approaches After completing this unit you should be able to: Outline the purpose of sampling Understand key theoretical.
Chapter 5 Parameter estimation. What is sample inference? Distinguish between managerial & financial accounting. Understand how managers can use accounting.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.. Chap 7-1 Chapter 7 Sampling Distributions Basic Business Statistics.
Statistics for Managers Using Microsoft Excel, 5e © 2008 Pearson Prentice-Hall, Inc.Chap 7-1 Statistics for Managers Using Microsoft® Excel 5th Edition.
SW 983 Missing Data Treatment Most of the slides presented here are from the Modern Missing Data Methods, 2011, 5 day course presented by the KUCRMDA,
Eurostat Weighting and Estimation. Presented by Loredana Di Consiglio Istituto Nazionale di Statistica, ISTAT.
© John M. Abowd 2007, all rights reserved General Methods for Missing Data John M. Abowd March 2007.
Survey Methodology EPID 626 Sampling, Part II Manya Magnus, Ph.D. Fall 2001.
The Impact of Missing Data on the Detection of Nonuniform Differential Item Functioning W. Holmes Finch.
Chapter 7 Point Estimation of Parameters. Learning Objectives Explain the general concepts of estimating Explain important properties of point estimators.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 7-1 Chapter 7 Sampling and Sampling Distributions Basic Business Statistics 11 th Edition.
Latent regression models. Where does the probability come from? Why isn’t the model deterministic. Each item tests something unique – We are interested.
Basic Business Statistics
Tutorial I: Missing Value Analysis
Topics Semester I Descriptive statistics Time series Semester II Sampling Statistical Inference: Estimation, Hypothesis testing Relationships, casual models.
Pre-Processing & Item Analysis DeShon Pre-Processing Method of Pre-processing depends on the type of measurement instrument used Method of Pre-processing.
Guillaume Osier Institut National de la Statistique et des Etudes Economiques (STATEC) Social Statistics Division Construction.
Review Law of averages, expected value and standard error, normal approximation, surveys and sampling.
MISSING DATA AND DROPOUT
Introduction to Survey Data Analysis
How to handle missing data values
Chapter 2: The nonresponse problem
Presenter: Ting-Ting Chung July 11, 2017
The European Statistical Training Programme (ESTP)
The European Statistical Training Programme (ESTP)
Chapter 8: Weighting adjustment
Chapter 12: Other nonresponse correction techniques
CH2. Cleaning and Transforming Data
New Techniques and Technologies for Statistics 2017  Estimation of Response Propensities and Indicators of Representative Response Using Population-Level.
The European Statistical Training Programme (ESTP)
Rachael Bedford Mplus: Longitudinal Analysis Workshop 23/06/2015
Chapter 13: Item nonresponse
Chapter 2: The nonresponse problem
Presentation transcript:

Robert Voogt Dutch Ministery Of Social Affairs and Employment (formerly of the University Of Amsterdam) Nonresponse in survey research: why is it a problem?

2 Overview What is nonresponse, why is it a problem and why does the traditional way of correcting for nonresponse not solve the problem Overview of general correction techniques An alternative approach to correct for nonresponse bias Real life illustration

3 What is nonresponse, why is it a problem and why does the traditional way of correcting for nonresponse not solve the problem?

4 Survey research Population is sampled Sample is a good representation of population when good sample techniques are used Not all sample elements will respond

5 Unit vs Item nonresponse Some are not reached, others refuse or are not sending back the questionaire: unit nonresponse Some who do answer the questionnaire do so incompletely: item nonresponse

6 MCAR, MAR, MNAR 3 general nonresponse mechanisms can be distinguished MCAR: Missing Complety At Random MAR: Missing At Random MNAR: Missing Not At Random

7 Missing Completely At Random (MCAR) Conditional distribution M given the survey outcomes Y and survey design variables Z. Let f(M|Y,  ) denote the distribution, with  the unknown parameters. If MCAR: f(M|Y,Z,  ) = f(M|  ) for all Y,Z,  Not a realistic assumption

8 Example MCAR Taking a random subsample of a group of nonrespondents If random subsample of nonrespondents is analysed (after obtaining answers of all of them), the nonsampled nonrespondents can be said to be MCAR So correction methods using the MCAR assumption can be used

9 Missing At Random (MAR) MAR: f(M|Y,Z,  ) = f(M|Y obs,Z,  ) for all Y mis,  where Y obs denotes all the observed survey data This means that missingness depends on the observed variables, the observed values of incomplete variables or on the design variables, but not on the variables or values that are missing

10 Example MAR For both respondents and nonrespondents we know their level of education Respondents who share the same value of level of education have the same distribution on the unobserved variables Most survey nonrespondent adjustment methods assume MAR

11 Not Missing At Random (NMAR) NMAR: f(M|Y,Z,  ) = f(M|Y obs,Y mis,Z,  ) for all Y obs,Y mis,  This means that missingness depends on missing values after conditioning on the observed data To get an unbiased distribution M, a joint model of the data and the nonresponse mechanism is necessary

12 Example MNAR For both respondents and nonrespondents we know their level of education Given the level of education nonresponse on the variables of interest is not random This means it is not sufficient to use only level of education to correct for nonresponse bias.

13 Nonresponse bias If nonresponse is not a result of design, almost always NMAR is the case, with data biased by nonresponse as a result. The amount of nonresponse bias is dependent on: 1.the correlation between the target variable(s) and the nonresponse mechanism; 2.the level of nonresponse.

14 Nonresponse bias with Y k : the score of element k in the population on the target variabele  k : probability of response of element k in the population when contacted in the sample C( ,Y): population covariance between response probabilities and the values of the target variable

15 Nonresponse bias with (Y k -Y): the difference between the population score and the score of element k on the variabele of interest (  k -  : the difference between the mean probability to respond and the probability to respond of element k It follows from this equation that the response level in itself does not say everything: the amount of bias depends on the relation between the first and second part of the equation

16 Traditional correction methods Use population information to compare to the respondent group with the population Use information that is available for both respondents and nonrespondents Use information about the difficulty to obtain data from the respondents In fact, the assumption is that the data are MAR, given the values of the variables of which population information or information about the nonrespondents is available

17 Traditional correction methods No information about the difference on the variables of interest between the respondents and nonrespondents No information about the difference in response probabilities between sample elements that score different on the variables of interest So there is no reason why this way of correcting should work

18 Overview of general correction techniques

19 Different correction techniques Weighting: assigning each observed element an adjustment weight Extrapolation: respondents who are most like the nonrespondents are used for correction Imputation: missing values are substituted by estimates

20 Weighting Weighting: assigning each observed element with an adjustment weight Sample elements that belong to groups that seem underrepresented on the variables used in the weighting will have a high adjustment weight Sample elements that belong to groups that seem overrepresented among the respondents will have a low adjustment weight

21 Weighting Example Question: Have you ever visited Lugano? (Y/N) Population information available about age (18-30 ) (31-64 ) (65-older) Comparison of respondents and population Weighting AgeRespPopulWeight %30%30/20= %50%50/70= %20%20/10=2.0 Lug UnwW* Yes20% (4) 50% (35) 10% (1) 40% (40) 33% (33) No80% (16) 50% (35) 90% (9) 60% (60) 67% (67) N Yes: 4* *.7 + 1*2.0= =32.5 No: 16* *.7 + 9*2.0= =66.5

22 Extrapolation Central idea: some groups of respondents are more like the nonrespondents than others are For example, sample elements that first refused, but when contacted for the second time, were persuaded to participate, can be used as proxies for the final refusals

23 Extrapolation Example Question: Have you ever visited Lugano? (Y/N) Two respondent groups: early respondents and late respondents Calculate the distribution among the nonrespondents using the last respondent method LugR1R2TRNRTS Yes48% (29) 28% (11) 40% (40) 20% (10) 33% (50) No52% (31) 72% (29) 60% (60) 80% (41) 67% (100) N Last respondent: L=A 2 +(A 2 -A 1 ) (X 2 -X 1 /X 2 ), with: L: theoretical last respondent A: % response to an item in a wave X: cumulative % respondents at the end of a wave L = 50+(50-40) (67-40/67) = 50+*.40=18%

24 Imputation Imputation: missing values are substituted by estimates Different methods of imputation: Single Imputation: for each variable one value is imputed Hot Deck Imputation: a missing value is replaced by an observed value of a comparable respondent Multiple Imputation: for each variable several values are imputed; in this way the uncertainty that imputation brings with it is also taken into account

25 Hot Deck Imputation Example Divide the respondents into homogenous groups. For exampe, by using CHAID. CHAID recursively partitions a sample into groups so that the variance of the dependent variable is minimized within groups and maximized among groups Link each nonrespondent to the group it fits in best Substitute the values of a random respondent from the same group as the value of the nonrespondent

26 Hot Deck Imputation Example, part 2 CHAID finds groups: age 18-30, 31-64/low education, 31-64/high education, 65+/male and 65+/female GrpRHDI NRTS % (4)25*.20 = /low33% (10)4*.33 = /high63% (25)1*.63 = /male20% (1)8*.20 =23 65+/female 0% (0)12*.0 =00 % Lug Yes40% (40)18%(9)33% (49)

27 Multiple Imputation Example For each case, 5 values for each missing variabele are calculated, using a regression equation and adding a random error term These values are combined in one single value, for example, by taking the mean The variance will take the uncertainty due to the imputed value into account by combining the within imputation variance (the variance of each estimated data set) and the between imputation variance (in which all 5 data sets are used)

28 Multiple Imputation Example, part 2 Imp1Imp2Imp3Imp4Imp5Mean NR NR NR NR …. NR TNR.33 Percentage that has visited Lugano

29 An alternative approach to correct for nonresponse

30 Key to succes of correction methods The information used in the correction method The correction method must model the nonresponse mechanism The variables used in correction should have a relation with: –the variables of interest –the probability to respond of a sample element

31 Central Question Method (Betlehem & Kersten, 1984) Nonrespondents are asked to answer one (or more) questions central to the subject of the study The central questions are believed to have a strong relation with both the nonresponse process and the subject of the study Central questions are used in correction

32 Central Question Example Central Question: Have you ever visited Switzerland? (Y/N) Question of interest: Have you ever visited Lugano? (Y/N) Comparison of respondents and non- respondents Weighting as correction technique LugCQ:YCQ:NUnwW* Yes67% (40) 0% (0) 40% (40) 29% (29) No33% (20) 100% (40) 60% (60) 71% (71) N Yes: 40* *1.43 = = 29 No: 20* *1.43 = = 71 CQRespNonrTSWeight Yes60%10%43%43/60=0.72 No40%90%57%57/40=1.43 N

33 Real Life Illustration

34 Illustration Election study High levels of nonresponse External information available to test the succes of the correction procedures

35 Our research questions Does nonresponse causes a problem in election studies? Is using background variables sufficient or do we need central questions? Do different correction techniques lead to different results? Is it really necessary to recontact nonrespondents?

36 Data Collection City of Zaanstad, The Netherlands N=995; 901 used Recontacting refusals Mixed mode data collection Two central questions: –Voted in 1998 national elections –Political interest

37 Response rate MethodN% TelephoneComplete question Central questions819.0 MailComplete question Central questions273.0 Face-to-faceComplete question Central questions374.1 Nonresponse525.8 Total sample901100

38 Does nonresponse cause problems? We distinguish four groups: Response at first contact (470) Response after two contacts (76) Response after three or four contacts (158) Nonrespondents (including those who answered the central questions) (197)

39 Comparison of response groups R1R2R3NR Voted nat. elections Voted prov. elections Interested in politics Voting not important Conclusion: nonresponse bias is present

40 How to correct? Using the Central Question Procedure and compare it with more traditional correction methods Two central questions: Voted at national elections (0-1) – from election lists (so no response bias) Political interest (0-1) – from short nonresponse questionnaire

41 Correction methods Weighting by background variables / + central questions Extrapolation Hot Deck Imputation by background variables / + central questions Multiple Imputation by background variables / + central questions for response levels of 52 % and 78 %

42 Weighting On background variables: age, ethnicity, gender, household composition, education, residential value, number of years living in current residence, social cohesion in neighborhood; using an iterative procedure As above plus validated voter turnout national elections 1998 and political interest (central questions)

43 Extrapolation Last Respondent Method

44 Hot Deck Imputation Obtain subgroups by using CHAID Assign nonrespondents to the groups Decide exact value to be imputed using a regression model (multiple imputation) For background variables / background variables and central questions

45 Multiple Imputation Use AMELIA (King et al., 1998) to calculate 10 discrete imputation values for each variable Calculate the mean distribution by summing the 10 proportions of each of the categories of the variable and divide it by 10 Compute variance to take both within- and between-imputation variance into account For background variables / background variables and central questions

46 Dependent variables Voted at national elections Voted at provincial elections Self-reported political interest Importance of voting

47 Results for weighting 52%78% RspBGCQRspBGCQTS Voted national Political Interest Voted provincial Importance Voting Rsp: Respondents, BG: Background variables CQ: Central Questions, TS: Total sample

48 Compare different methods RspWHDIMIEXWHDIMITS 78%BG CQ Voted National Political Interest Voted Provincial Importance Voting Rsp: Respondents, W: Weighting, HDI: Hot Deck Imputation, MI: Multiple Imputation. EX: Extrapolation, BG: Background variables, CQ: Central Questions, TS: Total sample

49 Relations: regression turnout provincial elections 52BVCQ78BVCQ RespWW HDI MIRespWWHDIMITS VtNat *********** Age *********** Urb Sex * Educ ******* Ethn Value ** Mobil *** Cohe

50 Conclusions Using cental questions lead to better estimates than only using background variables Higher response levels lead to better estimates All correction techniques perform equally well: the information used in the correction is more important than the technique used Correcting bias in regression parameters is less succesful

51 Recommedations Always reapproach nonrespondents, to try to reach a response level of 75 % Always ask (a sample of) nonrespondents to answer a small number of central questions Always try to get as much information as possible from external sources The technique used is not so important – simple techniques perform equally well as more complex ones.

52 Thank you for your attention! Questions? Contact: