Chapter 8: Nonresponse Reading (read for concepts)

Slides:

Advertisements

Similar presentations

Survey Methodology Nonresponse EPID 626 Lecture 6.

Advertisements

Brian A. Harris-Kojetin, Ph.D. Statistical and Science Policy

The estimation strategy of the National Household Survey (NHS) François Verret, Mike Bankier, Wesley Benjamin & Lisa Hayden Statistics Canada Presentation.

Statistics for Managers Using Microsoft® Excel 5th Edition

Economics 105: Statistics Review #1 due next Tuesday in class Go over GH 8 No GH’s due until next Thur! GH 9 and 10 due next Thur. Do go to lab this week.

Chapter 3 Producing Data 1. During most of this semester we go about statistics as if we already have data to work with. This is okay, but a little misleading.

NLSCY – Non-response. Non-response There are various reasons why there is non-response to a survey  Some related to the survey process Timing Poor frame.

QBM117 Business Statistics Statistical Inference Sampling 1.

Chapter 7 Sampling Distributions

© 2004 Prentice-Hall, Inc.Chap 1-1 Basic Business Statistics (9 th Edition) Chapter 1 Introduction and Data Collection.

Chapter 12 Sample Surveys

CHAPTER 4 Designing Studies

The Excel NORMDIST Function Computes the cumulative probability to the value X Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc

Survey Methods: Communicating with Respondents

7-1 Copyright ©2011 Pearson Education, Inc. publishing as Prentice Hall Chapter 7 Sampling and Sampling Distributions Statistics for Managers using Microsoft.

Chapter 13 Survey Designs

FINAL REPORT: OUTLINE & OVERVIEW OF SURVEY ERRORS

Power Point Slides by Ronald J. Shope in collaboration with John W. Creswell Chapter 13 Survey Designs.

Dr. Engr. Sami ur Rahman Assistant Professor Department of Computer Science University of Malakand Research Methods in Computer Science Lecture: Research.

Modeling errors in physical activity data Sarah Nusser Department of Statistics and Center for Survey Statistics and Methodology Iowa State University.

Copyright 2010, The World Bank Group. All Rights Reserved. Estimation and Weighting, Part I.

Making Sense of the Social World 4th Edition

Multiple Indicator Cluster Surveys Survey Design Workshop Sampling: Overview MICS Survey Design Workshop.

Chap 20-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 20 Sampling: Additional Topics in Sampling Statistics for Business.

A Latent Class Call-back Model for Survey Nonresponse Paul P. Biemer RTI International and UNC-CH Michael W. Link Centers for Disease Control and Prevention.

Lesli Scott Ashley Bowers Sue Ellen Hansen Robin Tepper Jacob Survey Research Center, University of Michigan Third International Conference on Establishment.

Evaluating a Research Report

1 Ratio estimation under SRS Assume Absence of nonsampling error SRS of size n from a pop of size N Ratio estimation is alternative to under SRS, uses.

Copyright ©2011 Pearson Education 7-1 Chapter 7 Sampling and Sampling Distributions Statistics for Managers using Microsoft Excel 6 th Global Edition.

AP Statistics.  Observational study: We observe individuals and measure variables of interest but do not attempt to influence responses.  Experiment:

The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 4 Designing Studies 4.1 Samples and Surveys.

CHAPTER 8: Producing Data: Sampling

Section 5.1 Designing Samples Malboeuf AP Statistics, Section 5.1, Part 1 3 Observational vs. Experiment An observational study observes individuals.

1 Introduction to Survey Data Analysis Linda K. Owens, PhD Assistant Director for Sampling & Analysis Survey Research Laboratory University of Illinois.

Brown, Suter, and Churchill Basic Marketing Research (8 th Edition) © 2014 CENGAGE Learning Basic Marketing Research Customer Insights and Managerial Action.

Panel Study of Entrepreneurial Dynamics Richard Curtin University of Michigan.

CHAPTER 8: Producing Data Sampling ESSENTIAL STATISTICS Second Edition David S. Moore, William I. Notz, and Michael A. Fligner Lecture Presentation.

1 Introduction to Survey Data Analysis Linda K. Owens, PhD Assistant Director for Sampling & Analysis Survey Research Laboratory University of Illinois.

The Challenge of Non- Response in Surveys. The Overall Response Rate The number of complete interviews divided by the number of eligible units in the.

Copyright 2010, The World Bank Group. All Rights Reserved. Reducing Non-Response Section A 1.

Chapter 15 Sampling and Sample Size Winston Jackson and Norine Verberg Methods: Doing Social Research, 4e.

Copyright 2010, The World Bank Group. All Rights Reserved. Reducing Non-Response Section B 1.

5-4-1 Unit 4: Sampling approaches After completing this unit you should be able to: Outline the purpose of sampling Understand key theoretical.

Section 5.1 Designing Samples AP Statistics

AP STATISTICS LESSON AP STATISTICS LESSON DESIGNING DATA.

SW 983 Missing Data Treatment Most of the slides presented here are from the Modern Missing Data Methods, 2011, 5 day course presented by the KUCRMDA,

ICCS 2009 IDB Workshop, 18 th February 2010, Madrid 1 Training Workshop on the ICCS 2009 database Weighting and Variance Estimation picture.

Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.

Eurostat Weighting and Estimation. Presented by Loredana Di Consiglio Istituto Nazionale di Statistica, ISTAT.

Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.

I can identify the difference between the population and a sample I can name and describe sampling designs I can name and describe types of bias I can.

Effects of Sampling and Screening Strategies in an RDD Survey Anthony M. Roman, Elizabeth Eggleston, Charles F. Turner, Susan M. Rogers, Rebecca Crow,

Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 7-1 Chapter 7 Sampling and Sampling Distributions Basic Business Statistics 11 th Edition.

 An observational study observes individuals and measures variable of interest but does not attempt to influence the responses.  Often fails due to.

Basic Business Statistics, 8e © 2002 Prentice-Hall, Inc. Chap 1-1 Inferential Statistics for Forecasting Dr. Ghada Abo-zaid Inferential Statistics for.

The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 4 Designing Studies 4.1 Samples and Surveys.

Basic Business Statistics

Tutorial I: Missing Value Analysis

1 Chapter 13 Collecting the Data: Field Procedures and Nonsampling Error © 2005 Thomson/South-Western.

Status for AP Congrats! We are done with Part I of the Topic Outline for AP Statistics! (20%-30%) of the AP Test can be expected to cover topics from chapter.

The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 4 Designing Studies 4.1 Samples and Surveys.

Sampling Chapter 5. Introduction Sampling The process of drawing a number of individual cases from a larger population A way to learn about a larger population.

Introduction/ Section 5.1 Designing Samples.  We know how to describe data in various ways ◦ Visually, Numerically, etc  Now, we’ll focus on producing.

Using Surveys to Design and Evaluate Watershed Education and Outreach Day 5 Methodologies for Implementing Mailed Surveys Alternatives to Mailed Surveys.

Chapter 5 Data Production

Introduction to Survey Data Analysis

Nonresponse in Survey Sampling

The European Statistical Training Programme (ESTP)

New Techniques and Technologies for Statistics 2017 Estimation of Response Propensities and Indicators of Representative Response Using Population-Level.

Chapter 13: Item nonresponse

Presentation transcript:

Chapter 8: Nonresponse Reading 8.1-8.3 8.4 (read for concepts) 8.5 (intro, 8.5.2 are focus) 8.6 8.8 (no 8.7)

Outline What is nonresponse (NR)? Why should we do something about NR? Ch 8: Nonresponse 4/23/2017 Outline What is nonresponse (NR)? Why should we do something about NR? Strategies to reduce NR Design phase After data collection Callbacks to gain info on nonrespondents (double sampling) Weighting adjustments – post-stratification only Imputation of missing values (item NR), a little from mechanisms for NR Response rate calculations Stat 804

What is nonresponse? Failure to obtain data through some part of the data collection process Nonresponse occurs during data collection process, after sample is selected Separate from ineligible cases Can not locate (may not know if eligible) Locate but refuse to participate (may or may not know eligibility) Participate but don’t answer all questions (eligibility known) …

Types of nonresponse Unit nonresponse Item nonresponse Missing data for entire observation unit All variables have missing data Item nonresponse Missing data for one or more variables for the observation unit Failure to obtain a response to an individual item = question

Example: random digit dialing (RDD) phone calls Some case (= phone number) dispositions Non-working Rings, but get no answer Get answer, determine it’s not a household Get a household, refuse survey participation Get a household, answer all but a few questions Get a household and answer all questions Eligible, unit NR, item NR?

Example: soil survey Can not reach sample unit (in canyon) Can reach, but can’t collect data (denied permission by land owner) Collect data, data sheet destroyed Forget to collect data for an item

Ignoring nonresponse (is bad) Impacts are related to differences between nonresponding and responding subpopulations in relation to analysis variables If population mean is different for responding and nonresponding subpopulations, will get a biased estimate when analyzing data from only the responding subpopulation Bias depends on Nonresponse rate Difference between population means for responding and nonresponding subpopulations p. 258 subpopulation table and equations

Ignoring nonresponse – 2 Hard to determine if distributions (parameters) for responding and nonresponding subpopulations are different Often no information on nonrespondents Examine causes of NR Is mechanism generating NR related to analysis variables? Figure 8.2 – framework for factors Data collectors (interviewers, field observers) Survey content (questionnaire, field protocols) Respondent or field site characteristics

Ignoring nonresponse – 3 Sample size reductions affect precision Low response rate  low sample size  higher variances Increasing sample size will NOT mitigate bias problems Literary Digest Survey Less of a concern because often you can anticipate and design for NR sample size attrition

Example: Norwegian voting behavior survey (Table 8.1) Survey with good follow-up methodology Examined differences between nonrespondents and full sample Age-specific voting rates lower for NR portion, especially for younger voters Low nonresponse, but high bias potential 90% response rate, but differences are large with respect to main analysis variables Mechanisms causing NR Absence or illness  less likely to respond, lower voting rates Impact: overestimate prevalence of positive voting behaviors

Strategies Best: design survey to prevent NR Post-data collection Perform nonresponse study (call-backs) Use weights to adjust for NR units Use a model to impute (fill in) values for missing items

Strategy 1: Design to prevent Consider likely mechanisms for NR when designing survey Reduce respondent burden to extent possible Two main areas Data collection methodology Burden for individual, population Sample design Burden for population Remedies for avoiding NR also tend to improve data quality

Factors to consider Survey content Timing Interviewers Salience of topic to respondent Sensitive topics (socially undesirable behaviors, medical issues) Timing Farm surveys avoid peak work times Holidays associated with higher NR Interviewers Training to improve technique Refusal conversion staff Observer variation for bird counts

Factors to consider – 2 Data collection method Mail/fax/web has highest NR, then phone, then in-person Interviewer assists in locating process, gaining cooperation to participate, avoiding item NR Computer-assisted data collection instruments prevent item NR due to data collector error Guides data collection, checks for completeness

Factors to consider – 3 Questionnaire design Key: reduce respondent burden (effort to respond, frustration in responding) Cognitive psych principles used to simplify, clarify, test questions and questionnaire flow Examples of factors follow … Wording of individual questions Can respondent answer the question? Does s/he understand the question? Single concept, simple wording, transition

Factors to consider – 4 Questionnaire flow/design Content: is flow logical, assist in cognitive process? Mail, web, fax: visual interface is very important to helping respondent accurately complete questionnaire Length of questionnaire Shorten to extent possible Allowable length depends on how vested the respondent is likely to be

Factors to consider – 5 Survey introduction First contact between respondent and data collector Want to motivate respondent to participate Positive: contributions to knowledge base Negative: confidentiality concersn Methods (use both if possible) Advance letter to respondent or land owner (need address) Phone or written introduction to questionnaire

Factors to consider – 6 Incentives Follow-up to obtain response Money, gifts, coupons, lottery; penalties Hard to determine what is appropriate Generally has a positive effect Worry: incentive creep, increases cost of survey Respondents get used to it  increases difficulty and cost in gaining response Follow-up to obtain response Mail: repeated notifications after initial mailing Postcard reminder, 2nd questionnaire mailing Phone: protocols for repeated attempts to get an answer, refusal conversion

Factors to consider – 7 Sample design Use design and estimation principles that increase precision for a given sample size Stratification, ratio/regression estimation Less burden on population by using smaller sample size to achieve a given precision level

Example: Census study Decennial census Start with a mail survey, then do in-person nonresponse follow-up Little increases in response rates save big $$ Much cheaper to do a mail survey Entire US population, so “sample size” is large Impact of three methods on response rates Advance letter notifying household that census forms are coming Stamped return envelope included with form Reminder postcard sent a few days after the form Figure 8.1: letter, postcard > envelope Increased from 50  65%

Mechanisms for nonresponse Define a new random variable that indicates whether a unit responds to the survey We use a random variable because willingness to respond is not a fixed characteristics of a unit Define the probability that a unit will respond to the survey = propensity score

Types of nonresponse MCAR: missing completely at random MAR: missing at random given covariates Also called ignorable nonresponse Nonignorable nonresponse

Missing completely at random (MCAR) Propensity to respond is completely random Default assumption in many analyses Often not true Propensity score is not related to Known information about the respondent or design factors (x) Response variables to be observed (y) Implies If we take a SRS of n units, responding portion of sample is a SRS of nR units (sample mean of responding units) is unbiased for (population mean for whole pop)

Missing at random given covariates (ignorable) Propensity score Depends on known information about respondent or variables used in sample design (x) Does not depend on response (y) Since know values of x for all units in the population, can create adjustments for the nonresponse Adjustment methods depend on a model for nonresponse Example: propensity score depends only on gender and age, but does not depend on responses to questions in survey

Nonignorable nonresponse Propensity score depends on response (y) and can not be completely explained by other factors (x) Example: crime victims less likely to respond to victimization questions (y) on a survey Models will not fully adjust for potential nonresponse bias Very difficult to verify if nonresponse mechanism is nonignorable

Strategy 2: Call-backs and double sampling Basic idea Select a subsample of nonrepsondents Collect data from contacted nonrespondents Use these data to estimate population mean for nonrespondents, This subsample is referred to by Lohr as the “call-back” sample It is a telephone follow-up to a mail survey Method is more general than that The sampling design is an example of “double” or “2-phase” sampling (we won’t cover this in general) We will make the (very unrealistic) assumption that all of the “call-back” sample provides responses to the survey

Framework Whole Population N Non-respondents (NR) Respondents (R) NM Sample n

Subsample the nonresponding portion of population Whole Population N Non-respondents (NR) Respondents (R) NM NR nR Sample 100% of the nonresponding part of sample = nMCB =  nM units

Estimation Sample mean from responding population Sample mean from “call-back” subset of nonresponding population

Estimation – 2 Estimator for population mean Estimator for population total

Estimation – 3 Analysis weights Estimator for variance of Respondents in original sample: Nonrespondent “call-backs”: Estimator for variance of

Strategy 3: weighting methods for nonresponse Approaches Weighting-class adjustment Post-stratification In previous chapters Assume that all SUs/OUs provided a response Weights were typically inverse of inclusion probability wi = 1 /i Interpretation of weight Number of units in the population represented by unit i in the sample

Weighting methods for nonresponse What if not all SUs/OUs provide a response? Second probability = probability of responding for unit i = propensity score Weight for unit i Interpretation Number of units in the population represented by responding unit i Assumes data are missing at random (MAR, ignorable given covariates)

Weighting-class adjustment Create a set of “weighting” classes such that we can assume propensity score is same within each class Example: age classes 15-24, 25-34, 35-44, 45-64, 65+ Estimate propensity score using initial sampling weights, wi = 1 /i

Weighting-class adjustment – 2 New analysis weight for responding portion of sample Estimators for population total tU and mean

Example: SRS design (p. 266) Inclusion probability for unit i Estimated propensity score for unit i Analysis weight for responding unit i

Example: SRS design – 2 Table 8.2 for analysis weight (= weight factor in table) Estimator for population total under SRS Estimator for population mean under SRS

Weighting-class adjustment - 3 Selecting weighting classes Use principles for selecting strata Classes should be groups of similar units in relation to Propensity score (likelihood of responding) Response variable Should maximize variation across classes for these two factors

Post-stratification Assume SRS Very similar to weighting-class adjustment Classes are post-strata Use population counts rather than sample counts Weighting-class approach essentially estimates Nh in with

Post-stratification (under SRS) Assume SRS of n from N Estimator for population mean For a particular survey data set (condition on nhR , h = 1, 2, … H)

Strategy 4: Imputation Missing item (question) data are typical in a survey Refusals, data collector error, edit erroneous value after data collection Imputation is a statistical method for “filling in” missing values If impute all missing values, can get a complete rectangular data set (rows = units, columns = variables) An indicator variable should be developed to identify which values are imputed

Imputation methods Deductive imputation Cell mean imputation Common method, rarely applicable Cell mean imputation Leads to incorrect distribution of y in dataset Hot-deck imputation (random) Most common and generally applicable Regression imputation Between hot-deck and cell mean Multiple imputation Accounting for variation due to imputation process

Deductive imputation Sufficient information exists to identify the missing value Relatively uncommon (especially with computer-based systems) Example for NCVS Person 7 Crime victim = no Violent crime victim = ? Deductive imputation Crime victim = no  Violent crime victim = no

Cell mean imputation Procedure Properties Divide responding units in to imputation classes Within a given imputation class: Calculate the average value for available item data in class Fill in missing value for nonresponding unit with average value Properties Assumes MAR (covariates = classes) Retains mean estimate for an imputation class Underestimates variance, distorts distribution of y All missing values in a class are equal to the class mean

(Random) hot deck imputation Procedure Divide responding units in to imputation classes (like weighting classes) Choose like strata – group similar units in relation to variable with missing value Within a given imputation class Randomly select a donor from responding units in class Filling in missing value for nonresponding unit with value from donor unit Properties Retains variation in individual values Assumes MAR (imputation class = covariate) Can impute for many variables from same donor

Regression imputation Procedure Use a regression model to relate covariate(s) to variable with missing data Estimate regression parameters with data from responding units Fill in missing value with predicted value, or derived value from prediction (if > .5, binary y = 1) Properties Assumes MAR Useful when number of responding units in imputation class are too small Useful if a strong relationship exists that provides a better predicted value for the missing data May be a form of (conditional) mean imputation Requires separate model for each variable with missing data

Multiple imputation Procedure Properties Select an imputation method Impute m > 1 values for each missing data item Result is m (different) data sets with no missing values Properties Variation in estimates across data sets provides an estimate of the variability associated with the imputation process Solution to problem with other methods Most analysts treat imputed data as “real” rather than “estimated” data Underestimate variance of estimates

Imputation summary Most imputation methods assume MAR given covariates Variation in methods associated with model used to account for covariate Good methods exist that do not lead to a distorted distribution of y in the data set Avoid cell mean imputation Hot deck imputation allows us to perform imputation for >1 variable at a time Most imputation methods do not account for the fact that you are “estimating” the data when estimating the variance of an estimate This is the motivation for multiple imputation Need special estimators for variance in multiple imputation

Outcome rates MANY ways to describe results of processes between sample selection and completing data collection Phases Locating unit Contacting unit (for people, businesses) Gaining cooperation of a unit (refusals) Determining eligibility Obtaining complete item data for a unit AAPOR reference http://www.aapor.org/default.asp?page=survey_methods/response_rate_calculator