Chapter 5: The analysis of nonresponse Handbook: chapter 5 How to detect a bias? Where to find auxiliary information An example: POLS 1998 Global analysis A more detailed analysis Conclusions
How to detect a bias? The question: How to detect whether estimates for a target variable are biased if you don’t know its values for the nonrespondents? The answer: Use auxiliary variables! Auxiliary variables: Have been measured in the survey Population (or complete sample) distribution is available Correlated with target variables Important: Decision to include auxiliary variables in survey at design stage.
How to detect a bias? Possible relationships between target variable, response behaviour and auxiliary variable Missing Completely At Random (MCAR): No relation between X and R. No bias. No problem. Missing At Random (MAR): Relation between X and R. Bias can be reduced by using X. Not Missing At Random (MAR): Direct relation between Y and R. Bias cannot be reduced. Response behaviour R Auxiliary variable X Target variable Y
Where to find auxiliary variables? In the sampling frame itself Example: Survey on Household Living Conditions 1998 Two-stage sample, first stage: municipalities, second stage: persons from population register Auxiliary variables: gender, age, marital status, geographical area
Where to find auxiliary variables? Observations by interviewers Example: Housing Demand Survey 1977/1978 Estimate of building period of house, Type of house, Number of floors.
Where to find auxiliary variables? National Statistical Institute Example: Survey on Household Living Conditions 1998 Population by sex, age, marital status and region
Survey on Household Living Conditions 1998 (POLS) Continuous survey Every month a new sample, thematic modules Fieldwork First month CAPI, second month CATI Fieldwork results Frequency Percentage Sample size 39 302 100.0 % Response Immediate response Converted refusers Other response 24 008 9718 14 275 15 61.1 % 24.7 % 36.3 % 0.0 % Nonresponse Unprocessed cases Non contact (not-at-home) Non contact (moved) Not able (illness, handicap) Not able (language problem) Refusal Other nonresponse 15 294 2 514 2 093 376 735 416 8 918 242 38.9 % 6.4 % 5.3 % 1.0 % 1.9 % 1.1 % 22.7 % 0.6 %
Survey on Household Living Conditions 1998 (POLS) Composition of nonresponse
Survey on Household Living Conditions 1998 (POLS) Auxiliary variables Social Statistics Database of Statistics Netherlands Data on demography, geography, income, labour, education, health, social protection Variables used: sex, age, province, ethnic group, employment, social security benefits Data on postal code areas 420,000 area’s, homogeneous with respect to socio-economic characteristics Variables used: urbanisation, town size, percentage non-natives, average housing value Fieldwork data Information about all contact attempts Variables used: result of contact attempt, type of nonresponse, interviewer district, telephone number
Global nonresponse analysis of POLS 1998 Age
Global nonresponse analysis of POLS 1998 Marital status
Global nonresponse analysis of POLS 1998 Household size
Global nonresponse analysis of POLS 1998 Degree of urbanisation
Global nonresponse analysis of POLS 1998 Province
Global nonresponse analysis of POLS 1998 Average house value in neighbourhood
Global nonresponse analysis of POLS 1998 Percentage of non-natives in neighbourhood
Global nonresponse analysis of POLS 1998 Listed fixed-line telephone number
Detailed analysis Ingredients for a detailed analysis of nonresponse Respondent characteristics Neighbourhood characteristics Interviewer and data collection characteristics
Detailed analysis Respondent characteristics Demographics like age, gender, marital status, type of household Socio-economics like income, employment, education Neighbourhood characteristics Type of dwelling and neighbourhood, degree of urbanisation Interviewer and data collection characteristics Type of nonresponse Contact and refusal strategy Interviewer experience and gender Interview mode
Detailed analysis Decomposition of response process Sample Processed? Not processed Not moved? Moved Contacted? Not contacted Participated? Language problem Not able Refusal Response
Detailed analysis Why a decomposition of the response process? It gives insight into the possible causes for selective response It may reveal possible measures to reduce nonresponse It enables the incorporation of auxiliary information Compare survey topics to steps in response process Do you travel a lot? Did you recently buy a new house? How many years of education in Dutch do you have? Do you participate in surveys?
Detailed analysis Processing allocated cases Respondent characteristics: listed phone, access to web Neighbourhood characteristics: type of neighbourhood Interviewer and data collection characteristics: interviewer workload Example Interviewer district, average value of houses Proportion explained: 9.1%
Detailed analysis Households that moved Respondent characteristics: available auxiliary information Neighbourhood characteristics: available auxiliary information Interviewer and data collection characteristics: tracking strategy, interviewer experience Example Age, marital status, degree of urbanisation Proportion explained: 15.3%
Detailed analysis Non-contacted cases Respondent characteristics: available auxiliary information Neighbourhood characteristics: available auxiliary information Interviewer and data collection characteristics: number and timing of contact attempts, registered phone, interviewer workload Example Registered phone, interviewer district, age, type of household Proportion explained: 15.2%
Detailed analysis Non-contacted cases N=32996P= 5.9% N=25512 P=3.8% Listed phone number No listed phone number Other Child, married, single parent, partner in unmarried couple without children Other towns Big cities Flevoland, Amsterdam N=232 P=8.2% N=1494 P=18.9% Divorced N=482 P=13.3% N=1012 P=21.5% N=1064 P=30.3% N=581 P=17.2% Districts: 8,12,15,17 Couple without children, single parent Districts Non-contacted cases
Detailed analysis Language problems Respondent characteristics: available auxiliary information Neighbourhood characteristics: available auxiliary information Interviewer and data collection characteristics: ethnicity of interviewer, proxy-interviewing Example Ethnicity, i.e. origin and generation Proportion explained: 51.2%
Detailed analysis Mentally or physically not able Respondent characteristics: available auxiliary information Neighbourhood characteristics: available auxiliary information Interviewer and data collection characteristics: - Example Age, employment status Proportion explained: 20%
Detailed analysis Refusals Respondent characteristics: available auxiliary information Neighbourhood characteristics: available auxiliary information Interviewer and data collection characteristics: interviewer experience, interviewer gender and age, interview mode, follow up strategy Example Registered phone, type of household, average house value, age, interviewer district Proportion explained: <5%
Detailed analysis In summary, literature confirms most findings Sample Processed? Interviewer workload, average house value Not moved? Age,type of household, Contacted? Registered phone, interviewer workload, age, type of household Participated? Age, interviewer experience,ethnic background, interviewer experience, average house value Response
Conclusions Identify and link available auxiliary information Respondent characteristics Neighbourhood characteristics Interviewer and data collection characteristics Investigate literature on similar surveys and data collection strategies If not available plan future collection of missing crucial auxiliary information Decompose response process Make at least a distinction between contact and participation Relate survey topics to steps in response process Relate interviewer and data collection characteristics to steps in response process Identify mode characteristics