Fredrik Olsson, Statistics Sweden, fredrik.olsson@scb.se Analysis on nonresponse bias for the Swedish Labour Force Surveys (LFS) Fredrik Olsson, Statistics Sweden, fredrik.olsson@scb.se 28 June 2018 Session 18
Nonresponse rate in the Swedish LFS
Analysis on nonresponse bias for the Swedish Labour Force Surveys (LFS) Joint work by Martin Axelson Vanja Hultkrantz Pär Sandberg Fredrik Olsson Frida Videll Report published at https://www.scb.se/publikation/35754
Methods for analyzing effects of nonresponse The Hansen-Hurwitz method Subsample of nonrespondents + Enable analysis on survey variables - Expensive - Hard to collect data from the entire subsample Study by using relevant and available data Register variables are collected for the entire sample + Relatively cheap + Easy to repeat - Not survey variables There exists different methods for analyzing the effects of nonresponse. In our work we discussed two methods, Hansen-Hurwitz and a register based analysis. In the Hansen-Hurwitz method, a subsample of nonrespondents is sampled and asked about some or all of the survey variables. On the plus side, this method enables analysis on the survey variables. On the down side it is expensive and it’s hard to collect data from the entire subsample. In a register based analysis, relevant and available data are used as an approximation for survey variables. To enable this analysis, register variables are collected for the entire sample. The advantages with this method is that it’s relatively cheap, especially compared to the Hansen-Hurwitz method, and that it’s easy to repeat. One negative thing is that the analysis isn’t done on the survey variables.
Chosen method Analysis using register variables 𝜃 𝑧𝑟 is the estimator for variable 𝑧 under nonresponse 𝜃 𝑧𝑠 is the estimator for variable 𝑧 under full response Bias : 𝐵 𝜃 𝑧𝑟 = 𝜃 𝑧𝑟 − 𝜃 𝑧𝑠 Relative bias: 𝑅𝐵 𝜃 𝑧𝑟 = ( 𝜃 𝑧𝑟 − 𝜃 𝑧𝑠 )/ 𝜃 𝑧𝑠 We chose to do an analysis using register variables. We computed two different estimates for each register variable. These estimates are computed using the same estimation currently in use in the Swedish LFS. 𝜃 𝑧𝑟 is the estimator for variable z that is obtained under nonresponse. 𝜃 𝑧𝑠 is the estimator for variable z that is obtained under full response. The estimated nonresponse bias is computed by subtracting the estimate for the sample from the estimate for the respondents. The estimated relative bias is computed by dividing the estimated bias with the estimate for the sample.
Example of registers used The Employment Register Employed Swedish Public Employment Service´s register of job- seekers Unemployed The Employment Register and Swedish Public Employment Service´s register of job-seekers Not in the labour force Register on Participation in Education Students Register over young people not in employment, education or training NEET We have used variables from different registers to approximate some of the sampling variables in the LFS. From the Employment Register we use information to approximate the LFS-variable employed. From the Swedish Public Employment Service’s register of job-seekers we use information to approximate unemployed. We also use information from these two registers to approximate not in the labour force. To approximate employees we use information from register on activity. From Register on Participation in Education we use information to approximate students. From Register on income and taxes we analyze three different groups of income. To approximate NEET, we use Register over young people not in employment, education or training.
Summary. Age 16-74, December 2015 Employed Unemployed Estimate respondents Estimate sample Bias Relative bias Employed 4 726 000 (±31 000) 4 673 000 (±23 000) 53 000 (±21 000) * 1,1 (±0,4) Unemployed 276 000 (±18 000) 268 000 (±13 000) 8 000 (±13 000) 2,9 (±4,9) Not in the labour force 2 180 000 (±32 000) 2 241 000 (±24 000) -61 000 (±21 000) -2,7 (±0,9) Students 1 123 000 (±32 000) 1 019 000 (±23 000) 104 000 (±23 000) 10,2 (±2,2) NEET (16-24, year 2014) 60 900 (±4 700) 86 000 (±4 000) -25 100 (±2 500) -29,1 (±3,2) Here’s a table that summarize the results for December 2015 for those aged 16-74. There are two different estimates of each variable. One estimate is based on the respondents and one estimate is based on the sample. The estimated bias is obtained by subtracting the estimate based on the sample from the estimate based on the respondents. The estimated relative bias is obtained by dividing the bias with the estimate based on the sample. Each of these estimates are presented with margin of error. If the estimated bias is higher than the margin of error, the bias is significantly different from zero and is marked with a star. The same goes for the estimate of the relative bias. We can see that the relative bias is significantly different from zero for all variables except for unemployed. For employed, the relative bias is 1,1… We can see that NEET and students obtain the highest levels of relative bias. What they have in common is that they are focused on young people.
”Employed” Relative bias for employed. Age 16-74. December 2011 – December 2015. Per cent. Looking at the relative bias for employed for December 2011 to December 2015 we can see that it has been relatively stable over time.
”Unemployed” Relative bias for unemployed. Age 16-74. December 2011 – December 2015. Per cent. Looking at the relative bias for unemployed for December 2011 to December 2015 we can see that there has been some changes but there hasn’t been an increase in the relative bias.
”Not in the labour force” Relative bias for not in the labour force. Age 16-74. December 2011 – December 2015. Per cent. The relative bias for not in the labour force has been relatively stable looking at estimates for December 2011 to December 2015. For the same time period the nonresponse rate has increased from 25 to 40 per cent. Looking at these results we can’t see a clear indication of an increasing nonresponse bias as the nonresponse rate increases.
Estimates of change Employed or unemployed according to register variables Monthly basis Comparison of corresponding months from one year to the next Let’s continue with estimates of change. In order to conduct this analysis we have created a variable for employed and unemployed using register variables. This variable is created on a monthly basis and is used to compare corresponding months from one year to the next.
”Employed” and ”unemployed” Estimates of change regarding employed and unemployed. Age 16-74. January 2013 – December 2015. In this graph we can se the estimates of change regarding employed and unemployed for January 2013 to December 2015. The blue lines represent estimates of employed and the purple line represent estimates of unemployed. The solid lines represent estimates based on respondents and the dotted lines represent estimates based on the sample. For unemployed, the estimates based on the respondents are similar to those based on the sample. The same pattern is also seen for employed with some exceptions.
”Employed” Estimated bias and 95 per cent confidence interval. Estimate of change regarding employed. Age 16-74. January 2013 – December 2015. If we take a closer look at employed we can see that only a few of the observed differences are significantly different from zero. In this graph we can see the estimated bias and the 95 per cent confidence interval. If all of the marks belonging to a given period are over or under zero, we have a difference that is significantly different from zero. Here we can see two observations that don’t cover zero.
”Employed” and ”unemployed”, upper secondary or non-tertiary education Estimates of change regarding employed and unemployed. Upper secondary or non-tertiary education. Age 16-74. January 2013 – December 2015. The pattern of higher levels of nonresponse bias for subgroups after education that we saw for estimates of level is also seen for estimates of change. Here we can see the estimates for employed and unemployed for upper secondary or non-tertiary education. For unemployed, the estimates based on the respondents are similar to those based on the sample. For employed, the estimates based on the respondents tend to be lower than those based on the sample. Let’s take a closer look at the estimates for employed…
”Employed”, upper secondary or non-tertiary education Estimated bias with 95 per cent confidence interval. Upper secondary or non-tertiary education. Age 16-74. January 2013 – December 2015. Here we can see that the estimates based on the respondents are systematically lower than the estimates based on the sample.
Summary – Estimates of level The estimated bias is significantly different from zero for many estimates Estimates with a focus on young people have the highest levels of estimated bias NEET Students The relative bias has been fairly constant over time We’ll start with estimates of level. For many estimates, the estimated bias is significantly different from zero. The highest levels of estimated bias is obtained for estimates with a focus on young people, that is for estimates of NEET and students. We have also seen that the estimated bias varies over subgroup and that the highest levels are obtained for education. Estimates by primary or lower secondary education tend to be underestimated and estimates by tertiary education tend to be overestimated. When we looked at the relative bias over time we could see that it’s been fairly constant over time for employed, unemployed and not in the labour force.
Summary - Estimates of change Unemployed Estimates based on respondents are similar to those based on the sample Employed In general the same pattern as for unemployed A difference is seen for education Upper secondary or non-tertiary education Estimates based on respondents are systematically lower than estimates based on the sample Tertiary Estimates based on the respondents are systematically higher than estimates based on the sample For estimates of change, we looked at unemployed and employed. For unemployed, the estimates based on respondents are similar to those based on the sample. For employed, we can, in general, see the same pattern as for unemployed. For upper secondary or non-tertiary education, estimates based on respondents are systematically lower than those based on the sample. For tertiary education, estimates based on the respondents are systematically higher than those based on the sample.
Fredrik Olsson, Statistics Sweden, fredrik.olsson@scb.se Analysis on nonresponse bias for the Swedish Labour Force Surveys (LFS) (https://www.scb.se/publikation/35754) Fredrik Olsson, Statistics Sweden, fredrik.olsson@scb.se