Stratification, calibration and reducing attrition rate in the Dutch EU-SILC Judit Arends
Overview The Dutch EU-SILC (2016 redesign) Stratification: oversampling low income groups Calibration Reducing attrition, nonresponse rate Plans and discussion points
Dutch EU-SILC: sources of data Statistics Netherlands (SN) Register country: most information from registers Selected respondent: only one person is interviewed Income component Mostly registers (t-1) child support, students, hh transfer Material deprivation All items: Survey Work intensity Register: employee status, source Survey: working hours, current status Some other target variables from register: country of birth, citizenship, NUTS, ethnic origin, child-care costs, rent, housing costs
Data collection strategy w1 w2-w4 Response = yes: Recruitment next poll 64- LFS cati Intro letter “Old” 65+ BRP cati “New”
Data collection strategy w1 w2-w4 Response = yes: Recruitment next poll 64- LFS cati cati Intro letter Intro letter “Old” 65+ BRP cati cati “New”
Data collection strategy w1 w2-w4 64- LFS cati Response = yes: Recruitment next poll cati Intro letter Intro letter “Old” 65+ BRP cati cati Intro letter BRP cawi Response = yes: Recruitment next poll nonresponse with phone- number “New” cati 64- cati 65+
Data collection strategy w1 w2-w4 Response = yes: Recruitment next poll 64- LFS cati cati Intro letter Intro letter “Old” 65+ BRP cati cati Intro letter Intro letter BRP cawi Response = yes: Recruitment next poll cawi nonresponse with phone- number nonresponse with phone- number “New” cati 64- cati 65+ cati
Panel – Old & New design “Old” “New” Interview year t-3 t-2 t-1 t t+1 Sampling year t-1 w1 w2 w3 w4 t w1 w2 w3 w4 t+1 w1 w2 w3 w4 Not sure of everyone is familiar with ‘structure’ of SILC so I will briefly explain … “New” t+2 w1 w2 w3 w4 t+3 w1 w2 w3 w4
Sampling frames at CBS Municipal basic registration of population data. gender date of birth marital status native country native country parents nationality type of household position in household RIN number address municipality district code Not listed in CBS register: name telephone number
Sampling frames at CBS Additional register information: income self-employed benefit employed / unemployed student grant disability addresses of institutional population Tax authorities Employment office Ministry of Education Ministry of Social Affairs and Employment Municipalities
Sampling frames: addresses (households) 10 %
Completing the samples Deleting sample elements with missing or incorrect address information Deleting institutional population Deleting addresses that are selected in a different survey in the last 12 months Adding telephone numbers (Under- or oversampling for subpopulations) Reducing sample to desired size in each stratum < 12 65 +
Sampling design 2016 Sample persons were drawn form the sampling frame of persons from the Population Register (BRP) Stratified sampling design Strata: income, household size, and 16 years 30 strata (22 – 21 strata): 10 decile income groups (t-2), 16 years household size 17+ (1 – 2 or more)
Sampling size Screening: - 7% 1.07*2.84*16.268=49.435 strata age Hh size Income decil total population over-sampling 1 17+ 1301 441262 2.45 2 1402 409657 2.84 3 1229 389943 2.62 4 911 352467 2.15 5 613 311516 1.63 6 496 275288 1.50 7 402 245862 1.36 8 346 219692 1.31 9 287 190783 1.25 10 289 173555 1.38 11 2 or more 643 539456 0.99 12 788 575937 1.14 13 930 733345 1.05 14 896 849475 0.88 15 815 990995 0.68 16 851 1120995 0.63 17 859 1230166 0.58 18 898 1331365 0.56 19 931 1435404 0.54 20 1022 1442278 0.59 21 1 or 2 39 15017 2.18 22 52 18777 2.27 23 45 17759 2.10 24 35 16760 1.75 25 34 20404 1.37 26 22900 1.26 27 32 23481 1.16 28 31 22678 1.12 29 20753 1.08 30 20641 1.18 Tot 16268 13506529 1.00 Screening: - 7% 1.07*2.84*16.268=49.435 Thinning out: each strata
Response distribution income group Strata Age Hhsize Incomedecil Response 1 17+ 26% 2 29% 3 33% 4 37% 5 42% 6 41% 7 8 46% 9 47% 10 39% 11 2 or more 35% 12 31% 13 14 45% 15 49% 16 17 51% 18 54% 19 56% 20 57% 21 1 or 2 1 to 5 22 6 to 10 Total
Response distribution: income group
Response probabilities 2017 Hh size = 1 Hh size = 2+ 16 y Ptotal = 42,0% (average 2 types of incentives)
Weighting adjustments Four weighting adjustments are applied at SN (all towards population of 16 years and older) Wave 1 Wave 2 to 4 (separately) Cross-sectional Longitudinal
Administrative variables General socio-demographic Gender, age, province, household type and size, ethnicity, country region, urbanization SILC-specific Income (personal and household), house ownership, socio-economic status (SES) SES: employee, other active, allowance, pension, other From income data, three variables are derived: Household income deciles Household income below SN threshold Household income below poverty threshold EU
Weighting wave 1 Model = Gender (2 classes) × Age (15 classes) + Province/NUTS2 (12 classes) × Age (2 classes) + Household size (5 classes) + NUTS2 (12 classes) + Ethnicity (3 classes) + Low income category SN (3 classes) + Degree of urbanization (5 classes) × EU poverty (3 classes) + Region/NUTS1 (4 classes) × EU poverty (3 classes) + NUTS1 (4 classes) × Income deciles (10 classes) + Tenure status/Houseownership (3 classes) + Activity status/SES (5 classes)
Weighting waves 2 to 4 Model = gender × age14 + province × age2 + hhsize4 + lowincome3 + urbanization × EU poverty + region × EU poverty + income deciles + houseownership3 Like wave 1 but less detailed
Weighting longitudinal data Model = gender × age15 + province × age2 + hhsize5 + lowincome3 + urbanization × EU poverty + region4 ×EU poverty + region4 × income deciles + houseownership3 + SES Like wave 1 but without ethnicity.
Weighting cross-sectional data Final model Model without SILC-specific variables Model = gender × age15 + province × age2 + hhsize5 + province + ethnicity3 + lowincome3 + urbanization × EU poverty + region4 ×EU poverty + province × income deciles + region4 × houseownership3 + province × SES + gender × age15 × hhtype Model = gender × age15 + province × age2 + hhsize5 + province + ethnicity3 + urbanization + region4 + gender × age15 × hhtype
Weighting cross-sectional data Income-related variables decrease estimates, i.e. provide a more positive view on poverty Standard errors also strongly deflated by the addition of the extra terms
Reducing attrition, nonresponse Age (especially younger people) In order to obtain a better estimate of the risk of poverty by age, the weighting model will be expanded with a crossing of age class and AROP Movers (CAWI) Question at the end: about their plans W2-W4: BRP before fieldwork? Incentives 10 euro’s (10% more response) iPad lottery: effect on response in W2 W2-W4: 5 euro’s Feedback of the results of the previous year? E-mails addresses?
Reducing attrition, nonresponse Invitation letters: receiving before the weekend Reminders: 2 letters CAWI - CATI CATI in W1 and CATI in W2: worked well CATI 65 min and 65 plus Morning, afternoon, evening Recruiting W2 About 80%: YES 65% in W2 52% response Experiment: not asking: 67% response
THANK YOU FOR YOUR ATTENTION Judit Arends-Tóth e-mail: jtoh@cbs.nl Bart Huynen e-mail: bhun@cbs.nl