Presentation is loading. Please wait.

Presentation is loading. Please wait.

The European Statistical Training Programme (ESTP)

Similar presentations


Presentation on theme: "The European Statistical Training Programme (ESTP)"— Presentation transcript:

1 The European Statistical Training Programme (ESTP)

2 Chapter 11: Adjustment for different types of nonresponse
Handbook: chapter 12 Motivation Methods Example

3 Motivation Relationship between response and auxiliary variables is the same for one survey The relationship between auxiliary variables and survey variables can be different in one survey. There are situations in which distinct causes of nonresponse may have a different influence on the survey variables.

4 Motivation Relationship between employment and non-contact (Daalmans et al. (2006)) Estimated number of employed persons as a function of the number of contact attempts Hard to contact persons are more often employed

5 Motivation There may be a correlation between the survey variables and the different response types This results in a different effect on the nonresponse bias Nonresponse bias of estimated response mean under the fixed response model: Two main causes of nonresponse: non-contact and refusal. These two types are nested.

6 Motivation Effect on bias: contact
noncontact participation refusal NR|NC NRF|C NRF|NC Effect on bias:

7 Methods Influence of the different causes of nonresponse can be a reason to use more advanced methods to adjust for nonresponse bias Two methods are discussed: Sequential weight adjustment method Sample selection model Sequential response process for contact and participation: ςi γi Pi = 1 Ci = 1 Pi = 0 Ci = 0 Sample Non-contact Contact Refusal Participation

8 Methods – Sequential weight adjustment method
Groves and Couper (1998), Iannacchione (2003) Method comes down to sequentially fitting a number of logistic regression models For different subsets of observations. Step 1: Fit a logistic regression model for contact for all elements

9 Methods – Sequential weight adjustment method
Sample elements that have successfully been contacted (Ci = 1) receive as a weight , so that they represent the sample. Step 2: Fit a weighted logistic regression model for participation, for contacted elements only

10 Methods – Sequential weight adjustment method
Final weights are obtained by multiplying wi with Hence, These weights can be used in regular nonresponse adjustment methods, for instance the Horvitz-Thompson estimator.

11 Methods – Sequential weight adjustment method
Accounts for the sequential nature of the response process and allows a distinction between different response types Does not allow for correlation between response types Participation propensities are only defined for subset of contacted elements This can be overcome by using a probit i.s.o. a logit model. Underlying, latent participation probability (also for non-contacted elements) Error distribution allows for correlation between the different stages of the response process This leads to the Sample selection model

12 Methods – Sample selection model
Heckman (1979) Sample selection arises due to self selection; either explicit (refusal) or implicit (non-contact) Yi* = Yi Yi* missing ςi* γi* Pi = 1 Ci = 1 Pi = 0 Ci = 0 Sample Non-contact Contact Refusal Participation

13 Methods – Sample selection model
Two equations: Response (selection equation) Survey variable (regression equation) Both variables are latent variables Outcome of first equation determines whether the survey variable is observed.

14 Methods – Sample selection model
Error term distribution assumed to be bivariate normal: Estimator based on the sample selection model: Estimation of model can be done by Maximum Likelihood Estimation (MLE) or Heckman’s two stage estimator.

15 Methods – Sample selection model
Identification of the sample selection models hinges on the assumption of a bivariate normal distribution of the error terms This causes a serious lack of robustness against misspecification

16 Example – GPS Methods applied to the General Population Survey
Fieldwork results: Unprocessed cases become non-contact Not able becomes refusal Result Frequency Percentage Sample size 32,019 100.0 % Response 18,792 58.7% Nonresponse Unprocessed cases Non-contact Not able Refusal 13,227 2,456 1,847 1,034 7,890 41.3% 7.7% 5.8% 3.2% 24.6% Result Frequency Percentage Sample size 32,019 100.0 % Response 18,792 58.7% Nonresponse Non-contact Refusal 13,227 4,303 8,924 41.3% 13.4% 27,8%

17 Example – SWA-method Fit two separate logit models: one for contact and one for participation Contact logit model fitted for all n = 32,019 observations Participation logit model fitted for subset of observations that are contacted nc = 27,716 First, bivariate analysis of auxiliary variables with contact and participation

18 Example – SWA-method Auxiliary variable Cramér’s V Contact
Participation Response Region of the country 0.205 0.092 0.163 Degree of urbanization 0.186 0.086 0.153 Has listed phone number 0.138 0.107 0.150 Percentage non-western non-natives in neighborhood 0.144 0.088 Percentage non-natives in neighborhood 0.081 0.133 Average house value in neighborhood 0.109 0.115 Ethnic background 0.098 0.089 0.112 Type of household 0.122 0.067 0.106 Size of the household 0.114 0.066 0.099 Marital status 0.130 0.058 0.097 Is non-native 0.087 Has social allowance 0.064 0.077 Age in 13 classes 0.095 0.071 0.061 Has an allowance 0.034 0.055 Children in household 0.053 0.038 0.056 Has a job 0.012 0.037 Age in 3 classes 0.084 0.044  0.030 Has disability allowance 0.001 0.025 0.021 Gender 0.011 Has unemployment allowance 0.003 0.002 0.000 Example – SWA-method

19 Example – SWA-method Next, stepwise building logistic regression models. The region of the country and having a listed phone are by far the most influential variables in the models for contact and participation. Contact Participation Response Variable Wald χ2 Region 135.40 Listed phone 149.84 242.04 127.57 113.14 164.79 Age in 13 categories 81.95 Marital status 75.87 Ethnic background 93.65 Urbanization 43.17 72.45 74.12 Type of household 42.84 Is non-native 72.02 Size of household 52.58 Average house value 41.19 Age in 3 categories 63.59 25.29 Has children 35.51 58.48 Has a job 23.70 31.02 44.50 23.61 Gender 21.37 43.70 16.28 Percentage of non-natives 19.62 14.40 14.72 12.00 Has social allowance 11.30 10.97 8.97 3.91 Has a social allowance 8.62

20 Variable Category β se Intercept 0.8868** 0.2028 Region Woodlands 0.1424 0.0762 (Reference Greenfields) Lowlands ** 0.0721 Highlands 0.0279 0.0728 Metropolis ** 0.1009 Urbanization Strong 0.0837 (Reference very strong) Fairly 0.1126 0.0882 Little 0.0895 Not 0.3465** 0.0977 Percentage of non-natives 5-10% 0.0310 0.0481 (Reference <5%) 10-15% 0.0582 15-20% 0.0699 20-30% ** 0.0707 30-40% * 0.0923 40-50% * 0.1186 >50% ** 0.1394 Listed Phone Yes 0.4452** 0.0394 (Reference no) Marital status Married 0.2880** 0.0570 (Reference not married) Widowed 0.2856** 0.0962 Divorced 0.0401 0.0726 Type of household Couple without children 0.3688** 0.0591 (Reference single) Couple with children 0.0219 0.0834 Single parent collinear Other 0.1534 Variable Category β se Average housevalue 50 – 75 thousand 0.3062* 0.1511 (Reference < 50 thousand) 75 – 100 thousand 0.3643* 0.1474 100 – 125 thousand 0.3602* 0.1483 125 – 150 thousand 0.3489* 0.1492 150 – 200 thousand 0.3047* 0.1480 200 – 250 thousand 0.2214 0.1517 250 – 300 thousand 0.2167 0.1588 300 – 350 thousand 0.1674 350 – 400 thousand 0.1831 400 – 500 thousand 0.1223 0.1982 > 500 thousand 0.2067 0.2306 Age in 13 categories 20-24 years ** 0.1161 (Reference years) 25-29 years ** 0.1167 30-34 years * 0.1186 35-39 years 0.1208 40-44 years 0.0090 0.1253 45-49 years 0.1267 50-54 years 0.0618 0.1303 55-59 years 0.1361 60-64 years 0.2272 0.1442 65-69 years 0.1183 0.1450 70-74 years 0.3128* 0.1533 75 + 0.2099 0.1460

21 Example – SWA-method Pseudo R2 is low, 7.8% explained variance
Variable Category β se Is non-native Yes ** 0.0482 (Reference no) Has children 0.4805** 0.0806 (Reference no ) Gender Female 0.1631** 0.0353 (Reference male) Has a job 0.1430** 0.0413 Pseudo R2 0.0780 χ2-value df 49 Pseudo R2 is low, 7.8% explained variance Results confirm what is known from the literature on nonresponse

22 Variable Category β se Intercept 1.1674** 0.1905 Listed Phone Yes 0.4064** 0.0332 (Reference no) Region Woodlands 0.0497 (Reference Greenfields) Lowlands ** 0.0474 Highlands 0.0484 Metropolis ** 0.0581 Ethnic background First generation non-western collinear (Reference native) First generation western 0.4839** 0.0923 Second generation non-western 0.3608 0.2579 Second generation western 0.7092** 0.0933 Average housevalue 50 – 75 thousand 0.1695 (Reference < 50 thousand) 75 – 100 thousand 0.1650 100 – 125 thousand 0.1646 125 – 150 thousand 0.1647 150 – 200 thousand 0.1637 200 – 250 thousand 0.1653 250 – 300 thousand 0.1686 300 – 350 thousand 0.1751 350 – 400 thousand 0.0701 0.1885 400 – 500 thousand 0.1913 > 500 thousand 0.0406 0.2125 Variable Category β se Age in 13 categories 20-24 years ** 0.0995 (Reference years) 25-29 years ** 30-34 years ** 0.1011 35-39 years 0.3099** 0.0604 40-44 years 0.2821** 0.0615 45-49 years 0.1258* 0.0596 50-54 years collinear 55-59 years 0.0779 60-64 years 0.0042 0.0783 65-69 years 0.1028 0.0793 70-74 years 75 + ** 0.0745 Size of household 2 ** 0.0485 (Reference 1) 3 ** 0.0548 4 * 0.0582 5 or more 0.0671 Has social allowance Yes ** 0.0711 (Reference no) Marital status Married 0.3727** 0.0458 (Reference not married) Widowed 0.0558 0.0719 Divorced 0.2570** 0.0645 Is non-native ** 0.0757

23 Example – SWA-method Variable Category β se Has a job Yes 0.1276** 0.0336 (Reference no) Age 35 – 54 years ** 0.1060 (Reference 18 – 34 years) 55 years and older ** 0.1196 Gender Female 0.0541* 0.0274 (Reference male) Pseudo R2 0.0270 χ2-value 941.99 df 42 Pseudo R2 is even lower than for contact, 2.7% explained variance This may be an advantage (think in terms of representativity)

24 Example – SWA-method Variabele Category Response mean SWA GREG
PC in household yes 57.8 55.2 55.3 no 42.6 44.8 44.7 Wants to move within a year definitely not 72.3 70.7 possibly 15.2 15.6 15.7 cannot find something 1.7 1.8 definitely 8.4 9.4 9.3 is going to move 2.5 General health condition very good 22.4 21.8 good 55.4 54.7 54.8 reasonable 13.1 13.6 varied 6.6 7.1 7.0 bad 2.8 Has newspaper subscription 65.6 62.9 62.8 34.4 37.1 37.2 Is active in a club 44.4 42.8 55.6 57.2 Is interested in politics very interested 11.6 11.9 fairly interested 42.4 41.9 little interested 29.1 28.5 not interested 16.9 17.7 Variabele Category Response mean SWA GREG Job level very low 3.6 3.8 low 13.3 13.1 13.0 middle 22.8 22.0 high 11.2 11.0 academic 4.2 no job 44.8 46.0 46.1 Level of education primary 20.0 21.3 21.2 junior secondary 9.2 9.1 prevocational 17.4 17.0 senior secondary 6.9 7.0 7.1 post secondary 28.0 26.9 higher professional university 5.2 5.4 Owns a house yes 62.5 58.3 59.3 no 37.5 41.7 Religious denomination none 37.2 38.4 38.7 roman catholic 33.7 32.1 31.9 protestant 22.3 21.1 islam 1.6 2.7 other 5.3 5.7 Employment situation works 12 hours or more 55.2 54.0 53.9 works less than 12 hours 6.1 5.8 does not work 40.2 40.3

25 Example – Sample selection model
Applied to overall response for the target variables PC in household, owns a house, has a newspaper subscription, and is active in a club. The sample selection model then reduces to the bivariate model with sample selection. Hence, no distinction between different types of response. The selection equation models the relationship between the auxiliary variables and response. The outcome equation models the relationship with the auxiliary variables and the different target variables.

26 Example – Sample selection model
Variable Wald χ2 Region 164.79 Degree of urbanization 16.28 Having a listed phone 242.04 Average housevalue 25.29 Ethnic background 93.65 Type of household 23.61 Size of household 52.58 Marital status 74.12 Has a social allowance 8.62 Has a job 23.70 Age (3 categories) 10.97 Gender 14.72 pseudo R2 0.042 χ2 df 40 Example – Sample selection model

27 Example – Sample selection model
Variable Category β Intercept Region Woodlands 0.0109 (Reference Greenfields) Lowlands ** Highlands Metropolis ** Urbanization Strong (Reference very strong) Fairly Little Not 0.0825 Listed phone Yes 0.4595** (Reference no) Average housevalue 50 – 75 thousand (Reference < 50 thousand) 75 – 100 thousand 100 – 125 thousand 125 – 150 thousand 0.0965 150 – 200 thousand 0.1144 200 – 250 thousand 0.1478 250 – 300 thousand 0.1749 300 – 350 thousand 0.1860 350 – 400 thousand 0.0938 400 – 450 thousand 0.1217 450 – 500 thousand 0.0849 > 500 thousand 0.2209 Variable Category β Ethnic background First generation non-western ** (Reference native) First generation western ** Second generation non-western Second generation western 0.0056 Type of household Couple without children (Reference single) Couple with children Single parent Other * Size of household 2 0.3415 (Reference 1) 3 0.4114 4 0.5972 5 or more 0.7391* Marital status Married 0.3229** (Reference not married) Widowed 0.0867 Divorced 0.1763** Has social allowance Yes ** (Reference no) Has a job 0.1408** Age 35 – 54 years ** (Reference 18 – 34 years) 55 years and older * Gender Female 0.0932** (Reference male)

28 Example – Sample selection model
The models for the target variables consist of the five variables that have the strongest bivariate relationships with the target variables. Hence, for each target variable a different model is used. Target variable Model PC in household Age in 13 categories, age in 3 categories, size of household, type of household, has a job Has newspaper subscription Average house value, age in 13 categories, percentage of non-western non-natives in neighborhood, percentage of non-natives in neighborhood, has a listed phone Is active in a club Percentage of non-western non-natives in neighborhood, average house value, ethnic background, has a listed phone, degree of urbanization Owns a house Average house value, percentage of non-western non-natives in neighborhood, percentage of non-natives in neighborhood, type of household, size of household

29 Example – Sample selection model
Target variable Category Response mean Sample selection model GREG PC in household yes 57.8 55.0 55.3 no 42.6 45.0 44.7 Has newspaper subscription 65.6 62.8 34.4 37.2 Is active in a club 44.4 42.8 55.6 57.2 Owns a house 62.5 58.5 59.3 37.5 41.5 41.7 Compared to the response mean the estimates for the categories ‘yes’ of the target variables are lower for the sample selection estimates. The same holds for the GREG-estimates The adjustment made to the estimated percentage of persons that are active in a club is smaller than the adjustments made for the other target variables.

30 Example – Sample selection model
The sample selection model allows for a correlation between the selection equation (response) and the outcome equation (target variable). A zero correlation implies that there is no selection bias due to nonresponse. There is a negative correlation between the four bivariate target variables in the GPS and response. This shows in the downwards adjustment of the values for the target variables compared to the response means. Target variable χ2-value p-value PC in household -0.8 186.8 0.0000 Has newspaper subscription 113.2 Is active in a club -0.4 10.6 0.0011 Owns a house 213.7


Download ppt "The European Statistical Training Programme (ESTP)"

Similar presentations


Ads by Google