The European Statistical Training Programme (ESTP)
Chapter 8: Weighting adjustment Handbook: chapter 8 What is weighting adjustment? Post-stratification Linear weighting Multiplicative weighting Calibration estimation Other weighting issues An example
Introduction What is weighting adjustment? Assignment of weights to observed (responding) persons. Use of weighted values to compute estimates. Why weighting? Reducing of the bias due to nonresponse. Increasing the precision of estimates (decreasing the variance). Required ingredients: auxiliary variables Usually categorical variables. Strongly correlated with target variables of the survey. Individual values are measured in the survey. Distribution in population (or full sample) must be available.
Introduction Principle Make response representative with respect to auxiliary variables. If auxiliary variables are correlated with target variable, then the sample will also be representative with respect to target variable. Use as much as possible auxiliary variables. Auxiliary variables Usually only limited number available (statistical institute). Examples: age, gender, marital status, region. They are not the most effective ones. Statistics Netherlands has many more auxiliary variables in the Social Statistical Database (SSD).
Introduction Adjustment weighting to correct for unit-nonresponse Use of auxiliary information A set of variables that have been measured in the survey and for which information on the population distribution is available Calculate adjustment weights Example: inclusion weight ci = 1 / πi This can also be written as Where wi is the inclusion weight ci times a correction weight di.
Post-stratification Suppose auxiliary variable X has L categories. It divides the population U into L strata U1, U2, …, UL. The number of elements in stratum Uh is denoted by Nh for h=1,2,...,L. So N = N1 + N2 + ... + NL The sample consists of n elements that can also be divided into the same strata, then n = n1 + n2 + ... + nL In case of simple random sampling without replacement With inclusion probabilities ci = n / N the post-stratification estimator becomes The estimator is equal to a weighted sum of sample stratum means
Post-stratification In case of nonresponse the post-stratification estimator becomes The bias of this estimator is equal to This can also be written as
Post-stratification A closer look at the bias: The bias is small if the biases within the strata are small This is the case when within strata there is little or no relationship between the target variable and the response behaviour. all response probabilities are more or less equal. all values of the target variable are more or less equal. Therefore, it is important to construct homogeneous strata.
Post-stratification Example: Sex Age Weight of a young female: 0.209 / 0.150 = 1.393 Population Sample Male Female Total Young 226 209 435 23 15 38 Middle 152 144 296 16 17 33 Elderly 133 136 269 13 29 511 480 1000 52 48 100 Weights 0.983 1.393 0.950 0.847 1.023 0.850
Linear weighting Who not post-stratification? Many auxiliary variables: too few (or no) observations per strata. Lack of sufficient population information. Solution Linear or multiplicative weighting Population Sample Male Female Total Young ? 435 23 15 38 Middle 296 16 17 33 Elderly 269 13 29 511 480 1000 52 48 100 Weights
Linear weighting Generalised regression estimator – full response with vector of population means of auxiliary variables vector of sample means of auxiliary variables b vector of regression coefficients: The estimator is asymptotically design unbiased (ADU) Variance
Linear weighting Generalised regression estimator - nonresponse Bias: with The bias is small if Residuals are small, i.e. regression model fits well. There is little correlation between residuals and response behaviour (MAR).
Linear weighting Generalised regression estimator = Weighting Estimator can be re-written as: (under general conditions) Consequently with and v a vector of weight coefficients:
Linear weighting Post-stratification Replace each qualitative auxiliary variable by a set of dummy variables. Use these dummy variables in regression model Example: one qualitative variable with L categories (strata) Introduce L dummy variables X1, X2, …, XL. Xh = 1 for an observation in stratum h, and 0 otherwise Vector of population means Vector of weight coefficients
Linear weighting Post-stratification - example Two auxiliary variables: Sex AgeClass Weight for young female = 1.393 Sex AgeClass X1 X2 X3 X4 X5 X6 Male Young 1 Middle Elderly Female Population means 0.226 0.152 0.133 0.209 0.144 0.136 Weight coefficients 0.983 0.950 1.023 1.393 0.847 0.850
Linear weighting Use only marginal distributions Two auxiliary variables: Sex + AgeClass Weight for young female = 0.991 + 0.033 + 0.161 = 1.185 Sex AgeClass X1 X2 X3 X4 X5 X6 Male Young 1 Middle Elderly Female Population means 1.000 0.511 0.489 0.435 0.296 0.269 Weight coefficients 0.991 -0.033 0.033 0.161 -0.095 -0.066
Linear weighting Many possible weighting models with more than two variables For example: three variables Sex, AgeClass, MarStat Models: Sex AgeClass MarStat (Sex AgeClass) + (AgeClass MarStat) + (Sex MarStat) Sex + AgeClass + MarStat And many more … (Sex MarStat) + AgeClass …
Linear weighting Qualitative and quantitative auxiliary variables Examples: Age, Age + Sex, Age Sex Age Sex X1 X2 X3 X4 X5 X6 65 Male 1 36 73 Female 6 33 82 2 32 66 Population means 1.000 34.369 0.511 0.489 33.509 35.268 Weight coefficients 1 1.101 -0.003 Weight coefficients 2 -0.032 0.032 Weight coefficients 3 1.087 -0.001 -0.004
Multiplicative weighting Alternative for linear weighting. Qualitative auxiliary variables only. Difference: weight is not a linear combination of weight coefficients but a product of weight factors. Iterative process: Weight model: (A B C) + (D E) + (F G H) Step 1: Introduce weight factors for each stratum in each cross-classification term. Set factors to 1. Step 2: Adjust factors for term 1, so that weighted sample distribution is equal to population distribution for variables involved. Step 3: Adjust factors for next term. This may disturb previous factors. Step 4: To this for all terms in the model. Step 5: Repeat steps 2-4 until factors do not change any more.
Multiplicative weighting Example: Sex + AgeClass Start situation Weight for young female = 1.000 1.000 = 1.000 Starting situation Male Female Weight factor Weighted sum Population distribution Young 0.230 0.150 1.000 0.380 0.435 Middle 0.160 0.170 0.330 0.296 Elderly 0.130 0.290 0.269 0.520 0.480 Popul. distr. 0.511 0.489
Multiplicative weighting Example: Sex + AgeClass Adjustment for Age Weight for young female = 1.000 1.145 = 1.145 Step 1 Male Female Weight factor Weighted sum Population distribution Young 0.230 0.150 1.145 0.435 Middle 0.160 0.170 0.897 0.296 Elderly 0.130 0.928 0.269 1.000 0.527 0.473 Popul. distr. 0.511 0.489
Multiplicative weighting Example: Sex + AgeClass Adjustment for Sex Weight for young male = 1.035 1.145 = 1.185 Step 2 Male Female Weight factor Weighted sum Population distribution Young 0.230 0.150 1.145 0.433 0.435 Middle 0.160 0.170 0.897 0.297 0.296 Elderly 0.130 0.928 0.270 0.269 0.969 1.035 0.511 0.489 1.000 Popul. distr.
Multiplicative weighting Example: Sex + AgeClass Final situation after convergence Weight for young female = 1.151 1.035 = 1.191 Step … Male Female Weight factor Weighted sum Population distribution Young 0.230 0.150 1.151 0.435 Middle 0.160 0.170 0.895 0.296 Elderly 0.130 0.923 0.269 0.968 1.035 0.511 0.489 1.000 Popul. distr.
Linear or multiplicative weighting? Advantages of linear weighting: Linear weighting based on regression model. Analytic formula for variances of estimators. Both qualitative and quantitative auxiliary variables. Disadvantage of linear weighting: Resulting weights may be negative. However: Both weighting methods produce estimates that are asymptotically equal.
Calibration estimation How to compare different weighting methods? Calibration estimation offers general framework for weighting methods General formulas for properties of methods like asymptotic distributions. Linear and multiplicative weighting are special cases Idea: Calibrate known auxiliary characteristics while affecting response as little as necessary Ingredients A distance measure D. Calibration of auxiliary variables X.
Calibration estimation Strategy Minimize under the constraint Examples Linear weighting: Multiplicative weighting: Without nonresponse both methods have the same asymptotic properties. With nonresponse the effectiveness depends on the validity of the underlying model .
Other weighting issues Consistency between person and household weights A survey may be used to make statistics about persons as well as about households. The person weights need not sum up to the household weights. Two sets of weights are impractical. Unsatisfactory solutions The household weight equals the weight of a reference person or randomly selected person in the household. Use the average person weight for the household. Generalised regression estimation
Other weighting issues Generalised regression estimation Auxiliary information at person level, X Household membership, H Auxiliary information at household level, Z = H’X Person 0-20 20-60 > 60 Male Female 1 2 3 4 Person H1 H2 H3 . . . 1 2 3 4 Household 0-20 20-60 > 60 Male Female 1 2
A practical example Selection of a weighting model Identification of auxiliary variables Collection of population totals Selection of weighting variables Weighting Example - Step 1 Available auxiliary variables: sex, age, marital status, province of residence and degree of urbanisation
A practical example Collection of population totals Ideal: complete crossing of all auxiliary variables. In practice totals of complete crossing often not available. Empty cells. Example - Step 2 Available: Age Sex Marital status, Age Province and Age Degree of urbanisation Age Male Female Unmarried Married Widowed Divorced 12-19 752.4 0.4 0.0 716.5 3.5 20-29 981.5 185.7 0.2 10.2 785.0 330.6 0.7 22.7 30-39 445.4 795.1 1.9 72.1 283.5 879.3 5.7 93.8 40-49 164.7 899.0 6.9 113.9 103.1 882.9 21.5 138.4 50-59 67.3 732.9 15.8 86.3 44.4 675.9 56.1 98.8 60-69 42.0 519.2 31.7 42.6 41.4 458.9 140.0 51.5 70-79 21.4 308.4 52.5 16.6 43.0 239.9 254.3 27.9 80+ 8.0 84.0 50.4 4.0 35.0 49.6 243.9 12.4
A practical example How to select weighting variables? Relation to nonresponse Relation to key survey topics Variables that are used as marginal variables in statistical tables Compute contingency tables (and possibly test for independence) Build model for nonresponse using auxiliary variables Select survey variables that cover range of topics in survey Compute contingency tables Build models for survey variables using auxiliary variables Take union of sets of auxiliary variables in models
A practical example Example - Step 3 Age Resp Pop Diff Province 12-19 12.8 11.1 1.7 Groningen 2.7 3.5 -0.8 20-29 15.9 17.5 -1.6 Friesland 4.3 3.9 0.4 39-39 20.5 19.4 1.1 Drenthe 2.3 3.0 -0.7 40-49 17.9 17.6 0.3 Overijssel 6.8 6.7 0.1 50-59 14.0 13.4 0.6 Flevoland 1.8 60-69 10.0 0.0 Gelderland 15.4 12.1 3.3 70-79 6.5 7.3 Utrecht 5.4 6.9 -1.5 80+ 2.5 3.7 -1.2 N-Holland 16.1 -2.1 Z-Holland 18.0 21.5 -3.5 Zeeland 2.4 N-Brabant 14.8 2.8 Limburg 9.1 7.4
A practical example Example - Step 3 Mar. status Resp Pop Diff Urbanisation Unmarried 32.7 34.2 -1.5 Very strong 11.8 18.0 -6.2 Married 57.2 53.2 4.0 Strong 24.0 23.8 0.2 Widowed 5.2 6.0 -0.8 Moderate 23.2 20.5 2.7 Divorced 4.9 6.7 -2.8 Little 23.3 21.1 2.2 Non 17.7 16.5 1.2 Sex Male 48.6 49.1 -0.5 Female 51.4 50.9 0.5
A practical example Computation of weights Construct weighting models with candidate auxiliary variables as ingredients for a number of key survey topics Control variance of estimates Example - Step 4 Weighting model Parameters Estimate Standard error 1 No weighting 43.4 1.2 2 Sex 3 Prov 12 43.3 4 MarStat 42.9 5 Urban 1.0 6 Age8 8 42.8 7 Age5 Province 60 (Sex Age8) + (Sex MarStat) 22 42.3 1.1 9 Age5 Urban 25 42.5 10 Sex + Age8 + MarStat + Urban + Province 23 42.1 11 (Sex Age8) + (Sex MarStat) + (Age5 Urban) + Province 53 42.0 0.9