Peter Linde, Interviewservice Statistics Denmark

Peter Linde, Interviewservice pli@dst.dk Statistics Denmark
An example on: Non-response adjustment by registers and estimation of variance by replicate weights Peter Linde, Interviewservice Statistics Denmark

Agenda Challenge with non-response
Case: Household Budget Survey How to weight as correctly as possible Case: Weighting Household Budget Survey How everything is done easy with replicate weights Case: Analyzing Household Budget Survey Conclusion

Four statements No statistic is better than its weakest point
The weakest point (the bias) is often one of the three points: - Population, frame and selection - Non-response or - Measurement error in the data collection Not everything that counts is countable - and not everything that is countable counts (Einstein) Every good statistics try to control all the weakest points as good as possible with the know information and resources

Bias and standard error of mean
The bias is “dominating” the standard error Horizontal: Bias (assumed = 3%) Black curve: The max. standard error by SRS. E.g. 9.8% if n=100 Green curve: The total error (square root of the mean square error) Scale horizontal: 100 to 4000 sample units. Vertical: 0-15%

Challenge (1) In the 90s the response rate was around 65-80%
Today the response rate is 50-65% in Denmark In 2000 the tax the income overstated by 4% i a SRS survey before adjusting for non-response To day the tax income is overstated by 7% Both demographic and social factors have a growing and more complex negative impact, e.g. sex and nationality

Challenge (2) Response rate 1992 Response rate 2012 Age
16-29 years 62% 47% 30-39 years 66% 58% 40-49 years 73% 57% 50-64 years 68% 61% 65-74 years 67% 71% Education Basic school 62% 51% Youth education 67% 56% Short further education 85% 73% Medium education 83% 74% Long further education 80% 70%

Challenge (3) Non-response on 52% in Household Budget Survey
From the tax register the taxable income for the entire population (and sample) is known Average household income in the population is DKK before tax. ( euro) In the sample after non-response the average household income is DDK or 7% higher

Challenge (4) When weighting for non-response one wish:
Estimate the mean and parameters correctly Use the correct sample size n, and not the sum of the weights or weights rounded, when calculate the variance Calculate the variance with respect for The price for different weights The benefit of explained variation

Challenge (5) And extended to be used in multivariate analysis
with model-assisted calibration of weights with GREG using all known categorical and continuous variables In standard programs as SAS (proc survey…) and STATA (svy) is this possible

How to weight (1) In the simple case weighting for sex: Men: Women: Total: If rescale with n / N

How to weight (2) When both weights and strata (respondent belong to N1 and N2) are known Two things are important gj is squared (it costs variance) centered on strata average (it wins variance: R**2)

How to weight (3) An indicator for the cost in variance: Example if g1=1.5, g2=0.5 and n=2: ( )/2=1.25 Designeffect from 1.1 to 1.25 is normal compare to g=1 When weights are known and strata are unknown: If e.g. dataproducer not disclose strata or only weights are used

How to weight (4) Calibration with generalised regression (GREG) estimation use information from registers Continuous variables may be involved Traditional post stratification: sex*age*region Different tables separately at the same time: sex*age and sex*nationality Continuous variables and separate tables It is a soft weighting – mode assisted – the weights only sum up to the population tables. If no sample bias compared to population tables (=“model”), no weighting More math in my paper on the internet, which also shows how to get from the general formula to the simple example with stratification for sex

Household Budget Survey (1)
Sample: Simple Random Sample from the hole population. Before non-response therefor only small (random) difference between then sample and the population (weak point 1 ok). Mode: Personal interview in households and diaries Non-response on 52% Register information in frame and sample: Region, age, sex, socio-economic background, years in the house, education, dwelling type, family type, Danish citizen and household income and ….. Measurement variable in questionnaire: Consumption

No weighting (no adjustment for non-response on 52%) Region, age and sex Region, age * sex, socio-economic background, years in the house, education, dwelling type, family type and Danish citizen Region, age * sex, socio-economic background, years in the house, education, dwelling type, family type and Danish citizen and household income in 5 groups Household income Pct. consumption Standard error more than of mean Weighting a (+7%) 46% Weighting b (strata known) (+4%) 44% Weighting c (strata known) (-½%) 39% Weighting d (strata known) % Weighting d (only weights) % Population not known

Bias reduced by up to 7% Design effect when all information is used: Effective sample size increases from 2462 to 2462/0.51 = 4827 compared with SRS If only the weights are used (but not to strata) the bias is still control but Design effect is: Effective sample size decreases from 2462 to 2462/1.19 = 2069 compared with SRS.

Repeated weighting (1) Different methods is used to take a sample in the sample and use the new samples to estimate the initial sample variance. BRR - Balanced Repeated Replicates Bootstrapping Jackknife, e.g. JK1, where e.g. the first 10 units SRS is taken out of a SRS sample on Then the next 10 is taken out and so on until you have G=100 samples with each 990 units.

Repeated weighting (2) In Household Budget Survey the standard error of consuming more than is On the basis of G=80 replicate weights the standard error is estimated in the range of The Jackknife method is able roughly accurately to estimate the standard error at the same level as the GREG estimator In SAS and STATA you just inform about the names of the replicate weights and the method e.g. JK1 and a regression analysis is make in 2-3 minutes.

Conclusion If register information is used to adjust for non-response
and if the data producer estimate the overall weight and replicate weights Researcher today can achieve the complete benefits by including registry variables in weighting for non-response: - least possible bias - greatest possible effective sample - and correct estimation of the variance when weighting Thank you for listening

Peter Linde, Interviewservice Statistics Denmark

Similar presentations

Presentation on theme: "Peter Linde, Interviewservice Statistics Denmark"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Peter Linde, Interviewservice Statistics Denmark

Similar presentations

Presentation on theme: "Peter Linde, Interviewservice Statistics Denmark"— Presentation transcript:

Similar presentations

About project

Feedback