Download presentation
Presentation is loading. Please wait.
Published byLorraine Bruce Modified over 5 years ago
1
A bootstrap method for estimators based on combined administrative and survey data
Sander Scholtus (Statistics Netherlands) NTTS Conference 13 March 2019
2
Dutch Virtual Census π 1 ( π 2 ) π 2 ? π₯ education level (π¦)
number of cases (2011 Census) 0-14 year olds Β±2.9 million π 1 admin. data admin. data Β±6.5 million LFS ( π 2 ) Β±0.34 million π 2 ? Β±6.9 million Β±16.7 million
3
educational attainment (π)
Dutch Virtual Census Goal: estimate tables of frequencies involving education level Typical element: π βπ = πβπ β π π¦ ππ ( β π β 0,1 , π¦ ππ β 0,1 , πβ 1,β¦,πΆ ) other variables educational attainment (π) level 1 β¦ level c level C 1 h π βπ H
4
Dutch Virtual Census Goal: estimate tables of frequencies involving education level Typical element: π βπ = πβπ β π π¦ ππ ( β π β 0,1 , π¦ ππ β 0,1 , πβ 1,β¦,πΆ ) Proposal (De Waal and Daalmans, 2018): use mass imputation Estimate (e.g., logistic regression) model for π¦ 1 ,β¦, π¦ πΆ based on π 2 For each πβ π 2 \ π 2 , impute predictions π¦ 1π ,β¦, π¦ πΆπ based on this model Estimator for π βπ : π βπ = π βπ1 + π βπ2 = πβ π 1 β π π¦ ππ + πβ π 2 β π π¦ ππ + πβ π 2 \ π 2 β π π¦ ππ Question: how to evaluate the variance of π βπ ? Analytical approximation: possible but cumbersome (Scholtus, 2018) Bootstrap procedure
5
General set-up Target population π= π 1 βͺ π 2 Subpopulation π 1 :
Admin. data available Considered fixed (no variance) Probability sample π: May have overlap with admin. data π 1 =πβ© π 1 ; π 2 =πβ© π 2 Inclusion probabilities π π known Estimator of interest: π =π‘( π 1 ,π) π 1 π= π 1 βͺ π 2 π 2 ?
6
Bootstrap Classical bootstrap (Efron, 1979):
Estimator π =π‘ π , with π an i.i.d. random sample of size π from a distribution πΉ Resampling: Draw a with-replacement sample π π β of size π from π Compute π π β =π‘ π π β Repeat this a large number of times (π=1,2,β¦,π΅) Bootstrap estimator for the variance of π : var boot π = 1 π΅β1 π=1 π΅ π π β β π β π β = 1 π΅ π=1 π΅ π π β
7
Bootstrap Classical bootstrap does not account for
Finite-population sampling Complex survey design Different extensions of the bootstrap available Overview: Mashreghi, Haziza and LΓ©ger (2016) Here: extension based on pseudo-populations Theory: Booth, Butler and Hall (1994), Chauvet (2007) Previous application: Kuijvenhoven and Scholtus (2011)
8
Bootstrap First assume: π€ π =1/ π π is always integer-valued
Bootstrap algorithm: 1. Create a pseudo-population π β by taking π€ π copies of each unit πβπ. 2. For each π=1,β¦,π΅ do the following: - Draw sample π π β from π β analogous to design used to draw π from π. If πβ π β is a copy of πβπ then its inclusion probability is π π β β π π . - Analogously to π =π‘ π, π 1 , construct replicate π π β =π‘ π π β , π 1 . 3. Compute the variance estimate for π based on pseudo-population π β as: var boot π = 1 π΅β1 π=1 π΅ π π β β π β , with π β = 1 π΅ π=1 π΅ π π β .
9
Bootstrap General case: π€ π =1/ π π = π€ π + π π , with π€ π ββ, π π β 0,1 Bootstrap algorithm: 1. Create a pseudo-population π β by taking π π copies of each unit πβπ. Random inflation weight: π π = π€ π with probability 1β π π , π π = π€ π +1 with probability π π . 2. For each π=1,β¦,π΅ do the following: - Draw sample π π β from π β analogous to design used to draw π from π. If πβ π β is a copy of πβπ then its inclusion probability is π π β β π π . - Analogously to π =π‘ π, π 1 , construct replicate π π β =π‘ π π β , π 1 . 3. Compute the variance estimate for π based on pseudo-population π β as: var boot π = 1 π΅β1 π=1 π΅ π π β β π β , with π β = 1 π΅ π=1 π΅ π π β .
10
Bootstrap General case: π€ π =1/ π π = π€ π + π π , with π€ π ββ, π π β 0,1 Bootstrap algorithm: For each π=1,β¦,π΄ do the following: 1. Create a pseudo-population π π β by taking π π copies of each unit πβπ. Random inflation weight: π π = π€ π with probability 1β π π , π π = π€ π +1 with probability π π . 2. For each π=1,β¦,π΅ do the following: - Draw sample π ππ β from π π β analogous to design used to draw π from π. If πβ π π β is a copy of πβπ then its inclusion probability is π π β β π π . - Analogously to π =π‘ π, π 1 , construct replicate π ππ β =π‘ π ππ β , π 1 . 3. Compute the variance estimate for π based on pseudo-population π π β as: π£ π π = 1 π΅β1 π=1 π΅ π ππ β β π π β , with π π β = 1 π΅ π=1 π΅ π ππ β . Finally compute: var boot π = 1 π΄ π=1 π΄ π£ π π .
11
Bootstrap Key step: Analogously to π =π‘ π, π 1 , construct replicate π ππ β =π‘ π ππ β , π 1 Example: Dutch Virtual Census with mass imputation Original estimator: π βπ = πβ π 1 β π π¦ ππ + πβ π 2 β π π¦ ππ + πβ π 2 \ π 2 β π π¦ ππ Construction of bootstrap replicate π βπ,ππ β : π 2π β is the subpopulation of π π β consisting of copies of units from π 2 π 2ππ β = π ππ β β© π 2π β ; note: size of overlap is random Use π 2ππ β to re-estimate the imputation model for π¦ 1 ,β¦, π¦ πΆ Impute the missing values of π¦ 1 ,β¦, π¦ πΆ in π 2π β \ π 2ππ β Compute: π βπ,ππ β = πβ π 1 β π π¦ ππ + πβ π 2ππ β β π π¦ ππ + πβ π 2π β \ π 2ππ β β π π¦ ππ
12
true standard deviations educational attainment
Simulation study true counts true standard deviations age (years) educational attainment low medium high young (15β35) 330 795 400 34.5 42.2 36.8 middle (36β55) 115 560 480 22.3 36.1 old (56+) 120 525 22.8 35.6 Synthetic target population of π=3725 persons Simple random sample of size π=π/5=745; no admin. data Mass imputation based on logistic regression: Gender Γ (Age + Income) True standard deviations: estimated by repeating sampling and imputing times
13
Simulation study true counts true standard deviations age (years) educational attainment low medium high young (15β35) 330 795 400 34.5 42.2 36.8 middle (36β55) 115 560 480 22.3 36.1 old (56+) 120 525 22.8 35.6 Analytical variance approximation (Scholtus, 2018), repeated 100 times Bootstrap procedure with π΄=1, π΅=200, repeated 100 times estimated analytical st. dev. estimated bootstrap st. dev. age (years) educational attainment low medium high young (15β35) 32.2 39.5 34.5 34.1 41.9 36.4 middle (36β55) 20.6 34.0 33.3 22.7 36.6 36.0 old (56+) 21.1 32.8 31.8 22.5 35.2
14
Conclusion Bootstrap method for estimating accuracy of statistics based on combined administrative and survey data Advantage over analytical variance estimation: flexibility Possible disadvantage: computational workload Future work: Simulation study with real Dutch Census data (in progress) Extending method to account for additional sources of uncertainty: Micro-integration of survey and admin. data in overlapping part Measurement error β¦
15
References J.G. Booth, R.W. Butler, and P. Hall (1994), Bootstrap Methods for Finite Populations. Journal of the American Statistical Association 89, 1282β1289. G. Chauvet (2007), MΓ©thodes de Bootstrap en Population Finie. PhD Thesis (in French), LβUniversitΓ© de Rennes. T. de Waal and J. Daalmans (2018), Mass Imputation for Census Estimation: Methodology. Report, Statistics Netherlands. B. Efron (1979), Bootstrap methods: another look at the jack-knife. The Annals of Statistics 7, 1β26. L. Kuijvenhoven and S. Scholtus (2011), Bootstrapping Combined Estimators based on Register and Sample Survey Data. Discussion Paper, Statistics Netherlands. Z. Mashreghi, D. Haziza, and C. LΓ©ger (2016), A Survey of Bootstrap Methods in Finite Population Sampling. Statistics Surveys 10, 1β52. S. Scholtus (2018), Variances of Census Tables after Mass Imputation. Discussion Paper, Statistics Netherlands.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.