Download presentation
Presentation is loading. Please wait.
Published byKristian Gerstle Modified over 5 years ago
1
Determining Subsampling Rates for Nonrespondents
Rachel Harter, NORC at the University of Chicago Traci Mach, Board of Governors of the Federal Reserve John Wolken, Board of Governors of the Federal Reserve Janella Chapline, NORC at the University of Chicago
2
The views expressed herein are those of the authors
The views expressed herein are those of the authors. They do not necessarily reflect the opinions of the Federal Reserve Board or its staff.
3
Overview Double Sampling Examples
Methods for Determining Subsampling Rates Illustrations using Survey of Small Business Finances Comments
4
Introduction Double sampling (Two Phase or Sequential Sampling)
Hansen and Hurwitz (1946) Relatively inexpensive data collection method applied to a larger sample More expensive follow-up method for a subsample
5
Introduction (cont.) Subsample phase 1 nonrespondents to:
Increase weighted response rates Maintain response rates while reducing costs Reduce nonresponse bias
6
Introduction (cont.) Subsampling affects Variability in weights
Effective sample size Number of completed cases Costs
7
Introduction (cont.) Subsampling rate depends on
Design stage decision vs. late decision Design objectives Assumed/known parameters Contractual constraints
8
Example 1 American Community Survey
Three modes of data collection: mail→ telephone→ in-person interview of subsample Subsampling rates are based on expected completion rates at tract level Typically subsample 1-in-3 or 2-in-3
9
Example 2 Chicago Health and Social Life Survey
Subsampled nonrespondents when the response rates were lower than expected Subsampled 1 in 4
10
Example 3 National Survey of Family Growth 2006
Used response propensity models to stratify segments Subsampling rates varied by stratum to favor segments likely to yield more completed cases for lower cost
11
Example 4 General Social Survey 2004, 2006
Nonrespondents subsampled to boost weighted response rates and control costs Subsampled 45% as balance between unweighted number of completed cases and weighted response rate
12
General Guidelines for Subsampling Nonrespondents
Kish’s rule of thumb (1965) Data collection in the second phase is at least 10 times the cost of phase one data collection on a per case basis in order to be economical.
13
General Guidelines for Subsampling Nonrespondents
Elliott et al. (2000) Subsampling saves resources whenever the per-callback or per-interview cost is increasing with each attempt, or when the probability of a successful interview attempt is decreasing.
14
Hansen-Hurwitz Method (1946)
Basic strategy Determine the sample needed to achieve the desired precision, assuming no nonresponse. Assume cost structure and expected response rates for each phase are known. Solve for the initial sample size n and subsampling rate f that minimize cost subject to the desired precision level.
15
Hansen-Hurwitz Method (cont.)
Per-unit cost structure C = c0n + c1n1 + c2n2’ Optimal subsampling rate f = sqrt{( c0 + c1 r1) / (c2 r1)}
16
Hansen-Hurwitz Method (cont.)
Drawbacks (Groves 1989) Takes into account sampling error only. Completion rate in phase 2 assumed to be high. Mode effects between phases are ignored. No distinction made between noncontacts and refusals. Completion rates and cost structures are known in advance.
17
Deming Method (1953) Goal Basic Strategy
minimize cost for a specified mean squared error, or vice versa Basic Strategy All sample cases are attempted once Use variance, nonresponse bias, and cost to determine the number of callback attempts Subsample for the callback attempts
18
Deming Method (cont.) Mean square error of the estimator Cost function
MSE = A + B/n + C/(nf) Cost function Cost = Dn + Enf Subsampling rate f = sqrt{CD/BE}
19
Deming Method (cont.) Drawbacks
Estimates of means and variances for each attempt are needed in advance for the MSE function. Assumes cases are equally likely to respond on each attempt.
20
Elliott-Little-Lewitzky Method (2000)
Allow different response probabilities with each callback attempt and nonzero costs of refusals. Define efficiency ratio as the cost under the subsampling approach to the cost under the full-callback approach. Subsampling is effective when efficiency ratio<1.
21
Elliott-Little-Lewitzky Method (cont.)
Basic strategy Subsample at the mth callback attempt. Total K callback attempts. Find subsampling rate f that minimizes the efficiency ratio for the mth callback attempt. Repeat for all values of m up to K. Determine the values of m and f that minimize the efficiency ratio.
22
Alternative Constraints
Required Completed Cases Required Response Rate Keep Costs and Weighting Effect Within Limits, Given Completes
23
Required Completed Cases
nspec = n r1 + n (1-r1) f r2 In the situation where r1 and r2 are fixed, f is determined by n. To also minimize cost, f is either 0 or 1 depending on whether c2>c1 or vice versa.
24
Required Response Rate
The response rate is a function of r1 and r2, the completion rates for each phase—not the subsampling rate f. Subsampling affects the response rate by redeploying funds to change r2.
25
Weighting Effect and Cost Within Limits, Given Required Completes
Each phase may have multiple outcomes, and each outcome may have a different known cost. The expected rates for each outcome are assumed known. Determine initial sample size and cost to achieve completes without subsampling.
26
WEFF, Cost Constraints Given Completes (cont.)
Determine increase in initial sample size to compensate for fewer completes with subsampling (function of f). Determine cost with subsampling (function of f).
27
WEFF, Cost Constraints Given Completes (cont.)
Ratio of cost with subsampling to cost without subsampling must be less than specified percentage. Solve for acceptable range of f to meet cost reduction constraint.
28
WEFF, Cost Constraints Given Completes (cont.)
Assuming equal base weights, weighting effect is a function of f. Weighing effect must be less than specified value.
29
WEFF, Cost Constraints Given Completes (cont.)
Solve for acceptable range of f. Use the intersection of the cost range and the WEFF range for f (if the intersection is non-empty).
30
2003 Survey of Small Business Finances (SSBF)
Two Types of data collection Screener: respondents screened by phone after advance mailing Main Interview: eligible businesses interviewed by phone, after sending worksheet
31
SSBF (cont.) Four batches/replicates of sample firms
Double sampling applied to both screener and main interview
32
Illustrations Using SSBF
Use screener cases in batch 2 n=5,666 total sample cases 2,838 completed the screener by the end of phase 1 (r1 = 50%) 1,099 cases selected for phase 2 (f = 60%) Additional 359 cases completed the screener in phase 2 (r2=33%)
33
Hansen-Hurwitz Subsampling Rate
C = $.98 n + $17.48 n1 + $26.03 n’2 f = 86%
34
Deming Subsampling Rate
Cost = $9.72 n + $4.29 n f MSE = A + B/n + C/(nf) f = sqrt{CD/BE} = sqrt{(C/B)(9.72/4.29)}
36
Subsampling Rate for Sample Size Constraint
nspec = n r1 + n (1-r1) f r2
37
Subsampling Rate for Sample Size Constraint
c2 > c1 If cost and sample size were the only considerations, take a larger initial sample and set f = 0.
38
Discussion Consider Bias and Variance Implications
Oh and Scheuren (1983) Alternatives to Subsampling for Nonresponse Politz and Simmons (1940) Groves (1989) Limitations of Cost/Error Models Fellegi and Sunter (1974)
39
Next Step Explore relationships among subsampling rates, cost redeployment, and response rates. Relationships may be institution-specific.
40
Contact Info:
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.