Download presentation
Presentation is loading. Please wait.
1
Sampling with unequal probabilities
2
Introduction We have learned the following sampling schemes
SRS: take an SRS from all the units in a population Stratified sampling: take an SRS in each stratum One-stage cluster sampling: take an SRS of clusters Two-stage cluster sampling: 1 take an SRS of clusters; 2 in each sampled cluster, take an SRS
3
Introduction Sampling with equal probabilities
SRS: all the units have the same sampling probability: n/N Stratified sampling: all the units within the same stratum have the same sampling probability:
4
Introduction Sampling with equal probabilities
One-stage cluster sampling: all the clusters have the same sampling probability: n/N Two-stage cluster sampling All the clusters have the same sampling probability: n/N All the units within the same cluster have the same sampling probability:
5
Introduction Surveys assuming equal probabilities are easy to design and explain In many situations, sampling with equal probabilities are not as efficient as sampling with unequal probabilities
6
Motivating examples: nursing homes
Sampling nursing home residents in an area N=294 homes K=37,652 beds Mi is between 20 and 1000 If we do cluster sampling with equal probabilities Step1: take an SRS of nursing homes Step2: take an SRS of residents within each selected home
7
Motivating examples: nursing homes
A home of 20 beds is as likely to be chosen as a nursing home of 1000 beds Large variance Inconvenient to administer Cost An alternative method is let the sampling probability of a home be proportional to the number of beds in that home
8
Motivating examples: California schools
There are 757 school districts A school district is a form of special-purpose district which serves to operate local public primary and secondary schools, for formal academic or scholastic teaching, in various nations The district sizes vary
9
Motivating examples: California schools
Some large school districts Los Angeles Unified: 552 San Diego Unified: 142 San Francisco Unified: 100 Irvine Unified: 29 district size 1 2 3 4 5 5-10 >10 % 24.7% 11.3% 11.9% 9.1% 6.7% 15.9% 20.0%
10
California schools: estimated API99
11
Statistical Inference: Theory and Methods
Outline: When the sample size n=1 When the sample size n=2 A general sample size n
12
Sampling one primary sampling unit
Suppose we want to select one PSU (n=1)from the N PSU’s The total for PSU i is ti Want to estimate the population total t We use the example of n=1 to show the ideas of unequal-probability sampling
13
Sampling one primary sampling unit
Let be the probability that a store is selected on the first draw Let be the probability that a store is sampled Because only one unit is sampled,
14
Sampling one primary sampling unit
Make the sampling probability proportional to store size:
15
Sampling one primary sampling unit
16
Sampling one primary sampling unit
17
Sampling one primary sampling unit
If we do equal-probability sampling
18
Example: California schools
Sampling one district in every sample With equal probabilities: ᴪi=1/N With unequal probabilities
19
Example: California schools
20
Example: California schools
Theoretical variance: 1.05E14 Empirical variance: 9.12E13 Theoretical variance: 4.08E11 Empirical variance: 4.96E11
21
Sampling one primary sampling unit
In the examples, the variance of the equal-probability sampling is much greater than that of the unequal-probability sampling This is because the unequal-probability scheme uses auxiliary information It is expected that the sale of a store is related to the store size The unequal-probability scheme uses this information in designing the sampling scheme thus improves precision
22
Sampling one primary sampling unit
In the unequal-probability sampling we considered, the probabilities are proportional to the areas of the stores In the equal-probability sampling we use the same probability (1/4) for all the stores Question: if we choose arbitrary probabilities, is the estimate (the weighted sum) biased or unbiased?
23
Sampling one primary sampling unit
Sampling prob proportional to size equal prob sampling Sampling with arbitrary probs
24
Unequal-probability sampling
To illustrate some features of unequal-probability sampling, let’s examine the supermarket example. Now assume n=2 The population
25
Unequal-probability sampling
Let
26
Unequal-probability sampling
27
Inclusion probability and joint inclusion probability
Inclusion probability: πi Joint inclusion probability:πij The next slide shows that
28
Inclusion probability and joint inclusion probability
29
Inclusion probability and joint inclusion probability
30
The Joint Inclusion Probability (πij)
The calculation is straightforward for sampling with equal probabilities: For i≠j, Zi and Zj are not independent
31
The Joint Inclusion Probability (πij)
We have the formula for n=2: The derivation for n>2 is not straightforward
32
The Horvitz-Thompson estimator: the general idea
33
The Horvitz-Thompson estimator: the general idea
34
The Horvitz-Thompson estimator: one-stage sampling
35
The Horvitz-Thompson estimator: one-stage sampling
The estimator: Results: SYG format Sen-Yates-Grundy (Sen 1953, Yates and Grundy 1953)
36
The Horvitz-Thompson estimator: one-stage sampling
Proof
37
The Horvitz-Thompson estimator: one-stage sampling
Proof (SYG)
38
The Horvitz-Thompson estimator: one-stage sampling
How to estimate the variance? Horvitz and Thompson (1952) Sen-Yates-Grundy (Sen 1953, Yates and Grundy 1953) Both estimators are unbiased for the variance
39
The Horvitz-Thompson estimator: one-stage sampling
Horvitz and Thompson (1952)
40
The Horvitz-Thompson estimator: one-stage sampling
Sen-Yates-Grundy
41
The Horvitz-Thompson estimator: one-stage sampling
Example:
42
The Horvitz-Thompson estimator
Practical issues: Sampling weights are often provided; as a result, it is usually easy to compute the HT estimate for population total or mean The variance part is tricky The inclusion probabilities and the joint inclusion probabilities are difficult to calculate for n>2 The HT variance estimator might lead to a negative estimate Approximate estimators for ?
43
One-stage sampling with replacement
Suppose n>1 Sampling with replacement the selection probabilities do not change; draws are independent =Pr(select unit i on any draw) The probability that unit i is in the sample at least once If n=1, then
44
One-stage sampling with replacement
45
One-stage sampling with replacement
46
One-stage sampling with replacement
47
One-stage sampling with replacement An example
Consider the population of introductory statistics classes at a college shown in the next slide. The college has 15 such classes; class i has Mi students, for a total of 647 students in introductory statistical courses. We decided to sample 5 classes with replacement, with probability proportional to Mi, and then collect a questionnaire from each student in the sampled classes. For this example,
48
One-stage sampling with replacement An example
49
One-stage sampling with replacement An example
50
One-stage sampling with replacement An example
51
One-stage sampling with replacement Designing selection probabilities
How to choose sampling probabilities? We want to choose sampling probabilities that minimize the variance The ideal sampling probabilities: But ti’s are unknown before a survey is taken
52
One-stage sampling with replacement Designing selection probabilities
Many totals in a PSU are related to the number of elements in a PSU We often choose sampling probabilities to be the relative proportions of elements (or size) in PSUs A large PSU has a greater chance of being in the sample than a small PSU. With Mi the number of elements in the ith PSU and K the number of elements, we have probability proportional to size (PPS) sampling.
53
The Horvitz-Thompson estimator: two-stage sampling
54
The Horvitz-Thompson estimator: two-stage sampling
55
The Horvitz-Thompson estimator for unequal-probability sampling
56
The Horvitz-Thompson estimator for unequal-probability sampling
57
The Horvitz-Thompson estimator: the unbiasedness of the variance estimator (optional)
58
The Horvitz-Thompson estimator for unequal-probability sampling
59
The Horvitz-Thompson estimator for unequal-probability sampling
60
The Horvitz-Thompson estimator for unequal-probability sampling
61
The Horvitz-Thompson estimator for unequal-probability sampling
62
The Horvitz-Thompson estimator for unequal-probability sampling
63
The Horvitz-Thompson estimator for unequal-probability sampling
Summary The Horvitz-Thompson estimator is unbiased The estimate of the variance could be negative because
64
The Horvitz-Thompson estimator: two-stage sampling with replacement
65
two-stage sampling with replacement
The estimators for two-stage unequal-probability sampling with replacement are almost the same as those for one-stage sampling Take a sample of PSU’s with replacement, choosing the ith PSU with known probability Once we selected one PSU, we take SRS of mi SSUs
66
two-stage sampling with replacement
The only difference is that in two-stage sampling, we must estimate ti. If PSU i is in the sampling more than once, there are estimates of the total for PSU i
67
two-stage sampling with replacement
The subsampling procedure needs to meet two requirements: (1) If a PSU is sampled L times, we need to do subsampling L times independently (2) the jth subsample taken from PSU i is selected in such a way that
68
Two-stage sampling with replacement
69
two-stage sampling with replacement
Consider the population of introductory statistics classes at a college shown in the next slide. The college has 15 such classes; class i has Mi students, for a total of 647 students in introductory statistical courses. We decided to sample 5 classes with replacement, with probability proportional to Mi, and then collect a questionnaire from each student in the sampled classes. For this example,
70
Two-stage sampling with replacement
71
Two-stage sampling with replacement
Five SSU’s per selected PSU
72
Two-stage sampling with replacement
In this example, Classes were selected with probability proportional to the number of students in the class Subsampling the same number (mi=m) of students in each class resulted in a self-weighting sample
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.