Download presentation
1
A new sampling method: stratified sampling
In stratified sampling, we conduct SRS in each stratum Outline Definition and motivation Statistical inference (theory of stratified sampling) Advantages of stratified sampling Sample size calculation
2
Stratified sampling: definition and motivation
A motivating example: average number of words in save messages of people in this room What is stratified sampling? Stratify: make layers Strata: subpopulations Strata do not overlap Each sampling unit belongs to exactly one stratum Strata constitute the whole population
3
Why do we use stratified sampling?
Be protected from obtaining a really bad sample. Example Population size is N=500 (250 women and 250 men) SRS of size n=50 It is possible to obtain a sample with no or a few males Pr(less than or equal to 15 men in an SRS)=0.003 Pr(less than or equal to 20 men in an SRS)=0.10 In stratified sampling, we can sample 25 men and 25 women
4
Why do we use stratified sampling?
Stratified sampling allows us to compare subgroups Convenient, reduce cost, easy to sample More precise. See the following example
5
Total number of farm acres (3078 counties)
SRS of 300 counties from the Census of Agriculture Estimate: , standard error: Stratified sampling: about 10% stratum (region)
6
Total number of farm acres (3078 counties)
Estimate: Standard error:
7
Theory of stratified sampling
8
Notation for Stratification: Population
9
Notation for Stratification: Sample
10
Stratified sampling: estimation
11
Statistical Properties: Bias and Variance
12
Variance Estimates for stratified samples
13
Confidence intervals for stratified samples
Some books use t distribution with n-H degrees of freedom
14
Sampling probabilities and weights
In a population with 1600 men and 400 women and the stratified sample design specifies sampling 200 men and 200 women, Each man in the sample has weight 8 and woman has weight 2 Each woman in the sample represents herself and 1 other woman not selected Each man represents himself and 7 other men not in the sample
15
Sampling probabilities and weights
The sampling probability for the jth unit in the hth stratum is Sampling weight: The sum of sampling weight is N
16
Sampling probabilities and weights
17
Sampling probabilities and weights
example
18
Sampling probabilities and weights in proportional allocation
In proportional allocation, the number of sampled units in each stratum is proportional to the size of the stratum, i.e., Every unit in the sample has the same weight and represents the same number of units in the population. The sample is called self-weighting
19
Sampling probabilities and weights in proportional allocation
Sampling probability for all units is about 10% All the weights are the same: 10
21
An example of stratified sampling
22
Observed data
23
Spreadsheet for calculations in the example
24
Stratified sampling for proportions
25
Allocating observations to strata
In the theoretical derivation and examples of stratified sampling, we assume that someone has designed a survey. Survey design is the most important part of using a survey in research If we use a badly designed survey, there is no way that we can get the correct result The problem of allocating observations to strata concerns how should one determines the sample size /relative sample of each stratum.
26
Proportional Allocation
In proportional allocation the number of sampled units in each stratum is proportional to the size of the stratum The probability of selection is the same for all strata (= ) for all strata Every unit in the sample has the same weight (=N/n), represents the same number of units in the population The sample is a self-weighting sample
27
Stratified sampling (with proportional allocation) vs SRS
What is the benefit of using stratified sampling (with proportional allocation) Under what conditions is stratified sampling (with proportional allocation) better than SRS? To compare the two sampling methods, we need to compare between-strata and within-strata variances
28
Analysis of Variance (ANOVA) for the population
29
Stratified sampling (with proportional allocation) vs SRS
30
Stratified sampling (with proportional allocation) vs SRS
31
Stratified sampling (with proportional allocation) vs SRS
The situation when stratified sampling with proportional allocation give a larger variance than SRS rarely happens when the strata sizes are large. The more unequal the stratum means, the more precision we will gain by using stratified sampling with proportional allocation
32
Optimal Allocation Stratified sampling with proportional allocation is easy to conduct It is more precise than SRS in most situations But it is not necessarily the most efficient stratified sampling This is especially true when the variances vary substantially from stratum to stratum
33
Optimal allocation The goal of optimal allocation is to gain the most information for the least cost. We can assume that the total cost is fixed. Given that, we want to minimize the variance Different types of cost Total cost: C Overhead cost such as maintaining an office: C0 The cost of taking an observation in stratum h: Ch
34
Optimal allocation Recall that Want to minimize subject to
35
Optimal allocation Introducing a Lagrange multiplier λ, we will need to minimize Take partial derivative and set it to zero
36
Optimal allocation
37
Optimal allocation We need to find the value of n
Recall that the total cost C is fixed, i.e.,
38
Optimal allocation Combine the results, we have
39
Optimal allocation: two special situations
40
An example
41
Optimal allocation: two special situations
42
Optimal allocation for fixed variance (v)
One may want to minimize cost for fixed variance Mathematically, we want to One can use Lagrange multiplier to show that Want to minimize subject to
43
Some practical issues Stratified sampling often gives higher precision than SRS But how to define strata? Stratification is most efficient when stratum means differ widely
44
Define strata Try to find some variables closely related to y
E.g., For farm income, use the size of a farm as a stratification variable For estimating total business expenditures on advertising, stratify by number of employees or by the type of product Get information from experts, old data, preliminary data, etc
45
Effects of unknown strata sizes and variances
Unknown strata sizes and variances cause bias One can use a pilot study to obtain good estimates of strata sizes and variances
46
Summary Stratified sampling almost always gives higher precision than SRS Stratification adds complexity to survey. E.g., when strata sizes and variances are unknown In many situations, the potential gain from stratification are large enough to justify the effects of stratifying population and the expenses of conducting pilot studies
47
Poststratification Suppose a sampling frame lists all households in an area You would like to estimate the average amount spent on food in a month One desirable stratification variable is household size Large households are expected to have higher food bills The distribution of household size is known (from U.S. census data)
48
An example of poststratification
The distribution of household size from U.S. census
49
An example of poststratification
The sampling frame does not include information on household size – we cannot conduct a stratified sampling based on household size We take an SRS and record The amount spent on food The household size If n (of the SRS) is large enough, we expect about 26% 1-person households and about 31% two-person households, and so on
50
An example of poststratification
We can use the methods of stratified sampling to estimate the average amount spent on food for each category of household sizes After the observations are taken, we can form a “stratified” estimate of the population mean
51
An example of poststratification
52
An example of poststratification
Discuss about the example
53
An example of poststratification
Poststratification can be dangerous You can obtain arbitrarily small variances if you choose the strata after seeing data Poststratificaiton is most often used to correct for the effects of differential nonresponse in the poststrata (chapter 8)
54
A new sampling method Motivating example
Want to study the average amount water used by per person How would you design a survey?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.