Presentation is loading. Please wait.

Presentation is loading. Please wait.

Stratified Sampling STAT262.

Similar presentations


Presentation on theme: "Stratified Sampling STAT262."— Presentation transcript:

1 Stratified Sampling STAT262

2 Motivating example: enrollment in California Schools
California schools: E=4421, H=775, M=1018 We are interested in the number of enrolled students 25%

3 What is stratified sampling?
Stratify: make layers Strata: subpopulations Strata do not overlap Each sampling unit belongs to exactly one stratum Strata constitute the whole population

4 Why do we use stratified sampling?
Be protected from obtaining a really bad sample Population size is N=500 (250 women and 250 men) SRS of size n=50 It is possible to obtain a sample with no or a few males Pr(less than or equal to 20 men in an SRS)=0.10 In stratified sampling, we can sample 25 men and 25 women

5 Enrollment in California Schools
SRS of 0.5% of N Stratified sampling: 0.5% from each stratum We use 1000 simulations to compare their estimates of the population mean

6 SRS vs Stratified Sampling

7 Stratified Sampling: Theory
or utilize existing layers/strata

8 Population

9 The SRS from Stratum h

10 Stratified Sampling: Estimation

11 Stratified Sampling: Bias and Variance

12 Stratified Sampling: Bias and Variance

13 Stratified Sampling: Confidence Intervals
A large-sample 100(1-α)% CI for the population mean is Some books use t distribution with n-H degrees of freedom for the critical value

14 Stratified Sampling: Sampling Probabilities and weights
Example. Population: 1600 men and 400 women. Sample: 200 randomly chosen men 200 randomly chosen women Each man in the sample has a weight of 8. He represents himself and 7 other men not in the sample Each woman has a weight of 2. She represents herself and 1 other woman not in the sample

15 Stratified Sampling: Sampling Probabilities and weights
The sampling probability for the jth unit in the hth stratum is Sampling weight: The sum of sampling weight is N

16 Stratified Sampling: Sampling Probabilities and weights

17 Stratified Sampling: Sampling Probabilities and weights
example

18 Sampling probabilities and weights in proportional allocation
In proportional allocation, the number of sampled units in each stratum is proportional to the size of the stratum, i.e., Every unit in the sample has the same weight and represents the same number of units in the population. The sample is called self-weighting

19 Sampling probabilities and weights in proportional allocation
Sampling probability for all units is about 10% All the weights are the same: 10

20 An Example of Stratified Sampling
California enrollments Elementary schools: 4397 High schools: 751 Middle schools: 1009

21 Observed Data Stratum Nh nh ybarh s2h E 4397 200 439.14 30759.85

22 Estimation and CI Total: 3845143 978180565554
Nh nh ybarh s2h th th.var E H M Total: Estimate of the total enrollments: A 95% CI: ( *989030, *989030), i.e., ( , ) Estimate of the mean enrollments (school size): /N=624.5 A 95% CI: ( *989030/N, *989030/N), i.e., (309.7, 939.4)

23 Stratified sampling for proportions

24 Allocating observations to strata
Survey design is the most important part of using a survey in research If we use a badly designed survey: garbage in garbage out Allocating observations to strata: determines the sample size or relative sample of each stratum

25 Proportional Allocation
the number of sampled units in each stratum is proportional to the size of the stratum The probability of selection is the same across strata (= nh/Nh=n/N ) Every unit in the sample has the same weight (=N/n), represents the same number of units in the population The sample is a self-weighting sample

26 Stratified sampling (with proportional allocation) vs SRS
What is the benefit of using stratified sampling (with proportional allocation) Under what conditions is stratified sampling (with proportional allocation) better than SRS? To compare the two sampling methods, we need to compare between-strata and within-strata variances

27 Population ANOVA Table

28 Stratified sampling (with proportional allocation) vs SRS

29 Stratified sampling (with proportional allocation) vs SRS

30 Stratified sampling (with proportional allocation) vs SRS
Proportional allocation is better than SRS unless The above inequality doesn’t happen often The more heterogeneous between strata (the more unequal the stratum means), the more benefit we gain from using stratified sample with proportional allocation

31 The California API Example
The population ANOVA > summary(aov(enroll ~ stype, data=pop.enroll)) Df Sum Sq Mean Sq stype Residuals =9,872,252,775 =19,221,868, % reduction

32 Optimal Allocation Stratified sampling with proportional allocation is easy to conduct It is more precise than SRS in most situations Is it the most efficient stratified sampling?

33 Optimal allocation The goal of optimal allocation is to gain the most information for the least cost We can assume that the total cost is fixed. Given that, we want to minimize the variance Different types of cost Total cost: C Overhead cost such as maintaining an office: C0 The cost of taking an observation in stratum h: Ch

34 Optimal allocation The goal of optimal allocation is to gain the most information for the least cost. We can assume that the total cost is fixed. Given that, we want to minimize the variance Different types of cost Total cost: C Overhead cost such as maintaining an office: C0 The cost of taking an observation in stratum h: Ch

35 Optimal allocation Recall that Want to minimize subject to

36 Optimal allocation Introducing a Lagrange multiplier λ, we will need to minimize Take partial derivative and set it to zero

37 Optimal allocation

38 Optimal allocation We need to find the value of n
Recall that the total cost C is fixed, i.e.,

39 Optimal allocation Combine the results, we have

40 Optimal allocation: two special situations

41 An example

42 Optimal allocation: two special situations

43 Optimal allocation for fixed variance
One may want to minimize cost for fixed variance Mathematically, we One can use Lagrange multiplier to show that Want to minimize subject to

44 Some practical issues Stratified sampling often gives higher precision than SRS But how to define strata? Stratification is most efficient when stratum means differ widely

45 Define strata Try to find some variables closely related to y
E.g., For farm income, use the size of a farm as a stratification variable For estimating total business expenditures on advertising, stratify by number of employees or by the type of product Get information from experts, old data, preliminary data, etc

46 Effects of unknown strata sizes and variances
Unknown strata sizes and variances cause bias One can use a pilot study to obtain good estimates of strata sizes and variances

47 Homework 2 Problem We are interested in the average school size (number of enrollments) Suppose that the only thing you know about California schools is that there are 6157 schools: 4397 elementary schools, 1009 middle schools and 751 high schools Use the “pop.enroll” dataframe in the R file for enrollments and school types. There is no missing value in the dataframe. Budget information: You can sample 60 schools for a pilot study to estimate stratum variances Use Neyman’s allocation to sample 600 schools (from the remaining schools) for your formal study

48 Homework 2 Problem 1 Note that I didn’t specify the way you can allocate the 60 schools for your pilot study. Consider three methods: An SRS A stratified sample with proportional allocation A stratified sample with 20 for each of the three types of schools Use simulations (say 1000 times) to study how the methods of pilot study affect the precision of your estimate for the average school size

49 Homework 2 Problem 1 Summarize and discuss your findings
Remark: another unspecified detail is how the data from a pilot study should be used. Should it be discarded? Should it be combined with the data you sample in the formal study? This is up to you.

50 Summary Stratified sampling almost always gives higher precision than SRS Stratification adds complexity to survey. E.g., when strata sizes and variances are unknown In many situations, the potential gain from stratification are large enough to justify the effects of stratifying population and the expenses of conducting pilot studies

51 Poststratification Suppose a sampling frame lists all households in an area You would like to estimate the average amount spent on food in a month One desirable stratification variable is household size Large households are expected to have higher food bills The distribution of household size is known (from U.S. census data)

52 An example of poststratification
The distribution of household size from U.S. census

53 An example of poststratification
The sampling frame does not include information on household size – we cannot conduct a stratified sampling based on household size We take an SRS and record The amount spent on food The household size If n (of the SRS) is large enough, we expect about 26% one-person households and about 31% two-person households, and so on

54 An example of poststratification
We can use the methods of stratified sampling to estimate the average amount spent on food for each category of household sizes After the observations are taken, we can form a “stratified” estimate of the population mean

55 An example of poststratification

56 An example of poststratification
Consider the following poststratificaiton method: Stratify families into H groups according to the average amount spent on food per month What is the consequence of doing it?

57 An example of poststratification
Poststratification can be dangerous You can obtain arbitrarily small variances if you choose the strata after seeing data Poststratificaiton is most often used to correct for the effects of differential nonresponse rates in the poststrata

58 A new sampling method A motivating example
Want to study the average amount spent on food per month and per person How would you design a survey?

59


Download ppt "Stratified Sampling STAT262."

Similar presentations


Ads by Google