Download presentation
Presentation is loading. Please wait.
Published byEric Greer Modified over 9 years ago
1
Sampling Design and Analysis MTH 494 Lecture-30 Ossam Chohan Assistant Professor CIIT Abbottabad
2
Sampling with unequal probabilities Up to now, we have only discussed sampling schemes in which the probabilities of choosing sampling units are equal. Equal probabilities give schemes that are often easy to design and explain. Such schemes are not, however, always possible or, if practicable, as efficient as schemes using unequal probabilities. Cluster sample with equal probabilities may result in large variance for the design-unbiased estimator of the population mean and total 2
3
Primary Sampling Units(PSUS) In sample surveys, primary sampling unit (commonly abbreviated as PSU) arises in samples in which population elements are grouped into aggregates and the aggregates become units in sample selection. The aggregates are, due to their intended usage, called "sampling units." Primary sampling unit refers to sampling units that are selected in the first (primary) stage of a multi-stage sample ultimately aimed at selecting individual elements. In selecting a sample, one may choose elements directly; in such a design, the elements are the only sampling units. One may also choose to group the elements into aggregates and choose the aggregates in a first stage of selection and then elements at a later stage of selection. The aggregates and the elements are both sampling units in such a design. For example, if a survey is selecting households as elements, then counties may serve as the primary sampling unit, with blocks and households... 3
4
Sampling one primary unit As a special case, suppose we select just one (n=1) of N psus to be in the sample. The total for psu I is denoted by t i, and we want to estimate the population total, t. Sampling one psu will demonstrate the ideas of unequal-probability sampling without introducing the complications. 4
5
Understanding example Let us start out by looking at what happens for a situation in which we know the whole population. A town has four supermarkets, ranging in size from 100 square meters (m 2 ) to 1000 m 2. We want to estimate the total amount of sales in the four stores for last month by sampling just one of the stores. Of course, this is just an illustration-if we really had only four supermarkets we would probably take a census) 5
6
You might expect that a larger store would have more sales than a smaller store, and that the variability in total sales among several 1000 m 2 stores will be greater than the variability in total sales among several 100 m 2 stores. Since we sample only one store, the probability that a store is selected on the first draw is (ϒ i ) is the same as the probability that the store is included in the sample (π i ). For this example, take π i = ϒ i = Pr(store i selected) 6
7
This is proportional to the size of the store. Since store A accounts for 1/16 of the total floor area of the four stores, it is sampled with probability 1/16. for illustrative purposes, we know the values of t i for the whole population. Values are given on next slide. 7
8
StoreSize (m 2 )ϒiϒiπ i (in thousand) A1001/1611 B2002/1620 C3003/1624 D100010/16245 Total16001300 8
9
We could select a probability sample of size 1 with probabilities given above by shuffling cards numbered 1 through 16 and choosing one card. If the card’s number is 1, choose store A; if 2 or 3, choose B; if 4, 5 or 6, choose C; and if 7 through 16, choose D. or we could spin once on a spinner like this: 9
10
10
11
We compensate for the unequal probabilities of selection by also using ϒ i in the estimator. We have already seen such compensation fro unequal probabilities in stratified sampling: if we select 10% of the units in stratum 1 and 20% of the units in stratum 2, the sampling weight is 10 for each units in stratum 1 and 5 for each unit in stratum 2. Here, we select store A with probability 1/16, so store A’s sampling weight is 16. 11
12
If the size of store is roughly proportional to the total sales for that store, we would expect that store A also has about 1/16 of the total sales and that multiplying store A’s sales by 16 would estimate the total sales for all four stores. As always, the sampling weight of unit i is the reciprocal of the probability of selection: 12
13
Unequal-probability sampling without replacement Generally, sampling with replacement is less efficient than sampling without replacement; with replacement is introduced first because of the case in selecting and analyzing samples. Nevertheless, in large surveys with many small strata, the inefficiencies may wipe out the gains in convenience. Much research has been done on unequal-probability sampling without replacement; the theory is more complicated because the probability that a unit is selected is different for the first unit chosen than for the second, third and subsequent units. When you understand the probabilistic arguments involved, however, you can find the properties of any sampling scheme. 13
14
Example The supermarket example discussed above can be used to illustrate some of the features of unequal-probability sampling with replacement. Here is the population again. 14
15
StoreSize (m 2 )ϒiϒiπ i (in thousand) A1001/1611 B2002/1620 C3003/1624 D100010/16245 Total16001300
16
Let’s select two psus without replacement and with unequal probabilities. As we discussed above ϒ i = P(Select unit i on first draw) Since we are sampling without replacement, though, the probability that unit j is selected on the second draw depends on which unit was selected on the first draw. 16
17
One way to select the units with unequal probabilities is to use ϒ i as the probability of selecting unit i on the first draw, and then adjust the probabilities of selecting the other stores on the second draw. If store A was chosen on the first draw, then for selecting the second store would spin the wheel while clocking out the selection for store A, or shuffle the deck and redeal without Card-1. Thus 17
18
Rest of the solution 18
19
19
20
Summary Summary of the Designs and Methods 20
21
You will recall that the objective of statistics is to make inferences about a population from information contained in a sample. This unit discusses the design of sample surveys and associated methods of inference for populations containing a finite number of elements. Practical examples have been selected primarily from the fields of business and the social sciences where finite populations of human responses are frequently the target of surveys. Natural resource management examples are also included. 21
22
Summary The method of inference employed for most sample surveys is estimation. Thus we consider appropriate estimators for population parameters and the associated two-standard deviation bound on the error of estimation. In repeated sampling the error of estimation will be less than its bound, with probability approximately equal to 0.95. Equivalently, we construct confidence intervals that, in repeated sampling, enclose the true population parameter approximately 95 times out of 100. 22
23
The quantity of information pertinent to a given parameter is measured by the bound on the error of estimation. 23
24
The first segment, presented in initial discussions, reviews the objective of statistics and points to the peculiarities of problems arising in the social sciences, business and natural resource management that make them different from traditional type of experiment conducted in the laboratory. 24
25
The basic sample survey design, simple random sampling, is presented first. For this design the sample is selected so that every sample of size n in the population has an equal chance of being chosen. The design does not make a specific attempt to reduce the cost of the desired quantity of information. It is the most basic type of sample survey design, and all other designs are compared with it. 25
26
The second type of design, stratified random sampling divides the population into homogeneous groups called strata, this procedure usually produces an estimator that possesses a smaller variance than can be acquired by simple random sampling. Thus the cost of survey can be reduced by selecting fewer elements to achieve an equivalent bound on the error of estimation. 26
27
The third type of experimental design is systematic sampling, which is usually applied to population elements that are available in a list or line, such as names on file cards in a drawer or people coming out a factory. A random starting point is selected and then every k th element thereafter is sampled. Systematic sampling is frequently conducted when collecting a simple random or a stratified random sample is extremely costly or impossible. Once again, the reduction in survey cost in primarily associated with the cost of collecting the sample. 27
28
The fourth type of sample survey design is cluster sampling. Cluster sampling may reduce cost because each sampling unit is a collection of elements usually selected so as to be physically close together. Cluster sampling is most often used when a frame that lists all population elements is not available or when travel costs from element to element are considerable. Cluster sampling reduces the cost of the survey primarily by reducing the cost of collecting the data. 28
29
A discussion of ratio, regression, and difference estimators, which utilize information on an auxiliary variable is covered in third segment of the material. The ratio estimator illustrates how additional information, frequently acquired at little cost, can be used to reduce the variance of estimator and, consequently, reduce the overall cost of a survey. It also suggests the possibility of acquiring more sophisticated estimators by using information on more than one auxiliary variable. 29
30
This unit on ratio estimation follows naturally the discussion on simple random sampling in previous unit of SRS. That is, you can take a measurement of y, the response of interest, for each element of the SRS and utilize the traditional estimators. 30
31
To summarize, we have presented various elementary sample survey designs along with their associated methods of inference. Treatment of the topics has been directed towards practical applications so that you can see how sample survey design can be employed to make inferences at minimum cost when sampling from finite social, business or natural resource populations. 31
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.