2. Stratified Random Sampling. Lecture-4 Sampling Methods 2. Stratified Random Sampling. Engr. Dr. Attaullah Shah
Simple Random Sampling Used when there is inadequate information for developing a conceptual model for a site or for stratifying a site Any sample in which the probabilities of selection are known Sampling units are chosen by using some method using chance to determine selection
Simple random sampling is the basis for all probability sampling techniques and is the point of reference from which modifications to increase sampling efficiency may be made Alone, simple random sampling may not give the desired precision
Simple Random Sampling Advantages Prior information about population is not necessary Easy to perform, easy to analyze Disadvantages May not give desired precision Need a sampling frame. One way to overcome this problem while still keeping the advantages of random sampling is to use stratified random sampling. This involves dividing the units in the population into non over lapping strata, and selecting an independent simple random sample from each of these strata.
One way to overcome this problem while still keeping the advantages of random sampling is to use stratified random sampling. This involves dividing the units in the population into non over lapping strata, and selecting an independent simple random sample from each of these strata.
Stratified Random Sampling Prior knowledge of the sampling area and information obtained from background data may be used to reduce the number of observations necessary to attain specified precision Goal is to increase precision and control sources of variability in the data
Stratified Random Sampling Variability between strata must be larger than variability with strata for any benefit to be seen Sampling within each stratum is done with a Simple Random Sample
Stratified Random Sampling Advantages Gives estimates for subgroups Can be more precise than Simple Random Sampling Can be more convenient to implement Disadvantages Requires prior information about the population More complicated computation
Potential gains of Stratified Sampling First, if the individuals within strata are more similar than individuals in general, then the estimate of the overall population mean will have a smaller standard error than can be obtained with the same simple random sample size. Second, there may be value in having separate estimates of population parameters for the different strata. Third, stratification makes it possible to sample different parts of a population in different ways, which may make some cost savings possible.
Assume that K strata have been chosen, ith the ith of these having size Ni and the total population size being ΣNi = N. Then if a random sample with size ni is taken from the ith stratum, the sample mean yi will be an unbiased estimate of the true stratum mean μi, with estimated variance as: Where si is the sample standard deviation within the stratum. In terms of the true strata means, the overall population mean is the weighted average.
And the corresponding sample estimate is with estimated variance The estimated standard error of is , the square root of the estimated variance, and an approximate 100(1 − α)% confidence interval for the population mean is given by: If the population total is of interest, then this can be estimated by The estimated standard error of population total: Again, an approximate 100(1 − α)% confidence interval takes the form
When a stratified sample of points in a spatial region is carried out, it will often be the case that there are an unlimited number of sample points that can be taken from any of the strata, so that Ni and N are infinite. Equation can then be modified to and the equation becomes Where wi, the proportion of the total study area within the ith stratum, replaces Ni/N.
Example 2.3: Bracken Density in Otago As part of a study of the distribution of scrub weeds in New Zealand, data were obtained on the density of bracken on 1-hectare (ha, 100 ×100 m) pixels along a transect 90-km long and 3-km wide, running from Balclutha to Katiki Point on the South Island of New Zealand, as shown in Figure 2.2 (Gonzalez and Benwell 1994). This example involves a comparison between estimating the density (the percentage of the land in the transect covered with bracken) using (a) a simple random sample of 400 pixels, and (b) a stratified random sample with five strata and the same total sample size. There are altogether 27,000 pixels in the entire transect, most of which contain no bracken. The simple random sample of 400 pixels was found to contain 377 with no bracken, 14 with 5% bracken, 6 with 15% bracken, and 3 with 30% bracken. The sample mean is therefore y = 0.625%, the sample standard deviation is s = 3.261, and the estimated standard error of the mean is The approximate 95% confidence limits for the true population mean density is therefore 0.625 ± 1.96 × 0.162, or 0.31% to 0.94%.
The strata for stratified sampling were five stretches of the transect, each about 18-km long, and each containing 5400 pixels. The sample results and some of the calculations for this sample are shown in Table 2.4. The estimated population mean density from equation given equation is 0.613%, with an estimated variance of 0.0208 from equation The estimated standard error is therefore √0.0208 = 0.144, and an approximate 95% confidence limits for the true population mean density is 0.613 ± 1.96 × 0.144, or 0.33% to 0.90%.
Post Stratification Can be used when stratification is appropriate for some key variable, but cannot be done until after the sample is selected Often appropriate when a simple random sample is not properly balanced according to major groupings
A simple random sample is expected to place sample units in different strata according to the size of those strata. Therefore, post-stratification should be quite similar to stratified sampling with proportional allocation, providing that the total sample size is reasonably large. It therefore has some considerable potential merit as a method that permits the method of stratification to be changed after a sample has been selected. This may be particularly valuable in situations where the data may be used for a variety of purposes, some of which are not known at the time of sampling.