SAMPLING METHODS
Reasons for Sampling Samples can be studied more quickly than populations. A study of a sample is less expensive than studying an entire population, because smaller number of items or subjects are examined. This consideration is especially important in the design of large studies that require a length follow-up. A study of an entire population (census) is impossible in most situations. Sometimes, the process of the study destroys or depletes the item being studied.
Sample results are often more accurate than results based on a population. If samples are properly selected, probability methods can be used to estimate the error in the resulting statistics. It is this aspect of sampling that permits investigators to make probability statements about observations in a study.
SAMPLING MEHODS The primary purpose of sampling is to estimate certain population parameters such as means, totals, proportions or ratios. A probabilty sample has the characteristic that every element in the population has a known, nonzero probablity of being included in the sample. A non- probability sample is one, that does not have this feature. Non-probability SamplingProbability Sampling
Probability Sampling Methods Simple Random Sampling Stratified Random Sampling Systematic Sampling Cluster Sampling
Simple Random Sampling A simple random sample is one in which every subject has an equal probability of being selected for the study. The recommended way to select a simple random sample is to use a table of random numbers or a computer-generated list of random numbers.
From a population of size N, in order to select a simple random sample of size n; 1.List and number each element in the population from 1 to N. 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, , N-1, N 2. Determine the required sample size, n. 3. Select n random numbers by a random process, e.g. Table of random numbers or a sofware, MS Excell 4. Take subjects from the population corresponding to the selected random numbers. 5. Estimate the population values (parameters).
From a population of size N=500, select a random sample of size n=10. Number subjects from 1 to 500. From a random starting point, 838, move down. Take numbers ≤ , 404, 100, 215, 290, 479, 487, 69, 405, 290th subjects in the population will constitute the sample. Make observations on selected subjects. Estimate parameters.
Point Estimates From the sample calculate statistics to estimate paramaters (population values). Point Estimates μ P
Interval Estimates Confidence interval for the population mean: Where S is the standard deviation and t is the tabulated t value.
: One Tail: : Two Tails:
Example A researcher wishes to estimate the average age of the mother at first birth. He selects 10 mothers at random, and gathers the following data: Mother No Age at first birth
Point estimate of the population mean: Sample standard deviation: Estimated standard eror of the mean:
If the researcher wishes to be 95% confident in his estimate:
: One Tail: : Two Tails:
CONFIDENCE INTERVAL FOR A POPULATION PROPORTION When P, population proportion is unknown, its estimate, the sample proportion, p can be used.
A researcher wishes to estimate, with 95% confidence, the proportion of woman who are at or below 20 years of age at first birth. Example
Point estimate of the population proportion: p=a/n=4/10=0.4 Estimated standard error of the mean:
In the above example if the sample size were 100 instead of 10, then the 95% confidence interval would be:
Among 250 students of Hacettepe University interwieved 185 responded that they reqularly read a daily newspaper. With 95% confidence, find an interval within which the proportion of students who regularly read a newspaper in Hacettepe University lie. Point estimate of the proportion of students who read a newspaper. The standard error of the estimate is
In oder words, the standard deviation of the proportions that can be computed from all possible samples of size 250 is The 95% Confidence Interval is:
Systematic Sampling A systematic random sample is one in which every kth item is selected; k is determined by dividing the number of items in the population by the desired sample size. N/n … i … k … i+k … i+2k … i+3k … N
Stratified Sampling A stratified random sample is one in which the population is first divided into relevant strata (subgroups), which are internally homogenous with respect to the variable of interest and a random sample is then selected from each stratum. Characteristics used to stratify should be related to the measurement of interest, in which case stratified random sampling is the most efficient, meaning that it requires the smallest sample size.
Strata Strata size Sample size 1 N 1 n 1 2 N 2 n 2 k N k n k TOTAL N n From each starta, select random samples independently, whose sizes are proportional to the size of that strata.
Estimation of the parameters
Cluster Sampling A cluster random sample results from a two-stage process in which the population is divided into clusters and a subset of the clusters is randomly selected. Clusters are commonly based on geographic areas or districts, so this approach is used more often in epidemiologic research than clinical studies.
Non-probability Sampling The sampling methods just discussed are all based on probability, but nonprobability sampling methods also exist, such as convenience samples or quota samples. Nonprobability samples are those in which the probability that a subject is selected is unknown. Nonprobability samples often reflect selection biases of the person doing the study and do not fulfill the requirements of randomness needed to estimate sampling error. When we use the term “sample” in the context of observational studies, we will assume that the sample has been randomly selected in an appropriate way.
DETERMINATION OF THE SAMPLE SIZE How large a sample is needed for estimating a) Population mean, : i) When population size, N, is unknown ii) When population size, N, is known
If we wish, with 95% confidence, to estimate the average birth weight of infants, within 250 gr around the unknown population mean, how largea sample should we select? (Assume =700 gr) Example When N=60 When d=400 gr, required sample size, n is 9.97~10.
b) Population proportion, P: i) When population size, N, is unknown ii) When population size, N, is known
Example If we wish, with 95% confidence, the proportion of infants with low birth weight within 10% around the unknown population proportion, how many infants should be selected?
If we know that the population size from which we will sample is 100, how many infants should be selected? Example