Sampling Evan Mandell Geog 3000
What is a sample? A sample is a relatively small subset of the total population. Why do We Sample? There are just too many things in the world to measure each and every one. Evan Mandell Geog 3000
Sampling Design How big of a sample do we need for the results to be meaningful? Answer: 1/sqrt of N How do we get statistically dependable results? Answer: We choose a sample at random. Evan Mandell Geog 3000
The Simple Random Sample This method is the standard against which we measure all other methods because it has: Unbiasedness: Every item has the same chance of being chosen as the other ones Independence: The selection of one item has no effect on the selection of other items. Evan Mandell Geog 3000
Stratified Sampling Split the population into similar groups (strata) and draw a simple random sample from each group. Evan Mandell Geog 3000
Cluster Sampling Group the population into smaller clusters and derive samples from each cluster. Evan Mandell Geog 3000
Systematic Sampling Starts with a randomly chosen unit and then selects every kth unit thereafter. Evan Mandell Geog 3000
Sample Size and Standard Error We introduce a new variable! P - pronounced P-Hat P-hat is the number of successes x in the sample, divided by the sample size n. P = ^ ^ X/N Evan Mandell Geog 3000
Sampling Error Continued The standard deviation of P-hat is a measure of the sampling error. A) Define population with unknown parameter B) Find an estimator, it’s theoretical sampling distribution and Standard deviation σp = sqrt [ P(1 - P) / n ] C) Draw a random sample and find the estimate D) Report the result and its sampling error Evan Mandell Geog 3000
Central Limit Theorem X bar is approximately normal As n gets larger, x-bar approaches the Normal Distribution To find the distribution of X-bar, we only need to know the population mean and standard deviation. Evan Mandell Geog 3000
The t-distribution The t-distribution solves the two problems of the central limit theorem. A) It depends on a large sample size B) To use it, we need to know the Standard Deviation. The “T” can handle small sample sizes! Evan Mandell Geog 3000
T-Distribution Continued T is “the best we can do under the circumstances”. T is more spread out than Z because the uncertainty makes T a bit sloppier. The larger the sample size, the closer T gets to Z, the normal! Evan Mandell Geog 3000
What the Heck Did We Just Learn? Proportions (p-hat) are approximately normally distributed The larger the sample size, the more “normal” they appear We use t-distribution for small sample sizes Evan Mandell Geog 3000