Why sample? Diversity in populations Practicality and cost
Terms Population = large group about which conclusions are drawn. Real, but unknown. Sample = small group that represents population. Real, known. Sample Sample Sample Sample Sample Sample Population Sample Sample
Element = individual member of a population. Sampling unit = element or group of elements selected in a sample. Unit of analysis = element or group of elements compared in the analysis The above units can be the same or different.
Element, Sampling Unit, Unit of Analysis: Examples Opinion Survey of UMD Students Element = individual student Sampling unit = individual student Unit of analysis = individual student (student opinions measured) Survey of family incomes Element = adult household member Sampling unit = household or address Unit of analysis = family (total family income measured)
Element, Sampling Unit, Unit of Analysis: More Examples Voter Polls Element = individual voter Sampling unit = telephone number Unit of analysis = individual voter (voter opinions measured) U.S. Census of housing Element = household or address Sampling unit = household or address Unit of analysis = household or address (# of rooms measured)
Sampling frame = list of all the sampling units in the population Sampling frame = list of all the sampling units in the population. Needed for probability sampling. Probability sample = researcher knows and controls the probability of selection. Main advantage: Only probability samples permit accurate estimation of sampling error.
Simple Random Sample Every element in the population has an equal and constant chance of selection 1. Physical sampling with replacement 2. Table of random numbers 3. Random selection by computer Probability of selection = Sample Size/ Pop. size Requires list (frame) of all elements in population
Systematic Random Sample Every “kth” element is drawn from a list. (e.g. every 50th name) 1. K = sampling interval = Pop. Size/Sample size (e.g. 5000/100). 2. Random starting point between 1 and K (e.g. 1 and 50). 3. Statistically equivalent to simple random sample) 4. List must be randomly ordered. 5. Convenient, since lists are available for many populations
Stratified Random Sample Population is first divided into groups (strata). Simple random sample is taken from within each stratum Separate random samples are combined into a single total sample.
Example of stratified sample Seniors Sample 2 Juniors Sample Sample UMD Population Sample 3 Sophomores Freshmen Sample 4 Sample 1 Sample 1 Stratified Sample Sample 2 Sample 2 Sample 3 Sample 3 Sample 4 Sample 4
Considerations in Stratified Sampling Requires knowledge of stratifying variable Best used when there is much variation between strata in variable being measured (Example: Stratify by year in school if measuring opinions of advising) Lowest sampling error Most costly
Sampling error = estimated difference between sample value and actual population value (e.g. + 3%)
Cluster Sample Elements in population are naturally grouped together (“clusters”) Simple random sample of clusters is taken Every element in selected clusters is studied. Population: Sample
Considerations in Cluster Sampling Best when there is little variation between clusters in variable being measured. Does not require a list of individual elements (only clusters). May be used to cover large geographic area (smaller areas = clusters) May be less expensive Highest sampling error.
Multistage Designs Combines two or more sampling designs. Example: sampling voters in MN Stage 1: Stratify by geographic area (e.g. county) Stage 2: Sample census tracts (clusters) in selected counties. Stage 3: Take SRS of households in each tract. Commonly used in large, diverse populations Design is best left to experts!
Sampling Why use sampling? Terms and definitions Probability Sampling Designs Simple random Systematic Stratified Cluster Multistage designs Estimation from samples
Estimation from Samples Find a likely range of values for a population parameter (e.g. average, %) Parameter = characteristic of a population Statistic = characteristic of a sample Statistical inference = drawing conclusions about a population based on sample data Usually connected with a probability of error.
Sampling Distribution Distribution of results of all possible samples of size N taken from same population Theoretical, not actually done in practice Properties of sampling distributions are known to statisticians Used as basis for inferring from samples to populations
Example: estimating proportion of homes with internet access Suppose population proportion = .62 Take 1 sample of size 200 homes. 150 have internet access. Sample p = .60 Can we conclude that the population proportion is .60? A different sample might produce a different answer
What if we took all possible samples? Most sample proportions would be close to population value A few would be much higher or lower Average of sample proportions would be the true population proportion Distribution would be a bell-shaped curve 0 .1 .2 .3 .4 .5 .62 .7 .8 .9 1.0 % of samples All possible sample proportions
What we know from sampling distribution: We DON’T know the true population proportion. We DO know how many sample proportions fall within a given distance of the true proportion. Sampling error = estimated difference between sample value and actual population value (example: 95% of sample proportions fall within + 3% of true proportion)
How we make an estimate Find sample proportion Add sampling error (margin of error) on either side True proportion probably falls within this interval 0 .1 .2 .3 .4 .5 .62 .7 .8 .9 1.0 % of samples All possible sample proportions 0 .1 .2 .3 .4 .5 .62 .7 .8 .9 1.0 % of samples All possible sample proportions p p p p
Examples of estimates If 95% of sample proportions (p) fall within + 3% of true proportion, then 95% of all intervals p + .03 will contain true population proportion. If p = .6, we estimate the true proportion is .6 + .03 = .57 to .63 If p = .62, we estimate the true proportion is .62 + .03 = .59 to .65 If p = .57, we estimate the true proportion is .57 + .03 = .54 to .60 If p = If p = .7, we estimate the true proportion is .7 + .03 = .67 to .73 95% of the time this procedure yields a correct estimate.