Sampling
Sampling Probability Sampling Non-probability sampling Based on random selection Non-probability sampling Based on convenience
Sampling Miscues: Alf Landon for President (1936) Literary Digest: post cards to voters in 6 states Correctly predicting elections from 1920-1932 Names selected from telephone directories and automobile registrations In 1936, they sent out 10 million post cards Results pick Landon 57% to Roosevelt 43% Election: Roosevelt in the largest landslide Roosevelt 61% of the vote and 523-8 in Elect. Col. Why so inaccurate?: Poor sampling frame Leads to selection of wealthy respondents
Sampling Miscues: Thomas E. Dewey for President (1948) Gallup uses quota sampling to pick winner 1936-1944 Quota sampling: matches sample characteristics to characteristics of population Gallup quota samples on the basis of income In 1948, Gallup picked Dewey to defeat Truman Reasons: 1. Most pollsters quit polling in October 2. Undecided voters went for Truman 3. Unrepresentative samples—WWII changed society since census
Non-probability Sampling In situations where sampling frame for randomization doesn’t exist Types of non-probability samples: 1. Reliance on available subjects convenience sampling 2. Purposive or judgmental sampling 3. Snowball sampling 4. Quota sampling
Reliance on Available Subjects Person on the street, easily accessible Examples: Mall intercepts, college students, person on the street Frequently used, but usually biased Notoriously inaccurate Especially in making inferences about larger population
Purposive or Judgmental Sampling Dictated by the purpose of the study Situational judgments about what individuals should be surveyed to make for a useful or representative sample E.g., Using college students to study third-person effects regarding rap and metal music 3pe: Others are more affected by exposure than self Assessing effects on self and others Using college students makes for homogeneity of self
Snowball Sampling Used when population of interest is difficult to locate E.g., homeless people Research collects data from of few people in the targeted group Initially surveyed individuals asked to name other people to contact Good for exploration Bad for generalizability
Quota Sampling Begins with a table of relevant characteristics of the population Proportions of Gender, Age, Education, Ethnicity from census data Selecting a sample to match those proportions Problems: 1. Quota frame must be accurate 2. Sample is not random
Probability Sampling Goal: Representativeness Random selection Sample resembles larger population Random selection Enhancing likelihood of representative sample Each unit of the population has an equal chance of being selected into the sample
Population Parameters Parameter: Summary statistic for the population E.g., Mean age of the population Sample is used to make parameter estimates E.g., Mean age of the sample Used as an estimate of the population parameter
Sampling Error Every time you draw a sample from the population, the parameter estimate will fluctuate slightly E.g.: Sample 1: Mean age = 37.2 Sample 2: Mean age = 36.4 Sample 3: Mean age = 38.1 If you draw lots of samples, you would get a normal curve of values
Normal Curve of Sample Estimates Frequency of estimated means from multiple samples Likely population parameter Estimated Mean
Standard Error The average distance of sample estimates from the population parameter 68% of sample estimates will fall within in one standard error of the population parameter
Normal Curve of Sample Estimates Frequency of estimated means from multiple samples Population parameter 1 standard error unit Estimated Mean
Normal Curve of Sample Estimates 2/3 of samples Frequency of estimated means from multiple samples Population parameter 1 standard error unit Estimated Mean
Standard Error Estimates and Sample Size As the sample size increases: The standard error decreases In other words, are sample estimate is likely to be closer to the population parameter As the sample size increases, we get more confident in our parameter estimate
Confidence Levels Two thirds of samples will fall within the standard error of the population parameter Therefore: a single sample has a 68% chance of being within the standard error Confidence levels: 68% sure estimate is within 1 s.e. of parameter 95% sure estimate is within 2 s.e. of parameter 99% sure estimate is within 3 s.e. of parameter
Confidence Interval Interval width at which we are 95% confident contains the population parameter For example, we predict that Candidate X will receive 45% of the vote with a 3% confidence interval We are 95% sure the parameter will be between: 42% and 48% Confidence interval shrinks as: Standard error is smaller Sample size is larger
Sample Size & Confidence Interval How precise does the estimate have to be? More precise: larger sample size Larger samples increase precision But at a diminishing rate Each unit you add to your sample contributes to the accuracy of your estimate But the amount it adds shrinks with additional unit added
95% Confidence Intervals Sample Size % split N = 100 N = 200 N = 300 N = 400 N = 500 N = 700 N = 1000 N = 1500 50/50 10.0 7.1 5.8 5.0 4.5 3.8 3.2 2.6 70/30 9.2 6.5 5.3 4.6 4.1 3.5 2.9 2.4 90/10 6.8 4.2 3.0 2.7 2.3 1.9 1.5
Sampling Frame List of units from which sample is drawn Defines your population E.g., List of members of organization or community Ideally you’d like to list all members of your population as your sampling frame Randomly select your sample from that list Often impractical to list entire population
Sampling Frames for Surveys Limitations of the telephone book: Misses unlisted numbers Class bias: Poor people may not have phone Less likely to have multiple phone lines Most studies use a technique such as Random Digit Dialing as a surrogate for a sampling frame
Types of Sampling Designs Simple Random Sampling Systematic Sampling Stratified Sampling Multi-stage Cluster Sampling
Simple Random Sampling Establish a sampling frame A number is assigned to each element Numbers are randomly selected into the sample
Systematic Sampling Establish sampling frame Select every kth element with random start E.g., 1000 on the list, choosing every 10th name yields a sample size of 100 Sampling interval: standard distance between units on the sampling frame Sampling interval = population size / sample size Sampling ratio: proportion of population that are selected Sampling ratio = sample size / population size
Stratified Sampling Modification used to reduce potential for sampling error Research ensures that certain groups are represented proportionately in the sample E.g., If the population is 60% female, stratified sample selects 60% females into the sample E.g., Stratifying by region of the country to make sure that each region is proportionately represented
Two Methods of Stratification 1. Sort population in groups Randomly select within groups in proportion to relative group size 2. Sort population into groups Systemically select within groups using random start Disproportionate stratification: Some stratification groups can be over-sampled for sub- group analysis Samples are then weighted to restore population proportions
Cluster Sampling Frequently, there is no convenient way of listing the population for sampling purposes E.g., Sample of Dane County or Wisconsin Hard to get a list of the population members Cluster sample Sample of census blocks List of people for selected census block Select sub-sample of people living on each block
Multi-stage Cluster Sample Cluster sampling done in a series of stages: List, then sample within Example: Stage 1: Listing zip codes Randomly selecting zip codes Stage 2: List census blocks within selected zip codes Randomly select census blocks Stage 3: List households on selected census blocks Randomly select households Stage 4: List residents of selected households Randomly select person to interview
Multi-stage Sampling and Sampling Error Error is introduced at each stage One solution is to use stratification at each stage to try to reduce sampling error