SAMPLING
Sampling Terminology Population: the relevant target group for the study Census: sample entire population Sample: a subset of the target population selected to represent the population Sample unit: individuals, households, stores, co.’s, or other target population “elements” available for selection during sampling process Sample frame: a list or other way of identifying units from which sample is to be drawn (e.g., time/place frame)
Sampling Terminology Sample representativeness: degree to which sample is similar to the target population in terms of key characteristics Incidence rate: percentage of people in the general population or on a list that fits the qualifications of those the researcher wishes to describe
Sampling Terminology Sampling error Nonsampling error Discrepancies between data generated from a sample and the actual population data as result of sampling instead of census Nonsampling error All other biases at any stage, including inaccurate population definition, sample frame error, etc., that can occur regardless of whether sample or census used
Steps in Developing a Sample Plan Step 1: Define the target population Step 2: Define the data collection method Step 3: Obtain or designate a sample frame Step 4: Determine sampling method Time/Area sampling Nonprobability methods Probability methods Step 5: Determine sample size Step 6: Develop operational procedures
Step 1:Define the Target Population Specify target population described in overall study objective(s) Demographic characteristics (age, gender, income, geographic location, household) Lifestyle characteristics (fun-loving, health-conscious, blue-collar) Product Usage (customers who spend over $50/month, current customers, 1st-time users)
Step 2: Define Data Collection Method Considerations: Representativeness Sources of Sample Bias Incidence Rate Response Rate
Step 3: Obtain/Designate Sample Frame Directories, company records, mailing lists, public records, event or specified location and time frame Screening questions (disqualifiers) Sample frame error: coverage problems Omission Ineligibility Duplication Underrepresentation/Overrepresentation
Step 4: Determine Sampling Method Time/Area Sampling Different blocks of time Different days Different locations
Appendix C: Time/Area Matrix Appendix C: Time/Area Matrix *filled blocks represent times surveyors were present at Duck Pond M 10/15 T 10/16 W 10/17 R 10/18 F 10/19 12:00-12:30 12:30–1:00 1:00-1:30 1:30-2:00 2:00-2:30 2:30-3:00 3:00-3:30
Nonprobability Methods Representativeness, not statistical chances of inclusion Convenience samples: “catch as can,” self-selected volunteers Judgment samples: hand-picked by researcher or other expert Referrals (“snowballing”): additional respondents referred by previous respondents Quota Sampling: set goals for sampling subgroup members (e.g., 5 men, 5 women each sampling session)
Probability Methods Statistical probability, every population unit has known chance of being picked Simple random sampling (SRS) Every unit has equal probability of being selected from the sample frame (n/N) Drawings Random number tables, RDD
Probability Methods Systematic sampling (SYMRS) Procedure that samples every ith unit after a random start point Example: List of 2,000 names and n = 250 2000/250 = skip interval of 8, every 8th name Select a random # for start point (e.g., 5) Selected names are 5th, 13th, 21st, etc. Drop-down substitution for ineligibles, refusals
Probability Methods Stratified sampling (STRS) Population is classified into subpopulations or “strata,” based on some known and available surrogate information. Then, apply sampling method to select units from within each strata. Types: Proportionate Disproportionate
STRATIFIED SAMPLES (n = 400) Proportionate Disproportionate Income Population % Sample % n Sample % n < 20,000 30% 30% 120 15% 60 20 – 29,999 35 35 140 20 80 30 – 49,999 25 25 100 30 120 > 50,000 10 10 40 35 140 100% 100% 400 100% 400
Probability Methods Cluster sampling Population divided into subgroups, each of which are representative of the population. Then apply sampling method to select clusters (1-step), units within clusters (2-step). Examples: neighborhoods, university classes, store locations
Step 5: Determine Sample Size NONSTATISTICAL APPROACHES Arbitrary: % of population Conventional: past studies, industry standards Cost Basis: budget or value of information
The Confidence Interval Statistical Approach 3 factors in determining sample size: Confidence intervals (confidence in estimate) Z value 90% z = 1.65 95% = 1.96 98% z = 2.33 99% = 2.58 Sampling error: precision, or tolerance for error around estimate stated in percentage points Estimated standard deviation: estimate of variability of population characteristic based on prior information
Sample size formula for means (interval data) Z2 * s2 n = e2 where n = required sample size Z = the Z value for your desired confidence level s = estimated standard deviation for the population mean e = desired accuracy range
Example – sample size for mean You plan on doing a survey to estimate the average amount spent on a “special occasion” restaurant meal for two. How large should the sample be? Desired confidence level is 95% ( z = 1.96) Desired accuracy level is + or - $3.00 Estimated standard deviation is $15.70
Sample size formula for percentages (nominal or ordinal) Z2 * ([P * Q]) n = e2 Where n = required sample size Z = the Z value for your desired confidence level P = estimation of the population % Q = (100 – P) e = desired accuracy range
Example: Sample size for percentages A perfume maker wants to know, within 2%, the percentage of women in the U.S. who have tried its new cologne. Trial rates for similar new products typically run about 10%. How large a sample would be needed to make the estimate with 90% confidence?
Effects of the 3 factors on statistical sample sizes What happens to sample size when confidence level is increased? What happens to sample size when allowable error is increased? What happens to sample size when population variability is high?
Oversampling: Sample Size vs. Number of Contacts Needed Calculate needed sample size (n) Divide by estimates of ECR, RR, OIR (multiply all applicable terms) Example: n = 1000, ECR = 50%, OIR = 65% Number contacts = 1000/[.50 * .65] = 3077
Step 6: Develop Operational Procedures Coping with refusals, increasing response rates Tracking procedures: call-back logs, track # of refusals, breakoffs, & ineligibles Schedules, time/area matrix Training, practice (field pretests), interviewer guidelines & procedures Materials (clipboards, response cards, pencils) Keeping track of collected surveys, follow-up