Download presentation
Presentation is loading. Please wait.
Published byBeryl Matthews Modified over 6 years ago
1
Sampling Plans Copyright (c) 2008 by The McGraw-Hill Companies. This spreadsheet is intended solely for educational purposes by licensed users of LearningStats. It may not be copied or resold for profit.
2
Why Sample? Lower cost Improved accuracy Destructive testing, such as
That's right – a sample can be both cheaper and more accurate than an attempted census! The 2000 U.S. Census cost more and was less accurate because Congress rejected a plan to use scientific sampling in hard-to-count locations.* Lower cost Improved accuracy Destructive testing, such as Light bulb life (hours until failure) 5 mph crash car bumper tests (dollars of damage) Perishable data (e.g., fresh fish inspection) Timely data is required (e.g., political polls) Large, dispersed population (e.g., seat belt usage) * Source:
3
Why Census? Computer database exists
When the population is small or when the data are already in a computer database, why take a sample? But in auditing a computer record, we may still have to check physical records, so sampling still may be appropriate. Why Census? Computer database exists Population is small and accessible Legal requirements U.S. Census (national headcount) Bank cash (end-of-day cash drawer balance) Money is no constraint (rare) Time is no constraint (rare)
4
Objectives Descriptive (to get a general overview)
Percent of HMO patients who arrive late Your mean commute time to work Inferential (to estimate a key parameter) Mean insurance claim size for angioplasty Six-month stent failure rate
5
Inference Population m = $58,222 s = $6,771 N = 125,000 Sample
Question What is the average salary of a computer security specialist? We sample because the population is large and hard to reach. Population m = $58,222 s = $6,771 N = 125,000 Sample = $57,545 s = $6,958 n = 100
6
Sampling Plans - I Simple random sample Systematic sample
List of population items must be available Random numbers are used to choose items Each item has same chance of being selected Systematic sample Continuous process non-enumerable population (no list) Choose every Kth item (e.g., every 10th voter at poll exit) Use a random starting point (e.g., the 8th voter) Unbiased unless data are in non-random order
7
Sampling Plans - II Stratified sample Cluster sample
Each stratum is a defined population sub group (e.g., male/female) May be many strata (e.g., gender, race, occupation) Weight sample estimates by strata size (strata % must be known) Corrects for possible under-representation of groups Cluster sample Like stratification except based on geography (e.g., school districts) Two-stage is common (random cluster, random items in cluster) Reduces travel cost for in-person interviews
8
Alas – many key business decisions are made this way!
Sampling Plans - III Judgment sample Experts in the field select the sample (e.g., which firms) Utilizes domain knowledge of experts (e.g., software engineers) May avoid wasting time on atypical or unimportant respondents But introduces subjectivity Convenience sample Asking co-workers opinions "because they're handy" Using a data set that happens to exist already Subjective Unknowable biases Alas – many key business decisions are made this way!
9
Simple Random Sample Data are the GPAs of college freshmen in a classroom. Choose 10 at random by picking random rows and columns. Sampled GPAs are highlighted. How well do the sample statistics and histograms match the population? Note If you chose the sample, you might try to avoid adjacent items, or try to cover every row or column. That would not be random.
10
Systematic Sample Data are the GPAs of college freshmen in a classroom. Choose 10 at random by picking every 12th student starting in row 5, column 1 and going down and across. Sampled GPAs are highlighted. How well do the sample statistics and histograms match the population? Note Although this sample happens to give poor estimates, the systematic method is not to blame unless the population items were arranged in a certain way (e.g., every 12th student is in the honors program). In this case, it's just sampling error.
11
Cluster Sample Should we sample without replacement?
Data are the ages of concert goers. Each cluster is a zip code. There are 11 clusters. Choose 3 clusters at random, then select 3 persons from each selected cluster. Sampled ages are highlighted. How well do the sample statistics and histograms match the population?
12
Bottom Line Do a cost/benefit before sampling
Define the purpose before you plunge Expert advice can help with The sampling plan The sample sizes required Analysis of the sample
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.