Survey sampling Sampling & non-sampling error Bias

Slides:



Advertisements
Similar presentations
OVERVIEW OF SAMPLE SURVEYS
Advertisements

Sampling: Theory and Methods
Multiple Indicator Cluster Surveys Survey Design Workshop
Variance Estimation in Complex Surveys Third International Conference on Establishment Surveys Montreal, Quebec June 18-21, 2007 Presented by: Kirk Wolter,
Statistical Significance and Population Controls Presented to the New Jersey SDC Annual Network Meeting June 6, 2007 Tony Tersine, U.S. Census Bureau.
Overview of Sampling Methods II
Chapter 7 Sampling and Sampling Distributions
Faculty of Allied Medical Science Biostatistics MLST-201
EMR 6500: Survey Research Dr. Chris L. S. Coryn Kristin A. Hobson Spring 2013.
Where do data come from and Why we don’t (always) trust statisticians.
Statistics for Managers Using Microsoft® Excel 5th Edition
Selection of Research Participants: Sampling Procedures
Dr. Chris L. S. Coryn Spring 2012
Chapter 17 Additional Topics in Sampling
Chapter 11 Sampling Design. Chapter 11 Sampling Design.
11 Populations and Samples.
The Excel NORMDIST Function Computes the cumulative probability to the value X Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc
Ratio estimation with stratified samples Consider the agriculture stratified sample. In addition to the data of 1992, we also have data of Suppose.
A new sampling method: stratified sampling
Sampling Design.
Variables and Measurement (2.1) Variable - Characteristic that takes on varying levels among subjects –Qualitative - Levels are unordered categories (referred.
Probability Sampling.
Formalizing the Concepts: Simple Random Sampling.
Unit 3: Sample Size, Sampling Methods, Duration and Frequency of Sampling #3-3-1.
Sampling Procedures and sample size determination.
CHAPTER 7, the logic of sampling
Sampling Moazzam Ali.
SAMPLING METHODS Chapter 5.
17 June, 2003Sampling TWO-STAGE CLUSTER SAMPLING (WITH QUOTA SAMPLING AT SECOND STAGE)
United Nations Workshop on the 2010 World Programme on Population and Housing Censuses: Census Evaluation and Post Enumeration Surveys, Amman, Jordan,
Sampling : Error and bias. Sampling definitions  Sampling universe  Sampling frame  Sampling unit  Basic sampling unit or elementary unit  Sampling.
COLLECTING QUANTITATIVE DATA: Sampling and Data collection
Sampling.
Sampling January 9, Cardinal Rule of Sampling Never sample on the dependent variable! –Example: if you are interested in studying factors that lead.
Sampling. Concerns 1)Representativeness of the Sample: Does the sample accurately portray the population from which it is drawn 2)Time and Change: Was.
Definitions Observation unit Target population Sample Sampled population Sampling unit Sampling frame.
Copyright 2010, The World Bank Group. All Rights Reserved. Estimation and Weighting, Part I.
Islamic University college of Nursing
Chap 20-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 20 Sampling: Additional Topics in Sampling Statistics for Business.
Shooting right Sampling methods FETP India. Competency to be gained from this lecture Select a sample from a population to generate precise and valid.
Chapter 18 Additional Topics in Sampling ©. Steps in Sampling Study Step 1: Information Required? Step 2: Relevant Population? Step 3: Sample Selection?
CHAPTER 12 DETERMINING THE SAMPLE PLAN. Important Topics of This Chapter Differences between population and sample. Sampling frame and frame error. Developing.
© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Population and sample. Population: are complete sets of people or objects or events that posses some common characteristic of interest to the researcher.
Sampling Design and Analysis MTH 494 Lecture-30 Ossam Chohan Assistant Professor CIIT Abbottabad.
DTC Quantitative Methods Survey Research Design/Sampling (Mostly a hangover from Week 1…) Thursday 17 th January 2013.
Sampling Design and Analysis MTH 494 LECTURE-12 Ossam Chohan Assistant Professor CIIT Abbottabad.
United Nations Regional Workshop on the 2010 World Programme on Population and Housing Censuses: Census Evaluation and Post Enumeration Surveys, Bangkok,
Lohr 2.2 a) Unit 1 is included in samples 1 and 3.  1 is therefore 1/8 + 1/8 = 1/4 Unit 2 is included in samples 2 and 4.  2 is therefore 1/4 + 3/8 =
Tahir Mahmood Lecturer Department of Statistics. Outlines: E xplain the role of sampling in the research process D istinguish between probability and.
Sampling Techniques 19 th and 20 th. Learning Outcomes Students should be able to design the source, the type and the technique of collecting data.
5-4-1 Unit 4: Sampling approaches After completing this unit you should be able to: Outline the purpose of sampling Understand key theoretical.
Copyright 2010, The World Bank Group. All Rights Reserved. Part 1 Sample Design Produced in Collaboration between World Bank Institute and the Development.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.. Chap 7-1 Chapter 7 Sampling Distributions Basic Business Statistics.
Statistics for Managers Using Microsoft Excel, 5e © 2008 Pearson Prentice-Hall, Inc.Chap 7-1 Statistics for Managers Using Microsoft® Excel 5th Edition.
Sampling Sources: -EPIET Introductory course, Thomas Grein, Denis Coulombier, Philippe Sudre, Mike Catchpole -IDEA Brigitte Helynck, Philippe Malfait,
Chapter 6: 1 Sampling. Introduction Sampling - the process of selecting observations Often not possible to collect information from all persons or other.
1 of 22 INTRODUCTION TO SURVEY SAMPLING October 6, 2010 Linda Owens Survey Research Laboratory University of Illinois at Chicago
Sampling technique  It is a procedure where we select a group of subjects (a sample) for study from a larger group (a population)
CHAPTER 7, THE LOGIC OF SAMPLING. Chapter Outline  A Brief History of Sampling  Nonprobability Sampling  The Theory and Logic of Probability Sampling.
Population vs. Sample. Population: a set which includes all measurements of interest to the researcher (The collection of all responses, measurements,
Sampling Design and Analysis MTH 494 LECTURE-11 Ossam Chohan Assistant Professor CIIT Abbottabad.
Sampling Concepts Nursing Research. Population  Population the group you are ultimately interested in knowing more about “entire aggregation of cases.
PRESENTED BY- MEENAL SANTANI (039) SWATI LUTHRA (054)
United Nations Regional Workshop on the 2010 World Programme on Population and Housing Censuses: Census Evaluation and Post Enumeration Surveys, Addis.
AC 1.2 present the survey methodology and sampling frame used
Graduate School of Business Leadership
Meeting-6 SAMPLING DESIGN
Variables and Measurement (2.1)
Random sampling Carlo Azzarri IFPRI Datathon APSU, Dhaka
Presentation transcript:

Survey sampling Sampling & non-sampling error Bias Simple sampling methods Sampling terminology Cluster sampling Design effect Stratified sampling Sampling weights

Why sample? To make an inference about a population Studying entire pop is impractical or impossible

Example of sampling Estimate the proportion of adults, ages 18-65, in Port Elizabeth that have type 2 diabetes Select a sample from which to estimate the proportion Population: adults aged 18-65 living in Port Elizabeth Inference: proportion with type 2 diabetes

Probability sampling Each individual has known (non-zero) probability of selection Precision of estimates can be quantified

Non-probability sampling Cheaper, more convenient Quality of estimates cannot be assessed May not be representative of population

Sampling error v. Non-sampling error

Sampling error Random variability in sample estimates that arises out of the randomness of the sample selection process Precision can be quantified (estimation of standard errors, confidence intervals)

Non-sampling error Estimation error that arises from sources other than random variation non-response undercoverage of survey poorly-trained interviewers non-truthful answers non-probability sampling This type of error is a bias

What is bias? We want to estimate the mean weight of all women aged 15-44 living in Coopersville. Suppose there are 50,000 such women and the true mean weight is 61.7 kg. We select a sample of 200 such women and interview them, asking each woman what her weight is. The sample mean weight is 59.4 kg. Is our estimate biased?

Bias Suppose we could repeat the survey many, many times. Then we compute the mean of all the sample means. Say the mean of the means = 62.9 Bias = (mean of means) - (true mean) = 62.9 - 61.7 = 1.2 kg

Unbiased estimation If . . . (mean of the means) = (true mean) then the bias is zero, and we say that the estimator is unbiased. The “mean of the means” is called the “expected value” of the estimator.

Simple sampling methods Task: Select a sample of n individuals or items from a population of N individuals or items Common methods simple random sampling systematic sampling

Simple sampling methods Simple random sampling (SRS) each item in population is equally likely to be selected each combination of n items is equally likely to be selected Systematic sampling (typical method) randomly select a starting point select every kth item thereafter

Systematic sampling example Stack of 213 hospital admission forms; select a sample of 15 213/15 = 14.2  Select every 14th form Starting point: random number between 1 and 14 (we choose 11) First form selected is 11th from top Second form selected is 25th from top (11 + 14 = 25) Third form selected is 39th from top (11 + 2x14 = 39) And so forth . . .

Systematic sampling, continued What is the probability that the 146th form will be selected? The 195th? Does this qualify as a simple random sample? Why or why not? Is there any potential problem arising from the use of systematic sampling in this situation?

Example was typical quick method In the preceding example, we selected every 14th form Ideally, we would select every 14.2th form (see later example on 2-stage sample of nurses) Example is a quick and easy method, commonly used in the field; it is a good approximation to the more rigorous procedure

Systematic sampling: + and - Advantages of systematic sampling typically simpler to implement than SRS can provide a more uniform coverage Potential disadvantage of systematic sampling can produce a bias if there is a systematic pattern in the sequence of items from which the sample is selected

Role of simple sampling methods These simple sampling methods are necessary components of more complex sampling methods: cluster sampling stratified sampling We’ll discuss these more complex methods next (following some definitions)

Definitions Listing units (or enumeration units) the lowest level sampled units (e.g., households or individuals) PSUs (primary sampling units) the first units sampled (e.g., states or regions) Sampling probability for any unit eligible to be sampled, the probability that the unit is selected in the sample

More definitions EPSEM sampling Sampling frame “equal probability of selection method”, thus a method in which each listing unit has the same sampling probability Sampling frame the set of items from which sampling is done--often a list of items.

More definitions Undercoverage: the degree to which we fail to identify all eligible units in the population incomplete lists incomplete or incorrect eligibility information

Still more definitions Non-response: failure to interview sampled listing units (study subjects) refusal death physician refusal inability to locate subject unavailability

Still more definitions Precision: the amount of random error in an estimate often measured by the width or half-width of the confidence interval standard error is another measure of precision estimates with smaller standard error or narrower CI are said to be more precise

CLUSTER SAMPLING single stage

Clusters Subsets of the listing units in the population Set of clusters must be mutually exclusive and collectively exhaustive counties townships regions institutions

Example Single-stage cluster sampling There are 361 nurses working at the 31 hospitals and clinics in Region 4 We wish to interview a sample of these nurses select a simple random sample of 5 hospitals/clinics interview all nurses employed at the 5 selected institutions

Assessing the example Hospitals/clinics are the PSUs Nurses are the listing units Sampling probability for each nurse is 5/31 Thus, this is an EPSEM sample Sampling frame is the list of 31 hospitals and clinics

CLUSTER SAMPLING two stage

Cluster sampling -- two stage Select a sample of clusters, as in the single-stage method From each selected cluster, select a subsample of listing units

Cluster sampling -- two stage It is always nice to do EPSEM sampling because such samples are self-weighting don’t need sampling weights in analysis A common EPSEM method for two-stage sampling is PPS (probability proportional to size)

PPS sampling The key to the method is that the sampling probabilities of clusters in the first stage are proportional to the “sizes” of the clusters size = number of listing units in cluster At stage 2, select the same number of listing units from each selected cluster

Nurse example revisited Two-stage sampling We want to interview a sample of 36 nurses We can afford to visit 9 different hospitals/clinics Thus, we need to interview 36/9 = 4 nurses at each institution

Nurse example revisited Two-stage sampling Stage 1: select a sample of 9 hospitals/clinics Selection prob. proportional to “size” Stage 2: select a sample of 4 nurses from each selected institution At each stage, use one of the simple sampling methods

Nurse example revisited Two-stage sampling PSUs are the hospitals/clinics Listing units are the nurses Sampling frames Stage 1: List of 31 hospitals/clinics Stage 2: Lists of nurses at each selected hospital/clinic

Selecting 2-stage nurse sample Sampling interval, I = 361/9 = 40.1 Starting point, random number between 1 and 40; we choose R = 14 First sampling number = R = 14 2nd sampling number = 14 + 1x40.1 = 54.1 3rd sampling number = 14 + 2x40.1 = 94.2 We have selected institutions 2, 5, 9, . . .

Two-stage nurse sample

Applying the sampling numbers For each sampling number, choose the first unit with cumulative “size” equal to or greater than the sampling number Example: sampling number 54.1 first unit with cumulative size  54.1 is unit 5 (cum. no. of nurses = 57) so we select unit 5 for the sample

Optional challenge What is the selection probability for institution 1? 12/40.1 = 0.299 What is the selection probability for a nurse in institution 1? (12/40.1) x (4/12) = 0.998 = 36/361 What is the selection probability for a nurse in institution 2? (7/40.1) x (4/7) = 0.998 = 36/361 All nurses have the same selection probability.

Why do cluster sampling instead Of a simple sampling method? Advantages reduced logistical costs (e.g., travel) list of all 361 nurses may not be available (reduces listing labor) Disadvantages estimates are less precise analysis is more complicated (requires special software)

Design effect Relative increase in variance of an estimate due to the sampling design “variance” = (standard error)2 Formula s1 = standard error under simple random sampling s2 = standard error under complex sampling design (e.g., cluster sampling) design effect = (s2/s1)2

Design effect for cluster sampling For cluster sampling designs, the design effect is always >1 This means that estimates from a survey done with cluster sampling are less precise than corresponding estimates obtained from a survey having the same sample size done with simple random sampling

Cluster sizes Recommended “take” per cluster is 20-40 for multi-purpose surveys Time and resource limitations will often dictate the maximum number of clusters you can include in the study Including more clusters improves the precision of your estimates more than a corresponding increase in sample size within the clusters already in the sample

STRATIFIED SAMPLING

Strata Subsets of the listing units in the population Set of strata must be mutually exclusive and collectively exhaustive Strata are often based on demographic variables age sex race

Stratified sampling Sample from each stratum Often, sampling probabilities vary across strata

Stratified sampling Advantages Disadvantages guarantees coverage across strata can over-sample some strata in order to obtain precise within-stratum estimates typically, design effect < 1 Disadvantages with unequal sampling probabilities, sampling weights must be included in analysis more complicated requires special software

Example: sampling breast cancer cases for the Women’s CARE Study Stratification variables geographic site race (2 races) five-year age group Over-sampled younger women Over-sampled black women

Example: Sampling households for a reproductive health survey in 11 refugee camps in Pakistan Selected simple random sample of households from within each of the 11 camps All households were selected with the same probability

Refugee camp sampling

The sampling operation Must be carefully controlled don’t leave to discretion in the field use a carefully defined procedure Document what you did for reference during analysis to defend your study

Sampling frames A list containing all listing units is great if you can get it ok if it includes some ineligibles Problems associated with geographic location-based sampling map-based sampling EPI sampling

Sampling weights Inverse of the net sampling probability Interpretation: the sampling weight for an sampled individual is the number of individuals his/her data “represent”

Example--sampling weights There are 150 employees in a firm stratum 1: 50 employees aged 18-29 stratum 2: 100 employees aged 30-69 We sample 10 from each stratum Sampling probabilities are stratum 1: 10/50 = 0.20 stratum 2: 10/100 = 0.10

Example: sampling weights (continued) stratum 1: 1/0.20 = 5 stratum 2: 1/0.10 = 10 Interpretation: Each sampled employee in stratum 1 represents 5 employees Each sampled employee in stratum 2 represents 10 employees

What about non-response? 1 employee in the stratum 1 sample and 3 employees in the stratum 2 sample refuse to participate in the survey Net sampling probabilities stratum 1: 9/50 = 0.18 stratum 2: 7/100 = 0.07

Revised sampling weights Sampling weights revised for non-response stratum 1: 1/0.18 = 5.56 stratum 2: 1/0.07 = 14.29 This computation is often done by multiplying the original sampling weights by adjustment factors to account for non-response rates

Post-stratification weighting Define strata, which may or may not have been used as strata in the sampling design Compute sampling probabilities = proportion of each stratum that was actually sampled Compute sampling weights from these sampling probabilities Allows post-hoc treatment of unequal representation of population segments in the sample

Discussion topics What is the population of interest? Infinite populations Selecting random numbers Selecting simple random samples from finite populations from infinite populations Analysis software for complex surveys