Sampling: What you don’t know can hurt you Juan Muñoz.

Slides:



Advertisements
Similar presentations
Multiple Indicator Cluster Surveys Survey Design Workshop
Advertisements

Estimates and sampling errors for Establishment Surveys International Workshop on Industrial Statistics Beijing, China, 8-10 July 2013.
Sampling Strategy for Establishment Surveys International Workshop on Industrial Statistics Beijing, China, 8-10 July 2013.
MKTG 3342 Fall 2008 Professor Edward Fox
Multiple Indicator Cluster Surveys Survey Design Workshop
Complex Surveys Sunday, April 16, 2017.
Sample Design (Click icon for audio) Dr. Michael R. Hyman, NMSU.
Dr. Chris L. S. Coryn Spring 2012
Who and How And How to Mess It up
Sampling.
Sampling Prepared by Dr. Manal Moussa. Sampling Prepared by Dr. Manal Moussa.
Fundamentals of Sampling Method
CHAPTER twelve Basic Sampling Issues Copyright © 2002
The Excel NORMDIST Function Computes the cumulative probability to the value X Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc
Formalizing the Concepts: Simple Random Sampling.
Sampling and Sampling Procedures.  In most epidemiologic studies, we deal with a sample of the population  The study population may be:  An entire.
Sampling Procedures and sample size determination.
Formalizing the Concepts: STRATIFICATION. These objectives are often contradictory in practice Sampling weights need to be used to analyze the data Sampling.
Sampling Design.
Sampling Concepts Population: Population refers to any group of people or objects that form the subject of study in a particular survey and are similar.
Sampling Designs and Sampling Procedures
SAMPLING METHODS Chapter 5.
Lecture 30 sampling and field work
Key terms in Sampling Sample: A fraction or portion of the population of interest e.g. consumers, brands, companies, products, etc Population: All the.
Sample Design.
United Nations Workshop on the 2010 World Programme on Population and Housing Censuses: Census Evaluation and Post Enumeration Surveys, Amman, Jordan,
Copyright 2010, The World Bank Group. All Rights Reserved. Agricultural Census Sampling Frames and Sampling Section A 1.
COLLECTING QUANTITATIVE DATA: Sampling and Data collection
Learning Objective Chapter 11 Basic Sampling Issues CHAPTER eleven Basic Sampling Issues Copyright © 2000 by John Wiley & Sons, Inc.
Sample Design Establishments Surveys Stuart Brown Research, Design & Evaluation January 2013 STATISTICAL INSTITUTE OF JAMAICA.
Definitions Observation unit Target population Sample Sampled population Sampling unit Sampling frame.
Sampling: Theory and Methods
Sampling Techniques LEARNING OBJECTIVES : After studying this module, participants will be able to : 1. Identify and define the population to be studied.
CHAPTER 12 – SAMPLING DESIGNS AND SAMPLING PROCEDURES Zikmund & Babin Essentials of Marketing Research – 5 th Edition © 2013 Cengage Learning. All Rights.
Multiple Indicator Cluster Surveys Survey Design Workshop Sampling: Overview MICS Survey Design Workshop.
Chap 20-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 20 Sampling: Additional Topics in Sampling Statistics for Business.
Design Effects: What are they and how do they affect your analysis? David R. Johnson Population Research Institute & Department of Sociology The Pennsylvania.
Learning Objectives Copyright © 2004 John Wiley & Sons, Inc. Basic Sampling Issues CHAPTER Ten.
Sample Issues and Field Work Session V Lusaka, January 20, 2003 Juan Munoz and Francesca Recanatini
Data Collection and Sampling
Scot Exec Course Nov/Dec 04 Survey design overview Gillian Raab Professor of Applied Statistics Napier University.
Aim: What is a sample design? Chapter 3.2 Sampling Design.
Sampling Design and Analysis MTH 494 LECTURE-12 Ossam Chohan Assistant Professor CIIT Abbottabad.
United Nations Regional Workshop on the 2010 World Programme on Population and Housing Censuses: Census Evaluation and Post Enumeration Surveys, Bangkok,
SAMPLING TECHNIQUES. Definitions Statistical inference: is a conclusion concerning a population of observations (or units) made on the bases of the results.
Sampling Methods. Probability Sampling Techniques Simple Random Sampling Cluster Sampling Stratified Sampling Systematic Sampling Copyright © 2012 Pearson.
Basic Sampling Issues Chapter 11. What is sampling Sampling: a way of studying a subset of the population but still ensuring “generalizability” (vs. census.
Learning Objectives Copyright © 2002 South-Western/Thomson Learning Basic Sampling Issues CHAPTER twelve.
5-4-1 Unit 4: Sampling approaches After completing this unit you should be able to: Outline the purpose of sampling Understand key theoretical.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.. Chap 7-1 Chapter 7 Sampling Distributions Basic Business Statistics.
Chapter Eleven The entire group of people about whom information is needed; also called the universe or population of interest. The process of obtaining.
Sampling Sources: -EPIET Introductory course, Thomas Grein, Denis Coulombier, Philippe Sudre, Mike Catchpole -IDEA Brigitte Helynck, Philippe Malfait,
Chapter Eleven Sampling: Design and Procedures Copyright © 2010 Pearson Education, Inc
Chapter Ten Copyright © 2006 John Wiley & Sons, Inc. Basic Sampling Issues.
Sampling technique  It is a procedure where we select a group of subjects (a sample) for study from a larger group (a population)
1 Aspects of Sampling for Household Surveys Kathleen Beegle Workshop 17, Session 1c Designing and Implementing Household Surveys March 31, 2009.
Probability Sampling. Simple Random Sample (SRS) Stratified Random Sampling Cluster Sampling The only way to ensure a representative sample is to obtain.
Survey nonresponse and the distribution of income Emanuela Galasso* Development Research Group, World Bank * Based on joint work by Martin Ravallion, Anton.
1. 2 DRAWING SIMPLE RANDOM SAMPLING 1.Use random # table 2.Assign each element a # 3.Use random # table to select elements in a sample.
RESEARCH METHODS Lecture 28. TYPES OF PROBABILITY SAMPLING Requires more work than nonrandom sampling. Researcher must identify sampling elements. Necessary.
United Nations Regional Workshop on the 2010 World Programme on Population and Housing Censuses: Census Evaluation and Post Enumeration Surveys, Addis.
Sampling Design and Procedure
Copyright © 2015 Inter-American Development Bank. This work is licensed under a Creative Commons IGO 3.0 Attribution-Non Commercial-No Derivatives (CC-IGO.
AC 1.2 present the survey methodology and sampling frame used
Sampling.
Sampling Designs and Sampling Procedures
Meeting-6 SAMPLING DESIGN
Basic Sampling Issues.
Random sampling Carlo Azzarri IFPRI Datathon APSU, Dhaka
CHAPTER eleven Basic Sampling Issues
Presentation transcript:

Sampling: What you don’t know can hurt you Juan Muñoz

Outline of presentation Basic concepts –Scientific sampling –Simple Random Sampling –Sampling errors and confidence intervals –Sampling errors and sample size –Sample size and population size –Non-sampling errors –Sampling for rare events –Two-stage sampling and clustering –Stratification –Design effect Implementation issues –Planning the survey –Sample frames –Excluded strata –Paneling –Nonresponse

Random Sampling Random Sampling (a.k.a. Scientific Sampling) is a selection procedure that gives each element of the population a known, positive probability of being included in the sample Random Sampling permits establishing Sampling Errors and Confidence Intervals Other sampling procedures (purposive sampling, quota sampling, etc.) cannot do that Other sampling procedures can also yield biased conclusions

In a Simple Random Sample, households are chosen –With the same probability –Independently of each other In a Simple Random Sample, the selection probability of each household is p = n / N, where –n = sample size –N = size of the population A Simple Random Sample is self-weighted Simple Random Sampling

A simple random sample would be hard to implement... –A list of all households in the country is generally not available to select the sample from –In other words, we don’t have a good sample frame –High transportation costs –Difficult management...but can be used to illustrate some basic facts about sampling –Sampling Errors and Confidence Intervals –The relationship between sampling error and sample size –The relationship between sample size and population size –Sampling vs. non-sampling errors Simple Random Sampling

Sampling error and sample size Standard error e when estimating a prevalence P in a sample of size n taken from an infinite population

Confidence intervals In a sample of 1,000 households, 280 households (28 percent) have preschool children. Standard error is 1.42 percent.

Confidence intervals In a sample of 1,000 households, 280 households (28 percent) have preschool children. Standard error is 1.42 percent. Standard error 95 percent confidence interval: 28 ± percent confidence interval: 28 ±

Sampling error and sample size Standard error Sample size To halve sampling error......sample size must be quadrupled

Sample size and population size Standard error e when estimating a prevalence P in a sample of size n taken from a population of size N finite population correction

Sample size and population size Sample size needed for a given precision Population size

Sample size Sampling error Non-sampling error Sampling vs. non-sampling errors Total error

Absolute and relative errors Formula gives the absolute error But we are often interested in the relative error For rare events (small p,) the relative error can be large, even with very big samples This may be the case of some of the MDG’s Infant / maternal mortality HIV/AIDS prevalence Extreme poverty

Two-stage sampling The country is divided into small Primary Sampling Units (PSUs) In the first stage, PSUs are selected In the second stage, households are chosen within the selected PSUs

Two-stage sampling Solves the problems of Simple Random Sampling Provides an opportunity to link community-level factors to household behavior The sample can be made self-weighted if –In the first stage, PSUs are selected with Probability Proportional to Size (PPS) –In the second stage, a fixed number of households are chosen within each of the selected PSUs The price to pay is cluster effect

Cluster effect Standard error grows when the sample of size n is drawn from k PSUs, with m households in each PSU ( n=km ) Cluster effect Intra-cluster correlation coefficient Two Stage SampleSimple Random Sample

Cluster effects Intra-cluster correlation coefficient Number of PSUs Number of households per PSU For a total sample size of 12,000 households

Sampling weights need to be used to analyze the data Sampling weights need to be used to analyze the data Stratified Sampling These objectives are often contradictory in practice The population is divided up into subgroups or “strata”. A separate sample of households is then selected from each stratum. There are two primary reasons for using a stratified sampling design: –To potentially reduce sampling error by gaining greater control over the composition of the sample. –To ensure that particular groups within a population are adequately represented in the sample. The sampling fraction generally varies across strata.

Design effect In a two-stage sample Cluster effect = e ² TSS / e ² SRS In a more complex sample (with two or more stages, stratification, etc.) Design effect = Deff = e ² CS / e ² SRS It can be interpreted as an apparent shrinking of the sample size, as a result of clustering and stratification. It can be estimated with specialized software (such as the Stata’s svy commands)

First stage sample frame: The list of Census Enumeration Areas Exhaustive Unambiguous Linked with cartography Measure of size (for PPS selection) Up to date (?) Area Units of adequate size

Second stage sample frame: The household listing operation What is involved? How long does it take? How much does it cost? How much earlier than the survey? Is it always needed? Dwellings or households? Who draws the sample? Asking extra questions during listing Can new technologies help? Training, organization, supervision, forms households per enumerator/day ~15% of the total cost of fieldwork As close as possible Yes (almost)does A dwelling listing is more permanent Ideally, central staff Not recommended Yes (GPS )

Planning the survey Selected PSUs should be allocated –Among teams –During the survey period

Parts of the country may need to be excluded from the sample for security or other reasons Excluded strata

Panel Surveys can measure change better Y 2001 Y It seems that Y 2001 > Y 2005 but… …both measures are affected by sampling errors (e 2001 et e 2005 ) The error of the difference Y Y 2001 is… …√ (e² e² 2005 ) if the two samples are independent …only √(e² e² 2005 –2ρ[Y 2001,Y 2005 ]) if the sample is the same

Advantages and disadvantages of panels Analytical advantages –Can measure changes better –Permit understanding better why things changed –Permits correlating past and present behavior Analytical disadvantages –Become progressively less representative of the population Practical disadvantages –Sample attrition –Much harder to manage –Better to design them prospectively rather than in afterthought Practical advantages –No sampling design needed for the second and subsequent surveys

Nonresponse Possible solutions…  Replace nonrespondents with similar households  Increase the sample size to compensate for it  Use correction formulas  Use imputation techniques (hot-deck, cold-deck, warm-deck, etc.) to simulate the answers of nonrespondents  None of the above ✔

The best way to deal with nonresponse is to prevent it Lohr, Sharon L. Sampling: Design & Analysis (1999)

Total Nonresponse Interviewers Type of survey Respondents Training Work LoadMotivation QualificationData collection method Demographic Socio-economic Economic Burden Motivation Proxy Availability Source: “Some factors affecting Non-Response.” by R. Platek Survey Methodology

Total sample size: 18,144 households 56 Strata = 18 governorates x 3 zones (5 in Bagdad) ( Urban Center / Other Urban / Rural ) No explicitly excluded strata Within each stratum: 324 households, selected in two- stages: –54 Blocks, selected with PPS –In each block: 6 households (a cluster,) selected with EP The 162 clusters of each governorate were allocated –To fieldworkers: 3 teams x 3 interviewers x 18 clusters –In time: 18 waves x 9 clusters (randomly) One wave = 20 days  fieldwork period = 12 months Case study: The IHSES Iraq Household Socio-Economic Survey Presenter: Ms Najla Murad - COSIT

If a cluster could not be visited at the scheduled time, it was swapped with one of the selected clusters not yet visited, chosen at random. At the end of fieldwork, 75 of the 3,024 originally selected clusters could not be visited (2.5 percent) However, over 30 percent of the clusters were not visited at the scheduled time In the clusters that could be visited, non- response was negligible (~1.5 percent) Case study: The IHSES Iraq Household Socio-Economic Survey Performance of the contingency plans