1 Session 10 Sampling Weights: an appreciation. 2 To provide you with an overview of the role of sampling weights in estimating population parameters.

Slides:



Advertisements
Similar presentations
Sampling: Theory and Methods
Advertisements

Multistage Sampling.
Calculation of Sampling Errors MICS3 Regional Workshop on Data Archiving and Dissemination Alexandria, Egypt 3-7 March, 2007.
Calculation of Sampling Errors MICS3 Data Analysis and Report Writing Workshop.
SJS SDI_161 Design of Statistical Investigations Stephen Senn Random Sampling I.
Lecture 2 ANALYSIS OF VARIANCE: AN INTRODUCTION
ELSA English Longitudinal Study of Ageing The English Longitudinal Study of Ageing Rebecca Taylor National Centre for Social.
Outline of talk The ONS surveys Why should we weight?
Multistage Sampling Module 3 Session 9.
1 From the data to the report Module 2. 2 Introduction Welcome Housekeeping Introductions Name, job, district, team.
Stratified Sampling Module 3 Session 6.
1 Cluster Sampling Module 3 Session 8. 2 Purpose of the session To demonstrate how a cluster sample is selected in practice To demonstrate how parameters.
1 Session 8 Tests of Hypotheses. 2 By the end of this session, you will be able to set up, conduct and interpret results from a test of hypothesis concerning.
1 Adding a statistics package Module 2 Session 7.
Housekeeping: Variable labels, value labels, calculations and recoding
1 Session 7 Standard errors, Estimation and Confidence Intervals.
Basic Sampling Concepts
SADC Course in Statistics Estimating population characteristics with simple random sampling (Session 06)
The Poisson distribution
Overview of Sampling Methods II
SADC Course in Statistics Further ideas concerning confidence intervals (Session 06)
SADC Course in Statistics Introduction to Non- Parametric Methods (Session 19)
SADC Course in Statistics Tests for Variances (Session 11)
Assumptions underlying regression analysis
Estimation in Stratified Random Sampling
SADC Course in Statistics Basic principles of hypothesis tests (Session 08)
SADC Course in Statistics Meaning and use of confidence intervals (Session 05)
SADC Course in Statistics The binomial distribution (Session 06)
SADC Course in Statistics Sampling weights: an appreciation (Sessions 19)
SADC Course in Statistics Sample size determinations (Session 11)
SADC Course in Statistics Sampling design using the Paddy game (Sessions 15&16)
SADC Course in Statistics Session 4 & 5 Producing Good Tables.
SADC Course in Statistics Common complications when analysing survey data Module I3 Sessions 14 to 16.
SADC Course in Statistics Introduction to Statistical Inference (Session 03)
SADC Course in Statistics Overview of Sampling Methods I (Session 03)
SADC Course in Statistics General approaches to sample size determinations (Session 12)
SADC Course in Statistics To the Woods discussion (Sessions 10)
Objectives and data needs
SADC Course in Statistics Objectives and analysis Module B2, Session 14.
SADC Course in Statistics Revision on tests for proportions using CAST (Session 18)
Probability Distributions
Hypothesis Test II: t tests
The Game of Algebra or The Other Side of Arithmetic The Game of Algebra or The Other Side of Arithmetic © 2007 Herbert I. Gross by Herbert I. Gross & Richard.
Faculty of Allied Medical Science Biostatistics MLST-201
Module 16: One-sample t-tests and Confidence Intervals
ESDS meeting 9 th September P|E|A|S Practical Exemplars on the Analysis of Surveys –Web site to help people analyse surveys –Supported by the ESRC.
Columbus State Community College
Chapter 10: Sampling and Sampling Distributions
SADC Course in Statistics Introduction and Study Objectives (Session 01)
Complex Surveys Sunday, April 16, 2017.
Why sample? Diversity in populations Practicality and cost.
Stratified Simple Random Sampling (Chapter 5, Textbook, Barnett, V
Definitions Observation unit Target population Sample Sampled population Sampling unit Sampling frame.
Copyright 2010, The World Bank Group. All Rights Reserved. Estimation and Weighting, Part I.
Least Squares Regression: y on x © Christine Crisp “Teach A Level Maths” Vol. 2: A2 Core Modules.
1 Dealing with Item Non-response in a Catering Survey Pauli Ollila Statistics Finland Kaija Saarni Finnish Game and Fisheries Research Institute Asmo Honkanen.
Sampling Design and Analysis MTH 494 Lecture-30 Ossam Chohan Assistant Professor CIIT Abbottabad.
Sampling Design and Analysis MTH 494 Ossam Chohan Assistant Professor CIIT Abbottabad.
Sampling Design and Analysis MTH 494 LECTURE-12 Ossam Chohan Assistant Professor CIIT Abbottabad.
Lohr 2.2 a) Unit 1 is included in samples 1 and 3.  1 is therefore 1/8 + 1/8 = 1/4 Unit 2 is included in samples 2 and 4.  2 is therefore 1/4 + 3/8 =
Chapter 7 Sampling and Sampling Distributions ©. Simple Random Sample simple random sample Suppose that we want to select a sample of n objects from a.
Copyright 2010, The World Bank Group. All Rights Reserved. Part 1 Sample Design Produced in Collaboration between World Bank Institute and the Development.
1 Chapter 2: Sampling and Surveys. 2 Random Sampling Exercise Choose a sample of n=5 from our class, noting the proportion of females in your sample.
ICCS 2009 IDB Seminar – Nov 24-26, 2010 – IEA DPC, Hamburg, Germany Training Workshop on the ICCS 2009 database Weights and Variance Estimation picture.
Sampling Design and Analysis MTH 494 Lecture-21 Ossam Chohan Assistant Professor CIIT Abbottabad.
Chapter 1 Introduction to Statistics. Section 1.1 Fundamental Statistical Concepts.
Population vs. Sample. Population: a set which includes all measurements of interest to the researcher (The collection of all responses, measurements,
Chapter 12 Vocabulary. Matching: any attempt to force a sample to resemble specified attributed of the population Population Parameter: a numerically.
Meeting-6 SAMPLING DESIGN
Random sampling Carlo Azzarri IFPRI Datathon APSU, Dhaka
Presentation transcript:

1 Session 10 Sampling Weights: an appreciation

2 To provide you with an overview of the role of sampling weights in estimating population parameters To demonstrate computation of sampling weights for a simple scenario To highlight difficulties in calculating sampling weights for complex survey designs and the need to seek professional expertise for this purpose To learn about file merging and continue with the on- going project work Session Objectives

3 Real surveys are generally multi-stage At each stage, probabilities of selecting units at that stage are not generally equal When population parameters like a mean or proportion is to be estimated, results from lower levels need to be scaled-up from the sample to the population This scaling-up factor, applied to each unit in the sample is called its sampling weight. What are sampling weights?

4 Suppose for example, a simple random sample of 500 HHs in a rural district (having 7349 HHs in total) showed 140 were living below the poverty line Hence total in population living below the poverty line = (140/500)*7349 =2058 Data for each HH was a 0,1 variable, 1 being allocated if HH was below poverty line. Multiplying this variable by 7349/500=14.7 & summing would lead to the same answer. i.e. sampling weight for each HH = 14.7 A simple example

5 Above was a trivial example with equal probabilities of selection In general, units in the sample have very differing probabilities of selection To allow for unequal probabilities of selection, each unit is weighted by the reciprocal of its probability of selection Thus sampling weight=(1/prob of selection) Why are weights needed?

6 Consider a conveniently rectangular forest with a river running down in the middle, thus dividing the forest into Region 1 and Region 2. Region 1 is divided into 96 strips, each 50m x 50m, while Region 2 is divided into 72 strips. Data are the number of small trees and the number of large trees in each strip. Aim: To find the total number of large trees, the total number of small trees, and hence the total number of trees in the forest. An example

7 Each region can be regarded as a stratum: 8 strips were chosen from region 1 and 6 from region 2. Mean number of large trees per strip were: in region 1, based on n 1 = in region 2, based on n 2 =6 Hence total number of large trees in the forest can be computed as (96*97.875) + (72*83.5) = So what are the sampling weights used for each unit (strip)? Weights in stratified sampling

8 The sampling weights are the same for all strips, whether in region 1 or region 2. Why is this? What are the probabilities of selection here? In region 1, each unit is selected with prob=8/96 In region 2, each unit is selected with prob=6/72 A design where probabilities of selection are equal for all selected units is called a self-weighting design. Regarding the sample as a simple random sample then gives us the correct mean. Self-weighting

9 Easy to see that the mean number of large trees in the forest is [(96/168)* ] + [(72/168)*83.5] = Regarding the 14 observations as though they were drawn as a simple random sample gives 91.71, i.e. the same answer. The results for variances however differ Variance of stratified sample mean=1.28 Variance of mean ignoring stratification = 2.18 Results for means

10 Important to note that the weights used in computing a mean, i.e. (96/168)*(1/8) = 1/14 for strips in region 1, & (72/168)*(1/6) = 1/14 for strips in region 2, are not sampling weights Sampling weights refer to the multiplying factor when estimating a total. Essentially they represent the number of elements in the population that an individual sampling unit represent. More on weights

11 Weights are also used to deal with non-responses and missing values If measurements on all units are not available for some reason, may re-compute the sampling weights to allow for this. e.g. In conducting the Household Budget Survey 2000/2001 in Tanzania, not all rural areas planned in the sampling scheme were visited. As a result, sampling weights had to be re-calculated and used in the analysis. Other uses of weight

12 General approach is to find the probability of selecting a unit at every stage of the sample selection process e.g. in a 3-stage design, three set of probabilities will result Probability of selecting each final stage unit is then the product of these three probabilities The reciprocal of the above probability is then the sampling weight Computation of weights

13 Standard methods as illustrated in textbooks on sampling, often do not apply in real surveys Complex sampling designs are common Computing correct probabilities of selection can then be very challenging Usually professional assistance is needed to determine the correct sampling weights and to use it correctly in the analysis Difficulties in computations

14 When analysing data from complex survey designs, it is important to check that the software can deal with sampling weights Packages such as Stata, SAS, Epi-info have facilities for dealing with sampling weights However, need to be careful that the approaches used are appropriate for your own survey design Note: Above discussion was aimed at providing you with an overview of sampling weights. See next slide for work of the remainder of this session. Software for dealing with weights

15 To understand how files may be merged, work through sections 10.5 and 10.6 of the Stata Guide. Now move to your project work and practice file merging to address objectives 4 and 5 of your task. A description of the work you should undertake is provided in the handout titled Practical 10. Practical work