1 Session 10 Sampling Weights: an appreciation
2 To provide you with an overview of the role of sampling weights in estimating population parameters To demonstrate computation of sampling weights for a simple scenario To highlight difficulties in calculating sampling weights for complex survey designs and the need to seek professional expertise for this purpose To learn about file merging and continue with the on- going project work Session Objectives
3 Real surveys are generally multi-stage At each stage, probabilities of selecting units at that stage are not generally equal When population parameters like a mean or proportion is to be estimated, results from lower levels need to be scaled-up from the sample to the population This scaling-up factor, applied to each unit in the sample is called its sampling weight. What are sampling weights?
4 Suppose for example, a simple random sample of 500 HHs in a rural district (having 7349 HHs in total) showed 140 were living below the poverty line Hence total in population living below the poverty line = (140/500)*7349 =2058 Data for each HH was a 0,1 variable, 1 being allocated if HH was below poverty line. Multiplying this variable by 7349/500=14.7 & summing would lead to the same answer. i.e. sampling weight for each HH = 14.7 A simple example
5 Above was a trivial example with equal probabilities of selection In general, units in the sample have very differing probabilities of selection To allow for unequal probabilities of selection, each unit is weighted by the reciprocal of its probability of selection Thus sampling weight=(1/prob of selection) Why are weights needed?
6 Consider a conveniently rectangular forest with a river running down in the middle, thus dividing the forest into Region 1 and Region 2. Region 1 is divided into 96 strips, each 50m x 50m, while Region 2 is divided into 72 strips. Data are the number of small trees and the number of large trees in each strip. Aim: To find the total number of large trees, the total number of small trees, and hence the total number of trees in the forest. An example
7 Each region can be regarded as a stratum: 8 strips were chosen from region 1 and 6 from region 2. Mean number of large trees per strip were: in region 1, based on n 1 = in region 2, based on n 2 =6 Hence total number of large trees in the forest can be computed as (96*97.875) + (72*83.5) = So what are the sampling weights used for each unit (strip)? Weights in stratified sampling
8 The sampling weights are the same for all strips, whether in region 1 or region 2. Why is this? What are the probabilities of selection here? In region 1, each unit is selected with prob=8/96 In region 2, each unit is selected with prob=6/72 A design where probabilities of selection are equal for all selected units is called a self-weighting design. Regarding the sample as a simple random sample then gives us the correct mean. Self-weighting
9 Easy to see that the mean number of large trees in the forest is [(96/168)* ] + [(72/168)*83.5] = Regarding the 14 observations as though they were drawn as a simple random sample gives 91.71, i.e. the same answer. The results for variances however differ Variance of stratified sample mean=1.28 Variance of mean ignoring stratification = 2.18 Results for means
10 Important to note that the weights used in computing a mean, i.e. (96/168)*(1/8) = 1/14 for strips in region 1, & (72/168)*(1/6) = 1/14 for strips in region 2, are not sampling weights Sampling weights refer to the multiplying factor when estimating a total. Essentially they represent the number of elements in the population that an individual sampling unit represent. More on weights
11 Weights are also used to deal with non-responses and missing values If measurements on all units are not available for some reason, may re-compute the sampling weights to allow for this. e.g. In conducting the Household Budget Survey 2000/2001 in Tanzania, not all rural areas planned in the sampling scheme were visited. As a result, sampling weights had to be re-calculated and used in the analysis. Other uses of weight
12 General approach is to find the probability of selecting a unit at every stage of the sample selection process e.g. in a 3-stage design, three set of probabilities will result Probability of selecting each final stage unit is then the product of these three probabilities The reciprocal of the above probability is then the sampling weight Computation of weights
13 Standard methods as illustrated in textbooks on sampling, often do not apply in real surveys Complex sampling designs are common Computing correct probabilities of selection can then be very challenging Usually professional assistance is needed to determine the correct sampling weights and to use it correctly in the analysis Difficulties in computations
14 When analysing data from complex survey designs, it is important to check that the software can deal with sampling weights Packages such as Stata, SAS, Epi-info have facilities for dealing with sampling weights However, need to be careful that the approaches used are appropriate for your own survey design Note: Above discussion was aimed at providing you with an overview of sampling weights. See next slide for work of the remainder of this session. Software for dealing with weights
15 To understand how files may be merged, work through sections 10.5 and 10.6 of the Stata Guide. Now move to your project work and practice file merging to address objectives 4 and 5 of your task. A description of the work you should undertake is provided in the handout titled Practical 10. Practical work