Presentation is loading. Please wait.

Presentation is loading. Please wait.

Weights in the DHS A brief overview

Similar presentations


Presentation on theme: "Weights in the DHS A brief overview"— Presentation transcript:

1 Weights in the DHS A brief overview
Note to facilitators: Wherever possible, add or modify the examples in the presentation to cover the specific data files and topics of interest to workshop participants. If you are only using Stata/SPSS, please modify slides and examples as needed. This presentation briefly introduces sampling and weighting procedures used in the DHS, and how to use sampling weights in analysis. Session 16 will cover sampling design in further detail, with instructions on how to take DHS’s complex sample design into account in your analysis.

2 Overview Sample sizes in the DHS Why weights are used in the DHS
How to weight data in Stata/SPSS Replicating weighted and unweighted results to match DHS final reports Frequencies Crosstabulations In this presentation, we will talk about some of the general procedures the DHS programme uses to select samples and create weights for DHS datasets. We will then discuss how to choose the most appropriate weight for your analytical purposes, and give you practice using weights to create properly weighted frequencies and crosstabulations with DHS data.

3 Group meant to represent a larger population
DHS data are a SAMPLE of people selected to represent an entire country Group meant to represent a larger population NOT a census – not everyone in the country is interviewed A sample is: A group of people selected for a study, and A group meant to represent a larger population. The participants in the DHS are a sample of individuals selected to be representative of the entire population of the country. As an example, we could pretend that the people in this room represent the entire population of a country. We could sample some, but not all, of the people in this room as a SAMPLE to represent ALL of the people in the room. We wouldn’t want to sample EVERYONE in the room – that would be a census. Note to facilitators: update map for your country

4 Sample Sizes in the DHS The larger the sample, the more reliable the survey findings (in general) Need to balance sample size with budgetary constraints The more sub-national areas (e.g., provinces or districts) for which data are desired, the larger the sample must be Sample sizes in the DHS are large, in the thousands, which produces estimates for indicators that we can be confident in. Some indicators, such as mortality and fertility, require larger sample sizes than others in order to obtain a reliable estimate. The DHS provides estimates at the national level, for urban and rural areas, and usually for sub-national administrative areas. Providing representative samples at the sub-national level requires a larger sample and increases the cost of the survey. Many more households, women, and men need to be included in the survey. The sample sizes in the DHS depend on funding and the needs of the government. DHS sampling attempts to balance methodological sampling concerns against cost effectiveness; in other words, the DHS strives to get the “best” indicators for the best price.

5 Ideal Sample Sizes for Indicators
Infant mortality rate (IMR) 800 to 1,000 women Total fertility rate (TFR) Contraceptive prevalence rate (CPR) 400 to 500 women This table shows the minimum sample sizes for some basic indicators. These are the ideal numbers that are required to achieve a reasonable level of accuracy when estimating the indicator. If sample sizes are lower than those indicated in the table, the level of accuracy is reduced and the confidence intervals are larger.

6 Sample sizes – most recent DHS
Kenya: 9,057 households; representative at regional level and urban/rural level Nigeria: 34,070 households; representative at state and zonal level Rwanda: 12,792 households; representative at urban/rural level and provincial level Uganda: 9,864 households; representative at national level, urban/rural level, sub-national regions including IDP camps Refer to Chapter 1 and appendix tables for details on sampling Note to organizers: The examples of sample sizes on this slide can be substituted with examples from country surveys relevant to workshop participants. For Fellows Data Users’ Workshops, this would be the surveys from the fellows’ country teams, or the surveys the teams will be analyzing. For Further Analysis Workshops, this would be the survey for the country holding the workshop and a selection of regional neighboring countries. Note to facilitators: use this slide instead of the next if your workshop covers multiple countries and you want to facilitate comparisons. The DHS provides estimates at the national level, for urban and rural areas, and usually for about five to ten sub-national administrative areas. Providing representative samples at the sub-national level requires a larger sample and increases the cost of the survey. Many more households, women, and men need to be included in the survey. This slide shows the total sample size for several recent surveys, and the sub-national levels at which they are representative. Details about sampling procedures, sample size, and representative areas for the survey(s) you are working with can be found in Chapter 1 and appendices of the survey’s Final Report.

7 Number of sampling domains
The goal of the ZDHS was to provide reliable estimates for each of the following areas: Zimbabwe as a whole Urban and rural areas of Zimbabwe 10 provinces (including Harare and Bulawayo) Women ages 15-49, men 15-54 Note to facilitators: use this slide instead of the previous if your workshop focuses on one country, and update the examples for your country. The DHS provides estimates at the national level, for urban and rural areas, and usually for about five to ten sub-national administrative areas. Providing representative samples at the sub-national level requires a larger sample and increases the cost of the survey. Many more households, women, and men need to be included in the survey.

8 Computing Sample Size Roughly, we need about 1,000 households to produce accurate estimates for each of the 10 provinces Larger samples are usually needed to provide accurate estimates for urban areas (rarer events) Consider non-response rates Non-response may vary by area: generally higher in Zimbabwe, especially for males and especially in urban areas. Result Final target sample size: approx. 11,000 households Actual realized sample: 9,756 households The computation of the desired sample size for a DHS survey is household-based. The DHS considers expected non-response rates, based on its extensive experience, and also considers the quality and recency of the sampling frame. We also consider the minimum size needed for each sub-group for which we anticipate producing indicators. As shown before, for our standard fertility and mortality indicators, we need approximately 1,000 households for each geographic area. For example, the Zimbabwe DHS needs to survey about 10,000 household to produce indicators representative of each of its 10 provinces (1,000 x 10).

9 Making the data representative: sampling weights
Why weight the data? The population is not evenly distributed among different regions, but minimum n’s needed. Response rates (especially for HIV testing) may be very different by region or urban/rural residence Weights are used to restore the representativeness of the sample, so the total sample “looks like” the country’s total population Weights “take into account” or “adjust for” disproportionate sampling and non-response In almost every country, the population is not evenly distributed across the different regions being surveyed. But we need at least the minimum sample size in each region to obtain reliable estimates, for example, of fertility and infant mortality. The DHS undersamples larger regions and oversamples smaller regions to achieve these minimum required sample sizes. In addition to differences in population sizes, response rates may be very different by region or urban/rural residence, especially for HIV testing and other biomarker data collection. These differences are also taken into account in the calculation of weights. Weights are used to restore the representativeness of the sample, so that the total sample again resembles the country’s total population. Weights take into account, or adjust for, disproportionate sampling and non-response.

10 What is a Weight? Technical definition: An adjustment factor applied to each case in tabulations to adjust for differences in probability of selection and interview among cases in a sample, either due to design or happenstance In practice: A number that is multiplied by each case (woman, child, household, couple) to “weight up” or “weight down” that observation If a woman in a survey has a weight of 1.2, she represents 1.2 women in the total survey population If a household in a survey has a weight of 0.8, it represents 0.8 households in the total survey population Technically, a sample weight is an adjustment factor applied to each case in tabulations to adjust for differences in probability of selection and interview among cases in a sample, either due to design or happenstance. In practical terms, a sample weight is a number that is multiplied by each case (i.e., woman, child, household, or couple) to “weight up” or “weight down” that observation. If a woman in a survey sample has a weight of 1.2, she represents 1.2 women in the total survey population. If a household in a survey sample has a weight of 0.8, it represents 0.8 household in the total survey population.

11 Let’s Imagine… We need to interview approximately 13,700 households in Ethiopia to have reliable estimates at the national and sub-national levels. Ethiopia as a whole Urban and rural areas of Ethiopia 11 geographic areas (9 regions and 2 city administrations) We select a simple random sample of households based on the regional population distribution. Weights are easier to understand with an example. Here, we’ll use Ethiopia as an example, because the population is very unevenly distributed across the country. This example shows how sampling weights were applied in the 2005 Ethiopia DHS. Let’s assume that the sampling experts have determined that we need to interview about 14,500 households in order to get reliable estimates of our indicators at the national and sub-national levels. We select these households from each region EXACTLY in proportion to the distribution of households in the country; in other words, we take a simple random sample of households in Ethiopia.

12 Distribution of Households by Region
Ethiopia 2005: Distribution of Households by Region This is how households in Ethiopia are distributed by region. What will our simple random sample of 14,500 households look like based on this distribution? This pie chart shows the distribution of households by region in Ethiopia. It is clear that some regions have much larger populations than others. Ask: what the sample would look like based on this distribution? Answer: The large regions would have very big samples, and the small regions would have very small samples.

13 Regional Distribution in a Simple Random Sample
% of sample No. of households Tigray 6.3 914 Affar 1.6 232 Amhara 25.4 3,683 Oromiya 36.0 5,220 Somali 4.3 624 Benishangul-Gumuz 1.0 145 SNNP 19.9 2,886 Gambela 0.4 58 Harari 0.3 44 Addis Ababa 4.2 609 Dire Dawa 0.5 73 Total 100% 14,500 Notice the sample only includes 58 households in Gambela. Is that enough households to give reliable estimates for Gambela? This table shows how a simple random sample would be divided across Ethiopia’s 11 regions. Look closely at the numbers for Gambela, one of the smallest regions. Only 0.4% of Ethiopia’s population lives in Gambela. Therefore, in a simple random sample only 58 households—or 0.4% of the total sample of 14,500 households—would be selected from Gambela. Ask: Can we really get enough information about Gambela by selecting only 58 households? Answer: No. We cannot calculate a fertility or a mortality rate with data from only 58 households. Remember, we only have enough resources to select 14,500 households. What do we do? The answer is that we oversample in some regions and undersample in others, or redistribute our 14,500 households differently than we would in a simple random sample.

14 Actual EDHS Sample Region No. of households % of sample Tigray 1,349 9 Affar 935 6 Amhara 2,158 15 Oromiya 2,241 Somali 901 Benishangul-Gumuz 954 7 SNNP 2,012 14 Gambela 925 Harari 960 Addis Ababa 1,400 10 Dire Dawa 810 Total 14,645 100% If we oversample in Gambela, we have enough households to be confident that estimates will be reliable for Gambela. Conversely, we can undersample in Oromiya and be confident that the sample is still large enough to produce reliable estimates. In order to include enough households to provide representative data at the regional level, we must oversample in regions with smaller populations and undersample in regions with larger populations. The total number of households in the sample—14,645—remains the same, but the households are redistributed so as to select more households in smaller regions. This table shows the number of households selected in the actual 2005 EDHS sample. You can see that the survey selected 925 households in Gambela instead of 58. This provides enough information about households in Gambela to describe their health and population status accurately. In contrast, the EDHS undersampled in the largest region, Oromiya. So many households from Oromiya would be included in a simple random sample (over 5,000), that their number could be reduced without affecting the representativeness of the regional sample.

15 Region No. of households selected % of sample Tigray 1,349 9 Affar 935 6 Amhara 2,158 15 Oromiya 2,241 Somali 901 Benishangul-Gumuz 954 7 SNNP 2,012 14 Gambela 925 Harari 960 Addis Ababa 1,400 10 Dire Dawa 810 Total 14,645 100% But now Gambela’s households represent 6% of the DHS sample when they only make up 0.4% of the population. And Oromiya is only 15% of the population. Why is this a problem? EXPLAIN that, after oversampling, the households selected in Gambela make up 6% of the total DHS sample—a much higher percentage than their 0.4% share of Ethiopia’s population. In contrast, after undersampling, the households selected in Oromiya make up 15% of the DHS sample, which is much less than their 36% share of Ethiopia’s population. Ask: Why is this is a problem? Answer: because less than 1% of the population lives in Gambela, but Gambelans make up 6% of the sample. And 36% of the population lives in Oromiya, but they make up only 15% of the sample. The sample will no longer be representative of the population.

16 The Problem of Over- and Undersampling
Suppose we want to know what percent of households in Ethiopia own an insecticide-treated net (ITN). Assume, hypothetically, that an NGO did a huge ITN distribution campaign in Gambela, so 70% of households in Gambela own an ITN. But the average in all the other regions is 20%. But because of oversampling, households in Gambela make up 6% of the sample instead of their 0.4% share of the population. Why is this a problem? Note to facilitators: if participants are unlikely to be familiar with malaria and mosquito nets, create a different example that is more relevant to your workshop. For example, you could discuss different fertility rates in Gambela vs. the rest of the country. Here’s why oversampling and undersampling can be a problem when calculating national averages for indicators. Let’s assume that after a mass ITN distribution campaign, households in Gambela have an unusually high rate of ITN ownership of 70%, compared with just 20% for households in other regions. Ask: why this is a problem? Answer: This is a problem because Gambela will have too much influence on the national average. The households in Gambela account for 6% of the EDHS sample when, in reality, they make up less than 1% of the population. Therefore, the high rate of ITN ownership in Gambela would skew the national average. To fix this problem, the DHS applies sampling weights. Sampling weights are designed to correct for the fact that people in some areas or subgroups are more likely than others to be selected for interviews and included in the survey sample. Because of over- and undersampling in the 2005 EDHS, for example, households in Gambela had a greater probability of being selected than households in Oromiya.

17 Creating Sampling Weights
Once we have the final number of households interviewed, we take that number and multiply it by a weight that will reduce or increase it to the size it should in proportion to the total population. In areas where we oversampled, the weight will be less than 1, and where we undersampled, the weight will be more than 1. For example, in Gambela, a household may count for of a household when calculating the national percent of households that own an ITN. In Oromiya (where we undersampled), one household may count for 2.22 households. In calculating a sampling weight, sampling experts use information about the probability each household (actually, each cluster) had of being selected into the sample, and the proportion of the population represented by each cluster, to “weight” the number of households from each region. Weighting makes each region’s contribution to the total sample proportional to the actual distribution of households across the country. The DHS does this so that the impact of any one region on the national average will not be out of proportion to its actual share of the population. In areas where we oversampled (like Gambela), one household should not have as much influence on the national average. And areas that were undersampled, households should have more influence. So, the weight in oversampled households would be less than 1 and undersampled households would be more than 1 to account for this. We “weight up” areas that were undersampled, and “weight down” areas that were oversampled, to make the sample accurately represent the population.

18 Region Actual no. of households Weighted no. of households Tigray 1,282 940 Affar 806 138 Amhara 2,066 3,709 Oromiya 2,155 4,790 Somali 796 540 Benishangul-Gumuz 869 128 SNNP 1,933 2,802 Gambela 820 47 Harari 904 39 Addis Ababa 1,333 525 Dire Dawa 757 64 Total 13,721 To make sure that households in Gambela are not overrepresented when calculating the national average sampling weights are applied to reflect their true percentage of the population. This table shows the actual number of households, and the weighted number of households, in the final 2005 EDHS sample. (NOTE: The numbers in this slide are showing the actual and weighted number of households interviewed, which is less than the number selected). After weights are applied, the weighted number of households from Gambela in the 2005 EDHS sample shrinks from 820 actual households to 47 weighted households. 47 divided by 13,721 comes to a little less than 0.4 %, which is the same as the actual percentage Gambela contributes to the national population. Weighting is also done for regions with large populations, such as Oromiya: households in Oromiya are “weighted up” so that the unweighted sample of 2155 households “grows” to 4790 households – about a third of the weighted sample. Each final country report shows the weighted and unweighted numbers of women and men interviewed for the survey in Table 3.1.

19 Applying Sampling Weights
Gambela Gambela Gambela This first pie chart is the one we saw at the beginning of this presentation, showing the distribution of households in Ethiopia. Gambela is a tiny pie slice Click to show next pie chart The second pie chart shows the unweighted distribution of households in our sample. Gambela is a much bigger pie slice, and Oromiya, the tan slice, is much smaller. The final pie chart shows the weighted distribution of households in the sample. We can see that the proportion of households in each region has been adjusted to reflect the distribution of households in the population: the last pie chart matches the first one. Real distribution of households by region in Ethiopia Distribution of households selected in the DHS (Unweighted numbers) Adjusted number of households in sample (Weighted numbers)

20 The ultimate weight is calculated to…
Ensure representativeness of population Considered before survey is implemented To account for non-response After the survey is finished, if response rates are different by region or residence, weights are adjusted To summarize: The final weight is calculated to ensure the sample is representative of the population. Thus, survey design characteristics are considered before the survey is implemented. Weights have another important function: they can also correct for non-response. For example, a certain percentage of selected respondents probably will not complete the survey, and that percentage may vary between subgroups. Weights can account for this. Weights are calculated after all the data are collected. Adjustments for non-response are made after the survey is finished if response rates are different by region or residence. Response rates by region are shown in the appendices of each final report. If a survey includes HIV biomarkers, response rates for HIV testing are also shown in the HIV chapter.

21 Using weights in Stata and SPSS
In this section, we will discuss the different sampling weights available in DHS datasets, and which weight to use when.

22 Weights in DHS data files
Sample weights are calculated by sampling experts and included in each DHS recode file. Because response rates (and sometimes sample selections) are different for households, women, men, HIV testing, and domestic violence, different weights are calculated for the different units of analysis Unit of analysis Variable Households hv005 Household members (PR file) Women or children v005 Men mv005 Domestic Violence d005 HIV test results hiv05 Sample weights are calculated by sampling experts and included in each DHS recode file before the dataset is made publicly available. So the sampling weights are included in the data files you’re using for analysis. Response rates—and sometimes sample selections—are different for households, women, men, HIV testing, and domestic violence. For example, while generally every woman age in selected households are selected as respondents to the survey, for ethical reasons, only one eligible woman per household is administered the domestic violence module. Also, in some surveys, male respondents age or respondents selected for HIV testing are not selected from every survey household, but from one in every two or one in every three survey households. For this reason, different weights are calculated for the different units of analysis. This table shows the standard variable name for each weight provided in the DHS recode files.

23 DHS weight value convention
All DHS weights allow for 8 digits, with an implicit 6 decimals Remember: no decimal places in DHS variables If hv005 = , the actual weight value is 2.561 If mv005 = , the actual weight value is 0.789 (in practice, divide by 1,000,000) All DHS weights allow for eight digits, with an implicit six decimals. For example, if hv005 = 2,561,000, the actual weight value of the HIV weight variable for the observation is 2.561, meaning that observation represents people in the survey population. If mv005 = 789,000, the actual weight value of the men’s weight variable for the observation is 0.789, meaning that observation represents men in the survey population. In practice, you will need to transform the weight variable by dividing it by 1,000,000.

24 Weights: Which Weight When?
Use the weight applicable to the unit of analysis (household, women, men) When analyzing households, use the household weight When analyzing Women, use the Women’s weight If analyzing Domestic Violence data, use the DV weight – only a subsample of women were selected for the DV module When analyzing HIV data, use the HIV weight When analyzing men or couples, use men’s weight In your analyses, you should use the weight applicable to the unit of analysis, e.g., household, women, or men. When analyzing households, use the household weight (hv005). When analyzing women, use the women’s weight (v005) When analyzing domestic violence data, use the domestic violence weight (d005): Only a subsample of women are selected for the domestic violence module. When analyzing HIV data, use the HIV weight (hiv05) When analyzing men or couples, use the men’s weight (mv005). For couples, it is the male partner’s weight that is used because men have a higher non-response rate than do women (and are sometimes only a subsample) and the men’s weight adjusts for this non-response.

25 DHS conventions for presenting data
DHS tables show weighted percentages (figures) AND weighted sample sizes (denominators) For every table, DHS also checks the unweighted denominator to see if the sample size is too small, and thus results are unreliable. If a figure is based on <25 unweighted cases, the figure is not shown, and is replaced with a * If a figure is based in unweighted cases, the figure is shown in (parentheses), and should be interpreted with caution. DHS tables show weighted percentages (figures) and weighted sample sizes (denominators). The DHS implements a sampling design that is intended to achieve sufficient sample sizes to produce reliable national and certain specified sub-national estimates. Yet, as these data are disaggregated at finer and finer levels, the sample size for one subgroup may become small, e.g., for one current marital status category or a 5-year age group. The concern with sample size relates to the unweighted number of respondents, which are generally not shown in DHS tables (only weighted n’s are). Thus, for every table, DHS also checks the unweighted denominator to see if the sample size is too small and, thus, the results may be unreliable. DHS tables use symbols to alert you if the number of unweighted cases in a subgroup is too small. For results involving percentages based on unweighted cases, the figure is shown in parentheses, denoting that it should be interpreted with caution. For results based on fewer than 25 unweighted cases, the figure is too small to be reliable and, therefore, is not shown. Instead, it is replaced with an asterisk. Note to facilitators: find an example table with parentheses or *s in the final report and ask participants to interpret the results.

26 Practice! Now let’s open our datasets and practice using weights.
Note to facilitators: it will be helpful at this point to run SPSS/Stata on your computer and swap back and forth between these slides and the statistical package as you follow these commands.

27 Run weighted and unweighted frequencies
BEFORE you turn on weights (particularly important in SPSS): Find a variable you plan to use in your analysis (using the recode manual, etc.) Run an unweighted frequency of that variable Create your weight variable THEN, In SPSS, apply the weight 4. Run a weighted frequency of the same variable 5. Compare the weighted and unweighted frequencies. To compare weighted and unweighted frequencies of a variable, first run an unweighted frequency. Then, create your weight variable, and in SPSS turn the weights on. Then, run a weighted frequency of the same variable and compare the results. Note: in Stata these steps can be done in any order; in SPSS you must have weights turned off to run unweighted frequencies.

28 Syntax In SPSS: Freq v025. Compute Weight = v005/1000000.
Weight by weight. In Stata: tab v025 generate weight = v005/ tab v025 [iweight=weight] This is the syntax used to complete the steps we just described. Here we’re looking at urban/rural residence, v025. Note that in Stata, all commands and the “v” in v025 must be lower case, with no CAPS. SPSS is not case sensitive. Facilitators: follow these steps on your computer. In SPSS, you may want to show this from the drop-down menus.

29 An aside: Multiple types of weights in Stata
Stata recognizes several types of weights For frequencies and tables, we use what Stata calls the importance weight: [iweight=weightvar] We will use a different type of weight when calculating sampling errors – later this week! For other weights, look in help in STATA Suggestion: avoid analytic weights (aweights) For Stata users who are interested, you can read more about the different types of weights by typing “help weight” in the command window. Though aweights “analytic weights” sound good, they rescale your data, which is not something we want. Only use iweights or pweights.

30 Match your results to table 3.1 in your DHS final report
Facilitators: Have participants repeat these steps and match their results to final report table Take a break and walk around, checking to make sure everyone is able to do this. Note to organizers: swap this and other examples for examples relevant to your workshop.

31 Matching results to Kenya FR table 3.1
Your results should match the final report exactly – both the percentages and sample sizes. Note that SPSS will round the weighted Ns, which sometimes gives slight inaccuracies in sample sizes. Stata gives the un-rounded Ns, which is helpful – we can always tell that the data are weighted if the Ns are not whole numbers. When you get to presenting your final results, you can round the Ns – this is easy to do in Excel, for example.

32 Match your results to table 3.1 ALWAYS check denominators!
NOTE: Weighted and Unweighted Total Ns will always be the same because the weights have been normalized. Weighted and Unweighted Ns for subpopulations will not always be the same, because they have not been normalized. It’s good practice to always make sure your total sample size, or N, matches the final report. Table 3.1 gives both weighted and unweighted Ns; all other tables are weighted and show weighted Ns unless the table heading says otherwise. Sample weights are normalized so that the unweighted and weighted TOTAL sample sizes are the same. For parts of the sample, or subpopulations, the weighted and unweighted sample sizes will NOT be the same. Here we can see an example: the total N for men is the same weighted and unweighted for the whole population The weighted and unweighted sample sizes are not the same, however, for men 15-49, because the weights were not normalized to add up to this sample size.

33 Can you find and tabulate the other variables shown and match them to table 3.1?
Facilitators: give participants time to work in small groups to find and tabulate other variables in table 3.1: education, age, province/region, etc. Remind them to look for variables with the tools they learned earlier: lookfor, the recode manual, etc. After a bit of practice, regroup to introduce crosstabulations.

34 Crosstabs Once you’ve matched weighted and unweighted frequencies to table 3.1 in your final report, try matching the first few columns of table 3.2.1: Educational attainment: Women Rather than a frequency of one variable, these types of tables tabulate two variables at once. In Stata, the syntax is very similar: Tabulate one variable: tab var1 Crosstab of two variables: tab var1 var2 Options to get columns, rows, Ns. We now know how to tabulate one variable at a time. Now let’s try tabulating two variables together: a crosstabulation. For example, we can examine how education varies by women’s age. In Stata, the code is very straightforward, as shown.

35 Facilitators: discuss difference between one-way and 2-way table
Facilitators: discuss difference between one-way and 2-way table. Ask participants what table 3.1 was showing (answer: percent distributions), vs. this table (answer: row percentages).

36 Crosstab commands in Stata
To match table 3.2.1, crosstab of age by educational status: tab v013 v149 [iweight=weightvar] This gives weighted Ns. To get row %s: tab v013 v149 [iweight=weightvar], row If, after checking the Ns, you want just a table of row %s, use the “nofreq” command tab v013 v149 [iweight=weightvar], row nofreq The same tools will allow you to match results for the other variables in table 3.2.1, or for men We can adapt the weighted frequency commands we used earlier to create weighted crosstabs. The second command shown here will give you row percentages, which will match the final report table. If you wanted column percentages, type “col” instead of “row”. If you want a cleaner table, you can use the “nofreq” command to suppress cell frequencies, or ns. However, remember to check your Ns first! If your Ns don’t match the final report, your percentages definitely won’t match, and it’s easier to find your problem with Ns than percentages.

37 Crosstab commands in SPSS
In SPSS, different commands are needed for frequencies of one variable and crosstabs of two variables. One variable: Freq var1. Two variables: Crosstabs tables var1 by var2. We also need to tell SPSS NOT to round cell counts, which is the default. If we write Crosstabs tables v013 by v149. And weights are turned on, what N do we get for the # of women in Kenya who are ages 15-19? Does this match the # of women from a weighted frequency of v013? Does this match the # of women in the final report? SPSS is a little different. We need a different type of command to create a crosstabulation than we did to create a frequency. As mentioned earlier, SPSS rounds the cell counts. This results in inaccuracies: SPSS rounds the counts in each cell (the ns) and then sums them up to get the final Ns. This rounding gives us the wrong N for the # of women who are in this example. Note to facilitators: find a different example for your survey. Follow these steps on your computer and compare them to the next table to demonstrate the problem.

38 The # of women 15-19 does not match the final report due to rounding.

39 Crosstab commands in SPSS: ASIS
By default, SPSS applies weights to each cell, then counts up the # of rounded cases in each cell, resulting in inaccurate total Ns. To turn off this option, add the count= ASIS subcommand: Crosstabs tables v013 by v149 /count=ASIS. To get row %s: /count=ASIS /cell=row. Can add column, total, count to /cell line To override this problematic rounding, we use the ASIS (as is) subcommand. Facilitators: demonstrate these commands on your computer, perhaps using the drop-down menus.

40 Crosstabs in SPSS: drop-down menu instructions
Analyze  Descriptive Statistics  Crosstabs Put V013 in the Row box and V149 in the Column box Click on the “Cells” button Counts: select “Observed” Percentages: select “Row” Noninteger weights: select “No Adjustment” (ASIS) Click “Continue” Click “OK” to run, or “Paste” to paste to syntax More on keeping records of your work (.do and syntax files) tomorrow Here is a summary of the process using SPSS’s drop-down menus. You can either run the command directly, or hit “Paste” to keep a copy of your syntax, then highlight the pasted syntax and run it. We will discuss syntax files, and the equivalent do-files in Stata, tomorrow.


Download ppt "Weights in the DHS A brief overview"

Similar presentations


Ads by Google