Objectives and data needs SADC Course in Statistics Objectives and data needs (Session 01)
Module overview Basic concepts and definitions Sampling methods – simple random, stratified, cluster, multi-stage, etc Designing a sampling scheme for relatively simple scenarios in accordance with objectives and available resources How to produce estimates for population characteristics with measures of precision Sample size determinations An appreciation of what is meant by sampling weights
Module aims By the end of this module, you will be able to explain what is meant by sample, population, sampling frame, sampling units explain the notions of representativeness and generalisability of results design sampling schemes for simple scenarios produce population estimates and associated measures of precision discuss options for calculation of sample sizes with a good understanding of general issues involved
Aims – this session… By the end of this session, you will be able to appreciate the different type of objectives that may arise in real life surveys critically assess the type of data needed to address questions of interest explain the benefits of sampling recognise importance of utilising existing knowledge about the population sampled We begin with some general remarks, then move to survey objectives & other issues.
Some general remarks This module is on sampling ideas and issues arising from sampling procedures when conducting a survey It is not about survey methods and analysis – which are covered in Module H7 As such, there will not be coverage of the survey process except where it relates to sampling The accompanying handout gives an outline of the survey process so the context is clear – useful to read…
Survey Objectives Decisions regarding the sampling process cannot be made rationally unless we are clear about the survey objectives. Surveys conducted by national statistical offices are often done to provide information on which policy decisions can be made The stated objectives are often vague, e.g. “the objective of this survey is to collect information about…….” OK as a starting point, but need to be more specific if a sensible sampling procedure is to be used – some examples follow…
Objectives related to estimation To estimate median income of dwellings in slum areas of a city proportion of rural households that have no access to a medical facility within 3 kms maternal mortality rate, i.e. deaths per 1000 live births of mothers from puerperal causes mean yield per hectare of pigeonpea production in small-holder commercial farms
Objectives related to comparisons Questions of interest may be: does a newly introduced farming practice for managing banana plants result in higher yields compared to a standard practice? is there a difference in access to health facilities between rural & urban areas? is there evidence that children from poorer families have less opportunities for entering higher educational institutions?
Objectives related to relationships Is there a relationship between consumption expenditure (as a proxy for income) and household demographics and assets? children’s enrolment in primary school and educational level of household head? mean number of visits by household members to a health clinic and their level of access to clean water and adequate sanitary facilities?
Sampling units and data Sampling is a first step in any survey study. There is always a need, before data collection, to identify (amongst other things):- the ultimate sampling unit on which measurements are to be made actual measurements needed, plus clarity on the calculation of any derived variable(s) sampling procedure to use to select sampling units Two examples from slides 7 and 9 follow…
Initial steps – some examples Example 1. Estimating the proportion of rural households that have no access to a medical facility with qualified personnel within 3 kms of their homestead. Unit: Household Measurement: Distance to the closest medical facility (latter appropriately defined) Derived variable: Coded as 1 if above measurement > 3 kms, 0 otherwise
An example with a hierarchy Example 2. Determining if there is a relationship between number of visits by household members to a health clinic and their level of access to clean water and adequate sanitary facilities? Units (within HH): Household members Measurements: Visits made by each member in HH, how clean water is accessed, sanitary facilities Derived variable: Mean number of visits = Sum of visits by all members divided by HH size (needed as objective is at the HH level, but measurement is at person level)
Selection of households? Another aspect needed before data collection is to plan the sampling procedure, i.e. how the households can be selected. Here it is important to study the sampling setting to understand better the structure (social, geographical etc) of the “target” population from which households are to be drawn. Here “target” population refers to the group of households to which the survey results are intended to generalise.
Using existing knowledge Examining the literature should also point to existing data sources – need to avoid duplicate data collection. Use what is relevant. Also use general knowledge about the target population. Often much is on record: it would be desirable to use this information Administrative structure: e.g. districts, counties, subcounties, parishes, villages in Uganda Agro-ecological regions Rural/urban divide, etc Build knowledge about existing data sources and population into the sampling and data collection process
Why Sample? The whole population is rarely measurable An exception is the census, e.g. usually population censuses are done once every 10 years A well-designed sample enables us to extrapolate our results to the population Statistical methods enable us to measure the reliability of our conclusions Though they were covered in Module H2, we reiterate benefits below for completeness and as a reminder.
Benefits of sampling Cheaper, quicker and administratively easier than census Less prone to errors – and those that do occur are more easily identified A well-thought out sampling procedure can ensure proper coverage of major population characteristics If suitably structured, the sample survey can (i) take account of varying sizes of units, e.g. farms, and (ii) correct for under-enumeration and some sorts of non-response.
Limitations of sampling Sound sample surveys require considerable time and effort to plan and run. If tasks entailed and resources needed are under-estimated, the results will be poor Unless a pre-determined data analysis plan is in place at the start, data relevant to objectives may not be collected, or too much unnecessary data will be collected Training survey staff is crucial. Ill-phrased questions, poorly linked to objectives, can lead to non-informative results
References De Vaus, D.A. (2001) Research Design in Social Research. Sage Publications, London. ISBN 0 76195346 9 KALTON, G. (1990) Introduction to Survey Sampling. Sage Publications. ISBN 0 8039 2126 8