Farm Household Surveys DATABASE ORGANISATION AND DATA CLEANING Glwadys Aymone GBETIBOUO C4ECOSOLUTIONS, CAPE TOWN Economics analyses of climate change impacts workshop Accra, Ghana
Database organisation and cleaning, or data management is generally seen as a set of tasks related to the tabulation phase of the survey, in other words, activities that are conducted towards the end of the survey project, that use computers in clean offices. Survey data management should begin concurrently with questionnaire design. Keys points to consider: – Nature and identification of the statistical units observed – Built-in redundancies – Length and complexity of the questionnaire – Sample size and design – Survey timing and scheduling
DATA ENTRY : “flat file”
codification of the statistical unit ADM0ADM1ADM2CADM0CADM1CADM2CODE South AfricaEastern CapeAberden
Household code 8 digits code HHCODE
DATA ENTRY SYSTEM A complex household survey typically contains hundreds of variables. For example household survey dataset 2003 GEF study : 1342 variables After the survey instrument has been finalized, you develop the data entry system and provide a protocol for data entry. Coding questionnaire Coding sheet Household data: 12 worksheets Climate data; soil data, runoff data
DATA ENTRY hhcodeTIBfarmtyperelheadhhsizegender1age1 HHCODETIB : : : : : : : : :
Data cleaning Generally data is subjected to control mechanisms: 1.range checks, 2.consistency checks and 3.typographical checks
Range checks Every variable in the survey contains only data within a limited domain of valid values. tab farmtype, missing farmtype | Freq. Percent Cum | | | | | Total | hhcode farmtype remark CHECK DATA FOR THIS OBS.
Consistency check Values from one question are consistent with values from another question. Demographic consistency of the household Consistency of age and other individual characteristics gen test=hhmales+hhfemales list hhcode hhsize hhmales hhfemales test remark if test!=hhsize, hhcode hhsize hhmales hhfemales test remark CHECK DATA FOR THIS OBS CHECK DATA FOR THIS OBS. tab age5 hhcode age5 remark CHECK DATA FOR THIS OBS.
Typographical checks Typographical error consists in the transposition of digits like entering : 41 rather than 14 This error can be check through the double data entry of all questionnaires -999 rather than.-99 in a numerical input foreach var of varlist _all { replace `var'=-99 if `var'==-999 replace `var'=. if `var'==-99 } Use the tab function to obtain frequency tables of the datafrequency tables of the data