How data quality affects poverty and inequality measurement PovcalNet team DECPI The World Bank
Outline Household survey data –Sampling, questionnaire design, interview methods, income/consumption aggregates –Non-response bias –Data processing and analysis –Overtime comparison Other data –National account (NA) data –Price data –PPPs and ICP data
Household survey data 1.Household survey design –Sampling –daily dairy vs. recall –different recall periods –different income/consumption modules –Non-response bias 2.Common problems on data processing 3.Common mistakes when calculating poverty measures
Household survey data: sampling Malawi 1997 and 2004 household survey
Household survey data: sampling Malawi 1997 and 2004 household survey more than 4000 households drop from 1997 sample
Household survey data: sampling Different sample size/frame will cause comparison problems. Vietnam 2010 survey vs. previous rounds India NSS: thick and thin rounds Indonesia and other countries China 2013 national household survey – census frame vs. legal resident registration –how to compare with previous rural/urban surveys?
Household survey data: daily dairy vs. recall Example of China SW poverty monitoring survey survey: one time recall method 1996 survey: daily dairy method 1995 mean income per capita: Yuan 1996 mean income per capita: Yuan Is there 16% increase in per capita income in one year?
Household survey data: daily dairy vs. recall Example of China SW poverty monitoring survey survey: one time recall method 1996 survey: daily dairy method 1995 mean income per capita: Yuan 1996 mean income per capita: Yuan Is there 16% increase in per capita income in one year? 10-15% increase is due to the switch from recall to dairy.
Example of China SW poverty monitoring survey
Household survey data: different recall periods Example of India NSS 55 th round
Household survey data: different recall periods Example of India NSS 55 th round Result: poverty estimates from NSS 55 are incomparable with previous years
Household survey data: different income/consumption modules Example of Honduras 1997 and 1999 surveys
Household survey data: different income/consumption modules Example of Honduras:
Different income modules?
Household survey data: different income/consumption modules Example of Ethiopia 2000 surveys:
Household survey data: different income/consumption modules Example of Ethiopia 2000 surveys: Reason: different consumption modules
Nonresponse bias in measuring poverty and inequality High nonresponse rates of 10-30% are now common LSMS: 0-26% nonresponse (Scott and Steele, 2002) UK surveys: 15-30% US: 10-20% Concerns that the problem might be increasing
Nonresponse bias in measuring poverty and inequality Compliance is unlikely to be random: Rich people have: –higher opportunity cost of time –more to hide (tax reasons) –more likely to be away from home? –multiple earners Poorest might also not comply: –alienated from society? –homeless
Probability of being in UHS in 2004/05 plotted against income (n=235,000) 19
Common problems on data processing 1.Income/consumption aggregates 2.Valuing income in kind 3.Missing value 4.Outliers
Income/consumption aggregates
Missing, zero and outliers Never mix missing value and zero; Examples from LAC labor force surveys Outliers: check carefully and always keep original records –Income by sources –Sub components of consumptions
Argentina (urban)
Annual income growth of bottom 40% (circa )
Annual per capita GDP growth is less than 1% during same period
Missing and outliers Examples from Colombia 2000 survey – 7% are zero income
Welfare indicator: income vs. consumption – LAC
More than 14% of people with zero income!
Welfare indicator: income vs. consumption – East Asia
Common mistakes on calculating poverty measures 1.Ranking variable 2.Weights Household weight Sampling weight 3.Outliers and missing 4.Adult equivalent
National Account data GDP Private consumption Population CPI – sample, weights change overtime Spatial price Currency change over time
PPPs and ICP data ICP rounds: 1985, 1993, 1996 and 2005 difference in coverage PPPs –PWT PPP and the World Bank’s PPP –GDP PPP and consumption PPP PPPP: PPP for the poor
Biases in 2005 ICP “Urban bias” in price surveys –China: 11 cities; reasonably representative of urban areas but not rural –Similar problems for Argentina, Brazil, Bolivia, Cambodia, Chile, Colombia, Pakistan, Peru, Thailand and Uruguay. Correction using urban/rural poverty line differentials. India: ICP surveys under-represent rural areas (only 28%) –Implicit PPPs for urban and rural India (Rs 17 and Rs 11) PPP’s for the poor: Deaton and Dupriez have re-weighted the PPPs for sub-sample of countries with the necessary data and find similar results
China First time China has participated in the ICP Urban bias: prices collected from 11 cities Correction using urban/rural poverty line differential
National poverty line ($/month at 2005 PPP) Log consumption per person at 2005 PPP Note: See Figure 1 Food PPP Fisher PPPP (Deaton-Dupriez) Lower poverty line: $22.72 $1 a day line! $31.72