Challenges in small area estimation of poverty indicators Risto Lehtonen, Ari Veijanen, Maria Valaste (University of Helsinki) , and Mikko Myrskylä (Max Planck Institute for Demographic Research, Rostock) Ameli 2010 Conference, 25-26 February 2010, Vienna
Outline Background Material and methods Results Discussion References
EU/FP7 Project AMELI Advanced Methodology for European Laeken Indicators (2008-2011) The project is supported by European Commission funding from the Seventh Framework Programme for Research DoW: The study will include research on data quality including Measurement of quality Treatment of outliers and nonresponse Small area estimation The measurement of development over time
Material and methods Investigation of statistical properties (bias and accuracy) of estimators of selected Laeken indicators for population subgroups or domains and small areas Method: Design-based Monte Carlo simulation experiments based on real data Data: Statistical register data based on merging of administrative register data at the unit level (Finland)
Laeken indicators based on binary variables At-risk-of poverty rate Direct estimators Horvitz-Thompson estimators HT Indirect estimators Model-assisted GREG and MC estimators Model-based EBLUP and EB estimators Modelling framework Generalized linear mixed models GLMM Lehtonen and Veijanen (2009) Rao (2003), Jiang and Lahiri (2006)
Laeken indicators based on medians or quantiles Indicators based on medians or quantiles of cumulative distribution function of the underlying continuous variable Relative median at-risk-of poverty gap Quintile share ratio (S20/S80 ratio) Gini coefficient Direct estimators DEFAULT Synthetic estimators SYN Expanded prediction SYN estimators EP-SYN Composite estimators COMP Simulation-based methods
Generalized linear mixed models
Design-based GREG type estimators for poverty rate
Model-based estimators for poverty rate
Poverty gap for domains Relative median at-risk-of poverty gap Poverty gap in domain d describes the difference between the poor people's median income and the at-risk-of-poverty threshold t
Estimators of poverty gap
Estimators of poverty gap
Estimators of poverty gap
Estimators of poverty gap
MSE estimation for direct estimator DEFAULT
MSE estimation for SYN estimator
Monte Carlo simulation Fixed finite population of 1,000,000 persons D = 70 domains of interest Cross-classification of NUTS 3 with sex and age group (7x2x5) Y-variables Equivalized income (based on register data) Binary indicator for persons in poverty X-variables (binary or continuous variables) house _owner (binary) education_level (7 classes) and educ_thh lfs_code (3 classes) and empmohh socstrat (6 classes) sex_class and age_class (5 age classes) NUTS3
Sampling designs SRSWOR sampling Stratified SRSWOR Sample size n = 5,000 persons Stratified SRSWOR Stratification by education level of HH head H = 7 strata Unequal inclusion probabilities Design weights vary between strata Min: 185, Max: 783 K = 1000 independent samples
Quality measures of estimators Design bias Absolute relative bias ARB (%) Accuracy Relative root mean squared error RRMSE (%)
Discussion: Poverty rate Indirect design-based estimator MLGREG Design unbiased Large variance in small domains Small variance in large domains Indirect model-based estimator EB Design biased Small variance also in small domains Accuracy: EB outperformed MLGREG Might be the best choice at least for small domains unless it is important to avoid design bias
Discussion: Poverty gap Direct estimator DEFAULT Small design bias but large variance Indirect model-based SYN Very large bias but small variance Indirect model-based EP-SYN based on expanded predictions Much smaller bias and variance than in SYN Composite (DEFAULT with EP-SYN) Small domains: good compromise Large domains: bias can still dominate the MSE
Thank you for your attention!