1 Small Area Models for Unemployment Rate Estimation at Sub-Provincial Areas in Italy DAló M., Di Consiglio L., Falorsi S., Solari F. ~ Istat Pratesi M.,

Slides:



Advertisements
Similar presentations
Copula Representation of Joint Risk Driver Distribution
Advertisements

Multistage Sampling.
Ecole Nationale Vétérinaire de Toulouse Linear Regression
Multiple Indicator Cluster Surveys Survey Design Workshop
By: Saad Rais, Statistics Canada Zdenek Patak, Statistics Canada
Linearization Variance Estimators for Survey Data: Some Recent Work
1 ESTIMATION IN THE PRESENCE OF TAX DATA IN BUSINESS SURVEYS David Haziza, Gordon Kuromi and Joana Bérubé Université de Montréal & Statistics Canada ICESIII.
1 Eloise E. Kaizar The Ohio State University Combining Information From Randomized and Observational Data: A Simulation Study June 5, 2008 Joel Greenhouse.
NTTS conference, February 18 – New Developments in Nonresponse Adjustment Methods Fannie Cobben Statistics Netherlands Department of Methodology.
Page 1 Measuring Survey Quality through Representativity Indicators using Sample and Population based Information Chris Skinner, Natalie Shlomo, Barry.
Jörg Drechsler (Institute for Employment Research, Germany) NTTS 2009 Brussels, 20. February 2009 Disclosure Control in Business Data Experiences with.
Outline of talk The ONS surveys Why should we weight?
Overview of Sampling Methods II
Sources and effects of bias in investigating links between adverse health outcomes and environmental hazards Frank Dunstan University of Wales College.
1 The Social Survey ICBS Nurit Dobrin December 2010.
A Synthetic Environment to Evaluate Alternative Trip Distribution Models Xin Ye Wen Cheng Xudong Jia Civil Engineering Department California State Polytechnic.
Labour Force Historical Review Sandra Keys, University of Waterloo DLI OntarioTraining University of Guelph, Guelph, ON April 12, 2006.
Multiple Regression. Introduction In this chapter, we extend the simple linear regression model. Any number of independent variables is now allowed. We.
AADAPT Workshop South Asia Goa, December 17-21, 2009 Kristen Himelein 1.
Nonparametric estimation of non- response distribution in the Israeli Social Survey Yury Gubman Dmitri Romanov JSM 2009 Washington DC 4/8/2009.
The STARTS Model David A. Kenny December 15, 2013.
Analysis of variance and statistical inference.
Empirical Estimator for GxE using imputed data Shuo Jiao.
Challenges in small area estimation of poverty indicators
Micro-level Estimation of Child Undernutrition Indicators in Cambodia Tomoki FUJII Singapore Management
Riku Salonen Regression composite estimation for the Finnish LFS from a practical perspective.
Nonparametric, Model-Assisted Estimation for a Two-Stage Sampling Design Mark Delorey, F. Jay Breidt, Colorado State University Abstract In aquatic resources,
Complex Surveys Sunday, April 16, 2017.
Raymond J. Carroll Texas A&M University Nonparametric Regression and Clustered/Longitudinal Data.
Small area Estimation of Italian poverty and social exclusion indicators Stefano Falorsi Michele D’Alò Loredana Di Consiglio Fabrizio Solari Matteo Mazziotta.
Clustered or Multilevel Data
Nonparametric, Model-Assisted Estimation for a Two-Stage Sampling Design Mark Delorey Joint work with F. Jay Breidt and Jean Opsomer September 8, 2005.
STAT262: Lecture 5 (Ratio estimation)
Ratio estimation with stratified samples Consider the agriculture stratified sample. In addition to the data of 1992, we also have data of Suppose.
Ordinary Kriging Process in ArcGIS
Maximum likelihood (ML)
Joint UNECE/Eurostat Meeting on Population and Housing Censuses (13-15 May 2008) Sample results expected accuracy in the Italian Population and Housing.
Improving Quality in the Office for National Statistics’ Annual Earnings Statistics Pete Brodie & Kevin Moore UK Office for National Statistics.
Overview G. Jogesh Babu. Probability theory Probability is all about flip of a coin Conditional probability & Bayes theorem (Bayesian analysis) Expectation,
Joint UNECE/Eurostat Meeting on Population and Housing Censuses (28-30 October 2009) Accuracy evaluation of Nuts level 2 hypercubes with the adoption of.
9 th Workshop on Labour Force Survey Methodology – Rome, May 2014 The Italian LFS sampling design: recent and future developments 9 th Workshop on.
Sampling strategy for the dual-system correction of the under-coverage in the Register Supported 2011 Italian Population Census Loredana Di Consiglio,
ISTAT - Italian National Institute of Statistics Labour Force Survey Division Unit “Methods for LFS data treatment” European Conference on Quality in Official.
Multilevel Data in Outcomes Research Types of multilevel data common in outcomes research Random versus fixed effects Statistical Model Choices “Shrinkage.
Introduction Since 1995, the Municipality of Firenze designed a quarterly labour force (LF) survey, parallel to that of ISTAT, to cope with the unavailability,
1 Enhancing Small Area Estimation Methods Applications to Istat’s Survey Data Ranalli M.G. ~ Università di Perugia D’Alo’ M., Di Consiglio L., Falorsi.
ESSnet on Small Area Estimation
United Nations Regional Workshop on the 2010 World Programme on Population and Housing Censuses: Census Evaluation and Post Enumeration Surveys, Bangkok,
Evaluating generalised calibration / Fay-Herriot model in CAPEX Tracy Jones, Angharad Walters, Ria Sanderson and Salah Merad (Office for National Statistics)
Weighting and estimation methods: description in the Memobust handbook Loredana di Consiglio, Fabrizio Solari 2013 European Establishment Statistics Workshop.
Eurostat Weighting and Estimation. Presented by Loredana Di Consiglio Istituto Nazionale di Statistica, ISTAT.
Chapter 7 Point Estimation of Parameters. Learning Objectives Explain the general concepts of estimating Explain important properties of point estimators.
Improving of Household Sample Surveys Data Quality on Base of Statistical Matching Approaches Ganna Tereshchenko Institute for Demography and Social Research,
Simulation Study for Longitudinal Data with Nonignorable Missing Data Rong Liu, PhD Candidate Dr. Ramakrishnan, Advisor Department of Biostatistics Virginia.
Rome, May 2014 Structural variables Weighting the Spanish annual subsample.
Statistical methods for real estate data prof. RNDr. Beáta Stehlíková, CSc
September 28, 2000 Improved Simultaneous Data Reconciliation, Bias Detection and Identification Using Mixed Integer Optimization Methods Presented by:
1 ESSnet on Small Area Estimation Meeting no. 4 ESSnet on Small Area Estimation Meeting no. 4 Neuchatel, 7-8 July 2011 WP4: Software Tools.
United Nations Regional Workshop on the 2010 World Programme on Population and Housing Censuses: Census Evaluation and Post Enumeration Surveys, Addis.
Small area estimation combining information from several sources Jae-Kwang Kim, Iowa State University Seo-Young Kim, Statistical Research Institute July.
Estimating standard error using bootstrap
Linear Mixed Models in JMP Pro
Regression composite estimation for the Finnish LFS from a practical perspective Riku Salonen.
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
OVERVIEW OF LINEAR MODELS
OVERVIEW OF LINEAR MODELS
Fixed, Random and Mixed effects
Marie Reijo, Population and Social Statistics
Longitudinal Data & Mixed Effects Models
SMALL AREA ESTIMATION FOR CITY STATISTICS
Presentation transcript:

1 Small Area Models for Unemployment Rate Estimation at Sub-Provincial Areas in Italy DAló M., Di Consiglio L., Falorsi S., Solari F. ~ Istat Pratesi M., Salvati N. ~ University of Pisa Ranalli M.G. ~ University of Perugia

2 OUTLINE Italian Labour Force Survey Standard and current small area estimators Enhanced Small area estimators Experimental study Analysis of results Final remarks

3 Labour Force Survey description Labour Force Survey (LFS) is a quarterly two stage survey with partial overlap of sampling units according to a rotation scheme of type (2-2-2). In each province the municipalities are classified as Self-Representing Areas (SRAs) and the Non Self-Representing Areas (NSRAs). From each SRAs a sample of households is selected. In NSRAs the sample is based on a stratified two stage sampling design. The municipalities are the primary sampling units (PSUs), while the households are the Secondary Sampling Units (SSUs). For each quarterly sample about 1350 municipalities and 200,000 individuals are involved.

4 Since 2000, ISTAT disseminates yearly LFS estimates of employed and unemployed counts related to the 784 Local Labour Market Areas (LLMAs). LLMAs are unplanned domains obtained as clusters of municipalities cutting across provinces which are the LFS finest planned domains. The direct estimates are unstable due to very small LLMA sample sizes (more than 100 LLMAs have zero sample size). SAE methods are necessary. Until 2003, a design based composite type estimator was adopted. Starting from 2004, after the redesign of LFS sampling strategy, a unit-level EBLUP estimator with spatially autocorrelated random area effects has been introduced. Small area estimation on LFS

5 Standard small area estimators – design based The GREG estimator is based on the standard linear model: Direct and GREG estimator and can be expressed as an adjustment of the direct estimator for differences between the sample and population area means of covariates The direct estimator is given by

6 Unit level EBLUP Standard small area estimators – model based The EBLUP assumes a standard linear mixed model with unit-specific auxiliary variables, random area-specific effects and errors independently normally distributed and is given by where

7 Unit level Synthetic estimator Standard small area estimators – model based Two Synthetic estimators have been considered. The first assumes a standard linear model as in GREG The second a linear mixed model with unit-specific auxiliary variables as for the EBLUP In both cases it is given by

8 Enhanced small area estimators 1. Unit level EBLUP with spatial correlation of area effects The matrix A depends on the distances among the areas and on an unknown parameter connected to the spatial correlation coefficient among the areas. The SEBLUP estimator is based on the following unit level linear mixed model:

9 Enhanced small area estimators 2. Model Based Direct Estimator (Chambers & Chandra, 2006) where the weights are such that is the (E)BLUP of The MBDE estimator is based on a unit level linear mixed model and is given by under the model (Royall, 1976). Calibrated with respect to the total of x. Reduces bias vs EBLUP Does not allow estimation for non-sampled areas Less efficient than EBLUP

10 In the literature there are many nonparametric regression methods (kernel, local polynomial, wavelets…) BUT difficult to incorporate in a Small area model Methods based on penalized splines (Eilers e Marx, 1996; Ruppert et al., 2003) can be estimated by means of mixed models -> promising candidate for SAE methods Enhanced small area estimators 3. Nonparametric EBLUP (Opsomer et al., 2008) Great Flexibility in definition of model Estimable with existing software using REML Hard to estimate efficiency and test for terms significance (via bootstrap?)

11 The estimator of the mean value has the following form: Enhanced small area estimators 4. Logistic models The estimation of the probability of being unemployed has been obtained considering different models – both fixed effect models and mixed effect models

12 LFS empirical study The simulation study on LFS has been carried out to estimate the unemployment rate at LLMA level 500 two-stage LFS sample have been drawn from 2001 census data set. The performances of the methods have been evaluated for the estimation of the unemployment rate in the 127 LLMAs belonging to the geographical area Center of Italy. GREG, Synthetic, EBLUP have been applied considering two different sets of auxiliary variables - LFS = real covariates - sex by 14 age classes + employment indicator at previous census; LFS+C = real covariates + geographic coordinates (latitude and longitude of the municipality the sampling unit belongs to).

13 SEBLUP: A spatial correlation in the variance matrix of the random effects has been considered in LFS model MBDE: Model based direct estimation is performed on sampled LLMAs, while synthetic estimators based on unit level linear mixed model is considered for non sampled LLMAs (LFS covariates) Nonparametric EBLUP: two semiparametric representations based on penalized splines have been applied (fitted as additional random effects): geographical coordinates of the municipality (EBLUP-LFS+SplineC): this allows for a finer representation of the spatial component vs SEBLUP-LFS (at municipality level instead of LLMA). age (EBLUP-SplineA & SEBLUP-SplineA) LOGIT: Logistic models are implemented with LFS covariates in both a fixed and mixed effects model and with a Spline for age in a fixed effects model. Enhanced Small area estimators

14 Average Absolute RB: Average RRMSE: Maximum Absolute RB: Maximum RRMSE: Evaluation Criteria % Relative Bias: % Relative Root Mean Squared Error:

ESTIMATORAARB ARRMSEMARB MRRMSE DIRECT GREG-LFS GREG-LFS+C SYNTH-LFS SYNTH-EB-LFS SYNTH-EB-LFS+C EBLUP-LFS EBLUP-LFS+C SEBLUP-LFS MBDE-LFS EBLUP-LFS+SplineC EBLUP-SplineA SEBLUP-SplineA LOGIT-LFS LOGIT-SplineA MLOGIT-LFS Results: LFS=LFS covariates; LFS+C= LFS+ geog. coord. mun.

16 Analysis of results The results show that the DIRECT estimator has the lowest AARB and the highest ARRMSE as expected. With respect to the Direct, the two GREG estimators increase bias and decrease variance. When geographical information is considered as fixed effect the estimators display better performances in terms of bias. The purely model based SYNTH-LFS estimator displays worse performances than mixed model based synthetic estimators. The EBLUP estimators show a larger bias but a much lower variance respect to the GREG estimators and a lower bias respect to the corresponding SYNTH-EB-LFS and SYNTH-EB-LFS+C. The close performances between EBLUP and SYNTH-EB estimators highlight the importance of the introduction of random area effects in the model.

17 The performance of the MBDE-LFS estimator is a compromise between GREG-LFS and EBLUP-LFS. The estimators EBLUP-LFS+C, SEBLUP-LFS and EBLUP-LFS+SplineC, including the spatial information in different ways, display similar results in terms bias and MSE. EBLUP-LFS+C is to be preferred for its simplicity and because it seems to keep more under control the bias. Similar outcomes are exhibited by EBLUP-LFS and EBLUP-SplineA, but the latter is more parsimonious. LOGIT-LFS underperforms the corresponding SYNTH-LFS. LOGIT-SplineA shows very similar results to that of LOGIT-LFS showing again that a simpler model than LFS should be detected. With respect to the EBLUP-LFS estimator, the MLOGIT-LFS estimator is better in terms of bias, but performs poorly in terms of MSE Analysis of results

18 Final remarks Sensitivity to smoothing parameters choice in the splines approach has to be investigated. The introduction of the sampling weights should be considered to try to achieve benchmarking with direct estimates produced at regional level The model group is a small portion of Italy (center); hence the area specific effects are smaller than they could be if an overall model was considered for all the country: the introduction of geographical information should be analyzed considering a larger model level group