Sampling strategy for the dual-system correction of the under-coverage in the Register Supported 2011 Italian Population Census Loredana Di Consiglio,

Slides:



Advertisements
Similar presentations
EVAULATION OF THE NSCRG SCHOOL SAMPLE Donsig Jang and Xiaojing Lin Third International Conference on Establishment Surveys Montreal, Canada, June 21, 2007.
Advertisements

Innovation data collection: Advice from the Oslo Manual South East Asian Regional Workshop on Science, Technology and Innovation Statistics.
Innovation Surveys: Advice from the Oslo Manual South Asian Regional Workshop on Science, Technology and Innovation Statistics Kathmandu,
Unido.org/statistics International workshop on industrial statistics 8 – 10 July, Beijing Non response in industrial surveys Shyam Upadhyaya.
Nonparametric estimation of non- response distribution in the Israeli Social Survey Yury Gubman Dmitri Romanov JSM 2009 Washington DC 4/8/2009.
Chapter 6 Sampling and Sampling Distributions
Sta220 - Statistics Mr. Smith Room 310 Class #14.
1. Estimation ESTIMATION.
Small area Estimation of Italian poverty and social exclusion indicators Stefano Falorsi Michele D’Alò Loredana Di Consiglio Fabrizio Solari Matteo Mazziotta.
Chapter 7 Sampling and Sampling Distributions
7-2 Estimating a Population Proportion
Experimental Evaluation
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Lecture Slides Elementary Statistics Eleventh Edition and the Triola Statistics Series by.
FINAL REPORT: OUTLINE & OVERVIEW OF SURVEY ERRORS
Presented by Mst. Maksuda Shilpi Deputy Director Bangladesh Bureau of Statistics Bangladesh.
Joint UNECE/Eurostat Meeting on Population and Housing Censuses (13-15 May 2008) Sample results expected accuracy in the Italian Population and Housing.
Arun Srivastava. Types of Non-sampling Errors Specification errors, Coverage errors, Measurement or response errors, Non-response errors and Processing.
Eurostat Repeated surveys. Presented by Eva Elvers Statistics Sweden.
FARMS MULTIFUNCTIONALITY AND HOUSEHOLDS INCOMES IN SUSTAINABLE RURAL DEVELOPMENT Session 4: Income and Employment of the Rural Household By Marco Ballin.
Innovations on methods and survey process for the 2011 Italian population census European Conference on Quality in Official Statistics 8-11 July, 2008.
Joint UNECE/Eurostat Meeting on Population and Housing Censuses (28-30 October 2009) Accuracy evaluation of Nuts level 2 hypercubes with the adoption of.
5 Marzo 2007 EMERGING METHODOLOGIES OF CONTINUOUS USE OF REGISTERS AND GEOCODED DATABASES IN THE ITALIAN POPULATION AND HOUSING CENSUS Fabio Crescenzi,
Statistical Methods, part 1 Module 2: Latent Class Analysis of Survey Error Models for measurement errors Dan Hedlin Stockholm University November 2012.
Rudi Seljak, Metka Zaletel Statistical Office of the Republic of Slovenia TAX DATA AS A MEANS FOR THE ESSENTIAL REDUCTION OF THE SHORT-TERM SURVEYS RESPONSE.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Chapter 7 Estimates and Sample Sizes
Random Sampling, Point Estimation and Maximum Likelihood.
9 th Workshop on Labour Force Survey Methodology – Rome, May 2014 The Italian LFS sampling design: recent and future developments 9 th Workshop on.
List frames area frames and administrative data, are they complementary or in competition? Elisabetta Carfagna University of Bologna Department of Statistics.
Use of Administrative Data in Statistics Canada’s Annual Survey of Manufactures Steve Matthews and Wesley Yung May 16, 2004 The United Nations Statistical.
United Nations Regional Workshop on the 2010 World Programme on Population and Housing Censuses: Census Evaluation and Post Enumeration Surveys Bangkok,
Transition from traditional census to sample survey? (Experience from Population and Housing Census 2011) Group of Experts on Population and Housing Censuses,
Use of web scraping and text mining techniques in the Istat survey on “Information and Communication Technology in enterprises” Giulio Barcaroli(*), Alessandra.
Eurostat Overall design. Presented by Eva Elvers Statistics Sweden.
The relationship between error rates and parameter estimation in the probabilistic record linkage context Tiziana Tuoto, Nicoletta Cibella, Marco Fortini.
ISTAT - Italian National Institute of Statistics Labour Force Survey Division Unit “Methods for LFS data treatment” European Conference on Quality in Official.
European Conference on Quality in Official Statistics Session 26: Quality Issues in Census « Rome, 10 July 2008 « Quality Assurance and Control Programme.
The new multiple-source system for Italian Structural Business Statistics based on administrative and survey data Orietta Luzi, Ugo Guarnera, Paolo Righi.
1 Enhancing Small Area Estimation Methods Applications to Istat’s Survey Data Ranalli M.G. ~ Università di Perugia D’Alo’ M., Di Consiglio L., Falorsi.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 1-1 Statistics for Managers Using Microsoft ® Excel 4 th Edition Chapter.
Managerial Economics Demand Estimation & Forecasting.
1 Chapter 6 Estimates and Sample Sizes 6-1 Estimating a Population Mean: Large Samples / σ Known 6-2 Estimating a Population Mean: Small Samples / σ Unknown.
Workshop on Census Evaluation for Countries in Asia EVALUATION OF 2009 POPULATION AND HOUSING CENSUS DATA Presented by Nguyen Van Hung and Phan Thi Minh.
1 Nonparametric Statistical Techniques Chapter 17.
Section 10.1 Confidence Intervals
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Section 7-1 Review and Preview.
1 Chapter 9 Hypothesis Testing. 2 Chapter Outline  Developing Null and Alternative Hypothesis  Type I and Type II Errors  Population Mean: Known 
Copyright 2010, The World Bank Group. All Rights Reserved. Part 1 Sample Design Produced in Collaboration between World Bank Institute and the Development.
2008 Population Census of Cambodia Post Enumeration Survey Mrs. Hang Lina Deputy Director General National Institute of Statistics, Min. of Planning Regional.
Lecture 4: Statistics Review II Date: 9/5/02  Hypothesis tests: power  Estimation: likelihood, moment estimation, least square  Statistical properties.
ICCS 2009 IDB Workshop, 18 th February 2010, Madrid 1 Training Workshop on the ICCS 2009 database Weighting and Variance Estimation picture.
Slide Slide 1 Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley. Overview.
Eurostat Weighting and Estimation. Presented by Loredana Di Consiglio Istituto Nazionale di Statistica, ISTAT.
1 For a Population Statistical Register Characteristics and Potentials for the Official Statistics Central department for administrative data and archives.
United Nations Workshop on Revision 3 of Principles and Recommendations for Population and Housing Censuses and Evaluation of Census Data, Amman 19 – 23.
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 1 Understandable Statistics S eventh Edition By Brase and Brase Prepared by: Lynn Smith.
United Nations Regional Workshop on the 2010 World Programme on Population and Housing Censuses: Census Evaluation and Post Enumeration Surveys Asunción,
Towards an improvement of current migration estimates for Italy Domenico Gabrielli, Maria Pia Sorvillo Istat - Italy Joint UNECE-Eurostat Work session.
Understanding Basic Statistics Fourth Edition By Brase and Brase Prepared by: Lynn Smith Gloucester County College Chapter Nine Hypothesis Testing.
ICCS 2009 IDB Seminar – Nov 24-26, 2010 – IEA DPC, Hamburg, Germany Training Workshop on the ICCS 2009 database Weights and Variance Estimation picture.
1 Probability and Statistics Confidence Intervals.
The combined use of multiple data sources in the population census Fabio Crescenzi, Giuseppe Sindoni National Institute of Statistics Rome, Italy
1/22#/ Post Enumeration Survey for Population Census Jaewon Lee Statistical Research Institute Statistics Korea.
Chapter 9 Hypothesis Testing Understanding Basic Statistics Fifth Edition By Brase and Brase Prepared by Jon Booze.
Chapter 6 Sampling and Sampling Distributions
STA248 week 121 Bootstrap Test for Pairs of Means of a Non-Normal Population – small samples Suppose X 1, …, X n are iid from some distribution independent.
Chapter Nine Hypothesis Testing.
P-values.
Hypothesis Testing and Confidence Intervals (Part 1): Using the Standard Normal Lecture 8 Justin Kern October 10 and 12, 2017.
Sampling and estimation
Presentation transcript:

Sampling strategy for the dual-system correction of the under-coverage in the Register Supported 2011 Italian Population Census Loredana Di Consiglio, Marco Fortini, Stefano Falorsi ISTAT Q2010 European Conference on Quality in Official Statistics - Helsinki, May 2010

Sampling strategy for the dual-system correction of the under- coverage in the Register Supported 2011 Italian Population Census Q2010 European Conference on Quality in Official Statistics - Helsinki, May 2010 Outline Purpose: to plan a sampling strategy taking into account for municipal undercoverage of next Italian Census round Sketch of 2011 Italian Census Sources of data useful in planning Post Enumeration Survey (PES) Sampling strategies considered for comparison Construction of a fictitious, but plausible, population for simulations of sampling universe Results of simulation study

Sampling strategy for the dual-system correction of the under- coverage in the Register Supported 2011 Italian Population Census Q2010 European Conference on Quality in Official Statistics - Helsinki, May 2010 Key innovations of the 2011 Italian census From traditional enumeration method… Search for households and people on the field … to a register-supported census Municipal population registers so to mail out questionnaires to people Data collection method based on web, mail back and municipal data collection centres Reduction of the number of enumerators Data collection from late respondents Coverage evaluation activities

Sampling strategy for the dual-system correction of the under- coverage in the Register Supported 2011 Italian Population Census Q2010 European Conference on Quality in Official Statistics - Helsinki, May 2010 Coverage evaluation program Requested by Eurostat quality report, it is anyhow crucial in this context of extensive process and methods innovations Over-coverage: people no more living in the municipality who are still enlisted into the population registers Checked by interviewers during contact of late-respondents Under-coverage: people living in the municipality being not yet enlisted in population registers  Supplemental lists of people  Extensive search on the field  Statistical estimation based on capture-recapture techniques

Sampling strategy for the dual-system correction of the under- coverage in the Register Supported 2011 Italian Population Census Q2010 European Conference on Quality in Official Statistics - Helsinki, May 2010 Overview of Italian census undercount Gross undercoverage of population registers Estimated by Fortini and Gallo (2009) in about 400,000 people (up to 560,000) through administrative data and mixture model analysis to account for underreporting in the source Gross undercoverage of 2001 Census (enumeration based) 2001 Post Enumeration Survey estimates that about 800,000 people were missed Both estimates are based on strong assumptions However, this evidence makes reasonable the use of municipal population registers as the main source for households enumeration

Sampling strategy for the dual-system correction of the under- coverage in the Register Supported 2011 Italian Population Census Q2010 European Conference on Quality in Official Statistics - Helsinki, May 2010 Capture-Recapture Approach Correction for population register undercount through a second source based on independent field enumeration x 1+ people enlisted into municipal register estimate of municipal population based on field enumeration survey in a sample or enumeration areas (EAs) estimate of people that would have been counted by both the sources if field enumeration had carried out on the whole municipal area Petersen estimator of the hidden population is (Wolter, 1986) Main goal: municipality estimates of population counts

Sampling strategy for the dual-system correction of the under- coverage in the Register Supported 2011 Italian Population Census Q2010 European Conference on Quality in Official Statistics - Helsinki, May 2010 Sampling design for the 2011 Post-Enumeration Survey About 1300 municipalities and 1,200,000 people will be sampled Two alternative two-stage sampling design with municipalities and enumeration areas as primary and secondary sampling units Design A - region by class of population size (less than 5000, , , more than 50000) Design B - aggregation of provinces inside region by the 4 classes of population size (help in reducing bias of SAE) Stratification and selection of municipalities according to their population size is considered for both designs It is necessary to sample among municipalities in order to control costs

Sampling strategy for the dual-system correction of the under- coverage in the Register Supported 2011 Italian Population Census Q2010 European Conference on Quality in Official Statistics - Helsinki, May 2010 Estimators Direct estimates of census counts are available only at planned domain level small area estimation methods are needed at least for municipalities not included in the sample Possible available predictors at area level modelling Population counts coming from register Demographic indicators (e.g. dependency ratios) Socio economic indicators In what follows we consider  Direct estimation at regional level (Planned domains)  Synthetic estimator at municipality level Assumption of invariance among municipal under-coverage rates at planned domain level

Sampling strategy for the dual-system correction of the under- coverage in the Register Supported 2011 Italian Population Census Q2010 European Conference on Quality in Official Statistics - Helsinki, May 2010 Direct Estimators Simple Calibrated Expansion estimators Inverse of the selection probability Final weight

Sampling strategy for the dual-system correction of the under- coverage in the Register Supported 2011 Italian Population Census Q2010 European Conference on Quality in Official Statistics - Helsinki, May 2010 Synthetic Estimator Based on invariance assumption of under-coverage rates for municipalities belonging to the same planned domain For each system of weights, the coverage ratio is computed at domain level From the ratios, simple and calibrated synthetic estimators are obtained for municipalities Simple Calibrated

Sampling strategy for the dual-system correction of the under- coverage in the Register Supported 2011 Italian Population Census Q2010 European Conference on Quality in Official Statistics - Helsinki, May 2010 Empirical study It is based on simulation study Two pseudo-populations of 335,643 Italian EAs were considered Sources of information 2001 Italian Post Enumeration Census Administrative data on changes of residence occurred after 2001 census (from November 2002 to December 2005) For every non empty EAs belonging to the 8101 Italian municipalities, the following counts were generated  Observed count from population register (X 1+ )  True (N) population count  Field enumeration count (X +1 )  Count of people enumerated by both the sources (X 11 )

Sampling strategy for the dual-system correction of the under- coverage in the Register Supported 2011 Italian Population Census Q2010 European Conference on Quality in Official Statistics - Helsinki, May 2010 Assemble the Pseudo-population For each Municipality Munic. Id EA Id True NP. RegSurveyBoth Tot.746 EA Population register counts come from 2001 Census counts

Sampling strategy for the dual-system correction of the under- coverage in the Register Supported 2011 Italian Population Census Q2010 European Conference on Quality in Official Statistics - Helsinki, May 2010 Assign True population counts to municipality For each Municipality Munic.EATrueNP.RegSurveyBoth Tot EA Population register counts come from 2001 Census counts True municipal Population counts: inflating P. Reg. with coverage rate ‘r’ estimated by model in Fortini, Gallo (2009) (2 different populations) 1/r

Sampling strategy for the dual-system correction of the under- coverage in the Register Supported 2011 Italian Population Census Q2010 European Conference on Quality in Official Statistics - Helsinki, May 2010 Assign True population counts to EAs For each Municipality Munic.EATrueNP.RegSurveyBoth Tot EA Population register counts come from 2001 Census counts True municipal Population counts: inflating P. Reg. with coverage rate ‘r’ estimated by model in Fortini, Gallo (2009) (2 different populations) 1/r True N is allocated between EAs by hierarchical Dirichlet/Multinomial model with parameter vector p given by distribution of P. Reg population among EAs

Sampling strategy for the dual-system correction of the under- coverage in the Register Supported 2011 Italian Population Census Q2010 European Conference on Quality in Official Statistics - Helsinki, May 2010 Assign survey counts to EAs Each Municipality Munic.EATrue NP.RegSurveyBoth Tot EA Survey counts – True N multiplied by coverage rate ‘rs’ ‘rs’ from beta - binomial distribution “alpha” and “beta” such that mean and variance of 2001 PES coverage rates is reproduced (5 macro regions by 4 classes of munic. pop. size) rs 536

Sampling strategy for the dual-system correction of the under- coverage in the Register Supported 2011 Italian Population Census Q2010 European Conference on Quality in Official Statistics - Helsinki, May 2010 Assign survey counts to municipality Each Municipality Munic.EATrue NP. RegSurveyBoth Tot Municipal count is obtained summing up value of the EAs

Sampling strategy for the dual-system correction of the under- coverage in the Register Supported 2011 Italian Population Census Q2010 European Conference on Quality in Official Statistics - Helsinki, May 2010 Assign number of people enumerated by both the lists Each Municipality Munic.EATrueNP.RegSurveyBoth Tot People enumerated by both lists: Hypergeometric distribution at EA level with parameters True N, P.Reg, Survey

Sampling strategy for the dual-system correction of the under- coverage in the Register Supported 2011 Italian Population Census Q2010 European Conference on Quality in Official Statistics - Helsinki, May 2010 Assign number of people enumerated by both the lists Each Municipality Munic.EATrueNP.RegSurveyBoth Tot Municipal count is obtained summing up EAs

Sampling strategy for the dual-system correction of the under- coverage in the Register Supported 2011 Italian Population Census Q2010 European Conference on Quality in Official Statistics - Helsinki, May 2010 St. dev. of coverage rates among municipalities About 400,000 and 900,000 missing people were generated for pseudo- Register and pseudo-Survey respectively Population register variability is larger for POP2 than for POP1 Survey variability is larger than its respective Population register variability (because of its lower coverage rate) Survey variability is not so close to PES variability, even though their order of magnitude is the same

Sampling strategy for the dual-system correction of the under- coverage in the Register Supported 2011 Italian Population Census Q2010 European Conference on Quality in Official Statistics - Helsinki, May 2010 Variability of coverage rates among EAs – Population registers Pseudo-coverage of the register vs size of EAs (left) is compared with EAs coverage rates distribution at 2001 Italian PES (1098 EAs) Too many points here Simulated EAs show too many large units with very small coverage rate, which seems not realistic in our context

Sampling strategy for the dual-system correction of the under- coverage in the Register Supported 2011 Italian Population Census Q2010 European Conference on Quality in Official Statistics - Helsinki, May 2010 Variability of coverage rates among EAs – Control survey Pseudo-coverage of survey vs size of EAs (left) is compared with EAs coverage rates distribution at 2001 Italian PES (1098 EAs) Too few points here Simulated EAs show too few small units with small coverage rate in this case

Sampling strategy for the dual-system correction of the under- coverage in the Register Supported 2011 Italian Population Census Q2010 European Conference on Quality in Official Statistics - Helsinki, May 2010 Simulation of the sampling space Four tests: designs A and B for populations 1 and 2 Each simulation is based on 500 sample replications Sampling of municipalities with probability proportional to their population size Simple random sampling of EAs within municipalities Simple and weighted direct estimation at domain level Synthetic estimation at municipality level Population counts coming from population registers are used here as benchmark for comparisons downwards biased but available at zero cost of achievement

Sampling strategy for the dual-system correction of the under- coverage in the Register Supported 2011 Italian Population Census Q2010 European Conference on Quality in Official Statistics - Helsinki, May 2010 Results – Bias of registers vs. synthetic estimates Main results  Direct estimates have good performance in terms of bias and MSE at domain level  Calibrated estimates overcome the simple ones in terms of MSE, both for direct and synthetic estimators  The less-aggregate design B does not significantly improve the estimates, so only design A is shown here In terms of bias, synthetic estimator improves registers. Improvements decrease for larger municipalities. This results are more evident for population 1 than for population 2 In terms of maximum bias the improvement is not so noticeable

Sampling strategy for the dual-system correction of the under- coverage in the Register Supported 2011 Italian Population Census Q2010 European Conference on Quality in Official Statistics - Helsinki, May 2010 Bias of synthetic estimator vs register counts Population 1 - design A by class of municipality size Less than 5,000 5,000 – 19,000 Bisectors delimit the zone where synthetic estimates are better than simple register counts in term of bias 20,000 – 49,000 50,000 and more  Synthetic estimator almost always improve registers in terms of bias  However, the improvement does not seem so prominent

Sampling strategy for the dual-system correction of the under- coverage in the Register Supported 2011 Italian Population Census Q2010 European Conference on Quality in Official Statistics - Helsinki, May 2010 Bias of synthetic estimator vs register count Population 2 - design A by class of municipality size  Same conclusion for POP2 with worst results for larger municipalities

Sampling strategy for the dual-system correction of the under- coverage in the Register Supported 2011 Italian Population Census Q2010 European Conference on Quality in Official Statistics - Helsinki, May 2010 Results – MSE of synthetic and direct estimators Direct estimator can be applied to self-representative municipalities It is reported in the table for the two classes of larger municipalities On average, synthetic estimator overcome the direct, which seems not useful even in sampled municipalies MSE of synthetic estimates is much larger than Bias (in Table 2) Since in real cases this does not happen, this could be an evidence of a too high variability of pseudo-populations at level of EAs

Sampling strategy for the dual-system correction of the under- coverage in the Register Supported 2011 Italian Population Census Q2010 European Conference on Quality in Official Statistics - Helsinki, May 2010 Difference between synthetic and direct estimator in terms of MSE – municipalities larger than 50,000 inh. The most part of municipalities larger than 50,000 inh. show better Synthetic MSE (negative values) Direct and Synthetic estimates are equivalent for larger municipalities (>250,000 inh.), but only for in POP1

Sampling strategy for the dual-system correction of the under- coverage in the Register Supported 2011 Italian Population Census Concluding Remarks  Sampling strategy of next Italian Census PES is evaluated here through pseudo-population and simulated experiments  Slight improvement in census counts from registers is obtained from synthetic estimates  Though Census PES is required by EU regulation for evaluation purposes, our present results does not endorse the use of PES in order to correct Census counts  Even not discussed here, direct estimation with calibration achieved suitable results at domain level both in term of Bias and Variance Further developments  Better definition of pseudo-populations with respect to coverage ratios between EAs  Use of model estimation (EBLUP) is promising in our previous studies carried out in a simplified framework Q2010 European Conference on Quality in Official Statistics - Helsinki, May 2010