Joint UNECE/Eurostat Meeting on Population and Housing Censuses (13-15 May 2008) Sample results expected accuracy in the Italian Population and Housing.

Slides:



Advertisements
Similar presentations
Multiple Indicator Cluster Surveys Survey Design Workshop
Advertisements

Innovation data collection: Methodological procedures & basic forms Regional Workshop on Science, Technology and Innovation (STI) Indicators.
Innovation data collection: Advice from the Oslo Manual South East Asian Regional Workshop on Science, Technology and Innovation Statistics.
Innovation Surveys: Advice from the Oslo Manual South Asian Regional Workshop on Science, Technology and Innovation Statistics Kathmandu,
Innovation Surveys: Advice from the Oslo Manual National training workshop Amman, Jordan October 2010.
for epidemiological studies
Sampling Strategy for Establishment Surveys International Workshop on Industrial Statistics Beijing, China, 8-10 July 2013.
Quality in Italian consumer price survey: optimal allocation of resources and indicators to monitor the data collection process Federico Polidoro, Rosabel.
Maintaining high quality surveys with optimized interviewers replacements : the new French sample monitoring strategy Sébastien Faivre, INSEE, Head of.
1 Pertemuan 01 PENDAHULUAN: Data dan Statistika Matakuliah: I0262-Statiatik Probabilitas Tahun: 2007.
FINAL REPORT: OUTLINE & OVERVIEW OF SURVEY ERRORS
Sampling Concepts Population: Population refers to any group of people or objects that form the subject of study in a particular survey and are similar.
Trade and business statistics: use of administrative data Lunch Seminar Enrico Giovannini Italian National Statistical Institute (ISTAT) New York, February,
Joint UNECE/Eurostat Work Session on Migration Statistics 3 March, 2008, Geneva, Switzerland Selected methods to improve emigration estimates MEASURING.
Sample Design.
The new HBS Chisinau, 26 October Outline 1.How the HBS changed 2.Assessment of data quality 3.Data comparability 4.Conclusions.
United Nations Workshop on Revision 3 of Principles and recommendations for Population and Housing Censuses and Census Evaluation Amman, Jordan, 19 – 23.
Estimating the Labour Force Trinidad and Tobago 28 th May 2014 Sterling Chadee Director of Statistics.
Customer : contractor : December, 2012 Sociologic Research on Awareness of Industrial Property Protection Possibilities.
Innovations on methods and survey process for the 2011 Italian population census European Conference on Quality in Official Statistics 8-11 July, 2008.
COLLECTING QUANTITATIVE DATA: Sampling and Data collection
Ilie Dumitrescu National Institute of Statistics Third International Conference on Agricultural Statistics MEXSAI-ICAS-III Cancun- Mexico, 2-4 November.
Joint UNECE/Eurostat Meeting on Population and Housing Censuses (28-30 October 2009) Accuracy evaluation of Nuts level 2 hypercubes with the adoption of.
Addressed Based Sampling as an Alternative to Traditional Sampling Approaches: An Exploration May 6, 2013.
TURKISH STATISTICAL INSTITUTE Social Sector Statistics Department Tourism Statistics Group
Multiple Indicator Cluster Surveys Survey Design Workshop Sampling: Overview MICS Survey Design Workshop.
9 th Workshop on Labour Force Survey Methodology – Rome, May 2014 The Italian LFS sampling design: recent and future developments 9 th Workshop on.
Workshop - Genève 22 november French rolling census Jean-Michel DURR Insee.
Use of Administrative Data in Statistics Canada’s Annual Survey of Manufactures Steve Matthews and Wesley Yung May 16, 2004 The United Nations Statistical.
Optimal Allocation in the Multi-way Stratification Design for Business Surveys (*) Paolo Righi, Piero Demetrio Falorsi 
Emerging methodologies for the census in the UNECE region Paolo Valente United Nations Economic Commission for Europe Statistical Division International.
Slide 1 Estimating Performance Below the National Level Applying Simulation Methods to TIMSS Fourth Annual IES Research Conference Dan Sherman, Ph.D. American.
1 Sources of gender statistics Angela Me UNECE Statistics Division.
United Nations Economic Commission for Europe Statistical Division Sources of gender statistics Angela Me UNECE Statistics Division.
Eurostat Overall design. Presented by Eva Elvers Statistics Sweden.
Census Mapping A Case of Zambia UN Workshop on Census Cartography and Management, Lusaka, 8-12 th October 2007.
Sampling strategy for the dual-system correction of the under-coverage in the Register Supported 2011 Italian Population Census Loredana Di Consiglio,
1 Customer: Contractor: Public Opinion Survey on Awareness of Possibilities on Protection of Industrial Property February 2008.
1 Improving Statistics for Food Security, Sustainable Agriculture and Rural Development – Action Plan for Africa THE RESEARCH COMPONENT OF THE IMPLEMENTATION.
Scot Exec Course Nov/Dec 04 Survey design overview Gillian Raab Professor of Applied Statistics Napier University.
The relationship between error rates and parameter estimation in the probabilistic record linkage context Tiziana Tuoto, Nicoletta Cibella, Marco Fortini.
ISTAT - Italian National Institute of Statistics Labour Force Survey Division Unit “Methods for LFS data treatment” European Conference on Quality in Official.
DATA EVALUATION METHODS USED IN THE PREVIOUS CENSUSES POST ENUMERATION AND DEMOGRAHIC ANALYSIS Gebeyehu Abelti Deputy Director General, Population & Social.
The 2010 Population and Housing Census of Mongolia: Census PES 1 A.Amarbal Director of Population and Housing Census Bureau, National Statistical Office.
Census of Economic Establishments in Ethiopia Yasin Mossa Central Statistics Agency of Ethiopia July 2009.
Eurostat Weighting and Estimation. Presented by Loredana Di Consiglio Istituto Nazionale di Statistica, ISTAT.
The challenge of a mixed-mode design survey and new IT tools application: the case of the Italian Structure Earning Surveys Fabiana Rocci Stefania Cardinleschi.
5 Marzo 2007 INNOVATIONS IN CENSUS MAPPING AND CENSUS DATA GEOCODING Fabio Crescenzi Istat, Central Directorate on General Censuses Joint UNECE/Eurostat.
Chapter 6: 1 Sampling. Introduction Sampling - the process of selecting observations Often not possible to collect information from all persons or other.
Targeting of Public Spending Menno Pradhan Senior Poverty Economist The World Bank office, Jakarta.
POST ENUMERATION SURVEY TANZANIA EXPERIENCE BY Mrs RADEGUNDA MARO.
Observatoire statistique transfrontalier de l'Espace franco-valdo-genevois : Using the French and Swiss censuses for transnational statistics Group of.
© Statistisches Bundesamt, VI A Statistisches Bundesamt The new method of the next german Population census Johann Szenzenstein, Federal Statistical Office,
1 Aspects of Sampling for Household Surveys Kathleen Beegle Workshop 17, Session 1c Designing and Implementing Household Surveys March 31, 2009.
An assessment of the robustness of weights in the Famille et Employeurs survey Nicolas Razafindratsima & Elisabeth Morand.
Census quality evaluation: Considerations from an international perspective Bernard Baffour and Paolo Valente UNECE Statistical Division Joint UNECE/Eurostat.
Joint UNECE-Eurostat worksession on confidentiality, 2011, Tarragona Sampling as a way to reduce risk and create a Public Use File maintaining weighted.
Census Office Fernando Casimiro Geneva, July 2010 Portugal – Census results tailored to user needs «
An Overview of Editing and Imputation Methods for the next Italian Censuses Gianpiero Bianchi, Antonia Manzari, Alessandra Reale UNECE-Eurostat Meeting.
Interstate Statistical Committee of the Commonwealth of Independent States (CIS-STAT) CES seminar “Challenges for future population and housing censuses.
COMBINING SURVEY AND ADMINISTRATIVE DATA IN THE ITALIAN EU-SILC EXPERIENCE: POSITIVE AND CRITICAL ASPECTS National Institute of Statistics - Italy Claudio.
The Population and Housing Census in Switzerland Marcel Heiniger, Swiss Federal Statistical Office IPUMS/IECM Workshop “Integrating European Census Microdata.
Sampling.
Dual Mode of Data Collection – A New Approach in the Population, Housing and Dwelling Census in Slovakia in 2011 European Conference on Quality in Official.
Italy towards a permanent census using administrative sources
Impact of a public education program on promoting rational use of medicines:
2011 POPULATION AND HOUSING CENSUS PREPARATORY WORKS
Sampling and estimation
SMALL AREA ESTIMATION FOR CITY STATISTICS
Presentation transcript:

Joint UNECE/Eurostat Meeting on Population and Housing Censuses (13-15 May 2008) Sample results expected accuracy in the Italian Population and Housing Census Giancarlo Carbonetti, Marco Fortini Istat – Italian National Statistical Institute General Censuses Directorate May 13th 2008

Joint UNECE Eurostat Meeting 2 Outline  Introduction  Some aspects related to the use of samples of households for long form enumerations  Sampling strategies  Simulation study  Some results  Conclusions

Joint UNECE Eurostat Meeting 3 Introduction - 1 Main critical issue of the last Census  Huge organizational (and economical) effort of Municipal Census Offices sudden and time-concentrated increase of workload for largest municipalities, massive network of enumerators and coordinators to be trained and managed lack of adequately skilled resources, high turn over rates Main objectives for the next Census to improve the census operations efficiency to reduce the municipalities workload to keep an high level of quality

Joint UNECE Eurostat Meeting 4 Introduction - 2  Innovations proposed to reach the objectives the use of population registers mail out of census forms mixed mode of data collection mainly based on mail and web  Expected consequences with the innovations the increase of “back office” work the reduction of enumerators number (“front office” work)  How it is possible increasing the response rates A proposal: the use of a “short form” version of the questionnaire is considered to reach high response rates.

Joint UNECE Eurostat Meeting 5 Introduction - 3 Consequences of the use of short form increasing the response rates reducing as much as possible the response time delay This approach risks information loss!!! How to preserve the richness of the census information by a selection of a sample of households to which a “long form” version of the questionnaire is supplied Strategy: the simultaneously use of short and long forms.

Joint UNECE Eurostat Meeting 6 Some aspects related to the use of samples of households for long form enumeration - 1 Which type of information can be surveyed by means of a sample of long forms and which must be collected on the whole population?  The overall set of census variables is partitioned into two subsets the demographic variables (gender, date of birth, marital status, nationality, …) the remaining variables (educational level, occupational status, commuting)  Short form accounts for merely the first set of variables whereas long form accounts for the whole set

Joint UNECE Eurostat Meeting 7 Some aspects related to the use of samples of households for long form enumeration - 2 Which is the population municipality threshold under which the sampling strategy cannot be adopted?  An option we are taking into consideration is to sample in municipalities with more than 5,000 inhabitants long forms will be submitted to a sample of households short forms will be administered to remaining households  In municipalities smaller than 5,000 inhabitants long forms will be submitted to the whole population

Joint UNECE Eurostat Meeting 8 Some aspects related to the use of samples of households for long form enumeration - 3 Which domains have to be considered to plan the sample and to produce accurate estimates?  New “census domains” have been defined an appropriate methodology was adopted to build up census domains by aggregating the smallest census areas the new “areas” are referred to sub-municipal level  Accuracy of sampling estimates for different territorial levels a similar precision is expected for estimates among areas higher precision is expected for larger territorial reference (from sub-municipal to nationwide level)

Joint UNECE Eurostat Meeting 9 Some aspects related to the use of samples of households for long form enumeration - 4 Which statistical methodology performs the most accurate estimation? … in terms of … sampling design use of appropriate lists efficient estimation methods sampling error assessment The answer to this question is the aim of the study of which some results will be presented.

Joint UNECE Eurostat Meeting 10 Sampling strategies Two different sampling designs have been tested  Simple Random Sampling of HOUseholds from Administrative Registers (SRSHOU) managed by municipalities  Area Frame Sampling based on a Simple Random Sampling of ENumeration Areas (SRSENA) which implies a complete data collection of households dwelling in the selected enumeration areas (from Digital Geocoded Database) Different studies have been conducted  To compare the two different approaches (with a sampling ratio of about one third of the whole population considered)  To evaluate in the SRSHOU the improvement of the estimates precision for increasing sampling ratio (10%, 15%, 20%, 33%)  To introduce some stratifications of the units involved

Joint UNECE Eurostat Meeting 11 Simulation study - 1 Main features of the sampling designs  Domains: the “new areas” referred to sub-municipal districts  Target variables: “variables” related to cross-classification of educational level, employment status and commuting with demographic variables  Sampling units: “households” or “enumeration area”  Estimator: “calibrated estimators” by using final weights properly modified so to make the sample more representative The sampling strategies were compared to each other through Monte Carlo sampling replications (carried out on 2001 Italian Census data) in order to assess the sampling error defined by the coefficient of variation (CV) which represents an accuracy measurement of the sampling estimates.

Joint UNECE Eurostat Meeting 12 Simulation study - 2 Geographical area Classes of population size (a) Total 10,000-20,00020, ,000 more than 100,000 North Center South Total (a) It has been considered the legal (official) population date referred to the 2001 Census of Population.  Because of the strong differences among the Italian municipalities, 40 of them with different population size and from different regions of Italy were considered

Joint UNECE Eurostat Meeting 13 Simulation study - 3 Sampled UnitsUniverse% Areas4973,347(*)14.85% Enumeration areas30,890382, % Households2,243,51121,810, % Individuals5,537,58256,594, % (*) Estimated number Amount of units involved by the simulation study

Joint UNECE Eurostat Meeting 14 Scatter plot of cv and p (estimates) for each census area. SRSHOU design (sampling ratio=33%). City of Perugia. 1% 2% 3%

Joint UNECE Eurostat Meeting 15 Distribution of median cv for classes of p for SRSHOU design and SRSENA design (both with sampling ratio=33%). Comparison of 4 municipalities. Classes of p Milano (111 areas)Bologna (32 areas)Padova (18 areas)Livorno (13 areas) SRSHOUSRSENASRSHOUSRSENASRSHOUSRSENASRSHOUSRSENA < 0.05% %├0.1% %├0.25% %├0.5% %├1% %├2.5% %├5% %├10% %├15% %├20% %├30% ≥ 30% THIS IS DUE TO THE CLUSTER EFFECT

Joint UNECE Eurostat Meeting 16 Loss of efficiency (in terms of CV for classes of p) of estimation with SRSENA with respect to SRSHOU design (both with sampling ratio=33%). Comparison of 4 municipalities. Classes of p Milano (111 areas) Bologna (32 areas) Padova (18 areas) Livorno (13 areas) < 0.05% %├0.1% %├0.25% %├0.5% %├1% %├2.5% %├5% %├10% %├15% %├20% %├30% ≥ 30% [CV (SRSHOU_s.r. 33%) -CV (SRSENA_s.r. 33%) ]

Joint UNECE Eurostat Meeting 17 Distribution of median cv for classes of p. Comparison of 4 different sampling ratios with the SRSHOU design. Classes of p sampling ratio= 10% sampling ratio= 15% sampling ratio= 20% sampling ratio= 33% 170 areas140 areas111 areas204 areas < 0.05% %├0.1% %├0.25% %├0.5% %├1% %├2.5% %├5% %├10% %├15% %├20% %├30% ≥ 30%

Joint UNECE Eurostat Meeting 18 Gain of efficiency (in terms of CV for classes of p) of estimation with SRSHOU design by increasing sampling ratio from 10% to 33%. Classes of p increasing s.r. from 10% to 15% increasing s.r. from 10% to 20% increasing s.r. from 10% to 33% < 0.05% %├0.1% %├0.25% %├0.5% %├1% %├2.5% %├5% %├10% %├15% %├20% %├30% ≥ 30% [CV (SRSHOU_s.r. 10%) -CV (SRSHOU_s.r. N%) ]x100/[CV (SRSHOU_s.r. 10%) ] Gain between percent Gain between percent Gain between percent

Joint UNECE Eurostat Meeting 19 Distribution of median cv for five classes of p and three classes of area (according to population size). Comparison of 4 different sampling ratios with the SRSHOU design. Classes of p Population by area (thousands) Sampling ratio 10%15%20%33% 0.1%├0.25% < ├ ≥ %├1% < ├ ≥ %├5% < ├ ≥ %├15% < ├ ≥ %├30% < ├ ≥

Joint UNECE Eurostat Meeting 20 Median CV for some classes of p and for three classes of area (according to population size). Comparison of 4 different sampling ratios (s.r.) with the SRSHOU design. Graph referred to area size less than 10,000 inhabitants.

Joint UNECE Eurostat Meeting 21 Median CV for some classes of p and for three classes of area (according to population size). Comparison of 4 different sampling ratios (s.r.) with the SRSHOU design. Graph referred to area size between 10,000 and 12,000 inhabitants. The gain of efficiency (in terms of CV) for census areas with size between 10,000 and 12,000 with respect to census areas with less than 10,000 is about percent. Similar results are obtained for all tested sampling ratios.

Joint UNECE Eurostat Meeting 22 Median CV for some classes of p and for three classes of area (according to population size). Comparison of 4 different sampling ratios (s.r.) with the SRSHOU design. Graph referred to area size more than 12,000 inhabitants. The gain of efficiency (in terms of CV) for census areas with size more than 12,000 with respect to census areas with less than 10,000 is about percent. As before, similar results are obtained for all tested sampling ratios.

Joint UNECE Eurostat Meeting 23 Distribution of the estimates referred to areas larger than 12,000 inhabitants for classes of cv. Comparison of percentage frequencies for 4 different sampling ratios with the SRSHOU design. Classes of coefficient of variation % Sampling ratio 10%15%20%33% < 2% %├5% %├10% %├20% %├50% %├100% %├200% ≥ 200% HA – high accuracy MA – medium accuracy LA – low accuracy

Joint UNECE Eurostat Meeting 24 Distribution of the estimates referred to areas larger than 12,000 inhabitants for classes of cv. Comparison of percentage frequencies for 4 different sampling ratios with the SRSHOU design - 2 Classes of cv% Sampling ratio 10%15%20%33% < 10% %├50% ≥ 50% HA - high accuracy MA - medium accuracy LA - low accuracy

Joint UNECE Eurostat Meeting 25 Generic sampled area a Territory R S given by aggregation of K sampled areas Percentage expected reduction of CV in R S Estimates of p referred to territory given by aggregation of areas. Territory R given by aggregation of sampled areas and not sampled areas Quote of sub-population of R elegible for drawing the LF sample. Percentage expected reduction of CV in R

Joint UNECE Eurostat Meeting 26 Conclusions - 1  As expected, the most accurate estimates were obtained for: simple random sampling of households from administrative registers largest sampling ratio  Better efficiency of estimates for largest areas (>12,000 inhabitants) this result could represent a suggestion for planning the sampling design by defining larger census areas (of about 15,000 people)  The estimates referred to large domains given by aggregation of areas show high accuracy the accuracy increases with the domain’s number in case in which a part of the large domain is totally surveyed, the estimates show a further increasing in accuracy

Joint UNECE Eurostat Meeting 27 Conclusions - 2  However area frame sampling is only slightly less efficient than SRSHOU, thus it could be adopted where reliable administrative registers are not available  Sampling ratio will be chosen considering trade-off between: needed financial savings accuracy required at different territorial domains  Further analyses will be conducted on small area estimation techniques to produce more accurate estimates for: smallest territorial levels rare populations

Joint UNECE Eurostat Meeting 28 Thank you for your attention and …

Joint UNECE Eurostat Meeting 29 … have a good lunch!!!

Joint UNECE Eurostat Meeting 30

Joint UNECE Eurostat Meeting 31 Simulation study - 4  Cross-classification cells educational level, employment status, commuting and gender 90 simple estimation cells  Calibration constraints defined by cross-classifying gender by age, and gender by marital status  Computational algorithm implemented by SAS code for each municipality and for each alternative sampling design: step 1) selection of a sample (of households or enumeration areas) step 2) computation of final weights step 3) estimation of the relative frequency p for each target cell step 4) iteration of steps 1), 2) and 3) for 1,000 sampling replications step 5) computation of sampling distribution mean and standard error for each one of the 90 frequency cells

Joint UNECE Eurostat Meeting 32 Evaluation criterion: the coefficient of variation In order to compare the sampling strategies has been considered as evaluation criterion the coefficient of variation CV : which represents an accuracy measurement of the sampling estimates. Consequently, the percentage maximum expected error can be computed: Δ% ≈ 1.96 · CV which is implied (with a probability of 0.95) by the estimation method.  The distribution of the empirical CV’s for all the 90 target cells was determined.  After having classified the target cells depending on their value p, CV’s distribution related to the cells in the same p group has been studied.

Joint UNECE Eurostat Meeting 33 Estimate referred to the generic sampled area a Estimate referred to the territory R S given by aggregation of K sampled areas where Percentage expected reduction of CV for K>5 → red%>50% for K>30 → red%>80% for K>100 → red%>90% Number of areas K ─ Percentage expected reduction of CV Estimates of p referred to territory given by aggregation of areas. Case 1: aggregation of sampled areas.

Joint UNECE Eurostat Meeting 34 Territory R S referred to Sampled areas: long form to a sample of households. Territory R NS of Not Sampled areas: long form to all the households Sub-population of R elegible for drawing the LF sample. Number of areas K Estimates of p referred to territory given by aggregation of areas. Case 2: aggregation of sampled and not sampled areas. ─ γ=1 ─ γ=0.7 ─ γ=0.6 ─ γ=0.5 Percentage expected reduction of CV