THE MAIN INNOVATIONS OF DATA EDITING AND IMPUTATION FOR THE 2010 ITALIAN AGRICULTURAL CENSUS G. Bianchi, R. M. Lipsi, P. Francescangeli, G. Ruocco, A.

Slides:



Advertisements
Similar presentations
MICS 3 DATA ANALYSIS AND REPORT WRITING. Purpose Provide an overview of the MICS3 process in analyzing data Provide an overview of the preparation of.
Advertisements

Improving imputation methodology in the Hungarian Central Statistical Office (HCSO) NTTS 2009 seminar, Bruxelles February 2009 Improving imputation.
Data Imputation United Nations Statistics Division (UNSD) 16 March 2011 Santiago, Chile.
Some considerations on developing a DWH for SBS estimates Orietta Luzi – Mauro Masselli Istat - Italy march 2013.
Migration of a large survey onto a micro-economic platform Val Cox April 2014.
Quality Guidelines for statistical processes using administrative data European Conference on Quality in Official Statistics Q2014 Giovanna Brancato, Francesco.
Quality in Italian consumer price survey: optimal allocation of resources and indicators to monitor the data collection process Federico Polidoro, Rosabel.
New procedures for Editing and Imputation of demographic variables G. Bianchi, A. Manzari, A. Pezone, A. Reale, G. Saporito ISTAT.
1 Editing Administrative Data and Combined Data Sources Introduction.
Edit and Imputation of the 2011 Abu Dhabi Census Glenn Hui and Hanan AlDarmaki Statistics Centre - Abu Dhabi UNECE CES Work Session on Statistical Data.
Trade and business statistics: use of administrative data Lunch Seminar Enrico Giovannini Italian National Statistical Institute (ISTAT) New York, February,
Vienna, 23 April 2008 UNECE Work Session on SDE Topic (v) Editing on results (post-editing) 1 Topic (v): Editing based on results Discussants: Maria M.
Eurostat Statistical Data Editing and Imputation.
Combining administrative and survey data: potential benefits and impact on editing and imputation for a structural business survey UNECE Work Session on.
Using survey data collection as a tool for improving the survey process Silvia Biffignandi, Antonio Laureti Giulio Perani University of Bergamo Istat Istat.
Giovanna Brancato, Marina Signore Istat Work Session on Statistical Metadata (METIS) Metadata and Quality Indicators Reuse for Quality reporting Geneva,
Joint UNECE/Eurostat Meeting on Population and Housing Censuses (28-30 October 2009) Accuracy evaluation of Nuts level 2 hypercubes with the adoption of.
European Conference on Quality in Official Statistics (Q2010) 4-6 May 2010, Helsinki, Finland Brancato G., Carbini R., Murgia M., Simeoni G. Istat, Italian.
Rudi Seljak, Metka Zaletel Statistical Office of the Republic of Slovenia TAX DATA AS A MEANS FOR THE ESSENTIAL REDUCTION OF THE SHORT-TERM SURVEYS RESPONSE.
Software Systems for Survey and Census Yudi Agusta Statistics Indonesia (Chief of IT Division Regional Statistics Office of Bali Province) Joint Meeting.
Deliverable 2.6: Selective Editing Hannah Finselbach 1 and Orietta Luzi 2 1 ONS, UK 2 ISTAT, Italy.
1 Improving Statistics for Food Security, Sustainable Agriculture and Rural Development – Action Plan for Africa THE RESEARCH COMPONENT OF THE IMPLEMENTATION.
CZECH STATISTICAL OFFICE Na padesátém 81, CZ Praha 10, Czech Republic 1 Subsystem QUALITY in Statistical Information System Czech.
New and Emerging Methods Maria Garcia and Ton de Waal UN/ECE Work Session on Statistical Data Editing, May 2005, Ottawa.
Topic (ii): New and Emerging Methods Maria Garcia (USA) Jeroen Pannekoek (Netherlands) UNECE Work Session on Statistical Data Editing Paris, France,
Jeroen Pannekoek - Statistics Netherlands Work Session on Statistical Data Editing Oslo, Norway, 24 September 2012 Topic (I) Selective and macro editing.
European Conference on Quality in Official Statistics Session 26: Quality Issues in Census « Rome, 10 July 2008 « Quality Assurance and Control Programme.
Recommended Practices for Editing and Imputation in the European Statistical System: the EDIMBUS Project* Orietta Luzi (Istat, Italy) Ton De Waal (Statistics.
Stop the Madness: Use Quality Targets Laurie Reedman.
Oslo, 24–26 September 2012 Work Session on Statistical Data Editing APPLICATION OF THE DEVELOPED SAS MACRO FOR EDITING AND IMPUTATION AT.
Statistik.atSeite 1 Norbert Rainer Quality Reporting and Quality Indicators for Statistical Business Registers European Conference on Quality in Official.
Quality Assurance Programme of the Canadian Census of Population Expert Group Meeting on Population and Housing Censuses Geneva July 7-9, 2010.
2008 Population Census of Cambodia Post Enumeration Survey Mrs. Hang Lina Deputy Director General National Institute of Statistics, Min. of Planning Regional.
CBS-SSB STATISTICS NETHERLANDS – STATISTICS NORWAY Work Session on Statistical Data Editing Oslo, Norway, September 2012 Jeroen Pannekoek and Li-Chun.
Statistical Expertise for Sound Decision Making Quality Assurance for Census Data Processing Jean-Michel Durr 28/1/20111Fourth meeting of the TCG - Lubjana.
1 C. ARRIBAS, D. LORCA, A. SALINERO & A. COLMENERO Measuring statistical quality at the Spanish National Statistical Institute.
Lyne Guertin Census Data Processing and Estimation Section Social Survey Methods Division Methodology Branch, Statistics Canada UNECE April 28-30, 2014.
The challenge of a mixed-mode design survey and new IT tools application: the case of the Italian Structure Earning Surveys Fabiana Rocci Stefania Cardinleschi.
Topic (iii): Macro Editing Methods Paula Mason and Maria Garcia (USA) UNECE Work Session on Statistical Data Editing Ljubljana, Slovenia, 9-11 May 2011.
Outlining a Process Model for Editing With Quality Indicators Pauli Ollila (part 1) Outi Ahti-Miettinen (part 2) Statistics Finland.
Paolo Valente - UNECE Statistical Division Slide 1 Technology for census data coding, editing and imputation Paolo Valente (UNECE) UNECE Workshop on Census.
Pilot Census in Poland Some Quality Aspects Geneva, 7-9 July 2010 Janusz Dygaszewicz Central Statistical Office POLAND.
Topic (i): Selective editing / macro editing Discussants Orietta Luzi - Italian National Statistical Institute Rudi Seljak - Statistical Office of Slovenia.
Regional Seminar on Promotion and Utilization of Census Results and on the Revision on the United Nations Principles and Recommendations for Population.
Evaluating the Quality of Editing and Imputation: the Simulation Approach M. Di Zio, U. Guarnera, O. Luzi, A. Manzari ISTAT – Italian Statistical Institute.
United Nations Oslo City Group on Energy Statistics OG7, Helsinki, Finland October 2012 ESCM Chapter 8: Data Quality and Meta Data 1.
Methods and software for editing and imputation: recent advancements at Istat M. Di Zio, U. Guarnera, O. Luzi, A. Manzari ISTAT – Italian Statistical Institute.
Copyright 2010, The World Bank Group. All Rights Reserved. Recommended Tabulations and Dissemination Section B.
Copyright 2010, The World Bank Group. All Rights Reserved. Managing processes Core business of the NSO Part 1 Strengthening Statistics Produced in Collaboration.
New and Emerging Methods UN/ECE Work Session on Statistical Data Editing Vienna April 21-23, 2008.
Census quality evaluation: Considerations from an international perspective Bernard Baffour and Paolo Valente UNECE Statistical Division Joint UNECE/Eurostat.
The development of a data editing and imputation tool set UN/ECE Work Session on Statistical Data Editing Topic (ii): Global solutions to editing Claude.
An Overview of Editing and Imputation Methods for the next Italian Censuses Gianpiero Bianchi, Antonia Manzari, Alessandra Reale UNECE-Eurostat Meeting.
14-Sept-11 The EGR version 2: an improved way of sharing information on multinational enterprise groups.
How to deal with quality aspects in estimating national results Annalisa Pallotti Short Term Expert Asa 3st Joint Workshop on Pesticides Indicators Valletta.
Use of administrative data for outlier detection in the VI Italian agriculture census A. Reale 1, M. Riani 2, M. Greco 1, G. Ruocco 1 1 ISTAT, Census Department;
Session topic (i) – Editing Administrative and Census data Discussants Orietta Luzi and Heather Wagstaff UNECE Worksession on Statistical Data Editing.
Theme (v): Managing change
Theme (i): New and emerging methods
An R package for selective editing based on a latent class model
UNITED NATIONS ECONOMIC COMMISSION FOR EUROPE CONFERENCE OF EUROPEAN STATISTICIANS Work Session on Statistical Data Editing April 2017 The Hague,
Estimation methods for the integration of administrative sources
Survey phases, survey errors and quality control system
Survey phases, survey errors and quality control system
Tomaž Špeh, Rudi Seljak Statistical Office of the Republic of Slovenia
Software Systems for Survey and Census
Aurora De Santis, Riccardo Carbini Istat, Italy
Albania 2021 Population and Housing Census - Plans
Network of Methodologists
GSBPM AND ISO AS QUALITY MANAGEMENT SYSTEM TOOLS: AZERBAIJAN EXPERIENCE Yusif Yusifov, Deputy Chairman of the State Statistical Committee of the Republic.
Presentation transcript:

THE MAIN INNOVATIONS OF DATA EDITING AND IMPUTATION FOR THE 2010 ITALIAN AGRICULTURAL CENSUS G. Bianchi, R. M. Lipsi, P. Francescangeli, G. Ruocco, A. M. Salvatore, F. Scalfati Work Session on Statistical Data Editing Ljubljana, Slovenia, 9-11 May 2011 UNECE - CONFERENCE OF EUROPEAN STATISTICIANS

Outline Introduction E&I strategy guidelines strategy and census stages during data collection stages after data capturing E&IS tools and innovations Conclusions References

Introduction  For the 6 th Italian Agricultural Census, a new Editing and Imputation System (E&IS) has been implemented in order to reduce the total census error  The main purpose of the E&IS is to identify and treat the non sampling errors, in order to provide a complete and consistent set of data

E&I strategy guidelines (1)  Quality oriented approach by performing the E&I process from data collection to the final figures  Data editing and detection of outliers and influential errors (selective editing) during data collection  After data capturing, scheduling of two main correction stages, centrally managed by Istat

E&I strategy guidelines (2)  Use of techniques that minimize the number of changes especially for the treatment of not influential random errors  Quality indicators to monitor the main steps of E&I  Ad-hoc documentation to evaluate the outcome of the procedures, paying particular attention to changes due to the E&I process

E&I strategy and census stages (1)  According to the E&I strategy, all variables are separated into different related subsets to identify the most appropriate treatment for each of them  The E&I process will feature three main stages: 1.E&I during data collection 2.Provisional figures dissemination (primary variables) 3.Final results dissemination

E&I strategy and census stages (2)

E&I during data collection (1) In order to prevent and correct fatal errors and missing values during data capturing Questionnaire editing Holdings/enumerators A subset of 220 checking rules (fatal and query) has been implemented in the web based data entry System Automatic check Data collection staff Before the final release of data to the census DB, to localize potential errors slipped during data gathering Census Data Collection System

E&I during data collection (2) Before the end of field enumeration operations, and while data collection network is still in force, two distinct procedures have been implemented and launched by Istat to detect influential errors and outlier values Outliers detection -Forward Search Technique -manual review of anomalous values by data collection staff Micro-editing check Underlines inconsistent data by analyzing at unit level the coherence between the answers referring to related topics E&I SYSTEM

E&I during data collection (3) Forward Search Technique: outliers detection among strata, defined according to the crop type and the farm size Census Administrative Register -Regression line Y=aX+b -Parameters estimation a and b, with and without outliers -Statistical significance and goodness of fit of the regression model (R2)

E&I stages after data capturing (1)  In order to achieve maximum coherence between provisional and final data at regional level, the strategy adopted is firstly to correct all the primary variables and then the secondary ones  After data collection, two main correction stages are scheduled. In the first stage, all the variables for the dissemination of provisional figures (primary variables) are corrected  In each E&I stage, the following steps are repeated: automatic error detection and treatment of errors

E&I stages after data capturing (2) First step of each E&I stage Automatic error detection Macro level editing - Uses all (or large part) of data to identify errors - Enables to evaluate the accuracy of preliminary estimates such as totals (or subgroups main figures) - Outliers detection Micro level editing Erroneous values in individual records are automatically identified by means of edit rules

E&I stages after data capturing (3) Second step of each E&I stage Treatment of errors Selective editing Treatment of the outliers and influential errors, having substantial impact on data dissemination is based on manual review Random errors Treatment of not influential random errors is based on minimum change approaches Imputation Model based techniques or nearest neighbour donor will be used for the imputation of not influential random errors

E&IS tools and innovations (1)  Inclusion of a subset of edit rules in the data capture stage  Use of Forward Search methods for the outliers detection  Use of administrative sources for micro and macro data checks  Use of score functions to prioritize records to be manually reviewed  Use of minimum change based model or nearest neighbour approach for localizing residual random errors  Mix of different imputation methods as nearest neighbour approach or model based imputation

E&IS tools and innovations (2)  The core of E&IS is the software DIESIS (Data Imputation Editing System – Italian Software), used for dealing with non influential errors in quantitative variables  DIESIS was developed in 2001 by ISTAT and academic researchers of the University of Rome “ Sapienza”  In DIESIS, optimization techniques were implemented for the simultaneous treatment of qualitative and quantitative variables  Joint use of data driven and minimum change approaches  DIESIS localization performance has been compared with the performance of the Canadian software BANFF

E&IS tools and innovations (3)  The scheduling and the monitoring of all procedures and the interactive corrections will be managed by CONCERT, a Java web application  To test the E&IS while implementing the scheduled procedures, an Oracle database was implemented  The whole process of E&I will be documented by a set of quality indicators both, on the data collected and on the results of the different editing steps

E&IS tools and innovations (4)

E&IS tools and innovations (5) Some simulation studies have been carried out for:  identifying for each section of the questionnaire, the most appropriate correction approach  evaluating the imputation of missing non linearly dependent data through conditional Copula functions (developed by ISTAT and the University of Bologna)  assessing the use of Forward Search techniques (robust statistical methods) for outliers detection (developed by ISTAT and the University of Parma)

Conclusions  The innovative E&I strategy will reduce the efforts of coping with timeliness constraints and will increase data consistency and accuracy  The results of the procedures implemented in the E&IS are very encouraging and allow to trust in a good improvement of census data quality

Thank you!!!

References Bianchi G., Di Lascio F. M. L., Giannerini S., Manzari A., Reale A., Ruocco G. (2009-a) Exploring copulas for the imputation of missing nonlinearly dependent data, Seventh Scientific Meeting of the CLAssification and Data Analysis Group of the Italian Statistical Society Università di Catania (Italy). September 9-11, Bianchi G., Francescangeli P., Manzari A., Reale A., Ruocco G., Salvi S. (2009-b) An overview of Editing and Imputation System of 2010 Italian Agriculture Census. Round. Roundtable Meeting on Programme for the 2010 Round Census of Agriculture. Budapest november Bianchi G., Manzari A., Reale A., Salvi S. (2009-c) Valutazione dell’idoneità del software DIESIS all’individuazione dei valori errati in variabili quantitative. Istat - Collana Contributi Istat – n. 1 – Cotton C. (1991) Functional description of the generalized edit and imputation system. Business Survey Methods Division - July 25 Statistics Canada. Kovar J.G., MacMillian J.H., and Whitridge P. (1988) Overview and strategy for the generalized edit and imputation system. Report, Methodology Branch - April 1988 (updated February 1991) Statistics Canada. Luzi et al. (2007). EDIMBUS. Recommended Practices for Editing and Imputation in Cross-Sectional Business Surveys, August Riani M., Atkinson A. C. (2000). Robust Diagnostic Data Analysis: Trasformations in Regression. TECHNOMETRICS. vol. 42, pp ISSN: With discussion.