CBS-SSB STATISTICS NETHERLANDS – STATISTICS NORWAY Work Session on Statistical Data Editing Oslo, Norway, 24-26 September 2012 Jeroen Pannekoek and Li-Chun.

Slides:



Advertisements
Similar presentations
Integrated Data Editing and Imputation Ton de Waal Department of Methodology Voorburg Statistics Netherlands ICES III conference, Montréal June 19, 2007.
Advertisements

Data Imputation United Nations Statistics Division (UNSD) 16 March 2011 Santiago, Chile.
Module B-4: Processing ICT survey data TRAINING COURSE ON THE PRODUCTION OF STATISTICS ON THE INFORMATION ECONOMY Module B-4 Processing ICT Survey data.
CBS-SSB STATISTICS NETHERLANDS – STATISTICS NORWAY Work Session on Statistical Data Editing Ljubljana, Slovenia, 9-11 May 2011 Jeroen Pannekoek and Li-Chun.
Editing and Imputing VAT Data for the Purpose of Producing Mixed- Source Turnover Estimates Hannah Finselbach and Daniel Lewis Office for National Statistics,
Tool for Assessing Impact of Changing Editing Rules On Cost & Quality Alaa Al-Hamad, Begoña Martín, Gary Brown Processing, Editing & Imputation Branch.
UNECE Work Session on Statistical Data Editing Vienna April 2008 Topic ii – Editing Administrative Data and Combined Sources.
1 Editing Administrative Data and Combined Data Sources Introduction.
Selecting Preservation Strategies for Web Archives Stephan Strodl, Andreas Rauber Department of Software.
Analysis Stage (Phase I) The goal: understanding the customer's requirements for a software system. n involves technical staff working with customers n.
©Ian Sommerville 2000Software Engineering, 6/e, Chapter 91 Formal Specification l Techniques for the unambiguous specification of software.
Edit and Imputation of the 2011 Abu Dhabi Census Glenn Hui and Hanan AlDarmaki Statistics Centre - Abu Dhabi UNECE CES Work Session on Statistical Data.
Data Editing United Nations Statistics Division (UNSD) 16 March 2011 Santiago, Chile.
Environment Change Information Request Change Definition has subtype of Business Case based upon ConceptPopulation Gives context for Statistical Program.
Vienna, 23 April 2008 UNECE Work Session on SDE Topic (v) Editing on results (post-editing) 1 Topic (v): Editing based on results Discussants: Maria M.
Eurostat Statistical Data Editing and Imputation.
©Ian Sommerville 2000Software Engineering, 6th edition. Chapter 9 Slide 1 Formal Specification l Techniques for the unambiguous specification of software.
SE-02 SOFTWARE ENGINEERING LECTURE 3 Today: Requirements Analysis Requirements tell us what the system should do - not how it should do it. Requirements.
Rudi Seljak, Metka Zaletel Statistical Office of the Republic of Slovenia TAX DATA AS A MEANS FOR THE ESSENTIAL REDUCTION OF THE SHORT-TERM SURVEYS RESPONSE.
A generic tool to assess impact of changing edit rules in a business survey – SNOWDON-X Pedro Luis do Nascimento Silva Robert Bucknall Ping Zong Alaa Al-Hamad.
THE MAIN INNOVATIONS OF DATA EDITING AND IMPUTATION FOR THE 2010 ITALIAN AGRICULTURAL CENSUS G. Bianchi, R. M. Lipsi, P. Francescangeli, G. Ruocco, A.
Deliverable 2.6: Selective Editing Hannah Finselbach 1 and Orietta Luzi 2 1 ONS, UK 2 ISTAT, Italy.
Topic (ii): New and Emerging Methods Maria Garcia (USA) Jeroen Pannekoek (Netherlands) UNECE Work Session on Statistical Data Editing Paris, France,
Jeroen Pannekoek - Statistics Netherlands Work Session on Statistical Data Editing Oslo, Norway, 24 September 2012 Topic (I) Selective and macro editing.
European Conference on Quality in Official Statistics Session 26: Quality Issues in Census « Rome, 10 July 2008 « Quality Assurance and Control Programme.
Recommended Practices for Editing and Imputation in the European Statistical System: the EDIMBUS Project* Orietta Luzi (Istat, Italy) Ton De Waal (Statistics.
Topic (vi): New and Emerging Methods Topic organizer: Maria Garcia (USA) UNECE Work Session on Statistical Data Editing Oslo, Norway, September 2012.
Statistics New Zealand’s End-to-End Metadata Life-Cycle ”Creating a New Business Model for a National Statistical Office if the 21 st Century” Gary Dunnet.
Environment Change Information Request Change Definition has subtype of Business Case based upon ConceptPopulation Gives context for Statistical Program.
DATA PREPARATION: PROCESSING & MANAGEMENT Lu Ann Aday, Ph.D. The University of Texas School of Public Health.
Chap. 5 Building Valid, Credible, and Appropriately Detailed Simulation Models.
Statistical Expertise for Sound Decision Making Quality Assurance for Census Data Processing Jean-Michel Durr 28/1/20111Fourth meeting of the TCG - Lubjana.
Chapter 10 Verification and Validation of Simulation Models
1 C. ARRIBAS, D. LORCA, A. SALINERO & A. COLMENERO Measuring statistical quality at the Spanish National Statistical Institute.
Building Simulation Model In this lecture, we are interested in whether a simulation model is accurate representation of the real system. We are interested.
Workshop on Price Index Compilation Issues February 23-27, 2015 Data Collection Issues Gefinor Rotana Hotel, Beirut, Lebanon.
Topic (iii): Macro Editing Methods Paula Mason and Maria Garcia (USA) UNECE Work Session on Statistical Data Editing Ljubljana, Slovenia, 9-11 May 2011.
Outlining a Process Model for Editing With Quality Indicators Pauli Ollila (part 1) Outi Ahti-Miettinen (part 2) Statistics Finland.
Developing and applying business process models in practice Statistics Norway Jenny Linnerud and Anne Gro Hustoft.
Copyright 2010, The World Bank Group. All Rights Reserved. Managing Data Processing Section B.
Topic (i): Selective editing / macro editing Discussants Orietta Luzi - Italian National Statistical Institute Rudi Seljak - Statistical Office of Slovenia.
Software Engineering1  Verification: The software should conform to its specification  Validation: The software should do what the user really requires.
United Nations Oslo City Group on Energy Statistics OG7, Helsinki, Finland October 2012 ESCM Chapter 8: Data Quality and Meta Data 1.
Multivariate selective editing via mixture models: first applications to Italian structural business surveys Orietta Luzi, Guarnera U., Silvestri F., Buglielli.
Generic Statistical Data Editing Models (GSDEMs) Workshop on the Modernisation of Official Statistics The Hague, 24 November 2015.
Requirement engineering & Requirement tasks/Management. 1Prepared By:Jay A.Dave.
Testing the use of administrative data to edit the 2009 Agriculture Census Dolores Lorca National Statistical Institute of Spain.
Study of Editing and Imputation Practices at Statistics Finland Janika Konnu and Pauli Ollila Statistics Finland Q2010: Editing session Wednesday 5 th.
1 Phase Testing. Janice Regan, For each group of units Overview of Implementation phase Create Class Skeletons Define Implementation Plan (+ determine.
An Overview of Editing and Imputation Methods for the next Italian Censuses Gianpiero Bianchi, Antonia Manzari, Alessandra Reale UNECE-Eurostat Meeting.
Elaborating on the Business Architecture of SN Robbert Renssen Statistics Netherlands Standard Process Steps.
 The processes used for RE vary widely depending on the application domain, the people involved and the organisation developing the requirements.  However,
Administrative Data and Official Statistics Administrative Data and Official Statistics Principles and good practices Quality in Statistics: Administrative.
1 IT system and data validation process in Latvian CPI/HICP Prepared by Oskars Alksnis, Central Statistical Bureau of Latvia EU Twinning Project Forwarding.
Software Testing.
Generic Statistical Data Editing Models (GSDEMs)
An information model for a metadata-driven editing and imputation system Rok Platinovsek UNECE Work Session on Statistical Data Editing, April
UNITED NATIONS ECONOMIC COMMISSION FOR EUROPE CONFERENCE OF EUROPEAN STATISTICIANS Work Session on Statistical Data Editing April 2017 The Hague,
Chapter 10 Verification and Validation of Simulation Models
Survey phases, survey errors and quality control system
Survey phases, survey errors and quality control system
Tomaž Špeh, Rudi Seljak Statistical Office of the Republic of Slovenia
Aurora De Santis, Riccardo Carbini Istat, Italy
Jeroen Pannekoek, Sander Scholtus and Mark van der Loo
MECH 3550 : Simulation & Visualization
Data processing German foreign trade statistics
The role of metadata in census data dissemination
Automatic Editing with Soft Edits
Jeroen Pannekoek, Mark van der Loo and Bart van den Broek
New and Emerging Methods
Presentation transcript:

CBS-SSB STATISTICS NETHERLANDS – STATISTICS NORWAY Work Session on Statistical Data Editing Oslo, Norway, September 2012 Jeroen Pannekoek and Li-Chun Zhang On the general flow of editing

CBS - SSB 1 Introduction An overall data editing process involves all activities to transform raw micro-data with errors and missing values into edited statistical micro-data that are suitable for production of publication figures. GSBPM: review, validate and edit, impute, output control. For implementation of an E&I system we need more detailed descriptions called statistical functions that each perform some action on the data. This paper tries to identify common statistical functions that are used as building blocks in different overall E&I processes or strategies. The decomposition of the overall process can facilitate process design, re-use of methodological components and documentation and generic software tools.

CBS - SSB 2 Contents Some classifications of data editing functions that are relevant for the process design. A summary of statistical data editing functions in some detail. Some process flow examples, using the statistical functions as building blocks, from the Netherlands and Norway. Concluding remarks

CBS - SSB 3 Classification of functions by purpose Verification Checking of hard and soft edit rules, calculation scores, detection of systematic errors. Input: rules and data → Output: quality indicators and measures Less formal: graphical macro-editing, output control. Selection (for further processing) Selection of units for manual editing. Selection of variables to change, error localisation. Input: quality indicators and data → Output: selection of records or fields Amending Modifying selected data values to resolve problems detected by verification, including imputation of missing values.

CBS - SSB 4 Unit-mode versus batch-mode operation Since manual editing is time-consuming it should start during the sometimes lengthy data collection period. This must then also hold for any automatic editing function that is applied before manual editing. Unit-mode functions Proceed on a record-by-record basis and can be applied during the data collection phase. Bach-mode functions Use all of the data (or a large subset) and can only be applied near the end of the data collection phase.

CBS - SSB 5 Editing functions: verification (1/2) Edit-rules (unit-mode) Systems of connected balance edits: profit=turnover-total costs. total costs = costs of employees + costs of purchases +  Non-negativity edits and inequalities. Ratio edits (soft). Score functions  Measure the potential effect that editing a unit may have on estimates of totals or other aggregate parameters of interest. Based on measures of the deviation between observed values and predicted or “anticipated” values s i =f(x j,x j a ).  Unit-mode: x j a is based historical data or other external source. Batch-mode: x j a is based on current data.  Also applied to measure and check the actual effect of (automatic) editing instead of the potential effect of editing. Then x j a is the edited value.

CBS - SSB 6 Editing functions: verification (2/2) Extended score functions Score functions can be extended by adding indicators for further processing based on simple criteria, other than the regular score function. For instance: >0: regular score value -9: “crucial” (dominates the totals in its branch) → manual editing -8: influential and main variables are missing → re-contact -7: non-influential and main variables missing → unit nonrespons Macro-verification Macro-verification functions are batch-mode by definition. They include all macro-editing activities: verifying aggregates, graphical inspection of distributions, graphical or model-based outlier detection etc.

CBS - SSB 7 Editing functions: selection Selection of units for manual editing using regular scores By comparing to a predetermined threshold value – unit-mode. By ordering units on scores and select the highest ranking – batch-mode Selection of variables for amendment: error localization (unit-mode). To resolve edit-failures, some values need to be changed. The error localization problem is the selection of which variables to be changed. A generic automatic approach (Felligi-Holt): select the fewest (weighted) number of variables to change Macro-selection (batch mode) of units for manual editing Implausible aggregates eventually lead to suspect units (down-drilling) Graphical verification leads to selection of the most extraordinary units.

CBS - SSB 8 Editing functions: amendment Amendment of systematic errors (unit-mode) Errors with a detectable cause and reliable correction mechanism. Generic: Thousand errors, recognizable typos, rounding errors. Subject-related: specific “if-then” type of correction rules. Deductive imputation of missing values (unit-mode) Some missing values can univocally be determined by the hard edit- rules. Which gives the only possible feasible imputation. Model based imputation (batch- or unit-mode) For most missing value we need model-based predicted values to impute. Batch-mode if current data are used to estimate parameters. Adjustment for inconsistency (unit-mode) Adjustment of imputation to ensure consistency with edit-rules

CBS - SSB 9 Illustration of automatic editing Action# failed hard edits# missing values none5140 Treatment of Systematic errors Thousand errors5140 Typing errors4760 Rounding errors4400 Selection of fields to change F-H Error localization-397 Automatic imputation/adjustment Deductive imputation-266 Regression imputation2540 Adjustment of imp. values00 Data from child day care institutions: 500 records with 68 SBS-type variables and 40 hard edit-rules.

CBS - SSB 10 Process flow. Scenario A: Selective editing No Input micro data 5. Macro-selection 1. Primary automated processing 4. Automatic amendment of uncritical units Edited micro data 3. Clerical interactive editing No 1a. Systematic errors 1b. Evaluation of scores 2a. Selection using scores 2b. (FH-)selection of fields 4a. Imputation of missings 4b. Adjustments 5. Macro-verification and selection 2. Micro-selection Yes

CBS - SSB 11 Process flow. Scenario B: More automatic editing No Input micro data 5. Macro-selection 1. Primary automated processing 4. Automatic amendment of uncritical units Edited micro data 3. Clerical interactive editing No 1. All unit-mode automatic editing 1a. Systematic errors 1b. (FH-)selection of fields 1c. Imputation 1d. Adjustments 1e. Evaluation of scores 4a. Batch-mode Imputation 4b. Adjustments 5. Macro-verification and selection 2. Micro-selection Yes

CBS - SSB 12 Process flow: Scenario C. No timeliness problems, No Input micro data 1. Primary automated processing 4. Automatic amendment Edited micro data 3. (Partial) Clerical interactive editing No 1. Systematic errors 4a. Imputation of missings 4b. Adjustments 2. Macro-verification and selection. Including batch- mode scores 2. Macro-selection Yes 3. (Partial) Clerical interactive editing.

CBS - SSB 13 Concluding remarks The shown description of the overall process can be helpful in the communication between editing staff, project managers, process designers and methodologists. It clarifies the organization of the process and the choices that must be made. It also helps to define the functionalities and interfaces of generic software components by placing them in the context of the overall process scheme. Increasing automatic editing can greatly reduce the amount of manual editing. This may involve automatic editing of influential units and subject specific “if-then” rules.