Data Validation in the ESS Context

Slides:



Advertisements
Similar presentations
ESS VIP project on Validation
Advertisements

The use and convergence of quality assurance frameworks for international and supranational organisations compiling statistics The European Conference.
Standardisation in the European Statistical System Barteld Braaksma, Cecilia Colasanti, Piero Demetrio Falorsi, Wim Kloek, Miguel Angel Martínez Vidal,
WP.5 - DDI-SDMX Integration E.S.S. cross-cutting project on Information Models and Standards Marco Pellegrino, Denis Grofils Eurostat METIS Work Session6-8.
Quality assurance activities at EUROSTAT CCSA Conference Helsinki, 6-7 May 2010 Martina Hahn, Eurostat.
The ESS.VIP Programme: a response to the challenges facing the ESS Mariana Kotzeva, ESS VIP Programme Coordinator Advisor Hors Classe ESTAT.
BAIGORRI Antonio – Eurostat, Unit B1: Quality; Classifications Q2010 EUROPEAN CONFERENCE ON QUALITY IN STATISTICS Terminology relating to the Implementation.
Eurostat Expression language (EL) in Eurostat SDMX - TWG Luxembourg, 5 Jun 2013 Adam Wroński.
1 C. ARRIBAS, D. LORCA, A. SALINERO & A. COLMENERO Measuring statistical quality at the Spanish National Statistical Institute.
Copyright 2010, The World Bank Group. All Rights Reserved. Principles, criteria and methods Part 2 Quality management Produced in Collaboration between.
Implementation of the European Statistics Code of Practice Yalta September 2009 Pieter Everaers, Eurostat.
Metadata Working Group Jean HELLER EUROSTAT Directorate A: Statistical Information System Unit A-3: Reference data bases.
Statistical data editing - UNECE work session – OSLO September 2012 Proposal of a revised approach for data validation within the European Statistical.
United Nations Oslo City Group on Energy Statistics OG7, Helsinki, Finland October 2012 ESCM Chapter 8: Data Quality and Meta Data 1.
Eurostat Standardisation DIME-ITDG 2015 Item 6 DIME-ITDG February
Eurostat Sharing data validation services Item 5.1 of the agenda.
1 Recent developments in quality related matters in the ESS High level seminar for Eastern Europe, Caucasus and Central Asia countries Claudia Junker,
Quality declarations Study visit from Ukraine 19. March 2015
Investment Intentions Survey 2016
UNECE-CES Work session on Statistical Data Editing
Governance, Fraud, Ethics and Corporate Social Responsibility
Implementation of Quality indicators for administrative data
Towards more flexibility in responding to users’ needs
The Systems Engineering Context
Progress on ESS Validation Project
ESS Validation State of Play and next steps
Implementing the ESS Vision 2020
Eurostat Quality Management (in the ESS context)
ESS Vision 2020 Implementation
Workshop on the Validation of Waste Statistics
Camilla Stoltenberg IANPHI Annual Meeting Roma, 24 October 2017
ESS Vision 2020 Validation: Implementation of deliverables
Measuring Data Quality and Compilation of Metadata
ESS Vision 2020 LDF and HRM Working Group
Overview of the ESS quality framework and context
ESS Vision 2020 Resource Directors Group – June 2015
Towards a European validation architecture
ESS Vision 2020: ESS.VIP Validation
ESS Vision 2020.
ESS.VIP VALIDATION An ESS.VIP project for mutual benefits
ESS Validation Project State of Play and next steps
Business and IT Architecture for ESS validation
Mandates for Unit A6 "Statistical cooperation" (Doc MGSC/2013/16)
Data Validation in the ESS Context
ESTP Course Balance of Payments – Introductory course Paris, May 2014 Quality issues.
Sharing data validation activities in the ESS.
Education and Training Statistics Working Group – 2-3 June 2016
ESS Validation Project State of Play and next steps
Palestinian Central Bureau of Statistics
3. An overview of the SDMX implementation process
Who are the users and what they want
Applying the ESS EARF in a VIP project: The ESS.VIP Validation example
Evolution of Urban Audit
ESS.VIP Validation Item 5.1
Quality Reporting in CBS
2.4 Business Architecture For ESS Validation
SDMX Implementation The National Accounts use case
Kees Zeelenberg, Winfried Ypma, Peter Struijs; Statistics Netherlands
Metadata on quality of statistical information
ESS Vision 2020.
Business architecture
Statistical cooperation
ESS.VIP.SERV Shared Services
Modernisation of Validation in the ESS Status report
European Statistics Code of Practice
European Statistical Cooperation Joint EFTA/ECE/SSCU seminar “Economic Globalisation: a Challenge for Official Statistics” 3-6 July 2007, Kiev Inna Steinbuka.
ESS Vision and VALIDATION
Compliance for statistics
Overview of the ESS quality framework and context
Item 9 Validation in UOE data collection
Presentation transcript:

Data Validation in the ESS Context IPA training course Item 02 Luxembourg, 18 May 2017 Vincent.tronet@ec.europa.eu Eurostat, Unit B1

Presentation Outline ESS Vision 2020 Quality and compliance GSBMP Data validation – Life cycle Main problems in current situation and medium term goals Validation principles Validation levels Time for Quiz and …

The path to ESS Vision 2020 COM (2009)404 Vision COM (2009)433 GDP and beyond (2011)211 Robust quality ESSC Nov.2012 ESS.VIP Programme Reg. 99/2013 ESP 2013-2017 "The ESS provides the EU, the world and the public with independent high quality information on the economy and society on European, national and regional levels and make the information available to everyone for decision-making purposes, research and debate." Reg. 223/2009 ESS

In May 2014, the ESSC came to an agreement on the ESS Vision 2020 as the guiding frame for the ESS development during the years up to 2020 5 key areas of work: Focus on users Strive for quality Harness new data sources Promote efficiency in production processes Improve dissemination and communication

ESS Vision 2020 portfolio

ESS Vision 2020 – Building Blocks Standards Network Services Data warehouses Enterprise Architecture Quality Validation Users Communication Administrative data Big Data Governance Methods

Quality and compliance Countries have to provide statistics to Eurostat on time (punctuality) and of acceptable quality. Link with quality principles of the European statistics code of practice – Principles 11 to 15 related to statistical output: 11: Relevance 12: Accuracy and reliability 13: Timeliness and punctuality 14: Coherence and comparability 15: Accessibility and clarity

GSBPM Global Statistical Business Process Model

Data validation – life cycle

Main problems in current situation Time-consuming "validation ping-pong" Possibility of validation gaps No clear picture of who does what Risk of non-consistent assessment over time, between countries and across statistical domains Possible misunderstanding on what is acceptable or not Possible subjective assessment of data quality Duplication of IT development costs within the ESS Manual work due to low integration between the different tools Lack of standards and of common architecture

Medium-term goals - ESS VIP validation Business Outcomes Goal 1: Ensure the transparency of the validation procedures applied to the data sent to Eurostat by the ESS Member States. Increase in the quality and credibility of European statistics   Reduction of costs related to the time-consuming validation cycle in the ESS ("validation Ping-Pong") Goal 2: Enable sharing and re-use of validation services across the ESS on a voluntary basis. Reduction of costs related to IT development and maintenance

Validation Principles Validation processes must be designed to be able to correct errors as soon as possible, so that data editing can be performed at the stage where the knowledge is available to do this properly and efficiently. The sooner, the better Trust, but verify Well-documented and appropriately communicated validation rules Well-documented and appropriately communicated validation errors Comply of explain Good enough is the new perfect When exchanging data between organisations, data producers should be trusted to have checked the data before and data consumers should verify the data on the common rules agreed. Validation rules must be clearly and unambiguously defined and documented in order to achieve a common understanding and implementation among the different actors involved The error messages related to the validation rules need to be clearly and unambiguously defined and documented, so that they can be communicated appropriately to ensure a common understanding on the result of the validation process. Validation rules must be satisfied or reasonably well explained. Validation rules should be fit-for-purpose: they should balance data consistency and accuracy requirements with timeliness and feasibility constraints.

Validation levels 0 to 5 (Graph A) Data Within a statistical authority Between statistical authorities Level 5 Consistency checks Within a domain Between domains From the same source From different sources Same dataset Between datasets Same file Between files Level 0 Format and File structure checks Level 1 Cells, Records, File checks Level 4 Consistency Checks Level 3 Mirror checks Level 2 Checks between correlated datasets Revisions and Time series Validation complexity

Data Validation levels 0 to 5 (Graph B) Validation complexity Same file Level 0: Format & file structure Same dataset Level 1: Cells, records, file From the same source Within a domain Between files Level 2: Revisions and Time series Within an organisation Validation complexity Data Between datasets Level 2: Between correlated datasets From different sources Level 3: Mirror checks Between domains Level 4: Consistency checks Between different organisations Level 5: Consistency checks

Validation levels 0 to 5 (Graph C)

Priority to Validations Level 0 and 1 Most numerous, Easiest to define, Easiest to check, Fastest to check, Easiest to implement, configure and share in a validation service (can be used for "pre-validation") Most numerous to lead to severity level “error” (with rejection of data), lead, when not checked to tedious and burdensome manual work

QUESTIONS before surprise ?

Your favourite representation of validation levels Time for Quiz 1 and… Your favourite representation of validation levels Graph A Graph B Graph C 18

Validation rules… Guess the level(s) No worry: you'll remain anonymous… Time for Quiz 2 and … Validation rules… Guess the level(s) Valid code in field "aircraft type" Annual data consistent with quarterly data Country code in first field consistent with data sender Eurostat data for country X is consistent with OECD data for country X Eurostat data for country "X" is consistent with country data on national website Outliers detection Changes in agricultural production of potatoes for country "X" consistent with neighbouring countries 19

Time for a Coffee and chocolate break Before the results of the Quiz… 20

Validation rules… Guess the level(s) Results of Quiz 2 Validation rules… Guess the level(s) 1. Valid code in field "aircraft type" 2. Annual data consistent with quarterly data 1. Country code in first field consistent with data sender 5. Eurostat data for country X is consistent with OECD data for country X 2. Eurostat data for country "X" is consistent with country data on national website 1.2. Outliers detection 3. Changes in agricultural production of potatoes for country "X" consistent with neighbouring countries 1 even if normally linked to DSD 2: Correlated datasets from same country 1: Consistency between file content and "envelope" 5: Consistency between data from 2 different organisations (different methods and databases) 2: Normally same methodology for ESS countries but maybe data revised only at one place 1 if time series in same file, else 2 3: Same domain but different sources 21

Thank you for your attention! 22