Data Validation in the ESS Context

Slides:



Advertisements
Similar presentations
ESS VIP project on Validation
Advertisements

WP.5 - DDI-SDMX Integration E.S.S. cross-cutting project on Information Models and Standards Marco Pellegrino, Denis Grofils Eurostat METIS Work Session6-8.
Copyright 2010, The World Bank Group. All Rights Reserved. Principles, criteria and methods Part 2 Quality management Produced in Collaboration between.
Implementation of the European Statistics Code of Practice Yalta September 2009 Pieter Everaers, Eurostat.
Eurostat SDMX and Global Standardisation Marco Pellegrino Eurostat, Statistical Office of the European Union Bangkok,
1 Integration of the Eurostat and ESS Metadata Systems A. Götzfried Head of Unit B6 Eurostat.
Statistical data editing - UNECE work session – OSLO September 2012 Proposal of a revised approach for data validation within the European Statistical.
IT Directors’ Group Meeting October 2010 Sharing data validation tools in the ESS Christine WIRTZ – Head of Unit B3 Georges PONGAS – Unit B3 Daniel.
1 Quality reporting within the Eurostat and the ESS metadata systems August Götzfried and Håkan Linden Eurostat Unit B6: Reference databases and metadata.
Eurostat Sharing data validation services Item 5.1 of the agenda.
13 November, 2014 Seminar on Quality Reports QUALITY REPORTS EXPERIENCE OF STATISTICS LITHUANIA Nadiežda Alejeva Head, Price Statistics.
Relationship between Short-term Economic Statistics Expert Group Meeting on Short-Term Statistics February 2016 Amman, Jordan.
Quality declarations Study visit from Ukraine 19. March 2015
Theme (iv): Standards and international collaboration
Investment Intentions Survey 2016
UNECE-CES Work session on Statistical Data Editing
Implementation of Quality indicators for administrative data
Towards more flexibility in responding to users’ needs
Seminar on ESA 2010 Metadata
Progress on ESS Validation Project
ESS Validation State of Play and next steps
Workshop on the Validation of Waste Statistics
ESS Vision 2020 Validation: Implementation of deliverables
11. The future of SDMX Introducing the SDMX Roadmap 2020
Quality Reporting with JD+
Structural Business Statistics Data validation
ESS Vision 2020 LDF and HRM Working Group
Validation Break-out sessions
Overview of the ESS quality framework and context
ESS Vision 2020 Resource Directors Group – June 2015
ESTP programme for 2016 Živilė Aleksonytė-Cormier
Towards a European validation architecture
Statistics Denmark’s presentation of metadata
ESS Vision 2020: ESS.VIP Validation
Data Validation in the ESS Context
Task Force on Annual Financial Accounts
ESS Vision 2020.
ESS.VIP VALIDATION An ESS.VIP project for mutual benefits
ESS Validation Project State of Play and next steps
Business and IT Architecture for ESS validation
Working Group on Population and Housing Censuses
Giuliano Amerini Unit E6 (Transport)
ESTP Course Balance of Payments – Introductory course Paris, May 2014 Quality issues.
Sharing data validation activities in the ESS.
Sub-Regional Workshop on International Merchandise Trade Statistics Compilation and Export and Import Unit Value Indices 21 – 25 November Guam.
Validation services developed in the ESS
SDMX Tools Interactive demonstrations Structural Validation Service
Education and Training Statistics Working Group – 2-3 June 2016
ESS Validation Project State of Play and next steps
Who are the users and what they want
Applying the ESS EARF in a VIP project: The ESS.VIP Validation example
SDMX : General introduction H. Linden, Eurostat, Unit B5
Quality Reporting in CBS
SDMX Implementation The National Accounts use case
Metadata on quality of statistical information
Prodcom Working Group Item Quality reporting and indicators
ESS Vision 2020.
M. Henrard, B5 N. Buysse and H. Linden, B6 Eurostat
Business architecture
2.7 Annex 3 – Quality reports
Modernisation of Validation in the ESS Status report
European Statistics Code of Practice
GENEDI EUROPEAN COMMISSION - EUROSTAT GENERIC EDI TOOLBOX
ESS Vision and VALIDATION
Overview of the ESS quality framework and context
Validation Activities in the ESS What you will hear today…
ESS conceptual standards for quality reporting
GESMES and SDMX-ML - Practical issues
Data Validation in the ESS (National Accounts - Case of Albania)
GSBPM Giorgia Simeoni, Istat,
Presentation transcript:

Data Validation in the ESS Context ESTP course Item 02 Luxembourg, 21-22 Nov 2017 Vincent.tronet@ec.europa.eu Eurostat, Unit B1

Presentation Outline ESS Vision 2020 Quality and compliance GSBMP Data validation – Life cycle Main problems in current situation and medium term goals Validation levels 20 main types of validation rules in the ESS

ESS Validation Youtube video at https://www. youtube. com/watch

The path to ESS Vision 2020 COM (2009)404 Vision COM (2009)433 GDP and beyond (2011)211 Robust quality ESSC Nov.2012 ESS.VIP Programme Reg. 99/2013 ESP 2013-2017 "The ESS provides the EU, the world and the public with independent high quality information on the economy and society on European, national and regional levels and make the information available to everyone for decision-making purposes, research and debate." Reg. 223/2009 ESS

In May 2014, the ESSC came to an agreement on the ESS Vision 2020 as the guiding frame for the ESS development during the years up to 2020 5 key areas of work: Focus on users Strive for quality Harness new data sources Promote efficiency in production processes Improve dissemination and communication

ESS Vision 2020 portfolio

ESS Vision 2020 – Building Blocks Standards Network Services Data warehouses Enterprise Architecture Quality Validation Users Communication Administrative data Big Data Governance Methods

Quality and compliance Countries have to provide statistics to Eurostat on time (punctuality) and of acceptable quality. Link with quality principles of the European statistics code of practice – Principles 11 to 15 related to statistical output: 11: Relevance 12: Accuracy and reliability 13: Timeliness and punctuality 14: Coherence and comparability 15: Accessibility and clarity

GSBPM Global Statistical Business Process Model

Data validation – life cycle

Main problems in current situation Time-consuming "validation ping-pong" Possibility of validation gaps No clear picture of who does what Risk of non-consistent assessment over time, between countries and across statistical domains Possible misunderstanding on what is acceptable or not Possible subjective assessment of data quality Duplication of IT development costs within the ESS Manual work due to low integration between the different tools Lack of standards and of common architecture

Medium-term goals - ESS VIP validation Business Outcomes Goal 1: Ensure the transparency of the validation procedures applied to the data sent to Eurostat by the ESS Member States. Increase in the quality and credibility of European statistics   Reduction of costs related to the time-consuming validation cycle in the ESS ("validation Ping-Pong") Goal 2: Enable sharing and re-use of validation services across the ESS on a voluntary basis. Reduction of costs related to IT development and maintenance

Data Validation levels 0 to 5 (Graph B) Validation complexity Same file Level 0: Format & file structure Same dataset Level 1: Cells, records, file From the same source Within a domain Between files Level 2: Revisions and Time series Within an organisation Validation complexity Data Between datasets Level 2: Between correlated datasets From different sources Level 3: Mirror checks Between domains Level 4: Consistency checks Between different organisations Level 5: Consistency checks

Validation levels 0 to 5 (Graph A) Data Within a statistical authority Between statistical authorities Level 5 Consistency checks Within a domain Between domains From the same source From different sources Same dataset Between datasets Same file Between files Level 0 Format and File structure checks Level 1 Cells, Records, File checks Level 4 Consistency Checks Level 3 Mirror checks Level 2 Checks between correlated datasets Revisions and Time series Validation complexity

Priority to Validation at file level - Why? Easiest to define, check, implement, configure and share in a validation service (for "pre-validation") Fastest to check, Most numerous validation rules, Most numerous to lead to severity level “error” (with rejection of data), lead, when not checked to tedious and burdensome manual work and "ping-pong" At least these rules should be checked prior to transmission to Eurostat

Validation rules… Guess the level(s) No worry: you'll remain anonymous… Time for a Quiz … Validation rules… Guess the level(s) Validation rule/Level 1 2 3 4 5 Valid code in field "aircraft type" Annual data consistent with quarterly data Country code in first field consistent with data sender Eurostat data for country X is consistent with OECD data for country X The revisions of data between 2 versions are limited Outliers detection Export figures reported by country A towards country B correspond to Import figures reported by country B from country A Changes in agricultural production of olives for country "X" is consistent with neighbouring countries 16

20 Main types of validation rules Cover at least 80-90% of the validation rules used for ESS data Used: To structure the standard documentation that describes validation rules in domains (and as a check-list) To identify the main VTL operators needed for Validation of ESS data In Validation Rule Manager to allow an easy definition of rules based on a simple set of parameters per rules and an easy and automatic generation of VTL

Classified in 4 groups Basic file structure check (level 0): => Preliminary checks needed => SDMX full coverage (structural Validation) => 4 types of rules that lead to errors in case of failure Basic intra-file checks (Levels zero and 1): => SDMX large coverage => 5 types of rules that lead in most cases to errors in case of failure Checks intra or inter files (levels 1 to 5): => SDMX very partial coverage => 8 types of rules that usually lead to errors in case of failure Checks inter-files in same statistical domain (Level 2 and 3): => No SDMX coverage => 3 types of rules that usually lead to warnings in case of failure

Main types of validation rules - Overview Rule type Mandatory Default Validation level SDMX Micro data Severity level Comments 1 2 3 4 5 E W I (EVA) Envelope is Acceptable X   (FLF) File Format (FDD) Fields Delimiter (X) “;” Mandatory for CSV files (DES) Decimals Separator “.” Mandatory for CSV files (always “.” For SDMX-ML) (FDT) Field Type (FDL) Field Length (FDM) Field is Mandatory or empty (COV) Codes are Valid Mandatory for key fields (RWD) Records are Without Duplicates Key Mandatory for aggregated (tabular) data Default: No duplicate key (REP) Records Expected are Provided (RNR) Records’ Number is in a Range >=1 Default: at least one record (file not empty) (COC) Codes are Consistent (VIR) Values are in Range >=0 Default: values are zero or positive (Min=zero) (VCO) Values are Consistent (VAD) Values for Aggregates are consistent with Details = Mandatory if aggregates and details are provided Default: aggregates = sum of details (VNO) Values are Not Outliers (VSA) Values for Seasonally Adjusted data are plausible (RRL) Records Revised are Limited (VRT) Values are Revised within a Tolerance level (VMP) Values for Mirror data are Plausible

Main types of validation rules Focus on 8 rules from Group 3 Rule type Mandatory Default Validation level SDMX Micro data Severity level Comments 1 2 3 4 5 E W I (REP) Records Expected are Provided   X (X) (RNR) Records’ Number is in a Range >=1 Default: at least one record (file not empty) (COC) Codes are Consistent (VIR) Values are in Range >=0 Default: values are zero or positive (Min=zero) (VCO) Values are Consistent (VAD) Values for Aggregates are consistent with Details = Mandatory if aggregates and details are provided Default: aggregates = sum of details (VNO) Values are Not Outliers (VSA) Values for Seasonally Adjusted data are plausible

QUESTIONS ?

Time for a Coffee break 22