Download presentation
Presentation is loading. Please wait.
1
Data Validation in the ESS Context
IPA training course Item 02 Luxembourg, 18 May 2017 Eurostat, Unit B1
2
Presentation Outline ESS Vision 2020 Quality and compliance GSBMP
Data validation – Life cycle Main problems in current situation and medium term goals Validation principles Validation levels Time for Quiz and …
3
The path to ESS Vision 2020 COM (2009)404 Vision COM (2009)433 GDP and beyond (2011)211 Robust quality ESSC Nov.2012 ESS.VIP Programme Reg. 99/2013 ESP "The ESS provides the EU, the world and the public with independent high quality information on the economy and society on European, national and regional levels and make the information available to everyone for decision-making purposes, research and debate." Reg. 223/2009 ESS
4
In May 2014, the ESSC came to an agreement on the ESS Vision 2020 as the guiding frame for the ESS development during the years up to 2020 5 key areas of work: Focus on users Strive for quality Harness new data sources Promote efficiency in production processes Improve dissemination and communication
5
ESS Vision 2020 portfolio
6
ESS Vision 2020 – Building Blocks
Standards Network Services Data warehouses Enterprise Architecture Quality Validation Users Communication Administrative data Big Data Governance Methods
7
Quality and compliance
Countries have to provide statistics to Eurostat on time (punctuality) and of acceptable quality. Link with quality principles of the European statistics code of practice – Principles 11 to 15 related to statistical output: 11: Relevance 12: Accuracy and reliability 13: Timeliness and punctuality 14: Coherence and comparability 15: Accessibility and clarity
8
GSBPM Global Statistical Business Process Model
9
Data validation – life cycle
10
Main problems in current situation
Time-consuming "validation ping-pong" Possibility of validation gaps No clear picture of who does what Risk of non-consistent assessment over time, between countries and across statistical domains Possible misunderstanding on what is acceptable or not Possible subjective assessment of data quality Duplication of IT development costs within the ESS Manual work due to low integration between the different tools Lack of standards and of common architecture
11
Medium-term goals - ESS VIP validation
Business Outcomes Goal 1: Ensure the transparency of the validation procedures applied to the data sent to Eurostat by the ESS Member States. Increase in the quality and credibility of European statistics Reduction of costs related to the time-consuming validation cycle in the ESS ("validation Ping-Pong") Goal 2: Enable sharing and re-use of validation services across the ESS on a voluntary basis. Reduction of costs related to IT development and maintenance
12
Validation Principles
Validation processes must be designed to be able to correct errors as soon as possible, so that data editing can be performed at the stage where the knowledge is available to do this properly and efficiently. The sooner, the better Trust, but verify Well-documented and appropriately communicated validation rules Well-documented and appropriately communicated validation errors Comply of explain Good enough is the new perfect When exchanging data between organisations, data producers should be trusted to have checked the data before and data consumers should verify the data on the common rules agreed. Validation rules must be clearly and unambiguously defined and documented in order to achieve a common understanding and implementation among the different actors involved The error messages related to the validation rules need to be clearly and unambiguously defined and documented, so that they can be communicated appropriately to ensure a common understanding on the result of the validation process. Validation rules must be satisfied or reasonably well explained. Validation rules should be fit-for-purpose: they should balance data consistency and accuracy requirements with timeliness and feasibility constraints.
13
Validation levels 0 to 5 (Graph A)
Data Within a statistical authority Between statistical authorities Level 5 Consistency checks Within a domain Between domains From the same source From different sources Same dataset Between datasets Same file Between files Level 0 Format and File structure checks Level 1 Cells, Records, File checks Level 4 Consistency Checks Level 3 Mirror checks Level 2 Checks between correlated datasets Revisions and Time series Validation complexity
14
Data Validation levels 0 to 5 (Graph B) Validation complexity
Same file Level 0: Format & file structure Same dataset Level 1: Cells, records, file From the same source Within a domain Between files Level 2: Revisions and Time series Within an organisation Validation complexity Data Between datasets Level 2: Between correlated datasets From different sources Level 3: Mirror checks Between domains Level 4: Consistency checks Between different organisations Level 5: Consistency checks
15
Validation levels 0 to 5 (Graph C)
16
Priority to Validations
Level 0 and 1 Most numerous, Easiest to define, Easiest to check, Fastest to check, Easiest to implement, configure and share in a validation service (can be used for "pre-validation") Most numerous to lead to severity level “error” (with rejection of data), lead, when not checked to tedious and burdensome manual work
17
QUESTIONS before surprise ?
18
Your favourite representation of validation levels
Time for Quiz 1 and… Your favourite representation of validation levels Graph A Graph B Graph C 18
19
Validation rules… Guess the level(s)
No worry: you'll remain anonymous… Time for Quiz 2 and … Validation rules… Guess the level(s) Valid code in field "aircraft type" Annual data consistent with quarterly data Country code in first field consistent with data sender Eurostat data for country X is consistent with OECD data for country X Eurostat data for country "X" is consistent with country data on national website Outliers detection Changes in agricultural production of potatoes for country "X" consistent with neighbouring countries 19
20
Time for a Coffee and chocolate break Before the results of the Quiz…
20
21
Validation rules… Guess the level(s)
Results of Quiz 2 Validation rules… Guess the level(s) 1. Valid code in field "aircraft type" 2. Annual data consistent with quarterly data 1. Country code in first field consistent with data sender 5. Eurostat data for country X is consistent with OECD data for country X 2. Eurostat data for country "X" is consistent with country data on national website 1.2. Outliers detection 3. Changes in agricultural production of potatoes for country "X" consistent with neighbouring countries 1 even if normally linked to DSD 2: Correlated datasets from same country 1: Consistency between file content and "envelope" 5: Consistency between data from 2 different organisations (different methods and databases) 2: Normally same methodology for ESS countries but maybe data revised only at one place 1 if time series in same file, else 2 3: Same domain but different sources 21
22
Thank you for your attention!
22
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.