Download presentation
Presentation is loading. Please wait.
Published byPernille Hetland Modified over 6 years ago
1
Validation in the ESS CoE Data Warehousing 23./
2
Background The ESS.VIP Validation and the ESSnet ValiDat Foundation
2012: Eurostat emphasizes the strategic importance of validation But is concerned mainly with data exchange with NSIs 2013: Launch of ESS.VIP Validation 2014: Launch of a task force to integrate Member States 2015: ESSnet ValiDat Foundation (IT, NL, LT, DE and Eurostat) Takes a somewhat broader approach to validation 2016: The ESSnet still casts its shadows… 2017: ESSnet ValiDat Integration (?)
3
What is data validation (exactly)?
Data validation as defined by the ESSnet: Data Validation is an activity verifying whether or not a combination of values is a member of a set of acceptable combinations.
4
Validation as a Problem
Is there a business case? A survey conducted by the ESSnet revealed the impact of data validation on the statistical produnction Effort: The amount of effort put into data validation (and editing) in five sample domains was estimated by the member states to account for 40 to 60 % of the total effort Relevance: The impact of data validation on data quality is generally assumed to be of paramount importance
5
Validation as a Problem
if employment status == “old-age pensioner” and age < 35 then error “Too young!” 0.5 < turnover(curMonth)/turnover(prevMonth) < 2 WENN ANZAHL VON Familie[ALLE].Person[MIT Alter < 18] > 0 DANN ... ENDE IF maritalstate=married THEN Age>15 “Too young to be married” ENDIF profit <= 0.6*revenue
6
ValiDat – Foundation I Business case - implications:
If validation has such a high impact on data quality and consumes so many resources, then it should be well understood, fairly wide standardized and as far as possible automated
7
ValiDat – Foundation II
The Base Line: Methodology The core delivery of the methodological work of the ESSnet is the handbook on validation that compiles and builds on the work of others and makes it available (pragmatically) to statisticians Why are we doing validation (remember the business case!)? Enhance data quality dimensions: Directly (like accuracy, coherence and comparability) Indirectly (timeliness) as restrictions
8
ValiDat - Foundation The Base Line: Methodology Content of handbook:
What Why How When
9
ValiDat - Foundation The Base Line: Methodology – What?
The handbook provides classification schemes for validation rules: Levels Pragmatic typology Formal typology All have their merits and help communicate about validation
10
ValiDat - Foundation The Base Line: Methodology – What?
Levels and rule types are building blocks to discuss other important concepts like: Structural vs. content based validation Simple vs. complex rule types Soft vs. hard checks Micro data vs. macro data validation They can be used as a framework for metrics, languages and technologies
11
ValiDat - Foundation The Base Line: Methodology – When? Here Here Here
12
ValiDat - Foundation The Base Line: Methodology – How?
Validation Life Cycle Simon et al. 2015
13
ValiDat - Foundation The Base Line: Methodology – How?
How do we know that we have struck the right balance between Improving data quality Keeping costs at bay Our solution: use metrics! Analyse the internal consistency of validation rule sets Analyse the value of validation rules on observed data Analyse validation rule sets in comparison to observed and expected data
14
ValiDat - Foundation Validation language
Any (future) validation language must meet two main requirements: Provide a formal means for domain experts to unambigously communicate about and exchange validation rules (human-readable). Enable generic IT-systems to validate data for any survey by providing the necessary information (machine-readable). These might be conflicting goals!
15
VTL A new Standard? VTL - Validation and Transformation Language has been specified by the SDMX community
16
VTL A new Standard? The ESSnet evaluated the following characteristics of VTL: Correctness and coherence Completeness Usability (by human users) Feasibility (for machine-to-machine communication) Evaluation is publicly available on CROS
17
VTL A PoC (Proof of Concept) Let‘s simulate a European Infrastructure!
18
Rule 5, 3 implementations VTL eStatistik (DE) Validate (NL)
19
VTL PoC Results VTL code is hard to read and understand
VTL code is lengthy Manual translation from VTL to national dialects requires strong IT skills Automatic translation from VTL to national dialect will not be easy -> VTL must be improved considerably -> Good tool support is essential (GUI)
20
Tools and Services Infrastructure as proposed by Eurostat
21
Tools and Services Additional requirements of the NSIs
End-to-end validation Support of the whole production chain (GSBPM) Support of the whole validation life cycle (from Specification to evaluation) Language and standards (VTL, SDMX, DDI, CSPA, ..) Other functional requirements Roles Versioning Metrics
22
Tools and Services Additional requirements of the NSI
Non-functional requirements Adaptability (to national systems) Usability (for different user groups) Performance (working with big datasets and complex rules) Stable and error free (as central part of statistical production) IT-Security, Data protection acts and Statistical confidentiality Organisational issues Training, support and documentation have to be secured Maintenance has to be secured Costs (development, modification, production)
23
Next Steps Deployment: Making it work!
Handbook (Trainings, Workshops, CoE?) VTL VTL 1.1. under public review till Evaluation: ESS Decision Gate: Tools & Services (Test facilities, Improvements)
24
Next Steps How to proceed
VTL pilots (running until end of 1st quarter 2017) National Accounts Animal Production Short Term Statistics Asylum and Migration Trade in Services Energy Statistics Household Budget Survey
25
Next Steps How to proceed Involvement of more member states Task Force
ESSnet ValiDat Integration January 2017 – March 2018; NL, PT, LT, PL, SE, DE Work Packages: Revision of the handbook, further work on metrics Validation reports Assessment of validation service scenarios Regional workshops (Nov to Feb. 2018) Lisbon, The Hague, Belgrade, Vilnius
26
Thank you for your attention!
27
Tools and Services EStat NSI EStat NSI
Business Architecture is momentarily limited EStat NSI Focus on Trans-mission Focus on Validation process (end to end) EStat NSI © Luca Gramaglia
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.