Download presentation
Presentation is loading. Please wait.
Published byKamilla Viken Modified over 5 years ago
1
Modernisation of Validation in the ESS Status report
EDAMIS and VALIDATION SERVICES User Group Luxembourg, 9-10 October 2017 Eurostat, Unit B1
2
Presentation Outline Project background
Main achievements since Jan 2016 and next steps until Dec 2019 Priorities for deployment (Validation levels and domains) Business Architecture Standard framework for validation rules and roles VTL (Validation and Transformation Language) Validation services Training courses
3
VIP Validation and follow-up
Project background VIP Validation and follow-up 3 Phases: (1) : Eurostat Validation project => pilots for agriculture statistics (2) : ESS VIP project => Methodological handbook, VTL, Validation policy, prototypes for validation services, … (3) : ESSC approved in May 2016 a set of follow-up actions to the ESS.VIP Validation. => focus on deploying into statistical production the deliverables of the ESS.VIP Validation
4
Main problems in current situation
Time-consuming "validation ping-pong" Possibility of validation gaps No clear picture of who does what Risk of non-consistent assessment over time, between countries and across statistical domains Possible misunderstanding on what is acceptable or not Possible subjective assessment of data quality Duplication of IT development costs within the ESS Manual work due to low integration between the different tools Lack of standards and of common architecture
5
Goals and expected outcomes
for ESS Validation Medium-term goals Business Outcomes Goal 1: Ensure the transparency of the validation procedures applied to the data sent to Eurostat by the ESS Member States. Increase in the quality and credibility of European statistics Reduction of costs related to the time-consuming validation cycle in the ESS ("validation Ping-Pong") Goal 2: Enable sharing and re-use of validation services across the ESS on a voluntary basis. Reduction of costs related to IT development and maintenance
6
Main achievements Jan 2016 to Sept 2017
Business architecture for ESS Validation finalised and approved VTL: Assessment of VTL for content validation rules in 5 domains Proof of concept of a VTL interpreter (VTL sandbox) Contribution to improvement of VTL (via public consultation) Tools and services: STRUVAL available at Eurostat in Process Manager and exposed as shared service CONVAL-1 (based on EDIT) available at Eurostat in Process Manager and exposed as shared service Communication (Video, Training courses, Web pages, …)
7
Next steps Until Dec 2019 Deployment of standard framework (template, guidelines, best practices, …) for describing and agreeing on validation rules and roles covering all statistical domains Deployment of Business Architecture in ESS Validation services available: (STRUVAL), CONVAL-2 (with VTL support), Validation Rule Manager Validation Rule Registry available Standardisation of ESS validation reports
8
Priority for Validation at file level (levels 0 and 1)
Data Within a statistical authority Between statistical authorities Level 5 Consistency checks Within a domain Between domains From the same source From different sources Same dataset Between datasets Same file Between files Level 0 Format and File structure checks Level 1 Cells, Records, File checks Level 4 Consistency Checks Level 3 Mirror checks Level 2 Checks between correlated datasets Revisions and Time series Validation complexity
9
Priority to Validation at file level
Why? Most numerous validation rules, Easiest to define, Easiest to check, Fastest to check, Easiest to implement, configure and share in a validation service (can be used for "pre-validation") Most numerous to lead to severity level “error” (with rejection of data), lead, when not checked to tedious and burdensome manual work
10
Priority for deployment in Domains
Top 20 domains for eDAMIS transmissions in 2016 Cover about 80% of all transmissions (in total of 83 domains) Include 7 of the 8 pilot domains 5 Priority levels: 8 Pilot domains (Animal, Milk, Asylum, BOP-ITS, Energy, HBS, NA, STS) 9 domains in top 20, with ratio V2/V1>= 50% 4 domains in top 20 with ratio V2/V1<50% 23 domains ranked 21 to 83 with ratio V2/V1>= 50% 39 domains ranked 21 to 83 with ratio V2/V1< 50% Tasks: Maturity assessment and documentation gathering Standardisation of documentation Discussions in Working Groups Validation rules written in VTL Validation services can be used
11
Priorities for deployment
Top 20 domains Rank Dir. Unit Process Transm. % Trans Version 1 Version 2 Version 3+ V2/V1 Versions Pilot Priority order 1 G G3 STS 6605 10.00% 4888 1094 623 22.38% 1.35 X 2 C C2 NA-ESA 6255 9.47% 2569 1548 2138 60.26% 2.43 3 E E1 CROPROD 4849 7.34% 1578 841 2430 53.30% 3.07 4 G5 COMEXT 4384 6.64% 1287 682 2415 52.99% 3.41 5 E5 ENERGY 4310 6.52% 2852 838 620 29.38% 1.51 6 F F2 ASYLUM 2848 4.31% 2238 506 104 22.61% 1.27 7 C5 BPM6 2730 4.13% 572 383 1775 66.96% 4.77 8 G2 SBS 2575 3.90% 1817 469 289 25.81% 1.42 9 TOUR 2525 3.82% 1068 732 725 68.54% 2.36 10 E3 AIR 2399 3.63% 959 764 676 79.67% 2.50 11 REGWEB 2286 3.46% 1228 797 261 64.90% 1.86 12 C4 HICP 2000 3.03% 22.58% 13 MRTM 1977 2.99% 568 442 967 77.82% 3.48 14 ANI 1783 2.70% 1444 241 98 16.69% 1.23 15 MILK 962 1.46% 554 202 206 36.46% 1.74 16 D D4 EDP 954 1.44% 333 380 72.37% 2.86 17 ROAD 844 1.28% 417 189 238 45.32% 2.02 18 E2 WASTE 807 1.22% 558 156 93 27.96% 1.45 19 F5 ESSPROS 783 1.19% 252 188 343 74.60% 3.11 20 F3 LFS 726 1.10% 270 160 296 59.26% 2.69 11
12
Business Architecture
What is it? Validation principles As is and to be situation Severity levels and data acceptance process Scenarios for implementation
13
Focus of ESS.VIP Validation
NSI ESTAT Validation can take place in several points of the ESS statistical production process
14
What does the business architecture for validation do?
Sets basic principles for validation in the ESS Defines the to-be state Clarifies the roles of Eurostat and Member States in the validation of data sent to Eurostat Clarifies how common validation services could be used by Member States and Eurostat
15
Validation Principles
Validation processes must be designed to be able to correct errors as soon as possible, so that data editing can be performed at the stage where the knowledge is available to do this properly and efficiently. The sooner, the better Trust, but verify Well-documented and appropriately communicated validation rules Well-documented and appropriately communicated validation errors Comply of explain Good enough is the new perfect When exchanging data between organisations, data producers should be trusted to have checked the data before and data consumers should verify the data on the common rules agreed. Validation rules must be clearly and unambiguously defined and documented in order to achieve a common understanding and implementation among the different actors involved The error messages related to the validation rules need to be clearly and unambiguously defined and documented, so that they can be communicated appropriately to ensure a common understanding on the result of the validation process. Validation rules must be satisfied or reasonably well explained. Validation rules should be fit-for-purpose: they should balance data consistency and accuracy requirements with timeliness and feasibility constraints.
16
As-is situation
17
To-be situation
18
To-be situation The sooner, the better Good enough is the new perfect
19
To-be situation Well-documented validation rules
Well documented validation errors
20
To-be situation Comply or explain Trust, but verify
21
Definition of severity levels
Errors: rules must necessarily be satisfied for mathematical or logical reasons. Data containing errors are not considered acceptable. Warnings: highlight suspicious values. Data with warnings may be accepted upon further verification and/or justification. Information: highlight potentially suspicious values which do not usually require further justification for acceptance.
22
Data acceptance process
Data with no errors or warnings: data are accepted Data with errors: data are rejected. In exceptional circumstances, Member States may request to override the error (and provide a justification). Data with warnings and no errors: Justification is required. If the justification is deemed sufficient, data are accepted. Minimum standard for compliance purposes: data with no errors must be received before deadline
23
Data acceptance process
24
Validation in the ESS – IT building blocks
25
Scenario 1 Member States receive the common rules and implement them in their own validation systems
26
Scenario 2 Member States use common ESS services in their process for validation (as shared or replicated services) Shared service Replicated services
27
Scenario 3 Member States use common ESS processes for validation
28
Standard framework for validation rules and roles
Practice of designing and agreeing on validation rules at Working Group level extended to all statistical domains until end 2019 Validation rules documented using common cross-domain standards Clear validation responsibilities assigned to the different actors
29
20 Main types of validation rules
Cover at least 80-90% of the validation rules used for ESS data Used to structure the standard documentation that describes validation rules in domains (and as a check-list) Used in Validation Rule Manager to allow an easy definition of rules based on a simple set of parameters per rules and an easy and automatic generation of VTL
30
20 Main types of validation rules
Rule type Mandatory Default Validation level SDMX Micro data Severity level 1 2 3 4 5 E W I (EVP) Envelope is Plausible X (FLF) File Format (FDD) Fields Delimiters (DES) Decimal Separator (FDT) Field Type (X) (FDL) Field Length (FDM) Field is Mandatory or empty (COV) Code is Valid (RWD) Records are Without Duplicates Key (REP) Records Expected are Provided (RNR) Records’ Number is in a Range >=1 (RRL) The number of Records Revised is Limited (COC) Codes are Consistent (VIR) Value is in Range >=0 (VCO) Values are Consistent (VAD) Values for Aggregates are consistent with Details = (VNO) Value is Not an Outlier (VSA) Values for Seasonally Adjusted data are plausible (VRT) Value is Revised within a Tolerance level (VMP) Values for Mirror data are Plausible
31
VTL A new standard language for validation created by the SDMX community: VTL (Validation and Transformation Language)
32
VTL – purposes Non-ambiguous Language designed for Validation (and Transformation) of statistical datasets Aims at being understood by statisticians and computers: used by statisticians to agree on validation rules, Converted or interpreted by computers to perform validation rules on various IT platforms. The ESS.VIP Validation took as a starting point for its activities the following definition given by the UNECE: Data validation is "an activity aimed at verifying whether the value of a data item comes from the given (finite or infinite) set of acceptable values." Data validation is focused on checking the validity/consistency of data. Checking process or structural metadata is not within the scope of validation, though process or structural metadata may serve as input to validation procedures.
33
Assessment of VTL 1.1 by statisticians
on pilot domains (Feb 2017) VTL 1.1 is not very intuitive for statisticians Description of the validation rules needs to be harmonised and linked to the DSD done in plain english with examples of bad/good data and with optimised (simplified) VTL code well structured (pre-condition, rule, severity, message, …) Rules to be expressed in a positive way (what is correct) Description and VTL code to be harmonised and made as clear as possible for the main types of rules For specific rules (not so usual), expertise needed to support VTL The ESS.VIP Validation took as a starting point for its activities the following definition given by the UNECE: Data validation is "an activity aimed at verifying whether the value of a data item comes from the given (finite or infinite) set of acceptable values." Data validation is focused on checking the validity/consistency of data. Checking process or structural metadata is not within the scope of validation, though process or structural metadata may serve as input to validation procedures.
34
VTL Tools – state of play
Eurostat has developed a VTL sandbox as a demonstrator and tool to improve VTL and test VTL scripts IT (ISTAT) works on a VTL Editor IT (Bank of Italy) Integrates VTL in their Infostat platform PL has developed a VTL to T-SQL translator NO works on integrating VTL in their production environment (see ECB – BIRD (Bank's Integrated Reporting Dictionary) (see VTL part 1 (General description) VTL part 2 (Library of Operators) eBNF (Extended Backus-Naur Form) Technical notation
35
Versions of VTL – state of play
VTL 1.0 published in March 2015 Collection of comments (public review) VTL 1.1 published in November 2016 VTL 2.0 will be published by the end of 2017 Used for developing tools and services in the ESS SDMX web site: VTL part 1 (General description) VTL part 2 (Library of Operators) eBNF (Extended Backus-Naur Form) Technical notation
36
Validation services STRUVAL available at Eurostat in Process Manager and exposed as shared service Pilots with countries for National Accounts and STS CONVAL-1 (based on EDIT) available at Eurostat in Process Manager and exposed as shared service CONVAL-2 (with VTL support) planned for 2018 VRM (Validation Rule Manager) and VRR (Validation Rule Registry) planned for The ESS.VIP Validation took as a starting point for its activities the following definition given by the UNECE: Data validation is "an activity aimed at verifying whether the value of a data item comes from the given (finite or infinite) set of acceptable values." Data validation is focused on checking the validity/consistency of data. Checking process or structural metadata is not within the scope of validation, though process or structural metadata may serve as input to validation procedures.
37
Training courses => ESTP course Nov => 4 ESSnet regional conferences
38
Youtube video at https://www.youtube.com/watch?v=vuBJxGtg7qA
QUESTIONS? Eurostat, Unit B1
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.