IT Directors’ Group Meeting October 2010 Sharing data validation tools in the ESS Christine WIRTZ – Head of Unit B3 Georges PONGAS – Unit B3 Daniel SURANYI – Unit B5 Item 3.3.b of the agenda
19-20 October 2010IT Directors’ Group Meeting 2 Background ITDG 2009: –Eurostat presented new ESS vision with a specific view on IT architecture and IT tools Harmonising statistical production processes welcomed, BUT –considered very ambitious –should be medium to longer term perspective Sharing of IT tools –Implicit and crucial aspect of future infrastructure of the ESS –Virtual sharing OR sharing of real software could be envisaged –Challenges: IT standards, interdependence of actors –Linked to SDMX framework Data validation services appropriate and logical step
19-20 October 2010IT Directors’ Group Meeting 3 Data validation in the ESS Data validation takes place: –In Member States – before transmission –In Eurostat – before further dissemination and processing Several steps in validation: –Format validation –Codes validation –Data validation 1st level: basic checks – existence of mandatory fields, range checks, consistency of info inside file 2nd level: consistency with historical data / data from other sources (other countries, other statistics) 3rd level: expert validation / in-depths analysis
19-20 October 2010IT Directors’ Group Meeting 4 Data validation tools developed by Eurostat eVE = eDamis Validation Engine –Allows for a final check before transmitting data to Eurostat –Covers format, codes and basic checks –For files in SDMX-ML format; linked to DSD EBB = Editing Building Block –Allows importing of external reference files –Can be configured for 2nd level validation –For files with an agreed format applied by all data senders (csv, flr,sdmx-ml, sdmx-edi) Different ways of coding validation rules Validation of confidential data currently limited
19-20 October 2010IT Directors’ Group Meeting 5 VIP “Data validation” VIP on efficiency gains in the validation process Initial focus on Agriculture Statistics (Animal Production and Farm Structure Survey 2010); Ultimate aim: improve efficiency in the production chain from MS to Eurostat through improvements in the validation process Looks at different approaches to achieve efficiency gains: –Implementing validation tools –Rebalancing validation tasks – ‘the sooner the better’ approach –Policy decisions and guidelines on the roles of different actors
19-20 October 2010IT Directors’ Group Meeting 6 EBB = Editing Building Block
19-20 October 2010IT Directors’ Group Meeting 7 EBB = Editing Building Block Main Functionalities: Acceptance of various file formats and number of variables (limited by the DBMS column number capacity) Validation programs are parametric Not only validation but also variable creation Possibility to manipulate incoming datasets Information is persistent (data+metadata) and reusable
19-20 October 2010IT Directors’ Group Meeting 8 Functionality in detail File management: Fixed length records Variable length records (delimited) Sdmx-ML Gesmes files Scripting and web services Web version (dec 2010) and stand alone version
19-20 October 2010IT Directors’ Group Meeting 9 Validation rules, Computations Rules are logical expressions followed by: The rule name The rule severity The rule warning message A possible modification or creation of data depending on the rule result. Rules can be horizontal or vertical (inter record) Special computations (outliers) Output statistics (summary) and details for errors (what error where in the dataset).
19-20 October 2010IT Directors’ Group Meeting 10 Dataset operations Copy file, select part of file Split file Aggregate Rename Merge Append Reorder lines or columns
19-20 October 2010IT Directors’ Group Meeting 11 The Architecture
19-20 October 2010IT Directors’ Group Meeting 12 Applied in the domains Foreign Trade Esspross AES, CVTS BOP EHIS Transport
19-20 October 2010IT Directors’ Group Meeting 13 eVE = eDAMIS Validation Engine
19-20 October 2010IT Directors’ Group Meeting 14 eDAMIS Validation Engine Validation at the Single Entry Point Based on SDMX No installation or configuration in Member States eDAMIS Web Forms: Real-Time Validation (in Production for some years) eDAMIS Web Portal: New Validation Engine (available since eDAMIS 3.0, July 2010) eDAMIS Web Application –Server side validation for all eWA versions –Local validation in eWA 3.1 (using rules from the server)
19-20 October 2010IT Directors’ Group Meeting 15 eVE – Data Validation Features Same Validation Rule Syntax as Web Forms Within one file and reference period Different rule sets per reference period possible Country specific rules Mandatory values, Range checks Basic expressions Validation of confidential datasets (Portal or eWA 3.1) Full automatic transmission and validation workflow
19-20 October 2010IT Directors’ Group Meeting 16 Workflow eDAMIS Validation Engine eWA eWP eDAMIS Server Validation SDMX Registry DSDs Eurostat Production Unit MS Database Web Service SDMX Report Browser SDMX Converter CSV Settings
19-20 October 2010IT Directors’ Group Meeting 17 Projects in Member States Fisheries Pilot (May 2010) –Workshops in Sweden, Latvia, UK, Romania –Remote Testing in Netherlands (CBS) –SDMX based collection starts in December 2010 Aviation Pilot (September 2010) –Workshop with Statistik Austria Results from both Pilot Projects –Implementation of SDMX is simpler than expected –Countries visited appreciated simple usage of eVE
19-20 October 2010IT Directors’ Group Meeting 18 Conclusion Tools have been developed that could be shared and tested For SDMX-ML data collections: eVE offers basic validation without further configuration EBB can be integrated without changing data transmission formats. It allows for more complex validation. More sophisticated validation requires further multi- disciplinary reflection.
19-20 October 2010IT Directors’ Group Meeting 19 Your feedback on: How to use these tools ESS-wide? Suggestions for directions of improvements