12. Validation services and the new. Validation & Transformation

Slides:



Advertisements
Similar presentations
Introduction to SDMX Seminar Eurostat/ECLAC 02 October 2012 August Götzfried Head of Unit, Eurostat B5 Management of statistical data and metadata.
Advertisements

Background Data validation, a critical issue for the E.S.S.
WP.5 - DDI-SDMX Integration
WP.5 - DDI-SDMX Integration E.S.S. cross-cutting project on Information Models and Standards Marco Pellegrino, Denis Grofils Eurostat METIS Work Session6-8.
SDMX and DDI Working Together Technical Workshop 5-7 June 2013
United Nations Economic Commission for Europe Statistical Division Part B of CMF: Metadata, Standards Concepts and Models Jana Meliskova UNECE Work Session.
Eurostat Expression language (EL) in Eurostat SDMX - TWG Luxembourg, 5 Jun 2013 Adam Wroński.
Eurostat SDMX and Global Standardisation Marco Pellegrino Eurostat, Statistical Office of the European Union Bangkok,
SDMX IT Tools Introduction
1 SDMX Global Conference September 2015 SDMX into the future VTL (Validation and Transformation Language) A new technical standard for enhancing.
Eurostat 1 3.An overview of the SDMX implementation process Edward Cook Eurostat Unit B5: “Central data and metadata services” SDMX Basics course,
IT Directors’ Group Meeting October 2010 Sharing data validation tools in the ESS Christine WIRTZ – Head of Unit B3 Georges PONGAS – Unit B3 Daniel.
1 Joint UNECE/EUROSTAT/OECD METIS Work Session (Geneva, March 2010) The On-Going Review of the SDMX Technical Specifications Marco Pellegrino, Håkan.
Eurostat Report on SDMX Reference Infrastructure User Group 1 st meeting in Luxembourg Sept 2012 Item 5.2 of the agenda November 2012IT Director's.
Eurostat Sharing data validation services Item 5.1 of the agenda.
3 June 2013 SDMX Technical Working Group Luxembourg 1 WP Item 6 Expressions and Calculations.
SDMX Basics course, March 2016 Eurostat SDMX Basics course, March Introducing the Roadmap Marco Pellegrino Eurostat Unit B5: “Data and.
Validation Architecture in the ESS CSPA Workshop, Geneva June 2016 Geneva June 2016 Eurostat, Vincent TRONET, Unit B1.
Theme (iv): Standards and international collaboration
UNECE-CES Work session on Statistical Data Editing
Object oriented system development life cycle
Progress on ESS Validation Project
ESS Validation State of Play and next steps
The CVD Metadata Handler
Workshop on the Validation of Waste Statistics
Using the Checklist for SDMX Data Providers
ESS Vision 2020 Validation: Implementation of deliverables
Upcoming changes to the DMX technical standard
SDMX: A brief introduction
11. The future of SDMX Introducing the SDMX Roadmap 2020
Validation Break-out sessions
22 February, ITDG/DIME Item 2 – Progress and deployment
2. An overview of SDMX (What is SDMX? Part I)
ESS Vision 2020: ESS.VIP Validation
Eurostat – Units E2, B5 Cristina BLANARU
An Introduction to Software Architecture
Opinions after the 24/25 February 2016 Plenary
Task Force on Annual Financial Accounts
ESS.VIP VALIDATION An ESS.VIP project for mutual benefits
VTL: Validation and Transformation Language
Business and IT Architecture for ESS validation
X-DIS/XBRL Phase 2 Kick-Off
Statistical Information Technology
Giuliano Amerini Unit E6 (Transport)
Sharing data validation activities in the ESS.
Validation Services - Implementation
3rd WGM Meeting 3 May 2018 Item 2.3 Possible standards for ESS Validation.
Validation services developed in the ESS
ESS VIP ICT Project Task Force Meeting 5-6 March 2013.
Prepared by Peter Boško, Luxembourg June 2012
3. An overview of the SDMX implementation process
Applying the ESS EARF in a VIP project: The ESS.VIP Validation example
SDMX : General introduction H. Linden, Eurostat, Unit B5
SISAI STATISTICAL INFORMATION SYSTEMS ARCHITECTURE AND INTEGRATION
ESS.VIP Validation Item 5.1
Presentation to SISAI Luxembourg, 12 June 2012
Item 7.3 (b) SDMX for UOE data collection
SDMX Implementation: the process
SDMX Implementation The National Accounts use case
M. Henrard, B5 N. Buysse and H. Linden, B6 Eurostat
Generic Statistical Information Model (GSIM)
Item 2.2 of the agenda IT Working Group meeting 2016
3. An overview of the SDMX implementation process
Modernisation of Validation in the ESS Collaboration with countries
ESS Vision and VALIDATION
Project objectives and benefits
Validation Activities in the ESS What you will hear today…
Future of EDAMIS Webforms
Palestinian Central Bureau of Statistics
Presentation transcript:

12. Validation services and the new. Validation & Transformation 12. Validation services and the new Validation & Transformation Language (VTL) Marco Pellegrino and Andras Nikl Eurostat Unit B5: “Central data and metadata services” SDMX Basics course, 1-2 March 2017 1

Outline of the presentation VTL - Validation and Transformation Language Validation Services – implementation

Transformation activities SDMX evolution: originally focused on data collection and dissemination From 2011 on: Supporting other stages of the statistical production process Validation and Transformation activities

VTL governance VTL is governed by the SDMX initiative Maintained under the umbrella of the SDMX TWG (Technical Working Group) by the VTL Task Force composed of members of the SDMX TWG and SWG (Statistical Working Group) and other experts involved in DDI, GSIM and SDMX design and evolution VTL is a “stand-alone” specification It can be used with SDMX, DDI, GSIM or others It can be used on its own

The main VTL goals Define and preserve V&T rules Exchange and share V&T rules Apply V&T rules in automated processes Taking care of making VTL applicable to several standards (e.g. SDMX, DDI, GSIM and possibly others) A very challenging target!

A language manipulates the artefacts of an IM (IM = information model) SDMX, DDI, GSIM … … have different IMs a language for one of them wouldn’t fit the others  a dedicated IM for VTL designed to be very abstract and mappable to the IMs of SDMX, DDI, GSIM (and possible others) Using VTL in SDMX, DDI, GSIM ... by mapping their artefacts to the VTL artefacts

Tuned information model Section with the relationships between VTL and GSIM Provided detailed information on each artefact

Model for Variables and Value Domains One of the main goals of the VTL standard is to provide organizations with the ability to acknowledge and apply transformation/validation rules defined by others The VTL manipulation language can reference permanent artefacts as many times as needed Reusable artefacts and rules are defined through the VTL definition language and reused through the VTL manipulation language

VTL 1.0: published in March 2015 (http://sdmx.org/?page_id=5096 ) VTL part 1 (General description) VTL part 2 (Library of Operators) eBNF (Extended Backus-Naur Form) Technical notation VTL 1.1: in progress Language extensions Reusability of rules, structural validation SDMX implementation: in progress Mapping of SDMX and VTL artefacts Messages for exchanging VTL rules Registry for storing VTL rules Web services for retrieving VTL rules

Assessment of VTL 1.0 September 2014: VTL 1.0 released for public review December 2014: ESSNet ValiDat WP4: Assessment of VTL from the point of view of completeness, correctness and usability March 2015: Final release of VTL 1.0 April-December 2015: In-depth assessment of VTL 1.0 carried out by the ValiDat ESSnet

Towards VTL 1.1 VTL 1.1 takes into account the ESSNet comments and … Includes new operators proposed by NSIs and Eurostat Defines a set of "core" operators and a library of high-level operators Allows to create user functions Enhances the reusability of the VTL code

High-level mechanisms Horizontal validation rules Vertical validation rules

Timeline for VTL 1.1 October 2016 VTL 1.1 (General part and Reference manual) published at https://sdmx.org/?page_id=5096 6 January 2017 Deadline for public review April 2017 Final version of VTL 1.1 ready 2017 Q2 Decision gate on using VTL as a validation language for the ESS For info: marco.pellegrino@ec.europa.eu Pending challenges: Improve VTL documentation Reference implementation Compliance kit How to organize users and implementers support How to involve user groups and communities

Validation Services - Implementation 538137538137538137 Presenter: Andras Nikl, representing B5, the unit responsible for validation service implementation. Planned time: 10-15 mins. Objective: Provide external stakeholders insight to Eurostat objectives regarding data validation. Highlight business benefits and understand basic expectations. The presentation explores the following subjects: How standards and services are built on each other – reminder As Is > To Be situation of Eurostat validation infrastructure. Benefits to stakeholders. Walkthrough target architecture. Discuss Validation Rule Manager as tool exploiting VTL. Discuss pre-validation and reporting. Implementation scenarios Implementation as collaborative effort. Support structure and continuous committed development. Timeline. Open to feedback and requests Validation Services - Implementation Eurostat - Directorate B Unit B5 – Data and Metadata Services and Standards

Q4 2017 Q4 2017 February 2017 Duration: 3 min. Talking points: Explain objectives of presentation, welcome questions, comments or suggestions. Explain sequence and interdependencies between standards and services. Standards in this context also refer to public repositories (SDMX, VTL based validation rules). Goal of Eurostat is to receive local data of higher quality, consistency and predictability. Harmonized and transparent activities across the ESS. This is a joint interest with data providers: effectiveness and efficiency, high quality standards, transparent communications. Explain objectives: SDMX as single standard for transparent and consistent data production; services for single roboust process. Explain planned timeline in terms of service availability.

Duration: 6 min for slides 3-7 Talking points: Systems perspective: Automation and standardization. Industrial grade data validation. We are offering a single, roboust service solution to support a commonly agreed way to validate data. Context: The current technological landscape of Eurostat is fragmented, with multiple parallel systems, technologies and approaches. Our ultimate direction is identical but parallel efforts result in lost opportunities, added cost and manual activities that are error prone. Shareability, transparency, reusability of technology and knowledge. We are building a single, automated system, utilizing readily available technologies and reusing existing infrastructure. Tried-and-tested, precedence of implementation exists already (NA, STS, EA). Automation brings increased consistency, predictability and productivity. Benefits to business: compressed response times, ease of use, no validation ping-pong, influence on rule construction, reusable rules, development synergies, specialized support. Pre-validation. Walkthrough of process: The architecture is designed to execute validation and return results to the user. Edamis remains the single point of entry.

Talking points: Process managed orchestrates process, all other components have a single purpose. This serves to provide stability and is easy to maintain and configurable. Benefits to business: All configurable: messaging, business rules, validation rules, report structures, rule severities. The process manager is configured as per business requirements. All other components have a single purpose and are called when needed. Fully automated.

Talking points: Recap what the services are and what they do. STRUVAL: SDMX compliance (including checks on file format, well-formedness and validity of the of xml structure); Compliance with relevant DSD; Presence of mandatory fields. Up and running. Public SDMX registry for DSDs. Feedback positive.

Talking points: CONVAL: domain and dataset specific validation rules and rulesets, within or between files and sources (as startegic goal). VRM and VTL – collection of agreed and standarized rules and rule sets. Shareability and reusability of rules. VRM: easy to use, intuitive rule creation and management tool for DMAs, graphical interface, easy to learn, no VTL scripting knowledge needed. Involvement of external users in rule creation is encouraged – collaboration leads to higher quality and transparency regarding goals.

Talking points: Result sent back to Edamis (no change), if dataset passes validation successfully, the data is transferred to further processing and dissemination. Two final points to make here: Full automation allows for pre-validation process runs. Informs data provider, does not interrupt anyone else. Reporting. Configurable, user-defined. Error categorization, clear actions to be taken, INC managemant.

Duration: 3 mins. for slides 8-10 Talking points: Implementation options are local decision and Eurostat service support will be available. Local replication. Services shared as software packages, process orchestartion is local responsibility. Independent and autonomous approach, transmission only once input is correct. May be desirable in case of sensitive data. Mixed model. BPaaS. Reliance on roboust Estat infrastructure, no local overhead. Supported by Estat INC management and dedicated point of contact. Clarify: Transmitted data will automatically run through Estat validation process in either scenario.

Duration: Talking points:

Duration: Talking points: Equals to BPaaS in cloud computing.

Deploy to production and use Collect your requirements Deploy to production and use Prepare and configure Duration: 3 min. Talking points: Service implementation is collaborative activity, this is business driven change, not a technologically driven one. This slide explains the steps of the implementation process. Requirements: Introduce implementation steps and general areas of interest (collection of rules and dataflows etc.). Business needs emphasized. Configuration: All is provided for the business. 2-3 weeks. Testing: Business involvement not mandatory but recommended to build experience. External testers may be involved if desired. Deployment: Piloting phase (limited participants), and parallel runs executed for risk free introduction to production systems. Post-deployment support, change requests and ongoing development effort > this is only Phase 1. Test and accept

Q4 2017 Q4 2017 February 2017 Duration: 1 min. Talking points: Repeat slide to highlight service availablity timelines.

Thank you for your interest! Talking points: Open for questions. Thank you for your interest!