Download presentation
Presentation is loading. Please wait.
Published byJayson Bryant Modified over 6 years ago
1
12. Validation services and the new. Validation & Transformation
12. Validation services and the new Validation & Transformation Language (VTL) Marco Pellegrino and Andras Nikl Eurostat Unit B5: “Central data and metadata services” SDMX Basics course, 1-2 March 2017 1
2
Outline of the presentation
VTL - Validation and Transformation Language Validation Services – implementation
3
Transformation activities
SDMX evolution: originally focused on data collection and dissemination From 2011 on: Supporting other stages of the statistical production process Validation and Transformation activities
4
VTL governance VTL is governed by the SDMX initiative
Maintained under the umbrella of the SDMX TWG (Technical Working Group) by the VTL Task Force composed of members of the SDMX TWG and SWG (Statistical Working Group) and other experts involved in DDI, GSIM and SDMX design and evolution VTL is a “stand-alone” specification It can be used with SDMX, DDI, GSIM or others It can be used on its own
5
The main VTL goals Define and preserve V&T rules
Exchange and share V&T rules Apply V&T rules in automated processes Taking care of making VTL applicable to several standards (e.g. SDMX, DDI, GSIM and possibly others) A very challenging target!
6
A language manipulates the artefacts of an IM
(IM = information model) SDMX, DDI, GSIM … … have different IMs a language for one of them wouldn’t fit the others a dedicated IM for VTL designed to be very abstract and mappable to the IMs of SDMX, DDI, GSIM (and possible others) Using VTL in SDMX, DDI, GSIM ... by mapping their artefacts to the VTL artefacts
7
Tuned information model
Section with the relationships between VTL and GSIM Provided detailed information on each artefact
8
Model for Variables and Value Domains
One of the main goals of the VTL standard is to provide organizations with the ability to acknowledge and apply transformation/validation rules defined by others The VTL manipulation language can reference permanent artefacts as many times as needed Reusable artefacts and rules are defined through the VTL definition language and reused through the VTL manipulation language
9
VTL 1.0: published in March 2015 (http://sdmx.org/?page_id=5096 )
VTL part 1 (General description) VTL part 2 (Library of Operators) eBNF (Extended Backus-Naur Form) Technical notation VTL 1.1: in progress Language extensions Reusability of rules, structural validation SDMX implementation: in progress Mapping of SDMX and VTL artefacts Messages for exchanging VTL rules Registry for storing VTL rules Web services for retrieving VTL rules
10
Assessment of VTL 1.0 September 2014: VTL 1.0 released for public review December 2014: ESSNet ValiDat WP4: Assessment of VTL from the point of view of completeness, correctness and usability March 2015: Final release of VTL 1.0 April-December 2015: In-depth assessment of VTL carried out by the ValiDat ESSnet
11
Towards VTL 1.1 VTL 1.1 takes into account the ESSNet comments and …
Includes new operators proposed by NSIs and Eurostat Defines a set of "core" operators and a library of high-level operators Allows to create user functions Enhances the reusability of the VTL code
12
High-level mechanisms
Horizontal validation rules Vertical validation rules
13
Timeline for VTL 1.1 October 2016 VTL 1.1 (General part and Reference manual) published at 6 January 2017 Deadline for public review April 2017 Final version of VTL 1.1 ready 2017 Q2 Decision gate on using VTL as a validation language for the ESS For info: Pending challenges: Improve VTL documentation Reference implementation Compliance kit How to organize users and implementers support How to involve user groups and communities
14
Validation Services - Implementation
Presenter: Andras Nikl, representing B5, the unit responsible for validation service implementation. Planned time: mins. Objective: Provide external stakeholders insight to Eurostat objectives regarding data validation. Highlight business benefits and understand basic expectations. The presentation explores the following subjects: How standards and services are built on each other – reminder As Is > To Be situation of Eurostat validation infrastructure. Benefits to stakeholders. Walkthrough target architecture. Discuss Validation Rule Manager as tool exploiting VTL. Discuss pre-validation and reporting. Implementation scenarios Implementation as collaborative effort. Support structure and continuous committed development. Timeline. Open to feedback and requests Validation Services - Implementation Eurostat - Directorate B Unit B5 – Data and Metadata Services and Standards
15
Q4 2017 Q4 2017 February 2017 Duration: 3 min. Talking points:
Explain objectives of presentation, welcome questions, comments or suggestions. Explain sequence and interdependencies between standards and services. Standards in this context also refer to public repositories (SDMX, VTL based validation rules). Goal of Eurostat is to receive local data of higher quality, consistency and predictability. Harmonized and transparent activities across the ESS. This is a joint interest with data providers: effectiveness and efficiency, high quality standards, transparent communications. Explain objectives: SDMX as single standard for transparent and consistent data production; services for single roboust process. Explain planned timeline in terms of service availability.
16
Duration: 6 min for slides 3-7
Talking points: Systems perspective: Automation and standardization. Industrial grade data validation. We are offering a single, roboust service solution to support a commonly agreed way to validate data. Context: The current technological landscape of Eurostat is fragmented, with multiple parallel systems, technologies and approaches. Our ultimate direction is identical but parallel efforts result in lost opportunities, added cost and manual activities that are error prone. Shareability, transparency, reusability of technology and knowledge. We are building a single, automated system, utilizing readily available technologies and reusing existing infrastructure. Tried-and-tested, precedence of implementation exists already (NA, STS, EA). Automation brings increased consistency, predictability and productivity. Benefits to business: compressed response times, ease of use, no validation ping-pong, influence on rule construction, reusable rules, development synergies, specialized support. Pre-validation. Walkthrough of process: The architecture is designed to execute validation and return results to the user. Edamis remains the single point of entry.
17
Talking points: Process managed orchestrates process, all other components have a single purpose. This serves to provide stability and is easy to maintain and configurable. Benefits to business: All configurable: messaging, business rules, validation rules, report structures, rule severities. The process manager is configured as per business requirements. All other components have a single purpose and are called when needed. Fully automated.
18
Talking points: Recap what the services are and what they do. STRUVAL: SDMX compliance (including checks on file format, well-formedness and validity of the of xml structure); Compliance with relevant DSD; Presence of mandatory fields. Up and running. Public SDMX registry for DSDs. Feedback positive.
19
Talking points: CONVAL: domain and dataset specific validation rules and rulesets, within or between files and sources (as startegic goal). VRM and VTL – collection of agreed and standarized rules and rule sets. Shareability and reusability of rules. VRM: easy to use, intuitive rule creation and management tool for DMAs, graphical interface, easy to learn, no VTL scripting knowledge needed. Involvement of external users in rule creation is encouraged – collaboration leads to higher quality and transparency regarding goals.
20
Talking points: Result sent back to Edamis (no change), if dataset passes validation successfully, the data is transferred to further processing and dissemination. Two final points to make here: Full automation allows for pre-validation process runs. Informs data provider, does not interrupt anyone else. Reporting. Configurable, user-defined. Error categorization, clear actions to be taken, INC managemant.
21
Duration: 3 mins. for slides 8-10
Talking points: Implementation options are local decision and Eurostat service support will be available. Local replication. Services shared as software packages, process orchestartion is local responsibility. Independent and autonomous approach, transmission only once input is correct. May be desirable in case of sensitive data. Mixed model. BPaaS. Reliance on roboust Estat infrastructure, no local overhead. Supported by Estat INC management and dedicated point of contact. Clarify: Transmitted data will automatically run through Estat validation process in either scenario.
22
Duration: Talking points:
23
Duration: Talking points: Equals to BPaaS in cloud computing.
24
Deploy to production and use
Collect your requirements Deploy to production and use Prepare and configure Duration: 3 min. Talking points: Service implementation is collaborative activity, this is business driven change, not a technologically driven one. This slide explains the steps of the implementation process. Requirements: Introduce implementation steps and general areas of interest (collection of rules and dataflows etc.). Business needs emphasized. Configuration: All is provided for the business. 2-3 weeks. Testing: Business involvement not mandatory but recommended to build experience. External testers may be involved if desired. Deployment: Piloting phase (limited participants), and parallel runs executed for risk free introduction to production systems. Post-deployment support, change requests and ongoing development effort > this is only Phase 1. Test and accept
25
Q4 2017 Q4 2017 February 2017 Duration: 1 min. Talking points:
Repeat slide to highlight service availablity timelines.
26
Thank you for your interest!
Talking points: Open for questions. Thank you for your interest!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.