VTL: Validation and Transformation Language

Slides:



Advertisements
Similar presentations
Background Data validation, a critical issue for the E.S.S.
Advertisements

1 Owner-Occupied Housing Summary of the pilot Item 5 of the Agenda D4 – Price Statistics HICP Working Group Luxembourg October 2007.
Working Group on Environmental Expenditure Statistics Luxembourg, March 2015 EGSS data production and dissemination (point 3.1 of the agenda) Eurostat.
Eurostat Expression language (EL) in Eurostat SDMX - TWG Luxembourg, 5 Jun 2013 Adam Wroński.
Eurostat SDMX and Global Standardisation Marco Pellegrino Eurostat, Statistical Office of the European Union Bangkok,
Metadata Working Group Jean HELLER EUROSTAT Directorate A: Statistical Information System Unit A-3: Reference data bases.
1 SDMX Global Conference September 2015 SDMX into the future VTL (Validation and Transformation Language) A new technical standard for enhancing.
ESS Slide 1 Quality assessment of MEHM in SILC Eurostat Unit F5 “Health and Food Safety Statistics” 4 th meeting of the Task Force on Health Expectancies.
EDIT validation tool item 8 of the agenda Structural Business Statistics Working Group 14 April 2015, Luxembourg Arlind Dobërdolani.
Agenda item 5 ESS Vision 2020: other activities DIGICOM and SIMSTAT DIME-ITDG joint plenary Luxembourg,
UNECE-CES Work session on Statistical Data Editing
Module V Creating awareness on validation of the acquired competences
15-16 November 2017 Valenciennes, Cité des Congrès
Eurojust cases involving crimes against children
Progress on ESS Validation Project
ESS Vision 2020 Validation: Implementation of deliverables
11. The future of SDMX Introducing the SDMX Roadmap 2020
SDMX Visualisation.
ESTP I October 2012, Luxembourg Item 4 of the agenda
3rd WGM Meeting 3 May 2018 Item 2.2 Results of the Task Force on Validation State of play of ESS Validation project.
2.1. ESS Agreement on Learning Mobility (IVET & Youth)
Education and Training Statistics Working Group, May 2011
Towards a European validation architecture
ESS Vision 2020: ESS.VIP Validation
Ag.no.16 A65 country manuals and country assessments
2. An overview of SDMX (What is SDMX? Part I)
Habides update (May 2011).
State of play of Urban Audit
Education and Training Statistics Working Group
Agenda Item 2.1 SES 2014: follow-up
ESS Validation Project State of Play and next steps
Business and IT Architecture for ESS validation
Statistical Information Technology
Document reference, if any
LAMAS Working Group 29 June-1 July 2016
3rd WGM Meeting 3 May 2018 Item 2.3 Possible standards for ESS Validation.
Progress Report on Annual Financial Accounts
Reporting – Art 17 of HD and Art 12 of BD
Item 7.1 Implementation of the 2016 Adult Education Survey
ESS Validation Project State of Play and next steps
Programme adoptions Cohesion Policy:
Item 7.1 – Overview of 2012 UOE data collection
SBS Quality reporting item 4 of the agenda
State of play: data transmission, validation and dissemination
The Macroeconomic Imbalances Procedure - brief overview
LAMAS Working Group 29 June-1 July 2016
Item 4.1 – Overview of 2014 UOE data collection
4.1 Do you speak VTL? Validation and Transformation Language
EuroGroups register First results of measures on advancement
Income distribution: flash estimates 2016 (FE) Item 3.6 of the agenda
FISIM State of play Agenda Item 3.
European Statistical Training Programme (ESTP)
Item 7.3 (b) SDMX for UOE data collection
Actions on Data validation
Item 7.11 SDMX Progress report
Doc.A6465/16/03 Ag.no.16 A65 country manuals
Doc.A6465/14/04 Ag.16 A65 country manuals
IT security assurance – 2018 and beyond Item 2 of the agenda DIME/ITDG Steering Group June 2018 Pascal JACQUES ESTAT B2/LISO.
LAMAS Working Group 5-6 October 2016
ESTP Training Course “Enterprise Architecture and the different EA layers, application to the ESS context ” Rome, 16 – 19 October 2017.
Modernisation of Validation in the ESS Status report
Teodora Brandmuller Unit E4
LAMAS Working Group June 2018
Modernisation of Validation in the ESS Collaboration with countries
Connectivity to secure networks
ESS Vision and VALIDATION
Project objectives and benefits
VTL – Validation and Transformation Language: a new emerging standard
EDAMIS3: CURRENT STATUS
Presentation transcript:

VTL: Validation and Transformation Language ESTP training course Item 4 Luxembourg, 21-22 Nov 2017 Maurizio.Capaccioli@ec.europa.eu Eurostat, Unit B1

VTL: the origin Based on a generic information model that can be used with different standards: SDMX, DDI, GSIM or others VTL is maintained by the VTL Task Force, composed of members of Eurostat, ECB, ILO, INEGI, Bank of Italy, ISTAT The VTL Task Force works under the umbrella of the SDMX Technical Working Group VTL is a potential standard syntax for validation created by the SDMX Sponsors. The aim of VTL is to provide an unambiguous language to communicate validation rules between different statistical organisations. Through the ESS.VIP Validation Eurostat contributed to the development of VTL 1.0, which was published in March 2015. The ESS.VIP Validation has also conducted a pilot translation of validation rules into VTL in selected domains (i.e. SIMSTAT, Animal Production Statistics, National Accounts coming soon). Through the ESSnet, the ESS.VIP Validation is also conducting a thorough review of VTL to see whether it can meet the criteria for a standard ESS validation language.

Mapping of SDMX and VTL artefacts Messages for exchanging VTL rules SDMX, which stands for Statistical Data and Metadata eXchange is an international initiative that aims at standardising and modernising (“industrialising”) the mechanisms and processes for the exchange of statistical data and metadata among international organisations and their member countries. SDMX is sponsored by seven international organisations including the Bank for International Settlements (BIS), the European Central Bank (ECB), Eurostat (Statistical Office of the European Union), the International Monetary Fund (IMF), the Organisation for Economic Cooperation and Development (OECD), the United Nations Statistical Division (UNSD), and the World Bank. VTL is a potential standard syntax for validation created by the SDMX Sponsors. The aim of VTL is to provide an unambiguous language to communicate validation rules between different statistical organisations. Through the ESS.VIP Validation Eurostat contributed to the development of VTL 1.0, which was published in March 2015. The ESS.VIP Validation has also conducted a pilot translation of validation rules into VTL in selected domains (i.e. SIMSTAT, Animal Production Statistics, National Accounts coming soon). Through the ESSnet, the ESS.VIP Validation is also conducting a thorough review of VTL to see whether it can meet the criteria for a standard ESS validation language. SDMX implementation: in progress Mapping of SDMX and VTL artefacts Messages for exchanging VTL rules Registry for storing VTL rules Web services for retrieving VTL rules

VTL – purposes provide an unambiguous language to communicate validation rules between different statistical organisations provide a high-level language to document the data transformations provide an efficient language for implementing data validation services provide an efficient language for implementing data transformations The ESS.VIP Validation took as a starting point for its activities the following definition given by the UNECE: Data validation is "an activity aimed at verifying whether the value of a data item comes from the given (finite or infinite) set of acceptable values." Data validation is focused on checking the validity/consistency of data. Checking process or structural metadata is not within the scope of validation, though process or structural metadata may serve as input to validation procedures.

Versions of VTL VTL 1.0 published in March 2015 Collection of comments (public review) VTL 1.1 published in November 2016 VTL 2.0 will be published in December 2017 SDMX web site: http://sdmx.org/?page_id=5096 VTL part 1 (General description) VTL part 2 (Library of Operators) eBNF (Extended Backus-Naur Form) Technical notation

VTL – main principles Most of the VTL operators operate on datasets A dataset is described by dimensions, measures and attributes Example: ds_bop_1 REF_AREA PARTNER TIME OBS_VALUE OBS_STATUS EU25 CA 2010 20 D BG 1 P RO EU27 23 VTL part 1 (General description) VTL part 2 (Library of Operators) eBNF (Extended Backus-Naur Form) Technical notation Dimension Measure Attribute

VTL – main principles Example of a typical VTL operation: ds3 := ds1 + ds2 Operations carried out by VTL: join the data points of the ds1 and ds2 using the dimension values apply the scalar function "+" to all pairs of numeric measures of ds1 and ds2 having the same name if desired, execute an attribute propagation function defined by the user (e.g. concatenate the "flag" attribute of the two data points) create a temporary dataset containing the resulting data points VTL part 1 (General description) VTL part 2 (Library of Operators) eBNF (Extended Backus-Naur Form) Technical notation

Example of VTL validation rules Hierarchical validation rules Data point validation rules Time-series rules Boolean conditions VTL is a potential standard syntax for validation created by the SDMX Sponsors. The aim of VTL is to provide an unambiguous language to communicate validation rules between different statistical organisations. Through the ESS.VIP Validation Eurostat contributed to the development of VTL 1.0, which was published in March 2015. The ESS.VIP Validation has also conducted a pilot translation of validation rules into VTL in selected domains (i.e. SIMSTAT, Animal Production Statistics, National Accounts coming soon). Through the ESSnet, the ESS.VIP Validation is also conducting a thorough review of VTL to see whether it can meet the criteria for a standard ESS validation language. check ( ds1#obs_value >= 0 )

VTL - hierarchical ruleset Hierarchical ruleset: hr_euro_agg N. Antecedent variables: time Rule variables: ref_area 1   EU15 = AT + BE + LU + DE + ES + FI + FR + EL + IE + IT + NL + PT + DK + UK + SE 2 EU25 = EU15 + CY + CZ + ES + HU + LT + LV + MT + PL + SK + SI 3 EU27 = EU25 + BG + RO 4 EU28 = EU27 + HR 5 time between 1995 and 2003 EU = EU15 6 time between 2004 and 2005 EU = EU25 7 time between 2006 and 2012 EU = EU27 8 time >= 2013 EU = EU28 VTL syntax: define hierarchical ruleset hr_euro_agg ( antecedent variable = time, variable = ref_area) is EU15 = AT + BE + LU + DE + ES + FI + FR + EL + IE + IT + NL + PT + DK + UK + SE ; EU25 = EU15 + CY + CZ + ES + HU + LT + LV + MT + PL + SK + SI ; EU27 = EU25 + BG + RO ; EU28 = EU27 + HR ; when time between 1995 and 2003 then EU = EU15 ; when time between 2004 and 2005 then EU = EU25 ; when time between 2006 and 2012 then EU = EU27 ; when time >= 2013 then EU = EU28 ; end hierarchical ruleset VTL is a potential standard syntax for validation created by the SDMX Sponsors. The aim of VTL is to provide an unambiguous language to communicate validation rules between different statistical organisations. Through the ESS.VIP Validation Eurostat contributed to the development of VTL 1.0, which was published in March 2015. The ESS.VIP Validation has also conducted a pilot translation of validation rules into VTL in selected domains (i.e. SIMSTAT, Animal Production Statistics, National Accounts coming soon). Through the ESSnet, the ESS.VIP Validation is also conducting a thorough review of VTL to see whether it can meet the criteria for a standard ESS validation language.

VTL – datapoint validation ruleset create datapoint ruleset dr_flow_positive ( flow, obs_value ) when flow = "IMP" or flow = "EXP" then obs_value > 0 ; end horizontal ruleset The datapoint ruleset: is defined on the variables flow and obs_value verifies that in each data point of the dataset to be validated (not shown here) the component obs_value is greater than zero when the flow is "IMP" or "EXP". the above syntax creates a ruleset (a permanent object) named "dr_flow_positive"

VTL – checking boolean conditions ds_result := check ( ds1 # obs_value > 1000, errorcode ( "Value must be greater than 1000" ) )

Exercise 1 VTL code: ds_result  := check ( ds_bop # time_period between 2008 and 2015, errorcode(“_____”), errorlevel(“Error”) ) ; ds_bop is the dataset containing the data to be validated Question: What is the correct text (error message) to be inserted in _____ ?

Exercise 2 ds_bop1 REF_AREA PARTNER TIME OBS_VALUE OBS_STATUS EU25 CA 2010 20 D BG 1 P RO EU27 23 VTL code: ds_result  := check ( ds_bop1 # obs_value, hr_euro_agg ) ; hr_euro_agg is the hierarchical ruleset described in slide 9. Question: What is the data point contained in the ds_result dataset?

Exercise 3 ds_bop1 REF_AREA PARTNER FLOW TIME OBS_VALUE OBS_STATUS EU25 CA IMP 2010 20 D BG 1 P RO EU27 23 VTL code: ds_result  := check ( ds_bop1, dr_flow_positive ) ; dr_flow_positive is the datapoint ruleset described in slide 10. Question: What is the data point contained in the ds_result dataset?

VTL – assessement of usability Assessment of usability by statisticians: Covering several domains: Animal Production, Asylum, International Trade in Services, National Accounts, Short Term Statistics Participation of 8 countries + Eurostat Some comments received: Rules in plain english and examples of bad/good data are both essential Rules in VTL may be useful as complement (to limit risks of ambiguity) Need to agree on way to express the rule (negative or positive)

Development of VTL tools IT tools and services under development: ECB VTL parser Norway Java API based on JSON-stat format https://github.com/statisticsnorway/java-vtl Poland VTL to SQL translator UNECE paper Istat VTL Editor ESTAT Compiler (part of the Validation Service) ESTAT Validation Rule Manager ESTAT Sandbox: simple GUI + VTL translator to SQL

Use of VTL Use of VTL: ECB BIRD portal Continuous Capture of Metadata VTL is used to document the data validations and transformations of the statistical process: http://banks-integrated-reporting-dictionary.eu/bird-group Continuous Capture of Metadata There is a proposal to use VTL as a common language to describe data transformations http://c2metadata.org/

Thank you for your attention! Any questions?