1 SDMX Global Conference September 2015 SDMX into the future VTL (Validation and Transformation Language) A new technical standard for enhancing data validation and processing Vincenzo Del Vecchio - Bank of Italy Marco Pellegrino – Eurostat SDMX TWG & VTL Task Force
September 2015 SDMX Global Conference Approach SDMX originally focused on data collection and dissemination Current line of tendency: Support more stages of the statistical production process Generic Statistical Business Process Model
September 2015 SDMX Global Conference What is VTL A standard language For defining validation and transformation rules Validation (now) Transformation (partially now, to be enriched at a later stage) Main goals: Define and preserve validation and transformation rules Exchange and share rules Apply rules in industrialized processes Apply to several standards (e.g. SDMX, DDI, GSIM) thanks to a generic information model
The VTL Information Model VTL is a “stand-alone” specification –It can be used with SDMX, DDI, GSIM or potentially anything else –It can be used on its own VTL has its own information model –All kind of data are modelled as mathematical functions having independent variables (Identifiers) and dependent variables (Measures and Attributes) –GSIM IM is used as a basis –It can be mapped against SDMX –It can be mapped against DDI September 2015 SDMX Global Conference 4
Main VTL drivers (1) Business orientation –Designed for subject matter experts use Integrated Approach –Any kind of data –Independent of the phase of the process –Unique language for validation and calculation September 2015 SDMX Global Conference
Main VTL drivers (2) IT implementation independence –Independent of IT tools –Allowing multiple tools –Resilient to tools changes Active Role for processing –Formal (described by means of BNF) –Able to drive the validation & calculation software Extensible and customizable September 2015 SDMX Global Conference
September 2015 SDMX Global Conference VTL 1.0 Operators
September 2015 SDMX Global Conference VTL features (1) Declarative language based on Expressions D4 := Check( (D1 – D2) = D3) D1, D2, D3:Operands D4: Result +, >Operators Operates on Data Sets (SDMX Dataflows) D1, D2, D3, D4are typically Data Sets, e.g.: D1 – population at time T by age and civil status D2 - population at time T-1 by age and civil status D3 – population flows between T-1 and T by age and civil status D4 – consistency of population figures (true / false), by age and civil status … and on parts of Data Sets e.g. Time Series, Cross Sections, single Data Points
September 2015 SDMX Global Conference VTL features (2) Supports operations on many types of statistical data, e.g.: Dimensional and Unit data, Survey and register data, Quantitative and qualitative data, …... And can combine them, e.g.: D1 – Securities Register (by security id) D2 – Securities Holdings (by security holder, security id, date) D3 := merge (D1, D2, on (D1#sec_id = D2#sec_id), return D2#sec_holder, D2#sec_id, D1#sec_type) produces D3 by adding to D2 the security type taken from D1)
September 2015 SDMX Global Conference VTL features (3) Can concatenate expressions D4 := Check( (D1-D2) = D3) D5 := if D4 = False then D2 else D1 (the result of the former is an operand of the latter) Considers the validation as a kind of Transformation (calculation), in order to Use a common language Use validations and calculations together, e.g.: Validation: D4 := Check( (D1-D2) = D3) Calculation: D5 := if D4 = False then D2 else D1
September 2015 SDMX Global Conference The Tranformations graph Collection activity n.1 D1D1 D2D2 D3D3 D4D4 D5D5 T1T1 T3T3 T2T2 D 11 D 12 D 13 D 15 D 17 D 16 T 13 T 12 T14T14 Collection activity n.2 Collection activity n.3 D 21 D 22 D 23 D 24 T 22 T 21 Legend: D i = Data Set i T j = Transformation j D 51 D 52 T 53 T 52 T 51 Analysis & research models D 54 D 53 T 54 D 60 D 61 Publications T 60 T 61 Statistical products D 70 T 71 T 70 T 72 D 71 D 72 D 41 T 42 T 41 D 42
September 2015 SDMX Global Conference VTL features (4) VTL 1.0 allows: Persistent and temporary results Operations on mono and multi measure data Dealing with missing data Dealing with Attributes and their propagation rules VTL 1.1 will introduce: Other operators, mainly for validation purposes Reusable rules Bug fixing, fine tuning
September 2015 SDMX Global Conference VTL status VTL 1.0: published in March 2015 –( VTL: part 1 - part 2part 1part 2 BNF (Extended Backus-Naur Form) Technical notation VTL 1.1 (language extensions): work in progress SDMX implementation: work in progress –Messages for exchanging VTL rules –Registry for storing VTL rules –Web services for retrieving VTL rules
VTL is maintained by the SDMX TWG through the VTL Task Force –Extensions will be considered for inclusion in future versions VTL has already produced some feedback to GSIM for next version –VTL can be mapped against SDMX –VTL can be directly utilized by DDI in those places where computations are included –As GSIM processing Rules Governance and Standards Alignment September 2015 SDMX Global Conference 14
September 2015 SDMX Global Conference SDMX into the future Contribute to VTL 1.1 !!! Comments on VTL 1.0 and suggestions for improvement can be sent to the SDMX Technical Working Group Thanks for your attention !