SDMX Tools Architecture

Slides:



Advertisements
Similar presentations
1 SDMX Reference Infrastructure (SDMX-RI) Work in progress, status and plans Bengt-Åke Lindblad, Adam Wroński Eurostat Eurostat Unit B3 – IT and standards.
Advertisements

CountryData Development Improving the collation, availability and dissemination of development indicators (including the MDGs) Nairobi, 27 November 2013.
13-Jul-07 Implementation of SDMX for data and metadata exchange Balance of Payments Working Group 2-3 April 2012 Daniel Suranyi Eurostat B5 Management.
1 Meeting on the Management of Statistical Information Systems (MSIS 2010) SDMX architecture for data sharing and interoperability Francesco Rizzo, ISTAT,
Eurostat 6. SDMX: A non-technical overview of the SDMX architecture and IT tools 1 Raynald Palmieri Eurostat Unit B5: “Central data and metadata services”
Eurostat SDMX and Global Standardisation Marco Pellegrino Eurostat, Statistical Office of the European Union Bangkok,
Eurostat 4. SDMX: Main objects for data exchange 1 Raynald Palmieri Eurostat Unit B5: “Central data and metadata services” SDMX Basics course, October.
2.An overview of SDMX (What is SDMX? Part I) 1 Edward Cook Eurostat Unit B5: “Central data and metadata services” SDMX Basics course, October 2015.
SDMX IT Tools SDMX Reference Infrastructure
Eurostat SDMX Reference Infrastructure: Tools demonstration November 2015 Eurostat Unit B3 – IT and standards for data and metadata exchange Jean-Francois.
SDMX IT Tools SDMX use in practice in NA
Eurostat November 2015 Eurostat Unit B3 – IT and standards for data and metadata exchange Jean-Francois LEBLANC Christian SEBASTIAN SDMX IT Tools SDMX.
7b. SDMX practical use case: Census Hub
Implementation of SDMX for Balance of Payments Balance of Payments Working Group 9-10 April 2013 BP Daniel Suranyi Eurostat B5 Management of statistical.
Eurostat 6. SDMX: A non-technical overview of the SDMX architecture and IT tools 1 Raynald Palmieri Eurostat Unit B5: “Central data and metadata services”
SDMX Basics course, March 2016 Eurostat SDMX Basics course, March Introducing the Roadmap Marco Pellegrino Eurostat Unit B5: “Data and.
ΕΚΤ Access to Knowledge ΕΚΤ Access to Knowledge R&D Statistics Information System: An Interoperability Tail between CERIF and SDMX Dimitris Karaiskos Dimitrios.
IAEA International Atomic Energy Agency Implementing SDMX for Energy Domain: From Discussion to Actual Implementation and Testing Andrii Gritsevskyi Oslo.
B.6 Roadmap 2013 – 2014 SDMX RI User Group Luxembourg, September 2013.
The evolution of the SDMX infrastructure and services
4. SDMX: Main objects for data exchange
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Database System Concepts and Architecture.
Training course on Euro SDMX Registry
The CVD Metadata Handler
SDMX Opportunities MED Meeting 14 May 2013 Daniel Suranyi Eurostat B5
SDMX Information Model
7. SDMX practical use case: National Accounts
Practical use case of SDMX (1): Short-term Statistics (STS)
Census Hub in practice Working Group "European Statistical Data Support" Luxembourg, 29 April 2015.
SDMX Converter Raynald PALMIERI June 2015
SDMX: A brief introduction
11. The future of SDMX Introducing the SDMX Roadmap 2020
The Euro SDMX Registry & SDMX Global Registry
Data collection of 2012: Data transmission standards and tools
SDMX Reference Infrastructure Introduction
Census Hub: Progress report
2. An overview of SDMX (What is SDMX? Part I)
Eurostat – Units E2, B5 Cristina BLANARU
2. An overview of SDMX (What is SDMX? Part I)
Census Hub – progress report Francesco Rizzo Unit B5
Workshop on ESA 2010 transmission programme – What and how?
Data Transmission Tools & Services EDAMIS, SDMX, Validation
SDMX Tools Overview and architecture
Statistical Information Technology
SDMX IT Tools SDMX Converter
SDMX as basis for water data reporting
Practical use cases of SDMX: Census Hub
ESS VIP ICT Project Task Force Meeting 5-6 March 2013.
X-DIS project: final report
Item of the Agenda Towards an integrated Eurostat metadata handler – Eurostat SDMX Registry services for Member States Francesco Rizzo Unit B3 13.
SDMX IT Tools SDMX use in practice in NA
9. Practical use case 3: Pesticides Use Project
SDMX Implementation The National Accounts use case
GENEDI EUROPEAN COMMISSION - EUROSTAT GENERIC EDI TOOLBOX
Eurostat Unit B3 – IT and standards for data and metadata exchange
EDIT data validation system Ewa Stacewicz EUROSTAT VALIDATION TEAM
Eurostat Unit B3 – IT and standards for data and metadata exchange
Eurostat Unit B3 – IT and standards for data and metadata exchange
Eurostat Unit B3 – IT and standards for data and metadata exchange
5. SDMX: General input requirements
Marco Pellegrino, Bengt-Åke Lindblad
7. Introduction to the main SDMX objects for metadata exchange
Developing SDMX artefacts for data exchange, sharing and dissemination
SDMX: Frequently Asked Questions
Standardizing and industrializing a business process – the dissemination use case Alessio Cardacino - ESTP Course “Information standards.
SDMX IT Tools SDMX Registry
Integrated Statistical Production System WITH GSBPM
SDMX IT building blocks
Presentation transcript:

SDMX Tools Architecture Raynald PALMIERI June 2015 IPA Workshop on ITS Statistics

Technical Integration National statistical organisations International organisations National databases (Mapping) Source SDMX Data Structure Defintions & Data Flows National data sources SDMX Implementation Challenge Metadata driven process based on SDMX standards and tools Full automation of the data exchange possible

Registry

Work roles and process SDMX Registry Interpret SDMX messages Consult and get DSD ColLecting organisation Publish DSD Consult and get DSD Now we'll turn to another important module of the standard: the IT architecture. The IT architecture involves: The standard formats for the data exchange. Different architectures for data exchange. And the SDMX registry. Send SDMX messages Create DSD Format / structure SDMX messages 4

SDMX Registry Only concerned with providing information needed to access the data and reference metadata sets. An application which wants a particular data or metadata set would then query the registry for the URL, and then go and retrieve the data or metadata set directly from the provider's web server. Example of existing SDMX registries: Euro SDMX Registry (2.1) SDMX Global Registry (2.1) The Euro-SDMX registry is a metadata registry which implements the SDMX registry specifications. It provides a web-based user interface as well as web services for interacting with the SDMX structural metadata objects in use within Eurostat and with statistical partners. It will enable National Statistics Institutes and other external organisations to obtain Data Structure Definitions (DSD) and other structural metadata, such as the Metadata Structure Definitions (MSD) for the Euro-SDMX Metadata Structure (ESMS). The business vision for SDMX envisages a "data sharing" model to facilitate data and metadata exchange. The SDMX Registry is tasked with providing structure, organisation, and maintenance and query interfaces for most of the SDMX components required to support the data sharing vision. The "data sharing" model is based on the possibility of discovering easily where data and metadata are available and how to access them. The SDMX Registry plays an important role in this architecture, in fact it can be seen as a central application which is accessible to other programs over the Internet (or an Intranet or Extranet) to provide information needed to facilitate the reporting, collection and dissemination of statistics. In its broad terms, the SDMX Registry – as understood in web services terminology – is an application which stores metadata for querying, and which can be used by any other application in the network with sufficient access privileges. It can be seen as the index of a distributed database or metadata repository which is made up of all the data provider’s data sets and reference metadata sets within a statistical community. It is important to stress that registry services are not concerned with the storage of data or reference metadata sets. Data and metadata sets are stored elsewhere, on the sites of the data providers. The registry is only concerned with providing information needed to access the data and reference metadata sets. An application which wants a particular data or metadata set would then query the registry for the URL, and then go and retrieve the data or metadata set directly from the provider's web server. The SDMX registry specifications define the logical interfaces for the SDMX registry in terms of the functions required and the data that may be present in the function call, and the behaviour expected of the registry and normative rules regarding the Registry Interface and the identification of registry objects. The Euro-SDMX Registry can be reached via its web graphical user interface at https://webgate.ec.europa.eu/sdmxregistry. It provides the registry web services according to the SDMX Version 2.0 standards. Eurostat has also deployed a training instance of the Registry, which can be used as a "sandbox" for training courses and presentations, without the risk of modifying the real registry. Access to the training environment can be provided on request. The tool is licenced under the terms of the European Union Public Licence V.1.1 A complete SDMX Registry package is stored on CIRCA repository and can be downloaded here: EuroSdmxRegistry_Java_v3.3.19_2011.08.19.zip (ZIP) The package contains an application, source code and also documentation such as a User Manual, Installation Guide, Analysis and Design as well as Testing Documentation. All versions of the tool can be downloaded from CIRCA repository.

SDMX Global Registry(SGR) Contains SDMX DSD and artefacts that are used and maintained by national and international organisations. Designed in such a way that it can be developed into a global portal for SDMX structural data and metadata, providing not only information about how statistical information is structured, but also where the related data can be accessed.

Where to find the BOP DSD ? http://sdmx.org/?page_id=1747

Navigate the content of the registry Registry Content Navigate the content of the registry Search for an item that contains the text entered Technical view of SDMX objects

Dimensions, measure and attributes Registry Content Download the DSD (XML file) Dimensions, measure and attributes Clicking on a concept on the left frame, displays the corresponding list of codes (if exists)

View the xml file in your browser (DSD NA_MAIN)

SDMX Tools

SDMX Tools for Data Providers Tools offered Use Action Web Forms (EU) Excel-like templates Transmission of low volumes of data No costs for EU organisations Manual work for senders (type, copy/paste) SDMX Converter (sender’s PC) Converts data files between SDMX formats and other file formats Installation on sender’s PC Manual work for senders (convert) (batch mode) Installation on server SDMX Reference Infrastructure A set of tools that allows to connect your IT systems to the SDMX world Mapping of database to DSDs Own SDMX Implementation Local development in organisations

Data provider view: One goal – different possibilities Webforms EDAMIS Single Entry Point SDMX-ML file Database export SDMX Converter Database SDMX Reference Infrastructure SDMX-ML file SDMX Web Service Same DSD  Same Format Excel sheets SDMX Converter SDMX-ML file

Data provider view: One goal – different IT Architectures Webforms Web-based Push mode Excel sheets SDMX Converter Local conversion Push mode Database export SDMX Converter Local conversion Push mode Database SDMX Reference Infrastructure Pull mode

SDMX Dataset based on a DSD Structure The data structure definition being defined, let’s see how the statistical data are organised in the file to be exchanged, what is called Dataset in SDMX. In the related SDMX-ML structure Specific Time series data file we can highlight the different structural constructs to which attributes can be related: The Data set, where for this example Attribute UNIT is related The Series, where for this example no Attribute is related And finally the Observation level, where the Attribute Observation status is linked to. Each information in the table is placed into the dataset with its corresponding concept name: UNIT FREQ COUNTRY TOURISM INDICATOR TOURISM ACTIVITY TIME, OBSERVATION VALUE and OBSERVATION STATUS 15

SDMX Converter https://webgate. ec. europa. eu/fpfis/mwikis/sdmx/index File based conversion Open source and platform independend (Java) Different ways of using it: Graphical user interface Batch file (server or client side) Web Service interface Reusing source code in your own Java application Formats: SDMX-ML 2.0, SDMX-EDI (Gesmes), FLR, CSV, Google DSPL, XLS, XLSX (2014)

Select the Input file Select the output file Select the input and output formats Identify a DSD to download from the SDMX Registry Select the DSD on the local drive Identify a dataflow linked to the DSD to download from the SDMX Registry Select / manage headers for CSV input formats CSV parameters Select mapping / transoding tables GESMES representation for GESMES output formats XML parameters for SDMX output formats Load / save the current settings

SDMX Reference Infrastructure SDMX-RI Generalized service infrastructure that can be re-used partially or completely by any organisation interested in starting SDMX projects for data exchange. Set of pick-and-choose building blocks allowing a statistical office to expose data to the external world based on access rights Developed in both Java and .NET with well defined API SDMX-RI is a generalized service infrastructure that can be re-used partially or completely by any organisation interested in starting SDMX projects for data exchange. An organisation can decide to use the SDMX Reference Infrastructure as a whole, can extend the infrastructure adding new modules, can modify some modules, or can integrate some building block within its existing dissemination environment. The most common SDMX Reference Infrastructure modules and supporting tools are described below. SDMX Query Parser, which is an XML parsing API implementation for incoming SDMX-ML Query messages. It validates the received SDMX Query with the SDMX-ML Query schema and in the next step translates this Query to the internal SDMX Data Model, returning it to the Web Service Provider. Data Retriever is another tool of the RI, dedicated to retrieve respective data from dissemination databases. It operates by translating the internal SDMX Data Model Query to an SQL statement which helps requested data to be taken from the dissemination database, using in the process the mapping information from the Mapping Store. It results with an SDMX dataset represented in the internal SDMX Data Model. Structure Retriever is similar in operation to the module mentioned above, which retrieves the SDMX Data Structures based on an SDMX Structure Query. It translates the query to an SQL statement and takes the SDMX Structural Metadata from the Mapping Store, delivering at the end an SDMX-ML structure message. SDMX Data Generator is used for creating the responses sent to clients. It translates the (internal Data Model) Data Message to an SDMX-ML Dataset in the requested data format. The Data Structure Definition and the SDMX-ML Dataset message format are the input in this case. There are two web components of an SDMX Reference Infrastructure: Web Client and Web Service Provider. Web Client acts as the GUI to interact with the Web Service for the display of Structural Metadata and also for the selection, display and exporting of data. It works as a web interface for creation of basic SDMX queries to expose structural metadata from a Mapping Store and data from dissemination databases. All data and metadata are retrieved using the (SDMX-RI) Web Service. User is able to customize the presentation of the tabular data and download it in various formats (e.g. SDMX-ML, XLS and CSV). Web Service Provider receives from a Client SDMX (Data or Structure) Query Messages and responds with an SDMX-ML (Data or Structure) message related to the input SDMX Query. It performs “XML Validation” of the received SDMX Query and includes a “Soap Error Handler”. It also operates and controls the information exchange with the other modules. Mapping Assistant, a tool developed by Eurostat, plays an important role in the SDMX Reference Infrastructure. The purpose of this desktop application is to translate “Legacy” Dissemination systems to the “SDMX-World”. Comparing to mentioned above modules, operating “under the hood”, the Mapping Assistant requires a user interaction. Additional explanation is available in our SDMX Reference Infrastructure PowerPoint presentation. The tools are licenced under the terms of the European Union Public Licence V.1.1 All versions of the tools can be downloaded from CIRCA repository.

SDMX Reference Infrastructure Census-HUB architecture – Eurostat to NSI Web service SDMX Query (XML File) SDMX DataMessage (XML File) SDMX-RI Eurostat Census Hub National Statistics Institute 19

SDMX Reference Infrastructure https://webgate. ec. europa Data Provider Data Collector SDMX Registry Mapping Assistant DSD Web Svc Web Client Test Client Non-SDMX local database SDMX data set SDMX-RI

Mapping Assistant Facilitates the mapping between the structural metadata provided by an SDMX-ML Data Structure Definition (DSD) and those that reside in a database of a dissemination environment. Maintains a Mapping Store for keeping the mappings between the SDMX and the local data storage scheme In the SDMX Reference Infrastructure, provides mapping information to the Data Retriever The Mapping Assistant is meant to facilitate the mapping between the structural metadata provided by an SDMX-ML Data Structure Definition (DSD) and those that reside in a database of a dissemination environment. Mapping Assistant maintains a Mapping Store for keeping the mappings between the SDMX and the local data storage scheme. In the SDMX Reference Infrastructure, the Mapping Assistant provides mapping information to the Data Retriever. The Data Retriever module connects to the Mapping Store database and accesses the appropriate mappings to translate the SDMX-ML queries to SQL for the dissemination database. The mapping process with the Mapping Assistant tool can be described in four steps: Step 1 – loading of the SDMX structures - CategoryScheme, DataFlow, Data Structure Definition – from SDMX-ML structure files Step 2 – loading of the local non-SDMX database schema and the creation of the Dataset Step 3 – mapping of local concepts to SDMX Concepts of the Data Structure Definition Step 4 – trancoding of local codes to SDMX Codes of the Codelists referenced in the Data Structure Definition The objective of the mapping process is to define the mappings for each of the mandatory SDMX Components to local data columns, residing in a local storage (Dissemination Database). The mapping should be complete and well defined, in a machine readable way. This allows automated data retrieval from the Dissemination Database (DDB) by an SDMX-ML Query and the transformation of the dataset to an SDMX-ML format. The Mapping Assistant (MA) employs an intermediate layer in order to simplify the mapping and hide complexity of the Local Storage. Therefore the MA defines a Dataset artefact that encapsulates the local storage. The Dataset presents the required data as they were in a single table. These table columns have to be mapped with the SDMX Components. Therefore, defining a Dataset includes defining the columns of this virtual table and the SQL query that returns them. Dataset concept resembles the concept of the SQL VIEW where data coming from a complex are presented in single table. A complete definition of a Dataset requires also the Connection to the DDB that the query is executed upon. Having defined a Dataset the mapping to the SDMX constructs is simplified. Mapping Assistant utilizes the Mapping Set artefact that groups the mappings for a specific data retrieval case. Since data exchange in SDMX is based on Dataflow, the Mapping Set is defined using a specific Dataflow and a Dataset. The mappings are performed between the data fields (Concepts) of the Data Structure Definition related with the particular Dataflow & the Dataset’s columns. Also matching between the codes defined (in Codelists) for the DSD Concepts and the codes of the Dataset’s columns (Local codes) has to be done. This is called transcoding. The tool is licenced under the terms of the European Union Public Licence V.1.1 Complete Mapping Assistant package: MappingAssistant_.NET_v2.7.2_2011.10.19.zip (ZIP), containing an application, source code and documentation such as a User Manual, Installation Guide, Analysis and Design as well as Testing Documentation, stored in the CIRCA repository.

DSD

Local database

Local variables to DSD concepts Mapping Local variables to DSD concepts Local table DSD Mapping of composite codings

SDMX Reference Infrastructure https://webgate. ec. europa Data Provider Data Collector SDMX Registry Mapping Assistant DSD Web Svc Web Client Test Client Non-SDMX local database SDMX data set SDMX-RI

Connection to the database Generate SQL and SDMX-ML file Build SDMX query Connection to the database Generate SQL and SDMX-ML file

Display the SDMX dataset in an Excel-like form

Validation

SDMX Validation Possibilities Before/During Transmission (“First Level”) - Covered by SDMX today - Format Check (SDMX-ML) - Codes exist (SDMX DSD) - Codes used correctly (SDMX Dataflow & Constraint) After Transmission ( “Second Level”) - Not yet covered by SDMX (VTL project) - Detailed value check - Validation expressions - …

SDMX Tools Architecture