Presentation is loading. Please wait.

Presentation is loading. Please wait.

SDMX Tools Architecture

Similar presentations


Presentation on theme: "SDMX Tools Architecture"— Presentation transcript:

1 SDMX Tools Architecture
Raynald PALMIERI June 2015 IPA Workshop on ITS Statistics

2 Technical Integration
National statistical organisations International organisations National databases (Mapping) Source SDMX Data Structure Defintions & Data Flows National data sources SDMX Implementation Challenge Metadata driven process based on SDMX standards and tools Full automation of the data exchange possible

3 Registry

4 Work roles and process SDMX Registry
Interpret SDMX messages Consult and get DSD ColLecting organisation Publish DSD Consult and get DSD Now we'll turn to another important module of the standard: the IT architecture. The IT architecture involves: The standard formats for the data exchange. Different architectures for data exchange. And the SDMX registry. Send SDMX messages Create DSD Format / structure SDMX messages 4

5 SDMX Registry Only concerned with providing information needed to access the data and reference metadata sets. An application which wants a particular data or metadata set would then query the registry for the URL, and then go and retrieve the data or metadata set directly from the provider's web server. Example of existing SDMX registries: Euro SDMX Registry (2.1) SDMX Global Registry (2.1) The Euro-SDMX registry is a metadata registry which implements the SDMX registry specifications. It provides a web-based user interface as well as web services for interacting with the SDMX structural metadata objects in use within Eurostat and with statistical partners. It will enable National Statistics Institutes and other external organisations to obtain Data Structure Definitions (DSD) and other structural metadata, such as the Metadata Structure Definitions (MSD) for the Euro-SDMX Metadata Structure (ESMS). The business vision for SDMX envisages a "data sharing" model to facilitate data and metadata exchange. The SDMX Registry is tasked with providing structure, organisation, and maintenance and query interfaces for most of the SDMX components required to support the data sharing vision. The "data sharing" model is based on the possibility of discovering easily where data and metadata are available and how to access them. The SDMX Registry plays an important role in this architecture, in fact it can be seen as a central application which is accessible to other programs over the Internet (or an Intranet or Extranet) to provide information needed to facilitate the reporting, collection and dissemination of statistics. In its broad terms, the SDMX Registry – as understood in web services terminology – is an application which stores metadata for querying, and which can be used by any other application in the network with sufficient access privileges. It can be seen as the index of a distributed database or metadata repository which is made up of all the data provider’s data sets and reference metadata sets within a statistical community. It is important to stress that registry services are not concerned with the storage of data or reference metadata sets. Data and metadata sets are stored elsewhere, on the sites of the data providers. The registry is only concerned with providing information needed to access the data and reference metadata sets. An application which wants a particular data or metadata set would then query the registry for the URL, and then go and retrieve the data or metadata set directly from the provider's web server. The SDMX registry specifications define the logical interfaces for the SDMX registry in terms of the functions required and the data that may be present in the function call, and the behaviour expected of the registry and normative rules regarding the Registry Interface and the identification of registry objects. The Euro-SDMX Registry can be reached via its web graphical user interface at  It provides the registry web services according to the SDMX Version 2.0 standards. Eurostat has also deployed a training instance of the Registry, which can be used as a "sandbox" for training courses and presentations, without the risk of modifying the real registry. Access to the training environment can be provided on request. The tool is licenced under the terms of the European Union Public Licence V.1.1 A complete SDMX Registry package is stored on CIRCA repository and can be downloaded here: EuroSdmxRegistry_Java_v3.3.19_ zip (ZIP) The package contains an application, source code and also documentation such as a User Manual, Installation Guide, Analysis and Design as well as Testing Documentation. All versions of the tool can be downloaded from CIRCA repository.

6 SDMX Global Registry(SGR)
Contains SDMX DSD and artefacts that are used and maintained by national and international organisations. Designed in such a way that it can be developed into a global portal for SDMX structural data and metadata, providing not only information about how statistical information is structured, but also where the related data can be accessed.

7 Where to find the BOP DSD ?

8 Navigate the content of the registry
Registry Content Navigate the content of the registry Search for an item that contains the text entered Technical view of SDMX objects

9 Dimensions, measure and attributes
Registry Content Download the DSD (XML file) Dimensions, measure and attributes Clicking on a concept on the left frame, displays the corresponding list of codes (if exists)

10 View the xml file in your browser (DSD NA_MAIN)

11 SDMX Tools

12 SDMX Tools for Data Providers
Tools offered Use Action Web Forms (EU) Excel-like templates Transmission of low volumes of data No costs for EU organisations Manual work for senders (type, copy/paste) SDMX Converter (sender’s PC) Converts data files between SDMX formats and other file formats Installation on sender’s PC Manual work for senders (convert) (batch mode) Installation on server SDMX Reference Infrastructure A set of tools that allows to connect your IT systems to the SDMX world Mapping of database to DSDs Own SDMX Implementation Local development in organisations

13 Data provider view: One goal – different possibilities
Webforms EDAMIS Single Entry Point SDMX-ML file Database export SDMX Converter Database SDMX Reference Infrastructure SDMX-ML file SDMX Web Service Same DSD Same Format Excel sheets SDMX Converter SDMX-ML file

14 Data provider view: One goal – different IT Architectures
Webforms Web-based Push mode Excel sheets SDMX Converter Local conversion Push mode Database export SDMX Converter Local conversion Push mode Database SDMX Reference Infrastructure Pull mode

15 SDMX Dataset based on a DSD
Structure The data structure definition being defined, let’s see how the statistical data are organised in the file to be exchanged, what is called Dataset in SDMX. In the related SDMX-ML structure Specific Time series data file we can highlight the different structural constructs to which attributes can be related: The Data set, where for this example Attribute UNIT is related The Series, where for this example no Attribute is related And finally the Observation level, where the Attribute Observation status is linked to. Each information in the table is placed into the dataset with its corresponding concept name: UNIT FREQ COUNTRY TOURISM INDICATOR TOURISM ACTIVITY TIME, OBSERVATION VALUE and OBSERVATION STATUS 15

16 SDMX Converter https://webgate. ec. europa. eu/fpfis/mwikis/sdmx/index
File based conversion Open source and platform independend (Java) Different ways of using it: Graphical user interface Batch file (server or client side) Web Service interface Reusing source code in your own Java application Formats: SDMX-ML 2.0, SDMX-EDI (Gesmes), FLR, CSV, Google DSPL, XLS, XLSX (2014)

17 Select the Input file Select the output file Select the input and output formats Identify a DSD to download from the SDMX Registry Select the DSD on the local drive Identify a dataflow linked to the DSD to download from the SDMX Registry Select / manage headers for CSV input formats CSV parameters Select mapping / transoding tables GESMES representation for GESMES output formats XML parameters for SDMX output formats Load / save the current settings

18 SDMX Reference Infrastructure SDMX-RI
Generalized service infrastructure that can be re-used partially or completely by any organisation interested in starting SDMX projects for data exchange. Set of pick-and-choose building blocks allowing a statistical office to expose data to the external world based on access rights Developed in both Java and .NET with well defined API SDMX-RI is a generalized service infrastructure that can be re-used partially or completely by any organisation interested in starting SDMX projects for data exchange. An organisation can decide to use the SDMX Reference Infrastructure as a whole, can extend the infrastructure adding new modules, can modify some modules, or can integrate some building block within its existing dissemination environment. The most common SDMX Reference Infrastructure modules and supporting tools are described below. SDMX Query Parser, which is an XML parsing API implementation for incoming SDMX-ML Query messages. It validates the received SDMX Query with the SDMX-ML Query schema and in the next step translates this Query to the internal SDMX Data Model, returning it to the Web Service Provider. Data Retriever is another tool of the RI, dedicated to retrieve respective data from dissemination databases. It operates by translating the internal SDMX Data Model Query to an SQL statement which helps requested data to be taken from the dissemination database, using in the process the mapping information from the Mapping Store. It results with an SDMX dataset represented in the internal SDMX Data Model. Structure Retriever is similar in operation to the module mentioned above, which retrieves the SDMX Data Structures based on an SDMX Structure Query. It translates the query to an SQL statement and takes the SDMX Structural Metadata from the Mapping Store, delivering at the end an SDMX-ML structure message. SDMX Data Generator is used for creating the responses sent to clients. It translates the (internal Data Model) Data Message to an SDMX-ML Dataset in the requested data format. The Data Structure Definition and the SDMX-ML Dataset message format are the input in this case. There are two web components of an SDMX Reference Infrastructure: Web Client and Web Service Provider. Web Client acts as the GUI to interact with the Web Service for the display of Structural Metadata and also for the selection, display and exporting of data. It works as a web interface for creation of basic SDMX queries to expose structural metadata from a Mapping Store and data from dissemination databases. All data and metadata are retrieved using the (SDMX-RI) Web Service. User is able to customize the presentation of the tabular data and download it in various formats (e.g. SDMX-ML, XLS and CSV). Web Service Provider receives from a Client SDMX (Data or Structure) Query Messages and responds with an SDMX-ML (Data or Structure) message related to the input SDMX Query. It performs “XML Validation” of the received SDMX Query and includes a “Soap Error Handler”. It also operates and controls the information exchange with the other modules. Mapping Assistant, a tool developed by Eurostat, plays an important role in the SDMX Reference Infrastructure. The purpose of this desktop application is to translate “Legacy” Dissemination systems to the “SDMX-World”. Comparing to mentioned above modules, operating “under the hood”, the Mapping Assistant requires a user interaction. Additional explanation is available in our SDMX Reference Infrastructure PowerPoint presentation. The tools are licenced under the terms of the European Union Public Licence V.1.1 All versions of the tools can be downloaded from CIRCA repository.

19 SDMX Reference Infrastructure Census-HUB architecture – Eurostat to NSI
Web service SDMX Query (XML File) SDMX DataMessage (XML File) SDMX-RI Eurostat Census Hub National Statistics Institute 19

20 SDMX Reference Infrastructure https://webgate. ec. europa
Data Provider Data Collector SDMX Registry Mapping Assistant DSD Web Svc Web Client Test Client Non-SDMX local database SDMX data set SDMX-RI

21 Mapping Assistant Facilitates the mapping between the structural metadata provided by an SDMX-ML Data Structure Definition (DSD) and those that reside in a database of a dissemination environment. Maintains a Mapping Store for keeping the mappings between the SDMX and the local data storage scheme In the SDMX Reference Infrastructure, provides mapping information to the Data Retriever The Mapping Assistant is meant to facilitate the mapping between the structural metadata provided by an SDMX-ML Data Structure Definition (DSD) and those that reside in a database of a dissemination environment. Mapping Assistant maintains a Mapping Store for keeping the mappings between the SDMX and the local data storage scheme. In the SDMX Reference Infrastructure, the Mapping Assistant provides mapping information to the Data Retriever. The Data Retriever module connects to the Mapping Store database and accesses the appropriate mappings to translate the SDMX-ML queries to SQL for the dissemination database. The mapping process with the Mapping Assistant tool can be described in four steps: Step 1 – loading of the SDMX structures - CategoryScheme, DataFlow, Data Structure Definition – from SDMX-ML structure files Step 2 – loading of the local non-SDMX database schema and the creation of the Dataset Step 3 – mapping of local concepts to SDMX Concepts of the Data Structure Definition Step 4 – trancoding of local codes to SDMX Codes of the Codelists referenced in the Data Structure Definition The objective of the mapping process is to define the mappings for each of the mandatory SDMX Components to local data columns, residing in a local storage (Dissemination Database). The mapping should be complete and well defined, in a machine readable way. This allows automated data retrieval from the Dissemination Database (DDB) by an SDMX-ML Query and the transformation of the dataset to an SDMX-ML format. The Mapping Assistant (MA) employs an intermediate layer in order to simplify the mapping and hide complexity of the Local Storage. Therefore the MA defines a Dataset artefact that encapsulates the local storage. The Dataset presents the required data as they were in a single table. These table columns have to be mapped with the SDMX Components. Therefore, defining a Dataset includes defining the columns of this virtual table and the SQL query that returns them. Dataset concept resembles the concept of the SQL VIEW where data coming from a complex are presented in single table. A complete definition of a Dataset requires also the Connection to the DDB that the query is executed upon. Having defined a Dataset the mapping to the SDMX constructs is simplified. Mapping Assistant utilizes the Mapping Set artefact that groups the mappings for a specific data retrieval case. Since data exchange in SDMX is based on Dataflow, the Mapping Set is defined using a specific Dataflow and a Dataset. The mappings are performed between the data fields (Concepts) of the Data Structure Definition related with the particular Dataflow & the Dataset’s columns. Also matching between the codes defined (in Codelists) for the DSD Concepts and the codes of the Dataset’s columns (Local codes) has to be done. This is called transcoding. The tool is licenced under the terms of the European Union Public Licence V.1.1 Complete Mapping Assistant package: MappingAssistant_.NET_v2.7.2_ zip (ZIP), containing an application, source code and documentation such as a User Manual, Installation Guide, Analysis and Design as well as Testing Documentation, stored in the CIRCA repository.

22 DSD

23 Local database

24 Local variables to DSD concepts
Mapping Local variables to DSD concepts Local table DSD Mapping of composite codings

25 SDMX Reference Infrastructure https://webgate. ec. europa
Data Provider Data Collector SDMX Registry Mapping Assistant DSD Web Svc Web Client Test Client Non-SDMX local database SDMX data set SDMX-RI

26 Connection to the database Generate SQL and SDMX-ML file
Build SDMX query Connection to the database Generate SQL and SDMX-ML file

27 Display the SDMX dataset in an Excel-like form

28 Validation

29 SDMX Validation Possibilities
Before/During Transmission (“First Level”) - Covered by SDMX today - Format Check (SDMX-ML) - Codes exist (SDMX DSD) - Codes used correctly (SDMX Dataflow & Constraint) After Transmission ( “Second Level”) - Not yet covered by SDMX (VTL project) - Detailed value check - Validation expressions - …

30 SDMX Tools Architecture


Download ppt "SDMX Tools Architecture"

Similar presentations


Ads by Google