Statistical Information Technology SDMX The basics Eurostat Statistical Information Technology
Some parallels with everyday life WHY SDMX? Some parallels with everyday life translation Let’s start with something specific: communication Using the same language allows working more efficiently, that is why we need a lingua franca when we want to speak to each other. English saves us cost of interpretation and help decreasing errors in exchange of ideas. Here is the parallel with SDMX. At the office we also use a variety of file formats, such as, CSV, Excel, PC-Axis, flat files. Now SDMX comes into play, providing a common language for data and metadata exchange, a common model for statistical data and a potential for simplifying our work. 2
Some parallels with everyday life WHY SDMX? Some parallels with everyday life A second example: mobile phone chargers. Having different types of chargers with different plugs is unpractical, costly and wasteful. Today, manufacturers agreed to use a single type of plug, due to the initiative of the European Union. In statistics we also don’t want different data models and transmission formats leading to a variety of expensive tools and time-consuming conversions. SDMX standardises the way data is organised and exchanged by using standard protocols, together with international guidelines on how to shape the data. 3
WHY SDMX? SDMX gives technical and statistical standards for the exchange of statistics, adapting to new Internet technologies. WHY SDMX? The internet changed the way people exchange information; SDMX changes the way statisticians exchange and process data. The internet offers modern technologies. SDMX uses them for activities that statisticians do every day: to describe, share and relate data, and manage statistical processes. Finally, statistics can be published in a way that conveys the full meaning and context, also helping people to compare, interpret and understand the data. 4
This is what SDMX provides and enables WHAT SDMX IS A model to describe statistical data and metadata A standard for automated communication from machine to machine A technology supporting standardised IT tools In order to take advantage of all this: Statisticians agree to use a common description for data and metadata The data exchange process is then driven by the common description Data descriptions are made available for everybody who wants to understand and reuse the data So, SDMX is: A logical model to describe statistical data, which also provides guidelines on how to structure the content A standard for automated communication from machine to machine A technology supporting standardised IT and methodological tools that can be used by all involved in the data exchange and processing. In order to take advantage of all this: Statisticians agree to use a common description of data. The common data description is then used as the parameters driving the exchange and processing. Finally, the data descriptions are made available to everybody, so that all understand and can use the data. This is what SDMX provides and enables 5
SDMX describes the data and metadata exchange by end of June Provision Agreement Organisation scheme SDMX Registry maintainer Concept Schemes Codelists Now, after having defined the structure, we describe the exchange process. Let’s say that Eurostat has to send the tourism table to OECD, and that - according to the organisation scheme - Eurostat has the role of maintaining the objects related to the table in question. SDMX incorporates the idea of "service level agreement“ by defining a "provision agreement” which describes the way data should be provided. Here, Eurostat is the Data Provider: the Provision Agreement might say that the table should be provided before end of June according to a specific DSD. DSD Concepts 6
SDMX - The background The exchange of statistical data and metadata is complex, resource intensive and expensive. In the past, national and international organisations have developed specific processes and IT solutions. Opportunities and challenges related to new technologies such as XML, web services, etc. arose in the last years. SDMX is the global answer by main statistical organisations in the world.
What does SDMX deliver? U.N. Statistical Commission (02/2008): SDMX is global Seven international organizations (BIS, ECB, Eurostat, IMF, OECD, UN, World Bank) have joined forces based on: Memorandum of Understanding signed March 2007 Rotating chair every two years U.N. Statistical Commission (02/2008): SDMX is recognised as “the preferred standard for the exchange and sharing of data and metadata in the global statistical community”. Emphasised the need of further involving national and international agencies. Underlined importance of capacity building and outreach (Seminars, Workshops, manuals, training, technical assistance). Requested SDMX Sponsors to continue the SDMX development of technical and statistical standards, IT service infrastructure and IT tools What does SDMX deliver? 8
The SDMX components SDMX consists of … 1. The SDMX information model for data and metadata; 2. SDMX Content-oriented Guidelines as statistical standards; 3. SDMX IT architecture for data and metadata exchange 4. SDMX IT tools supporting the implementation and use SDMX is not just a data transmission format, but should be used from end-to-end of the statistical business process.
1. SDMX information model 1. The SDMX information model SDMX technical standard 2.0/2.1 XML format for the exchange of SDMX structured data and metadata Data/metadata structure definitions (DSDs/MSDs) to be defined for statistical domains Users can retrieve the DSDs and MSDs from SDMX registries (e.g. the Euro SDMX Registry)
2. The SDMX statistical standards 2. The SDMX Content Oriented Guidelines Statistical Cross-Domain Concepts + Code lists: short list of statistical concepts relevant to all statistical domains; to be used within the SDMX technical standards (e.g. frequency, observation status, time format, unit of measurement) Statistical Subject-Matter Domains list of subject-matter domains (e.g. demographic and social statistics, economic statistics, environment) Metadata Common Vocabulary common cross-domain statistical terminology used in the SDMX content-oriented guidelines See also www.sdmx.org
Benefits of SDMX The use of the SDMX standards and guidelines provides the main benefits: Reducing reporting burden; Fostering international data and metadata consistency; Enhancing the integration and the efficiency of statistical business processes (vertical and horizontal); Providing standard dissemination formats; Facilitating data and metadata use and analysis; Open standards maintained by the SDMX sponsors with the input of the statistical organisations;
SDMX gets more and more used by statistical organisations Summarising … SDMX is global. SDMX is not only about IT standards, but also about statistical standards SDMX gets more and more used by statistical organisations SDMX at one of the main enablers when it comes to the integration and harmonisation of statistical business processes
How to get there? Preparation Compliance DSD Production Implementation We have defined the data structure and the exchange process. We have achieved what SDMX calls compliance. Now, we are ready to put to work SDMX for our data exchange. These are the steps achieved so far: Preparation. Statisticians from the organisations involved in the data exchange described the data and the different data flows, data sets and provision agreements. Compliance. We created all the necessary objects in accordance with the SDMX standard. These are the next practical steps: Implementation. We put all this into practice: standard software is installed and parameterised to use the DSD and the other SDMX objects. The exchange process is set up and tested. Production when the data and metadata exchange is carried out according to SDMX specifications. 14
Eurostat SDMX Implementation Process 15