SDMX - Appendices Francesco Rizzo Istat ESTP Training Course

Slides:



Advertisements
Similar presentations
The use of SDMX at the ECB Xavier Sosnovsky European Central Bank Bonn,
Advertisements

Database System Concepts and Architecture
SDMX training session on basic principles, data structure definitions and data file implementation 29 November
Database System Concepts and Architecture Lecture # 3 22 June 2012 National University of Computer and Emerging Sciences.
WP.5 - DDI-SDMX Integration E.S.S. cross-cutting project on Information Models and Standards Marco Pellegrino, Denis Grofils Eurostat METIS Work Session6-8.
Survey Data Management and Combined use of DDI and SDMX DDI and SDMX use case Labor Force Statistics.
Restricted Daejeon, April An SDMX based unified data catalogue (UDC) MSIS – Meeting on the Management of Statistical Information Systems 1.
CountryData Technologies for Data Exchange SDMX Information Model: An Introduction.
SDMX Standards Relationships to ISO/IEC 11179/CMR Arofan Gregory Chris Nelson Joint UNECE/Eurostat/OECD workshop on statistical metadata (METIS): Geneva.
Eurostat Unit B3 – IT and standards for data and metadata exchange SDMX Basics Training – 2012 IT architectures for data exchange SDMX-RI and the Hub approach.
Francesco Rizzo (ISTAT - Italy) SDMX ISTAT FRAMEWORK GENEVE May 2007 OECD SDMX Expert Group.
Francesco Rizzo (ISTAT - Italy) Stefano De Francisci (ISTAT – Italy) An integration approach for the Statistical Information System of Istat using SDMX.
1 Meeting on the Management of Statistical Information Systems (MSIS 2010) SDMX architecture for data sharing and interoperability Francesco Rizzo, ISTAT,
Eurostat SDMX and Global Standardisation Marco Pellegrino Eurostat, Statistical Office of the European Union Bangkok,
Eurostat 4. SDMX: Main objects for data exchange 1 Raynald Palmieri Eurostat Unit B5: “Central data and metadata services” SDMX Basics course, October.
SDMX IT Tools Introduction
2.An overview of SDMX (What is SDMX? Part I) 1 Edward Cook Eurostat Unit B5: “Central data and metadata services” SDMX Basics course, October 2015.
7b. SDMX practical use case: Census Hub
Eurostat November 2015 Eurostat Unit B3 – IT and standards for data and metadata exchange Jean-Francois LEBLANC Christian SEBASTIAN SDMX IT Tools SDMX.
Eurostat 6. SDMX: A non-technical overview of the SDMX architecture and IT tools 1 Raynald Palmieri Eurostat Unit B5: “Central data and metadata services”
1 Copyright © 2008, Oracle. All rights reserved. Repository Basics.
IAEA International Atomic Energy Agency Implementing SDMX for Energy Domain: From Discussion to Actual Implementation and Testing Andrii Gritsevskyi Oslo.
Databases and Database User ch1 Define Database? A database is a collection of related data.1 By data, we mean known facts that can be recorded and that.
Introduction to DBMS Purpose of Database Systems View of Data
Databases (CS507) CHAPTER 2.
Databases and DBMSs Todd S. Bacastow January 2005.
Outline Types of Databases and Database Applications Basic Definitions
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Data Warehouse.
4. SDMX: Main objects for data exchange
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Database System Concepts and Architecture.
Chapter 2 Database Environment Pearson Education © 2009.
Chapter 2: Database System Concepts and Architecture
SDMX Information Model
Data, Databases, and DBMSs
MANAGING DATA RESOURCES
File Systems and Databases
Census Hub in practice Working Group "European Statistical Data Support" Luxembourg, 29 April 2015.
SDMX: A brief introduction
11. The future of SDMX Introducing the SDMX Roadmap 2020
SDMX Reference Infrastructure Introduction
2. An overview of SDMX (What is SDMX? Part I)
2. An overview of SDMX (What is SDMX? Part I)
SDMX Tools Architecture
Introduction to DBMS Purpose of Database Systems View of Data
SDMX Information Model: An Introduction
Introduction of Week 11 Return assignment 9-1 Collect assignment 10-1
LOD reference architecture
SDMX in the S-DWH Layered Architecture
SDMX Tools Overview and architecture
ESS VIP ICT Project Task Force Meeting 5-6 March 2013.
Prepared by Peter Boško, Luxembourg June 2012
X-DIS project: final report
Database System Concepts and Architecture
SDMX IT Tools SDMX use in practice in NA
Item 7.3 (b) SDMX for UOE data collection
2nd SISAI meeting Luxembourg, June 2012
Introduction to reference metadata and quality reporting
Eurostat Unit B3 – IT and standards for data and metadata exchange
Developing SDMX artefacts for data exchange, sharing and dissemination
Standardizing and industrializing a business process – the dissemination use case Alessio Cardacino - ESTP Course “Information standards.
SDMX IT Tools SDMX Registry
SDMX IT building blocks
SDMX Global Conference Francesco Rizzo – ISTAT, Italy
Palestinian Central Bureau of Statistics
SDMX training Francesco Rizzo June 2018
Standardizing and industrializing a business process – the dissemination use case – Annex 1 Alessio Cardacino - ESTP Course “Information.
Presentation transcript:

SDMX - Appendices Francesco Rizzo Istat ESTP Training Course “Information standards and technologies for describing, exchanging and disseminating data and metadata” Rome, 19-22 June 2018

Appendix 1 SDMX messages and formats

SDMX data formats 2.0 Vs 2.1 Simplified Data Formats: Use Case V 2.0 Single data set xml schema that supports all DSDs Generic (restrict to time series) SDMXDataGenericTimeSeries (restrict to time series) SDMXDataGeneric (supports both time series and non-time series) Specific time series data set xml schema for a DSD Compact Utility SDMXDataStructureSpecificTimeSeries (restrict to time series) Discontinued Specific data set xml schema for non-time series Cross Sectional SDMXDataStructureSpecific (Supports both time series and non-time series) SDMX 2.0 Generic data message: No validation Carries data for any data structure definition Verbose – files are very large Can perform incremental updates and carry partial data sets Useful for applications which need to carry potentially incorrect data for processing and cleaning Useful for generic applications which handle data for more than one DSD Serves as a “pivot format” between other SDMX-ML format types Utility data message: Provides strongest validation – all business rules in DSD are enforced by a generic XML parser (schemas are specific to particular DSDs) Less verbose than Generic; more verbose than Compact & Cross-Sectional Incremental updates not supported For XML tools, this is the most “normal” type of XML schema – performs best Compact data message: Equivalent of SDMX-EDI data format, but schemas are specific to a particular DSD Good for exchanging partial data sets and incremental updates Very compact (for XML) in terms of file sizes Very simple, but performs limited validation Will validate codelists, but not some other things Cross-sectional data message: Similar to Compact format, but allows for lots of observations for a single point in time (not time-series oriented like other formats) Very compact Supports incremental updates Provides limited validation – schemas are specific to a particular DSD SDMX 2.1 Simplified Data Formats: All data formats will be more consistent Cross-sectional and time-series formats are more similar Two families of Data Set: Generic (i.e. XML data set constructs support data for any DSD) SDMXDataGenericTimeSeries (time dimension at the observation level) SDMXDataGeneric (e.g. a dimension, except time dimension, at the observation level) Structure Specific (i.e. XML data set constructs specific to a DSD) SDMXDataStructureSpecificTimeSeries (time dimension at the observation level) SDMXDataStructureSpecific (e.g. all dimension at observation level – flat -; a dimension, except time dimension, at observation level) Note that time series variants are identical in structure to the non-time series variants, but restrict the content to time series

Structure message

Structure message: focus on DSD

Appendix 2 SDMX Implementation in Istat

SDMX Istat Strategy as part of Stat2015 modernization program

Implementing steps of the SDMX strategy Looking for the necessary funds to support the implementation Develop a suitable cross-cutting architecture Streamlining the internal capabilities and capacity building actions Collaborating with other organizations

Results achieved up to now Developed the SDMX Istat Framework Metadata management system in production all disseminated datasets described through SDMX artefacts legacy “reference and quality metadata system” wrapped for extracting SDMX metadata sets developed APIs for handling SDMX artefacts for reference metadata Dissemination data warehouse accessible through the SDMX Single Exit Point Streamlined the reporting system

Istat - SDMX architecture 10

SDMX Istat toolkit A set of pick-and-choose building blocks allowing a statistical office to facilitate the standardization and industrialization of the dissemination/reporting process: metadata handling database building data loading data/metadata dissemination/reporting (M2M) data/metadata dissemination/reporting (GUI) data exchange between Organizations (Pull and Push) Subject-matter domain independent Built using the SDMX Common API (SdmxSource.NET) It is a complement of the SDMX-RI (it extends the SDMX-RI) it can be used for building: “distributed” data warehouse SDMX-based “stand alone” dissemination systems

Lesson learnt SDMX is enough mature to be used beyond the data exchange between data producers (NSIs) and data collectors (IOs) Standardization and industrialization Data sharing (facilitating the open(statistical) data Re-using software, experiences and know-how is the only way to reduce costs and move forward quickly Capacity building actions for creating “consensus” and all the necessary capabilities

Appendix 3 SDMX Istat Toolkit

SDMX Istat toolkit

SDMX Istat Toolkit – modules (1/3) Metadata Repository/Registry – based on the SDMX-RI Mapping Store, allows to handle SDMX structural metadata (Data Structure Definition; Code List; Hierarchical Code List; Concept Scheme; Dataflow; Category Scheme; Structure Set; Process; Organisation Scheme, Metadata Structure Definition, Metadata Flow) SDMX Web Service – based on the SDMX-RI Web Service Provider, allows to query and submit structural metadata. Furthermore data can be extract in different formats: SDMX, RDF, Google/DSPL, CSV, JSON. Metadata Web GUI – provides a graphical user interface for browsing, download, create and submit structural metadata. It can be used as a “switch” towards different SDMX Web Services based on the SDMX-RI. In this context a user can browse metadata stored in distributed repositories. The application allows to handle the order in the Code Lists and to add further items in already final Code Lists

SDMX Istat Toolkit – modules (2/3) Meta Manager – it can perform many of the functionalities offered by the Metadata Web GUI, such as create Codelists, Conceptscheme, Categoryschemes, Dataflows and Data Structure Definitions. Moreover, it allows to overcome some SDMX constraints, and modify finalized item scheme artefacts (e.g. Codelists, Conceptschemes, Categoryschemes): Add new items (delete is not allowed) Modify name, description, annotations, etc. Handle the order and hierarchy of the items Move a Dataflow from a Category to a another, or between different Categoryschemes This application can also be used for building “nomenclature” servers, such as classifications’ servers and glossaries Data Manager (Former Builder & Loader) – allows to create a dissemination/reporting SDMX compliant database. The database schema is created through DSDs and related artifacts. CSV and SDMX data files can be loaded into the database using a Web GUI.

SDMX Istat Toolkit – modules (3/3) Data Web Browser – interacts with SDMX-RI web services (or compliant) allowing data-users to browse, present and visualize datasets. it can be used within a single Organization in order to disseminate datasets stored into one or more databases, or in the context of a “multi-source” project (Hub architecture), where more Organizations expose their databases through SDMX Web Services based on the SDMX-RI. A data user” can: switch between the available dashboards; switch between different distributed databases (web services); browse one or more tree-themes and select the dataset of interest (the same leaf-tree, can categorize datasets coming from different databases); set filter for each dataset; specify the layout of the table; calculate cyclical and trend variation; create graphs; store queries (only for authenticated users) that can be used in other working sessions.

Appendix 4 Useful terms and concepts

Useful terms and concepts (1/5) Standard (ISO): is a document that provides requirements, specifications, guidelines or characteristics that can be used consistently to ensure that materials, products, processes and services are fit for their purpose Information System (IS): It is an organized system for the collection, organization, storage and communication of information it is the study of complementary networks that people and organizations use to collect, filter, process, create and distribute data Computer(-based) information system: it is essentially an IS using computer technology (hw, sw, DBs, Networks, Procedures) to carry out some or all of its planned tasks. Information Model: an abstract but formal representation of entities including their properties, relationships and the operations that can be performed on them

Useful terms and concepts (2/5) Web Services: interface for a service oriented architecture (see SOA), in which Web-based applications dynamically interact with each other using open standards that include XML, HTTP, UDDI and SOAP. Such applications typically run behind the scenes, one program "talking to" another, server to server or client to server Tightly (Highly) Coupled systems: systems that are dependent upon each other Loosely Coupled systems: systems that interact when necessary, but remain uncoupled from each other

Useful terms and concepts (3/5) Information Microdata: typically address the data as fields and records Macrodata: (aggregated data): data that can be operated upon as an “hypercube” of set dimensionality Metadata: data that are used to describe other data Actors Data provider: who provides data to somebody else Data collector: who collects data provided by somebody else Actions Push: the data provider starts the “data exchange” action and sends data to the data collector(s) using different means (mail, email, ad-hoc such as eDamis, etc.) Pull: the data collector starts the “data exchange” action and grabs data directly from the data provider(s) database or file server. In this case the data provider “share” the data

Useful terms and concepts (4/5) Microdata: typically address the data as fields and records. Features which can be essential to these operations include: the ability for a single field (eg respondent’s annual income) to be harnessed for different purposes as a continuous measure a dimension (e.g. as the basis for categorisation by income range) an attribute (e.g. used at a person level as part of determining whether a household should be assigned the characteristic “double income, no kids (DINK)” the ability to derive new unit level indicators via complex formulas (eg “decision tables”) applied across fields within a record and/or across related records relationships between different types and sets of records, for example: person and household within a single survey relationships between records for the same unit in multiple waves of a longitudinal study probabilistic linking of records

Useful terms and concepts (5/5) Macrodata: Aggregate data is more commonly visualized, and operated upon, as a “hypercube” of set dimensionality. This can have many benefits for efficiency when, for example: understanding specific characteristics of a population (and comparing characteristics of subpopulations within that population) rather than details of individual respondents selecting and understanding a subset of aggregates which are of interest for a particular purpose, and identifying and analyzing time series, including trends