Download presentation
Presentation is loading. Please wait.
Published byCaroline Atkinson Modified over 5 years ago
1
SDMX - Appendices Francesco Rizzo Istat ESTP Training Course
“Information standards and technologies for describing, exchanging and disseminating data and metadata” Rome, June 2018
2
Appendix 1 SDMX messages and formats
3
SDMX data formats 2.0 Vs 2.1 Simplified Data Formats: Use Case V 2.0
Single data set xml schema that supports all DSDs Generic (restrict to time series) SDMXDataGenericTimeSeries (restrict to time series) SDMXDataGeneric (supports both time series and non-time series) Specific time series data set xml schema for a DSD Compact Utility SDMXDataStructureSpecificTimeSeries (restrict to time series) Discontinued Specific data set xml schema for non-time series Cross Sectional SDMXDataStructureSpecific (Supports both time series and non-time series) SDMX 2.0 Generic data message: No validation Carries data for any data structure definition Verbose – files are very large Can perform incremental updates and carry partial data sets Useful for applications which need to carry potentially incorrect data for processing and cleaning Useful for generic applications which handle data for more than one DSD Serves as a “pivot format” between other SDMX-ML format types Utility data message: Provides strongest validation – all business rules in DSD are enforced by a generic XML parser (schemas are specific to particular DSDs) Less verbose than Generic; more verbose than Compact & Cross-Sectional Incremental updates not supported For XML tools, this is the most “normal” type of XML schema – performs best Compact data message: Equivalent of SDMX-EDI data format, but schemas are specific to a particular DSD Good for exchanging partial data sets and incremental updates Very compact (for XML) in terms of file sizes Very simple, but performs limited validation Will validate codelists, but not some other things Cross-sectional data message: Similar to Compact format, but allows for lots of observations for a single point in time (not time-series oriented like other formats) Very compact Supports incremental updates Provides limited validation – schemas are specific to a particular DSD SDMX 2.1 Simplified Data Formats: All data formats will be more consistent Cross-sectional and time-series formats are more similar Two families of Data Set: Generic (i.e. XML data set constructs support data for any DSD) SDMXDataGenericTimeSeries (time dimension at the observation level) SDMXDataGeneric (e.g. a dimension, except time dimension, at the observation level) Structure Specific (i.e. XML data set constructs specific to a DSD) SDMXDataStructureSpecificTimeSeries (time dimension at the observation level) SDMXDataStructureSpecific (e.g. all dimension at observation level – flat -; a dimension, except time dimension, at observation level) Note that time series variants are identical in structure to the non-time series variants, but restrict the content to time series
4
Structure message
5
Structure message: focus on DSD
6
Appendix 2 SDMX Implementation in Istat
7
SDMX Istat Strategy as part of Stat2015 modernization program
8
Implementing steps of the SDMX strategy
Looking for the necessary funds to support the implementation Develop a suitable cross-cutting architecture Streamlining the internal capabilities and capacity building actions Collaborating with other organizations
9
Results achieved up to now
Developed the SDMX Istat Framework Metadata management system in production all disseminated datasets described through SDMX artefacts legacy “reference and quality metadata system” wrapped for extracting SDMX metadata sets developed APIs for handling SDMX artefacts for reference metadata Dissemination data warehouse accessible through the SDMX Single Exit Point Streamlined the reporting system
10
Istat - SDMX architecture
10
11
SDMX Istat toolkit A set of pick-and-choose building blocks allowing a statistical office to facilitate the standardization and industrialization of the dissemination/reporting process: metadata handling database building data loading data/metadata dissemination/reporting (M2M) data/metadata dissemination/reporting (GUI) data exchange between Organizations (Pull and Push) Subject-matter domain independent Built using the SDMX Common API (SdmxSource.NET) It is a complement of the SDMX-RI (it extends the SDMX-RI) it can be used for building: “distributed” data warehouse SDMX-based “stand alone” dissemination systems
12
Lesson learnt SDMX is enough mature to be used beyond the data exchange between data producers (NSIs) and data collectors (IOs) Standardization and industrialization Data sharing (facilitating the open(statistical) data Re-using software, experiences and know-how is the only way to reduce costs and move forward quickly Capacity building actions for creating “consensus” and all the necessary capabilities
13
Appendix 3 SDMX Istat Toolkit
14
SDMX Istat toolkit
15
SDMX Istat Toolkit – modules (1/3)
Metadata Repository/Registry – based on the SDMX-RI Mapping Store, allows to handle SDMX structural metadata (Data Structure Definition; Code List; Hierarchical Code List; Concept Scheme; Dataflow; Category Scheme; Structure Set; Process; Organisation Scheme, Metadata Structure Definition, Metadata Flow) SDMX Web Service – based on the SDMX-RI Web Service Provider, allows to query and submit structural metadata. Furthermore data can be extract in different formats: SDMX, RDF, Google/DSPL, CSV, JSON. Metadata Web GUI – provides a graphical user interface for browsing, download, create and submit structural metadata. It can be used as a “switch” towards different SDMX Web Services based on the SDMX-RI. In this context a user can browse metadata stored in distributed repositories. The application allows to handle the order in the Code Lists and to add further items in already final Code Lists
16
SDMX Istat Toolkit – modules (2/3)
Meta Manager – it can perform many of the functionalities offered by the Metadata Web GUI, such as create Codelists, Conceptscheme, Categoryschemes, Dataflows and Data Structure Definitions. Moreover, it allows to overcome some SDMX constraints, and modify finalized item scheme artefacts (e.g. Codelists, Conceptschemes, Categoryschemes): Add new items (delete is not allowed) Modify name, description, annotations, etc. Handle the order and hierarchy of the items Move a Dataflow from a Category to a another, or between different Categoryschemes This application can also be used for building “nomenclature” servers, such as classifications’ servers and glossaries Data Manager (Former Builder & Loader) – allows to create a dissemination/reporting SDMX compliant database. The database schema is created through DSDs and related artifacts. CSV and SDMX data files can be loaded into the database using a Web GUI.
17
SDMX Istat Toolkit – modules (3/3)
Data Web Browser – interacts with SDMX-RI web services (or compliant) allowing data-users to browse, present and visualize datasets. it can be used within a single Organization in order to disseminate datasets stored into one or more databases, or in the context of a “multi-source” project (Hub architecture), where more Organizations expose their databases through SDMX Web Services based on the SDMX-RI. A data user” can: switch between the available dashboards; switch between different distributed databases (web services); browse one or more tree-themes and select the dataset of interest (the same leaf-tree, can categorize datasets coming from different databases); set filter for each dataset; specify the layout of the table; calculate cyclical and trend variation; create graphs; store queries (only for authenticated users) that can be used in other working sessions.
18
Appendix 4 Useful terms and concepts
19
Useful terms and concepts (1/5)
Standard (ISO): is a document that provides requirements, specifications, guidelines or characteristics that can be used consistently to ensure that materials, products, processes and services are fit for their purpose Information System (IS): It is an organized system for the collection, organization, storage and communication of information it is the study of complementary networks that people and organizations use to collect, filter, process, create and distribute data Computer(-based) information system: it is essentially an IS using computer technology (hw, sw, DBs, Networks, Procedures) to carry out some or all of its planned tasks. Information Model: an abstract but formal representation of entities including their properties, relationships and the operations that can be performed on them
20
Useful terms and concepts (2/5)
Web Services: interface for a service oriented architecture (see SOA), in which Web-based applications dynamically interact with each other using open standards that include XML, HTTP, UDDI and SOAP. Such applications typically run behind the scenes, one program "talking to" another, server to server or client to server Tightly (Highly) Coupled systems: systems that are dependent upon each other Loosely Coupled systems: systems that interact when necessary, but remain uncoupled from each other
21
Useful terms and concepts (3/5)
Information Microdata: typically address the data as fields and records Macrodata: (aggregated data): data that can be operated upon as an “hypercube” of set dimensionality Metadata: data that are used to describe other data Actors Data provider: who provides data to somebody else Data collector: who collects data provided by somebody else Actions Push: the data provider starts the “data exchange” action and sends data to the data collector(s) using different means (mail, , ad-hoc such as eDamis, etc.) Pull: the data collector starts the “data exchange” action and grabs data directly from the data provider(s) database or file server. In this case the data provider “share” the data
22
Useful terms and concepts (4/5)
Microdata: typically address the data as fields and records. Features which can be essential to these operations include: the ability for a single field (eg respondent’s annual income) to be harnessed for different purposes as a continuous measure a dimension (e.g. as the basis for categorisation by income range) an attribute (e.g. used at a person level as part of determining whether a household should be assigned the characteristic “double income, no kids (DINK)” the ability to derive new unit level indicators via complex formulas (eg “decision tables”) applied across fields within a record and/or across related records relationships between different types and sets of records, for example: person and household within a single survey relationships between records for the same unit in multiple waves of a longitudinal study probabilistic linking of records
23
Useful terms and concepts (5/5)
Macrodata: Aggregate data is more commonly visualized, and operated upon, as a “hypercube” of set dimensionality. This can have many benefits for efficiency when, for example: understanding specific characteristics of a population (and comparing characteristics of subpopulations within that population) rather than details of individual respondents selecting and understanding a subset of aggregates which are of interest for a particular purpose, and identifying and analyzing time series, including trends
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.