Presentation is loading. Please wait.

Presentation is loading. Please wait.

SDMX training session on basic principles, data structure definitions and data file implementation 29 November 2007 2.

Similar presentations


Presentation on theme: "SDMX training session on basic principles, data structure definitions and data file implementation 29 November 2007 2."— Presentation transcript:

1 SDMX training session on basic principles, data structure definitions and data file implementation
29 November 2007 2

2 A - Introduction

3 Purpose of the training session
Provide understanding of the basic SDMX principles (DSD and Dataset Implementation) Provide knowledge to the SDMX Standard and its XML implementation Present ESTAT tools as case studies illustrating their scope and usage

4 Current practices Current practices on data and metadata exchange:
Legal Framework (Commission Regulations, Council Regulations, etc.) Data and metadata files, questionnaires, quality reports, etc. Format (paper form, EDIFACT, XML, Structured Files, etc.) Media ( , file upload, Web-form, removable media, dial-up, etc.)

5 The need for a standard…
Enhance electronic data and metadata exchange Enhance availability of statistical data and metadata information for the users Promote interoperability between different systems Improve the quality of transmitted data (Timeliness & Punctuality, Accessibility & Clarity, Accuracy, Comparability)

6 SDMX (Statistical Data and Metadata eXchange)
Initiative on the standardisation of the statistical data and metadata exchange process. 7 Sponsors (BIS, ECB, ESTAT, IMF, OECD, UN, WB) “Push” and “pull” mode Use of XML technologies to promote interoperability Basic principles: Data Structure Definitions (DSD) & Metadata Structure Definitions (MSD) SDMX registries Data on the WEB using SDMX

7 SDMX (cont.) Exchange and Sharing of statistical information
Statistical data Statistical metadata Structural metadata Reference metadata Emphasis on macro-data (aggregated statistics) Promotes a “data sharing” model low-cost high-quality of transmitted data interoperability between (otherwise) incompatible systems Data sharing based on the SDMX: reduces the reporting burden of organisations Allows them to publish data once and let their counterparties “pull” data and related metadata as required. How is this achieved: the availability of an abstract Information Model capable of supporting any time-series and cross-sectional data, structural metadata and reference metadata (SDMX-IM) standardised XML schemas derived from the model (SDMX-ML) the use of web-services technology Data sharing process is based on an architecture of central registry services. Registry services provide visibility into the data and metadata existing within the community and support the access and use of this data and metadata by providing a set of triggers for automated processing. The data itself is not stored in a central registry these services merely provide a useful set of metadata about the data (and additional metadata) in a known location, so that users/applications can easily locate and obtain whatever data and/or metadata is registered. The use of standards for all data, metadata, and the registry services themselves is ubiquitous, permitting a high level of automation within a data-sharing community.

8 SDMX Training 29 November 2007 B – SDMX Core Elements 2

9 EXAMPLE DATASET1

10 EXAMPLE DATASET2

11 SDMX Information Model
The SDMX Information Model (SDMX-IM) is a conceptual model from which syntax specific implementations are developed. The SDMX-IM provides for the structuring not only of data, but also of “reference” metadata! The model is constructed as a set of structures which assist in the understanding, re-use and maintenance of the model. Data Structure Definition and Metadata Structure Definition Dataflows - Datasets Data Provisioning All technology specifications in SDMX are implementations of the conceptual model

12 Structures in the SDMX-IM
Components Concept Scheme Concept Code List Code Category Scheme Category Organisation Scheme Organisation Organisation Role - DataProvider - DataConsumer - MaintainanceAgency Data Structure Definition (DSD) Dimensions Attributes Measures Groups

13 Structures in the SDMX-IM (cont.)
Fundamental parts: Structural metadata (DSD, concepts, code lists) Observational data (organised set of numeric observations) Reference metadata Definitions: Data Structure Definition (DSD): set of structural metadata needed to understand the dataset structure Dataflow Definition: a description of the dataset which identifies, categorises and constraints the allowable content of the dataset Dataset: an organised collection of statistical data the ‘container’ of a Data Flow Definition for an instance of the data. Dataset: any organised collection of data Dataflow: links Dataset to a specific DSD DSD: define the valid content and structure of a dataset Code lists – Codes: list of predefined values to be used within the DSD Concept Schemes – Concepts: a statistical characteristic used within a DSD

14 Structures in the SDMX-IM (cont.)
Code lists – Codes: list of predefined values to be used within the DSD Codelists enumerate a set of values to be used in the representation of several structural components of SDMX. Concept Schemes – Concepts: a statistical characteristic used within a DSD Additional properties can be defined for concepts: Provide Name/Description in various locales Assign default representation (coded or uncoded) Define semantic hierarchies of concepts Category Schemes – Categories: Category schemes are made up of a hierarchy of categories (subject matter domains), which in SDMX may include any type of useful classification for the organization of data and metadata A Dataflow may be linked to many Categories

15 DSD components Dimension (e.g. frequency, reference area):
Classificatory variable used for identification of subsets or single observations Definition of the key descriptor for reporting Datasets Attribute (e.g. title, observation status): Add additional metadata about the observations Can be attached at four possible levels (Observation, Time Series / Cross-Sectional data, Group, Data Set) Measure (e.g. turnover index, # of births, # of deaths): Data (uncoded / unclassified) that can be reported (The observation value) Primary (Time Series) or Cross-Sectional (Cross-sectional data) Groups: Grouping of dimensions in order to attach group attributes (e.g. sibling group)

16 Data Structure Definition
Examples: Time Series dataset STS domain: Turnover Index for Retail Trade and repair DSD Cross-Sectional dataset Demography domain: Rapid questionnaire DSD

17 STS Sample Dataset Dimensions Attributes Dimensions Measure

18 STS DSD components Dataflow: STSRTD_TURN_M

19 Demography Sample Dataset
Measures Attributes Dimensions

20 Demography DSD components
Dataflow: DEMOGRAPHY_RQ

21 Data Provisioning A Data Provider can provide data/metadata for many Dataflows using an agreed data structure. Dataflows may incorporate data coming from more than one Data Provider. Provision Agreement  which data providers are supplying what data to which data flows. The Dataflow may be linked to 1 or more Categories (subject matter domains) from different Category Schemes. Regulations

22 Identification, Versioning & Maintenance
Identification: every structural element must have a semantic identifier (e.g. CL_UNIT) Versioning: a specific element may have different versions (updates of the element) Maintenance: some structures must be maintained by an organisation Unique identification: id+version+agency id: CL_UNIT version:1.0 agency: ESTAT id: CL_UNIT version:1.0 agency: ECB Internationalization: the use of multiple languages for describing any element SDMX-IM covers aggregate data and metadata in all domains (not domain-specific)

23 SDMX High level View CategoryScheme
Data or Metadata Structure Definition Category can have child categories comprises subject or reporting categories Data or Metadata Flow Data Provider Provision Agreement uses specific data/metadata structure can be linked to categories in multiple category schemes conforms to business rules of the data/metadata flow can get data from multiple data providers can provide data or metadata for many data or metadata flows using agreed data or metadata structure is registered for Registered Data or MetadataSet Data or MetadataSet

24 Tools Demonstration

25 SDMX Registry A repository for keeping
Structural metadata (e.g. CodeLists, ConceptSchemes, DSDs) Provisioning information (e.g. Dataflows, Provision agreements) Repository is accessible via a Web Service accepting SDMX-ML messages Graphical User Interface (GUI) for user interaction over the Web A type of technology which can be operated by many different users, each for the purposes of their own statistical community. A part of the overall SDMX package of standards Accessible application to other programs over the Internet

26 Data Structure Wizard DSW – “standalone” application (replacing AccessDB tool) Main functionalities Manage data structures (create, modify, delete, query) Import/Export SDMX-ML structures (validate structure messages) Import/Export GESMES/TS structure files Create Data messages Query SDMX Registry Submit data structures to SDMX Registry

27 Example - DSD creation using the DSW
Live demonstration of the DSW in order to create datasets conforming with the previously created DSD

28 Example Dimensions Attributes Group Frequency (CL_FREQ)
Reference Area (CL_AREA_EE) Time period Product (CL_PRODUCT) Attributes Compilation Confidentiality Status Availability Group

29 SDMX Training 29 November 2007 C – SDMX-ML Data sets 2

30 Syntaxes for SDMX data Based on a common Information Model
SDMX-EDI (GESMES/TS) EDIFACT syntax Time series oriented – One format for Data Sets SDMX-ML XML syntax Four different formats for Data Sets Easier validation (XML based) Tools enable us to use the desired format

31 SDMX-ML Data Messages Equivalent representations for reporting Datasets: Generic message: one schema, not domain-specific Compact message: format for large-volume exchange of data, schema is specific to a DSD Utility message: format for advanced validation, schema is specific to a DSD Cross-Sectional message: format for non-time-series data, schema is specific to a DSD

32 The SDMX-ML Time-Series format
Used for representing time-series data Contain related metadata as defined in DSDs Three different (equivalent) representations available Generic message Compact message Utility message

33 Generic Dataset

34 Compact Dataset

35 Utility Dataset

36 The SDMX-ML Cross-Sectional data format
Used for representing non time-series data Contain related metadata as defined in DSDs Two different representations available Generic message Cross-Sectional message

37 Cross-Sectional Dataset

38 Conversions Equivalent formats
Can convert from any SDMX-ML format to another Based on the same IM Exceptions: If a Cross-Sectional DSD does NOT contain time dimension Conversions: Between the SDMX-ML formats Can be expanded to other formats (e.g. CSV, GESMES)

39 D – Producing SDMX-ML Data sets
SDMX Training 29 November 2007 D – Producing SDMX-ML Data sets 2

40 Reporting and Dissemination Guidelines
Define and classify all the underlying concepts of a dataset Provide the specification of the DSD: Name & identifier List of statistical concepts List of metadata concepts List of code lists Provide the related Dataflows (e.g. STSRTD_TURN_M, DEMOGRAPHY_RQ) List the Mandatory attributes (e.g. reference area, frequency), and the Conditional ones

41 Message Implementation Guidelines (MIG)
Comprises: DSD details (id, version, agencyID) Dimensions (concepts, representations, dimension types -e.g. frequency, entity, count, etc.-, attachment level ) Measure (primary or cross-sectional) Attributes (concept, representation, assignment status –mandatory or conditional-, attachment level, attribute type, attachment measure) Groups (subset of dimensions)

42 Structure of a MIG document
DSD table Dataflows table Referenced concept schemes Referenced Code Lists Detailed explanation of the Generic SDMX-ML sample dataset Detailed explanation of the Compact (or Cross-Sectional) SDMX-ML sample dataset

43 Example - Data Set creation using the DSW
Live demonstration of the DSW in order to create datasets conforming with the previously created DSD

44 SDMX Converter Main Functionality Reading the input message
parsing of the message populating the data model of the tool (based on the SDMX v2.0 information model ) Writing the converted message uses the data model to write the output message in the required target format. Information retrieved from the Registry Data flow ID is used to retrieve the data flow definition from the Registry. The DSD is retrieved from the data flow definition and is used to acquire the DSD

45 SDMX Converter (cont.) Tool utility:
You may already have data in other format than SDMX-ML (e.g. CSV, GESMES/TS) CSV  Compact SDMX-ML You may want further validation of your data Compact SDMX_ML  Utility SDMX_ML Conversions: From CSV to any type From SDMX-ML to any type From SDMX-EDI to any type

46 Conversion Example


Download ppt "SDMX training session on basic principles, data structure definitions and data file implementation 29 November 2007 2."

Similar presentations


Ads by Google