Download presentation
1
Basics David Barraclough OECD SDMX Coordinator
2
Overview What is SDMX? Why SDMX? SDMX at OECD How to start with SDMX?
Some SDMX concepts How is data exchanged The main tools Content-oriented guidelines Future of SDMX
3
Not simply a technical format!
What is SDMX (not)? Not simply a technical format!
4
What is SDMX? Statistical Data and Metadata eXchange
Released in 2002 “SDMX is an initiative to foster standards for the exchange of statistical information.” Sponsor organisations: BIS, ECB, EUROSTAT, IMF, OECD, UN, World Bank
5
What is SDMX? Format: XML and EDI (rebranded GESMES)
SDMX Information model Web service standards: APIs SDMX Registry standards Content-oriented guidelines
6
Why SDMX? The Business Case
Reusable, open-source (free) tools save money and time Standard codes and naming help improve reuse and save time Reuse of categories Less mapping/data processing saving Shopping list of concepts when defining structures Strongly-typed structures help improve validation and processing Heavy-lifting processing of data messages can be automated Text format is human-readable Easier to create new tools around the agreed format
7
Why SDMX? The Business Case
Standard technical architecture promotes more timely, better quality data Timely because less manual conversion is needed Quality because automated processing means less human error SDMX Information model Provides a common terminology Makes tool development much easier Information model described later
8
What’s in it for Data Reporters?
SDMX Registry helps structure metadata SDMX Tools exist and are free One dissemination channel instead of packaging data for multiple consumers Can easily disseminate SDMX from existing data warehouse with SDMX-RI Lots of SDMX methodology available and growing
9
Why not use…? Issues CSV Not structured, hard to validate No metadata
Excel Metadata tied to presentation Proprietary format Licensing Hard to process and automate FAME, SAS, STATA GESMES No information model Few tools or international support XML No context to tags SDMX adds context to XML XBRL, DDI Not focused on aggregated data exchange
10
The SDMX XML Data file format:
11
“Global DSDs” Domains at various stages of implementation:
National Accounts Balance of Payments Foreign Direct Investment FDI In draft: Harmonized Trade IMTS R&D Education Many other “Shared ” DSDs.
12
SDMX at OECD Harmonized Trade data
Synchronised from UN database every night Only “Delta” is synched in our database. Required because trade database is huge SDMX standards support querying the delta for a given date Harmonized Trade
13
SDMX at OECD OECD.Stat SDMX web service is used for:
Data resellers receive data in standard format, easy to process Incremental updates are possible by slicing data Querying autonomously. Standard API is easy to use in programs
14
<Demo of OECD.Stat web service>
SDMX at OECD <Demo of OECD.Stat web service>
15
How to start with SDMX? Data Structure Definition Data set
Structure specific data set Structure specific time series data set Generic time series data set Generic data set Data flow Data flow definition Category Category map Category scheme Category scheme map Code Code map Codelist Codelist map Hierarchy Hierarchical code Hierarchical codelist Hybrid code map Hybrid codelist map Concept Concept map Concept scheme Concept scheme map Metadata structure definition Metadata set Metadata flow Metadata flow definition Metadata concept Metadata concept scheme Reporting category Reporting category map Structure map Structure set Structure usage Constraints Annotation Representation Identifiable artefact ref Maintainable artefact ref Structure ref International string Localised string Agency Agency scheme Contact Provision agreement Data and metadata provisioning Data provider Data provider scheme Data provider ref Data consumer Data consumer scheme Organisation map Organisation unit Organisation unit scheme Organisation scheme map Metadata target Attribute descriptor Data attribute Metadata report Report structure Metadata attribute Measure descriptor Primary measure Component map Transition Enumerated attribute value XHTML attribute value Text attribute value Other non enumerated attribute value Target data key Target object key Level Coding format Source code Source hierarchical code Source codelist Source hierarchical codelist Hierarchical code reference Target code Target codelist Target hierarchical code Target hierarchical codelist Dimension descriptor Dimension Time dimension Measure dimension Group dimension descriptor Data set target Target data set Report period target Target report period Dimension description values target Identifiable object target Target identifiable object Constraint content target Reporting taxonomy Reporting taxonomy map Series key Group key Reporting year start day Attachment constraint No specified relationship Primary measure relationship Group relationship Dimension relationship Measure key value Coded key value Uncoded key value Time key value Time dimension value Component value Observation Uncoded observation Coded observation Uncoded attribute value Coded attribute value Scheme map To text format To value type Data key set Data key Metadata key set Metadata key Constraint role Content constraint Cube region Metadata target region Constraint role type Reference period Release calendar Member selection Member value Range period Start period End period Before period After period Registration Process Process step Process artefact Simple datasource Rest datasource Web service datasource Computation Transition Transformation Transformation scheme Operator scheme Reference node Constant node Operator Operator node Parameter
16
How to start with SDMX? Not much needed, but at least:
Understand the business case – the value in doing the project SDMX.org Learning and working groups: Understand the basic SDMX terms, but don’t try to understand the whole of the standard…
17
SDMX Information Model
What is an Information Model? Examples: SDMX IM designed for statistical data and metadata exchange SDMX IM focused on aggregated data, but can be used for microdata Information Model Objects Used by Excel Sheets, Cells, Rows Formulae, VBA Relational database Database, Table, Column SQL, Interface OECD metadata 42 categories OECD.Stat, Metastore
18
SDMX Information Model
Benefits of having an information model: Common vocabulary (Code list, Concept, Dataset) IM objects are fit-for-purpose Clearly defined relationships between objects and their usage SDMX formats and tools are built around the IM Interoperable tools IM is highly structured, easier to use a part of it rather than implementing full SDMX standard
19
Basic SDMX Artefacts DSD: Data Structure Definition Concept Code list
Defines a cube/dataset for a domain such as National Account States dimensions, their members, and attributes Understand difference between a dimension and attribute Concept Either a dimension or attribute, e.g. Dimensions: Age, Location, Sector, Time Attributes: Observation status, Unit multiplier Code list Dimension or attribute members Each code list item has a code and description Concept Scheme List of all concepts for domain before splitting them into DSDs
20
Basic SDMX Artefacts Example with National Accounts
Concept Scheme National Accounts DSD NA Main Concept Frequency Code List Frequency CL Reference area Area CL Sector Sector CL Observation Status Observation Status CL
21
How is SDMX data exchanged?
Web Services used for automation Web Service: a web site without a user interface Instead of user interface there is an API (Application Programming Interface) Used for machine-to-machine processing SDMX has a standard API Means that same software can use API from many locations
22
How is SDMX data exchanged?
Push mode Data provider sends data files to each collector Each collector gets the data Pull mode Data provider publishes the data once Each collector gets the data from the provider Data hub Data published to a central location (the hub) Consumers get notification when data is published Pull mode offers more efficient dissemination and collection of data, enables client-drive slicing, and increases timelines of data SDMX uses web services to support Pull mode
23
Content-oriented Guidelines
Common code lists: Country Observation Status Currency Etc. Rules in coding Guidelines for SDMX projects and creating new DSDs, etc. Benefits: Promote best practices in artifact creation, governance Alignment between domains Speed-up SDMX projects. Provide shopping list of existing code lists Help SDMX projects with recommendations
24
Map data flows between organisations
SDMX Project Steps Map data flows between organisations Data formats Reporting forms or tables Mailbox/web List domain concepts for entire domain Becomes Concept Scheme Define code lists Codify all items using SDMX guidelines Hierarchy can come later Concepts dimension or attribute Dimension uniquely identifies data Attribute adds info to data, e.g. flags Create DSDs from concepts. Use data flows Each dimension grouping is a DSD Ways to avoid many DSDs Pilot DSDs 1st pilot:Reporters provide feedback on DSD structures 2nd pilot: Reporters send data, Consumers process it
25
SDMX Main Tools SDMX Registry SDMX Converter
Directory of the structural metadata SDMX Converter converts between formats (Excel, GESMES, CSV, etc.) SDMX Reference Infrastructure SDMX Export and mapping for existing database Mapping
26
<Demo of Global Registry>
SDMX Tools <Demo of Global Registry>
27
Future of SDMX SDMX Validation Language More standard code lists
Automate basic level of data validation e.g. a+b=c Transform data More standard code lists E.g. Seasonal adjustment Better, more reusable tools, e.g. Mapping Plug-and-play modules to transform, validate messages More guidelines and harmonised structures Such as Global DSDs Use SDMX for reference metadata exchange
28
Any questions? David Barraclough OECD SDMX Coordinator
Thank you Any questions? David Barraclough OECD SDMX Coordinator
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.