SDMX and DDI: How Do They Fit Together in Practical Terms? Arofan Gregory The Open Data Foundation European DDI User’s Group 2011 Gothenburg, Sweden.

Slides:



Advertisements
Similar presentations
The SDMX Registry Model April 2, 2009 Arofan Gregory Open Data Foundation.
Advertisements

Status on the Mapping of Metadata Standards
SDMX in the Vietnam Ministry of Planning and Investment - A Data Model to Manage Metadata and Data ETV2 Component 5 – Facilitating better decision-making.
CESSDA Question Databank Tender, results and future Maarten Hoogerwerf, CESSDA expert seminar 2009.
SDMX training session on basic principles, data structure definitions and data file implementation 29 November
Information Infrastructure: Foundations for ABS Transformation Stuart Girvan, Australian Bureau of Statistics MSIS Paris, April 2013.
GSIM, CSPA, and Related Activities of the High-Level Group
Implementing a New Classification Management System at Statistics New Zealand Andrew Hancock, Statistics New Zealand Arofan Gregory, Metadata Technology.
DDI 3.0 Conceptual Model Chris Nelson. Why Have a Model Non syntactic representation of the business domain Useful for identifying common constructs –Identification,
Modernizing the Data Documentation Initiative (DDI-4) Dan Gillman, Bureau of Labor Statistics Arofan Gregory, Open Data Foundation WICS, 5-7 May 2015.
Introduction to ebXML Mike Rawlins ebXML Requirements Team Project Leader.
Environment Change Information Request Change Definition has subtype of Business Case based upon ConceptPopulation Gives context for Statistical Program.
Background Data validation, a critical issue for the E.S.S.
MDC Open Information Model West Virginia University CS486 Presentation Feb 18, 2000 Lijian Liu (OIM:
Data Documentation Initiative (DDI): Goals and Benefits Mary Vardigan Director, DDI Alliance.
ESCWA SDMX Workshop Session: Role in the Statistical Lifecycle and Relationship with DDI (Data Documentation Initiative)
WP.5 - DDI-SDMX Integration
WP.5 - DDI-SDMX Integration E.S.S. cross-cutting project on Information Models and Standards Marco Pellegrino, Denis Grofils Eurostat METIS Work Session6-8.
NSI 1 Collect Process AnalyseDisseminate Survey A Survey B Historically statistical organisations have produced specialised business processes and IT.
Case Studies: Statistics Canada (WP 11) Alice Born Statistics UNECE Workshop on Statistical Metadata.
Survey Data Management and Combined use of DDI and SDMX DDI and SDMX use case Labor Force Statistics.
Using ISO/IEC to Help with Metadata Management Problems Graeme Oakley Australian Bureau of Statistics.
SDMX and DDI Working Together Technical Workshop 5-7 June 2013
Generic Statistical Information Model (GSIM) Thérèse Lalor and Steven Vale United Nations Economic Commission for Europe (UNECE)
3 rd Annual European DDI Users Group Meeting, 5-6 December 2011 The Ongoing Work for a Technical Vocabulary of DDI and SDMX Terms Marco Pellegrino Eurostat.
DDI-RDF Discovery Vocabulary A Metadata Vocabulary for Documenting Research and Survey Data Linked Data on the Web (LDOW 2013) Thomas Bosch.
Introduction to MDA (Model Driven Architecture) CYT.
Architecture for a Database System
Development of metadata in the National Statistical Institute of Spain Work Session on Statistical Metadata Genève, 6-8 May-2013 Ana Isabel Sánchez-Luengo.
METADATA HARMONISATION SDMX Training BANK INDONESIA SEPTEMBER 2015 YOGYAKARTA, INDONESIA.
SDMX Standards Relationships to ISO/IEC 11179/CMR Arofan Gregory Chris Nelson Joint UNECE/Eurostat/OECD workshop on statistical metadata (METIS): Geneva.
Business needs and context for DDI and SDMX ESS DDI/SDMX Workshop
Technical Overview of SDMX and DDI : Describing Microdata Arofan Gregory Metadata Technology.
SDMX and DDI working together Technical workshop, Luxembourg, June 2013 Use cases for DDI and SDMX.
Describing Statistical registers in SDMX and DDI: A Comparison Arofan Gregory Metadata Technology Eurostat, June 4-6, 2013 Luxembourg.
DDI-RDF Leveraging the DDI Model for the Linked Data Web.
XML Registries Source: Java TM API for XML Registries Specification.
Development Process and Testing Tools for Content Standards OASIS Symposium: The Meaning of Interoperability May 9, 2006 Simon Frechette, NIST.
February 17, 1999Open Forum on Metadata Registries 1 Census Corporate Statistical Metadata Registry By Martin V. Appel Daniel W. Gillman Samuel N. Highsmith,
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
United Nations Economic Commission for Europe Statistical Division Introduction to Steven Vale UNECE
InSPIRe Australian initiatives for standardising statistical processes and metadata Simon Wall Australian Bureau of Statistics December
Panel discussion questions for Session ii. Panellists Dan Gilman (BLS) – BLS are an associate member of the DDI Alliance Eric Rodriguez (INEGI) Achim.
Statistical Metadata Strategy and GSIM Implementation in Canada Statistics Canada.
Eurostat SDMX and Global Standardisation Marco Pellegrino Eurostat, Statistical Office of the European Union Bangkok,
Eurostat 4. SDMX: Main objects for data exchange 1 Raynald Palmieri Eurostat Unit B5: “Central data and metadata services” SDMX Basics course, October.
SDMX IT Tools Introduction
Metadata By N.Gopinath AP/CSE Metadata and it’s role in the lifecycle. The collection, maintenance, and deployment of metadata Metadata and tool integration.
2.An overview of SDMX (What is SDMX? Part I) 1 Edward Cook Eurostat Unit B5: “Central data and metadata services” SDMX Basics course, October 2015.
Eurostat Standardisation within the ESS: SDMX present and future Luxembourg, October 2015 Marco Pellegrino Eurostat, Statistical Office of the European.
ABS Issues for DDI Futures Bryan Fitzpatrick October 2012.
Aim: “to support the enhancement and implementation of the standards needed for the modernisation of statistical production and services”
1 Chapter 2 Database Environment Pearson Education © 2009.
1 Joint UNECE/EUROSTAT/OECD METIS Work Session (Geneva, March 2010) The On-Going Review of the SDMX Technical Specifications Marco Pellegrino, Håkan.
SDMX Basics course, March 2016 Eurostat SDMX Basics course, March Introducing the Roadmap Marco Pellegrino Eurostat Unit B5: “Data and.
>> Metadata What is it, and what could it be? EU Twinning Project Activity E.2 26 May 2013.
DDI and GSIM – Impacts, Context, and Future Possibilities
Chapter 2 Database Environment Pearson Education © 2009.
SDMX Information Model
Chapter 2 Database Environment.
GSBPM, GSIM, and CSPA.
2. An overview of SDMX (What is SDMX? Part I)
2. An overview of SDMX (What is SDMX? Part I)
The problem we are trying to solve
Database Design Hacettepe University
Metadata The metadata contains
SISAI STATISTICAL INFORMATION SYSTEMS ARCHITECTURE AND INTEGRATION
Presentation to SISAI Luxembourg, 12 June 2012
The role of metadata in census data dissemination
DDI and GSIM – Impacts, Context, and Future Possibilities
Presentation transcript:

SDMX and DDI: How Do They Fit Together in Practical Terms? Arofan Gregory The Open Data Foundation European DDI User’s Group 2011 Gothenburg, Sweden

Outline Background Characterizing the Standards – DDI – SDMX – Similarities and Differences – Other Relevant Standards Implementation Approaches – DDI In, SDMX Out – SDMX-Centric – Standards Agnostic Future Possibilities: The SDMX-DDI Dialogue Proposal

Background This presentation intends to examine the different architectural approaches to implementations of SDMX and DDI together – While several organizations are mentioned, it is not a report on the status of prototypes or implementations This presentation does not intend to introduce DDI or SDMX to an unfamiliar audience – Familiarity with the standards is assumed

Background (2) When people think about using SDMX and DDI together, they make assumptions – Microdata (and tabulations) can be described using DDI – A transformation could be applied to produce SDMX to describe the aggregates/tables – There is a straight mapping from DDI to SDMX Interestingly, this conceptual model is not how the use of DDI and SDMX together is being approached in reality – The Devil is in the details! (Or is it “The Tomten is in the details” ?)

Background (3) People have been discussing the use of SDMX and DDI together for some time Now, we are at the stage where implementations are being investigated and prototyped – Not “if”, but “how” Most often, this is done in the context of the Generic Statistical Business Process Model (GSBPM), by data producers – The idea of “industrialized” statistical production – Strong emphasis on process management

Characterizing the Standards: DDI DDI Lifecycle can provide a very detailed set of metadata, covering: – The study or series of studies – Many aspects of data collection, including surveys and processing of microdata – The structure of data files, including hierarchical files and those with complex relationships – The lifecycle events and archiving of data files and their metadata – The tabulation and processing of data into tables (Ncubes) Allows for a link between the microdata variables and the resulting aggregates

Characterizing the Standards: SDMX Describes the structure of aggregate/dimensional data (“structural metadata”) Provides formats for the dimensional data Provides a model of data reporting/collection and dissemination Provides a way of describing the structures of arbitrary metadata sets (“reference metadata”) Provides formats for the arbitrary metadata sets Provides a set of standard registry interfaces, providing a catalog of resources Provides guidelines for deploying standard web services for SDMX resources Provides a way of describing statistical processes

Differences DDI has much more detailed metadata at the level of the study, because it is intended to describe the full process of data production (the data lifecycle) DDI provides more complete descriptions of the processing of data SDMX provides more architectural components, to support reporting/collecting and exchange

Similarities: Design Both standards use a similar mechanism for structuring URN identifiers Both standards use a similar model for identifiable, versionable, and maintainable things – Both have a concept of an owning agency – There is a very similar set of rules about versioning and maintenance Both standards use “schemes” as packages for lists of like items Both standards are designed to support reuse, and have similar referencing models

Similarities: Specific Metadata Items Concept Schemes SDMX Codelists/DDI Codes and Categories Dimensional data structures (Ncubes/DSDs) Organization Schemes There is an effort as part of the SDMX-DDI Dialogue to produce a common vocabulary of terms, describing similarities and differences

Other Relevant Standards Some things are not covered well by either SDMX or DDI, particularly classification management – The Neuchatel model is probably a better standard, but it has no standard XML representation – The older (and similar) CLASET model is also potentially useful, and does have an XML representation The GSBPM gives us a generic model for describing business processes, but to implement process management you will use other standards such as BPMN (specific process modelling) and BPEL (for executing processes)

Implementation Approach: DDI In, SDMX Out This is an approach used by the Australian Bureau of Statistics (ABS) in one part of their microdata access facility, REEM It is based on a set of software tools developed and sold by Space-Time Research (“SuperCross”) to support tabulations from microdata

DDI In, SDMX Out STR Database Format REEM Tabulation Tool <SDMX 2.0> ASCII Microdata File Production Time Process Run Time Process Run Time Process

Considerations This is a limited implementation, providing secure access to microdata in the form of user-defined tabulations – It is only a limited dissemination scenario – It relies on run-time confidentialization, which limits the data that can be made available (only data and tabulations for which robust automated confidentialisation can be assured) The internal formats are proprietary, and lack some of the richness of the DDI-L model – We also identified some bugs in DDI 3.1 Not sufficient for users who wish to perform statistical analysis of the microdata rather than produce tabulations – That need will be met by another part of the overall REEM solution in future There is no direct mapping from DDI to SDMX

Implementation Approach: SDMX-Centric This approach came out of discussions within INSEE, as they considered designs for the new metadata repository they are developing Similar approaches have been considered by other organizations This relies heavily on the use of the SDMX architectural components and model, especially the SDMX Registry There is an idea of GSBPM-based process management, but no process-management tool

SDMX Registry Classification Management Concept Management Study Unit Management (Instruments, Groups) Process Metadata Management Quality Metadata Management Dissemination SDMX Registry Dissemination Database(s) SDMX Metadata Reports SDMX Metadata Reports 5 different things: DDI Categories & Codes; SDMX Codelists, Hierarchical Codelists, Categories SDMX And DDI Concepts SDMX MD Reports, DDI Instances Resource Coordinator

MSD for DDI SDMX Registry SDMX Metadata Report Is Registered Query is made on the MD Report (not the DDI) Application Response contains pointers to the DDI resource Application reads the MD Report and can then go and get the DDI,

Considerations The SDMX Registry is available as a free tool, reducing the amount of development needed to deploy such a system – Other SDMX tools are also available for free Applications are coded against specific versions of the standards, coming with fairly high maintenance costs if future versions need to be supported Access to non-SDMX resources (DDI) involves a level of indirection – Retrieval is a two-step process: first get the “placeholder” SDMX Metadata Report, process it, and then retrieve the non-SDMX resource (DDI) The GSBPM was described as a set of SDMX Processes, and these are held in the registry to help organize and manage the statistical production process

Implementation Approach: Standards Agnostic This approach is currently being prototyped by the ABS, as part of a major re-development of their IT infrastructure to “industrialize” their production processes It is a registry-based, distributed model, but it does not rely on the SDMX Registry, but on a standards-agnostic registry It is also based on the GSBPM, and on the emergent sibling to it, the Generic Statistical Information Model (GSIM) There is a major component of process management and automation

Services Registry Metadata Registry Business Process Management Metadata Repositories Data Repositories BP Instance Repository Centralized Metadata Repository Corporate Directory Access/ User Management Rules Engine Other Services ID Service Resolution Service Rules Repository Schema Repository Centralized Data Repository

Registration of a New Metadata Object BPMS Metadata Repository (1)MD object published to repository Resolution service Registration service Metadata Registry Access control service (2) Write access Is determined for MD object ID service Object-type-specific indexing service (3) BPMS invokes the registration service (4) MD object (and location) obtained (5) Valid ID for MD object obtained (6) Location and ID are stored (7) Indexing service is invoked (8) Properties and relationships stored in registry; BP instance status updated (9)Registration response sent with errors/warnings and/or success

Standards Agnosticism The term “standards-agnostic” means that the standards themselves are represented as metadata objects within the registry – Each version of each relevant standard is described as either a read-only or a sufficient read-write format for any type of object – Every metadata object describes which versions of which standards are supported – Transformation services between standards and versions are also registered resources – Introducing new standards or new versions of new standards has a minimal impact on existing applications – Some “standards” could be agreed organization-wide standards, not necessarily public standards such as DDI and SDMX

Considerations There is a huge emphasis on process automation and management – All functionality is exposed as web services – this is an “SOA” architecture which works well with existing process management tools The cost of developing and deploying the new infrastructure will be very high – Migration from legacy systems will be challenging – Organizational change issues will need to be overcome The value of deploying such a system will be immense – Flexibility and speed will be greatly increased for statistical production – Management of the statistical production process will be easier and more effective – Consistency and quality of the data products will be enhanced

Future Possibilities: The SDMX-DDI Dialogue Proposal There has been a set of informal meetings between members (and prospective members) of the SDMX community and the DDI community, looking for ways in which the standards can be used together effectively – The first meeting was held at EDDI 2010 – There have been several other meetings since One proposal is now being discussed which outlines an approach to using SDMX and DDI interchangeably

A Simple Fact Its not about which flavor of XML you use – XML doesn’t really matter It’s about the data and the metadata!

The Challenge If I want to use DDI to describe my data, and you want to use SDMX, how can we ensure that we are getting the same data and metadata?

The Proposed Approach The SDMX-DDI Dialogue has been defining a set of relevant business cases where the two standards could be used together One of these business cases involves retrieving unit record data from a register A model of the full set of useful data and metadata has been identified – The metadata is a subset of the DDI elements, which could be expressed in DDI as a “DDI Profile”

The Proposed Approach (2) The full set of information includes: – The unit record data – Structural information about the variables and representations – Additional information about how the data has been generated/collected/processed In DDI, this set of information can be expressed as a DDI instance and a data file – Both the structural and processing metadata can be expressed as a single DDI instance

The Proposed Approach In SDMX, we have three XML files: – A file holding the data, expressed as dimensional microdata The unit identifier is a dimension The variable identifier is a dimension There are dimensions related to time – A reference metadata report will all other metadata describing the process/collection/generation of the administrative data – A file describing the concepts, data structure, and codelists (“structural metadata”) for the data, and also the structure of the metadata report

Metadata Set Unit Record Data DDI Instance ASCII Data File SDMX Data Set SDMX Structural Metadata SDMX Metadata Report

Results If I am using SDMX, but I am sent DDI, a simple transformation will give me the same payload of data and metadata Vice-versa for SDMX users There are some conventions which will need to be established regarding identifiers and the way the unit record files are structured There will need to be agreed models for each business case

An SDMX File?

Questions?