WP.5 - DDI-SDMX Integration

Slides:



Advertisements
Similar presentations
Status on the Mapping of Metadata Standards
Advertisements

SDMX in the Vietnam Ministry of Planning and Investment - A Data Model to Manage Metadata and Data ETV2 Component 5 – Facilitating better decision-making.
United Nations Statistics Division Principles and concepts of classifications.
DDI 3.0 Conceptual Model Chris Nelson. Why Have a Model Non syntactic representation of the business domain Useful for identifying common constructs –Identification,
SDMX and DDI: How Do They Fit Together in Practical Terms? Arofan Gregory The Open Data Foundation European DDI User’s Group 2011 Gothenburg, Sweden.
International Seminar on Modernizing Official Statistics:
Eurostat J OINT UNECE/OECD/E UROSTAT MEETING OF THE GROUP OF EXPERTS ON BUSINESS REGISTERS 3-4 September 2013, Geneva Session 1: Economic globalisation.
ESS.VIP programme architecture
Background Data validation, a critical issue for the E.S.S.
ESCWA SDMX Workshop Session: Role in the Statistical Lifecycle and Relationship with DDI (Data Documentation Initiative)
GSIM Stakeholder Interview Feedback HLG-BAS Secretariat January 2012.
WP.5 - DDI-SDMX Integration E.S.S. cross-cutting project on Information Models and Standards Marco Pellegrino, Denis Grofils Eurostat METIS Work Session6-8.
NSI 1 Collect Process AnalyseDisseminate Survey A Survey B Historically statistical organisations have produced specialised business processes and IT.
Case Studies: Statistics Canada (WP 11) Alice Born Statistics UNECE Workshop on Statistical Metadata.
Survey Data Management and Combined use of DDI and SDMX DDI and SDMX use case Labor Force Statistics.
Overview of SDMX: Statistical Data and Metadata eXchange Technical and Content Standards for Statistical Data Ann McPhail, Division Chief Statistics Department,
SDMX and DDI Working Together Technical Workshop 5-7 June 2013
3 rd Annual European DDI Users Group Meeting, 5-6 December 2011 The Ongoing Work for a Technical Vocabulary of DDI and SDMX Terms Marco Pellegrino Eurostat.
Background to the Generic Statistical Information Model (GSIM) Briefing Pack December
4 April 2007METIS Work Session1 Metadata Standards and Their Support of Data Management Needs Daniel W. Gillman Bureau of Labor Statistics Paul Johanis.
SDMX Standards Relationships to ISO/IEC 11179/CMR Arofan Gregory Chris Nelson Joint UNECE/Eurostat/OECD workshop on statistical metadata (METIS): Geneva.
Business needs and context for DDI and SDMX ESS DDI/SDMX Workshop
Technical Overview of SDMX and DDI : Describing Microdata Arofan Gregory Metadata Technology.
Eurostat Unit B3 – IT and standards for data and metadata exchange SDMX Basics Training – 2012 IT architectures for data exchange SDMX-RI and the Hub approach.
SDMX and DDI working together Technical workshop, Luxembourg, June 2013 Use cases for DDI and SDMX.
Describing Statistical registers in SDMX and DDI: A Comparison Arofan Gregory Metadata Technology Eurostat, June 4-6, 2013 Luxembourg.
United Nations Economic Commission for Europe Statistical Division Part B of CMF: Metadata, Standards Concepts and Models Jana Meliskova UNECE Work Session.
Metadata Models in Survey Computing Some Results of MetaNet – WG 2 METIS 2004, Geneva W. Grossmann University of Vienna.
The ESS.VIP Programme: a response to the challenges facing the ESS Mariana Kotzeva, ESS VIP Programme Coordinator Advisor Hors Classe ESTAT.
BAIGORRI Antonio – Eurostat, Unit B1: Quality; Classifications Q2010 EUROPEAN CONFERENCE ON QUALITY IN STATISTICS Terminology relating to the Implementation.
1 HLG-BAS workshop Session III Questionnaire responses of the HLG-BAS related groups A. Born / A. Götzfried / J.M. Museux.
Eurostat Expression language (EL) in Eurostat SDMX - TWG Luxembourg, 5 Jun 2013 Adam Wroński.
United Nations Economic Commission for Europe Statistical Division Introduction to Steven Vale UNECE
InSPIRe Australian initiatives for standardising statistical processes and metadata Simon Wall Australian Bureau of Statistics December
Statistical Metadata Strategy and GSIM Implementation in Canada Statistics Canada.
1 1 Developing a framework for standardisation High-Level Seminar on Streamlining Statistical production Zlatibor, Serbia 6-7 July 2011 Rune Gløersen IT.
Work packages SGA II ESSnet on microdata linking and data warehousing in statistical production Harry Goossens – Statistics Netherlands Head Data Service.
The future of Statistical Production CSPA. 50 task team members 7 task teams CSPA 2015 project.
Eurostat SDMX and Global Standardisation Marco Pellegrino Eurostat, Statistical Office of the European Union Bangkok,
SDMX IT Tools Introduction
2.An overview of SDMX (What is SDMX? Part I) 1 Edward Cook Eurostat Unit B5: “Central data and metadata services” SDMX Basics course, October 2015.
1 SDMX Global Conference September 2015 SDMX into the future VTL (Validation and Transformation Language) A new technical standard for enhancing.
Eurostat Standardisation within the ESS: SDMX present and future Luxembourg, October 2015 Marco Pellegrino Eurostat, Statistical Office of the European.
Aim: “to support the enhancement and implementation of the standards needed for the modernisation of statistical production and services”
7b. SDMX practical use case: Census Hub
GSIM, DDI & Standards- based Modernisation of Official Statistics Workshop – DDI Lifecycle: Looking Forward October 2012.
United Nations Economic Commission for Europe Statistical Division GSBPM and Other Standards Steven Vale UNECE
Eurostat Sharing data validation services Item 5.1 of the agenda.
From Intrastat to SIMSTAT and ESS.VIP Programme ESTAT Walter Radermacher.
SDMX Basics course, March 2016 Eurostat SDMX Basics course, March Introducing the Roadmap Marco Pellegrino Eurostat Unit B5: “Data and.
Interoperable data formats: SDMX
SISAI STATISTICAL INFORMATION SYSTEMS ARCHITECTURE AND INTEGRATION
11. The future of SDMX Introducing the SDMX Roadmap 2020
Metadata in the modernization of statistical production at Statistics Canada Carmen Greenough June 2, 2014.
2. An overview of SDMX (What is SDMX? Part I)
2. An overview of SDMX (What is SDMX? Part I)
The Generic Statistical Information Model
SDMX Information Model: An Introduction
ESS.VIP VALIDATION An ESS.VIP project for mutual benefits
Statistical Information Technology
SISAI STATISTICAL INFORMATION SYSTEMS ARCHITECTURE AND INTEGRATION
Presentation to SISAI Luxembourg, 12 June 2012
Item 7.3 (b) SDMX for UOE data collection
Implementing the “Vision” within the ESS
Business architecture
Generic Statistical Information Model (GSIM)
Work Session on Statistical Metadata (Geneva, Switzerland May 2013)
Presentation of Project Joint meeting of the ESS.VIP.BUS ICT Project
Palestinian Central Bureau of Statistics
Presentation transcript:

WP.5 - DDI-SDMX Integration METIS Work Session 6-8 May 2013 WP.5 - DDI-SDMX Integration E.S.S. cross-cutting project on Information Models and Standards Marco Pellegrino, Denis Grofils Eurostat

Outline ESS VIP programme Cross-cutting project on Information Models and Standards - Scope - Phases - Deliverables SDMX-DDI Integration: open points for discussion

ESS.VIP programme Transformation programme for the modernisation of the production systems in the European Statistical System (ESS) through: moving towards more common solutions and shared services and environment economies of scale and efficiency gains, sharing costs Context : Production of European Statistics European statistical system : a complex system with many production lines (NGOs and Eurostat) Long tradition of output harmonisation (legal framework) Tradition of cooperative developments (ESSnets) ICT changes and IT rationalisation in many organisations

Infrastructure transformation Improve communication/network among ESS partners Provide common information repositories Foster process interoperabilty Foster process agility and reuse of components

ESS.VIP Programme components Projects in statistical domains Technical cross-cutting projects Frameworks and administrative mechanisms ADMIN Information models and standards Communication NAPS Network for information exchange Governance PRIX Data warehouses Human resources ESBRs Shared services Sharing costs Financial resources SIMSTAT Legal framework ICT Programme management Common data validation policy Several dependencies

ESS.VIP business and information principles Maximum reuse of existing process components and segments Metadata driven processes allowing adaptation/parameterisation and extension to other contexts New business process built as a sequence of modular process steps / services Information objects structured according to available information models and stored in corporate registries/repositories in view of reuse Adherence to industry and open standards as available (e.g. Plug & Play)

Information Models Standards Objectives: To ensure that ESS.VIP have access to a set of agreed-upon standards supporting the modernisation of statistical production processes. To increase coherence between standards, at the same time ensuring that these are consistent with best practices and recommendations from the international community. To define information models that can be used across the ESS to model structural metadata for micro-data and aggregated data. To set up guidelines for designing and documenting business processes. To provide support mechanisms (e.g., capacity-building and training).

The four meta-modelling levels Models? The four meta-modelling levels SDMX metamodel Data model, concepts, DSD The M0 layer is the real system. A model represents this system at level M1. This model conforms to its metamodel defined at level M2. The metamodel itself conforms to the meta-metamodel at level M3. The meta-metamodel conforms to itself. A model is a partial analogy of a system. The analogy between the model and the represented reality is partial. The properties of the model are not identical to the properties of the reality. Real data (e.g. BOP, ESA) A model represents a system and conforms to a metamodel.

Integration of methods and techniques DATA Software Services Administrator DEFINITIONS User Information Model Users would insert definitions in the information system using a subject matter language (instead of giving them as requirements to EDP people) and definitions would drive software, so the desired result could be obtained sooner and cheaper ! Moreover, definitions would be proper and accurate elsewhere the process would fail and definition would remain in the information system, so they could be inquired to remember what the system contains and does, instead of having the behaviour of the information system hidden in computer programs ! To obtain such a goal, the system must contain also information needed to operate it, like the one related to controls, calculations, estimates and other functions to be executed, together with the order and the conditions of execution (the process). Design Build Collect Process Disseminate Use

Which standards and models? Re-use existing resources Link to new initiatives (e.g. Sponsorship on Standardisation, GSIM)

DDI and SDMX SDMX: Statistical Data and Metadata eXchange Standard for the exchange of statistical data and metadata “the preferred standard for exchange and sharing of data and metadata in the global statistical community” Widely used in the ESS and at UN level Extended to support some registers as well DDI: The Data Documentation Initiative Standard for the documentation of data Initially focused on archiving micro-data in the area of social sciences – Widely used in data archives Extended to support the full life-cycle of data

The SDMX-DDI approach Informal meetings (2010-2013) between members of the SDMX and DDI communities, looking for ways in which the standards could be used together effectively Initiative of the SDMX Secretariat through its Technical Working Group Approach to using SDMX and DDI interchangeably Now, we are at the stage where implementations are being investigated and prototyped Not “if”, but “how” Most often, this is done in the context of the Generic Statistical Business Process Model (GSBPM) Idea of “industrialised” statistical production Strong emphasis on process management

Generic Statistical Business Process Model DDI DDI SDMX

GSBPM, DDI and SDMX: towards a complete system?

DDI and SDMX DDI offers a very rich model for the documentation of micro-data SDMX offers a very integrated exchange platform for statistical outputs (IT architectures, tools, web services) When people think about using SDMX and DDI together, they make assumptions Microdata (and tabulations) can be described using DDI A transformation could be applied to produce SDMX to describe the aggregates/tables There is a straight mapping from DDI to SDMX Interestingly, this conceptual model is not how the use of DDI and SDMX together is being approached in reality The Devil is in the details! The combined use of both standards could allow a higher level of integration of the complete production process But: The devil is in the detail!

Characterizing the Standards: DDI DDI Lifecycle can provide a very detailed set of metadata, covering: The study or series of studies Many aspects of data collection, including surveys and processing of microdata The structure of data files, including hierarchical files and those with complex relationships The lifecycle events and archiving of data files and their metadata The tabulation and processing of data into tables (Ncubes) Allows for a link between the microdata variables and the resulting aggregates

Characterizing the Standards: SDMX Describes the structure of aggregate/dimensional data (“structural metadata”) Provides formats for the dimensional data Provides a model of data reporting and dissemination Provides a way of describing and formatting stand-alone metadata sets (“reference metadata”) Provides standard registry interfaces, providing a catalogue of resources Provides guidelines for deploying standard web services for SDMX resources Provides a way of describing statistical processes

SDMX Process Metadata Data validation and editing, SDMX Registry, DSD and data set, MSD, metadata set, Web services Process Metadata DDI has much more detailed metadata at the level of the study, because it is intended to describe the full process of data production (the data lifecycle) DDI provides more complete descriptions of the processing of data SDMX provides more architectural components, to support reporting/collecting and exchange SDMX provides generic mechanisms to support foreseen and non-foreseen use cases (categorisation, HCL, MSD) Similarities: Both standards use a similar mechanism for structuring URN identifiers Both standards use a similar model for identifiable, versionable, and maintainable things Both have a concept of an owning agency There is a very similar set of rules about versioning and maintenance Both standards use “schemes” as packages for lists of like items Both standards are designed to support reuse, and have similar referencing models

Analysis of use cases The SDMX TWG has been defining a set of relevant use cases where the two standards could be compared and, if possible, used together: Survey data collection Administrative and register data Combined use of DDI and SDMX Micro-data access and on-demand tabulation of micro-data Metadata and quality reporting

Survey A Survey is targeted at a specific Population and comprises Questions Question may be linked to a Variable which specifies - conceptual meaning (Concept) valid set of responses that are allowed (Category Scheme and contained Category) Output from the Survey is a Unit Record Data Set

The Proposed Approach The full set of information includes: The unit record data Structural information about the variables and representations Additional information about how the data has been generated/collected/processed In DDI, this set of information can be expressed as a DDI instance and a data file Both the structural and processing metadata can be expressed as a single DDI instance

Data Process and Cleaning Editing Process can include Validation Outlier Trimming Recodes Editing for Non Response Editing Process consists of Description of the process (Process Description) Software environment (Executable Code)

Tabulation The result of a Tabulation is an Aggregate Data Set Structured according to a Dimensional Structure Definition (SDMX DSD) Comprising Dimensions, Attributes, Measures Each take their semantic and representation from a Variable Data Set comprises statistical series Key Attributes Observations

Output Tables

Concepts

Metadata Set Unit Record Data DDI Instance ASCII Data File SDMX Data SDMX Structural Metadata SDMX Metadata Report In SDMX, we have three XML files: A file holding the data, expressed as dimensional microdata The unit identifier is a dimension The variable identifier is a dimension There are dimensions related to time A reference metadata report with all other metadata describing the process/collection/generation of the administrative data A file describing the concepts, data structure, and codelists (“structural metadata”) for the data, and also the structure of the metadata report

The challenge Its not about which flavor of XML we use (XML doesn’t really matter) It’s about data and metadata! If I want to use DDI to describe my data, and you want to use SDMX, how can we ensure that we are getting the same data and metadata?

The challenge (2) If I am using SDMX, but I am sent DDI, a simple transformation must give me the same payload of data and metadata Vice-versa for SDMX users Conventions will need to be established regarding identifiers and the way the unit record files are structured There will need to be agreed models for each business case

Combined DDI-SDMX approaches Mixing the two standards within an implementation, allowing for the expression of the same metadata set in both standards, so that the information could be transformed from one format to the other. This way, it would become possible to select either DDI or SDMX for a particular operation, similar to what we discussed above regarding application functionality. Metadata stored and indexed in such a fashion that it can be expressed either as SDMX or DDI on an as-needed basis. Metadata Repository and Registry project at ABS. The actual format used for metadata storage may be neither SDMX nor DDI, so long as it can be expressed using both standards. GSIM to be implemented through a combination of SDMX and DDI?

Generic Statistical Information Model (GSIM) SDMX DDI ISO 11179 Etc.

Feedback is welcome Thank you! Marco Pellegrino Denis Grofils