CHRIS NELSON METADATA TECHNOLOGY WORK SESSION ON STATISTICAL METADATA GENEVA 6-8 MAY 2013 Designing a Metadata Repository Metadata Technology Ltd.

Slides:



Advertisements
Similar presentations
The SDMX Registry Model April 2, 2009 Arofan Gregory Open Data Foundation.
Advertisements

National Institute of Statistics, Geography and Informatics (INEGI) Implementation of SDMX in Mexico.
Towards a normalised, domain-independent model for modelling the contents of statistical data and associated metadata Or: How to design correct and globally.
SDMX in the Vietnam Ministry of Planning and Investment - A Data Model to Manage Metadata and Data ETV2 Component 5 – Facilitating better decision-making.
SDMX training session on basic principles, data structure definitions and data file implementation 29 November
TCP/IP Protocol Suite 1 Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Chapter 22 World Wide Web and HTTP.
© Metadata Technology ESCWA SDMX Workshop Session: SDMX Registry Registration of a Data Set.
SDMX data discovery, query, and visualisation within Excel
DDI 3.0 Conceptual Model Chris Nelson. Why Have a Model Non syntactic representation of the business domain Useful for identifying common constructs –Identification,
The use and convergence of quality assurance frameworks for international and supranational organisations compiling statistics The European Conference.
DAT602 Database Application Development Lecture 15 Java Server Pages Part 1.
WP.5 - DDI-SDMX Integration
WP.5 - DDI-SDMX Integration E.S.S. cross-cutting project on Information Models and Standards Marco Pellegrino, Denis Grofils Eurostat METIS Work Session6-8.
Global SDMX Implementation Experience from on-going projects Daniel Suranyi, Eurostat Project Officer for SDMX implementation SDMX Expert Group,
Survey Data Management and Combined use of DDI and SDMX DDI and SDMX use case Labor Force Statistics.
Sdmx web services Strutural data
3 rd Annual European DDI Users Group Meeting, 5-6 December 2011 The Ongoing Work for a Technical Vocabulary of DDI and SDMX Terms Marco Pellegrino Eurostat.
DATA PORTAL SDMX Training BANK INDONESIA SEPTEMBER 2015 YOGYAKARTA, INDONESIA.
IMS Proof of Concept for Data Capture using Metadata Bryan Fitzpatrick Rapanea Consulting Limited June 2014.
Introduction: Databases and Database Users
SDMX at the IMF Progress Report Expert Group on Statistical Data and Metadata Exchange (SDMX 2007), Geneva, May 8-11, 2007 Patrick Hinderdael, Economic.
SDMX AND DATA DISSEMINATION SDMX Training BANK INDONESIA SEPTEMBER 2015 YOGYAKARTA, INDONESIA.
Restricted Daejeon, April An SDMX based unified data catalogue (UDC) MSIS – Meeting on the Management of Statistical Information Systems 1.
METADATA HARMONISATION SDMX Training BANK INDONESIA SEPTEMBER 2015 YOGYAKARTA, INDONESIA.
CountryData Technologies for Data Exchange SDMX Information Model: An Introduction.
SDMX Standards Relationships to ISO/IEC 11179/CMR Arofan Gregory Chris Nelson Joint UNECE/Eurostat/OECD workshop on statistical metadata (METIS): Geneva.
Topic Rathachai Chawuthai Information Management CSIM / AIT Review Draft/Issued document 0.1.
SDMX Web Services the JSON version Sami Airo & Gerard Salou.
METIS 2004 (Geneva, 9-11 February 2004) Inter-agency cooperation for the dissemination and exchange of standard metadata Invited Paper Submitted by Eurostat,
© Metadata Technology ESCWA SDMX Workshop Session: SDMX and a Re-usable Component Architecture.
Data Manipulation Jonathan Rosenberg dynamicsoft.
Model and Representations
Semantic Phyloinformatic Web Services Using the EvoInfo Stack Speaker: John Harney LSDIS Lab, Dept. of Computer Science, University of Georgia Mentor(s):
Eurostat 6. SDMX: A non-technical overview of the SDMX architecture and IT tools 1 Raynald Palmieri Eurostat Unit B5: “Central data and metadata services”
Eurostat SDMX and Global Standardisation Marco Pellegrino Eurostat, Statistical Office of the European Union Bangkok,
Eurostat 4. SDMX: Main objects for data exchange 1 Raynald Palmieri Eurostat Unit B5: “Central data and metadata services” SDMX Basics course, October.
Metadata Working Group Jean HELLER EUROSTAT Directorate A: Statistical Information System Unit A-3: Reference data bases.
REST By: Vishwanath Vineet.
Eurostat November 2015 Eurostat Unit B3 – IT and standards for data and metadata exchange Jean-Francois LEBLANC Christian SEBASTIAN SDMX IT Tools SDMX.
Statistical Data and Metadata Exchange SDMX Metadata Common Vocabulary Status of project and issues ( ) Marco Pellegrino Eurostat
© Metadata Technology ESCWA SDMX Workshop Session: Reference Metadata and Metadata Structure Definition.
OASIS ebXML Registry Standard Open Forum 2003 on Metadata Registries 10:30 – 11:15 January 20, 2003 Kathryn Breininger The Boeing Company Chair, OASIS.
1 CS 430: Information Discovery Lecture 26 Architecture of Information Retrieval Systems 1.
Eurostat November 2015 Eurostat Unit B3 – IT and standards for data and metadata exchange Jean-Francois LEBLANC Christian SEBASTIAN SDMX IT Tools SDMX.
Eurostat May 2016 Eurostat, Unit B3 – IT solutions for statistical production Test Client Jean-Francois LEBLANC Christian SEBASTIAN.
IAEA International Atomic Energy Agency Implementing SDMX for Energy Domain: From Discussion to Actual Implementation and Testing Andrii Gritsevskyi Oslo.
Jonathan Rosenberg dynamicsoft
National Accounts World Wide Exchange
4. SDMX: Main objects for data exchange
11. The future of SDMX Introducing the SDMX Roadmap 2020
SDMX Reference Infrastructure Introduction
ESCWA SDMX Workshop Session: Constraints.
2. An overview of SDMX (What is SDMX? Part I)
2. An overview of SDMX (What is SDMX? Part I)
SDMX Information Model: An Introduction
Middleware, Services, etc.
LOD reference architecture
Developing a Data Model
Statistical Information Technology
RAMON Re-engineering An Update
SODI Live Demonstration
Item 7.3 (b) SDMX for UOE data collection
Jean-Francois LEBLANC Christian SEBASTIAN
Eurostat Unit B3 – IT and standards for data and metadata exchange
7. Introduction to the main SDMX objects for metadata exchange
Standardizing and industrializing a business process – the dissemination use case Alessio Cardacino - ESTP Course “Information standards.
SDMX IT Tools SDMX Registry
Presentation transcript:

CHRIS NELSON METADATA TECHNOLOGY WORK SESSION ON STATISTICAL METADATA GENEVA 6-8 MAY 2013 Designing a Metadata Repository Metadata Technology Ltd

Roadmap What type of metadata Major issues facing designers of metadata repositories How these can be overcome Concentrates on SDMX as this is the standard supported by the Metadata Technology metadata repository but many of the issues and the solutions are generic to any approach that supports a standard metadata model Concentrates on data dissemination and how to combine the data points to the metadata that are relevant

What Type of Metadata What SDMX calls “referential metadata” sometimes known as “footnote metadata” Often used in data quality frameworks ECB Eurostat ILO IMF SDDS and GDDS OECD Many others

Problem Statement We are building a metadata repository to support the metadata model of SDMX the repository is therefore not built to support user requirements specific to an individual organisation it is built to support the genericity of metadata that can be authored, stored, queried, and made available with the SDMX constructs and data points to which they relate Most problems stem from the need to support a standardised metadata model it is important not to deviate from the model There are many models for metadata but few of them support discovery, query, and retrieval of metadata SDMX has a strong and generic model for (reference) metadata

Problem Statement SDMX has strong support for data query both simple (REST) queries and more complex (SOAP) queries but no real support for simple metadata queries little real support for combining metadata with data points in the same “message” But applications (e.g. web clients) want simple mechanisms to know what metadata are relevant to the data points to retrieve the metadata to process the metadata in general XML is NOT friendly for web clients

Design Issues How to query for metadata? in SDMX the standard is very clear on how to query for data but there is no equivalent simple query format for metadata that lends itself to use by web clients. What should the metadata repository return when queried? should this be a link to the metadata or the actual metadata? How is the web client informed of the presence of metadata? should this be separated from the data set or embedded with the data, even though it was authored independent of the data? Does the data store need to know about the metadata repository? or can the data store be totally de-coupled from the metadata repository?

Design Issues How does the application know precisely to which object metadata is attached? it is simple to attach metadata to data as data has a precise key but metadata related to structural metadata can be more complex careful thought needs to be given to validating that the “context” of the metadata is well understood What input and output formats should be supported for the metadata? for SDMX systems clearly SDMX will need to be supported but there are other more web friendly formats such as JSON (Java Script Object Notation) How to attach the same metadata to multiple objects? metadata often travels through the statistical lifecycle and may be used by multiple artefacts in this lifecycle

Query for Metadata – Scenario 1  The web client queries for the data, obtains the result,  determines the “key” for each possible point at which metadata can be available, and queries the metadata repository for all these points.  The web client places an “i” at the point at which there is metadata.  When the user clicks on the “i” the web client retrieves the metadata.  The metadata is returned.

Query for Metadata – Scenario 1 The disadvantages of this approach are: 1.The web client needs know how to query for the metadata. 2.The web client needs to make multiple queries to the metadata service, one for each data point.

Query for Metadata – Scenario 2  The web client queries for the data.  The same data query is passed to the metadata repository and the data service annotates the data response with the metadata points.  The web client receives the response, and can use the metadata points to indicate to the user (“i” ) that extra metadata is available.  When the user wishes to view the additional metadata,  the web client makes an additional query to the metadata repository.  The metadata is returned.

Query for Metadata – Scenario 2 The disadvantages of this approach are: 1.The database needs to know about the metadata repository, it needs to query the metadata repository, and it needs to know how to interpret the response and embed the information into the dataset. 2.It is not possible to retrofit the metadata repository onto existing data web services without enhancing the data web service.

Query for Metadata – Scenario 3 Looks similar to Scenario 2 but architecture is different  Here the metadata repository is in charge and  It passes the data query to the database.  The metadata repository enhances the data response with the metadata points.  -  as for scenario 2. Advantage of this approach The metadata repository can be retro-fitted to existing database systems because the database does not need to have any knowledge of the existence of the metadata repository.

Unite the Metadata with the Data Remember we must conform to the SDMX Information Model SDMX has Data Set for data Metadata Set for metadata (Structure message for structural metadata)

A Not Very Friendly Response Application Here’s some metadata. It relates to country=Austria in the context of LFS by Sex and Age. Find the data and sort it out yourself Here’s some data for LFS by Sex and Age for employment of men aged for all countries There may be some metadata, but this will be in a separate message

A More Friendly Response Application Here’s some URLs of related metadata. I’ve put them next to the data points to which they refer, or in the data set section if the metadata refer to many data points. If you go to the URL you will get the metadata Here’s some data for LFS by Sex and Age for employment of men aged for all countries This can be embedded in the SDMX data set using Annotations Remember, we are talking about the Information Model, not the specific syntax used to represent this. This could XML, or JSON, or CSV(!), or any syntax that supports the model

Data or Metadata Structure Definition Category Scheme Category Data or Metadata Flow Data Provider Provision Agreement Content Constraint Structure and Item Scheme Maps Registered Data Set or Metadata Set Categorisation RegisterRegister Data Provider Scheme Attaching Metadata to Non- Data Points All these points are valid Concept Scheme Concept

Data Provider Data Provider Scheme Concept Scheme Concept Attaching Metadata to Non-Data Points This union has no semantic meaning in the SDMX Information Model So it has no meaning for a metadata repository supporting the semantics of the Information Model But it may be meaningful for an individual organisation and so the SDMX Metadata Structure Definition (MSD) allows this type of union. However, the generic metadata repository cannot support this and must ensure such MSDs are not defined or used in the repository

Storing, Indexing, Retrieving, Metadata Metadata needs to be identified uniquely and to be found easily. a lot of metadata is textual in nature this is often better supported in a NoSQL database which has good in-built indexing and textual query support Metadata may be versioned draft and final versions metadata for different reporting periods Metadata may be shared between different objects this is quite common for metadata supporting processes This means the metadata repository needs to separate the actual metadata from the way it is indexed

Summary Get the design right by knowing up front the user requirements and the problems these may reveal Designing a generic solution that supports a specific standard (or even multiple standards) is more difficult than a solution for a specific organisation There will always be specific user requirements that will “test” the robustness of the solution But the robustness of the design will be helped if the design is based on a strong model for metadata