Presentation is loading. Please wait.

Presentation is loading. Please wait.

Metadata Common Vocabulary a journey from a glossary to an ontology of statistical metadata, and back Sérgio Bacelar

Similar presentations


Presentation on theme: "Metadata Common Vocabulary a journey from a glossary to an ontology of statistical metadata, and back Sérgio Bacelar"— Presentation transcript:

1 Metadata Common Vocabulary a journey from a glossary to an ontology of statistical metadata, and back Sérgio Bacelar (sergio.bacelar@ine.pt)sergio.bacelar@ine.pt Statistics Portugal Joint UNECE/Eurostat/OECD Work Session on Statistical Metadata (METIS) Lisbon, 11 – 13 March, 2009

2 2 Definitions SDMX and SDMX Content-Oriented Guidelines (COG) Metadata Common Vocabulary (MCV) Concepts and related definitions used in structural and reference metadata of international organizations and national data producing agencies. Content Oriented Guidelines = MCV+ Cross Domain Concepts (subset of MCV) + Statistical Subject-matter Domains Last version (2009): 397 terms. Goal: uniform understanding of standard metadata concepts.

3 3 ESSnet on SDMX Objective –Further development of SDMX Further development and improvement of the SDMX Content-oriented Guidelines Metadata Task Force on SDMX (Statistics Portugal) WP Proposal: MCV Ontology Metadata Common Vocabulary (MCV) Semantic univocity  design of a conceptual model of the domain Detecting eventual inconsistencies, redundancies or incompleteness of the glossary Lack of structure, flat list, non-hierarchic relations between terms No semantic relations between terms

4 4 Conceptual system Building a glossary implies usually a previous design of a conceptual model of the respective domain. Proposal for a revision of MCV –Starting with the existent terms and definitions –creating semantic relations between terms based on the definitions of the MCV terms (bottom-up or middle-out strategy): –Goal: reveal the latent conceptual system, detecting eventual structural incongruence or redundancies.

5 5 Conceptual system and Concept Map Main goals –find redundancies, inconsistencies, omissions, terms belonging to other domains different from statistical metadata (justified by the complex and interdisciplinary nature of metadata). –To find omitted terms (important and relevant), is necessary to analyze the definitions of the concepts. Bearing this in mind we built a “Concept Map” representing about 20% of the terms in MCV (draft version). A concept map is a diagram showing the relationships among terms/concepts. Concepts are connected with labeled arrows, in a downward-branching hierarchical structure. Visualization (graphical): difficult since there is a great number of terms and relations.

6 6 Concept Map (partial view)

7 7

8 8 Terms and relations between MCV terms/concepts

9 9 Using Resource Description Framework (RDF) RDF is a framework for representing information in the Web. RDF is particularly concerned with meaning. RDF is a collection of triples, each one consisting of a subject, a predicate and an object: e.g. “MetadataExchange is-a DataAnd MetadataExchange”

10 10 Middle range solution Using SKOS (Simple Knowledge Organization System) - currently developed within the W3C framework Bridging technology between “chaos” and more rigorous logical formalism of ontology languages (like OWL). It is an application of the Resource Description Framework (RDF) providing a model for expressing the basic structure and content of concept schemes such as thesauri.

11 11 SKOS example: concept -data <rdf:RDF........... http://www.mycom/#data Characteristics or information, usually numerical, that are collected through observation data Data is the physical representation of information in a manner suitable for communication, interpretation, or processing by human beings or by automatic means (Economic Commission for Europe of the United Nations (UNECE), "Terminology on Statistical Metadata", Conference of European Statisticians Statistical Standards and Studies, No. 53, Geneva, 2000).

12 12 Ontologies Ontology = explicit formal specifications of the terms in the domain (statistical metadata) and relations among them. It is a model of reality in the world (created using an iterative design) Using an editing and modeling system of ontologies like Protégé (open source software in http://protege.stanford.edu )http://protege.stanford.edu

13 13 Ontologies reasoning It is essential to provide tools and services (reasoners) to help users answer queries over ontologies and classes and instances, e.g.: find more general/specific classes; retrieve individual matching an existing query ex. Is there any survey with trimestral frequency that uses any classification system and has a dissemination format as an on-line database?

14 14 Ontologies - methodology Developing an ontology: 1. Defining classes 2. Arranging classes in a taxonomic hierarchy (classes and subclasses) 3. Defining slots (same as roles or properties) 4. Describing allowed values for these slots (facets, role restrictions) 5. Filling in the values for slots for instances (individuals)

15 15 Ontology - Classes Just a first try to build an ontology of statistical metadata: main classes created from MCV (According to SDMX Content-Oriented Guidelines: Framework, Draft March 2006, p.6) 1. General metadata (derived from ISO, UNECE and UN documents); 2. Metadata describing Statistical methodologies; 3. Metadata describing Quality assessment; 4. Terms referring to Data and metadata exchange (SDMX information model and data structure definitions, etc.).

16 16 Classes and subclasses (Protégé)

17 17 Classes and subclasses

18 18 Classes and subclasses Quality

19 19 Properties Property Class (e.g. “Quality according to Eurostat, has a dimension called relevance”) relevance

20 20 Codification - Ontology Web Language (OWL) ………………….. <rdfs:comment > Metadata Common Vocabulary (MCV) ontology. ……………………… // Object Properties ……………………….. // Classes

21 21 Conclusion Since Ontology is a very strict, rigorous and formal language to represent knowledge, mapping a glossary like Metadata Common Vocabulary into a Statistical Metadata Ontology can help to reduce eventual inconsistencies, incompleteness and lack of structure; This may facilitate harmonization of concepts describing data (semantic univocity) to the SDMX users.


Download ppt "Metadata Common Vocabulary a journey from a glossary to an ontology of statistical metadata, and back Sérgio Bacelar"

Similar presentations


Ads by Google