A Semantic Modelling Approach to Biological Parameter Interoperability Roy Lowry & Laura Bird British Oceanographic Data Centre Pieter Haaring RIKZ, Rijkswaterstaat,

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Advertisements

Alexandria Digital Library Project Integration of Knowledge Organization Systems into Digital Library Architectures Linda Hill, Olha Buchel, Greg Janée.
European Interoperability Architecture e-SENS Workshop : Cartography Tool in practise 7-8 January 2015.
GE/BCDMEP Meeting March 2004 EnParDis Enabling Parameter Discovery Roy Lowry, Michael Hughes & Laura Bird British Oceanographic Data Centre.
Integration of Protein Family, Function, Structure Rich Links to >90 Databases Value-Added Reports for UniProtKB Proteins iProClass Protein Knowledgebase.
Roy Lowry Adam Leadbetter British Oceanographic Data Centre.
The BODC Parameter Markup and Usage Vocabulary Semantic Model Roy Lowry British Oceanographic Data Centre GO-ESSP Meeting, RAL, June 2005.
Helping people find content … preparing content to be found Enabling the Semantic Web Joseph Busch.
1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.
Vocabulary management: a foundation for semantic interoperability through ontology development Roy Lowry British Oceanographic Data Centre GO-ESSP, Paris,
University of Southern California Enterprise Wide Information Systems ABAP/ 4 Programming Language Instructor: Richard W. Vawter.
Brian A. Carlsen Apelon, Inc. Tools For Classification Integration Networked Knowledge Organization Systems/Services Workshop June 28, 2001.
NERC Data Grid Helen Snaith and the NDG consortium …
Pan-European infrastructure for Ocean & Marine Data management An EU Integrated research Infrastructure Initiative (I3) MIKADO : Java tool for XML Creation.
Software Documentation Written By: Ian Sommerville Presentation By: Stephen Lopez-Couto.
Broadcast service Core tools. Agenda 1.Introduction – tool and its main features 2.Setting up and sending a simple broadcast 3.Achieving.
Ontology Semantic Mediation in the Big Picture MMI Workshop - August 2005.
EDMED and EDIOS Roy Lowry, Karen Vickers (Technical) Lesley Rickards, Liz Bradshaw (Content) British Oceanographic Data Centre.
Vocabulary Services “Huuh - what is it good for…” (in WDTS anyway…) 4 th September 2009 Jonathan Yu CSIRO Land and Water.
2 nd Training Workshop 4 – 5 June 2007 Common Data Index - CDI By Dick M.A Schaap Technical Coordinator SeaDataNet.
The NERC DataGrid Vocabulary Server Roy Lowry British Oceanographic Data Centre Ontology Registry Meeting.
The NERC DataGrid Vocabulary Server: an operational system with distributed ontology potential Roy Lowry British Oceanographic Data Centre GO-ESSP 2008,
SeaDataNet Ontology Use Case Roy Lowry British Oceanographic Data Centre Coastal Atlas Interoperability Workshop, Corvallis, July (+ Lessons.
Špindlerův Mlýn, Czech Republic, SOFSEM Semantically-aided Data-aware Service Workflow Composition Ondrej Habala, Marek Paralič,
MEDIN Data Guidelines. Data Guidelines Documents with tables and Excel versions of tables which are organised on a thematic basis which consider the actual.
Classroom User Training June 29, 2005 Presented by:
Controlled Vocabularies (Term Lists). Controlled Vocabs Literally - A list of terms to choose from Aim is to promote the use of common vocabularies so.
CF Conventions Support at BADC Alison Pamment Roy Lowry (BODC)
GCMD/IDN STATUS AND PLANS Stephen Wharton CWIC Meeting February19, 2015.
The MMI Tools Carlos Rueda Monterey Bay Aquarium Research Institute OOS Semantic Interoperability Workshop Marine Metadata Interoperability Project Boulder,
Access Primer Africamuseum 5 June MS Access  Relational Database Management System Data/information resides in series of related tables Principle.
1 The NERC DataGrid DataGrid The NERC DataGrid DataGrid AHM 2003 – 2 Sept, 2003 e-Science Centre Metadata of the NERC DataGrid Kevin O’Neill CCLRC e-Science.
FireRMS NEMSIS (Part 2) Presented by Laura Small FireRMS Quality Assurance.
Coastal Web Atlas Design and Usability Liz O’Dea Coastal & Marine Resources Centre, University College Cork.
Metadata and Geographical Information Systems Adrian Moss KINDS project, Manchester Metropolitan University, UK
UKOLN is supported by: Approaches to Metadata Quality Marieke Guy QA Focus A centre of expertise in digital information management
MD9.6 Release: Highlights Increased the character limit for all URL resources to 600 characters. Data_Center/Service_Provider Data_Set_Citation/Service_Citation.
EMODNET Chemistry 2 Semantic Suggestions Roy Lowry and Adam Leadbetter British Oceanographic Data Centre.
Introduction to Databases Trisha Cummings. What is a database? A database is a tool for collecting and organizing information. Databases can store information.
NERC DataGrid NERC DataGrid Vocabulary Server Use Cases Vocabulary Workshop, RAL, February 25, 2009.
Metadata for the GPII Liddy Nevile. DRD metadata.
Database Design and Management CPTG /23/2015Chapter 12 of 38 Functions of a Database Store data Store data School: student records, class schedules,
CSIRO Marine Research Data Centre linked databases - CAAB, MarLIN and Divisional Data Warehouse.
Database Management Systems.  Database management system (DBMS)  Store large collections of data  Organize the data  Becomes a data storage system.
Field Based Data Validation: a very real experience in wrangling data, taxonomic names, and photos Moorea Biocode Project, supported by the Gordon and.
Data Staging Data Loading and Cleaning Marakas pg. 25 BCIS 4660 Spring 2012.
 Dr. Syed Noman Hasany.  Review of known methodologies  Analysis of software requirements  Real-time software  Software cost, quality, testing and.
SeaDataNet Harmonizing and optimizing the metadatabases and controlled vocabularies, incl maintenance & retrieval systems.
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
Microsoft Access Lesson 5 Lexington Technology Center February 25, 2003 Bob Herring On the Web at
Description and exemplification use of a Data Dictionary. A data dictionary is a catalogue of all data items in a system. The data dictionary stores details.
Metadata-based Discovery: Experience in Crystallography UKOLN is supported by: Monica Duke UKOLN, University of Bath, UK A centre of.
Roy Lowry British Oceanographic Data Centre.  Controlled Vocabularies - What and Why  Controlled Vocabularies - History  Controlled Vocabularies -
Semantic Web. P2 Introduction Information management facilities not keeping pace with the capacity of our information storage. –Information Overload –haphazardly.
Validation of Metadata XML files SeaDataNet Training, June 2008 Presented by with contributions from Karen Vickers (BODC) Presented by Michèle Fichaut.
CDI Data Discovery and Access Service Dick Schaap (MARIS) – SeaDataNet Technical Coordinator RDA – Paris - Sept 2015.
MIKADO – Generation of ISO – SeaDataNet metadata files
Product Training Program
P01 parameters: tips and tools
Usage of BODC parameter vocabularies
UNIFIED MEDICAL LANGUAGE SYSTEMS (UMLS)
DBMS & TPS Barbara Russell MBA 624.
Microsoft Office Access 2010 Lab 2
Flanders Marine Institute (VLIZ)
Data Management: The Data Repatriation Re-integration Step or …
Cataloging the Internet
Data Model.
Vocabularies at the British Oceanographic Data Centre
LINGUA INGLESE 2A – a.a. 2018/2019 Computer-Aided Translation Technology LESSON 3 prof. ssa Laura Liucci –
Lab 2: Information Retrieval
Presentation transcript:

A Semantic Modelling Approach to Biological Parameter Interoperability Roy Lowry & Laura Bird British Oceanographic Data Centre Pieter Haaring RIKZ, Rijkswaterstaat, The Netherlands Ocean Biodiversity Informatics

Presentation Overview The nature of the problem Dictionaries and data models The starting position Manual mapping Automation through semantic matching From dictionary to semantic model Mapping semantic models Semantic model applications Conclusions and lessons learned

The Nature of the Problem BODC and Rijkswaterstaat both have marine databases holding a wide range of physical, chemical and biological parameters Both were to be included pan-European metadatabases (EDIOS and SEA-SEARCH CDI) using a common discovery vocabulary BODC set up the vocabulary and obviously included a mapping to the BODC Parameter Dictionary Problem arose of how to provide a similar mapping for the Rijkswaterstaat If the Rijkswaterstaat data markup vocabulary could be mapped to the BODC Parameter Dictionary then the BODC discovery vocabulary mapping could be used

Dictionaries and Data Models BODC systems have roots in the GF3 model, which means:  Data values are linked to a parameter code  Parameter code is defined in a Parameter Dictionary  The parameter code specifies more than one metadata item for the data value  For chemical and biological data ‘more than one’ becomes ‘a lot’

Dictionaries and Data Models Rijkswaterstaat uses data models (DONAR becoming WADI)  Measurements are accompanied by attributes containing specific atomic metadata items  Each attribute is populated from a controlled vocabulary  DONAR constrains attribute term combinations using a ‘parameter dictionary’ concept  WADI reduces maintenance overheads by allowing any combination

The Starting Position BODC  Parameter Codes defined by two plain-text fields  Related semantic information not necessarily in the same field  Fields would not concatenate sensibly  OK for humans, but not for machines Rijkswaterstaat  Consistently located semantics  Metadata fields that concatenate sensibly in both Dutch and English

Manual Mapping Manual mapping protocol  For each entry in the Rijkswaterstaat ‘dictionary’ spreadsheet  Look up code with identical meaning using BODC Dictionary search tools (Access Filter by Form)  If found –Copy BODC code from Access and paste into spreadsheet  Else –Prepare dictionary update record and submit for QA and load Error prone and 500 entries is pushing the limit of human endurance!

Semantic Matching When code lists run into thousands, automation is required Rijkswaterstaat developed a semantic matching tool to pull matching terms (preferably one) from the BODC dictionary Defeated by the lack of standardisation in the BODC plain-text fields e.g.  Calanus abundance  Abundance of Calanus  Calanus count  Number of Calanus

Dictionary to Semantic Model Became apparent that the BODC Dictionary required significant improvement if it was to support mapping automation Development strategy was to model the parameter code in the same way DONAR models a measurement Semantic model developed to cover all codes in BODC Dictionary

Dictionary to Semantic Model Semantic model developed from DONAR with an increased semantic element count to overcome shoe-horningSemantic model developed from DONAR with an increased semantic element count to overcome shoe-horning Principle that semantic elements may be combined automatically to produce text descriptions maintainedPrinciple that semantic elements may be combined automatically to produce text descriptions maintained Currently implemented as three sub- modelsCurrently implemented as three sub- models Element superset will ultimately be created as a single modelElement superset will ultimately be created as a single model

Dictionary to Semantic Model Biological sub-model semantic elements  Parameter (Abundance, Biomass)  Taxon_code (ITIS code)  Taxon_name  Taxon_subgroup (gender, size, stage)  Parameter_compartment_relationship (per unit volume of the, per unit area of the)  Compartment (water column, bed, sediment)  Sample_preparation  Analysis  Data_processing Needs further refinement e.g. subdivide Taxon_subgroup

Mapping Semantic Models Two stage process  First map the semantic elements  DONAR Parameter = BODC Parameter + Parameter_compartment_relationship  DONAR Compartment = BODC Compartment  Then map vocabularies for mapped elements  Surface water = water column Relational database designers will recognise this as normalisation

Mapping Semantic Models Number of ‘look-ups’ required is reduced by an order of magnitude Vocabulary elements have simple semantics so automation is possible Approximately 90% of the Rijkswaterstaat to BODC mapping accomplished by a single SQL statement Straightforward extension of vocabulary maps (different names for same thing) sorted out most of the rest Thesauri could help reduce the need for this

Mapping Semantic Models ‘Hard Core’ problems required manual resolution  Unclear or ambiguous semantics in Rijkswaterstaat element vocabularies (residual beta)  Problems with Dutch to English translation Some mapping errors were detected  Caused by homonyms (Branchiura)  Emphasises the need for more than just a name for a taxon (reference or ITIS code)

Semantic Model Applications Semantic modelling is a lowest common denominator approach to metadata This is what makes it good for mapping The approach also offers the basis for user-controlled data discovery and interoperability  User chooses the semantic element subset  User data selection interaction based on the subset vocabulary  Automated interoperability requires more sophistication (thesauri, ontologies)

Conclusions Don’t even think about manual mapping of large parameter dictionaries 99% of a map is completed in the first 10% of the time More standardisation means fewer errors and problems Semantic model vocabularies need ontologies and thesauri to achieve their full interoperability potential

Conclusions Semantic modelling works for mappings between dictionaries and data models It also has great potential for parameter discovery and interoperability