Jessie Kennedy Rob Gales, Robert Kukla

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Advertisements

TDWG GUID-2 June 10, 2006Jessie Kennedy/Rob Gales LSID Resolution In SEEK Taxon.
1 IBM SanFrancisco Product Evaluation Negotiated Option Presentation By Les Beckford May 2001.
Course Instructor: Aisha Azeem
The chapter will address the following questions:
UNIT-V The MVC architecture and Struts Framework.
Semantic Similarity Computation and Concept Mapping in Earth and Environmental Science Jin Guang Zheng Xiaogang Ma Stephan.
SERNEC Image/Metadata Database Goals and Components Steve Baskauf
Practical RDF Chapter 1. RDF: An Introduction
In-Band Access Control Framework Group Name: WG4 SEC Source: Qualcomm Meeting Date: Agenda Item:
Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드.
Ricardo Pereira Software Engineer TDWG Infrastructure Project (TIP)
Ontology Architectural Support Options Group Name: MAS WG Source: Catalina Mladin, Lijun Dong, InterDigital Meeting Date: Agenda Item: TBD.
TDWG Life Sciences Identifiers Applicability Statement Ben Richardson Review Manager, LSID Applicability Statement Western Australian Herbarium Department.
Introduction to the Semantic Web and Linked Data Module 1 - Unit 2 The Semantic Web and Linked Data Concepts 1-1 Library of Congress BIBFRAME Pilot Training.
Metadata Common Vocabulary a journey from a glossary to an ontology of statistical metadata, and back Sérgio Bacelar
Globally Unique Identifiers in Biodiversity Informatics Kevin Richards Landcare Research NZ TDWG 2008.
© 2006 University of Kansas An LSID resolver for specimens and a digression into issues raised by the use of GUIDs Steve Perry
Acronym Soup GBIF, TDWG & GUIDs Jerry Cooper. Global Biodiversity Information Facility (GBIF) Established in 2000 through non-binding MOU (25 countries.
1 Open Ontology Repository initiative - Planning Meeting - Thu Co-conveners: PeterYim, LeoObrst & MikeDean ref.:
DANIELA KOLAROVA INSTITUTE OF INFORMATION TECHNOLOGIES, BAS Multimedia Semantics and the Semantic Web.
Riccardi: DIALOGUE Workshop August 1, 2005 Supported by NSF BDI 1 Representing and Using Phylogenetic Characters in Morphbank Greg Riccardi, David Gaitros,
Steven Perry Dave Vieglais. W a s a b i Web Applications for the Semantic Architecture of Biodiversity Informatics Overview WASABI is a framework for.
Converting an Existing Taxonomic Data Resource to Employ an Ontology and LSIDS Jessie Kennedy Rob Gales, Robert Kukla.
TDWG Core Ontology J Kennedy R Gales, R Hyam, R Kukla, J Wieczorek, G Hagedorn, M Döering D Vieglais, S Perry, D Hobern.
1 The XMSF Profile Overlay to the FEDEP Dr. Katherine L. Morse, SAIC Mr. Robert Lutz, JHU APL
Core LIMS Training: Project Management
Gridpp37 – 31/08/2016 George Ryall David Meredith
The Semantic Web By: Maulik Parikh.
Make-to-Stock Scenario Overview
Introduction to Persistent Identifiers
What are they? The Package Repository Client is a set of Tcl scripts that are capable of locating, downloading, and installing packages for both Tcl and.
Resource Management / Acquisitions
LOCO Extract – Transform - Load
BIBFLOW Project Update
Self Healing and Dynamic Construction Framework:
SAP University Alliances
draft-lemonade-imap-submit-01.txt “Forward without Download”
Node.js Express Web Services
Use Case Model.
Object-Oriented Database Management System (ODBMS)
Software Reuse ©Ian Sommerville 2006.
Active Directory and Group Policy
Distribution and components
LCGAA nightlies infrastructure
SysML 2.0 Model Lifecycle Management (MLM) Working Group
Make-to-Stock Scenario Overview
Processes The most important processes used in Web-based systems and their internal organization.
THE STEPS TO MANAGE THE GRID
draft-ietf-geopriv-lbyr-requirements-02 status update
Data Access Service Specification: RDF(S) Ontology Access Draft
SAD ::: Spring 2018 Sabbir Muhammad Saleh
An Introduction to Software Engineering
ARCH-1: Application Architecture made Simple
Health Ingenuity Exchange - HingX
Introducing ISTQB Agile Foundation Extending the ISTQB Program’s Support Further Presented by Rex Black, CTAL Copyright © 2014 ASTQB 1.
Chapter 13 Quality Management
[jws13] Evaluation of instance matching tools: The experience of OAEI
Marketing-to-Opportunity Scenario Overview
Using Use Case Diagrams
Metadata The metadata contains
OWASP Application Security Verification Standard
RDF David R Newman 15 July 2009.
COMPUTER NETWORKS PRESENTATION
Alignment of Part 4B with ISAE 3000
A Firmware Update Architecture for Internet of Things Devices
Using GitHub for Papyrus Models Jessie Jewitt – OAM Technology Consulting/ ARM Inc. January 29th, 2018.
SDMX IT Tools SDMX Registry
Message Passing Systems
Presentation transcript:

Jessie Kennedy Rob Gales, Robert Kukla Converting an Existing Taxonomic Data Resource to Employ an Ontology and LSIDS Jessie Kennedy Rob Gales, Robert Kukla

Introduction Data sharing is fundamental to biodiversity and taxonomic data applications, Previous attempts to facilitate sharing have had limited success lack of take up of data exchange standards now slowly happening due to the TDWG standards initiative the absence of a common terminology or vocabulary for use in taxonomic data the lack of reference database systems for serving authoritative data Proposed new technologies a Core Ontology for taxonomic data to model the biodiversity domain. Adoption of Life Science Identifiers (LSIDs) by the TDWG GUID group for uniquely identifying taxonomic data objects, e.g specimens, names, concepts, etc. LSIDs can make use of an Ontology to define the data to be returned Need a mechanism for migrating existing data to the new technologies explore the issues in using LSIDs and RDF according to an Ontology.

Re-using LSIDs Using LSIDs per se will not address the issue of data sharing Repositories must reuse LSIDs to cross reference data within and outwith their own repository. It is important that we use the same LSID to refer to the same entity If multiple LSIDs exist for the same entity we would be required to decide whether or not two LSIDs were really the same thing. We would be in a similar situation as we are today, for example, trying to decide if two taxonomic names are really the same. Generating LSIDs for any self contained data set is a fairly trivial task Appointing LSIDs to existing data from an authoritative repository to re-use them is more challenging.

Project Overview Imagining the future Assume have authority providers for certain data Publications, names etc e.g. IPNI, ZOObank, IF, Pubbank… Want to Convert Existing Data repository Relational database the Hexacorallians of the World Represent existing data as RDF triples Use LSIDs to uniquely identify entities in data according to a domain ontology which extends TDGW core ontology Use LSIDs to cross reference between the data in the repository Some LSIDs re-used from external sources Some LSIDs generated locally Owned data Development of a tool to aid the process of converting internal database keys to LSIDs aid users in appointing the appropriate LSID from some external LSID authority.

Creating Domain Ontology Draft Core Ontology Core and BDI ontology Classes and optional relationships between classes Extend to Domain Ontology Domain classes inherit from the core classes Extended with additional classes Re-use existing ontologies where possible Specify additional literal properties Where necessary Straightforward for developer For Hexacorallia data Creating RDF triples Manual mapping of relational data to RDF triples according to OWL specification Used wasabi mapping extensions & custom code for generation

Simulate Authority Providers Hexacorallian Database Specimen Triple Store Publication Triple Store Concept Triple Name Map + AutoLSID Person Simulated Authority Data providers e . g IPNI / Zoobank , Pubbank Museum _ specimens Test Data set Generate LSID and RDF instances according to classes in the ontology appropriate to each “authority”

Convert Existing Provider Convert Existing Thematic Data Provider to use existing LSIDs and ontology Original data repository RDF Data to be updated with LSIDs from “authority” providers Linker Tool Hexacorallia Thematic Provider Map to ontology Hexacorallia Thematic LSID Observation subset Triple Store LSID Match with linking tool Match + ->LSID Match + ->LSID Match + ->LSID Match + ->LSID Match + ->LSID Store Authority ( simulated ) Name Person Specimen Concept Publication Observation LSID Resolution Triple Triple Triple Triple Triple Triple Store Store Store Store Store Services

WASABI Service Request Dispatcher Linking…. WASABI Service Request Dispatcher LSID SPARQL OAI Linker authoritative (“source”) provider & linker local (“target”) provider Linker Client Hexacorallia Thematic Triple Store Person Triple Store

Configure Provider for Update Name the local repository Select class to be linked

WASABI Service Request Dispatcher Linking…. WASABI Service Request Dispatcher LSID SPARQL OAI Linker authoritative (“source”) provider & linker local (“target”) provider Linker Client Hexacorallia Thematic Triple Store Person Triple Store

Name authority provider with linking service Configure the linker Select class to link on Name authority provider with linking service

WASABI Service Request Dispatcher Linking…. WASABI Service Request Dispatcher LSID SPARQL OAI Linker authoritative (“source”) provider & linker local (“target”) provider Linker Client Hexacorallia Thematic Triple Store Person Triple Store

Request Annotations

Linking Service… Communication between linking service and linking client RDF Handler takes RDF model’s in a POST request. If data is sufficient in size, it is cached and a thread spawned to link and maintain status, which is fetched through the polling mechanism. Contains URI’s of classes that may be linked by the service. Contains status information and any suggestions that have been made since last poll.

Linking Service Determines properties for matching Return suggestions to the client Weight possible matches Examines the ontology based on the classes the linking service or application has been configured with to determine any other classes that may be linked upon (super classes) and properties that have been defined on those classes. Determines properties that have been defined on submitted instances that have a range of one of the classes identified by the bootstrapping process by further examination of the ontology. Will download and cache additional ontologies if necessary. Each step of the linking pipeline is executed for each resource in the RDF model submitted

Confirm/Skip Annotations Suggested match Person to find LSID for

Confirm/Skip Annotations Person to find LSID for Choice of possible persons with LSIDs

Research Questions How effective is the draft ontology for representing existing data sources? Can suitable extensions be easily defined? Straight forward for developer Need independent verification… What are the issues for an existing data provider to convert their data to using the ontology and LSIDs? Replace or annotate existing data If, for example, I replace an author with a person LSID what I get when I resolve a person won’t likely be what I would have had when I had the data for an author. Dependencies between LSID’able objects If you link via a taxon name LSID – the resolved name should have embedded an LSID for a publication – so there shouldn’t be any need (in principal) to match publications for names What about authorities that issues LSIDs but don’t map to other authorities e.g. name providers not mapping to either publication or specimen providers and don’t want to!

Research Questions… What support would a linking tool need to provide end users? How would users want to process this data How much automation? E.g. above a certain confidence level Would his be trusted? Order of matching E.g. match all instances of persons at once Match of persons by publication? Other Issues… Performance of existing linking tool approach Lots of data passing going on Need better batch or one at a time Finding authorities that provide linking services How do you find out about authorities with linking services? How do you know which ones to use?

Acknowledgements TDWG/Gordon Betty Moore Foundation