Converting an Existing Taxonomic Data Resource to Employ an Ontology and LSIDS Jessie Kennedy Rob Gales, Robert Kukla.

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Advertisements

A vision for the future of taxonomic databases David Eades Illinois Natural History Survey Presented at the Natural History Museum, London, 17 January.
TDWG GUID-2 June 10, 2006Jessie Kennedy/Rob Gales LSID Resolution In SEEK Taxon.
GUIDs in EMu Ian Turnbull KE Software. GUID? UUID? A Globally Unique Identifier (GUID) is a persistent unique reference number used as an identifier.
Schema Matching and Query Rewriting in Ontology-based Data Integration Zdeňka Linková ICS AS CR Advisor: Július Štuller.
GridVine: Building Internet-Scale Semantic Overlay Networks By Lan Tian.
Copyright Irwin/McGraw-Hill Data Modeling Prepared by Kevin C. Dittman for Systems Analysis & Design Methods 4ed by J. L. Whitten & L. D. Bentley.
SPICE! An Ontology Based Web Application By Angela Maduko and Felicia Jones Final Presentation For CSCI8350: Enterprise Integration.
Data Intensive Techniques to Boost the Real-time Performance of Global Agricultural Data Infrastructures SEMAGROW U SING A POWDER T RIPLE S TORE FOR BOOSTING.
Automating programming via concept mining, probabilistic reasoning over semantic knowledge base of SE domain by Max Talanov.
Object-Oriented Analysis and Design
1 IBM SanFrancisco Product Evaluation Negotiated Option Presentation By Les Beckford May 2001.
Overview of Software Requirements
PROMPT: Algorithm and Tool for Automated Ontology Merging and Alignment Natalya F. Noy and Mark A. Musen.
70-293: MCSE Guide to Planning a Microsoft Windows Server 2003 Network, Enhanced Chapter 7: Planning a DNS Strategy.
1 System: Teallach Presenters: Baolinh Le, [Bryce Carder] Course: Knowledge-based User Interfaces Date: April 29, 2003 Teallach: A Model-Based User Interface.
Course Instructor: Aisha Azeem
Domain-Specific Software Engineering Alex Adamec.
The chapter will address the following questions:
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 18 Slide 1 Software Reuse 2.
CIMI / FHIR and Shape Expressions. Local DB … …
Developing Enterprise Architecture
Semantic Similarity Computation and Concept Mapping in Earth and Environmental Science Jin Guang Zheng Xiaogang Ma Stephan.
SERNEC Image/Metadata Database Goals and Components Steve Baskauf
Overview of the Database Development Process
CSC271 Database Systems Lecture # 6. Summary: Previous Lecture  Relational model terminology  Mathematical relations  Database relations  Properties.
What is Sure BDCs? BDC stands for Batch Data Communication and is also known as Batch Input. It is a technique for mass input of data into SAP by simulating.
Practical RDF Chapter 1. RDF: An Introduction
Publishing and Visualizing Large-Scale Semantically-enabled Earth Science Resources on the Web Benno Lee 1 Sumit Purohit 2
Integrating Security Design Into The Software Development Process For E-Commerce Systems By: M.T. Chan, L.F. Kwok (City University of Hong Kong)
Integrating Live Plant Images with Other Types of Biodiversity Records Steve Baskauf Vanderbilt Dept. of Biological Sciences
In-Band Access Control Framework Group Name: WG4 SEC Source: Qualcomm Meeting Date: Agenda Item:
Development Process and Testing Tools for Content Standards OASIS Symposium: The Meaning of Interoperability May 9, 2006 Simon Frechette, NIST.
Kuali Identity Management Overview. Why did we write KIM? Common Interface for Kuali Applications Provide a Fully-Functional Product A Single API for:
Definition of an Observation In general, an observation represents the measurement of some attribute, of some thing, at a particular time and place. Observations.
S&I Integration with NIEM (DRAFT) Standards Development Support June 8, 2011.
IFS310: Module 6 3/1/2007 Data Modeling and Entity-Relationship Diagrams.
Ricardo Pereira Software Engineer TDWG Infrastructure Project (TIP)
Distribution and components. 2 What is the problem? Enterprise computing is Large scale & complex: It supports large scale and complex organisations Spanning.
Ontology Architectural Support Options Group Name: MAS WG Source: Catalina Mladin, Lijun Dong, InterDigital Meeting Date: Agenda Item: TBD.
TDWG Life Sciences Identifiers Applicability Statement Ben Richardson Review Manager, LSID Applicability Statement Western Australian Herbarium Department.
11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)
Introduction to the Semantic Web and Linked Data Module 1 - Unit 2 The Semantic Web and Linked Data Concepts 1-1 Library of Congress BIBFRAME Pilot Training.
Metadata Common Vocabulary a journey from a glossary to an ontology of statistical metadata, and back Sérgio Bacelar
Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.
Globally Unique Identifiers in Biodiversity Informatics Kevin Richards Landcare Research NZ TDWG 2008.
© 2006 University of Kansas An LSID resolver for specimens and a digression into issues raised by the use of GUIDs Steve Perry
Acronym Soup GBIF, TDWG & GUIDs Jerry Cooper. Global Biodiversity Information Facility (GBIF) Established in 2000 through non-binding MOU (25 countries.
P51UST: Unix and SoftwareTools Unix and Software Tools (P51UST) Version Control Systems Ruibin Bai (Room AB326) Division of Computer Science The University.
Linked Data Best Practices and BibFrame December 15 th, 2015 Rob Sanderson (google doc) CNI 2015 F ALL F ORUM.
Winter 2011SEG Chapter 11 Chapter 1 (Part 1) Review from previous courses Subject 1: The Software Development Process.
Abstract Modeling of Service Package Result Components 31 March – 3 April 2014 Noordwijkerhout, Netherlands John Pietras Global Science and Technology,
1 Open Ontology Repository initiative - Planning Meeting - Thu Co-conveners: PeterYim, LeoObrst & MikeDean ref.:
DANIELA KOLAROVA INSTITUTE OF INFORMATION TECHNOLOGIES, BAS Multimedia Semantics and the Semantic Web.
Introduction to Active Directory
Riccardi: DIALOGUE Workshop August 1, 2005 Supported by NSF BDI 1 Representing and Using Phylogenetic Characters in Morphbank Greg Riccardi, David Gaitros,
TDWG – Looking Backward and Forward Donald Hobern, Director, Atlas of Living Australia 20 October 2008.
Steven Perry Dave Vieglais. W a s a b i Web Applications for the Semantic Architecture of Biodiversity Informatics Overview WASABI is a framework for.
DHCP Vrushali sonar. Outline DHCP DHCPv6 Comparison Security issues Summary.
Concept mining for programming automation. Problem ➲ A lot of trivial tasks that could be automated – Add field Patronim on Customer page. – Remove field.
ODATA DESIGN PRINCIPLES July 26, BUILD ON HTTP, REST OData is a RESTful HTTP Protocol Build on HTTP Entities modeled as Resources Relationships.
Kyung Hee University Class Diagramming Notation OOSD 담당조교 석사과정 이정환.
GoRelations: an Intuitive Query System for DBPedia Lushan Han and Tim Finin 15 November 2011
TDWG Core Ontology J Kennedy R Gales, R Hyam, R Kukla, J Wieczorek, G Hagedorn, M Döering D Vieglais, S Perry, D Hobern.
Jessie Kennedy Rob Gales, Robert Kukla
SAP University Alliances
Data Access Service Specification: RDF(S) Ontology Access Draft
Health Ingenuity Exchange - HingX
Metadata The metadata contains
RDF David R Newman 15 July 2009.
Presentation transcript:

Converting an Existing Taxonomic Data Resource to Employ an Ontology and LSIDS Jessie Kennedy Rob Gales, Robert Kukla

Introduction  Data sharing is fundamental to biodiversity and taxonomic data applications,  Previous attempts to facilitate sharing have had limited success  lack of take up of data exchange standards  now slowly happening due to the TDWG standards initiative  the absence of a common terminology or vocabulary for use in taxonomic data  the lack of reference database systems for serving authoritative data  Proposed new technologies  a Core Ontology for taxonomic data to model the biodiversity domain.  Adoption of Life Science Identifiers (LSIDs) by the TDWG GUID group  for uniquely identifying taxonomic data objects, e.g specimens, names, concepts, etc.  LSIDs can make use of an Ontology to define the data to be returned  Need a mechanism for migrating existing data to the new technologies  explore the issues in using LSIDs and RDF according to an Ontology.

Re-using LSIDs  Using LSIDs per se will not address the issue of data sharing  Repositories must reuse LSIDs to cross reference data within and outwith their own repository.  It is important that we use the same LSID to refer to the same entity  If multiple LSIDs exist for the same entity we would be required to decide whether or not two LSIDs were really the same thing.  We would be in a similar situation as we are today,  for example, trying to decide if two taxonomic names are really the same.  Generating LSIDs for any self contained data set is a fairly trivial task  Appointing LSIDs to existing data from an authoritative repository to re-use them is more challenging.

Project Overview  Imagining the future  Assume have authority providers for certain data  Publications, names etc e.g. IPNI, ZOObank, IF, Pubbank…  Want to Convert Existing Data repository  Relational database  the Hexacorallians of the World  Represent existing data as RDF triples  Use LSIDs to uniquely identify entities in data  according to a domain ontology which extends TDGW core ontology  Use LSIDs to cross reference between the data in the repository  Some LSIDs re-used from external sources  Some LSIDs generated locally  Owned data  Development of a tool to aid the process of converting internal database keys to LSIDs  aid users in appointing the appropriate LSID from some external LSID authority.

Creating Domain Ontology  Draft Core Ontology  Core and BDI ontology  Classes and optional relationships between classes  Extend to Domain Ontology  Domain classes inherit from the core classes  Extended with additional classes  Re-use existing ontologies where possible  Specify additional literal properties  Where necessary  Straightforward for developer  For Hexacorallia data  Creating RDF triples  Manual mapping of relational data to RDF triples according to OWL specification  Used wasabi mapping extensions & custom code for generation

Hexacorallian Database Specimen Triple Store Publication Triple Store Concept Triple Store Name Triple Store Map +AutoLSID Map +AutoLSID Map +AutoLSID Map +AutoLSID Person Triple Store Map +AutoLSID Simulated Authority Data providers e.g.IPNI/Zoobank,Pubbank, Museum_specimens Test Data set Simulate Authority Providers Generate LSID and RDF instances according to classes in the ontology appropriate to each “authority”

Specimen Triple Store Publication Triple Store Concept Triple Store Name Triple Store Hexacorallia Thematic Triple Store Observation Triple Store LSID Observation subset Person Triple Store Hexacorallia Thematic Provider Map to ontology Convert Existing Thematic Data Provider to use existing LSIDs and ontology Match + ->LSID Authority(simulated) LSID Resolution Services LSID Match with linking tool Match + ->LSID Match + ->LSID Match + ->LSID Match + ->LSID Convert Existing Provider Original data repository RDF Data to be updated with LSIDs from “authority” providers Linker Tool

Linking…. WASABI Service Request Dispatcher LSID SPARQL OAI WASABI Service Request Dispatcher LSIDSPARQL Linker OAI authoritative (“source”) provider & linker local (“target”) provider Linker Client Hexacorallia Thematic Triple Store Person Triple Store

Configure Provider for Update Select class to be linked Name the local repository

Linking…. WASABI Service Request Dispatcher LSID SPARQL OAI WASABI Service Request Dispatcher LSIDSPARQL Linker OAI authoritative (“source”) provider & linker local (“target”) provider Linker Client Hexacorallia Thematic Triple Store Person Triple Store

Configure the linker Select class to link on Name authority provider with linking service

Linking…. WASABI Service Request Dispatcher LSID SPARQL OAI WASABI Service Request Dispatcher LSIDSPARQL Linker OAI authoritative (“source”) provider & linker local (“target”) provider Linker Client Hexacorallia Thematic Triple Store Person Triple Store

Request Annotations

Linking Service… Communication between linking service and linking client

Linking Service Determines properties for matching Weight possible matches Return suggestions to the client

Confirm/Skip Annotations Person to find LSID for Suggested match

Confirm/Skip Annotations

Person to find LSID for Choice of possible persons with LSIDs

Research Questions  How effective is the draft ontology for representing existing data sources?  Can suitable extensions be easily defined?  Straight forward for developer  Need independent verification…  What are the issues for an existing data provider to convert their data to using the ontology and LSIDs?  Replace or annotate existing data  If, for example, I replace an author with a person LSID what I get when I resolve a person won’t likely be what I would have had when I had the data for an author.  Dependencies between LSID’able objects  If you link via a taxon name LSID – the resolved name should have embedded an LSID for a publication – so there shouldn’t be any need (in principal) to match publications for names  What about authorities that issues LSIDs but don’t map to other authorities  e.g. name providers not mapping to either publication or specimen providers  and don’t want to!

Research Questions…  What support would a linking tool need to provide end users?  How would users want to process this data  How much automation?  E.g. above a certain confidence level  Would his be trusted?  Order of matching  E.g. match all instances of persons at once  Match of persons by publication?  Other Issues…  Performance of existing linking tool approach  Lots of data passing going on  Need better batch or one at a time  Finding authorities that provide linking services  How do you find out about authorities with linking services?  How do you know which ones to use?

Acknowledgements  TDWG/Gordon Betty Moore Foundation