Presentation is loading. Please wait.

Presentation is loading. Please wait.

Jessie Kennedy Rob Gales, Robert Kukla

Similar presentations


Presentation on theme: "Jessie Kennedy Rob Gales, Robert Kukla"— Presentation transcript:

1 Jessie Kennedy Rob Gales, Robert Kukla
Converting an Existing Taxonomic Data Resource to Employ an Ontology and LSIDS Jessie Kennedy Rob Gales, Robert Kukla

2 Introduction Data sharing is fundamental to biodiversity and taxonomic data applications, Previous attempts to facilitate sharing have had limited success lack of take up of data exchange standards now slowly happening due to the TDWG standards initiative the absence of a common terminology or vocabulary for use in taxonomic data the lack of reference database systems for serving authoritative data Proposed new technologies a Core Ontology for taxonomic data to model the biodiversity domain. Adoption of Life Science Identifiers (LSIDs) by the TDWG GUID group for uniquely identifying taxonomic data objects, e.g specimens, names, concepts, etc. LSIDs can make use of an Ontology to define the data to be returned Need a mechanism for migrating existing data to the new technologies explore the issues in using LSIDs and RDF according to an Ontology.

3 Re-using LSIDs Using LSIDs per se will not address the issue of data sharing Repositories must reuse LSIDs to cross reference data within and outwith their own repository. It is important that we use the same LSID to refer to the same entity If multiple LSIDs exist for the same entity we would be required to decide whether or not two LSIDs were really the same thing. We would be in a similar situation as we are today, for example, trying to decide if two taxonomic names are really the same. Generating LSIDs for any self contained data set is a fairly trivial task Appointing LSIDs to existing data from an authoritative repository to re-use them is more challenging.

4 Project Overview Imagining the future
Assume have authority providers for certain data Publications, names etc e.g. IPNI, ZOObank, IF, Pubbank… Want to Convert Existing Data repository Relational database the Hexacorallians of the World Represent existing data as RDF triples Use LSIDs to uniquely identify entities in data according to a domain ontology which extends TDGW core ontology Use LSIDs to cross reference between the data in the repository Some LSIDs re-used from external sources Some LSIDs generated locally Owned data Development of a tool to aid the process of converting internal database keys to LSIDs aid users in appointing the appropriate LSID from some external LSID authority.

5 Creating Domain Ontology
Draft Core Ontology Core and BDI ontology Classes and optional relationships between classes Extend to Domain Ontology Domain classes inherit from the core classes Extended with additional classes Re-use existing ontologies where possible Specify additional literal properties Where necessary Straightforward for developer For Hexacorallia data Creating RDF triples Manual mapping of relational data to RDF triples according to OWL specification Used wasabi mapping extensions & custom code for generation

6 Simulate Authority Providers
Hexacorallian Database Specimen Triple Store Publication Triple Store Concept Triple Name Map + AutoLSID Person Simulated Authority Data providers e . g IPNI / Zoobank , Pubbank Museum _ specimens Test Data set Generate LSID and RDF instances according to classes in the ontology appropriate to each “authority”

7 Convert Existing Provider
Convert Existing Thematic Data Provider to use existing LSIDs and ontology Original data repository RDF Data to be updated with LSIDs from “authority” providers Linker Tool Hexacorallia Thematic Provider Map to ontology Hexacorallia Thematic LSID Observation subset Triple Store LSID Match with linking tool Match + ->LSID Match + ->LSID Match + ->LSID Match + ->LSID Match + ->LSID Store Authority ( simulated ) Name Person Specimen Concept Publication Observation LSID Resolution Triple Triple Triple Triple Triple Triple Store Store Store Store Store Services

8 WASABI Service Request Dispatcher
Linking…. WASABI Service Request Dispatcher LSID SPARQL OAI Linker authoritative (“source”) provider & linker local (“target”) provider Linker Client Hexacorallia Thematic Triple Store Person Triple Store

9 Configure Provider for Update
Name the local repository Select class to be linked

10 WASABI Service Request Dispatcher
Linking…. WASABI Service Request Dispatcher LSID SPARQL OAI Linker authoritative (“source”) provider & linker local (“target”) provider Linker Client Hexacorallia Thematic Triple Store Person Triple Store

11 Name authority provider with linking service
Configure the linker Select class to link on Name authority provider with linking service

12 WASABI Service Request Dispatcher
Linking…. WASABI Service Request Dispatcher LSID SPARQL OAI Linker authoritative (“source”) provider & linker local (“target”) provider Linker Client Hexacorallia Thematic Triple Store Person Triple Store

13 Request Annotations

14 Linking Service… Communication between linking service and linking client RDF Handler takes RDF model’s in a POST request. If data is sufficient in size, it is cached and a thread spawned to link and maintain status, which is fetched through the polling mechanism. Contains URI’s of classes that may be linked by the service. Contains status information and any suggestions that have been made since last poll.

15 Linking Service Determines properties for matching
Return suggestions to the client Weight possible matches Examines the ontology based on the classes the linking service or application has been configured with to determine any other classes that may be linked upon (super classes) and properties that have been defined on those classes. Determines properties that have been defined on submitted instances that have a range of one of the classes identified by the bootstrapping process by further examination of the ontology. Will download and cache additional ontologies if necessary. Each step of the linking pipeline is executed for each resource in the RDF model submitted

16 Confirm/Skip Annotations
Suggested match Person to find LSID for

17 Confirm/Skip Annotations
Person to find LSID for Choice of possible persons with LSIDs

18 Research Questions How effective is the draft ontology for representing existing data sources? Can suitable extensions be easily defined? Straight forward for developer Need independent verification… What are the issues for an existing data provider to convert their data to using the ontology and LSIDs? Replace or annotate existing data If, for example, I replace an author with a person LSID what I get when I resolve a person won’t likely be what I would have had when I had the data for an author. Dependencies between LSID’able objects If you link via a taxon name LSID – the resolved name should have embedded an LSID for a publication – so there shouldn’t be any need (in principal) to match publications for names What about authorities that issues LSIDs but don’t map to other authorities e.g. name providers not mapping to either publication or specimen providers and don’t want to!

19 Research Questions… What support would a linking tool need to provide end users? How would users want to process this data How much automation? E.g. above a certain confidence level Would his be trusted? Order of matching E.g. match all instances of persons at once Match of persons by publication? Other Issues… Performance of existing linking tool approach Lots of data passing going on Need better batch or one at a time Finding authorities that provide linking services How do you find out about authorities with linking services? How do you know which ones to use?

20 Acknowledgements TDWG/Gordon Betty Moore Foundation


Download ppt "Jessie Kennedy Rob Gales, Robert Kukla"

Similar presentations


Ads by Google