Presentation is loading. Please wait.

Presentation is loading. Please wait.

Wednesday 25 June 2014 – FAO, Rome BiOnym A concept-mapping workflow for taxon names reconciliation iMarine Board 5 – 25 June 2014, FAO, Rome, Italy Fabio.

Similar presentations


Presentation on theme: "Wednesday 25 June 2014 – FAO, Rome BiOnym A concept-mapping workflow for taxon names reconciliation iMarine Board 5 – 25 June 2014, FAO, Rome, Italy Fabio."— Presentation transcript:

1 Wednesday 25 June 2014 – FAO, Rome BiOnym A concept-mapping workflow for taxon names reconciliation iMarine Board 5 – 25 June 2014, FAO, Rome, Italy Fabio Fiorellato, Edward Vanden Berghe, Gianpaolo Coro, Nicolas Bailly, Caselyn Aldemita FAO / CNR / FIN / VUB

2 ‘Big Data’: Data make its way to biology Need for data integration Becoming a very realistic possibility –Management of DBs of millions of records Needs integration of small, restricted-scope datasets into massive databases –Intra-discipline integration (homogenous) –Inter-discipline integration (heterogeneous) Individual studies too small to inform on a scale commensurate with problems humankind faces –Evidence-based management of living resources –Climate change, global warming…

3 Central role of taxon name reconciliation Taxon name enrichment Taxon name reconciliationTaxon name access Occurrence data access Environmental data access openModeller AquaMaps Distribution modelling Occurrence data enrichment Occurrence data reconciliation

4 The BiOnym Workflow

5 Taxonomic names are the keys… … Keys to bind together information on the same taxon from different sources But there are problems – Different research groups use different spellings – Accidental misspellings – Synonym, homonym reconciliation (but outside scope of ByOnym)

6 Some people can’t type Real example in OBIS point data database Asthenognathas inaefaipes Asthenognathus inaeqipes Asthenognathus maefaipes Astheognathus inaequipes Asthenognathus inaeguipes Astheognathus inaeqinipes Asthenognathus inaequipes

7 Things can go very wrong with Excel Clupea harengus Linnaeus, 1758 Clupea harengus Linnaeus, 1759 Clupea harengus Linnaeus, 1760 … Clupea harengus Linnaeus, 2254 Clupea harengus Linnaeus, 2255

8 Taxonomic names are the keys… … Keys to bind together information on the same taxon from different sources But there are problems – Different research groups use different spellings – Accidental misspellings Reconciliation is necessity, not luxury!!!

9 Existing systems… … Are not flexible –We need flexibility, as our use case will dictate what the ‘optimal’ behaviour of the system is E.g. manual vs automatic systems … Are often coupled to a single ‘reference list’ –Using different tax. Scope for test and reference only increases false positives E.g. TaxaMatch with IRMNG… …Don’t always have throughput needed for large-scale projects –Largest db appr. 20M names – too many pairs!

10 Our need A flexible, highly customisable, workflow- based approach to taxon name matching –User controls input –Output can be used as input in other processes –Running on high performance computing infrastructure BiOnym!

11 The BiOnym Workflow

12 Key concepts and features in BiOnym Real-world application of the concept-mapping principles Focused on marine taxonomy but extendible to other life zones, and embedded in a wider-scope technology (COMET) Provides a full customisable workflow (order of matchers) Takes advantage of the iMarine distributed infrastructure The modular architecture enabled developers to integrate from third party components, new functionalities or improve existing ones with ease ….. And to add taxonomic authority files Based on standard and open formats (DwC, DwCa, …)

13 The iMarine solution: existing state-of-the-art A general purpose concept mapping framework (COMET) was already available in FAO: – based on an existing FAO product (limited to the fishing vessels domain) initially developed with the support of the Japanese trust fund – domain independent (can be tailored to any custom domain with little effort) – provided with all the necessary building blocks and components for general purpose usage

14 The iMarine solution: the quest for integration The integration of COMET inside iMarine was hailed and expected. Its main challenges: – Identify and define the custom domain (biological taxonomy) – Design and implement: custom COMET matchlets (engine assigning similarity scores to pairs of names) additional, reusable tools for data interchange and data preparation (DwCA converter, input parser, pre- and post-processors) – Enable components to be easily distributed among worker nodes inside the infrastructure – Integration in the iMarine Statistical Manager

15 BiOnym System: Overview

16 Where are we? Tools available in the VRE Statistical Manager for techies (!) Portlet available in the infrastructure but … … still to be integrated in the production part of Biodiversity Research VRE … … after testing by users outside iMarine Match names from a file in SM-VRE, not yet in portlet Accessible as a webservice under WPS protocol

17 The Bionym Interface in Statistical Manager VRE Never mind the small print. Step 1: Select your data Step 2: Compose the matching process. This relies on infrastructure resources Step 3: review results. This can be private and ‘for your eyes only’, or public.

18 Interface in the portlet: Advanced search http://bionym.d4science.org:8080/bionym-portletv6/

19 Matching results VME-DB and iMarine Reports - 8th TCom Feb 4th @Athens 19

20 Future work Within the framework of EC iMarine: –Finalise/fine tune the interface; –Analyse the feed back of the to be contacted members of the biodiversity community Beyond September: –Several suggestions in the technical report recently published, see section 5 [http://puma.isti.cnr.it/dfdownload.php?ident=/cnr.isti/2014-TR-022] Postprocessing Sharing matching results –Explore and fine-tune the WPS services. 20 VME-DB and iMarine Reports - 8th TCom Feb 4th @Athens

21 Thank you 21


Download ppt "Wednesday 25 June 2014 – FAO, Rome BiOnym A concept-mapping workflow for taxon names reconciliation iMarine Board 5 – 25 June 2014, FAO, Rome, Italy Fabio."

Similar presentations


Ads by Google