Core Data Resources and FAIRification of Data Biodiversity Data Integration IG Core Data Resources and FAIRification of Data Presentation for Joint meeting: IG ELIXIR Bridging Force, IG Biodiversity Data Integration, WG BioSharing Registry RDA P11 Berlin 2018 Wouter Addink DiSSCo Coordination team member
Biodiversity Data Integration IG From data.. to knowledge?
Moving away from siloed data Atlas of Living Australia Moving away from siloed data L Biodiversity Heritage Library L Catalogue of Life Global Biodiversity Knowledgebase L GBIF L Barcode of Life L iDigBio L Encyclopedia of Life L Treebase
Biodiversity Data Integration IG Hot topics in the Biodiversity Data community A.H. Ariño et al: TDWG Now and Then, TDWG , Costa Rica, 7-XII-2016
Biodiversity Data Integration IG UN Sustainable Development Goals Required: High quality integrated data and services Coordinated strategy
Biodiversity Data Integration IG Linking dispersed information is imperative Example– Invasive Alien Species UN Sustainable Development Goals (Target 15.8) Economic costs of IAS for EU €20 Billion / year Kettunen et al. 2009 Urgent challenge Institutional collections Facilities & information Climate data Ecological monitoring data Genomic information Species distribution & genomics Linked Data Other Research Infrastructures Analysis / Interpretation Services Modelling / Prevention / Early detection
DATA MEASUREMENTS MODELLING Biodiversity Data Integration IG RI landscape for linking biodiversity information ENVRI plus, 2017 Species/ organisms observatories Experiments RIs providing data on external factors Integrative RIs Biodiversity standards / Reference data Taxonomic backbone System Modelling / Prevention / Early detection DATA MEASUREMENTS MODELLING Species distribution & genomics Institutional collections Alien Invasive species use case
SAP Biodiversity Data Integration IG Alignment of Projects for effective RI development - DiSSCo example Biodiversity Data Integration IG ICEDIG €3M | 2018-2020 € CoL+ €0.5M | 2017 - 2020 € €10M | 2014 - 2017 DiSSCo Design Study €10M | 2019 - 2021 SYNTHESYS+ € €2M | 2024 - 2025 DiSSCo Deploy € €53M | 2021 - 2024 DiSSCo Construct € DiSSCo Prepare €20M | 2019 - 2023 € €0.5M | 2018 - 2022 MOBILISE € SAP Strategic Alignment of Projects
114 National Facilities 21 Countries DiSSCo: A new European infrastructure 114 National Facilities 21 Countries Largest ever formal agreement between natural science collection facilities Centralised governance model already in place Supporting network of working groups DiSSCo builds on top of a mature community of institutions Strategic collaboration already underpinned by sound governance and decision-making structures
Biodiversity Data Integration IG Challenges in the Biodiversity Data domain Accelerate generation and linking of information into research data objects Ensure provenance and quality Provide reliable, unified, certified services and harmonised policies Provide services to other Research Infrastructures Connect publishing and use Improve feedback and ability to reference data
Biodiversity Data Integration IG Stakeholders in FAIRification of data Specification of core cloud services | Service Level Agreements e-Infrastructures Standardisation bodies New community data Standards Recommendations - specifications | Knowledge exchange Technical communities Research Infrastructures User requirements | Systems interoperability Data, workflow and systems integrity FAIR principles
Linking Biodiversity Data & Core data resources Catalogue of Life Plus project: Joint development of a practical, community-based approach to rapid completion of a Global Taxonomic backbone: (Re-)connects taxonomic research with specimen data Quality control and enhanced linkages Contribution of taxonomic expertise through a clearinghouse Species names DNA Barcoding Specimen identifiers Literature references iBOL – International Barcoded of Life project: The International Barcode of Life Project (iBOL) is the largest biodiversity genomics initiative ever undertaken, to create a digital identification system for life. CETAF Identifiers initiative: a joint Linked Open Data (LOD) compliant identifier system developed by the CETAF Information Science and Technology Committee (ISTC) providing mechanisms for consistently referencing individual specimens BHL – Biodiversity Heritage Library: Collaboratively makes biodiversity literature openly available to the world
Biodiversity Data Integration IG FAIRification process adopted by GO FAIR Steps: Retrieve non-FAIR data Analyse the retrieved data Define the semantic model Make data linkable Assign license Define metadata for the dataset Deploy FAIR data resource
Biodiversity Data Integration IG Some issues for FAIRification of Biodiversity Data No infrastructure yet for sensitive biodiversity data No standard ontologies Semantic Web and Linked Data technologies not widely used in community No common standard for metadata and current standards incomplete for giving attribution for the maintenance, curation, and digitization of collections. (RDA / TDWG Metadata Standards WG is working on this)
The need for taxon concept identifiers From: The use and limits of scientific names in biological informatics D. Remsen http://zookeys.pensoft.net/articles.php?id=6234
Data classes in the biodiversity data domain Occurrence Specimen Taxon Concept Interaction Taxon Name Publication Trait Collection Sequence Gene
Meta-model interpretation Relations in occurrence data Record <Class=Occurrence> BRA:UFPB:JPB:0000061643 Meta-model interpretation Observer Soares Neto, RL Place João Pessoa includedIn Place Paralba includedIn observedBy includedIn Place Brasil Occurrence BRA:UFPB:JPB:0000061643 Event <Unnamed> 20 Jan 2016 Place Campus I da UFPB (7.1375 S, 34.84586 W) fromEvent atLocality hasEvidence Specimen 61643 identifiedAs TaxonConcept <Species> hasName TaxonName Tarenaya spinosa includedIn inCollection TaxonConcept <Genus> TaxonName Tarenaya hasOwner hasName Collection JPB Institution UFPB TaxonConcept <Family> TaxonName Cleomaceae includedIn hasName hasCustodian