Dr Dimitris Koureas Lead of Research Data & Partnerships Natural History Museum London Executive Secretary, Biodiversity Information Standards (TDWG) Co-chair,

Slides:



Advertisements
Similar presentations
GLOBAL BIODIVERSITY INFORMATION FACILITY Dr Vishwas Chavan Senior Programme Officer for DIGIT Towards Data Publishing Framework.
Advertisements

Digital Repositories: interoperability & common services Closing Remarks Dr Liz Lyon, UKOLN, University of Bath, UK
Publish or perish? Linking Scratchpads and the new Biodiversity Data Journal for streamlining publication of botanical data D.N Koureas 1, L. Penev 2 &
Co-funded by the European Union under FP7-ICT Co-ordinated by aparsen.eu #APARSEN Welcome to the Conference !! Juan Bicarregui Chair, APA Executive.
Facilitating biodiversity science through
Scratchpads Publishing biodiversity: The interplay between Scratchpads and the Biodiversity Data Journal Dr Dimitrios Koureas Biodiversity Informatics.
EU BON citizen science gateway Veljo Runnel University of Tartu Natural History Museum.
Biodiversity Heritage Library by Connie Rinaldo. Overview History EOL/BHL: WHY? Members/Collaborators Process Governance Sustainability: Legal and Financial.
Dimitris Koureas, Vince Smith & Simon Rycroft Natural History Museum London Linking data, services and communities using Virtual Research Environments.
RDA Data Foundation and Terminology (DFT) IG: Introduction Prepared for RDA Plenary San Diego, March 9, 2015 Gary Berg-Cross, Raphael Ritz, Co-Chairs DFT.
RDA P5 | San Diego, CA, USA | 9-11 Mar 2015 Data integration for tackling global environmental challenges Plenary session - RDA P5 Dimitris Koureas Natural.
Fourth Annual Summit | Feb | Tucson, AZ Scratchpads for community involvement for natural history collections Dr Dimitris Koureas Biodiversity.
Key integrating concepts Groups Formal Community Groups Ad-hoc special purpose/ interest groups Fine-grained access control and membership Linked All content.
DATA FOUNDATION TERMINOLOGY WG 4 th Plenary Update THE PLUM GOALS This model together with the derived terminology can be used Across communities and stakeholders.
RDA Data Foundation and Terminology (DFT) IG: Introduction Prepared for RDA Plenary San Diego, March 9, 2015 Gary Berg-Cross, Raphael Ritz, Co-Chairs DFT.
GLOBAL BIODIVERSITY INFORMATION FACILITY Dr Vishwas Chavan Senior Programme Officer for DIGIT Data Citation Mechanism and.
Managing the Record of Research At the Smithsonian Using SIdora SAA Research Forum August 12, 2014.
Research Data Management At the Smithsonian Using SIdora Nano Tech Working Group May 15, 2014.
RDA Data Foundation and Terminology (DFT) IG: Introduction Prepared for RDA 6 th Plenary Paris, Sept. 25, 2015 Gary Berg-Cross, Raphael Ritz Co-Chairs.
Online tools and standards for Biodiversity data in the Semantic Web Dr Dimitris Koureas Biodiversity Informatics Group | Department of Life Sciences The.
Breakouts. Penguins: Skunks: Cacti: Beetles: Classroom A - Suzanne Classroom C - Chris Lecture Hall 2 - Connie Ward Lecture Hall - Marie (Theme: Content.
Dimitris Koureas, PhD Natural History Museum London Linking layers of biodiversity data: Informatics challenges for the long tail research RDA - Long Tail.
[] Where Did Those GBIF Occurrences Come From? Providing Digital Access to NatureServe's Reference Database: Report on a Project in the Early Stages of.
Enhancing formal and professional training capacity in Biodiversity Informatics: Collaboration and funding opportunities Dimitris Koureas Natural History.
Digitization of Natural History Collections (DIGIT) Larry Speers Program Officer Digitization of Natural History Collections Data TDWG Annual Meeting Oct.
Richard White Biodiversity Informatics. What is biodiversity informatics? The preceding project, among others, shows that the challenges facing biodiversity.
Scratchpads The virtual research environment for biodiversity data Simon Rycroft, Dave Roberts, Vince Smith, Alice Heaton, Katherine Bouton, Laurence Livermore,
E-Science and Technology Infrastructure for Biodiversity and Ecosystem Research.
LifeWatch E-Science and Observatory Infrastructure for Biodiversity & Ecosystem Science Olaf Bánki.
RDA Data Foundation and Terminology (DFT) WG: Overview  Prepared for Collab Chairs Meeting, NIST, Nov 13-14, 2014  Gary Berg-Cross, Raphael Ritz, Peter.
Global Biodiversity Information Facility GLOBAL BIODIVERSITY INFORMATION FACILITY Meredith A. Lane CODATA/ERPANET Workshop: Scientific Data Selection &
Hydro DWG at the RDA Plenary: BoF and Aligning HDWG work with WMO expectations and timeline Sylvain, Tony, Silvano, Ilya.
Summary of RDA Outputs so far dr. Ir. Herman Stehouwer 22 September 2015.
Biodiversity literature mark-up Compelling use cases for Natural History Collections Dr Dimitris Koureas Natural History Museum London Workshop on mark-up.
Scratchpads and the new Biodiversity Data Journal Biodiversity Data Publishing made… easier Dimitris Koureas Natural History Museum London.
Research Data Management At the Smithsonian Using Sidora CNI December 10, 2013.
Hydro DWG at the RDA Plenary BoF - Improve sharing of water resource data globally 24 September BREAKOUT :30-15:00.
Data in Context Co-chairs: Brigitte Jörg, Keith Jeffery RDA 3rd Plenary, March, 26th - 28th, 2014 Dublin.
Proposal for a new RDA/TDWG WG Attribution Standards for Data Object Curation.
Virtual Biodiversity ViBRANT Vocabularies, Standards, merging and linking Data Olaf Banki University of Amsterdam ViBRANT Virtual Biodiversity.
Discussion of Data Fabric Terms & Preparation for RDA P7 Virtual Meeting Monday, January 25, 2016 Organized by Gary Berg-Cross (DFT-IG) and Peter Wittenburg.
Data Foundation IG DF Organizing Chairs: Gary Berg-Cross & Peter Wittenburg.
Metadata Standards Directory Alex Ball, Jane Greenberg, Keith Jeffery, Rebecca Koskela.
RDA-WDS Publishing Data IG Data Bibliometrics Working Group.
ICSU-WDS & RDA Data Publication Services WG. 2 Linking Research Data and the Literature: why? Why link? 1.Increase visibility & discoverability of research.
RDA in a nutshell 18 May 2016
RDA for Data Practitioners Peter Wittenburg / Rainer Stotzka.
ODIN – ORCID and DATACITE Interoperability Network ODIN: Connecting research and researchers Sergio Ruiz - DataCite Funded by The European Union Seventh.
Data Foundations And Terminology (DFT) IG Virtual Meeting July 6 th 2016 Co-Chairs DFT IG :Gary Berg-Cross & Raphael Ritz P8 Sessions DFT IG Breakout Session.
Data Foundations And Terminology (DFT) IG
Workshop on Brokering in Data Fabrics - community perspectives -
RDA 9th Plenary Breakout 3, 5 April :00-17:30
Overview of WGs, IGs and BoFs
Dimitris Koureas Lead, Research Data and Partnerships
Current and Upcoming RDA Recommendations Dr. ir. Herman Stehouwer
RDA Data Fabric (DF) Interest Group Peter Wittenburg & Gary Berg-Cross
GBIF Implementation Plan Highlights
Susanna-Assunta Sansone, Rebecca Lawrence and Simon Hodson
Data Foundations And Terminology (DFT) IG
Organising data to represent biodiversity
Publishing Data Services working group output:
Data Sharing Between SANBI and Partners
Data publishing from the viewpoint of a biodiversity publisher
Agenda welcome and goals (Peter)
Core Data Resources and FAIRification of Data
GBIF Strategic Plan Alberto González-Talaván
Bird of Feather Session
Digital Objects: The Science
Co-Chairs: Keith Jeffery, Rebecca Koskela, Alex Ball
EOSC-hub Contribution to the EOSC WGs
Presentation transcript:

Dr Dimitris Koureas Lead of Research Data & Partnerships Natural History Museum London Executive Secretary, Biodiversity Information Standards (TDWG) Co-chair, RDA Biodiversity Data Integration IG RDA outputs for addressing biodiversity data challenges

The problem: Capturing and integrating biodiversity data How to we join up these activities?How do we use this as a tool? Species conservation & protected areas Impacts of human development Biodiversity & human health Impacts of climate change Food, farming & biofuels Invasive alien species What infrastructures do we need? (technologies, tools, standards…) What processes do we need? (Modelling, workflows…) What data do we need? (Genes, localities…)

Challenge 1: mobilising data at all scales

Technical aspect of data mobilisation Collections 1.5-3B specimens in collections worldwide Fragments efforts / need coordination Biodiversity literature >300M pages, BHL scanned 41M to date Copyright post-1923 & article metadata Informatics challenges Automation & annotation Storage & persistence Business models to sustain activity Collections, literature & metadata How can we quickly, efficiently and cost effectively mobilise biological data at scale? Bibliography of Life (RefFinder & RefBank) BHL literature NHM Digitisation

Big Data in Taxonomy and Systematics

Challenge 2: linking & aggregating data at different scales National Efforts c.5M (e.g. NHM Data Portal) Communities c.50k (e.g. Scratchpads) Global Efforts c.500M (e.g. GBIF Data Portal)

Conceptually has many potential uses Identifying trends Explaining patterns Making predictions Real time alerts - when data contradicts current knowledge The ultimate policy tool Major informatics challenges Technical very difficult (many years off) Needs effective prototypes & platforms Some first steps e.g. Local Ecological Footprint Tool Nature 2013, doi: /493295a Reasoning across large, linked biodiversity datasets A clear, singular, long-term vision, which biodiversity data can contribute too Challenge 3: Synthesising data, e.g. modelling human pressures on biodiversity

Projecting Responses of Ecological Diversity In Changing Terrestrial Systems 2M records, 19k sites, 34k spp. Management Practices EcosystemsAgro-systems Small aggregated datasets Species richness in different ecosystems Land-use change Pollution Invasive species Infrastructure Models to predict how biodiversity responds to human pressures Synthetic challenges: Modeling the biosphere

Diversity of Data types AND data sources Rod Page

GBIF Aggregators Occurrence data aggregated from different nodes (data holders)

Encyclopedia of Life Aggregators

EOL - TraitBank Over 8 million traits Aggregators

GenBank GenBank is part of the International Nucleotide Sequence Database Collaboration A comprehensive database that contains publicly available nucleotide sequences for almost 260,000 formally described species Aggregators

Species+ Aggregators A combined source for legislation, distribution and trade in MEA-listed species

Making taxonomy digital, open & linked Aggregators

The Scratchpads concept Your data External data & services Data papers

Catalogue of Life Providers A single authoritative source of taxonomic information

Biodiversity literature openly available to the world Biodiversity Heritage Library (BHL) > 200M pages of legacy literature Providers

Rod Page

Linking everything together is not enough. We need to provide turn-key solutions for researchers across scientific domains Will spark new ideas and support disruptive innovation in science

The 1k species project

Data mobilisation: mass digitisation and institutional data portals Generate the data necessary to document our collections and provide the means to access this information Provide common institutional portals to these data with a use a common licensing framework Tracking and linkage of data, specimens and authors through the adoption of persistent digital object identifier frameworks (e.g. ORCID, DataCite, and CrossRef DOIs). Data citation Metadata PID Information Types RDA outputs

Data access: a common digital gateway to collections Three billion specimens dispersed across multiple physical locations Union catalogue – Integrated information on collective museum holdings Download datasets, images and 3D models from across all our institutions in a single step Repository Audit and certification RDA outputs

Tools and services that facilitate the manipulation and analysis of big, integrated datasets. Examples include: taxonomic name matching, checklist production, authority files, georeferencing, image recognition and acoustic recognition. Cross-institutionally agreed core data models and technical interfaces (APIs). Data services: a service driven architecture for tool and model development RDA outputs Practical policy Publishing Data Services Data Foundation and terminology

Data consensus: community data curation and attribution Incentivise community contribution and curation through: 1.Easy to use mechanisms (services) 2.Clear incentives with academic and wider societal value RDA outputs Data Attribution (Will be proposed) Data Attribution (Will be proposed)

What is the potential value of RDA outputs for integrating biodiversity data? Strong advocacy from domain experts for RDA activities Robust, simple and transferable case studies of how RDA outputs can underpin our efforts Streamline the process of collating requirements specifications from all domain IGs as a first step for all tech WGs Organise joint sessions with emphasis on scientific topics derived from domain IGs Leverage the domain IGs as the driving force for technical solutions

The vision Develop the Biodiversity knowledge graph and build services on top of that

Thank you

Array Database Working GroupPeter Baumann Brokering Governance WGStefano Nativi, Max Craglia, Jay Pearlman Data Citation WG Andreas Rauber, Ari Asmi, Dieter van Uytvanck Data Description Registry Interoperability (DDRI) WGAmir Aryani, Adrian Burton Data Foundation and Terminology WG Peter Wittenburg, Gary Berg-Cross, Raphael Ritz Data Type Registries WGLarry Lannom, Daan Broeder Metadata Standards Catalog WGRebecca Koskela, Keith Jeffrey, Alex Ball Metadata Standards Directory WG groups.org Jane Greenberg, Keith Jeffery, Rebecca Koskela, Alex Ball PID Information Types WGTobias Weigel, Tim DiLauro Practical Policy WGReagan Moore, Rainer Stotzka QoS-DataLC Definitions WGPaul Millar RDA/CODATA Summer Schools in Data Science and Cloud Computing in the Developing World Hugh Shanahan, Andrew Harrison, Simon Hodson RDA/WDS Publishing Data Bibliometrics WG Kerstin Lehnert, Todd Carpenter, John Kratz, Sarah Callaghan RDA/WDS Publishing Data Services WGHylke Koers, Adrian Burton RDA/WDS Publishing Data Workflows WG Sunje Dallmeier-Tiessen, Fiona Murphy, Nurnberger, Varsha Khodiyar Repository Audit and Certification DSA–WDS Partnership WG Lesley Rickards, Mary Vardigan, Rorie Edmunds Research Data Collections WG Bridget Almas, Frederik Baumgardt, Tobias Weigel, Tom Zastrow The BioSharing Registry: connecting data policies, standards & databases in life sciences Susanna-Assunta Sansone, Rebecca Lawrence, Simon Hodson & Peter McQuilton Wheat Data Interoperability WG groups.org Esther Dzalé Yeumo, Richard Fulss Working Group Data Security and Trust

All data objects need to have unique identifiers and resolvable handlers Standardised web-services Ontologies and vocabularies and open services Clear and robust governance models The challenges of a fragmented domain of e-infrastructures

Diversity of Data types AND data sources Investigator-focused 'small data‘ Locally generated 'invisible data' 'incidental data' Dark data 20% 80% Published and discoverable data Dark data more important mainly due to their volume 1 1 Heidorn PB. Library Trends 57: