Facilitating biodiversity science through


Similar presentations
Maines Sustainability Solutions Initiative (SSI) Focuses on research of the coupled dynamics of social- ecological systems (SES) and the translation of.

Use it or lose it: Crowdsourcing support and outreach activities in a hybrid sustainability model for e-infrastructures The ViBRANT project case studies.
GEO SB-01 Oceans and Society: Blue Planet An Integrating Oceans Task of GEO GEO-IX Plenary November 2012 Foz do Iguaçu, Brazil on behalf of the Blue.
Publish or perish? Linking Scratchpads and the new Biodiversity Data Journal for streamlining publication of botanical data D.N Koureas 1, L. Penev 2 &
Vision and Ambition for LifeWatch ICT Infrastructure Axel Poigné (Fraunhofer IAIS) Vera Hernández-Ernst (Fraunhofer IAIS) Alex Hardisty (Cardiff University)
What IHE Delivers 1 Business models - sustainability IHE Australia Worhshop – July 2011 Peter MacIsaac & Paul Clarke.
Scratchpads Publishing biodiversity: The interplay between Scratchpads and the Biodiversity Data Journal Dr Dimitrios Koureas Biodiversity Informatics.
EU BON citizen science gateway Veljo Runnel University of Tartu Natural History Museum.
Dimitris Koureas, Vince Smith & Simon Rycroft Natural History Museum London Linking data, services and communities using Virtual Research Environments.
JRC's Open Access (OA) Policy G. P. Tartaglia, A. Annoni, G. Merlo, F
THE JOINED UP WORLD OF E-RESEARCH Professor Neil McLean National Technical Standards Adviser to the Department of Education Science and Training (DEST)
TDWG Annual Conference 2013, Florence Hannu Saarenmaa University of Eastern Finland Integrating observation and survey data for production of the Essential.
Fourth Annual Summit | Feb | Tucson, AZ Scratchpads for community involvement for natural history collections Dr Dimitris Koureas Biodiversity.
Harnessing the Power of Environmental Data for Decision-Making IABIN Phase II.
Key integrating concepts Groups Formal Community Groups Ad-hoc special purpose/ interest groups Fine-grained access control and membership Linked All content.
Sustainability of EDIT Informatics Activities. BoD working group on sustainability Executive Summary, 20th July 2009: “… set of themes we are sure we.
Nurturing a community based sustainability model Support and outreach structures in Scratchpads Livermore L. & Koureas D. Biodiversity Informatics Group.
Scratchpads Publication Module - A paradigm shift in publishing RBG Kew, Seminar,
EGI-Engage EGI-Engage Engaging the EGI Community towards an Open Science Commons Project Overview 9/14/2015 EGI-Engage: a project.
Dr. Nikos Houssos| National Documentation Centre / NHRF European Network of National Contact Points for Research Infrastructures moving forward The CERIF-based.
Break Dengue in a Nutshell. WHAT WE WILL DO? Joint all forces against Dengue Leverage the power of social movements Be a pilot for other NTD fighting.
Data Infrastructures Opportunities for the European Scientific Information Space Carlos Morais Pires European Commission Paris, 5 March 2012 "The views.
1 European policies for e- Infrastructures Belarus-Poland NREN cross-border link inauguration event Minsk, 9 November 2010 Jean-Luc Dorel European Commission.
1 Common Challenges Across Scientific Disciplines Laurence Field CERN 18 th November 2013.
ICSU World Data System - trusted data services for global science Michael Diepenbroek, Vice-Chair WDS-SC.
11 th GBIF Global NODES Meeting Incentivising and Strategising Publishing of Biodiversity Data Vishwas Chavan Senior Programme Officer for Digitisation.
Virtual Biodiversity ViBRANT Vince Smith & Dave Roberts Natural History Museum, London ViBRANT Virtual Biodiversity.
General strategy. Introduction Global “financial crisis” Beginning to cascade into GBIF Now thinking about the forward strategy and next work programme.
Online tools and standards for Biodiversity data in the Semantic Web Dr Dimitris Koureas Biodiversity Informatics Group | Department of Life Sciences The.
@dimitriskoureas making small data… big. Publications based on countless specimens, images, maps, keys and datasets Typically generated by small communities.
Session Chair: Peter Doorn Director, Data Archiving and Networked Services (DANS), The Netherlands.
Building Biodiversity Information Education: Next Generation Bioinformaticians P. Bryan Heidorn Carole Palmer Dan Wright Graduate School of Library and.
Dimitris Koureas, PhD Natural History Museum London Linking layers of biodiversity data: Informatics challenges for the long tail research RDA - Long Tail.
ViBRANT Virtual Biodiversity Research Project overview Isabella Van de Velde Royal Belgian Institute of Natural Sciences, Brussels.
Enhancing formal and professional training capacity in Biodiversity Informatics: Collaboration and funding opportunities Dimitris Koureas Natural History.
Scratchpads The virtual research environment for biodiversity data Simon Rycroft, Dave Roberts, Vince Smith, Alice Heaton, Katherine Bouton, Laurence Livermore,
Every datum counts! Capitalising on small contributions to the big dreams of mobilising biodiversity information Vishwas Chavan, Eamonn O’ Tuama, Samy.
TWC Deep Earth Computer: A Platform for Linked Science of the Deep Carbon Observatory Community Xiaogang (Marshall) Ma, Yu Chen, Han Wang, Patrick West,
GBIF Mid Term Meetings 2011 Biodiversity Data Portals for GBIF Participants: The NPT Global Biodiversity Information Facility (GBIF) 3 rd May 2011.
An Introduction to Scratchpads: Making your data work for you Laurence Livermore Natural History Museum, London Joinville, Brazil.
SEEK Welcome Malcolm Atkinson Director 12 th May 2004.
E-Science and Technology Infrastructure for Biodiversity and Ecosystem Research.
LifeWatch E-Science and Observatory Infrastructure for Biodiversity & Ecosystem Science Olaf Bánki.
DataONE: Preserving Data and Enabling Data-Intensive Biological and Environmental Research Bob Cook Environmental Sciences Division Oak Ridge National.
Identifying funding and collaboration opportunities to support the Global Names e-Infrastructure Dimitris Koureas & Vince Smith Natural History Museum.
Biodiversity literature mark-up Compelling use cases for Natural History Collections Dr Dimitris Koureas Natural History Museum London Workshop on mark-up.
Scratchpads and the new Biodiversity Data Journal Biodiversity Data Publishing made… easier Dimitris Koureas Natural History Museum London.
Dr Dimitris Koureas Lead of Research Data & Partnerships Natural History Museum London Executive Secretary, Biodiversity Information Standards (TDWG) Co-chair,
Virtual Biodiversity ViBRANT Vocabularies, Standards, merging and linking Data Olaf Banki University of Amsterdam ViBRANT Virtual Biodiversity.
Scratchpads An online platform for biodiversity data Laurence Livermore Biodiversity Informatics | Department of Life Sciences Natural History Museum London.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI strategy and Grand Vision Ludek Matyska EGI Council Chair EGI InSPIRE.
The Global Scene Wouter Los University of Amsterdam The Netherlands.
European open science cloud (EOSC) visions and impact on DARIAH roadmap Eveline Wandl-Vogt, Maarten Hoogerwerf, Jakub Szprot.
1 Kostas Glinos European Commission - DG INFSO Head of Unit, Géant and e-Infrastructures "The views expressed in this presentation are those of the author.
Open Science (publishing) as-a-Service Paolo Manghi (OpenAIRE infrastructure) Institute of Information Science and Technologies Italian Research Council.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI Business Engagement Program for SMEs Javier Jiménez Business Development.
An Open Data Platform in the framework of the EGI-LifeWatch Competence Centre Fernando Aguilar Jesús Marco
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No EUDAT Aalto Data.
Coordination and Policy Development in Preparation for a European Open Biodiversity Knowledge Management System Supported by the European Commission through.
EGI-InSPIRE RI EGI Compute and Data Services for Open Access in H2020 Tiziana Ferrari Technical Director, EGI.eu
Announcing the 2014 National Digital Stewardship Agenda.
Towards a European Open Science Cloud for research
Dimitris Koureas Lead, Research Data and Partnerships
Virtual Research Environments The story of Scratchpads
EGI-Engage Engaging the EGI Community towards an Open Science Commons
The Biodiversity and Protected Areas Management (BIOPAMA) Programme
Core Data Resources and FAIRification of Data
Integrating social science data in Europe
Bird of Feather Session
Digital Objects: The Science
Presentation transcript:

Facilitating biodiversity science through Virtual Research Environments Challenges and opportunities for the Scratchpads platform Firstly, please accept my apologies for not being able to physically attend this workshops. The reason was prior teaching engagements. What I aim to do within the next 15 minutes is present the context in which VREs have been developed and the challenges that they serve, especially in the biodiversity domain using the example of Scratchpads Dr Dimitris Koureas Natural History Museum London Research Data Alliance Biodiversity Information Standards (TDWG)

The problem: Capturing and integrating biodiversity data How to we join up these activities? How do we use this as a tool? Species conservation & protected areas Impacts of human development Biodiversity & human health Impacts of climate change Food, farming & biofuels Invasive alien species What infrastructures do we need? (technologies, tools, standards…) What processes do we need? (Modelling, workflows…) What data do we need? (Genes, localities…) Studies of biodiversity have many components, broadly they map to one or more of these six different circles on this slide. Topics like Applied Ecology, Conservation Science; Genomics and Evolutionary Biology are at the heart of what we do. The answers to global urgent societal challenges like biodiversity loss, climate change and sustainability of ecosystem services or invasive alien species can only be provided if we link these topics up, and use the product of these activities as a resource to address issues relevant to science and society. This means our job is to in part, to identify the kinds of infrastructures we need to achieve this (so for example, where does cloud computing, or High Performance Computing fit on to this diagram). We need to ask ourselves what kinds of processes are needed to achieve this linkage (so what kind of workflows are required, what models do we need). And of course we need to identify and target the data we need to address these questions (what genotypic, phenotypic, environmental and temporal data are required to integrate these areas). So my talk today is really about the kinds of work we need to address to achieve integration between these topics, and how we might develop these within new consortia as part of the Horizon 2020 funding framework.

mobilising data at all scales Challenge 1: mobilising data at all scales If I was giving this talk say five years ago, I’d say the major barrier toward delivering this integration concerned a lack of computing power, storage or maybe getting the right algorithms together. These used to be the limiting factor when we thought about infrastructure integration. Now these limiting factors are different and in my view they are three-fold. Firstly the major challenge is finding the data. Within our community we are surrounded be a sea of legacy data and many new form of data. There are major digitisation efforts going on

linking & aggregating data at different scales Challenge 2: linking & aggregating data at different scales Communities c.50k (e.g. Scratchpads) National Efforts c.5M (e.g. NHM Data Portal) Linking that data Date is created and Global Efforts c.500M (e.g. GBIF Data Portal)

Models to predict how biodiversity responds to human pressures Challenge 3: Synthesising data, e.g. modelling human pressures on biodiversity Management Practices Ecosystems Agro-systems 2M records, 19k sites, 34k spp. Small aggregated datasets Species richness in different ecosystems Making use of the data, PREDICTS - Projecting Responses of Ecological Diversity In Changing Terrestrial Systems - is a collaborative project aiming to use a meta-analytic approach to investigate how local biodiversity typically responds to human pressures such as land-use change, pollution, invasive species and infrastructure, and ultimately improve our ability to predict future biodiversity changes. The PREDICTS project is collecting data from scientists worldwide in order to produce a global database of terrestrial species' responses to human pressures. Thanks to generous contributions from researchers, and a great deal of hard work by students and staff at the Natural History Museum and UNEP-WCMC, the project now has over 2 million biodiversity records from over 19,000 sites, covering more than 34,000 species. Looking at species richness in a range of natural, agricultural and land management settings, and then extrapolating these data forward to make predictions on species richness based on predicted changes in patters on land use. Land-use change Pollution Invasive species Infrastructure www.predicts.org.uk Projecting Responses of Ecological Diversity In Changing Terrestrial Systems Models to predict how biodiversity responds to human pressures

Reaching the long term vision is predicated on a BIG change in the way we (researchers) work To actually be able to tackle these global challenges in biodiversity sciences and fulfil the community inspired vision of modelling the biosphere, researchers and scientists in general need to significantly change the way they do science. Where each step of the scientific lifecycle is subject to huge changes. A paradigm shift in science only similar in scale and impact to the scientific revolution of the Enlightenment period. Data driven science Open science Efficient infrastructures

90% of all science data generated in the last 3 years! Data is everywhere and is produced with an ever increasing rate Data are everywhere, they are being produced with unprecedented rates. It only when we realise their volume that we can appreciate the potential impact and opportunities, as well as the size of the challenges that lie ahead! 90% of all science data generated in the last 3 years!

Big Data in Taxonomy and Systematics Even within traditionally isolated research domains like alpha-taxonomy there is a weave of new data that support the publication of c. 17.000 new taxa on a yearly basis! BUT, producing the data doesn’t actually mean that we can effectively re-use them, preserve them or aggregate them... (next slide) c. 17000 new taxa described every year

80% dark (or grey) data An informaticians view of biodiversity 20% Investigator-focused 'small data‘ Locally generated 'invisible data' 'incidental data' Dark data more important mainly due to their volume1 80% dark (or grey) data Dark data lost within 20 years In fact 80% of all the data produced in science is estimated that eventually are lost after two decades! Most of the dark data are produced by…. Forming the main corpus of what we call the long tail data! Published and discoverable data 20% 1Heidorn PB. Library Trends 57:280-299

Socio-cultural & Technological challenges To fully embark into the new data-driven scientific era Socio-cultural & Technological challenges Socio-cultural: Shift in the modus operandi of doing science Technical: Mobilisation, standardisation and accessibility

Biodiversity informatics landscape Key problems Landscape is complex, fragmented & hard to navigate Many audiences (policy makers, scientists, amateurs, citizen scientists) Many scales (global solutions to local problems) Figure adapted from Peterson et al, Syst. & Biodiv. 2010 doi: 10.1080/14772001003739369

The role of Virtual Research Environments VREs sit on the top of e-infrastructures They abstract from available services Thematic gateways to data and cyberinfrastructures, collaboration platforms, capacity buidling, interdisciplinary research, cross-border collaboration

Biodiversity data online Virtual Research Environment Enter – Structure – Curate – Link – Share – Publish Biodiversity data online 8 years of continuing development | 3 major Grants | Industry leading platform

Scratchpads 650 Communities 3.1 million visitors 150,000 taxa 6,500 active users The Scratchpads platform is being developed for the last 7 years under this framework. To provide researchers with the necessary tools to make taxonomy digital, open and linked! To facilitate the development of virtual research environments 14

A Scratchpad is a collaborative platform, a gateway to big data Harvest Open Biodiversity standards and services (e.g. TDWG: DwC) Scratchpads are fed by your data. Scratchpads help you structure your data in a way that makes them both human and machine readable. Allows you to contribute to global biodiversity databases and also aggregates all related to your data information from external resources. Feed to In-house data External data & services

User and stakeholder engagement Data preservation & citability Service longevity

User buy-in Incentives for mobilising long-tail research Share your work and take credit for it Publication of data to peer-reviewed open access journals Biodiversity Data Journal – Pensoft GigaScience - BMC, Scientific Data – NPG & F1000 Research Pensoft Writing Tool XML PWT

User and stakeholder buy-in Incentives for mobilising long-tail data Confidence Commitment Longevity Agility Adaptability User monitoring Marketing Visibility Intuitive interface

Data structure, annotation and storage Adhere to ratified community standards DwC (DwC-A) Audubon core Phytogeographical areas Allocation of persistent identifiers to data objects - PURL already in place - Deposition in open repositories

Data structure, annotation and storage Effective implementation of Knowledge Organisation Systems Biodiversity communities Vocabularies and ontologies The domain is lagging in achieving the optimum use of controlled vocabularies and ontologies

Longevity of services is key Need to look beyond the fragile model of recursive research funding Shift in the way we think of e-infrastructures and information resources Stable/rigid system Dynamic/open process Outsource to the end user community We need to set up the environment that will enable the community contribution

Infrastructure maintenance Community based sustainability model Infrastructure maintenance Technical maintenance User support Open source & modular Crowdsourcing support activities Maximising support efficiency Three basic pillars for community support

Key actions to increase interoperability, efficiency and uptake Minimise infrastructure redundancy Harmonise user experiences Open access and open source Learn from experience across domains

Leverage effort and data impact mobilisation & generation Data curation Data publishing Data analysis Seamless virtual research environments that incentivise mobilisation of long tail research

Common issues - different approaches A highly dynamic but fragmented landscape

Efficient Networking and collaboration platforms Biodiversity Data Integration IG The single largest organisation on research data Crossdomain | Bottom-up | Multilateral agreement ca.60 members European COST Actions European ESFRI projects US RC Networks

Science is a ‘light’s better’ endeavour in that research effort is Tools for making sense of the big data world are important because… Science is a ‘light’s better’ endeavour in that research effort is not directed at areas where the work is technically infeasible. Research is directed where real, interpretable results may be obtained.

Thank you http://uk.linkedin.com/in/dkoureas @DimitrisKoureas d.koureas@nhm.ac.uk