Download presentation
Presentation is loading. Please wait.
1
Facilitating biodiversity science through
Virtual Research Environments Challenges and opportunities for the Scratchpads platform Firstly, please accept my apologies for not being able to physically attend this workshops. The reason was prior teaching engagements. What I aim to do within the next 15 minutes is present the context in which VREs have been developed and the challenges that they serve, especially in the biodiversity domain using the example of Scratchpads Dr Dimitris Koureas Natural History Museum London Research Data Alliance Biodiversity Information Standards (TDWG)
2
The problem: Capturing and integrating biodiversity data
How to we join up these activities? How do we use this as a tool? Species conservation & protected areas Impacts of human development Biodiversity & human health Impacts of climate change Food, farming & biofuels Invasive alien species What infrastructures do we need? (technologies, tools, standards…) What processes do we need? (Modelling, workflows…) What data do we need? (Genes, localities…) Studies of biodiversity have many components, broadly they map to one or more of these six different circles on this slide. Topics like Applied Ecology, Conservation Science; Genomics and Evolutionary Biology are at the heart of what we do. The answers to global urgent societal challenges like biodiversity loss, climate change and sustainability of ecosystem services or invasive alien species can only be provided if we link these topics up, and use the product of these activities as a resource to address issues relevant to science and society. This means our job is to in part, to identify the kinds of infrastructures we need to achieve this (so for example, where does cloud computing, or High Performance Computing fit on to this diagram). We need to ask ourselves what kinds of processes are needed to achieve this linkage (so what kind of workflows are required, what models do we need). And of course we need to identify and target the data we need to address these questions (what genotypic, phenotypic, environmental and temporal data are required to integrate these areas). So my talk today is really about the kinds of work we need to address to achieve integration between these topics, and how we might develop these within new consortia as part of the Horizon 2020 funding framework.
3
mobilising data at all scales
Challenge 1: mobilising data at all scales If I was giving this talk say five years ago, I’d say the major barrier toward delivering this integration concerned a lack of computing power, storage or maybe getting the right algorithms together. These used to be the limiting factor when we thought about infrastructure integration. Now these limiting factors are different and in my view they are three-fold. Firstly the major challenge is finding the data. Within our community we are surrounded be a sea of legacy data and many new form of data. There are major digitisation efforts going on
4
linking & aggregating data at different scales
Challenge 2: linking & aggregating data at different scales Communities c.50k (e.g. Scratchpads) National Efforts c.5M (e.g. NHM Data Portal) Linking that data Date is created and Global Efforts c.500M (e.g. GBIF Data Portal)
5
Models to predict how biodiversity responds to human pressures
Challenge 3: Synthesising data, e.g. modelling human pressures on biodiversity Management Practices Ecosystems Agro-systems 2M records, 19k sites, 34k spp. Small aggregated datasets Species richness in different ecosystems Making use of the data, PREDICTS - Projecting Responses of Ecological Diversity In Changing Terrestrial Systems - is a collaborative project aiming to use a meta-analytic approach to investigate how local biodiversity typically responds to human pressures such as land-use change, pollution, invasive species and infrastructure, and ultimately improve our ability to predict future biodiversity changes. The PREDICTS project is collecting data from scientists worldwide in order to produce a global database of terrestrial species' responses to human pressures. Thanks to generous contributions from researchers, and a great deal of hard work by students and staff at the Natural History Museum and UNEP-WCMC, the project now has over 2 million biodiversity records from over 19,000 sites, covering more than 34,000 species. Looking at species richness in a range of natural, agricultural and land management settings, and then extrapolating these data forward to make predictions on species richness based on predicted changes in patters on land use. Land-use change Pollution Invasive species Infrastructure Projecting Responses of Ecological Diversity In Changing Terrestrial Systems Models to predict how biodiversity responds to human pressures
6
Reaching the long term vision is predicated on
a BIG change in the way we (researchers) work To actually be able to tackle these global challenges in biodiversity sciences and fulfil the community inspired vision of modelling the biosphere, researchers and scientists in general need to significantly change the way they do science. Where each step of the scientific lifecycle is subject to huge changes. A paradigm shift in science only similar in scale and impact to the scientific revolution of the Enlightenment period. Data driven science Open science Efficient infrastructures
7
90% of all science data generated in the last 3 years!
Data is everywhere and is produced with an ever increasing rate Data are everywhere, they are being produced with unprecedented rates. It only when we realise their volume that we can appreciate the potential impact and opportunities, as well as the size of the challenges that lie ahead! 90% of all science data generated in the last 3 years!
8
Big Data in Taxonomy and Systematics
Even within traditionally isolated research domains like alpha-taxonomy there is a weave of new data that support the publication of c new taxa on a yearly basis! BUT, producing the data doesn’t actually mean that we can effectively re-use them, preserve them or aggregate them... (next slide) c new taxa described every year
9
80% dark (or grey) data An informaticians view of biodiversity 20%
Investigator-focused 'small data‘ Locally generated 'invisible data' 'incidental data' Dark data more important mainly due to their volume1 80% dark (or grey) data Dark data lost within 20 years In fact 80% of all the data produced in science is estimated that eventually are lost after two decades! Most of the dark data are produced by…. Forming the main corpus of what we call the long tail data! Published and discoverable data 20% 1Heidorn PB. Library Trends 57:
10
Socio-cultural & Technological challenges
To fully embark into the new data-driven scientific era Socio-cultural & Technological challenges Socio-cultural: Shift in the modus operandi of doing science Technical: Mobilisation, standardisation and accessibility
11
Biodiversity informatics landscape
Key problems Landscape is complex, fragmented & hard to navigate Many audiences (policy makers, scientists, amateurs, citizen scientists) Many scales (global solutions to local problems) Figure adapted from Peterson et al, Syst. & Biodiv. 2010 doi: /
12
The role of Virtual Research Environments
VREs sit on the top of e-infrastructures They abstract from available services Thematic gateways to data and cyberinfrastructures, collaboration platforms, capacity buidling, interdisciplinary research, cross-border collaboration
13
Biodiversity data online
Virtual Research Environment Enter – Structure – Curate – Link – Share – Publish Biodiversity data online 8 years of continuing development | 3 major Grants | Industry leading platform
14
Scratchpads 650 Communities 3.1 million visitors 150,000 taxa
6,500 active users The Scratchpads platform is being developed for the last 7 years under this framework. To provide researchers with the necessary tools to make taxonomy digital, open and linked! To facilitate the development of virtual research environments 14
15
A Scratchpad is a collaborative platform, a gateway to big data
Harvest Open Biodiversity standards and services (e.g. TDWG: DwC) Scratchpads are fed by your data. Scratchpads help you structure your data in a way that makes them both human and machine readable. Allows you to contribute to global biodiversity databases and also aggregates all related to your data information from external resources. Feed to In-house data External data & services
16
User and stakeholder engagement
Data preservation & citability Service longevity
17
User buy-in Incentives for mobilising long-tail research
Share your work and take credit for it Publication of data to peer-reviewed open access journals Biodiversity Data Journal – Pensoft GigaScience - BMC, Scientific Data – NPG & F1000 Research Pensoft Writing Tool XML PWT
18
User and stakeholder buy-in
Incentives for mobilising long-tail data Confidence Commitment Longevity Agility Adaptability User monitoring Marketing Visibility Intuitive interface
19
Data structure, annotation and storage
Adhere to ratified community standards DwC (DwC-A) Audubon core Phytogeographical areas Allocation of persistent identifiers to data objects - PURL already in place - Deposition in open repositories
20
Data structure, annotation and storage
Effective implementation of Knowledge Organisation Systems Biodiversity communities Vocabularies and ontologies The domain is lagging in achieving the optimum use of controlled vocabularies and ontologies
21
Longevity of services is key
Need to look beyond the fragile model of recursive research funding Shift in the way we think of e-infrastructures and information resources Stable/rigid system Dynamic/open process Outsource to the end user community We need to set up the environment that will enable the community contribution
22
Infrastructure maintenance
Community based sustainability model Infrastructure maintenance Technical maintenance User support Open source & modular Crowdsourcing support activities Maximising support efficiency Three basic pillars for community support
24
Key actions to increase interoperability, efficiency and uptake
Minimise infrastructure redundancy Harmonise user experiences Open access and open source Learn from experience across domains
25
Leverage effort and data impact
mobilisation & generation Data curation Data publishing Data analysis Seamless virtual research environments that incentivise mobilisation of long tail research
26
Common issues - different approaches
A highly dynamic but fragmented landscape
27
Efficient Networking and collaboration platforms
Biodiversity Data Integration IG The single largest organisation on research data Crossdomain | Bottom-up | Multilateral agreement ca.60 members European COST Actions European ESFRI projects US RC Networks
28
Science is a ‘light’s better’ endeavour in that research effort is
Tools for making sense of the big data world are important because… Science is a ‘light’s better’ endeavour in that research effort is not directed at areas where the work is technically infeasible. Research is directed where real, interpretable results may be obtained.
29
Thank you http://uk.linkedin.com/in/dkoureas @DimitrisKoureas
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.