partly FAIR, partly Cloudy Barend Mons Leiden University Medical Center Dutch Techcenter for Life Sciences Netherlands e-Science Centre Chair: HLEG-EOSC
https://vimeo.com/162062013
Data loss is real and significant, while data growth is staggering Science1.0 Data loss is real and significant, while data growth is staggering Nature news, 19 December 2013 Computer speed and storage capacity is doubling every 18 months and this rate is steady DNA sequence data is doubling every 6 months over the last 3 years and looks to continue for this decade ‘Oops, that link was the laptop of my PhD student’ 80% of data is lost for reuse
Cloud
: 2 B (kick start phase)
The Data Stewardship Cycle 5% (50 B) The full data cycle and: Where the boudaries of ELIXIR are (ref: our discussion on the ‘left side’ (data design and planning and analysis/modelling I put the third ‘flag’ there to emphasize that TRAINING of respected data experts is a remit of ELIXIR and that the NL node in GOBLET will participate. Argument: the better the (data colelction in) big data experiments of the future is planned and designed, the easier it is for ELIXIR later to deal with the data properly and serve them up for analysis at the end of the cycle. I can also emphaisze that the BBMRI etc. are more on the left and could partner with ELIXIR on the right etc. We can discuss this slide once more before the meeting to align our messages. 6
The Future of FAIR Data Stewardship 5% FAIR
Cloud European Open Science Cloud Introduction (extremely brief)
partly FAIR, partly Cloudy
EOSC: Challenges and Observations Cloud EOSC: Challenges and Observations The majority of the challenges are social rather than technical Not just the size of data, but in particular complex data and analytics across domains. Shortage of data experts globally and in the European Union Archaic system of rewards and funding of science and innovation ‘Valley of death’ between (e-)infrastructure providers and domain specialists. Short funding cycles of core research infrastructures are not fit for purpose Fragmentation between domains causes repetitive and isolated solutions Distributed data sets increasingly do not move (size & privacy reasons) Centralised HPC is insufficient to support distributed meta-analysis and learning. However, the major components for a first generation EOSC are largely ‘there’ But ‘lost in fragmentation’ and spread over 28 Member States. 80% social/ 20% technical
EOSC: Framing Trusted access to services & systems Cloud EOSC: Framing Trusted access to services & systems Re-use of shared data Across disciplinary, social and geographical borders Federated environment, across Member States
EOSC: ‘Internet approach’ Cloud EOSC: ‘Internet approach’ Minimal international guidance and governance Maximum freedom to implement. Globally interoperable and accessible Globally embedded in a ‘Commons’
EOSC: Scope Human expertise Core resources Standards, Best Practices Cloud EOSC: Scope Human expertise Core resources Standards, Best Practices Underpinning technical infrastructures A web of Data and Services
EOSC: Supports Open Science Open Innovation Cloud EOSC: Supports Open Science Open Innovation Systematic and professional data management Long term data stewardship
EOSC: Implementation Recommendations Cloud EOSC: Implementation Recommendations I1: Turn this report into an EC approved document to guide EOSC initiative I2: Develop, Endorse and implement a Rules of Engagement scheme I3: Fund a concentrated effort to locate and develop Data Expertise in Europe I4: Install a highly innovative guided funding scheme for the preparatory phase I5: Make adequate data stewardship mandatory for all research proposals I6: Install an executive team to deal with international coherence of the EOSC I7: Install an executive team to deal with the preparatory phase of the EOSC
Open Science Conference Open Access FAIR data
‘We support appropriate efforts to promote open science and facilitate appropriate access to publicly funded research results on findable, accessible, interoperable and reusable (FAIR)’
Services Data Exchange layer minimal protocols
inter MS (self) coordination RoE MoU RoE Partner 1 2 3 4 services opting as ‘Certified EOSC provider MS-stakeholder Group Cloud Coins H2020 Funder PI