EUDAT Towards a European Collaborative Data Infrastructure Mark van de Sanden – SARA, The Netherlands EGI TF, Lyon, 19 September 2011 (slides from Damien Carpentier, CSC)
Outline of the talk Problem description - Current data infrastructure landscape EUDAT project Expected benefits and challenges of a CDI
The current data infrastructure landscape: challenges and opportunities Long history of data management in Europe: several existing data infrastructures dealing with established and growing user communities (e.g., ESO, ESA, EBI, CERN) New Research Infrastructures are emerging and are also trying to build data infrastructure solutions to meet their needs (CLARIN, EPOS, ELIXIR, ESS, etc.) A large number of projects providing excellent data services (EURO-VO, GENESI-DR, Geo-Seas, HELIO, IMPACT, METAFOR, PESI, SEALS, etc.) However, most of these infrastructures and initiatives address primarily the needs of a specific discipline and user community Challenges Compatibility, interoperability, and cross-disciplinary research Data growth in volume and complexity (the so-called “data tsunami”) strong impact on costs threatening the sustainability of the infrastructure Opportunities Potential synergies do exist: although disciplines have different ambitions, they have common basic needs and requirements that could be matched with generic pan-European services supporting multiple communities and ensuring greater interoperability. Strategy needed at pan-European level
EUDAT Key facts and objectives Initiative funded through FP7 e-Infrastructure Call 9 (WP11): INFRA-2011-1.2.2: Data infrastructure for e-Science (november 2010) Call 9 Objective: ”Establish a peristent and robust service infrastructure for scientific data in Europe that responds to the need of data-intensive Science of 2020” Budget 43M€ EUDAT selected for funding (three-year project) Official starting date: 1st October 2011 Biggest budget of the call: 9,3 M€ EC Grant Total Budget: 16,3 M€ Consortium 23 partners representing 13 countries 15 user communities from a wide range of disciplines (Biomed, Earth Science, Climate, SSH, etc.) Targets EUDAT objective: “To deliver a Collaborative Data Infrastructure (CDI) with the capacity and capability for meeting researchers’ needs in a flexible and sustainable way, across geographical and disciplinary boundaries.” The infrastructure must be Collaborative The infrastructure must be driven by researchers’ needs The infrastructure must be sustainable yet flexible The infrastructure must be pan-European The infrastructure must be multi-disciplinary
Towards a Collaborative Data Infrastructure Source: HLEG report, p. 31 EUDAT will focus on building this generic data infrastructure layer and offer a trusted domain for long term data preservation accompanied with related services to store, identify, authenticate and mine these data. This need be done in close collaboration with the Communities Core services must match the requirements of the communities Community services can also be incorporated into the common data service infrastructure when they are of use to other communities.
The EUDAT Consortium
The EUDAT Communities (by field) Biological and Medical Science VPH, ELIXIR, BBRMI, ECRIN Environmental Science ENES, EPOS, Lifewatch, EMSO, IAGOS-ERI, ICOS Social Sciences and Humanities CLARIN Physical Sciences and Engineering WLCG, ISIS Material Science ESS… Energy EUFORIA… EUDAT targets all scientific disciplines (discipline neutral): To enable the capture and identify cross-discipline requirements To involving the scientists of all the communities in the shaping of the infrastructure and its services
EUDAT core services Core services are building blocks of EUDAT‘s Common Data Infrastructure mainly included on bottom layer of data services Fundamental Core Services Long-term preservation Persistent identifier service Data access and upload Workspaces Web execution and workflow services Single Sign On (federated AAI) Monitoring and accounting services Network services Extended Core Services (community-supported) Joint meta data service Joint data mining service No need to match the needs of all at the same time, addressing a group of communities can be very valuable, too
First Services available EUDAT Timeline 1st User Forum 2nd User Forum 3rd User Forum 4th User Forum EUDAT Kick-Off Sustainability Plan Cross- Community Services Full core Services deployed First Services available USER REQUIREMENTS SERVICE DESIGN SERVICE DEPLOYMENT Service deployment 2012 2013 2014 2015
Expected benefits of a Collaborative Data Infrastructure Enabling multi-disciplinary data intensive research and collaboration Development of common services supporting research communities Support to existing scientific communities’ infrastructures Support to smaller communities through access to sophisticated services Inter-disciplinary collaboration and exploitation of synergies between communities Communities from different disciplines working together to build services Data sharing between disciplines Collaboration with other large-scale infrastructure European e-Infrastructures: Géant, PRACE,EGI, etc. Global initiatives in the US, Japan, Australia, etc. Ensuring wide access to and preservation of data in a sustainable way A robust generic infrastructure capable of handling the scale and complexity of data that will be generated over the next 10-20 years Greater access to existing data and better management of data for the future Increased security by managing multiple copies in geographically distant locations Put Europe in a competitive position for important data repositories of world-wide relevance Economies of scale and cost-efficiency Shared resources and work are less costly
Challenges and Opportunities Delivering high level multi-disciplinary data services Achieving a high level of interoperability in the context of diversity of data, research disciplines and practices Need to strongly involve the different communities in the design and evaluation of services EUDAT as a platform to discuss interoperability issues (along with other initiatives: e.g DAITF) Building trust among stakeholders Trust between service providers and users but also between the researchers and disciplines themselves Trust in the EUDAT infrastructure, the data deposited and collected, data integrity Ensuring the sustainaibility of the infrastructure Providing a framework and a plan to ensure the continuity of services beyond the immediate funding window, through the setting up of a sustainable entity Funding and business models Parnerships (new communities, industry, etc.) and governance models
Thank You sanden@sara.nl