EUDAT Towards a European Collaborative Data Infrastructure Damien Lecarpentier – CSC, IT Center for Science, Finland ISC’11, Hamburg, 20 June 2011
Outline of the talk EUDAT concept EUDAT consortium EUDAT service approach Expected benefits and challenges of a CDI
Initiative funded through FP7 e-Infrastructure Call 9 (WP11): INFRA : Data infrastructure for e-Science (november 2010) Call 9 Objective: ”Establish a peristent and robust service infrastructure for scientific data in Europe that responds to the need of data-intensive Science of 2020” Budget 43M€ EUDAT selected for funding (three-year project) Official starting date: 1st October 2011 Biggest budget of the call: 9,3 M€ EC Grant Total Budget: 16,3 M€ Consortium 23 partners representing 13 countries 15 user communities from a wide range of disciplines (Biomed, Earth Science, Climate, SSH, etc.) Targets EUDAT objective: “To deliver a Collaborative Data Infrastructure (CDI) with the capacity and capability for meeting researchers’ needs in a flexible and sustainable way, across geographical and disciplinary boundaries.” EUDAT Key facts and objectives The infrastructure must be Collaborative The infrastructure must be driven by researchers’ needs The infrastructure must be sustainable yet flexible The infrastructure must be pan-European The infrastructure must be multi-disciplinary
The current data infrastructure landscape: challenges and opportunities Long history of data management in Europe: several existing data infrastructures dealing with established and growing user communities (e.g., ESO, ESA, EBI, CERN) New Research Infrastructures are emerging and are also trying to build data infrastructure solutions to meet their needs (CLARIN, EPOS, ELIXIR, ESS, etc.) A large number of projects providing excellent data services (EURO-VO, GENESI-DR, Geo-Seas, HELIO, IMPACT, METAFOR, PESI, SEALS, etc.) However, most of these infrastructures and initiatives address primarily the needs of a specific discipline and user community Challenges Compatibility, interoperability, and cross-disciplinary research Data growth in volume and complexity (the so-called “data tsunami”) strong impact on costs threatening the sustainability of the infrastructure Opportunities Potential synergies do exist: although disciplines have different ambitions, they have common basic needs and requirements that could be matched with generic pan-European services supporting multiple communities and ensuring greater interoperability. Strategy needed at pan-European level
Towards a Collaborative Data Infrastructure Source: HLEG report, p. 31 EUDAT will focus on building this generic data infrastructure layer and offer a trusted domain for long term data preservation accompanied with related services to store, identify, authenticate and mine these data. This need be done in close collaboration with the Communities Core services must match the requirements of the communities Community services can also be incorporated into the common data service infrastructure when they are of use to other communities.
The EUDAT Consortium
The EUDAT Communities
The EUDAT Communities (by field) EUDAT targets all scientific disciplines (discipline neutral): To enable the capture and identify cross-discipline requirements To involving the scientists of all the communities in the shaping of the infrastructure and its services Biological and Medical ScienceVPH, ELIXIR, BBRMI, ECRIN Environmental ScienceENES, EPOS, Lifewatch, EMSO, IAGOS-ERI, ICOS Social Sciences and HumanitiesCLARIN Physical Sciences and EngineeringWLCG, ISIS Material ScienceESS… EnergyEUFORIA…
EUDAT Services Activities – Iterative Design EUDAT’s Services activity is concerned with identification of the types of data services needed by the European research communities, delivering them through a federated data infrastructure and supporting their users 1. Capturing Communities Requirements (WP4) Services to be deployed must be based on user communities needs Strong engagement and collaboration with user communities (EUDAT communities and beyond) to capture requirements 2. Building the services (WP5) User requirements must be matched with available technologies Need to identify: available technologies and tools to develop the required services (technology appraisal) gaps and market failures that should be addressed by EUDAT research activities Services must be designed, built and tested in a pre-production test bed environment and made available to WP4 for evaluation by their users 3. Deploying the services and operating the federated infrastructure (WP6) Services must be deployed on the EUDAT infrastructure and made available to users, with interfaces for cross-site, cross-community operation Reliability, 24h/7d availability and accessibility of the shared services, with operational security, data integrity and compliance with stakeholder requirements and policies.
Core services are building blocks of EUDAT‘s Common Data Infrastructure mainly included on bottom layer of data services Fundamental Core Services Long-term preservation Persistent identifier service Data access and upload Workspaces Web execution and workflow services Single Sign On (federated AAI) Monitoring and accounting services Network services Extended Core Services (community-supported) Joint meta data service Joint data mining service EUDAT core services No need to match the needs of all at the same time, addressing a group of communities can be very valuable, too
Service Model Approach and Generic Collaboration Generic Service Model Fundamental Core Services meet strongly overlapping service requirements Extended Core Services are mainly community-supported, community requirements are typically overlapping between some disciplines Collaboration between Teams Fundamental Core Services are operated and supported by an Operations Team which collaborates across the participating centres. Extended Core Services and other joint multi-disciplinary service must be community-supported, the requirements are overlapping between a specific subset of disciplines
EUDAT Kick-Off Service deployment SERVICE DESIGN USER REQUIREMENTS SERVICE DEPLOYMENT st User Forum4th User Forum2nd User Forum3rd User Forum First Services available Cross- Community Services Full core Services deployed Sustainability Plan EUDAT Timeline
Expected benefits of a Collaborative Data Infrastructure Enabling multi-disciplinary data intensive research and collaboration Development of common services supporting research communities Support to existing scientific communities’ infrastructures Support to smaller communities through access to sophisticated services Inter-disciplinary collaboration and exploitation of synergies between communities Communities from different disciplines working together to build services Data sharing between disciplines Collaboration with other large-scale infrastructure European e-Infrastructures: Géant, PRACE,EGI, etc. Global initiatives in the US, Japan, Australia, etc. Ensuring wide access to and preservation of data in a sustainable way A robust generic infrastructure capable of handling the scale and complexity of data that will be generated over the next years Greater access to existing data and better management of data for the future Increased security by managing multiple copies in geographically distant locations Put Europe in a competitive position for important data repositories of world-wide relevance Economies of scale and cost-efficiency Shared resources and work are less costly
Challenges and Opportunities Delivering high level multi-disciplinary data services Achieving a high level of interoperability in the context of diversity of data, research disciplines and practices Need to strongly involve the different communities in the design and evaluation of services EUDAT as a platform to discuss interoperability issues (along with other initiatives: e.g DAITF) Building trust among stakeholders Trust between service providers and users but also between the researchers and disciplines themselves Trust in the EUDAT infrastructure, the data deposited and collected, data integrity Ensuring the sustainaibility of the infrastructure Providing a framework and a plan to ensure the continuity of services beyond the immediate funding window, through the setting up of a sustainable entity Funding and business models Parnerships (new communities, industry, etc.) and governance models
“Do the difficult things while they are easy and do the great things while they are small. A journey of a thousand miles must begin with a single step.” Lao Tzu The beginning of a long journey…
How to get in touch with EUDAT? Kimmo Koski, CSC - IT Center for Science EUDAT Project Coordinator Peter Wittenburg, Max Planck Institute for Psycholinguistics at Nijmegen (MPI-PL) EUDAT Scientific Coordinator Damien Lecarpentier, CSC - IT Center for Science EUDAT Project Manager BoF session on “e-Infrastructure for science in Europe”, on Tuesday 21 June, 14:30-15:15, Hall B Partners’ booths at ISC: CSC #146 BSC # 114 DKRZ # 140 EPCC # 152 THANK YOU!