Design your e-infrastructure. https://indico. egi

Slides:



Advertisements
Similar presentations
EGI-Engage EGI-Engage Engaging the EGI Community towards an Open Science Commons Project Overview 9/14/2015 EGI-Engage: a project.
Advertisements

CLARIN Infrastructure Vision (and some real needs) Daan Broeder CLARIN EU/NL Max-Planck Institute for Psycholinguistics.
QCDGrid Progress James Perry, Andrew Jackson, Stephen Booth, Lorna Smith EPCC, The University Of Edinburgh.
Bob Jones Technical Director CERN - August 2003 EGEE is proposed as a project to be funded by the European Union under contract IST
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No The pan-European.
A scalable and flexible platform to run various types of resource intensive applications on clouds ISWG June 2015 Budapest, Hungary Tamas Kiss,
Aalto Data Repository Keijo Heljanko and Mikko Hakala
NFFA-EUROPE: Information and Data Management Repository Platform for nanoscience in Europe LOGO of your Pilot – organisation / initiative Stefano Cozzini.
Open Science (publishing) as-a-Service Paolo Manghi (OpenAIRE infrastructure) Institute of Information Science and Technologies Italian Research Council.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Data service requirements and provisioning models Gergely Sipos With input.
CLARIN EUDAT2020 uptake plan Dieter Van Uytvanck CLARIN ERIC EUDAT User Forum, Rome.
European Life Sciences Infrastructure for Biological Information EGI 2015, Lisbon, 18 May 2015 Rafael C Jimenez, ELIXIR CTO ELIXIR.
European Life Sciences Infrastructure for Biological Information ELIXIR Cloud Roadmap Chairs: Steven Newhouse, EMBL-EBI & Mirek Ruda,
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No EUDAT Aalto Data.
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No The use of the.
Store and exchange data with colleagues and team Synchronize multiple versions of data Ensure automatic desktop synchronization of large files B2DROP is.
EGI-InSPIRE RI EGI Compute and Data Services for Open Access in H2020 Tiziana Ferrari Technical Director, EGI.eu
EGI-InSPIRE RI EGI-InSPIRE RI EGI-InSPIRE Software provisioning and HTC Solution Peter Solagna Senior Operations Manager.
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No Aalto Data Repository.
EGI-InSPIRE RI An Introduction to European Grid Infrastructure (EGI) March An Introduction to the European Grid Infrastructure.
EGI… …is a Federation of over 300 computing and data centres spread across 56 countries in Europe and worldwide …delivers advanced computing.
Towards integrating European research information
Break out group coordinator:
Bob Jones EGEE Technical Director
The OpenAIRE Infrastructure
EOSC Services for Scientists
Tokamak data mirror for JET and MAST Moving towards an open data repository for European nuclear fusion research.
GISELA & CHAIN Workshop Digital Cultural Heritage Network
Alessandro Spinuso, Andreas Rietbrock, Andrè Gemuend,
EOSC MODEL Pasquale Pagano CNR - ISTI
Break out group coordinator:
ICOS on-demand atmospheric transport computation A use case for interoperability of EGI and EUDAT services Ute Karstens, André Bjärby, Oleg Mirzov, Roger.
INTAROS WP5 Data integration and management
LifeWatch, costing and funding
Design your e-infrastructure!
Donatella Castelli CNR-ISTI
EGI use case description and development planning template Use case(s): Provider: …
Ideas for an ICOS Competence Centre Implementation of an on-demand computation service Ute Karstens, André Bjärby, Oleg Mirzov, Roger Groth, Mitch Selander,
DI4R, 30th September 2016, Krakow
Short to Medium Term Priority issues for EGI, EMI, anD others
NA3: User Community Support Team
EGI-Engage Engaging the EGI Community towards an Open Science Commons
EOSChub / OpenAIRE-Advance
Connecting the European Grid Infrastructure to Research Communities
Solutions for federated services management EGI
EGI – Organisation overview and outreach
Design your e-infrastructure. egi
DATA SPHINX & EUDAT Collaboration
Introduction to D4Science
EOSC & e-Science: enabling the digital transformation of Science
Case Study: Algae Bloom in a Water Reservoir
EGI Webinar - Introduction -
EOSC services architecture
NFFA Europe.
ELIXIR Competence Center
Design your e-infrastructure. egi
An EUDAT-based FAIR Data Approach for Data Interoperability
Design your e-infrastructure. egi
Break out group coordinator:
Integrating social science data in Europe
Break out group coordinator:
Brian Matthews STFC EOSCpilot Brian Matthews STFC
GISELA & CHAIN Workshop Digital Cultural Heritage Network
DATATURB Direct simulation data of turbulent flows
Sergio Andreozzi Strategy and Policy Manager (EGI.eu)
Data(trans)forming Roberto Barcellan European Commission NTTS2019
Technical Outreach Expert
Expand portfolio of EGI services
Maria Teresa Capria December 15, 2009 Paris – VOPlaneto 2009
EOSC-hub Contribution to the EOSC WGs
Presentation transcript:

Design your e-infrastructure. https://indico. egi Design your e-infrastructure! https://indico.egi.eu/indico/event/3025/ Use case: CLL ERIC Break out group coordinator: Roberta Piscitelli Strategy and Policy officer at EGI foundation Krakow, 27. September, 2016.

Group members Roberta Piscitelli (EGI Foundation) Giuseppe La Rocca (EGI Foundation) Fotis E. Psomopoulos (CERTH/AUTH) Kostas Stamatopoulos (ERIC, CERTH) Paolo Manghi (OpenAIRE, ISTI-CNR) Jani Heikkinen (CSC) Xavier Jeannin (RENATER) Peter Kacsuk (MTA-SZTAKI)

First break-out Background and Users

Who will be the user? Can the users be characterised? How many are they? Scientists (physicians, biologists, (bio)informaticians, engineers, biostatisticians, other) involved in research on CLL and related disorders
 ERIC has more than 730 members representing more than 50 countries, mostly in Europe but also Australia, the US and the Gulf region.
 ERIC members are also active in other European and international scientific societies and consortia (e.g. European Haematology Association, European LeukemiaNet American Society of Hematology, Euroclonality-NGS, iwCLL), with large lists of members/partners who can be considered potential users.


What value will the envisaged system deliver for them (the whole setup)? What will the system exactly deliver to them? The value for the users would be one place to use the different services coming the different e-infrastructures. Migration of local services (groupes as computational/data/other services) to supporting e-infrastructures
 a community platform like gCube that ensures a certain level of interoperability and scalability; different services are integrated: 
 Persistent sensitive data repository (patient and clinical trial data)
 Computational analysis services around the repository
 No authentication is needed to process the data, some activities require and authentication and some others not.
 OpenAIRE can integrate with the outcome of ERIC experimental processes to publish data and experiments (services applied to the data at given time and conditions)

How should they use the system? Depending on user expertise:
 non tech-savvy users: access to data and services through web-interface The end users go to the web interface, query on the analysis dataset, and request the original  data, in some cases they might request processing of the data. Different representations of the data should be available.
 Experienced users will develop services on top of the platform
 data producers: command-line access to resources
 project managers: orchestration of services towards specific solutions

What's the timeline for development, testing and large-scale operation What's the timeline for development, testing and large-scale operation? (Consecutive releases can/should be considered.) ERIC currently includes several mature services, however
 only local infrastructure when available
 No scalability
 Q1 2017: Development and setup of the first pilot
 Requirements collection (EGI)
 Storing/archiving data (EUDAT)
 A small computational and storage resource is allocated
 Q3: 2017: Testing
 Q1: 2018: Large-scale deployment: A second pilot is tested in the infrastructure to verify interoperability and feasibility for high throughput data (EGI and EUDAT)
 Q2: 2018: Scientific Results are integrated in OpenAire

Design and implementation plan Second break-out Design and implementation plan

What should the first version include What should the first version include? - The most basic product prototype imaginable already bringing value to the users (the so-called Minimal Viable Product - MVP) First version would be a virtual research environment on a single site that would allow:
 Integration of the existing distributed services into a single end-point Central repository of datasets A catalogue of services (tools and software?) and data sources A Virtual Organisation for CLL is created A small number of computational services hosted in EGI resources Replicated datasets in EUDAT One experimental case of dumping results of one experiment in EUDAT and retrieving data. Literature, documents related provided to OpenAire and the metadata connected to those and make those discoverable through OpenAire services 
 Indicative KPIs:
 Over 5 computational services that offer data analysis processes Over 5 datasets that are currently across different sites 1 integrated service combining 3 or more of the existing services on extended datasets

Which components/services already exist in this architecture? Several databases available to the community: ERIC minimum dataset project ERIC TP53 database IMGT/CLL-DB
 Several tools available through ERIC: http://www.ericll.org/pages/services IGHV gene mutational analysis MRD diagnostics ARResT/AssignSubsets
 2 EU funded projects are already under ERIC, all the results and publications are on the ERIC website (not all of them) so they can be linked by OpenAire

Which components/services are under development (and by who)? An effort towards Data Integration is currently under way, involving the majority of ERIC members.
 A centralized computational infrastructure would greatly facilitate this effort 


Which components/services should be still brought into the system Which components/services should be still brought into the system? Which EGI/EUDAT/GEANT/OpenAire partner can do it? A community in OpenAire should be created (OpenAire)
 OpenAire Resource Community Dashboard (OpenAIRE system and Zenodo)
 EGI FedCloud Resources, to make the user requests more scalable in terms of resources
 A resource catalogue (D4Science/OpenAire)
 B2SHARE and B2SAFE (publish datasets and long-term storage) (EUDAT)

Are there gaps in the EGI/EUDAT/GEANT/OpenAIRE service catalogues that should be filled to implement the use case? Which service provider could fill the gap? A  scalable application framework (available in prototype version and will be further developed in the H2020 project COLA to be started in 2017 January) to handle different kinds of clouds and hybrid architecture, running on top of EGI FedCloud (MTA-SZTAKI)
 Moving data around and computation to the data and create the pipeline, at all times the location in which the data is stored, will it also have access to compute?
 The algorithm can be moved around the different services 
 Maybe OneData? Need to check how EGI is connected to EUDAT
 A second pilot that is testing and solving this with high-throughput data should be implemented (EGI & EUDAT should address this)

Next steps Setup a dedicated VO for ERIC-CLL (EGI) Move the first pilot in EGI resources (EGI and ERIC-CLL) A community in OpenAire should be created (OpenAire) Publish datasets and long-term storage (EUDAT) Describe how and what ERIC-CLL resources will be included in the Resource Catalogue (ERIC-CLL in collaboration with OpenAire)
 Similarly, for the second pilot.