Download presentation
Presentation is loading. Please wait.
Published byShannon Wood Modified over 6 years ago
1
Design your e-infrastructure. https://indico. egi
Design your e-infrastructure! Use case: CLL ERIC Break out group coordinator: Roberta Piscitelli Strategy and Policy officer at EGI foundation Krakow, 27. September, 2016.
2
Group members Roberta Piscitelli (EGI Foundation)
Giuseppe La Rocca (EGI Foundation) Fotis E. Psomopoulos (CERTH/AUTH) Kostas Stamatopoulos (ERIC, CERTH) Paolo Manghi (OpenAIRE, ISTI-CNR) Jani Heikkinen (CSC) Xavier Jeannin (RENATER) Peter Kacsuk (MTA-SZTAKI)
3
First break-out Background and Users
4
Who will be the user? Can the users be characterised? How many are they?
Scientists (physicians, biologists, (bio)informaticians, engineers, biostatisticians, other) involved in research on CLL and related disorders
ERIC has more than 730 members representing more than 50 countries, mostly in Europe but also Australia, the US and the Gulf region.
ERIC members are also active in other European and international scientific societies and consortia (e.g. European Haematology Association, European LeukemiaNet American Society of Hematology, Euroclonality-NGS, iwCLL), with large lists of members/partners who can be considered potential users.
5
What value will the envisaged system deliver for them (the whole setup)? What will the system exactly deliver to them? The value for the users would be one place to use the different services coming the different e-infrastructures. Migration of local services (groupes as computational/data/other services) to supporting e-infrastructures
a community platform like gCube that ensures a certain level of interoperability and scalability; different services are integrated:
Persistent sensitive data repository (patient and clinical trial data)
Computational analysis services around the repository
No authentication is needed to process the data, some activities require and authentication and some others not.
OpenAIRE can integrate with the outcome of ERIC experimental processes to publish data and experiments (services applied to the data at given time and conditions)
6
How should they use the system?
Depending on user expertise:
non tech-savvy users: access to data and services through web-interface The end users go to the web interface, query on the analysis dataset, and request the original data, in some cases they might request processing of the data. Different representations of the data should be available.
Experienced users will develop services on top of the platform
data producers: command-line access to resources
project managers: orchestration of services towards specific solutions
7
What's the timeline for development, testing and large-scale operation
What's the timeline for development, testing and large-scale operation? (Consecutive releases can/should be considered.) ERIC currently includes several mature services, however
only local infrastructure when available
No scalability
Q1 2017: Development and setup of the first pilot
Requirements collection (EGI)
Storing/archiving data (EUDAT)
A small computational and storage resource is allocated
Q3: 2017: Testing
Q1: 2018: Large-scale deployment: A second pilot is tested in the infrastructure to verify interoperability and feasibility for high throughput data (EGI and EUDAT)
Q2: 2018: Scientific Results are integrated in OpenAire
8
Design and implementation plan
Second break-out Design and implementation plan
9
What should the first version include
What should the first version include? - The most basic product prototype imaginable already bringing value to the users (the so-called Minimal Viable Product - MVP) First version would be a virtual research environment on a single site that would allow:
Integration of the existing distributed services into a single end-point Central repository of datasets A catalogue of services (tools and software?) and data sources A Virtual Organisation for CLL is created A small number of computational services hosted in EGI resources Replicated datasets in EUDAT One experimental case of dumping results of one experiment in EUDAT and retrieving data. Literature, documents related provided to OpenAire and the metadata connected to those and make those discoverable through OpenAire services
Indicative KPIs:
Over 5 computational services that offer data analysis processes Over 5 datasets that are currently across different sites 1 integrated service combining 3 or more of the existing services on extended datasets
10
Which components/services already exist in this architecture?
Several databases available to the community: ERIC minimum dataset project ERIC TP53 database IMGT/CLL-DB
Several tools available through ERIC: IGHV gene mutational analysis MRD diagnostics ARResT/AssignSubsets
2 EU funded projects are already under ERIC, all the results and publications are on the ERIC website (not all of them) so they can be linked by OpenAire
11
Which components/services are under development (and by who)?
An effort towards Data Integration is currently under way, involving the majority of ERIC members.
A centralized computational infrastructure would greatly facilitate this effort
12
Which components/services should be still brought into the system
Which components/services should be still brought into the system? Which EGI/EUDAT/GEANT/OpenAire partner can do it? A community in OpenAire should be created (OpenAire)
OpenAire Resource Community Dashboard (OpenAIRE system and Zenodo)
EGI FedCloud Resources, to make the user requests more scalable in terms of resources
A resource catalogue (D4Science/OpenAire)
B2SHARE and B2SAFE (publish datasets and long-term storage) (EUDAT)
13
Are there gaps in the EGI/EUDAT/GEANT/OpenAIRE service catalogues that should be filled to implement the use case? Which service provider could fill the gap? A scalable application framework (available in prototype version and will be further developed in the H2020 project COLA to be started in 2017 January) to handle different kinds of clouds and hybrid architecture, running on top of EGI FedCloud (MTA-SZTAKI)
Moving data around and computation to the data and create the pipeline, at all times the location in which the data is stored, will it also have access to compute?
The algorithm can be moved around the different services
Maybe OneData? Need to check how EGI is connected to EUDAT
A second pilot that is testing and solving this with high-throughput data should be implemented (EGI & EUDAT should address this)
14
Next steps Setup a dedicated VO for ERIC-CLL (EGI)
Move the first pilot in EGI resources (EGI and ERIC-CLL) A community in OpenAire should be created (OpenAire) Publish datasets and long-term storage (EUDAT) Describe how and what ERIC-CLL resources will be included in the Resource Catalogue (ERIC-CLL in collaboration with OpenAire)
Similarly, for the second pilot.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.