INDIGO: Building a DataCloud Framework to Support Open Science Yin Chen, Fernando Aguilar,

Slides:



Advertisements
Similar presentations
EXTENDING SCIENTIFIC WORKFLOW SYSTEMS TO SUPPORT MAPREDUCE BASED APPLICATIONS IN THE CLOUD Shashank Gugnani Tamas Kiss.
Advertisements

SOA & BPM Business Architecture, SOA & BPM Learn about SOA and Business Process Management (BPM) Learn how to build process diagrams.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
Near East Rural & Agricultural Knowledge and Information Network - NERAKIN Food and Agriculture Organization of the United Nations Near East and North.
EGI-Engage EGI-Engage Engaging the EGI Community towards an Open Science Commons Project Overview 9/14/2015 EGI-Engage: a project.
Flexibility and user-friendliness of grid portals: the PROGRESS approach Michal Kosiedowski
United Nations Economic Commission for Europe Statistical Division Seasonal Adjustment Process with Demetra+ Anu Peltola Economic Statistics Section, UNECE.
Jarek Nabrzyski, Ariel Oleksiak Comparison of Grid Middleware in European Grid Projects Jarek Nabrzyski, Ariel Oleksiak Poznań Supercomputing and Networking.
Protein Molecule Simulation on the Grid G-USE in ProSim Project Tamas Kiss Joint EGGE and EDGeS Summer School.
E-Science for the SKA WF4Ever: Supporting Reuse and Reproducibility in Experimental Science Lourdes Verdes-Montenegro* AMIGA and Wf4Ever teams Instituto.
Current and Future Applications of the Generic Statistical Business Process Model at Statistics Canada Laurie Reedman and Claude Julien May 5, 2010.
Sharing Workflows through Coarse-Grained Workflow Interoperability : Sharing Workflows through Coarse-Grained Workflow Interoperability G. Terstyanszky,
TESTBED FOR FUTURE INTERNET SERVICES TEFIS at the EU-Canada Future Internet Workshop, March Annika Sällström – Botnia Living Lab at Centre for.
Composing workflows in the environmental sciences using Web Services and Inferno Jon Blower, Adit Santokhee, Keith Haines Reading e-Science Centre Roger.
SEAD Virtual Archive :: A Thin Layer for Scientific Discovery and Long-Term Preservation Inna Kouper April #dlbbspring2013.
1 Computing Challenges for the Square Kilometre Array Mathai Joseph & Harrick Vin Tata Research Development & Design Centre Pune, India CHEP Mumbai 16.
United Nations Economic Commission for Europe Statistical Division High-Level Group Achievements and Plans Steven Vale UNECE
Portable Infrastructure for the Metafor Metadata System Charlotte Pascoe 1, Gerry Devine 2 1 NCAS-BADC, 2 NCAS-CMS University of Reading PIMMS provides.
The Astronomy challenge: How can workflow preservation help? Susana Sánchez, Jose Enrique Ruíz, Lourdes Verdes-Montenegro, Julian Garrido, Juan de Dios.
Cooperative experiments in VL-e: from scientific workflows to knowledge sharing Z.Zhao (1) V. Guevara( 1) A. Wibisono(1) A. Belloum(1) M. Bubak(1,2) B.
A worldwide e-Infrastructure for NMR and structural biology A worldwide e-Infrastructure for NMR and structural biology Introduction Structural biology.
A scalable and flexible platform to run various types of resource intensive applications on clouds ISWG June 2015 Budapest, Hungary Tamas Kiss,
| nectar.org.au NECTAR TRAINING Module 2 Virtual Laboratories and eResearch Tools.
Earth System Curator and Model Metadata Discovery and Display for CMIP5 Sylvia Murphy and Cecelia Deluca (NOAA/CIRES) Hannah Wilcox (NCAR/CISL) Metafor.
SHIWA and Coarse-grained Workflow Interoperability Gabor Terstyanszky, University of Westminster Summer School Budapest July 2012 SHIWA is supported.
Next generation Science Gateways in the context of the INDIGO project: a pilot case on large scale climate-change data analytics Roberto Barbera, Riccardo.
VLDATA Common solution for the (very-)large data challenge EINFRA-1, focus on topics (4) & (5)
1 st EGI CTA VT meeting 18 January 2013 C. Vuerli (INAF, Italy), N. Neyroud (CNRS/IN2P3/LAPP, France)
EGI-Engage EGI, EGI-Engage and the DARIAH CC Gergely Sipos Technical Outreach Manager 3/9/
INDIGO – DataCloud WP5 introduction INFN-Bari CYFRONET RIA
An Open Data Platform in the framework of the EGI-LifeWatch Competence Centre Fernando Aguilar Jesús Marco
Disaster Mitigation Competence Centre Project Meeting Coordinator: Simon Lin March 31, 2015.
Overview of the global architecture Giacinto DONVITO INFN-Bari.
1 This Changes Everything: Accelerating Scientific Discovery through High Performance Digital Infrastructure CANARIE’s Research Software.
RI EGI-InSPIRE RI Astronomy and Astrophysics Dr. Giuliano Taffoni Dr. Claudio Vuerli.
A worldwide e-Infrastructure and Virtual Research Community for NMR and structural biology Alexandre M.J.J. Bonvin Project coordinator Bijvoet Center for.
InSilicoLab – Grid Environment for Supporting Numerical Experiments in Chemistry Joanna Kocot, Daniel Harężlak, Klemens Noga, Mariusz Sterzel, Tomasz Szepieniec.
INDIGO Outreach and Exploitation process Peter Solagna, Matthew Viljoen EGI.eu.
EGI-InSPIRE RI EGI Compute and Data Services for Open Access in H2020 Tiziana Ferrari Technical Director, EGI.eu
SCI-BS is supported by the FP7 Capacities Programme under contract nr RI Quality assurance in SCI-BUS project by applying agile testing practices.
EGI-InSPIRE RI An Introduction to European Grid Infrastructure (EGI) March An Introduction to the European Grid Infrastructure.
INDIGO – DataCloud ATOS exploitation approach and plans Indigo-DataCloud Industrial Partner meeting Frascati (Rome) May 6 th F. Javier Nieto de Santos.
Enabling scientific applications on hybrid e-Infrastructures: the FutureGateway framework Marco Fargetta (INFN), Riccardo Bruno (INFN), Roberto Barbera.
Yin Chen, EGI.eu Fernando Aguilar, , IFCA-CSIC
Accessing the VI-SEEM infrastructure
EUDAT Towards a European Collaborative Data Infrastructure
User Interfaces: Science Gateways, Workflows and Toolkits
SA2 Knowledge Commons EGI-LifeWatch Competence Center
WP2 Summary Progress in Frascati AHM
The First INDIGO-DataCloud Software Release
Overview of the global architecture
The Development Process of Web Applications
Population Imaging Use Case - EuroBioImaging
Supporting Research on Biodiversity: LifeWatch on the Cloud
eInfraCentral Portal User requirements and features
Exploitation and Sustainability updates
Defining and tracking requirements for New Communities
Data Ingestion in ENES and collaboration with RDA
Fernando Aguilar, IFCA-CSIC
PaaS Core Session (Notes from UPV)
EGI-Engage Engaging the EGI Community towards an Open Science Commons
An easier path? Customizing a “Global Solution”
INDIGO - DataCloud Dissemination activities
OGC Testbed 13 Outcomes and Information on the OGC Testbed 14
Case Study: Algae Bloom in a Water Reservoir
LifeWatch Cloud Computing Workshop
Clouds from FutureGrid’s Perspective
Dtk-tools Benoit Raybaud, Research Software Manager.
EOSC-hub Contribution to the EOSC WGs
GLENNA2 – The Nordic Cloud
Presentation transcript:

INDIGO: Building a DataCloud Framework to Support Open Science Yin Chen, Fernando Aguilar, Jesus Marco, Sandro Fiore, Massimiliano Rossi, INDIGO Collaboration Team,

Towards a sustainable European PaaS-based cloud solution for e-Science INDIGO-DataCloud INDIGO - DataCloud develops an open source data and computing platform targeted at scientific communities, deployable on multiple hardware and provisioned over hybrid, private or public, e-infrastructures. The platform aims to address a wide range of challenging requirements posed by leading-edge research activities conducted by 11 scientific communities from different research areas > Biological & Medical science > Social science & Humanities > Environmental and Earth science > Physics & Astrophysics 26 European partners in 11 European countries H2020 project, Apr 2015 – Oct 2017, 11M EURO website: INDIGO-DataCloud RIA

INDIGOArchitecture Enhanced features in: IaaS PaaS SaaS User-driven design approach Based on community requirements Reduce the gaps between ICT & Researchers INDIGO-DataCloud RIA

INDIGO- Communities Life Sciences: ELIXIR, INSTRUCT/WeNMR, EuroBioImaging Physical Sciences & Astronomy: CTA, LBT, WLCG Social Sciences & Humanities: DARIAH, DCH-RP Environmental Science: LifeWatch, EMSO, ENES INDIGO-DataCloud RIA

Methodology for Requirement Collection A template was designed to gather information from communities Based on Case Studies A Case Study is an implementation of a research method involving an up-close, in- depth, and detailed examination of a subject of study (the Case), as well as its related contextual conditions. Focus on Case Studies that are representative both of the research challenge and complexity but also of the possibilities offered by INDIGO- Data Cloud solutions on it. A Case Study is (ideally) based on a set of User Stories, i.e. how the researcher describes the steps to solve each part of the problem addressed. User Stories are the starting point of Use Cases, where they are transformed into a description using software engineering terms (like the actors, scenario, preconditions, etc). Use Cases are useful to capture the Requirements that will be handled by the INDIGO software developed in JRA work packages, and tracked by the Backlog system from the OpenProject tool. The template serves as a structured framework with guiding questions concerned by INDIGO development work packages. User Stor y A User Stor y B User Stor y C Case Study 1 Use Cases Requirements #LW1 #LW2 Open Project BACKLOG User Stor y A’ User Stor y B’ Case Study 2 highly criticized and refined in several iterations INDIGO-DataCloud RIA

Methodology for Requirement Analysis Step1: Analysis of the Questionnaires to identify requirements Produced large EXCEL table, several entries per Case Study Community, Req#, Req. Descr., Rank (Mandatory/Convenient/Optional), Current, Gaps, Solution…) Step2: identification of common requirements Produced single EXCEL table INDIGO-DataCloud RIA

INDIGO Community Requirements List INDIGO-DataCloud RIA

Reduce the Gaps between ICT and Communities Researchers INDIGO-DataCloud RIA Identify communities Champions Share activities inside INDIGO (in particular with ICT people) Advanced research activities (postdoc+ level) Different Technical/Scientific background/interest Understand how INDIGO services will match communities requirements Confirmation of the Case Studies analysis Selection of INDIGO components/services Using Tool for communication, i.e. OpenProject Record Case Studies, User Stories, Requirements Track INDIGO development Give feedback to ICT Testbeds to “try” Realistic resources Where demos can be built Build an exploitation path for their communities Help to prepare material to “support” their solution in front of colleagues Explore all components (e-infrastructure/resources included!) Dissemination material

Case Studies INDIGO-DataCloud RIA Case Study P0_1: Monitoring and Modelling Algae Bloom in a Water Reservoir Case Study P0_2: TRUFA (Transcriptomes User-Friendly Analysis) Case Study P1: Medical Imaging Biobanks Case Study P2: Molecular Dynamics Simulations Case Study P3_1: Astronomical Data Archives Case Study P3_2: Archive System for the Cherenkov Telescope Array (CTA) Case Study P4_1: HADDOCK Portal Case Study P4_2: DisVis Case Study P4_3: PowerFit Case Study P5: Climate models inter comparison data analysis Case Study P6: eCulture Science Gateway Case Study P7: EGI FedCloud Community Requirements Case Study P8: ELIXIR-ITA: Galaxy as a Cloud Service Case Study P9: MOIST—Multidisciplinary Oceanic Information System Case Study P10: Data repository platform for DARIAH

Community use cases of using INDIGO Solutions - Modeling INDIGO-DataCloud RIA

Community use cases of using INDIGO Solutions - Modeling INDIGO-DataCloud RIA A.ICT Expert configures a VM/Container with all software needed. B.ICT expert needs to upload all the input files for being modelled. C.N models can be run using the same input: Parameter sweep. D.TOSCA expert configure the template to deploy de 1-N instances for running models. E.The Orchestrator deploys the N Running Instances, which are monitored. F.RIs access directly to input files and write output files. G.BIO expert (final user) can check the running status and access directly to the output.

Community use cases of using INDIGO Solutions - Workflows INDIGO-DataCloud RIA

Community use cases of using INDIGO Solutions - Workflows INDIGO-DataCloud RIA A.TRUFA manager sets up a container/VM for TRUFA web and WNs (including all the pipeline software). B.TOSCA expert configure the templates to deploy the environment. C.TRUFA web is a service that need to be always running (Kubernetes?) D.TRUFA Web should be able to communicate with Orchestrator that manage the WNs and Scale them when needed. E.TRUFA Web can check the status of the jobs. F.Both TRUFA web and WNs can access to a Disk volume managed by OneData. Input/Output G.User access directly to TRUFA Web that is the endpoint to manage the actions: upload input, download outputs, configure a pipeline, check the status.

INDIGO-DataCloud RIA Better Sotware for better Science INDIGO-DataCloud Save the date - The path towards INDIGO DataCloud Release  Internal demo software release – April 2016  First INDIGO Software release – July 2016  Second INDIGO Software release – March 2017 Save the date - The path towards INDIGO DataCloud Release  Internal demo software release – April 2016  First INDIGO Software release – July 2016  Second INDIGO Software release – March 2017 More info: www. twitter.com/indigodatacloud Subscribe to INDIGO Newsletter: Follow on: