MMG: from proof-of-concept to production services at scale

Slides:



Advertisements
Similar presentations
Suggested Course Outline Cloud Computing Bahga & Madisetti, © 2014Book website:
Advertisements

1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
Office of Science U.S. Department of Energy Grids and Portals at NERSC Presented by Steve Chan.
EUROPEAN UNION Polish Infrastructure for Supporting Computational Science in the European Research Space User Oriented Provisioning of Secure Virtualized.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
A Brief Overview by Aditya Dutt March 18 th ’ Aditya Inc.
 Cloud computing  Workflow  Workflow lifecycle  Workflow design  Workflow tools : xcp, eucalyptus, open nebula.
EGI-Engage EGI-Engage Engaging the EGI Community towards an Open Science Commons Project Overview 9/14/2015 EGI-Engage: a project.
PCGRID ‘08 Workshop, Miami, FL April 18, 2008 Preston Smith Implementing an Industrial-Strength Academic Cyberinfrastructure at Purdue University.
Windows Azure Insights for the Enterprise IT Pro John Craddock Infrastructure and Identity Architect XTSeminars AZR301.
EXPOSE GOOGLE APP ENGINE AS TASKTRACKER NODES AND DATA NODES.
Windows Azure Conference 2014 Deploy your Java workloads on Windows Azure.
Grid Computing at The Hartford Condor Week 2008 Robert Nordlund
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
European Life Sciences Infrastructure for Biological Information META-pipe WP6 Kick-off Lars Ailo Bongo, ELIXIR-NO.
Grid Execution Management for Legacy Code Applications Grid Enabling Legacy Applications.
Infrastructure as code. “Enable the reconstruction of the business from nothing but a source code repository, an application data backup, and bare metal.
Lecture III: Challenges for software engineering with the cloud CS 4593 Cloud-Oriented Big Data and Software Engineering.
Azure in a Day Training: Windows Azure Module 1: Windows Azure Overview Module 2: Development Environment / Portal – DEMO: Signing up for Windows Azure.
Document Name CONFIDENTIAL Version Control Version No.DateType of ChangesOwner/ Author Date of Review/Expiry The information contained in this document.
Lars Ailo Bongo NBS meeting Tromsø, Jan 23, 2016 NeLS Norwegian e-Infrastructure for Life Sciences Overview and recent developments
Grid Activities in CMS Asad Samar (Caltech) PPDG meeting, Argonne July 13-14, 2000.
INDIGO – DataCloud WP5 introduction INFN-Bari CYFRONET RIA
European Life Sciences Infrastructure for Biological Information EGI 2015, Lisbon, 18 May 2015 Rafael C Jimenez, ELIXIR CTO ELIXIR.
European Life Sciences Infrastructure for Biological Information ELIXIR Cloud Roadmap Chairs: Steven Newhouse, EMBL-EBI & Mirek Ruda,
Architecture of a platform for innovation and research Erik Deumens – University of Florida SC15 – Austin – Nov 17, 2015.
EGI-InSPIRE RI EGI Compute and Data Services for Open Access in H2020 Tiziana Ferrari Technical Director, EGI.eu
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI Overview for ENVRI Gergely Sipos, Malgorzata Krakowian EGI.eu
For EGI/EUDAT EMBL/ELIXIR use-cases Tony Wildish
EGI-InSPIRE RI An Introduction to European Grid Infrastructure (EGI) March An Introduction to the European Grid Infrastructure.
Canadian Bioinformatics Workshops
PaaS services for Computing and Storage
Architecture and design
Course: Cluster, grid and cloud computing systems Course author: Prof
Accessing the VI-SEEM infrastructure
Organizations Are Embracing New Opportunities
AAI for a Collaborative Data Infrastructure
Introduction to Distributed Platforms
Budget JRA2 Beneficiaries Description TOT Costs incl travel
MMG: from proof-of-concept to production services at scale (part II)
Deploying Galaxy in a secure environment to analyse sensitive data
The PaaS Layer in the INDIGO-DataCloud
Working With Azure Batch AI
INTAROS WP5 Data integration and management
Docker Birthday #3.
WP6: Marine metagenomics
StratusLab Final Periodic Review
StratusLab Final Periodic Review
Joseph JaJa, Mike Smorul, and Sangchul Song
Ideas for an ICOS Competence Centre Implementation of an on-demand computation service Ute Karstens, André Bjärby, Oleg Mirzov, Roger Groth, Mitch Selander,
Cloud Data platform (Cloud Application Development & Deployment)
Our cloud usage - and not
Logo here Module 3 Microsoft Azure Web App. Logo here Module Overview Introduction to App Service Overview of Web Apps Hosting Web Applications in Azure.
ELIXIR activities in Norway (and Europe)
THE STEPS TO MANAGE THE GRID
EGI-Engage Engaging the EGI Community towards an Open Science Commons
Design and Implement Cloud Data Platform Solutions
An easier path? Customizing a “Global Solution”
Microsoft Virtual Academy
Challenges and approaches for providing a pipeline as a service
Use Case #1: Mobile Virtual Desktop
5 Azure Services Every .NET Developer Needs to Know
Distributing META-pipe on ELIXIR compute resources
Agenda Need of Cloud Computing What is Cloud Computing
Windows Azure Hybrid Architectures and Patterns
Day 2, Session 2 Connecting System Center to the Public Cloud
Harrison Howell CSCE 824 Dr. Farkas
Joining the EOSC Ecosystem
EOSC-hub Contribution to the EOSC WGs
Containers on Azure Peter Lasne Sr. Software Development Engineer
Presentation transcript:

MMG: from proof-of-concept to production services at scale Lars Ailo Bongo (ELIXIR-NO, WP6) WP4 F2F, 8-9 February 2017, Stockholm, Sweden

MMG on ELIXIR compute clouds Proof-of-concept: META-pipe on cPouta √ EMG on Embassy cloud √ Webinar: ELIXIR Compute Platform Roadmap TODO: Test META-pipe and EMG at scale Deploy META-pipe and EMG production service on cloud Document best practices Integrate META-pipe and EMG with other ELIXIR platforms Incorporate other MMG pipelines such as BioMaS Issues: Missing policies: who is paying for resources? Which resources can different users use? … Missing technology: how to do accounting? How to ensure a stable service? …

Outline META-pipe: Other MMG/WP6 activities EMG presentation to follow User feedback Elixir compute TUCs and other components used Need your help here Future plans Other MMG/WP6 activities EMG presentation to follow

META-pipe: analysis as a service √ Login Upload data Select analysis tool parameters Execute analysis Download results

META-pipe: architecture √

META-pipe: front-end technical solutions √ Login Authorization server integrated with ELIXIR AAI Upload data Incoming! web app library META-pipe storage server Select analysis parameters META-pipe web app Execute analysis META-pipe job queue META-pipe execution environment Download result

META-pipe: front-end policies Login All ELIXIR users can login  gives (user, home institution) Who can pay for the resources? Who is allowed to use tools and resources (academic vs industry)? Upload data Data size gives computation requirements Small for free? Medium on pre-allocated? Large as special case? Select analysis parameters / execute analysis Which resource to use? Who decides? Commercial clouds? Scheduling/ prioritization of jobs? Response time guarantees? Who is responsible to maintain and monitor resources? Download result Private vs (eventually) public?

META-pipe (and EMG): bakcend layers (√) Pipeline tools & DBs META-pipe Pipeline specification Spark program Analysis engine Spark, NFS Cloud setup cPouta ansible playbook

META-pipe: cloud execution Pipeline tools & reference DBs: Mostly 3rd party binaries Hundreds of GB of reference DBs Packaged in META-pipe Jenkins server Not in a container/ VM (no benefits for now) TODO: standardize description/ provenance data reporting (WP4?) TODO: summarize best practices (WP4 / ?) Spark program Regular spark program + abstractions/interfaces for running 3rd party binaries TODO: better error detection, logging, and handling (WP6) TODO: more secure execution (WP6/ WP4) TODO: accounting and payment (WP4) TODO: use our approach for other pipelines? (WP4)

META-pipe: cloud execution Spark, NFS execution environment: Standalone Spark NFS since some tools need a shared file system TODO: optimize execution environments (WP6/WP4) TODO: test scalability (WP6/ WP4) ?: integrate META-pipe storage server with ELIXIR storage & transfer cPouta ansible playbook Setup Spark and NFS execution environment on cPouta OpenStack Ongoing work: setup execution environment on Open Nebula (CZ) TODO: port to other clouds (WP4?) TODO: provide best practice guidelines (WP4) TODO: long term maintaining of setup tools (?)

WP6 deliverables The comprehensive metagenomics standards environment √ Paper to be submitted on Friday Provenance of sampling standard Provenance of sequencing standard Provenance of analysis best practices Archiving of analysis discussion Marine metagenomics portal (MMP) √ https://mmp.sfb.uit.no/ Marine reference databases (MarRef, MarDB, MarCat) META-pipe used to process data for MarCat

WP6 deliverables MMG analysis pipelines: August 2018 Test META-pipe and MMG at scale Deploy META-pipe and MMG on ELIXIR compute clouds Evaluation of tools Synthetic benchmark metagenomes Federated search engine Training and workshops Metagenomics data analysis, 3-6 April 2017, Helsinki, Finland Metagenomics data analysis, ?, ?, Portugal

BioMaS pipeline on INDIGO-Datacloud BioMaS is a taxonomical classification pipeline (ELIXIR-IT) Provided as an on-demand Galaxy instance Based on INDIGO-Datacloud

Pyttipanna Who is the user of cloud services? Pipeline providers? End-users? Data transfer vs storage vs AAI 3 services? 1 distributed file storage? EMG cloud proof-of-concept = Plant use case Setup VMs, transfer data, allow user to run analyses

Summary 2 MMG pipelines can be run on ELIXIR clouds Need resources to test at scale Need policies and TUCs (21 and 22) for production use of clouds

TUCs TUC1/ TUC3 (Federated ID/ ELIXIR Identity): TUC2 (Other ID): Give access to service Get information needed for accounting and payment TUC2 (Other ID): Give access to non-European-academic users TUC4 (Cloud IaaS Services) Cloud providers that can run execution environment TUC5 (HTC/ HPC cluster) Run batch jobs to produce reference databases TUC6 (PRACE cluster) We do not need PRACE scale resources

TUCs TUC7 (Network file storage) TUC8 (File transfer) Not provided (we setup NFS as part of execution environment) TUC8 (File transfer) Not needed (file transfer time is low) TUC9/ TUC11 (Infrastructure service directory/ registry) Not needed TUC10 (Credential translation) TUC11 (Service access management) Needed to maintain user submitted data

TUCs TUC12/13 (Virtual machine library/ container library) We provide analysis as a service VMs/ containers useful for visualization tools TUC14 (Module library) We have META-pipe in a deployment server TUC15 (Data set replication) Not needed (our datasets are small) TUC17 (Endorsed…) User submitted data management TUC18 (Cloud storage) Replace META-pipe storage server

TUCs TUC19 (PID and metadata registry) Provide in reference databases? TUC20/23 (Federated cloud/HPC/HTC) Not exposed to our end-users TUC21 (Operational integration) Service availability monitoring is needed TUC22 (Resource accounting) Is very much needed