Distributing META-pipe on ELIXIR compute resources

Slides:

Advertisements

Similar presentations

Creating HIPAA-Compliant Medical Data Applications with Amazon Web Services Presented by, Tulika Srivastava Purdue University.

Advertisements

© 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Jim Donahue | Principal Scientist Adobe Systems Technology Lab Flint: Making.

EHarmony in Cloud Subtitle Brian Ko. eHarmony Online subscription-based matchmaking service Available in United States, Canada, Australia and United Kingdom.

Agile Infrastructure built on OpenStack Building The Next Generation Data Center with OpenStack John Griffith, Senior Software Engineer,

Low Control | Low Maintenance Shared Lower cost Dedicated Higher cost High Control | High Maintenance Hybrid On premises Off premises SQL Server Physical.

Nikolay Tomitov Technical Trainer SoftAcad.bg.  What are Amazon Web services (AWS) ?  What’s cool when developing with AWS ?  Architecture of AWS 

Amazon EC2 Quick Start adapted from EC2_GetStarted.html.

Customized cloud platform for computing on your terms !

Components of Windows Azure - more detail. Windows Azure Components Windows Azure PaaS ApplicationsWindows Azure Service Model Runtimes.NET 3.5/4, ASP.NET,

Windows Azure Conference 2014 Deploy your Java workloads on Windows Azure.

Customized cloud platform for computing on your terms ! Nirav Merchant

Installation and Development Tools National Center for Supercomputing Applications University of Illinois at Urbana-Champaign The SEASR project and its.

1 Week #10Business Continuity Backing Up Data Configuring Shadow Copies Providing Server and Service Availability.

Grid Computing at Yahoo! Sameer Paranjpye Mahadev Konar Yahoo!

European Life Sciences Infrastructure for Biological Information META-pipe WP6 Kick-off Lars Ailo Bongo, ELIXIR-NO.

1 e-Science AHM st Aug – 3 rd Sept 2004 Nottingham Distributed Storage management using SRB on UK National Grid Service Manandhar A, Haines K,

Aneka Cloud ApplicationPlatform. Introduction Aneka consists of a scalable cloud middleware that can be deployed on top of heterogeneous computing resources.

European Life Sciences Infrastructure for Biological Information ELIXIR Cloud Roadmap Chairs: Steven Newhouse, EMBL-EBI & Mirek Ruda,

Services for Sensitive Research Data Iozzi Maria Francesca, Group Leader & Nihal D. Perera, Senior Engineer Research Support Services Group ”Services for.

EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI Services for Distributed e-Infrastructure Access Tiziana Ferrari on behalf.

EPAM Cloud Orchestration

Scientific Data Processing Portal and Heterogeneous Computing Resources at NRC “Kurchatov Institute” V. Aulov, D. Drizhuk, A. Klimentov, R. Mashinistov,

Advanced Computing Facility Introduction

Petr Škoda, Jakub Koza Astronomical Institute Academy of Sciences

Enhancements to Galaxy for delivering on NIH Commons

Architecture and design

Connected Infrastructure

Accessing the VI-SEEM infrastructure

SparkBWA: Speeding Up the Alignment of High-Throughput DNA Sequencing Data - Aditi Thuse.

Introduction to Distributed Platforms

By Chris immanuel, Heym Kumar, Sai janani, Susmitha

MMG: from proof-of-concept to production services at scale (part II)

Customized cloud platform for computing on your terms !

Working With Azure Batch AI

Prepared by: Assistant prof. Aslamzai

Example: Rapid Atmospheric Modeling System, ColoState U

Docker Birthday #3.

WP6: Marine metagenomics

StratusLab Final Periodic Review

StratusLab Final Periodic Review

Provisioning 160,000 cores with HEPCloud at SC17

Bridges and Clouds Sergiu Sanielevici, PSC Director of User Support for Scientific Applications October 12, 2017 © 2017 Pittsburgh Supercomputing Center.

Our cloud usage - and not

Platform as a Service.

Tools and Services Workshop Overview of Atmosphere

ELIXIR activities in Norway (and Europe)

Connected Infrastructure

The Improvement of PaaS Platform ZENG Shu-Qing, Xu Jie-Bin 2010 First International Conference on Networking and Distributed Computing SQUARE.

TYPES OF SERVER. TYPES OF SERVER What is a server.

University of Technology

AWS COURSE DEMO BY PROFESSIONAL-GURU. Amazon History Ladder & Offering.

OpenStack Ani Bicaku 18/04/ © (SG)² Konsortium.

Design Unit 26 Design a small or home office network

Data Security for Microsoft Azure

Outline Virtualization Cloud Computing Microsoft Azure Platform

HC Hyper-V Module GUI Portal VPS Templates Web Console

CS110: Discussion about Spark

Ch 4. The Evolution of Analytic Scalability

Introduction to Apache

Brandon Hixon Jonathan Moore

Overview of big data tools

ELIXIR Competence Center

Cloud computing mechanisms

AWS Cloud Computing Masaki.

Syllabus and Introduction Keke Chen

* Introduction to Cloud computing * Introduction to OpenStack * OpenStack Design & Architecture * Demonstration of OpenStack Cloud.

Lecture 16 (Intro to MapReduce and Hadoop)

Challenges and approaches for providing a pipeline as a service

MMG: from proof-of-concept to production services at scale

OpenStack Summit Berlin – November 14, 2018

Presentation transcript:

Distributing META-pipe on ELIXIR compute resources Lars Ailo Bongo (NO)

Outline META-pipe META-pipe backend Future plans Biological functionality Resource requirements User interface demo META-pipe backend Design choices User management using ELIXIR AAI Distributed execution on ELIXIR compute cloud platform Future plans 30 min

META-pipe: marine metagenomics analysis pipeline QC and assembly, taxonomic classification and functional assignment Focus on full-length genes and the marine domain Outputs Genbank files and Krona charts; more formats being implemented Generates data for MarCat

META-pipe: resource usage QC and assembly Memory intensive Parallel but not distributed Taxonomic classification Very low resource usage Functional analysis Computationally intensive Low memory usage Data-parallel (scales well) Resource usage is dataset size dependent For big 2x7GB (paired-end, compressed) dataset… …5-6 hour QC assembly on 12 cores …24 hour functional analysis on 20x20 vcores More machines/cores => reduced execution time

META-pipe: resource requirements Compute resources High-memory machine for assembly Lots of cheap compute (virtual) machines for functional analysis Storage resources Bytes << cycles Network transfer time not an issue Not human data (not sensitive) Summary Need more than one server/ VM Can move data to compute resources

META-pipe backend architecture

Demo https://www.elixir-europe.org/documents/elixir-webinar-elixir-compute-platform- roapmap-november-2016

META-pipe backend architecture

Authentication using ELIXIR AAI For user Single sign-on using home institution credentials For analysis service provider Information from ELIXIR AAI: user ID, name, e-mail, (home institution, persistent ID, affiliation) Use information to implement authentication between our servers Resource monitoring and accounting (and payment?) Integration with ELIXIR data storage and transfer systems? Integration with other ELIXIR services?

META-pipe backend architecture

File upload Using web browser Stored on a META-pipe storage server Incoming! plugin to support large Gigabyte files But multi-GB files requires lots of compute resources! (In Norwegian NeLS: “ssh” between national infrastructure centers) Stored on a META-pipe storage server Currently one physical machine Object store (minio, S3 compatible) Not used during job execution Capacity most important

Job execution On our Stallo Supercomputer Press execute button On cPouta (FI) or CESNET (CZ) Administrator runs script to setup backend on cPouta or CESNET (once) Specify cPouta or CESNET as a tag for the job (for each job) Press execute button (for each job) In the future? User selects Elixir supported compute cloud resource (for each job) Backend automatically setup execution environment (for each job)

META-pipe backend architecture

Execution environment layers Pipeline META-pipe 2.0 tools, tool dependencies, and reference DBs Pipeline specification Spark program (+ our pipeline abstractions) Analysis engine Spark, NFS Cloud setup Ansible Terraform

Execution environment nodes Bastion node Cluster setup/ teardown scripts, cache with META-pipe tools and DBs Master node NFS server, Spark driver NFS volume: Java, Scala, Spark META-pipe tools and dependencies Reference databases Spark job input files Worker nodes Spark workers Local storage (reference DBs, Spark temporary files)

cPouta cloud setup cPouta is an OpenStack cloud at CSC (FI) We provide a tool for setting up the execution environment Work done in collaboration with ELIXIR-FI and ELIXIR compute platform Create environment (once) Create security group and ssh keys, setup network, setup bastion host Download META-pipe tools, dependencies and databases from our artifact server Create persistent volume with artifacts (used to initiate NFS disk on master)

cPouta cloud setup and META-pipe job execution Create virtual cluster Cluster provisioning and configuration Master: Install and setup Java, Scala, and Spark (generic) Master: setup NFS, provision and mount cached volume Workers: mount NFS, setup Spark worker Launch a job Get a job tagged with cPouta from META-pipe job server Copy input files from META-pipe storage server to master:/tmp/ Run Spark job on virtual cluster Copy results from master:/tmp/ to META-pipe storage server

cPouta cloud teardown Virtual cluster teardown Deprovision cluster and remove temporary files Environment cleanup (once) Remove security group and keys, delete META-pipe volume

Elixir compute cloud setup ELIXIR-CZ has created Terraform configurations for setting up META-pipe on Elixir compute clouds (OCCI endpoints) Based on our OpenStack cluster setup tool Create environment and setup master and slaves Provision hosts, install backend, and install META-pipe tools (as for cPouta) Launch job (as for cPouta) Cleanup

Backend design choices Scalable distributed execution of jobs One job is distributed over many machines in a (virtual) cluster Many virtual clusters may run at the same time Centralized servers reduce complexity Lightweight and portable execution managers Layered architecture Reuse, optimize separately Spark based backend “Cloud standard” with active software ecosystem

Future work Technical Administrative Elastic resource allocation (at scale) Assembly + functional analysis as a single job Reliable resource allocation for “bring your own cloud” Automatic failure handling Improved security Administrative Off-load monitoring and management User support on distributed resources Accounting

Summary META-pipe ELIXIR AAI integration ELIXIR compute cloud ready Compute intensive workload Distributed backend Layered (reusable) architecture ELIXIR AAI integration ELIXIR compute cloud ready

Acknowledgments META-pipe team: ELIXIR-NO ELIXIR-FI and ELIXIR-CZ Nils P. Willassen, Lars Ailo Bongo, Erik Hjerde, Espen M. Robertsen, Inge Alexander Raknes, Aleksandr Agafonov, Terje Klemetsen, Giacomo Tartari … ELIXIR-NO NeLS ELIXIR-FI and ELIXIR-CZ AAI, cloud setup EXCELERATE WP6