Download presentation
Presentation is loading. Please wait.
Published byLudmila Hrušková Modified over 5 years ago
1
MMG: from proof-of-concept to production services at scale
Lars Ailo Bongo (ELIXIR-NO, WP6) WP4 F2F, 8-9 February 2017, Stockholm, Sweden
2
MMG on ELIXIR compute clouds
Proof-of-concept: META-pipe on cPouta √ EMG on Embassy cloud √ Webinar: ELIXIR Compute Platform Roadmap TODO: Test META-pipe and EMG at scale Deploy META-pipe and EMG production service on cloud Document best practices Integrate META-pipe and EMG with other ELIXIR platforms Incorporate other MMG pipelines such as BioMaS Issues: Missing policies: who is paying for resources? Which resources can different users use? … Missing technology: how to do accounting? How to ensure a stable service? …
3
Outline META-pipe: Other MMG/WP6 activities EMG presentation to follow
User feedback Elixir compute TUCs and other components used Need your help here Future plans Other MMG/WP6 activities EMG presentation to follow
4
META-pipe: analysis as a service √
Login Upload data Select analysis tool parameters Execute analysis Download results
5
META-pipe: architecture √
6
META-pipe: front-end technical solutions √
Login Authorization server integrated with ELIXIR AAI Upload data Incoming! web app library META-pipe storage server Select analysis parameters META-pipe web app Execute analysis META-pipe job queue META-pipe execution environment Download result
7
META-pipe: front-end policies
Login All ELIXIR users can login gives (user, home institution) Who can pay for the resources? Who is allowed to use tools and resources (academic vs industry)? Upload data Data size gives computation requirements Small for free? Medium on pre-allocated? Large as special case? Select analysis parameters / execute analysis Which resource to use? Who decides? Commercial clouds? Scheduling/ prioritization of jobs? Response time guarantees? Who is responsible to maintain and monitor resources? Download result Private vs (eventually) public?
8
META-pipe (and EMG): bakcend layers (√)
Pipeline tools & DBs META-pipe Pipeline specification Spark program Analysis engine Spark, NFS Cloud setup cPouta ansible playbook
9
META-pipe: cloud execution
Pipeline tools & reference DBs: Mostly 3rd party binaries Hundreds of GB of reference DBs Packaged in META-pipe Jenkins server Not in a container/ VM (no benefits for now) TODO: standardize description/ provenance data reporting (WP4?) TODO: summarize best practices (WP4 / ?) Spark program Regular spark program + abstractions/interfaces for running 3rd party binaries TODO: better error detection, logging, and handling (WP6) TODO: more secure execution (WP6/ WP4) TODO: accounting and payment (WP4) TODO: use our approach for other pipelines? (WP4)
10
META-pipe: cloud execution
Spark, NFS execution environment: Standalone Spark NFS since some tools need a shared file system TODO: optimize execution environments (WP6/WP4) TODO: test scalability (WP6/ WP4) ?: integrate META-pipe storage server with ELIXIR storage & transfer cPouta ansible playbook Setup Spark and NFS execution environment on cPouta OpenStack Ongoing work: setup execution environment on Open Nebula (CZ) TODO: port to other clouds (WP4?) TODO: provide best practice guidelines (WP4) TODO: long term maintaining of setup tools (?)
11
WP6 deliverables The comprehensive metagenomics standards environment √ Paper to be submitted on Friday Provenance of sampling standard Provenance of sequencing standard Provenance of analysis best practices Archiving of analysis discussion Marine metagenomics portal (MMP) √ Marine reference databases (MarRef, MarDB, MarCat) META-pipe used to process data for MarCat
12
WP6 deliverables MMG analysis pipelines: August 2018
Test META-pipe and MMG at scale Deploy META-pipe and MMG on ELIXIR compute clouds Evaluation of tools Synthetic benchmark metagenomes Federated search engine Training and workshops Metagenomics data analysis, 3-6 April 2017, Helsinki, Finland Metagenomics data analysis, ?, ?, Portugal
13
BioMaS pipeline on INDIGO-Datacloud
BioMaS is a taxonomical classification pipeline (ELIXIR-IT) Provided as an on-demand Galaxy instance Based on INDIGO-Datacloud
14
Pyttipanna Who is the user of cloud services?
Pipeline providers? End-users? Data transfer vs storage vs AAI 3 services? 1 distributed file storage? EMG cloud proof-of-concept = Plant use case Setup VMs, transfer data, allow user to run analyses
15
Summary 2 MMG pipelines can be run on ELIXIR clouds
Need resources to test at scale Need policies and TUCs (21 and 22) for production use of clouds
16
TUCs TUC1/ TUC3 (Federated ID/ ELIXIR Identity): TUC2 (Other ID):
Give access to service Get information needed for accounting and payment TUC2 (Other ID): Give access to non-European-academic users TUC4 (Cloud IaaS Services) Cloud providers that can run execution environment TUC5 (HTC/ HPC cluster) Run batch jobs to produce reference databases TUC6 (PRACE cluster) We do not need PRACE scale resources
17
TUCs TUC7 (Network file storage) TUC8 (File transfer)
Not provided (we setup NFS as part of execution environment) TUC8 (File transfer) Not needed (file transfer time is low) TUC9/ TUC11 (Infrastructure service directory/ registry) Not needed TUC10 (Credential translation) TUC11 (Service access management) Needed to maintain user submitted data
18
TUCs TUC12/13 (Virtual machine library/ container library)
We provide analysis as a service VMs/ containers useful for visualization tools TUC14 (Module library) We have META-pipe in a deployment server TUC15 (Data set replication) Not needed (our datasets are small) TUC17 (Endorsed…) User submitted data management TUC18 (Cloud storage) Replace META-pipe storage server
19
TUCs TUC19 (PID and metadata registry)
Provide in reference databases? TUC20/23 (Federated cloud/HPC/HTC) Not exposed to our end-users TUC21 (Operational integration) Service availability monitoring is needed TUC22 (Resource accounting) Is very much needed
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.