Presentation is loading. Please wait.

Presentation is loading. Please wait.

The CLoud Infrastructure for Microbial Bioinformatics

Similar presentations


Presentation on theme: "The CLoud Infrastructure for Microbial Bioinformatics"— Presentation transcript:

1 The CLoud Infrastructure for Microbial Bioinformatics
Dr Tom Connor Cardiff University Genome Science 2015 @tomrconnor

2 The CLIMB Consortium Are
Professor Mark Pallen (Warwick) and Dr Sam Sheppard (Swansea) – Joint PIs Professor Mark Achtman (Warwick), Professor Steve Busby FRS (Birmingham), Dr Tom Connor (Cardiff)*, Professor Tim Walsh (Cardiff), Dr Robin Howe (Public Health Wales) – Co-Is Dr Nick Loman (Birmingham)* and Dr Chris Quince (Warwick) ; MRC Research Fellows Simon Thompson (Birmingham, Project Technical/OpenStack lead), Marius Bakke (Warwick, Systems administrator), Simon Thompson (Swansea, Systems administrator) * Principal bioinformaticians architecting and designing the system

3 The CLoud Infrastructure for Microbial Bioinformatics (climb.ac.uk)
We are creating A one stop shop for Microbial Bioinformatics Public/private cloud for use by UK academics Standardised cloud images that implement key pipelines Storage repository for data/images that are made available online and within our system, anywhere (‘eduroam for microbial genomics’) We will provide access to other databases from within the system As well as providing a place to support orphan databases and tools

4 System Outline 4 sites Connected over Janet
Different sizes of VM available; personal, standard, large memory, huge memory Able to support >1,000 VMs simultaneously (1:1 vCPUs/vRAM : CPUs/RAM) 7-8PB of object storage across 4 sites (~2-3PB usable with erasure coding) 4-500TB of local high performance storage per site A single system, with common log in, and between site data replication System has been designed to enable the addition of extra nodes / Universities

5 CLIMB Overview: GS Update
4 sites, running OpenStack Hardware procured in a three stage process IBM/OCF provided compute, Dell/redhat provided storage Networks provided by Brocade Are defining a reference architecture to enable other sites to trivially be added

6 Hardware (per site) 2 router/firewalls (capable of routing >80Gb each) 3 Controllers 21x 64 vCore, 512GB RAM nodes 3x 192 vCore, 3TB RAM nodes ~500TB GPFS (local) 4 controllers Infiniband, with 10Gb failover ~2PB total Ceph (shared) 27x 64TB nodes/site Cross site replication 10Gb Backbone

7 Overview – 4 sites, (virtually) identical hardware
External clouds External databases Each site is connected to the others over VPN tunnels. Sites can be easily added. System can use free router software and commodity hardware, pay for-software or dedicated router/firewalls Our intention is for the system to be presented to users as a single system, with single login, via Shibboleth. We are currently working on that bit  External clouds External databases A single system makes it easy(er) to share methods and data!

8 Flavours User configurable, with standard flavours
Regular; up to 8 vCPUs, 64GB RAM xlarge; up to 16 vCPUs, 256GB RAM Huge; up to 192 vCPUs, 3TB RAM System also supports a scalable virtual cluster 2+ nodes with 2+ vCPUs, 2-4GB RAM/vCPU Also provides for Long Term Hosting (for new or orphan datasets/tools)

9 Access Microbial researchers will be able to access the system through one of two ways Externally, via federated access system, login via .ac.uk user login in first instance, later (hopefully) open to anyone who uses shibboleth Internally, via user accounts setup by consortium for collaborators

10 Where are we now? Computational hardware was procured by March 2015 (~6 month process) Ahead of schedule - system is now online and in use for research Adopting two models for access Access for registered users to “pro” dashboard online now (intended for bioinformaticians/developers) “version 1.0” system providing universal access to predefined images starting with the GVL – by Winter 2015 (intended for those who just want a single server with predefined software)

11 Live Demo Dashboard Login: http://birmingham.climb.ac.uk Wiki Login:

12 VMs are already up

13 Users are already using CLIMB to do research

14 CLoud Infrastructure for Microbial Bioinformatics
A multi site system to provide a one-stop-bioinformatics-shop, designed specifically to support Microbial researchers For both Bioinformaticians and wet lab scientists Combines hardware with training Free, simple interface, easy to use Common login Easy data and method sharing Already have multiple users from across UK academia and healthcare

15 The CLIMB Consortium Are
Professor Mark Pallen (Warwick) and Dr Sam Sheppard (Swansea) – Joint PIs Professor Mark Achtman (Warwick), Professor Steve Busby FRS (Birmingham), Dr Tom Connor (Cardiff)*, Professor Tim Walsh (Cardiff), Dr Robin Howe (Public Health Wales) – Co-Is Dr Nick Loman (Birmingham)* and Dr Chris Quince (Warwick) ; MRC Research Fellows Simon Thompson (Birmingham, Project Technical/OpenStack lead), Marius Bakke (Warwick, Systems administrator), Simon Thompson (Swansea, Systems administrator) * Principal bioinformaticians architecting and designing the system

16 CLoud Infrastructure for Microbial Bioinformatics (CLIMB)
MRC funded project to develop Cloud Infrastructure for microbial bioinformatics ~£4M of hardware, capable of supporting >1000 individual virtual servers Amazon/Google cloud for Academics Already in production, and use by researchers


Download ppt "The CLoud Infrastructure for Microbial Bioinformatics"

Similar presentations


Ads by Google