Providing an Research Environment for life sciences

Slides:



Advertisements
Similar presentations
The Internet2 NET+ Services Program Jerry Grochow Interim Vice President CSG January, 2012.
Advertisements

QCloud Queensland Cloud Data Storage and Services 27Mar2012 QCloud1.
UMF Cloud
Faster, More Scalable Computing in the Cloud Pavan Pant, Director Product Management.
Testing PanDA at ORNL Danila Oleynik University of Texas at Arlington / JINR PanDA UTA 3-4 of September 2013.
Building Data-intensive Pipelines Ravi K Madduri Argonne National Lab University of Chicago.
Hadoop Team: Role of Hadoop in the IDEAL Project ●Jose Cadena ●Chengyuan Wen ●Mengsu Chen CS5604 Spring 2015 Instructor: Dr. Edward Fox.
U.S. Department of the Interior U.S. Geological Survey David V. Hill, Information Dynamics, Contractor to USGS/EROS 12/08/2011 Satellite Image Processing.
HPC at IISER Pune Neet Deo System Administrator
Cloud Computing in NASA Missions Dan Whorton CTO, Stinger Ghaffarian Technologies June 25, 2010 All material in RED will be updated.
Using Biological Cyberinfrastructure Scaling Science and People: Applications in Data Storage, HPC, Cloud Analysis, and Bioinformatics Training Scaling.
InstantGrid: A Framework for On- Demand Grid Point Construction R.S.C. Ho, K.K. Yin, D.C.M. Lee, D.H.F. Hung, C.L. Wang, and F.C.M. Lau Dept. of Computer.
NML Bioinformatics Service— Licensed Bioinformatics Tools High-throughput Data Analysis Literature Study Data Mining Functional Genomics Analysis Vector.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
The CRI compute cluster CRUK Cambridge Research Institute.
Computational Research in the Battelle Center for Mathmatical medicine.
NCAR RP Update Rich Loft NCAR RPPI May 7, NCAR Teragrid RP Developments Current Cyberinfrastructure –5.7 TFlops/2048 core Blue Gene/L system –100.
Aalto Data Repository Keijo Heljanko and Mikko Hakala
Improving the Research Bootstrap of Condor High Throughput Computing for Non-Cluster Experts Based on Knoppix Instant Computing Technology RIKEN Genomic.
The Future of Whole Human Genome Data Management and Analysis, Available on the Microsoft Azure Platform Today MICROSOFT AZURE APP BUILDER PROFILE: SPIRAL.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Overview of Atmosphere.
Transforming Science Through Data-driven Discovery Workshop Overview Ohio State University MCIC Jason Williams – Lead, CyVerse – Education, Outreach, Training.
Canadian Bioinformatics Workshops
CLOUD COMPUTING Presented to Graduate Students Mechanical Engineering Dr. John P. Abraham Professor, Computer Engineering UTPA.
Scientific Data Processing Portal and Heterogeneous Computing Resources at NRC “Kurchatov Institute” V. Aulov, D. Drizhuk, A. Klimentov, R. Mashinistov,
Virtual machines ALICE 2 Experience and use cases Services at CERN Worker nodes at sites – CNAF – GSI Site services (VoBoxes)
Ben Rogers August 18,  High Performance Computing  Data Storage  Hadoop Pilot  Secure Remote Desktop  Training Opportunities  Grant Collaboration.
Canadian Bioinformatics Workshops
High Performance Computing (HPC)
HPC In The Cloud Case Study: Proteomics Workflow
Accessing the VI-SEEM infrastructure
Univa Grid Engine Makes Work Management Automatic and Efficient, Accelerates Deployment of Cloud Services with Power of Microsoft Azure MICROSOFT AZURE.
The CLoud Infrastructure for Microbial Bioinformatics
Organizations Are Embracing New Opportunities
HPC In The Cloud Case Study: Proteomics Workflow
Big Data is a Big Deal!.
Big Data Enterprise Patterns
What is HPC? High Performance Computing (HPC)
Hybrid Cloud Solutions at IHG
Sushant Ahuja, Cassio Cristovao, Sameep Mohta
XNAT at Scale June 7, 2016.
SparkBWA: Speeding Up the Alignment of High-Throughput DNA Sequencing Data - Aditi Thuse.
Elastic Computing Resource Management Based on HTCondor
Joint Research Centre (JRC)
Data Analytics and CERN IT Hadoop Service
Introduction to Distributed Platforms
CyVerse Tools and Services
Tools and Services Workshop
Joslynn Lee – Data Science Educator
How to download, configure and run a mapReduce program In a cloudera VM Presented By: Mehakdeep Singh Amrit Singh Chaggar Ranjodh Singh.
MIDAS- Molecular Dynamics Analysis Tutorial February 2017
Working With Azure Batch AI
HTCondor at Syracuse University – Building a Resource Utilization Strategy Eric Sedore Associate CIO HTCondor Week 2017.
ECRG High-Performance Computing Seminar
Study course: “Computing clusters, grids and clouds” Andrey Y. Shevel
Hadoop Clusters Tess Fulkerson.
Integration of Singularity With Makeflow
USF Health Informatics Institute (HII)
HII Technical Infrastructure
Dr. John P. Abraham Professor, Computer Engineering UTPA
Database Applications (15-415) Hadoop Lecture 26, April 19, 2016
The Only Digital Asset Management System on Microsoft Azure, MediaValet Is Uniquely Equipped to Meet Any Company’s Needs MICROSOFT AZURE ISV PROFILE: MEDIAVALET.
CS110: Discussion about Spark
Module 01 ETICS Overview ETICS Online Tutorials
Syllabus and Introduction Keke Chen
High Performance Computing in Bioinformatics
MMG: from proof-of-concept to production services at scale
SHARCNET More than just HPC.
H2020 EU PROJECT | Topic SC1-DTH | GA:
Bethesda Cybersecurity Club
Presentation transcript:

Providing an Research Environment for life sciences David Brown 24th January 2018 www.genomicsengland.co.uk

Background Genomics England has been setup to deliver the UK 100K Genome programme. As part of the charter of Genomics England and the 100K genome programme no data can be downloaded. Therefore Genomics England has had to create an research environment to allow researchers to work with the Genomics England dataset. The need to address the needs of three categories of user: Bio-informatics – ’just give me the prompt’ Scientist with no programming skills Management

Clinical and Phenotypic Data Research Environment Virtual Desktop Access DataBology Pipeline Pilot R Command line tools Analysis Python API Rights Management Clinical and Phenotypic Data LabKey Variant Library OpenCGA Read Level Data File storage Data

Physical Environment – Hybrid Cloud Expansion 10PBs of additional storage Hadoop Cluster Backup Subsystem Spine Switch HDFS Multiple 10GiB/s 2,000 cores 18,000 cores 160GiB/s Spine switch Isilon Storage Cluster Initial 6.4PBs deployed in Corsham 300TBs deployed in Farnbrough

Major Factors Initial speed of implementation - when GEL was in startup we had 3 months to be able to accept sequences. Only a cloud solution could address these time scales. Compute demand is not constant – there is a need to flexi based on project demand Project timescales v’s rate of technology change – the adoption of any research environment is always slow than you plan, if you invest in technology early is will be obsolete by the time it is used

Compute Demand Types Traditional HPC Scheduled Grid Compute Static Centralized Resource On-Demand Compute with ‘thick nodes’ Pop-up environment created by research team on demand Searching on ‘research repository’ Head Node Submit Image Head Node Cloud Control Submit Image Submit Job Embassy Secure Compute Env Submit Job Grid Master Grid Master Thick Node 24 core 256GB RAM Grid Exec Grid Exec Grid Exec Grid Exec Grid Exec Embassy Secure Compute Env Data Data

Airlock The airlock is there, the issue now is who validates inbound and outbound data, this isn’t an IT issue!

Research Environment front end performance 300mls