Bioinformatic analysis using Jetstream, a cloud computing environment

Slides:



Advertisements
Similar presentations
1 Institutional Repository Workshop 1 – 3 April 2009 Presented by Leonard Daniels.
Advertisements

September 4, 2014 Using National Cyberinfrastructure Tom Doak Carrie Ganote National Center for Genome Analysis Support.
Senior Design Lab Policies Presented by: Trey Murdoch CSC IT Staff.
Creating a Biolinux AMI at Amazon’s EC2
Copyright Anthony K. Holden, This work is the intellectual property of the author. Permission is granted for this material to be shared for non-commercial,
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
IPlant Collaborative Powering a New Plant Biology iPlant Collaborative Powering a New Plant Biology.
Chromium OS is an open-source project that aims to build an operating system that provides a fast, simple, and more secure computing experience for people.
Statewide IT Conference, Bloomington IN (October 7 th, 2014) The National Center for Genome Analysis Support, IU and You! Carrie Ganote (Bioinformatics.
Using Biological Cyberinfrastructure Scaling Science and People: Applications in Data Storage, HPC, Cloud Analysis, and Bioinformatics Training Scaling.
Presented by: Sanketh Beerabbi University of Central Florida COP Cloud Computing.
| nectar.org.au NECTAR TRAINING Module 5 The Research Cloud Lifecycle.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Objectives.
RNA-Seq 2013, Boston MA, 6/20/2013 Optimizing the National Cyberinfrastructure for Lower Bioinformatic Costs: Making the Most of Resources for Publicly.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
Center for Research Computing at Notre Dame Jarek Nabrzyski, Director
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Atmosphere.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
DoC Private IaaS Cloud Thomas Joseph Cloud Manager
Award # funded by the National Science Foundation Award #ACI Jetstream: A Distributed Cloud Infrastructure for.
| nectar.org.au NECTAR TRAINING Module 5 The Research Cloud Lifecycle.
Providing National Cyberinfrastructure to Biologists, esp. Genomicists. William K. Barnett, Ph.D. (Director) Thomas G. Doak (Manager & Domain Biologist)
Virtual Machines. A virtual machine takes the layered approach to its logical conclusion. It treats hardware and the operating system kernel as though.
Virtual Machines Module 2. Objectives Define virtual machine Define common terminology Identify advantages and disadvantages Determine what software is.
OPERATING SYSTEM REVIEW. System Software The programs that control and maintain the operation of the computer and its devices The two parts of system.
Transforming Science Through Data-driven Discovery Tools and Services Workshop Atmosphere Joslynn Lee – Data Science Educator Cold Spring Harbor Laboratory,
Canadian Bioinformatics Workshops
SCI-BUS is supported by the FP7 Capacities Programme under contract nr RI CloudBroker usage Zoltán Farkas MTA SZTAKI LPDS
Canadian Bioinformatics Workshops bioinformatics.ca.
CyVerse Workshop Discovery Environment Overview. Welcome to the Discovery Environment A Simple Interface to Hundreds of Bioinformatics Apps, Powerful.
CyVerse Data Store Managing Your ‘Big’ Data. Welcome to the Data Store Manage and share your data across all CyVerse platforms.
Canadian Bioinformatics Workshops
Transforming Science Through Data-driven Discovery Using CyVerse Cyberinfrastructure to Enable Data Intensive Research, Collaboration, and Education Atmosphere.
Virtual Machine and VirtualBox
Canadian Bioinformatics Workshops
Cloud Computing for Science
bitcurator-access-webtools Quick Start Guide
Accessing the VI-SEEM infrastructure
Computing Clusters, Grids and Clouds Globus data service
Matt Lemons Nate Mayotte
Investigation authentication using AAF for the CVL on NeCTAR
Tools and Services Workshop
Joslynn Lee – Data Science Educator
About Dropbox Dropbox is a tool the CSME will set up for its participants upon request so that participants do not have to upload podcasts to CSME servers.
Bioinformatics Community of CNGrid A New Approach to Utilizing Grids
Dr. Craig A. Stewart Orcid ID:
National Center for Genome Analysis Support
Tools and Services Workshop Overview of the iPlant Data Store
Introduction to XSEDE Resources HPC Workshop 08/21/2017
Creating a Windows 7 Professional SP1 Virtual machine
Creating a Windows 10 Virtual machine
Virtual Machines.
HOW TO TRANSFER QUICKBOOKS FILES FROM ONE COMPUTER TO ANOTHER?
Introduction to eXtreme Science and Engineering Discovery Environment (XSEDE): Comet and Jetstream Sharon Solis Research Computing Consultant Enterprise.
Office 365 and OneDrive Samuel J. West Technology Assistance Center
An Introduction to jhhkjhjhjhhjh
Shared Research Computing Policy Advisory Committee (SRCPAC)
Open Source Toolkit for Turn-Key AI Cluster (Introduction)
Richard LeDuc, Ph.D. (Manager)
Virtualization Techniques
HC Hyper-V Module GUI Portal VPS Templates Web Console
Operating System Review
Cloud computing mechanisms
University of California, Berkeley
Virtual Machine and VirtualBox
Storing and Accessing G-OnRamp’s Assembly Hubs outside of Galaxy
Trip report: Visit to UPPNEX
Virtual Machine and VirtualBox
Software - Operating Systems
Virtual Machine and VirtualBox
Presentation transcript:

Bioinformatic analysis using Jetstream, a cloud computing environment Bhavya Papudeshi, Sheri Sanders, Carrie Ganote, Jeremy Fischer, Thomas G. Doak Research Technologies, University Information Technology Services, Pervasive Technology Institute, Indiana University, Bloomington, USA

National Center for Genome Analysis Support (NCGAS) National Center for Genome Analysis Support (NCGAS) assists researchers in addressing the scientific challenges of understanding and analyzing the wealth of gene sequence information now available. This includes on-boarding biology professionals who lack the necessary computational background to run their analyses on high performance computing systems. What is NCGAS and how one of our aims are achievable via JS Many life scientists, outstanding in their fields though they may be, often remain novices when transplanted to a computational environment.

Jetstream, a cloud computing environment Like cloud computing environments, you can spin up a virtual machine. Virtual Machines is similar to a laptop but computation is done elsewhere. Users have root privileges unlike high performance clusters. Virtual machines help with the transition to command line use, software installation, and running analysis in the Linux environment. Develop workflows or spin up pre-configured virtual machines , which can be shared and published between collaborators or publicly. Not crashing your computer

NCGAS use cases for Jetstream Jetstream virtual machines were used in workshops to teach transcriptomics, metagnomics and R workshops. Worked really well! A base Jbrowse image has been preconfigured (not public yet) to help different labs set up their own Jbrowse copies to share with their collaborators R studio preconfigured virtual machine is available for workshops. MATLAB software available on Jetstream for free cost.

Jetstream, a cloud computing environment Goto www.jetstream-cloud.org Goto Jetstream-cloud.org, and select Jetstream login.

Step 1- Getting an XSEDE account First –an XSEDE account, create one at https://portal.xsede.org Anybody can create an account on XSEDE. US or non-US. What is XSEDE- is a community effort towards providing scientists with necessary cyberinfrastructure (cloud computing, super computers) for analysis, data collection/sharing and expertise. Now you have an XSEDE account, next step is to login to Jetstream using the XSEDE ID.

Step 2- Selecting a Jetstream allocation To access Jetstream resources- access to a Jetstream allocation request for a Jetstream trial access- limited resources http://wiki.jetstream-cloud.org/Jetstream+Trial+Access+Allocation Contact NCGAS (help@ncgas.org) to get on our allocation – trial access as well but more resources than Option 1 Request for your own Jetstream allocation- http://wiki.jetstream-cloud.org/Jetstream+Allocations To spin up your image/VM from a preconfigured or basic Ubuntu, you will need to be part of an XSEDE allocation. There are three options for this. Requirements for a complete XSEDE-Jetstream allocation: You have to be faculty or staff at a us-based institution. You do NOT have to have a doctorate. Non–US users need to have their US collaborator to request allocation and get on that allocation. You can request for a specific amount of resources and renew upon request again. XSEDE allocation process requires abstract on your research and how Jetstream will help, as well as list of resources required. Jestream wiki has more details on this process. You can mail help@ncgas.org and we can help as well.

In a project space- Start a preconfigured VM Starting a new instance

Preconfigured biology based VM’s on Jetstream There are more than 20 preconfigured bioinformatics related images currently hosted on Jetstream. Here is a list of some of the bioinformatics related images/VM’s. There are more than 20 already preconfigured bioinformatics related images hosted on JS. If you are interested in for example running Café, you can start up a café VM, which will clone this café image mentioned here, and you have your own copy now which you can personalize.

Virtual machine size and volumes Select a VM size When you select a VM size, you have options from small to s1.xxlarge. The difference between them are spelled out, difference in the number of CPUs, total memory, and storage space. Pick accordingly, for example the largest image is uses up 44 CPUs from your allocation. In the case of a trial allocation, you cant even start this large an image. Use your Su’s wisely or you will find yourself spending time on JS renewals requests often, which means emails and writing up abstracts instead of analysis

Virtual machine size and volumes In the case your computation requires less than 60GB memory but your dataset is 300GB, then? In such a case you have no choice but to spin up the s1.xxlarge image. Lets forget about the really large dataset and say you only have 10Gb of data then you pick m1.xlarge image, saved to nearly half of your allocation. So in such cases we use an alternative – start up a volume, like a hard drive that hosts your data and saves your results.

Solution- Setting up a volume Each user can get 10 volumes up to 500GB total storage* “Use what you need, but not more than you need” –Jeremy’s quote

Uploading large amounts of data Globus allows you to transfer gigabytes or terabytes of data securely and quickly. Setting Globus up on your computer is free (see below), but clusters, academic, or commercial endpoints are required to pay for a subscription.  Jetstream has a paid subscription of Globus More information on this process at http://ncgas.org/Blog_Posts/Getting%20Started%20with%20Globus.php

Start a preconfigured VM- bcbio nextgen toolkit

Configuring a Virtual Machine on Jetstream Generates a ticket to help@xsede.edu and wait for Jetstream team to image it for you.

Acknowledgments NCGAS blog – Getting started on Jetstream http://ncgas.org/Blog_Posts/Getting%20Started%20on%20Jetstream.php File transfer using Globus http://ncgas.org/Blog_Posts/Getting%20Started%20with%20Globus.php Contact us help@ncgas.org Jetstream documentation - https://iujetstream.atlassian.net/wiki/spaces/JWT/overview @ncgasiu @ncgas