Bioinformatic analysis using Jetstream, a cloud computing environment Bhavya Papudeshi, Sheri Sanders, Carrie Ganote, Jeremy Fischer, Thomas G. Doak Research Technologies, University Information Technology Services, Pervasive Technology Institute, Indiana University, Bloomington, USA
National Center for Genome Analysis Support (NCGAS) National Center for Genome Analysis Support (NCGAS) assists researchers in addressing the scientific challenges of understanding and analyzing the wealth of gene sequence information now available. This includes on-boarding biology professionals who lack the necessary computational background to run their analyses on high performance computing systems. What is NCGAS and how one of our aims are achievable via JS Many life scientists, outstanding in their fields though they may be, often remain novices when transplanted to a computational environment.
Jetstream, a cloud computing environment Like cloud computing environments, you can spin up a virtual machine. Virtual Machines is similar to a laptop but computation is done elsewhere. Users have root privileges unlike high performance clusters. Virtual machines help with the transition to command line use, software installation, and running analysis in the Linux environment. Develop workflows or spin up pre-configured virtual machines , which can be shared and published between collaborators or publicly. Not crashing your computer
NCGAS use cases for Jetstream Jetstream virtual machines were used in workshops to teach transcriptomics, metagnomics and R workshops. Worked really well! A base Jbrowse image has been preconfigured (not public yet) to help different labs set up their own Jbrowse copies to share with their collaborators R studio preconfigured virtual machine is available for workshops. MATLAB software available on Jetstream for free cost.
Jetstream, a cloud computing environment Goto www.jetstream-cloud.org Goto Jetstream-cloud.org, and select Jetstream login.
Step 1- Getting an XSEDE account First –an XSEDE account, create one at https://portal.xsede.org Anybody can create an account on XSEDE. US or non-US. What is XSEDE- is a community effort towards providing scientists with necessary cyberinfrastructure (cloud computing, super computers) for analysis, data collection/sharing and expertise. Now you have an XSEDE account, next step is to login to Jetstream using the XSEDE ID.
Step 2- Selecting a Jetstream allocation To access Jetstream resources- access to a Jetstream allocation request for a Jetstream trial access- limited resources http://wiki.jetstream-cloud.org/Jetstream+Trial+Access+Allocation Contact NCGAS (help@ncgas.org) to get on our allocation – trial access as well but more resources than Option 1 Request for your own Jetstream allocation- http://wiki.jetstream-cloud.org/Jetstream+Allocations To spin up your image/VM from a preconfigured or basic Ubuntu, you will need to be part of an XSEDE allocation. There are three options for this. Requirements for a complete XSEDE-Jetstream allocation: You have to be faculty or staff at a us-based institution. You do NOT have to have a doctorate. Non–US users need to have their US collaborator to request allocation and get on that allocation. You can request for a specific amount of resources and renew upon request again. XSEDE allocation process requires abstract on your research and how Jetstream will help, as well as list of resources required. Jestream wiki has more details on this process. You can mail help@ncgas.org and we can help as well.
In a project space- Start a preconfigured VM Starting a new instance
Preconfigured biology based VM’s on Jetstream There are more than 20 preconfigured bioinformatics related images currently hosted on Jetstream. Here is a list of some of the bioinformatics related images/VM’s. There are more than 20 already preconfigured bioinformatics related images hosted on JS. If you are interested in for example running Café, you can start up a café VM, which will clone this café image mentioned here, and you have your own copy now which you can personalize.
Virtual machine size and volumes Select a VM size When you select a VM size, you have options from small to s1.xxlarge. The difference between them are spelled out, difference in the number of CPUs, total memory, and storage space. Pick accordingly, for example the largest image is uses up 44 CPUs from your allocation. In the case of a trial allocation, you cant even start this large an image. Use your Su’s wisely or you will find yourself spending time on JS renewals requests often, which means emails and writing up abstracts instead of analysis
Virtual machine size and volumes In the case your computation requires less than 60GB memory but your dataset is 300GB, then? In such a case you have no choice but to spin up the s1.xxlarge image. Lets forget about the really large dataset and say you only have 10Gb of data then you pick m1.xlarge image, saved to nearly half of your allocation. So in such cases we use an alternative – start up a volume, like a hard drive that hosts your data and saves your results.
Solution- Setting up a volume Each user can get 10 volumes up to 500GB total storage* “Use what you need, but not more than you need” –Jeremy’s quote
Uploading large amounts of data Globus allows you to transfer gigabytes or terabytes of data securely and quickly. Setting Globus up on your computer is free (see below), but clusters, academic, or commercial endpoints are required to pay for a subscription. Jetstream has a paid subscription of Globus More information on this process at http://ncgas.org/Blog_Posts/Getting%20Started%20with%20Globus.php
Start a preconfigured VM- bcbio nextgen toolkit
Configuring a Virtual Machine on Jetstream Generates a ticket to help@xsede.edu and wait for Jetstream team to image it for you.
Acknowledgments NCGAS blog – Getting started on Jetstream http://ncgas.org/Blog_Posts/Getting%20Started%20on%20Jetstream.php File transfer using Globus http://ncgas.org/Blog_Posts/Getting%20Started%20with%20Globus.php Contact us help@ncgas.org Jetstream documentation - https://iujetstream.atlassian.net/wiki/spaces/JWT/overview @ncgasiu @ncgas