Download presentation
Presentation is loading. Please wait.
Published byBlanche Black Modified over 6 years ago
1
Bioinformatic analysis using Jetstream, a cloud computing environment
Bhavya Papudeshi, Sheri Sanders, Carrie Ganote, Jeremy Fischer, Thomas G. Doak Research Technologies, University Information Technology Services, Pervasive Technology Institute, Indiana University, Bloomington, USA
2
National Center for Genome Analysis Support (NCGAS)
National Center for Genome Analysis Support (NCGAS) assists researchers in addressing the scientific challenges of understanding and analyzing the wealth of gene sequence information now available. This includes on-boarding biology professionals who lack the necessary computational background to run their analyses on high performance computing systems. What is NCGAS and how one of our aims are achievable via JS Many life scientists, outstanding in their fields though they may be, often remain novices when transplanted to a computational environment.
3
Jetstream, a cloud computing environment
Like cloud computing environments, you can spin up a virtual machine. Virtual Machines is similar to a laptop but computation is done elsewhere. Users have root privileges unlike high performance clusters. Virtual machines help with the transition to command line use, software installation, and running analysis in the Linux environment. Develop workflows or spin up pre-configured virtual machines , which can be shared and published between collaborators or publicly. Not crashing your computer
4
NCGAS use cases for Jetstream
Jetstream virtual machines were used in workshops to teach transcriptomics, metagnomics and R workshops. Worked really well! A base Jbrowse image has been preconfigured (not public yet) to help different labs set up their own Jbrowse copies to share with their collaborators R studio preconfigured virtual machine is available for workshops. MATLAB software available on Jetstream for free cost.
5
Jetstream, a cloud computing environment
Goto Goto Jetstream-cloud.org, and select Jetstream login.
6
Step 1- Getting an XSEDE account
First –an XSEDE account, create one at Anybody can create an account on XSEDE. US or non-US. What is XSEDE- is a community effort towards providing scientists with necessary cyberinfrastructure (cloud computing, super computers) for analysis, data collection/sharing and expertise. Now you have an XSEDE account, next step is to login to Jetstream using the XSEDE ID.
7
Step 2- Selecting a Jetstream allocation
To access Jetstream resources- access to a Jetstream allocation request for a Jetstream trial access- limited resources Contact NCGAS to get on our allocation – trial access as well but more resources than Option 1 Request for your own Jetstream allocation- To spin up your image/VM from a preconfigured or basic Ubuntu, you will need to be part of an XSEDE allocation. There are three options for this. Requirements for a complete XSEDE-Jetstream allocation: You have to be faculty or staff at a us-based institution. You do NOT have to have a doctorate. Non–US users need to have their US collaborator to request allocation and get on that allocation. You can request for a specific amount of resources and renew upon request again. XSEDE allocation process requires abstract on your research and how Jetstream will help, as well as list of resources required. Jestream wiki has more details on this process. You can mail and we can help as well.
8
In a project space- Start a preconfigured VM
Starting a new instance
9
Preconfigured biology based VM’s on Jetstream
There are more than 20 preconfigured bioinformatics related images currently hosted on Jetstream. Here is a list of some of the bioinformatics related images/VM’s. There are more than 20 already preconfigured bioinformatics related images hosted on JS. If you are interested in for example running Café, you can start up a café VM, which will clone this café image mentioned here, and you have your own copy now which you can personalize.
10
Virtual machine size and volumes
Select a VM size When you select a VM size, you have options from small to s1.xxlarge. The difference between them are spelled out, difference in the number of CPUs, total memory, and storage space. Pick accordingly, for example the largest image is uses up 44 CPUs from your allocation. In the case of a trial allocation, you cant even start this large an image. Use your Su’s wisely or you will find yourself spending time on JS renewals requests often, which means s and writing up abstracts instead of analysis
11
Virtual machine size and volumes
In the case your computation requires less than 60GB memory but your dataset is 300GB, then? In such a case you have no choice but to spin up the s1.xxlarge image. Lets forget about the really large dataset and say you only have 10Gb of data then you pick m1.xlarge image, saved to nearly half of your allocation. So in such cases we use an alternative – start up a volume, like a hard drive that hosts your data and saves your results.
12
Solution- Setting up a volume
Each user can get 10 volumes up to 500GB total storage* “Use what you need, but not more than you need” –Jeremy’s quote
13
Uploading large amounts of data
Globus allows you to transfer gigabytes or terabytes of data securely and quickly. Setting Globus up on your computer is free (see below), but clusters, academic, or commercial endpoints are required to pay for a subscription. Jetstream has a paid subscription of Globus More information on this process at
14
Start a preconfigured VM- bcbio nextgen toolkit
15
Configuring a Virtual Machine on Jetstream
Generates a ticket to and wait for Jetstream team to image it for you.
16
Acknowledgments NCGAS blog –
Getting started on Jetstream File transfer using Globus Contact us Jetstream documentation - @ncgasiu @ncgas
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.