Download presentation
Presentation is loading. Please wait.
Published byBerenice Dean Modified over 9 years ago
1
….. The cloud The cluster…..
3
What is “the cloud”? 1.Many computers “in the sky” 2.A service “in the sky” 3.Sometimes #1 and #2
4
Basically virtual computers…
6
To you.
7
What is a virtual computer?
8
What is a “regular” computer? Core 1 Core 2 Core 4 Core 3 8 GB
9
Core 1 Core 2 8 GB Core 3 Core 4
10
transcript assembly mrbayes – model 1 mrbayes – model 2
11
But it’s even cooler than that. You can have it your way! – Each machine can be setup just like your computer Programs, settings, etc. – Different machines for different tasks – Or one large machine for all tasks – Caveat – pretty much command line only
12
Momentary Digression What is the command line? – Text-based means of interacting with your computer – More likely to use on OSX or Linux – Fast – Somewhat obtuse
16
So, why, again, is this helpful? The Cloud can make similar resources available at a fraction of their overall cost. It’s essentially “on- demand” computing power. 48 Cores, 256 GB RAM = $33,500
17
Benefits of The Cloud Pay by the hour Use what you need No purchase/depreciation of equipment Almost instant access to many resources – If you need 1 node, no problem – If you need 500 nodes, no problem
18
Costs of The Cloud Few safety nets – With flexibility comes the power to do wrong Interactions can be complex – Requires proficiency in seemingly arcane tools (the CLI) Can be expensive Must rely on “others”
20
68.4 GB RAM 8 Cores
21
z $2.00/hr.
22
Why would you use this? Data pre-processing – Read trimming, Adapter trimming Genome assembly Long-running processes that tie up machines – mrbayes, raxml, best – alignments (blast, blat, lastz, bwa)
23
Practical example De novo Genome assembly – Have many reads – Need to put them together – Generally RAM intensive – Generally slow
24
Actual example Start an Amazon ec2 “instance” Add in necessary software Add 454 assembly software Get data to machine Start assembly Let it run Download assembled data
25
Reads Align and orient Assemble
26
Why is this hard? Must ensure correct ends overlap Must put correct pieces together Must do this quickly – Do things in RAM/memory Must deal with massive amounts of data – 0.5 to 2 to 20 GB or more
27
What, exactly, is a “cluster” Group of machines interacting to achieve a common goal
28
1000 Work Units Clusters
29
125 Work Units ~ 8X speedup or 1/8 th time
30
Why? Very long running processes/complex jobs – Genome:Genome alignments – Substitution models for thousands of loci – Species trees for thousands of loci Sometimes the only way to accomplish a “genome-scale” job in a reasonable time- frame
31
Practical example chr1 Similar
32
Practical example chr1 chr2 chr3 chr4
33
Practical example chr1 chr2 chr3 chr4 chr1 chr2 chr3 chr4
34
Practical example chr1 chr2 chr3 chr4 chr1 chr2 chr3 chr4
35
Cluster Caveats Sometimes not suited to certain jobs – Essentially those without component parts – Some modeling (e.g. mcmc) Complex – More moving parts = more to break
36
Clusters in the Cloud You have a big, complicated job You need many computers for a job You need to run job infrequently You don’t have massive computer resources
37
http://web.mit.edu/star
38
The Cloud as a service Alternative meaning of The Cloud Essentially web-powered software “Galaxy” is one such service
39
http://galaxy.psu.edu
40
Galaxy Very powerful analyses Relatively simple to use Repeatable Understandable Extendable
41
Galaxy – Basic services Convert fastq to fasta Summarize fastq reads Fasta + Qual to Fastq Trim fastq reads Merge data sets Convert SFF
42
Galaxy – Advanced services Intersect genomic regions Merge genomic regions Map with bowtie Map with bwa Use bwa to identify variants Convert genome coordinates
43
Actual example Finding “missing” genes – You have a genome sequence – You have gene annotation (i,e. refseq) – You have aligned mRNA data – You want to know where these do not overlap
44
Galaxy is very flexible Runs locally Runs on network Runs on cluster Runs in cloud Runs on cluster in cloud
45
Galaxy has some pre-requisites You know what you want to do You generally know how to do it You know what the data are that you need You know how to ensure the results are correct Galaxy abstracts away the complexity of the implementation steps
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.