Presentation is loading. Please wait.

Presentation is loading. Please wait.

….. The cloud The cluster…... What is “the cloud”? 1.Many computers “in the sky” 2.A service “in the sky” 3.Sometimes #1 and #2.

Similar presentations


Presentation on theme: "….. The cloud The cluster…... What is “the cloud”? 1.Many computers “in the sky” 2.A service “in the sky” 3.Sometimes #1 and #2."— Presentation transcript:

1 ….. The cloud The cluster…..

2

3 What is “the cloud”? 1.Many computers “in the sky” 2.A service “in the sky” 3.Sometimes #1 and #2

4 Basically virtual computers…

5

6 To you.

7 What is a virtual computer?

8 What is a “regular” computer? Core 1 Core 2 Core 4 Core 3 8 GB

9 Core 1 Core 2 8 GB Core 3 Core 4

10 transcript assembly mrbayes – model 1 mrbayes – model 2

11 But it’s even cooler than that. You can have it your way! – Each machine can be setup just like your computer Programs, settings, etc. – Different machines for different tasks – Or one large machine for all tasks – Caveat – pretty much command line only

12 Momentary Digression What is the command line? – Text-based means of interacting with your computer – More likely to use on OSX or Linux – Fast – Somewhat obtuse

13

14

15

16 So, why, again, is this helpful? The Cloud can make similar resources available at a fraction of their overall cost. It’s essentially “on- demand” computing power. 48 Cores, 256 GB RAM = $33,500

17 Benefits of The Cloud Pay by the hour Use what you need No purchase/depreciation of equipment Almost instant access to many resources – If you need 1 node, no problem – If you need 500 nodes, no problem

18 Costs of The Cloud Few safety nets – With flexibility comes the power to do wrong Interactions can be complex – Requires proficiency in seemingly arcane tools (the CLI) Can be expensive Must rely on “others”

19

20 68.4 GB RAM 8 Cores

21 z $2.00/hr.

22 Why would you use this? Data pre-processing – Read trimming, Adapter trimming Genome assembly Long-running processes that tie up machines – mrbayes, raxml, best – alignments (blast, blat, lastz, bwa)

23 Practical example De novo Genome assembly – Have many reads – Need to put them together – Generally RAM intensive – Generally slow

24 Actual example Start an Amazon ec2 “instance” Add in necessary software Add 454 assembly software Get data to machine Start assembly Let it run Download assembled data

25 Reads Align and orient Assemble

26 Why is this hard? Must ensure correct ends overlap Must put correct pieces together Must do this quickly – Do things in RAM/memory Must deal with massive amounts of data – 0.5 to 2 to 20 GB or more

27 What, exactly, is a “cluster” Group of machines interacting to achieve a common goal

28 1000 Work Units Clusters

29 125 Work Units ~ 8X speedup or 1/8 th time

30 Why? Very long running processes/complex jobs – Genome:Genome alignments – Substitution models for thousands of loci – Species trees for thousands of loci Sometimes the only way to accomplish a “genome-scale” job in a reasonable time- frame

31 Practical example chr1 Similar

32 Practical example chr1 chr2 chr3 chr4

33 Practical example chr1 chr2 chr3 chr4 chr1 chr2 chr3 chr4

34 Practical example chr1 chr2 chr3 chr4 chr1 chr2 chr3 chr4

35 Cluster Caveats Sometimes not suited to certain jobs – Essentially those without component parts – Some modeling (e.g. mcmc) Complex – More moving parts = more to break

36 Clusters in the Cloud You have a big, complicated job You need many computers for a job You need to run job infrequently You don’t have massive computer resources

37 http://web.mit.edu/star

38 The Cloud as a service Alternative meaning of The Cloud Essentially web-powered software “Galaxy” is one such service

39 http://galaxy.psu.edu

40 Galaxy Very powerful analyses Relatively simple to use Repeatable Understandable Extendable

41 Galaxy – Basic services Convert fastq to fasta Summarize fastq reads Fasta + Qual to Fastq Trim fastq reads Merge data sets Convert SFF

42 Galaxy – Advanced services Intersect genomic regions Merge genomic regions Map with bowtie Map with bwa Use bwa to identify variants Convert genome coordinates

43 Actual example Finding “missing” genes – You have a genome sequence – You have gene annotation (i,e. refseq) – You have aligned mRNA data – You want to know where these do not overlap

44 Galaxy is very flexible Runs locally Runs on network Runs on cluster Runs in cloud Runs on cluster in cloud

45 Galaxy has some pre-requisites You know what you want to do You generally know how to do it You know what the data are that you need You know how to ensure the results are correct Galaxy abstracts away the complexity of the implementation steps


Download ppt "….. The cloud The cluster…... What is “the cloud”? 1.Many computers “in the sky” 2.A service “in the sky” 3.Sometimes #1 and #2."

Similar presentations


Ads by Google