Download presentation
Presentation is loading. Please wait.
Published byViolet Parks Modified over 9 years ago
1
Running Scientific Workflow Applications on the Amazon EC2 Cloud Bruce Berriman NASA Exoplanet Science Institute, IPAC Gideon Juve, Ewa Deelman, Karan Vahi, Gaurang Mehta Information Sciences Institute, USC Benjamin Berman USC Epigenome Center USC Epigenome Center Phil Maechling So Cal Earthquake Center
2
Clouds (Utility Computing) Pay for what you use rather than purchase compute and storage resources that end up underutilized Analogous to household utilities Originated in the business domain to provide services for small companies who did not want to maintain an IT Department Provided by data centers that are built on compute and storage virtualization technologies. Clouds built with commodity hardware. They are a “new purchasing paradigm” rather than a new technology.
3
Benefits and Concerns Benefits Pay only for what you need Elasticity - increase or decrease capacity within minutes Ease strain on local physical plant Control local system administration costs Concerns What if they become oversubscribed and user cannot increase capacity on demand? How will the cost structure change with time? If we become dependent on them, will we be at the cloud providers’ mercy? Are clouds secure? Are they up to the demands of science applications?
4
Cloud Providers Pricing Structures vary widely Amazon EC2 charges for hourly usage Skytap charges per month IBM requires an annual subscription Savvis offers servers for purchase Uses Running business applications Web hosting Provide additional capacity for heavy loads Application testing Provider Amazon.com EC2 AT&T Synaptic Hosting GNi Dedicated Hosting IBM Computing on Demand Rackspace Cloud Servers Savvis Open Cloud ServePath GoGrid Skytap Virtual Lab 3Tera Unisys Secure Verizon Computing Zimory Gateway Source Information Week, 9/4/09
5
Purposes of Our Study How useful is cloud computing for scientific workflow applications? An experimental study of the performance of three workflows with different I/O, memory and CPU requirements on a commercial cloud A comparison of the performance of cloud resources and typical HPC resources, and An analysis of the various costs associated with running workflows on a commercial cloud. Clouds are well suited to processing of workflows Workflows are loosely-couple applications composed of tasks connected by data Allocate resources as needed for processing tasks and decrease scheduling overheads Chose Amazon EC 2 Cloud and the NCSA Abe Cluster http://aws.amazon.com/ec2/ http://www.ncsa.illinois.edu/UserInfo/Resou rces/Hardware/Intel64Cluster/
6
The Applications: Montage Montage processing flow ReprojectionBackground rectificationCo-addition Science Grade - preserves spatial and calibration fidelity of input images. Portable – all common *nix platforms Open source code General – all common coords and image projections Speed – Processes 40 million pixels in 32 min on 128 nodes of 1.2 GHz Linux cluster Utilities for managing and manipulating image files Stand-alone modules http://montage.ipac.caltech.edu Toolkit for assembling FITS images into science-grade mosaics.
7
The Applications: Broadband and Epigenome Broadband simulates and compares seismograms from earthquake simulation codes. Generates high- and low-frequency earthquakes for several sources Computes intensities of seismograms at measuring stations. Epigenome maps short DNA segments collected using high-throughput gene sequencing machines to a reference genome. Maps chunks to a reference genome Produces an output map of gene density compared with the reference genome
8
Comparison of Resource Usage Ran a mosaic 8 deg sq of M17 in 2MASS J-band Workflow contains 10,429 tasks Reads 4.2 GB of input data Produces 7.9 GB of output data. Montage is I/O-bound because it spends more than 95% of its time in I/O operations. ApplicationI/OMemoryCPU Montage High Low BroadbandMediumHighMedium EpigenomeLowMediumHigh
9
Comparison of Resource Usage Broadband 4 sources and 5 stations Workflow contains 320 tasks 6 GB of input data and 160 MB of output data. Memory-limited because more than 75% of its runtime is consumed by tasks requiring more than 1 GB of physical memory Epigenome Workflow contains 81 tasks, 1.8 GB of input data 300 MB of output data. CPU-bound because it spends 99% of its runtime in the CPU and only 1% on I/O and other activities. ApplicationI/OMemoryCPU MontageHighLow BroadbandMedium High Medium EpigenomeLowMedium High
10
Processing Resources Networks and File Systems HPC systems use high-performance network and parallel file systems BUT Amazon EC2 uses commodity hardware Ran all processes on single, multi-core nodes. Used local and parallel file system on Abe. Processors and OS Linux Red Hat Enterprise with VMWare Amazon EC2 offers different instances – look at cost vs. performance c1.xlarge and abe.local equivalent – estimate overhead due to virtualization abe.lustre and abe.local differ only in file system Amazon Abe TypeArchCPUCoresMemoryNetworkStoragePrice m1.small32-bit2.0-2.6 GHz Opteron1/21.7 GB1-Gbps EthernetLocal$0.10/hr m1.large64-bit2.0-2.6 GHz Opteron27.5 GB1-Gbps EthernetLocal$0.40/hr m1.xlarge64-bit2.0-2.6 GHz Opteron415 GB1-Gbps EthernetLocal$0.80/hr c1.medium32-bit2.33-2.66 GHz Xeon21.7 GB1-Gbps EthernetLocal$0.20/hr c1.xlarge64-bit2.0-2.66 GHz Xeon87.5 GB1-Gbps EthernetLocal$0.80/hr abe.local64-bit2.33 GHz Xeon88 GB10-Gbps InfiniBandLocal… abe.lustre32-bit2.0-2.6 GHz Opteron818 GB10-Gbps InfiniBandLustre…
11
Execution Environment Establish equivalent software environments on the two platforms “Submit” host used to send jobs to EC2 or Abe. All workflows used the Pegasus Workflow Management System with DAGMan and Condor. Pegasus - transforms abstract workflow descriptions into concrete plans DAGMan – manages dependencies Condor manages task execution Amazon EC2 Abe
12
Montage Performance (I/O Bound) Slowest on m1.small, but fastest on those machines with the most cores: m1.xlarge, c1.xlarge and abe.lustre, abe.local. The parallel file system on abe.lustre offers a big performance advantage for I/O bound systems – cloud providers would need to offer parallel file system and high-speed networks. Virtualization overhead <10%
13
Broadband Performance (Memory bound) Lower I/O requirements – not much difference between abe.lustre and abe.local ; both have 8 GB memory. Only slightly worse performance on c1.xlarge, 7.5 GB memory. Poor performance on c1.medium – only 1.7 GB of memory. Cores may sit idle to prevent system running out of memory. Virtualization overhead small
14
Epigenome Performance (CPU Bound) c1.xlarge, abe.lustre and abe.local give best performance – they are the three most powerful machines (64-bit, 2.3-2.6 GHz) The parallel file system on abe.lustre offers little benefit. Virtualization overhead is roughly 10%, largest of three apps - competing for CPU with OS.
15
Resource Cost Analysis You get what you pay for! The cheapest instances are the least powerful. InstanceCost $/hr m1.small0.10 m1.large0.40 m1.xlarge0.80 c1.medium0.20 c1.xlarge0.80 c1.medium a good choice for Montage but more powerful processors better for other two.
16
Data Transfer Costs OperationCost $/GB Transfer In0.10 Transfer Out0.17 For Broadband and Epigenome, economical to transfer data out of the cloud For Montage, output larger than input, so the costs to transfer data out are equal to or higher than processing costs for all but one processing instance. Is it more economical to store data on the cloud? ApplicationInput (GB)Output (GB)Logs (MB) Montage4.27.940 Broadband4.10.165.5 Epigenome1.80.33.3 ApplicationInputOutputLogsTotal Montage$0.42$1.32<$0.01$1.75 Broadband$0.40$0.03<$0.01$0.43 Epigenome$0.18$0.05<0.01$0.23
17
Storage Costs ItemCharges $ Storage of VM’s in S3 Disk0.15/GB-Month Storage of data in EBS disk0.10/GB-Month ItemLow Cost ($)High Cost ($) Transfer Data In0.42 Processing0.552.45 Storage1.07 Transfer Out…1.32 Totals2.045.22 ApplicationDataVMMonthly Cost $ Montage0.950.121.07 Broadband0.020.100.12 Epigenome0.200.100.32 Storage Costs of Output/job Storage Charges … And the bottom line
18
Most cost-effective model ? 15 Xeon 3.2- GHz dual processor dual-core Dell 2650 Power Edge servers Aberdeen Technologies 6-TB staging disk farm Dell PowerVault MD1200 storage disks Transfer In ($)Store 2MASS ($)IPAC Service ($) Transfer In7,5603,780… Store input data17,10061,50013,200 Processing9,000 66,000 Transfer Data Out25,560 Cost $/job1.652.752.20 Assume 1,000 2MASS mosaics of 4 deg sq centered on M17 per month for 3 years. Assume c1.medium processor on Amazon EC2
19
Conclusions Clouds can be used effectively and fairly efficiently for scientific applications. The virtualization overhead is low. The high speed network and parallel file systems give HPC clusters a significant performance advantage over cloud computing for I/O bound applications. On Amazon EC2, primary cost for Montage is data transfer. Processing is primary cost for Broadband, epigenome. Amazon EC2 offers no dramatic cost benefits over a locally mounted image-mosaic service. Reference: G. Juve, E. Deelman, K. Vahi, G. Mehta, B. Berriman, B. P. Berman, and P. Maechling, "Scientific Workflow Applications on Amazon EC2," in CloudComputing Workshop in Conjunction with e-Science Oxford, UK: IEEE, 2009
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.