Evaluating Caching and Storage Options on the Amazon Web Services Cloud Gagan Agrawal, Ohio State University - Columbus, OH David Chiu, Washington State University - Vancouver, WA Presented by Smita Vijayakumar, Juniper Networks
2 Outline Introduction to Cloud Computing Background on AWS and Motivation Cost and Performance Evaluation Conclusion
3 Cloud Computing Paradigm Cloud Utility Providers: Amazon AWS, Azure, Cloudera, Google App Engine Consumers: Companies, labs, schools, et al.
4 Cloud Computing Paradigm Algorithms & Data Cloud Utility Providers: Amazon AWS, Azure, Cloudera, Google App Engine Consumers: Companies, labs, schools, et al.
5 Cloud Computing Paradigm Algorithms & Data Cloud Utility Providers: Amazon AWS, Azure, Cloudera, Google App Engine Consumers: Companies, labs, schools, et al.
6 Cloud Computing Paradigm Algorithms & Data Cloud Utility Providers: Amazon AWS, Azure, Cloudera, Google App Engine Consumers: Companies, labs, schools, et al. Processed Results
7 Promises of Cloud Computing Allows us to consolidate machines and outsource computation and storage Pay-as-you-go Computing Infinite compute resources and storage
8 Outline Introduction to Cloud Computing Background on AWS and Motivation Cost and Performance Evaluation Conclusion
9 A Motivating Example A service-oriented system that answers queries from a similar domain Intermediate and final results can be cached and reused for future queries Often present in workflow applications
10 Data-Intensive Applications High-Energy Physics Bioinformatics Data Mining Geoinformatics Good Uses of the Cloud
11 Storage Requirements for Application Data Need for data storage Each stage of a workflow application can store many GBs of data Streaming applications require fast and vast storage for efficient analysis Need for caching
12 Leveraging the Cloud for Storage Store and Cache Intermediate and Final Results in the Cloud The Cloud has many options for data storage Memory Disks Network Disks Highly Available Persistent Storage There are several tradeoffs in each option
13 Amazon Web Services (AWS) A Case study: AWS has emerged as one of the most widely used Cloud platform We consider caching and storage performance in three AWS Services: Elastic Compute Cloud (EC2) Machine instances Simple Storage Service (S3) Elastic Block Storage (EBS)
14 AWS Services: EC2 Elastic Compute Cloud (EC2) Access to virtualized machines with varying capabilities (e.g., CPU cores, memory, disk space) depending on price. Instance TypeCPUMemoryDiskI/O Small1 virtual core1.7GB160GBmedium XLarge4 virtual cores (x 2 compute units ea) 15.0GB1.7TBhigh
15 AWS Services: EBS Elastic Block Storage (EBS) Persisted network disks. Must be mounted onto EC2 machine before use. Users must initially specify a fixed size and format to appropriate file system.
16 AWS Services: S3 Simple Storage Service (S3) Simple FTP-style API: GET, PUT, etc. Highly available, reliable, and durable storage (but slower) Infinite capacity Not required to be used with EC2 machines. Very inexpensive in terms of costs.
17 Costs of AWS Services
18 Tradeoffs Per Application and Service Caching in-core (EC2-Memory) Fast, but expensive Small, may need extra logic to coordinate set of EC2 nodes Data is volatile
19 Tradeoffs Per Application and Service Caching on local disk (EC2-Disk) Much slower than memory Much more space Data is still volatile
20 Tradeoffs Per Application and Service Caching on Elastic Block Store (EC2-EBS) Possibly slower than disk Volume size is initially configured by application users Data is persisted
21 Tradeoffs Per Application and Service Caching on S3 Slowest option, but most reliable No bound on size Data is persisted
22 Outline Introduction to Cloud Computing Background on AWS and Motivation Cost and Performance Evaluation Conclusion
23 Experiments We compare performance and cost tradeoffs in these various AWS options: Caching in-core (small and XLarge instance, not persistent) Caching on-disk (small and XLarge instance, not persistent) Caching on EBS (small and XLarge instance, persistent) Caching in S3 (persistent)
24 Experimental Application Geospatial Application: Land Elevation Change In general, 2 large matrices (DEM files) are retrieved, and their difference is returned 500 unique requests Requests are issued randomly Eviction not considered (we assume cache/storage configuration is being used to store all results)
25 Performance We use 4 different DEM data sizes to test performance: 1KB, 1MB, 5MB, 50MB This means a full cache would hold 500KB, 500MB, 2.5GB, 25GB
26 1KB DEM Size
27 1MB DEM Size
28 5MB DEM Size
29 50MB DEM Size
30 Cost Analysis We next assess the costs versus the performance Performance is being measured as relative speedup over the baseline DEM process execution, shown in Table 2 We project costs and speedup over 2000 and requests
31 Monthly Costs for Volatile Cache (1MB) I/O Requests outside of AWS 2000 I/O Requests outside of AWS Cost per unit speedup is low when requests are high. I/O costs are still low because of small data size Speedup
32 Monthly Costs for Volatile Cache (50MB) I/O Requests outside of AWS 2000 I/O Requests outside of AWS Costs are now dominated by I/O due to large data size In terms of performance, makes more sense to use xlarge for large data size Speedup small instance makes better economic sense for small number of requests
33 Monthly Costs for Persistent Cache (1MB) I/O Requests outside of AWS 2000 I/O Requests outside of AWS S3 makes better economic sense than EBS-based instances Speedup S3 performance is comparable for a cache with small I/O requests
34 Monthly Costs for Persistent Cache (50MB) I/O Requests outside of AWS 2000 I/O Requests outside of AWS Interesting - Even with low cost of S3, it still makes sense to use xlarge when I/O requests are high Speedup S3 still comparable, and makes better economic sense than EBS-based instances
35 Outline Introduction to Cloud Computing Background on AWS and Motivation Cost and Performance Evaluation Conclusion
36 Summary (1) For smaller data (<= 5MB) If request rate is low: Use small instance on-disk If request rate is high: Use small instance in-memory Although I/O is slow, the cost of using small instance is very low If persistence is needed, Use S3, and avoid EBS
37 Summary (2) For larger data (>= 50MB and large cache sizes) Use xlarge instances Higher I/O rates Larger memory and disk capacity EBS may be considered in conjunction to XLarge instances for persistence If performance is not an issue, but persistence and costs are, use S3
38 Conclusion Cloud offers many viable options for data storage and caching We evaluated the cost-performance tradeoffs of these various options, and determined a roadmap for making clear decisions on resource usage
39 Thank you Questions and Comments? David Chiu - Gagan Agrawal –