Evaluating Caching and Storage Options on the Amazon Web Services Cloud Gagan Agrawal, Ohio State University - Columbus, OH David Chiu, Washington State.

Slides:



Advertisements
Similar presentations
Variations of the Turing Machine
Advertisements

Using Matrices in Real Life
AP STUDY SESSION 2.
1
1 Vorlesung Informatik 2 Algorithmen und Datenstrukturen (Parallel Algorithms) Robin Pomplun.
© 2008 Pearson Addison Wesley. All rights reserved Chapter Seven Costs.
Copyright © 2003 Pearson Education, Inc. Slide 1 Computer Systems Organization & Architecture Chapters 8-12 John D. Carpinelli.
Chapter 1 The Study of Body Function Image PowerPoint
Processes and Operating Systems
Copyright © 2011, Elsevier Inc. All rights reserved. Chapter 6 Author: Julia Richards and R. Scott Hawley.
Author: Julia Richards and R. Scott Hawley
1 Copyright © 2013 Elsevier Inc. All rights reserved. Appendix 01.
1 Copyright © 2013 Elsevier Inc. All rights reserved. Chapter 3 CPUs.
Properties Use, share, or modify this drill on mathematic properties. There is too much material for a single class, so you’ll have to select for your.
UNITED NATIONS Shipment Details Report – January 2006.
Towards Automating the Configuration of a Distributed Storage System Lauro B. Costa Matei Ripeanu {lauroc, NetSysLab University of British.
Business Transaction Management Software for Application Coordination 1 Business Processes and Coordination. Introduction to the Business.
1 RA I Sub-Regional Training Seminar on CLIMAT&CLIMAT TEMP Reporting Casablanca, Morocco, 20 – 22 December 2005 Status of observing programmes in RA I.
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
FACTORING ax2 + bx + c Think “unfoil” Work down, Show all steps.
Year 6 mental test 5 second questions
Year 6 mental test 10 second questions
Chapter 6 File Systems 6.1 Files 6.2 Directories
1 Chapter 12 File Management Patricia Roy Manatee Community College, Venice, FL ©2008, Prentice Hall Operating Systems: Internals and Design Principles,
REVIEW: Arthropod ID. 1. Name the subphylum. 2. Name the subphylum. 3. Name the order.
Break Time Remaining 10:00.
4.1 © 2004 Pearson Education, Inc. Exam Managing and Maintaining a Microsoft® Windows® Server 2003 Environment Lesson 4: Organizing a Disk for Data.
Turing Machines.
Table 12.1: Cash Flows to a Cash and Carry Trading Strategy.
Database Performance Tuning and Query Optimization
PP Test Review Sections 6-1 to 6-6
Bright Futures Guidelines Priorities and Screening Tables
EIS Bridge Tool and Staging Tables September 1, 2009 Instructor: Way Poteat Slide: 1.
2 |SharePoint Saturday New York City
Exarte Bezoek aan de Mediacampus Bachelor in de grafische en digitale media April 2014.
VOORBLAD.
Making Time-stepped Applications Tick in the Cloud Tao Zou, Guozhang Wang, Marcos Vaz Salles*, David Bindel, Alan Demers, Johannes Gehrke, Walker White.
Chapter 6 File Systems 6.1 Files 6.2 Directories
Sample Service Screenshots Enterprise Cloud Service 11.3.
Copyright © 2012, Elsevier Inc. All rights Reserved. 1 Chapter 7 Modeling Structure with Blocks.
1 RA III - Regional Training Seminar on CLIMAT&CLIMAT TEMP Reporting Buenos Aires, Argentina, 25 – 27 October 2006 Status of observing programmes in RA.
Factor P 16 8(8-5ab) 4(d² + 4) 3rs(2r – s) 15cd(1 + 2cd) 8(4a² + 3b²)
Basel-ICU-Journal Challenge18/20/ Basel-ICU-Journal Challenge8/20/2014.
1..
CONTROL VISION Set-up. Step 1 Step 2 Step 3 Step 5 Step 4.
© 2012 National Heart Foundation of Australia. Slide 2.
Adding Up In Chunks.
MaK_Full ahead loaded 1 Alarm Page Directory (F11)
Understanding Generalist Practice, 5e, Kirst-Ashman/Hull
1 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt Synthetic.
Model and Relationships 6 M 1 M M M M M M M M M M M M M M M M
25 seconds left…...
1 Using one or more of your senses to gather information.
1 hi at no doifpi me be go we of at be do go hi if me no of pi we Inorder Traversal Inorder traversal. n Visit the left subtree. n Visit the node. n Visit.
Analyzing Genes and Genomes
©Brooks/Cole, 2001 Chapter 12 Derived Types-- Enumerated, Structure and Union.
Essential Cell Biology
Clock will move after 1 minute
Intracellular Compartments and Transport
PSSA Preparation.
Essential Cell Biology
Immunobiology: The Immune System in Health & Disease Sixth Edition
Physics for Scientists & Engineers, 3rd Edition
1 Chapter 13 Nuclear Magnetic Resonance Spectroscopy.
Energy Generation in Mitochondria and Chlorplasts
Elastic Cloud Caches for Accelerating Service-Oriented Computations Gagan Agrawal Ohio State University Columbus, OH David Chiu Washington State University.
Presentation transcript:

Evaluating Caching and Storage Options on the Amazon Web Services Cloud Gagan Agrawal, Ohio State University - Columbus, OH David Chiu, Washington State University - Vancouver, WA Presented by Smita Vijayakumar, Juniper Networks

2 Outline Introduction to Cloud Computing Background on AWS and Motivation Cost and Performance Evaluation Conclusion

3 Cloud Computing Paradigm Cloud Utility Providers: Amazon AWS, Azure, Cloudera, Google App Engine Consumers: Companies, labs, schools, et al.

4 Cloud Computing Paradigm Algorithms & Data Cloud Utility Providers: Amazon AWS, Azure, Cloudera, Google App Engine Consumers: Companies, labs, schools, et al.

5 Cloud Computing Paradigm Algorithms & Data Cloud Utility Providers: Amazon AWS, Azure, Cloudera, Google App Engine Consumers: Companies, labs, schools, et al.

6 Cloud Computing Paradigm Algorithms & Data Cloud Utility Providers: Amazon AWS, Azure, Cloudera, Google App Engine Consumers: Companies, labs, schools, et al. Processed Results

7 Promises of Cloud Computing Allows us to consolidate machines and outsource computation and storage Pay-as-you-go Computing Infinite compute resources and storage

8 Outline Introduction to Cloud Computing Background on AWS and Motivation Cost and Performance Evaluation Conclusion

9 A Motivating Example A service-oriented system that answers queries from a similar domain Intermediate and final results can be cached and reused for future queries Often present in workflow applications

10 Data-Intensive Applications High-Energy Physics Bioinformatics Data Mining Geoinformatics Good Uses of the Cloud

11 Storage Requirements for Application Data Need for data storage Each stage of a workflow application can store many GBs of data Streaming applications require fast and vast storage for efficient analysis Need for caching

12 Leveraging the Cloud for Storage Store and Cache Intermediate and Final Results in the Cloud The Cloud has many options for data storage Memory Disks Network Disks Highly Available Persistent Storage There are several tradeoffs in each option

13 Amazon Web Services (AWS) A Case study: AWS has emerged as one of the most widely used Cloud platform We consider caching and storage performance in three AWS Services: Elastic Compute Cloud (EC2) Machine instances Simple Storage Service (S3) Elastic Block Storage (EBS)

14 AWS Services: EC2 Elastic Compute Cloud (EC2) Access to virtualized machines with varying capabilities (e.g., CPU cores, memory, disk space) depending on price. Instance TypeCPUMemoryDiskI/O Small1 virtual core1.7GB160GBmedium XLarge4 virtual cores (x 2 compute units ea) 15.0GB1.7TBhigh

15 AWS Services: EBS Elastic Block Storage (EBS) Persisted network disks. Must be mounted onto EC2 machine before use. Users must initially specify a fixed size and format to appropriate file system.

16 AWS Services: S3 Simple Storage Service (S3) Simple FTP-style API: GET, PUT, etc. Highly available, reliable, and durable storage (but slower) Infinite capacity Not required to be used with EC2 machines. Very inexpensive in terms of costs.

17 Costs of AWS Services

18 Tradeoffs Per Application and Service Caching in-core (EC2-Memory) Fast, but expensive Small, may need extra logic to coordinate set of EC2 nodes Data is volatile

19 Tradeoffs Per Application and Service Caching on local disk (EC2-Disk) Much slower than memory Much more space Data is still volatile

20 Tradeoffs Per Application and Service Caching on Elastic Block Store (EC2-EBS) Possibly slower than disk Volume size is initially configured by application users Data is persisted

21 Tradeoffs Per Application and Service Caching on S3 Slowest option, but most reliable No bound on size Data is persisted

22 Outline Introduction to Cloud Computing Background on AWS and Motivation Cost and Performance Evaluation Conclusion

23 Experiments We compare performance and cost tradeoffs in these various AWS options: Caching in-core (small and XLarge instance, not persistent) Caching on-disk (small and XLarge instance, not persistent) Caching on EBS (small and XLarge instance, persistent) Caching in S3 (persistent)

24 Experimental Application Geospatial Application: Land Elevation Change In general, 2 large matrices (DEM files) are retrieved, and their difference is returned 500 unique requests Requests are issued randomly Eviction not considered (we assume cache/storage configuration is being used to store all results)

25 Performance We use 4 different DEM data sizes to test performance: 1KB, 1MB, 5MB, 50MB This means a full cache would hold 500KB, 500MB, 2.5GB, 25GB

26 1KB DEM Size

27 1MB DEM Size

28 5MB DEM Size

29 50MB DEM Size

30 Cost Analysis We next assess the costs versus the performance Performance is being measured as relative speedup over the baseline DEM process execution, shown in Table 2 We project costs and speedup over 2000 and requests

31 Monthly Costs for Volatile Cache (1MB) I/O Requests outside of AWS 2000 I/O Requests outside of AWS Cost per unit speedup is low when requests are high. I/O costs are still low because of small data size Speedup

32 Monthly Costs for Volatile Cache (50MB) I/O Requests outside of AWS 2000 I/O Requests outside of AWS Costs are now dominated by I/O due to large data size In terms of performance, makes more sense to use xlarge for large data size Speedup small instance makes better economic sense for small number of requests

33 Monthly Costs for Persistent Cache (1MB) I/O Requests outside of AWS 2000 I/O Requests outside of AWS S3 makes better economic sense than EBS-based instances Speedup S3 performance is comparable for a cache with small I/O requests

34 Monthly Costs for Persistent Cache (50MB) I/O Requests outside of AWS 2000 I/O Requests outside of AWS Interesting - Even with low cost of S3, it still makes sense to use xlarge when I/O requests are high Speedup S3 still comparable, and makes better economic sense than EBS-based instances

35 Outline Introduction to Cloud Computing Background on AWS and Motivation Cost and Performance Evaluation Conclusion

36 Summary (1) For smaller data (<= 5MB) If request rate is low: Use small instance on-disk If request rate is high: Use small instance in-memory Although I/O is slow, the cost of using small instance is very low If persistence is needed, Use S3, and avoid EBS

37 Summary (2) For larger data (>= 50MB and large cache sizes) Use xlarge instances Higher I/O rates Larger memory and disk capacity EBS may be considered in conjunction to XLarge instances for persistence If performance is not an issue, but persistence and costs are, use S3

38 Conclusion Cloud offers many viable options for data storage and caching We evaluated the cost-performance tradeoffs of these various options, and determined a roadmap for making clear decisions on resource usage

39 Thank you Questions and Comments? David Chiu - Gagan Agrawal –