Comparison of Cloud Providers Presented by Mi Wang.

Slides:



Advertisements
Similar presentations
Network Resource Broker for IPTV in Cloud Computing Lei Liang, Dan He University of Surrey, UK OGF 27, G2C Workshop 15 Oct 2009 Banff,
Advertisements

Performance Testing - Kanwalpreet Singh.
Cloud Service Models and Performance Ang Li 09/13/2010.
Scheduling in Web Server Clusters CS 260 LECTURE 3 From: IBM Technical Report.
Chapter 4 Infrastructure as a Service (IaaS)
INTRODUCTION TO CLOUD COMPUTING CS 595 LECTURE 6 2/13/2015.
1 Routing and Scheduling in Web Server Clusters. 2 Reference The State of the Art in Locally Distributed Web-server Systems Valeria Cardellini, Emiliano.
Technical Architectures
Authors: Thilina Gunarathne, Tak-Lon Wu, Judy Qiu, Geoffrey Fox Publish: HPDC'10, June 20–25, 2010, Chicago, Illinois, USA ACM Speaker: Jia Bao Lin.
An Adaptable Benchmark for MPFS Performance Testing A Master Thesis Presentation Yubing Wang Advisor: Prof. Mark Claypool.
Using Metacomputing Tools to Facilitate Large Scale Analyses of Biological Databases Vinay D. Shet CMSC 838 Presentation Authors: Allison Waugh, Glenn.
Chapter 1 Introduction 1.1A Brief Overview - Parallel Databases and Grid Databases 1.2Parallel Query Processing: Motivations 1.3Parallel Query Processing:
Fair Scheduling in Web Servers CS 213 Lecture 17 L.N. Bhuyan.
Nikolay Tomitov Technical Trainer SoftAcad.bg.  What are Amazon Web services (AWS) ?  What’s cool when developing with AWS ?  Architecture of AWS 
On-Demand Media Streaming Over the Internet Mohamed M. Hefeeda, Bharat K. Bhargava Presented by Sam Distributed Computing Systems, FTDCS Proceedings.
Database System Architectures  Client-server Database System  Parallel Database System  Distributed Database System Wei Jiang.
CloudCmp: Shopping for a Cloud Made Easy Ang Li Xiaowei Yang Duke University Srikanth Kandula Ming Zhang Microsoft Research 6/22/2010HotCloud 2010, Boston1.
Chapter 9 Overview  Reasons to monitor SQL Server  Performance Monitoring and Tuning  Tools for Monitoring SQL Server  Common Monitoring and Tuning.
New Challenges in Cloud Datacenter Monitoring and Management
Internet GIS. A vast network connecting computers throughout the world Computers on the Internet are physically connected Computers on the Internet use.
Load Test Planning Especially with HP LoadRunner >>>>>>>>>>>>>>>>>>>>>>
Measuring zSeries System Performance Dr. Chu J. Jong School of Information Technology Illinois State University 06/11/2012 Sponsored in part by Deer &
Computer System Lifecycle Chapter 1. Introduction Computer System users, administrators, and designers are all interested in performance evaluation. Whether.
A Brief Overview by Aditya Dutt March 18 th ’ Aditya Inc.
Shilpa Seth.  Centralized System Centralized System  Client Server System Client Server System  Parallel System Parallel System.
Cloud Computing for the Enterprise November 18th, This work is licensed under a Creative Commons.
Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.
PMIT-6102 Advanced Database Systems
Computer System Architectures Computer System Software
Cloud computing is the use of computing resources (hardware and software) that are delivered as a service over the Internet. Cloud is the metaphor for.
Cloud Benchmarking Soroush Rostami Advanced Topics in Information Systems Mazandaran University of Science and Technology, Advisor:
Cloud Computing. What is Cloud Computing? Cloud computing is a model for enabling convenient, on-demand network access to a shared pool of configurable.
+ CS 325: CS Hardware and Software Organization and Architecture Cloud Architectures.
Monitoring Latency Sensitive Enterprise Applications on the Cloud Shankar Narayanan Ashiwan Sivakumar.
Performance Concepts Mark A. Magumba. Introduction Research done on 1058 correspondents in 2006 found that 75% OF them would not return to a website that.
Web Search Using Mobile Cores Presented by: Luwa Matthews 0.
Frontiers in Massive Data Analysis Chapter 3.  Difficult to include data from multiple sources  Each organization develops a unique way of representing.
Server to Server Communication Redis as an enabler Orion Free
11 CLUSTERING AND AVAILABILITY Chapter 11. Chapter 11: CLUSTERING AND AVAILABILITY2 OVERVIEW  Describe the clustering capabilities of Microsoft Windows.
Windows Azure. Azure Application platform for the public cloud. Windows Azure is an operating system You can: – build a web application that runs.
Full and Para Virtualization
Cloud Computing is a Nebulous Subject Or how I learned to love VDF on Amazon.
3/12/2013Computer Engg, IIT(BHU)1 CLOUD COMPUTING-1.
Web Technologies Lecture 13 Introduction to cloud computing.
Lecture III: Challenges for software engineering with the cloud CS 4593 Cloud-Oriented Big Data and Software Engineering.
09/13/04 CDA 6506 Network Architecture and Client/Server Computing Peer-to-Peer Computing and Content Distribution Networks by Zornitza Genova Prodanoff.
1 TCS Confidential. 2 Objective : In this session we will be able to learn:  What is Cloud Computing?  Characteristics  Cloud Flavors  Cloud Deployment.
Capacity Planning in a Virtual Environment Chris Chesley, Sr. Systems Engineer
Configuring SQL Server for a successful SharePoint Server Deployment Haaron Gonzalez Solution Architect & Consultant Microsoft MVP SharePoint Server
Cofax Scalability Document Version Scaling Cofax in General The scalability of Cofax is directly related to the system software, hardware and network.
© 2015 MetricStream, Inc. All Rights Reserved. AWS server provisioning © 2015 MetricStream, Inc. All Rights Reserved. By, Srikanth K & Rohit.
Amazon Web Services. Amazon Web Services (AWS) - robust, scalable and affordable infrastructure for cloud computing. This session is about:
Spark on Entropy : A Reliable & Efficient Scheduler for Low-latency Parallel Jobs in Heterogeneous Cloud Huankai Chen PhD Student at University of Kent.
SEMINAR ON.  OVERVIEW -  What is Cloud Computing???  Amazon Elastic Cloud Computing (Amazon EC2)  Amazon EC2 Core Concept  How to use Amazon EC2.
Unit 3 Virtualization.
Chapter 6: Securing the Cloud
Threads vs. Events SEDA – An Event Model 5204 – Operating Systems.
Network Load Balancing
LECTURE 34: WEB PROGRAMMING FOR SCALE
Server Allocation for Multiplayer Cloud Gaming
Introduction to Cloud Computing
LECTURE 32: WEB PROGRAMMING FOR SCALE
Chapter 17: Database System Architectures
LECTURE 33: WEB PROGRAMMING FOR SCALE
Building a Database on S3
Cloud computing mechanisms
AWS Cloud Computing Masaki.
Virtual Memory: Working Sets
Database System Architectures
LECTURE 33: WEB PROGRAMMING FOR SCALE
Presentation transcript:

Comparison of Cloud Providers Presented by Mi Wang

Motivation Internet-based cloud computing has gained tremendous momentum. A growing number of companies provide public cloud computing services. How to compare their profermance? ◦ Help customer choose a provider that best fits its performance and cost needs. ◦ Help cloud provider know the right direction for improvements

A Benchmark for Clouds Introduction ◦ The goal of benchmarking a software system is to evaluate its average performance under a particular workload ◦ TPC benchmarks are widely used today in evaluating the performance of computer systems. ◦ Transaction Processing Performance Council (TPC) is a non-profit organization to define transaction processing and database benchmarks. ◦ However, they are not sufficient for analyzing novel cloud services

Requirements of a Cloud Benchmark Features and Metrics ◦ The main advantages of cloud computing are scalability, pay-per-use and fault-tolerance ◦ A benchmark for the cloud should test the these features and provide appropriate metrics for them.

Requirements of a Cloud Benchmark Architectures ◦ Clouds may have different service architectures ◦ A cloud benchmark should be general enough to cover the different architectural variants

Problems of TPC-W The TPC-W benchmark specifies an online bookstore that consists of 14 web interactions allowing to browse, search, display, update and order the products of the store. The main measured parameter is WIPS, the number of web interactions per second(WIPS) that the system can handle. TPC-W measure the cost by the ratio of total cost to maximum WIPS

Problems of TPC-W TPC-W is designed for transactional database systems. Cloud systems may not offer strong consistency constraints it requires. WIPS is not for adaptable and scalable systems. Ideal clouds would compensate increasing load by adding new processing units. It is not possible to report the maximum WIPS.

Problems of TPC-W Metric for cost is not applicable for clouds. Different price-plans and the lot- size prevent to calculate a single $/WIPS number. TPC-W does not reflect technical evolution of web applications TPC-W lacks adequate metrics for measuring the features of cloud systems like scalability, pay-per use and fault- tolerance

Ideas for a new benchmark Features ◦ Should analyze the ability of a dynamic system to adapt to a changing load in terms of scalability and costs ◦ Should run in different locations ◦ Should comprise web interactions that resemble the access patterns of Web 2.0 like applications. Multimedia content should also be included.

Ideas for a new benchmark Configurations: A new benchmark can choose between three different levels of consistency ◦ Low: All web interactions use only the BASE (Basically Available, Soft-State, Eventually Consistent) guarantees ◦ Medium: The web interactions use a mix of consistency guarantees ranging from BASE to ACID. ◦ High: All web interactions use only the ACID guarantees.

Ideas for a new benchmark Metrics: Scalability ◦ Ideally, clouds should scale linearly and infinitely with a constant cost per WI ◦ Increasing the issued WIPS over time and continuously counting the WI in a given response time. Measure the deviation between issued WIPS and answered WI

Ideas for a new benchmark Metrics: Scalability fifi yiyi f i+1 f i+2 f i+3 f i+4 y i+1 y i+2 y i+3 y i+4 f(x) = x b OR Non-linear regression function f(x) = x b b  (0, 1), b = 1 indicates perfect linear scaling

Ideas for a new benchmark Metrics: Cost ◦ Measure the cost in dollars per WIPS ◦ Price plans might cause variations of the $/WIPS ◦ Measure average and standard deviation of cost per WIPS Plans are fully utilized

Ideas for a new benchmark Metrics: Fault tolerance ◦ Failure is defined as a certain percentage of the resources used for the application is shut down ◦ Clouds would be able to replace these resources automatically ◦ Measure the ratio between WIPS in RT and Issued WIPS Failure

Goals of CloudCmp Provide performance and cost information about various cloud providers Help a provider identify its under- performing services compared to its competitors. Provide a fair comparison ◦ Characterizing all providers using the same set of workloads and metrics ◦ Skip specialized services that only a few providers offer

Goals of CloudCmp Reduce measurement overhead and monetary costs ◦ periodically measure each provider at different times of day across all its locations Comply with cloud providers’ use policies Cover a representative set of cloud providers

Method: Select Provider Amazon AWS Microsoft Azure Google AppEngine Rackspace CloudServers

Method: Identify core functionality Elastic compute cluster ◦ The cluster includes a variable number of virtual instances that run application code. Persistent storage ◦ The storage service keeps the state and data of an application and can be accessed by application instances through API calls. Intra-cloud network ◦ The intra-cloud network connects application instances with each other and with shared services. Wide-area network. ◦ The content of an application is delivered to end users through the wide-area network from multiple data centers (DCs) at different geographical locations.

Method: Identify core functionality Services offered by the providers ProviderElastic ClusterStorageWide-area Network AmazonXen VMSimpleDB (table), S3 (blob), SQS (queue) 3 Data Centers (2 in US, 1 in EU) MicrosoftAzure VMXStore (table, blob, queue) 6 Data Centers (2 each in US, EU, and Asia) Google AppEngine Proprietary sandbox DataStore (table)Unpublished number of Google Data Centers Rackspace CloudServers Xen VMCloudFiles (blob)2 Data Centers (all in US)

Method: Choose Performance Metrics Elastic compute cluster ◦ Provides virtual instances that host and run a customer’s application code. ◦ Is charged per usage:  IaaS: Time of an instance remains allocated  PaaS: CPU cycles consumes ◦ Elastic: can dynamically scale up and down the number of instances

Method: Choose Performance Metrics Metrics to compare Elastic compute cluster ◦ Benchmark finishing time  how long the instance takes to complete the benchmark tasks ◦ Scaling latency  time taken by a provider to allocate a new instance after a customer requests it ◦ Cost per benchmark  cost to complete each benchmark task

Method: Choose Performance Metrics Persistent Storage ◦ Three common types: table, blob, and queue  Table storage is designed to store structural data like conventional databases  Blob storage is designed to store unstructured blobs such as binary objects  Queue storage implements a global message queue to pass messages between different instances ◦ Two pricing models:  Based on CPU cycles consumed by an operation  Fixed cost per operation

Method: Choose Performance Metrics Metrics to compare Persistent Storage ◦ Operation response time  Time for a storage operation to finish ◦ Time to consistency  time between when data is written to the storage service and when all reads for the data return consistent and valid results ◦ Cost per operation

Method: Choose Performance Metrics Intra-cloud Network ◦ Connects a customer’s instances among themselves and with the shared services offered by a cloud ◦ None of the providers charge for traffic within their data centers. Inter-datacenter traffic is charged based on the amount of data Metrics to compare Intra-cloud Network ◦ Path capacity: TCP throughput ◦ Latency

Method: Choose Performance Metrics Wide-area Network ◦ The collection of network paths between a cloud’s data centers and external hosts on the Internet Metrics to compare Wide-area Network ◦ optimal wide-area network latency  The minimum latency between testers’ nodes and any data center owned by a provider

Implementation: Computation Metrics Benchmark tasks ◦ A modified set of Java-based benchmark tasks from SPECjvm2008 that satisfies constraints of all the providers Metrics: Benchmark finishing time ◦ Run the benchmark tasks on each of the virtual instance types provided by the clouds, and measure their finishing time ◦ Run instances of the same benchmark task in multiple threads to test multi-threading performance

Implementation: Computation Metrics Metrics: Cost per benchmark ◦ Multiply the published per hour price by finishing time for those who charge by time ◦ Using billing API for those who charge by CPU cycle. Metrics: Scaling latency ◦ Repeatedly request new instances and record the request time and available time ◦ Divide the latency into two segments to locate the performance bottleneck  Provisioning latency: Request time to powered-on time  Booting latency: Powered-on time to available time

Implementation: Storage Metrics Benchmark tasks ◦ Use Java-based client to test API to get, put or query data from the service ◦ Non-Java-based clients are also tested ◦ Mimic streaming workload to avoid the potential impact of memory or disk bottlenecks at the client’s side

Implementation: Storage Metrics Metrics: Response time ◦ The time from when the client instance begins the operation to when the last byte reaches the client Metrics: Throughput ◦ The maximum rate that a client instance obtains from the storage service

Implementation: Storage Metrics Metrics: Time to Consistency ◦ Write an object to a storage service, then repeatedly read the object and measure how long it takes before correct result is returned Metrics: Cost per operation ◦ Via billing API

Implementation: Network Metrics Metrics: Intra-cloud Network Throughput and Latency ◦ Allocate a pair of instances in the same or different data centers, run standard tools such as iperf and ping between the two instances Metrics: Optimal Wide-area Network Latency ◦ Run an instance in each data center owned by the provider and ping these instances from over 200 nodes on PlanetLab( a group of computers available as a testbed for computer networking and distributed systems research ). Record the smallest RTT ◦ For AppEngine: collect the IP addresses of the instance from each of the PlanetLab nodes. Ping all of these IP addresses from each of the PlanetLab nodes

Results Anonymize the identities of the providers in our results, and refer to them as C1 to C4 (But it is easy to see that: C1 – AWS, C2 – Rackspace, C3 – AppEngine, C4 – Microsoft Azure) Test all instance types offered by C2 and C4, and the general-purpose instances from C1 Refers to instance types as provider.i, i denotes the tier of service Compare instances from both Linux and Windows for experiments depend on the type of OS. Test Linux instances for others

Results Cloud instances tested

Results: Elastic Compute Cluster Metrics: Benchmark finishing time

Results: Elastic Compute Cluster Price-comparable instances offered by different providers have widely different CPU and memory performance. The instance types appear to be constructed in different ways. ◦ For C1, the high-end instances may have faster CPUs ◦ For C4, all instances might share the same type of physical CPU ◦ C2 may be lightly loaded during the test The disk I/O intensive task exhibits high variation on some C1 and C4 instances, probably due to interference from other colocated instances

Results: Elastic Compute Cluster Metrics: Performance at Cost

Results: Elastic Compute Cluster For single-threaded tests, the smallest instances of most providers are the most cost-effective For multi-threaded tests, the high-end instances are not more cost-effective than the low-end ones. ◦ The prices of high-end instances are proportional to the number of CPU cores ◦ Bounded by memory bus and I/O bandwidth ◦ For parallel applications it might be more cost- effective to use more low-end instances

Results: Elastic Compute Cluster Metrics: Scaling Latency

Results: Elastic Compute Cluster Metrics: Scaling Latency ◦ All cloud providers can allocate new instances quickly with the average scaling latency below 10 minutes ◦ Windows instances appear to take longer time to create than Linux ones  For C1, Windows ones have larger booting latency, possibly due to slower CPUs  For C2, provisioning latency of the Windows instances is much larger. It is likely that C2 may have different infrastructures to provision Linux and Windows instances.

Results: Persistent Storage Table Storage ◦ Test the performance of three operations: get, put, and query ◦ Each operation runs against two pre-defined data tables: a small one with 1K entries, and a large one with 100K entries ◦ Repeat each operation several hundred times ◦ C2 is not tested because it does not provide a table service

Results: Persistent Storage Table Storage: Response Time ◦ The three services perform similarly for both get and put operations ◦ For the query operation, C1 appears to have a better indexing strategy ◦ None of the services show noticeable performance degradation in multiple concurrent operations

Results: Persistent Storage Table Storage: Time to Consistency ◦ 40% of the get operations in C1 see inconsistency when triggered right after a put, Other providers exhibit no such inconsistency ◦ C1 does provide an API option to request strong consistency but disables it by default.

Results: Persistent Storage Table Storage: Cost per operation ◦ Both C1 and C3 charge lower cost for get/put than query ◦ C4 charges the same across operations and can improve its charging model by accounting for the complexity of the operation.

Results: Persistent Storage Blob Storage: Response time

Results: Persistent Storage Blob Storage: Response time ◦ Blobs of different sizes may stress different bottlenecks. The latency for small blobs can be dominated by one-off costs whereas that for large blobs can be determined by service throughput, network bandwidth, or client-side contention. ◦ C2’s store may be tuned for read heavy workload.

Results: Persistent Storage Blob Storage: Response time of multiple concurrent operations C1 and C4’s blob service throughput is well-tuned for multiple concurrent operations

Results: Persistent Storage Blob Storage: Maximum Throughput ◦ C1 and C2’s blob service throughput is close to their intra-datacenter network bandwidth ◦ C4’s blob service throughput of a large instance also corresponds to TCP throughput inside its datacenter, and may not be constrained by instance itself.

Results: Persistent Storage Blob Storage: Cost per operation ◦ The charging models are similar for all three providers and are based on the number of operations and the size of the blob. ◦ No differences

Results: Persistent Storage Queue Storage: Response time ◦ Message size is 50 Byte

Results: Persistent Storage Queue Storage ◦ No significant performance degradation is found in sending up to 32 concurrent messages ◦ Response time of the queue service is on the same order of magnitude as that of the table and blob services. ◦ Both services charge similarly–1 cent per 10K operations

Results: Intra-cloud Network Data Centers ◦ C3 is not considered because it does not allow direct communication between instances

Results: Intra-cloud Network Intra-datacenter Network ◦ C1 and C4 provide very high TCP throughput ◦ C2 has much lower throughput

Results: Intra-cloud Network Inter-datacenter Network ◦ Only the results for data centers within the US is shown ◦ The throughput across datacenters is much smaller than that within the datacenter

Results: Wide-area Network Optimal wide-area latency

Results: Wide-area Network Optimal wide-area latency ◦ Latency of C3 are lower than other providers, may due to its widely dispersed presence ◦ C1has a larger fraction of nodes that have an optimal latency higher than 100ms, which are mostly in Asia and South America, where C1 does not have a presence ◦ C2 has the worst latency distribution because it has the smallest number of data centers.

Using CloudCmp: Case Studies Deploy three simple applications on the cloud to check whether the benchmark results from CloudCmp are consistent with the performance experienced by real applications. ◦ a storage intensive e-commerce website ◦ a computation intensive application for DNA alignment ◦ a latency sensitive website that serves static objects

Using CloudCmp: Case Studies E-commerce Website ◦ a Java implementation of TPC-W, database operations are redirected to each cloud’s table storage APIs ◦ Major performance goal is to minimize the page generation time. The performance bottleneck lies in accessing table storage. ◦ C1 offers the lowest table service response time among all providers, so it should have best performance for TPC-W

Using CloudCmp: Case Studies E-commerce Website ◦ C1 has the lowest page generation time ◦ C4 has lower generation time than C3 for most pages except for pages 9 and 10, which contain many query operations, since C4 performs better than C3 in query operations.

Using CloudCmp: Case Studies Parallel Scientific Computation ◦ Blast, a parallel computation application for DNA alignment ◦ Blast instances communicate with each other through the queue storage service, and also use blob storage service to store results ◦ Major performance goal is to reduce job execution time given a budget on number of instances ◦ At a similar price point, C4.1 performs better than C1.1

Using CloudCmp: Case Studies Parallel Scientific Computation

Using CloudCmp: Case Studies Latency Sensitive Website ◦ Set up a simple web server to serve static pages. Download the pages from PlanetLab nodes around the world ◦ The performance goal is to minimize the page downloading time from many nodes ◦ Major bottleneck is the wide area network latency ◦ C3 having the lowest wide-area network latency distribution

Using CloudCmp: Case Studies Latency Sensitive Website ◦ C3 has the smallest page downloading time

Limitations and Future Works Limitations ◦ In several occasions, CouldCmp sacrifices depth for breadth ◦ The results is only a snapshot comparison among cloud providers Future works ◦ Use CloudCmp’s measurement results to make application-specific performance prediction ◦ It can be promising to develop a meta-cloud that combines the diverse strengths of various providers