Cloud Performance Evaluation at TU Delft (2008—)

Slides:



Advertisements
Similar presentations
Challenge the future Delft University of Technology Overprovisioning for Performance Consistency in Grids Nezih Yigitbasi and Dick Epema Parallel.
Advertisements

Cloud Service Models and Performance Ang Li 09/13/2010.
SLA-Oriented Resource Provisioning for Cloud Computing
EHarmony in Cloud Subtitle Brian Ko. eHarmony Online subscription-based matchmaking service Available in United States, Canada, Australia and United Kingdom.
Esma Yildirim Department of Computer Engineering Fatih University Istanbul, Turkey DATACLOUD 2013.
Variability in Architectural Simulations of Multi-threaded Workloads Alaa R. Alameldeen and David A. Wood University of Wisconsin-Madison
Charles Reiss *, Alexey Tumanov †, Gregory R. Ganger †, Randy H. Katz *, Michael A. Kozuch ‡ * UC Berkeley† CMU‡ Intel Labs.
Efficient Autoscaling in the Cloud using Predictive Models for Workload Forecasting Roy, N., A. Dubey, and A. Gokhale 4th IEEE International Conference.
Authors: Thilina Gunarathne, Tak-Lon Wu, Judy Qiu, Geoffrey Fox Publish: HPDC'10, June 20–25, 2010, Chicago, Illinois, USA ACM Speaker: Jia Bao Lin.
1 NetGames 2010 – CAMEO: Continuous Analytics for Massively Multiplayer Online Games CAMEO : Enabling Social Networks for Massively Multiplayer Online.
1 Google Workshop at TU Delft, 2010 – Online Games and Clouds Cloudifying Games: Rain for the Thirsty Alexandru Iosup Parallel and Distributed Systems.
1 A Performance Study of Grid Workflow Engines Alexandru Iosup and Dick Epema PDS Group Delft University of Technology The Netherlands Corina Stratan Parallel.
1 Trace-Based Characteristics of Grid Workflows Alexandru Iosup and Dick Epema PDS Group Delft University of Technology The Netherlands Simon Ostermann,
July 13, “How are Real Grids Used?” The Analysis of Four Grid Traces and Its Implications IEEE Grid 2006 Alexandru Iosup, Catalin Dumitrescu, and.
Euro-Par 2008, Las Palmas, 27 August DGSim : Comparing Grid Resource Management Architectures Through Trace-Based Simulation Alexandru Iosup, Ozan.
CloudCmp: Shopping for a Cloud Made Easy Ang Li Xiaowei Yang Duke University Srikanth Kandula Ming Zhang Microsoft Research 6/22/2010HotCloud 2010, Boston1.
New Challenges in Cloud Datacenter Monitoring and Management
1 Efficient Management of Data Center Resources for Massively Multiplayer Online Games V. Nae, A. Iosup, S. Podlipnig, R. Prodan, D. Epema, T. Fahringer,
Google AppEngine. Google App Engine enables you to build and host web apps on the same systems that power Google applications. App Engine offers fast.
August 28, Performance Analysis of Cloud Computing Services for Many-Tasks Scientific Computing Berkeley, CA, USA Alexandru Iosup, Nezih Yigitbasi,
August 29, Our team: Undergrad Nassos Antoniou, Thomas de Ruiter, Ruben Verboon, … Grad Siqi Shen, Nezih Yigitbasi, Ozan Sonmez Staff Henk Sips,
Euro-Par 2007, Rennes, 29th August 1 The Characteristics and Performance of Groups of Jobs in Grids Alexandru Iosup, Mathieu Jan *, Ozan Sonmez and Dick.
Cloud Computing for the Enterprise November 18th, This work is licensed under a Creative Commons.
Cloud computing is the use of computing resources (hardware and software) that are delivered as a service over the Internet. Cloud is the metaphor for.
1 TUD-PDS A Periodic Portfolio Scheduler for Scientific Computing in the Data Center Kefeng Deng, Ruben Verboon, Kaijun Ren, and Alexandru Iosup Parallel.
1 Cloud Computing Research at TU Delft – A. Iosup Alexandru Iosup Parallel and Distributed Systems Group Delft University of Technology The Netherlands.
A Performance Evaluation of Azure and Nimbus Clouds for Scientific Applications Radu Tudoran KerData Team Inria Rennes ENS Cachan 10 April 2012 Joint work.
1 EuroPar 2009 – POGGI: Puzzle-Based Online Games on Grid Infrastructures POGGI: Puzzle-Based Online Games on Grid Infrastructures Alexandru Iosup Parallel.
Your First Azure Application Michael Stiefel Reliable Software, Inc.
1 Time & Cost Sensitive Data-Intensive Computing on Hybrid Clouds Tekin Bicer David ChiuGagan Agrawal Department of Compute Science and Engineering The.
A Framework for Elastic Execution of Existing MPI Programs Aarthi Raveendran Tekin Bicer Gagan Agrawal 1.
Presented by: Mostafa Magdi. Contents Introduction. Cloud Computing Definition. Cloud Computing Characteristics. Cloud Computing Key features. Cost Virtualization.
A Framework for Elastic Execution of Existing MPI Programs Aarthi Raveendran Graduate Student Department Of CSE 1.
October 18, Our team: Undergrad Anand Sawant, Ruben Verboon, Gargi Prasad, Arnoud Bakker, Nassos Antoniou, Thomas de Ruiter, … Grad Siqi Shen, Nezih.
(1) Scheduling for Multithreaded Chip Multiprocessors (Multithreaded CMPs)
1 Challenge the future KOALA-C: A Task Allocator for Integrated Multicluster and Multicloud Environments Presenter: Lipu Fei Authors: Lipu Fei, Bogdan.
Euro-Par, A Resource Allocation Approach for Supporting Time-Critical Applications in Grid Environments Qian Zhu and Gagan Agrawal Department of.
1 ROIA 2009 – CAMEO: Continuous Analytics for Massively Multiplayer Online Games CAMEO: Continuous Analytics for Massively Multiplayer Online Games Alexandru.
October 23, Our team: Undergrad Nassos Antoniou, Thomas de Ruiter, Ruben Verboon, … Grad Siqi Shen, Nezih Yigitbasi, Ozan Sonmez Staff Henk Sips,
October 27, Our team: Undergrad Nassos Antoniou, Thomas de Ruiter, Ruben Verboon, … Grad Siqi Shen, Nezih Yigitbasi, Ozan Sonmez Staff Henk Sips,
The New Zealand Institute for Plant & Food Research Limited Use of Cloud computing in impact assessment of climate change Kwang Soo Kim and Doug MacKenzie.
November 29, Our team: Undergrad Thomas de Ruiter, Anand Sawant, Ruben Verboon, … Grad Siqi Shen, Guo Yong, Nezih Yigitbasi Staff Henk Sips, Dick.
OPERATING SYSTEMS CS 3530 Summer 2014 Systems and Models Chapter 03.
ApproxHadoop Bringing Approximations to MapReduce Frameworks
PDAC-10 Middleware Solutions for Data- Intensive (Scientific) Computing on Clouds Gagan Agrawal Ohio State University (Joint Work with Tekin Bicer, David.
3/12/2013Computer Engg, IIT(BHU)1 CLOUD COMPUTING-1.
Web Technologies Lecture 13 Introduction to cloud computing.
Ensieea Rizwani An energy-efficient management mechanism for large-scale server clusters By: Zhenghua Xue, Dong, Ma, Fan, Mei 1.
3/12/2013Computer Engg, IIT(BHU)1 CLOUD COMPUTING-2.
LACSI 2002, slide 1 Performance Prediction for Simple CPU and Network Sharing Shreenivasa Venkataramaiah Jaspal Subhlok University of Houston LACSI Symposium.
1 Evaluation of Cooperative Web Caching with Web Polygraph Ping Du and Jaspal Subhlok Department of Computer Science University of Houston presented at.
Spark on Entropy : A Reliable & Efficient Scheduler for Low-latency Parallel Jobs in Heterogeneous Cloud Huankai Chen PhD Student at University of Kent.
Cloud Benchmarking, Tools, and Challenges
OPERATING SYSTEMS CS 3502 Fall 2017
Lecture 2: Performance Evaluation
Analysis of File Systems Performance in Amazon EC2 Storage
Department of CSE CLOUD COMPUTING UNIT-V.
Amazon Web Services Submitted By- Section - B Group - 4
Grid Computing.
PA an Coordinated Memory Caching for Parallel Jobs
Yu Su, Yi Wang, Gagan Agrawal The Ohio State University
Department of Computer Science University of California, Santa Barbara
On Dynamic Resource Availability in Grids
IaaS Cloud Benchmarking: Approaches, Challenges, and Experience
Behavior Modification Report with Peak Reduction Component
Hawk: Hybrid Datacenter Scheduling
The Performance of Big Data Workloads in Cloud Datacenters
Mihai Neacşu, BSc. Prof.dr.eng. Alexandru Iosup Ir. Laurens Versluis
Department of Computer Science University of California, Santa Barbara
Presentation transcript:

Cloud Performance Evaluation at TU Delft (2008—) Alexandru Iosup Parallel and Distributed Systems Group Delft University of Technology The Netherlands Our team: Undergrad Nassos Antoniou, Thomas de Ruiter, Ruben Verboon, … Grad Siqi Shen, Nezih Yigitbasi, Ozan Sonmez Staff Henk Sips, Dick Epema, Alexandru Iosup Collaborators Ion Stoica and the Mesos team (UC Berkeley), Thomas Fahringer, Radu Prodan (U. Innsbruck), Nicolae Tapus, Mihaela Balint, Vlad Posea (UPB), Derrick Kondo, Emmanuel Jeannot (INRIA), Assaf Schuster, Mark Silberstein, Orna Ben-Yehuda (Technion), ... November 21, 2018 SPEC RG Cloud Meeting

The Real IaaS Cloud VS “The path to abundance” “The killer cyclone” http://www.flickr.com/photos/dimitrisotiropoulos/4204766418/ Tropical Cyclone Nargis (NASA, ISSS, 04/29/08) “The path to abundance” On-demand capacity Cheap for short-term tasks Great for web apps (EIP, web crawl, DB ops, I/O) “The killer cyclone” Not so great performance for scientific applications (compute- or data-intensive) November 21, 2018 November 21, 2018 2 2

This Presentation: Three Research Questions Q1: What is the performance of production IaaS cloud services? (and what is the impact on large-scale applications?) Q2: How variable is the performance of widely used production cloud services? (and what is the impact on large-scale applications?) Q3: How do provisioning and allocation policies affect the performance of IaaS cloud services? Specific research questions Other questions studied at TU Delft: How does virtualization affect the performance of IaaS cloud services? What is a good model for cloud workloads? Etc. November 21, 2018 November 21, 2018 3 3

Some Previous Work (>50 important references across our studies) Virtualization Overhead Loss below 5% for computation [Barham03] [Clark04] Loss below 15% for networking [Barham03] [Menon05] Loss below 30% for parallel I/O [Vetter08] Negligible for compute-intensive HPC kernels [You06] [Panda06] Cloud Performance Evaluation Performance and cost of executing a sci. workflows [Dee08] Study of Amazon S3 [Palankar08] Amazon EC2 for the NPB benchmark suite [Walker08] or selected HPC benchmarks [Hill08] CloudCmp [Li10] Kosmann et al. high-level research question November 21, 2018 November 21, 2018 4 4

Approach: Real Traces, Models, Real Tools, Real-World Experimentation (+ Simulation) Formalize real-world scenarios Exchange real traces Model relevant operational elements Scalable tools for meaningful and repeatable experiments Comparative studies, almost like benchmarking Simulation only when needed (long-term scenarios, etc.) November 21, 2018

Agenda Three Research Questions and General Approach IaaS Cloud Performance (Q1) Experimental Setup Experimental Results Implications on Real-World Workloads Cloud Performance Variability (Q2) Provisioning and Allocation Policies for IaaS Clouds (Q3) Conclusion November 21, 2018

Production IaaS Cloud Services Q1 Production IaaS Cloud Services Production IaaS cloud: lease resources (infrastructure) to users, operate on the market and have active customers A production cloud is a cloud operating on the market and having active customers AWS and GAE are the two largest in terms of the number of customers November 21, 2018 November 21, 2018 Iosup et al., Performance Analysis of Cloud Computing Services for Many Tasks Scientific Computing, (IEEE TPDS 2011). 7 7

Q1 Our Method Based on general performance technique: model performance of individual components; system performance is performance of workload + model [Saavedra and Smith, ACM TOCS’96] Adapt to clouds: Cloud-specific elements: resource provisioning and allocation Benchmarks for single- and multi-machine jobs Benchmark CPU, memory, I/O, etc.: November 21, 2018 Iosup et al., Performance Analysis of Cloud Computing Services for Many Tasks Scientific Computing, (IEEE TPDS 2011).

Agenda Three Research Questions and General Approach IaaS Cloud Performance (Q1) Experimental Setup Experimental Results Implications on Real-World Workloads Cloud Performance Variability (Q2) Provisioning and Allocation Policies for IaaS Clouds (Q3) Conclusion November 21, 2018

Single Resource Provisioning/Release Q1 Time depends on instance type Boot time non-negligible November 21, 2018 Iosup et al., Performance Analysis of Cloud Computing Services for Many Tasks Scientific Computing, (IEEE TPDS 2011).

Multi-Resource Provisioning/Release Q1 Time for multi-resource increases with number of resources November 21, 2018 Iosup et al., Performance Analysis of Cloud Computing Services for Many Tasks Scientific Computing, (IEEE TPDS 2011).

CPU Performance of Single Resource Q1 CPU Performance of Single Resource ECU definition: “a 1.1 GHz 2007 Opteron” ~ 4 flops per cycle at full pipeline, which means at peak performance one ECU equals 4.4 gigaflops per second (GFLOPS) Real performance 0.6..0.1 GFLOPS = ~1/4..1/7 theoretical peak November 21, 2018 Iosup et al., Performance Analysis of Cloud Computing Services for Many Tasks Scientific Computing, (IEEE TPDS 2011).

HPLinpack Performance (Parallel) Q1 HPLinpack Performance (Parallel) Low efficiency for parallel compute-intensive applications Low performance vs cluster computing and supercomputing November 21, 2018 Iosup et al., Performance Analysis of Cloud Computing Services for Many Tasks Scientific Computing, (IEEE TPDS 2011).

Performance Stability (Variability) Q1 Performance Stability (Variability) High performance stability for the best-performing instances November 21, 2018 Iosup et al., Performance Analysis of Cloud Computing Services for Many Tasks Scientific Computing, (IEEE TPDS 2011).

Summary Q1 Much lower performance than theoretical peak Especially CPU (GFLOPS) Performance variability Compared results with some of the commercial alternatives (see report) November 21, 2018

Agenda Three Research Questions and General Approach IaaS Cloud Performance (Q1) Experimental Setup Experimental Results Implications on Real-World Workloads Cloud Performance Variability (Q2) Provisioning and Allocation Policies for IaaS Clouds (Q3) Conclusion November 21, 2018

Implications: Simulations Q1 Implications: Simulations Input: real-world workload traces, grids and PPEs Running in Original env. Cloud with source-like perf. Cloud with measured perf. Metrics WT, ReT, BSD(10s) Cost [CPU-h] November 21, 2018 Iosup et al., Performance Analysis of Cloud Computing Services for Many Tasks Scientific Computing, (IEEE TPDS 2011).

Implications: Results Q1 Implications: Results Cost: Clouds, real >> Clouds, source Performance: AReT: Clouds, real >> Source env. (bad) AWT,ABSD: Clouds, real << Source env. (good) November 21, 2018 Iosup et al., Performance Analysis of Cloud Computing Services for Many Tasks Scientific Computing, (IEEE TPDS 2011).

Agenda Three Research Questions and General Approach IaaS Cloud Performance (Q1) Cloud Performance Variability (Q2) Experimental Setup Experimental Results Implications on Real-World Workloads Provisioning and Allocation Policies for IaaS Clouds (Q3) Conclusion November 21, 2018

Production Cloud Services Q2 Production Cloud Services Production cloud: operate on the market and have active customers IaaS/PaaS: Amazon Web Services (AWS) EC2 (Elastic Compute Cloud) S3 (Simple Storage Service) SQS (Simple Queueing Service) SDB (Simple Database) FPS (Flexible Payment Service) PaaS: Google App Engine (GAE) Run (Python/Java runtime) Datastore (Database) ~ SDB Memcache (Caching) URL Fetch (Web crawling) A production cloud is a cloud operating on the market and having active customers AWS and GAE are the two largest in terms of the number of customers November 21, 2018 November 21, 2018 Iosup, Yigitbasi, Epema. On the Performance Variability of Production Cloud Services, (IEEE CCgrid 2011). 20 20

Our Method [1/3] Performance Traces Q2 Our Method [1/3] Performance Traces CloudStatus* Real-time values and weekly averages for most of the AWS and GAE services Periodic performance probes Sampling rate is under 2 minutes * www.cloudstatus.com November 21, 2018 November 21, 2018 Iosup, Yigitbasi, Epema. On the Performance Variability of Production Cloud Services, (IEEE CCgrid 2011). 21

Our Method [2/3] Analysis Q2 Find out whether variability is present Investigate several months whether the performance metric is highly variable Find out the characteristics of variability Basic statistics: the five quartiles (Q0-Q4) including the median (Q2), the mean, the standard deviation Derivative statistic: the IQR (Q3-Q1) CoV > 1.1 indicate high variability Analyze the performance variability time patterns Investigate for each performance metric the presence of daily/monthly/weekly/yearly time patterns E.g., for monthly patterns divide the dataset into twelve subsets and for each subset compute the statistics and plot for visual inspection Our analysis comprise three steps November 21, 2018 November 21, 2018 Iosup, Yigitbasi, Epema. On the Performance Variability of Production Cloud Services, (IEEE CCgrid 2011). 22 22

Our Method [3/3] Is Variability Present? Q2 Our Method [3/3] Is Variability Present? Validated Assumption: The performance delivered by production services is variable. We verify our assumption in this slide and we just show the result for EC2. In the paper we have similar results for other services. November 21, 2018 November 21, 2018 Iosup, Yigitbasi, Epema. On the Performance Variability of Production Cloud Services, (IEEE CCgrid 2011). 23 23

Agenda Three Research Questions and General Approach IaaS Cloud Performance (Q1) Cloud Performance Variability (Q2) Experimental Setup Experimental Results Implications on Real-World Workloads Provisioning and Allocation Policies for IaaS Clouds (Q3) Conclusion November 21, 2018

Q2 AWS Dataset (1/4): EC2 Variable Performance Deployment Latency [s]: Time it takes to start a small instance, from the startup to the time the instance is available Higher IQR and range from week 41 to the end of the year; possible reasons: Increasing EC2 user base Impact on applications using EC2 for auto-scaling November 21, 2018 November 21, 2018 Iosup, Yigitbasi, Epema. On the Performance Variability of Production Cloud Services, (IEEE CCgrid 2011). 25

Q2 AWS Dataset (2/4): S3 Stable Performance Get Throughput [bytes/s]: Estimated rate at which an object in a bucket is read The last five months of the year exhibit much lower IQR and range More stable performance for the last five months Probably due to software/infrastructure upgrades November 21, 2018 November 21, 2018 Iosup, Yigitbasi, Epema. On the Performance Variability of Production Cloud Services, (IEEE CCgrid 2011). 26

AWS Dataset (3/4): SQS Q2 Variable Performance Stable Performance Average Lag Time [s]: Time it takes for a posted message to become available to read. Average over multiple queues. Long periods of stability (low IQR and range) Periods of high performance variability also exist November 21, 2018 November 21, 2018 27

AWS Dataset (4/4): Summary Q2 AWS Dataset (4/4): Summary All services exhibit time patterns in performance EC2: periods of special behavior SDB and S3: daily, monthly and yearly patterns SQS and FPS: periods of special behavior November 21, 2018 November 21, 2018 Iosup, Yigitbasi, Epema. On the Performance Variability of Production Cloud Services, (IEEE CCgrid 2011). 28

GAE Dataset (1/4): Run Service Q2 GAE Dataset (1/4): Run Service Fibonacci [ms]: Time it takes to calculate the 27th Fibonacci number Highly variable performance until September Last three months have stable performance (low IQR and range) November 21, 2018 November 21, 2018 Iosup, Yigitbasi, Epema. On the Performance Variability of Production Cloud Services, (IEEE CCgrid 2011). 29

GAE Dataset (2/4): Datastore Q2 GAE Dataset (2/4): Datastore Read Latency [s]: Time it takes to read a “User Group” Yearly pattern from January to August The last four months of the year exhibit much lower IQR and range More stable performance for the last five months Probably due to software/infrastructure upgrades To measure create/delete/read times CloudStatus uses a simple set of data which we refer to the combination of all these entities as a ’User Group’. November 21, 2018 November 21, 2018 Iosup, Yigitbasi, Epema. On the Performance Variability of Production Cloud Services, (IEEE CCgrid 2011). 30 30

GAE Dataset (3/4): Memcache Q2 GAE Dataset (3/4): Memcache PUT [ms]: Time it takes to put 1 MB of data in memcache. Median performance per month has an increasing trend over the first 10 months The last three months of the year exhibit stable performance November 21, 2018 November 21, 2018 Iosup, Yigitbasi, Epema. On the Performance Variability of Production Cloud Services, (IEEE CCgrid 2011). 31

GAE Dataset (4/4): Summary Q2 GAE Dataset (4/4): Summary All services exhibit time patterns Run Service: daily patterns and periods of special behavior Datastore: yearly patterns and periods of special behavior Memcache: monthly patterns and periods of special behavior URL Fetch: daily and weekly patterns, and periods of special behavior November 21, 2018 November 21, 2018 Iosup, Yigitbasi, Epema. On the Performance Variability of Production Cloud Services, (IEEE CCgrid 2011). 32

Agenda Three Research Questions and General Approach IaaS Cloud Performance (Q1) Cloud Performance Variability (Q2) Experimental Setup Experimental Results Implications on Real-World Workloads Provisioning and Allocation Policies for IaaS Clouds (Q3) Conclusion November 21, 2018

Experimental Setup (1/2): Simulations Q2 Trace based simulations for three applications Input GWA traces Number of daily unique users Monthly performance variability Application Service Job Execution GAE Run Selling Virtual Goods AWS FPS Game Status Maintenance AWS SDB/GAE Datastore November 21, 2018 November 21, 2018 Iosup, Yigitbasi, Epema. On the Performance Variability of Production Cloud Services, (IEEE CCgrid 2011). 34

Experimental Setup (2/2): Metrics Q2 Experimental Setup (2/2): Metrics Average Response Time and Average Bounded Slowdown Cost in millions of consumed CPU hours Aggregate Performance Penalty -- APP(t) Pref (Reference Performance): Average of the twelve monthly medians P(t): random value sampled from the distribution corresponding to the current month at time t (Performance is like a box of chocolates, you never know what you’re gonna get ~ Forrest Gump) max U(t): max number of users over the whole trace U(t): number of users at time t APP—the lower the better November 21, 2018 November 21, 2018 Iosup, Yigitbasi, Epema. On the Performance Variability of Production Cloud Services, (IEEE CCgrid 2011). 35

Grid & PPE Job Execution (1/2): Scenario Q2 Grid & PPE Job Execution (1/2): Scenario Execution of compute-intensive jobs typical for grids and PPEs on cloud resources Traces November 21, 2018 November 21, 2018 Iosup, Yigitbasi, Epema. On the Performance Variability of Production Cloud Services, (IEEE CCgrid 2011). 36

Grid & PPE Job Execution (2/2): Results Q2 Grid & PPE Job Execution (2/2): Results All metrics differ by less than 2% between cloud with stable and the cloud with variable performance Impact of service performance variability is low for this scenario November 21, 2018 November 21, 2018 Iosup, Yigitbasi, Epema. On the Performance Variability of Production Cloud Services, (IEEE CCgrid 2011). 37

Selling Virtual Goods (1/2): Scenario Virtual good selling application operating on a large-scale social network like Facebook Amazon FPS is used for payment transactions Amazon FPS performance variability is modeled from the AWS dataset Traces: Number of daily unique users of Facebook* November 21, 2018 November 21, 2018 *www.developeranalytics.com Iosup, Yigitbasi, Epema. On the Performance Variability of Production Cloud Services, (IEEE CCgrid 2011). 38

Selling Virtual Goods (2/2): Results Q2 Selling Virtual Goods (2/2): Results Significant cloud performance decrease of FPS during the last four months + increasing number of daily users is well-captured by APP APP metric can trigger and motivate the decision of switching cloud providers November 21, 2018 November 21, 2018 Iosup, Yigitbasi, Epema. On the Performance Variability of Production Cloud Services, (IEEE CCgrid 2011). 39

Game Status Maintenance (1/2): Scenario Q2 Game Status Maintenance (1/2): Scenario Maintenance of game status for a large-scale social game such as Farm Town or Mafia Wars which have millions of unique users daily AWS SDB and GAE Datastore We assume that the number of database operations depends linearly on the number of daily unique users November 21, 2018 November 21, 2018 Iosup, Yigitbasi, Epema. On the Performance Variability of Production Cloud Services, (IEEE CCgrid 2011). 40

Game Status Maintenance (2): Results Q2 Game Status Maintenance (2): Results AWS SDB GAE Datastore Big discrepancy between SDB and Datastore services Sep’09-Jan’10: APP of Datastore is well below than that of SDB due to increasing performance of Datastore APP of Datastore ~1 => no performance penalty APP of SDB ~1.4 => %40 higher performance penalty than SDB We show in slide 20 that the performance of datastore increases for the last months of the year APR gives a hint for provider selection November 21, 2018 November 21, 2018 Iosup, Yigitbasi, Epema. On the Performance Variability of Production Cloud Services, (IEEE CCgrid 2011). 41 41

Agenda Three Research Questions and General Approach IaaS Cloud Performance (Q1) Cloud Performance Variability (Q2) Provisioning and Allocation Policies for IaaS Clouds (Q3) Experimental Setup Experimental Results Conclusion November 21, 2018

Provisioning and Allocation Policies* Q3 Provisioning and Allocation Policies* * For User-Level Scheduling Provisioning Also looked at combined Provisioning + Allocation policies Allocation Villegas, Antoniou, Sadjadi, Iosup. An Analysis of Provisioning and Allocation Policies for Infrastructure-as-a-Service Clouds, (submitted). PDS Tech.Rep.2011-009 November 21, 2018

Experimental Tool: SkyMark Q3 Experimental Tool: SkyMark high-level research question Provisioning and Allocation policies steps 6+9, and 8, respectively Villegas, Antoniou, Sadjadi, Iosup. An Analysis of Provisioning and Allocation Policies for Infrastructure-as-a-Service Clouds, (submitted). PDS Tech.Rep.2011-009 November 21, 2018 November 21, 2018 44 44

Experimental Setup (1) Q3 Environments Workloads DAS4 Florida International University (FIU) Amazon EC2 Workloads Contents Arrival pattern Villegas, Antoniou, Sadjadi, Iosup. An Analysis of Provisioning and Allocation Policies for Infrastructure-as-a-Service Clouds, (submitted). PDS Tech.Rep.2011-009 November 21, 2018

Experimental Setup (2) Q3 Performance Metrics Cost Metrics Traditional: Makespan, Job Slowdown Workload Speedup One (SU1) Workload Slowdown Infinite (SUinf) Cost Metrics Actual Cost (Ca) Charged Cost (Cc) Compound Metrics Cost Efficiency (Ceff) Utility Villegas, Antoniou, Sadjadi, Iosup. An Analysis of Provisioning and Allocation Policies for Infrastructure-as-a-Service Clouds, (submitted). PDS Tech.Rep.2011-009 November 21, 2018

Agenda Three Research Questions and General Approach IaaS Cloud Performance (Q1) Cloud Performance Variability (Q2) Provisioning and Allocation Policies for IaaS Clouds (Q3) Experimental Setup Experimental Results Conclusion November 21, 2018

Performance Metrics Q3 Makespan very similar Very different job slowdown Villegas, Antoniou, Sadjadi, Iosup. An Analysis of Provisioning and Allocation Policies for Infrastructure-as-a-Service Clouds, (submitted). PDS Tech.Rep.2011-009 November 21, 2018

Cost Metrics Q3 Very different results between actual and charged Actual Cost Charged Cost Very different results between actual and charged Cloud charging function an important selection criterion All policies better than Startup in actual cost Policies much better/worse than Startup in charged cost Villegas, Antoniou, Sadjadi, Iosup. An Analysis of Provisioning and Allocation Policies for Infrastructure-as-a-Service Clouds, (submitted). PDS Tech.Rep.2011-009 November 21, 2018

Compound Metrics Q3 Trade-off Utility-Cost still needs investigation Performance and Cost are not both improved by the policies we have studied Villegas, Antoniou, Sadjadi, Iosup. An Analysis of Provisioning and Allocation Policies for Infrastructure-as-a-Service Clouds, (submitted). PDS Tech.Rep.2011-009 November 21, 2018

Agenda Three Research Questions and General Approach IaaS Cloud Performance (Q1) Cloud Performance Variability (Q2) Provisioning and Allocation Policies for IaaS Clouds (Q3) Conclusion November 21, 2018

Conclusion Take-Home Message Understanding how real large-scale distributed systems work Q1: What is the performance of production IaaS cloud services? Q2: How variable is the performance of widely used production cloud services? Q3: How do provisioning and allocation policies affect the performance of IaaS cloud services? Tools and Workloads SkyMark MapReduce http://www.flickr.com/photos/dimitrisotiropoulos/4204766418/ November 21, 2018

Thank you for your attention! Questions? Suggestions? Observations? More Info: http://www.st.ewi.tudelft.nl/~iosup/research.html http://www.st.ewi.tudelft.nl/~iosup/research_cloud.html http://www.pds.ewi.tudelft.nl/ Do not hesitate to contact me… Alexandru Iosup A.Iosup@tudelft.nl http://www.pds.ewi.tudelft.nl/~iosup/ (or google “iosup”) Parallel and Distributed Systems Group Delft University of Technology November 21, 2018