Capacity Scaling for Elastic Compute Clouds Ahmed Aleyeldin Hassan

Slides:

Advertisements

Similar presentations

Autonomic Scaling of Cloud Computing Resources

Advertisements

A Survey of Web Cache Replacement Strategies Stefan Podlipnig, Laszlo Boszormenyl University Klagenfurt ACM Computing Surveys, December 2003 Presenter:

Capacity Planning in a Virtual Environment

Hadi Goudarzi and Massoud Pedram

Achieving Elasticity for Cloud MapReduce Jobs Khaled Salah IEEE CloudNet 2013 – San Francisco November 13, 2013.

SLA-Oriented Resource Provisioning for Cloud Computing

Fast Algorithms For Hierarchical Range Histogram Constructions

CLive Cloud-Assisted P2P Live Streaming

Cloud Computing to Satisfy Peak Capacity Needs Case Study.

Cloud Computing Resource provisioning Keke Chen. Outline  For Web applications statistical Learning and automatic control for datacenters  For data.

Green Cloud Computing Hadi Salimi Distributed Systems Lab, School of Computer Engineering, Iran University of Science and Technology,

Proactive Prediction Models for Web Application Resource Provisioning in the Cloud _______________________________ Samuel A. Ajila & Bankole A. Akindele.

CloudScale: Elastic Resource Scaling for Multi-Tenant Cloud Systems Zhiming Shen, Sethuraman Subbiah, Xiaohui Gu, John Wilkes.

Efficient Autoscaling in the Cloud using Predictive Models for Workload Forecasting Roy, N., A. Dubey, and A. Gokhale 4th IEEE International Conference.

Automatic Resource Scaling for Web Applications in the Cloud Ching-Chi Lin Institute of Information Science, Academia Sinica Department of Computer Science.

A Grid Resource Broker Supporting Advance Reservations and Benchmark- Based Resource Selection Erik Elmroth and Johan Tordsson Reporter ： S.Y.Chen.

In-Band Flow Establishment for End-to-End QoS in RDRN Saravanan Radhakrishnan.

U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Virtualization in Data Centers Prashant Shenoy

Study of Distance Vector Routing Protocols for Mobile Ad Hoc Networks Yi Lu, Weichao Wang, Bharat Bhargava CERIAS and Department of Computer Sciences Purdue.

Telco Clouds: Modelling and Simulation

SPRING 2011 CLOUD COMPUTING Cloud Computing San José State University Computer Architecture (CS 147) Professor Sin-Min Lee Presentation by Vladimir Serdyukov.

EA and IT Infrastructure - 1© Minder Chen, Stages in IT Infrastructure Evolution Mainframe/Mini Computers Personal Computer Client/Sever Computing.

Discussion on LI for Mobile Clouds

U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Black-box and Gray-box Strategies for Virtual Machine Migration Timothy Wood, Prashant.

Self-Adaptive QoS Guarantees and Optimization in Clouds Jim (Zhanwen) Li (Carleton University) Murray Woodside (Carleton University) John Chinneck (Carleton.

CLOUD COMPUTING & COST MANAGEMENT S. Gurubalasubramaniyan, MSc IT, MTech Presented by.

Abstract Load balancing in the cloud computing environment has an important impact on the performance. Good load balancing makes cloud computing more.

Department of Computer Science Engineering SRM University

Virtual Machine Hosting for Networked Clusters: Building the Foundations for “Autonomic” Orchestration Based on paper by Laura Grit, David Irwin, Aydan.

How to Resolve Bottlenecks and Optimize your Virtual Environment Chris Chesley, Sr. Systems Engineer

Dynamic Resource Allocation Using Virtual Machines for Cloud Computing Environment.

1 An SLA-Oriented Capacity Planning Tool for Streaming Media Services Lucy Cherkasova, Wenting Tang, and Sharad Singhal HPLabs,USA.

Cloud Computing. What is Cloud Computing? Cloud computing is a model for enabling convenient, on-demand network access to a shared pool of configurable.

Predicting performance of applications and infrastructures Tania Lorido 27th May 2011.

Click to add text TWA Cloud Integration with Tivoli Service Automation Manager TWS Education.

Network Aware Resource Allocation in Distributed Clouds.

Heterogeneity and Dynamicity of Clouds at Scale: Google Trace Analysis [1] 4/24/2014 Presented by: Rakesh Kumar [1 ]

An Autonomic Framework in Cloud Environment Jiedan Zhu Advisor: Prof. Gagan Agrawal.

© 2009 IBM Corporation 1 Improving Consolidation of Virtual Machines with Risk-aware Bandwidth Oversubscription in Compute Clouds Amir Epstein Joint work.

Chapter 3 System Performance and Models. 2 Systems and Models The concept of modeling in the study of the dynamic behavior of simple system is be able.

Automated Control in Cloud Computing: Challenges and Opportunities Harold C. Lim, Shivnath Babu, Jeffrey S. Chase, and Sujay S. Parekh ACM’s First Workshop.

Problem Formulation Elastic cloud infrastructures provision resources according to the current actual demand on the infrastructure while enforcing service.

Job scheduling algorithm based on Berger model in cloud environment Advances in Engineering Software (2011) Baomin Xu,Chunyan Zhao,Enzhao Hua,Bin Hu 2013/1/251.

Zibin Zheng DR 2 : Dynamic Request Routing for Tolerating Latency Variability in Cloud Applications CLOUD 2013 Jieming Zhu, Zibin.

PaaSport Introduction on Cloud Computing PaaSport training material.

Chapter1: Introduction Chapter2: Overview of Supervised Learning

CLOUD COMPUTING RICH SANGPROM. What is cloud computing? “Cloud computing is a model for enabling ubiquitous, convenient, on-demand network access to a.

Copyright © 2010, Performance and Power Management for Cloud Infrastructures Hien Nguyen Van; Tran, F.D.; Menaud, J.-M. Cloud Computing (CLOUD),

Web Technologies Lecture 13 Introduction to cloud computing.

Dynamic Placement of Virtual Machines for Managing SLA Violations NORMAN BOBROFF, ANDRZEJ KOCHUT, KIRK BEATY SOME SLIDE CONTENT ADAPTED FROM ALEXANDER.

Ensieea Rizwani An energy-efficient management mechanism for large-scale server clusters By: Zhenghua Xue, Dong, Ma, Fan, Mei 1.

Capacity Planning in a Virtual Environment Chris Chesley, Sr. Systems Engineer

Universidade Federal do Ceará FOLE: A Framework for Elasticity Performance Evaluation in Cloud Computing Systems Emanuel F. Coutinho Group of Computer.

INTRODUCTION TO GRID & CLOUD COMPUTING U. Jhashuva 1 Asst. Professor Dept. of CSE.

© 2012 Eucalyptus Systems, Inc. Cloud Computing Introduction Eucalyptus Education Services 2.

Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte A Bag-of-Features Framework for Time Series Classification.

CSE 5810 Biomedical Informatics and Cloud Computing Zhitong Fei Computer Science & Engineering Department The University of Connecticut CSE5810: Introduction.

Advanced cloud infrastructures and services SAULIUS ŽIŪKAS.

Dynamic Resource Allocation for Shared Data Centers Using Online Measurements By- Abhishek Chandra, Weibo Gong and Prashant Shenoy.

Optimizing Distributed Actor Systems for Dynamic Interactive Services

Analyzing Security and Energy Tradeoffs in Autonomic Capacity Management Wei Wu.

Introduction to Cloud Computing

Cloud Computing.

A Framework for Automatic Resource and Accuracy Management in A Cloud Environment Smita Vijayakumar.

Comparison of the Three CPU Schedulers in Xen

Smita Vijayakumar Qian Zhu Gagan Agrawal

Cloud Computing: Concepts

Approximate Mean Value Analysis of a Database Grid Application

Presentation transcript:

Capacity Scaling for Elastic Compute Clouds Ahmed Aleyeldin Hassan ahmeda@cs.umu.se Ph. Lic. Defense Presentation Advisor: Erik Elmroth Coadvisor: Johan Tordsson Department of Computing Science Umeå University, Sweden www.cloudresearch.org

Outline Introduction Elasticity and Auto-scaling Contributions Paper 1 Paper 2 Paper 3 Conclusions Future Work 3

Computing as a utility: Cloud Computing John McCarthy in 1961 Amazon announced first cloud service in 2006 Renting spare capacity on their infrastructure Virtual Machines (VMs) Enterprise-scale computing power available to anyone (on demand) A closer step to computing as a utility 4

Cloud Computing Definition NIST definition model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources that can be rapidly provisioned and released with minimal management effort or service provider interaction On demand thus can handle peaks in workloads at a lower cost One of the five essential characteristics of cloud computing identified by NIST is Rapid elasticity 5

Cloud Elasticity The ability of the cloud to rapidly scale the allocated resource capacity to a service according to demand in order to meet the QoS requirements specified in the Service Level Agreements Capacity scaling can be done manually or automatically 6

Outline Introduction Elasticity and Auto-scaling Contributions Paper 1 Paper 2 Paper 3 Conclusions Future Work

Motivation & Problem Definition The cloud elasticity problem How much capacity to (de)allocate to a cloud service (and when)? Bursty and unknown workload Reduce resource usage Reduce Service Level Agreement (SLAs) violations In a cloud context Vertical elasticity: resize VMs (CPUs, memory, etc) Horizontal elasticity: add/remove VMs to service 8

Problem Description Prediction of load/signal/future is not a new problem Studied extensively within many disciplines Time series analysis Control theory Stock market predictions Epileptic seizure in EEG, etc. Multiple approaches proposed to prediction problem Neural networks Fuzzy logic Adaptive control Regression Kriging models <your favorite machine learning technique> However, solution must be suitable for our problem… 9

Requirements Adaptive Robustness Scalability Rapid Changing workload and infrastructure dynamics Robustness Avoid oscillations or behavioral changes Scalability Tens of thousands of servers + even more VMs Rapid A late prediction can be useless 10

Main Topics This thesis contributes to automating capacity scaling in the cloud Contributions include scientific publications studying: Design of algorithms for automatic capacity scaling An enhanced algorithm for automatic capacity scaling A tool for workload analysis and classification that assigns workloads to the most suitable capacity scaling algorithm Common objective: Automatic elasticity control 11

Outline Introduction Elasticity and Auto-scaling Contributions Paper 1 Paper 2 Paper 3 Conclusions Future Work

Paper I: An Adaptive Hybrid Elasticity Controller Hybrid control, a controller that combines Reactive control (step controller) Proactive control (predicts future workload) But how to best combine? For scale-up For scale down Adaptive to workload and changing system dynamics 13

Assumptions (Paper I) Service with homogeneous requests Short requests that take one time unit (or less) to serve VM startup time is negligible Delayed requests are dropped VM capacity constant Perfect load balancing assumed

Elasticity Controller Model Monitoring Elasticity Controller ... Infrastructure +/- N Completed requests Load, L(t) Dropped

Controller How to estimate change in workload? F = C * P Two control parameter alternatives studied Periodical rate of change of system load P1 = Load change in TD/ TD 2. Ratio of load change over average system service rate: P2 = Load change / avg. Service rate over all time Control parameter Estimated load change Average capacity in last time window Window size changes dynamically Smaller upon prediction errors A tolerance level decide how often window is resized 16

Performance Evaluation Simulation-based evaluations FIFA world cup server traces 3 aspects studied Best combination of reactive and proactive controllers Controller stability w.r.t. workload size Comparison with state-of-the art controller Regression control [Iqbal et al, FGCS 2011] Performance metrics Over-provisioning ( 𝑂𝑃 ): VMs allocated but not needed Under-provisioning ( 𝑈𝑃 ): VMs needed, but not allocated (SLA violation) 17

Selected Results Baseline: Reactive scale-up, Reactive scale-down 1.63% 𝑈𝑃 1.40% 𝑂𝑃 18

Selected Results (cont.) Reactive scale-up, P1 scale-down 0.18% 𝑈𝑃 (1.63% for baseline) 14.33% 𝑂𝑃 (1.40% for baseline) 19

Selected Results (cont.) Reactive scale-up, P2 scale-down 0.41% 𝑈𝑃 (1.63% for baseline) 9.44% 𝑂𝑃 (1.40% for baseline) 20

Comparison with Regression Regression-based control: Scale up: reactively, Scale down: regression 2nd order regression based on full workload history Evaluation on selected (nasty) part of FIFA trace Reactive scale-up, Reactive scale-down 2.99% 𝑈𝑃 , 19.57% 𝑂𝑃 Reactive scale-up, Regression scale-down 2.24% 𝑈𝑃 , 47% 𝑂𝑃 Reactive scale-up, P1 scale-down 1.07% 𝑈𝑃 , 39.75% 𝑂𝑃 Reactive scale-up, P2 scale-down 1.51% 𝑈𝑃 , 32.24% 𝑂𝑃 21

Outline Introduction Elasticity and Auto-scaling Contributions Paper 1 Paper 2 Paper 3 Conclusions Future Work

Assumptions (Paper II) Homogeneous requests Short requests that take one time unit (or less) Machine startup time is negligible Delayed requests are dropped Constant machine service rate Perfect load balancing assumed

Model G/G/N queue with variable N (#VMs) 24

Performance Evaluation Simulation-based evaluations Performance metrics Over-provisioning ( 𝑂𝑃 ): VMs allocated but not needed Under-provisioning ( 𝑈𝑃 ): VMs needed, but not allocated (SLA violation) Average queue length ( 𝑄 ) Oscillations (𝑂): total number of servers (VMs) added and removed Workload traces used A one month Google Cluster trace The FIFA 1998 world cup web server traces 25

Selected Results: Google Cluster Workload Our Controller vs. baseline Controller 26

Selected Results: Google Cluster Workload CProactive CReactive 𝑁 847 VMs 687 VMs 𝑂𝑃 164 VMs 1.3 VMs 𝑈𝑃 1.7 VMs 5.4 VMs 𝑄 3.48 jobs 10.22 jobs 𝑂 153979 VMs 505289 VMs ~23% extra resources required by our controller Reduces 𝑄 , 𝑈𝑃 and 𝑂 to almost a factor of three compared to a Reactive controller

Outline Introduction Elasticity and Auto-scaling Contributions Paper 1 Paper 2 Paper 3 Conclusions Future Work

No one size fits all predictors/controllers Different Workloads No one size fits all predictors/controllers

WAC: A Workload Analyzer and Classifier 30

Workload Analyzer Periodicity means easier predictions Auto-Correlation Function (ACF) Almost standard The cross-correlation of a signal with a time-shifted version of itself Bursts, difficult to predict! Completely random bursts, very difficult to predict!!! Sample Entropy derivation from Kolmogrov Sinai entropy The negative natural logarithm of the conditional probability that two sequences similar for m points are similar at the next point 31

Workload Classifier Supervised learning K-Nearest Neighbors (KNN) Training on objects with known classes Workloads with known best controller/predictor K-Nearest Neighbors (KNN) Fast with good prediction accuracy Two flavors during training Majority vote on the class Give equal weights to all votes Votes are inversely proportional to distance Evaluation using 14 real workloads + 55 synthetic traces 32

Controllers Implemented Controllers are the classes Modified second order regression [Iqbal et. al., FGCS 2011] (Regression) Step controller [Chieu et. al., ICEBE 2009] (Reactive) Histogram based Controller [Urgaonkar et. al., TAAS 2008] (Histogram) Algorithm proposed in our second paper (Proactive)

Controller Evaluation Under-Provisioning How many requests can you drop? Over-provisioning How much cost are you willing to pay to service all requests? Oscillations Can the service handle frequent changes in the assigned resources ? Consistency ? Load migration ? There are tradeoffs and objectives

Best Controller Real workloads Generated workloads Reactive 6.55% 0.1% Regression 33.72% 61.33% Histogram 12.56% 4.27% Proactive 47.17% 34.3%

Classifier Results: Real Workloads (Selected Results) Two controllers to choose from 36

Classifier Results: Mixed Workloads (Selected Results) Four controllers to choose from

Conclusions General conclusions Paper I Paper II Paper III No one solution fits all Trade offs between overprovisioning, underprovisioning, speed and oscillations Paper I Controllers that reduce underprovisioning Paper II Enhancing the model in Paper I Paper III A tool for workload analysis and classification Common theme: automatic elasticity control 38

Future Work Realistic workload generation Design of better controllers Collaboration with EIT (LU) already started Design of better controllers Collaboration with the Dept. of Automatic Control (LU) already started A deeper study of workload characteristics and their impact on different elasticity controllers Collaboration with the Dept. of Mathematical statistics (UMU) already started Workload classification Elasticity control vs. other management components, e.g., VM Placement (Scheduling) 39

Acknowledgments Erik Elmroth and Johan Tordsson Colleagues in the group Collaboration partners Maria Kihl Family Parents and siblings Wife and daughter 40