Dynamic Placement of Virtual Machines for Managing SLA Violations NORMAN BOBROFF, ANDRZEJ KOCHUT, KIRK BEATY SOME SLIDE CONTENT ADAPTED FROM ALEXANDER.

Slides:

Advertisements

Similar presentations

Background Virtual memory – separation of user logical memory from physical memory. Only part of the program needs to be in memory for execution. Logical.

Advertisements

Hadi Goudarzi and Massoud Pedram

Class-constrained Packing Problems with Application to Storage Management in Multimedia Systems Tami Tamir Department of Computer Science The Technion.

Lecture 5 Memory Management Part I. Lecture Highlights  Introduction to Memory Management  What is memory management  Related Problems of Redundancy,

SLA-Oriented Resource Provisioning for Cloud Computing

Segmentation and Paging Considerations

A Flexible Model for Resource Management in Virtual Private Networks Presenter: Huang, Rigao Kang, Yuefang.

CMPUT 466/551 Principal Source: CMU

Green Cloud Computing Hadi Salimi Distributed Systems Lab, School of Computer Engineering, Iran University of Science and Technology,

CloudScale: Elastic Resource Scaling for Multi-Tenant Cloud Systems Zhiming Shen, Sethuraman Subbiah, Xiaohui Gu, John Wilkes.

Efficient Autoscaling in the Cloud using Predictive Models for Workload Forecasting Roy, N., A. Dubey, and A. Gokhale 4th IEEE International Conference.

SLA-aware Virtual Resource Management for Cloud Infrastructures

Operating System Support Focus on Architecture

Data Sources The most sophisticated forecasting model will fail if it is applied to unreliable data Data should be reliable and accurate Data should be.

Improving Proxy Cache Performance: Analysis of Three Replacement Policies Dilley, J.; Arlitt, M. A journal paper of IEEE Internet Computing, Volume: 3.

70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment Chapter 11: Monitoring Server Performance.

Bandwidth Allocation in a Self-Managing Multimedia File Server Vijay Sundaram and Prashant Shenoy Department of Computer Science University of Massachusetts.

Modeling Quality-Quantity based Communication Orr Srour under the supervision of Ishai Menache.

By- Jaideep Moses, Ravi Iyer , Ramesh Illikkal and

Cutting the Electric Bill for Internet-Scale Systems Andreas Andreou Cambridge University, R02

1 Efficient Management of Data Center Resources for Massively Multiplayer Online Games V. Nae, A. Iosup, S. Podlipnig, R. Prodan, D. Epema, T. Fahringer,

CoolAir Temperature- and Variation-Aware Management for Free-Cooled Datacenters Íñigo Goiri, Thu D. Nguyen, and Ricardo Bianchini 1.

Scheduling a Large DataCenter Cliff Stein Columbia University Google Research June, 2009 Monika Henzinger, Ana Radovanovic Google Research.

Maintenance Forecasting and Capacity Planning

Presenter: Shant Mandossian EFFECTIVE TESTING OF HEALTHCARE SIMULATION SOFTWARE.

Department of Computer Science Engineering SRM University

Bargaining Towards Maximized Resource Utilization in Video Streaming Datacenters Yuan Feng 1, Baochun Li 1, and Bo Li 2 1 Department of Electrical and.

CHAPTER 2 OPERATING SYSTEM OVERVIEW 1. Operating System Operating System Definition A program that controls the execution of application programs and.

Dynamic Resource Allocation Using Virtual Machines for Cloud Computing Environment.

Predicting performance of applications and infrastructures Tania Lorido 27th May 2011.

Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.

Integrating Fine-Grained Application Adaptation with Global Adaptation for Saving Energy Vibhore Vardhan, Daniel G. Sachs, Wanghong Yuan, Albert F. Harris,

November , 2009SERVICE COMPUTATION 2009 Analysis of Energy Efficiency in Clouds H. AbdelSalamK. Maly R. MukkamalaM. Zubair Department.

Cloud Computing Energy efficient cloud computing Keke Chen.

Heterogeneity and Dynamicity of Clouds at Scale: Google Trace Analysis [1] 4/24/2014 Presented by: Rakesh Kumar [1 ]

An Autonomic Framework in Cloud Environment Jiedan Zhu Advisor: Prof. Gagan Agrawal.

Service Transition & Planning Service Validation & Testing

© 2009 IBM Corporation 1 Improving Consolidation of Virtual Machines with Risk-aware Bandwidth Oversubscription in Compute Clouds Amir Epstein Joint work.

RECON: A TOOL TO RECOMMEND DYNAMIC SERVER CONSOLIDATION IN MULTI-CLUSTER DATACENTERS Anindya Neogi IEEE Network Operations and Management Symposium, 2008.

High Performance Computing on Virtualized Environments Ganesh Thiagarajan Fall 2014 Instructor: Yuzhe(Richard) Tang Syracuse University.

CPU Scheduling Gursharan Singh Tatla 1-Feb-20111www.eazynotes.com.

70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment, Enhanced Chapter 11: Monitoring Server Performance.

Time-Series Forecasting Overview Moving Averages Exponential Smoothing Seasonality.

Problem Formulation Elastic cloud infrastructures provision resources according to the current actual demand on the infrastructure while enforcing service.

Embedded System Lab. 정범종 A_DRM: Architecture-aware Distributed Resource Management of Virtualized Clusters H. Wang et al. VEE, 2015.

Chapter 3 System Performance and Models Introduction A system is the part of the real world under study. Composed of a set of entities interacting.

Virtual Memory The memory space of a process is normally divided into blocks that are either pages or segments. Virtual memory management takes.

V Bandi and R Lahdelma 1 Forecasting. V Bandi and R Lahdelma 2 Forecasting? Decision-making deals with future problems -Thus data describing future must.

Modeling Virtualized Environments in Simalytic ® Models by Computing Missing Service Demand Parameters CMG2009 Paper 9103, December 11, 2009 Dr. Tim R.

Project Presentation By: Dean Morrison 12/6/2006 Dynamically Adaptive Prepaging for Effective Virtual Memory Management.

OPERATING SYSTEMS CS 3530 Summer 2014 Systems and Models Chapter 03.

Efficient Resource Provisioning in Compute Clouds via VM Multiplexing

Ensieea Rizwani An energy-efficient management mechanism for large-scale server clusters By: Zhenghua Xue, Dong, Ma, Fan, Mei 1.

CHARACTERIZING CLOUD COMPUTING HARDWARE RELIABILITY Authors: Kashi Venkatesh Vishwanath ; Nachiappan Nagappan Presented By: Vibhuti Dhiman.

Capacity Planning in a Virtual Environment Chris Chesley, Sr. Systems Engineer

Low Carbon Virtual Private Clouds Fereydoun Farrahi Moghaddam, Mohamed Cheriet, Kim Khoa Nguyen Synchromedia Laboratory Ecole de technologie superieure,

Lecture 4 Page 1 CS 111 Summer 2013 Scheduling CS 111 Operating Systems Peter Reiher.

Dynamic Resource Allocation for Shared Data Centers Using Online Measurements By- Abhishek Chandra, Weibo Gong and Prashant Shenoy.

OPERATING SYSTEMS CS 3502 Fall 2017

Jacob R. Lorch Microsoft Research

Reinforcement Learning Based Virtual Cluster Management

Authors Alessandro Duminuco, Ernst Biersack Taoufik and En-Najjary

Effective VM Sizing in Virtualized Data Centers

Chapter 8: Memory management

Outline Module 1 and 2 dealt with processes, scheduling and synchronization Next two modules will deal with memory and storage Processes require data to.

ICSOC 2018 Adel Nadjaran Toosi Faculty of Information Technology

Load forecasting Prepared by N.CHATHRU.

Dynamic Placement of Virtual Machines for managing sla violations

Towards Predictable Datacenter Networks

Presentation transcript:

Dynamic Placement of Virtual Machines for Managing SLA Violations NORMAN BOBROFF, ANDRZEJ KOCHUT, KIRK BEATY SOME SLIDE CONTENT ADAPTED FROM ALEXANDER NUS PRESENTED BY JON LOGAN

Motivation  Virtual machines are becoming more and more popular throughout our datacenters  Servers use electricity  Electricity can be expensive!  How do we minimize the number of utilized machines, while meeting our SLA obligations?  Usage patterns of machines are NOT static, and generally change dynamically

Goals  Maximize utilization of active machines  Minimize Service Level Agreement (SLA) violations  Minimize number of active machines  Power off unused machines to conserve cost (electricity)  Essentially, minimize cost while meeting SLA guarantees

Static Allocation  All machines are taken offline, and historical usage is used to determine ideal placement  Happens very infrequently (~weeks or months)  Must interrupt service to relocate  Utilization is not consistent in many cases! Demand may vary significantly within the period between allocations

Dynamic Allocation  VMs are seamlessly migrated between machines based on predicted demand  Is done rather frequently (~minutes, hours)  Live migration  Minimal (~ms) service disruptions during migration  Allows for allocations to more closely follow demand

Live Migration  Moves a VM image between machines without service interruption  The paper cites a ~45 second transition time  VM must be serialized and transferred over the network  Artificially limits our reallocation period  Can’t reallocate faster than we can migrate!

Service Level Agreement  Essentially is a contract between the provider and the customer that states that resources R will be available X% of the time  Violations cost money!  X is usually high (ex. 95%)  VMs do not necessarily use this entire resource allocation at all times, but it must be available should they choose to use it  Ex. VM may be doing batch processing, and only do substantial work between 12:00AM and 1:00AM

Static vs Dynamic Usages  Workloads are not static!  Try to predict the usage of the VM in a time T  Reallocate machines to be able to meet that predicted usage  Need to be within a certain percentile to meet SLA requirements  Capacity savings is simply  Static Allocation - (Predicted Usage + Error Factor)  Repeat this process every time T

What Workloads Are Best For Dynamic Allocation?  Not all Workloads are created equal  Some tend to be better than others  Constant workloads = bad!  A workload is an ideal candidate for dynamic allocation if  It has strong variability AND  It has strong autocorrelation combined with periodic behavior  Essentially, you need to have a decent degree of variability, and be able to reasonably predict its usage

Workload 3a  Strongly variable – good  Autocorrelation ~0.8 – good  Weak periodic behavior – bad  Verdict – Good  Large variability offers significant potential for optimization  Strong autocorrelation makes it possible to obtain a low-error predication

Workload 3b  Weakly variable - bad  Decaying autocorrelation - bad  Weak periodic behavior – bad  Verdict – Bad  Low variability makes potential gain low  Weak autocorrelation and no periodic component make it difficult to predict demand

Workload 3c  Strongly variable – good  Strong Autocorrelation– good  Strong periodic behavior – good  Verdict – Very Good  An ideal case for dynamic allocation

Potential Gain

Demand forecast algorithm  Determine the periods in demand using ‘common sense’ aided by periodogram (e.g.time-of-day,day of week,…)  Decompose the process into deterministic periodic and residual components D i + r i  Estimate the deterministic part using averaging of multiple smoothed historical periods  Fit Auto Regressive Moving Average (ARMA) model to the residual process  Use the combined components for demand prediction U i = D i + r i

Management Algorithm  Goal is to minimize time averaged number of active servers without violating the SLA agreement  Machines that are not utilized to handle VMs are powered off or put in a low power state  Will be reactivated if/when required (minimally, the next period)  The time to power on & migrate must be less than the period T  Responsible for actual migrations of machines  Placing of VMs is essentially a version of the bin packing problem  NP hard!  We use an approximation, using first-fit

Management Algorithm  Measure – Measure usage  Forecast – Predict usage for the next window  Remap – Relocate machines if necessary  Preform this (MFR) at regular intervals  Designed to try to predict the “best we can do”

Management Algorithm Overview

Key Terms  N – virtual machines  M – physical machines  C m – Maximum capacity of physical machine  f n i, k – forcast value for resource demand of VM n at interval i+k  R – migration interval  C p (u, o 2 ) – (1-p)-percentile of Gaussian distribution with mean u and variance o 2

Management Algorithm

Management Algorithm (2)

Management Algorithm (3)

Management Algorithm (4)

Simulations  Simulated using traces gathered from hundreds of production servers using various applications  Traces contain CPU, memory, storage, and network  We are only focusing on CPU usage  Samples were collected every 15 minutes  The simulated study  Verifies that the MFR meets SLA targets  Quantifies the reduction of SLA violations  Quantifies the number of saved machines  Explores the relationship between the remapping interval and the gain from dynamic management  Performs measurements to determine properties of a practical infrastructure with respect to migration of VMs

Overflows vs Number of PMs

Number of Machines vs Overflow Desired Significantly reduces number of machines active

Performance degrades as the migration interval increases Essentially, the prediction is the max usage predicted within the range

Limitations  The paper only looks at one resource utilization  In this case, CPU utilization  In the real world, you have numerous resources to handle allocations for  Memory, CPU, IO, Network, etc.  Assumes bandwidth between machines is free & unrestricted  Relocating some VMs in some cases may not be worth the cost of relocating the image  Their study size is small  Only 6 physical machines  What if different VMs have different SLA requirements?  What if your PMs had differing hardware?

Conclusion  Based on the simulated data, it significantly reduces cost to execute virtual machines  Relies on an ideal case of VMs  Predictable and volatile usage  Algorithm could be optimized to reduce the number of VM relocations, or to more optimally schedule  Simulation is too small  The paper claims a 44% average savings in the number of active PMs