Effective VM Sizing in Virtualized Data Centers Ming Chen1, Hui Zhang2, Ya-Yunn Su3, Xiaorui Wang1, Guofei Jiang2, Kenji Yoshihira2 1. University of Tennessee 2. NEC Laboratories America 3. National Taiwan University
Virtualized data centers: server consolidation and green IT Server consolidation - virtualization facilitates consolidation of several physical servers onto a single high end system — Reduces management costs/overheads — Increases overall utilization Resource Pool Green IT - computing more, consume less — Improving infrastructure efficiency —Increasing IT productivity Organizations are constantly faced with the challenge of accommodating increasing amounts of data, increasing numbers of devices and users, and increasingly powerful servers to support critical applications—all while controlling data center costs for power, cooling, and other operations. improving infrastructure efficiency - strategies that can have a significant effect on overall data center performance per watt, such as using energy-efficient equipment, optimizing data center temperature (77 degree), and utilizing best-practices data center design. increasing IT productivity - increasing the actual amount of computing work completed in the data center relative to the amount of power used. Although these two options are key tactics in controlling energy use, they generally represent only an incremental savings. Substantially reducing energy use typically requires adopting a more strategic approach to managing the IT infrastructure than these approaches can provide. Today Future Data center useful work IT load power DCiE = DCpW = Total data center Input power Total facility power 11/27/2018 DCiE: Data center infrastructure efficiency DCpW: Data center performance per Watt 2 2
In virtualized data centers… Server utilization based performance and power management mechanisms VMware DPM, NEC SSC, IBM Tivoli… Overload threshold CPUhigh CPU utilization CPUlow Power-saving mode 11/27/2018
VM sizing – a resource management primitive in virtualized data centers How much resource allocated to this VM? CPU utilization Sizing over the maximal load? Low resource utilization!!! Sizing over the average load? High performance violations!!! time Cumulative Distribution Function of Server Normalized -percentile Loads (5,415 servers of 10 IT systems) Maximal load is much larger than the average load 90% of the servers have the maximal load at least 2.2 times larger than their average load; 50% of the servers have the maximal load at least 7.2 times larger than their average load. 11/27/2018
Effective VM sizing Effective size, a new VM sizing concept under probabilistic SLAs A probabilistic SLA example [Bobroff2007] Prob[server x’s CPU utilization at any time > 90%] < 5% A VM’s effective size is decided by four factors its own workload performance constraint defined as probabilistic SLAs the resource capacity of the server the VMs co-hosted in the server Check the NOMS06 paper for a reference 11/27/2018
Stochastic bin packing problem Given a set of items, whose size is described by independent random variables S = {X1,X2, … ,Xn}, and an overflow probability p, Partition the set S into the smallest number of set (bins) S1 ,… , Sk such that for all 1 ≤ j ≤ k. VMs workload machines SLA Effective sizing is the basis of a family of O(1)-approximation algorithms for the stochastic bin packing problem. 11/27/2018
Effective Sizing – intrinsic demand Let a random variable Xi represent a VM i's resource demand, and Cj is the resource capacity of server j. The intrinsic demand of VM i on server j is defined as and Nij is the maximal value of N satisfying the following constraint Intuitively, Nij is the maximal number of VMs that can be packed into server j without breaking the probabilistic SLA when all the VMs have the same workload pattern as VM i. where Uk are independent and identically distributed (i.i.d.) random variables with the same distribution as Xi. 11/27/2018
Intrinsic demand – one example Statistical multiplexing rocks! Overflowing prob Effective sizing example: i.i.d random variables with normal distribution (server overload probability = 2.5%) 11/27/2018
Intrinsic demand – analysis Theorem 1. For items with independent Poisson distributions, the First Fit Decreasing (FFD) deterministic bin packing algorithm with effective sizing (intrinsic demand) finds a solution to the stochastic bin packing problem with at most (1.22B*+1) bins of size 1, where B* is the minimum possible number of bins. Theorem 2. For items with independent normal distributions, the First Fit Decreasing deterministic bin packing algorithm with effective sizing (intrinsic demand) finds a solution to the stochastic bin packing problem with at most (1.22B*+1) bins of size 1+rc, where B* is the minimum possible number of bins, and rc ≤ 0.496. 11/27/2018
Intrinsic demand may not be enough Workload independence assumption might not hold in practice 11/27/2018
Effective Sizing – correlation-aware demand Let a random variable Xi represent a VM i's resource demand, and another random variable Xj represent a server j's existing aggregate resource demand from the VMS already allocated to it. The correlation-aware demand of VM i on server j is defined as where σ2i and σ2j are the variances of the random variables Xi and Xj; ρ is the correlation coefficient between Xi and Xj; Zα denotes the α-percentile for the unit normal distribution (α= 1-p). For example, if we want the overflow probability p = 0.25%, then α= 99.75%, and Zα = 3. Intuitively, Nij is the maximal number of VMs that can be packed into server j without breaking the probabilistic SLA when all the VMs have the same workload pattern as VM i. 11/27/2018
Applying effective sizing in production systems Practical issues in many dimensions Product implementation VM migration cost History and correlation aware (HCA) VM placement algorithm in the paper. Workload distribution modeling Workload stationarity Application-layer SLAs Please see discussions in the paper. 11/27/2018
Data center workload traces Traces on 2525 servers from 10 IT systems Each is regarded as a VM in the simulations. Monitoring data: CPU utilization. 1 week length, 15 minute monitoring frequency 672 time points 11/27/2018 13
Simulation methodology All physical servers have homogenous hardware specs. CPU resource: 3GHZ X 4 (Quadra-core) (the most common CPU model in the traces) Memory constraint: the maximal number of VMs allowable if the server is memory bounded (4, 8, 16, …) At the beginning of each time window, provoke the server consolidation scheme Using the monitoring information in the previous window to make decision During each time window, measure the placement scheme by The number of active servers Server overflowing probability p=5% in the evaluation. Five server consolidation schemes B1: FFD + average load B2: FFD + maximal load B3: FFD + VMware DPM VM sizing (μ+2σ, μ - mean, σ – standard deviation) B4: FFD + 95-percentile load ES-CA: FFD + effective sizing 11/27/2018
Simulation results Effective sizing 11/27/2018 46% less servers than max-load sizing 23% less servers than VMware DPM 10% less servers than 95-percentile 11/27/2018
Simulation results Effective sizing 11/27/2018 34% less servers than max-load sizing 16% less servers than VMware DPM 11% less servers than 95-percentile ES-CA 11/27/2018
Conclusions & Future Work Effective sizing, a new VM sizing method in server consolidation. O(1)-approxmiation algorithms for stochastic bin packing problem. Migration-cost and workload-correlation aware VM placement algorithms. Future work Server consolidation in multiple dimensions. CPU, memory, disk, network. 11/27/2018