Dynamic Placement of Virtual Machines for managing sla violations Norman Bobroff Andrzej Kochut Kurk Beaty IBM T.J. Watson Research Center IFIP/IEEE International Symposium on Integrated Network Management, 2007 Presented by: Yun Liaw
Outline Introduction Server Workload Signatures that Benefit from Dynamic Placement Forecasting Placement Algorithm Simulation Studies Related Works Conclusion and Comments 2019/10/30
Introduction Low average utilization of servers is a well-known cost concern in datacenter To guarantee good performance at periods of peak demand, processing capacity is over-provisioned for many business applications The demand’s strong daily variability leading to low average utilization Traditional deployment pattern: One application per OS, and one OS per one physical machine Consolidation at application and OS levels can mitigate inefficiencies in using physical resources Application level consolidation requires considerable skill to ensure isolation between co-hosted applications within an OS image OS-based consolidation: A hypervisor executes on a physical machine (PM) and presents an abstraction of the underlying hardware to multiple virtual machines (VM) 2019/10/30
Introduction Server consolidation can be static or dynamic Static consolidation: Use historical average resource utilization as input to an algorithm that maps VMs to PMs Recomputed for a long period of time Mapping algorithm is done off-line Dynamic consolidation: Implemented at shorter timescale Need the ability to do live migration of VMs Live Migration: To migrate a VM from one PM to another without service interruption 2019/10/30
Introduction In this paper, a management algorithm for dynamic resource allocation in virtualized server environments is proposed Three parts of the algorithm (MFR) Measuring historical data Forecasting the future demand Remapping VMs to PMs
Introduction This paper’s contribution Method for classification of workload signatures to servers which benefit most from dynamic migration Forecasting technique suited for handling time series of resource demands Algorithm that remaps VMs to PMs Goal: minimizing the number of PMs Constraints: To support the workload at a specific rate of SLA violations p 2019/10/30
The algorithm – Key insight Consider a single PM that hosts a single VM The PM’s CPU capacity can be dynamically adjusted at time intervals of length τ Goal: To adjust CPU demand to ensure the probability of VM’s CPU demand exceeding capacity of PM is no greater than p
The algorithm – Key insight Ui: The CPU demand Prob. Density function of VM’s historical CPU demand Time series of VM’s CPU demand
Server workload signatures The properties that make a VM workload suitable for dynamic management: The timescale over which the resource demand varies must exceed the rebalancing interval τ Large resource variability Resource demand must be predictable The resource demand has deterministic periodic component Strong autocorrelation – possible to obtain low-error predictor 2019/10/30
2 1 3 [A] 1: raw CPU demand time series 2: autocorrelation function 3: cumulative dist. of demand 4: periodogram 1 3 2 4 [A]: strong autocorrelation, large variability and now distinct periodic component [B]: low autocorrelation, low variability and no distinct periodic component [C]: strong autocorrelation, large variability, and distinct periodic component [C] [B] 1 3 2019/10/30
Analytical formula for gain from dynamic management Notations: Reallocation interval: τ Demand probability density function: u(x) p-percentile of distribution u: Lp Distribution of predicted time series: ūτ(x) p-percentile of distribution of predictor error: Ep(τ) Gain G(τ): 2019/10/30
Forecasting Goal: building a predictor that forecasts the probability distribution of a demand in a future observation interval based on the historical usage data Approach (based on the work [20]) Decomposing the demand time series Ui into a sum of periodic components Dj S.t. where n is an integer and pj is a period Residual component of the demand 2019/10/30 [20] G.Jenkins, et al., Time Series Analysis: Forecasting and Control, Prentice Hall, 1994
Forecasting Approach (cont’d) Decouple Dj from the time series Ui Use a low-pass filter to smooth the time series Subdivide the smoothed time series into contiguous intervals of length pj Average the intervals together to form Dj Subtract the Dj from the Ui Model ri using a class of autoregression process -- AR(2) εi : the error term represented by Gaussian variable α: parameters estimated from data
[A] “The key to achieving a gain from dynamic management is the width of the predictor ‘s error distribution being less than the width of the total demand distribution” [B] [C] 2019/10/30
Management algorithm Management Objective: The remapping algorithm To minimize: the time-averaged number of active PMs hosting VMs Constraint: the rate of demand overloading the resource capacity is bounded by a specified threshold p The remapping algorithm A version of bin-packing problem (NP-hard) based on the first-fit heuristic approximation 2019/10/30
Management algorithm Notations: For each VM, use the forecast approach described earlier to forecast the max usage Sort VMs in descending order 2019/10/30
Management algorithm Notations: cp(μ, σ2): the capacity needed to guarantee an error rate less than p Taken off the VMs from the ordered list and attempt to place it on the PMs For each target PM, compute the distribution of the sum of the resource demands of all VMs allocated to this PM 2019/10/30
Management algorithm Notations: Taken off the VMs from the ordered list and attempt to place it on the PMs If p-percentile of this distribution is no greater than the capacity of the PM, the VM is allocated to this PM If the list of PMs is exhausted without satisfying this condition, the VM is assigned to the PM that has the smallest difference between allocated demands and its capacity 2019/10/30
Simulation studies The methodology is validated using simulations driven by traces gathered from hundreds of production servers running multiple OSes, and broad variety of applications The simulation studies include: Verify that MFR meets SLA targets Quantify the reduction in number of SLA violations Quantify the number of PMs used to support a workload Explore the relationship between the remapping interval and the gain from dynamic management Perform measurements on the VMWare ESX testbed To determine properties of a practical VM migration 2019/10/30
Simulation studies Meeting SLA objects For a given p, a set of 10 simulations is executed 2019/10/30
Simulation studies Quantifying the reduction in rate of SLA violations Map a single set of 25 VMs onto a set of PMs 2019/10/30
Simulation studies Quantifying the reduction in number of used PMs Compare the time-averaged number of active PMs required by MFR and static allocation at a specific target rate p 2019/10/30
Simulation studies Quantifying the reduction in number of used PMs Compare the required PMs under different target rate p 2019/10/30
Simulation studies Relationship between the remapping interval and the gain The quality of predictor decreases with the length of the prediction horizon 2019/10/30
Related Works Virtualization Time series forecasting Storage-based virtualization OS-level virtualization Application-level virtualization Time series forecasting Resource Management bin-packing heuristics resource economy in the datacenter The alg. trying to minimize the number of migrations 2019/10/30
Conclusions A management algorithm for dynamic allocation of VMs to PMs is presented providing probabilistic SLA guarantees Time series analysis and bin-packing heuristic combined to minimize the number of PMs required to support a workload A method for characterizing the gain that a given VM can achieve from migration is presented 2019/10/30
Comments Only one measured attribute – CPU Some functions are mysterious… for me The perspective of recent paper in VM consolidation are similar 2019/10/30
Analytical formula for gain from dynamic management The G(τ) above can be simplified to We apply the approximation that Thus Note: The mean of the predictor is the same as the mean of underlying distribution since the predictor is unbiased 2019/10/30