Host Load Prediction in a Google Compute Cloud with a Bayesian Model Sheng Di 1, Derrick Kondo 1, Walfredo Cirne 2 1 INRIA 2 Google.

Slides:

Advertisements

Similar presentations

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki

Advertisements

Random Forest Predrag Radenković 3237/10

Part II – TIME SERIES ANALYSIS C3 Exponential Smoothing Methods © Angel A. Juan & Carles Serrat - UPC 2007/2008.

Notes Sample vs distribution “m” vs “µ” and “s” vs “σ” Bias/Variance Bias: Measures how much the learnt model is wrong disregarding noise Variance: Measures.

Dynamic Bayesian Networks (DBNs)

CMPUT 466/551 Principal Source: CMU

Low Complexity Keypoint Recognition and Pose Estimation Vincent Lepetit.

Autocorrelation and Linkage Cause Bias in Evaluation of Relational Learners David Jensen and Jennifer Neville.

Efficient Autoscaling in the Cloud using Predictive Models for Workload Forecasting Roy, N., A. Dubey, and A. Gokhale 4th IEEE International Conference.

Sam Pfister, Stergios Roumeliotis, Joel Burdick

Assuming normally distributed data! Naïve Bayes Classifier.

1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.

Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.

Generic Object Detection using Feature Maps Oscar Danielsson Stefan Carlsson

Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.

1 Integration of Background Modeling and Object Tracking Yu-Ting Chen, Chu-Song Chen, Yi-Ping Hung IEEE ICME, 2006.

End of Chapter 8 Neil Weisenfeld March 28, 2005.

Part II – TIME SERIES ANALYSIS C2 Simple Time Series Methods & Moving Averages © Angel A. Juan & Carles Serrat - UPC 2007/2008.

Face Processing System Presented by: Harvest Jang Group meeting Fall 2002.

General Mining Issues a.j.m.m. (ton) weijters Overfitting Noise and Overfitting Quality of mined models (some figures are based on the ML-introduction.

Energy-efficient Self-adapting Online Linear Forecasting for Wireless Sensor Network Applications Jai-Jin Lim and Kang G. Shin Real-Time Computing Laboratory,

CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.

Handwritten Character Recognition using Hidden Markov Models Quantifying the marginal benefit of exploiting correlations between adjacent characters and.

1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS.

Muhammad Moeen YaqoobPage 1 Moment-Matching Trackers for Difficult Targets Muhammad Moeen Yaqoob Supervisor: Professor Richard Vinter.

Ahsanul Haque *, Swarup Chandra *, Latifur Khan * and Charu Aggarwal + * Department of Computer Science, University of Texas at Dallas + IBM T. J. Watson.

Ahsanul Haque *, Swarup Chandra *, Latifur Khan * and Michael Baron + * Department of Computer Science, University of Texas at Dallas + Department of Mathematical.

International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.

Time Series Data Analysis - I Yaji Sripada. Dept. of Computing Science, University of Aberdeen2 In this lecture you learn What are Time Series? How to.

Ensemble Classification Methods Rayid Ghani IR Seminar – 9/26/00.

Xiao Liu, Jinjun Chen, Ke Liu, Yun Yang CS3: Centre for Complex Software Systems and Services Swinburne University of Technology, Melbourne, Australia.

© 2010 AT&T Intellectual Property. All rights reserved. AT&T, the AT&T logo and all other AT&T marks contained herein are trademarks of AT&T Intellectual.

Statistical Sampling-Based Parametric Analysis of Power Grids Dr. Peng Li Presented by Xueqian Zhao EE5970 Seminar.

Probabilistic Graphical Models for Semi-Supervised Traffic Classification Rotsos Charalampos, Jurgen Van Gael, Andrew W. Moore, Zoubin Ghahramani Computer.

Today Ensemble Methods. Recap of the course. Classifier Fusion

Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.

Ensemble Methods: Bagging and Boosting

Brett D. Higgins ^, Kyungmin Lee *, Jason Flinn *, T.J. Giuli +, Brian Noble *, and Christopher Peplin + Arbor Networks ^ University of Michigan * Ford.

Managing Server Energy and Operational Costs Chen, Das, Qin, Sivasubramaniam, Wang, Gautam (Penn State) Sigmetrics 2005.

Some Aspects of Bayesian Approach to Model Selection Vetrov Dmitry Dorodnicyn Computing Centre of RAS, Moscow.

1Ellen L. Walker Category Recognition Associating information extracted from images with categories (classes) of objects Requires prior knowledge about.

Guest lecture: Feature Selection Alan Qi Dec 2, 2004.

1/22 Optimization of Google Cloud Task Processing with Checkpoint-Restart Mechanism Speaker: Sheng Di Coauthors: Yves Robert, Frédéric Vivien, Derrick.

Ensemble Methods in Machine Learning

Unscented Kalman Filter (UKF) CSCE 774 – Guest Lecture Dr. Gabriel Terejanu Fall 2015.

OBJECT TRACKING USING PARTICLE FILTERS. Table of Contents Tracking Tracking Tracking as a probabilistic inference problem Tracking as a probabilistic.

CS Statistical Machine learning Lecture 12 Yuan (Alan) Qi Purdue CS Oct

NTU & MSRA Ming-Feng Tsai

Machine Learning CUNY Graduate Center Lecture 6: Linear Regression II.

Facets: Fast Comprehensive Mining of Coevolving High-order Time Series Hanghang TongPing JiYongjie CaiWei FanQing He Joint Work by Presenter:Wei Fan.

Parameter Estimation. Statistics Probability specified inferred Steam engine pump “prediction” “estimation”

Arizona State University1 Fast Mining of a Network of Coevolving Time Series Wei FanHanghang TongPing JiYongjie Cai.

Boosting ---one of combining models Xin Li Machine Learning Course.

SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.

On the Optimality of the Simple Bayesian Classifier under Zero-One Loss Pedro Domingos, Michael Pazzani Presented by Lu Ren Oct. 1, 2007.

UCSpv: Principled Voting in UCS Rule Populations Gavin Brown, Tim Kovacs, James Marshall.

Dynamic Resource Allocation for Shared Data Centers Using Online Measurements By- Abhishek Chandra, Weibo Gong and Prashant Shenoy.

APPLICATION OF CLUSTER ANALYSIS AND AUTOREGRESSIVE NEURAL NETWORKS FOR THE NOISE DIAGNOSTICS OF THE IBR-2M REACTOR Yu. N. Pepelyshev, Ts. Tsogtsaikhan,

Machine Learning with Spark MLlib

How to forecast solar flares?

Outlier Processing via L1-Principal Subspaces

A Framework for Automatic Resource and Accuracy Management in A Cloud Environment Smita Vijayakumar.

Course: Autonomous Machine Learning

Presenter: Xudong Zhu Authors: Xudong Zhu, etc.

Revision (Part II) Ke Chen

Generalizations of Markov model to characterize biological sequences

Movie Recommendation System

Overfitting and Underfitting

Authors: Wai Lam and Kon Fan Low Announcer: Kyu-Baek Hwang

Machine learning overview

Presentation transcript:

Host Load Prediction in a Google Compute Cloud with a Bayesian Model Sheng Di 1, Derrick Kondo 1, Walfredo Cirne 2 1 INRIA 2 Google

2/28 Outline Motivation of Load Prediction Google Load Measurements & Characterization Pattern Prediction Formulation Exponential Segmented Pattern (ESP) Prediction Transformation of Pattern Prediction Mean Load Prediction based on Bayes Model Bayes Classifier Features of Load Fluctuation Evaluation of Prediction Effect Conclusion

3/28 Motivation (Who needs Load Prediction) From the perspective of on-demand allocation User’s resources/QoS are sensitive to host load. From the perspective of system performance Stable load vs. Unstable load: System is best to run in a load balancing state, where the load burst can be released asap. From the perspective of Green computing Resource Consolidation: Shutting down idle machines can save electricity cost.

4/28 Google Load Measurements & Characterization Overview of Google trace Google released one-month trace in Nov. of 2011 (40G disk space). 10,000+ Google 670,000 jobs, 25 million tasks in total Task: the basic resource consumption unit Job: a logic computation object that contains one or more tasks

5/28 Google Load Measurements & Characterization Load Comparison between Google and Grid (GWA) Google host load fluctuates with higher noises min noise / mean noise / max noise Google: , 0.028, AuverGrid: , , > 20 times

6/28 Pattern Prediction Formulation Exponentially Segmented Pattern (ESP) The hostload fluctuation over a period is split into a set of consecutive segments, whose lengths increase exponentially. We predict the mean load over each time segment: l 1, l 2, ….. (Evidence window)

7/28 Pattern Prediction Formulation (Cont ’ d) Reduction of ESP Prediction Problem Idea: Get Segmented Levels ( l i ) always from the mean load (denoted as η i ) during [t 0, t i ] We can get l i, based on t 0, (t i-1, η i-1 ), (t i, η i ) Two key steps in the Pattern Prediction Algorithm Predict mean values with b2 k lengths from current point Transform the set of mean load prediction to ESP Current time point Time Series t0t0 t1t1 t2t2 t3t3 t4t4

8/28 Traditional Approaches to Mean Load Prediction Can Feedback Control Model work? NO Example: Kalman Filter Reason: one-step look-ahead prediction doesn’t fit our long-interval prediction goal. Can we use short-term prediction error to instruct long-term prediction feed-back? NO Can the traditional methods like Linear Model fit Google host load prediction? Such as Simple Moving Average, Auto-Regression (AR), etc. 16 hours

9/28 Mean Load Prediction based on Bayes Model Principle of Bayes Model (Why Bayes?) We strongly believe the correctness of probability Posterior Probability rather than Prior Probability Naïve Bayes Classifier (N-BC) Predicted Value: Minimized MSE Bayes Classifier (MMSE-BC) Predicted Value:

10/28 Why do we use Bayes Model? Special Advantages of Bayes Model Bayes Method can 1. effectively retain important features about load fluctuation and noises, rather than ignoring them. 2. dynamically improve prediction accuracy, with more accurate probability updated based on increasing samples. 3. estimate the future with low computation complexity, due to quick probability calculation. 4. only take limited disk space since it just needs to keep/update corresponding probability values

11/28 Mean Load Prediction based on Bayes Model Implementation of Bayes Classifier Evidence Window: an interval until current moment States of Mean Load: for prediction interval r states (e.g., r = 50 means there are 50 mean load states to predict: [0,0.02), [0.02,0.04),……, [0.98,1] ) Key Point: How to extract features in Evidence Window?

12/28 Mean Load Prediction based on Bayes Model Features of Hostload in Evidence Window 1. Mean Load State (F ml (e)) 2. Weighted Mean Load State (F wml (e)) 3. Fairness index (F fi (e))

13/28 eg. α = 4 Mean Load Prediction based on Bayes Model 4. Noise-decreased fairness index (F ndfi (e)) Load outliers are kicked out 5. Type state (F ts (e)): for degree of jitter Representation: (α, β ) α= # of types (or # of state levels) β= # of state changes β=1β=2β=3β=4β=5β=6β=7β=8 Prediction Interval eg. β = 8

14/28 F 4-sp (e)F 3-sp (e)F 2-sp (e) Mean Load Prediction based on Bayes Model 6. First-Last Load (F fll (e)) = { first load level, last load level } 7. N-segment Pattern (F N-sp (e)) F 2-sp (e): {0.01, 0.03} F 3-sp (e): {0.02, 0.04, 0.04} F 4-sp (e): {0.02, 0.02, 0.05, 0.05} Prediction Interval

15/28 Mean Load Prediction based on Bayes Model Correlation of Features Linear Correlation Coefficient Rank Correlation Coefficient

16/28 Mean Load Prediction based on Bayes Model Compatibility of Features Four Groups split: {F ml, F wml, F 2-sp, F 3-sp, F 4-sp }, {F fi, F ndfi }, {F ts }, {F fll } Total Number of Compatible Combinations:

17/28 Evaluation of Prediction Effect (Cont ’ d) List of well-known load prediction methods Simple Moving Average Mean Value in the Evidence Window (EW) Linear Weighted Moving Average Linear Weighted Moving Average Value in the EW Exponential Moving Average Last-State use last state in the EW as the prediction value Prior Probability the value with highest prior probability Auto-Regression (AR): Improved Recursive AR Hybrid Model [27]: Kalman filter + SG filter + AR

18/28 Evaluation of Prediction Effect (Cont ’ d) Training and Evaluation Evaluation Type A: the case with insufficient samples Training Period: [day 1, day 25]: only 18,000 load samples Test Period: [day 26, day 29] Evaluation Type B: ideal case with sufficient samples Training Period : [day 1, day 29]: emulation of larger set of samples Test Period: [day 26, day 29]

19/28 Evaluation of Prediction Effect (Cont ’ d) Evaluation Metrics for Accuracy Mean Squared Error (MSE) where are true mean values and Success Rate (delta of 10%) in the test period success rate = Number of Accurate Predictions Total Number of Predictions

20/28 Evaluation of Prediction Effect (Cont ’ d) 1. Exploration of Best Feature Combination (Success Rate): Evaluation Type A Representation of Feature Combinations denotes the combination of the mean load feature and fairness index feature (a) s = 3.2 hour (b) s = 6.4 hour (c) s = 12.8 hour

21/28 Evaluation of Prediction Effect (Cont ’ d) 1. Exploration of Best Feature Combination (Mean Squared Error) (a) s = 3.2 hour (b) s = 6.4 hour (c) s = 12.8 hour

22/28 Evaluation of Prediction Effect (Cont ’ d) 2. Comparison of Mean Load Prediction Methods (Success Rate of CPU load w.r.t. Evaluation Type A) (a) s = 6.4 hour (b) s = 12.8 hour

23/28 Evaluation of Prediction Effect (Cont ’ d) 2. Comparison of Mean Load Prediction Methods (MSE of CPU load w.r.t. Evaluation Type A) (a) s = 6.4 hour (b) s = 12.8 hour

24/28 Evaluation of Prediction Effect (Cont ’ d) 4. Comparison of Mean-Load Prediction Methods (CPU load w.r.t. Evaluation Type B) Best feature Combination mean load fairness index type-state first-last

25/28 Evaluation of Prediction Effect (Cont ’ d) 5. Evaluation of Pattern Prediction Effect Mean Error & Mean MSE Mean Error:

26/28 Evaluation of Prediction Effect (Cont ’ d) 5. Evaluation of Pattern Prediction Effect Snapshot of Pattern Prediction (Evaluate Type A)

27/28 Conclusion Objective: predict ESP of host load fluctuation Two-Step Algorithm Mean Load Prediction for the exponential interval from the current moment Transformation to ESP Bayes Model (for Mean Load Prediction) Exploration of best-fit combination of features Comparison with 7 other well-known methods Use Google Trace in the experiment Evaluation type A: Bayes Model ({F ml }) outperforms others by % Evaluation type B: {F ml,F fi,F ts,F fll } is the best combination. MSE of Pattern Predictions: majority are in [10 -8, ]

28/28 Questions ? Thanks Questions ?