Understanding and Predicting Host Load Peter A. Dinda Carnegie Mellon University

Understanding and Predicting Host Load Peter A. Dinda Carnegie Mellon University http://www.cs.cmu.edu/~pdinda

2 Talk in a Nutshell Load is self-similar Load exhibits epochal behavior Load prediction benefits from capturing self-similarity Statistical analysis of two sets of week long, 1 Hz resolution traces of load on ~40 machines and evaluation of linear time series models for load prediction

3 Why Study Load? Load partially determines execution time We want to model and predict load [t min,t max ] ? Interactive Application Short tasks with deadlines Unmodified Distributed System

4 Load and Execution Time

5 Outline Measurement methodology Load traces Load variance New Results –Self-similarity –Epochal behavior Benefits of capturing self similarity in linear models Conclusions

6 Measurement Methodology Ready Queue RUN len t len t-T len t-2T len t-29T... len t-30T... Exponential Average (1 minute Load “Average”) avg t avg t-0.5T avg t-T... Our Measurements (1 Hz sample rate) Digital Unix KernelUser Level Measurement Tool T=2 seconds

7 Load Traces

8 Absolute Variation

9 Relative Variation

10 Load Autocorrelation Periodogram Time Lag Frequency

11 Visual Self-Similarity Here

12 The Hurst Parameter

13 Self-similarity Statistics

14 Why is Self-Similarity Important? Complex structure –Not completely random, nor independent –Short range dependence Excellent for history-based prediction –Long range dependence Possibly a problem Modeling Implications –Suggests models that can capture ARFIMA, FGN, TAR

15 Load Exhibits Epochal Behavior

16 Epoch Length Statistics

17 Why is Epochal Behavior Important? Complex structure –Non-stationary Modeling Implications –Suggests models ARIMA, ARFIMA, etc. Non-parametric spectral methods –Suggests problem decomposition

18 Linear Time Series Models Choose weights  j to minimize  a 2  a is the confidence interval for t+1 predictions Unpredictable Random Sequence Fixed Linear Filter Partially Predictable Load Sequence

19 Realizable Pole-Zero Models ARFIMA(p,d,q) ARIMA(p,d,q) ARMA(p,q) AR(p)MA(q) Self Similarity, d related to Hurst Non-stationarity, d integer p,q are numbers of parameters d is degree of differencing

20 Real World Benefits of Models  a is the confidence interval for t+1 predictions Map work that would take 100 ms at zero load axp0:  z =0.54,  =1.0,  a(ARMA(4,4)) = 0.109  a(ARFIMA(4,d,4)) = 0.108 no model: 1.0 +/- 1.06 (95%) => 100 to 306 ms ARMA:1.0 +/- 0.22 (95%) => 178 to 222 ms ARFIMA:1.0 +/- 0.21 (95%) => 179 to 221 ms axp7:  z =0.14,  =0.12,  a(ARMA(4,4)) = 0.041  a(ARFIMA(4,d,4)) = 0.025 no model:0.12 +/- 0.27 (95%) =>100 to 139 ms ARMA:0.12 +/- 0.08 (95%) =>104 to 120 ms ARFIMA:0.12 +/- 0.05 (95%)=>107 to 117 ms 1 % 40 %

21 t+1 prediction

22 t+8 prediction

23 Conclusions Load has high variance Load is self-similar Load exhibits epochal behavior Capturing self-similarity in linear time series models improves predictability

24 Load Traces Would a web-accessible load trace database be useful? Would you like to contribute?

Understanding and Predicting Host Load Peter A. Dinda Carnegie Mellon University

Similar presentations

Presentation on theme: "Understanding and Predicting Host Load Peter A. Dinda Carnegie Mellon University"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Understanding and Predicting Host Load Peter A. Dinda Carnegie Mellon University

Similar presentations

Presentation on theme: "Understanding and Predicting Host Load Peter A. Dinda Carnegie Mellon University"— Presentation transcript:

Similar presentations

About project

Feedback