Online Prediction of the Running Time Of Tasks Peter A. Dinda Department of Computer Science Northwestern University

Online Prediction of the Running Time Of Tasks Peter A. Dinda Department of Computer Science Northwestern University http://www.cs.northwestern.edu/~pdinda

2 Overview Predict running time of task Application supplies task size (0.1-10 seconds currently) Task is compute-bound (current limit) Prediction is a confidence interval Expresses prediction error Statistically valid decision-making in scheduler Based on host load prediction Homogenous Digital Unix hosts (current limit) »System is portable to many operating systems Everything in talk is publicly available

3 Outline Running time advisor Host load results Computing confidence intervals Performance evaluation Related work Conclusions

4 A Universal Challenge in High Performance Distributed Applications Highly variable resource availability Shared resources No reservations No globally respected priorities Competition from other users - “background workload” Running time can vary drastically Adaptation example goal: soft real-time for interactivity example mechanism: server selection Performance queries

5 Running Time Advisor (RTA) What will be the running time of this 3 second task if started now? nominal time: running time on empty host, task size It will be 5.3 seconds background workload Host App Entirely user-level tool No reservations or admission control Query result is a prediction

6 Variability and Prediction t High Resource Availability Variability Low Prediction Error Variability Characterization of variability Exchange high resource availability variability for low prediction error variability and a characterization of that variability t resource t error ACF t Prediction

7 Running Time Advisor (RTA) With 95% confidence, what will be the running time of this 3 second task if started now? It will be 4.1 to 6.3 seconds background workload Host App CI captures prediction error to the extent the application is interested in it Independent of prediction techniques

8 RTA API

10 Host Load Traces DEC Unix 5 second exponential average Full bandwidth captured (1 Hz sample rate) Long durations http://www.cs.northwestern.edu/~pdinda/LoadTraces

11 Host Load Properties Self-similarity –long-range dependence Epochal behavior –non-stationarity Complex correlation structure [LCR ’98, Scientific Programming, 3:4, 1999]

12 Host Load Prediction Fully randomized study on traces MEAN, LAST, AR, MA, ARMA, ARIMA, ARFIMA models AR(16) models most appropriate Covariance matrix for prediction errors Low overhead: <1% CPU [HPDC ’99, Cluster Computing, 3:4, 2000]

13 RPS Toolkit Extensible toolkit for implementing resource signal prediction systems Easy “buy-in” for users C++ and sockets (no threads) Prebuilt prediction components Libraries (sensors, time series, communication) Users have bought in Incorporated in CMU Remos, BBN QuO http://www.cs.northwestern.edu/~RPS [CMU-CS-99-138]

15 A Model of the Unix Scheduler Unix Scheduler Task t nom Background workload Task t act Actual Load Nominal running time Actual running time t act = f(t nom, background workload)

16 A Model of the Unix Scheduler Unix Scheduler Task t nom Background workload Task t exp Predicted Load Nominal running time Predicted running time > t exp = g(t nom, ) = t act + Error >

17 Available Time and Average Load Available time from 0 to t Average load from 0 to t t act is minimum t where at(t)=t nom Fluid model, Processor Sharing, Idealized Round-Robin, … Load Signal – replace with prediction of load signal

18 Discrete Time No magic here – this is the obvious discretization  is the sample interval z t+j replaced with prediction

19 Confidence Intervals z t+j replaced with z t+j in prediction, giving al i, at i, at(t) >>>> Confidence interval for at(t) is a CI for al i … >> Since this is a sum, the central limit theorem applies… prediction errors Then a 95% confidence interval is

20 The Variance of the Sum Prediction errors a t+j are not independent Predictor’s covariance matrix captures this Predictor makes it possible to compute this variance and thus the CI Important detail: load discounting

22 Experimental Setup Environment –Alphastation 255s, Digital Unix 4.0 –Workload: host load trace playback [LCR 2000] –Prediction system on each host AR(16), MEAN, LAST Tasks –Nominal time ~ U(0.1,10) seconds –Interarrival time ~ U(5,15) seconds –95 % confidence level Methodology –Predict CIs –Run task and measure http://www.cs.northwestern.edu/~pdinda/LoadTraces/playload

23 Metrics Coverage Fraction of testcases within confidence interval Ideally should equal the target 95 % Span Average length of confidence interval Ideally as short as possible R 2 between t exp and t act

24 General Picture of Results Five classes of behavior I’ll show you two RTA Works Coverage near 95% in most cases is possible Predictor quality matters Better predictors lead to smaller spans on lightly loaded hosts and to correct coverage on heavily loaded hosts AR(16) >= LAST >= MEAN Performance is slightly dependent on nominal time

25 Most Common Coverage Behavior

26 Most Common Span Behavior

27 Uncommon Coverage Behavior

28 Uncommon Span Behavior

29 Related Work Distributed interactive applications QuakeViz/ Dv, Aeschlimann [PDPTA’99] Quality of service QuO, Zinky, Bakken, Schantz [TPOS, April 97] QRAM, Rajkumar, et al [RTSS’97] Distributed soft real-time systems Lawrence, Jensen [assorted] Workload studies for load balancing Mutka, et al [PerfEval ‘91] Harchol-Balter, et al [SIGMETRICS ‘96] Resource signal measurement systems Remos [HPDC’98] Network Weather Service [HPDC‘97, HPDC’99] Host load prediction Wolski, et al [HPDC’99] (NWS) Samadani, et al [PODC’95] Hailperin [‘93] Application-level scheduling Berman, et al [HPDC’96] Stochastic Scheduling, Schopf [Supercomputing ‘99]

30 Conclusions Predict running time of compute-bound task Based on host load prediction Prediction is a confidence interval Confidence interval algorithm Covariance matrix Load discounting Effective for domain Digital Unix, 0.1-10 second tasks, 5-15 second interarrival Extensions in progress

31 For More Information All software and traces are available RPS + RTA + RTSA http://www.cs.northwestern.edu/~RPS Load Traces and playback http://www.cs.northwestern.edu/~pdinda/LoadTraces Prescience Lab Peter Dinda, Jason Skicewicz, Dong Lu http://www.cs.northwestern.edu/~plab

33 A Universal Problem Task ? Which host should the application send the task to so that its running time is appropriate? Known resource requirements What will the running time be if I... Example: Real-time

34 Running Time Advisor Predicted Running Time Task ? Application notifies advisor of task’s computational requirements (nominal time) Advisor predicts running time on each host Application assigns task to most appropriate host nominal time

35 Real-time Scheduling Advisor Predicted Running Time Task deadline nominal time ? deadline Application specifies task’s computational requirements (nominal time) and its deadline Advisor acquires predicted task running times for all hosts Advisor chooses one of the hosts where the deadline can be met

36 Task deadline nominal time ? Confidence Intervals to Characterize Variability Predicted Running Time deadline Application specifies confidence level (e.g., 95%) Running time advisor predicts running times as a confidence interval (CI) Real-time scheduling advisor chooses host where CI is less than deadline CI captures variability to the extent the application is interested in it “3 to 5 seconds with 95% confidence” 95% confidence

37 Prototype System This Paper

38 Load Discounting Motivation I/O priority boost Short tasks less effected by load

39 Load Discounting Apply before using load predictions  discount is estimatable machine property

Online Prediction of the Running Time Of Tasks Peter A. Dinda Department of Computer Science Northwestern University

Similar presentations

Presentation on theme: "Online Prediction of the Running Time Of Tasks Peter A. Dinda Department of Computer Science Northwestern University"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Online Prediction of the Running Time Of Tasks Peter A. Dinda Department of Computer Science Northwestern University

Similar presentations

Presentation on theme: "Online Prediction of the Running Time Of Tasks Peter A. Dinda Department of Computer Science Northwestern University"— Presentation transcript:

Similar presentations

About project

Feedback