Download presentation
Presentation is loading. Please wait.
1
Online Prediction of the Running Time Of Tasks Peter A. Dinda Department of Computer Science Northwestern University http://www.cs.northwestern.edu/~pdinda
2
2 Overview Predict running time of task Application supplies task size (0.1-10 seconds currently) Task is compute-bound (current limit) Prediction is a confidence interval Expresses prediction error Statistically valid decision-making in scheduler Based on host load prediction Homogenous Digital Unix hosts (current limit) »System is portable to many operating systems Everything in talk is publicly available
3
3 Outline Running time advisor Host load results Computing confidence intervals Performance evaluation Related work Conclusions
4
4 A Universal Challenge in High Performance Distributed Applications Highly variable resource availability Shared resources No reservations No globally respected priorities Competition from other users - “background workload” Running time can vary drastically Adaptation example goal: soft real-time for interactivity example mechanism: server selection Performance queries
5
5 Running Time Advisor (RTA) What will be the running time of this 3 second task if started now? nominal time: running time on empty host, task size It will be 5.3 seconds background workload Host App Entirely user-level tool No reservations or admission control Query result is a prediction
6
6 Variability and Prediction t High Resource Availability Variability Low Prediction Error Variability Characterization of variability Exchange high resource availability variability for low prediction error variability and a characterization of that variability t resource t error ACF t Prediction
7
7 Running Time Advisor (RTA) With 95% confidence, what will be the running time of this 3 second task if started now? It will be 4.1 to 6.3 seconds background workload Host App CI captures prediction error to the extent the application is interested in it Independent of prediction techniques
8
8 RTA API
9
9 Outline Running time advisor Host load results Computing confidence intervals Performance evaluation Related work Conclusions
10
10 Host Load Traces DEC Unix 5 second exponential average Full bandwidth captured (1 Hz sample rate) Long durations http://www.cs.northwestern.edu/~pdinda/LoadTraces
11
11 Host Load Properties Self-similarity –long-range dependence Epochal behavior –non-stationarity Complex correlation structure [LCR ’98, Scientific Programming, 3:4, 1999]
12
12 Host Load Prediction Fully randomized study on traces MEAN, LAST, AR, MA, ARMA, ARIMA, ARFIMA models AR(16) models most appropriate Covariance matrix for prediction errors Low overhead: <1% CPU [HPDC ’99, Cluster Computing, 3:4, 2000]
13
13 RPS Toolkit Extensible toolkit for implementing resource signal prediction systems Easy “buy-in” for users C++ and sockets (no threads) Prebuilt prediction components Libraries (sensors, time series, communication) Users have bought in Incorporated in CMU Remos, BBN QuO http://www.cs.northwestern.edu/~RPS [CMU-CS-99-138]
14
14 Outline Running time advisor Host load results Computing confidence intervals Performance evaluation Related work Conclusions
15
15 A Model of the Unix Scheduler Unix Scheduler Task t nom Background workload Task t act Actual Load Nominal running time Actual running time t act = f(t nom, background workload)
16
16 A Model of the Unix Scheduler Unix Scheduler Task t nom Background workload Task t exp Predicted Load Nominal running time Predicted running time > t exp = g(t nom, ) = t act + Error >
17
17 Available Time and Average Load Available time from 0 to t Average load from 0 to t t act is minimum t where at(t)=t nom Fluid model, Processor Sharing, Idealized Round-Robin, … Load Signal – replace with prediction of load signal
18
18 Discrete Time No magic here – this is the obvious discretization is the sample interval z t+j replaced with prediction
19
19 Confidence Intervals z t+j replaced with z t+j in prediction, giving al i, at i, at(t) >>>> Confidence interval for at(t) is a CI for al i … >> Since this is a sum, the central limit theorem applies… prediction errors Then a 95% confidence interval is
20
20 The Variance of the Sum Prediction errors a t+j are not independent Predictor’s covariance matrix captures this Predictor makes it possible to compute this variance and thus the CI Important detail: load discounting
21
21 Outline Running time advisor Host load results Computing confidence intervals Performance evaluation Related work Conclusions
22
22 Experimental Setup Environment –Alphastation 255s, Digital Unix 4.0 –Workload: host load trace playback [LCR 2000] –Prediction system on each host AR(16), MEAN, LAST Tasks –Nominal time ~ U(0.1,10) seconds –Interarrival time ~ U(5,15) seconds –95 % confidence level Methodology –Predict CIs –Run task and measure http://www.cs.northwestern.edu/~pdinda/LoadTraces/playload
23
23 Metrics Coverage Fraction of testcases within confidence interval Ideally should equal the target 95 % Span Average length of confidence interval Ideally as short as possible R 2 between t exp and t act
24
24 General Picture of Results Five classes of behavior I’ll show you two RTA Works Coverage near 95% in most cases is possible Predictor quality matters Better predictors lead to smaller spans on lightly loaded hosts and to correct coverage on heavily loaded hosts AR(16) >= LAST >= MEAN Performance is slightly dependent on nominal time
25
25 Most Common Coverage Behavior
26
26 Most Common Span Behavior
27
27 Uncommon Coverage Behavior
28
28 Uncommon Span Behavior
29
29 Related Work Distributed interactive applications QuakeViz/ Dv, Aeschlimann [PDPTA’99] Quality of service QuO, Zinky, Bakken, Schantz [TPOS, April 97] QRAM, Rajkumar, et al [RTSS’97] Distributed soft real-time systems Lawrence, Jensen [assorted] Workload studies for load balancing Mutka, et al [PerfEval ‘91] Harchol-Balter, et al [SIGMETRICS ‘96] Resource signal measurement systems Remos [HPDC’98] Network Weather Service [HPDC‘97, HPDC’99] Host load prediction Wolski, et al [HPDC’99] (NWS) Samadani, et al [PODC’95] Hailperin [‘93] Application-level scheduling Berman, et al [HPDC’96] Stochastic Scheduling, Schopf [Supercomputing ‘99]
30
30 Conclusions Predict running time of compute-bound task Based on host load prediction Prediction is a confidence interval Confidence interval algorithm Covariance matrix Load discounting Effective for domain Digital Unix, 0.1-10 second tasks, 5-15 second interarrival Extensions in progress
31
31 For More Information All software and traces are available RPS + RTA + RTSA http://www.cs.northwestern.edu/~RPS Load Traces and playback http://www.cs.northwestern.edu/~pdinda/LoadTraces Prescience Lab Peter Dinda, Jason Skicewicz, Dong Lu http://www.cs.northwestern.edu/~plab
32
32 Outline Running time advisor Host load results Computing confidence intervals Performance evaluation Related work Conclusions
33
33 A Universal Problem Task ? Which host should the application send the task to so that its running time is appropriate? Known resource requirements What will the running time be if I... Example: Real-time
34
34 Running Time Advisor Predicted Running Time Task ? Application notifies advisor of task’s computational requirements (nominal time) Advisor predicts running time on each host Application assigns task to most appropriate host nominal time
35
35 Real-time Scheduling Advisor Predicted Running Time Task deadline nominal time ? deadline Application specifies task’s computational requirements (nominal time) and its deadline Advisor acquires predicted task running times for all hosts Advisor chooses one of the hosts where the deadline can be met
36
36 Task deadline nominal time ? Confidence Intervals to Characterize Variability Predicted Running Time deadline Application specifies confidence level (e.g., 95%) Running time advisor predicts running times as a confidence interval (CI) Real-time scheduling advisor chooses host where CI is less than deadline CI captures variability to the extent the application is interested in it “3 to 5 seconds with 95% confidence” 95% confidence
37
37 Prototype System This Paper
38
38 Load Discounting Motivation I/O priority boost Short tasks less effected by load
39
39 Load Discounting Apply before using load predictions discount is estimatable machine property
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.