The Running Time Advisor A Resource Signal-based Approach to Predicting Task Running Time and Its Applications Peter A. Dinda Carnegie Mellon University.

Slides:



Advertisements
Similar presentations
Network Weather Service Sathish Vadhiyar Sources / Credits: NWS web site: NWS papers.
Advertisements

Feedback Control Real-Time Scheduling: Framework, Modeling, and Algorithms Chenyang Lu, John A. Stankovic, Gang Tao, Sang H. Son Presented by Josh Carl.
QoS-based Management of Multiple Shared Resources in Dynamic Real-Time Systems Klaus Ecker, Frank Drews School of EECS, Ohio University, Athens, OH {ecker,
A Prediction-based Real-time Scheduling Advisor Peter A. Dinda Carnegie Mellon University.
LOAD BALANCING IN A CENTRALIZED DISTRIBUTED SYSTEM BY ANILA JAGANNATHAM ELENA HARRIS.
An Evaluation of Linear Models for Host Load Prediction Peter A. Dinda David R. O’Hallaron Carnegie Mellon University.
All Hands Meeting, 2006 Title: Grid Workflow Scheduling in WOSE (Workflow Optimisation Services for e- Science Applications) Authors: Yash Patel, Andrew.
Variability in Architectural Simulations of Multi-threaded Workloads Alaa R. Alameldeen and David A. Wood University of Wisconsin-Madison
Efficient Autoscaling in the Cloud using Predictive Models for Workload Forecasting Roy, N., A. Dubey, and A. Gokhale 4th IEEE International Conference.
Lab Meeting Performance Analysis of Distributed Embedded Systems Lothar Thiele and Ernesto Wandeler Presented by Alex Cameron 17 th August, 2012.
The Case For Prediction-based Best-effort Real-time Peter A. Dinda Bruce Lowekamp Loukas F. Kallivokas David R. O’Hallaron Carnegie Mellon University.
Dynamic Mapping of Activation Trees Thesis Proposal January 29, 1998 Peter A. Dinda Committee David O’Hallaron (chair) Thomas Gross Peter Steenkiste Jaspal.
Modeling Host Load Peter A. Dinda Thesis Seminar 2/9/98.
Responsive Interactive Applications by Dynamic Mapping of Activation Trees February 20, 1998 Peter A. Dinda School of Computer.
Understanding and Predicting Host Load Peter A. Dinda Carnegie Mellon University
OS Fall ’ 02 Performance Evaluation Operating Systems Fall 2002.
Simulation Modeling and Analysis Session 12 Comparing Alternative System Designs.
Multi-Scale Analysis for Network Traffic Prediction and Anomaly Detection Ling Huang Joint work with Anthony Joseph and Nina Taft January, 2005.
1 Yi Qiao Jason Skicewicz Peter A. Dinda Prescience Laboratory Department of Computer Science Northwestern University Evanston, IL An Empirical Study.
Looking at the Server-side of P2P Systems Yi Qiao, Dong Lu, Fabian E. Bustamante and Peter A. Dinda Department of Computer Science Northwestern University.
Resource Signal Prediction and Its Application to Real-time Scheduling Advisors or How to Tame Variability in Distributed Systems Peter A. Dinda Carnegie.
Recent Results in Resource Signal Measurement, Dissemination, and Prediction App Transport Network Data Link Physical App Transport Network Data Link Physical.
Performance Evaluation
A Prediction-based Approach to Distributed Interactive Applications Peter A. Dinda Jason Skicewicz Dong Lu Prescience Lab Department of Computer Science.
Online Prediction of the Running Time Of Tasks Peter A. Dinda Department of Computer Science Northwestern University
GHS: A Performance Prediction and Task Scheduling System for Grid Computing Xian-He Sun Department of Computer Science Illinois Institute of Technology.
1 Dong Lu, Peter A. Dinda Prescience Laboratory Computer Science Department Northwestern University Virtualized.
Characterizing and Predicting TCP Throughput on the Wide Area Network Dong Lu, Yi Qiao, Peter Dinda, Fabian Bustamante Department of Computer Science Northwestern.
Multi-resolution Resource Behavior Queries Using Wavelets Jason Skicewicz Peter A. Dinda Jennifer M. Schopf Northwestern University.
Dv: A toolkit for building remote interactive visualization services David O’Hallaron School of Computer Science Carnegie Mellon University Martin Aeschlimann,
Bandwidth Allocation in a Self-Managing Multimedia File Server Vijay Sundaram and Prashant Shenoy Department of Computer Science University of Massachusetts.
OS Fall ’ 02 Performance Evaluation Operating Systems Fall 2002.
A Prediction-based Approach to Distributed Interactive Applications Peter A. Dinda Department of Computer Science Northwestern University
A Prediction-based Real-time Scheduling Advisor Peter A. Dinda Prescience Lab Department of Computer Science Northwestern University
Load Analysis and Prediction for Responsive Interactive Applications Peter A. Dinda David R. O’Hallaron Carnegie Mellon University.
Realistic CPU Workloads Through Host Load Trace Playback Peter A. Dinda David R. O’Hallaron Carnegie Mellon University.
Traffic modeling and Prediction ----Linear Models
Introduction to estimation theory Seoul Nat’l Univ.
1 Reading Report 9 Yin Chen 29 Mar 2004 Reference: Multivariate Resource Performance Forecasting in the Network Weather Service, Martin Swany and Rich.
Introduction to Discrete Event Simulation Customer population Service system Served customers Waiting line Priority rule Service facilities Figure C.1.
(C) 2009 J. M. Garrido1 Object Oriented Simulation with Java.
Computer Science Department University of Pittsburgh 1 Evaluating a DVS Scheme for Real-Time Embedded Systems Ruibin Xu, Daniel Mossé and Rami Melhem.
WP9 Resource Management Current status and plans for future Juliusz Pukacki Krzysztof Kurowski Poznan Supercomputing.
20 October 2006Workflow Optimization in Distributed Environments Dynamic Workflow Management Using Performance Data David W. Walker, Yan Huang, Omer F.
Chapter 3 System Performance and Models. 2 Systems and Models The concept of modeling in the study of the dynamic behavior of simple system is be able.
Quality of Service Karrie Karahalios Spring 2007.
Scheduling policies for real- time embedded systems.
Euro-Par, A Resource Allocation Approach for Supporting Time-Critical Applications in Grid Environments Qian Zhu and Gagan Agrawal Department of.
Smita Vijayakumar Qian Zhu Gagan Agrawal 1.  Background  Data Streams  Virtualization  Dynamic Resource Allocation  Accuracy Adaptation  Research.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Definitions Random Signal Analysis (Review) Discrete Random Signals Random.
Copyright © 2011, Performance Evaluation of a Green Scheduling Algorithm for Energy Savings in Cloud Computing Truong Vinh Truong Duy; Sato,
1 Grid Scheduling Cécile Germain-Renaud. 2 Scheduling Job –A computation to run on a machine –Possibly with network access e.g. input/output file (coarse.
Validating an Access Cost Model for Wide Area Applications Louiqa Raschid University of Maryland CoopIS 2001 Co-authors V. Zadorozhny, T. Zhan and L. Bright.
A Utility-based Approach to Scheduling Multimedia Streams in P2P Systems Fang Chen Computer Science Dept. University of California, Riverside
0 - 1 © 2007 Texas Instruments Inc, Content developed in partnership with Tel-Aviv University From MATLAB ® and Simulink ® to Real Time with TI DSPs Spectrum.
Chapter 3 System Performance and Models Introduction A system is the part of the real world under study. Composed of a set of entities interacting.
Integration of QoS-enabled Distributed Object Computing Middleware for Developing Next- Generation Distributed Applications By Krishnamurthy et Al. Presented.
OPERATING SYSTEMS CS 3530 Summer 2014 Systems and Models Chapter 03.
© 2002 IBM Corporation IBM Research 1 Policy Transformation Techniques in Policy- based System Management Mandis Beigi, Seraphin Calo and Dinesh Verma.
Scheduling MPI Workflow Applications on Computing Grids Juemin Zhang, Waleed Meleis, and David Kaeli Electrical and Computer Engineering Department, Northeastern.
Learning Photographic Global Tonal Adjustment with a Database of Input / Output Image Pairs.
Network Weather Service. Introduction “NWS provides accurate forecasts of dynamically changing performance characteristics from a distributed set of metacomputing.
Exposure Prediction and Measurement Error in Air Pollution and Health Studies Lianne Sheppard Adam A. Szpiro, Sun-Young Kim University of Washington CMAS.
OPERATING SYSTEMS CS 3502 Fall 2017
A Framework for Automatic Resource and Accuracy Management in A Cloud Environment Smita Vijayakumar.
Partially Predictable
Partially Predictable
Smita Vijayakumar Qian Zhu Gagan Agrawal
Networked Real-Time Systems: Routing and Scheduling
Srinivas Neginhal Anantharaman Kalyanaraman CprE 585: Survey Project
Presentation transcript:

The Running Time Advisor A Resource Signal-based Approach to Predicting Task Running Time and Its Applications Peter A. Dinda Carnegie Mellon University

2 High Level Goals Application-level performance predictions –Running time of compute-bound tasks Adaptation advice –Host selection to meet soft real-time deadline Resource signal approach –Host load signals Build systems that use statistics to help distributed applications adapt to highly variable resource availability Focus on information This Talk

3 Outline Bird’s eye view Adapting to highly variable resource availability Dv/QuakeViz Real-time scheduling advisor Running time advisor Confidence intervals Performance results (feasible, practical, useful) Prototype system Host load prediction Traces, structure, linear models, evaluation RPS Toolkit Conclusion

4 A Universal Challenge in High Performance Distributed Applications Highly variable resource availability Shared resources No reservations No globally respected priorities Competition from other users - “background workload” Running time can vary drastically Adaptation

5 A Universal Problem Task ? Which host should the application send the task to so that its running time is appropriate? Known resource requirements What will the running time be if I...

6 DV Framework For Distributed Interactive Visualization Large datasets (e.g., earthquake simulations) Distributed VTK visualization pipelines Active frames Encapsulate data, computation, path through pipeline Launched from server by user interaction Annotated with deadline Dynamically chose on which host each pipeline stage will execute and what quality settings to use

7 Example DV Pipeline for QuakeViz Active Frame n+2 ? Active Frame n+1 ? Active Frame n ? Simulation Output interpolation isosurface extraction isosurface extraction scene synthesis scene synthesis interpolation morphology reconstruction morphology reconstruction local display and user rendering reading ROI resolution contours Logical View interpolation isosurface extraction isosurface extraction scene synthesis scene synthesis Physical View deadline

8 Real-time Scheduling Advisor Distributed interactive applications Examples: CMU Dv/QuakeViz, BBN OpenMap Assumptions Sequential tasks initiated by user actions Aperiodic arrivals Resilient deadlines (soft real-time) Compute-bound tasks Known computational requirements Best-effort semantics Recommend host where deadline is likely to be met Predict running time on that host No guarantees

9 Running Time Advisor Predicted Running Time Task ? Application notifies advisor of task’s computational requirements (nominal time) Advisor predicts running time on each host Application assigns task to most appropriate host nominal time

10 Real-time Scheduling Advisor Predicted Running Time Task deadline nominal time ? deadline Application notifies advisor of task’s computational requirements (nominal time) and its deadline Advisor acquires predicted task running times for all hosts Advisor recommends one of the hosts where the deadline can be met

11 Variability and Prediction t High Resource Availability Variability Low Prediction Error Variability Characterization of variability Exchange high resource availability variability for low prediction error variability and a characterization of that variability t resource t error ACF t Prediction

12 Task deadline nominal time ? Confidence Intervals to Characterize Variability Predicted Running Time deadline Application specifies confidence level (e.g., 95%) Running time advisor predicts running times as a confidence interval (CI) Real-time scheduling advisor chooses host where CI is less than deadline CI captures variability to the extent the application is interested in it “3 to 5 seconds with 95% confidence” 95% confidence

13 Confidence Intervals And Predictor Quality Bad Predictor No obvious choice Good Predictor Two good choices Predicted Running Time Good predictors provide smaller CIs Smaller CIs simplify scheduling decisions Predicted Running Time deadline

14 Overview of Research Results Predicting CIs is feasible Host load prediction using AR(16) models Running time estimation using host load predictions Predicting CIs is practical RPS Toolkit (inc. in CMU Remos, BBN QuO) Extremely low-overhead online system Predicting CIs is useful Performance of real-time scheduling advisor Statistically rigorous analysis and evaluation Measured performance of real system

15 Experimental Setup Environment –Alphastation 255s, Digital Unix 4.0 –Workload: host load trace playback –Prediction system on each host Tasks –Nominal time ~ U(0.1,10) seconds –Interarrival time ~ U(5,15) seconds Methodology –Predict CIs / Host recommendations –Run task and measure

16 Predicting CIs is Feasible 3000 randomized tasks Near-perfect CIs on typical hosts

17 Predicting CIs is Practical - RPS System 1-2 ms latency from measurement to prediction 2KB/sec transfer rate <2% of CPU At Appropriate Rate

18 Predicting CIs is Useful - Real-time Scheduling Advisor tasks Predicted CI < Deadline Random Host Host With Lowest Load

19 Predicting CIs is Useful - Real-time Scheduling Advisor tasks Predicted CI < Deadline Random Host Host With Lowest Load

20 Outline Bird’s eye view Adapting to highly variable resource availability Dv/QuakeViz Real-time scheduling advisor Running time advisor Confidence intervals Performance results (feasible, practical, useful) Prototype system Host load prediction Traces, structure, linear models, evaluation RPS Toolkit Conclusion

21 Design Space Can the gap between the resources and the application can be spanned? yes!

22 Resource Signals Characteristics Easily measured, time-varying scalar quantities Strongly correlated with resource availability Periodically sampled (discrete-time signal) Examples Host load (Digital Unix 5 second load average) Network flow bandwidth and latency Leverage existing statistical signal analysis and prediction techniques

23 RPS Toolkit Extensible toolkit for implementing resource signal prediction systems Easy “buy-in” for users C++ and sockets (no threads) Prebuilt prediction components Libraries (sensors, time series, communication) Users have bought in Incorporated in CMU Remos, BBN QuO Research users: Bruce Lowekamp, Nancy Miller, LeMonte Green

24 Prototype System RPS components can be composed in other ways

25 Research Results Host load on real hosts has exploitable structure –Strong autocorrelation, self-similarity, epochal behavior –Trace database and host load trace playback Host load is predictable using simple linear models –Recommendation: AR(16) models or better for 1-30 sec predictions –RPS Toolkit for low overhead systems (<2% of CPU) C++, ported to 5 OSes, incorporated in CMU Remos, BBN QuO Running time CIs can be computed from load predictions –Load discounting, error covariances Effective real-time scheduling advice can be based on CIs –Know if deadline will be met before running task

26 Outline Bird’s eye view Adapting to Highly variable resource availability Dv/QuakeViz Real-time scheduling advisor Running time advisor Confidence intervals Performance results (feasible, practical, useful) Prototype system Host load prediction Traces, structure, linear models, evaluation RPS Toolkit Conclusion

27 Questions What are the properties of host load? Is host load predictable? What predictive models are appropriate? Are host load predictions useful?

28 Overview of Answers Host load exhibits complex behavior Strong autocorrelation, self-similarity, epochal behavior Host load is predictable 1 to 30 second timeframe Simple linear models are sufficient Recommend AR(16) or better Predictions are useful Can compute effective CIs from them

29 Host Load Traces DEC Unix 5 second exponential average Full bandwidth captured (1 Hz sample rate) Long durations

30 If Host Load Was “Random” (White Noise)... Time domainAutocorrelation SpectrogramFrequency domain

31 Host Load Has Exploitable Structure Time domainAutocorrelation SpectrogramFrequency domain

32 Linear Time Series Models (2000 sample fits, largest models in study, 30 secs ahead) Pole-zero / state-space models capture autocorrelation parsimoniously

33 Evaluation Methodology Ran ~190,000 randomly chosen testcases on the traces –Evaluate models independently of prediction/evaluation framework No monitoring –~30 testcases per trace, model class, parameter set Data-mine results Offline and online systems implemented using RPS Toolkit

34 Testcases Models –MEAN, LAST/BM(32) –Randomly chosen model from: AR(1..32), MA(1..8), ARMA(1..8,1..8), ARIMA(1..8,1..2,1..8), ARFIMA(1..8,d,1..8)

35 Evaluating a Testcase Measurements in Fit Interval Model Modeler Load Predictor Evaluator Measurements in Test Interval Prediction Stream z t+n-1,…, z t+1, z t z’ t,t+w z’ t,t+1 z’ t,t+2... z’ t+1,t+1+w z’ t+1,t+2 z’ t+1,t+3... z’ t+2,t+2+w z’ t+2,t+3 z’ t+2,t+4... Model Type Error Metrics Error Estimates One-time use Production Stream Characterization of variation Measurement of variation

36 Measured Prediction Variance: Mean Squared Error …, z t+1, z t z’ t,t+w z’ t,t+1 z’ t,t+2... z’ t+1,t+1+w z’ t+1,t+2 z’ t+1,t+3... z’ t+2,t+2+w z’ t+2,t+3 z’ t+2,t step ahead predictions 2 step ahead predictions w step ahead predictions... (  - z t+i ) 2 Variance of z... (z’ t+i,t+i+1 - z t+i+1 ) 2 (z’ t+i,t+i+2 - z t+i+2 ) 2 (z’ t+i,t+i+w - z t+i+w ) 2 1 step ahead mean squared error 2 step ahead mean squared error w step ahead mean squared error...   aw =   a1 =   a2 =   z = Load Predictor Good Load Predictor :   a1,   a2,…,   aw   z

37 Unpaired Box Plot Comparisons Good models achieve consistently low error Mean Squared Error Model AModel BModel C Inconsistent low error Consistent low error Consistent high error 2.5% 25% 50% Mean 75% 97.5%

38 1 second Predictions, All Hosts 2.5% 25% 50% Mean 75% 97.5% Predictive models clearly worthwhile

39 30 second Predictions, All Hosts 2.5% 25% 50% Mean 75% 97.5% Predictive models clearly beneficial even at long prediction horizons

40 30 Second Predictions, High Load, Dynamic Host 2.5% 25% 50% Mean 75% 97.5% Predictive models clearly worthwhile Begin to see differentiation between models

41 Outline Bird’s eye view Adapting to highly variable resource availability Dv/QuakeViz Real-time scheduling advisor Running time advisor Confidence intervals Performance results (feasible, practical, useful) Prototype system Host load prediction Traces, structure, linear models, evaluation RPS Toolkit Conclusion

42 Related Work Distributed interactive applications QuakeViz/ Dv, Aeschlimann [PDPTA’99] Quality of service QuO, Zinky, Bakken, Schantz [TPOS, April 97] QRAM, Rajkumar, et al [RTSS’97] Distributed soft real-time systems Lawrence, Jensen [assorted] Workload studies for load balancing Mutka, et al [PerfEval ‘91] Harchol-Balter, et al [SIGMETRICS ‘96] Resource signal measurement systems Remos [HPDC’98] Network Weather Service [HPDC‘97, HPDC’99] Host load prediction Wolski, et al [HPDC’99] (NWS) Samadani, et al [PODC’95] Hailperin [‘93] Application-level scheduling Berman, et al [HPDC’96] Stochastic Scheduling, Schopf [Supercomputing ‘99]

43 Conclusions Help applications adapt to highly variable resource availability Resource signal prediction Predict running times as confidence intervals –Predicting CIs is feasible Host load prediction using AR(16) models Running time estimation using host load predictions –Predicting CIs is practical RPS Toolkit (inc. in CMU Remos, BBN QuO) Extremely low-overhead online system –Predicting CIs is useful Performance of real-time scheduling advisor

44 Future Work New resource signals –Network bandwidth and latency (Remos) New prediction approaches –Wavelets, nonlinearity, cointegration Resource scheduler models –Better Unix scheduler model –Network models Adaptation advisors Applications and workloads –DV/QuakeViz, GIMP, Instrumentation

45 Tools/Venues for Future work Resource signal methodolgy RPS Toolkit Remos QuakeViz/DV Grid Forum

46 Future Work (Long Term) Experimental computer science research Application-oriented view Measurement studies and analysis Statistical approach Application services Systems building systems X applications X statistics

47 Teaching “Signals, systems, and statistics for computer scientists” “Performance data analysis” “Introduction to computer systems”

48 Response of Typical AR(16)

49 Response of AR(1024)