Download presentation
Presentation is loading. Please wait.
Published byAbigail Thornton Modified over 9 years ago
1
Network Weather Service
2
Introduction “NWS provides accurate forecasts of dynamically changing performance characteristics from a distributed set of metacomputing resources” What will be the future load (not current load) when a program is executed? Producing short-term performance forecasts based on historical performance measurements The forecasts can be used by dynamic scheduling agents
3
Introduction Resource allocation and scheduling decisions must be based on predictions of resource performance during a timeframe NWS takes periodic measurements of performance and using numerical models, forecasts resource performance
4
NWS Goals Components Persistent state Persistent state Name server Name server Sensors Sensors Passive (CPU availability) Active (Network measurements) Forecaster Forecaster
5
Architecture
6
Architecture
7
Performance measurements Using sensors CPU sensors Measures CPU availability Measures CPU availability Uses Usesuptimevmstat Active probes Network sensors Measures latency and bandwidth Measures latency and bandwidth Each host maintains Current data Current data One-step ahead predictions One-step ahead predictions Time series of data Time series of data
8
Network Measurements
9
Issues with Network Sensors Appropriate transfer size for measuring throughput Collision of network probes Solutions Tokens and hierarchical trees with cliques Tokens and hierarchical trees with cliques
10
Available CPU measurement
11
The formulae shown does not take into account job priorities Hence periodically an active probe is run to adjust the estimates
12
Predictions To generate a forecast, forecaster requests persistent state data When a forecast is requested, forecaster makes predictions for existing measurements using different forecast models Dynamic choice of forecast models based on the best Mean Absolute Error, Mean Square Prediction Error, Mean Percentage Prediction Error Forecasts requested by: InitForecaster() RequestForecasts() Forecasting methods Mean-based Mean-based Median based Median based Autoregressive Autoregressive
13
Forecasting Methods Notations: Prediction Accuracy: Mean Absolute Error (MAE) is the average of the above Prediction Method:
14
Forecasting Methods – Mean- based 1. 2. 3.
15
Forecasting Methods – Mean- based 4. 5.
16
Forecasting Methods – Median- based 1. 2. 3.
17
Autoregression 1. a i found such that it minimizes the overall error. r i,j is the autocorellation function for the series of N measurements.
18
Forecasting Methodology
19
Forecast Results
20
Forecasting Complexity vs Accuracy Semi Non-parametric Time Series Analysis (SNP) – an accurate but complicated model Model fit using iterative search Calculation of conditional expected value using conditional probability density
21
Sensor Control Each sensor connects to other sensors and perform measurements O(N 2 ) To reduce the time complexity, sensors organized in hierarchy called cliques To avoid collisions, tokens are used Adaptive control using adaptive token timeouts Adaptive time-out discovery and distributed leader election protocol
22
Synchronizing network probes Consistent periodicity and mutual exclusion Token List of hosts to probe List of hosts to probe Periodicity of probe Periodicity of probe Parameters to the probe Parameters to the probe Sequence number Sequence number Leader initiates the token A hosts after receiving a token: Conducts probes with the other hosts in the token Conducts probes with the other hosts in the token Passes the token to the next host Passes the token to the next host Token passed back to the leader
23
Contd… Leader notes the token circuit time and calculates the next token initiation time as (desired periodicity – token circuit time) To avoid long delays in token circulation and to have fault tolerance: Each host maintains a timer Each host maintains a timer When the timer times out, the host declares itself as the leader and initiates a new token When the timer times out, the host declares itself as the leader and initiates a new token When a host encounters two tokens, the old token is destroyed When a host encounters two tokens, the old token is destroyed Calculation of time-outs Each host records token circuit time, variance of the time Each host records token circuit time, variance of the time Uses NWS forecasting models to predict the next token arrival time Uses NWS forecasting models to predict the next token arrival time
24
New Protocol Compromise between periodicity and mutual exclusion NWS administrator specifies periodicity, and an upper range of desired periodicity If network conditions are stable and if tokens are received within the upper range, then mutual exclusion is guaranteed If network conditions are stable and if tokens are received within the upper range, then mutual exclusion is guaranteed If not, hosts times out and start conducting probes with possible collisions If not, hosts times out and start conducting probes with possible collisions Thus the protocol switches between good and bad phases
25
Illustration
26
Comparison of 2 protocols – Experimental setup 4 machines – 2 in Lyon, France and 2 in Tennessee, USA 240 second periodicity 5 second range
27
Comparison - Periodicity
28
Comparison – Mutual exclusion
29
Use of NWS: Scheduling a Jacobi application The problem: Appropriate partitioning strategy to balance processor efficiencies and communication overheads, i.e. deriving partitions to obtain resource performance
30
Deriving Partitions for Jacobi Notations Per-processor execution time The goal
31
Deriving Partitions for Jacobi Communication time Soultion: system of linear equations by Gaussian Elimination
32
NWS in Jacobi
33
Resource Selection and Scheduling
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.