Network Weather Service Sathish Vadhiyar Sources / Credits: NWS web site: NWS papers.

Slides:



Advertisements
Similar presentations
Multiple Processor Systems
Advertisements

Database System Concepts and Architecture
Distributed Systems Major Design Issues Presented by: Christopher Hector CS8320 – Advanced Operating Systems Spring 2007 – Section 2.6 Presentation Dr.
Analysis of : Operator Scheduling in a Data Stream Manager CS561 – Advanced Database Systems By Eric Bloom.
Hadi Goudarzi and Massoud Pedram
SCTP v/s TCP – A Comparison of Transport Protocols for Web Traffic CS740 Project Presentation by N. Gupta, S. Kumar, R. Rajamani.
A system Performance Model Instructor: Dr. Yanqing Zhang Presented by: Rajapaksage Jayampthi S.
Workshop on HPC in India Grid Middleware for High Performance Computing Sathish Vadhiyar Grid Applications Research Lab (GARL) Supercomputer Education.
SKELETON BASED PERFORMANCE PREDICTION ON SHARED NETWORKS Sukhdeep Sodhi Microsoft Corp Jaspal Subhlok University of Houston.
Efficient Autoscaling in the Cloud using Predictive Models for Workload Forecasting Roy, N., A. Dubey, and A. Gokhale 4th IEEE International Conference.
The Network Weather Service A Distributed Resource Performance Forecasting Service for Metacomputing Rich Wolski, Neil T. Spring and Jim Hayes Presented.
1 Virtual Machine Resource Monitoring and Networking of Virtual Machines Ananth I. Sundararaj Department of Computer Science Northwestern University July.
Performance Prediction Engineering Francine Berman U. C. San Diego Rich Wolski U. C. San Diego and University of Tennessee This presentation will probably.
Fault-tolerant Adaptive Divisible Load Scheduling Xuan Lin, Sumanth J. V. Acknowledge: a few slides of DLT are from Thomas Robertazzi ’ s presentation.
The Network Weather Service: A Distributed Resource Performance Forecasting Service for Metacomputing, Rich Wolski, Neil Spring, and Jim Hayes, Journal.
May 29, Final Presentation Sajib Barua1 Development of a Parallel Fast Fourier Transform Algorithm for Derivative Pricing Using MPI Sajib Barua.
Scheduling with Optimized Communication for Time-Triggered Embedded Systems Slide 1 Scheduling with Optimized Communication for Time-Triggered Embedded.
1 Introduction to Load Balancing: l Definition of Distributed systems. Collection of independent loosely coupled computing resources. l Load Balancing.
GHS: A Performance Prediction and Task Scheduling System for Grid Computing Xian-He Sun Department of Computer Science Illinois Institute of Technology.
Operating Systems Operating System Support for Multimedia.
On Fairness, Optimizing Replica Selection in Data Grids Husni Hamad E. AL-Mistarihi and Chan Huah Yong IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS,
Information and Scheduling: What's available and how does it change Jennifer M. Schopf Argonne National Lab.
Multiple Sender Distributed Video Streaming Nguyen, Zakhor IEEE Transactions on Multimedia April 2004.
CSE 160/Berman Programming Paradigms and Algorithms W+A 3.1, 3.2, p. 178, 5.1, 5.3.3, Chapter 6, 9.2.8, , Kumar Berman, F., Wolski, R.,
Department of Computer Science Southern Illinois University Edwardsville Dr. Hiroshi Fujinoki and Kiran Gollamudi {hfujino,
Ekrem Kocaguneli 11/29/2010. Introduction CLISSPE and its background Application to be Modeled Steps of the Model Assessment of Performance Interpretation.
MobSched: An Optimizable Scheduler for Mobile Cloud Computing S. SindiaS. GaoB. Black A.LimV. D. AgrawalP. Agrawal Auburn University, Auburn, AL 45 th.
1 Reading Report 9 Yin Chen 29 Mar 2004 Reference: Multivariate Resource Performance Forecasting in the Network Weather Service, Martin Swany and Rich.
Self Adaptivity in Grid Computing Reporter : Po - Jen Lo Sathish S. Vadhiyar and Jack J. Dongarra.
1 Artificial Evolution: From Clusters to GRID Erol Şahin Cevat Şener Dept. of Computer Engineering Middle East Technical University Ankara.
1 A Feedback Control Architecture and Design Methodology for Service Delay Guarantees in Web Servers Presentation by Amitayu Das.
Performance of the Relational Grid Monitoring Architecture (R-GMA) CMS data challenges. The nature of the problem. What is GMA ? And what is R-GMA ? Performance.
Meta Scheduling Sathish Vadhiyar Sources/Credits/Taken from: Papers listed in “References” slide.
20 October 2006Workflow Optimization in Distributed Environments Dynamic Workflow Management Using Performance Data David W. Walker, Yan Huang, Omer F.
Performance Model & Tools Summary Hung-Hsun Su UPC Group, HCS lab 2/5/2004.
Nimrod & NetSolve Sathish Vadhiyar. Nimrod Sources/Credits: Nimrod web site & papers.
Scientific Workflow Scheduling in Computational Grids Report: Wei-Cheng Lee 8th Grid Computing Conference IEEE 2007 – Planning, Reservation,
1 Multiprocessor and Real-Time Scheduling Chapter 10 Real-Time scheduling will be covered in SYSC3303.
1 Distributed Energy-Efficient Scheduling for Data-Intensive Applications with Deadline Constraints on Data Grids Cong Liu and Xiao Qin Auburn University.
Euro-Par, A Resource Allocation Approach for Supporting Time-Critical Applications in Grid Environments Qian Zhu and Gagan Agrawal Department of.
SERC Research Seminar Day August 18, 2007 Predictions for Parallel Applications and Systems Sathish Vadhiyar Grid Applications Research Laboratory (GARL)
Predicting Queue Waiting Time in Batch Controlled Systems Rich Wolski, Dan Nurmi, John Brevik, Graziano Obertelli Computer Science Department University.
Problem Solving with NetSolve Michelle Miller, Keith Moore,
PARALLEL APPLICATIONS EE 524/CS 561 Kishore Dhaveji 01/09/2000.
1 Clock Synchronization for Wireless Sensor Networks: A Survey Bharath Sundararaman, Ugo Buy, and Ajay D. Kshemkalyani Department of Computer Science University.
S-Paxos: Eliminating the Leader Bottleneck
SPRUCE Special PRiority and Urgent Computing Environment Advisor Demo Nick Trebon University of Chicago Argonne National Laboratory
OPERATING SYSTEMS CS 3530 Summer 2014 Systems with Multi-programming Chapter 4.
K-Anycast Routing Schemes for Mobile Ad Hoc Networks 指導老師 : 黃鈴玲 教授 學生 : 李京釜.
Shuman Guo CSc 8320 Advanced Operating Systems
Replicating Memory Behavior for Performance Skeletons Aditya Toomula PC-Doctor Inc. Reno, NV Jaspal Subhlok University of Houston Houston, TX By.
Automatic Statistical Evaluation of Resources for Condor Daniel Nurmi, John Brevik, Rich Wolski University of California, Santa Barbara.
Memory Coherence in Shared Virtual Memory System ACM Transactions on Computer Science(TOCS), 1989 KAI LI Princeton University PAUL HUDAK Yale University.
Efficient Resource Allocation for Wireless Multicast De-Nian Yang, Member, IEEE Ming-Syan Chen, Fellow, IEEE IEEE Transactions on Mobile Computing, April.
Application-level Scheduling Sathish S. Vadhiyar Credits / Sources: AppLeS web pages and papers.
Relying on Safe Distance to Achieve Strong Partitionable Group Membership in Ad Hoc Networks Authors: Q. Huang, C. Julien, G. Roman Presented By: Jeff.
An Overview of Distributed Real- Time Systems Research By Brian Demers March 24, 2003 CS 535, Spring 2003.
Network Weather Service. Introduction “NWS provides accurate forecasts of dynamically changing performance characteristics from a distributed set of metacomputing.
1 A Grid-Based Middleware’s Support for Processing Distributed Data Streams Liang Chen Advisor: Gagan Agrawal Computer Science & Engineering.
Application-Aware Traffic Scheduling for Workload Offloading in Mobile Clouds Liang Tong, Wei Gao University of Tennessee – Knoxville IEEE INFOCOM
Resource Characterization Rich Wolski, Dan Nurmi, and John Brevik Computer Science Department University of California, Santa Barbara VGrADS Site Visit.
Dynamo: A Runtime Codesign Environment
Introduction to Load Balancing:
Resource Characterization
Job Scheduling in a Grid Computing Environment
Load Weighting and Priority
Auburn University COMP7500 Advanced Operating Systems I/O-Aware Load Balancing Techniques (2) Dr. Xiao Qin Auburn University.
Liang Chen Advisor: Gagan Agrawal Computer Science & Engineering
The War Between Mice & Elephants by, Matt Hartling & Sumit Kumbhar
Presentation transcript:

Network Weather Service Sathish Vadhiyar Sources / Credits: NWS web site: NWS papers

Introduction NWS provides accurate forecasts of dynamically changing performance characteristics from a distributed set of metacomputing resources What will be the future load (not current load) when a program is executed? Producing short-term performance forecasts based on historical performance measurements The forecasts can be used by dynamic scheduling agents

Introduction Resource allocation and scheduling decisions must be based on predictions of resource performance during a timeframe NWS takes periodic measurements of performance and using numerical models, forecasts resource performance

NWS Goals Components Persistent state Persistent state Name server Name server Sensors Sensors Passive (CPU availability) Active (Network measurements) Forecaster Forecaster

Architecture

Architecture

Performance measurements Using sensors CPU sensors Measures CPU availability Measures CPU availability Uses Usesuptimevmstat Active probes Network sensors Measures latency and bandwidth Measures latency and bandwidth Each host maintains Current data Current data One-step ahead predictions One-step ahead predictions Time series of data Time series of data

Network Measurements

Issues with Network Sensors Appropriate transfer size for measuring throughput Collision of network probes Solutions Tokens and hierarchical trees with cliques Tokens and hierarchical trees with cliques

Available CPU measurement

The formulae shown does not take into account job priorities Hence periodically an active probe is run to adjust the estimates

Predictions To generate a forecast, forecaster requests persistent state data When a forecast is requested, forecaster makes predictions for existing measurements using different forecast models Dynamic choice of forecast models based on the best Mean Absolute Error, Mean Square Prediction Error, Mean Percentage Prediction Error Forecasts requested by: InitForecaster() RequestForecasts() Forecasting methods Mean-based Mean-based Median based Median based Autoregressive Autoregressive

Forecasting Methods Notations: Prediction Accuracy: Mean Absolute Error (MAE) is the average of the above Prediction Method:

Forecasting Methods – Mean- based

Forecasting Methods – Mean- based 4. 5.

Forecasting Methods – Median- based

Autoregression 1. a i found such that it minimizes the overall error. r i,j is the autocorellation function for the series of N measurements.

Forecasting Methodology

Forecast Results

Forecasting Complexity vs Accuracy Semi Non-parametric Time Series Analysis (SNP) – an accurate but complicated model Model fit using iterative search Calculation of conditional expected value using conditional probability density

Sensor Control Each sensor connects to other sensors and perform measurements O(N 2 ) To reduce the time complexity, sensors organized in hierarchy called cliques To avoid collisions, tokens are used Adaptive control using adaptive token timeouts Adaptive time-out discovery and distributed leader election protocol

Synchronizing network probes Consistent periodicity and mutual exclusion Token List of hosts to probe List of hosts to probe Periodicity of probe Periodicity of probe Parameters to the probe Parameters to the probe Sequence number Sequence number Leader initiates the token A hosts after receiving a token: Conducts probes with the other hosts in the token Conducts probes with the other hosts in the token Passes the token to the next host Passes the token to the next host Token passed back to the leader

Contd… Leader notes the token circuit time and calculates the next token initiation time as (desired periodicity – token circuit time) To avoid long delays in token circulation and to have fault tolerance: Each host maintains a timer Each host maintains a timer When the timer times out, the host declares itself as the leader and initiates a new token When the timer times out, the host declares itself as the leader and initiates a new token When a host encounters two tokens, the old token is destroyed When a host encounters two tokens, the old token is destroyed Calculation of time-outs Each host records token circuit time, variance of the time Each host records token circuit time, variance of the time Uses NWS forecasting models to predict the next token arrival time Uses NWS forecasting models to predict the next token arrival time

New Protocol Compromise between periodicity and mutual exclusion NWS administrator specifies periodicity, and an upper range of desired periodicity If network conditions are stable and if tokens are received within the upper range, then mutual exclusion is guaranteed If network conditions are stable and if tokens are received within the upper range, then mutual exclusion is guaranteed If not, hosts times out and start conducting probes with possible collisions If not, hosts times out and start conducting probes with possible collisions Thus the protocol switches between good and bad phases

Illustration

Comparison of 2 protocols – Experimental setup 4 machines – 2 in Lyon, France and 2 in Tennessee, USA 240 second periodicity 5 second range

Comparison - Periodicity

Comparison – Mutual exclusion

Use of NWS: Scheduling a Jacobi application The problem: Appropriate partitioning strategy to balance processor efficiencies and communication overheads, i.e. deriving partitions to obtain resource performance

Deriving Partitions for Jacobi Notations Per-processor execution time The goal

Deriving Partitions for Jacobi Communication time Soultion: system of linear equations by Gaussian Elimination

NWS in Jacobi

Resource Selection and Scheduling

References Implementing a Performance Forecasting System for Metacomputing: The Network Weather Service. Rich Wolski, Neil Spring, Chris Peterson, in Proceedings of SC97, November, Dynamically Forecasting Network Performance Using the Network Weather Service. Rich Wolski, in Journal of Cluster Computing, Volume 1, pp , January, The Network Weather Service: A Distributed Resource Performance Forecasting Service for Metacomputing. Rich Wolski, Neil Spring, and Jim Hayes, Journal of Future Generation Computing Systems,Volume 15, Numbers 5-6, pp , October, 1999.

References Synchronizing Network Probes to avoid Measurement Intrusiveness with the Network Weather Service, B. Gaidioz, R. Wolski, and B. Tourancheau, Proceedings of 9th IEEE High- performance Distributed Computing Conference, August, 2000, pp Experiences with Predicting Resource Performance On-line in Computational Grid Settings, Rich Wolski, ACM SIGMETRICS Performance Evaluation Review, Volume 30, Number 4, pp , March, 2003.

Forecasting Methods Summary

Prediction Accuracy