Customized Dynamic Load Balancing for a Network of Workstations Taken from work done by: Mohammed Javeed Zaki, Wei Li, Srinivasan Parthasarathy Computer.

Slides:



Advertisements
Similar presentations
Scheduling in Web Server Clusters CS 260 LECTURE 3 From: IBM Technical Report.
Advertisements

Distributed Systems Major Design Issues Presented by: Christopher Hector CS8320 – Advanced Operating Systems Spring 2007 – Section 2.6 Presentation Dr.
Analysis of : Operator Scheduling in a Data Stream Manager CS561 – Advanced Database Systems By Eric Bloom.
Resource Management §A resource can be a logical, such as a shared file, or physical, such as a CPU (a node of the distributed system). One of the functions.
CS 484. Discrete Optimization Problems A discrete optimization problem can be expressed as (S, f) S is the set of all feasible solutions f is the cost.
Dynamic Load Balancing for VORPAL Viktor Przebinda Center for Integrated Plasma Studies.
MPI and C-Language Seminars Seminar Plan  Week 1 – Introduction, Data Types, Control Flow, Pointers  Week 2 – Arrays, Structures, Enums, I/O,
A Parallel Computational Model for Heterogeneous Clusters Jose Luis Bosque, Luis Pastor, IEEE TRASACTION ON PARALLEL AND DISTRIBUTED SYSTEM, VOL. 17, NO.
Reference: Message Passing Fundamentals.
An Evaluation of a Framework for the Dynamic Load Balancing of Highly Adaptive and Irregular Parallel Applications Kevin J. Barker, Nikos P. Chrisochoides.
Dynamic Load Balancing Experiments in a Grid Vrije Universiteit Amsterdam, The Netherlands CWI Amsterdam, The
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
Fault-tolerant Adaptive Divisible Load Scheduling Xuan Lin, Sumanth J. V. Acknowledge: a few slides of DLT are from Thomas Robertazzi ’ s presentation.
High Performance Computing 1 Parallelization Strategies and Load Balancing Some material borrowed from lectures of J. Demmel, UC Berkeley.
Design, Implementation, and Evaluation of Differentiated Caching Services Ying Lu, Tarek F. Abdelzaher, Avneesh Saxena IEEE TRASACTION ON PARALLEL AND.
1 Introduction to Load Balancing: l Definition of Distributed systems. Collection of independent loosely coupled computing resources. l Load Balancing.
CS 584. Discrete Optimization Problems A discrete optimization problem can be expressed as (S, f) S is the set of all feasible solutions f is the cost.
Strategies for Implementing Dynamic Load Sharing.
Distributed Process Management1 Learning Objectives Distributed Scheduling Algorithms Coordinator Elections Orphan Processes.
Omar Darwish.  Load balancing is the process of improving the performance of a parallel and distributed system through a redistribution of load among.
A Framework for Collective Personalized Communication Laxmikant V. Kale, Sameer Kumar, Krishnan Varadarajan.
Introduction to Parallel Programming MapReduce Except where otherwise noted all portions of this work are Copyright (c) 2007 Google and are licensed under.
VOLTAGE SCHEDULING HEURISTIC for REAL-TIME TASK GRAPHS D. Roychowdhury, I. Koren, C. M. Krishna University of Massachusetts, Amherst Y.-H. Lee Arizona.
Designing and Evaluating Parallel Programs Anda Iamnitchi Federated Distributed Systems Fall 2006 Textbook (on line): Designing and Building Parallel Programs.
Multiple Processor Systems. Multiprocessor Systems Continuous need for faster and powerful computers –shared memory model ( access nsec) –message passing.
Heterogeneous Parallelization for RNA Structure Comparison Eric Snow, Eric Aubanel, and Patricia Evans University of New Brunswick Faculty of Computer.
Performance Evaluation of Parallel Processing. Why Performance?
1 Interconnects Shared address space and message passing computers can be constructed by connecting processors and memory unit using a variety of interconnection.
Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies.
August 15, 2001Systems Architecture II1 Systems Architecture II (CS ) Lecture 12: Multiprocessors: Non-Uniform Memory Access * Jeremy R. Johnson.
Performance Model & Tools Summary Hung-Hsun Su UPC Group, HCS lab 2/5/2004.
Performance Measurement n Assignment? n Timing #include double When() { struct timeval tp; gettimeofday(&tp, NULL); return((double)tp.tv_sec + (double)tp.tv_usec.
Chapter 6 Multiprocessor System. Introduction  Each processor in a multiprocessor system can be executing a different instruction at any time.  The.
Load Balancing The author of these slides is Dr. Arun Sood of George Mason University. Students registered in Computer Science networking courses at GMU.
Performance Measurement. A Quantitative Basis for Design n Parallel programming is an optimization problem. n Must take into account several factors:
Chapter 101 Multiprocessor and Real- Time Scheduling Chapter 10.
Evaluating FERMI features for Data Mining Applications Masters Thesis Presentation Sinduja Muralidharan Advised by: Dr. Gagan Agrawal.
Dynamic Load Balancing in Charm++ Abhinav S Bhatele Parallel Programming Lab, UIUC.
Dynamic Load Balancing and Job Replication in a Global-Scale Grid Environment: A Comparison IEEE Transactions on Parallel and Distributed Systems, Vol.
Super computers Parallel Processing By Lecturer: Aisha Dawood.
《 Hierarchical Caching Management for Software Defined Content Network based on Node Value 》 Reporter : Jing Liu , China Affiliation : University of Science.
CprE 458/558: Real-Time Systems (G. Manimaran)1 CprE 458/558: Real-Time Systems Distributed Real-Time Systems (contd.)
1 Distributed Databases BUAD/American University Distributed Databases.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.
CS 484 Designing Parallel Algorithms Designing a parallel algorithm is not easy. There is no recipe or magical ingredient Except creativity We can benefit.
CS 484 Load Balancing. Goal: All processors working all the time Efficiency of 1 Distribute the load (work) to meet the goal Two types of load balancing.
CS 584. Discrete Optimization Problems A discrete optimization problem can be expressed as (S, f) S is the set of all feasible solutions f is the cost.
Static Process Scheduling
Scalable and Topology-Aware Load Balancers in Charm++ Amit Sharma Parallel Programming Lab, UIUC.
Utilization Based Duty Cycle Tuning MAC Protocol for Wireless Sensor Networks Shih-Hsien Yang, Hung-Wei Tseng, Eric Hsiao-Kuang Wu, and Gen-Huey Chen Computer.
Deadlock Detection for Distributed Process Networks Alex Olson Embedded Software Spring 2004.
Hierarchical Load Balancing for Large Scale Supercomputers Gengbin Zheng Charm++ Workshop 2010 Parallel Programming Lab, UIUC 1Charm++ Workshop 2010.
Load Balancing : The Goal Given a collection of tasks comprising a computation and a set of computers on which these tasks may be executed, find the mapping.
Concurrency and Performance Based on slides by Henri Casanova.
LACSI 2002, slide 1 Performance Prediction for Simple CPU and Network Sharing Shreenivasa Venkataramaiah Jaspal Subhlok University of Houston LACSI Symposium.
Dynamic Load Balancing Tree and Structured Computations.
Programming for Performance Laxmikant Kale CS 433.
COMP7330/7336 Advanced Parallel and Distributed Computing Task Partitioning Dr. Xiao Qin Auburn University
COMP8330/7330/7336 Advanced Parallel and Distributed Computing Tree-Based Networks Cache Coherence Dr. Xiao Qin Auburn University
Computer Architecture: Parallel Task Assignment
Parallel Programming By J. H. Wang May 2, 2017.
Parallel Algorithm Design
Parallel Programming in C with MPI and OpenMP
Tarun Banka Department of Computer Science Colorado State University
CS 584.
COMP60621 Fundamentals of Parallel and Distributed Systems
PERFORMANCE MEASURES. COMPUTATIONAL MODELS Equal Duration Model:  It is assumed that a given task can be divided into n equal subtasks, each of which.
Parallel Programming in C with MPI and OpenMP
COMP60611 Fundamentals of Parallel and Distributed Systems
Presentation transcript:

Customized Dynamic Load Balancing for a Network of Workstations Taken from work done by: Mohammed Javeed Zaki, Wei Li, Srinivasan Parthasarathy Computer Science Department, University of Rochester June 1997 Presenter: Jacqueline Ewell

Static vs. Dynamic Load Balancing Static Load Balancing: allows the programmer to delegate work before runtime can accommodate for heterogeneous processor and non- uniform loops avoids runtime scheduling overheads needs to know all information about Workstations ahead of time Dynamic Load Balancing: ability to delegate work based on runtime performance of a Network of Workstations (NOW) transient external loads by multiple-users, heterogeneous processors, memory availability, network bandwidths and contentions, and software leads to a more logical choice of dynamic load balancing

Dynamic Load Balancing Strategies Task Queue Model: a centralized queue of work Diffusion Model: all work is delegated to each processor, when an imbalance is detected between it and its neighbor, work is moved Predict future performance from past performance: Exchange of performance information Global Distributed Scheme Global Centralized Scheme Local Distributed Scheme Local Centralized Scheme Work queue Local Global Distributed Centralized

Dynamic Load Balancing Strategies Global - all load balancing is done on a global scale Centralized - the load balancer is located on one processor Local - processors are divided into groups (size = K) and load balancing decisions are done within a group Distributed - the load balancer is replicated on every processor

Dynamic Load Balancing Strategies Global Centralized Load Balancer P1P2P3Pn... Global Distributed Local Distributed P1 P2P3Pn... Load Balancer Load Balancer Load Balancer Load Balancer Local Centralized Load Balancer P1P2P3 Pn... P4 G1 G2 Pn... Load Balancer P3 Load Balancer P3 Load Balancer P1P2 Load Balancer Load Balancer G1 G2

Strategy Tradeoffs Global vs. Local Global information is available at synchronization time; therefore work distribution is optimal Global scheme - synchronization and communication cost is much higher Local scheme - groups may sit idle while other groups are overloaded Centralized vs. Distributed Centralized scheme - one load balancer will hurt scalability Centralized scheme - distribution calculations are on one processor; therefore, done sequentially Distributed - “all-to-all” exchange of performance profile; therefore, network contention could be a problem

DLB Modeling & Decision Process Modeling Parameters: number of processors normalized processor speed number of neighbors data size number of loop iterations work per iteration # of bytes to be comm./iteration time per iteration network latency & bandwidth network topology maximum load duration of persistence of load Processor Parameters Program Parameters Network Parameters External Load Modeling

DLB Modeling & Decision Process (cont.) Total DLB Cost:Synchronization Cost + Cost of Calculating New Distribution + Cost of Sending Instructions* + Cost of Data Movement *only applies to centralized schemes

DLB Modeling & Decision Process (cont.) Synchronization Cost: GCDLB: one-to-all(P) + all-to-one(P) GDDLB: one-to-all(P) + all-to-all(P ) LCDLB: one-to-all(K) + all-to-one(K) LDDLB: one-to-all(K) + all-to-all(K ) 2 2 Cost of Calculating New Distribution:Usually very small Cost of Sending Instructions:Number of send Messages * Latency Cost of Data Movement: Number of Message * Latency + Number of Iterations Moved * Number of Bytes that need to be communicated per iteration / Bandwidth

DLB Modeling & Decision Process (cont.) Initially: work will be divided equally among all processors Synchronization: 1/Pth work has been done load function is known average effective speed is know Performance Metric: (number of iteration per second) load function and other parameters are plugged into the model to select the best strategy Work Movement: if amount of work to be moved is above a threshold Profitability Analysis - move work only if there is a 10% improvement in execution time

Experiment Global Schemes are best ; computation/communication ratio is high More Processors -> More Synchronization Cost ; favors Local Scheme Global is still better at 16- processors Centralized master, sequential redistribution, instruction sends, and delay factors add sufficient overheads to Centralized scheme

Experiment Amount of work/iteration is small; Local Distributed is favored As data size increases; Global Distributed does better On 16-processors, Local Distributed is the best Local is better than Global; since computation/comm. Ratio is small Distributed is better than Centralized

Modeling Results

Conclusions Different Schemes are best for different applications Customized Dynamic Load Balancing is essential when transient external loads are introduced Given the model, it is possible to select a good scheduling scheme Future Work Other Dynamic Load Balancing Schemes need to be incorporate into the model (not lying on the extremes) Instead of Local Central, have one master per group Local schemes, work should be exchanged between different groups Dynamic Group memberships