Adaptivity and Dynamic Load Balancing

Slides:



Advertisements
Similar presentations
Alyce Brady CS 470: Data Structures CS 510: Computer Algorithms Post-order Traversal: Left Child - Right Child - Root Depth-First Search.
Advertisements

CAN 1.Distributed Hash Tables a)DHT recap b)Uses c)Example – CAN.
Sharks and Fishes – The problem
Resource Management §A resource can be a logical, such as a shared file, or physical, such as a CPU (a node of the distributed system). One of the functions.
Token-Dased DMX Algorithms n LeLann’s token ring n Suzuki-Kasami’s broadcast n Raymond’s tree.
CS 484. Discrete Optimization Problems A discrete optimization problem can be expressed as (S, f) S is the set of all feasible solutions f is the cost.
1 Load Balancing and Termination Detection ITCS 4/5145 Parallel Programming, UNC-Charlotte, B. Wilkinson, 2009.
Parallel Graph Algorithms
Game of Life Courtesy: Dr. David Walker, Cardiff University.
Advanced Topics in Algorithms and Data Structures Lecture pg 1 Recursion.
Shirokuro : A Backtracking Approach Benjamin Bush Faculty Advisors: Dr. Russ Abbott, Dr. Gary Brookfield Department of Computer Science, Department of.
Load Balancing How? –Partition the computation into units of work (tasks or jobs) –Assign tasks to different processors Load Balancing Categories –Static.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
Search Algorithms for Discrete Optimization Problems
CS 584. Discrete Optimization Problems A discrete optimization problem can be expressed as (S, f) S is the set of all feasible solutions f is the cost.
Strategies for Implementing Dynamic Load Sharing.
CS 206 Introduction to Computer Science II 10 / 16 / 2009 Instructor: Michael Eckmann.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
P2P Course, Structured systems 1 Introduction (26/10/05)
CSC 2300 Data Structures & Algorithms February 16, 2007 Chapter 4. Trees.
Search Algorithms for Discrete Optimization Problems Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar Adapted for 3030 To accompany the text.
Broadcast & Convergecast Downcast & Upcast
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
Dynamic Load Balancing Tree and Structured Computations CS433 Laxmikant Kale Spring 2001.
Load Balancing and Termination Detection Load balance : - statically before the execution of any processes - dynamic during the execution of the processes.
Designing and Evaluating Parallel Programs Anda Iamnitchi Federated Distributed Systems Fall 2006 Textbook (on line): Designing and Building Parallel Programs.
Parallel Search Algorithm
Chapter 11 Indexing & Hashing. 2 n Sophisticated database access methods n Basic concerns: access/insertion/deletion time, space overhead n Indexing 
Application Paradigms: Unstructured Grids CS433 Spring 2001 Laxmikant Kale.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
CS 584. Load Balancing Goal: All processors working all the time Efficiency of 1 Distribute the load (work) to meet the goal Two types of load balancing.
Arboles B External Search The algorithms we have seen so far are good when all data are stored in primary storage device (RAM). Its access is fast(er)
More on Adaptivity in Grids Sathish S. Vadhiyar Source/Credits: Figures from the referenced papers.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
CS 484 Designing Parallel Algorithms Designing a parallel algorithm is not easy. There is no recipe or magical ingredient Except creativity We can benefit.
CS 484 Load Balancing. Goal: All processors working all the time Efficiency of 1 Distribute the load (work) to meet the goal Two types of load balancing.
CS 584. Discrete Optimization Problems A discrete optimization problem can be expressed as (S, f) S is the set of all feasible solutions f is the cost.
Parallel Graph Algorithms Sathish Vadhiyar. Graph Traversal  Graph search plays an important role in analyzing large data sets  Relationship between.
Dynamic Load Balancing Tree and Structured Computations.
CPS120: Introduction to Computer Science Sorting.
Load Balancing Definition: A load is balanced if no processes are idle
Auburn University
Load Balancing and Termination Detection
2D AFEAPI Overview Goals, Design Space Filling Curves Code Structure
Courtesy: Dr. David Walker, Cardiff University
Parallel Graph Algorithms
ChaNGa: Design Issues in High Performance Cosmology
Lecture 3: State, Detection
Planning & System installation
Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing Mapping Techniques Dr. Xiao Qin Auburn University.
Mutual Exclusion Continued
Parallel Density-based Hybrid Clustering
Parallel Algorithm Design
Parallel Programming in C with MPI and OpenMP
i206: Lecture 13: Recursion, continued Trees
L21: Putting it together: Tree Search (Ch. 6)
Alyce Brady CS 470: Data Structures CS 510: Computer Algorithms
Butterfly Network A butterfly network consists of (K+1)2^k nodes divided into K+1 Rows, or Ranks. Let node (i,j) refer to the jth node in the ith Rank.
Alyce Brady CS 470: Data Structures CS 510: Computer Algorithms
Advanced Associative Structures
Parallel Sort, Search, Graph Algorithms
CS 584.
Cosmology Applications N-Body Simulations
Introduction to Data Structures
CSC4005 – Distributed and Parallel Computing
Load Balancing Definition: A load is balanced if no processes are idle
Introduction to High Performance Computing Lecture 17
CIS825 Lecture 5 1.
Presented by Sams Uddin Ahamed & Najmus Sakib Borson Scapegoat Tree (Data Structure)
Data Mining CSCI 307, Spring 2019 Lecture 23
Presentation transcript:

Adaptivity and Dynamic Load Balancing Sathish Vadhiyar

Introduction Advanced scientific applications can be irregular They show changes in workloads, computation and communication characteristics in both space and time. Hence, need to adapt to changes for best performance

Dynamic Load Balancing in Depth First Search (DFS)

Recall DFS… Left subtree can be searched in parallel with the right subtree Statically assign a node to a processor – the whole subtree rooted at that node can be searched independently. Can lead to load imbalance; Load imbalance increases with the number of processors

Dynamic Load Balancing (DLB) Difficult to estimate the size of the search space beforehand Need to balance the search space among processors dynamically In DLB, when a processor runs out of work, it gets work from another processor

Maintaining Search Space Each processor searches the space depth-first Unexplored states saved as stack; each processor maintains its own local stack Initially, the entire search space assigned to one processor When a processor’s local stack is empty, it requests untried alternative from another processor’s stack

Work Splitting When a processor receives work request, it splits its search space Half-split: Stack space divided into two equal pieces – may result in load imbalance Giving stack space near the bottom of the stack can lead to giving bigger trees Stack space near the top of the stack tend to have small trees To avoid sending very small amounts of work – nodes beyond a specified stack depth are not given away – cutoff depth

Strategies 1. Send nodes near the bottom of the stack 2. Send nodes near the cutoff depth 3. Send half the nodes between the bottom of the stack and the cutoff depth Example: Figures 11.5(a) and 11.9

Load Balancing Strategies Asynchronous round-robin: Each processor has a target processor to get work from; the value of the target is incremented with modulo Global round-robin: One single target processor variable is maintained for all processors Random polling: randomly select a donor

Termination Detection Dijikstra’s Token Termination Detection Algorithm Based on passing of a token in a logical ring; P0 initiates a token when idle; A processor holds a token until it has completed its work, and then passes to the next processor; when P0 receives again, then all processors have completed However, a processor may get more work after becoming idle

Algorithm Continued…. Taken care of by using white and black tokens A processor can be in one of two states: black and white Initially, the token is white; all processors are in white state

Algorithm Continued…. If a processor Pj sends work to Pi (i<j), the token must traverse the ring again A processor j becomes black if it sends work to i<j If j completes work, it changes token to black and sends it to next processor; after sending, changes to white. When P0 receives a black token, reinitiates the ring

Dynamic Load Balancing in Game of Life problem

Dynamic load balancing Performed at each time step Orthogonal Recursive Bisection (ORB) Problems: Complexity in finding the neighbors and data for communication