Download presentation
Presentation is loading. Please wait.
1
An Evaluation of a Framework for the Dynamic Load Balancing of Highly Adaptive and Irregular Parallel Applications Kevin J. Barker, Nikos P. Chrisochoides Proceedings of the ACM/IEEE SC2003 Conference 2003 ACM 2003 ACM Presented by 張肇烜
2
Outline Introduction Introduction Load Balancing State-of-the-Art Load Balancing State-of-the-Art Representative Load Balancing Systems Representative Load Balancing Systems PREMA PREMA Performance Evaluation Performance Evaluation Conclusions Conclusions
3
Introduction Asynchronous and highly adaptive applications are defined by several characteristics. Asynchronous and highly adaptive applications are defined by several characteristics. No global synchronization points are inherent to the application. No global synchronization points are inherent to the application. The computational weights associated with individual work units may vary drastically throughout the execution of the application. The computational weights associated with individual work units may vary drastically throughout the execution of the application. The computation progresses is impossible to predict. The computation progresses is impossible to predict.
4
Introduction (cont.) Existing load balancing methods found in then literature and in publicly available software are not suitable for asynchronous and highly adaptive applications for the following three reasons: Existing load balancing methods found in then literature and in publicly available software are not suitable for asynchronous and highly adaptive applications for the following three reasons: Large penalty for global synchronization. Large penalty for global synchronization. Difficulty in predicting future work loads. Difficulty in predicting future work loads. Heavy workloads may delay message processing. Heavy workloads may delay message processing.
5
Load Balancing State-of-the-Art This can be done by dividing the load balancing process into its three primary step: This can be done by dividing the load balancing process into its three primary step: Information gathering and dissemination. Information gathering and dissemination. Decision making. Decision making. Data or computation migration. Data or computation migration.
6
Load Balancing State-of-the-Art (cont.)
7
(Loosely) Synchronous vs. Asynchronous (Loosely) Synchronous vs. Asynchronous Synchronous load balancing methods and tools must gather load information from all processors in order to reconstruct the global system state. Synchronous load balancing methods and tools must gather load information from all processors in order to reconstruct the global system state. Asynchronous methods require communication with only a small fixed-size ‘neighborhood’ of processors. Asynchronous methods require communication with only a small fixed-size ‘neighborhood’ of processors.
8
Load Balancing State-of-the-Art (cont.) Programmer-supplied Hints vs. Runtime Instrumentation. Programmer-supplied Hints vs. Runtime Instrumentation. First method for doing this is for the programmer to provide hints about the weight of pending computation. First method for doing this is for the programmer to provide hints about the weight of pending computation. Second method is to make the assumption that future performance will be related to what has been seen in the past. Second method is to make the assumption that future performance will be related to what has been seen in the past.
9
Load Balancing State-of-the-Art (cont.) Explicitly Initiated Load Balancing vs. Preemptive Load Balancing. Explicitly Initiated Load Balancing vs. Preemptive Load Balancing. Explicit load balancing has the advantage that well-tuned application routines will not be interrupted. Explicit load balancing has the advantage that well-tuned application routines will not be interrupted. Implicit load balancing will periodically check for pending balancer messages. Implicit load balancing will periodically check for pending balancer messages.
10
Representative Load Balancing Systems ParMETIS ParMETIS ParMETIS is an MPI-based parallel library that implements a variety of algorithms for partitioning unstructured graphs. ParMETIS is an MPI-based parallel library that implements a variety of algorithms for partitioning unstructured graphs. This type of explicit repartitioning suffers from the global synchronization and inaccurate workload prediction problems. This type of explicit repartitioning suffers from the global synchronization and inaccurate workload prediction problems.
11
Representative Load Balancing Systems (cont.) Charm++ Charm++ Charm++ is a parallel object-oriented programming language based on C++. Charm++ is a parallel object-oriented programming language based on C++. Programs written in Charm++ are decomposed into a number of cooperating message-driven objects called chares. Programs written in Charm++ are decomposed into a number of cooperating message-driven objects called chares. The load balancing methods are implemented using a global barrier. The load balancing methods are implemented using a global barrier. Load balancing is achieved by mapping and re-mapping chares to available processors. Load balancing is achieved by mapping and re-mapping chares to available processors.
12
PREMA PREMA is a runtime library based on a design philosophy which includes: PREMA is a runtime library based on a design philosophy which includes: Single-sided communication. Single-sided communication. A global namespace. A global namespace. A framework which allows implementation of customized dynamic load balancing algorithms. A framework which allows implementation of customized dynamic load balancing algorithms. A suite of commonly used dynamic load balancing strategies. A suite of commonly used dynamic load balancing strategies.
13
PREMA (cont.) First decomposed into some number of subdomains. First decomposed into some number of subdomains. Each subdomain is then registered with the PREMA system as a mobile object and assigned a unique mobile pointer. Each subdomain is then registered with the PREMA system as a mobile object and assigned a unique mobile pointer. The PERMA library allows load balancing to be initiated either explicitly or implicitly. The PERMA library allows load balancing to be initiated either explicitly or implicitly.
14
PREMA (cont.) Explicit Load Balancing Explicit Load Balancing Explicit load balancing requires the application program to explicity hand control to the load balancing algoritm. Explicit load balancing requires the application program to explicity hand control to the load balancing algoritm. This is done with the polling operation. This is done with the polling operation. The delay often suffered by load balancing information and request messages. The delay often suffered by load balancing information and request messages.
15
PREMA (cont.) Implicit Load Balancing Implicit Load Balancing Load balancing messages that are processed preemptively in no way affect the execution of the application. Load balancing messages that are processed preemptively in no way affect the execution of the application. Load balancing messages can be guaranteed to be received in a timely manner. Load balancing messages can be guaranteed to be received in a timely manner. The number of wasted processor cycles is minimized. The number of wasted processor cycles is minimized.
16
Performance Evaluation The benchmark program allows us to compare the performance of the three load balancers. The benchmark program allows us to compare the performance of the three load balancers. Command-line parameters are parsed to determine the number of work units. Command-line parameters are parsed to determine the number of work units. The work units are created and distributed to the available processors. The work units are created and distributed to the available processors. Computation is assigned to each work unit. Computation is assigned to each work unit.
17
Performance Evaluation (cont.) Control is handed to the runtime system and the load balancer. Control is handed to the runtime system and the load balancer. There is no communication between work units, and work units are able to execute in any order. There is no communication between work units, and work units are able to execute in any order. We vary two parameters: the initial imbalance percentage and the difference in computational weights. We vary two parameters: the initial imbalance percentage and the difference in computational weights.
18
Performance Evaluation (cont.)
26
Conclusions We have presented a runtime software system for implementing asynchronous and highly adaptive and irregular applications on distributed memory platforms. We have presented a runtime software system for implementing asynchronous and highly adaptive and irregular applications on distributed memory platforms. Our approach is effective in terms of minimizing idle cycles due to work load imbalances and efficient in terms of the overhead introduced during work load balancing for asynchronous and highly adaptive applications. Our approach is effective in terms of minimizing idle cycles due to work load imbalances and efficient in terms of the overhead introduced during work load balancing for asynchronous and highly adaptive applications.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.