Download presentation
Presentation is loading. Please wait.
Published byJulie Cole Modified over 9 years ago
1
CS 484 Load Balancing
2
Goal: All processors working all the time Efficiency of 1 Distribute the load (work) to meet the goal Two types of load balancing Static Dynamic
3
Load Balancing The load balancing problem can be reduced to the bin-packing problem NP-complete For simple cases, we can do well, but … Heterogeneity Different types of resources Processor Network, etc.
4
Evaluation of load balancing Efficiency Are the processors always working? How much processing overhead is associated with the load balance algorithm? Communication Does load balance introduce or affect the communication pattern? How much communication overhead is associated with the load balance algorithm? How many edges are cut in communication graph?
5
Partitioning Techniques Regular grids (-: Easy :-) striping blocking use processing power to divide load more fairly Generalized Graphs Levelization Scattered Decomposition Recursive Bisection
6
Example consider a set of twelve independent tasks with the following set of execution times: {10, 6, 4, 4, 2, 2, 2, 2, 1, 1, 1, 1} How would you distribute these tasks among 4 processors? 6
7
Consecutive block assignment 7 execution time for these twelve tasks would be 20 time units This schedule would take only 10 time units
8
8 Evaluation of Load Balancing Goal: Find a good mapping from the application graph G = (V,E) onto the processor graph H = (U,F) Consider Load: max number of nodes from G assigned to any single node of H Dilation: max distance of any route of a single edge from G in H Congestion: max number of edges from G that have to be routed via any single edge in H
9
9 Overall Goal: Find a mapping π that minimizes all three measures - load, dilation, and congestion Note: Today’s networks make dilation inconsequential to some extent
10
Levelization Begin with a boundary Number these nodes 1 All nodes connected to a level 1 node are labeled 2, etc. Partitioning is performed determine the number of nodes per processor count off the nodes of a level until exhausted proceed to the next level
11
Levelization
12
Recursive Coordinate Bisection Divide the domain based on the physical coordinates of the nodes. Pick a dimension and divide in half. RCB uses no connectivity information lots of edges crossing boundaries partitions may be disconnected Some new research based on graph separators overcomes some problems.
13
13 Unbalanced Recursive Bisection An attempt at reducing communication costs Create subgrids that have better aspect ratios Instead of dividing the grid in half, consider unbalanced subgrids of size: 1/p and (p-1)/p 2/p and (p-2)/p Etc. Choose the partition size that minimizes the subgrid aspect ratio
14
14 Unbalanced Recursive Bisection
15
Graph Theory Based Algorithms Geometric algorithms are generally low quality they don’t take into account connectivity Graph theory algorithms apply what we know about generalized graphs to the partitioning problem Hopefully, they reduce the cut size
16
Greedy Bisection Start with a vertex of the smallest degree least number of edges Mark all its neighbors Mark all its neighbors neighbors, etc. The first n/p marked vertices form one subdomain Apply the algorithm on the remaining
17
Recursive Graph Bisection Based on graph distance rather than coordinate distance. Determine the two furthest separated nodes Organize and partition nodes according to their distance from extremities. Computationally expensive Can use approximation methods.
18
Recursive Spectral Bisection Minimize the number of edges cut with the partition 18
19
RCB 529 edges cutRGB 618 edges cut RSB 299 edges cut
20
20 Dynamic Load Balancing
21
Load is statically partitioned initially Adjust load when an imbalance is detected. Objectives rebalance the load keep edge cut minimized (communication) avoid having too much overhead
22
Dynamic Load Balancing Consider adaptive algorithms After an interval of computation mesh is adjusted according to an estimate of the discretization error coarsened in areas refined in others Mesh adjustment causes load imbalance
23
Centralized DLB Control of the load is centralized Two approaches Master-worker (Task scheduling) Tasks are kept in central location Workers ask for tasks Requires that you have lots of tasks with weak locality requirements. No major communication between workers Load Monitor Periodically, monitor load on the processors Adjust load to keep optimal balance
24
Decentralizing DLB Generally focused on work pool Two approaches Hierarchy Fully distributed
25
Fully Distributed DLB Lower overhead than centralized schemes. No global information Load is locally optimized Propagation is slow Load balance may not be as good as centralized load balance scheme Three steps Flow calculation (How much to move) Mesh node selection (Which work to move) Actual mesh node migration
26
Flow calculation View as a network flow problem Add source and sink nodes Connect source to all nodes edge value is current load Connect sink to all nodes edge value is mean load processor communication graph
27
Flow calculation Many network flow algorithms more intense than necessary not parallel Use simpler, more scalable algorithms Random Matchings pick random neighboring processes exchange some load eventually you may get there
28
Diffusion Each processor balances its load with all its neighbors How much work should I have? ( is weighting factor) How much to send on an edge? Repeat until all load is balanced steps
29
Diffusion Convergence to load balance can be slow Can be improved with over-relaxation Monitor what is sent in each step Determine how much to send based on current imbalance and how much was sent in previous steps Diffuses load in steps
30
Dimension Exchange Rather than communicate with all neighbors each round, only communicate with one synchronous algorithm Comes from dimensions of hypercube Use edge coloring for general graphs Exchange load with neighbor along a dimension l = (li + lj)/2 Will converge in d steps if hypercube Some graphs may need different factor to converge faster l = li * a + lj * (1 –a)
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.