Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Parallelization of State-of-the-Art Graph Bisection Algorithms

Similar presentations


Presentation on theme: "A Parallelization of State-of-the-Art Graph Bisection Algorithms"— Presentation transcript:

1 A Parallelization of State-of-the-Art Graph Bisection Algorithms
Nan Dun, Kenjiro Taura, Akinori Yonezawa Graduate School of Information Science and Technology The University of Tokyo

2 Problem Description Graph Partition Problem Complexity Solutions
Goal: To minimize cut K-partition Bisection (Bipartition) Problem Complexity To find best partition or To find approximate partitions: NP-Hard1)2) Solutions Heuristics Non-deterministic On the Grid グラフ分割問題 L={1,2,3} R={4,5,6} 1 1 4 5 2 2 3 6 無向グラフ G=(V,E)が与えられたとき、|L|=|R|を満たすVの分割(L,R)で、LとR間の枝の本数を最小にするものを求める問題。 July 31, Kochi SWoPP 2006

3 Practical Application
In Mathematics Analysis of sparse system of linear equations In Computer Science Modeling data placement on distributed memory, to minimize communication In other Various Domains VLSI Design Transportation Networks Communication Networks July 31, Kochi SWoPP 2006

4 Bisection Initialization
Bisection Flow Bisection Initialization Random Initialization Half-Half Initialization Region Growing Bisection Refinement Kernighan-Lin3)4) Tabu Search7) Fixed Tabu Search Reactive Tabu Search Bisection Initialization Initial Bisection Bisection Refinement Final Bisection July 31, Kochi SWoPP 2006

5 Min-Max Greedy Growing7)
addset A B A Max: Breaking ties by maximizing internal connections Min: Search vertices which cause minimal edge-cut C July 31, Kochi SWoPP 2006

6 Swapping Pair of Vertices
Kernighan-Lin3)4) A C Calculate gain of each vertex Search a serials of pairs which leads to maximal edge-cut reduction if being swapped Swap pairs of vertices obtained in 2, lock them from further swap in current pass Iterate step 1, 2, 3 until edge-cut stops to converge B D Swapping Pair of Vertices A B C D gain(B) = -1, gain(C) = -2 ΔCut of swapping B, C = gain(B) + gain(C) + 2 = -1 *gain := # of Internal Edges - # of External Edges July 31, Kochi SWoPP 2006

7 Tabu Search7) Kernighan-Lin Like Temporarily Forbidden
Swapping pairs of vertices according to their gains Temporarily Forbidden Previously swapped vertices are temporarily forbad to move for a period of time (Tabu Length) Tabu Length: A fraction (Tabu Fraction) of |V| E.g.: Tabu Fraction = 0.01, |V| = 1000, Tabu Length = 0.01 x |V| = Previously swapped pairs are allowed to move again after 10 other swaps To exceed “Local-Minimum” July 31, Kochi SWoPP 2006

8 Graph Types – Tabu Lengths
|V| = |E| = Deg: Max 43 Min 3 Avg. 19.8 |V| = |E| = Deg: Max 573 Min 1 Avg. 6.1 Edge-Cut Tabu Fraction Number of Vertex Degree Denser random graphs tend to prefer smaller Tabu lengths, while denser geometric graphs tend to prefer larger tabu lengths8) Distribution of Vertex Degree Graphs having uniform distribution of vertex degree tend to have unique fitting tabu length July 31, Kochi SWoPP 2006

9 RRTS7) Synthesis of Heuristics Reactive Best Quality Long Running Time
Heuristics perform as complementary for each other Reactive Try each Tabu-length to see which is better Adaptive to various graphs Best Quality Beyond “Local-minimum” Long Running Time Scoring Phase REACTIVERANDOMIZEDTABUSEARCH Scoring each Tabu length by small runs of TS do I times Initial bisection by Min-Max do J times TS with high-scored Tabu length Refine by Kernighan-Lin runs R. Battiti and A. A. Bertossi. Greedy, Prohibition, and Reactive Heuristics for Graph Partitioning. IEEE Transactions on Computers, Vol. 48, April 1999. July 31, Kochi SWoPP 2006

10 Multi-level for Large Graphs
Coarsen Phase Coarsen large graphs to smaller one by using “Match Scheme” Multi-level coarsen Bisection Phase Bisecting small graphs is usually very fast Uncoarsen Phase Mapping back to original graph Perform refinement in each uncoarsening phase METIS5)12) Matching Scheme July 31, Kochi SWoPP 2006

11 Comparison of Heuristics
METIS RRTS100 FTS10000 cut time G1 130 0.01 168.11 1.22 G2 366 0.07 353 696.49 354 13.85 G3 311 0.10 935.56 306 32.85 G4 6337 0.04 6257 353.45 6316 3.77 G5 950 0.17 Timeout (1 hour) 929 31.55 Graph |V| |E| Degree Best Tabu Fraction Avg Min Max G1:fe_4elt 11143 32818 7.93 15 0.02 G2:fe_pwt 36519 144794 5.89 3 12 G3:fe_body 45087 163734 7.26 28 G4:mem 17758 54196 6.10 1 573 0.14 G5:wing 62032 121544 3.92 2 4 0.01 July 31, Kochi SWoPP 2006

12 Comparison of Heuristics
METIS Extremely Fast Using Multi-level Technique High-Quality Bisections but worse than RRTS Multi-level lacks “Global-Optimizing” during coarsen phase RRTS Very Slow Scoring Phase is time costing “Ever-best” Bisections Adaptive to kinds of graphs FTS with Known Tabu-Length Must faster than RRTS Comparable result to RRTS July 31, Kochi SWoPP 2006

13 A Naive Parallelization
Dispatch Graphs RRTS100 RRTS100 RRTS100 RRTS100 RRTS100 RRTS100 RRTS100 Synthesize Results Run RRTS independently on each node Simply equivalent to scale-up iterations Generate Different seeds for different nodes Heuristics are initial sensitive 10% ~ 20% enhanced July 31, Kochi SWoPP 2006

14 Statistical Properties of Cut-size
Incidence of Bests Average quality is good Only 0.25% is the best General Property Distribution becomes “Peak” as |V| grows Distribution tends towards Gaussian8) Mean and Variance scales linearly with |V| Count Edge-Cut |V| = |E| = Degree: Max 43 Min 3 Avg 19.80 RRTS100 on 400 nodes provided by Grid Challenge Federation July 31, Kochi SWoPP 2006

15 Issues of Parallelizing Heuristics
Hard by Message-Passing Model (MPI) J.R. Gilbert and E. Zmijewski9): A parallel graph partitioning algorithm for a message-passing multiprocessor. International Journal of Parallel Programming Par-METIS (Parallel METIS) Par-METIS only parallelized “coarsen-uncoarsen” part Hard to Be Efficient (statistic property) If we could parallelize heuristic efficiently The fraction of reach the best bisections is still small among overall iterations If we corporately run independent instance on Grid How many nodes will leads to best partition When will a good threshold come July 31, Kochi SWoPP 2006

16 Contribution of Phases
Initial Phase Reduce large portion of Edge-cut Good initial partitions lead to good final partitions Consistent time for different running, good initial partitions gain time for refinement TS and KL Phase Reductions tend be alike More iterations, better results ΔEdge-Cut Best Edge-Cuts July 31, Kochi SWoPP 2006

17 Results from Same Initial Bisections
Given Same Initial Partitions Best initial partitions leads to best final partitions FTS and KL tend to be deterministic Fewer swapping are available Diversity of edge-cut can be cancelled by distributing only one phase Run FTS and KL on one node is enough Count Perform FTS and KL on same initial partitions, 50 nodes July 31, Kochi SWoPP 2006

18 Multi-level Scoring Mainly Used to Adapt Large-Scale Graphs
Edge-Cut Edge-Cut Level-1 Tabu Fraction Level-2 Tabu Fraction Mainly Used to Adapt Large-Scale Graphs If |V| = 1000, Tabu = 0.01 x 1000 = 10 If |V| = , Tabu = 0.01 x = 1000 Tuning Tabu-Length to fit specific graphs better Level-1 Scoring distinguish graphs from their types Level-2 Scoring test better Tabu-length from specific graphs July 31, Kochi SWoPP 2006

19 Final Approaches Not to Use Multi-level Partition
To preserve a “best” quality Not to Parallelize Heuristics Itself Not a good trade-off To Parallelize Scoring Phase One group of nodes score one tabu length With multi-level scoring technique To Parallelize Initial Phase Only Remove diversity of edge-cut ASAP Take advantage of running distribution to remove diversity of edge-cut Reduce computing effort AMAP Further refinement can be done on single node To Use GXP Cluster Shell “mw” command: mw M {{ W }} July 31, Kochi SWoPP 2006

20 Full Picture Best Initial Partitions FTS and KL Multi-Level Scoring
High-Scored Level-1 Tabu Fraction S:0.001 S: 0.002 S: 0.003 S: 0.004 S: 0.005 S: 0.006 S: 0.007 High-Scored Level-2 Tabu Fraction Initial Phase Init Init Init Init Init Init Best Initial Partitions Refinement Phase FTS and KL July 31, Kochi SWoPP 2006

21 Conclusions Bisection Quality Bisection Time
“Ever-Best” partitions Edge-CutOUR ≤ Edge-CutRRTS≤ Edge-CutMETIS Bisection Time Comparable and Reasonable TimeMETIS < TimeOUR << TimeRRTS Speed Up 10 comparing to RRTS Adapted to Grid Environment Scalable Performance Convenient usage Good Fault Tolerant July 31, Kochi SWoPP 2006

22 御静聴ありがとうございました! July 31, Kochi SWoPP 2006


Download ppt "A Parallelization of State-of-the-Art Graph Bisection Algorithms"

Similar presentations


Ads by Google