A Parallelization of State-of-the-Art Graph Bisection Algorithms

A Parallelization of State-of-the-Art Graph Bisection Algorithms
Nan Dun, Kenjiro Taura, Akinori Yonezawa Graduate School of Information Science and Technology The University of Tokyo

Problem Description Graph Partition Problem Complexity Solutions
Goal: To minimize cut K-partition Bisection (Bipartition) Problem Complexity To find best partition or To find approximate partitions: NP-Hard1)2) Solutions Heuristics Non-deterministic On the Grid グラフ分割問題 L={1,2,3} R={4,5,6} 1 1 4 5 2 2 3 6 無向グラフ G=(V,E)が与えられたとき、|L|=|R|を満たすVの分割(L,R)で、LとR間の枝の本数を最小にするものを求める問題。 July 31, Kochi SWoPP 2006

Practical Application
In Mathematics Analysis of sparse system of linear equations In Computer Science Modeling data placement on distributed memory, to minimize communication In other Various Domains VLSI Design Transportation Networks Communication Networks July 31, Kochi SWoPP 2006

Bisection Initialization
Bisection Flow Bisection Initialization Random Initialization Half-Half Initialization Region Growing Bisection Refinement Kernighan-Lin3)4) Tabu Search7) Fixed Tabu Search Reactive Tabu Search Bisection Initialization Initial Bisection Bisection Refinement Final Bisection July 31, Kochi SWoPP 2006

Min-Max Greedy Growing7)
addset A B A Max: Breaking ties by maximizing internal connections Min: Search vertices which cause minimal edge-cut C July 31, Kochi SWoPP 2006

Swapping Pair of Vertices
Kernighan-Lin3)4) A C Calculate gain of each vertex Search a serials of pairs which leads to maximal edge-cut reduction if being swapped Swap pairs of vertices obtained in 2, lock them from further swap in current pass Iterate step 1, 2, 3 until edge-cut stops to converge B D Swapping Pair of Vertices A B C D gain(B) = -1, gain(C) = -2 ΔCut of swapping B, C = gain(B) + gain(C) + 2 = -1 *gain := # of Internal Edges - # of External Edges July 31, Kochi SWoPP 2006

Tabu Search7) Kernighan-Lin Like Temporarily Forbidden
Swapping pairs of vertices according to their gains Temporarily Forbidden Previously swapped vertices are temporarily forbad to move for a period of time (Tabu Length) Tabu Length: A fraction (Tabu Fraction) of |V| E.g.: Tabu Fraction = 0.01, |V| = 1000, Tabu Length = 0.01 x |V| = Previously swapped pairs are allowed to move again after 10 other swaps To exceed “Local-Minimum” July 31, Kochi SWoPP 2006

Graph Types – Tabu Lengths
|V| = |E| = Deg: Max 43 Min 3 Avg. 19.8 |V| = |E| = Deg: Max 573 Min 1 Avg. 6.1 Edge-Cut Tabu Fraction Number of Vertex Degree Denser random graphs tend to prefer smaller Tabu lengths, while denser geometric graphs tend to prefer larger tabu lengths8) Distribution of Vertex Degree Graphs having uniform distribution of vertex degree tend to have unique fitting tabu length July 31, Kochi SWoPP 2006

RRTS7) Synthesis of Heuristics Reactive Best Quality Long Running Time
Heuristics perform as complementary for each other Reactive Try each Tabu-length to see which is better Adaptive to various graphs Best Quality Beyond “Local-minimum” Long Running Time Scoring Phase REACTIVERANDOMIZEDTABUSEARCH Scoring each Tabu length by small runs of TS do I times Initial bisection by Min-Max do J times TS with high-scored Tabu length Refine by Kernighan-Lin runs R. Battiti and A. A. Bertossi. Greedy, Prohibition, and Reactive Heuristics for Graph Partitioning. IEEE Transactions on Computers, Vol. 48, April 1999. July 31, Kochi SWoPP 2006

Multi-level for Large Graphs
Coarsen Phase Coarsen large graphs to smaller one by using “Match Scheme” Multi-level coarsen Bisection Phase Bisecting small graphs is usually very fast Uncoarsen Phase Mapping back to original graph Perform refinement in each uncoarsening phase METIS5)12) Matching Scheme July 31, Kochi SWoPP 2006

Comparison of Heuristics
METIS RRTS100 FTS10000 cut time G1 130 0.01 168.11 1.22 G2 366 0.07 353 696.49 354 13.85 G3 311 0.10 935.56 306 32.85 G4 6337 0.04 6257 353.45 6316 3.77 G5 950 0.17 Timeout (1 hour) 929 31.55 Graph |V| |E| Degree Best Tabu Fraction Avg Min Max G1:fe_4elt 11143 32818 7.93 15 0.02 G2:fe_pwt 36519 144794 5.89 3 12 G3:fe_body 45087 163734 7.26 28 G4:mem 17758 54196 6.10 1 573 0.14 G5:wing 62032 121544 3.92 2 4 0.01 July 31, Kochi SWoPP 2006

Comparison of Heuristics
METIS Extremely Fast Using Multi-level Technique High-Quality Bisections but worse than RRTS Multi-level lacks “Global-Optimizing” during coarsen phase RRTS Very Slow Scoring Phase is time costing “Ever-best” Bisections Adaptive to kinds of graphs FTS with Known Tabu-Length Must faster than RRTS Comparable result to RRTS July 31, Kochi SWoPP 2006

A Naive Parallelization
Dispatch Graphs RRTS100 RRTS100 RRTS100 RRTS100 RRTS100 RRTS100 RRTS100 Synthesize Results Run RRTS independently on each node Simply equivalent to scale-up iterations Generate Different seeds for different nodes Heuristics are initial sensitive 10% ~ 20% enhanced July 31, Kochi SWoPP 2006

Statistical Properties of Cut-size
Incidence of Bests Average quality is good Only 0.25% is the best General Property Distribution becomes “Peak” as |V| grows Distribution tends towards Gaussian8) Mean and Variance scales linearly with |V| Count Edge-Cut |V| = |E| = Degree: Max 43 Min 3 Avg 19.80 RRTS100 on 400 nodes provided by Grid Challenge Federation July 31, Kochi SWoPP 2006

Issues of Parallelizing Heuristics
Hard by Message-Passing Model (MPI) J.R. Gilbert and E. Zmijewski9): A parallel graph partitioning algorithm for a message-passing multiprocessor. International Journal of Parallel Programming Par-METIS (Parallel METIS) Par-METIS only parallelized “coarsen-uncoarsen” part Hard to Be Efficient (statistic property) If we could parallelize heuristic efficiently The fraction of reach the best bisections is still small among overall iterations If we corporately run independent instance on Grid How many nodes will leads to best partition When will a good threshold come July 31, Kochi SWoPP 2006

Contribution of Phases
Initial Phase Reduce large portion of Edge-cut Good initial partitions lead to good final partitions Consistent time for different running, good initial partitions gain time for refinement TS and KL Phase Reductions tend be alike More iterations, better results ΔEdge-Cut Best Edge-Cuts July 31, Kochi SWoPP 2006

Results from Same Initial Bisections
Given Same Initial Partitions Best initial partitions leads to best final partitions FTS and KL tend to be deterministic Fewer swapping are available Diversity of edge-cut can be cancelled by distributing only one phase Run FTS and KL on one node is enough Count Perform FTS and KL on same initial partitions, 50 nodes July 31, Kochi SWoPP 2006

Multi-level Scoring Mainly Used to Adapt Large-Scale Graphs
Edge-Cut Edge-Cut Level-1 Tabu Fraction Level-2 Tabu Fraction Mainly Used to Adapt Large-Scale Graphs If |V| = 1000, Tabu = 0.01 x 1000 = 10 If |V| = , Tabu = 0.01 x = 1000 Tuning Tabu-Length to fit specific graphs better Level-1 Scoring distinguish graphs from their types Level-2 Scoring test better Tabu-length from specific graphs July 31, Kochi SWoPP 2006

Final Approaches Not to Use Multi-level Partition
To preserve a “best” quality Not to Parallelize Heuristics Itself Not a good trade-off To Parallelize Scoring Phase One group of nodes score one tabu length With multi-level scoring technique To Parallelize Initial Phase Only Remove diversity of edge-cut ASAP Take advantage of running distribution to remove diversity of edge-cut Reduce computing effort AMAP Further refinement can be done on single node To Use GXP Cluster Shell “mw” command: mw M {{ W }} July 31, Kochi SWoPP 2006

Full Picture Best Initial Partitions FTS and KL Multi-Level Scoring
High-Scored Level-1 Tabu Fraction S:0.001 S: 0.002 S: 0.003 S: 0.004 S: 0.005 S: 0.006 S: 0.007 High-Scored Level-2 Tabu Fraction Initial Phase Init Init Init Init Init Init Best Initial Partitions Refinement Phase FTS and KL July 31, Kochi SWoPP 2006

Conclusions Bisection Quality Bisection Time
“Ever-Best” partitions Edge-CutOUR ≤ Edge-CutRRTS≤ Edge-CutMETIS Bisection Time Comparable and Reasonable TimeMETIS < TimeOUR << TimeRRTS Speed Up 10 comparing to RRTS Adapted to Grid Environment Scalable Performance Convenient usage Good Fault Tolerant July 31, Kochi SWoPP 2006

御静聴ありがとうございました！ July 31, Kochi SWoPP 2006

A Parallelization of State-of-the-Art Graph Bisection Algorithms

Similar presentations

Presentation on theme: "A Parallelization of State-of-the-Art Graph Bisection Algorithms"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

A Parallelization of State-of-the-Art Graph Bisection Algorithms

Similar presentations

Presentation on theme: "A Parallelization of State-of-the-Art Graph Bisection Algorithms"— Presentation transcript:

Similar presentations

About project

Feedback