Rusakov A. S. (IPPM RAS), Sheblaev M.

Slides:

Advertisements

Similar presentations

Multilevel Hypergraph Partitioning Daniel Salce Matthew Zobel.

Advertisements

Social network partition Presenter: Xiaofei Cao Partick Berg.

A Graph-Partitioning-Based Approach for Multi-Layer Constrained Via Minimization Yih-Chih Chou and Youn-Long Lin Department of Computer Science, Tsing.

Simulated Evolution Algorithm for Multi- Objective VLSI Netlist Bi-Partitioning Sadiq M. Sait, Aiman El-Maleh, Raslan Al-Abaji King Fahd University of.

CISC October Goals for today: Foster’s parallel algorithm design –Partitioning –Task dependency graph Granularity Concurrency Collective communication.

EE 5301 – VLSI Design Automation I

Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 11: March 4, 2008 Placement (Intro, Constructive)

VLSI Layout Algorithms CSE 6404 A 46 B 65 C 11 D 56 E 23 F 8 H 37 G 19 I 12J 14 K 27 X=(AB*CD)+ (A+D)+(A(B+C)) Y = (A(B+C)+AC+ D+A(BC+D)) Dr. Md. Saidur.

Parallel Simulation etc Roger Curry Presentation on Load Balancing.

Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 9: February 20, 2008 Partitioning (Intro, KLFM)

1 Simulated Evolution Algorithm for Multiobjective VLSI Netlist Bi-Partitioning By Dr Sadiq M. Sait Dr Aiman El-Maleh Raslan Al Abaji King Fahd University.

2004/9/16EE VLSI Design Automation I 85 EE 5301 – VLSI Design Automation I Kia Bazargan University of Minnesota Part III: Partitioning.

Chapter 2 – Netlist and System Partitioning

EDA (CS286.5b) Day 5 Partitioning: Intro + KLFM. Today Partitioning –why important –practical attack –variations and issues.

Local Unidirectional Bias for Smooth Cutsize-delay Tradeoff in Performance-driven Partitioning Andrew B. Kahng and Xu Xu UCSD CSE and ECE Depts. Work supported.

Maximum Flows Lecture 4: Jan 19. Network transmission Given a directed graph G A source node s A sink node t Goal: To send as much information from s.

CS294-6 Reconfigurable Computing Day 13 October 6, 1998 Interconnect Partitioning.

Partitioning 2 Outline Goal Fiduccia-Mattheyses Algorithm Approach

Hard Optimization Problems: Practical Approach DORIT RON Tel Ziskind room #303

1 Circuit Partitioning Presented by Jill. 2 Outline Introduction Cut-size driven circuit partitioning Multi-objective circuit partitioning Our approach.

1 Enhancing Performance of Iterative Heuristics for VLSI Netlist Partitioning Dr. Sadiq M. Sait Dr. Aiman El-Maleh Mr. Raslan Al Abaji. Computer Engineering.

Multilevel Graph Partitioning and Fiduccia-Mattheyses

Partitioning Outline –What is Partitioning –Partitioning Example –Partitioning Theory –Partitioning Algorithms Goal –Understand partitioning problem –Understand.

Multilevel Hypergraph Partitioning G. Karypis, R. Aggarwal, V. Kumar, and S. Shekhar Computer Science Department, U of MN Applications in VLSI Domain.

Graph partition in PCB and VLSI physical synthesis Lin Zhong ELEC424, Fall 2010.

A Fast Algorithm for Enumerating Bipartite Perfect Matchings Takeaki Uno (National Institute of Informatics, JAPAN)

Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 4: January 28, 2015 Partitioning (Intro, KLFM)

Graph Partitioning Problem Kernighan and Lin Algorithm

CSE 494: Electronic Design Automation Lecture 4 Partitioning.

March 20, 2007 ISPD An Effective Clustering Algorithm for Mixed-size Placement Jianhua Li, Laleh Behjat, and Jie Huang Jianhua Li, Laleh Behjat,

10/25/ VLSI Physical Design Automation Prof. David Pan Office: ACES Lecture 3. Circuit Partitioning.

CALTECH CS137 Winter DeHon CS137: Electronic Design Automation Day 6: January 23, 2002 Partitioning (Intro, KLFM)

1 CS612 Algorithms for Electronic Design Automation CS 612 – Lecture 3 Partitioning Mustafa Ozdal Computer Engineering Department, Bilkent University Mustafa.

Speeding Up Enumeration Algorithms with Amortized Analysis Takeaki Uno (National Institute of Informatics, JAPAN)

1 Partitioning. 2 Decomposition of a complex system into smaller subsystems  Done hierarchically  Partitioning done until each subsystem has manageable.

CALTECH CS137 Winter DeHon CS137: Electronic Design Automation Day 9: February 9, 2004 Partitioning (Intro, KLFM)

1 NTUplace: A Partitioning Based Placement Algorithm for Large-Scale Designs Tung-Chieh Chen 1, Tien-Chang Hsu 1, Zhe-Wei Jiang 1, and Yao-Wen Chang 1,2.

Data Structures and Algorithms in Parallel Computing Lecture 7.

About Me Swaroop Butala  MSCS – graduating in Dec 09  Specialization: Systems and Databases  Interests:  Learning new technologies  Application of.

Simulated Evolution Algorithm for Multi- Objective VLSI Netlist Bi-Partitioning Sadiq M. Sait, Aiman El-Maleh, Raslan Al Abaji King Fahd University of.

ICS 252 Introduction to Computer Design

Outline Motivation and Contributions Related Works ILP Formulation

CprE566 / Fall 06 / Prepared by Chris ChuPartitioning1 CprE566 Partitioning.

Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 6: January 30, 2013 Partitioning (Intro, KLFM)

Example Apply hierarchical clustering with d min to below data where c=3. Nearest neighbor clustering d min d max will form elongated clusters!

Hypergraph Partitioning With Fixed Vertices Andrew E. Caldwell, Andrew B. Kahng and Igor L. Markov UCLA Computer Science Department

Multilevel Partitioning

James Hipp Senior, Clemson University.  Graph Representation G = (V, E) V = Set of Vertices E = Set of Edges  Adjacency Matrix  No Self-Inclusion (i.

1 Simulated Evolution Algorithm for Multi- Objective VLSI Netlist Bi-Partitioning Sadiq M. Sait,, Aiman El-Maleh, Raslan Al Abaji King Fahd University.

1 مرتضي صاحب الزماني Partitioning. 2 First Project Synthesis by Design Compiler Physical design by SoC Encounter –Use tutorials in \\fileserver\common\szamani\EDA.

3/21/ VLSI Physical Design Automation Prof. David Pan Office: ACES Lecture 4. Circuit Partitioning (II)

Partitioning Jong-Wha Chong Wireless Location and SOC Lab. Hanyang University.

High Performance Computing Seminar

A Continuous Optimization Approach to the Minimum Bisection Problem

VLSI Quadratic Placement

CS137: Electronic Design Automation

Delay Optimization using SOP Balancing

Andrew B. Kahng and Xu Xu UCSD CSE and ECE Depts.

Michael Chu, Kevin Fan, Scott Mahlke

A Linear-Time Heuristic for Improving Network Partitions

Chapter 2 – Netlist and System Partitioning

Plan Introduction to multilevel heuristics Rich partitioning problems

Haim Kaplan and Uri Zwick

EE5780 Advanced VLSI Computer-Aided Design

A Parallelization of State-of-the-Art Graph Bisection Algorithms

Delay Optimization using SOP Balancing

ESE535: Electronic Design Automation

Fast Min-Register Retiming Through Binary Max-Flow

An Improved Two-Way Partitioning Algorithm with Stable Performance

A Linear-Time Heuristic for Improving Network Partitions

Presentation transcript:

Rusakov A. S. (IPPM RAS), Sheblaev M. A modification of the Fiduccia-Mattheyses hypergraph bipartitioning algorithm Rusakov A. S. (IPPM RAS), Sheblaev M.

Motivation Let H(V, E) - hypergraph with V — nodes, Е – hyperedges, are edge and node weights. The problem of the balanced hypergraph bipartitioning is to find sets of nodes A and B to minimize sum of edges weights belong to cut : Balance constrain must be satisfied:

Motivation Graph and hypergraph partitioning — wide application area. Spectral method, simulated annealing, genetic algorithms Iterative improvement. KL method. FM algorithm. State of the art: multilevel iterative improvement algorithms. Metis, Hmetis, Scotch, MLPART, PaToH… FM algorithm does not perform well on graphs with largely varied node weights w(vi) or multiple node weight for node. We address FM algorithm weakness as part of multilevel bipartition algorithm.

Multilevel Partitioning Clustering Refinement

Fiduccia-Mattheyses pass. FM algorithm: One FM pass: given inital bipartition of the hypergraph G(v,e) Initialize node gains, bucket. while ( !bucket.empty() ) { take best gain node satisfying balance constrain vi Move vi Lock vi, remove vi from bucket update gains in the vi neighbourhood } Choose best solution seen in the pass. «Gain bucket» data structure is used. It allows to update bucket and take best node at O(1) if max gain move is balanced. One FM pass takes O(n)

Moves are made based on object gain. FM Partitioning: Moves are made based on object gain. Object Gain: The amount of change in cut crossings that will occur if an object is moved from its current partition into the other partition -1 2 - each object is assigned a gain - objects are put into a sorted gain list - the object with the highest gain from the larger of the two sides is selected and moved. - the moved object is "locked" - gains of "touched" objects are recomputed - gain lists are resorted -1 -2 -2 -1 1 -1 1

FM Partitioning: -1 2 -1 -2 -2 -1 1 -1 1

-1 -2 -2 -2 -1 -2 -2 -1 1 -1 1

-1 -2 -2 -2 -1 -2 -2 -1 1 1 -1

-1 -2 -2 -2 -1 -2 -2 -2 1 -1 -1 -1

-1 -2 -2 -2 1 -2 -2 -2 -2 1 -1 -1 -1

-1 -2 -2 -2 1 -2 -2 -2 -2 1 -1 -1 -1

-1 -2 -2 1 -2 -2 -1 -2 -2 -2 -3 -1 -1

-1 -2 -2 -1 -2 -2 -2 -1 -2 -2 -2 -3 -1 -1

Cut During One Pass (Bipartitioning) Moves

Gain Bucket Data Structure +pmax Max Gain Cell # Cell # -pmax 1 2 n

FM with largely varied weights Classic bucket *cell = max_gain_cell while( cell != NULL) { if ( IsBalancedMove(cell) return cell; Remove cell from bucket(cell) cell = cell->next } return NULL Fast. O(1) Bucket with restart *cell = max_gain_cell while( cell != NULL) { if ( IsBalancedMove(cell) return cell; cell = cell->next } return cell; Gives much better cut size

Gain Bucket Data Structure +pmax Max Gain Cell # Cell # -pmax 1 2 n

FM with largely varied weights Update of the gain cell. Reset UpdateGainCell(new gain) { ... If(new_gain > first_balanced_gain) first_balance = NULL … } Our modification of bucket. Introduce a new pointer to first balanced cell: GetBestMove() { *cell = max_gain_cell If (first_balanced) cell = first_balanced; while( cell != NULL) { if ( IsBalancedMove(cell) { first_balanced = cell->next; return cell; } cell = cell->next return NULL Starts search from the “good” cell. Good tradeoff of runtime and cut size

FLAT FM results. 2% imbalance bucket with restart Classic bucket new bucket TestName Cut Time Quality cut time ibm01: 303 1 1,00 328 0,92 393 2 0,77 ibm02 446 4 0,84 768 0,49 373 3 ibm03: 1734 0,85 2513 0,59 1558 0,95 ibm03 1042 7 0,94 1423 0,69 1093 5 0,90 ibm05: 2010 0,93 3201 1878 ibm06: 1076 9 0,96 2752 0,38 1173 0,88 ibm07: 1505 16 0,89 1574 1332 12 ibm08: 1937 35 5242 10 0,37 2491 17 0,78 ibm09: 1246 20 3806 0,33 1457 15 0,86 ibm10: 2638 49 4204 0,53 2230 24 ibm11: 2612 32 0,82 5757 18 2136 22 ibm12: 3722 73 8350 0,45 4741 30 0,79 ibm13 2188 39 0,76 3042 0,54 2036 28 0,81 ibm14: 3855 69 7528 43 0,42 4561 54 0,70 ibm15: 4948 168 0,87 9542 51 5246 75 ibm16: 3683 179 10828 57 0,34 4356 82 ibm17: 3285 117 13236 60 0,25 5434 79 0,60 ibm18: 3345 191 9522 63 0,35 3758 102 Score 1020,00 16,52 398,00 8,89 565,00 15,58

Application to Multilevel Bipartitioning runs with 5 clusterization scheme (FC, MHEC, HEM, HEC, PinHEM). Choose best V cycle

Multilevel Results Bucket with restarts New bucket Classic bucket TestName Cut Time Score ibm01: 219 4 1,00 245 0,89 336 3 0,65 ibm02 269 5 0,99 266 ibm03: 751 10 791 8 0,95 911 7 0,82 ibm03 535 9 0,96 515 513 ibm05: 1733 11 1732 12 1725 ibm06: 585 0,98 788 0,66 ibm07: 802 22 833 17 820 15 ibm08: 1190 32 1187 16 ibm09: 533 18 ibm10: 1119 39 0,97 1084 1216 ibm11: 874 34 868 26 844 27 ibm12: 2195 64 2185 33 2084 ibm13 1073 62 1111 43 0,92 1024 ibm14: 1938 100 1955 77 1960 74 ibm15: 2567 200 2653 92 2709 93 ibm16: 1788 151 1805 1868 104 ibm17: 2296 150 2362 158 2648 0,87 ibm18: 1943 49 2232 40 2057 23 0,94 971,00 17,67 677,00 17,40 589,00 16,71 Achieve almost the same cut size with 30% speeding up

Applying LIFO* [*] Implemented LIFO*. Evaluated [*] on our benchmarks set. Pure LIFO* did not show improvement. Combination of LIFO and LIFO* passes improved cut and runtime. Each 3th pass uses LIFO*. [*] - Yourim Yoon, Yong-Hyuk Kim, New Bucket Managements in Iterative Improvement Partitioning Algorithms test case LIFO LIFO* LIFO+LIFO* ibm01 262 252 3,97% 245 6,49% ibm02 269 346 -22,25% 0,00% ibm03 826 850 -2,82% 822 0,48% ibm04 537 539 -0,37% ibm05 1731 1726 0,29% ibm06 596 718 -16,99% ibm07 820 ibm08 1190 ibm09 535 ibm10 1088 1011 7,62% ibm11 877 6,95% ibm12 2185 2193 -0,36% ibm13 1170 1175 -0,43% 1166 0,34% ibm14 1955 1969 -0,71% ibm15 2488 2639 -5,72% ibm16 1805 1824 -1,04% 1789 0,89% ibm17 2378 2403 ibm18 1959 1931 1,45% -31,47% 8,20%

Combination of LIFO and LIFO* Results on ISPD98 hypergraph benchmarks. Multilevel bipartition. LIFO* on each 3th pass test case LIFO LIFO* LIFO+LIFO* ibm01 262 252 3,97% 245 6,49% ibm02 269 346 -22,25% 0,00% ibm03 826 850 -2,82% 822 0,48% ibm04 537 539 -0,37% ibm05 1731 1726 0,29% ibm06 596 718 -16,99% ibm07 820 ibm08 1190 ibm09 535 ibm10 1088 1011 7,62% ibm11 877 6,95% ibm12 2185 2193 -0,36% ibm13 1170 1175 -0,43% 1166 0,34% ibm14 1955 1969 -0,71% ibm15 2488 2639 -5,72% ibm16 1805 1824 -1,04% 1789 0,89% ibm17 2378 2403 ibm18 1959 1931 1,45% -31,47% 8,20%

Conclusion Classic FM performs suboptimal on graps/hypergraphs with varied node weights. Bucket with restarts provides best cut value. No crucial runtime penalty in multilevel FM. Propose new algorithm modification. Good quality/runtime tradeoff