2004/9/16EE 5301 - VLSI Design Automation I 85 EE 5301 – VLSI Design Automation I Kia Bazargan University of Minnesota Part III: Partitioning.

Slides:



Advertisements
Similar presentations
Multilevel Hypergraph Partitioning Daniel Salce Matthew Zobel.
Advertisements

L30: Partitioning 성균관대학교 조 준 동 교수
- 1 -  P. Marwedel, Univ. Dortmund, Informatik 12, 05/06 Universität Dortmund Hardware/Software Codesign.
Lectures on Network Flows
Rajat K. Pal. Chapter 3 Emran Chowdhury # P Presented by.
EE 5301 – VLSI Design Automation I
ICS 252 Introduction to Computer Design Routing Fall 2007 Eli Bozorgzadeh Computer Science Department-UCI.
VLSI Layout Algorithms CSE 6404 A 46 B 65 C 11 D 56 E 23 F 8 H 37 G 19 I 12J 14 K 27 X=(AB*CD)+ (A+D)+(A(B+C)) Y = (A(B+C)+AC+ D+A(BC+D)) Dr. Md. Saidur.
CS244-Introduction to Embedded Systems and Ubiquitous Computing Instructor: Eli Bozorgzadeh Computer Science Department UC Irvine Winter 2010.
VLSI Layout Algorithms CSE 6404 A 46 B 65 C 11 D 56 E 23 F 8 H 37 G 19 I 12J 14 K 27 X=(AB*CD)+ (A+D)+(A(B+C)) Y = (A(B+C)+AC+ D+A(BC+D)) Dr. Md. Saidur.
Chapter 2 – Netlist and System Partitioning
ICS 252 Introduction to Computer Design winter 2005 Eli Bozorgzadeh Computer Science Department-UCI.
EDA (CS286.5b) Day 5 Partitioning: Intro + KLFM. Today Partitioning –why important –practical attack –variations and issues.
Fall 2003EE VLSI Design Automation I 37 EE 5301 – VLSI Design Automation I Kia Bazargan University of Minnesota Part II: Algorithms.
ICS 252 Introduction to Computer Design Lecture 15 Winter 2004 Eli Bozorgzadeh Computer Science Department-UCI.
Lecture 9: Multi-FPGA System Software October 3, 2013 ECE 636 Reconfigurable Computing Lecture 9 Multi-FPGA System Software.
Study Group Randomized Algorithms Jun 7, 2003 Jun 14, 2003.
Fall 2003EE VLSI Design Automation I 149 EE 5301 – VLSI Design Automation I Kia Bazargan University of Minnesota Part V: Placement.
1 Circuit Partitioning Presented by Jill. 2 Outline Introduction Cut-size driven circuit partitioning Multi-objective circuit partitioning Our approach.
1 Enhancing Performance of Iterative Heuristics for VLSI Netlist Partitioning Dr. Sadiq M. Sait Dr. Aiman El-Maleh Mr. Raslan Al Abaji. Computer Engineering.
15-853Page :Algorithms in the Real World Separators – Introduction – Applications.
Multilevel Graph Partitioning and Fiduccia-Mattheyses
1 CSC 6001 VLSI CAD (Physical Design) January
Partitioning Outline –What is Partitioning –Partitioning Example –Partitioning Theory –Partitioning Algorithms Goal –Understand partitioning problem –Understand.
General Routing Overview and Channel Routing
Universität Dortmund  P. Marwedel, Univ. Dortmund, Informatik 12, 2003 Hardware/software partitioning  Functionality to be implemented in software.
Multilevel Hypergraph Partitioning G. Karypis, R. Aggarwal, V. Kumar, and S. Shekhar Computer Science Department, U of MN Applications in VLSI Domain.
Register-Transfer (RT) Synthesis Greg Stitt ECE Department University of Florida.
Domain decomposition in parallel computing Ashok Srinivasan Florida State University COT 5410 – Spring 2004.
Graph partition in PCB and VLSI physical synthesis Lin Zhong ELEC424, Fall 2010.
CSE 242A Integrated Circuit Layout Automation Lecture: Partitioning Winter 2009 Chung-Kuan Cheng.
Graph Partitioning Problem Kernighan and Lin Algorithm
CSE 494: Electronic Design Automation Lecture 4 Partitioning.
Massachusetts Institute of Technology 1 L14 – Physical Design Spring 2007 Ajay Joshi.
10/25/ VLSI Physical Design Automation Prof. David Pan Office: ACES Lecture 3. Circuit Partitioning.
ECE 260B – CSE 241A Partitioning & Floorplanning 1http://vlsicad.ucsd.edu ECE260B – CSE241A Winter 2005 Partitioning & Floorplanning Website:
CS244-Introduction to Embedded Systems and Ubiquitous Computing Instructor: Eli Bozorgzadeh Computer Science Department UC Irvine Winter 2010.
CALTECH CS137 Winter DeHon CS137: Electronic Design Automation Day 6: January 23, 2002 Partitioning (Intro, KLFM)
1 CS612 Algorithms for Electronic Design Automation CS 612 – Lecture 3 Partitioning Mustafa Ozdal Computer Engineering Department, Bilkent University Mustafa.
CSE 589 Part VI. Reading Skiena, Sections 5.5 and 6.8 CLR, chapter 37.
1/24 Introduction to Graphs. 2/24 Graph Definition Graph : consists of vertices and edges. Each edge must start and end at a vertex. Graph G = (V, E)
Flow Algorithms Prepared by John Reif, Ph.D. Distinguished Professor of Computer Science Duke University Analysis of Algorithms Week 9, Lecture 2.
Circuit Partitioning Divides circuit into smaller partitions that can be efficiently handled Goal is generally to minimize communication between balanced.
1 Partitioning. 2 Decomposition of a complex system into smaller subsystems  Done hierarchically  Partitioning done until each subsystem has manageable.
CALTECH CS137 Winter DeHon CS137: Electronic Design Automation Day 9: February 9, 2004 Partitioning (Intro, KLFM)
1 NTUplace: A Partitioning Based Placement Algorithm for Large-Scale Designs Tung-Chieh Chen 1, Tien-Chang Hsu 1, Zhe-Wei Jiang 1, and Yao-Wen Chang 1,2.
Domain decomposition in parallel computing Ashok Srinivasan Florida State University.
ICS 252 Introduction to Computer Design
1 EE5900 Advanced Embedded System For Smart Infrastructure Static Scheduling.
Community structure in graphs Santo Fortunato. More links “inside” than “outside” Graphs are “sparse” “Communities”
CprE566 / Fall 06 / Prepared by Chris ChuPartitioning1 CprE566 Partitioning.
Multilevel Partitioning
CSE 144 Project. Overall Goal of the Project Implement a physical design tool for a two- row standard cell design
Iterative Improvement for Domain-Specific Problems Lecturer: Jing Liu Homepage:
1 مرتضي صاحب الزماني Partitioning. 2 First Project Synthesis by Design Compiler Physical design by SoC Encounter –Use tutorials in \\fileserver\common\szamani\EDA.
3/21/ VLSI Physical Design Automation Prof. David Pan Office: ACES Lecture 4. Circuit Partitioning (II)
COMP7330/7336 Advanced Parallel and Distributed Computing Task Partitioning Dynamic Mapping Dr. Xiao Qin Auburn University
::Network Optimization:: Minimum Spanning Trees and Clustering Taufik Djatna, Dr.Eng. 1.
RTL Design Flow RTL Synthesis HDL netlist logic optimization netlist Library/ module generators physical design layout manual design a b s q 0 1 d clk.
TU/e Algorithms (2IL15) – Lecture 8 1 MAXIMUM FLOW (part II)
Partitioning Jong-Wha Chong Wireless Location and SOC Lab. Hanyang University.
CS137: Electronic Design Automation
Computability and Complexity
A Linear-Time Heuristic for Improving Network Partitions
Chapter 2 – Netlist and System Partitioning
Haim Kaplan and Uri Zwick
EE5780 Advanced VLSI Computer-Aided Design
A Fundamental Bi-partition Algorithm of Kernighan-Lin
ICS 252 Introduction to Computer Design
A Linear-Time Heuristic for Improving Network Partitions
Presentation transcript:

2004/9/16EE VLSI Design Automation I 85 EE 5301 – VLSI Design Automation I Kia Bazargan University of Minnesota Part III: Partitioning

2004/9/16EE VLSI Design Automation I 86 References and Copyright Textbooks referred (none required)  [Mic94] G. De Micheli “Synthesis and Optimization of Digital Circuits” McGraw-Hill,  [CLR90] T. H. Cormen, C. E. Leiserson, R. L. Rivest “Introduction to Algorithms” MIT Press,  [Sar96] M. Sarrafzadeh, C. K. Wong “An Introduction to VLSI Physical Design” McGraw-Hill,  [She99] N. Sherwani “Algorithms For VLSI Physical Design Automation” Kluwer Academic Publishers, 3 rd edition, 1999.

2004/9/16EE VLSI Design Automation I 87 References and Copyright (cont.) Slides used: (Modified by Kia when necessary)  [©Sarrafzadeh] © Majid Sarrafzadeh, 2001; Department of Computer Science, UCLA  [©Sherwani] © Naveed A. Sherwani, 1992 (companion slides to [She99])  [©Keutzer] © Kurt Keutzer, Dept. of EECS, UC-Berekeley  [©Gupta] © Rajesh Gupta UC-Irvine  [©Kang] © Steve Kang UIUC

2004/9/16EE VLSI Design Automation I 88 Partitioning Decomposition of a complex system into smaller subsystems  Done hierarchically  Partitioning done until each subsystem has manageable size  Each subsystem can be designed independently Interconnections between partitions minimized  Less hassle interfacing the subsystems  Communication between subsystems usually costly [©Sherwani]

2004/9/16EE VLSI Design Automation I 89 Example: Partitioning of a Circuit [©Sherwani] Input size: 48 Cut 1=4 Size 1=15 Cut 2=4 Size 2=16Size 3=17

2004/9/16EE VLSI Design Automation I 90 Hierarchical Partitioning Levels of partitioning:  System-level partitioning: Each sub-system can be designed as a single PCB  Board-level partitioning: Circuit assigned to a PCB is partitioned into sub-circuits each fabricated as a VLSI chip  Chip-level partitioning: Circuit assigned to the chip is divided into manageable sub-circuits NOTE: physically not necessary [©Sherwani]

2004/9/16EE VLSI Design Automation I 91 Delay at Different Levels of Partitions A B C PCB1 [©Sherwani] D x 10x 20x PCB2

2004/9/16EE VLSI Design Automation I 92 Partitioning: Formal Definition Input:  Graph or hypergraph  Usually with vertex weights  Usually weighted edges Constraints  Number of partitions (K-way partitioning)  Maximum capacity of each partition OR maximum allowable difference between partitions Objective  Assign nodes to partitions subject to constraints s.t. the cutsize is minimized Tractability  Is NP-complete 

2004/9/16EE VLSI Design Automation I 93 Kernighan-Lin (KL) Algorithm On non-weighted graphs An iterative improvement technique A two-way (bisection) partitioning algorithm The partitions must be balanced (of equal size ) Iterate as long as the cutsize improves:  Find a pair of vertices that result in the largest decrease in cutsize if exchanged  Exchange the two vertices (potential move)  “Lock” the vertices  If no improvement possible, and still some vertices unlocked, then exchange vertices that result in smallest increase in cutsize W. Kernighan and S. Lin, Bell System Technical Journal, 1970.

2004/9/16EE VLSI Design Automation I 94 Kernighan-Lin (KL) Algorithm Initialize  Bipartition G into V 1 and V 2, s.t., |V 1 | = |V 2 |  1  n = |V| Repeat  for i=1 to n/2 oFind a pair of unlocked vertices v ai  V 1 and v bi  V 2 whose exchange makes the largest decrease or smallest increase in cut-cost oMark v ai and v bi as locked oStore the gain g i.  Find k, s.t.  i=1..k g i =Gain k is maximized  If Gain k > 0 then move v a1,...,v ak from V 1 to V 2 and v b1,...,v bk from V 2 to V 1. Until Gain k  0

2004/9/16EE VLSI Design Automation I 95 Kernighan-Lin (KL) Example a b c d e f g h 4 { a, e } { d, g } 32 2 { c, f } 11 3 { b, h } -23 Step No.Vertex PairGainCut-cost [©Sarrafzadeh]

2004/9/16EE VLSI Design Automation I 96 Kernighan-Lin (KL) : Analysis Add “dummy” nodes Replace vertex of weight  with  vertices of size 1 Time complexity?  Inner (for) loop oIterates n/2 times oIteration 1: (n/2) x (n/2) oIteration i: (n/2 – i + 1) 2.  Passes? Usually independent of n  O(n 3 ) Drawbacks?  Local optimum  Balanced partitions only  No weight for the vertices  High time complexity  Only on edges, not hyper-edges

2004/9/16EE VLSI Design Automation I 97 Fiduccia-Mattheyses (FM) Algorithm Modified version of KL A single vertex is moved across the cut in a single move  Unbalanced partitions Vertices are weighted Concept of cutsize extended to hypergraphs Special data structure to improve time complexity to O(n 2 )  (Main feature) Can be extended to multi-way partitioning C. M. Fiduccia and R. M. Mattheyses, 19 th DAC, 1982.

2004/9/16EE VLSI Design Automation I 98 The FM Algorithm: Data Structure -pmax +pmax -pmax 2nd Partition Ist Partition List of free vertices [©Sherwani] v a1 v a2 v b1 v b2 Vertex n Vertex 12 n

2004/9/16EE VLSI Design Automation I 99 The FM Algorithm: Data Structure Pmax  Maximum gain  p max = d max. w max, where d max = max degree of a vertex (# edges incident to it) w max is the maximum edge weight  What does it mean intuitively? -Pmax.. Pmax array  Index i is a pointer to the list of unlocked vertices with gain i. Limit on size of partition  A maximum defined for the sum of vertex weights in a partition (alternatively, the maximum ratio of partition sizes might be defined)

2004/9/16EE VLSI Design Automation I 100 The FM Algorithm Initialize  Start with a balance partition A, B of G (can be done by sorting vertex weights in decreasing order, placing them in A and B alternatively) Iterations  Similar to KL  A vertex cannot move if violates the balance condition  Choosing the node to move: pick the max gain in the partitions  Moves are tentative (similar to KL)  When no moves possible or no more unlocked vertices available, the pass ends  When no move can be made in a pass, the algorithm terminates

2004/9/16EE VLSI Design Automation I 101  For multi terminal nets, K-L may decompose them into many 2-terminal nets, but not efficient!  Consider this example:  If A = {1, 2, 3} B = {4, 5, 6}, graph model shows the cutsize = 4 but in the real circuit, only 3 wires cut  Reducing the number of nets cut is more realistic than reducing the number of edges cut Why Hyperedges? [©Kang] m q k p m m m q q q k p

2004/9/16EE VLSI Design Automation I 102 Hyperedge to Edge Conversion A hyperedge can be converted to a “clique”. w=?  w=2/(n-1) has been used, also w=2/n  Best: w=4/(n 2 – mod(n,2)) for n=3, w=4/(9-1)=0.5 Always necessary to convert hyper-edge to edge? w w w “Real” cut=1“net” cut=2w [©Keutzer]

2004/9/16EE VLSI Design Automation I 103 External cost Internal cost Gain Calculation GAGA GBGB a1a1 a2a2 anan aiai a3a3 a5a5 a6a6 a4a4 b2b2 bjbj b4b4 b3b3 b1b1 b6b6 b7b7 b5b5 [©Kang]

2004/9/16EE VLSI Design Automation I 104 Lemma: Consider any a i  A, b j  B. If a i, b j are interchanged, the gain is Proof: Total cost before interchange (T) between A and B Total cost after interchange (T’) between A and B Therefore Gain Calculation (cont.) [©Kang]

2004/9/16EE VLSI Design Automation I 105 Gain Calculation (cont.) Lemma:  Let D x ’, D y ’ be the new D values for elements of A - {a i } and B - {b j }. Then after interchanging a i & b j, Proof:  The edge x-a i changed from internal in D x to external in D x ’  The edge y-bj changed from internal in D x to external in D x ’  The x-b j edge changed from external to internal  The y-a i edge changed from external to internal More clarification in the next two slides [©Kang]

2004/9/16EE VLSI Design Automation I 106 Clarification of the Lemma aiai bjbj x  

2004/9/16EE VLSI Design Automation I 107 Clarification of the Lemma (cont.) Decompose Ix and Ex to separate edges from ai and bj: Write the equations before the move... And after the move

2004/9/16EE VLSI Design Automation I 108 FM Gain Calculation: Direct Hyperedge Calc FM is able to calculate gain directly using hyperedges (  not necessary to convert hyperedges to edges) Definition:  Given a partition (A|B), we define the terminal distribution of n as an ordered pair of integers (A(n),B(n)), which represents the number of cells net n has in blocks A and B respectively (how fast can be computed?)  Net is critical if there exists a cell on it such that if it were moved it would change the net’s cut state (whether it is cut or not).  Net is critical if A(n)=0,1 or B(n)=0,1 [©Keutzer]

2004/9/16EE VLSI Design Automation I 109 FM Gain Calc: Direct Hyperedge Calc (cont.) Gain of cell depends only on its critical nets:  If a net is not critical, its cutstate cannot be affected by the move  A net which is not critical either before or after a move cannot influence the gains of its cells Let F be the “from” partition of cell i and T the “to”: g(i) = FS(i) - TE(i), where:  FS(i) = # of nets which have cell i as their only F cell  TE(i) = # of nets containing i and have an empty T side [©Keutzer]

2004/9/16EE VLSI Design Automation I 110 Example: KL Step 1 - Initialization A = {2, 3, 4}, B = {1, 5, 6} A’ = A = {2, 3, 4}, B’ = B = {1, 5, 6} Step 2 - Compute D values D 1 = E 1 - I 1 = 1-0 = +1 D 2 = E 2 - I 2 = 1-2 = -1 D 3 = E 3 - I 3 = 0-1 = -1 D 4 = E 4 - I 4 = 2-1 = +1 D 5 = E 5 - I 5 = 1-1 = +0 D 6 = E 6 - I 6 = 1-1 = +0 [©Kang] Initial partition

2004/9/16EE VLSI Design Automation I 111 Example: KL (cont.)  Step 3 - compute gains g 21 = D 2 + D 1 - 2C 21 = (-1) + (+1) - 2(1) = -2 g 25 = D 2 + D 5 - 2C 25 = (-1) + (+0) - 2(0) = -1 g 26 = D 2 + D 6 - 2C 26 = (-1) + (+0) - 2(0) = -1 g 31 = D 3 + D 1 - 2C 31 = (-1) + (+1) - 2(0) = 0 g 35 = D 3 + D 5 - 2C 35 = (-1) + (0) - 2(0) = -1 g 36 = D 3 + D 6 - 2C 36 = (-1) + (0) - 2(0) = -1 g 41 = D 4 + D 1 - 2C 41 = (+1) + (+1) - 2(0) = +2 g 45 = D 4 + D 5 - 2C 45 = (+1) + (+0) - 2(+1) = -1 g 46 = D 4 + D 6 - 2C 46 = (+1) + (+0) - 2(+1) = -1  The largest g value is g 41 = +2  interchange 4 and 1 (a 1, b 1 ) = (4, 1) A’ = A’ - {4} = {2, 3} B’ = B’ - {1} = {5, 6} both not empty [©Kang]

2004/9/16EE VLSI Design Automation I 112 Example: KL (cont.) Step 4 - update D values of node connected to vertices (4, 1) D 2 ’ = D 2 + 2C C 21 = (-1) + 2(+1) - 2(+1) = -1 D 5 ’ = D 5 + 2C C 54 = (0) - 2(+1) = -2 D 6 ’ = D 6 + 2C C 64 = (0) - 2(+1) = -2 Assign D i = D i ’, repeat step 3 : g25 = D 2 + D 5 - 2C 25 = (0) = -3 g26 = D 2 + D 6 - 2C 26 = (0) = -3 g35 = D 3 + D 5 - 2C 35 = (0) = -3 g36 = D 3 + D 6 - 2C 36 = (0) = -3 All values are equal; arbitrarily choose g 36 = -3  (a2, b2) = (3, 6) A’ = A’ - {3} = {2}, B’ = B’ - {6} = {5} New D values are: D 2 ’ = D 2 + 2C C 26 = (1) - 2(0) = +1 D 5 ’ = D 5 + 2C C 53 = (1) - 2(0) = +0 New gain with D 2  D 2 ’, D 5  D 5 ’ g 25 = D 2 + D 5 - 2C 52 = (0) = +1  (a3, b3) = (2, 5) [©Kang]

2004/9/16EE VLSI Design Automation I 113 Example: KL (cont.) Step 5 - Determine the # of moves to take g 1 = +2 g 1 + g 2 = = -1 g 1 + g 2 + g 3 = = 0 The value of k for max G is 1 X = {a 1 } = {4}, Y = {b 1 } = {1} Move X to B, Y to A  A = {1, 2, 3}, B = {4, 5, 6} Repeat the whole process: The final solution is A = {1, 2, 3}, B = {4, 5, 6}

2004/9/16EE VLSI Design Automation I 114 Subgraph Replication to Reduce Cutsize Vertices are replicated to improve cutsize Good results if limited number of components replicated [©Sherwani] C. Kring and A. R. Newta, ICCAD, 1991.

2004/9/16EE VLSI Design Automation I 115 Clustering  Bottom-up process  Merge heavily connected components into clusters  Each cluster will be a new “node”  “Hide” internal connections (i.e., connecting nodes within a cluster)  “Merge” two edges incident to an external vertex, connecting it to two nodes in a cluster Can be a preprocessing step before partitioning  Each cluster treated as a single node , ,4 6 1,

2004/9/16EE VLSI Design Automation I 116 Other Partitioning Methods KL and FM have each held up very well Min-cut / max-flow algorithms  Ford-Fulkerson – for unconstrained partitions Ratio cut Genetic algorithm Simulated annealing

2004/9/16EE VLSI Design Automation I 117 To Probe Further... George Karypis, Rajat Aggarwal, Vipin Kumar and Shashi Shekhar, "Multilevel hypergraph partitioning: application in VLSI domain", Design Automation Conference, pp , George Karypis and Vipin Kumar, "Multilevel k-way hypergraph partitioning", Design Automation Conference, pp , A. E. Caldwell, A. B. Kahng and I. L. Markov, "Hypergraph Partitioning With Fixed Vertices", Design Automation Conference (DAC), pp , A. E. Caldwell, A. B. Kahng, I. L. Markov, "Design and Implementation of Move-Based Heuristics for VLSI Hypergraph Partitioning", ACM Journal on Experimental Algorithms, Vol. 5, 2000.