CSE 494: Electronic Design Automation Lecture 4 Partitioning.

Slides:



Advertisements
Similar presentations
Lecture 15. Graph Algorithms
Advertisements

Multilevel Hypergraph Partitioning Daniel Salce Matthew Zobel.
L30: Partitioning 성균관대학교 조 준 동 교수
ECE 667 Synthesis and Verification of Digital Circuits
Approximation, Chance and Networks Lecture Notes BISS 2005, Bertinoro March Alessandro Panconesi University La Sapienza of Rome.
1 EE5900 Advanced Embedded System For Smart Infrastructure Static Scheduling.
Graph Isomorphism Algorithms and networks. Graph Isomorphism 2 Today Graph isomorphism: definition Complexity: isomorphism completeness The refinement.
Online Social Networks and Media. Graph partitioning The general problem – Input: a graph G=(V,E) edge (u,v) denotes similarity between u and v weighted.
Divide and Conquer. Recall Complexity Analysis – Comparison of algorithm – Big O Simplification From source code – Recursive.
Nattee Niparnan. Recall  Complexity Analysis  Comparison of Two Algos  Big O  Simplification  From source code  Recursive.
Lectures on Network Flows
EE 5301 – VLSI Design Automation I
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 11: March 4, 2008 Placement (Intro, Constructive)
VLSI Layout Algorithms CSE 6404 A 46 B 65 C 11 D 56 E 23 F 8 H 37 G 19 I 12J 14 K 27 X=(AB*CD)+ (A+D)+(A(B+C)) Y = (A(B+C)+AC+ D+A(BC+D)) Dr. Md. Saidur.
Recent Development on Elimination Ordering Group 1.
VLSI Layout Algorithms CSE 6404 A 46 B 65 C 11 D 56 E 23 F 8 H 37 G 19 I 12J 14 K 27 X=(AB*CD)+ (A+D)+(A(B+C)) Y = (A(B+C)+AC+ D+A(BC+D)) Dr. Md. Saidur.
2004/9/16EE VLSI Design Automation I 85 EE 5301 – VLSI Design Automation I Kia Bazargan University of Minnesota Part III: Partitioning.
Chapter 2 – Netlist and System Partitioning
Fast algorithm for detecting community structure in networks.
EDA (CS286.5b) Day 5 Partitioning: Intro + KLFM. Today Partitioning –why important –practical attack –variations and issues.
CS294-6 Reconfigurable Computing Day 13 October 6, 1998 Interconnect Partitioning.
Partitioning 2 Outline Goal Fiduccia-Mattheyses Algorithm Approach
SubSea: An Efficient Heuristic Algorithm for Subgraph Isomorphism Vladimir Lipets Ben-Gurion University of the Negev Joint work with Prof. Ehud Gudes.
Lecture 11. Matching A set of edges which do not share a vertex is a matching. Application: Wireless Networks may consist of nodes with single radios,
Simulated Annealing 10/7/2005.
Lecture 9: Multi-FPGA System Software October 3, 2013 ECE 636 Reconfigurable Computing Lecture 9 Multi-FPGA System Software.
1 CSE 417: Algorithms and Computational Complexity Winter 2001 Lecture 11 Instructor: Paul Beame.
CSE 144 Project Part 2. Overview Multiple rows Routing channel between rows Components of identical height but various width Goal: Implement a placement.
Partitioning 1 Outline –What is Partitioning –Partitioning Example –Partitioning Theory –Partitioning Algorithms Goal –Understand partitioning problem.
Lecture 11. Matching A set of edges which do not share a vertex is a matching. Application: Wireless Networks may consist of nodes with single radios,
1 Enhancing Performance of Iterative Heuristics for VLSI Netlist Partitioning Dr. Sadiq M. Sait Dr. Aiman El-Maleh Mr. Raslan Al Abaji. Computer Engineering.
Multilevel Graph Partitioning and Fiduccia-Mattheyses
Partitioning Outline –What is Partitioning –Partitioning Example –Partitioning Theory –Partitioning Algorithms Goal –Understand partitioning problem –Understand.
CSE 242A Integrated Circuit Layout Automation Lecture 5: Placement Winter 2009 Chung-Kuan Cheng.
Register-Transfer (RT) Synthesis Greg Stitt ECE Department University of Florida.
Graph partition in PCB and VLSI physical synthesis Lin Zhong ELEC424, Fall 2010.
CAFE router: A Fast Connectivity Aware Multiple Nets Routing Algorithm for Routing Grid with Obstacles Y. Kohira and A. Takahashi School of Computer Science.
CSE 242A Integrated Circuit Layout Automation Lecture: Partitioning Winter 2009 Chung-Kuan Cheng.
Graph Partitioning Problem Kernighan and Lin Algorithm
Network Aware Resource Allocation in Distributed Clouds.
Graph Algorithms. Definitions and Representation An undirected graph G is a pair (V,E), where V is a finite set of points called vertices and E is a finite.
Clustering Spatial Data Using Random Walk David Harel and Yehuda Koren KDD 2001.
Massachusetts Institute of Technology 1 L14 – Physical Design Spring 2007 Ajay Joshi.
Télécom 2A – Algo Complexity (1) Time Complexity and the divide and conquer strategy Or : how to measure algorithm run-time And : design efficient algorithms.
10/25/ VLSI Physical Design Automation Prof. David Pan Office: ACES Lecture 3. Circuit Partitioning.
CSC 211 Data Structures Lecture 13
1 CS612 Algorithms for Electronic Design Automation CS 612 – Lecture 3 Partitioning Mustafa Ozdal Computer Engineering Department, Bilkent University Mustafa.
CSE 589 Part VI. Reading Skiena, Sections 5.5 and 6.8 CLR, chapter 37.
1 Partitioning. 2 Decomposition of a complex system into smaller subsystems  Done hierarchically  Partitioning done until each subsystem has manageable.
Parallel Graph Partioning Using Simulated Annealing Parallel and Distributed Computing I Sadik Gokhan Caglar.
ICS 252 Introduction to Computer Design
Chapter 13 Backtracking Introduction The 3-coloring problem
Community structure in graphs Santo Fortunato. More links “inside” than “outside” Graphs are “sparse” “Communities”
CprE566 / Fall 06 / Prepared by Chris ChuPartitioning1 CprE566 Partitioning.
Example Apply hierarchical clustering with d min to below data where c=3. Nearest neighbor clustering d min d max will form elongated clusters!
Multilevel Partitioning
CSE 144 Project. Overall Goal of the Project Implement a physical design tool for a two- row standard cell design
3/21/ VLSI Physical Design Automation Prof. David Pan Office: ACES Lecture 4. Circuit Partitioning (II)
Partitioning Jong-Wha Chong Wireless Location and SOC Lab. Hanyang University.
VLSI Quadratic Placement
Andrew B. Kahng and Xu Xu UCSD CSE and ECE Depts.
A Linear-Time Heuristic for Improving Network Partitions
Chapter 2 – Netlist and System Partitioning
EE5780 Advanced VLSI Computer-Aided Design
Algorithms for Budget-Constrained Survivable Topology Design
A Fundamental Bi-partition Algorithm of Kernighan-Lin
Rusakov A. S. (IPPM RAS), Sheblaev M.
Fast Min-Register Retiming Through Binary Max-Flow
Algorithm Course Algorithms Lecture 3 Sorting Algorithm-1
A Linear-Time Heuristic for Improving Network Partitions
Presentation transcript:

CSE 494: Electronic Design Automation Lecture 4 Partitioning

Organization  Partitioning  Kernighan-Lin (KL) Heuristic  Fiduccia-Mattheyses (FM) Heuristic  Simulated annealing

Partitioning  Division of a graph (or hypergraph) into multiple sub-graphs is known as partitioning.  Partitioning should Maintain functionality Maintain functionality Minimize interconnections between sub- graphs Minimize interconnections between sub- graphs Have low run-time complexity Have low run-time complexity

Problem Formulation  Given A hypergraph G(V,E) A hypergraph G(V,E) V = {v_1,v_2,…,v_n} set of vertices V = {v_1,v_2,…,v_n} set of vertices E = {e_1,e_2,…,e_m} set of hyperedges where e_i = {v_i, v_j, …,v_k} E = {e_1,e_2,…,e_m} set of hyperedges where e_i = {v_i, v_j, …,v_k} Area of each vertex, a(v_i) Area of each vertex, a(v_i)  Partition V into {V_1,V_2,V_3,…,V_k} where V_i intersection V_j is empty set, i<>j V_i intersection V_j is empty set, i<>j Union of all V_i = V Union of all V_i = V Size of each partition < Constraint Size of each partition < Constraint Cut-set is minimized Cut-set is minimized  Partitioning is an NP complete problem.

Objective and Constraints  Objective Obj1: Minimize interconnection between various partitions Obj1: Minimize interconnection between various partitions Obj2: Minimize delay due to partition Obj2: Minimize delay due to partition  Constraints Const1: Number of terminals or pins. Const1: Number of terminals or pins. Const2: Area of each partition Const2: Area of each partition Const 3: Number of partitions Const 3: Number of partitions

Partitioning and Design Styles  Full Custom Area and terminal count constraints Area and terminal count constraints Minimize nets crossing a partition, delay Minimize nets crossing a partition, delay  Standard Cell At RTL, Circuit At RTL, Circuit Partition RTL specification into dis-joint sub-circuits, such that each sub-circuit corresponds to a standard cell Partition RTL specification into dis-joint sub-circuits, such that each sub-circuit corresponds to a standard cell Minimize nets, delay Minimize nets, delay  Gate array At RTL At RTL Partition RTL specification recursively such that each partition corresponds to a gate. Partition RTL specification recursively such that each partition corresponds to a gate. Minimize delay Minimize delay

Classification of Partitioning Algorithms  Constructive algorithms versus iterative improvement algorithms  Deterministic versus probabilistic algorithms

Bi-partitioning problem  Also known as min cut partitioning  Number of partitions = 2  Minimize the nets crossing the partitions  Size of the two partitions is equal  Given a graph with N nodes, calculate the number of different bi-partitions!

Kernighan-Lin (KL) Heuristic  Bi-partitioning algorithm  Input specified as a graph G(V,E) Obj: Divide V into two equal halves Obj: Divide V into two equal halves Minimize cut-set Minimize cut-set  Iterative improvement Starts with a random initial partition. Starts with a random initial partition.

KL: Input and Output

KL: Gain Calculation  For each vertex a I(a) = number of edges that do not cross cut I(a) = number of edges that do not cross cut E(a) = number of edges that cross the cut E(a) = number of edges that cross the cut Gain(a) = E(a) – I(a) Gain(a) = E(a) – I(a)  If two vertices a in A and b in B are exchanged Gain(a,b) = Gain(a) + Gain(b) – 2c(a,b) Gain(a,b) = Gain(a) + Gain(b) – 2c(a,b)  Cutcost’ = Cutcost - Gain(a,b)  For the remaining vertices x in A and y in B Gain’(x) = Gain(x) + 2c(x,a) – 2c(x,b) Gain’(x) = Gain(x) + 2c(x,a) – 2c(x,b) Gain’(y) = Gain(y) + 2c(y,b) – 2c(y,a) Gain’(y) = Gain(y) + 2c(y,b) – 2c(y,a)

KL: Strategy  From a node from each partition whose exchange results in largest gain.  Exchange the nodes, and lock them in the new partitions.  Maintain a table that records and updates the cumulative gain after every move.  Continue exchanging nodes until all nodes are locked.  Based on the table implement the first “k” moves that result in largest gain.

KL: Table Iteration Vertex pair Gain(i,j) Sum of Gain(i,i)Cutsize (3,5)336 2(4,6)581 3(1,7)-627 4(2,8)-209

KL: Algorithm begininitialize(); while (improve == TRUE) while (UNLOCK(A) == TRUE) for all unlocked (a) in A for all unlocked(b) in B if (cutcost + gain(a,b) < min) min = cutcost + gain(a,b) sel_a = a, sel_b =b cutcost = min, lock(sel_a), lock(sel_b), update(T) implement first k moves that achieve the lowest cutset set improve end Complexity = O(n^3)

KL Drawbacks  Handles only unit vertex nodes.  Addresses only exact bisections.  Cannot handle hypergraphs.  Time complexity is high.

Fiduccia-Mattheyses (FM) Problem Definition Given  A hypergraph G(C, N) where C is the set of cells, and N is the set of nets. Each cell i has a size s(i). Each cell i has a size s(i).  A fraction r = |A|/(|A| + |B|)  Partiton G into two block A and B such that the resulting cutset is minimized, and the resulting cutset is minimized, and the fraction r is satisfied. the fraction r is satisfied.

FM Definitions  Total number of nets: N  Total number of cells: C  Size of each cell: s(i)  Number of cells in a net: n(i)  Number of pins in a cell: p(i)  Total number of pins: p(1) + p(2) +.. P(C) = n(1) + n(2) + …n(N) = P

FM Definition  The cut state of a net is ‘1’, if the net has cells in both partitions.  A net is considered critical if it has a cell which if moved will change its cut state: No cell in one partition (or all cells are in one partition), No cell in one partition (or all cells are in one partition), It has only one cell in partition A, and the remaining are in partition B. It has only one cell in partition A, and the remaining are in partition B.

FM Strategy  Overall strategy is similar to KL. Iterative improvement. Iterative improvement. However, some modifications. However, some modifications.  Support for hypergraphs.  Only one cell moved at a time. Max gain Max gain Maintains the ratio (r-smax <= r <= r+smax) Maintains the ratio (r-smax <= r <= r+smax)  Efficient data structures for: Accessing cells and nets Accessing cells and nets Obtaining cells with max gain Obtaining cells with max gain Calculating and updating gain Calculating and updating gain

Cell and Net Data Structures  An array of cell nodes Each node has a linked list of nets Each node has a linked list of nets  A array of nets Each position has a linked list of cells Each position has a linked list of cells  Constructed in O(P).

Bucket Structure  The gain when a cell is moved can vary from pmax to - pmax.  Each partition has an array of pointers called the bucket array.  Size of the array is given by 2*pmax + 1.  Each array location “i” has a linked list of pointers with gain “-pmax + i”.  The bucket structure is utilized for bucket sort.  A pointer MAXGAIN that points to the location with the maxgain cell.

Free List  Once a cell has been moved, and locked it is Removed from the bucket structure. Removed from the bucket structure. Placed in the free cell list. Placed in the free cell list.  Reduces the number of entries in the bucket structure.

Selection of base cell  Consider the cell of the highest gain from each of the bucket structure. Must satisfy r “inequality” on the move. Must satisfy r “inequality” on the move.  Break ties by selecting one that gives the best r.  Selected cell is called base cell.  Remove from bucket structure, lock and place in free list.

Initial Computation of Cell Gains  F => current or “from” block of cell i.  T => target or “to” block of cell i.  Gain determined by only critical nets.  FS(i) => number of nets that have cell i as their only F cell.  TE(i) => number of nets that contain cell i and have an empty T.  G(i) = FS(i) – TE(i)  Can be calculated in O(P).

Updating Cell Gains  Base cell is moved from one partition to another.  Only nets that are critical before and after the move should be considered.  Cells that are not locked and belong such critical nets are updated.

Updating Cell Gains F F T F T F T T Case 1 Case 4 Case 3 Case 2

Updating Cell Gains +1 F T Case F T F T F T

Updating Cell Gains 0+1 F T Case FT 0 F T 0 0 FT 0

Updating Cell Gains 0+1 F T Case FT 0 F T +1 0 FT 0 0

Updating Cell Gains F T Case 4 FT F T 0 FT +1

Updation Algorithm For each net n on the base cell /* critical before move */ If T(n) = 0 then incr gain of all free cells on n If T(n) = 1 then decr gain of only T cell /* change net distribution */ decr F(n), incr T(n) /* critical after move */ If F(n) = 0 then decr gain of all free cells on n If F(n) = 1 then incr gain on the only F cell End Complexity if O(P)

KL and FM are Deterministic algorithms  Every invocation of the algorithm with identical inputs, generates the same solution (hence, deterministic).  Fast, but inherently greedy in nature. Cost Successive solutions Local minima

Non-deterministic algorithms  Also known as probabilistic or stochastic algorithms.  Every invocation of the algorithm with identical inputs generates a different solution.  Slower than non-deterministic, but demonstrates non-greedy behavior. Cost Successive solutions Hill-climbing behavior

Simulated Annealing  Simulated annealing is a generic optimization technique.  In PDA, it has been applied to partitioning and placement.  Maintains a temperature variable that is reduced from high value to a low value.  Number of solutions explored at each temperature by modification of existing solution.  Solution that decreases cost is always accepted.  Accept solutions that increase cost at high temperatures with greater probability.  At low temperatures accept solutions that increase cost with very low probability.

Partitioning by Simulated Annealing Algorithm SA Begin T = T_initial; P = initial partition; C = cutsize(P); repeatrepeat P’ = neighbourhood(P); C’ = cutsize(P’); D = C’ – C; r = random (0,1); If (D < 0 OR r < exp(-D/T)) accept P’; until (equilibrium at T is reached) T = alpha * T; /* 0 < alpha < 1 */ Until (T == T_final); End.

Partitioning by Simulated Annealing  A neighbourhood solution could be generated by exchanging of two nodes.  Equilibrium at T Apply fixed number of moves. Apply fixed number of moves.

Ratio Cut  KL aims to generate equally sized bi- partitions.  FM gives the possibility for unequal bipartitions.  Neither, consider the graph structure itself.  Ratio cut overcomes this limitation.

Ratio Cut  Ratio cut is a cost function.  Utilized instead of just cut set.