Analytical Minimization of Signal Delay in VLSI Placement Andrew B. Kahng and Igor L. Markov UCSD, Univ. of Michigan

Slides:



Advertisements
Similar presentations
ECE 667 Synthesis and Verification of Digital Circuits
Advertisements

OCV-Aware Top-Level Clock Tree Optimization
4/22/ Clock Network Synthesis Prof. Shiyan Hu Office: EREC 731.
Introduction to Algorithms
Buffer and FF Insertion Slides from Charles J. Alpert IBM Corp.
ELEN 468 Lecture 261 ELEN 468 Advanced Logic Design Lecture 26 Interconnect Timing Optimization.
1 Modeling and Optimization of VLSI Interconnect Lecture 9: Multi-net optimization Avinoam Kolodny Konstantin Moiseev.
X-Architecture Placement Based on Effective Wire Models Tung-Chieh Chen, Yi-Lin Chuang, and Yao-Wen Chang Graduate Institute of Electronics Engineering.
Meng-Kai Hsu, Sheng Chou, Tzu-Hen Lin, and Yao-Wen Chang Electronics Engineering, National Taiwan University Routability Driven Analytical Placement for.
Clock Skewing EECS 290A Sequential Logic Synthesis and Verification.
SimPL: An Effective Placement Algorithm Myung-Chul Kim, Dong-Jin Lee and Igor L. Markov Dept. of EECS, University of Michigan 1ICCAD 2010, Myung-Chul Kim,
Consistent Placement of Macro-Blocks Using Floorplanning and Standard-Cell Placement Saurabh Adya Igor Markov (University of Michigan)
FastPlace: Efficient Analytical Placement using Cell Shifting, Iterative Local Refinement and a Hybrid Net Model FastPlace: Efficient Analytical Placement.
Toward Better Wireload Models in the Presence of Obstacles* Chung-Kuan Cheng, Andrew B. Kahng, Bao Liu and Dirk Stroobandt† UC San Diego CSE Dept. †Ghent.
Intrinsic Shortest Path Length: A New, Accurate A Priori Wirelength Estimator Andrew B. KahngSherief Reda VLSI CAD Laboratory.
Boosting: Min-Cut Placement with Improved Signal Delay Andrew B. KahngSherief Reda CSE & ECE Departments University of CA, San Diego La Jolla, CA
Power-Aware Placement
EE4271 VLSI Design Interconnect Optimizations Buffer Insertion.
An Analytic Placer for Mixed-Size Placement and Timing-Driven Placement Andrew B. Kahng and Qinke Wang UCSD CSE Department {abk, Work.
Supply Voltage Degradation Aware Analytical Placement Andrew B. Kahng, Bao Liu and Qinke Wang UCSD CSE Department {abk, bliu,
Local Unidirectional Bias for Smooth Cutsize-delay Tradeoff in Performance-driven Partitioning Andrew B. Kahng and Xu Xu UCSD CSE and ECE Depts. Work supported.
Penn ESE Fall DeHon 1 ESE (ESE534): Computer Organization Day 19: March 26, 2007 Retime 1: Transformations.
Placement Feedback: A Concept and Method for Better Min-Cut Placements Andrew B. KahngSherief Reda CSE & ECE Departments University of CA, San Diego La.
On Legalization of Row-Based Placements Andrew B. KahngSherief Reda CSE & ECE Departments University of CA, San Diego La Jolla, CA 92093
1 UCSD VLSI CAD Laboratory ISQED-2009 Revisiting the Linear Programming Framework for Leakage Power vs. Performance Optimization Kwangok Jeong, Andrew.
Can Recursive Bisection Alone Produce Routable Placements? Andrew E. Caldwell Andrew B. Kahng Igor L. Markov Supported by Cadence.
Studies of Timing Structural Properties for Early Evaluation of Circuit Design Andrew B. Kahng*, Ryan Kastner, Stefanus Mantik, Majid Sarrafzadeh and Xiaojian.
A Proposal for Routing-Based Timing-Driven Scan Chain Ordering Puneet Gupta 1 Andrew B. Kahng 1 Stefanus Mantik 2
Processing Rate Optimization by Sequential System Floorplanning Jia Wang 1, Ping-Chih Wu 2, and Hai Zhou 1 1 Electrical Engineering & Computer Science.
ELEN 468 Lecture 271 ELEN 468 Advanced Logic Design Lecture 27 Interconnect Timing Optimization II.
Pei-Ci Wu Martin D. F. Wong On Timing Closure: Buffer Insertion for Hold-Violation Removal DAC’14.
UC San Diego Computer Engineering. VLSI CAD Laboratory.. UC San Diego Computer EngineeringVLSI CAD Laboratory.. UC San Diego Computer EngineeringVLSI CAD.
EDA (CS286.5b) Day 18 Retiming. Today Retiming –cycle time (clock period) –C-slow –initial states –register minimization.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 8: February 13, 2008 Retiming.
Area-I/O Flip-Chip Routing for Chip-Package Co-Design Progress Report 方家偉、張耀文、何冠賢 The Electronic Design Automation Laboratory Graduate Institute of Electronics.
Modern VLSI Design 4e: Chapter 4 Copyright  2008 Wayne Wolf Topics n Interconnect design. n Crosstalk. n Power optimization.
1 Coupling Aware Timing Optimization and Antenna Avoidance in Layer Assignment Di Wu, Jiang Hu and Rabi Mahapatra Texas A&M University.
Horizontal Benchmark Extension for Improved Assessment of Physical CAD Research Andrew B. Kahng, Hyein Lee and Jiajia Li UC San Diego VLSI CAD Laboratory.
TSV-Aware Analytical Placement for 3D IC Designs Meng-Kai Hsu, Yao-Wen Chang, and Valerity Balabanov GIEE and EE department of NTU DAC 2011.
Solving Hard Instances of FPGA Routing with a Congestion-Optimal Restrained-Norm Path Search Space Keith So School of Computer Science and Engineering.
March 20, 2007 ISPD An Effective Clustering Algorithm for Mixed-size Placement Jianhua Li, Laleh Behjat, and Jie Huang Jianhua Li, Laleh Behjat,
UC San Diego / VLSI CAD Laboratory Incremental Multiple-Scan Chain Ordering for ECO Flip-Flop Insertion Andrew B. Kahng, Ilgweon Kang and Siddhartha Nath.
Seeing the Forest and the Trees: Steiner Wirelength Optimization in Placement Jarrod A. Roy, James F. Lu and Igor L. Markov University of Michigan Ann.
An Efficient Clustering Algorithm For Low Power Clock Tree Synthesis Rupesh S. Shelar Enterprise Microprocessor Group Intel Corporation, Hillsboro, OR.
1 CS612 Algorithms for Electronic Design Automation CS 612 – Lecture 8 Lecture 8 Network Flow Based Modeling Mustafa Ozdal Computer Engineering Department,
Analytic Placement. Layout Project:  Sending the RTL file: −Thursday, 27 Farvardin  Final deadline: −Tuesday, 22 Ordibehesht  New Project: −Soon 2.
Massachusetts Institute of Technology 1 L14 – Physical Design Spring 2007 Ajay Joshi.
10/25/ VLSI Physical Design Automation Prof. David Pan Office: ACES Lecture 3. Circuit Partitioning.
Placement. Physical Design Cycle Partitioning Placement/ Floorplanning Placement/ Floorplanning Routing Break the circuit up into smaller segments Place.
Modern VLSI Design 3e: Chapter 4 Copyright  1998, 2002 Prentice Hall PTR Topics n Interconnect design. n Crosstalk. n Power optimization.
Register Placement for High- Performance Circuits M. Chiang, T. Okamoto and T. Yoshimura Waseda University, Japan DATE 2009.
Quadratic VLSI Placement Manolis Pantelias. General Various types of VLSI placement  Simulated-Annealing  Quadratic or Force-Directed  Min-Cut  Nonlinear.
CALTECH CS137 Winter DeHon CS137: Electronic Design Automation Day 7: February 3, 2002 Retiming.
Physical Synthesis Buffer Insertion, Gate Sizing, Wire Sizing,
Maze Routing Algorithms with Exact Matching Constraints for Analog and Mixed Signal Designs M. M. Ozdal and R. F. Hentschke Intel Corporation ICCAD 2012.
1 NTUplace: A Partitioning Based Placement Algorithm for Large-Scale Designs Tung-Chieh Chen 1, Tien-Chang Hsu 1, Zhe-Wei Jiang 1, and Yao-Wen Chang 1,2.
Data Structures and Algorithms in Parallel Computing Lecture 7.
1 CS612 Algorithms for Electronic Design Automation CS 612 – Lecture 8 Lecture 8 Network Flow Based Modeling Mustafa Ozdal Computer Engineering Department,
Net Criticality Revisited: An Effective Method to Improve Timing in Physical Design H. Chang 1, E. Shragowitz 1, J. Liu 1, H. Youssef 2, B. Lu 3, S. Sutanthavibul.
Static Timing Analysis
Outline Motivation and Contributions Related Works ILP Formulation
Effective Linear Programming-Based Placement Techniques Sherief Reda UC San Diego Amit Chowdhary Intel Corporation.
Hypergraph Partitioning With Fixed Vertices Andrew E. Caldwell, Andrew B. Kahng and Igor L. Markov UCLA Computer Science Department
6/19/ VLSI Physical Design Automation Prof. David Pan Office: ACES Placement (3)
RTL Design Flow RTL Synthesis HDL netlist logic optimization netlist Library/ module generators physical design layout manual design a b s q 0 1 d clk.
Placement and Routing Algorithms. 2 FPGA Placement & Routing.
VLSI Quadratic Placement
Andrew B. Kahng and Xu Xu UCSD CSE and ECE Depts.
EE5780 Advanced VLSI Computer-Aided Design
Algorithms (2IL15) – Lecture 7
Presentation transcript:

Analytical Minimization of Signal Delay in VLSI Placement Andrew B. Kahng and Igor L. Markov UCSD, Univ. of Michigan IBM technical contact: Paul Villarrubia

Outline Background: Global Placement for VLSI –wirelength minimization –delay minimization Contribution –minimization objective –“generic” minimization algorithm: outer loop and inner loop –empirical results Futures

VLSI Global Placement Find locations for standard cells Standard cells placed in rows, without overlap Minimize wirelength, “routing congestion” Minimize clock cycle Key abstractions: –standard cells  rectangular outlines –netlist  weighted hypergraph (signal nets  hyperedges) –signal delay  function of cell locations (interconnect dominates)

A VLSI Global Placement Example bad placementgood placement

Netlist Hypergraph and Timing Graph Two signal nets: 3 pins (l.blue), and 4 pins (l.green) Ovals: hyperedges Red edges: timing graph edges

Top-Down Global Placement Placement blocks represent cells and layout area –single block at the start, driven by recursive (min-cut) bipartitioning –each pass: number of blocks doubles, size of blocks halves –end case: several cells in a tiny region etc. Intuition: many cells can operate in parallel. Partitioning finds “independent” groups of cells

Analytical Global Placement Find a continuous placement (locations == reals) Efficient optimizations when nonconvex constraints are relaxed (e.g., cells are allowed to overlap) Represent multi-pin hyperedges by sets of edges –minimize total weighted “wirelength” of all edges Popular objectives: Linear (Manhattan) WL = w 12 ( |x 1 -x 2 | + |y 1 -y 2 | ) Quadratic “squared” WL = w 12 ( (x 1 -x 2 ) 2 + (y 1 -y 2 ) 2 ) Constraints: fixed vertices and/or “region constraints” P1P1 P2P2

Analytical Placement Alone is Not Enough Many cells overlap Must “spread” the placement IBM CPlace and XQ –Remove overlap (comp. geometry) –Cplace combines min-cut with analytical techniques

Timing-Driven Placement Cycle time  maximum path delay, not total path delay (!) –max(x,y,...) is not differentiable –framework: pin-based timing graph Analytical approaches allow cell overlaps –Cell overlaps are resolved later Main difficulty: cannot enumerate signal paths Signal paths implicitly defined by device types –signal path sources, sinks == I/O pins and storage elements Timing constraints also implicitly defined –“actual arrival times” (AATs) at sources –“required arrival times” (RATs) at sinks –source-sink path constraint: path delay  -

Implicit Analysis of Path Constraints Static Timing Analysis (STA) methodology –forward topological traversal in timing graph  –similar backward traversal  is given by - –negative slacks  violated timing constraints STA-based and STA-inspired placement methods –slacks  net weights for HPWL minimization top-down placement to maximize negative slack (Marek-Sadowska/Lin 86) –note: STA requires edge delays (e.g., from placement) –delay budgets zero-slack (Hauge, Nair and Yoffa 86) iterative min-max (Shragowitz et al. 90/92) limit-bumping (Frankle 92)

Motivations For Novelty Many promising techniques available –net reweighting –delay budgeting –others Existing frameworks have weaknesses –speed/scalability –loss or ignorance of input information delay budgeting algorithms tend to ignore fixed locations, obstacles –optimization of “wrong” global objectives (e.g., average wirelength)

The Dimensionless Path-Timing Objective For path  consider edge e  Dimensionless Path-Timing Objective (DPO)  =max  {t  /c  }= max  {(  e  d e )/c  } Where – c  is path constraint – t  is path delay – d e = d ij (x i,y i,x j,y j ) is edge delay

DPO: Properties  =max  {t  /c  }= max  {(  e  d e )/c  }   1  all timing constraints are satisfied Convex when edge delay models are convex Min DPO  max slack when all c  are equal Max slack can be reduced to min DPO –add two new vertices: the source and the sink –connect the source to former sources –connect the sink to former sinks –use constant edge delay models

Criticalities: “Multiplicative Slacks” By analogy with slack, define criticalities  i = max   v {t  /c  } for vertex v=v i  ij = max   e {t  /c  } for edge e=e ij Criticalities are multiplicative versions of slack DPO and criticalities quickly computable –STA + postprocessing Vertex criticalities  cells on critical paths –can be used by the proposed top-down timing-driven placement flow

Generic Minimization of DPO Reduce DPO to a simpler objective: max ij w ij d ij –maximal weighted edge delay –use “reweighting iterations” One reweighting iteration –assume a placement –compute edge criticalities –compute new edge weights w ij –minimize max ij w ij d ij (New weights: w ij ’=  ij  / d ij where  = max ij w ij d ij )

Properties of Reweighting Theorem 1. If  = max ij w ij d ij does not increase at a particular iteration, all timing constraints must be satisfied. Theorem 2. A re-weighting iteration either decreases DPO, or leaves it unchanged. Reweighting upper-bounds d ij because w ij d ij    can interpret reweighting as delay rebudgeting Youssef and Shragowitz used w ij =  ij in 1990/92 –[interpretation of their iterative MiniMax] –no iterations with placement: ignore fixed pad locations

Optimization of Maximal Edge Delay Must consider particular edge delay models –popular choices: linear and quadratic Theorem 3. 2-dim max edge delay can be reduced to 1-dim case with double #vertices [“Inlined” implementation: no new graph] max a km |t k -t m | max b km (t k -t m ) 2 Theorem 4. Let b km =a km 2  minimizers coincide  Linear and quadratic WL are numerically equivalent!

Top-Down Placement Framework Top-down placement done in passes In one pass –split every previously existing block Cell-to-block assignments –viewed as region constraints –gradually refine, converge to cell locs Assume we analytically minimized signal delay  have cell locations  can compute edge delays  can perform Static Timing Analysis  know which cells lie on critical paths Use delay-minimizing cell locs when splitting blocks

Empirical Validation We combined min-max placement with recursive min-cut bisection (Capo  CapoT) Implemented minimization of edge delay objectives: –Length as delay –Squared length as delay –Quadratic RC delay –MST-based Elmore delay (using Evaluated –Internal evaluators (after placement): sanity check –Industry timing analyzer Compared to an industry placer on 4 test-cases –Won on three test-cases (by slack computed with industry STA)

Results of Quadratic, Linear and Min-Max Placement

Conclusions and Ongoing Work New timing-driven placement framework –can potentially be combined with budgeting or reweighting –expected to be successful enough on its own –leverages mincut placement –relies on a novel analytical delay minimization Dimensionless Path-timing Objective (DPO) –novel global timing objective; generalizes slack optimization New minimization algorithms –reweighting iteration: reduction to simpler MAX-based objective –MAX-based objective can be minimized very quickly Ongoing work in the context of timing-driven flows

Future Work Observation (how the proposed method works) –a classic placement approach is split into stages –a new timing optimization is performed between those stages –most critical wires/gates are found first (traditionally: placement is found first)  Try other types of optimizations during placement –routing of timing-critical nets better delay estimation early cross-talk detection? –sizing of timing-critical drivers –buffer insertion for timing-critical nets –early detection of dangerous cross-talk  Faster and cheaper ICs