VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 5: Global Routing © KLMH Lienig 1 EECS 527 Paper Presentation Techniques for Fast.

Slides:

Advertisements

Similar presentations

Porosity Aware Buffered Steiner Tree Construction C. Alpert G. Gandham S. Quay IBM Corp M. Hrkic Univ Illinois Chicago J. Hu Texas A&M Univ.

Advertisements

OCV-Aware Top-Level Clock Tree Optimization

Advanced Interconnect Optimizations. Buffers Improve Slack RAT = 300 Delay = 350 Slack = -50 RAT = 700 Delay = 600 Slack = 100 RAT = 300 Delay = 250 Slack.

Cadence Design Systems, Inc. Why Interconnect Prediction Doesn’t Work.

Ispd-2007 Repeater Insertion for Concurrent Setup and Hold Time Violations with Power-Delay Trade-Off Salim Chowdhury John Lillis Sun Microsystems University.

4/22/ Clock Network Synthesis Prof. Shiyan Hu Office: EREC 731.

Buffer and FF Insertion Slides from Charles J. Alpert IBM Corp.

ELEN 468 Lecture 261 ELEN 468 Advanced Logic Design Lecture 26 Interconnect Timing Optimization.

Xing Wei, Wai-Chung Tang, Yu-Liang Wu Department of Computer Science and Engineering The Chinese University of HongKong

© KLMH Lienig 1 Impact of Local Interconnects and a Tree Growing Algorithm for Post-Grid Clock Distribution Jiayi Xiao.

Wen-Hao Liu1, Yih-Lang Li, and Cheng-Kok Koh Department of Computer Science, National Chiao-Tung University School of Electrical and Computer Engineering,

1 Interconnect Layout Optimization by Simultaneous Steiner Tree Construction and Buffer Insertion Presented By Cesare Ferri Takumi Okamoto, Jason Kong.

Chop-SPICE: An Efficient SPICE Simulation Technique For Buffered RC Trees Myung-Chul Kim, Dong-Jin Lee and Igor L. Markov Dept. of EECS, University of.

VLSI Physical Design: From Graph Partitioning to Timing Closure Paper Presentation © KLMH Lienig 1 EECS 527 Paper Presentation Topological Design of Clock.

The Cost of Fixing Hold Time Violations in Sub-threshold Circuits Yanqing Zhang, Benton Calhoun University of Virginia Motivation and Background Power.

Minimum-Buffered Routing of Non- Critical Nets for Slew Rate and Reliability Control Supported by Cadence Design Systems, Inc. and the MARCO Gigascale.

Boosting: Min-Cut Placement with Improved Signal Delay Andrew B. KahngSherief Reda CSE & ECE Departments University of CA, San Diego La Jolla, CA

Interconnect Optimizations. A scaling primer Ideal process scaling: –Device geometries shrink by  = 0.7x) Device delay shrinks by  –Wire geometries.

Power-Aware Placement

ER UCLA UCLA ICCAD: November 5, 2000 Predictable Routing Ryan Kastner, Elaheh Borzorgzadeh, and Majid Sarrafzadeh ER Group Dept. of Computer Science UCLA.

EE4271 VLSI Design Interconnect Optimizations Buffer Insertion.

Power Optimal Dual-V dd Buffered Tree Considering Buffer Stations and Blockages King Ho Tam and Lei He Electrical Engineering Department University of.

Fast and Area-Efficient Phase Conflict Detection and Correction in Standard-Cell Layouts Charles Chiang, Synopsys Andrew B. Kahng, UC San Diego Subarna.

Interconnect Optimizations

1 UCSD VLSI CAD Laboratory ISQED-2009 Revisiting the Linear Programming Framework for Leakage Power vs. Performance Optimization Kwangok Jeong, Andrew.

EE4271 VLSI Design Advanced Interconnect Optimizations Buffer Insertion.

1 Application Specific Integrated Circuits. 2 What is an ASIC? An application-specific integrated circuit (ASIC) is an integrated circuit (IC) customized.

ELEN 468 Lecture 271 ELEN 468 Advanced Logic Design Lecture 27 Interconnect Timing Optimization II.

Layout-based Logic Decomposition for Timing Optimization Yun-Yin Lien* Youn-Long Lin Department of Computer Science, National Tsing Hua University, Hsin-Chu,

Triple Patterning Aware Detailed Placement With Constrained Pattern Assignment Haitong Tian, Yuelin Du, Hongbo Zhang, Zigang Xiao, Martin D.F. Wong.

Interconnect Synthesis. Buffering Related Interconnect Synthesis Consider –Layer assignment –Wire sizing –Buffer polarity –Driver sizing –Generalized.

1 A Method for Fast Delay/Area Estimation EE219b Semester Project Mike Sheets May 16, 2000.

Advanced Interconnect Optimizations. Timing Driven Buffering Problem Formulation Given –A Steiner tree –RAT at each sink –A buffer type –RC parameters.

Combining High Level Synthesis and Floorplan Together EDA Lab, Tsinghua University Jinian Bian.

1 ENTITY test is port a: in bit; end ENTITY test; DRC LVS ERC Circuit Design Functional Design and Logic Design Physical Design Physical Verification and.

VLSI Physical Design: From Graph Partitioning to Timing Closure Paper Presentation © KLMH Lienig 1 EECS 527 Paper Presentation Accurate Estimation of Global.

VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 5: Global Routing © KLMH Lienig 1 FLUTE: Fast Lookup Table Based RSMT Algorithm.

Introduction to Routing. The Routing Problem Apply after placement Input: –Netlist –Timing budget for, typically, critical nets –Locations of blocks and.

MGR: Multi-Level Global Router Yue Xu and Chris Chu Department of Electrical and Computer Engineering Iowa State University ICCAD

A Topology-based ECO Routing Methodology for Mask Cost Minimization Po-Hsun Wu, Shang-Ya Bai, and Tsung-Yi Ho Department of Computer Science and Information.

CRISP: Congestion Reduction by Iterated Spreading during Placement Jarrod A. Roy†‡, Natarajan Viswanathan‡, Gi-Joon Nam‡, Charles J. Alpert‡ and Igor L.

Power Reduction for FPGA using Multiple Vdd/Vth

Global Routing.

1 Coupling Aware Timing Optimization and Antenna Avoidance in Layer Assignment Di Wu, Jiang Hu and Rabi Mahapatra Texas A&M University.

CAD for Physical Design of VLSI Circuits

A Polynomial Time Approximation Scheme For Timing Constrained Minimum Cost Layer Assignment Shiyan Hu*, Zhuo Li**, Charles J. Alpert** *Dept of Electrical.

Wen-Hao Liu 1, Yih-Lang Li 1, and Kai-Yuan Chao 2 1 Department of Computer Science, National Chiao-Tung University, Hsin-Chu, Taiwan 2 Intel Architecture.

VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 5: Global Routing © KLMH Lienig 1 EECS 527 Paper Presentation High-Performance.

New Modeling Techniques for the Global Routing Problem Anthony Vannelli Department of Electrical and Computer Engineering University of Waterloo Waterloo,

Bus-Pin-Aware Bus-Driven Floorplanning B. Wu and T. Ho Department of Computer Science and Information Engineering NCKU GLSVLSI 2010.

Massachusetts Institute of Technology 1 L14 – Physical Design Spring 2007 Ajay Joshi.

Kwangsoo Han, Andrew B. Kahng, Hyein Lee and Lutong Wang

Placement. Physical Design Cycle Partitioning Placement/ Floorplanning Placement/ Floorplanning Routing Break the circuit up into smaller segments Place.

VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 6: Detailed Routing © KLMH Lienig 1 What Makes a Design Difficult to Route Charles.

A Faster Approximation Scheme for Timing Driven Minimum Cost Layer Assignment Shiyan Hu*, Zhuo Li**, and Charles J. Alpert** *Dept of ECE, Michigan Technological.

Fishbone: A Block-Level Placement and Routing Scheme Fan Mo and Robert K. Brayton EECS, UC Berkeley.

Fast Algorithms for Slew Constrained Minimum Cost Buffering S. Hu*, C. Alpert**, J. Hu*, S. Karandikar**, Z. Li*, W. Shi* and C. Sze** *Dept of ECE, Texas.

ECE 260B – CSE 241A /UCB EECS Kahng/Keutzer/Newton Physical Design Flow Read Netlist Initial Placement Placement Improvement Cost Estimation Routing.

1 CS612 Algorithms for Electronic Design Automation CS 612 – Lecture 1 Course Overview Mustafa Ozdal Computer Engineering Department, Bilkent University.

Physical Synthesis Buffer Insertion, Gate Sizing, Wire Sizing,

Static Timing Analysis

A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion Shiyan Hu*, Zhuo Li**, Charles Alpert** *Dept of Electrical.

-1- Delay Uncertainty and Signal Criticality Driven Routing Channel Optimization for Advanced DRAM Products Samyoung Bang #, Kwangsoo Han ‡, Andrew B.

An O(bn 2 ) Time Algorithm for Optimal Buffer Insertion with b Buffer Types Authors: Zhuo Li and Weiping Shi Presenter: Sunil Khatri Department of Electrical.

An O(nm) Time Algorithm for Optimal Buffer Insertion of m Sink Nets Zhuo Li and Weiping Shi {zhuoli, Texas A&M University College Station,

6/19/ VLSI Physical Design Automation Prof. David Pan Office: ACES Placement (3)

VLSI Physical Design Automation

Buffer Insertion with Adaptive Blockage Avoidance

Objectives What have we learned? What are we going to learn?

Under a Concurrent and Hierarchical Scheme

Presentation transcript:

VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 5: Global Routing © KLMH Lienig 1 EECS 527 Paper Presentation Techniques for Fast Physical Synthesis By Charles J. Alpert, Shrirang K. Karandikar, Zhuo Li, Gi-Joon Nam, Stephen T. Quay, Haoxing Ren, C. N. Sze, Paul G. Villarrubia, and Mehmet C. Yildiz Presented by Lingfeng Xu Department Electrical Engineering and Computer Science University of Michigan, Ann Arbor 11/2011

VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 5: Global Routing © KLMH Lienig EECS 527 Paper Presentation Outlines  Introduction  Buffering Trends  Major Phases of Physical Synthesis  Closer Look at Optimization  Selected Techniques  Fast Timing-Driven Buffering  Layout Aware Buffer Trees  Diffusion Based Legalization  Q&A 2

VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 5: Global Routing © KLMH Lienig EECS 527 Paper Presentation Introduction  Purpose of physical synthesis  Timing closure  Physical synthesis  Iterations  Iterate between manual design work and automatic physical synthesis  Philosophy  As fast as possible even if a little optimality is sacrificed  IBM’s physical synthesis tool  PDS (Placement-Driven Synthesis) system 3

VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 5: Global Routing © KLMH Lienig EECS 527 Paper Presentation Buffering trends  “Buffering Explosion”  Thiner wires == resistance increase  Wire delays increasingly dominate gate delays  Saxena et al. [3] predict that half of all logic will consist of buffers  20% - 25% buffers or inverters in today’s 90nm design 4

VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 5: Global Routing © KLMH Lienig  Percentage of block-level nets requiring repeaters [3]  Intra-block communication repeaters as a percentage of the total cell count for the block [3] 5

VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 5: Global Routing © KLMH Lienig EECS 527 Paper Presentation Buffering trends  Challenges  Buffer insertion need to be performed fast  Area and Power  Layout awareness  Buffering constricts or seeds global routing 6

VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 5: Global Routing © KLMH Lienig EECS 527 Paper Presentation Major Phase of Physical Synthesis  PDS stages  Initial placement and optimization  Timing-driven placement and optimization  Timing-driven detailed placement  Optimization techniques  Clock insertion and optimization  Routing and post routing optimization  Early-mode timing optimization 7

VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 5: Global Routing © KLMH Lienig EECS 527 Paper Presentation Closer look at Optimization  Optimization phases  Electrical correction  Critical path optimization  Histogram compression  Legalization  An example of physical synthesis breakdown 8 Initial Placement Electrical Correction Legalization Critical Slack Optimization Phase 1 Timing-driven Placement Electrical Correction Critical Slack Optimization Legalization Compression Legalization Phase 2 Timing-driven Detailed Placement Phase 3 Electrical Correction Legalization Critical Slack Optimization Legalization Critical Slack Optimization Legalization Compression Legalization Phase 4

VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 5: Global Routing © KLMH Lienig EECS 527 Paper Presentation How to Achieve Fast Physical Synthesis?  Selected Techniques  Fast Timing-Driven Buffering  Layout Aware Buffer Trees  Diffusion Based Legalization 9

VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 5: Global Routing © KLMH Lienig EECS 527 Paper Presentation Fast Timing-Driven Buffering  Motivation  Over a million buffers  Rebuffering rips all buffers and reinserts buffers from scratch  Considerations  Buffering resources vs. delay  Runtime  Slew, noise and capacitance constraints 10

VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 5: Global Routing © KLMH Lienig EECS 527 Paper Presentation Fast Timing-Driven Buffering  Classical Buffering Algorithm  Goal: Maximize source RAT  Dynamic programming  Candidate solutions generated and propagated from the sinks to the source  Solution internal node characteristics (q, c, w)  q: required arrival time  c: downstream load capacitance  w: cost summation for the buffer insertion decision  Example: sink (q = RAT, c = load capacitance, w = 0) 11

VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 5: Global Routing © KLMH Lienig EECS 527 Paper Presentation Fast Timing-Driven Buffering  Classical Buffering Algorithm  Two solutions α 1, α 2  α 2 dominates α 1, if q 2 ≥ q 1, c 2 ≤ c 1 and w 2 ≤ w 1  α 1 is redundant and can be pruned  At the end of algorithm  A set of solutions with different cost-RAT tradeoff is obtained  Choose one in middle  “10 ps rule”: If margin RAT gain is more than 10ps, choose solution with bigger RAT 12

VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 5: Global Routing © KLMH Lienig EECS 527 Paper Presentation Fast Timing-Driven Buffering  Prebuffer Slack Pruning (PSP)  Based on current node being processed  if q 2 < q 1, c 2 < c 1 and (q 2 - q 1 )/(c 2 - c 1 ) ≥ R min, then α 2 is pruned early  Appropriate R min guarantees optimality, however larger value does not hurt solution quality 13

VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 5: Global Routing © KLMH Lienig EECS 527 Paper Presentation Fast Timing-Driven Buffering  Squeeze Pruning  Three partial solutions α 1, α 2, α 3 with same cost  if (q 2 - q 1 )/(c 2 - c 1 )≤(q 3 - q 2 )/(c 3 - c 2 ), then α 2 is pruned  For a two-pin net, the middle point is always dominated by either the first or the third solution; for multi-sink net, optimality not guaranteed but causes no degradation in solution most of the time 14

VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 5: Global Routing © KLMH Lienig EECS 527 Paper Presentation Fast Timing-Driven Buffering  Library Lookup  Every buffer in the library is examined for iteration If there are m kinds of buffer and inverter, n nodes, mn candidate solutions in total  However many candidate solutions are not worth considering  Pre-compute Buffer table and Inverter table  2n candidate solutions, n with inverters and n with buffers 15

VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 5: Global Routing © KLMH Lienig EECS 527 Paper Presentation Fast Timing-Driven Buffering  Results and Summary  Derived from 5000 high capacitance nets from an ASIC chip  3% quality degradation and 20x speedup  Philosophy: as fast as possible even if a little optimality is sacrificed  Rip up and rebuffering with more accurate techniques can be perform latter if desired 16

VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 5: Global Routing © KLMH Lienig EECS 527 Paper Presentation Layout Aware Fast and Flexible Buffer Trees  Layout problems in buffering  (a) Alley  (b) Pile-ups  Holes in large blocks  Layout constrains  Holes in large blocks  Navigating blocks and dense region  Critical and non-critical routes  Avoiding routing congestions 17

VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 5: Global Routing © KLMH Lienig EECS 527 Paper Presentation Layout Aware Fast and Flexible Buffer Trees  Layout aware buffer tree flow  Step 1: Construct a fast timing-driven Steiner tree  Step 2: Reroute the Steiner tree to preserve its topology while navigating environmental constrains  Step 3: Insert buffers (e.g. with Fast Timing-Driven Buffering)  This work focuses on Step 2 18

VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 5: Global Routing © KLMH Lienig EECS 527 Paper Presentation Layout Aware Fast and Flexible Buffer Trees  Algorithm  Break existing Steiner tree into disjoint 2-paths, i.e., paths start and end with either source, sink or a Steiner point  Each 2-path is routed in turn to minimize cost, starting from sinks and ending at source  Maze routing for each 2-path with cost function  If Steiner point is in a congested region, move it in a specified “plate region” 19

VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 5: Global Routing © KLMH Lienig EECS 527 Paper Presentation  20

VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 5: Global Routing © KLMH Lienig EECS 527 Paper Presentation  21

VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 5: Global Routing © KLMH Lienig EECS 527 Paper Presentation Layout Aware Fast and Flexible Buffer Trees  General Maze routing cost function  Tradeoff parameter 0 ≤ K ≤ 1  Tile cost: cost(t) = 1 + K e(t)  Merging branches: cost(t) = max(cost(L), cost(R) + K min(cost(L), cost(R))  Sink initialization cost(s) = (K - 1)RAT(s)/DpT  Use K=1 for electrical correction; use K=0.1 for critical path 22

VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 5: Global Routing © KLMH Lienig EECS 527 Paper Presentation Layout Aware Fast and Flexible Buffer Trees  Example and Summary  A 7-pin net of an industrial design  (a) K=1.0, 4134ps slack improvement  (b) K=0.1, 4646ps slack improvement 23

VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 5: Global Routing © KLMH Lienig EECS 527 Paper Presentation Diffusion-Based Placement Techniques for Legalization  Classical legalization  After optimization, local regions can be overfull  Run periodically to snap from overlaps to legal positions  If one waits too long between two legalizations, cells may end up quite far away from optimal position, which may severely hurt timing  Diffusion-Based Legalization  Avoid cells been moved too far away  Fast. Run in minutes on designs with millions of gates 24

VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 5: Global Routing © KLMH Lienig EECS 527 Paper Presentation Diffusion-Based Placement Techniques for Legalization  Diffusion as a Physical Process  Moves elements from a state with non-zero potential energy to a state of equilibrium  Can be modeled by breaking down into finite time steps  Relationship of material concentration with time and space 25

VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 5: Global Routing © KLMH Lienig EECS 527 Paper Presentation Diffusion-Based Placement Techniques for Legalization  Diffusion as a Physical Process  Cell velocity  Cell new location 26

VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 5: Global Routing © KLMH Lienig EECS 527 Paper Presentation Diffusion-Based Placement Techniques for Legalization  Diffusion Based Placement  Coordinates are scaled so that the width and height of each bin is one  Location (x, y) lies in bin  Forward Time Centered Space (FTCS) scheme New bin density  Bin velocity 27

VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 5: Global Routing © KLMH Lienig EECS 527 Paper Presentation Diffusion-Based Placement Techniques for Legalization  Diffusion Based Placement  Enforce v H = 0 at horizontal boundary and v H = 0 at vertical boundary  Two cells right next to each other can be assigned very different velocities which could change their relative ordering. Apply velocity interpolation based on the four closest bins to remedy this behavior  New locations (x, y) for the next time stamp 28

VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 5: Global Routing © KLMH Lienig EECS 527 Paper Presentation Diffusion-Based Placement Techniques for Legalization  Diffusion Based Placement: Getting it work  Diffusion process reaches equilibrium when each bin has the same density, i.e. the average density, can cause unnecessary spreading, even if every bin’s density is well below d max  Idea: Run diffusion for regions which requires it  Local Diffusion: Run diffusion on cells in a window around bins that violate target density constraint  If FTCS error exceeds a certain threshold, update the real density based on real cell placement and restart the diffusion algorithm 29

VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 5: Global Routing © KLMH Lienig EECS 527 Paper Presentation Diffusion-Based Placement Techniques for Legalization  Example  Before legalization, after traditional legalization and diffusion legalization  4% total wire length save  48% worst slack improvement  36% less negative paths  Summary  Diffusion based legalization is less likely to disrupt the state of design 30

VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 5: Global Routing © KLMH Lienig EECS 527 Paper Presentation Summary  Buffering trends  “Buffer Explosion”  Physical synthesis phases  4 phases  Fast Timing-Driven Buffering  Layout Aware Buffer Trees  Diffusion-Based Legalization 31

VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 5: Global Routing © KLMH Lienig EECS 527 Paper Presentation 32 Thanks ! Q&A