Advanced Interconnect Optimizations. Buffers Improve Slack RAT = 300 Delay = 350 Slack = -50 RAT = 700 Delay = 600 Slack = 100 RAT = 300 Delay = 250 Slack.

Slides:



Advertisements
Similar presentations
Porosity Aware Buffered Steiner Tree Construction C. Alpert G. Gandham S. Quay IBM Corp M. Hrkic Univ Illinois Chicago J. Hu Texas A&M Univ.
Advertisements

ECE Longest Path dual 1 ECE 665 Spring 2005 ECE 665 Spring 2005 Computer Algorithms with Applications to VLSI CAD Linear Programming Duality – Longest.
Ispd-2007 Repeater Insertion for Concurrent Setup and Hold Time Violations with Power-Delay Trade-Off Salim Chowdhury John Lillis Sun Microsystems University.
4/22/ Clock Network Synthesis Prof. Shiyan Hu Office: EREC 731.
Buffer and FF Insertion Slides from Charles J. Alpert IBM Corp.
ELEN 468 Lecture 261 ELEN 468 Advanced Logic Design Lecture 26 Interconnect Timing Optimization.
Confidentiality/date line: 13pt Arial Regular, white Maximum length: 1 line Information separated by vertical strokes, with two spaces on either side Disclaimer.
1 Interconnect Layout Optimization by Simultaneous Steiner Tree Construction and Buffer Insertion Presented By Cesare Ferri Takumi Okamoto, Jason Kong.
© Yamacraw, 2001 Minimum-Buffered Routing of Non-Critical Nets for Slew Rate and Reliability A. Zelikovsky GSU Joint work with C. Alpert.
Minimum-Buffered Routing of Non- Critical Nets for Slew Rate and Reliability Control Supported by Cadence Design Systems, Inc. and the MARCO Gigascale.
38 th Design Automation Conference, Las Vegas, June 19, 2001 Creating and Exploiting Flexibility in Steiner Trees Elaheh Bozorgzadeh, Ryan Kastner, Majid.
Interconnect Optimizations. A scaling primer Ideal process scaling: –Device geometries shrink by  = 0.7x) Device delay shrinks by  –Wire geometries.
EE4271 VLSI Design Interconnect Optimizations Buffer Insertion.
Interconnect Optimizations. A scaling primer Ideal process scaling: –Device geometries shrink by S  = 0.7x) Device delay shrinks by s –Wire geometries.
Power Optimal Dual-V dd Buffered Tree Considering Buffer Stations and Blockages King Ho Tam and Lei He Electrical Engineering Department University of.
04/09/02EECS 3121 Lecture 25: Interconnect Modeling EECS 312 Reading: 8.3 (text), 4.3.2, (2 nd edition)
UCLA TRIO Package Jason Cong, Lei He Cheng-Kok Koh, and David Z. Pan Cheng-Kok Koh, and David Z. Pan UCLA Computer Science Dept Los Angeles, CA
Statistical timing and synthesis Chandu paper. Canonical form Compute max(A,B) = C in canonical form (assuming  X i independent)
Interconnect Optimizations
Fast Buffer Insertion Considering Process Variation Jinjun Xiong, Lei He EE Department University of California, Los Angeles Sponsors: NSF, UC MICRO, Actel,
EE4271 VLSI Design Advanced Interconnect Optimizations Buffer Insertion.
ELEN 468 Lecture 271 ELEN 468 Advanced Logic Design Lecture 27 Interconnect Timing Optimization II.
Pei-Ci Wu Martin D. F. Wong On Timing Closure: Buffer Insertion for Hold-Violation Removal DAC’14.
Modern VLSI Design 2e: Chapter 4 Copyright  1998 Prentice Hall PTR Topics n Crosstalk. n Power optimization.
Interconnect Synthesis. Buffering Related Interconnect Synthesis Consider –Layer assignment –Wire sizing –Buffer polarity –Driver sizing –Generalized.
Advanced Interconnect Optimizations. Timing Driven Buffering Problem Formulation Given –A Steiner tree –RAT at each sink –A buffer type –RC parameters.
VLSI Physical Design: From Graph Partitioning to Timing Closure Paper Presentation © KLMH Lienig 1 EECS 527 Paper Presentation Accurate Estimation of Global.
1 Delay Estimation Most digital designs have multiple data paths some of which are not critical. The critical path is defined as the path the offers the.
Modern VLSI Design 4e: Chapter 4 Copyright  2008 Wayne Wolf Topics n Interconnect design. n Crosstalk. n Power optimization.
Review: CMOS Inverter: Dynamic
1 Coupling Aware Timing Optimization and Antenna Avoidance in Layer Assignment Di Wu, Jiang Hu and Rabi Mahapatra Texas A&M University.
Lecture 12 Review and Sample Exam Questions Professor Lei He EE 201A, Spring 2004
EE 5900 Advanced Algorithms for Robust VLSI CAD, Spring 2009 Static Timing Analysis and Gate Sizing.
Elmore Delay, Logical Effort
A Polynomial Time Approximation Scheme For Timing Constrained Minimum Cost Layer Assignment Shiyan Hu*, Zhuo Li**, Charles J. Alpert** *Dept of Electrical.
VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 5: Global Routing © KLMH Lienig 1 EECS 527 Paper Presentation Techniques for Fast.
Thermal-aware Steiner Routing for 3D Stacked ICs M. Pathak and S.K. Lim Georgia Institute of Technology ICCAD 07.
Modern VLSI Design 3e: Chapter 4 Copyright  1998, 2002 Prentice Hall PTR Topics n Interconnect design. n Crosstalk. n Power optimization.
A Faster Approximation Scheme for Timing Driven Minimum Cost Layer Assignment Shiyan Hu*, Zhuo Li**, and Charles J. Alpert** *Dept of ECE, Michigan Technological.
ELEN 468 Lecture 271 ELEN 468 Advanced Logic Design Lecture 27 Gate and Interconnect Optimization.
Linear Delay Model In general the propagation delay of a gate can be written as: d = f + p –p is the delay due to intrinsic capacitance. –f is the effort.
Fast Algorithms for Slew Constrained Minimum Cost Buffering S. Hu*, C. Alpert**, J. Hu*, S. Karandikar**, Z. Li*, W. Shi* and C. Sze** *Dept of ECE, Texas.
Topics Combinational network delay.
Radhamanjari Samanta *, Soumyendu Raha * and Adil I. Erzin # * Supercomputer Education and Research Centre, Indian Institute of Science, Bangalore, India.
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR Circuit design for FPGAs n Static CMOS gate vs. LUT n LE output drivers n Interconnect.
ERT/SERT Algorithm (1/16)Practical Problems in VLSI Physical Design Elmore Routing Tree (ERT) Algorithm Perform ERT algorithm under 65nm technology  Unit-length.
Physical Synthesis Buffer Insertion, Gate Sizing, Wire Sizing,
Modern VLSI Design 4e: Chapter 3 Copyright  2008 Wayne Wolf Topics n Wire delay. n Buffer insertion. n Crosstalk. n Inductive interconnect. n Switch logic.
Routing Tree Construction with Buffer Insertion under Obstacle Constraints Ying Rao, Tianxiang Yang Fall 2002.
Introduction to Clock Tree Synthesis
An Efficient Surface-Based Low-Power Buffer Insertion Algorithm
Modern VLSI Design 3e: Chapter 3 Copyright  1998, 2002 Prentice Hall PTR Topics n Wire delay. n Buffer insertion. n Crosstalk. n Inductive interconnect.
Clock Distribution Network
Static Timing Analysis
Incorporating Driver Sizing Into Buffer Insertion Via a Delay Penalty Technique Chuck Alpert, IBM Chris Chu, Iowa State Milos Hrkic, UIC Jiang Hu, IBM.
A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion Shiyan Hu*, Zhuo Li**, Charles Alpert** *Dept of Electrical.
Sorting Lower Bounds n Beating Them. Recap Divide and Conquer –Know how to break a problem into smaller problems, such that –Given a solution to the smaller.
Effects of Inductance on the Propagation Delay and Repeater Insertion in VLSI Circuits Yehea I. Ismail and Eby G. Friedman, Fellow, IEEE.
A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion Shiyan Hu*, Zhuo Li**, Charles Alpert** *Dept of Electrical.
An O(bn 2 ) Time Algorithm for Optimal Buffer Insertion with b Buffer Types Authors: Zhuo Li and Weiping Shi Presenter: Sunil Khatri Department of Electrical.
An O(nm) Time Algorithm for Optimal Buffer Insertion of m Sink Nets Zhuo Li and Weiping Shi {zhuoli, Texas A&M University College Station,
Leonid Kraginskiy Strategic CAD Labs Intel Corporation
Static Timing Analysis and Gate Sizing Optimization
Topics Driving long wires..
Buffer Insertion with Adaptive Blockage Avoidance
Static Timing Analysis and Gate Sizing Optimization
Buffered tree construction for timing optimization, slew rate, and reliability control Abstract: With the rapid scaling of IC technology, buffer insertion.
Buffered Steiner Trees for Difficult Instances
Objectives What have we learned? What are we going to learn?
Performance-Driven Interconnect Optimization Charlie Chung-Ping Chen
Presentation transcript:

Advanced Interconnect Optimizations

Buffers Improve Slack RAT = 300 Delay = 350 Slack = -50 RAT = 700 Delay = 600 Slack = 100 RAT = 300 Delay = 250 Slack = 50 RAT = 700 Delay = 400 Slack = 300 slack min = -50 slack min = 50 Decouple capacitive load from critical path RAT = Required Arrival Time Slack = RAT - Delay

Timing Driven Buffering Problem Formulation Given –A Steiner tree –RAT at each sink –A buffer type –RC parameters –Candidate buffer locations Find buffer insertion solution such that the slack at the driver is maximized

Candidate Buffering Solutions

Candidate Solution Characteristics Each candidate solution is associated with –v i : a node –c i : downstream capacitance –q i : RAT v i is a sink c i is sink capacitance v is an internal node

Van Ginneken’s Algorithm Candidate solutions are propagated toward the source Dynamic Programming

Solution Propagation: Add Wire c 2 = c 1 + cx q 2 = q 1 – rcx 2 /2 – rxc 1 r: wire resistance per unit length c: wire capacitance per unit length (v 1, c 1, q 1 ) (v 2, c 2, q 2 ) x

8 Solution Propagation: Insert Buffer c 1b = C b q 1b = q 1 – R b c 1 – t b C b : buffer input capacitance R b : buffer output resistance t b : buffer intrinsic delay (v 1, c 1, q 1 ) (v 1, c 1b, q 1b )

Solution Propagation: Merge c merge = c l + c r q merge = min(q l, q r ) (v, c l, q l )(v, c r, q r )

Solution Propagation: Add Driver q 0d = q 0 – R d c 0 = slack min R d : driver resistance Pick solution with max slack min (v 0, c 0, q 0 ) (v 0, c 0d, q 0d )

Example of Solution Propagation (v 1, 1, 20) 22 v1v1 v1v1 (v 2, 3, 16) r = 1, c = 1 R b = 1, C b = 1, t b = 1 R d = 1 (v 2, 1, 12) v1v1 (v 3, 5, 8) v1v1 (v 3, 3, 8) slack = 5slack = 3 Add wire Insert buffer Add wire Add driver

12 Example of Merging Left candidates Right candidates Merged candidates

Solution Pruning Two candidate solutions –(v, c 1, q 1 ) –(v, c 2, q 2 ) Solution 1 is inferior if –c 1 > c 2 : larger load –and q 1 < q 2 : tighter timing

Pruning When Insert Buffer They have the same load cap C b, only the one with max q is kept

15 Generating Candidates (1) (2) (3) From Dr. Charles Alpert

16 Pruning Candidates (3) (a) (b) Both (a) and (b) “look” the same to the source. Throw out the one with the worst slack (4)

17 Candidate Example Continued (4) (5)

18 Candidate Example Continued After pruning (5) At driver, compute which candidate maximizes slack. Result is optimal.

19 Merging Branches Right Candidates Left Candidates

20 Pruning Merged Branches Critical With pruning

21 Van Ginneken Example (20,400) (30,250) (5, 220) Wire C=10,d=150 Buffer C=5, d=30 (20,400) Buffer C=5, d=50 C=5, d=30 Wire C=15,d=200 C=15,d=120 (30,250) (5, 220) (45, 50) (5, 0) (20,100) (5, 70)

22 Van Ginneken Example Cont’d (20,400) (30,250) (5, 220) (45, 50) (5, 0) (20,100) (5, 70) (5,0) is inferior to (5,70). (45,50) is inferior to (20,100) (20,400) (30,250) (5, 220) (20,100) (5, 70) (30,10) (15, -10) Pick solution with largest slack, follow arrows to get solution Wire C=10

Basic Data Structure (c 1, q 1 )(c 2, q 2 )(c 3, q 3 ) Sorted list such that c 1 < c 2 < c 3 If there is no inferior candidates q 1 < q 2 < q 3 Worse load cap Better timing

24 Prune Solution List (c 1, q 1 )(c 2, q 2 )(c 3, q 3 ) Increasing c q 1 < q 2 ? (c 4, q 4 ) q 3 < q 4 ? Y N Prune 2 q 1 < q 3 ? q 2 < q 3 ? Y q 3 < q 4 ? Y Prune 3 q 1 < q 4 ? N Prune 3 N N Prune 4 N q 2 < q 4 ?

25 Pruning In Merging (c l1, q l1 ) (c l2, q l2 ) (c l3, q l3 ) (c r1, q r1 ) (c r2, q r2 ) q l1 < q l2 < q r1 < q l3 < q r2 Merged candidates (c l1 +c r1, q l1 ) (c l2 +c r1, q l2 ) (c l3 +c r1, q r1 ) (c l3 +c r2, q l3 ) (c l1, q l1 ) (c l2, q l2 ) (c l3, q l3 ) (c r1, q r1 ) (c r2, q r2 ) (c l1, q l1 ) (c l2, q l2 ) (c l3, q l3 ) (c r1, q r1 ) (c r2, q r2 ) (c l1, q l1 ) (c l2, q l2 ) (c l3, q l3 ) (c r1, q r1 ) (c r2, q r2 ) Left candidates Right candidates

Van Ginneken Complexity Generate candidates from sinks to source Quadratic runtime –Adding a wire does not change #candidates –Adding a buffer adds only one new candidate –Merging branches additive, not multiplicative –Linear time solution list pruning Optimal for Elmore delay model

Multiple Buffer Types (v 1, 1, 20) 22 v1v1 v1v1 (v 2, 3, 16) r = 1, c = 1 R b1 = 1, C b1 = 1, t b1 = 1 R b2 = 0.5, C b2 = 2, t b2 = 0.5 R d = 1 (v 2, 1, 12) v1v1 (v 2, 2, 14)