EE4271 VLSI Design Interconnect Optimizations Buffer Insertion.

Slides:



Advertisements
Similar presentations
Porosity Aware Buffered Steiner Tree Construction C. Alpert G. Gandham S. Quay IBM Corp M. Hrkic Univ Illinois Chicago J. Hu Texas A&M Univ.
Advertisements

Topics Electrical properties of static combinational gates:
Gregory Shklover, Ben Emanuel Intel Corporation MATAM, Haifa 31015, Israel Simultaneous Clock and Data Gate Sizing Algorithm with Common Global Objective.
Advanced Interconnect Optimizations. Buffers Improve Slack RAT = 300 Delay = 350 Slack = -50 RAT = 700 Delay = 600 Slack = 100 RAT = 300 Delay = 250 Slack.
EE141 © Digital Integrated Circuits 2nd Wires 1 The Wires Dr. Shiyan Hu Office: EERC 731 Adapted and modified from Digital Integrated Circuits: A Design.
Buffer and FF Insertion Slides from Charles J. Alpert IBM Corp.
ELEN 468 Lecture 261 ELEN 468 Advanced Logic Design Lecture 26 Interconnect Timing Optimization.
1 Modeling and Optimization of VLSI Interconnect Lecture 9: Multi-net optimization Avinoam Kolodny Konstantin Moiseev.
Confidentiality/date line: 13pt Arial Regular, white Maximum length: 1 line Information separated by vertical strokes, with two spaces on either side Disclaimer.
1 Interconnect Layout Optimization by Simultaneous Steiner Tree Construction and Buffer Insertion Presented By Cesare Ferri Takumi Okamoto, Jason Kong.
EE 587 SoC Design & Test Partha Pande School of EECS Washington State University
A Look at Chapter 4: Circuit Characterization and Performance Estimation Knowing the source of delays in CMOS gates and being able to estimate them efficiently.
Fall 06, Sep 19, 21 ELEC / Lecture 6 1 ELEC / (Fall 2005) Special Topics in Electrical Engineering Low-Power Design of Electronic.
Minimum-Buffered Routing of Non- Critical Nets for Slew Rate and Reliability Control Supported by Cadence Design Systems, Inc. and the MARCO Gigascale.
Modern VLSI Design 2e: Chapter4 Copyright  1998 Prentice Hall PTR.
Interconnect Optimizations. A scaling primer Ideal process scaling: –Device geometries shrink by  = 0.7x) Device delay shrinks by  –Wire geometries.
Interconnect Optimizations. A scaling primer Ideal process scaling: –Device geometries shrink by S  = 0.7x) Device delay shrinks by s –Wire geometries.
04/11/02EECS 3121 Lecture 26: Interconnect Modeling, continued EECS 312 Reading: 8.2.2, (text) HW 8 is due now!
04/09/02EECS 3121 Lecture 25: Interconnect Modeling EECS 312 Reading: 8.3 (text), 4.3.2, (2 nd edition)
UCLA TRIO Package Jason Cong, Lei He Cheng-Kok Koh, and David Z. Pan Cheng-Kok Koh, and David Z. Pan UCLA Computer Science Dept Los Angeles, CA
Interconnect Optimizations
Outline Noise Margins Transient Analysis Delay Estimation
Fast Buffer Insertion Considering Process Variation Jinjun Xiong, Lei He EE Department University of California, Los Angeles Sponsors: NSF, UC MICRO, Actel,
EE4271 VLSI Design Advanced Interconnect Optimizations Buffer Insertion.
ELEN 468 Lecture 271 ELEN 468 Advanced Logic Design Lecture 27 Interconnect Timing Optimization II.
Introduction to CMOS VLSI Design Lecture 21: Scaling and Economics Credits: David Harris Harvey Mudd College (Material taken/adapted from Harris’ lecture.
Interconnect Synthesis. Buffering Related Interconnect Synthesis Consider –Layer assignment –Wire sizing –Buffer polarity –Driver sizing –Generalized.
Advanced Interconnect Optimizations. Timing Driven Buffering Problem Formulation Given –A Steiner tree –RAT at each sink –A buffer type –RC parameters.
Practical Aspects of Logic Gates COE 202 Digital Logic Design Dr. Aiman El-Maleh College of Computer Sciences and Engineering King Fahd University of Petroleum.
1 Delay Estimation Most digital designs have multiple data paths some of which are not critical. The critical path is defined as the path the offers the.
Modern VLSI Design 4e: Chapter 4 Copyright  2008 Wayne Wolf Topics n Interconnect design. n Crosstalk. n Power optimization.
EE141 © Digital Integrated Circuits 2nd Wires 1 The Wires Dr. Shiyan Hu Office: EERC 731 Adapted and modified from Digital Integrated Circuits: A Design.
Review: CMOS Inverter: Dynamic
EE 5900 Advanced Algorithms for Robust VLSI CAD, Spring 2009 Static Timing Analysis and Gate Sizing.
Elmore Delay, Logical Effort
A Polynomial Time Approximation Scheme For Timing Constrained Minimum Cost Layer Assignment Shiyan Hu*, Zhuo Li**, Charles J. Alpert** *Dept of Electrical.
An Efficient Clustering Algorithm For Low Power Clock Tree Synthesis Rupesh S. Shelar Enterprise Microprocessor Group Intel Corporation, Hillsboro, OR.
Thermal-aware Steiner Routing for 3D Stacked ICs M. Pathak and S.K. Lim Georgia Institute of Technology ICCAD 07.
Modern VLSI Design 3e: Chapter 4 Copyright  1998, 2002 Prentice Hall PTR Topics n Interconnect design. n Crosstalk. n Power optimization.
A Faster Approximation Scheme for Timing Driven Minimum Cost Layer Assignment Shiyan Hu*, Zhuo Li**, and Charles J. Alpert** *Dept of ECE, Michigan Technological.
ELEN 468 Lecture 271 ELEN 468 Advanced Logic Design Lecture 27 Gate and Interconnect Optimization.
Linear Delay Model In general the propagation delay of a gate can be written as: d = f + p –p is the delay due to intrinsic capacitance. –f is the effort.
Fast Algorithms for Slew Constrained Minimum Cost Buffering S. Hu*, C. Alpert**, J. Hu*, S. Karandikar**, Z. Li*, W. Shi* and C. Sze** *Dept of ECE, Texas.
© Digital Integrated Circuits 2nd Inverter EE5900 Advanced Algorithms for Robust VLSI CAD The Inverter Dr. Shiyan Hu Office: EERC 731 Adapted.
Physical Synthesis Buffer Insertion, Gate Sizing, Wire Sizing,
Modern VLSI Design 4e: Chapter 3 Copyright  2008 Wayne Wolf Topics n Wire delay. n Buffer insertion. n Crosstalk. n Inductive interconnect. n Switch logic.
EE 4271 VLSI Design, Fall 2013 Static Timing Analysis and Gate Sizing Optimization.
Introduction to Clock Tree Synthesis
An Efficient Surface-Based Low-Power Buffer Insertion Algorithm
Modern VLSI Design 3e: Chapter 3 Copyright  1998, 2002 Prentice Hall PTR Topics n Wire delay. n Buffer insertion. n Crosstalk. n Inductive interconnect.
Modern VLSI Design 3e: Chapter 3 Copyright  1998, 2002 Prentice Hall PTR Topics n Electrical properties of static combinational gates: –transfer characteristics;
EE 587 SoC Design & Test Partha Pande School of EECS Washington State University
A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion Shiyan Hu*, Zhuo Li**, Charles Alpert** *Dept of Electrical.
Effects of Inductance on the Propagation Delay and Repeater Insertion in VLSI Circuits Yehea I. Ismail and Eby G. Friedman, Fellow, IEEE.
A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion Shiyan Hu*, Zhuo Li**, Charles Alpert** *Dept of Electrical.
1 Modeling and Optimization of VLSI Interconnect Lecture 2: Interconnect Delay Modeling Avinoam Kolodny Konstantin Moiseev.
An O(bn 2 ) Time Algorithm for Optimal Buffer Insertion with b Buffer Types Authors: Zhuo Li and Weiping Shi Presenter: Sunil Khatri Department of Electrical.
Circuit Delay Performance Estimation Most digital designs have multiple signal paths and the slowest one of these paths is called the critical path Timing.
An O(nm) Time Algorithm for Optimal Buffer Insertion of m Sink Nets Zhuo Li and Weiping Shi {zhuoli, Texas A&M University College Station,
Unified Adaptivity Optimization of Clock and Logic Signals Shiyan Hu and Jiang Hu Dept of Electrical and Computer Engineering Texas A&M University.
COE 360 Principles of VLSI Design Delay. 2 Definitions.
Wires & wire delay Lecture 9 Tuesday September 27, 2016.
The Interconnect Delay Bottleneck.
Static Timing Analysis and Gate Sizing Optimization
Topics Driving long wires..
Static Timing Analysis and Gate Sizing Optimization
Buffered tree construction for timing optimization, slew rate, and reliability control Abstract: With the rapid scaling of IC technology, buffer insertion.
Wire Indctance Consequences of on-chip inductance include:
Objectives What have we learned? What are we going to learn?
Performance-Driven Interconnect Optimization Charlie Chung-Ping Chen
Presentation transcript:

EE4271 VLSI Design Interconnect Optimizations Buffer Insertion

Moore’s law Twice the number of transistors, approximately every two years, so double clock frequency accordingly

Source: Gordon Moore, Chairman Emeritus, Intel Corp Technology generation (  m ) Delay (psec) Transistor/Gate delay Interconnect delay Interconnects Dominate This is why Moore’s law is not true anymore.

Objectives What have we learned? –Compute circuit delay on wires and gates –Gate delay optimization What are we going to learn? –Interconnect delay optimization: buffer insertion Why reducing delay How to perform it –This is the most important optimization in circuit design

Source: Gordon Moore, Chairman Emeritus, Intel Corp Technology generation (  m ) Delay (psec) Transistor/Gate delay Interconnect delay Why is this trend?

A scaling primer Ideal process scaling: –Device geometries shrink by S  = 0.7x) Device delay shrinks by s –Wire geometries shrink by  Unit resistance R/  :  /(ws.hs) = r/s 2 Unit coupling capacitance Cc/  : (hs)/(Ss) Resistance doubled, capacitance roughly unchanged for unit length How about the change in wire length? SGD h w l S ll hh SS ww

Technology scaling Global (long) interconnect lengths don’t shrink –Global interconnect link cells far apart Local (short) interconnect lengths shrink by s –Local interconnects link cells nearby

Interconnect delay scaling Delay of a wire of length l :  int = (rl)(cl) = rcl 2 (a quadratic function of length) Local interconnects :  int : (r/s 2 )(c)(ls) 2 = rcl 2 –Local interconnect delay unchanged Global interconnects :  int : (r/s 2 )(c)(l) 2 = (rcl 2 )/s 2 –Global interconnect delay doubled – unsustainable! Interconnect delay increasingly more dominant

Buffer Insertion For Delay Reduction

Elmore Delay for Wire x C unit wire capacitance c unit wire resistance r

Elmore Delay for Buffer v C u Driving resistance Input capacitance

Elmore Delay for A Circuit Delay =  all Ri  all Cj downstream from Ri Ri*Cj Elmore delay to n1 R(B)*(C1+C2) Elmore delay to n2 R(B)*(C1+C2)+R(w)*C2 R(B) C1 R(w) C2 n1 B n2

R Buffers Reduce Wire Delay x/2 cx/4 rx/2 t_unbuf = R( cx + C ) + rx( cx/2 + C ) t_buf = 2R( cx/2 + C ) + rx( cx/4 + C ) t_buf – t_unbuf = RC – rcx 2 /4 x/2 cx/4 rx/2 C C R x ∆t∆t

Buffered global interconnects: Intuition Interconnect delay = r.c.l 2 /2 Interconnect delay =  r.c.l i 2 /2 < r.c.l 2 /2 (where l =  l j ) since  (l j 2 ) < (  l j ) 2 (Of course, we need to consider buffer delay as well) l1l1 lnln l3l3 l2l2 l

Optimal Buffer Insertion on A Wire Delay before buffer insertion = rcL 2 /2 Assume N identical buffers with equal inter-buffer length l (L = Nl) For minimum delay, L R d – On resistance of inverter C g – Gate input capacitance r,c – unit resistance and capacitance … … l

Optimal interconnect delay Substituting l opt back into the interconnect delay expression: Delay grows linearly with L (instead of quadratically)

Total buffer count Ever-increasing fractions of total cell count will be buffers –70% in 32nm –25% is widely observed nm65nm45nm32nm % cells used to buffer nets clk-buf buf tot-buf

ITRS projections

Exercise 1 Given a wire of length 10 with r=2, c=2, what is its delay? Given a buffer with R d =10, C g =20, after optimally buffering the wire, what is the delay? What if wire length is 100? Any conclusion?

Exercise 2 Relationship with gate sizing –If we can size the buffer, what is the best buffer size? –Let R 0 denote the unit size buffer driving resistance, and C 0 denote the unit size buffer input capacitance. Thus, R d =R 0 /h and C g =C 0 h –What is best h leading to smallest delay?

Analogy

Advancing technology = period of city expansion, more transistors = larger city Interconnects = streets Buffers = gas stations Signal delay (timing) = time to cross the city Buffer insertion = gas station construction

Previous Result is Only Theoretical: Discrete Buffer Locations Candidate buffer locations

RAT: Required Arrival Time RAT = 100 Wire delay = 80 AT = 0 RAT = 100 Wire delay = 80 AT = 0 RAT = 20 AT = 80

Slack: RAT - AT RAT = 100 Wire delay = 80 AT = 0 RAT = 20 AT = 80 Slack = 20 Minimizing circuit delay = maximizing RAT at driver = maximizing slack at driver

Motivation for Problem Formulation RAT = 300 AT = 350 Slack = RAT-AT= -50 RAT = 700 AT = 600 Slack = 100 RAT = 300 AT = 250 Slack = 50 RAT = 700 AT = 400 Slack = 300 slack = -50 slack = 50 Decouple capacitive load from critical path RAT = Required Arrival Time Slack = RAT - AT We need to maximum slack or RAT at driver

Timing Driven Buffering Problem Formulation Given –A Steiner tree –RAT at each sink –A buffer type –RC parameters –Candidate buffer locations Find buffer insertion solution such that the slack (or RAT) at the driver is maximized

An Example for Buffer Insertion (v 1, 1, 20) 22 v1v1 v1v1 (v 2, 3, 16) r = 1, c = 1 R b = 1, C b = 1 R d = 1 (v 2, 1, 13) v1v1 (v 3, 5, 8) v1v1 (v 3, 3, 9) slack = 6slack = 3 Add wire Insert buffer Add wire Add driver CQ

Candidate Buffering Solution Definition Each candidate solution is associated with –v i : a node –c i : downstream capacitance –q i : RAT v i is a sink c i is sink capacitance v is an internal node

Van Ginneken’s Algorithm Candidate solutions are propagated toward the source

Solution Propagation: Add Wire c 2 = c 1 + cx q 2 = q 1 – rcx 2 /2 – rxc 1 r: wire resistance per unit length c: wire capacitance per unit length (v 1, c 1, q 1 ) (v 2, c 2, q 2 ) x

32 Solution Propagation: Insert Buffer c 1b = C b q 1b = q 1 – R b c 1 C b : buffer capacitance R b : buffer resistance (v 1, c 1, q 1 ) (v 1, c 1b, q 1b )

Solution Propagation: Add Driver q 0d = q 0 – R d c 0 R d : driver resistance Pick solution with max slack (v 0, c 0, q 0 ) (v 0, c 0d, q 0d )

Exercise (20,400) 2 2 Unit Wire Cap = 5 Unit Wire Res = 3 Buffer C=5, R=1 Perform buffer insertion to maximize the slack at driver 2

Exponential Runtime 2 solutions 4 solutions 8 solutions 16 solutions n candidate buffer locations lead to 2 n solutions

Solution Pruning Two candidate solutions –(v, c 1, q 1 ) –(v, c 2, q 2 ) Solution 1 is inferior if –c 1 > c 2 : larger load –and q 1 < q 2 : tighter timing

LOAD An Analogy - I Faster -> Smaller Delay -> Larger RAT (since RAT = RAT output - Delay) Larger Load -> Larger Capacitance

LOAD Faster & smaller load (larger RAT, smaller capacitance): Good Slower & larger load (smaller RAT, larger capacitance): Inferior END An Analogy - II

END Who will be the winner? Cannot tell at this moment, so keep both of them. An Analogy - III

END Who will be the winner? Cannot tell at this moment, so keep both of them. An Analogy - IV

Pruning When Insert Buffer They have the same load cap C b, only the one with max q is kept

42 Generating Candidates (1) (2) (3) From Dr. Charles Alpert

43 Pruning Candidates (3) (a) (b) Both (a) and (b) “look” the same to the source. Throw out the one with the worse slack (4)

44 Candidate Example Continued (4) (5)

45 Candidate Example Continued After pruning (5) At driver, compute which candidate maximizes slack. Result is optimal.

46 Example (20,400) (30,250) (5, 220) (20,400) (30,250) (5, 220) (40, 40) (5, 0) (15,160) (5, 145) Unit Wire Cap = 5 Unit Wire Res = 3 Buffer C=5, R=

47 Example Cont’d (20,400) (30,250) (5, 220) (40, 40) (5, 0) (15,160) (5, 145) (5,0) is inferior to (5,145). (45,40) is inferior to (15,160) (20,400) (30,250) (5, 220) (15,160) (5, 145) (5,15) (5,70) Pick solution with largest slack, follow arrows to get solution

Exercise Without pruning, there will be exponential number of candidate solutions (i.e., given n candidate buffer locations, there will be 2 n solutions). With pruning, how many solutions will we have?

Exercise Unit Wire Cap = 1 Unit Wire Res = 1 Buffer C=1, R=1 2 2 (10,40) (8,50) (5,10) (15,40) (7,10) (9,30) (12,20) Continue the following buffer insertion process. Assume that all partial candidate buffering solutions are as shown.

Summary Interconnect delay increases with technology scaling Linear interconnect delay with buffer insertion Buffer insertion with candidate buffer locations Pruning for accelerating buffer insertion technique