Interconnect Optimizations

Slides:



Advertisements
Similar presentations
Porosity Aware Buffered Steiner Tree Construction C. Alpert G. Gandham S. Quay IBM Corp M. Hrkic Univ Illinois Chicago J. Hu Texas A&M Univ.
Advertisements

Topics Electrical properties of static combinational gates:
Advanced Interconnect Optimizations. Buffers Improve Slack RAT = 300 Delay = 350 Slack = -50 RAT = 700 Delay = 600 Slack = 100 RAT = 300 Delay = 250 Slack.
EE141 © Digital Integrated Circuits 2nd Wires 1 The Wires Dr. Shiyan Hu Office: EERC 731 Adapted and modified from Digital Integrated Circuits: A Design.
Buffer and FF Insertion Slides from Charles J. Alpert IBM Corp.
ELEN 468 Lecture 261 ELEN 468 Advanced Logic Design Lecture 26 Interconnect Timing Optimization.
Confidentiality/date line: 13pt Arial Regular, white Maximum length: 1 line Information separated by vertical strokes, with two spaces on either side Disclaimer.
1 Interconnect Layout Optimization by Simultaneous Steiner Tree Construction and Buffer Insertion Presented By Cesare Ferri Takumi Okamoto, Jason Kong.
EE 587 SoC Design & Test Partha Pande School of EECS Washington State University
Noise Model for Multiple Segmented Coupled RC Interconnects Andrew B. Kahng, Sudhakar Muddu †, Niranjan A. Pol ‡ and Devendra Vidhani* UCSD CSE and ECE.
A Look at Chapter 4: Circuit Characterization and Performance Estimation Knowing the source of delays in CMOS gates and being able to estimate them efficiently.
Interconnect Optimizations. A scaling primer Ideal process scaling: –Device geometries shrink by  = 0.7x) Device delay shrinks by  –Wire geometries.
EE4271 VLSI Design Interconnect Optimizations Buffer Insertion.
Interconnect Optimizations. A scaling primer Ideal process scaling: –Device geometries shrink by S  = 0.7x) Device delay shrinks by s –Wire geometries.
04/11/02EECS 3121 Lecture 26: Interconnect Modeling, continued EECS 312 Reading: 8.2.2, (text) HW 8 is due now!
© Digital Integrated Circuits 2nd Inverter CMOS Inverter: Digital Workhorse  Best Figures of Merit in CMOS Family  Noise Immunity  Performance  Power/Buffer.
Power Optimal Dual-V dd Buffered Tree Considering Buffer Stations and Blockages King Ho Tam and Lei He Electrical Engineering Department University of.
04/09/02EECS 3121 Lecture 25: Interconnect Modeling EECS 312 Reading: 8.3 (text), 4.3.2, (2 nd edition)
S. Reda EN160 SP’07 Design and Implementation of VLSI Systems (EN0160) Lecture 22: Material Review Prof. Sherief Reda Division of Engineering, Brown University.
Fast Buffer Insertion Considering Process Variation Jinjun Xiong, Lei He EE Department University of California, Los Angeles Sponsors: NSF, UC MICRO, Actel,
EE 201A (Starting 2005, called EE 201B) Modeling and Optimization for VLSI Layout Instructor: Lei He
EE4271 VLSI Design Advanced Interconnect Optimizations Buffer Insertion.
ELEN 468 Lecture 271 ELEN 468 Advanced Logic Design Lecture 27 Interconnect Timing Optimization II.
Temperature-Aware Design Presented by Mehul Shah 4/29/04.
ELEN 468 Lecture 221 ELEN 468 Advanced Logic Design Lecture 22 Timing Verification.
Interconnect Synthesis. Buffering Related Interconnect Synthesis Consider –Layer assignment –Wire sizing –Buffer polarity –Driver sizing –Generalized.
Advanced Interconnect Optimizations. Timing Driven Buffering Problem Formulation Given –A Steiner tree –RAT at each sink –A buffer type –RC parameters.
MOS Inverter: Static Characteristics
CSET 4650 Field Programmable Logic Devices
1 Delay Estimation Most digital designs have multiple data paths some of which are not critical. The critical path is defined as the path the offers the.
EGRE 427 Advanced Digital Design Figures from Application-Specific Integrated Circuits, Michael John Sebastian Smith, Addison Wesley, 1997 Chapter 7 Programmable.
Modern VLSI Design 4e: Chapter 4 Copyright  2008 Wayne Wolf Topics n Interconnect design. n Crosstalk. n Power optimization.
Review: CMOS Inverter: Dynamic
1 Coupling Aware Timing Optimization and Antenna Avoidance in Layer Assignment Di Wu, Jiang Hu and Rabi Mahapatra Texas A&M University.
Chapter 4 Interconnect Analysis. Organization 4.1 Linear System 4.2 Elmore Delay 4.3 Moment Matching and Model Order Reduction –AWE –PRIMA 4.4 Recent.
EE 5900 Advanced Algorithms for Robust VLSI CAD, Spring 2009 Static Timing Analysis and Gate Sizing.
Elmore Delay, Logical Effort
A Polynomial Time Approximation Scheme For Timing Constrained Minimum Cost Layer Assignment Shiyan Hu*, Zhuo Li**, Charles J. Alpert** *Dept of Electrical.
Physical Synthesis Ing. Pullini Antonio
Modern VLSI Design 3e: Chapter 4 Copyright  1998, 2002 Prentice Hall PTR Topics n Interconnect design. n Crosstalk. n Power optimization.
A Faster Approximation Scheme for Timing Driven Minimum Cost Layer Assignment Shiyan Hu*, Zhuo Li**, and Charles J. Alpert** *Dept of ECE, Michigan Technological.
ELEN 468 Lecture 271 ELEN 468 Advanced Logic Design Lecture 27 Gate and Interconnect Optimization.
Linear Delay Model In general the propagation delay of a gate can be written as: d = f + p –p is the delay due to intrinsic capacitance. –f is the effort.
Fast Algorithms for Slew Constrained Minimum Cost Buffering S. Hu*, C. Alpert**, J. Hu*, S. Karandikar**, Z. Li*, W. Shi* and C. Sze** *Dept of ECE, Texas.
1 Interconnect/Via. 2 Delay of Devices and Interconnect.
Physical Synthesis Buffer Insertion, Gate Sizing, Wire Sizing,
Modern VLSI Design 4e: Chapter 3 Copyright  2008 Wayne Wolf Topics n Wire delay. n Buffer insertion. n Crosstalk. n Inductive interconnect. n Switch logic.
Basics of Energy & Power Dissipation
EE 4271 VLSI Design, Fall 2013 Static Timing Analysis and Gate Sizing Optimization.
Introduction to Clock Tree Synthesis
Interconnect/Via.
An Efficient Surface-Based Low-Power Buffer Insertion Algorithm
Modern VLSI Design 3e: Chapter 3 Copyright  1998, 2002 Prentice Hall PTR Topics n Wire delay. n Buffer insertion. n Crosstalk. n Inductive interconnect.
Modern VLSI Design 3e: Chapter 3 Copyright  1998, 2002 Prentice Hall PTR Topics n Electrical properties of static combinational gates: –transfer characteristics;
EE 587 SoC Design & Test Partha Pande School of EECS Washington State University
A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion Shiyan Hu*, Zhuo Li**, Charles Alpert** *Dept of Electrical.
A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion Shiyan Hu*, Zhuo Li**, Charles Alpert** *Dept of Electrical.
1 Modeling and Optimization of VLSI Interconnect Lecture 2: Interconnect Delay Modeling Avinoam Kolodny Konstantin Moiseev.
An O(bn 2 ) Time Algorithm for Optimal Buffer Insertion with b Buffer Types Authors: Zhuo Li and Weiping Shi Presenter: Sunil Khatri Department of Electrical.
Circuit Delay Performance Estimation Most digital designs have multiple signal paths and the slowest one of these paths is called the critical path Timing.
An O(nm) Time Algorithm for Optimal Buffer Insertion of m Sink Nets Zhuo Li and Weiping Shi {zhuoli, Texas A&M University College Station,
COE 360 Principles of VLSI Design Delay. 2 Definitions.
Wires & wire delay Lecture 9 Tuesday September 27, 2016.
Static Timing Analysis and Gate Sizing Optimization
Topics Driving long wires..
Jason Cong, David Zhigang Pan & Prasanna V. Srinivas
Static Timing Analysis and Gate Sizing Optimization
Wire Indctance Consequences of on-chip inductance include:
Objectives What have we learned? What are we going to learn?
Jason Cong, David Zhigang Pan & Prasanna V. Srinivas
Presentation transcript:

Interconnect Optimizations

A scaling primer Ideal process scaling: Device geometries shrink by S (= 0.7x) Device delay shrinks by s Wire geometries shrink by s R/m : r/(ws.hs) = r/s2 Cc/m : (hs).e/(Ss) = Cc C/m : similar R/m doubles, C/m and Cc/m unchanged h w l S ls hs Ss ws

Interconnect role Short (local) interconnect Used to connect nearby cells Minimize wire C, i.e., use short min-width wires Medium to long-distance (global) interconnect Size wires to tradeoff area vs. delay Increasing width  Capacitance increases, Resistance decreases Need to find acceptable tradeoff - wire sizing problem “Fat” wires Thicker cross-sections in higher metal layers Useful for reducing delays for global wires Inductance issues, sharing of limited resource

Cross-Section of A Chip

Block scaling Block area often stays same # cells, # nets doubles Wiring histogram shape invariant Global interconnect lengths don’t shrink Local interconnect lengths shrink by s

Interconnect delay scaling Delay of a wire of length l : tint = (rl)(cl) = rcl2 (first order) Local interconnects : tint : (r/s2)(c)(ls)2 = rcl2 Local interconnect delay unchanged (compare to faster devices) Global interconnects : tint : (r/s2)(c)(l)2 = (rcl2)/s2 Global interconnect delay doubles – unsustainable! Interconnect delay increasingly more dominant

Buffer Insertion For Delay Reduction

Analysis of Simple RC Circuit vT(t) v(t) C ± state variable Input waveform

Analysis of Simple RC Circuit Step-input response: v0 v0u(t) v0(1-e-t/RC)u(t) match initial state: output response for step-input:

Delays of Simple RC Circuit v(t) = v0(1 - e-t/RC) -- waveform under step input v0u(t) v(t)=0.5v0  t = 0.69RC i.e., delay = 0.69RC (50% delay) v(t)=0.1v0  t = 0.1RC v(t)=0.9v0  t = 2.3RC i.e., rise time = 2.2RC (if defined as time from 10% to 90% of Vdd) Commonly used metric TD = RC (= Elmore delay)

Elmore Delay Delay

Elmore Delay Driver is modeled as R Driver intrinsic gate delay t(B) Delay = all Ri all Cj downstream from Ri Ri*Cj Elmore delay at n2 R(B)*(C1+C2)+R(w)*C2 Elmore delay at n1 R(B)*(C1+C2) n1 n2 R(B) B R(w) C1 C2

Elmore Delay For uniform wire No matter how to lump, the Elmore delay is the same x unit wire capacitance c unit wire resistance r C

Delay for Buffer u v u C C(b) Input capacitance Driver resistance Intrinsic buffer delay

Buffers Reduce Wire Delay x/2 x/2 R C rx/2 R rx/2 cx/4 cx/4 cx/4 cx/4 C ∆t t_unbuf = R( cx + C ) + rx( cx/2 + C ) t_buf = 2R( cx/2 + C ) + rx( cx/4 + C ) + tb t_buf – t_unbuf = RC + tb – rcx2/4 x

Combinational Logic Delay Register Primary Input Register Primary Output Combinational Logic clock Combinational logic delay <= clock period

Buffered global interconnects: Intuition Interconnect delay = r.c.l2 Now, interconnect delay =  r.c.li2 < r.c.l2 (where l = S lj ) since S (lj 2) < (S lj )2 (Of course, account for buffer delay also) l l1 ln l3 l2

Optimal inter-buffer length First order (lumped parasitic, Elmore delay) analysis Assume N identical buffers with equal inter-buffer length l (L = Nl) For minimum delay, L Rd – On resistance of inverter Cg – Gate input capacitance r,c – Resistance, cap. per micron … l

Optimal interconnect delay Substituting lopt back into the interconnect delay expression: Delay grows linearly with L (instead of quadratically)

Total buffer count 10 20 30 40 50 60 70 80 90nm 65nm 45nm 32nm % cells used to buffer nets clk-buf buf tot-buf Ever-increasing fractions of total cell count will be buffers 70% in 32nm

ITRS projections Source: ITRS, 2003 0.1 1 10 100 250 180 130 90 65 45 32 Feature size (nm) Relative delay Gate delay (fanout 4) Local interconnect (M1,2) Global interconnect with repeaters Global interconnect without repeaters

Buffers Improve Slack slackmin = -50 slackmin = 50 RAT = 300 Delay = 350 Slack = -50 slackmin = -50 RAT = 700 Delay = 600 Slack = 100 RAT = Required Arrival Time Slack = RAT - Delay RAT = 300 Delay = 250 Slack = 50 Decouple capacitive load from critical path slackmin = 50 RAT = 700 Delay = 400 Slack = 300

Timing Driven Buffering Problem Formulation Given A Steiner tree RAT at each sink A buffer type RC parameters Candidate buffer locations Find buffer insertion solution such that the slack at the driver is maximized

Candidate Buffering Solutions

Candidate Solution Characteristics Each candidate solution is associated with vi: a node ci: downstream capacitance qi: RAT vi is a sink ci is sink capacitance v is an internal node

Van Ginneken’s Algorithm Candidate solutions are propagated toward the source Dynamic Programming

Solution Propagation: Add Wire x (v1, c1, q1) (v2, c2, q2) c2 = c1 + cx q2 = q1 – rcx2/2 – rxc1 r: wire resistance per unit length c: wire capacitance per unit length

Solution Propagation: Insert Buffer (v1, c1, q1) (v1, c1b, q1b) c1b = Cb q1b = q1 – Rbc1 – tb Cb: buffer input capacitance Rb: buffer output resistance tb: buffer intrinsic delay

Solution Propagation: Merge (v, cl , ql) (v, cr , qr) cmerge = cl + cr qmerge = min(ql , qr)

Solution Propagation: Add Driver (v0, c0, q0) (v0, c0d, q0d) q0d = q0 – Rdc0 = slackmin Rd: driver resistance Pick solution with max slackmin

Example of Solution Propagation r = 1, c = 1 Rb = 1, Cb = 1, tb = 1 Rd = 1 2 2 (v1, 1, 20) Add wire (v2, 3, 16) (v2, 1, 12) v1 v1 Insert buffer Add wire Add wire (v3, 5, 8) (v3, 3, 8) v1 v1 slack = 3 slack = 5 Add driver Add driver

Example of Merging Left candidates Right candidates Merged candidates

Solution Pruning Two candidate solutions Solution 1 is inferior if (v, c1, q1) (v, c2, q2) Solution 1 is inferior if c1 > c2 : larger load and q1 < q2 : tighter timing

Pruning When Insert Buffer They have the same load cap Cb, only the one with max q is kept

Generating Candidates (1) (2) (3) From Dr. Charles Alpert

Pruning Candidates (3) (b) (a) Both (a) and (b) “look” the same to the source. Throw out the one with the worst slack (4)

Candidate Example Continued (4) (5)

Candidate Example Continued After pruning (5) At driver, compute which candidate maximizes slack. Result is optimal.

Merging Branches Right Candidates Left

Pruning Merged Branches Critical With pruning

Van Ginneken Example (20,400) Buffer C=5, d=30 Wire C=10,d=150 (30,250) (5, 220) (20,400) Buffer C=5, d=50 C=5, d=30 Wire C=15,d=200 C=15,d=120 (30,250) (5, 220) (45, 50) (5, 0) (20,100) (5, 70) (20,400)

Van Ginneken Example Cont’d (45, 50) (5, 0) (20,100) (5, 70) (30,250) (5, 220) (20,400) (5,0) is inferior to (5,70). (45,50) is inferior to (20,100) Wire C=10 (30,250) (5, 220) (20,100) (5, 70) (30,10) (15, -10) (20,400) Pick solution with largest slack, follow arrows to get solution

Basic Data Structure (c1, q1) (c2, q2) (c3, q3) Sorted list such that Worse load cap (c1, q1) (c2, q2) (c3, q3) Better timing Sorted list such that c1 < c2 < c3 If there is no inferior candidates q1 < q2 < q3

Prune Solution List (c1, q1) (c2, q2) (c3, q3) (c4, q4) Increasing c N q1 < q2 ? Prune 2 q1 < q3 ? Prune 3 q1 < q4 ? Y Y N q2 < q3 ? Prune 3 q2 < q4 ? Y N N q3 < q4 ? Prune 4 q3 < q4 ? Prune 4

Pruning In Merging Left candidates Right candidates ql1 < ql2 < qr1 < ql3 < qr2 (cl1, ql1) (cl2, ql2) (cl3, ql3) (cr1, qr1) (cr2, qr2) (cl1, ql1) (cl2, ql2) (cl3, ql3) (cr1, qr1) (cr2, qr2) Merged candidates (cl1+cr1, ql1) (cl2+cr1, ql2) (cl3+cr1, qr1) (cl3+cr2, ql3) (cl1, ql1) (cl2, ql2) (cl3, ql3) (cr1, qr1) (cr2, qr2) (cl1, ql1) (cl2, ql2) (cl3, ql3) (cr1, qr1) (cr2, qr2)

Van Ginneken Complexity Generate candidates from sinks to source Quadratic runtime Adding a wire does not change #candidates Adding a buffer adds only one new candidate Merging branches additive, not multiplicative Linear time solution list pruning Optimal for Elmore delay model

Multiple Buffer Types 2 2 (v1, 1, 20) (v2, 3, 16) v1 (v2, 2, 14) r = 1, c = 1 Rb1 = 1, Cb1 = 1, tb1 = 1 Rb2 = 0.5, Cb2 = 2, tb2 = 0.5 Rd = 1 (v1, 1, 20) (v2, 3, 16) v1 (v2, 2, 14) (v2, 1, 12) v1 v1