Presentation is loading. Please wait.

Presentation is loading. Please wait.

Interconnect Optimizations

Similar presentations


Presentation on theme: "Interconnect Optimizations"— Presentation transcript:

1 Interconnect Optimizations

2 A scaling primer Ideal process scaling:
Device geometries shrink by S (= 0.7x) Device delay shrinks by s Wire geometries shrink by s R/m : r/(ws.hs) = r/s2 Cc/m : (hs).e/(Ss) = Cc C/m : similar R/m doubles, C/m and Cc/m unchanged h w l S ls hs Ss ws

3 Interconnect role Short (local) interconnect
Used to connect nearby cells Minimize wire C, i.e., use short min-width wires Medium to long-distance (global) interconnect Size wires to tradeoff area vs. delay Increasing width  Capacitance increases, Resistance decreases Need to find acceptable tradeoff - wire sizing problem “Fat” wires Thicker cross-sections in higher metal layers Useful for reducing delays for global wires Inductance issues, sharing of limited resource

4 Cross-Section of A Chip

5 Block scaling Block area often stays same
# cells, # nets doubles Wiring histogram shape invariant Global interconnect lengths don’t shrink Local interconnect lengths shrink by s

6 Interconnect delay scaling
Delay of a wire of length l : tint = (rl)(cl) = rcl (first order) Local interconnects : tint : (r/s2)(c)(ls)2 = rcl2 Local interconnect delay unchanged (compare to faster devices) Global interconnects : tint : (r/s2)(c)(l)2 = (rcl2)/s2 Global interconnect delay doubles – unsustainable! Interconnect delay increasingly more dominant

7 Buffer Insertion For Delay Reduction

8 Analysis of Simple RC Circuit
vT(t) v(t) C state variable Input waveform

9 Analysis of Simple RC Circuit
Step-input response: v0 v0u(t) v0(1-e-t/RC)u(t) match initial state: output response for step-input:

10 Delays of Simple RC Circuit
v(t) = v0(1 - e-t/RC) -- waveform under step input v0u(t) v(t)=0.5v0  t = 0.69RC i.e., delay = 0.69RC (50% delay) v(t)=0.1v0  t = 0.1RC v(t)=0.9v0  t = 2.3RC i.e., rise time = 2.2RC (if defined as time from 10% to 90% of Vdd) Commonly used metric TD = RC (= Elmore delay)

11 Elmore Delay Delay

12 Elmore Delay Driver is modeled as R Driver intrinsic gate delay t(B)
Delay = all Ri all Cj downstream from Ri Ri*Cj Elmore delay at n2 R(B)*(C1+C2)+R(w)*C2 Elmore delay at n1 R(B)*(C1+C2) n1 n2 R(B) B R(w) C1 C2

13 Elmore Delay For uniform wire
No matter how to lump, the Elmore delay is the same x unit wire capacitance c unit wire resistance r C

14 Delay for Buffer u v u C C(b) Input capacitance Driver resistance
Intrinsic buffer delay

15 Buffers Reduce Wire Delay
x/2 x/2 R C rx/2 R rx/2 cx/4 cx/4 cx/4 cx/4 C ∆t t_unbuf = R( cx + C ) + rx( cx/2 + C ) t_buf = 2R( cx/2 + C ) + rx( cx/4 + C ) + tb t_buf – t_unbuf = RC + tb – rcx2/4 x

16 Combinational Logic Delay
Register Primary Input Register Primary Output Combinational Logic clock Combinational logic delay <= clock period

17 Buffered global interconnects: Intuition
Interconnect delay = r.c.l2 Now, interconnect delay =  r.c.li2 < r.c.l2 (where l = S lj ) since S (lj 2) < (S lj )2 (Of course, account for buffer delay also) l l1 ln l3 l2

18 Optimal inter-buffer length
First order (lumped parasitic, Elmore delay) analysis Assume N identical buffers with equal inter-buffer length l (L = Nl) For minimum delay, L Rd – On resistance of inverter Cg – Gate input capacitance r,c – Resistance, cap. per micron l

19 Optimal interconnect delay
Substituting lopt back into the interconnect delay expression: Delay grows linearly with L (instead of quadratically)

20 Total buffer count 10 20 30 40 50 60 70 80 90nm 65nm 45nm 32nm % cells used to buffer nets clk-buf buf tot-buf Ever-increasing fractions of total cell count will be buffers 70% in 32nm

21 ITRS projections Source: ITRS, 2003 0.1 1 10 100 250 180 130 90 65 45 32 Feature size (nm) Relative delay Gate delay (fanout 4) Local interconnect (M1,2) Global interconnect with repeaters Global interconnect without repeaters

22 Buffers Improve Slack slackmin = -50 slackmin = 50 RAT = 300
Delay = 350 Slack = -50 slackmin = -50 RAT = 700 Delay = 600 Slack = 100 RAT = Required Arrival Time Slack = RAT - Delay RAT = 300 Delay = 250 Slack = 50 Decouple capacitive load from critical path slackmin = 50 RAT = 700 Delay = 400 Slack = 300

23 Timing Driven Buffering Problem Formulation
Given A Steiner tree RAT at each sink A buffer type RC parameters Candidate buffer locations Find buffer insertion solution such that the slack at the driver is maximized

24 Candidate Buffering Solutions

25 Candidate Solution Characteristics
Each candidate solution is associated with vi: a node ci: downstream capacitance qi: RAT vi is a sink ci is sink capacitance v is an internal node

26 Van Ginneken’s Algorithm
Candidate solutions are propagated toward the source Dynamic Programming

27 Solution Propagation: Add Wire
x (v1, c1, q1) (v2, c2, q2) c2 = c1 + cx q2 = q1 – rcx2/2 – rxc1 r: wire resistance per unit length c: wire capacitance per unit length

28 Solution Propagation: Insert Buffer
(v1, c1, q1) (v1, c1b, q1b) c1b = Cb q1b = q1 – Rbc1 – tb Cb: buffer input capacitance Rb: buffer output resistance tb: buffer intrinsic delay

29 Solution Propagation: Merge
(v, cl , ql) (v, cr , qr) cmerge = cl + cr qmerge = min(ql , qr)

30 Solution Propagation: Add Driver
(v0, c0, q0) (v0, c0d, q0d) q0d = q0 – Rdc0 = slackmin Rd: driver resistance Pick solution with max slackmin

31 Example of Solution Propagation
r = 1, c = 1 Rb = 1, Cb = 1, tb = 1 Rd = 1 2 2 (v1, 1, 20) Add wire (v2, 3, 16) (v2, 1, 12) v1 v1 Insert buffer Add wire Add wire (v3, 5, 8) (v3, 3, 8) v1 v1 slack = 3 slack = 5 Add driver Add driver

32 Example of Merging Left candidates Right candidates Merged candidates

33 Solution Pruning Two candidate solutions Solution 1 is inferior if
(v, c1, q1) (v, c2, q2) Solution 1 is inferior if c1 > c2 : larger load and q1 < q2 : tighter timing

34 Pruning When Insert Buffer
They have the same load cap Cb, only the one with max q is kept

35 Generating Candidates
(1) (2) (3) From Dr. Charles Alpert

36 Pruning Candidates (3) (b) (a)
Both (a) and (b) “look” the same to the source. Throw out the one with the worst slack (4)

37 Candidate Example Continued
(4) (5)

38 Candidate Example Continued
After pruning (5) At driver, compute which candidate maximizes slack. Result is optimal.

39 Merging Branches Right Candidates Left

40 Pruning Merged Branches
Critical With pruning

41 Van Ginneken Example (20,400) Buffer C=5, d=30 Wire C=10,d=150
(30,250) (5, 220) (20,400) Buffer C=5, d=50 C=5, d=30 Wire C=15,d=200 C=15,d=120 (30,250) (5, 220) (45, 50) (5, 0) (20,100) (5, 70) (20,400)

42 Van Ginneken Example Cont’d
(45, 50) (5, 0) (20,100) (5, 70) (30,250) (5, 220) (20,400) (5,0) is inferior to (5,70). (45,50) is inferior to (20,100) Wire C=10 (30,250) (5, 220) (20,100) (5, 70) (30,10) (15, -10) (20,400) Pick solution with largest slack, follow arrows to get solution

43 Basic Data Structure (c1, q1) (c2, q2) (c3, q3) Sorted list such that
Worse load cap (c1, q1) (c2, q2) (c3, q3) Better timing Sorted list such that c1 < c2 < c3 If there is no inferior candidates q1 < q2 < q3

44 Prune Solution List (c1, q1) (c2, q2) (c3, q3) (c4, q4) Increasing c N
q1 < q2 ? Prune 2 q1 < q3 ? Prune 3 q1 < q4 ? Y Y N q2 < q3 ? Prune 3 q2 < q4 ? Y N N q3 < q4 ? Prune 4 q3 < q4 ? Prune 4

45 Pruning In Merging Left candidates Right candidates
ql1 < ql2 < qr1 < ql3 < qr2 (cl1, ql1) (cl2, ql2) (cl3, ql3) (cr1, qr1) (cr2, qr2) (cl1, ql1) (cl2, ql2) (cl3, ql3) (cr1, qr1) (cr2, qr2) Merged candidates (cl1+cr1, ql1) (cl2+cr1, ql2) (cl3+cr1, qr1) (cl3+cr2, ql3) (cl1, ql1) (cl2, ql2) (cl3, ql3) (cr1, qr1) (cr2, qr2) (cl1, ql1) (cl2, ql2) (cl3, ql3) (cr1, qr1) (cr2, qr2)

46 Van Ginneken Complexity
Generate candidates from sinks to source Quadratic runtime Adding a wire does not change #candidates Adding a buffer adds only one new candidate Merging branches additive, not multiplicative Linear time solution list pruning Optimal for Elmore delay model

47 Multiple Buffer Types 2 2 (v1, 1, 20) (v2, 3, 16) v1 (v2, 2, 14)
r = 1, c = 1 Rb1 = 1, Cb1 = 1, tb1 = 1 Rb2 = 0.5, Cb2 = 2, tb2 = 0.5 Rd = 1 (v1, 1, 20) (v2, 3, 16) v1 (v2, 2, 14) (v2, 1, 12) v1 v1


Download ppt "Interconnect Optimizations"

Similar presentations


Ads by Google