Elmore Delay, Logical Effort
Modern Interconnect © Rabaey, ch4Wire.ppt, slide 22
Example: Intel 0.25 micron Process 5 metal layers Ti/Al - Cu/Ti/TiN Polysilicon dielectric © Rabaey, ch4Wire.ppt, slide 23
Modern Interconnect 90nm process © Chris Kim (image from Intel?)
The Lumped RC-Model, The Elmore Delay (result *0.69) © Rabaey, ch4Wire.ppt, slide 27
Example: The Elmore Delay Shared Paths: R44 = R1+R3+R4 Rii = R1+R3+Ri Ri4 = R1+R3 Ri2 = R1 Ti = C1R1 + C2R1 + C3(R1+R3) + C4(R1+R3) + Ci(R1+R3+Ri)
The Elmore Delay RC Chain © Rabaey, ch4Wire.ppt
The Distributed RC-line Diffusion Equation © Rabaey, ch4Wire.ppt
Deriving the Diffusion Eq
Step-response of RC wire as a function of time and space © Rabaey, ch4Wire.ppt
RC-Models © Rabaey, ch4Wire.ppt
Driving an RC-line © Rabaey, ch4Wire.ppt
Designing Fast CMOS Gates Slides from chapter6.ppt of Rabaey’s page
Fan-In Considerations B C D CL A Distributed RC model (Elmore delay) tpHL = 0.69 Reqn(C1+2C2+3C3+4CL) Propagation delay deteriorates rapidly as a function of fan-in – quadratically in the worst case. C3 B C2 C While output capacitance makes full swing transition (from VDD to 0), internal nodes only transition from VDD-VTn to GND C1, C2, C3 on the order of 0.85 fF for W/L of 0.5/0.25 NMOS and 0.375/0.25 PMOS CL of 3.2 fF with no output load (all diffusion capacitance – intrinsic capacitance of the gate itself). To give a 80.3 psec tpHL (simulated as 86 psec) C1 D
tp as a Function of Fan-In tpHL quadratic linear tp Gates with a fan-in greater than 4 should be avoided. tp (psec) tpLH Fixed fan-out (NMOS 0.5 micrcon, PMOS 1.5 micron) tpLH increases linearly due to the linearly increasing value of the diffusion capacitance tpHL increase quadratically due to the simultaneous incrase in pull-down resistance and internal capacitance fan-in
tp as a Function of Fan-In and Fan-Out Fan-in: quadratic due to increasing resistance and capacitance Fan-out: each additional fan-out gate adds two gate capacitances to CL tp = a1FI + a2FI2 + a3FO a1 term is for parallel chain, a2 term is for serial chain, a3 is fan-out
Fast Complex Gates: Design Technique 1 Transistor sizing as long as fan-out capacitance dominates Progressive sizing CL Distributed RC line M1 > M2 > M3 > … > MN (the fet closest to the output is the smallest) InN MN M1 have to carry the discharge current from M2, M3, … MN and CL so make it the largest MN only has to discharge the current from MN (no internal capacitances) C3 In3 M3 C2 In2 M2 Can reduce delay by more than 20%; decreasing gains as technology shrinks C1 In1 M1
Fast Complex Gates: Design Technique 2 Transistor ordering critical path critical path 01 CL CL charged charged 1 In1 In3 M3 M3 1 C2 1 C2 In2 In2 M2 discharged M2 charged For lecture. Critical input is latest arriving signal Place latest arriving signal (critical path) closest to the output 1 C1 C1 In3 discharged In1 charged M1 M1 01 delay determined by time to discharge CL, C1 and C2 delay determined by time to discharge CL
Fast Complex Gates: Design Technique 3 Alternative logic structures F = ABCDEFGH Reduced fan-in -> deeper logic depth Reduction in fan-in offsets, by far, the extra delay incurred by the NOR gate (second configuration). Only simulation will tell which of the last two configurations is faster, lower power
Fast Complex Gates: Design Technique 4 Isolating fan-in from fan-out using buffer insertion CL CL Reduce CL on large fan-in gates, especially for large CL, and size the inverters progressively to handle the CL more effectively
Slides from chapter6.ppt of Rabaey’s page Logical Effort Slides from chapter6.ppt of Rabaey’s page
Transistor Sizing D=1+f D=2+4/3 f D=2+5/3 f Cg= Cint= Cg= Cint= Cg= Assumes Rp = Rn
Normalized Space
Parasitic Term P NOTE: p is a gate parameter function(W)
Logical Effort Term g NOTE: g is a gate parameter function(W)
Transistor Sizing a Complex CMOS Gate B 8 6 4 3 C 8 6 D 4 6 OUT = D + A • (B + C) For class lecture. Red sizing assuming Rp = Rn Follow short path first; note PMOS for C and B 4 rather than 3 – average in pull-up chain of three – (4+4+2)/3 = 3 Also note structure of pull-up and pull-down to minimize diffusion cap at output (e.g., single PMOS drain connected to output) Green for symmetric response and for performance (where Rn = 3 Rp) Sizing rules of thumb PMOS = 3 * NMOS 1 in series = 1 2 in series = 2 3 in series = 3 etc. A 2 D 1 B 2 C 2
Logical Effort From Sutherland, Sproull
Logical Effort of Gates
tp as a Function of Fan-Out All gates have the same drive current. tpNOR2 tpNAND2 tpINV tp (psec) Slope is a function of “driving strength” slope is a function of the driving strength eff. fan-out
Buffer Example In Out CL 1 2 N (in units of tinv) For given N: Ci+1/Ci = Ci/Ci-1 To find N: Ci+1/Ci ~ 4 How to generalize this to any logic path?
Delay in a Logic Gate Gate delay: d = h + p effort delay intrinsic delay Effort delay: h = g f logical effort effective fanout = Cout/Cin Logical effort is a function of topology, independent of sizing Effective fanout (electrical effort) is a function of load/gate size
Add Branching Effort Branching effort:
Multistage Networks Stage effort: hi = gifi Path electrical effort: F = Cout/Cin Path logical effort: G = g1g2…gN Branching effort: B = b1b2…bN Path effort: H = GFB Path delay D = Sdi = Spi + Shi
Optimum Effort per Stage When each stage bears the same effort: Stage efforts: g1f1 = g2f2 = … = gNfN Effective fanout of each stage: Minimum path delay
Optimal Number of Stages For a given load, and given input capacitance of the first gate Find optimal number of stages and optimal sizing Substitute ‘best stage effort’
Example: Optimize Path g = 1 f = a g = 5/3 f = b/a g = 5/3 f = c/b g = 1 f = 5/c Effective fanout, F = 5 G = H = h = a = b = c =
Example: Optimize Path g = 1 f = a g = 5/3 f = b/a g = 5/3 f = c/b g = 1 f = 5/c Effective fanout, F = 5 G = 25/9 H = 125/9 = 13.9 h = 1.93 a = 1.93 b = ha/g2 = 2.23 c = hb/g3 = 5g4/f = 2.59
Example – 8-input AND
Method of Logical Effort Compute the path effort: F = GBH Find the best number of stages N ~ log4F Compute the stage effort f = F1/N Sketch the path with this number of stages Work either from either end, find sizes: Cin = Cout*g/f Reference: Sutherland, Sproull, Harris, “Logical Effort, Morgan-Kaufmann 1999.
Summary Sutherland, Sproull Harris