1 Arkadiy Morgenshtein, Eby G. Friedman, Ran Ginosar, Avinoam Kolodny Technion – Israel Institute of Technology Timing Optimization in Logic with Interconnect SLIP (System Level Interconnect Prediction) 2008
2 Timing Optimization function AB Typically, a mixture of both Intro Special cases A B AB only gates only wires
3 Logic with Wires Common Example Intro UART design
4 The Interconnect Wall Logic w/o wires Long wires Logic Gate Sizing Logical Effort Interconnect Optimization Repeater Insertion Intro
5 Timing Optimization in Logic with Interconnect Logic w/o wires Long wires A B Intro
6 Existing Techniques A (very) Short Tutorial
7 Logical Effort (only logic) - delay of minimal inverter R 0 ·C 0, technology constant Delay model - logical effort, gate type factor: e.g. g inv =1 - electrical effort, load driving capability Delay = = = Intro - parasitic effort, due to output capacitance I. Sutherland, B. Sproull, and D. Harris, “Logical Effort - Designing Fast CMOS Circuits,” Morgan Kaufmann, Optimal sizing Delay i = Delay i+1 g i h i =g i+1 h i+1
8 No wires Limitations of Logical Effort Delay = = = = = = LE breaks down Logic with wires and branches No fixed side branches Intro ? ? ?
9 Optimal sizing Optimal number of repeaters Repeater Insertion (only wires) Delay ~ Length 2 D = RC = 25 D = Σrc = 5 Intro H.B. Bakoglu, “Circuits, Interconnections and Packaging for VLSI,” Adison-Wesley, pp. 194 ‑ 219, 1990 Delay ~ Length - effective resistance of minimal inverter - wire resistance - gate capacitance of minimal inverter - wire capacitance
10 Properties of Repeater Insertion Characteristics of RI Number and size of repeaters are independent Single optimal size for a given process and metal layer x fixed = Intro equal Assumptions of basic repeater insertion (RI) Equal size Equal spacing Terminal gates are similar to repeaters
11 So, What Are We Going To Do?
12 We Are Breaking The Wall Logic w/o wiresLong wires Intro Logical EffortRepeaters Insertion Challenges: Gate placements Gate sizes Number of gates, repeaters WANTED – solution for the mixed case
13 Our Approach to Timing Optimization Unified Logical Effort (ULE) Gate-terminated Sized Repeater Insertion (GSRI) Logic Gates as Repeaters (LGR) Gate placement (along the wire) Gate sizes Number of repeaters
14 Logic Gates as Repeaters - LGR “Where should the gates be located (along the wire)?”
15 The Idea LGR Problem – delay reduction in logic with wire A solution – wire segmenting by repeaters Drawback – power, area w/o logical functionality = waste Proposed – logic gates as repeaters LGR - distribution of logic gates over interconnect - driving the partitioned wire without adding repeaters K. Venkat, “Generalized Delay Optimization of Resistive Interconnections through an Extension of Logical Effort,” ISCAS 1993
16 LGR Delay Modeling Total Delay LGR M. Moreinis, A. Morgenshtein, I. Wagner, and A. Kolodny, “Logic Gates as Repeaters (LGR) for Area-Efficient Timing Optimization,” IEEE TVLSI, 2006
17 Optimal Wire Segmenting Output resistance of driving gate i below average wire length i is increased Input capacitance of successor gate i+1 above average wire length i is decreased All gates are equal equal partitioning In the case of a negative segment length, neighbor gates are merged LGR
18 LGR Results Delay reduction of up-to 27% - by “moving” the gates Critical path of decoder circuit LGR Further delay reduction – by scaling and LGR+RI M. Moreinis, et al., “Repeater Insertion combined with LGR Methodology for on-Chip Interconnect Timing Optimization,” ICECS, 2004.
19 Optimal Gate Scaling Enlargement of all gates by a uniform factor S to minimize timing can be performed iteratively with Segmenting inverters equal segments LGR
20 LGR Segmenting and Scaling For intermediate wires LGR outperforms RI by up-to 55% For long wires RI is faster BUT: it requires 44 repeaters Best for long wires – combined LGR and RI Uniform scaling performed for all gates M. Moreinis, et al., “Repeater Insertion combined with LGR Methodology for on-Chip Interconnect Timing Optimization,” ICECS, LGR
21 Logic gates serve as repeaters No need for logically redundant repeaters Delay reduction + lower area/power Can be combined with RI LGR Summary LGR
22 Unified Logical Effort - ULE “What is the optimal size of the gates?”
23 Unified Delay Model (including wires) Capacitive interconnect effort Resistive interconnect effort ULE
24 Minimal Delay Condition ULE Minimal Delay Equal Stage Delays
25 Minimal Delay for Capacitive Wires Capacitive interconnect (short wires and branches) General RC interconnect ULE
26 ULE Convergence to LE and RI repeater insertion repeater scaling special cases ULE logic without wires Logical Effort
27 Some Algebra… ULE
28 Intuition of ULE Optimum ULE optimal size = Delay caused by gate capacitance should be equal to delay caused by gate resistance
29 ULE Optimality ULE Size too small high resistance Size too big high capacitance
30 Optimal Gate Capacitance ULE Expression for size of a single gate Gate sizes along a logic path are iteratively determined
31 Examples (1): ULE Sizing Equal wires Total electrical effort H = 10 L = 0 Size converges to LE Longer wires ULE is faster Long wires Fixed sizing x opt ULE Gate # C a p a c i t a n c e ( × C 0 ) x opt LE 10 μ m 50 µ m 100 µ m 0. 5 m m L=1mm 10 L=0
32 Examples (2): ULE Sizing Total electrical effort H = 1 L = 0 Converges to LE (no scaling) All wire lengths ULE is faster Long wires Fixed sizing x opt ULE Gate# C a p a c i t a n c e ( × C 0 ) x opt LE 10µm 50µm 100µm 0.5mm L=1 L=0
33 So, What is X opt ? For long wires ULE
34 Optimum Condition for Long Wires ULE For long wires
35 X opt and Repeaters equal wires INV (g=1) Optimal sizing condition for repeater ULE H.B. Bakoglu, “Circuits, Interconnections and Packaging for VLSI,” Adison-Wesley, pp. 194 ‑ 219, 1990
36 Solving Design Problems with X opt - Layout constraint - optimal size of the repeater located between two wires ULE
37 Solving Design Problems with X opt - Cell size constraint - optimal wire length with a repeater of size x rep ULE
38 Typical Design Example Optimal ULE sizing (a)similar gates, similar wires (b)different gates, similar wires (c)similar gates, different wires Gates with higher logical effort get bigger size No fixed x opt in circuits with various gates and wires ULE
39 ULE Results Critical path in a logic circuit (e.g. Adder) Simulation Setup Compared to Cadence Virtuoso® Analog Optimizer (using numerical algorithms) 65 nm CMOS ULE
40 LE becomes inaccurate as the wire lengths grows ULE is close to Analog Optimizer tool within 9% ULE: minimal delay Analog Optimizer: minimal delay (but sloooooow) Logical Effort: higher delay Delay Optimization ULE
41 ULE run time is orders of magnitude shorter than the run time of Analog Optimizer ULE run time is shorter than 1 second Run time [min] Run Time Comparison ULE
42 Power-Delay Optimization in ULE Power is function of gate and wire capacitances Optimal gate size C i ULE
43 x1 L1 x2 L2 x3 L3 x4 L4 X5 L5 x6 L6 x7 L7 x8 L8 x9 L9 X10 Sizing for minimal P×D Random logic path assumed with 10 stages Four wire length scenarios S1: all wires L = 100µm S2: all wires L = 80µm S3: all wires L = 400µm S4: L = {900,600,150,300,800,200,400,150,250} Power-Delay optimization reduces gate sizes as compared to Delay optimization Gate size (×C 0 ) ULE (S4) minimal Delay minimal Power×Delay
44 energy (pJ) delay (ps) Reduced Energy, Low Delay Penalty ULE Energy S1S2S3S4 scenario energy [pJ] minimal Power-Delay minimal Delay Delay S1S2S3S4 scenario delay [ps] minimal Power-Delay minimal Delay
45 ULE for Branches and Fanout General ULE condition for gate sizing ULE
46 ULE Sizing in Path with Branches Four branch scenarios S1: Lb = 400µm, Cb = 1 for all branches S2: Lb = 400µm, Cb = 30 for all branches S3: Lb = {400, 100, 400, 400}µm, Cb = {30,1,30,1} S4: Lb = {100, 100, 100, 400}µm, Cb = {1,1,1,30} Lw = 100µm for all wires at critical path Branches cause a change in sizing as compared to ULE without branches
47 ULE Delay Optimization with Branches Additional delay reduction is obtained using extended ULE condition with branches
48 Useful over entire range of problems logic only – logic & wires – wires only Computes optimal gate sizes Low computational complexity Unified Logical Effort Summary ULE =
49 “When can I reduce delay by adding an inverter?” One More Question: ULE
50 Adding an Inverter to Reduce Delay condition for inverter insertion ULE
51 Inverter Addition vs. Gate Sizing L = 1000µm X 1, X 3 - variables Inverter insertion depends on the value and ratio of the gate sizes X 1 and X 3 Size of the inverter X 2 is determined from ULE ULE
52 Inverter Addition – More Applications No wires Beneficial when the electrical effort is higher than 4 vs. wire length equal wires Beneficial when the wire is longer than Lcr Power Beneficial when the expected delay reduction is more than ∆ ULE
53 Example: Critical Wire Length Lcr (µm) ∆ Critical Length vs. ∆ Critical length Lcr for inverter insertion depends upon the minimal delay reduction factor ∆ Size of the inverter X2 is determined from ULE ULE
54 Gate-Terminated Sized Repeater Insertion - GSRI “What is the optimal number of gates/repeaters?”
55 Revisiting Standard Repeater Insertion GSRI RI Assumptions Fixed and equal sizes Terminal gates are similar to repeaters fixed equal BUT The wires are usually located between different logic gates Different repeater sizes may be chosen Gate-Terminated Sized Repeater Insertion (GSRI) is proposed
56 Delay Model of Logic with Repeaters GSRI
57 Delay Minimization by GSRI GSRI RI assumptions - Long wires - Terminal gates are repeaters - Many repeaters (K>>1) H.B. Bakoglu, “Circuits, Interconnections and Packaging for VLSI,” Adison-Wesley, pp. 194 ‑ 219, 1990
58 Example: Single Wire GSRI how many repeaters? RI 2 GSRI 4 Why? The first gate is weaker than the repeater (RI assumption is inaccurate)
59 Number of Repeaters in Logic Path GSRI GSRI allows optimization of shorter wires than RI The number of repeaters per wire is not equal in GSRI: - Higher electrical effort more repeaters - ALU critical path, 65nm process - Several wire lengths scenarios - ULE sizing performed before GSRI
60 Delay Reduction by GSRI GSRI ULE sizing w/o repeaters RI/GSRI ULE sizing on repeaters GSRI result in up to 25% delay reduction as compared to RI ULE further reduces the delay by up to 27% mostly in short wires
61 GSRI Followed by ULE Sizing GSRI Two alternatives for ULE sizing - Sizing of the repeaters, without sizing the gates - Power-efficient - Sizing of the entire path, including the gates and the repeaters - Lowest delay Size (×C 0 ) GSRI GSRI)
62 Using Smaller Repeaters GSRI Smaller size more repeaters Power may decrease for higher number of smaller repeaters Many smaller repeaters reduced transition time lower short-circuit currents 17% delay reduction 15% power reduction & Delay [ps] Power [pW]
63 Additional Perspective GSRI GSRI may provide smaller delay with smaller repeaters than RI Power-aware RI will lead to higher delay penalty than currently assumed
64 Accurate number of repeaters Terminal gates ≠ repeaters Supports smaller repeaters Analytic expression – no more “rules of thumb” Minimal delay GSRI delay < standard RI delay Gate-terminated Sized Repeater Insertion Summary GSRI
65 Summary of Approaches ULE GSRI LGR
66 Summary LE – only logic RI – only wires We propose: general solution - logic with wires Unified Logical Effort (ULE) - Fast sizing of gates in presence of interconnect - Intuitive conditions for minimal delay Gate-terminated Sized Repeater Insertion (GSRI) - Accurate optimal number of repeaters - Enhanced design flexibility and smaller delay than in RI Logic Gates as Repeaters (LGR) - Distribution of logic gates over interconnect - Delay optimization without logically-redundant repeaters
67 Future Work Analyzing wire sizing Developing power efficient heuristics Incorporating inductance Integration in EDA tools
68 Thank You!