Download presentation
Presentation is loading. Please wait.
1
VLSI Arithmetic Lecture 8
Prof. Vojin G. Oklobdzija University of California
2
Designing for Speed and Power
Ultimate Speed Adders, IEEE Trans on Electronic Computers, April, 1963 – correspondence between Sklansky and Lehman: Sklansky: “Consequently the question: “Which adder is the fastest?” is an impossibly difficult question if we define adder speed as the contribution of an adder to the over-all computational effectiveness.” June 18, 2003
3
Designing for Speed and Power
Ultimate Speed Adders, IEEE Trans on Electronic Computers, April, 1963 – correspondence between Sklansky and Lehman: Sklansky: “At this point we find that we still cannot answer the proposed question, for the following reason: - we have not yet defined a unit of addition time. The natural unit to adopt is the delay of a single AND gate or OR gate. In practice, however, the speeds of these gates are dependent on several factors of which gate delay is only one” June 18, 2003
4
Designing for Speed and Power
Ultimate Speed Adders, IEEE Trans on Electronic Computers, April, 1963 – correspondence between Sklansky and Lehman: Sklansky: “When in this communication we answer the question: “Which adder is the fastest?” we shall really be answering the following more restricted question. “Which binary parallel adder consumes the fewest gate delays in adding two summands, under the constraint that the fan-in and fan-out capacities of the individual gates do not exceed certain specified limits?”. June 18, 2003
5
Designing for Speed and Power
Ultimate Speed Adders, IEEE Trans on Electronic Computers, April, 1963 – correspondence between Sklansky and Lehman: Lehman: “Continued study of the binary adder design problem.. has convinced me that the search after a “best”, a “fastest” or a “most efficient” circuit is futile.” “Moreover at the highest speeds the logically faster circuit is not necessarily physically faster. Increased cabling and higher component densities in the more complex circuits may often do more harm than good”. June 18, 2003
6
Designing for Speed and Power
Ultimate Speed Adders, IEEE Trans on Electronic Computers, April, 1963 – correspondence between Sklansky and Lehman: Lehman: “Thus I believe that in the future too, devices with fan-in and fan-out each of order five or more will be perfectly practical, yielding circuits as fast as and probably cheaper than those based on gates with more restricted operation conditions.” “How realistic is an assessment based on circuits with a fan-in of two and a large or even unrestricted fan-out, for example ?”. June 18, 2003
7
Designing for Speed and Power
Ultimate Speed Adders, IEEE Trans on Electronic Computers, April, 1963 – correspondence between Sklansky and Lehman: Lehman: “Thus I do not believe that Sklansky has satisfactorily answered his question: “Which is the faster adder?”. In fact this question appears to me to be meaningless. If as seems reasonable, we define an adder as the physical realization of some logical scheme for achieving (binary) addition within some larger systems, no absolutely “fastest adder” can exist. June 18, 2003
8
Design of Power Efficient VLSI Arithmetic: Speed and Power Trade-offs
Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University of California Davis Tutorial Présentation 16th International Symposium on Computer Arithmetic Santiago de Compostela, SPAIN June 18, 2003
9
Issues to be addressed How do we compare different topologies for their efficiency ? How do we estimate speed and efficiency of our algorithm ? What criteria's should we use when developing a new algorithm ? How does power enter into this equation ? June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN
10
Additional Issues Determine which topology is the best for given Power or Delay budget Determine which topology can stretch the furthest in terms of speed or power June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN
11
Metric
12
Previously used estimates
Counting the number of gates (logic levels): not accurate June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN
13
Critical path in Motorola's 64-bit CLA
As opposed to Ripple or Carry-Skip Adders the critical path in the Carry-Lookahead-Adder travels in vertical direction rather than a horizontal one as shown in the previous slide. Therefore the delay of Carry-Lookahead-Adder is not directly proportional to the size of the adder N, but to the number of levels used. Given that the groups and super-groups in the Carry-Lookahead-Adder resemble a tree structure the delay of a Carry-Lookahead-Adder is thus proportional to the log function of the size N. This log dependency makes Carry-Lookahead-Adder one of the theoretically fastest structures for addition. However, it can be argued that the speed efficiency of the Carry-Lookahead-Adder has passed the point of diminishing returns given the fan-in and fan-out dependencies of the logic gates and inadequacy of the delay model based on counting number of gates in the critical path. In reality, Carry-Lookahead-Adder is indeed achieving lesser speed than expected, especially when compared to some techniques that consume less hardware for the implementation. An example of a Carry Lookahead Adder, and a critical path as implemented in Motorola processor is shown in this slide. June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN
14
Motorola's 64-bit CLA Modified PG Block
Intermediate propagate signals Pi:0 are generated to speed-up C3 June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN
15
Fan-In and Fan-Out Dependency (Oklobdzija, Barnes: IBM 1985)
June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN
16
Delay Comparison: Variable Block Adder (Oklobdzija, Barnes: IBM 1985)
Complexity June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN
17
Design Objective Design takes time:
finding results afterward is not of much value There is a disconnect between measures used by computer arithmetic when developing an algorithm and what is obtained after implementation we want to estimate as close to the measured results A simple tool that can evaluate different design trade-off for a given technology is needed Power trade-off is the most important speed and power are tradable June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN
18
Logical Effort Theory “Back of the Envelope” complexity: good for estimating speed Gate delay = linear function of load Slope: logical effort gate driving characteristics Intersect: parasitic gate internal load “Logical Effort” accuracy is not sufficient We needed to extend and refine the method However, that becomes more than “Back of the Envelope” Logical Effort does not account for possible power-delay trade-offs June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN
19
Logical Effort Theory Excel –a platform of choice (ARITH-16)
Simple enough Can provide computation quickly Easy to enter a given design Technology characterization is needed: This needs to be done only once: available for every design afterwards Domino gate = 2 stages of dynamic and static Different driving characteristics of these stages Multi-output gate (carry-look-ahead, Ling/conditional sum) Energy model needs to be included June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN
20
Energy Motivation AGUs: performance and peak-current limiters
*courtesy of Intel Corp. Cache Processor thermal map Temp (oC) Execution core AGU 120oC AGUs: performance and peak-current limiters High activity thermal hotspot Goal: high-performance energy-efficient design June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN
21
Kogge-Stone Adder Critical path = PG+5+XOR = 7 gate stages
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 Carry-merge gates XOR Critical path = PG+5+XOR = 7 gate stages Generate,Propagate fanout of 2,3 Maximum interconnect spans 16b Energy inefficient June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN
22
Sparse-tree Adder Architecture
Generate every 4th carry in parallel Side-path: 4-bit conditional sum generator 73% fewer carry-merge gatesenergy-efficient June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN
23
Kogge-Stone adder (8-stage)
D = 8*(GBH)1/8* *P June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN
24
MXA2 – Architecture & Result
Multiplexer-based Generate carries using radix-2 (P,G) 4-bit conditional sum selected by carries 4-b cell width = 17m 9-stage critical path Per-stage effort = 3.7 Total effort delay = 33.3 Total parasitic = 22.5 Total delay = 55.8 June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN
25
HC2 – Architecture Generate even carries using radix-2 (P,G)
Generate odd carries from even carries CMOS adder for sum 1-b cell width 4m 10-stage critical path June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN
26
HC2 – Circuits & Results June 18, 2003
16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN
27
KS2 – Architecture & Results
Generate carries using radix-2 (P,G) CMOS adder for sum Similar circuits as HC2 1-b cell width 4m 9-stage critical path June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN
28
KS4 – Architecture Generate carries using redundant radix-4 (P,G)
Dynamic circuit 1-b cell width 4m 6-stage critical path June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN
29
KS4 – Circuits & Result June 18, 2003
16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN
30
CLA4 – Architecture Generate carries using radix-4 (P,G,C)
1-b cell width 4m 15-stage critical path (P,G,C) Network G-Path P-Path June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN
31
CLA4 – Circuits & Result June 18, 2003
16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN
32
LNG4 – Architecture Generate carries using Ling pseudo-carries
Conditional sums selected by local & long carries 1-b cell width 5.1m; 9-stage critical path June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN
33
LNG4 – Circuits & Result June 18, 2003
16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN
34
Results from Simulation
Fairly consistent with logical effort analysis Per-stage delay 1.4 FO4 (static) 0.8 FO4 (dynamic) June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN
35
Delay of Representative 64-b Adders
June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN
36
What happened when Power is considered ?
June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN
37
Energy-Delay Space Comparison must be done in the Energy-Delay Space
June 18, 2003
38
Logical Effort June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN
39
Delay in a Logic Gate Delay of a logic gate has two components
d = f + p Logical effort describes relative ability of gate topology to deliver current (defined to be 1 for an inverter) Electrical effort is the ratio of output to input capacitance parasitic delay effort delay, stage effort electrical effort is also called “fanout” f = gh electrical effort = Cout/Cin logical effort *from Mathew Sanu / D. Harris June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN
40
Logical Effort Parameters: Inverter
Delay g=2.2 (logic effort) d=gh+p p=3.8ps (parasitic delay) Fanout: h =Cin/Cout d = gh + p Delay increases linearly with fanout More complex gates have greater g and p *from Mathew Sanu / D. Harris June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN
41
Normalized Logical Effort: Inverter
*from Mathew Sanu / D. Harris 6 5 4 g = p = d = 1 inverter Normalized delay: d 1 3 gh + p = h+1 effort delay 2 1 parasitic delay 1 2 3 4 5 Fanout: h = Cout/Cin Define delay of unloaded inverter = 1 Define logical effort ‘g’ of inverter = 1 Delay of complex gates can be defined w.r.t d=1 June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN
42
Computing Logical Effort
DEF: Logical effort is the ratio of the input capacitance to the input capacitance of an inverter delivering the same output current Measured from delay vs. fanout plots of simulated gates Or estimated, counting capacitance in units of transistor W *from Mathew Sanu / D. Harris June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN
43
L.E for Adder Gates *from Mathew Sanu / D. Harris Logical effort parameters obtained from simulation for std cells Define logical effort ‘g’ of inverter = 1 Delay of complex gates can be defined w.r.t d=1 June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN
44
Normalized L.E Gate type Logical Eff. (g) Parasitics (Pinv) Inverter 1 Dyn. Nand 0.6 1.34 Dyn. CM 1.62 Dyn. CM-4N 3.71 Static CM 1.48 2.53 Mux 1.68 2.93 XOR 1.69 2.97 Logical effort & parasitic delay normalized to that of inverter *from Mathew Sanu June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN
45
Delay of a string of gates
Delay of a path, D = di = gihi pi gi & pi are constants To minimize path delay, optimal values of hi are to be determined D is minimized when each stage bears the same effort, i.e. gihi = g i+1h i+1 *from Mathew Sanu / D. Harris June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN
46
Minimizing path delay gi Logical Effort of a string of gates:
Path Electrical Effort: Branching Effort Path Branching Effort: Path Effort: F=GBH G = Cout(path) H = hi = Cin(path) Con-path + Coff-path Con-path b = bi B = Delay is minimized when each stage bears the same effort: f = gihi = F1/N The minimum delay of an N-stage path is: NF1/N + P *from Mathew Sanu / D. Harris June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN
47
Inclusion of Wire Delay into Logical Effort
June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN
48
Wiring Load Wiring in hand analysis Wiring in HSPICE Wire length
Only lumped capacitance included Wiring in HSPICE Short wire: 1-segment -model RC network Long wire: 4-segment -model RC network Using worst-case wire capacitance Wire length Estimated from most critical 1-bit pitch June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN
49
Modeling interconnect cap.
Include interconnect cap in branching factor Coff-path Coff-path PG CM0 PG CM0 Adder bitpitch Adder bitpitch Cint CM0 CM0 Con-path Con-path Con-path + Coff-path Con-path + Coff-path+Cint Cint b = = 2 b = = 2+ Con-path Con-path Con-path = 2 + I I : % int. cap to gate cap in 1 adder bitpitch June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN
50
Branching g0 g1 g2 g3 Logical Effort assumes the “branching” factor of this circuit to be 2. This is incorrect and can create inaccuracies June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN
51
Correction on Branching
f0 = f1 , f2 = f3 Td1 = (f0 + f1 + parasitics) Td2 = (f2 + f3 + parasitics) Minimum Delay occurs when Td1 = Td2 June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN
52
“Real” Branching Calculation
Branching only equals 2 when: This explains why we had to resort to Excel ! June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN
53
Technology Characterization
June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN
54
Characterization Setup
Logical Effort Requirements: Equalize input and output transitions. Logical Effort is characterized by varying the h (Cout/Cin) of a gate. By using a variable load of inverters each gate can be characterized over the same range of loads. The Logical Effort of each gate is characterized for each input. Energy is characterized for each output transition of the gate caused by each input transition. i.e. for an inverter: energy is measured for tLH and tHL June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN
55
LE Characterization Setup for Static Gates
In tLH tHL Average Energy .. Variable Load June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN
56
LE Characterization Setup for Dynamic Gates
In tHL Energy Variable Load June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN
57
LE Table (Static CMOS) Technology: P/N Ratio = 2 INV = 3.67, pINV = 4.29 Measured on worst-case single-input switching June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN
58
Static CMOS Gates: Delay Graphs
June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN
59
Static Gates: Pull-up Delay Graph
June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN
60
LE Table (Dynamic CMOS)
Technology: Minimum-sized keeper included Measured on all-input switching of worst path June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN
61
Dynamic CMOS: Delay Graphs
June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN
62
Dynamic CMOS: Delay Graphs
June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN
63
Energy Calculation June 18, 2003
16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN
64
Energy Calculation 16X Minimal Size Dyn-NAND 8X Minimal Size Dyn-NAND
June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN
65
Energy Calculation June 18, 2003
16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN
66
Energy Calculation June 18, 2003
16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN
67
Energy Calculation NAND-2 June 18, 2003
16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN
68
Examples June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN
69
64-Bit Adders Han-Carlson (prefix-2, HC2): Static and Dynamic
Han-Carlson (prefix-2, HC2-2): Dynamic-Static Kogge-Stone (prefix-2, KS2): Static and Dynamic Kogge-Stone (prefix-2, KS2-2): Dynamic-Static Quaternary-Tree (prefix-2, QT2): Static and Dynamic Included wire delay, tdelay = 0.7RwireCwire Included wire energy, Ew = CwireV2 June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN
70
Test Setup 1mm wire Cwire A0 S0 Adder A63 S63 Cwire
H=(Cin + Cwire)/Cin June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN
71
Energy-Delay Estimates
June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN
72
Adders: Energy Dynamic: KS, HC QT KS Static HC Dynamic-Static
June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN
73
Dynamic Static Implementation of Carry-Merge stage
inverters to be eliminated Regular Domino Implementation Compound-Domino Implementation June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN
74
Energy-Delay comparison of 64-bit KS, HC and QT adders
June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN
75
Adders: Critical Path Energy
QT dynamic-static HC-dynamic KS dynamic HC dynamic-static QT static KS dynamic-static HC-static KS-static June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN
76
Intel 32-bit Adder 0.13u 1.2V [VLSI-2002]
KS KS estimated QT QT Estimated June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN
77
Energy-Delay comparison of 32-bit QT and KS adders: estimated vs
Energy-Delay comparison of 32-bit QT and KS adders: estimated vs. simulation in 0.10mm technology June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN
78
Est. Results: All Adders w/o Wires
June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN
79
Est. Results: All Adders w/ Wires
June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN
80
Conclusion Using realistic measures for comparing various designs leads to better design choices Power is as important as speed Making comparison in Energy-Delay space is necessary: power can always be traded for speed and vice versa Wire effects are significant Leakage currents ? June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.