Download presentation
Presentation is loading. Please wait.
Published byLorena Barnett Modified over 8 years ago
1
CSE477 L20 Adder Design.1Irwin&Vijay, PSU, 2003 CSE477 VLSI Digital Circuits Fall 2003 Lecture 20: Adder Design Mary Jane Irwin ( www.cse.psu.edu/~mji ) www.cse.psu.edu/~cg477www.cse.psu.edu/~mji www.cse.psu.edu/~cg477 [Adapted from Rabaey’s Digital Integrated Circuits, Second Edition, ©2003 J. Rabaey, A. Chandrakasan, B. Nikolic]
2
CSE477 L20 Adder Design.2Irwin&Vijay, PSU, 2003 Review: Basic Building Blocks Datapath l Execution units -Adder, multiplier, divider, shifter, etc. l Register file and pipeline registers l Multiplexers, decoders Control l Finite state machines (PLA, ROM, random logic) Interconnect l Switches, arbiters, buses Memory l Caches (SRAMs), TLBs, DRAMs, buffers
3
CSE477 L20 Adder Design.3Irwin&Vijay, PSU, 2003 The 1-bit Binary Adder 1-bit Full Adder (FA) A B S C in S = A B C in C out = A&B | A&C in | B&C in (majority function) How can we use it to build a 64-bit adder? How can we modify it easily to build an adder/subtractor? How can we make it better (faster, lower power, smaller)? ABC in C out Scarry status 00000kill 00101 01001propagate 01110 10001 10110 11010generate 11111 C out G = A & B P = A B K = !A & !B = P C in = G | P&C in
4
CSE477 L20 Adder Design.4Irwin&Vijay, PSU, 2003 FA Gate Level Implementations AB S C out C in t1 t0 t2 t0 t1 AB S C out C in t2 The way you learned to design in CSE271 and CSE471
5
CSE477 L20 Adder Design.5Irwin&Vijay, PSU, 2003 Review: XOR FA C out S C in A B 16 transistors
6
CSE477 L20 Adder Design.6Irwin&Vijay, PSU, 2003 Review: CPL FA A !A B!B C in !C in !S S C out !C out A !A B !B BC in !C in C in !C in 20+8 transistors, dual rail – beware of threshold drops
7
CSE477 L20 Adder Design.7Irwin&Vijay, PSU, 2003 Delay Balanced FA B!B Identical Delays for Carry and Sum P!P Signal set-up B A !B p A Carry generation Sum generation C in !P A !C out !P P C in P A !C out P !P S C in 20+2 transistors
8
CSE477 L20 Adder Design.8Irwin&Vijay, PSU, 2003 Review: Mirror Adder B BB BB B B B A A A A A A A A C in !C out !S 24+4 transistors kill generate 0-propagate 1-propagate C out = A&B | B&C in | A&C in SUM = A&B&C in | C OUT &(A | B | C in ) 44 44 4 8 888 8 222 3 3 3 6 6 6 444 4 2 Sizing: Each input in the carry circuit has a logical effort of 2 so the optimal fan-out for each is also 2. Since !C out drives 2 internal and 2 inverter transistor gates (to form C in for the nms bit adder) should oversize the carry circuit. PMOS/NMOS ratio of 2.
9
CSE477 L20 Adder Design.9Irwin&Vijay, PSU, 2003 Mirror Adder Features The NMOS and PMOS chains are completely symmetrical with a maximum of two series transistors in the carry circuitry, guaranteeing identical rise and fall transitions if the NMOS and PMOS devices are properly sized. When laying out the cell, the most critical issue is the minimization of the capacitances at node !C out (four diffusion capacitances, two internal gate capacitances, and two inverter gate capacitances). Shared diffusions can reduce the stack node capacitances. The transistors connected to C in are placed closest to the output. Only the transistors in the carry stage have to be optimized for optimal speed. All transistors in the sum stage can be minimal size.
10
CSE477 L20 Adder Design.10Irwin&Vijay, PSU, 2003 A 64-bit Adder/Subtractor 1-bit FA S0S0 C 0 =C in C1C1 1-bit FA S1S1 C2C2 S2S2 C3C3 C 64 =C out 1-bit FA S 63 C 63... Ripple Carry Adder (RCA) built out of 64 FAs Subtraction – complement all subtrahend bits (xor gates) and set the low order carry-in RCA advantage: simple logic, so small (low cost) disadvantage: slow (O(N) for N bits) and lots of glitching (so lots of energy consumption) A0A0 B0B0 A1A1 B1B1 A2A2 B2B2 A 63 B 63 add/subt
11
CSE477 L20 Adder Design.11Irwin&Vijay, PSU, 2003 Ripple Carry Adder (RCA) A0A0 B0B0 S0S0 C 0 =C in FA A1A1 B1B1 S1S1 A2A2 B2B2 S2S2 A3A3 B3B3 S3S3 C out =C 4 T = O(N) worst case delay T adder T FA (A,B C out ) + (N-2)T FA (C in C out ) + T FA (C in S) Real Goal: Make the fastest possible carry path
12
CSE477 L20 Adder Design.12Irwin&Vijay, PSU, 2003 Inversion Property AB S C in FA !C out (A, B, C in ) = C out (!A, !B, !C in ) C out AB S FAC out C in !S (A, B, C in ) = S(!A, !B, !C in ) Inverting all inputs to a FA results in inverted values for all outputs
13
CSE477 L20 Adder Design.13Irwin&Vijay, PSU, 2003 Exploiting the Inversion Property A0A0 B0B0 S0S0 C 0 =C in FA’ A1A1 B1B1 S1S1 A2A2 B2B2 S2S2 A3A3 B3B3 S3S3 C out =C 4 Now need two “flavors” of FAs regular cellinverted cell Minimizes the critical path (the carry chain) by eliminating inverters between the FAs (will need to increase the transistor sizing on the carry chain portion of the mirror adder).
14
CSE477 L20 Adder Design.14Irwin&Vijay, PSU, 2003 Fast Carry Chain Design The key to fast addition is a low latency carry network What matters is whether in a given position a carry is l generatedG i = A i & B i l propagatedP i = A i B i (sometimes use A i | B i ) l annihilated (killed)K i = !A i & !B i Giving a carry recurrence of C i+1 = G i | P i &C i C 1 = C 2 = C 3 = C 4 =
15
CSE477 L20 Adder Design.15Irwin&Vijay, PSU, 2003 Fast Carry Chain Design The key to fast addition is a low latency carry network What matters is whether in a given position a carry is l generatedG i = A i & B i l propagatedP i = A i B i (sometimes use A i | B i ) l annihilated (killed)K i = !A i & !B i Giving a carry recurrence of C i+1 = G i | P i &C i C 1 = G 0 | P 0 &C 0 C 2 = G 1 | P 1 &G 0 | P 1 &P 0 &C 0 C 3 = G 2 | P 2 &G 1 | P 2 &P 1 &G 0 | P 2 &P 1 &P 0 &C 0 C 4 = G 3 | P 3 &G 2 | P 3 &P 2 &G 1 | P 3 &P 2 &P 1 &G 0 | P 3 &P 2 &P 1 &P 0 &C 0
16
CSE477 L20 Adder Design.16Irwin&Vijay, PSU, 2003 Manchester Carry Chain (MCC) Switches controlled by G i and P i Total delay of l time to form the switch control signals G i and P i l setup time for the switches l signal propagation delay through N switches in the worst case GiGi PiPi !C i !C i+1 clk
17
CSE477 L20 Adder Design.17Irwin&Vijay, PSU, 2003 4-bit Sliced MCC Adder GP !C 0 clk GPGPGP & & & & & & & & A0A0 B0B0 A1A1 B1B1 A2A2 B2B2 A3A3 B3B3 S0S0 S1S1 S2S2 S3S3 !C 1 !C 2 !C 3 !C 4
18
CSE477 L20 Adder Design.18Irwin&Vijay, PSU, 2003 8-bit MCC Adder 4-bit slice MCC!C 0 & & 4-bit slice MCC & & !C 7 Its really hard to beat the speed of a well designed MCC for word lengths of 8 bits or less !
19
CSE477 L20 Adder Design.19Irwin&Vijay, PSU, 2003 Carry Skip Adders (aka Carry Bypass Adders) T = O( n) A = O(n)
20
CSE477 L20 Adder Design.20Irwin&Vijay, PSU, 2003 Carry Skip Adder If (P 0 & P 1 & P 2 & P 3 = 1) then C 4 = C 0 otherwise the block itself kills or generates the carry internally A0A0 B0B0 S0S0 C0C0 FA A1A1 B1B1 S1S1 A2A2 B2B2 S2S2 A3A3 B3B3 S3S3 C4C4 C4C4 BP = P 0 &P 1 &P 2 &P 3 “Block Propagate”
21
CSE477 L20 Adder Design.21Irwin&Vijay, PSU, 2003 Carry-Skip Chain Implementation BP block carry-in block carry-out carry-out C in G0G0 P0P0 P1P1 P2P2 P3P3 G1G1 G2G2 G3G3 !C out BP
22
CSE477 L20 Adder Design.22Irwin&Vijay, PSU, 2003 16 bit, 4-bit Block Carry Skip Adder Worst-case delay carry from bit 0 to bit 15 = carry generated in bit 0, ripples through bits 1, 2, and 3, skips the middle two groups (B is the group size in bits), ripples in the last group from bit 12 to bit 15 C i,0 Sum Carry Propagation Setup Sum Carry Propagation Setup Sum Carry Propagation Setup Sum Carry Propagation Setup bits 0 to 3bits 4 to 7bits 8 to 11bits 12 to 15 T add = t setup + B t carry + ((N/B) - 2) t skip +B t carry + t sum
23
CSE477 L20 Adder Design.23Irwin&Vijay, PSU, 2003 Optimal Skip Block Size and Add Time Assuming one stage of ripple (t carry ) has the same delay as one skip logic stage (t skip ) and both are 1 T CSkA = 1 + B + (N/B-2) + B + 1 t setup ripple in skips ripple in t sum block 0 last block = 2B + N/B So the optimal block size, B, is dT CSkA /dB = 0 (N/2) = B opt And the optimal time is Optimal T CSkA = 4√(n/2) – 1 = 2√(2n) – 1
24
CSE477 L20 Adder Design.24Irwin&Vijay, PSU, 2003 Carry Skip Adder Extensions Variable block sizes l A carry that is generated in, or absorbed by, one of the inner blocks travels a shorter distance through the skip blocks, so can have bigger blocks for the inner carries without increasing the overall delay C in C out Multiple levels of skip logic skip level 1 skip level 2 C in C out AND of the first level skip signals (BP’s)
25
CSE477 L20 Adder Design.25Irwin&Vijay, PSU, 2003 RCA, Carry Skip Adder Comparison B=2 B=3 B=4 B=5 B=6
26
CSE477 L20 Adder Design.26Irwin&Vijay, PSU, 2003 Carry Select Adder 4-b Setup “0” carry propagation “1” carry propagation1 0 multiplexerC in C out Sum generation P’sG’s C’s Precompute the carry out of each block for both carry_in = 0 and carry_in = 1 (can be done for all blocks in parallel) and then select the correct one A’sB’s S’s
27
CSE477 L20 Adder Design.27Irwin&Vijay, PSU, 2003 Carry Select Adder: Critical Path Setup “0” carry “1” carry 1 0 mux C in Sum gen P’sG’s C’s S’s A’sB’s Setup “0” carry “1” carry mux Sum gen P’sG’s C’s S’s A’sB’s Setup “0” carry “1” carry mux Sum gen P’sG’s C’s S’s A’sB’s Setup “0” carry “1” carry mux C out Sum gen P’sG’s C’s S’s A’sB’s bits 0 to 3bits 4 to 7bits 8 to 1bits 12 to 15
28
CSE477 L20 Adder Design.28Irwin&Vijay, PSU, 2003 Carry Select Adder: Critical Path Setup “0” carry “1” carry 1 0 mux C in Sum gen P’sG’s C’s S’s A’sB’s Setup “0” carry “1” carry mux Sum gen P’sG’s C’s S’s A’sB’s Setup “0” carry “1” carry mux Sum gen P’sG’s C’s S’s A’sB’s Setup “0” carry “1” carry mux C out Sum gen P’sG’s C’s S’s A’sB’s bits 0 to 3bits 4 to 7bits 8 to 1bits 12 to 15 T add = t setup + B t carry + N/B t mux + t sum 1 +4 +1
29
CSE477 L20 Adder Design.29Irwin&Vijay, PSU, 2003 Square Root Carry Select Adder Setup “0” carry “1” carry 1 0 mux C in Sum gen P’sG’s C’s S’s A’sB’sA’sB’s S’s Setup “0” carry “1” carry mux Sum gen P’sG’s C’s A’sB’s Setup “0” carry “1” carry mux C out Sum gen P’sG’s C’s S’s A’sB’s bits 0 to 1bits 2 to 4 bits 5 to 8bits 9 to 13 Setup mux Sum gen P’sG’s C’s S’s “1” carry “0” carry Setup “0” carry “1” carry mux Sum gen P’sG’s C’s A’sB’s bits 14 to 19 S’s
30
CSE477 L20 Adder Design.30Irwin&Vijay, PSU, 2003 Square Root Carry Select Adder Setup “0” carry “1” carry 1 0 mux C in Sum gen P’sG’s C’s S’s AsB’sA’sBs 1 0 S’s Setup “0” carry “1” carry mux Sum gen P’sG’s C’s A’sB’s Setup “0” carry “1” carry 1 0 mux C out Sum gen P’sG’s C’s S’s A’sB’s bits 0 to 1bits 2 to 4 bits 5 to 8bits 9 to 13 T add = t setup + 2 t carry + √N t mux + t sum Setup 1 0 mux Sum gen P’sG’s C’s S’s “1” carry “0” carry Setup “0” carry “1” carry mux Sum gen P’sG’s C’s A’sB’s bits 14 to 19 1 +2 +1 +3+4+5+6 S’s
31
CSE477 L20 Adder Design.31Irwin&Vijay, PSU, 2003 Prefix Adders T = O(log n) A = O(n log n)
32
CSE477 L20 Adder Design.32Irwin&Vijay, PSU, 2003 Parallel Prefix Adders (PPAs) Define carry operator € on (G,P) signal pairs l € is associative, i.e., [(g’’’,p’’’) € (g’’,p’’)] € (g’,p’) = (g’’’,p’’’) € [(g’’,p’’) € (g’,p’)] € (G’’,P’’)(G’,P’) (G,P) where G = G’’ | P’’&G’ P = P’’&P’ € €€ € G’G’ !G G ’’ P ’’
33
CSE477 L20 Adder Design.33Irwin&Vijay, PSU, 2003 PPA General Structure Given P and G terms for each bit position, computing all the carries is equal to finding all the prefixes in parallel (G 0,P 0 ) € (G 1,P 1 ) € (G 2,P 2 ) € … € (G N-2,P N-2 ) € (G N-1,P N-1 ) Since € is associative, we can group them in any order l but note that it is not commutative Measures to consider l number of € cells l tree cell depth (time) l tree cell area l cell fan-in and fan-out l max wiring length l wiring congestion l delay path variation (glitching) P i, G i logic (1 unit delay) S i logic (1 unit delay) C i parallel prefix logic tree (1 unit delay per level)
34
CSE477 L20 Adder Design.34Irwin&Vijay, PSU, 2003 Brent-Kung PPA Parallel Prefix Computation € G0P0G0P0 G1P1G1P1 G2p2G2p2 G3P3G3P3 G4P4G4P4 G5P5G5P5 G6P6G6P6 G7P7G7P7 G8P8G8P8 G9p9G9p9 G 10 P 10 G 11 p 11 G 12 P 12 G 13 p 13 G 14 p 14 G 15 p 15 €€€€€€€€€€€€€€€€€€€€€€€€€ C1C1 C2C2 C3C3 C4C4 C5C5 C6C6 C7C7 C8C8 C9C9 C 10 C 11 C 12 C 13 C 14 C 15 C 16 T = log 2 N T = log 2 N - 2 A = 2log 2 N A = N/2
35
CSE477 L20 Adder Design.35Irwin&Vijay, PSU, 2003 Brent-Kung PPA Parallel Prefix Computation € G0P0G0P0 G1P1G1P1 G2p2G2p2 G3P3G3P3 G4P4G4P4 G5P5G5P5 G6P6G6P6 G7P7G7P7 G8P8G8P8 G9p9G9p9 G 10 P 10 G 11 p 11 G 12 P 12 G 13 p 13 G 14 p 14 G 15 p 15 €€€€€€€€€€€€€€€€€€€€€€€€€ C1C1 C2C2 C3C3 C4C4 C5C5 C6C6 C7C7 C8C8 C9C9 C 10 C 11 C 12 C 13 C 14 C 15 C 16 T = log 2 N T = log 2 N - 2 A = 2log 2 N A = N/2
36
CSE477 L20 Adder Design.36Irwin&Vijay, PSU, 2003 A Faster Yet PPA There are even faster PPA approaches that are used in most modern day machines for operands of 32 bits or greater Kogge-Stone (KS) l faster pp tree (logN for KS versus 2logN-2 for BK) l fan-out of carry cell € limited to two l takes more € cells (NlogN - N + 1 for KS versus 2N - 2 - logN for BK) and has more wiring Brent-Kung (BK) adder has the time bound of T BK = 1 + (2log N – 2) + 1
37
CSE477 L20 Adder Design.37Irwin&Vijay, PSU, 2003 Kogge-Stone PPF Adder Parallel Prefix Computation € G0P0G0P0 G1P1G1P1 G2P2G2P2 G3P3G3P3 G4P4G4P4 G5P5G5P5 G6P6G6P6 G7P7G7P7 G8P8G8P8 G9P9G9P9 G 10 P 10 G 11 P 11 G 12 P 12 G 13 P 13 G 14 P 14 G 15 P 15 €€€€€€€€€€€€€€€ C1C1 C2C2 C3C3 C4C4 C5C5 C6C6 C7C7 C8C8 C9C9 C 10 C 11 C 12 C 13 C 14 C 15 C 16 C in € T = log 2 N A = log 2 N A = N €€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€
38
CSE477 L20 Adder Design.38Irwin&Vijay, PSU, 2003 Kogge-Stone PPF Adder Parallel Prefix Computation T = log 2 N A = log 2 N A = N T add = t setup + log 2 N t € + t sum
39
CSE477 L20 Adder Design.39Irwin&Vijay, PSU, 2003 PPA Comparisons MeasureBK PPAN=64KS PPAN=64 # of € cells2N - 2 - logN129NlogN - N + 1321 tree depth2logN - 210logN6 tree area (WxH) (N/2) * (2logN -2)320N * logN384 cell fan-in2222 cell fan-outlogN622 max wire length N/416N/232 wiring density sparsedense glitchinghighlow
40
CSE477 L20 Adder Design.40Irwin&Vijay, PSU, 2003 More Adder Comparisons
41
CSE477 L20 Adder Design.41Irwin&Vijay, PSU, 2003 Next Lecture and Reminders Next lecture l Multiplier Design -Reading assignment – Rabaey, et al, 11.4 Reminders l HW#4 due November 11 th (not Nov 4 th as on outline) l HW#5 will be optional (due November 20 th ) l Project final reports due December 4 th l Final grading negotiations/correction (except for the final exam) must be concluded by December 10 th l Final exam scheduled -Tuesday, December 16 th from 10:10 to noon in 118 and 113 Thomas
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.