Download presentation
Presentation is loading. Please wait.
Published byRandolf Roger Beasley Modified over 9 years ago
1
Fast Adders: Parallel Prefix Network Adders, Conditional-Sum Adders, & Carry-Skip Adders ECE 645: Lecture 5
2
Required Reading Chapter 6.4, Carry Determination as Prefix Computation Chapter 6.5, Alternative Parallel Prefix Networks Chapter 7.4, Conditional-Sum Adder Chapter 7.5, Hybrid Adder Designs Chapter 7.1, Simple Carry-Skip Adders Note errata at: http://www.ece.ucsb.edu/~parhami/text_comp_arit_1ed.htm#errors Behrooz Parhami, Computer Arithmetic: Algorithms and Hardware Design
3
Recommended Reading J-P. Deschamps, G. Bioul, G. Sutter, Synthesis of Arithmetic Circuits: FPGA, ASIC and Embedded Systems Chapter 11.1.9, Prefix Adders Chapter 11.1.3, Carry-Skip Adder Chapter 11.1.4, Optimization of Carry-Skip Adders Chapter 11.1.10, FPGA Implementations of Adders Chapter 11.1.11, Long-Operand Adders
4
Parallel Prefix Network Adders
5
5 Basic component - Carry operator (1) B”B’ g” p” g’ p’ gp g = g” + g’p” p = p’p” (g, p) = (g’, p’) ¢ (g”, p”) = (g” + g’p”, p’p”) B Parallel Prefix Network Adders
6
6 Basic component - Carry operator (2) B”B’ g” p” g’ p’ gp g = g” + g’p” p = p’p” (g, p) = (g’, p’) ¢ (g”, p”) = (g” + g’p”, p’p”) B overlap okay! Parallel Prefix Network Adders
7
7 Associative Not commutative [(g 1, p 1 ) ¢ (g 2, p 2 )] ¢ (g 3, p 3 ) = (g 1, p 1 ) ¢ [(g 2, p 2 ) ¢ (g 3, p 3 )] (g 1, p 1 ) ¢ (g 2, p 2 ) (g 2, p 2 ) ¢ (g 1, p 1 ) Properties of the carry operator ¢
8
8 Major concept Given: Find: (g 0, p 0 ) (g 1, p 1 ) (g 2, p 2 ) …. (g k-1, p k-1 ) (g [0,0], p [0,0] ) (g [0,1], p [0,1] ) (g [0,2], p [0,2] ) … (g [0,k-1], p [0,k-1] ) c i = g [0,i-1] + c 0 p [0,i-1] Parallel Prefix Network Adders block generate from index 0 to k-1
9
9 Parallel Prefix Sum Problem Given: Find: Parallel Prefix Adder Problem Given: Find: x 0 x 0 +x 1 x 0 +x 1 +x 2 … x 0 +x 1 +x 2 + …+ x k-1 x 0 x 1 x 2 … x k-1 x 0 x 0 ¢ x 1 x 0 ¢ x 1 ¢ x 2 … x 0 ¢ x 1 ¢ x 2 ¢ … ¢ x k-1 x 0 x 1 x 2 … x k-1 where x i = (g i, p i ) Similar to Parallel Prefix Sum Problem
10
10 Parallel Prefix Sums Network I
11
11 Cost = C(k) = 2 C(k/2) + k/2 = = 2 [2C(k/4) + k/4] + k/2 = 4 C(k/4) + k/2 + k/2 = = …. = = 2 log k-1 C(2) + k/2 (log 2 k-1) = = k/2 log 2 k 2 Example: C(16) = 2 C(8) + 8 = 2[2 C(4) + 4] + 8 = = 4 C(4) + 16 = 4 [2 C(2) + 2] + 16 = = 8 C(2) + 24 = 8 + 24 = 32 = (16/2) log 2 16 C(2) = 1 Parallel Prefix Sums Network I – Cost (Area) Analysis
12
12 Delay = D(k) = D(k/2) + 1 = = [D(k/4) + 1] + 1 = D(k/4) + 1 + 1 = = …. = = log 2 k Example: D(16) = D(8) + 1 = [D(4) + 1] + 1 = = D(4) + 2 = [D(2) + 1] + 2 = = 4 = log 2 16 D(2) = 1 Parallel Prefix Sums Network I – Delay Analysis
13
13 Parallel Prefix Sums Network II (Brent-Kung)
14
14 Cost = C(k) = C(k/2) + k-1 = = [C(k/4) + k/2-1] + k-1 = C(k/4) + 3k/2 - 2 = = …. = = C(2) + (2k - 2k/2 (log k-1) ) - (log 2 k-1) = = 2k - 2 - log 2 k Example: C(16) = C(8) + 16-1 = [C(4) + 8-1] + 16-1 = = C(2) + 4-1 + 24-2 = 1 + 28 - 3 = 26 = 2·16 - 2 - log 2 16 C(2) = 1 2 Parallel Prefix Sums Network II – Cost (Area) Analysis
15
15 Delay = D(k) = D(k/2) + 2 = = [D(k/4) + 2] + 2 = D(k/4) + 2 + 2 = = …. = = 2 log 2 k - 1 Example: D(16) = D(8) + 2 = [D(4) + 2] + 2 = = D(4) + 4 = [D(2) + 2] + 4 = = 7 = 2 log 2 16 - 1 D(2) = 1 Parallel Prefix Sums Network II – Delay Analysis
16
16 8-bit Brent-Kung Parallel Prefix Network
17
17 x1’x1’x3’x3’ s1’s1’s3’s3’ 2 –bit B-K PPN x5’x5’x7’x7’ s5’s5’s7’s7’ 4-bit Brent-Kung Parallel Prefix Network
18
18 8-bit Brent-Kung Parallel Prefix Network Critical Path
19
19 Critical Path GP c C S g i = x i y i p i = x i y i g = g” + g’ p” p = p’ p” c i+1 = g [0,i] + c 0 p [0,i] s i = p i c i 1 gate delay 2 gate delays 1 gate delay
20
20 Brent-Kung Parallel Prefix Graph for 16 Inputs
21
21 Kogge-Stone Parallel Prefix Graph for 16 Inputs
22
22 Comparison of architectures Hybrid Network 2 Brent-Kung Kogge-Stone Cost(k) Delay(k) Cost(16) Delay(16) Cost(32) Delay(32) 2 log 2 k - 2 2k - 2 - log 2 k log 2 k+1 log 2 k k/2 log 2 k k log 2 k - k + 1 6 5 4 26 32 49 8 6 5 57 80 129 Parallel Prefix Network Adders
23
23 Latency vs. Area Tradeoff
24
24 Hybrid Brent-Kung/Kogge-Stone Parallel Prefix Graph for 16 Inputs
25
Conditional-Sum Adders
26
One-level k-bit Carry-Select Adder
27
Two-level k-bit Carry Select Adder
28
28 Conditional Sum Adder Extension of carry-select adder Carry select adder One-level using k/2-bit adders Two-level using k/4-bit adders Three-level using k/8-bit adders Etc. Assuming k is a power of two, eventually have an extreme where there are log 2 k-levels using 1-bit adders This is a conditional sum adder
29
29 Conditional Sum Adder: Top-Level Block for One Bit Position
30
30 Three Levels of a Conditional Sum Adder 2 2 2 2 x i+3 y i+3 c=0 c=1 2 2 x i+2 y i+ 2 c=0 c=1 1 1 1 1 3 c=0 3 c=1 2 2 2 2 x i+1 y i+1 c=0 c=1 2 2 1-bit conditional sum block xixi yiyi c=0 c=1 1 1 1 1 3 c=0 3 c=1 22 1 1 3 3 5 5 c=0 1+1 2+1 4+1 branch point concatenation block carry-in determines selection
31
31 16-Bit Conditional Sum Adder Example
32
32 Conditional Sum Adder Metrics
33
Hybrid Adders
34
A Hybrid Ripple-Carry/Carry-Lookahead Adder
35
A Hybrid Carry-Lookahead/Carry-Select Adder
36
Carry-Skip Adders Fixed-Block-Size
37
Apr. 2008Computer Arithmetic, Addition/SubtractionSlide 37 7.1 Simple Carry-Skip Adders Fig. 7.1 Converting a 16-bit ripple-carry adder into a simple carry-skip adder with 4-bit skip blocks.
38
Apr. 2008Computer Arithmetic, Addition/SubtractionSlide 38 Another View of Carry-Skip Addition Street/freeway analogy for carry-skip adder.
39
Apr. 2008Computer Arithmetic, Addition/SubtractionSlide 39 Mux-Based Skip Carry Logic The carry-skip adder with “OR combining” works fine if we begin with a clean slate, where all signals are 0s at the outset; otherwise, it will run into problems, which do not exist in mux-based version 0101 p [4j, 4j+3] c 4j+4 Fig. 10.7 of arch book
40
Apr. 2008Computer Arithmetic, Addition/SubtractionSlide 40 Carry-Skip Adder with Fixed Block Size Block width b; k/b blocks to form a k-bit adder (assume b divides k) Example: k = 32, b opt = 4, T opt = 12.5 stages (contrast with 32 stages for a ripple-carry adder) T fixed-skip-add = (b – 1) + 0.5 + (k/b – 2) + (b – 1) in block 0 OR gate skips in last block 2b + k/b – 3.5 stages dT/db = 2 – k/b 2 = 0 b opt = k/2 T opt = 2 2k – 3.5... 1
41
Fixed-Block-Size Carry-Skip Adder (1) Notation & Assumptions Delay of skip logic = Delay of one stage of ripple-carry adder = 1 delay unit Latency of the carry-skip adder with fixed block width Fixed block size - b bits Latency fixed-carry-skip = ( b - 1 ) + 0.5 + - 2 + ( b - 1 ) in block 0 OR gate skipsin last block Adder size - k-bits = 2b + - 3.5 k b k b Number of stages - t
42
Fixed-Block-Size Carry-Skip Adder (2) Optimal fixed block size dLatency fixed-carry-skip db =2 - k b2b2 =0 b opt = k 2 t opt = k b opt = 2 k Latency fixed-carry-skip opt = 2 k 2 + k k 2 - 3.5 = = 2 k + - 3.5= 2 k 2 - 3.5
43
k b opt Latency fixed-carry-skip 32 16 4 Latency ripple-carry Latency look-ahead 64 128 Fixed-Block-Size Carry-Skip Adder (3) 8 t opt 8 16 12.5 28.5 32 128 6.5 4.5 8.5 2 3 8 5 16 5 6 13 11 18.5 64 7.5 18.5
44
Carry-Chain & Carry-Skip Adders in Xilinx FPGAs
45
LUT 01 xixi yiyi c i+1 00110011 01010101 yiyiyiyi cici cici xixi yiyi pipi cici sisi Basic Cell of a Carry-Chain Adder in Xilinx FPGAs
46
LUT 01 Carry-Skip Adder b-bit block LUT 01 01 xibxib....... yibyib pibpib cibcib sibsib c i b+1 s i b+1 x i b+1 y i b+1 p i b+1 x i b+(b-1) y i b+(b-1) p i b+(b-1) cc (i+1) b s i b+(b-1)
47
LUT 01 Carry-Skip Adder: Carry Skip Multiplexers LUT 01 01....... p [0, b-1] c0c0 cbcb p [b, 2 b-1] p [k-b, k-1] ckck c0c0 cbcb c k-b p [k-b, k-1] p [b, 2 b-1] p [0, b-1] cc b cc 2 b cc k-b
48
LUT 01 Carry-Skip Adder: Computation of the Block Propagate Signals LUT 01....... cbcb p [i b, i b+(b-1)] 0 1 p [i b, i b+1] p [i b, i b+3] p [i b, i b+(b-3)] 01 0 0 x i b+1 y i b+1 x i b y i b x i b+3 y i b+3 x i b+2 y i b+2 x i b+(b-1) y i b+(b-1) x i b+(b-2) y i b+(b-2)
49
Complete Carry-Skip Adder x b-1..0 y b-1..0 x 2 b-1..b y 2 b-1..b x 3 b-1..2 b y 3 b-1..2 b x k-1..k-b y k-1..k-b..... cc 2 b 0 1 cbcb 0 1 c2bc2b cc 3 b cc b 0 1 c0c0 cc k c k-b... s s-1..0 s 2 s-1..s s 3 b-1..2 b s k-1..k-b c0c0 0 1 ckck x b-1..0 y b-1..0 x 2 b-1..b y 2 b-1..b x 3 b-1..2 b y 3 b-1..2 b x k-1..k-b y k-1..k-b..... p [0,b-1] p [b,2 b-1] p [2 b,3 b-1] p [k-b,k-1]
50
Carry-Skip Adders in Xilinx Spartan II FPGAs by J-P. Deschamps, G. Bioul, G. Sutter Delay in ns b=k b=8 b=16 b=32 Max. Frequency Increase 64 14 13 12 13% 96 16 14 13 21% 128 23 14 14 63% 256 38 - 16 17 141% 512 77 - 20 20 296% 1024 159 - 28 25 531% k
51
Carry-Skip Adders in Xilinx Spartan II FPGAs by J-P. Deschamps, G. Bioul, G. Sutter Area in CLB slices b=k b=8 b=16 b=32 b=8 b=16 b=32 Area Overhead 64 32 47 41 - 47% 28% - 96 48 73 66 - 52% 38% - 128 64 99 91 - 55% 42% - 256 128 - 191 179 - 49% 40% 512 256- 391 375 - 53% 46% 1024 512 - 791 767 - 54% 50% k
52
Carry-Skip Adders Variable-Block-Size
53
Apr. 2008Computer Arithmetic, Addition/SubtractionSlide 53 Carry-Skip Adder with Variable-Width Blocks Fig. 7.2 Carry-skip adder with variable-size blocks and three sample carry paths. The total number of bits in the t blocks is k: 2[b + (b + 1) +... + (b + t/2 – 1)] = t(b + t/4 – 1/2) = k b = k/t – t/4 + 1/2 T var-skip-add = 2(b – 1) + 0.5 + t – 2 = 2k/t + t/2 – 2.5 dT/db = –2k/t 2 + 1/2 = 0 t opt = 2 k T opt = 2 k – 2.5 (a factor of 2 smaller than for fixed-block)
54
P xjxj yjyj pjpj P x j+1 y j+1 p j+1 P p j+bi-2 P x j+b -2 y j+b -2 x j+b -1 y j+b -1 p j+bi-1 i i i i CP p [i,i|+b i -1] c in =0 xjxj yjyj FA x j+1 y j+1 FA x j+b -2 y j+b -2 FA... b i stages x j+b -1 y j+b -1 i i i i c out s j+b -1 s j+b -2 s j+1 sjsj Most critical path to produce carry i i
55
P xjxj yjyj pjpj P x j+1 y j+1 p j+1 P p j+b -2 P x j+b -2 y j+b -2 x j+b -1 y j+b -1 p j+b -1 i i i i CP p [i,i|+b i -1] c in xjxj yjyj FA x j+1 y j+1 FA x j+b -2 y j+b -2 FA... b i stages x j+b -1 y j+b -1 i i i c out s j+b -1 s j+b -2 s j+1 sjsj Most critical path to assimilate carry i i i i i
56
Variable-Block-Size Carry-Skip Adder (1) Notation & Assumptions Delay of skip logic = Delay of one stage of ripple-carry adder = 1 delay unit Adder size - k-bits Number of stages - t Block size - variable First and last block size - b bits
57
Variable-Block-Size Carry-Skip Adder (2) Optimum block sizes b t-1 b t-2 b t-3... b t/2+1 b t/2-1... b 2 b 1 b 0 b b+1 b+2... b+ t 2 b+ t 2 b+2 b+1 b Total number of bits k = 2 [ b + (b+1) + (b+2) + … + (b + + 1 ) ] = = t ( b + - ) t 2 t 4 1 2
58
Variable-Block-Size Carry-Skip Adder (3) Number of bits in the first and last block b = k t - t 4 + 1 2 Latency of the carry-skip adder with variable block width Latency fixed-carry-skip = ( b - 1 ) + 0.5 + t - 2 + ( b - 1 ) in block 0 OR gate skipsin last block = 2 b + t - 3.5 = 2 k t - t 4 + 1 2 + t - 3.5 = = 2k t 1 2 t- 2.5 +
59
Variable-Block-Size Carry-Skip Adder (4) Optimal number of blocks dLatency variable-carry-skip dt = 2k t2t2 t opt = 4 k -+ 1 2 = 0 = k 2 b opt = k t opt - 4 + 1 2 = = k - 4 + 1 2 k 2 k 2 = 1 2 b opt =1
60
Variable-Block-Size Carry-Skip Adder (5) Optimal latency = 2k t 1 2 t- 2.5 + Latency variable-carry-skip opt = = 2k + 2 k 2 k 2 = - 2.5 = k 2 Latency variable-carry-skip opt Latency fixed-carry-skip opt 2
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.