Fast Adders: Parallel Prefix Network Adders, Conditional-Sum Adders, & Carry-Skip Adders ECE 645: Lecture 5
Required Reading Chapter 6.4, Carry Determination as Prefix Computation Chapter 6.5, Alternative Parallel Prefix Networks Chapter 7.4, Conditional-Sum Adder Chapter 7.5, Hybrid Adder Designs Chapter 7.1, Simple Carry-Skip Adders Note errata at: Behrooz Parhami, Computer Arithmetic: Algorithms and Hardware Design
Recommended Reading J-P. Deschamps, G. Bioul, G. Sutter, Synthesis of Arithmetic Circuits: FPGA, ASIC and Embedded Systems Chapter , Prefix Adders Chapter , Carry-Skip Adder Chapter , Optimization of Carry-Skip Adders Chapter , FPGA Implementations of Adders Chapter , Long-Operand Adders
Parallel Prefix Network Adders
5 Basic component - Carry operator (1) B”B’ g” p” g’ p’ gp g = g” + g’p” p = p’p” (g, p) = (g’, p’) ¢ (g”, p”) = (g” + g’p”, p’p”) B Parallel Prefix Network Adders
6 Basic component - Carry operator (2) B”B’ g” p” g’ p’ gp g = g” + g’p” p = p’p” (g, p) = (g’, p’) ¢ (g”, p”) = (g” + g’p”, p’p”) B overlap okay! Parallel Prefix Network Adders
7 Associative Not commutative [(g 1, p 1 ) ¢ (g 2, p 2 )] ¢ (g 3, p 3 ) = (g 1, p 1 ) ¢ [(g 2, p 2 ) ¢ (g 3, p 3 )] (g 1, p 1 ) ¢ (g 2, p 2 ) (g 2, p 2 ) ¢ (g 1, p 1 ) Properties of the carry operator ¢
8 Major concept Given: Find: (g 0, p 0 ) (g 1, p 1 ) (g 2, p 2 ) …. (g k-1, p k-1 ) (g [0,0], p [0,0] ) (g [0,1], p [0,1] ) (g [0,2], p [0,2] ) … (g [0,k-1], p [0,k-1] ) c i = g [0,i-1] + c 0 p [0,i-1] Parallel Prefix Network Adders block generate from index 0 to k-1
9 Parallel Prefix Sum Problem Given: Find: Parallel Prefix Adder Problem Given: Find: x 0 x 0 +x 1 x 0 +x 1 +x 2 … x 0 +x 1 +x 2 + …+ x k-1 x 0 x 1 x 2 … x k-1 x 0 x 0 ¢ x 1 x 0 ¢ x 1 ¢ x 2 … x 0 ¢ x 1 ¢ x 2 ¢ … ¢ x k-1 x 0 x 1 x 2 … x k-1 where x i = (g i, p i ) Similar to Parallel Prefix Sum Problem
10 Parallel Prefix Sums Network I
11 Cost = C(k) = 2 C(k/2) + k/2 = = 2 [2C(k/4) + k/4] + k/2 = 4 C(k/4) + k/2 + k/2 = = …. = = 2 log k-1 C(2) + k/2 (log 2 k-1) = = k/2 log 2 k 2 Example: C(16) = 2 C(8) + 8 = 2[2 C(4) + 4] + 8 = = 4 C(4) + 16 = 4 [2 C(2) + 2] + 16 = = 8 C(2) + 24 = = 32 = (16/2) log 2 16 C(2) = 1 Parallel Prefix Sums Network I – Cost (Area) Analysis
12 Delay = D(k) = D(k/2) + 1 = = [D(k/4) + 1] + 1 = D(k/4) = = …. = = log 2 k Example: D(16) = D(8) + 1 = [D(4) + 1] + 1 = = D(4) + 2 = [D(2) + 1] + 2 = = 4 = log 2 16 D(2) = 1 Parallel Prefix Sums Network I – Delay Analysis
13 Parallel Prefix Sums Network II (Brent-Kung)
14 Cost = C(k) = C(k/2) + k-1 = = [C(k/4) + k/2-1] + k-1 = C(k/4) + 3k/2 - 2 = = …. = = C(2) + (2k - 2k/2 (log k-1) ) - (log 2 k-1) = = 2k log 2 k Example: C(16) = C(8) = [C(4) + 8-1] = = C(2) = = 26 = 2· log 2 16 C(2) = 1 2 Parallel Prefix Sums Network II – Cost (Area) Analysis
15 Delay = D(k) = D(k/2) + 2 = = [D(k/4) + 2] + 2 = D(k/4) = = …. = = 2 log 2 k - 1 Example: D(16) = D(8) + 2 = [D(4) + 2] + 2 = = D(4) + 4 = [D(2) + 2] + 4 = = 7 = 2 log D(2) = 1 Parallel Prefix Sums Network II – Delay Analysis
16 8-bit Brent-Kung Parallel Prefix Network
17 x1’x1’x3’x3’ s1’s1’s3’s3’ 2 –bit B-K PPN x5’x5’x7’x7’ s5’s5’s7’s7’ 4-bit Brent-Kung Parallel Prefix Network
18 8-bit Brent-Kung Parallel Prefix Network Critical Path
19 Critical Path GP c C S g i = x i y i p i = x i y i g = g” + g’ p” p = p’ p” c i+1 = g [0,i] + c 0 p [0,i] s i = p i c i 1 gate delay 2 gate delays 1 gate delay
20 Brent-Kung Parallel Prefix Graph for 16 Inputs
21 Kogge-Stone Parallel Prefix Graph for 16 Inputs
22 Comparison of architectures Hybrid Network 2 Brent-Kung Kogge-Stone Cost(k) Delay(k) Cost(16) Delay(16) Cost(32) Delay(32) 2 log 2 k - 2 2k log 2 k log 2 k+1 log 2 k k/2 log 2 k k log 2 k - k Parallel Prefix Network Adders
23 Latency vs. Area Tradeoff
24 Hybrid Brent-Kung/Kogge-Stone Parallel Prefix Graph for 16 Inputs
Conditional-Sum Adders
One-level k-bit Carry-Select Adder
Two-level k-bit Carry Select Adder
28 Conditional Sum Adder Extension of carry-select adder Carry select adder One-level using k/2-bit adders Two-level using k/4-bit adders Three-level using k/8-bit adders Etc. Assuming k is a power of two, eventually have an extreme where there are log 2 k-levels using 1-bit adders This is a conditional sum adder
29 Conditional Sum Adder: Top-Level Block for One Bit Position
30 Three Levels of a Conditional Sum Adder x i+3 y i+3 c=0 c=1 2 2 x i+2 y i+ 2 c=0 c= c=0 3 c= x i+1 y i+1 c=0 c= bit conditional sum block xixi yiyi c=0 c= c=0 3 c= c= branch point concatenation block carry-in determines selection
31 16-Bit Conditional Sum Adder Example
32 Conditional Sum Adder Metrics
Hybrid Adders
A Hybrid Ripple-Carry/Carry-Lookahead Adder
A Hybrid Carry-Lookahead/Carry-Select Adder
Carry-Skip Adders Fixed-Block-Size
Apr. 2008Computer Arithmetic, Addition/SubtractionSlide Simple Carry-Skip Adders Fig. 7.1 Converting a 16-bit ripple-carry adder into a simple carry-skip adder with 4-bit skip blocks.
Apr. 2008Computer Arithmetic, Addition/SubtractionSlide 38 Another View of Carry-Skip Addition Street/freeway analogy for carry-skip adder.
Apr. 2008Computer Arithmetic, Addition/SubtractionSlide 39 Mux-Based Skip Carry Logic The carry-skip adder with “OR combining” works fine if we begin with a clean slate, where all signals are 0s at the outset; otherwise, it will run into problems, which do not exist in mux-based version 0101 p [4j, 4j+3] c 4j+4 Fig of arch book
Apr. 2008Computer Arithmetic, Addition/SubtractionSlide 40 Carry-Skip Adder with Fixed Block Size Block width b; k/b blocks to form a k-bit adder (assume b divides k) Example: k = 32, b opt = 4, T opt = 12.5 stages (contrast with 32 stages for a ripple-carry adder) T fixed-skip-add = (b – 1) (k/b – 2) + (b – 1) in block 0 OR gate skips in last block 2b + k/b – 3.5 stages dT/db = 2 – k/b 2 = 0 b opt = k/2 T opt = 2 2k –
Fixed-Block-Size Carry-Skip Adder (1) Notation & Assumptions Delay of skip logic = Delay of one stage of ripple-carry adder = 1 delay unit Latency of the carry-skip adder with fixed block width Fixed block size - b bits Latency fixed-carry-skip = ( b - 1 ) ( b - 1 ) in block 0 OR gate skipsin last block Adder size - k-bits = 2b k b k b Number of stages - t
Fixed-Block-Size Carry-Skip Adder (2) Optimal fixed block size dLatency fixed-carry-skip db =2 - k b2b2 =0 b opt = k 2 t opt = k b opt = 2 k Latency fixed-carry-skip opt = 2 k 2 + k k = = 2 k + - 3.5= 2 k
k b opt Latency fixed-carry-skip Latency ripple-carry Latency look-ahead Fixed-Block-Size Carry-Skip Adder (3) 8 t opt
Carry-Chain & Carry-Skip Adders in Xilinx FPGAs
LUT 01 xixi yiyi c i yiyiyiyi cici cici xixi yiyi pipi cici sisi Basic Cell of a Carry-Chain Adder in Xilinx FPGAs
LUT 01 Carry-Skip Adder b-bit block LUT xibxib yibyib pibpib cibcib sibsib c i b+1 s i b+1 x i b+1 y i b+1 p i b+1 x i b+(b-1) y i b+(b-1) p i b+(b-1) cc (i+1) b s i b+(b-1)
LUT 01 Carry-Skip Adder: Carry Skip Multiplexers LUT p [0, b-1] c0c0 cbcb p [b, 2 b-1] p [k-b, k-1] ckck c0c0 cbcb c k-b p [k-b, k-1] p [b, 2 b-1] p [0, b-1] cc b cc 2 b cc k-b
LUT 01 Carry-Skip Adder: Computation of the Block Propagate Signals LUT cbcb p [i b, i b+(b-1)] 0 1 p [i b, i b+1] p [i b, i b+3] p [i b, i b+(b-3)] x i b+1 y i b+1 x i b y i b x i b+3 y i b+3 x i b+2 y i b+2 x i b+(b-1) y i b+(b-1) x i b+(b-2) y i b+(b-2)
Complete Carry-Skip Adder x b-1..0 y b-1..0 x 2 b-1..b y 2 b-1..b x 3 b-1..2 b y 3 b-1..2 b x k-1..k-b y k-1..k-b..... cc 2 b 0 1 cbcb 0 1 c2bc2b cc 3 b cc b 0 1 c0c0 cc k c k-b... s s-1..0 s 2 s-1..s s 3 b-1..2 b s k-1..k-b c0c0 0 1 ckck x b-1..0 y b-1..0 x 2 b-1..b y 2 b-1..b x 3 b-1..2 b y 3 b-1..2 b x k-1..k-b y k-1..k-b..... p [0,b-1] p [b,2 b-1] p [2 b,3 b-1] p [k-b,k-1]
Carry-Skip Adders in Xilinx Spartan II FPGAs by J-P. Deschamps, G. Bioul, G. Sutter Delay in ns b=k b=8 b=16 b=32 Max. Frequency Increase % % % % % % k
Carry-Skip Adders in Xilinx Spartan II FPGAs by J-P. Deschamps, G. Bioul, G. Sutter Area in CLB slices b=k b=8 b=16 b=32 b=8 b=16 b=32 Area Overhead % 28% % 38% % 42% % 40% % 46% % 50% k
Carry-Skip Adders Variable-Block-Size
Apr. 2008Computer Arithmetic, Addition/SubtractionSlide 53 Carry-Skip Adder with Variable-Width Blocks Fig. 7.2 Carry-skip adder with variable-size blocks and three sample carry paths. The total number of bits in the t blocks is k: 2[b + (b + 1) (b + t/2 – 1)] = t(b + t/4 – 1/2) = k b = k/t – t/4 + 1/2 T var-skip-add = 2(b – 1) t – 2 = 2k/t + t/2 – 2.5 dT/db = –2k/t 2 + 1/2 = 0 t opt = 2 k T opt = 2 k – 2.5 (a factor of 2 smaller than for fixed-block)
P xjxj yjyj pjpj P x j+1 y j+1 p j+1 P p j+bi-2 P x j+b -2 y j+b -2 x j+b -1 y j+b -1 p j+bi-1 i i i i CP p [i,i|+b i -1] c in =0 xjxj yjyj FA x j+1 y j+1 FA x j+b -2 y j+b -2 FA... b i stages x j+b -1 y j+b -1 i i i i c out s j+b -1 s j+b -2 s j+1 sjsj Most critical path to produce carry i i
P xjxj yjyj pjpj P x j+1 y j+1 p j+1 P p j+b -2 P x j+b -2 y j+b -2 x j+b -1 y j+b -1 p j+b -1 i i i i CP p [i,i|+b i -1] c in xjxj yjyj FA x j+1 y j+1 FA x j+b -2 y j+b -2 FA... b i stages x j+b -1 y j+b -1 i i i c out s j+b -1 s j+b -2 s j+1 sjsj Most critical path to assimilate carry i i i i i
Variable-Block-Size Carry-Skip Adder (1) Notation & Assumptions Delay of skip logic = Delay of one stage of ripple-carry adder = 1 delay unit Adder size - k-bits Number of stages - t Block size - variable First and last block size - b bits
Variable-Block-Size Carry-Skip Adder (2) Optimum block sizes b t-1 b t-2 b t-3... b t/2+1 b t/ b 2 b 1 b 0 b b+1 b+2... b+ t 2 b+ t 2 b+2 b+1 b Total number of bits k = 2 [ b + (b+1) + (b+2) + … + (b ) ] = = t ( b + - ) t 2 t 4 1 2
Variable-Block-Size Carry-Skip Adder (3) Number of bits in the first and last block b = k t - t Latency of the carry-skip adder with variable block width Latency fixed-carry-skip = ( b - 1 ) t ( b - 1 ) in block 0 OR gate skipsin last block = 2 b + t = 2 k t - t t = = 2k t 1 2 t
Variable-Block-Size Carry-Skip Adder (4) Optimal number of blocks dLatency variable-carry-skip dt = 2k t2t2 t opt = 4 k = 0 = k 2 b opt = k t opt = = k k 2 k 2 = 1 2 b opt =1
Variable-Block-Size Carry-Skip Adder (5) Optimal latency = 2k t 1 2 t Latency variable-carry-skip opt = = 2k + 2 k 2 k 2 = = k 2 Latency variable-carry-skip opt Latency fixed-carry-skip opt 2