Fast Adders: Parallel Prefix Network Adders, Conditional-Sum Adders, & Carry-Skip Adders ECE 645: Lecture 5.

Slides:



Advertisements
Similar presentations
Multioperand Addition Lecture 6. Required Reading Chapter 8, Multioperand Addition Note errata at:
Advertisements

Institute of Applied Microelectronics and Computer Engineering College of Computer Science and Electrical Engineering, University of Rostock Slide 1 Spezielle.
Fast Adders See: P&H Chapter 3.1-3, C Goals: serial to parallel conversion time vs. space tradeoffs design choices.
Part II Addition / Subtraction
ECE 331 – Digital System Design
Design and Implementation of VLSI Systems (EN1600) Lecture 27: Datapath Subsystems 3/4 Prof. Sherief Reda Division of Engineering, Brown University Spring.
1 CS 140 Lecture 14 Standard Combinational Modules Professor CK Cheng CSE Dept. UC San Diego Some slides from Harris and Harris.
1 A Timing-Driven Synthesis Approach of a Fast Four-Stage Hybrid Adder in Sum-of-Products Sabyasachi Das University of Colorado, Boulder Sunil P. Khatri.
S. Reda EN160 SP’07 Design and Implementation of VLSI Systems (EN0160) Lecture 28: Datapath Subsystems 2/3 Prof. Sherief Reda Division of Engineering,
EECS Components and Design Techniques for Digital Systems Lec 18 – Arithmetic II (Multiplication) David Culler Electrical Engineering and Computer.
VLSI Arithmetic Adders Prof. Vojin G. Oklobdzija University of California
CSE477 VLSI Digital Circuits Fall 2002 Lecture 20: Adder Design
1 Design of a Parallel-Prefix Adder Architecture with Efficient Timing-Area Tradeoff Characteristic Sabyasachi Das University of Colorado, Boulder Sunil.
Introduction to CMOS VLSI Design Lecture 11: Adders
Chapter # 5: Arithmetic Circuits Contemporary Logic Design Randy H
Modern VLSI Design 2e: Chapter 6 Copyright  1998 Prentice Hall PTR Topics n Shifters. n Adders and ALUs.
VLSI Arithmetic Adders & Multipliers
Spring 2006EE VLSI Design II - © Kia Bazargan 68 EE 5324 – VLSI Design II Kia Bazargan University of Minnesota Part II: Adders.
Lecture 17: Adders.
ECE 301 – Digital Electronics
Lecture 12b: Adders. CMOS VLSI DesignCMOS VLSI Design 4th Ed. 17: Adders 2 Generate / Propagate  Equations often factored into G and P  Generate and.
Spring 2002EECS150 - Lec10-cl1 Page 1 EECS150 - Digital Design Lecture 10 - Combinational Logic Circuits Part 1 Feburary 26, 2002 John Wawrzynek.
Chapter 5 Arithmetic Logic Functions. Page 2 This Chapter..  We will be looking at multi-valued arithmetic and logic functions  Bitwise AND, OR, EXOR,
Parallel Prefix Adders A Case Study
Introduction to CMOS VLSI Design Lecture 11: Adders David Harris Harvey Mudd College Spring 2004.
Basic Adders and Counters Implementation of Adders in FPGAs ECE 645: Lecture 3.
Bar Ilan University, Engineering Faculty
 Arithmetic circuit  Addition  Subtraction  Division  Multiplication.
Lecture 10 Fast Dividers.
VLSI Arithmetic Adders & Multipliers Prof. Vojin G. Oklobdzija University of California
1. Copyright  2005 by Oxford University Press, Inc. Computer Architecture Parhami2 Figure 10.1 Truth table and schematic diagram for a binary half-adder.
Part II Addition / Subtraction
1 Design Space Exploration for Power-Efficient Mixed-Radix Ling Adders Chung-Kuan Cheng Computer Science and Engineering Depart. University of California,
Chapter # 5: Arithmetic Circuits
Chapter 6-1 ALU, Adder and Subtractor
5-1 Programmable and Steering Logic Chapter # 5: Arithmetic Circuits.
EECS Components and Design Techniques for Digital Systems Lec 16 – Arithmetic II (Multiplication) David Culler Electrical Engineering and Computer.
1 CPSC3850 Adders and Simple ALUs Simple Adders Figures 10.1/10.2 Binary half-adder (HA) and full-adder (FA). Digit-set interpretation: {0, 1}
Circuit Delays. Outline  Calculation of Circuit Delays  Faster Circuits  Look-Ahead Carry Adder.
Reconfigurable Computing - Type conversions and the standard libraries John Morris Chung-Ang University The University of Auckland ‘Iolanthe’ at 13 knots.
July 2005Computer Architecture, The Arithmetic/Logic UnitSlide 1 Part III The Arithmetic/Logic Unit.
Computer Architecture, The Arithmetic/Logic UnitSlide 1 Part III The Arithmetic/Logic Unit.
Modern VLSI Design 4e: Chapter 6 Copyright  2008 Wayne Wolf Topics n Shifters. n Adders and ALUs.
FPGA-Based System Design: Chapter 4 Copyright  2003 Prentice Hall PTR Topics n Number representation. n Shifters. n Adders and ALUs.
Spring C:160/55:132 Page 1 Lecture 19 - Computer Arithmetic March 30, 2004 Sukumar Ghosh.
ECE 645 – Computer Arithmetic Lecture 6: Multi-Operand Addition ECE 645—Computer Arithmetic 3/5/08.
Block p and g Generators. Carry Determination as Prefix Computations Two Contiguous (or Overlapping) Blocks (g’, p’) and (g’’, p’’) Merged Block (g, p)
Unrolling Carry Recurrence
Conditional-Sum Adders Parallel Prefix Network Adders
EE466: VLSI Design Lecture 13: Adders
CMPEN 411 VLSI Digital Circuits Spring 2009 Lecture 19: Adder Design
CSE477 VLSI Digital Circuits Fall 2002 Lecture 20: Adder Design
1. 2 Figure 10.1 Truth table and schematic diagram for a binary half-adder Simple Adders Half-adder.
Multioperand Addition
Carry-Lookahead, Carry-Select, & Hybrid Adders ECE 645: Lecture 2.
EEL 5722 FPGA Design Fall 2003 Digit-Serial DSP Functions Part I.
Full Adder Truth Table Conjugate Symmetry A B C CARRY SUM
Tree and Array Multipliers Lecture 8. Required Reading Chapter 11, Tree and Array Multipliers Chapter 12.5, The special case of squaring Note errata at:
Carry-Lookahead & Carry-Select Adders
Conditional-Sum Adders Parallel Prefix Network Adders
Sequential Multipliers
Divide-and-Conquer Design
FPGA Implementation of Multipliers
Basic Adders and Counters Implementation of Adders
Multioperand Addition
Tree and Array Multipliers
Carry-Lookahead, Carry-Select, & Hybrid Adders
Carry-Lookahead, Carry-Select, & Hybrid Adders
Carry-Lookahead & Carry-Select Adders
Conditional-Sum Adders Parallel Prefix Network Adders
Presentation transcript:

Fast Adders: Parallel Prefix Network Adders, Conditional-Sum Adders, & Carry-Skip Adders ECE 645: Lecture 5

Required Reading Chapter 6.4, Carry Determination as Prefix Computation Chapter 6.5, Alternative Parallel Prefix Networks Chapter 7.4, Conditional-Sum Adder Chapter 7.5, Hybrid Adder Designs Chapter 7.1, Simple Carry-Skip Adders Note errata at: Behrooz Parhami, Computer Arithmetic: Algorithms and Hardware Design

Recommended Reading J-P. Deschamps, G. Bioul, G. Sutter, Synthesis of Arithmetic Circuits: FPGA, ASIC and Embedded Systems Chapter , Prefix Adders Chapter , Carry-Skip Adder Chapter , Optimization of Carry-Skip Adders Chapter , FPGA Implementations of Adders Chapter , Long-Operand Adders

Parallel Prefix Network Adders

5 Basic component - Carry operator (1) B”B’ g” p” g’ p’ gp g = g” + g’p” p = p’p” (g, p) = (g’, p’) ¢ (g”, p”) = (g” + g’p”, p’p”) B Parallel Prefix Network Adders

6 Basic component - Carry operator (2) B”B’ g” p” g’ p’ gp g = g” + g’p” p = p’p” (g, p) = (g’, p’) ¢ (g”, p”) = (g” + g’p”, p’p”) B overlap okay! Parallel Prefix Network Adders

7 Associative Not commutative [(g 1, p 1 ) ¢ (g 2, p 2 )] ¢ (g 3, p 3 ) = (g 1, p 1 ) ¢ [(g 2, p 2 ) ¢ (g 3, p 3 )] (g 1, p 1 ) ¢ (g 2, p 2 )  (g 2, p 2 ) ¢ (g 1, p 1 ) Properties of the carry operator ¢

8 Major concept Given: Find: (g 0, p 0 ) (g 1, p 1 ) (g 2, p 2 ) …. (g k-1, p k-1 ) (g [0,0], p [0,0] ) (g [0,1], p [0,1] ) (g [0,2], p [0,2] ) … (g [0,k-1], p [0,k-1] ) c i = g [0,i-1] + c 0 p [0,i-1] Parallel Prefix Network Adders block generate from index 0 to k-1

9 Parallel Prefix Sum Problem Given: Find: Parallel Prefix Adder Problem Given: Find: x 0 x 0 +x 1 x 0 +x 1 +x 2 … x 0 +x 1 +x 2 + …+ x k-1 x 0 x 1 x 2 … x k-1 x 0 x 0 ¢ x 1 x 0 ¢ x 1 ¢ x 2 … x 0 ¢ x 1 ¢ x 2 ¢ … ¢ x k-1 x 0 x 1 x 2 … x k-1 where x i = (g i, p i ) Similar to Parallel Prefix Sum Problem

10 Parallel Prefix Sums Network I

11 Cost = C(k) = 2 C(k/2) + k/2 = = 2 [2C(k/4) + k/4] + k/2 = 4 C(k/4) + k/2 + k/2 = = …. = = 2 log k-1 C(2) + k/2 (log 2 k-1) = = k/2 log 2 k 2 Example: C(16) = 2 C(8) + 8 = 2[2 C(4) + 4] + 8 = = 4 C(4) + 16 = 4 [2 C(2) + 2] + 16 = = 8 C(2) + 24 = = 32 = (16/2) log 2 16 C(2) = 1 Parallel Prefix Sums Network I – Cost (Area) Analysis

12 Delay = D(k) = D(k/2) + 1 = = [D(k/4) + 1] + 1 = D(k/4) = = …. = = log 2 k Example: D(16) = D(8) + 1 = [D(4) + 1] + 1 = = D(4) + 2 = [D(2) + 1] + 2 = = 4 = log 2 16 D(2) = 1 Parallel Prefix Sums Network I – Delay Analysis

13 Parallel Prefix Sums Network II (Brent-Kung)

14 Cost = C(k) = C(k/2) + k-1 = = [C(k/4) + k/2-1] + k-1 = C(k/4) + 3k/2 - 2 = = …. = = C(2) + (2k - 2k/2 (log k-1) ) - (log 2 k-1) = = 2k log 2 k Example: C(16) = C(8) = [C(4) + 8-1] = = C(2) = = 26 = 2· log 2 16 C(2) = 1 2 Parallel Prefix Sums Network II – Cost (Area) Analysis

15 Delay = D(k) = D(k/2) + 2 = = [D(k/4) + 2] + 2 = D(k/4) = = …. = = 2 log 2 k - 1 Example: D(16) = D(8) + 2 = [D(4) + 2] + 2 = = D(4) + 4 = [D(2) + 2] + 4 = = 7 = 2 log D(2) = 1 Parallel Prefix Sums Network II – Delay Analysis

16 8-bit Brent-Kung Parallel Prefix Network

17 x1’x1’x3’x3’ s1’s1’s3’s3’ 2 –bit B-K PPN x5’x5’x7’x7’ s5’s5’s7’s7’ 4-bit Brent-Kung Parallel Prefix Network

18 8-bit Brent-Kung Parallel Prefix Network Critical Path

19 Critical Path GP c C S g i = x i y i p i = x i  y i g = g” + g’ p” p = p’ p” c i+1 = g [0,i] + c 0 p [0,i] s i = p i  c i 1 gate delay 2 gate delays 1 gate delay

20 Brent-Kung Parallel Prefix Graph for 16 Inputs

21 Kogge-Stone Parallel Prefix Graph for 16 Inputs

22 Comparison of architectures Hybrid Network 2 Brent-Kung Kogge-Stone Cost(k) Delay(k) Cost(16) Delay(16) Cost(32) Delay(32) 2 log 2 k - 2 2k log 2 k log 2 k+1 log 2 k k/2 log 2 k k log 2 k - k Parallel Prefix Network Adders

23 Latency vs. Area Tradeoff

24 Hybrid Brent-Kung/Kogge-Stone Parallel Prefix Graph for 16 Inputs

Conditional-Sum Adders

One-level k-bit Carry-Select Adder

Two-level k-bit Carry Select Adder

28 Conditional Sum Adder Extension of carry-select adder Carry select adder One-level using k/2-bit adders Two-level using k/4-bit adders Three-level using k/8-bit adders Etc. Assuming k is a power of two, eventually have an extreme where there are log 2 k-levels using 1-bit adders This is a conditional sum adder

29 Conditional Sum Adder: Top-Level Block for One Bit Position

30 Three Levels of a Conditional Sum Adder x i+3 y i+3 c=0 c=1 2 2 x i+2 y i+ 2 c=0 c= c=0 3 c= x i+1 y i+1 c=0 c= bit conditional sum block xixi yiyi c=0 c= c=0 3 c= c= branch point concatenation block carry-in determines selection

31 16-Bit Conditional Sum Adder Example

32 Conditional Sum Adder Metrics

Hybrid Adders

A Hybrid Ripple-Carry/Carry-Lookahead Adder

A Hybrid Carry-Lookahead/Carry-Select Adder

Carry-Skip Adders Fixed-Block-Size

Apr. 2008Computer Arithmetic, Addition/SubtractionSlide Simple Carry-Skip Adders Fig. 7.1 Converting a 16-bit ripple-carry adder into a simple carry-skip adder with 4-bit skip blocks.

Apr. 2008Computer Arithmetic, Addition/SubtractionSlide 38 Another View of Carry-Skip Addition Street/freeway analogy for carry-skip adder.

Apr. 2008Computer Arithmetic, Addition/SubtractionSlide 39 Mux-Based Skip Carry Logic The carry-skip adder with “OR combining” works fine if we begin with a clean slate, where all signals are 0s at the outset; otherwise, it will run into problems, which do not exist in mux-based version 0101 p [4j, 4j+3] c 4j+4 Fig of arch book

Apr. 2008Computer Arithmetic, Addition/SubtractionSlide 40 Carry-Skip Adder with Fixed Block Size Block width b; k/b blocks to form a k-bit adder (assume b divides k) Example: k = 32, b opt = 4, T opt = 12.5 stages (contrast with 32 stages for a ripple-carry adder) T fixed-skip-add = (b – 1) (k/b – 2) + (b – 1) in block 0 OR gate skips in last block  2b + k/b – 3.5 stages dT/db = 2 – k/b 2 = 0  b opt =  k/2 T opt = 2  2k –

Fixed-Block-Size Carry-Skip Adder (1) Notation & Assumptions Delay of skip logic = Delay of one stage of ripple-carry adder = 1 delay unit Latency of the carry-skip adder with fixed block width Fixed block size - b bits Latency fixed-carry-skip = ( b - 1 ) ( b - 1 ) in block 0 OR gate skipsin last block Adder size - k-bits = 2b k b k b Number of stages - t

Fixed-Block-Size Carry-Skip Adder (2) Optimal fixed block size dLatency fixed-carry-skip db =2 - k b2b2 =0 b opt =  k 2 t opt = k b opt =  2 k Latency fixed-carry-skip opt = 2  k 2 + k  k = =  2 k +  - 3.5=  2 k

k b opt Latency fixed-carry-skip Latency ripple-carry Latency look-ahead Fixed-Block-Size Carry-Skip Adder (3) 8 t opt

Carry-Chain & Carry-Skip Adders in Xilinx FPGAs

LUT 01 xixi yiyi c i yiyiyiyi cici cici xixi yiyi pipi cici sisi Basic Cell of a Carry-Chain Adder in Xilinx FPGAs

LUT 01 Carry-Skip Adder b-bit block LUT xibxib yibyib pibpib cibcib sibsib c i  b+1 s i  b+1 x i  b+1 y i  b+1 p i  b+1 x i  b+(b-1) y i  b+(b-1) p i  b+(b-1) cc (i+1)  b s i  b+(b-1)

LUT 01 Carry-Skip Adder: Carry Skip Multiplexers LUT p [0, b-1] c0c0 cbcb p [b, 2  b-1] p [k-b, k-1] ckck c0c0 cbcb c k-b p [k-b, k-1] p [b, 2  b-1] p [0, b-1] cc b cc 2  b cc k-b

LUT 01 Carry-Skip Adder: Computation of the Block Propagate Signals LUT cbcb p [i  b, i  b+(b-1)] 0 1 p [i  b, i  b+1] p [i  b, i  b+3] p [i  b, i  b+(b-3)] x i  b+1 y i  b+1 x i  b y i  b x i  b+3 y i  b+3 x i  b+2 y i  b+2 x i  b+(b-1) y i  b+(b-1) x i  b+(b-2) y i  b+(b-2)

Complete Carry-Skip Adder x b-1..0 y b-1..0 x 2  b-1..b y 2  b-1..b x 3  b-1..2  b y 3  b-1..2  b x k-1..k-b y k-1..k-b..... cc 2  b 0 1 cbcb 0 1 c2bc2b cc 3  b cc b 0 1 c0c0 cc k c k-b... s s-1..0 s 2  s-1..s s 3  b-1..2  b s k-1..k-b c0c0 0 1 ckck x b-1..0 y b-1..0 x 2  b-1..b y 2  b-1..b x 3  b-1..2  b y 3  b-1..2  b x k-1..k-b y k-1..k-b..... p [0,b-1] p [b,2  b-1] p [2  b,3  b-1] p [k-b,k-1]

Carry-Skip Adders in Xilinx Spartan II FPGAs by J-P. Deschamps, G. Bioul, G. Sutter Delay in ns b=k b=8 b=16 b=32 Max. Frequency Increase % % % % % % k

Carry-Skip Adders in Xilinx Spartan II FPGAs by J-P. Deschamps, G. Bioul, G. Sutter Area in CLB slices b=k b=8 b=16 b=32 b=8 b=16 b=32 Area Overhead % 28% % 38% % 42% % 40% % 46% % 50% k

Carry-Skip Adders Variable-Block-Size

Apr. 2008Computer Arithmetic, Addition/SubtractionSlide 53 Carry-Skip Adder with Variable-Width Blocks Fig. 7.2 Carry-skip adder with variable-size blocks and three sample carry paths. The total number of bits in the t blocks is k: 2[b + (b + 1) (b + t/2 – 1)] = t(b + t/4 – 1/2) = k b = k/t – t/4 + 1/2 T var-skip-add = 2(b – 1) t – 2 = 2k/t + t/2 – 2.5 dT/db = –2k/t 2 + 1/2 = 0  t opt = 2  k T opt = 2  k – 2.5 (a factor of  2 smaller than for fixed-block)

P xjxj yjyj pjpj P x j+1 y j+1 p j+1 P p j+bi-2 P x j+b -2 y j+b -2 x j+b -1 y j+b -1 p j+bi-1 i i i i CP p [i,i|+b i -1] c in =0 xjxj yjyj FA x j+1 y j+1 FA x j+b -2 y j+b -2 FA... b i stages x j+b -1 y j+b -1 i i i i c out s j+b -1 s j+b -2 s j+1 sjsj Most critical path to produce carry i i

P xjxj yjyj pjpj P x j+1 y j+1 p j+1 P p j+b -2 P x j+b -2 y j+b -2 x j+b -1 y j+b -1 p j+b -1 i i i i CP p [i,i|+b i -1] c in xjxj yjyj FA x j+1 y j+1 FA x j+b -2 y j+b -2 FA... b i stages x j+b -1 y j+b -1 i i i c out s j+b -1 s j+b -2 s j+1 sjsj Most critical path to assimilate carry i i i i i

Variable-Block-Size Carry-Skip Adder (1) Notation & Assumptions Delay of skip logic = Delay of one stage of ripple-carry adder = 1 delay unit Adder size - k-bits Number of stages - t Block size - variable First and last block size - b bits

Variable-Block-Size Carry-Skip Adder (2) Optimum block sizes b t-1 b t-2 b t-3... b t/2+1 b t/ b 2 b 1 b 0 b b+1 b+2... b+ t 2 b+ t 2 b+2 b+1 b Total number of bits k = 2 [ b + (b+1) + (b+2) + … + (b ) ] = = t ( b + - ) t 2 t 4 1 2

Variable-Block-Size Carry-Skip Adder (3) Number of bits in the first and last block b = k t - t Latency of the carry-skip adder with variable block width Latency fixed-carry-skip = ( b - 1 ) t ( b - 1 ) in block 0 OR gate skipsin last block = 2 b + t = 2 k t - t t = = 2k t 1 2 t

Variable-Block-Size Carry-Skip Adder (4) Optimal number of blocks dLatency variable-carry-skip dt = 2k t2t2 t opt =  4 k = 0 =  k 2 b opt = k t opt = = k  k 2  k 2 = 1 2 b opt =1

Variable-Block-Size Carry-Skip Adder (5) Optimal latency = 2k t 1 2 t Latency variable-carry-skip opt = = 2k + 2  k 2  k 2 = =  k 2 Latency variable-carry-skip opt  Latency fixed-carry-skip opt  2