Integer Multipliers
Multipliers A must have circuit in most DSP applications A variety of multipliers exists that can be chosen based on their performance Serial, Serial/Parallel,Shift and Add, Array, Booth, Wallace Tree,….
16x16 multiplier converter Converter RB r e s t n RC RA
Multiplication Algorithm X= Xn-1 Xn-2 ………..……X0 Multiplicand Y=Yn-1 Yn-2……………….Y0 Multiplier Yn-1X0 Yn-2X0 Yn-3X0 …… Y1X0 Y0X0 Yn-1X1 Yn-2X1 Yn-3X1 …… Y1X1 Y0X1 Yn-1X2 Yn-2X2 Yn-3X2 …… Y1X2 Y0X2 … … … … …. …. …. …. …. Yn-1Xn-2 Yn-2X0 n-2 Yn-3X n-2 …… Y1Xn-2 Y0Xn-2 Yn-1Xn-1 Yn-2X0n-1 Yn-3Xn-1 …… Y1Xn-1 Y0Xn-1 ----------------------------------------------------------------------------------------------------------------------------------------- P2n-1 P2n-2 P2n-3 P2 P1 P0
1. Multiplication Algorithms Implementation of multiplication of binary numbers boils down to how to do the additions. Consider the two 8 bit numbers A and B to generate the 16 bit product P. First generate the 64 partial Products and then add them up.
Multiplier Design MU Storage R REG E G OUT I N ( Multiplier Unit) MU ( Multiplier Unit) R E G I N REG OUT Control Unit Storage
Serial Multiplier X: x3x2x1x0 Y:y 3y2y1y0 Input Sequence for G1: 00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0 00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0 Reset:010000100001000010000 Slide 1
Si: the ith bit of the final result X: x3x2x1x0 Y:y 3y2y1y0 Input Sequence for G1: 00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0 00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0 Reset:010000100001000010000 Slide 2
Si: the ith bit of the final result X: x3x2x1x0 Y:y 3y2y1y0 Input Sequence for G1: 00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0 00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0 Reset:010000100001000010000 Slide 3
Si: the ith bit of the final result X: x3x2x1x0 Y:y 3y2y1y0 Input Sequence for G1: 00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0 00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0 Reset:010000100001000010000 Slide 4
Si: the ith bit of the final result X: x3x2x1x0 Y:y 3y2y1y0 Input Sequence for G1: 00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0 00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0 Reset:010000100001000010000 Slide 5
Si: the ith bit of the final result Ci: the only carry from column i X: x3x2x1x0 Y:y 3y2y1y0 Input Sequence for G1: 00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0 00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0 Reset:010000100001000010000 Slide 6
Si: the ith bit of the final result Ci: the only carry from column i Sij: the jth partial sum for column i Cij: the jth partial carry from column i X: x3x2x1x0 Y:y 3y2y1y0 Input Sequence for G1: 00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0 00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0 Reset:010000100001000010000 Slide 7
Si: the ith bit of the final result Ci: the only carry from column i Sij: the jth partial sum for column i Cij: the jth partial carry from column i X: x3x2x1x0 Y:y 3y2y1y0 Input Sequence for G1: 00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0 00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0 Reset:010000100001000010000 Slide 8
Si: the ith bit of the final result Ci: the only carry from column i Sij: the jth partial sum for column i Cij: the jth partial carry from column i X: x3x2x1x0 Y:y 3y2y1y0 Input Sequence for G1: 00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0 00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0 Reset:010000100001000010000 Slide 9
Si: the ith bit of the final result Ci: the only carry from column i Sij: the jth partial sum for column i Cij: the jth partial carry from column i X: x3x2x1x0 Y:y 3y2y1y0 Input Sequence for G1: 00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0 00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0 Reset:010000100001000010000 Slide 10
Si: the ith bit of the final result Ci: the only carry from column i Sij: the jth partial sum for column i Cij: the jth partial carry from column i X: x3x2x1x0 Y:y 3y2y1y0 Input Sequence for G1: 00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0 00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0 Reset:010000100001000010000 Slide 11
Si: the ith bit of the final result Ci: the only carry from column i Sij: the jth partial sum for column i Cij: the jth partial carry from column i X: x3x2x1x0 Y:y 3y2y1y0 Input Sequence for G1: 00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0 00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0 Reset:010000100001000010000 Slide 12
Si: the ith bit of the final result Ci: the only carry from column i Sij: the jth partial sum for column i Cij: the jth partial carry from column i X: x3x2x1x0 Y:y 3y2y1y0 Input Sequence for G1: 00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0 00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0 Reset:010000100001000010000 Slide 13
Si: the ith bit of the final result Ci: the only carry from column i Sij: the jth partial sum for column i Cij: the jth partial carry from column i X: x3x2x1x0 Y:y 3y2y1y0 Input Sequence for G1: 00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0 00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0 Reset:010000100001000010000 Slide 14
Si: the ith bit of the final result Ci: the only carry from column i Sij: the jth partial sum for column i Cij: the jth partial carry from column i X: x3x2x1x0 Y:y 3y2y1y0 Input Sequence for G1: 00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0 00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0 Reset:010000100001000010000 Slide 15
Si: the ith bit of the final result Ci: the only carry from column i Sij: the jth partial sum for column i Cij: the jth partial carry from column i X: x3x2x1x0 Y:y 3y2y1y0 Input Sequence for G1: 00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0 00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0 Reset:010000100001000010000 Slide 16
Si: the ith bit of the final result Ci: the only carry from column i Sij: the jth partial sum for column i Cij: the jth partial carry from column i X: x3x2x1x0 Y:y 3y2y1y0 Input Sequence for G1: 00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0 00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0 Reset:010000100001000010000 Slide 17
Si: the ith bit of the final result Ci: the only carry from column i Sij: the jth partial sum for column i Cij: the jth partial carry from column i X: x3x2x1x0 Y:y 3y2y1y0 Input Sequence for G1: 00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0 00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0 Reset:010000100001000010000 Slide 18
Si: the ith bit of the final result Ci: the only carry from column i Sij: the jth partial sum for column i Cij: the jth partial carry from column i X: x3x2x1x0 Y:y 3y2y1y0 Input Sequence for G1: 00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0 00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0 Reset:010000100001000010000 Slide 19
Si: the ith bit of the final result Ci: the only carry from column i Sij: the jth partial sum for column i Cij: the jth partial carry from column i X: x3x2x1x0 Y:y 3y2y1y0 Input Sequence for G1: 00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0 00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0 Reset:010000100001000010000 Slide 20
Si: the ith bit of the final result Ci: the only carry from column i Sij: the jth partial sum for column i Cij: the jth partial carry from column i X: x3x2x1x0 Y:y 3y2y1y0 Input Sequence for G1: 00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0 00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0 Reset:010000100001000010000 Slide 21
Si: the ith bit of the final result Ci: the only carry from column i Sij: the jth partial sum for column i Cij: the jth partial carry from column i X: x3x2x1x0 Y:y 3y2y1y0 Input Sequence for G1: 00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0 00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0 Reset:010000100001000010000 Slide 21
Serial / Parallel Multiplier Si: the ith bit of the final result Serial / Parallel Multiplier Slide 1 slide
Si: the ith bit of the final result Ci: the only carry from column i Slide 2
Si: the ith bit of the final result Ci: the only carry from column i Sij: the jth partial sum for column i Cij: the jth partial carry from column i Slide 3
Si: the ith bit of the final result Ci: the only carry from column i Sij: the jth partial sum for column i Cij: the jth partial carry from column i Slide 4
Si: the ith bit of the final result Ci: the only carry from column i Sij: the jth partial sum for column i Cij: the jth partial carry from column i Slide 5
Si: the ith bit of the final result Ci: the only carry from column i Sij: the jth partial sum for column i Cij: the jth partial carry from column i Slide 6
Si: the ith bit of the final result Ci: the only carry from column i Sij: the jth partial sum for column i Cij: the jth partial carry from column i Slide 7
Si: the ith bit of the final result Ci: the only carry from column i Slide 8
Shift AND Add Multiplier 8 bit Adder MUX INPUT Ain (7 downto 0) REGA Result (7 downto 0) Result (15 downto 8) INPUT Bin (7 downto 0) CLOCK REGB REGC
Synchronous Shift and Add Multiplier controller Multiplication process: 5 states: Idle, Init, Test, Add, and Shift&Count. Idle: Starts by receiving the Start signal; Init: Multiplicand and multiplier are loaded into a load register and a shift register, respectively; Test: The LSB in the shift register which contains the multiplier is tested to decide the next state;
Synchronous Shift and Add Multiplier ControllerDesign Add: If LSB is ‘1’, then next state is to add the new partial product to the accumulation result, and the state machine transits to shift&count state ; Shift&Count: If LSB is ‘0’, then the two shift register shift their contains one bit right, and the counter counts up by one step. After that, the state machine transits back to test state; When the counter reaches to N , a Stop signal is asserted and the state machine goes to the idle state; Idle: In the idle state, a Done signal is asserted to indicate the end of multiplication.
n-bit Multiplier: Q0=1: Multiplicand is added to register A; the result is stored in register A; registers C, A, Q are shifted to the right one bit Q0=0: Registers C, A, Q are shifted to the right one bit Slide 1
Example: 4-bit Multiplier Initial Values Slide 2
Example: 4-bit Multiplier First Cycle--Add Slide 3
Example: 4-bit Multiplier First Cycle--Shift Slide 4
Example: 4-bit Multiplier Second Cycle--Shift Slide 5
Example: 4-bit Multiplier Third Cycle--Add Slide 6
Example: 4-bit Multiplier Third Cycle--Shift Slide 7
Example: 4-bit Multiplier Fourth Cycle--Add Slide 8
Example: 4-bit Multiplier Fourth Cycle--Shift Slide 9
4*4 Synchronous Shift and Add Multiplier Design Layout Design Floor plan of the 4*4 Synchronous Shift and Add Multiplier
Comparison between Synchronous and Asynchronous Approaches .
Example : (simulated by Ovais Ahmed) Multiplicand = 100010012 = 8916 Multiplier = 101010112 = AB16 Expected Result = 1011011100000112 =5B8316
Array Multiplier · Regular structure based on add and shift algorithm. · Regular structure based on add and shift algorithm. · Addition is mainly done by carry save algorithm. · Sign bit extension results in a higher capacitive load and slows down the speed of the circuit.
Addition with CLA
Array Multiplier with CSA
Critical Path with Array Multipliers FA FA FA HA FA FA FA HA FA FA FA HA Two of the possible paths for the Ripple-Carry based 4*4 Multiplier Area = (N*N) AND Gate + (N-1)N Full-Adder τ Delay = + (2N-1) τ HA FA
Wallace Tree
Array Multiplier + Wallace Tree
Baugh-Wooley Algorithm Convert negative partial products to positive representation No sign-extension required 12/7/2018 Concordia VLSI Lab 59 slide 59
examples of 5-by-5 Baugh-Wooley 12/7/2018 Concordia VLSI Lab 60
Squarer using Baugh-Wooley Algorithm * ------------- a7*a0 a6*a0 a5*a0 a4*a0 a3*a0 a2*a0 a1*a0 a0*a0 a7*a1 a6*a1 a5*a1 a4*a1 a3*a1 a2*a1 a1*a1 a0*a1 a7*a2 a6*a2 a5*a2 a4*a2 a3*a2 a2*a2 a1*a2 a0*a2 a7*a3 a6*a3 a5*a3 a4*a3 a3*a3 a2*a3 a1*a3 a0*a3 a7*a4 a6*a4 a5*a4 a4*a4 a3*a4 a2*a4 a1*a4 a0*a4 a7*a5 a6*a5 a5*a5 a4*a5 a3*a5 a2*a5 a1*a5 a0*a5 a7*a6 a6*a6 a5*a6 a4*a6 a3*a6 a2*a6 a1*a6 a0*a6 a7*a7 a6*a7 a5*a7 a4*a7 a3*a7 a2*a7 a1*a7 a0*a7 ‘0' S15, S14 S13 S12 S11 S10 S9 S8 S7 S6 S5 S4 S3 S2 S1 S0
Example of an 8bit squarer
Array Multiplier 32bits by 32bits multiplier
Booth (Radix-4) Multiplier · Radix-4 (3 bit recoding) reduces number of partial products to be added by half. · Great saving in area and increased speed. A = -an-12n-1 + an-22n-2 + an-32n-3 + …. + a12 + a0 B = -bn-12n-1 + bn-22n-2 + bn-32n-3 + …. + b12 + b0 · Base 4 redundant sign digit representation of B is (n/2) - 1 B = 22i Ki i = 0
· · Ki is calculated by following equation Ki = -2b2i+1 + b2i + b2i-1 i = 0,1,2,….(n-2)/2 · 3 bits of Multiplier B, b2i+1, b2i, b2i-1, are examined and corresponding Ki is calculated. · B is always appended on the right with zero (b-1 = 0), and n is always even (B is sign extended if needed). · The product AB is then obtained by adding n/2 partial products. (n/2) - 1 AB = P = 22i Ki A i = 0
Booth Algorithm Decoding of multiplier to generate signals for hardware use Xi+1 Xi Xi-1 OP NEG ZERO TWO 1 2
Booth Algorithm Three bits of the multiplicand at a time A Booth recoded multiplier examines Three bits of the multiplicand at a time It determine whether to add zero, 1, -1, 2, or -2 of that rank of the multiplicand. The operation to be performed is based on the current two bits of the multiplicand and the previous bit Xi+1 X Xi-1 Zi/2 1 2 -2 -1
BIT M is 21 20 2-1 OPERATION multiplied Xi Xi+1 Xi+2 by M is 21 20 2-1 OPERATION multiplied Xi Xi+1 Xi+2 by add zero (no string) +0 1 add multipleic (end of string) +X add multiplic. (a string) add twice the mul. (end of string) +2X sub. twice the m. (beg. of string) -2X sub. the m. (-2X and +X) -X sub . the m. (beg. of string) sub. zero (center of string) -0
Booth Algorithm- dot notation Multiplicand A = ● ● ● ● Multiplier B = (●●)(●●) Partial product bits ● ● ● ● (B1B0)2A40 Partial product bits ● ● ● ● (B3B2)A41 Product P = ● ● ● ● ● ● ● ●
Added to the multiplier Example The following example is used to show how the calculation is done properly. Multiplicand X = 000011 Added to the multiplier Multiplier Y = 011101 0 1 1 1 0 1 0 After booth decoding, Y is decoded as to multiply X by +2, -1, +1 separately, then shift the partial product two bits and add them together. X* +1 000000000011 X* -1 1111111101 X* +2 00000110 -------------------------------------------- 000001010111
Sign Extension
Segmented input operands Sign extension Traditional sign-extension scheme Segment the input operands based on the size of embedded blocks Multiply the segmented inputs and extend the sign bit of each partial products Sum all partial products Segmented input operands Sign extension × + Final result partial products Sign 12/7/2018 Concordia VLSI Lab 72 slide 72
Booth Algorithm-Example 1 Example 1:
Booth Algorithm Example 2 Notice sign extensions
Booth Algorithm-Example 3 Notice the sign extensions
Comparison of Booth and parallel multiplier shift and Add
Template to reduce sign extensions for Booth Algorithm Please note that each operand is 17 bit ie. the 17th bit is the sign bit. Also negative numbers are entered as 1’s complement, this is why you need to add the S in the right hand side of the diagram. If you use 2’complement then the S’s on right side of the diagram can be removed
Comparison of Template and the sign extension
Example of using the template 25 * - 35 with -35 as the multiplier. Using 8 bit representation Using the Template 25 * -35 Sign bit 0 0 0 1 1 0 0 1 Add SS 1 1 0 1 1 1 0 1 0 Add inverted S Add Inverted sign and add 1 1 0 0 0 0 0 1 1 0 0 1 * 1 Add Inverted sign bit 1 0 1 1 1 0 0 1 1 1 * -1 1 0 0 1 1 0 0 1 0 * 2 No sign bit 1 1 0 0 1 1 1 * -1 1 1 1 1 0 0 1 0 0 1 0 1 0 1 This is a –ve number. Convert it 0 0 0 0 1 1 0 1 1 0 1 0 1 1 512 256 64 32 8 2 1 = 875
Booth Multiplier Components Multiplicand Booth Encoder PPU (Partial products unit) PPA (Partial products adding unit) Product
Wallace Tree and Ripple Carry Adder Structure. Of 8*8 multiplier With Pipeline
Hardware implementation of Booth with shift and add
Simulation Plan
Testing the Design
Simulation For Parallel Multipliers Signed Number: Unsigned Number:
Simulation For Signed S/P Multipliers There are 340 ns delay between the result and the operators because of the D flip-flops delay.
FPGA after implementation, areas of programming shown clearly
Another implementation of the above after pipelining, the place and rout has paced the design in different places.
Spartacus FPGA board
Testing the multiplication system
Comparison of Multipliers Array Multiplier Modified Booth Multiplier Wallace-Tree Multiplier Modified Booth-Wallace Tree Multiplier Twin Pipe Serial-Parallel Multiplier Behavioral Multiplier Area – Total CLB’s (#) 3076.50 2649.50 3325.50 2672.50 490.00 2993.50 Maximum Delay D(ns) 35.78 24.43 18.93 18.53 107.52 (3.36x32) 49.33 Total Dynamic Power P (W) 7.52 6.33 7.46 6.41 0.28 6.24 Delay ·Power Product (DP) (ns W) 268.98 154.64 141.14 118.76 30.62 307.58 Area•Power Product (AP) (# W) 23128.20 16771.60 24793.93 17127.79 139.54 18665.07 Area•Delay Product (AD) (# ns) 1.10E+05 6.47E+04 6.30E+04 4.95E+04 5.27E+04 1.48E+05 Area•Delay2 Product (AD2) (# ns2) 3.94E+06 1.58E+06 1.19E+06 9.18E+05 5.66E+06 7.28E+06 Table 7. Performance comparison for two’s complement multipliers By Chen Yaoquan, M.Eng. 2005
Comparison of Multipliers Array Multiplier Modified Booth Multiplier Wallace-Tree Multiplier Modified Booth-Wallace Tree Multiplier Twin Pipe Serial-Parallel Multiplier Behavioral Multiplier Area – Total CLB’s (#) 3280.50 2800.00 3321.50 2845.50 487.00 3003.00 Maximum Delay D(ns) 37.23 25.33 18.93 18.33 107.52 44.50 Total Dynamic Power P (W) 7.57 6.66 7.32 0.29 6.26 Delay ·Power Product (DP) (ns W) 281.88 168.77 138.60 122.13 30.66 278.53 Area•Power Product (AP) (# W) 24837.98 18656.40 24319.36 18959.57 138.89 18795.78 Area•Delay Product (AD) (# ns) 1.22E+05 7.09E+04 6.29E+04 5.22E+04 5.24E+04 1.34E+05 Area•Delay2 Product (AD2) (# ns2) 4.55E+06 1.80E+06 1.19E+06 9.56E+05 5.63E+06 5.95E+06 Table 7. Performance comparison for Unsigned multipliers By Chen Yaoquan, M.Eng. 2005
Comparison of Multipliers Change the value of “set_max_delay” in Script file (ns) 10 20 30 40 50 60 >60 Area(#) 3014.5 3013.0 3110.0 3193.5 3019.5 2999.5 2978.5 Power(w) 6.6499 6.6470 7.5683 8.1878 8.0645 8.0419 8.0156 Delay(ns) 31.98 30.93 30.08 39.93 49.88 59.63 The relation of Area and Delay for behavioral multiplier -- "banana curve"
Comparison of Multipliers Array Multiplier Modified Booth Multiplier Wallace-Tree Multiplier Modified Booth-Wallace Tree Multiplier Twin Pipe Serial-Parallel Multiplier Behavioral Multiplier Area Medium Small Large Smallest Critical Delay Fast Very Fast Fastest Very Large Power Consumption Complexity Simple Complex More Complex Simplest Implement Easy Difficut Easiest By Chen Yaoquan, M.Eng. 2005
Pipelining Simulation
Synthesis for Signed Multipliers Array Modified Booth Wallace Tree Modified Booth -Wallace Tree Twin Pipe S/P Behavioral
Synthesis for Unsigned Multipliers Array Modified Booth Wallace Tree Modified Booth -Wallace Tree Twin Pipe S/P Behavioral
Conclusion Modified Booth and Wallace Tree are the best techniques for high speed multiplication. Wallace Tree has the best performance, but it is hard to implement. Booth algorithm based multipliers have lower area among parallel multipliers. For behavioral multipliers, the area will increase while the delay decreases.
Comparison Array Multiplier Modified Booth Multiplier Array Multiplier Modified Booth Multiplier Wallace Tree Multiplier Modified Booth & Wallace Tree Multiplier Twin Pipe Serial-Parallel Multiplier Area – Total CLB’s (#) 1165 1292 1659 1239 133 Maximum Delay (ns) 187.87ns 139.41ns 101.14ns 101.43ns 22.58ns (722.56ns) Power Consumption at highest speed (mW) 16.6506mW (at 188ns) 23.136mW (at 140ns) 30.95mW (at 101.14ns) 30.862mW (at 101.43ns) 2.089mW (at 722.56ns) Delay Power Product (DP) (ns mW) 3128.15 3225.39 3130.28 3130.33 1509.42 Area Power Product (AP) (# mW) 19.397 x 103 29.891 x 103 51.346 x 103 38.238 x 103 277.837 Area Delay Product (AD) (# ns) 218.868 x 103 180.118 x 103 167.791 x 103 125.671 x 103 96.101 x 103 Area Delay2 Product(AD2) (# ns2) 41.119 x 106 25.110 x 106 16.970 x 106 12.747 x 106 69.438 x 106
NOTICE · The rest of these slides are for extra information only and are not part of the lecture
Array Addition
Addition of 8 binary numbers using the Wallace tree principal
Baugh-Wooley two's complement multiplier:
Cluster Multipliers Divide the multiplier into smaller multipliers
Cluster Multipliers The circuit used to generate the enable signal 8-bit cluster low power multiplier
Cluster Multipliers Dividing the multiplication circuit into clusters (blocks) of smaller multipliers Applying clock gating techniques to disable the blocks that are producing a zero result. Features Low Power (claims 13.4 % savings)
Multiplexer-Based Array Multipliers Z j xjyj
Multiplexer-Based Array Multipliers Two types of cells: Cell 1: produce the terms Zij2j and includes a full adder of carry save adder array Cell 2: produce the terms xjyj 2j and includes a full adder of carry save adder array
Multiplexer-Based Array Multipliers Characteristics Faster than Modified Booth Unlike Booth, does not require encoding logic Requires approximately N2/2 cells Has a zigzag shape, thus not layout-friendly
Multiplexer-Based Array Multipliers Improvement More rectangular layout Save up to 40 percent area without penalties Outperforms the modified Booth multiplier in both speed and power by 13% to 26%
Gray-Encoded Array Multiplier Dec Hyb 0000 4 0100 -8 1100 -4 1000 1 0001 5 0101 -7 1101 -3 1001 2 0011 6 0111 -6 1111 -2 1011 3 0010 7 0110 -5 1110 -1 1010 2’s complement Hybrid Coding Having a single bit different for consecutive values Reducing the number of transitions, and thus power ( for highly correlated streams ).
Gray-Encoded Array Multiplier An 8-bit wide 2’s complement radix-4 array multiplier
Gray-Encoded Array Multiplier Characteristics Uses gray code to reduce the switching activity of multiplier Saves 45.6% power than Modified Booth Uses greater area(26.4% ) than Modified Booth
Ultra-high Speed Parallel Multiplier How to ultra-high speed? Based on Modified Booth Algorithm and Tree Structure (Column compress) Chooses efficient counters (3:2 and 5:3) Uses the new compressor (faster 20% ) Uses First Partial product Addition (FPA) Algorithm (reducing the bits of CLA by 50%)
Ultra-high Speed Parallel Multiplier Divide into 3 rows or 5 rows only (most efficient). Calculate the partial products as soon as possible. The final CLA is only 16-bit instead of 32-bit. Calculation process using parallel counter in case of 16x16 ---Totally reduce delay by about 30%
ULLRLF Multiplier ULLRLF stands for Upper/Lower Left-to-Right Leapfrog. Combine the following techniques: Signal flow optimization in [3:2] adder array for partial product reduction, Left-to-right leapfrog (LRLF) signal flow, Splitting of the reduction array into upper/lower parts.
ULLRLF Multiplier Signal flow optimization in [3:2] adder array PPij is always connected to pin A Sin/Cin are connected to B/C , most Sin signals are connected to C Signal flow optimization in [3:2] adder array -- For n = 32, the delay is reduced by 30 percent. -- The power is saved also.
ULLRLF Multiplier 2) Left-to-Right Leapfrog (LRLF) Structure The sum signals skip over alternate rows. 2) Left-to-Right Leapfrog (LRLF) Structure -- The delay of signals is more balanceable. -- Low power.
ULLRLF Multiplier 3) Upper/Lower Split Structure Only n+2 bits 3) Upper/Lower Split Structure -- The long path of data path be broken into parallel short paths, there would be a saving in power. -- The delay of Partial Products Reduction is reduced.
ULLRLF Multiplier ULLRLF multipliers have less power than optimized tree multipliers for n ≤ 32 while keeping similar delay and area. With more regularity and inherently shorter interconnects, the ULLRLF structure presents a competitive alternative to tree structures. Floorplan of ULLRLF (n = 32)
Signed Array Multiplier
Unsigned Array Multiplier
Signed Modified Booth Multiplier
Signed Modified Booth Multiplier
Unsigned Modified Booth Multiplier
Unsigned Modified Booth Multiplier
Wallace Tree multipliers
Wallace Tree multipliers Use the 3:2 counters and 2:2 counters Number of levels of = log (32/2) / log (3/2) ≈8 Irregular structure Fast
Wallace Tree multipliers 2-level hierarchical
Modified Booth-Wallace Tree Multipliers
Modified Booth-Wallace Tree Multipliers Use the 3:2 counters and 2:2 counters Number of levels of = log (16/2) / log (3/2) ≈6 Irregular structure Fast Less area
Twin pipe serial-parallel multipliers
Signed twin pipe serial-parallel multipliers “Sign” control line and the sign-change hardware
Unsigned twin pipe serial-parallel multipliers Don’t need the “Sign” control line and the sign-change hardware