Download presentation
Presentation is loading. Please wait.
1
Integer Multipliers
2
Multipliers A must have circuit in most DSP applications
A variety of multipliers exists that can be chosen based on their performance Serial, Serial/Parallel,Shift and Add, Array, Booth, Wallace Tree,….
3
16x16 multiplier converter Converter RB r e s t n RC RA
4
Multiplication Algorithm
X= Xn-1 Xn-2 ………..……X0 Multiplicand Y=Yn-1 Yn-2……………….Y Multiplier Yn-1X0 Yn-2X0 Yn-3X0 …… Y1X0 Y0X0 Yn-1X1 Yn-2X1 Yn-3X1 …… Y1X1 Y0X1 Yn-1X2 Yn-2X2 Yn-3X2 …… Y1X2 Y0X2 … … … … … … … … …. Yn-1Xn-2 Yn-2X0 n-2 Yn-3X n …… Y1Xn-2 Y0Xn-2 Yn-1Xn-1 Yn-2X0n-1 Yn-3Xn …… Y1Xn-1 Y0Xn-1 P2n P2n P2n P P P0
5
1. Multiplication Algorithms
Implementation of multiplication of binary numbers boils down to how to do the additions. Consider the two 8 bit numbers A and B to generate the 16 bit product P. First generate the 64 partial Products and then add them up.
6
Multiplier Design MU Storage R REG E G OUT I N ( Multiplier Unit)
MU ( Multiplier Unit) R E G I N REG OUT Control Unit Storage
7
Serial Multiplier X: x3x2x1x0 Y:y 3y2y1y0 Input Sequence for G1:
00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0 00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0 Reset: Slide 1
8
Si: the ith bit of the final result
X: x3x2x1x0 Y:y 3y2y1y0 Input Sequence for G1: 00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0 00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0 Reset: Slide 2
9
Si: the ith bit of the final result
X: x3x2x1x0 Y:y 3y2y1y0 Input Sequence for G1: 00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0 00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0 Reset: Slide 3
10
Si: the ith bit of the final result
X: x3x2x1x0 Y:y 3y2y1y0 Input Sequence for G1: 00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0 00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0 Reset: Slide 4
11
Si: the ith bit of the final result
X: x3x2x1x0 Y:y 3y2y1y0 Input Sequence for G1: 00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0 00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0 Reset: Slide 5
12
Si: the ith bit of the final result Ci: the only carry from column i
X: x3x2x1x0 Y:y 3y2y1y0 Input Sequence for G1: 00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0 00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0 Reset: Slide 6
13
Si: the ith bit of the final result Ci: the only carry from column i
Sij: the jth partial sum for column i Cij: the jth partial carry from column i X: x3x2x1x0 Y:y 3y2y1y0 Input Sequence for G1: 00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0 00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0 Reset: Slide 7
14
Si: the ith bit of the final result Ci: the only carry from column i
Sij: the jth partial sum for column i Cij: the jth partial carry from column i X: x3x2x1x0 Y:y 3y2y1y0 Input Sequence for G1: 00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0 00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0 Reset: Slide 8
15
Si: the ith bit of the final result Ci: the only carry from column i
Sij: the jth partial sum for column i Cij: the jth partial carry from column i X: x3x2x1x0 Y:y 3y2y1y0 Input Sequence for G1: 00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0 00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0 Reset: Slide 9
16
Si: the ith bit of the final result Ci: the only carry from column i
Sij: the jth partial sum for column i Cij: the jth partial carry from column i X: x3x2x1x0 Y:y 3y2y1y0 Input Sequence for G1: 00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0 00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0 Reset: Slide 10
17
Si: the ith bit of the final result Ci: the only carry from column i
Sij: the jth partial sum for column i Cij: the jth partial carry from column i X: x3x2x1x0 Y:y 3y2y1y0 Input Sequence for G1: 00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0 00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0 Reset: Slide 11
18
Si: the ith bit of the final result Ci: the only carry from column i
Sij: the jth partial sum for column i Cij: the jth partial carry from column i X: x3x2x1x0 Y:y 3y2y1y0 Input Sequence for G1: 00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0 00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0 Reset: Slide 12
19
Si: the ith bit of the final result Ci: the only carry from column i
Sij: the jth partial sum for column i Cij: the jth partial carry from column i X: x3x2x1x0 Y:y 3y2y1y0 Input Sequence for G1: 00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0 00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0 Reset: Slide 13
20
Si: the ith bit of the final result Ci: the only carry from column i
Sij: the jth partial sum for column i Cij: the jth partial carry from column i X: x3x2x1x0 Y:y 3y2y1y0 Input Sequence for G1: 00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0 00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0 Reset: Slide 14
21
Si: the ith bit of the final result Ci: the only carry from column i
Sij: the jth partial sum for column i Cij: the jth partial carry from column i X: x3x2x1x0 Y:y 3y2y1y0 Input Sequence for G1: 00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0 00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0 Reset: Slide 15
22
Si: the ith bit of the final result Ci: the only carry from column i
Sij: the jth partial sum for column i Cij: the jth partial carry from column i X: x3x2x1x0 Y:y 3y2y1y0 Input Sequence for G1: 00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0 00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0 Reset: Slide 16
23
Si: the ith bit of the final result Ci: the only carry from column i
Sij: the jth partial sum for column i Cij: the jth partial carry from column i X: x3x2x1x0 Y:y 3y2y1y0 Input Sequence for G1: 00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0 00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0 Reset: Slide 17
24
Si: the ith bit of the final result Ci: the only carry from column i
Sij: the jth partial sum for column i Cij: the jth partial carry from column i X: x3x2x1x0 Y:y 3y2y1y0 Input Sequence for G1: 00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0 00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0 Reset: Slide 18
25
Si: the ith bit of the final result Ci: the only carry from column i
Sij: the jth partial sum for column i Cij: the jth partial carry from column i X: x3x2x1x0 Y:y 3y2y1y0 Input Sequence for G1: 00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0 00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0 Reset: Slide 19
26
Si: the ith bit of the final result Ci: the only carry from column i
Sij: the jth partial sum for column i Cij: the jth partial carry from column i X: x3x2x1x0 Y:y 3y2y1y0 Input Sequence for G1: 00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0 00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0 Reset: Slide 20
27
Si: the ith bit of the final result Ci: the only carry from column i
Sij: the jth partial sum for column i Cij: the jth partial carry from column i X: x3x2x1x0 Y:y 3y2y1y0 Input Sequence for G1: 00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0 00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0 Reset: Slide 21
28
Si: the ith bit of the final result Ci: the only carry from column i
Sij: the jth partial sum for column i Cij: the jth partial carry from column i X: x3x2x1x0 Y:y 3y2y1y0 Input Sequence for G1: 00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0 00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0 Reset: Slide 21
29
Serial / Parallel Multiplier
Si: the ith bit of the final result Serial / Parallel Multiplier Slide 1 slide
30
Si: the ith bit of the final result Ci: the only carry from column i
Slide 2
31
Si: the ith bit of the final result Ci: the only carry from column i
Sij: the jth partial sum for column i Cij: the jth partial carry from column i Slide 3
32
Si: the ith bit of the final result Ci: the only carry from column i
Sij: the jth partial sum for column i Cij: the jth partial carry from column i Slide 4
33
Si: the ith bit of the final result Ci: the only carry from column i
Sij: the jth partial sum for column i Cij: the jth partial carry from column i Slide 5
34
Si: the ith bit of the final result Ci: the only carry from column i
Sij: the jth partial sum for column i Cij: the jth partial carry from column i Slide 6
35
Si: the ith bit of the final result Ci: the only carry from column i
Sij: the jth partial sum for column i Cij: the jth partial carry from column i Slide 7
36
Si: the ith bit of the final result Ci: the only carry from column i
Slide 8
37
Shift AND Add Multiplier
8 bit Adder MUX INPUT Ain (7 downto 0) REGA Result (7 downto 0) Result (15 downto 8) INPUT Bin (7 downto 0) CLOCK REGB REGC
38
Synchronous Shift and Add Multiplier controller
Multiplication process: 5 states: Idle, Init, Test, Add, and Shift&Count. Idle: Starts by receiving the Start signal; Init: Multiplicand and multiplier are loaded into a load register and a shift register, respectively; Test: The LSB in the shift register which contains the multiplier is tested to decide the next state;
39
Synchronous Shift and Add Multiplier ControllerDesign
Add: If LSB is ‘1’, then next state is to add the new partial product to the accumulation result, and the state machine transits to shift&count state ; Shift&Count: If LSB is ‘0’, then the two shift register shift their contains one bit right, and the counter counts up by one step. After that, the state machine transits back to test state; When the counter reaches to N , a Stop signal is asserted and the state machine goes to the idle state; Idle: In the idle state, a Done signal is asserted to indicate the end of multiplication.
40
n-bit Multiplier: Q0=1: Multiplicand is added to register A; the result is stored in register A; registers C, A, Q are shifted to the right one bit Q0=0: Registers C, A, Q are shifted to the right one bit Slide 1
41
Example: 4-bit Multiplier
Initial Values Slide 2
42
Example: 4-bit Multiplier
First Cycle--Add Slide 3
43
Example: 4-bit Multiplier
First Cycle--Shift Slide 4
44
Example: 4-bit Multiplier
Second Cycle--Shift Slide 5
45
Example: 4-bit Multiplier
Third Cycle--Add Slide 6
46
Example: 4-bit Multiplier
Third Cycle--Shift Slide 7
47
Example: 4-bit Multiplier
Fourth Cycle--Add Slide 8
48
Example: 4-bit Multiplier
Fourth Cycle--Shift Slide 9
49
4*4 Synchronous Shift and Add Multiplier Design Layout Design
Floor plan of the 4*4 Synchronous Shift and Add Multiplier
50
Comparison between Synchronous and Asynchronous Approaches
.
51
Example : (simulated by Ovais Ahmed)
Multiplicand = = 8916 Multiplier = = AB16 Expected Result = =5B8316
52
Array Multiplier · Regular structure based on add and shift algorithm.
· Regular structure based on add and shift algorithm. · Addition is mainly done by carry save algorithm. · Sign bit extension results in a higher capacitive load and slows down the speed of the circuit.
53
Addition with CLA
54
Array Multiplier with CSA
55
Critical Path with Array Multipliers
FA FA FA HA FA FA FA HA FA FA FA HA Two of the possible paths for the Ripple-Carry based 4*4 Multiplier Area = (N*N) AND Gate + (N-1)N Full-Adder τ Delay = + (2N-1) τ HA FA
57
Wallace Tree
58
Array Multiplier + Wallace Tree
59
Baugh-Wooley Algorithm
Convert negative partial products to positive representation No sign-extension required 12/7/2018 Concordia VLSI Lab 59 slide 59
60
examples of 5-by-5 Baugh-Wooley
12/7/2018 Concordia VLSI Lab 60
61
Squarer using Baugh-Wooley Algorithm
* a7*a0 a6*a0 a5*a0 a4*a0 a3*a0 a2*a0 a1*a0 a0*a0 a7*a1 a6*a1 a5*a1 a4*a1 a3*a1 a2*a1 a1*a1 a0*a1 a7*a2 a6*a2 a5*a2 a4*a2 a3*a2 a2*a2 a1*a2 a0*a2 a7*a3 a6*a3 a5*a3 a4*a3 a3*a3 a2*a3 a1*a3 a0*a3 a7*a4 a6*a4 a5*a4 a4*a4 a3*a4 a2*a4 a1*a4 a0*a4 a7*a5 a6*a5 a5*a5 a4*a5 a3*a5 a2*a5 a1*a5 a0*a5 a7*a6 a6*a6 a5*a6 a4*a6 a3*a6 a2*a6 a1*a6 a0*a6 a7*a7 a6*a7 a5*a7 a4*a7 a3*a7 a2*a7 a1*a7 a0*a7 ‘0' S15, S14 S13 S12 S11 S10 S9 S8 S7 S6 S5 S4 S3 S2 S1 S0
62
Example of an 8bit squarer
63
Array Multiplier 32bits by 32bits multiplier
64
Booth (Radix-4) Multiplier
· Radix-4 (3 bit recoding) reduces number of partial products to be added by half. · Great saving in area and increased speed. A = -an-12n-1 + an-22n-2 + an-32n-3 + …. + a12 + a0 B = -bn-12n-1 + bn-22n-2 + bn-32n-3 + …. + b12 + b0 · Base 4 redundant sign digit representation of B is (n/2) - 1 B = i Ki i = 0
65
· · Ki is calculated by following equation
Ki = -2b2i+1 + b2i + b2i i = 0,1,2,….(n-2)/2 · 3 bits of Multiplier B, b2i+1, b2i, b2i-1, are examined and corresponding Ki is calculated. · B is always appended on the right with zero (b-1 = 0), and n is always even (B is sign extended if needed). · The product AB is then obtained by adding n/2 partial products. (n/2) - 1 AB = P = 22i Ki A i = 0
66
Booth Algorithm Decoding of multiplier to generate signals for hardware use
Xi+1 Xi Xi-1 OP NEG ZERO TWO 1 2
67
Booth Algorithm Three bits of the multiplicand at a time
A Booth recoded multiplier examines Three bits of the multiplicand at a time It determine whether to add zero, 1, -1, 2, or -2 of that rank of the multiplicand. The operation to be performed is based on the current two bits of the multiplicand and the previous bit Xi+1 X Xi-1 Zi/2 1 2 -2 -1
68
BIT M is 21 20 2-1 OPERATION multiplied Xi Xi+1 Xi+2 by
M is 21 20 2-1 OPERATION multiplied Xi Xi+1 Xi+2 by add zero (no string) +0 1 add multipleic (end of string) +X add multiplic. (a string) add twice the mul. (end of string) +2X sub. twice the m. (beg. of string) -2X sub. the m. (-2X and +X) -X sub . the m. (beg. of string) sub. zero (center of string) -0
69
Booth Algorithm- dot notation
Multiplicand A = ● ● ● ● Multiplier B = (●●)(●●) Partial product bits ● ● ● ● (B1B0)2A40 Partial product bits ● ● ● ● (B3B2)A41 Product P = ● ● ● ● ● ● ● ●
70
Added to the multiplier
Example The following example is used to show how the calculation is done properly. Multiplicand X = Added to the multiplier Multiplier Y = After booth decoding, Y is decoded as to multiply X by +2, -1, +1 separately, then shift the partial product two bits and add them together. X* X* X*
71
Sign Extension
72
Segmented input operands
Sign extension Traditional sign-extension scheme Segment the input operands based on the size of embedded blocks Multiply the segmented inputs and extend the sign bit of each partial products Sum all partial products Segmented input operands Sign extension × + Final result partial products Sign 12/7/2018 Concordia VLSI Lab 72 slide 72
73
Booth Algorithm-Example 1
Example 1:
74
Booth Algorithm Example 2
Notice sign extensions
75
Booth Algorithm-Example 3
Notice the sign extensions
76
Comparison of Booth and parallel multiplier shift and Add
77
Template to reduce sign extensions for Booth Algorithm
Please note that each operand is 17 bit ie. the 17th bit is the sign bit. Also negative numbers are entered as 1’s complement, this is why you need to add the S in the right hand side of the diagram. If you use 2’complement then the S’s on right side of the diagram can be removed
78
Comparison of Template and the sign extension
79
Example of using the template
25 * with -35 as the multiplier. Using 8 bit representation Using the Template 25 * -35 Sign bit Add SS Add inverted S Add Inverted sign and add 1 * 1 Add Inverted sign bit * -1 * 2 No sign bit * -1 This is a –ve number. Convert it = 875
80
Booth Multiplier Components
Multiplicand Booth Encoder PPU (Partial products unit) PPA (Partial products adding unit) Product
81
Wallace Tree and Ripple Carry Adder Structure.
Of 8*8 multiplier With Pipeline
82
Hardware implementation of Booth with shift and add
83
Simulation Plan
84
Testing the Design
85
Simulation For Parallel Multipliers
Signed Number: Unsigned Number:
86
Simulation For Signed S/P Multipliers
There are 340 ns delay between the result and the operators because of the D flip-flops delay.
87
FPGA after implementation, areas of programming shown clearly
88
Another implementation of the above after pipelining, the place and rout has paced the design in different places.
89
Spartacus FPGA board
90
Testing the multiplication system
91
Comparison of Multipliers
Array Multiplier Modified Booth Multiplier Wallace-Tree Multiplier Modified Booth-Wallace Tree Multiplier Twin Pipe Serial-Parallel Multiplier Behavioral Multiplier Area – Total CLB’s (#) 490.00 Maximum Delay D(ns) 35.78 24.43 18.93 18.53 (3.36x32) 49.33 Total Dynamic Power P (W) 7.52 6.33 7.46 6.41 0.28 6.24 Delay ·Power Product (DP) (ns W) 268.98 154.64 141.14 118.76 30.62 307.58 Area•Power Product (AP) (# W) 139.54 Area•Delay Product (AD) (# ns) 1.10E+05 6.47E+04 6.30E+04 4.95E+04 5.27E+04 1.48E+05 Area•Delay2 Product (AD2) (# ns2) 3.94E+06 1.58E+06 1.19E+06 9.18E+05 5.66E+06 7.28E+06 Table 7. Performance comparison for two’s complement multipliers By Chen Yaoquan, M.Eng. 2005
92
Comparison of Multipliers
Array Multiplier Modified Booth Multiplier Wallace-Tree Multiplier Modified Booth-Wallace Tree Multiplier Twin Pipe Serial-Parallel Multiplier Behavioral Multiplier Area – Total CLB’s (#) 487.00 Maximum Delay D(ns) 37.23 25.33 18.93 18.33 107.52 44.50 Total Dynamic Power P (W) 7.57 6.66 7.32 0.29 6.26 Delay ·Power Product (DP) (ns W) 281.88 168.77 138.60 122.13 30.66 278.53 Area•Power Product (AP) (# W) 138.89 Area•Delay Product (AD) (# ns) 1.22E+05 7.09E+04 6.29E+04 5.22E+04 5.24E+04 1.34E+05 Area•Delay2 Product (AD2) (# ns2) 4.55E+06 1.80E+06 1.19E+06 9.56E+05 5.63E+06 5.95E+06 Table 7. Performance comparison for Unsigned multipliers By Chen Yaoquan, M.Eng. 2005
93
Comparison of Multipliers
Change the value of “set_max_delay” in Script file (ns) 10 20 30 40 50 60 >60 Area(#) 3014.5 3013.0 3110.0 3193.5 3019.5 2999.5 2978.5 Power(w) 6.6499 6.6470 7.5683 8.1878 8.0645 8.0419 8.0156 Delay(ns) 31.98 30.93 30.08 39.93 49.88 59.63 The relation of Area and Delay for behavioral multiplier -- "banana curve"
94
Comparison of Multipliers
Array Multiplier Modified Booth Multiplier Wallace-Tree Multiplier Modified Booth-Wallace Tree Multiplier Twin Pipe Serial-Parallel Multiplier Behavioral Multiplier Area Medium Small Large Smallest Critical Delay Fast Very Fast Fastest Very Large Power Consumption Complexity Simple Complex More Complex Simplest Implement Easy Difficut Easiest By Chen Yaoquan, M.Eng. 2005
95
Pipelining Simulation
96
Synthesis for Signed Multipliers
Array Modified Booth Wallace Tree Modified Booth -Wallace Tree Twin Pipe S/P Behavioral
97
Synthesis for Unsigned Multipliers
Array Modified Booth Wallace Tree Modified Booth -Wallace Tree Twin Pipe S/P Behavioral
98
Conclusion Modified Booth and Wallace Tree are the best techniques for high speed multiplication. Wallace Tree has the best performance, but it is hard to implement. Booth algorithm based multipliers have lower area among parallel multipliers. For behavioral multipliers, the area will increase while the delay decreases.
99
Comparison Array Multiplier Modified Booth Multiplier
Array Multiplier Modified Booth Multiplier Wallace Tree Multiplier Modified Booth & Wallace Tree Multiplier Twin Pipe Serial-Parallel Multiplier Area – Total CLB’s (#) 1165 1292 1659 1239 133 Maximum Delay (ns) 187.87ns 139.41ns 101.14ns 101.43ns 22.58ns (722.56ns) Power Consumption at highest speed (mW) mW (at 188ns) 23.136mW (at 140ns) 30.95mW (at ns) 30.862mW (at ns) 2.089mW (at ns) Delay Power Product (DP) (ns mW) Area Power Product (AP) (# mW) x 103 x 103 x 103 x 103 Area Delay Product (AD) (# ns) x 103 x 103 x 103 x 103 x 103 Area Delay2 Product(AD2) (# ns2) x 106 x 106 x 106 x 106 x 106
100
NOTICE · The rest of these slides are for extra information only
and are not part of the lecture
101
Array Addition
102
Addition of 8 binary numbers using the Wallace tree principal
106
Baugh-Wooley two's complement multiplier:
108
Cluster Multipliers Divide the multiplier into smaller multipliers
109
Cluster Multipliers The circuit used to generate the enable signal
8-bit cluster low power multiplier
110
Cluster Multipliers Dividing the multiplication circuit into clusters (blocks) of smaller multipliers Applying clock gating techniques to disable the blocks that are producing a zero result. Features Low Power (claims 13.4 % savings)
111
Multiplexer-Based Array Multipliers
Z j xjyj
112
Multiplexer-Based Array Multipliers
Two types of cells: Cell 1: produce the terms Zij2j and includes a full adder of carry save adder array Cell 2: produce the terms xjyj 2j and includes a full adder of carry save adder array
113
Multiplexer-Based Array Multipliers
Characteristics Faster than Modified Booth Unlike Booth, does not require encoding logic Requires approximately N2/2 cells Has a zigzag shape, thus not layout-friendly
114
Multiplexer-Based Array Multipliers
Improvement More rectangular layout Save up to 40 percent area without penalties Outperforms the modified Booth multiplier in both speed and power by 13% to 26%
115
Gray-Encoded Array Multiplier
Dec Hyb 0000 4 0100 -8 1100 -4 1000 1 0001 5 0101 -7 1101 -3 1001 2 0011 6 0111 -6 1111 -2 1011 3 0010 7 0110 -5 1110 -1 1010 2’s complement Hybrid Coding Having a single bit different for consecutive values Reducing the number of transitions, and thus power ( for highly correlated streams ).
116
Gray-Encoded Array Multiplier
An 8-bit wide 2’s complement radix-4 array multiplier
117
Gray-Encoded Array Multiplier
Characteristics Uses gray code to reduce the switching activity of multiplier Saves 45.6% power than Modified Booth Uses greater area(26.4% ) than Modified Booth
118
Ultra-high Speed Parallel Multiplier
How to ultra-high speed? Based on Modified Booth Algorithm and Tree Structure (Column compress) Chooses efficient counters (3:2 and 5:3) Uses the new compressor (faster 20% ) Uses First Partial product Addition (FPA) Algorithm (reducing the bits of CLA by 50%)
119
Ultra-high Speed Parallel Multiplier
Divide into 3 rows or 5 rows only (most efficient). Calculate the partial products as soon as possible. The final CLA is only 16-bit instead of 32-bit. Calculation process using parallel counter in case of 16x16 ---Totally reduce delay by about 30%
120
ULLRLF Multiplier ULLRLF stands for Upper/Lower Left-to-Right Leapfrog. Combine the following techniques: Signal flow optimization in [3:2] adder array for partial product reduction, Left-to-right leapfrog (LRLF) signal flow, Splitting of the reduction array into upper/lower parts.
121
ULLRLF Multiplier Signal flow optimization in [3:2] adder array
PPij is always connected to pin A Sin/Cin are connected to B/C , most Sin signals are connected to C Signal flow optimization in [3:2] adder array -- For n = 32, the delay is reduced by 30 percent. -- The power is saved also.
122
ULLRLF Multiplier 2) Left-to-Right Leapfrog (LRLF) Structure
The sum signals skip over alternate rows. 2) Left-to-Right Leapfrog (LRLF) Structure -- The delay of signals is more balanceable. -- Low power.
123
ULLRLF Multiplier 3) Upper/Lower Split Structure
Only n+2 bits 3) Upper/Lower Split Structure -- The long path of data path be broken into parallel short paths, there would be a saving in power. -- The delay of Partial Products Reduction is reduced.
124
ULLRLF Multiplier ULLRLF multipliers have less power than optimized tree multipliers for n ≤ 32 while keeping similar delay and area. With more regularity and inherently shorter interconnects, the ULLRLF structure presents a competitive alternative to tree structures. Floorplan of ULLRLF (n = 32)
125
Signed Array Multiplier
126
Unsigned Array Multiplier
127
Signed Modified Booth Multiplier
128
Signed Modified Booth Multiplier
129
Unsigned Modified Booth Multiplier
130
Unsigned Modified Booth Multiplier
131
Wallace Tree multipliers
132
Wallace Tree multipliers
Use the 3:2 counters and 2:2 counters Number of levels of = log (32/2) / log (3/2) ≈8 Irregular structure Fast
133
Wallace Tree multipliers
2-level hierarchical
134
Modified Booth-Wallace Tree Multipliers
135
Modified Booth-Wallace Tree Multipliers
Use the 3:2 counters and 2:2 counters Number of levels of = log (16/2) / log (3/2) ≈6 Irregular structure Fast Less area
136
Twin pipe serial-parallel multipliers
137
Signed twin pipe serial-parallel multipliers
“Sign” control line and the sign-change hardware
138
Unsigned twin pipe serial-parallel multipliers
Don’t need the “Sign” control line and the sign-change hardware
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.