Integer Multipliers.

Integer Multipliers

Multipliers A must have circuit in most DSP applications
A variety of multipliers exists that can be chosen based on their performance Serial, Serial/Parallel,Shift and Add, Array, Booth, Wallace Tree,….

16x16 multiplier converter Converter RB r e s t n RC RA

Multiplication Algorithm
X= Xn-1 Xn-2 ………..……X0 Multiplicand Y=Yn-1 Yn-2……………….Y Multiplier Yn-1X0 Yn-2X0 Yn-3X0 …… Y1X0 Y0X0 Yn-1X1 Yn-2X1 Yn-3X1 …… Y1X1 Y0X1 Yn-1X2 Yn-2X2 Yn-3X2 …… Y1X2 Y0X2 … … … … … … … … …. Yn-1Xn-2 Yn-2X0 n-2 Yn-3X n …… Y1Xn-2 Y0Xn-2 Yn-1Xn-1 Yn-2X0n-1 Yn-3Xn …… Y1Xn-1 Y0Xn-1 P2n P2n P2n P P P0

1. Multiplication Algorithms
Implementation of multiplication of binary numbers boils down to how to do the additions. Consider the two 8 bit numbers A and B to generate the 16 bit product P. First generate the 64 partial Products and then add them up.

Multiplier Design MU Storage R REG E G OUT I N ( Multiplier Unit)
MU ( Multiplier Unit) R E G I N REG OUT Control Unit Storage

Serial Multiplier X: x3x2x1x0 Y:y 3y2y1y0 Input Sequence for G1:
00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0 00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0 Reset: Slide 1

Si: the ith bit of the final result
X: x3x2x1x0 Y:y 3y2y1y0 Input Sequence for G1: 00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0 00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0 Reset: Slide 2

Si: the ith bit of the final result Ci: the only carry from column i

Sij: the jth partial sum for column i Cij: the jth partial carry from column i X: x3x2x1x0 Y:y 3y2y1y0 Input Sequence for G1: 00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0 00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0 Reset: Slide 7

Serial / Parallel Multiplier
Si: the ith bit of the final result Serial / Parallel Multiplier Slide 1 slide

Slide 2

Sij: the jth partial sum for column i Cij: the jth partial carry from column i Slide 3

Slide 8

Shift AND Add Multiplier
8 bit Adder MUX INPUT Ain (7 downto 0) REGA Result (7 downto 0) Result (15 downto 8) INPUT Bin (7 downto 0) CLOCK REGB REGC

Synchronous Shift and Add Multiplier controller
Multiplication process: 5 states: Idle, Init, Test, Add, and Shift&Count. Idle: Starts by receiving the Start signal; Init: Multiplicand and multiplier are loaded into a load register and a shift register, respectively; Test: The LSB in the shift register which contains the multiplier is tested to decide the next state;

Synchronous Shift and Add Multiplier ControllerDesign
Add: If LSB is ‘1’, then next state is to add the new partial product to the accumulation result, and the state machine transits to shift&count state ; Shift&Count: If LSB is ‘0’, then the two shift register shift their contains one bit right, and the counter counts up by one step. After that, the state machine transits back to test state; When the counter reaches to N , a Stop signal is asserted and the state machine goes to the idle state; Idle: In the idle state, a Done signal is asserted to indicate the end of multiplication.

n-bit Multiplier: Q0=1: Multiplicand is added to register A; the result is stored in register A; registers C, A, Q are shifted to the right one bit Q0=0: Registers C, A, Q are shifted to the right one bit Slide 1

Example: 4-bit Multiplier
Initial Values Slide 2

First Cycle--Add Slide 3

First Cycle--Shift Slide 4

Second Cycle--Shift Slide 5

Third Cycle--Add Slide 6

Third Cycle--Shift Slide 7

Fourth Cycle--Add Slide 8

Fourth Cycle--Shift Slide 9

4*4 Synchronous Shift and Add Multiplier Design Layout Design
Floor plan of the 4*4 Synchronous Shift and Add Multiplier

Comparison between Synchronous and Asynchronous Approaches
.

Example : (simulated by Ovais Ahmed)
Multiplicand = = 8916 Multiplier = = AB16 Expected Result = =5B8316

Array Multiplier · Regular structure based on add and shift algorithm.
· Regular structure based on add and shift algorithm. · Addition is mainly done by carry save algorithm. · Sign bit extension results in a higher capacitive load and slows down the speed of the circuit.

Addition with CLA

Array Multiplier with CSA

Critical Path with Array Multipliers
FA FA FA HA FA FA FA HA FA FA FA HA Two of the possible paths for the Ripple-Carry based 4*4 Multiplier Area = (N*N) AND Gate + (N-1)N Full-Adder τ Delay = + (2N-1) τ HA FA

Wallace Tree

Array Multiplier + Wallace Tree

Baugh-Wooley Algorithm
Convert negative partial products to positive representation No sign-extension required 12/7/2018 Concordia VLSI Lab 59 slide 59

examples of 5-by-5 Baugh-Wooley
12/7/2018 Concordia VLSI Lab 60

Squarer using Baugh-Wooley Algorithm
* a7*a0 a6*a0 a5*a0 a4*a0 a3*a0 a2*a0 a1*a0 a0*a0 a7*a1 a6*a1 a5*a1 a4*a1 a3*a1 a2*a1 a1*a1 a0*a1 a7*a2 a6*a2 a5*a2 a4*a2 a3*a2 a2*a2 a1*a2 a0*a2 a7*a3 a6*a3 a5*a3 a4*a3 a3*a3 a2*a3 a1*a3 a0*a3 a7*a4 a6*a4 a5*a4 a4*a4 a3*a4 a2*a4 a1*a4 a0*a4 a7*a5 a6*a5 a5*a5 a4*a5 a3*a5 a2*a5 a1*a5 a0*a5 a7*a6 a6*a6 a5*a6 a4*a6 a3*a6 a2*a6 a1*a6 a0*a6 a7*a7 a6*a7 a5*a7 a4*a7 a3*a7 a2*a7 a1*a7 a0*a7 ‘0' S15, S14 S13 S12 S11 S10 S9 S8 S7 S6 S5 S4 S3 S2 S1 S0

Example of an 8bit squarer

Array Multiplier 32bits by 32bits multiplier

Booth (Radix-4) Multiplier
· Radix-4 (3 bit recoding) reduces number of partial products to be added by half. · Great saving in area and increased speed. A = -an-12n-1 + an-22n-2 + an-32n-3 + …. + a12 + a0 B = -bn-12n-1 + bn-22n-2 + bn-32n-3 + …. + b12 + b0 · Base 4 redundant sign digit representation of B is (n/2) - 1 B =  i Ki i = 0

· · Ki is calculated by following equation
Ki = -2b2i+1 + b2i + b2i i = 0,1,2,….(n-2)/2 · 3 bits of Multiplier B, b2i+1, b2i, b2i-1, are examined and corresponding Ki is calculated. · B is always appended on the right with zero (b-1 = 0), and n is always even (B is sign extended if needed). · The product AB is then obtained by adding n/2 partial products. (n/2) - 1 AB = P =  22i Ki A i = 0

Booth Algorithm Decoding of multiplier to generate signals for hardware use
Xi+1 Xi Xi-1 OP NEG ZERO TWO 1 2

Booth Algorithm Three bits of the multiplicand at a time
A Booth recoded multiplier examines Three bits of the multiplicand at a time It determine whether to add zero, 1, -1, 2, or -2 of that rank of the multiplicand. The operation to be performed is based on the current two bits of the multiplicand and the previous bit Xi+1 X Xi-1 Zi/2 1 2 -2 -1

BIT M is 21 20 2-1 OPERATION multiplied Xi Xi+1 Xi+2 by
M is 21 20 2-1 OPERATION multiplied Xi Xi+1 Xi+2 by add zero (no string) +0 1 add multipleic (end of string) +X add multiplic. (a string) add twice the mul. (end of string) +2X sub. twice the m. (beg. of string) -2X sub. the m. (-2X and +X) -X sub . the m. (beg. of string) sub. zero (center of string) -0

Booth Algorithm- dot notation
Multiplicand A = ● ● ● ● Multiplier B = (●●)(●●) Partial product bits ● ● ● ● (B1B0)2A40 Partial product bits ● ● ● ● (B3B2)A41 Product P = ● ● ● ● ● ● ● ●

Added to the multiplier
Example The following example is used to show how the calculation is done properly. Multiplicand X = Added to the multiplier Multiplier Y = After booth decoding, Y is decoded as to multiply X by +2, -1, +1 separately, then shift the partial product two bits and add them together. X* X* X*

Sign Extension

Segmented input operands
Sign extension Traditional sign-extension scheme Segment the input operands based on the size of embedded blocks Multiply the segmented inputs and extend the sign bit of each partial products Sum all partial products Segmented input operands Sign extension × + Final result partial products Sign 12/7/2018 Concordia VLSI Lab 72 slide 72

Booth Algorithm-Example 1
Example 1:

Booth Algorithm Example 2
Notice sign extensions

Booth Algorithm-Example 3
Notice the sign extensions

Comparison of Booth and parallel multiplier shift and Add

Template to reduce sign extensions for Booth Algorithm
Please note that each operand is 17 bit ie. the 17th bit is the sign bit. Also negative numbers are entered as 1’s complement, this is why you need to add the S in the right hand side of the diagram. If you use 2’complement then the S’s on right side of the diagram can be removed

Comparison of Template and the sign extension

Example of using the template
25 * with -35 as the multiplier. Using 8 bit representation Using the Template 25 * -35 Sign bit Add SS Add inverted S Add Inverted sign and add 1 * 1 Add Inverted sign bit * -1 * 2 No sign bit * -1 This is a –ve number. Convert it = 875

Booth Multiplier Components
Multiplicand Booth Encoder PPU (Partial products unit) PPA (Partial products adding unit) Product

Wallace Tree and Ripple Carry Adder Structure.
Of 8*8 multiplier With Pipeline

Hardware implementation of Booth with shift and add

Simulation Plan

Testing the Design

Simulation For Parallel Multipliers
Signed Number: Unsigned Number:

Simulation For Signed S/P Multipliers
There are 340 ns delay between the result and the operators because of the D flip-flops delay.

FPGA after implementation, areas of programming shown clearly

Another implementation of the above after pipelining, the place and rout has paced the design in different places.

Spartacus FPGA board

Testing the multiplication system

Comparison of Multipliers
Array Multiplier Modified Booth Multiplier Wallace-Tree Multiplier Modified Booth-Wallace Tree Multiplier Twin Pipe Serial-Parallel Multiplier Behavioral Multiplier Area – Total CLB’s (#) 490.00 Maximum Delay D(ns) 35.78 24.43 18.93 18.53 (3.36x32) 49.33 Total Dynamic Power P (W) 7.52 6.33 7.46 6.41 0.28 6.24 Delay ·Power Product (DP) (ns W) 268.98 154.64 141.14 118.76 30.62 307.58 Area•Power Product (AP) (# W) 139.54 Area•Delay Product (AD) (# ns) 1.10E+05 6.47E+04 6.30E+04 4.95E+04 5.27E+04 1.48E+05 Area•Delay2 Product (AD2) (# ns2) 3.94E+06 1.58E+06 1.19E+06 9.18E+05 5.66E+06 7.28E+06 Table 7. Performance comparison for two’s complement multipliers By Chen Yaoquan, M.Eng. 2005

Array Multiplier Modified Booth Multiplier Wallace-Tree Multiplier Modified Booth-Wallace Tree Multiplier Twin Pipe Serial-Parallel Multiplier Behavioral Multiplier Area – Total CLB’s (#) 487.00 Maximum Delay D(ns) 37.23 25.33 18.93 18.33 107.52 44.50 Total Dynamic Power P (W) 7.57 6.66 7.32 0.29 6.26 Delay ·Power Product (DP) (ns W) 281.88 168.77 138.60 122.13 30.66 278.53 Area•Power Product (AP) (# W) 138.89 Area•Delay Product (AD) (# ns) 1.22E+05 7.09E+04 6.29E+04 5.22E+04 5.24E+04 1.34E+05 Area•Delay2 Product (AD2) (# ns2) 4.55E+06 1.80E+06 1.19E+06 9.56E+05 5.63E+06 5.95E+06 Table 7. Performance comparison for Unsigned multipliers By Chen Yaoquan, M.Eng. 2005

Change the value of “set_max_delay” in Script file (ns) 10 20 30 40 50 60 >60 Area(#) 3014.5 3013.0 3110.0 3193.5 3019.5 2999.5 2978.5 Power(w) 6.6499 6.6470 7.5683 8.1878 8.0645 8.0419 8.0156 Delay(ns) 31.98 30.93 30.08 39.93 49.88 59.63 The relation of Area and Delay for behavioral multiplier -- "banana curve"

Array Multiplier Modified Booth Multiplier Wallace-Tree Multiplier Modified Booth-Wallace Tree Multiplier Twin Pipe Serial-Parallel Multiplier Behavioral Multiplier Area Medium Small Large Smallest Critical Delay Fast Very Fast Fastest Very Large Power Consumption Complexity Simple Complex More Complex Simplest Implement Easy Difficut Easiest By Chen Yaoquan, M.Eng. 2005

Pipelining Simulation

Synthesis for Signed Multipliers
Array Modified Booth Wallace Tree Modified Booth -Wallace Tree Twin Pipe S/P Behavioral

Synthesis for Unsigned Multipliers
Array Modified Booth Wallace Tree Modified Booth -Wallace Tree Twin Pipe S/P Behavioral

Conclusion Modified Booth and Wallace Tree are the best techniques for high speed multiplication. Wallace Tree has the best performance, but it is hard to implement. Booth algorithm based multipliers have lower area among parallel multipliers. For behavioral multipliers, the area will increase while the delay decreases.

Comparison Array Multiplier Modified Booth Multiplier
Array Multiplier Modified Booth Multiplier Wallace Tree Multiplier Modified Booth & Wallace Tree Multiplier Twin Pipe Serial-Parallel Multiplier Area – Total CLB’s (#) 1165 1292 1659 1239 133 Maximum Delay (ns) 187.87ns 139.41ns 101.14ns 101.43ns 22.58ns (722.56ns) Power Consumption at highest speed (mW) mW (at 188ns) 23.136mW (at 140ns) 30.95mW (at ns) 30.862mW (at ns) 2.089mW (at ns) Delay Power Product (DP) (ns mW) Area  Power Product (AP) (# mW) x 103 x 103 x 103 x 103 Area  Delay Product (AD) (# ns) x 103 x 103 x 103 x 103 x 103 Area  Delay2 Product(AD2) (# ns2) x 106 x 106 x 106 x 106 x 106

NOTICE · The rest of these slides are for extra information only
and are not part of the lecture

Array Addition

Addition of 8 binary numbers using the Wallace tree principal

Baugh-Wooley two's complement multiplier:

Cluster Multipliers Divide the multiplier into smaller multipliers

Cluster Multipliers The circuit used to generate the enable signal
8-bit cluster low power multiplier

Cluster Multipliers Dividing the multiplication circuit into clusters (blocks) of smaller multipliers Applying clock gating techniques to disable the blocks that are producing a zero result. Features Low Power (claims 13.4 % savings)

Multiplexer-Based Array Multipliers
Z j xjyj

Two types of cells: Cell 1: produce the terms Zij2j and includes a full adder of carry save adder array Cell 2: produce the terms xjyj 2j and includes a full adder of carry save adder array

Characteristics Faster than Modified Booth Unlike Booth, does not require encoding logic Requires approximately N2/2 cells Has a zigzag shape, thus not layout-friendly

Improvement More rectangular layout Save up to 40 percent area without penalties Outperforms the modified Booth multiplier in both speed and power by 13% to 26%

Gray-Encoded Array Multiplier
Dec Hyb 0000 4 0100 -8 1100 -4 1000 1 0001 5 0101 -7 1101 -3 1001 2 0011 6 0111 -6 1111 -2 1011 3 0010 7 0110 -5 1110 -1 1010 2’s complement Hybrid Coding Having a single bit different for consecutive values Reducing the number of transitions, and thus power ( for highly correlated streams ).

An 8-bit wide 2’s complement radix-4 array multiplier

Characteristics Uses gray code to reduce the switching activity of multiplier Saves 45.6% power than Modified Booth Uses greater area(26.4% ) than Modified Booth

Ultra-high Speed Parallel Multiplier
How to ultra-high speed? Based on Modified Booth Algorithm and Tree Structure (Column compress) Chooses efficient counters (3:2 and 5:3) Uses the new compressor (faster 20% ) Uses First Partial product Addition (FPA) Algorithm (reducing the bits of CLA by 50%)

Ultra-high Speed Parallel Multiplier
Divide into 3 rows or 5 rows only (most efficient). Calculate the partial products as soon as possible. The final CLA is only 16-bit instead of 32-bit. Calculation process using parallel counter in case of 16x16 ---Totally reduce delay by about 30%

ULLRLF Multiplier ULLRLF stands for Upper/Lower Left-to-Right Leapfrog. Combine the following techniques: Signal flow optimization in [3:2] adder array for partial product reduction, Left-to-right leapfrog (LRLF) signal flow, Splitting of the reduction array into upper/lower parts.

ULLRLF Multiplier Signal flow optimization in [3:2] adder array
PPij is always connected to pin A Sin/Cin are connected to B/C , most Sin signals are connected to C Signal flow optimization in [3:2] adder array -- For n = 32, the delay is reduced by 30 percent. -- The power is saved also.

ULLRLF Multiplier 2) Left-to-Right Leapfrog (LRLF) Structure
The sum signals skip over alternate rows. 2) Left-to-Right Leapfrog (LRLF) Structure -- The delay of signals is more balanceable. -- Low power.

ULLRLF Multiplier 3) Upper/Lower Split Structure
Only n+2 bits 3) Upper/Lower Split Structure -- The long path of data path be broken into parallel short paths, there would be a saving in power. -- The delay of Partial Products Reduction is reduced.

ULLRLF Multiplier ULLRLF multipliers have less power than optimized tree multipliers for n ≤ 32 while keeping similar delay and area. With more regularity and inherently shorter interconnects, the ULLRLF structure presents a competitive alternative to tree structures. Floorplan of ULLRLF (n = 32)

Signed Array Multiplier

Unsigned Array Multiplier

Signed Modified Booth Multiplier

Unsigned Modified Booth Multiplier

Wallace Tree multipliers

Use the 3:2 counters and 2:2 counters Number of levels of = log (32/2) / log (3/2) ≈8 Irregular structure Fast

2-level hierarchical

Modified Booth-Wallace Tree Multipliers

Modified Booth-Wallace Tree Multipliers
Use the 3:2 counters and 2:2 counters Number of levels of = log (16/2) / log (3/2) ≈6 Irregular structure Fast Less area

Twin pipe serial-parallel multipliers

Signed twin pipe serial-parallel multipliers
“Sign” control line and the sign-change hardware

Unsigned twin pipe serial-parallel multipliers
Don’t need the “Sign” control line and the sign-change hardware

Integer Multipliers.

Similar presentations

Presentation on theme: "Integer Multipliers."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Integer Multipliers.

Similar presentations

Presentation on theme: "Integer Multipliers."— Presentation transcript:

Similar presentations

About project

Feedback