Integer Multipliers.

Slides:

Advertisements

Similar presentations

1 Integer Multipliers. 2 Multipliers A must have circuit in most DSP applications A variety of multipliers exists that can be chosen based on their performance.

Advertisements

Registers and Counters

Using Carry-Save Adders For Radix- 4, Can Be Used to Generate 3a – No Booth’s Slight Delay Penalty from CSA – 3 Gates.

Copyright 2008 Koren ECE666/Koren Part.6b.1 Israel Koren Spring 2008 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Digital Computer.

EECS Components and Design Techniques for Digital Systems Lec 18 – Arithmetic II (Multiplication) David Culler Electrical Engineering and Computer.

UNIVERSITY OF MASSACHUSETTS Dept

Contemporary Logic Design Arithmetic Circuits © R.H. Katz Lecture #24: Arithmetic Circuits -1 Arithmetic Circuits (Part II) Randy H. Katz University of.

Chapter 7 - Part 2 1 CPEN Digital System Design Chapter 7 – Registers and Register Transfers Part 2 – Counters, Register Cells, Buses, & Serial Operations.

Copyright 2008 Koren ECE666/Koren Part.6a.1 Israel Koren Spring 2008 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Digital Computer.

KU College of Engineering Elec 204: Digital Systems Design

Multiplication.

Low-power, High-speed Multiplier Architectures

Aug Shift Operations Source: David Harris. Aug Shifter Implementation Regular layout, can be compact, use transmission gates to avoid threshold.

Chapter 6-2 Multiplier Multiplier Next Lecture Divider

Digital Integrated Circuits Chpt. 5Lec /29/2006 CSE477 VLSI Digital Circuits Fall 2002 Lecture 21: Multiplier Design Mary Jane Irwin (

ECE 645 – Computer Arithmetic Lecture 7: Tree and Array Multipliers ECE 645—Computer Arithmetic 3/18/08.

Reconfigurable Computing - Multipliers: Options in Circuit Design John Morris Chung-Ang University The University of Auckland ‘Iolanthe’ at 13 knots on.

Multi-operand Addition

Advanced VLSI Design Unit 05: Datapath Units. Slide 2 Outline  Adders  Comparators  Shifters  Multi-input Adders  Multipliers.

EKT 221/4 DIGITAL ELECTRONICS II  Registers, Micro-operations and Implementations - Part3.

ENG241 Digital Design Week #8 Registers and Counters.

Full Tree Multipliers All k PPs Produced Simultaneously Input to k-input Multioperand Tree Multiples of a (Binary, High-Radix or Recoded) Formed at Top.

CSE477 L21 Multiplier Design.1Irwin&Vijay, PSU, 2002 CSE477 VLSI Digital Circuits Fall 2002 Lecture 21: Multiplier Design Mary Jane Irwin (

Full Adder Truth Table Conjugate Symmetry A B C CARRY SUM

Combinational Circuits

Prof. Sin-Min Lee Department of Computer Science

CHAPTER 18 Circuits for Arithmetic Operations

Registers and Counters

Multiplier Design [Adapted from Rabaey’s Digital Integrated Circuits, Second Edition, ©2003 J. Rabaey, A. Chandrakasan, B. Nikolic]

CSE477 VLSI Digital Circuits Fall 2003 Lecture 21: Multiplier Design

Sequential Multipliers

UNIVERSITY OF MASSACHUSETTS Dept

CSE 575 Computer Arithmetic Spring 2003 Mary Jane Irwin (www. cse. psu

EKT 221 : Digital 2 COUNTERS.

Digital Systems Section 8 Multiplexers. Digital Systems Section 8 Multiplexers.

1 Integer Multipliers. 2 Multipliers A must have circuit in most DSP applications A variety of multipliers exists that can be chosen based on their performance.

Multipliers Multipliers play an important role in today’s digital signal processing and various other applications. The common multiplication method is.

Digital Systems Section 14 Registers. Digital Systems Section 14 Registers.

Chapter 1 Number Systems, Number Representations, and Codes

Instructor: Alexander Stoytchev

Digital System Design Review.

Registers and Counters Register : A Group of Flip-Flops. N-Bit Register has N flip-flops. Each flip-flop stores 1-Bit Information. So N-Bit Register Stores.

Instructor: Alexander Stoytchev

Chap. 8 Datapath Units: Multiplier Design

ECE 434 Advanced Digital System L13

ECE 434 Advanced Digital System L12

Tree and Array Multipliers

Unsigned Multiplication

King Fahd University of Petroleum and Minerals

ECEN 248: INTRODUCTION TO DIGITAL SYSTEMS DESIGN

Multiplier-less Multiplication by Constants

Topics Multipliers..

Digital Systems Section 12 Binary Adders. Digital Systems Section 12 Binary Adders.

CS 140 Lecture 14 Standard Combinational Modules

Instructor: Alexander Stoytchev

Overview Part 1 – Design Procedure Part 2 – Combinational Logic

CSE 140 Lecture 14 Standard Combinational Modules

UNIVERSITY OF MASSACHUSETTS Dept

UNIVERSITY OF MASSACHUSETTS Dept

UNIVERSITY OF MASSACHUSETTS Dept

Lecture 9 Digital VLSI System Design Laboratory

Comparison of Various Multipliers for Performance Issues

Sequential Multipliers

UNIVERSITY OF MASSACHUSETTS Dept

CHAPTER 18 Circuits for Arithmetic Operations

Appendix J Authors: John Hennessy & David Patterson.

Booth Recoding: Advantages and Disadvantages

Number Representation

UNIVERSITY OF MASSACHUSETTS Dept

Presentation transcript:

Integer Multipliers

Multipliers A must have circuit in most DSP applications A variety of multipliers exists that can be chosen based on their performance Serial, Serial/Parallel,Shift and Add, Array, Booth, Wallace Tree,….

16x16 multiplier converter Converter RB r e s t n RC RA

Multiplication Algorithm X= Xn-1 Xn-2 ………..……X0 Multiplicand Y=Yn-1 Yn-2……………….Y0 Multiplier Yn-1X0 Yn-2X0 Yn-3X0 …… Y1X0 Y0X0 Yn-1X1 Yn-2X1 Yn-3X1 …… Y1X1 Y0X1 Yn-1X2 Yn-2X2 Yn-3X2 …… Y1X2 Y0X2 … … … … …. …. …. …. …. Yn-1Xn-2 Yn-2X0 n-2 Yn-3X n-2 …… Y1Xn-2 Y0Xn-2 Yn-1Xn-1 Yn-2X0n-1 Yn-3Xn-1 …… Y1Xn-1 Y0Xn-1 ----------------------------------------------------------------------------------------------------------------------------------------- P2n-1 P2n-2 P2n-3 P2 P1 P0

1. Multiplication Algorithms Implementation of multiplication of binary numbers boils down to how to do the additions. Consider the two 8 bit numbers A and B to generate the 16 bit product P. First generate the 64 partial Products and then add them up.

Multiplier Design MU Storage R REG E G OUT I N ( Multiplier Unit) MU ( Multiplier Unit) R E G I N REG OUT Control Unit Storage

Serial Multiplier X: x3x2x1x0 Y:y 3y2y1y0 Input Sequence for G1: 00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0 00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0 Reset:010000100001000010000 Slide 1

Si: the ith bit of the final result X: x3x2x1x0 Y:y 3y2y1y0 Input Sequence for G1: 00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0 00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0 Reset:010000100001000010000 Slide 2

Si: the ith bit of the final result X: x3x2x1x0 Y:y 3y2y1y0 Input Sequence for G1: 00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0 00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0 Reset:010000100001000010000 Slide 3

Si: the ith bit of the final result X: x3x2x1x0 Y:y 3y2y1y0 Input Sequence for G1: 00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0 00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0 Reset:010000100001000010000 Slide 4

Si: the ith bit of the final result X: x3x2x1x0 Y:y 3y2y1y0 Input Sequence for G1: 00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0 00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0 Reset:010000100001000010000 Slide 5

Si: the ith bit of the final result Ci: the only carry from column i X: x3x2x1x0 Y:y 3y2y1y0 Input Sequence for G1: 00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0 00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0 Reset:010000100001000010000 Slide 6

Si: the ith bit of the final result Ci: the only carry from column i Sij: the jth partial sum for column i Cij: the jth partial carry from column i X: x3x2x1x0 Y:y 3y2y1y0 Input Sequence for G1: 00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0 00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0 Reset:010000100001000010000 Slide 7

Si: the ith bit of the final result Ci: the only carry from column i Sij: the jth partial sum for column i Cij: the jth partial carry from column i X: x3x2x1x0 Y:y 3y2y1y0 Input Sequence for G1: 00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0 00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0 Reset:010000100001000010000 Slide 8

Si: the ith bit of the final result Ci: the only carry from column i Sij: the jth partial sum for column i Cij: the jth partial carry from column i X: x3x2x1x0 Y:y 3y2y1y0 Input Sequence for G1: 00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0 00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0 Reset:010000100001000010000 Slide 9

Si: the ith bit of the final result Ci: the only carry from column i Sij: the jth partial sum for column i Cij: the jth partial carry from column i X: x3x2x1x0 Y:y 3y2y1y0 Input Sequence for G1: 00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0 00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0 Reset:010000100001000010000 Slide 10

Si: the ith bit of the final result Ci: the only carry from column i Sij: the jth partial sum for column i Cij: the jth partial carry from column i X: x3x2x1x0 Y:y 3y2y1y0 Input Sequence for G1: 00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0 00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0 Reset:010000100001000010000 Slide 11

Si: the ith bit of the final result Ci: the only carry from column i Sij: the jth partial sum for column i Cij: the jth partial carry from column i X: x3x2x1x0 Y:y 3y2y1y0 Input Sequence for G1: 00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0 00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0 Reset:010000100001000010000 Slide 12

Si: the ith bit of the final result Ci: the only carry from column i Sij: the jth partial sum for column i Cij: the jth partial carry from column i X: x3x2x1x0 Y:y 3y2y1y0 Input Sequence for G1: 00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0 00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0 Reset:010000100001000010000 Slide 13

Si: the ith bit of the final result Ci: the only carry from column i Sij: the jth partial sum for column i Cij: the jth partial carry from column i X: x3x2x1x0 Y:y 3y2y1y0 Input Sequence for G1: 00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0 00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0 Reset:010000100001000010000 Slide 14

Si: the ith bit of the final result Ci: the only carry from column i Sij: the jth partial sum for column i Cij: the jth partial carry from column i X: x3x2x1x0 Y:y 3y2y1y0 Input Sequence for G1: 00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0 00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0 Reset:010000100001000010000 Slide 15

Si: the ith bit of the final result Ci: the only carry from column i Sij: the jth partial sum for column i Cij: the jth partial carry from column i X: x3x2x1x0 Y:y 3y2y1y0 Input Sequence for G1: 00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0 00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0 Reset:010000100001000010000 Slide 16

Si: the ith bit of the final result Ci: the only carry from column i Sij: the jth partial sum for column i Cij: the jth partial carry from column i X: x3x2x1x0 Y:y 3y2y1y0 Input Sequence for G1: 00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0 00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0 Reset:010000100001000010000 Slide 17

Si: the ith bit of the final result Ci: the only carry from column i Sij: the jth partial sum for column i Cij: the jth partial carry from column i X: x3x2x1x0 Y:y 3y2y1y0 Input Sequence for G1: 00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0 00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0 Reset:010000100001000010000 Slide 18

Si: the ith bit of the final result Ci: the only carry from column i Sij: the jth partial sum for column i Cij: the jth partial carry from column i X: x3x2x1x0 Y:y 3y2y1y0 Input Sequence for G1: 00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0 00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0 Reset:010000100001000010000 Slide 19

Si: the ith bit of the final result Ci: the only carry from column i Sij: the jth partial sum for column i Cij: the jth partial carry from column i X: x3x2x1x0 Y:y 3y2y1y0 Input Sequence for G1: 00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0 00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0 Reset:010000100001000010000 Slide 20

Si: the ith bit of the final result Ci: the only carry from column i Sij: the jth partial sum for column i Cij: the jth partial carry from column i X: x3x2x1x0 Y:y 3y2y1y0 Input Sequence for G1: 00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0 00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0 Reset:010000100001000010000 Slide 21

Si: the ith bit of the final result Ci: the only carry from column i Sij: the jth partial sum for column i Cij: the jth partial carry from column i X: x3x2x1x0 Y:y 3y2y1y0 Input Sequence for G1: 00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0 00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0 Reset:010000100001000010000 Slide 21

Serial / Parallel Multiplier Si: the ith bit of the final result Serial / Parallel Multiplier Slide 1 slide

Si: the ith bit of the final result Ci: the only carry from column i Slide 2

Si: the ith bit of the final result Ci: the only carry from column i Sij: the jth partial sum for column i Cij: the jth partial carry from column i Slide 3

Si: the ith bit of the final result Ci: the only carry from column i Sij: the jth partial sum for column i Cij: the jth partial carry from column i Slide 4

Si: the ith bit of the final result Ci: the only carry from column i Sij: the jth partial sum for column i Cij: the jth partial carry from column i Slide 5

Si: the ith bit of the final result Ci: the only carry from column i Sij: the jth partial sum for column i Cij: the jth partial carry from column i Slide 6

Si: the ith bit of the final result Ci: the only carry from column i Sij: the jth partial sum for column i Cij: the jth partial carry from column i Slide 7

Si: the ith bit of the final result Ci: the only carry from column i Slide 8

Shift AND Add Multiplier 8 bit Adder MUX INPUT Ain (7 downto 0) REGA Result (7 downto 0) Result (15 downto 8) INPUT Bin (7 downto 0) CLOCK REGB REGC

Synchronous Shift and Add Multiplier controller Multiplication process: 5 states: Idle, Init, Test, Add, and Shift&Count. Idle: Starts by receiving the Start signal; Init: Multiplicand and multiplier are loaded into a load register and a shift register, respectively; Test: The LSB in the shift register which contains the multiplier is tested to decide the next state;

Synchronous Shift and Add Multiplier ControllerDesign Add: If LSB is ‘1’, then next state is to add the new partial product to the accumulation result, and the state machine transits to shift&count state ; Shift&Count: If LSB is ‘0’, then the two shift register shift their contains one bit right, and the counter counts up by one step. After that, the state machine transits back to test state; When the counter reaches to N , a Stop signal is asserted and the state machine goes to the idle state; Idle: In the idle state, a Done signal is asserted to indicate the end of multiplication.

n-bit Multiplier: Q0=1: Multiplicand is added to register A; the result is stored in register A; registers C, A, Q are shifted to the right one bit Q0=0: Registers C, A, Q are shifted to the right one bit Slide 1

Example: 4-bit Multiplier Initial Values Slide 2

Example: 4-bit Multiplier First Cycle--Add Slide 3

Example: 4-bit Multiplier First Cycle--Shift Slide 4

Example: 4-bit Multiplier Second Cycle--Shift Slide 5

Example: 4-bit Multiplier Third Cycle--Add Slide 6

Example: 4-bit Multiplier Third Cycle--Shift Slide 7

Example: 4-bit Multiplier Fourth Cycle--Add Slide 8

Example: 4-bit Multiplier Fourth Cycle--Shift Slide 9

4*4 Synchronous Shift and Add Multiplier Design Layout Design Floor plan of the 4*4 Synchronous Shift and Add Multiplier

Comparison between Synchronous and Asynchronous Approaches .

Example : (simulated by Ovais Ahmed) Multiplicand = 100010012 = 8916 Multiplier = 101010112 = AB16 Expected Result = 1011011100000112 =5B8316

Array Multiplier · Regular structure based on add and shift algorithm. · Regular structure based on add and shift algorithm. · Addition is mainly done by carry save algorithm. · Sign bit extension results in a higher capacitive load and slows down the speed of the circuit.

Addition with CLA

Array Multiplier with CSA

Critical Path with Array Multipliers FA FA FA HA FA FA FA HA FA FA FA HA Two of the possible paths for the Ripple-Carry based 4*4 Multiplier Area = (N*N) AND Gate + (N-1)N Full-Adder τ Delay = + (2N-1) τ HA FA

Wallace Tree

Array Multiplier + Wallace Tree

Baugh-Wooley Algorithm Convert negative partial products to positive representation No sign-extension required 12/7/2018 Concordia VLSI Lab 59 slide 59

examples of 5-by-5 Baugh-Wooley 12/7/2018 Concordia VLSI Lab 60

Squarer using Baugh-Wooley Algorithm * ------------- a7*a0 a6*a0 a5*a0 a4*a0 a3*a0 a2*a0 a1*a0 a0*a0 a7*a1 a6*a1 a5*a1 a4*a1 a3*a1 a2*a1 a1*a1 a0*a1 a7*a2 a6*a2 a5*a2 a4*a2 a3*a2 a2*a2 a1*a2 a0*a2 a7*a3 a6*a3 a5*a3 a4*a3 a3*a3 a2*a3 a1*a3 a0*a3 a7*a4 a6*a4 a5*a4 a4*a4 a3*a4 a2*a4 a1*a4 a0*a4 a7*a5 a6*a5 a5*a5 a4*a5 a3*a5 a2*a5 a1*a5 a0*a5 a7*a6 a6*a6 a5*a6 a4*a6 a3*a6 a2*a6 a1*a6 a0*a6 a7*a7 a6*a7 a5*a7 a4*a7 a3*a7 a2*a7 a1*a7 a0*a7 ‘0' S15, S14 S13 S12 S11 S10 S9 S8 S7 S6 S5 S4 S3 S2 S1 S0

Example of an 8bit squarer

Array Multiplier 32bits by 32bits multiplier

Booth (Radix-4) Multiplier · Radix-4 (3 bit recoding) reduces number of partial products to be added by half. · Great saving in area and increased speed. A = -an-12n-1 + an-22n-2 + an-32n-3 + …. + a12 + a0 B = -bn-12n-1 + bn-22n-2 + bn-32n-3 + …. + b12 + b0 · Base 4 redundant sign digit representation of B is (n/2) - 1 B =  22i Ki i = 0

· · Ki is calculated by following equation Ki = -2b2i+1 + b2i + b2i-1 i = 0,1,2,….(n-2)/2 · 3 bits of Multiplier B, b2i+1, b2i, b2i-1, are examined and corresponding Ki is calculated. · B is always appended on the right with zero (b-1 = 0), and n is always even (B is sign extended if needed). · The product AB is then obtained by adding n/2 partial products. (n/2) - 1 AB = P =  22i Ki A i = 0

Booth Algorithm Decoding of multiplier to generate signals for hardware use Xi+1 Xi Xi-1 OP NEG ZERO TWO 1 2

Booth Algorithm Three bits of the multiplicand at a time A Booth recoded multiplier examines Three bits of the multiplicand at a time It determine whether to add zero, 1, -1, 2, or -2 of that rank of the multiplicand. The operation to be performed is based on the current two bits of the multiplicand and the previous bit Xi+1 X Xi-1 Zi/2 1 2 -2 -1

BIT M is 21 20 2-1 OPERATION multiplied Xi Xi+1 Xi+2 by M is 21 20 2-1 OPERATION multiplied Xi Xi+1 Xi+2 by add zero (no string) +0 1 add multipleic (end of string) +X add multiplic. (a string) add twice the mul. (end of string) +2X sub. twice the m. (beg. of string) -2X sub. the m. (-2X and +X) -X sub . the m. (beg. of string) sub. zero (center of string) -0

Booth Algorithm- dot notation Multiplicand A = ● ● ● ● Multiplier B = (●●)(●●) Partial product bits ● ● ● ● (B1B0)2A40 Partial product bits ● ● ● ● (B3B2)A41 Product P = ● ● ● ● ● ● ● ●

Added to the multiplier Example The following example is used to show how the calculation is done properly. Multiplicand X = 000011 Added to the multiplier Multiplier Y = 011101 0 1 1 1 0 1 0 After booth decoding, Y is decoded as to multiply X by +2, -1, +1 separately, then shift the partial product two bits and add them together. X* +1 000000000011 X* -1 1111111101 X* +2 00000110 -------------------------------------------- 000001010111

Sign Extension

Segmented input operands Sign extension Traditional sign-extension scheme Segment the input operands based on the size of embedded blocks Multiply the segmented inputs and extend the sign bit of each partial products Sum all partial products Segmented input operands Sign extension × + Final result partial products Sign 12/7/2018 Concordia VLSI Lab 72 slide 72

Booth Algorithm-Example 1 Example 1:

Booth Algorithm Example 2 Notice sign extensions

Booth Algorithm-Example 3 Notice the sign extensions

Comparison of Booth and parallel multiplier shift and Add

Template to reduce sign extensions for Booth Algorithm Please note that each operand is 17 bit ie. the 17th bit is the sign bit. Also negative numbers are entered as 1’s complement, this is why you need to add the S in the right hand side of the diagram. If you use 2’complement then the S’s on right side of the diagram can be removed

Comparison of Template and the sign extension

Example of using the template 25 * - 35 with -35 as the multiplier. Using 8 bit representation Using the Template 25 * -35 Sign bit 0 0 0 1 1 0 0 1 Add SS 1 1 0 1 1 1 0 1 0 Add inverted S Add Inverted sign and add 1 1 0 0 0 0 0 1 1 0 0 1 * 1 Add Inverted sign bit 1 0 1 1 1 0 0 1 1 1 * -1 1 0 0 1 1 0 0 1 0 * 2 No sign bit 1 1 0 0 1 1 1 * -1 1 1 1 1 0 0 1 0 0 1 0 1 0 1 This is a –ve number. Convert it 0 0 0 0 1 1 0 1 1 0 1 0 1 1 512 256 64 32 8 2 1 = 875

Booth Multiplier Components Multiplicand Booth Encoder PPU (Partial products unit) PPA (Partial products adding unit) Product

Wallace Tree and Ripple Carry Adder Structure. Of 8*8 multiplier With Pipeline

Hardware implementation of Booth with shift and add

Simulation Plan

Testing the Design

Simulation For Parallel Multipliers Signed Number: Unsigned Number:

Simulation For Signed S/P Multipliers There are 340 ns delay between the result and the operators because of the D flip-flops delay.

FPGA after implementation, areas of programming shown clearly

Another implementation of the above after pipelining, the place and rout has paced the design in different places.

Spartacus FPGA board

Testing the multiplication system

Comparison of Multipliers Array Multiplier Modified Booth Multiplier Wallace-Tree Multiplier Modified Booth-Wallace Tree Multiplier Twin Pipe Serial-Parallel Multiplier Behavioral Multiplier Area – Total CLB’s (#) 3076.50 2649.50 3325.50 2672.50 490.00 2993.50 Maximum Delay D(ns) 35.78 24.43 18.93 18.53 107.52 (3.36x32) 49.33 Total Dynamic Power P (W) 7.52 6.33 7.46 6.41 0.28 6.24 Delay ·Power Product (DP) (ns W) 268.98 154.64 141.14 118.76 30.62 307.58 Area•Power Product (AP) (# W) 23128.20 16771.60 24793.93 17127.79 139.54 18665.07 Area•Delay Product (AD) (# ns) 1.10E+05 6.47E+04 6.30E+04 4.95E+04 5.27E+04 1.48E+05 Area•Delay2 Product (AD2) (# ns2) 3.94E+06 1.58E+06 1.19E+06 9.18E+05 5.66E+06 7.28E+06 Table 7. Performance comparison for two’s complement multipliers By Chen Yaoquan, M.Eng. 2005

Comparison of Multipliers Array Multiplier Modified Booth Multiplier Wallace-Tree Multiplier Modified Booth-Wallace Tree Multiplier Twin Pipe Serial-Parallel Multiplier Behavioral Multiplier Area – Total CLB’s (#) 3280.50 2800.00 3321.50 2845.50 487.00 3003.00 Maximum Delay D(ns) 37.23 25.33 18.93 18.33 107.52 44.50 Total Dynamic Power P (W) 7.57 6.66 7.32 0.29 6.26 Delay ·Power Product (DP) (ns W) 281.88 168.77 138.60 122.13 30.66 278.53 Area•Power Product (AP) (# W) 24837.98 18656.40 24319.36 18959.57 138.89 18795.78 Area•Delay Product (AD) (# ns) 1.22E+05 7.09E+04 6.29E+04 5.22E+04 5.24E+04 1.34E+05 Area•Delay2 Product (AD2) (# ns2) 4.55E+06 1.80E+06 1.19E+06 9.56E+05 5.63E+06 5.95E+06 Table 7. Performance comparison for Unsigned multipliers By Chen Yaoquan, M.Eng. 2005

Comparison of Multipliers Change the value of “set_max_delay” in Script file (ns) 10 20 30 40 50 60 >60 Area(#) 3014.5 3013.0 3110.0 3193.5 3019.5 2999.5 2978.5 Power(w) 6.6499 6.6470 7.5683 8.1878 8.0645 8.0419 8.0156 Delay(ns) 31.98 30.93 30.08 39.93 49.88 59.63 The relation of Area and Delay for behavioral multiplier -- "banana curve"

Comparison of Multipliers Array Multiplier Modified Booth Multiplier Wallace-Tree Multiplier Modified Booth-Wallace Tree Multiplier Twin Pipe Serial-Parallel Multiplier Behavioral Multiplier Area Medium Small Large Smallest Critical Delay Fast Very Fast Fastest Very Large Power Consumption Complexity Simple Complex More Complex Simplest Implement Easy Difficut Easiest By Chen Yaoquan, M.Eng. 2005

Pipelining Simulation

Synthesis for Signed Multipliers Array Modified Booth Wallace Tree Modified Booth -Wallace Tree Twin Pipe S/P Behavioral

Synthesis for Unsigned Multipliers Array Modified Booth Wallace Tree Modified Booth -Wallace Tree Twin Pipe S/P Behavioral

Conclusion Modified Booth and Wallace Tree are the best techniques for high speed multiplication. Wallace Tree has the best performance, but it is hard to implement. Booth algorithm based multipliers have lower area among parallel multipliers. For behavioral multipliers, the area will increase while the delay decreases.

Comparison Array Multiplier Modified Booth Multiplier Array Multiplier Modified Booth Multiplier Wallace Tree Multiplier Modified Booth & Wallace Tree Multiplier Twin Pipe Serial-Parallel Multiplier Area – Total CLB’s (#) 1165 1292 1659 1239 133 Maximum Delay (ns) 187.87ns 139.41ns 101.14ns 101.43ns 22.58ns (722.56ns) Power Consumption at highest speed (mW) 16.6506mW (at 188ns) 23.136mW (at 140ns) 30.95mW (at 101.14ns) 30.862mW (at 101.43ns) 2.089mW (at 722.56ns) Delay Power Product (DP) (ns mW) 3128.15 3225.39 3130.28 3130.33 1509.42 Area  Power Product (AP) (# mW) 19.397 x 103 29.891 x 103 51.346 x 103 38.238 x 103 277.837 Area  Delay Product (AD) (# ns) 218.868 x 103 180.118 x 103 167.791 x 103 125.671 x 103 96.101 x 103 Area  Delay2 Product(AD2) (# ns2) 41.119 x 106 25.110 x 106 16.970 x 106 12.747 x 106 69.438 x 106

NOTICE · The rest of these slides are for extra information only and are not part of the lecture

Array Addition

Addition of 8 binary numbers using the Wallace tree principal

Baugh-Wooley two's complement multiplier:

Cluster Multipliers Divide the multiplier into smaller multipliers

Cluster Multipliers The circuit used to generate the enable signal 8-bit cluster low power multiplier

Cluster Multipliers Dividing the multiplication circuit into clusters (blocks) of smaller multipliers Applying clock gating techniques to disable the blocks that are producing a zero result. Features Low Power (claims 13.4 % savings)

Multiplexer-Based Array Multipliers Z j xjyj

Multiplexer-Based Array Multipliers Two types of cells: Cell 1: produce the terms Zij2j and includes a full adder of carry save adder array Cell 2: produce the terms xjyj 2j and includes a full adder of carry save adder array

Multiplexer-Based Array Multipliers Characteristics Faster than Modified Booth Unlike Booth, does not require encoding logic Requires approximately N2/2 cells Has a zigzag shape, thus not layout-friendly

Multiplexer-Based Array Multipliers Improvement More rectangular layout Save up to 40 percent area without penalties Outperforms the modified Booth multiplier in both speed and power by 13% to 26%

Gray-Encoded Array Multiplier Dec Hyb 0000 4 0100 -8 1100 -4 1000 1 0001 5 0101 -7 1101 -3 1001 2 0011 6 0111 -6 1111 -2 1011 3 0010 7 0110 -5 1110 -1 1010 2’s complement Hybrid Coding Having a single bit different for consecutive values Reducing the number of transitions, and thus power ( for highly correlated streams ).

Gray-Encoded Array Multiplier An 8-bit wide 2’s complement radix-4 array multiplier

Gray-Encoded Array Multiplier Characteristics Uses gray code to reduce the switching activity of multiplier Saves 45.6% power than Modified Booth Uses greater area(26.4% ) than Modified Booth

Ultra-high Speed Parallel Multiplier How to ultra-high speed? Based on Modified Booth Algorithm and Tree Structure (Column compress) Chooses efficient counters (3:2 and 5:3) Uses the new compressor (faster 20% ) Uses First Partial product Addition (FPA) Algorithm (reducing the bits of CLA by 50%)

Ultra-high Speed Parallel Multiplier Divide into 3 rows or 5 rows only (most efficient). Calculate the partial products as soon as possible. The final CLA is only 16-bit instead of 32-bit. Calculation process using parallel counter in case of 16x16 ---Totally reduce delay by about 30%

ULLRLF Multiplier ULLRLF stands for Upper/Lower Left-to-Right Leapfrog. Combine the following techniques: Signal flow optimization in [3:2] adder array for partial product reduction, Left-to-right leapfrog (LRLF) signal flow, Splitting of the reduction array into upper/lower parts.

ULLRLF Multiplier Signal flow optimization in [3:2] adder array PPij is always connected to pin A Sin/Cin are connected to B/C , most Sin signals are connected to C Signal flow optimization in [3:2] adder array -- For n = 32, the delay is reduced by 30 percent. -- The power is saved also.

ULLRLF Multiplier 2) Left-to-Right Leapfrog (LRLF) Structure The sum signals skip over alternate rows. 2) Left-to-Right Leapfrog (LRLF) Structure -- The delay of signals is more balanceable. -- Low power.

ULLRLF Multiplier 3) Upper/Lower Split Structure Only n+2 bits 3) Upper/Lower Split Structure -- The long path of data path be broken into parallel short paths, there would be a saving in power. -- The delay of Partial Products Reduction is reduced.

ULLRLF Multiplier ULLRLF multipliers have less power than optimized tree multipliers for n ≤ 32 while keeping similar delay and area. With more regularity and inherently shorter interconnects, the ULLRLF structure presents a competitive alternative to tree structures. Floorplan of ULLRLF (n = 32)

Signed Array Multiplier

Unsigned Array Multiplier

Signed Modified Booth Multiplier

Signed Modified Booth Multiplier

Unsigned Modified Booth Multiplier

Unsigned Modified Booth Multiplier

Wallace Tree multipliers

Wallace Tree multipliers Use the 3:2 counters and 2:2 counters Number of levels of = log (32/2) / log (3/2) ≈8 Irregular structure Fast

Wallace Tree multipliers 2-level hierarchical

Modified Booth-Wallace Tree Multipliers

Modified Booth-Wallace Tree Multipliers Use the 3:2 counters and 2:2 counters Number of levels of = log (16/2) / log (3/2) ≈6 Irregular structure Fast Less area

Twin pipe serial-parallel multipliers

Signed twin pipe serial-parallel multipliers “Sign” control line and the sign-change hardware

Unsigned twin pipe serial-parallel multipliers Don’t need the “Sign” control line and the sign-change hardware