Multiplier Design [Adapted from Rabaey’s Digital Integrated Circuits, Second Edition, ©2003 J. Rabaey, A. Chandrakasan, B. Nikolic]

Slides:



Advertisements
Similar presentations
EE141 © Digital Integrated Circuits 2nd Arithmetic Circuits 1 Digital Integrated Circuits A Design Perspective Arithmetic Circuits Jan M. Rabaey Anantha.
Advertisements

Using Carry-Save Adders For Radix- 4, Can Be Used to Generate 3a – No Booth’s Slight Delay Penalty from CSA – 3 Gates.
Datapath Functional Units. Outline  Comparators  Shifters  Multi-input Adders  Multipliers.
EE141 © Digital Integrated Circuits 2nd Arithmetic Circuits 1 [Adapted from Rabaey’s Digital Integrated Circuits, ©2002, J. Rabaey et al.]
Modern VLSI Design 2e: Chapter 6 Copyright  1998 Prentice Hall PTR Topics n Multipliers.
Copyright 2008 Koren ECE666/Koren Part.6b.1 Israel Koren Spring 2008 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Digital Computer.
EECS Components and Design Techniques for Digital Systems Lec 18 – Arithmetic II (Multiplication) David Culler Electrical Engineering and Computer.
UNIVERSITY OF MASSACHUSETTS Dept
Contemporary Logic Design Arithmetic Circuits © R.H. Katz Lecture #24: Arithmetic Circuits -1 Arithmetic Circuits (Part II) Randy H. Katz University of.
EE466: VLSI Design Lecture 14: Datapath Functional Units.
Introduction to CMOS VLSI Design Datapath Functional Units
ECE 301 – Digital Electronics
Multiplication.
Lecture 18: Datapath Functional Units
Aug Shift Operations Source: David Harris. Aug Shifter Implementation Regular layout, can be compact, use transmission gates to avoid threshold.
Chapter 6-2 Multiplier Multiplier Next Lecture Divider
Review: Basic Building Blocks  Datapath l Execution units -Adder, multiplier, divider, shifter, etc. l Register file and pipeline registers l Multiplexers,
Digital Integrated Circuits Chpt. 5Lec /29/2006 CSE477 VLSI Digital Circuits Fall 2002 Lecture 21: Multiplier Design Mary Jane Irwin (
EE141 © Digital Integrated Circuits 2nd Arithmetic Circuits 1 Digital Integrated Circuits A Design Perspective Arithmetic Circuits Reference: Digital Integrated.
Advanced VLSI Design Unit 05: Datapath Units. Slide 2 Outline  Adders  Comparators  Shifters  Multi-input Adders  Multipliers.
FPGA-Based System Design: Chapter 4 Copyright  2004 Prentice Hall PTR Topics n Multipliers.
CSE477 L24 RAM Cores.1Irwin&Vijay, PSU, 2002 CSE477 VLSI Digital Circuits Fall 2002 Lecture 24: RAM Cores Mary Jane Irwin ( )
EE141 © Digital Integrated Circuits 2nd Arithmetic Circuits 1 Digital Integrated Circuits A Design Perspective Arithmetic Circuits Jan M. Rabaey Anantha.
Cost/Performance Tradeoffs: a case study
CMPEN 411 VLSI Digital Circuits Spring 2009 Lecture 19: Adder Design
Digital Integrated Circuits 2e: Chapter Copyright  2002 Prentice Hall PTR, Adapted by Yunsi Fei ECE 300 Advanced VLSI Design Fall 2006 Lecture.
Topics Multipliers..
CPEN Digital System Design
Full Tree Multipliers All k PPs Produced Simultaneously Input to k-input Multioperand Tree Multiples of a (Binary, High-Radix or Recoded) Formed at Top.
CSE477 VLSI Digital Circuits Fall 2002 Lecture 20: Adder Design
Sp09 CMPEN 411 L21 S.1 CMPEN 411 VLSI Digital Circuits Spring 2009 Lecture 21: Shifters, Decoders, Muxes [Adapted from Rabaey’s Digital Integrated Circuits,
CSE477 L21 Multiplier Design.1Irwin&Vijay, PSU, 2002 CSE477 VLSI Digital Circuits Fall 2002 Lecture 21: Multiplier Design Mary Jane Irwin (
EE141 Arithmetic Circuits 1 Chapter 14 Arithmetic Circuits Rev /12/2003 Rev /05/2003.
Logic Design (CE1111 ) Lecture 4 (Chapter 4) Combinational Logic Prepared by Dr. Lamiaa Elshenawy 1.
EE141 Arithmetic Circuits 1 Chapter 14 Arithmetic Circuits Rev /12/2003.
CSE575 Multiplication.1 © MJIrwin, PSU, 2005 Computer Arithmetic CSE 575 Computer Arithmetic Spring 2005 Mary Jane Irwin (
CSE477 L20 Adder Design.1Irwin&Vijay, PSU, 2003 CSE477 VLSI Digital Circuits Fall 2003 Lecture 20: Adder Design Mary Jane Irwin (
Combinational Circuits
CSE477 VLSI Digital Circuits Fall 2003 Lecture 21: Multiplier Design
UNIVERSITY OF MASSACHUSETTS Dept
CSE 575 Computer Arithmetic Spring 2003 Mary Jane Irwin (www. cse. psu
Homework Reading Machine Projects Labs
Multipliers Multipliers play an important role in today’s digital signal processing and various other applications. The common multiplication method is.
Instructor: Alexander Stoytchev
Chap. 8 Datapath Units: Multiplier Design
Mary Jane Irwin ( ) CSE477 VLSI Digital Circuits Fall 2002 Lecture 22: Shifters, Decoders, Muxes Mary Jane.
Digital Building Blocks
Mary Jane Irwin ( ) CSE477 VLSI Digital Circuits Fall 2003 Lecture 22: Shifters, Decoders, Muxes Mary Jane.
Unsigned Multiplication
VLSI Arithmetic Lecture 10: Multipliers
Digital Integrated Circuits A Design Perspective
CSE 370 – Winter 2002 – Comb. Logic building blocks - 1
Topics Multipliers..
CS 140 Lecture 14 Standard Combinational Modules
Review: Basic Building Blocks
Homework Reading Machine Projects Labs
Part III The Arithmetic/Logic Unit
CSE 140 Lecture 14 Standard Combinational Modules
UNIVERSITY OF MASSACHUSETTS Dept
UNIVERSITY OF MASSACHUSETTS Dept
Combinational Circuits
UNIVERSITY OF MASSACHUSETTS Dept
Lecture 9 Digital VLSI System Design Laboratory
Comparison of Various Multipliers for Performance Issues
UNIVERSITY OF MASSACHUSETTS Dept
Arithmetic Building Blocks
ΗΜΥ 307 ΨΗΦΙΑΚΑ ΟΛΟΚΛΗΡΩΜΕΝΑ ΚΥΚΛΩΜΑΤΑ Εαρινό Εξάμηνο 2019 ΔΙΑΛΕΞΕΙΣ 14-15: Κυκλώματα Αριθμητικής και Λογικής Other handouts To handout next time ΧΑΡΗΣ.
Instruction execution and ALU
Arithmetic Circuits.
UNIVERSITY OF MASSACHUSETTS Dept
Presentation transcript:

Multiplier Design [Adapted from Rabaey’s Digital Integrated Circuits, Second Edition, ©2003 J. Rabaey, A. Chandrakasan, B. Nikolic]

Review: Basic Building Blocks Datapath Execution units Adder, multiplier, divider, shifter, etc. Register file and pipeline registers Multiplexers, decoders Control Finite state machines (PLA, ROM, random logic) Interconnect Switches, arbiters, buses Memory Caches (SRAMs), TLBs, DRAMs, buffers

The Binary Multiplication

Multiply Operation Multiplication is just a a lot of additions N multiplicand multiplier can be formed in parallel partial product array N double precision product 2N

Multiplication Approaches Right shift and add Partial product array rows are accumulated from top to bottom on an N-bit adder After each addition, right shift (by one bit) the accumulated partial product to align it with the next row to add Time for N bits Tserial_mult = O(N Tadder) = O(N2) for a RCA Making it faster Use a faster adder Use higher radix (e.g., base 4) multiplication – O(N/2 Tadder) Use multiplier recoding to simplify multiple formation (booth) Form the partial product array in parallel and add it in parallel Right shift approach (almost) always used because left shift requires 2n bit adder Making it smaller (i.e., slower) Use serial-parallel mult Use an array multiplier Very regular structure with only short wires to nearest neighbor cells. Thus, very simple and efficient layout in VLSI Can be easily and efficiently pipelined

Serial-parallel multiplier structure One simple,small way to implement is the serial-parallel multiplier. So called because the n-bit multiplier is fed in serially while the m bit multiplicand is held in parallel.

The Array Multiplier

Booth multiplier Encoding scheme to reduce number of stages in multiplication. Performs two bits of multiplication at once—requires half the stages. Each stage is slightly more complex than simple multiplier, but adder/subtracter is almost as small/fast as adder.

Booth encoding Two’s-complement form of multiplier: y = -2nyn + 2n-1yn-1 + 2n-2yn-2 + ... (first bit is the sign bit) (example, y=18=010010 y= -18 = 101110 ) Rewrite using 2a = 2a+1 - 2a: y = 2n(yn-1-yn) + 2n-1(yn-2 -yn-1) + 2n-2(yn-3 -yn-2) + ... Consider first two terms: by looking at three bits of y, we can determine whether to add x, 2x to partial product.

Booth actions y = 2n(yn-1-yn) + 2n-1(yn-2 -yn-1) + 2n-2(yn-3 -yn-2) + ... Consider first two terms: by looking at three bits of y, we can determine whether to add x, 2x to partial product. yi yi-1 yi-2 increment 0 0 0 0 0 0 1 x 0 1 0 x 0 1 1 2x 1 0 0 -2x 1 0 1 -x 1 1 0 -x 1 1 1 0

Booth example x = 1001 (910), y = 0111 (710). P0 = 00000000 y3y2y1=011 y1y0y-1=11(0) y1y0y-1 = 110, P1 = P0 - (1001) = 11110111 x shift left for 2 bits to be 100100 y3y2y1 = 011, P2 = P1 (10*100100) = 11110111+01001000 = 001111111 (6310) An array multiplier needs N addtions, booth multiplier needs only N/2 additions

Review: A 64-bit Adder/Subtractor add/subt C0=Cin Ripple Carry Adder (RCA) built out of 64 FAs Subtraction – complement all subtrahend bits (xor gates) and set the low order carry-in RCA advantage: simple logic, so small (low cost) disadvantage: slow (O(N) for N bits) and lots of glitching (so lots of energy consumption) A0 1-bit FA S0 B0 C1 A1 1-bit FA S1 B1 C2 A2 1-bit FA S2 B2 C3 . . . C63 A63 1-bit FA S63 B63 C64=Cout

Booth structure See Wayne Wolf book the

Wallace-Tree Multiplier

Wallace-Tree Multiplier Full adder = (3,2) compressor

Making it Faster: Tree Multiplier Structure Q (‘ier) D (‘icand) D multiple forming circuits partial product array reduction tree mux + reduction tree (log N) CPA (log N) interconnect fast carry propagate adder (CPA) P (product)

(4,2) Counter Built out of two (3,2) counters (just FA’s!) all of the inputs (4 external plus one internal) have the same weight (i.e., are in the same bit position) the internal carry output is fed to the next higher weight position (indicated by the ) (3,2) a balanced delay tree 2 csa delays total Note: Two carry outs - one “internal” and one “external” (3,2)

Tiling (4,2) Counters Reduces columns four high to columns only two high Tiles with neighboring (4,2) counters Internal carry in at same “level” (i.e., bit position weight) as the internal carry out (3,2) (3,2) (3,2) (3,2) (3,2) (3,2) For lecture a balanced delay tree 2 csa delays total

4x4 Partial Product Array Reduction Fast 4x4 multiplication using (4,2) counters How would you lay it out? multiplicand multiplier five (4,2) counters 5-bit CPA multiplicand multiplier 8-bit product partial product array reduced pp array (to CPA) For lecture double precision product

8x8 Partial Product Array Reduction ‘icand Wallace tree multiplier ‘ier partial product array two rows of nine (4,2) counters reduced partial product array one row of thirteen (4,2) counters Completely populate (costs more in terms of (4,2) counters) – advantage is the CPA doesn’t have to be as wide, so the multiplier faster, and the reduction tree is more “regular” to a 13-bit fast CPA

An 8x8 Multiplier Layout How should it be laid out? multiplicand nine (4,2) counters thirteen (4,2) counters 13-bit CPA

Multipliers —Summary

Next Lecture and Reminders Shifters, decoders, and multiplexers Reading assignment – Rabaey, et al, 11.5-11.6