CSE477 VLSI Digital Circuits Fall 2003 Lecture 21: Multiplier Design

CSE477 VLSI Digital Circuits Fall 2003 Lecture 21: Multiplier Design
Mary Jane Irwin ( ) [Adapted from Rabaey’s Digital Integrated Circuits, Second Edition, © J. Rabaey, A. Chandrakasan, B. Nikolic]

Review: Basic Building Blocks
Datapath Execution units Adder, multiplier, divider, shifter, etc. Register file and pipeline registers Multiplexers, decoders Control Finite state machines (PLA, ROM, random logic) Interconnect Switches, arbiters, buses Memory Caches (SRAMs), TLBs, DRAMs, buffers

Review: Adder Comparisons
CSkip best area*speed (N) MCC RCA best area* power*speed (N) KS PPA best speed (logN)

Multiply Operation Multiplication is just a a lot of additions N
multiplicand multiplier can be formed in parallel partial product array N double precision product 2N

Multiplication Approaches
Right shift and add Partial product array rows are accumulated from top to bottom on an N-bit adder After each addition, right shift (by one bit) the accumulated partial product to align it with the next row to add Time for N bits Tserial_mult = O(N Tadder) = O(N2) for a RCA Making it faster Use a faster adder Use higher radix (e.g., base 4) multiplication – O(N/2 Tadder) Use multiplier recoding to simplify multiple formation Form the partial product array in parallel and add it in parallel Right shift approach (almost) always used because left shift requires 2n bit adder Making it smaller (i.e., slower) Use an array multiplier Very regular structure with only short wires to nearest neighbor cells. Thus, very simple and efficient layout in VLSI Can be easily and efficiently pipelined

Making it Faster: Tree Multiplier Structure
Q (‘ier) D (‘icand) D multiple forming circuits partial product array reduction tree mux + reduction tree (log N) CPA (log N) interconnect fast carry propagate adder (CPA) P (product)

(4,2) Counter Built out of two (3,2) counters (just FA’s!)
all of the inputs (4 external plus one internal) have the same weight (i.e., are in the same bit position) the internal carry output is fed to the next higher weight position (indicated by the ) (3,2) a balanced delay tree 2 csa delays total Note: Two carry outs - one “internal” and one “external” (3,2)

Tiling (4,2) Counters Reduces columns four high to columns only two high Tiles with neighboring (4,2) counters Internal carry in at same “level” (i.e., bit position weight) as the internal carry out (3,2) (3,2) (3,2) (3,2) (3,2) (3,2) For lecture a balanced delay tree 2 csa delays total

4x4 Partial Product Array Reduction
Fast 4x4 multiplication using (4,2) counters How would you lay it out? multiplicand multiplier five (4,2) counters 5-bit CPA multiplicand multiplier 8-bit product partial product array reduced pp array (to CPA) For lecture double precision product

8x8 Partial Product Array Reduction
‘icand Wallace tree multiplier ‘ier partial product array two rows of nine (4,2) counters Completely populate (costs more in terms of (4,2) counters) – advantage is the CPA doesn’t have to be as wide, so the multiplier faster, and the reduction tree is more “regular” reduced partial product array one row of thirteen (4,2) counters to a 13-bit fast CPA

An 8x8 Multiplier Layout How should it be laid out? multiplicand
nine (4,2) counters thirteen (4,2) counters 13-bit CPA

A Better 8x8 Multiplier Layout
A better layout that focuses on interconnect multiplicand multiple generators 2 nine (4,2) counters thirteen (4,2) counters multiplier nine (4,2) counters need 4 multiples per 4,2 counter slice of same weight CPA

A 16x16 Multiplier Layout . . . multiplicand multiple generators
multiple selection signals (‘ier) 2 multiple generators (4,2) counter slice (4,2) counter slice (4,2) counter slice need 4 multiples per 4,2 counter slice of same weight CPA

Why Not Recode ? Multiplier recoding (modified Booth’s, canonical, …) recode the multiplier to allow base 4 multiplication with simple multiple formation without recoding have the base 4 multiplier digit set of 0, 1, 2, 3 with recoding have the base 4 multiplier digit set of -2, -1, 0, 1, 2 Thus, with recoding the initial partial product array is only N/2 high N But, the first level of (4,2) counters also reduces the partial product array to N/2 high N/2 2N Which is better depends on the logic delay (recoding wins) and interconnect complexity (counters win big)

Next Lecture and Reminders
Shifters, decoders, and multiplexers Reading assignment – Rabaey, et al, Reminders HW#4 due today HW#5 will (optional) due November 20th Project final reports due December 4th Final grading negotiations/correction (except for the final exam) must be concluded by December 10th Final exam scheduled Tuesday, December 16th from 10:10 to noon in 118 and 113 Thomas

CSE477 VLSI Digital Circuits Fall 2003 Lecture 21: Multiplier Design

Similar presentations

Presentation on theme: "CSE477 VLSI Digital Circuits Fall 2003 Lecture 21: Multiplier Design"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CSE477 VLSI Digital Circuits Fall 2003 Lecture 21: Multiplier Design

Similar presentations

Presentation on theme: "CSE477 VLSI Digital Circuits Fall 2003 Lecture 21: Multiplier Design"— Presentation transcript:

Similar presentations

About project

Feedback