Download presentation
Presentation is loading. Please wait.
Published byEgbert Warner Modified over 9 years ago
1
Digital Integrated Circuits Chpt. 5Lec. 01- 08/29/2006 CSE477 VLSI Digital Circuits Fall 2002 Lecture 21: Multiplier Design Mary Jane Irwin ( www.cse.psu.edu/~mji ) www.cse.psu.edu/~cg477www.cse.psu.edu/~mji www.cse.psu.edu/~cg477 [Adapted from Rabaey’s Digital Integrated Circuits, ©2002, J. Rabaey et al.]
2
Digital Integrated Circuits Chpt. 5Lec. 01- 08/29/2006 Review: Basic Building Blocks n Datapath –Execution units »Adder, multiplier, divider, shifter, etc. –Register file and pipeline registers –Multiplexers, decoders n Control –Finite state machines (PLA, ROM, random logic) n Interconnect –Switches, arbiters, buses n Memory –Caches (SRAMs), TLBs, DRAMs, buffers
3
Digital Integrated Circuits Chpt. 5Lec. 01- 08/29/2006 Review: Binary Adder Landscape synchronous word parallel adders ripple carry adders (RCA) carry prop min adders signed-digit fast carry prop residue adders adders adders Manchester carry parallel conditional carry carry chain select prefix sum skip T = O(N), A = O(N) T = O(1), A = O(N) T = O(log N) A = O(N log N) T = O( N), A = O(N) T = O(N) A = O(N)
4
Digital Integrated Circuits Chpt. 5Lec. 01- 08/29/2006 Multiply Operation n Multiplication as repeated additions multiplicand multiplier partial product array double precision product N 2N N can be formed in parallel
5
Digital Integrated Circuits Chpt. 5Lec. 01- 08/29/2006 Shift & Add Multiplication n Right shift and add –Partial product array rows are accumulated from top to bottom on an N-bit adder –After each addition, right shift (by one bit) the accumulated partial product to align it with the next row to add –Time for N bits T serial_mult = O(N T adder ) = O(N 2 ) for a RCA n Making it faster –Use a faster adder –Use higher radix (e.g., base 4) multiplication »Use multiplier recoding to simplify multiple formation –Form partial product array in parallel and add it in parallel n Making it smaller (i.e., slower) –Use an array multiplier »Very regular structure with only short wires to nearest neighbor cells. Thus, very simple and efficient layout in VLSI »Can be easily and efficiently pipelined
6
Digital Integrated Circuits Chpt. 5Lec. 01- 08/29/2006 Tree Multiplier Structure partial product array reduction tree fast carry propagate adder (CPA) P (product) mux + reduction tree (log N) + CPA (log N) Q (‘ier) D (‘icand) D D D 0 0 0 0multiple forming circuits
7
Digital Integrated Circuits Chpt. 5Lec. 01- 08/29/2006 (4,2) Counter n Built out of two (3,2) counters (just FA’s!) –all of the inputs (4 external plus one internal) have the same weight (i.e., are in the same bit position) –the internal output is carried to the next higher weight position (indicated by the ) (3,2) Note: Two carry outs - one “internal” and one “external”
8
Digital Integrated Circuits Chpt. 5Lec. 01- 08/29/2006 Tiling (4,2) Counters n Reduces columns four high to columns only two high –Tiles with neighboring (4,2) counters –Internal carry in at same “level” (i.e., bit position weight) as the internal carry out (3,2)
9
Digital Integrated Circuits Chpt. 5Lec. 01- 08/29/2006 Tiling (4,2) Counters n Reduces columns four high to columns only two high –Tiles with neighboring (4,2) counters –Internal carry in at same “level” (i.e., bit position weight) as the internal carry out (3,2)
10
Digital Integrated Circuits Chpt. 5Lec. 01- 08/29/2006 4x4 Partial Product Array Reduction multiplicand multiplier partial product array reduced pp array (to CPA) double precision product n Fast 4x4 multiplication using (4,2) counters
11
Digital Integrated Circuits Chpt. 5Lec. 01- 08/29/2006 4x4 Partial Product Array Reduction multiplicand multiplier partial product array reduced pp array (to CPA) double precision product n Fast 4x4 multiplication using (4,2) counters
12
Digital Integrated Circuits Chpt. 5Lec. 01- 08/29/2006 8x8 Partial Product Array Reduction ‘icand ‘ier partial product array How many (4,2) counters minimum are needed to reduce it to 2 rows?
13
Digital Integrated Circuits Chpt. 5Lec. 01- 08/29/2006 8x8 Partial Product Array Reduction ‘icand ‘ier partial product array reduced partial product array How many (4,2) counters minimum are needed to reduce it to 2 rows? Answer: 24
14
Digital Integrated Circuits Chpt. 5Lec. 01- 08/29/2006 Alternate 8x8 Partial Product Array Reduction ‘icand ‘ier partial product array reduced partial product array More (4,2) counters, so what is the advantage?
15
Digital Integrated Circuits Chpt. 5Lec. 01- 08/29/2006 Array Reduction Layout Approach multiple generators multiplicand multiple selection signals (‘ier)... 2 (4,2) counter slice CPA
16
Digital Integrated Circuits Chpt. 5Lec. 01- 08/29/2006 Next Lecture and Reminders n Next lecture –Shifters, decoders, and multiplexers »Reading assignment – Rabaey, et al, 11.5-11.6 n Reminders –Project final reports due December 5 th –HW5 (last one!) due November 19 th –Final grading negotiations/correction (except for the final exam) must be concluded by December 10 th –Final exam scheduled »Monday, December 16 th from 10:10 to noon in 118 and 121 Thomas
17
Digital Integrated Circuits Chpt. 5Lec. 01- 08/29/2006 Topics n Adders and ALUs (§6.4, §6.5) n Multipliers (§6.6) –Array multiplier –Baugh-Wooley multiplier –Booth encoding –Wallace tree multiplier n Subsystem design principles (§6.2)
18
Digital Integrated Circuits Chpt. 5Lec. 01- 08/29/2006 Elementary School Algorithm 0 1 1 0 multiplicand × 1 0 0 1 multiplier 0 1 1 0 + 0 0 0 0 0 0 1 1 0 + 0 0 0 0 0 0 0 1 1 0 + 0 1 1 0 0 1 1 0 1 1 0 partial products
19
Digital Integrated Circuits Chpt. 5Lec. 01- 08/29/2006 Combinational Multiplier bit of multiplier controls whether addition occurs
20
Digital Integrated Circuits Chpt. 5Lec. 01- 08/29/2006 Array Multiplier n Regular layout –An n × m cell layout –Easy to be pipelined –Used frequently in FPGA and ASICs n Critical path –Less than (n+m-1) bit adder delay n Handles unsigned multiplication ONLY
21
Digital Integrated Circuits Chpt. 5Lec. 01- 08/29/2006 A 4 × 4 Unsigned Array Multiplier skew array for rectangular layout
22
Digital Integrated Circuits Chpt. 5Lec. 01- 08/29/2006 Unsigned Array Multiplier + a b CinCin C ou t Sum + x0y1x0y1 x0y2x0y2 P1 + x0y0x0y0 x0y3x0y3 + 0 + + 0 x1y1x1y1 x1y2x1y2 + x1y0x1y0 x1y3x1y3 + + P2 P3 P4 0 P0 + 0 x2y1x2y1 x2y2x2y2 + x2y0x2y0 x2y3x2y3 + + x3y1x3y1 x3y2x3y2 x3y0x3y0 x3y3x3y3 P5P6P7
23
Digital Integrated Circuits Chpt. 5Lec. 01- 08/29/2006 Signed Multiplication n Signed number representation – n Signed n×n multiplication –(1110) 2 × (0011) 2 = (1010) 2 (-2) × 3 = (-6) –No difference from unsigned multiplication if the result has the same bit-width as the input n But what if we want the result to be 2n bit? –Use sign-bit extension –Needs 2n × 2n array multiplier
24
Digital Integrated Circuits Chpt. 5Lec. 01- 08/29/2006 Baugh-Wooley Multiplier: Principle
25
Digital Integrated Circuits Chpt. 5Lec. 01- 08/29/2006 Baugh-Wooley Multiplier: Structure + a b CinCin C ou t Sum
26
Digital Integrated Circuits Chpt. 5Lec. 01- 08/29/2006 Booth Multiplier n Utilize Booth encoding scheme n Booth encoding scheme Handles signed multiplication Reduce the number of partial products by half Small area and fast Encoding scheme cannot be applied hierarchically »Often used as the first stage partial products reduction
27
Digital Integrated Circuits Chpt. 5Lec. 01- 08/29/2006 Booth Encoding: Principle n Two’s-complement form of multiplier y – n Consider first two terms – –By looking at three bits of y, we can determine whether to add x, 2x to partial product.
28
Digital Integrated Circuits Chpt. 5Lec. 01- 08/29/2006 Booth Actions y i y i-1 y i-2 increment 0 0 00 0 0 1X 0 1 0X 0 1 12X 1 0 0-2X 1 0 1-X 1 1 0-X 1 1 10
29
Digital Integrated Circuits Chpt. 5Lec. 01- 08/29/2006 Booth Example n Don’t forget the sign extension of the encoded value when add them together –Only have to extend 2 bits though n x = 011001 (25 10 ), y = 101110 (-18 10 ). y 1 y 0 y -1 = 100, P 1 = P 0 - (10 011001) = 11111001110 y 3 y 2 y 1 = 111, P 2 = P 1 0 = 11111001110. n y 5 y 4 y 3 = 101, P 3 = P 2 - 0110010000 = 11000111110.
30
Digital Integrated Circuits Chpt. 5Lec. 01- 08/29/2006 Wallace Tree n Reduces the number of partial products n Built from carry-save adders: – Three inputs: a, b, c – Two outputs: y, z such that y + z = a + b + c n Carry-save equations: – y i = a i b i c i – z i+1 = a i b i + b i c i + c i a i – What’s the difference from carry-ripple adder?
31
Digital Integrated Circuits Chpt. 5Lec. 01- 08/29/2006 Wallace Tree Structure FA a2a2 b2b2 c2c2 a1a1 b1b1 c1c1 a0a0 b0b0 c0c0 s0s0 s1s1 s2s2 carry- ripple adder FA a2a2 b2b2 c2c2 a1a1 b1b1 c1c1 a0a0 b0b0 c0c0 y0y0 carry- save adder z1z1 y1y1 z2z2 y2y2 z3z3
32
Digital Integrated Circuits Chpt. 5Lec. 01- 08/29/2006 Wallace Tree Operation n n additions are reduced to (2n/3) additions after each level –Sum of inputs = Sum of outputs –Can apply the reduction hierarchically –More efficient design uses 4-2 adders to reduce n additions to (n/2) additions after each level n Need final adder to add the last two numbers
33
Digital Integrated Circuits Chpt. 5Lec. 01- 08/29/2006 A Booth-Wallace Tree Multiplier Most commonly used high-performance multiplier
34
Digital Integrated Circuits Chpt. 5Lec. 01- 08/29/2006 Topics n Adders and ALUs (§6.4, §6.5) n Multipliers (§6.6) n Subsystem design principles (§6.2)
35
Digital Integrated Circuits Chpt. 5Lec. 01- 08/29/2006 Pipelining n Pipelining can be used to reduce clock period at the expense of latency: combinational logic 1 combinational logic 2
36
Digital Integrated Circuits Chpt. 5Lec. 01- 08/29/2006 Cycle Time and Latency # stages cycle time # stages latency
37
Digital Integrated Circuits Chpt. 5Lec. 01- 08/29/2006 Data Paths n A data path is a logical and physical structure: –bit-wise logical organization –bit-wise physical structure n Data paths generally use busses to pass data between function units.
38
Digital Integrated Circuits Chpt. 5Lec. 01- 08/29/2006 Bit Slice Organization registersshifterALU bit n-1 bit 0 bus control
39
Digital Integrated Circuits Chpt. 5Lec. 01- 08/29/2006 Data Path Cell Design n Connections may be made by: –abutment, requiring stretching cells; –river routing, requiring a routing channel between function units.
40
Digital Integrated Circuits Chpt. 5Lec. 01- 08/29/2006 Project n Due 10/26 –Schematic –Verilog/Spectre simulation results –10/27 presentation (10-15 PowerPoint slides) n Important (efficiency-related) –How to add array of instances
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.