Download presentation
Presentation is loading. Please wait.
Published byJonathan Taylor Modified over 9 years ago
1
Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei ECE 300 Advanced VLSI Design Fall 2006 Lecture 18: Adders, Multipliers, & Shifters Yunsi Fei [Adapted from Jan Rabaey et al’s Digital Integrated Circuits ©2002, PSU Irwin & Vijay © 2002, and Princeton Wayne Wolf’s Modern VLSI Design © 2002 ]
2
Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei Review: Binary Adder Landscape synchronous word parallel adders ripple carry adders (RCA) carry prop min adders signed-digit fast carry prop residue adders adders adders Manchester carry carry conditional carry carry chain select lookahead sum skip T = O(N), A = O(N) T = O(1), A = O(N) T = O(log N) A = O(N log N) T = O( N), A = O(N) T = O(N) A = O(N)
3
Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei Logarithmic Carry Lookahead Adders n Define carry operator € on (G,P) signal pairs –€ is associative, i.e., [(g’’’,p’’’) € (g’’,p’’)] € (g’,p’) = (g’’’,p’’’) € [(g’’,p’’) € (g’,p’)] € (G’’,P’’)(G’,P’) (G,P) where G = G’’ + P’’G’ P = P’’P’ € €€ € G’G’ !G G ’’ P ’’
4
Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei General Structure n Given P and G terms for each bit position, computing all the carries is equal to finding all the prefixes in parallel (G 0,P 0 ) € (G 1,P 1 ) € (G 2,P 2 ) € … € (G N-2,P N-2 ) € (G N-1,P N-1 ) n Since € is associative, we can group them in any order –but note that it is not commutative n Measures to consider –number of € cells –tree cell depth (time) –tree cell area –cell fan-in and fan-out –max wiring length –wiring congestion –delay path variation (glitching) P i, G i logic (1 unit delay) S i logic (1 unit delay) C i parallel prefix logic tree (1 unit delay per level)
5
Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei Brent-Kung Tree Parallel Prefix Computation € G0P0G0P0 G1P1G1P1 G2p2G2p2 G3P3G3P3 G4P4G4P4 G5P5G5P5 G6P6G6P6 G7P7G7P7 G8P8G8P8 G9p9G9p9 G 10 P 10 G 11 p 11 G 12 P 12 G 13 p 13 G 14 p 14 G 15 p 15 €€€€€€€€€€€€€€€€€€€€€€€€€ C1C1 C2C2 C3C3 C4C4 C5C5 C6C6 C7C7 C8C8 C9C9 C 10 C 11 C 12 C 13 C 14 C 15 C 16 C in 0 € T = log 2 N T = log 2 N - 2 A = 2log 2 N A = N/2 T add = t setup + (2log 2 N-2) t € + t sum
6
Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei Kogge-Stone PPF Adder Parallel Prefix Computation € G0P0G0P0 G1P1G1P1 G2P2G2P2 G3P3G3P3 G4P4G4P4 G5P5G5P5 G6P6G6P6 G7P7G7P7 G8P8G8P8 G9P9G9P9 G 10 P 10 G 11 P 11 G 12 P 12 G 13 P 13 G 14 P 14 G 15 P 15 €€€€€€€€€€€€€€€ C1C1 C 2 C3C3 C4C4 C5C5 C6C6 C7C7 C8C8 C9C9 C 10 C 11 C 12 C 13 C 14 C 15 C 16 C in € T = log 2 N A = log 2 N A = N €€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€ T add = t setup + log 2 N t € + t sum
7
Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei DEC “alpha” 21064 Adder n 64-bit adder, 0.75 m technology, 5ns delay n On the 8-bit level: Manchester chain n On the 32-bit sub-block: Carry look ahead n On the 64-bit block: Carry select
8
Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei More Adder Comparisons
9
Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei Adder Speed Comparisons
10
Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei Adder Average Power Comparisons
11
Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei PDP of Adder Comparisons From Nagendra, 1996
12
Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei Serial Adder n May be used in signal-processing arithmetic where fast computation is important but latency is unimportant. n Data format (LSB first): bit 0bit 1bit 2bit 3...
13
Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei Serial adder Structure LSB control signal clears the carry shift register:
14
Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei Multiply Operation n Multiplication as repeated additions multiplicand multiplier partial product array double precision product N 2N N can be formed in parallel
15
Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei Shift & Add Multiplication n Right shift and add –Partial product array rows are accumulated from top to bottom on an N-bit adder –After each addition, right shift (by one bit) the accumulated partial product to align it with the next row to add –Time for N bits T serial_mult = O(N T adder ) = O(N 2 ) for a RCA n Making it faster –Use a faster adder –Use higher radix (e.g., base 4) multiplication »Use multiplier recoding to simplify multiple formation –Form partial product array in parallel and add it in parallel n Making it smaller (i.e., slower) –Use an array multiplier »Very regular structure with only short wires to nearest neighbor cells. Thus, very simple and efficient layout in VLSI »Can be easily and efficiently pipelined
16
Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei Tree Multiplier Structure partial product array reduction tree fast carry propagate adder (CPA) P (product) mux + reduction tree (log N) + CPA (log N) Q (‘ier) D (‘icand) D D D 0 0 0 0 multiple forming circuits
17
Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei (4,2) Counter n Built out of two (3,2) counters (just FA’s!) –all of the inputs (4 external plus one internal) have the same weight (i.e., are in the same bit position) –the internal output is carried to the next higher weight position (indicated by the ) (3,2) Note: Two carry outs - one “internal” and one “external”
18
Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei Tiling (4,2) Counters n Reduces column height from four to two –Tiles with neighboring (4,2) counters –Internal carry in at same “level” (i.e., bit position weight) as the internal carry out (3,2)
19
Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei 4x4 Partial Product Array Reduction multiplicand multiplier partial product array reduced pp array (to CPA) double precision product n Fast 4x4 multiplication using (4,2) counters
20
Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei 8x8 Partial Product Array Reduction ‘icand ‘ier partial product array reduced partial product array How many (4,2) counters minimum are needed to reduce it to 2 rows? Answer: 24
21
Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei Alternate 8x8 Partial Product Array Reduction ‘icand ‘ier partial product array reduced partial product array More (4,2) counters, so what is the advantage?
22
Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei Array Reduction Layout Approach multiple generators multiplicand multiple selection signals (‘ier)... 2 (4,2) counter slice CPA
23
Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei Parallel Programmable Shifters Data In Control = Data Out Shift amount Shift direction Shift type (logical, arith, circular) Shifters used in multipliers, floating point units Consume lots of area if done in random logic gates
24
Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei A Programmable Binary Shifter rgtnopleft AiAi A i-1 B i-1 BiBi AiAi A i-1 rgtnopleftBiBi B i-1 A1A1 A0A0 010A1A1 A0A0 A1A1 A0A0 1000A1A1 A1A1 A0A0 001A0A0 0
25
Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei 4-bit Barrel Shifter A0A0 A1A1 A2A2 A3A3 B0B0 B1B1 B2B2 B3B3 Sh1 Sh2 Sh3 Sh0Sh1Sh2Sh3 Example: Sh0 = 1 B 3 B 2 B 1 B 0 = A 3 A 2 A 1 A 0 Sh1 = 1 B 3 B 2 B 1 B 0 = A 3 A 3 A 2 A 1 Sh2 = 1 B 3 B 2 B 1 B 0 = A 3 A 3 A 3 A 2 Sh3 = 1 B 3 B 2 B 1 B 0 = A 3 A 3 A 3 A 3 Area dominated by wiring
26
Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei 4-bit Barrel Shifter Layout Width barrel ~ 2 p m N N = max shift distance, p m = metal pitch Delay ~ 1 fet + N diff caps Width barrel Only one Sh# active at a time l
27
Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei 8-bit Logarithmic Shifter A3A3 A2A2 A1A1 A0A0 !Sh1Sh1!Sh2Sh2!Sh3Sh3 B0B0 B1B1 B2B2 B3B3 log N stages 000111
28
Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei 8-bit Logarithmic Shifter Layout Slice Width log ~ p m (2K+(1+2+…+2 K-1 )) = p m (2 K +2K-1) K = log 2 N Delay ~ K fets + 2 diff caps A0A0 B3B3 B2B2 B1B1 B0B0 A1A1 A2A2 A3A3 124
29
Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei Shifter Implementation Comparisons NK BarrelLogarithmic WidthSpeedWidthSpeed 2 N p m 1 + N diffsp m (2 K +2K-1)K + 2 diffs 8316 p m 1 + 813 p m 3 + 2 16432 p m 1 + 1623 p m 4 + 2 32564 p m 1 + 3241 p m 5 + 2 646128 p m 1 + 6475 p m 6 + 2
30
Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei Decoders n Decodes inputs to activate one of many outputs –two inverters, four 2-input nand gates, four inverters plus enable logic –how about for a 3-to-8, 4-to-16, etc. decoder? In 0 In 1 Enable Out 0 = !In 1 & !In 0 Out 1 = !In 1 & In 0 Out 2 = In 1 & !In 0 Out 3 = In 1 & In 0 2x4
31
Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei Dynamic NOR Decoder V dd GND A0A0 !A 0 A1A1 !A 1 B0B0 B1B1 B2B2 B3B3 precharge 11111111 on on 0 1 0 1 0 1
32
Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei Dynamic NAND Decoder GND A0A0 !A 0 A1A1 !A 1 B3B3 precharge B2B2 B1B1 B0B0 0 1 on 11111111 0 1 1 0
33
Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei Building Big Decoders from Small 1x2 A4A4 enable A3A3 A2A2 2x4 A1A1 A0A0...... 0 0 0 0 1 1 0 1 Active low enable Active low output
34
Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei Multiplexers n Selects one of several inputs to gate to the single output –two inverters, four 3-input nands, one 4-input nand –how about for an 8x1, 16x1, etc. mux? In 0 S 1 S 0 Out = In 0 & !S 1 & !S 0 | In 1 & !S 1 & S 0 | In 2 & S 1 & !S 0 | In 3 & S 1 & S 0 In 1 In 2 In 3 4x1
35
Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei Review: TG 2x1 Multiplexer GND V DD In 1 In 2 SS SS S S !S In 2 In 1 F F F = !((In 1 & S) | (In 2 & !S))
36
Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei Building Big Muxes from Small A0A0 S0S0 A1A1 2x1 A2A2 A3A3 S1S1 Out 10
37
Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei Review: Datapath Bit-Sliced Organization Control Flow Bit 0 Bit 1 Bit 2 Bit 3 Tile identical bit-slice elements Register File Pipeline Register Adder ShifterPipeline Register Multiplexer Data Flow Pipeline Register From I$ Pipeline Register To/From D$ decoder
38
Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei Layout of Bit-Sliced Datapaths
39
Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei Layout of Bit-sliced Datapaths Without feedthroughs or pitch matching (4.2 m 2 ) With feedthroughs (3.2 m 2 ) With feedthroughs and pitch matching (2.2 m 2 )
40
Digital Integrated Circuits 2e: Chapter 11.4-11.6 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei Alpha 21264 Integer Unit Datapath Multimedia engine Shifter Intercluster bypass Adder Logic box Register file Register file decoder Logic box Adder Intercluster bypass Load bypass Store FIFO Address drivers tristate bus driver bus driver RC1_0 RC1_1 RC2_0 RC2_1 LSD_1LSD_0to D$
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.