Download presentation
Presentation is loading. Please wait.
Published byGudrun Ulriksen Modified over 5 years ago
1
Lecture 9 Digital VLSI System Design Laboratory
Instructor: Prof. Diana Marculescu Fall 1999
2
Topics Shifters. Adders and ALUs.
3
Combinational shifters
Useful for arithmetic operations, bit field extraction, etc. Latch-based shift register can shift only one bit per clock cycle. A multiple-shift shifter requires additional connectivity.
4
Barrel shifter Can perform n-bit shifts in a single cycle.
Efficient layout. Does require transmission gates and long wires.
5
Barrel shifter structure
Accepts 2n data inputs and n control signals, producing n data outputs.
6
Barrel shifter operation
Selects arbitrary contiguous n bits out of 2n input buts. Examples: right shift: data into top, 0 into bottom; left shift: 0 into top, data into bottom; rotate: data into top and bottom.
7
Barrel shifter layout Two-dimensional array of 2n vertical Xn horizontal cells. Input data travels diagonally upward. Output wires travel horizontally. Control signals run vertically. Exactly one control signal is set to 1, turning on all transmission gates in that column.
8
Barrel shifter cell
9
Barrel shifter organization
control outputs inputs
10
Barrel shifter in action
1
11
Analysis Large number of cells, but each one is small.
Delay is large, considering long wires and transmission gates.
12
Adders Adder delay is dominated by carry chain.
Carry chain analysis must consider transistor, wiring delay. Modern VLSI favors adder designs which have compact carry chains.
13
Full adder Computes one-bit sum, carry:
si = ai XOR bi XOR ci ci+1 = aibi + aici + bici Ripple-carry adder: n-bit adder built from full adders. Delay of ripple-carry adder goes through all carry bits.
14
Carry-lookahead adder
First compute carry propagate, generate: Pi = ai + bi Gi = ai bi Compute sum and carry from P and G: si = ci XOR Pi XOR Gi ci+1 = Gi + Pici
15
Carry-lookahead expansion
Can recursively expand carry formula: ci+1 = Gi + Pi(Gi-1 + Pi-1ci-1) ci+1 = Gi + PiGi-1 + PiPi-1 (Gi-2 + Pi-1ci-2) Expanded formula does not depend on intermerdiate carries. Allows carry for each bit to be computed independently.
16
Depth-4 carry-lookahead
17
Analysis Deepest carry expansion requires gates with large fanin: large, slow. Carry-lookahead unit requires complex wiring between adders and lookahead unit—values must be routed back from lookahead unit to adder. Layout is even more complex with multiple levels of lookahead.
18
Carry-skip adder Looks for cases in which carry out of a set of bits is identical to carry in. Typically organized into m-bit stages. If ai = bi for every bit in stage, then bypass gate sends stage’s carry input directly to carry output.
19
Two-bit carry-skip structure
skipping carry internal carry
20
Carry-skip group structure
FA FA FA FA FA FA skip skip
21
Carry-select adder Computes two results in parallel, each for different carry input assumptions. Uses actual carry in to select correct result. Reduces delay to multiplexer.
22
Carry-select structure
23
Manchester carry chain
Precharged carry chain which uses P and G signals. Propagate signal connects adjacent carry bits. Generate signal discharges carry bit. Worst-case discharge path goes through entire carry chain.
24
Manchester carry chain circuit
+ + Pi-1 Pi Gi-1 Gi stage i-1 stage i
25
Serial adder May be used in signal-processing arithmetic where fast computation is important but latency is unimportant. Data format (LSB first): ... bit 3 bit 2 bit 1 bit 0
26
Serial adder structure
LSB control signal clears the carry shift register:
27
Power consumption of adders
Callaway and Schwartzlander experiments on 16-bit adders: constant width carry-skip gave lowest power consumption ripple-carry was slightly higher carry lookahead was higher carry-select was highest
28
ALUs ALU computes a variety of logical and arithmetic functions based on opcode. May offer complete set of functions of two variables or a subset. ALU built around adder, since carry chain determines delay.
29
Function block circuit
out a b ctrl0 ctrl1 ctrl2 ctrl3
30
Function blocks and ALUs
Function block may be used to compute required intermediate signals for a full-function ALU. Requires little area. Transmission gates may introduce significant delay.
31
ALU structure
32
ALU design P and G compute intermediate values from inputs. May not correspond to carry lookahead P and G for non-addition functions. Add unit is adder of choice. Output unit computes from sum, propagate signal.
33
Topics Multipliers.
34
Elementary school algorithm
multiplicand x multiplier x 1 x 0 partial product x 0 x 1
35
Combinational multiplier
Uses n adders, eliminates registers: bit of multiplier controls whether addition occurs
36
Array multiplier Array multiplier is an efficient layout of a combinational multiplier. Array multipliers may be pipelined to decrease clock period at the expense of latency.
37
Array multiplier organization
x multiplicand multiplier skew array for rectangular layout product
38
Unsigned array multiplier
x2y0 x1y0 x0y0 …. + x1y1 + x0y1 + x1y2 + x0y2 …. xnyn + + P2n-1 P2n-2 P0
39
Baugh-Wooley multiplier
Algorithm for two’s-complement multiplication. Adjusts partial products to maximize regularity of multiplication array. Moves partial products with negative signs to the last steps; also adds negation of partial products rather than subtracts.
40
Booth multiplier Encoding scheme to reduce number of stages in multiplication. Performs two bits of multiplication at once—requires half the stages. Each stage is slightly more complex than simple multiplier, but adder/subtracter is almost as small/fast as adder.
41
Booth encoding Two’s-complement form of multiplier:
y = -2nyn + 2n-1yn-2 + 2n-2yn Rewrite using 2a = 2a+1 - 2a: y = -2n(yn-1-yn) + 2n-1(yn-2 -yn-1) + 2n-2(yn-3 -yn-2) + ... Consider first two terms: by looking at three bits of y, we can determine whether to add x, 2x to partial product.
42
Booth actions yi yi-1 yi-2 increment 0 0 0 0 0 0 1 x 0 1 0 x 0 1 1 2x
x x x x x x
43
Booth example x = 011001 (2510), y = 101110 (-1810).
y1y0y-1 = 100, P1 = P0 - (10 ) = y3y2y1= 111, P2 = P1 0 = y5y4y3= 101, P3 = P =
44
Booth structure
45
Wallace tree Reduces depth of adder chain.
Built from carry-save adders: three inputs a, b, c produces two outputs y, z such that y + z = a + b + c Carry-save equations: yi = parity(ai,bi,ci) zi = majority(ai,bi,ci)
46
Wallace tree structure
47
Wallace tree operation
At each stage, i numbers are combined to form ceil(2i/3) sums. Final adder completes the summation. Wiring is more complex. Can build a Booth-encoded Wallace tree multiplier.
48
Serial-parallel multiplier
Used in serial-arithmetic operations. Multiplicand can be held in place by register. Multiplier is shifted into array.
49
Power consumption of multipliers
Callaway and Schwartzlander experiments for bit widths of 8 through 32: array multiplier Wallace tree without Booth encoding Wallace tree with Booth encoding Wallace tree used significantly less power, advantage grew with word length.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.