Lecture 9 Digital VLSI System Design Laboratory Instructor: Prof. Diana Marculescu Fall 1999
Topics Shifters. Adders and ALUs.
Combinational shifters Useful for arithmetic operations, bit field extraction, etc. Latch-based shift register can shift only one bit per clock cycle. A multiple-shift shifter requires additional connectivity.
Barrel shifter Can perform n-bit shifts in a single cycle. Efficient layout. Does require transmission gates and long wires.
Barrel shifter structure Accepts 2n data inputs and n control signals, producing n data outputs.
Barrel shifter operation Selects arbitrary contiguous n bits out of 2n input buts. Examples: right shift: data into top, 0 into bottom; left shift: 0 into top, data into bottom; rotate: data into top and bottom.
Barrel shifter layout Two-dimensional array of 2n vertical Xn horizontal cells. Input data travels diagonally upward. Output wires travel horizontally. Control signals run vertically. Exactly one control signal is set to 1, turning on all transmission gates in that column.
Barrel shifter cell
Barrel shifter organization control outputs inputs
Barrel shifter in action 1
Analysis Large number of cells, but each one is small. Delay is large, considering long wires and transmission gates.
Adders Adder delay is dominated by carry chain. Carry chain analysis must consider transistor, wiring delay. Modern VLSI favors adder designs which have compact carry chains.
Full adder Computes one-bit sum, carry: si = ai XOR bi XOR ci ci+1 = aibi + aici + bici Ripple-carry adder: n-bit adder built from full adders. Delay of ripple-carry adder goes through all carry bits.
Carry-lookahead adder First compute carry propagate, generate: Pi = ai + bi Gi = ai bi Compute sum and carry from P and G: si = ci XOR Pi XOR Gi ci+1 = Gi + Pici
Carry-lookahead expansion Can recursively expand carry formula: ci+1 = Gi + Pi(Gi-1 + Pi-1ci-1) ci+1 = Gi + PiGi-1 + PiPi-1 (Gi-2 + Pi-1ci-2) Expanded formula does not depend on intermerdiate carries. Allows carry for each bit to be computed independently.
Depth-4 carry-lookahead
Analysis Deepest carry expansion requires gates with large fanin: large, slow. Carry-lookahead unit requires complex wiring between adders and lookahead unit—values must be routed back from lookahead unit to adder. Layout is even more complex with multiple levels of lookahead.
Carry-skip adder Looks for cases in which carry out of a set of bits is identical to carry in. Typically organized into m-bit stages. If ai = bi for every bit in stage, then bypass gate sends stage’s carry input directly to carry output.
Two-bit carry-skip structure skipping carry internal carry
Carry-skip group structure FA FA FA FA FA FA skip skip
Carry-select adder Computes two results in parallel, each for different carry input assumptions. Uses actual carry in to select correct result. Reduces delay to multiplexer.
Carry-select structure
Manchester carry chain Precharged carry chain which uses P and G signals. Propagate signal connects adjacent carry bits. Generate signal discharges carry bit. Worst-case discharge path goes through entire carry chain.
Manchester carry chain circuit + + Pi-1 Pi Gi-1 Gi stage i-1 stage i
Serial adder May be used in signal-processing arithmetic where fast computation is important but latency is unimportant. Data format (LSB first): ... bit 3 bit 2 bit 1 bit 0
Serial adder structure LSB control signal clears the carry shift register:
Power consumption of adders Callaway and Schwartzlander experiments on 16-bit adders: constant width carry-skip gave lowest power consumption ripple-carry was slightly higher carry lookahead was higher carry-select was highest
ALUs ALU computes a variety of logical and arithmetic functions based on opcode. May offer complete set of functions of two variables or a subset. ALU built around adder, since carry chain determines delay.
Function block circuit out a b ctrl0 ctrl1 ctrl2 ctrl3
Function blocks and ALUs Function block may be used to compute required intermediate signals for a full-function ALU. Requires little area. Transmission gates may introduce significant delay.
ALU structure
ALU design P and G compute intermediate values from inputs. May not correspond to carry lookahead P and G for non-addition functions. Add unit is adder of choice. Output unit computes from sum, propagate signal.
Topics Multipliers.
Elementary school algorithm 0 1 1 0 multiplicand x 1 0 0 1 multiplier 0 1 1 0 + 0 0 0 0 0 0 1 1 0 0 0 0 1 1 0 + 0 1 1 0 0 1 1 0 1 1 0 x 1 x 0 partial product x 0 x 1
Combinational multiplier Uses n adders, eliminates registers: bit of multiplier controls whether addition occurs
Array multiplier Array multiplier is an efficient layout of a combinational multiplier. Array multipliers may be pipelined to decrease clock period at the expense of latency.
Array multiplier organization 0 1 1 0 x 1 0 0 1 + 0 0 0 0 0 0 1 1 0 0 0 0 1 1 0 + 0 1 1 0 0 1 1 0 1 1 0 multiplicand multiplier skew array for rectangular layout product
Unsigned array multiplier x2y0 x1y0 x0y0 …. + x1y1 + x0y1 + x1y2 + x0y2 …. xnyn + + P2n-1 P2n-2 P0
Baugh-Wooley multiplier Algorithm for two’s-complement multiplication. Adjusts partial products to maximize regularity of multiplication array. Moves partial products with negative signs to the last steps; also adds negation of partial products rather than subtracts.
Booth multiplier Encoding scheme to reduce number of stages in multiplication. Performs two bits of multiplication at once—requires half the stages. Each stage is slightly more complex than simple multiplier, but adder/subtracter is almost as small/fast as adder.
Booth encoding Two’s-complement form of multiplier: y = -2nyn + 2n-1yn-2 + 2n-2yn-2 + ... Rewrite using 2a = 2a+1 - 2a: y = -2n(yn-1-yn) + 2n-1(yn-2 -yn-1) + 2n-2(yn-3 -yn-2) + ... Consider first two terms: by looking at three bits of y, we can determine whether to add x, 2x to partial product.
Booth actions yi yi-1 yi-2 increment 0 0 0 0 0 0 1 x 0 1 0 x 0 1 1 2x 0 0 0 0 0 0 1 x 0 1 0 x 0 1 1 2x 1 0 0 -2x 1 0 1 -x 1 1 0 -x 1 1 1 0
Booth example x = 011001 (2510), y = 101110 (-1810). y1y0y-1 = 100, P1 = P0 - (10 011001) = 11111001110. y3y2y1= 111, P2 = P1 0 = 11111001110. y5y4y3= 101, P3 = P2 - 0110010000 = 11000111110.
Booth structure
Wallace tree Reduces depth of adder chain. Built from carry-save adders: three inputs a, b, c produces two outputs y, z such that y + z = a + b + c Carry-save equations: yi = parity(ai,bi,ci) zi = majority(ai,bi,ci)
Wallace tree structure
Wallace tree operation At each stage, i numbers are combined to form ceil(2i/3) sums. Final adder completes the summation. Wiring is more complex. Can build a Booth-encoded Wallace tree multiplier.
Serial-parallel multiplier Used in serial-arithmetic operations. Multiplicand can be held in place by register. Multiplier is shifted into array.
Power consumption of multipliers Callaway and Schwartzlander experiments for bit widths of 8 through 32: array multiplier Wallace tree without Booth encoding Wallace tree with Booth encoding Wallace tree used significantly less power, advantage grew with word length.