Aug Shift Operations Source: David Harris
Aug Shifter Implementation Regular layout, can be compact, use transmission gates to avoid threshold drop. Not amenable to synthesis, high capacitive loading for large arrays. Source: David Harris
Aug Shifter Implementation Each level shifts by two. Amenable to synthesis, fast.
Aug Multiplication Source: David Harris
Aug Array Multiplier with CPAs Source: Jan Rabaey Array adder with Carry propagate adders (CPA), multiple near-critical paths
Aug Array Multiplier with CSAs Only one critical path Source: Jan Rabaey
Aug How do CSAs work? CSA: Carry Save Adder Want to add these four numbers together (same problem as adding partial products in a multiplier) Source: David Harris
Aug How do CSAs work? (cont) Can use a full adder network to add three numbers together if we view the carry-in inputs as a bus that contains the third number. The output produces a sum vector and a carry vector, and these have to be added to produce the final result. Source: David Harris
Aug How do CSAs work? (cont) carry vector has to be shifted to left by 1 before being added to the sum because the COUT bit has a weight of 2x that of the sum bit. Source: David Harris
Aug CSA Multiplier Carry is shifted to left before being added. This final addition is always N/2 in size if the product has N bits. For large multipliers, need to use a fast adder structure to do this addition. Source: Jan Rabaey
Aug Multiplier Layout Layout can be made to be rectangular Source: David Harris
Aug ’s Complement Multiply Definition MSb has negative weight 4 bit 2’s complement example: = -5 = 0xB = 1011 = -1* *2 2 +1*2 1 +1*2 0 = =-5 Source: David Harris
Aug ’s Complement Multiplication 2’s complement Source: David Harris
Aug Modified Baugh-Wooley Multiplier (2’s complement) Pre-compute sums of constant ‘1’, push some terms upwards. Source: David Harris
Aug Multiplier Layout For Two’s Complement Shaded Cells are modified cells for Baugh- Wooley. Source: David Harris
Aug Booth Encoding Previous multipliers use radix-2, one bit of the multiplier is observed at a time. In general, radix-2 r multipliers produce N/r partial products (assuming NxN multiplier). Fewer partial products lead to smaller/faster CSA arrays. A radix-4 = radix-2 2 multiplier produces N/2 partial products. Two-bits * two bits = Y 1 Y 0 * X 1 X 0 = Y*X = Y*0, Y*1, Y*2, Y*3 Y*0, Y*1, Y*2 are easy/fast (Y*2 is a shift). Y*3 is hard, has to be done Y*3= Y*(2+1)= 2Y + Y, involves a carry propagate.
Aug Radix-4 Partial Products Y * X N-1 X N-2...X 3 X 2 X 1 X 0 Y* X 1 X 0 + Y* X 3 X 2 + Y* X N-1 X N-2 Number of partial products is reduced. Source: David Harris
Aug Booth Encoding (cont.) Observe that 2Y = 4Y – 2Y and 3Y = 4Y – Y 4Y is simply the next row in the partial product, so just add Y to next row. In both cases, Y has to be added to current partial product. Booth encoding looks at current 2 bits, and MSB of previous 2 bits, and modifies the partial product. If the MSB of the previous pair is ‘1’, add in ‘Y’ to current value.
Aug Booth Encoding (cont) PP =0*Y PP =0*Y +Y = Y PP =Y +0 = Y PP =Y +Y = 2Y PP =-2Y +0 = -2Y PP =-2Y +Y = -Y PP =-Y +0 = -Y PP =-Y +Y = 0 Sign bit select 2Y select 1Y select Negative operations are done at bit level as complements with +1 added to PP to complete 2’s complement Source: David Harris
Aug Booth Selection Logic Replaces AND gates in CSA array When –Y is chosen, have a problem in that a ‘1’ has to be added to complete two’s complement Source: David Harris
Aug Unsigned R-4 Booth Array (16 x 16) sign extension, either all 1’s or all 0’s for -Y terms ‘1’ or ‘0’ needed to complete 2’s complement Extra PP in case last PP needed a ‘Y’ added in here (last two X bits were either 2 or 3) Source: David Harris
Aug Optimized R-4 Booth Array (unsigned) SSSS = S additional reduction produces this. Source: David Harris
Aug Signed R-4 Booth Array (16 x 16) e i = M i xor y15 Last PP8 is not needed for signed multiply Source: David Harris
Aug Booth Speedup Radix-4 arrays 20-to-50% smaller than CSA arrays and up to 20% faster. Higher Radix multipliers are possible, but not worth it except for larger multipliers (at least 64 bits).
Aug Wallace Trees A CSA adder just adds the PPs together one at a time: 3,2 Counter is another name for a full adder Source: David Harris
Aug Wallace Trees (cont). A Wallace tree adds the partial products in parallel! Number of levels is: Layout is not regular, long wires can cause delay. Source: David Harris
Aug Compressor Used to reduce the number of levels in a Wallace Tree Number of levels is: Logic more complex than Full Adder Layout is more regular. Source: David Harris
Aug Multiplier Summary CSA’s – simple, but many partial products Booth Encoding – reduces number of required PPs, achieves speedup over CSAs Wallace Trees – adds PPs in parallel