Low-power, High-speed Multiplier Architectures Shawn Nicholl ELEC-5705y March 7, 2005
Low-Power, High-Speed Multiplier Architectures Agenda/Overview Design Abstraction Numbering Systems Addition and Subtraction Adder Architectures Multiplication Traditional Multiplier Architectures Advanced Multiplier Architectures 2005/03/07 Low-Power, High-Speed Multiplier Architectures
Levels of Abstraction in Digital ICs Low-power, high-speed techniques can be used at many levels of abstraction Systems Increasing Abstraction Modules Multiplier Architectures Logic Gates Circuits Devices Higher levels of abstraction have greater effect on overall system performance 2005/03/07 Low-Power, High-Speed Multiplier Architectures
Numbering Systems – A Quick Review Some common numbering systems: Decimal Range: 0 to 10n-1 Unsigned Binary Range: 0 to 2n-1 Two’s-Complement Range: -2n-1 to +(2n-1 –1) Sign Decimal Unsigned Binary Two’s Complement + 10 0000 1010 N/A - 45 0010 1101 1101 0011 45d = 0+0+25+0+23+22+0+20 0 0 1 0 1 1 0 1 Eg. 1 1 0 1 0 0 1 0 1 2’s Comp 1 1 0 1 0 0 1 1 2005/03/07 Low-Power, High-Speed Multiplier Architectures
Adding and Subtracting Two’s-complement algorithm is consistent Addition and subtraction and behave the same Negative numbers treated same as positive numbers Example: Add –45d to 10d 10d -45d 45d -10d 35d -35d Step1) Initialize Step2) Compare so that augend holds larger number Step3) Treat as a subtraction Step4) Do subtraction (borrows may be required) Step5) Negate result (knowing that augend was negative) Two’s Complement Method Step1) Initialize Step2) Add (no special rules) 10d = 0000 1010b -45d = 1101 0011b 0000 1010b 1101 0011b 1101 1101b Converting 2’s Comp back to decimal: 1101 1101b = -35d -45d Augend 10d Addend ----- 2005/03/07 Low-Power, High-Speed Multiplier Architectures
Adding and Subtracting (Example 2) Example2: Subtract –45d from 10d Two’s Complement Method 10d = 0000 1010b -45d = 1101 0011b 1b 0000 1010b 0010 1100b 0011 0111b Converting 2’s Comp back to decimal: 0011 0111b = 55d Step1) Initialize Step2) Invert subtrahend and set CIN = 1 Subtraction logic can be shared with addition logic! Signed Decimal Method 10d - -45d + 45d 55d Step1) Initialize Step2) Subtrahend is negative, so negate it and do an addition 10 minuend - -45 subtrahend 55 2005/03/07 Low-Power, High-Speed Multiplier Architectures
Low-Power, High-Speed Multiplier Architectures Adder Building Blocks Half Adder Sn = An Bn COn = An • Bn Full Adder Sn = An Bn CINn COUTn = An • Bn• CINn 2005/03/07 Low-Power, High-Speed Multiplier Architectures
Adder Architectures (CRA) Carry Ripple Adder (CRA) Gate Count N Area N Delay N Power N Layout friendly (low fan-in/fan-out; regular structure) 2005/03/07 Low-Power, High-Speed Multiplier Architectures
Adder Architectures (CLA) Carry Lookahead Adder (CLA) Generate: Gn = An • Bn Propagate: Pn = An + Bn Recursive Relationship: CINn = Gn-1 + Pn-1• CINn-1 Generates Propagates 1 CINn = Gn-1 + Pn-1Gn-2 + Pn-1Pn-2…P1G0 + Pn-1Pn-2…P0CIN0 Source: Patterson and Hennessy, Figure A.14 CLA: Delay log2N (if built right) Gate count, power are greater than CRA Not layout friendly (high fan-in; difficult to route) Shows the technique of parallelism to make the circuit faster. CINn = Gn-1 + Pn-1 * CINn-1 If previous stage generates a carry, then there is a carry-in to the current stage OR If previous stage has a carry-in and it propagates that carry-in, then there is a carry-in to current stage. 2005/03/07 Low-Power, High-Speed Multiplier Architectures
Adder Architectures (CSA) Carry Save Adder Adders work independently, so very fast Pipelined architecture results in flops and control logic, which increase area and latency 2005/03/07 Low-Power, High-Speed Multiplier Architectures
Unsigned Multiplication Example: Multiply 118d by 99d Two’s Complement Method Step1) Initialize Step2) Find partial products Step3) Sum up the shifted partial products Multiplicand Multiplier Step1) Initialize Step2) Find partial products Step3) Sum up the shifted partial products 118d 99d 1062d 1062 d 11682d 118d = 0111 0110b 99d = 0110 0011b 01110110b 01110110 b 00000000 b 00000000 b 00000000 b 01110110 b 01110110 b 00000000 b 010110110100010 b Shift-and-Add Algorithm Convert 2’s-Comp back to decimal: 0010 1101 1010 0010 = 11682d 2005/03/07 Low-Power, High-Speed Multiplier Architectures
Shift-and-Add Multiplier B Multiplicand X A Multiplier P Product Shift-and-Add Multiplier Take N cycles to complete: TLat= (TN-bitADD+Tshift)xN Requires minimal logic (most logic is in the adder) 2005/03/07 Low-Power, High-Speed Multiplier Architectures
Basic Signed Multiplication Basic Idea Convert to Unsigned Use Shift-and-Add Multiplier Convert to Signed Extra Hardware! 2005/03/07 Low-Power, High-Speed Multiplier Architectures
Signed Multiplication Booth Recoding Reduce the number of partial products by re-coding the multiplier operand Works for signed numbers Low-order Bit Last Bit Shifted Out Example: Multiply -118d by -99d An An-1 Partial Product 1 +B -B Recall, 99d = 0110 0011b 1001 1100b 1b -99d = 1001 1101b Radix-2 Booth Recoding -99d = 2005/03/07 Low-Power, High-Speed Multiplier Architectures
Radix-2 Booth Multiplication Example: Multiply -118d by -99d Radix-2 Booth Step1) Initialize Step2) Find partial products Step3) Sum up the shifted partial products B = -118d = 1000 1010b -B = 118d = 0111 0110b A = -99d = 1001 1101b -118d = 0111 0110b -99d = -99d = 01110110b -B B 110001010 b Sign Extension 01110110 b 00000000 b 00000000 b 1110001010 b 000000000 b 01110110 b 0010110110100010 b Convert 2’s-Comp back to decimal: 0010 1101 1010 0010 = 11682d 2005/03/07 Low-Power, High-Speed Multiplier Architectures
Low-Power, High-Speed Multiplier Architectures Array Multiplier -118d = 0111 0110b 01110110b 00000000 b 00000000 b 1110001010 b 000000000 b 01110110 b 0010110110100010 b 110001010 b 01110110 b -99d = -B B 01110110b 110001010 b 01110110 b -B B 00000000 b 00000000 b 1110001010 b 000000000 b 01110110 b Array Multiplier Combinatorial, so it is very fast – delay N Can be pipelined Very regular structure 2005/03/07 Low-Power, High-Speed Multiplier Architectures
Array Multiplier Structure Source: J. Kuo and J. Lou, Low-Voltage CMOS VLSI Circuits, 1999 2005/03/07 Low-Power, High-Speed Multiplier Architectures
Radix-4 Booth Multiplication Low-order Bits Similar to Radix-2, but uses looks at two low-order bits at a time (instead of 1) Last Bit Shifted Out A2n+1 A2n A2n-1 Partial Product 1 +B +2B -2B -B Recall, 99d = 0110 0011b 1001 1100b 1b -99d = 1001 1101b Radix-4 Booth Recoding -99d = 2005/03/07 Low-Power, High-Speed Multiplier Architectures
Radix-4 Booth Multiplication Example: Multiply -118d by -99d Radix-4 Booth Step1) Initialize Step2) Find partial products Step3) Sum up the shifted partial products B = -118d = 1000 1010b -B = 118d = 0111 0110b 2B = -236d = 1 0001 0100b -2B = 236d = 0 1110 1100b A = -99d = 1001 1101b -118d = 0111 0110b -99d = 111111110001010b 011101100 b 0010110110100010 b 01110110 b 11100010100 b B -B 2B -2B Sign Extension -99d = Convert 2’s-Comp back to decimal: 0010 1101 1010 0010 = 11682d Reduces number of partial products by half! 2005/03/07 Low-Power, High-Speed Multiplier Architectures
Tree Multiplier Wallace Tree Reduces the total number of full-adders Original Structure Tree Structure Wallace Tree Reduces the total number of full-adders Uses 3:2 Compressor (aka Full Adder) Delay log3/2N Irregular structure is difficult to layout Source: J. Kuo, et. al., Low-Voltage CMOS VLSI Circuits, 1999 2005/03/07 Low-Power, High-Speed Multiplier Architectures
Twin Pipe Serial-Parallel Multiplier Even data bits on rising clock Parallel Feed One Operand Serial Feed One Operand Odd data bits on falling clock Source: S. Shah, et.al., “Comparison of 32-bit Multipliers for Various Performance Measures”, 2000. Features Low Area High latency Low Power 2005/03/07 Low-Power, High-Speed Multiplier Architectures
Cluster Multiplication Divide circuit into clusters of nibble-wide multiplications If all bits in a nibble are zeroes, then use clock-gating to gate multiplication for that nibble Features Low Power (claims 13% savings) Source: A. Fayed, M. Bayoumi, “A Novel Architecture for Low-Power Design of Parallel Multipliers”, 2001. 2005/03/07 Low-Power, High-Speed Multiplier Architectures
Multiplexer-Based Array Multiplier Characteristics Fast (because it is array-based) Unlike Booth, does not require encoding logic Source: K. Pekmestzi, “Multiplexer-Based Array Multipliers”, 1999. Processes 1 bit of multiplier and 1 bit of multiplicand at a time, thus it is symmetric Has a zigzag shape, thus not layout-friendly 2005/03/07 Low-Power, High-Speed Multiplier Architectures
Area-Efficient Multiplexer-Based Multiplier Source:Y. Wang, Y. Jiang, E. Sha, “On Area-Efficient Low Power Array Multipliers”, 2001. Characteristics Increases each row to have N+1 cells (instead of N) Depth is cut in half (increases “squareness”) 2005/03/07 Low-Power, High-Speed Multiplier Architectures
Low Latency Booth-Encoding-based Pipeline Multiplier Features Delay N/4 Needs (N+N/2)-bit addition at end Uses CLA’s instead of CSA’s because longest stage (i.e. adder at end) determines fastest operating frequency Source: X. Wu, H. Chen, S. Wei, “Design of a Low Latency High Speed Pipelining Multiplier”, 2001. 2005/03/07 Low-Power, High-Speed Multiplier Architectures
Two’s Complement Gray-Encoded Array Multiplier Characteristics Uses gray code to reduce the switching activity of multiplier Claims that traditional Booth uses 45% more power Greater area than traditional Booth Source: E. Costa, et.al., “A New Architecture for 2’s Complement Gray Encoded Array Multiplier”, 2002. 2005/03/07 Low-Power, High-Speed Multiplier Architectures
Low-Power, High-Speed Multiplier Architectures Project Plan Start End Task - 03/05 Research Multiplier Circuits 03/06 03/12 Code multipliers in Verilog HDL 03/13 03/19 Synthesize all multiplier circuits 03/20 03/26 Analyze results (delay/power/area) 03/27 04/02 Prepare report 04/03 04/09 Prepare for final exam 04/10 04/16 Complete Report and Submit 2005/03/07 Low-Power, High-Speed Multiplier Architectures
Low-Power, High-Speed Multiplier Architectures References S. Shah, A.J. Al-Khalili, D. Al-Khalili, “Comparison of 32-bit Multipliers for Various Performance Measures”, Proc. 2000 Int’l Conf. Microelectronics, pp. 75-80, 2000. D. Patterson, J. Hennessy, 2nd, ed., Computer Architecture – A Quantitative Approach, San Francisco, CA: Morgan Kaufmann Publishers, Inc., 1996. X. Wu, H. Chen, S. Wei, “Design of a Low Latency High Speed Pipelining Multiplier”, Proc. 2001 Int’l Conf. on ASIC, pp. 551-554, 2001. J. Wakerly, 2nd, ed., Digital Design – Principles and Practices, Eaglewood Cliffs, NJ: Prentice Hall, 1994. J. Kuo and J. Lou, Low-Voltage CMOS VLSI Circuits, New York, NY: John Wiley & Sons, Inc., 1999. K. Pekmestzi, “Multiplexer-Based Array Multipliers”, IEEE Trans. on Computers, vol. 48, pp. 15-23, 1999. A. Fayed, M. Bayoumi, “A Novel Architecture for Low-Power Design of Parallel Multipliers”, Proc. 2001 IEEE Computer Society Workshop on VLSI, pp. 149-154, 2001. Y. Wang, Y. Jiang, E. Sha, “On Area-Efficient Low Power Array Multipliers”, Proc. 2001 IEEE Int’l Conf. On Electronics, Circuits and Systems, vol. 3, pp. 1429‑1432, 2001. 2005/03/07 Low-Power, High-Speed Multiplier Architectures