Decimal Floating-point Multiplication via Carry-Save Addition Mark Erle Systems & Technology Group International Business Machines Brian Hickmann & Mike Schulte Electrical & Computer Engineering University of Wisconsin at Madison
2 Outline Introduction and motivation Introduction and motivation Extensions to fixed-point design Extensions to fixed-point design Implementation highlights Implementation highlights Verification and synthesis results Verification and synthesis results Summary Summary
3 Introduction Preponderance of business data in decimal form Preponderance of business data in decimal form Inexact mapping between decimal and binary Inexact mapping between decimal and binary Decimal arithmetic used/required in banking, finance, insurance, accounting Decimal arithmetic used/required in banking, finance, insurance, accounting Increasing support in arithmetic community, (IEEE P754 in ballot review process) Increasing support in arithmetic community, (IEEE P754 in ballot review process) Multiplication a key function Multiplication a key function
4 Motivation What's involved in extending fixed-point multiplication to support floating-point? What's involved in extending fixed-point multiplication to support floating-point? What are the similarities and differences with BFP multiplication? What are the similarities and differences with BFP multiplication?
5
6 Intermediate Exponent Calculation Preferred exponent: Preferred exponent: PE = E A + E B - bias Based on location of the decimal point (effective shift right): Based on location of the decimal point (effective shift right): IE IP = PE + p After left shifting the intermediate product: After left shifting the intermediate product: IE SIP = IE IP – SLA
7 Intermediate Product Shifting Based on leading zero counts of operands Based on leading zero counts of operands SLA may be off by one; need guard digit SLA may be off by one; need guard digit SLA = min(LZ A + LZ B, p) SLA = min(LZ A + LZ B, p) Shift right when IE IP < Emin Shift right when IE IP < Emin
8 Sticky Bit Generation Logically, all bits beyond the round digit must be ORed after left shifting Logically, all bits beyond the round digit must be ORed after left shifting SC = S IP – p – 2, where 2 is for g and r SC = S IP – p – 2, where 2 is for g and r Generate sticky bit on-the-fly, ORing one digit at a time while decrementing SC Generate sticky bit on-the-fly, ORing one digit at a time while decrementing SC SC = min(0, p – (LZ A - LZ B )) SC = min(0, p – (LZ A - LZ B )) –S IP - p = ((p – LZ A ) + (p – LZ B )) – p –Calculate two cycles prior to when needed
9 Rounding - Scheme No rounding overflow... simplifies scheme No rounding overflow... simplifies scheme Unique compound adder needed Unique compound adder needed –SIP may be in redundant form –Require C SIP +0 and C SIP +1; named C +0 and C +1 Possible corrective left shift (cls) of one digit Possible corrective left shift (cls) of one digit –S IP = S A + S B or S A + S B - 1 –Adder p digits wide –Concatenate g or g + 1
10 Rounding – Scheme Continued Three cases based on MSDs of C +0 and C +1 Three cases based on MSDs of C +0 and C +1 –No leading zeros, no corrective left shift –Leading zeros, possible corrective left shift –Zero followed by all nines Logically, select one among the following Logically, select one among the following –C +0, C +1 –C +0 « 1 || g, C +0 « 1 || g + 1 –C +1 « 1 || g, C +1 « 1 || g + 1 –Zero, largest finite number, infinity
11 Exception Detection & Handling Invalid operation Invalid operation –sNaN (pass significand of sNaN) –0 x ∞ (produce qNaN with significand 0) Overflow (and Inexact) Overflow (and Inexact) –IE IP – SLA > Emax –Increase SLA until all LZs removed Underflow (and possibly Inexact) Underflow (and possibly Inexact) –IE IP – SLA < Emin –Decrease SLA until 0, then shift right Inexact Inexact
12
13 Implementation Highlights Leverage operands' LZCs Leverage operands' LZCs –SC, SLA, and IE SIP Handle NaNs with minimal overhead Handle NaNs with minimal overhead –No dataflow modification –Coerce multiplicand or multiplier to 1 Support gradual underflow Support gradual underflow –No dataflow modification –Simply extend number of iterations Simple, control-based rounding scheme Simple, control-based rounding scheme
14 RTL Model and Verification Verilog model for both fixed-point and floating-point multiplier designs Verilog model for both fixed-point and floating-point multiplier designs All rounding modes, NaNs, exceptions All rounding modes, NaNs, exceptions Over 500,000 random & directed testcases Over 500,000 random & directed testcases –IBM decNumber based –IBM Haifa's FPgen (IEEE754R compliance) –IBM dectest Validated pre- and post-synthesis Validated pre- and post-synthesis
15 Synthesis Results 64-bit (16 digit) operands, DPD encoded 64-bit (16 digit) operands, DPD encoded LSI Logic's gflxp 0.11um CMOS, 55ps FO4 LSI Logic's gflxp 0.11um CMOS, 55ps FO4 Synopsys Design Compiler Synopsys Design Compiler Results Results –Fixed-point119,653 um FO4s –Floating-point237,607 um FO4s Critical path Critical path –Fixed-point4:2 compressor (accumulator) –Floating-point128-bit barrel shifer
16 Applicability to Parallel Designs IE and IP shift generation IE and IP shift generation Rounding scheme Rounding scheme NaN handling NaN handling Exception detection and handling Exception detection and handling On-the-fly sticky bit generation... NO On-the-fly sticky bit generation... NO
17 Sequential vs. Parallel Sequential Sequential –Less area –Potentially better cycle time Parallel Parallel –Less latency –Higher throughput
18 Summary Extended fixed-point, serial multiplier to support floating-point Extended fixed-point, serial multiplier to support floating-point Leveraged operands' LZCs Leveraged operands' LZCs Developed an efficient rounding scheme Developed an efficient rounding scheme Verified RTL and gate-level models Verified RTL and gate-level models Presented area and delay numbers for fixed- and floating-point designs Presented area and delay numbers for fixed- and floating-point designs Discussed applicability to parallel designs Discussed applicability to parallel designs
19 Et voilà! Vive le système décimale!
20 Backup Slides
21 No Rounding Overflow If S IP = 2p – 1 If S IP = 2p – 1 –MSD == 0 –Increment will not cause rounding overflow If S IP = 2p If S IP = 2p –Then we must have string of p 9s –p 9s is greater than maximum product –No rounding overflow possible Simplifies rounding scheme Simplifies rounding scheme
22 Decimal Storage Format