By Liang-Kai Wang and Michael J. Schulte Joseph Schneider March 12, 2010
Goal is to improve latency for DFP Adder Number of modifications performed to achieve this, such as an implementation of a new internal format Overall focus is on the design of a decimal LZA
Detects location of most significant bit Previous designs have been for binary, not decimal Design of decimal LZA expected to improve latency
Exponent field uncompressed Significand encoded in BCD New section for Leading Zero Count; Removes leading zero detection from critical path
Internal Format removes need for Forward and Backward conversion units Pre-correction moved in front of Swapping unit and duplicated; Keeps it out of critical path Leading Zero Detection no longer performed in Shift Amount unit; Lead Zero Count is now an input signal, LZA used so later decimal operations do not need to recalculate it
Needed in addition and subtraction to guarantee leading zero count of output is correct Only needed when result after addition or subtraction is not rounded; LZC is always zero when result is rounded
Preliminary LZC is the minimum number of leading zeros between the two significands being added If there is a carry, final LZC obtained by reducing preliminary LZC by one
Requires Encoding unit, Correction unit, and a parallel array of decimal digit adders Encoding unit ◦ Converts BCD digits into strings of zeros and ones ◦ Detects position of most significant non-zero digit in the string Correction unit ◦ Flag generation modules and correction trees determine if correction needs to be performed on Encoding unit’s result
IBM decNumber library version 3.56 used to verify correctness of adder Sign, exponent, and length and value of significand randomly generated Adder successfully passed numerous random tests and the corner cases of IBM’s test suite Previous adder version and new adder implemented in Verilog RTL using TSMC 45nm bulk technology
Both designs use same floorplan so Area Util. Rate reflects how much area used by each design New adder 14% faster but at the cost of 18% more area
LZA takes up significant amount of area, though Kogge-Stone adder is still the largest component
LZA synthesized alone; Critical path has maximum delay of 24 FO4 inverter delays Subtractor takes up over 60% of LZA area