Ph.D final defence1 Algorithms and Architectures for Decimal Transcendental Function Computation Ph.D Candidate: Dongdong Chen Department of Electrical.

Ph.D final defence1 Algorithms and Architectures for Decimal Transcendental Function Computation Ph.D Candidate: Dongdong Chen Department of Electrical and Computer Engineering University of Saskatchewan

Ph.D final defence2 Outline Research Background and Motivation Table-based First-Order Polynomial Approximation Digit-Recurrence with Selection by Rounding Function Iteration Method Conclusion

Ph.D final defence3 Research Background Why Decimal Arithmetic?

Ph.D final defence4 Objectives (Con.) Algorithm, architecture and VLSI circuit design for DFP arithmetic (Lecture 2) –DFP adder/substracter –DFP multiplier –DFP divider –DFP transcendental function computation

Ph.D final defence5 Background The decimal computer arithmetic went out of style 25 to 30 years ago; no one uses it now." Is that true?

Ph.D final defence6 Introduction Decimal is still essential for specific applications –Numbers in commercial databases are decimal –Extensive use decimal in commercial applications –Survey of commercial databases report –Decimal fixed-point or floating-point number How to process decimal computation –Software computation –Convert back to decimal representation –Problems

Ph.D final defence7 Introduction (Con.) Errors from decimal and binary conversion –Example 1: represent 0.1 in DFP or BFP Decimal representation (BCD code):0.0001 Binary representation: 0.00011… 0.09… –Example 2: telephone billing Cost: 0.70; Tax: 5% BFP arithmetic: 0.6999…8*(1.05)=0.734999… DFP arithmetic: 0.70*(1.05)=0.74 Decimal integer, fixed-point or floating-point? Decimal hardware or software solutions?

Ph.D final defence8 DFP arithmetic defined in IEEE 754-2008 IBM computing systems include DFP hardware –IBM Power6, z9, z10 Intel include DFP software solution in system –Intel DFP software computation library DFP arithmetic IP blocks: –Basic DFP arithmetic IPs: DFP adder/substrcter, multiplier, divider, square root etc. –Transcendental DFP arithmetic IPs: DFP CORDIC, Logarithm, antilogarithm, reciprocal etc. Current Researches

Ph.D final defence9 DFP Arithmetic in IEEE 754-2008 Review BFP arithmetic in IEEE 754-2008 How to define new DFP in IEEE 754-2008

Ph.D final defence10 BFP Floating-point representation Representation: –sign, exponent, significand (or mantissa): (–1) sign ×  significand ×  2 exponent –more bits for significand gives more accuracy –more bits for exponent increases range IEEE 754 floating point standard: –single precision: 8 bit exponent, 23 bit significand –double precision: 11 bit exponent, 52 bit significand

Ph.D final defence11 BFP floating-point Number Leading “1” bit of significand is implicit –Example: if the significand is 011010110…0, the actual significand is 1.011010110…0 This is called a normalized number; there is exactly one non-zero digit to the left of the point. –Unique representation of a number –We get a little more precision: there are 24 bits in the significand, but only 23 of them are stored.

Ph.D final defence12 Exponent Exponent is “biased” to make sorting easier –all 0s is smallest exponent, all 1s is largest –The actual exponent is e-127 for single precision, and e-1023 for double precision –Bias of 127 for single precision and 1023 for double precision –By biasing the exponent and storing it before the significand, we can compare magnitudes as if they were unsigned integers. If e = 1000 0011 (131 10 ), the actual exponent is 131-127=4 If e = 0101 1101 (93 10 ), the actual exponent is 93-127=-34

Ph.D final defence13 BFP Floating-Point Formats

Ph.D final defence14 BFP Floating-Point Formats (Con.) Negative Overflow Positive Overflow Expressible negative numbers Expressible positive numbers 0-2 -127 2 -127 Positive underflow Negative underflow (2 – 2 -23 )×2 128 - (2 – 2 -23 )×2 128 00000000 00000000000000000000000 Biased exponent Fraction Positive and negative zero 11111111 00000000000000000000000 Biased exponent Fraction 1 1 0 0 Positive and negative infinity exponent = 128 and fraction ≠ 0, It is called “not a number” or NaN 0 ∞

Ph.D final defence15 Example Summary: FP representation (–1) sign ×  significand)×2 exponent – bias Example: –decimal: -.75 = -3/4 = -3/2 2 –binary: -.11 = -1.1 x 2 -1 –floating point: exponent = 126 = 01111110 –IEEE single precision: 1 01111110 10000000000000000000000

Ph.D final defence16 Representation: –sign, exponent, significand (or mantissa): (–1) sign ×  significand ×  10 exponent –more digits for significand gives more accuracy –more bits for exponent increases range representation: DFP formats: –decimal32: DFP storage format encoded in 32-bit –decimal64: DFP computational format encoded in 64-bit –decimal128: DFP computational format encoded in 128-bit DFP Number Representation

Ph.D final defence17 DFP Number format 1-bit Sign (S) is defined as same as BFP format w+5-bit combination (G) to two subfield: –5-bit (G 0 …G 4 ) to encode: 2 MSBs of exponent; 1 MSD of significand; Not-a-Number (NaN); Inf; –W-bit(G 5 …G w+4 ) as a suffix 2 MSBs derived from G 0 …G 4, which consists of w+2-bit nonnegative biased exponent.

Ph.D final defence18 DFP Exponent Exponent is “biased” to make sorting easier –Binary format (not decimal) –The actual exponent is e-101 for decimal32, e-398 for decimal64, e-6167 for decimal128 –Range of exponent is (emin−q+1) ≤ e ≤ (emax−q+1);

Ph.D final defence19 DFP Number format (Con.) J×10-bit Trailing Significand (T) Field: –Densely packed decimal (DPD) encoding 3-digit decimal number encoded to 10-bit binary number DPD converted to binary coded decimal (BCD) –Binary integer decimal (BID) encoding decimal number encoded by binary integer –Non-normalized decimal significand (-1) 0 × 0.00900 × 10 2 (-1) 0 × 0.09000 × 10 1 –DFP number’s Cohort

Ph.D final defence20 Parameters in DFP Format

Ph.D final defence21 Example Summary: DFP representation (–1) sign ×(significand)×10 exponent-bias Convert -8.35×10 -2 to decimal64 –Sign bit: “1” negative, “0” positive (sign 1) –Exponent: -2+398=396 (8-bit “0110001100”) –Significand: 835  (50-bit DPD coding “0…00 02 3D”) –Encoding of 5-bit MSBs ( G 0 …G 4 ) of Combinational field “01000” –Decimal-64 : “10100010001100…..00…1000111101” “A2 30 00 00 00 00 02 3D” (binary/hex)

Ph.D final defence22 Not-a-Number: G 0 …G 4 “11111”; Infinite Number: G 0 …G 4 “11110”, sign of Inf according to the sign bit; Overflow: If DFP numbers with absolute values are larger than the largest DFP number (|v max |=(10 q - 1)×10 emax-q+1 ) then overflow occurs. Underflow: If DFP number are less than the smallest DFP number (|v min |=10 emin-q+1 ) then underflow occurs. If the absolute value of DFP number is less than 10 emin and larger than 10 emax-q+1, it produces subnormal. Normal number: The remaining exponent values and significands represent normal numbers. DFP special values

Ph.D final defence23 Basic DFP arithmetic operations Two decimal-specific DFP operations –SameQuantum(DFP 1,DFP 2 ) –Quantize(DFP 1,DFP 2 ) DFP comparison operations –do not distinguish between redundant of the same number DFP conversion operations –DFP to BFP conversion (correctly rounded); –DFP to integer conversion Recommended DFP operations DFP Arithmetic Operations

Ph.D final defence24 Basic DFP arithmetic operations Two decimal-specific DFP operations –SameQuantum(DFP 1,DFP 2 ) –Quantize(DFP 1,DFP 2 ) DFP comparison operations –do not distinguish between redundant of the same number DFP conversion operations –DFP to BFP conversion (correctly rounded); –DFP to integer conversion Recommended DFP operations DFP Arithmetic Operations

Ph.D final defence25 Non-normalized decimal significand DFP number’s Cohort Standard defines the preferred (required) exponent (quantum) –Exact operation results: the cohort member is selected based on the preferred exponent (quantum) for a DFP result of that operation –Inexact operation results: the cohort member of least possible exponent is used to get the maximum number of significant digits DFP Number’s Cohort

Ph.D final defence26 Five types of active rounding modes –roundTiesToEven –roundTiesToAway –roundTiesToPositive –roundTiesToNegative –roundTowardZero Correct rounding and Faithful rounding IEEE 754-2008 require to satisfy the correct rounded results for all DFP arithmetic operations DFP operations should satisfy all rounding modes DFP Rounding Modes

Ph.D final defence27 Invalid operation: Operand is NaN; 0×Inf; quare- root of negative operand; default result is NaN Division by zero: if the dividend is a finite non-zero number and the divisor is zero. The default result is a +inf or −inf. Overflow operation: if the magnitude of a result exceeds the largest finite number representable in the format of the operation. Underflow operation: if the magnitude of a result is below 10 emin. Inexact: the correctly rounded result of an operation differs from the infinite precision result. DFP Exception Handling

Ph.D final defence28 DFP Addition/Subtraction

Ph.D final defence29 DFP Add/Sub Data flow

Ph.D final defence30 Step 1: equalize the exponents –add the mantissas only when exponents are the same. –the number with smaller exponent should be shifting its point to the left, and the number with larger exponent should be shifting its point to right. –Rewriting the operand with the smaller exponent could result in a loss of the least significant digits –keep guard digit, round digit, and stick digit for the operand with smaller exponent DFP Addition

Ph.D final defence31 DFP addition Step 2: add the mantissas 0099999x10 1 +0016234x10 -3 0999990x10 0 0000016(234)x10 0 1000006(234) x10 0 Step 3: Normalize the result if necessary

Ph.D final defence32 DFP addition Step 4: Round the number if needed 1000006234x10 0 =1000006x10 0 Step 5: Repeat step 3 if the result is no longer normalized The final result is 1000006 The correct answer is 1000006.234

Ph.D final defence33 Guard bits To help minimize rounding problems, IEEE specifies that intermediate steps of operations must store guard digits - additional internal digits that increase the precision of the operations. Previous example: add one extra digit. IEEE 754-2008 requires one guard digit, one rounded digit and one sticky digit to make rounding more accurate.

Ph.D final defence34 DFP add/sub

Ph.D final defence35 General Description: Addition

Ph.D final defence36 Example: Addition

Ph.D final defence37 Example: Addition (Con.)

Ph.D final defence38 DFU: IBM POWER6 and Z10

Ph.D final defence39 High performance Implementation

Ph.D final defence41 High performance Implementation [12] A. Vázquez and E. Antelo“A High-performance Significand BCD Adder with IEEE 754-2008 Decimal Rounding” ARITH19, Portland. June 08-10 2009

Ph.D final defence42 Evaluation Results and Comparison [Proposed]: A. Vázquez and E. Antelo“A High-performance Significand BCD Adder with IEEE 754-2008 Decimal Rounding” ARITH19, Portland. June 08-10 2009

Ph.D final defence43 DFP Multiplication

Ph.D final defence44 Scheme of decimal multiplier x : 1 9 6 3 × y : 8 1 4 5 = xy0: 5x 9 8 1 5 0 0 0 0 0 xy1: 5x 9 8 1 5 −x - 1 9 6 3 xy2 : x 1 9 6 3 0 0 0 0 0 xy3: 10x 1 9 6 3 0 −2x - 3 9 2 6 1 5 9 8 8 6 3 5

Ph.D final defence45 Partial product generation Generate XY i Y i {1,2,3…7,8,9} XY i is carry save format

Ph.D final defence46 Partial product generation Solid Circles: BCD Sum (digit) Hollow Circles: Carry (bit) n-digit radix-10 CSA m-digit radix-10 counter

Ph.D final defence47 Carry Save Adder Tree CSA Tree to Generate Multiplication Result

48 Nov. 26, 2010 Flowchart of DFP Multiplier

49 Nov. 26, 2010 Architecture of DFP Multiplier

50 Nov. 26, 2010 Exception Detection & Handling Invalid operation –sNaN (pass significand of sNaN) –0 x ∞ (produce qNaN with significand 0) Overflow (and Inexact) –IE IP – SLA > Emax –Increase SLA until all LZs removed Underflow (and possibly Inexact) –IE IP – SLA < Emin –Decrease SLA until 0, then shift right Inexact

51 Nov. 26, 2010 Implementation Highlights Leverage operands' LZCs –SC, SLA, and IE SIP Handle NaNs with minimal overhead –No dataflow modification –Coerce multiplicand or multiplier to 1 Support gradual underflow –No dataflow modification –Simply extend number of iterations Simple, control-based rounding scheme

52 Nov. 26, 2010 Synthesis Results 64-bit (16 digit) operands, DPD encoded LSI Logic's gflxp 0.11um CMOS, 55ps FO4 Synopsys Design Compiler Results –Fixed-point119,653 um 2 14.72 FO4s –Floating-point237,607 um 2 15.45 FO4s Critical path –Fixed-point4:2 compressor (accumulator) –Floating-point128-bit barrel shifer

53 Nov. 26, 2010 Applicability to Parallel Designs IE and IP shift generation Rounding scheme NaN handling Exception detection and handling On-the-fly sticky bit generation... NO

54 Nov. 26, 2010 Sequential vs. Parallel Sequential –Less area –Potentially better cycle time Parallel –Less latency –Higher throughput

Ph.D final defence55 DFP Division

Ph.D final defence56 DFP Division Data Flow Unpacking Decimal Floating- Point Number Check for zeros and infinity Subtract exponents Divide Mantissa Normalize and detect overflow and underflow Perform rounding Replace sign Packing

Ph.D final defence57 Unpacking and Sign Logic Step1: Unpacking Floating-Point Number Check for zeros and infinity (if F=0, Stop) Step2: Sign Process

Ph.D final defence58 Exponent Subtraction Step3: Exponent Subtract

Ph.D final defence59 Mantissa Division Step4: Mantissa Division Algorithms Choose here? 1. Restoring division 2. Non-restoring division 3. High-Radix division 4. Convergence division

Ph.D final defence60 Normalization Step5 : Left shift over one bit is needed to make Mantissa result Normalized, also need to detect overflow and underflow For example: “ 0934 … 2140819564 ” Left shift one bit  “ 934 … 21408195640 Should tell exponent and Ea=Eb-1

Ph.D final defence61 Rounding and Packing Step6 : Truncate, Round-up, Round-to-nearest. Sometimes, the Rounding Policy above is not fair, according to IEEE Rounding standard: “ Round to nearest even ” is more better. Step7: Packing the Sign bit and Exponent bits and Significand bits together, detect the NaN, Infinity,

Ph.D final defence62 High performance Implementation [1] L.-K. Wang and M. J. Schulte, “Decimal Floating-Point Division Using Newton-Raphson Iteration,” Proceedings of the IEEE International Conference on Application-Specific Systems, Architectures and Processors, pp. 84-95, Sep. 2004.

Ph.D final defence63 High performance Implementation [2] Tomás Lang and Alberto Nannarelli, “A Radix-10 Digit-Recurrence Division Unit: Algorithm and Architecture,”IEEE Transactions on Computers, pp727–739, IEEE, June 2007.

Ph.D final defence65 Evaluation Results and Comparison 1 : Synthesized with a STM 90-nm standard cell library DFP Divider [1] DFP Divider [2] Precision (digit)16 (decimal64) Cycle time (ns)0.571 # of cycles15020 Latency (ns)85.520

Ph.D final defence66 DFP Transcendental Arithmetic

Ph.D final defence67 Contents Introduction Decimal Logarithmic Converter Decimal Antilogarithmic Converter Conclusions Future Work

Ph.D final defence68 32-bit DFP Logarithm coefficient is a non-normalized decimal Integer. To guarantee a 32-bit DFP Calculation, there need to keep 14-digit FXP logarithmic calculation. Example:

Ph.D final defence69 32-bit DFP Antilogarithm Here: For 32-bit DFP: Example : To guarantee a 32-bit DFP calculation, there need to keep 8-digit FXP antilog calculation.

Ph.D final defence70 Digit-Recurrence Algorithm (Log) The corresponding recurrences: Here: selected so that converges to 1 e j ∈｛ -9 -8 -7…0 1…7 8 9 ｝

Ph.D final defence71 Digit-Recurrence Algorithm (Antilog) Any 7-digit fixed-point decimal input N: The corresponding recurrences: Here: selected so thatconverges to 0 e j ∈｛ -9 -8 -7…0 1…7 8 9 ｝

Ph.D final defence72 Selection By Rounding (cont.) A scaled remainder is defined as: is achieved by Rounding W [j] e 1 is achieved by using look-up table, e 2 …e j can be obtained with selection by rounding Log: Antilog:

Ph.D final defence73 Architecture: Decimal Log Converter

Ph.D final defence74 Implementation Results Logic UtilizationUsedAvailable*Utilization # of Occupied Slices28421369621% Maximum Frequency 47.7 MHz # of Clock Cycles17 clock cycle *: Xilinx Virtex2p XC2VP30 with package ff1157 and speed -7 Critical Path Detail (ns): Reg2Mux2Mult 2ShifterMux5CLARoundTotal 1.1881.5649.3471.4381.3505.5190.56620.97

Ph.D final defence75 Architecture: Dec. Antilog Converter

Ph.D final defence76 Implementation Results Logic UtilizationUsedAvailable*Utilization # of Occupied Slices23151369617% Maximum Frequency 51.5 MHz # of Clock Cycles11 clock cycle *: Xilinx Virtex2p XC2VP30 with package ff1157 and speed -7 Critical Path Detail (ns): Reg6MultMux4ShifterCLARoundTotal 1.5997.8391.5391.1006.7940.54519.42

Ph.D final defence77 Comparison (with Binary FXP Log and Exponential Converters) similar dynamic range for the normalized coefficients. Binary reference available having the same digit- recurrence algorithm with Selection by Rounding. The radix-10 is close to radix-8.

Ph.D final defence78 Comparison (cont.) (with Binary FXP Log and Exponential Converters) 1 : Synthesized with a TMSC 0.18-um standard cell library 2 : the area of 1-bit full adder 3 : the delay of 1-bit full adder Radix-10 Decimal 1 Radix-8 Binary [1] Log.Exp.Log.Exp. Precision (digit)7167 24532453 Area (fa 2 )163026401370226064718296271777 Cycle time (T 3 )171916187878 # of cycles8178 8181121 Latency (T 3 )1363231283065614477168

Ph.D final defence79 Conclusions Achieved 32-bit DFP accuracy of decimal log and antilog results. Implemented them on FPGA and ASIC. Compare them with binary converters.

Ph.D final defence80 EE990 April. 200980/18 Decimal Log and Antilog Converters Future Work The 64-bit and 128-bit DFP logarithm and antilog converters. The presented architecture can be optimized to achieve a faster speed or occupy a smaller area.

Ph.D final defence81 Summary IEEE 754-2008 defines a DFP standard that defines –number representation in several precisions –correct DFP arithmetic operations –rounding modes Implementation of DFP Adder, Multiplier, Divider, Logarithmic and Antilogarithmic Converter Implementing and programming DFP are both really hard.

Ph.D final defence1 Algorithms and Architectures for Decimal Transcendental Function Computation Ph.D Candidate: Dongdong Chen Department of Electrical.

Similar presentations

Presentation on theme: "Ph.D final defence1 Algorithms and Architectures for Decimal Transcendental Function Computation Ph.D Candidate: Dongdong Chen Department of Electrical."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Ph.D final defence1 Algorithms and Architectures for Decimal Transcendental Function Computation Ph.D Candidate: Dongdong Chen Department of Electrical.

Similar presentations

Presentation on theme: "Ph.D final defence1 Algorithms and Architectures for Decimal Transcendental Function Computation Ph.D Candidate: Dongdong Chen Department of Electrical."— Presentation transcript:

Similar presentations

About project

Feedback