Download presentation
Presentation is loading. Please wait.
Published by주이 원 Modified over 6 years ago
1
CSE 575 Computer Arithmetic Spring 2003 Mary Jane Irwin (www. cse. psu
CSE 575 Computer Arithmetic Spring Mary Jane Irwin ( These slides need more work!!
2
Division by Reciprocation
Assuming a fast multiplier is provided, another way to do division is via reciprocation Q = P/D P * 1/D that can be particularly efficient if several divisions by the same divisor need to be performed. Compute reciprocal by Series expansion (Taylor series) Additive iteration (Newton-Raphson) Also consider that divide happens infrequently would like divide time to approx. equal multiply time (if possible)
3
Two Goals Shoot for logarithmic convergence (e.g., double the number of “resolved” quotient bits each iteration) Use only simple operations (e.g., add, subtract, compare, multiply)
4
Reciprocation by Series Expansion
Let D = 1 + X and ½ ≤ D < 1 Then, based on the Maclaurin series g(X) = 1/D = 1/(1+X) = 1 – X + X2 –X3 +X and since X = D - 1, the above can be factored (for ½ ≤ D < 1) into 1/D = (1 - X)(1 + X2)(1 + X4)(1 + X8)(1 + X16). . . Notice that the 2’sc of 1 + Xj = 1 - Xj since 2 – (1 + Xj) = 1 - Xj and (1 + Xj) (1 - Xj) = 1 - X2j Maclaurin series is a special case of the Taylor series
5
IBM 360/91 Approach Compute 1/D to 32-bit precision via
(1 – X)(1 + X2)(1 + X4) by table look-up 1 - X8 = [(1 – X)(1 + X2)(1 + X4)](1 + X) 1 + X8 is the 2’sc of 1 - X8 1 – X16 = (1 + X8)(1 – X8) 1 + X16 is the 2’sc of 1 – X16 1 – X32 = (1 + X16)(1 – X16) 1 + X32 is the 2’sc of 1 – X32 Requires a 28 x 8 table look-up and three multiplies to compute the needed terms start off with a ROM table look up (for speed) - want to start off with 8 bits (to the right of the binary point) “correct” so need an 2**8 x8 bit ROM (or better a 2**10x8 bit ROM) to give the first 8 bits of the inverse
6
Series Expansion Calculations
1/D = (1 - X)(1 + X2)(1 + X4)(1 + X8)(1 + X16) (1 + X32) table look-up (1 + X) * (1 - X8) * 2s’c (1 + X8) * * (1 – X16) 2s’c (1 + X16) Need two multipliers per iteration * * (1 – X32) 2s’c (1 + X32)
7
Additive Iteration Function must be based on a continuous and differentiable function of the form f(X) = 0 finding the root of Q = P/D or 1/D (or something close) where we can develop an iterative method for finding the root where the iterations contain only simple operations (i.e., no divide)
8
Newton-Raphson Approach
Newton-Raphson method Determine a root of f(X) = 0 giving the iterative recurrence Xi+1 = Xi - f(Xi)/f’(Xi) f’(Xi) = f(Xi)/(Xi-Xi+1) f(X) Xi+1 Xi tangent at Xi a second order equations (I.e., uses the 1st derivative) and will need 2 multiplies per iteration Xi+2 Root
9
Newton-Raphson Reciprocation
Newton-Raphson applied to reciprocation uses f(X) = 1/X – D = 0 which has a root at X = 1/D Since f’(X) = -(1/X)2, gives the recurrence Xi+1 = Xi (2 – XiD) Chose X0 such that 0 < X0 < 2/D Requiring two multipliers per iteration In general, for D in [1/2, 1) so that 1/D is in (1,2] -> picking X0 = 1.5 is simple and adequate (and since error0 < 1/D convergence is guaranteed) or choosing X0 = 1 also converges and first iteration only requires one multiply choosing X0 = 1 approaches the root from below f(X) is continuous and differentiable and 1/X = D, so X = 1/D so it has a root at the reciprocal simple iteration using 2 multiplies
10
Decimal Example Find 1/D where D = 0.75 (1/D= 1.33333. . .)
For lecture X3 = = 3 =
11
Convergence Rate Gives quadratic convergence (i+1 ≤ |i|2)
Xi+1 = Xi (2 – XiD) and i = 1/D – Xi So Xi = 1/D - i = (1 – Di )/D and i+1 = 1/D – Xi+1 i+1 = 1/D – [Xi (2–XiD)] = [1-2DXi+(DXi)2]/D And substituting for Xi i+1 = [1 - 2D((1–D i)/D) + (1–Di)2]/D = D i2 Recall that D < 1, so i+1 ≤ |i|2 to get 32 bits of precision, need 5 iterations each requiring two multiplies
12
Binary Example Find 1/D where D = 0.75 = 0.1100
X1 = 2-D = = 1 ≤ 2-2 X2 = = 2 ≤ 2-4 For lecture X3 = = 3 ≤ 2-8
13
Initial Values For D in [½, 1) a good initial value would be
X0 = 1.5 since it limits 0 to the maximum of 0.5 A better approximation would be X0 = 4(√3 – 1) - 2D = D that can be obtained easily and quickly from D by shifting and adding
14
Speeding it Up Iterative division takes 2log2n - 1 multiplications
So with 64-bit numbers and a 5 ns multiplier, division would need *6 - 1 = 55 ns Speedups are possible through Reducing the number of multiplies by doing a better initial guess (e.g., with a table look-up) Using narrower multiplications Performing the multiply faster
15
Key References Anderson, The IBM system 360/91 floating point execution unit, IBM J Res. Development, 11(1):34-53, 1967. Flynn, On division by functional iteration, IEEE Trans. on Computers, C-19(8): , 1970. Oberman and Flynn, Division algorithms and implementation, IEEE Trans. on Computers, C-46(8): , 1997. Waser and Flynn, Introduction to Arithmetic for Digital Systems Designers, HRW, 1982.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.