1/30 Division by Convergence 授課老師：王立洋老師製作學生： M9535204 蔡鐘葳.

1/30 Division by Convergence 授課老師：王立洋老師製作學生： M9535204 蔡鐘葳

2/30 Outline ▓ Speedup of Convergence Division ▓ Hardware Implementation ▓ Analysis of Lookup Table Size ▓ Reference

3/30 16.4. Speedup of Convergence Division

4/30 Introduction Compute y = 1/d Do the multiplication yz Division can be performed via 2  log2 k  – 1 multiplications This is not yet very impressive 64-bit numbers, 5-ns multiplier  55-ns division

5/30 Three Types of Speedup Three types of speedup are possible: Reducing the number of multiplications (reduce m) Using narrower multiplications (reduce the width of some x (i) s) Performing the multiplications faster

6/30 Initial Approximation Convergence is slow in the beginning: It takes 6 multiplications to get 8 bits of convergence and another 5 to go from 8 bits to 64 bits Since x (0) x (1) x (2) is essentially an approximation to 1/d, these four initial multiplications can be replaces by a table-lookup step that directly supplies x (0+)

7/30 Initial Approximation via Table Lookup A 2 w  w lookup table is necessary and sufficient for w bits of convergence after the first pair multiplications Approx to 1/d Better approx Read this value, x (0+), directly replaced by a table-lookup step, thereby reducing 6 multiplications to 2 d x (0) x (1) x (2) = (0.1111 1111... ) two

8/30 Example with 4-bit lookup Example with 4-bit lookup: d = (0.1011 xxxx...) two 11/16  d < 12/16 Inverses of the two extremes are 16/11  1.0111 and 16/12  1.0101 So, 1.0110 is a good estimate for 1/d 1.0110  0.1011 = (11/8)  (11/16) = 121/128 = 0.1111001 1.0110  0.1100 = (11/8)  (3/4) = 33/32 = 1.000010

9/30 Fig. 16.3 Fig. 16.3 Convergence in division by repeated multiplications with initial table lookup. After table lookup and first pair of multiplications, replacing several iterations After the second pair of multiplications

10/30 Fig. 16.3 For division by repeated multiplications We saw that convergence to 1 and q occurred from below If at some point in our iterations, d (i) overshoots 1 (becomes 1 + ε) The next multiplicative factor 2 － d (i) = 1 － ε will lead to a value smaller than 1 But still closer to 1, for d (i+1)

11/30 Analysis the Truncating Multiplicative (1/2) We begin by noting that dx (0) x (1) … x (i) = 1 – y (i) x (i+1) = 2 – (1 – y (i) ) = 1 + y (i) Assume that we truncate 1 – y (i) to an a-bit fraction Thus obtaining (1 – y (i) ) T with an error of α< 2 -a

12/30 Analysis the Truncating Multiplicative (2/2) With this truncated multiplicative factor, we get x (i+1) = 2 – (1 – y (i) ) = 1 + y (i) Where 0 ≦ (x (i+1) ) T – x (i+1) < 2 -a Thus dx (0) x (1) … x (i) x (i+1) T = (1 – y (i) )(1 + y (i) + α) = 1 – (y (i) ) 2 + α(1 – y (i) ) = dx (0) x (1) … x (i) x (i+1) + α(1 – y (i) )

13/30 Fig. 16.4 Fig. 16.4 Convergence in division by repeated multiplications with initial table lookup and the use of truncated multiplicative factors.

14/30 Fig. 16.4 The first pair of multiplications following the table- lookup involve a narrow multiplier It may be faster than a full-width multiplications If the multiplier is suitably truncated The result is that convergence occurs from above or below

15/30 Fig. 16.5 Fig. 16.4 One step in convergence division with truncated multiplicative factors.

16/30 Fig. 16.5 If we aim to go from l bits to 2l bits of convergence We can truncate the next multiplicative factor to 2l Bits Consider Fig. 16.5 A is the result of precise iteration, is no more than 2 -2l below 1 With a = 2l, B, arrived at by the approximate iteration, will be no more than 2 -2l above 1

17/30 Example 64-bit multiplication Initial step: Table of size 256  8 = 2K bits Middle steps: Multiplication pairs, with 9, 17, and 33-bit multipliers Final step: Full 64  64 multiplication

18/30 16.5. Hardware Implementation

19/30 Hardware Implementation Fig. 16.6 Two multiplications fully overlapped in a 2-stage pipelined multiplier.

20/30 Fig. 16.6 As the computation of z (i) x (i) moves from the top to the bottom pipeline stage The next iteration begins by computing the stage of d (i+1) x (i+1)

21/30 Implementing Division with Reciprocation Reciprocation: Multiplication pairs are data- dependent, so they cannot be pipelined or performed in parallel Since in the recurrence x (i+1) = x (i) (2 - x (i) d) The second multiplication by x (i) needs the result of the first one The most promising speedup method relief on deriving a better starting approximation to 1/d

22/30 The Required Lookup Table The Required Lookup Table can be made smaller, or totally eliminated, by a variety of methods Store the reciprocal values for fewer points Use linear or higher-order interpolation to compute the starting approximation Formulate the starting approximation as a multi-operand addition problem Use or pass through the multiplier’s CSA tree, suitably augmented, to compute it

23/30 16.6. Analysis of Lookup Table Size

24/30 Theorem for Table Size Theorem 16.1: To get w  5 bits of convergence after the first iteration of division by repeated multiplications, w bits of d (beyond the mandatory 1) must be inspected. The factor x (0+) read out from table is of the form (1.xxx... xxx) two, with w bits after the radix point Based on the theorem, the required table size is 2 w × w The cases w < 5: Practically uninteresting (allow smaller table) We can ignore them

25/30 Analysis of Lookup Table Size (1/4) Recall that our objective is to have 1 – 2 -w ≦ dx (0+) ≦ 1 + 2 -w Let d = (0.1 d -2 d -3 ) …d -(w+1) d -(w+2) …d -l ) two ----------------------- w bits to be inspected Theorem 16.1 postulates the existence of x (0+) = (1. x + -1 x + -2 …x + -w ) two satisfying the objective inequality

26/30 Analysis of Lookup Table Size (2/4) Let u = (1 d -2 d -3 ) … d -(w+1) ) two satisfying 2 w ≦ u < 2 w+1 We have 2 -(w+1) u ≦ d < 2 -(w+1) (u+1) Similarly, let v = (1x + -1 x + -2 …x + -w ) two The objective inequality can be rewrite as 2 w – 1 ≦ dv ≦ 2 w + 1

27/30 Analysis of Lookup Table Size (3/4) We derive the following sufficient conditions 2 w - 1 ≦ 2 -(w+1) uv 2 -(w+1) (u+1)v ≦ 2 w + 1 The conditions lead to the following restrictions on v

28/30 Analysis of Lookup Table Size (4/4) The latter condition is equivalent to The last inequality always holds is left as an exercise Completes the “sufficiency” part of the proof At least w bits of d must be inspected x (0+) must have at least w bits after the radix point

29/30 Example Table 16.2 Sample entries in the lookup table replacing the first four multiplications in division by repeated multiplications ––––––––––––––––––––––––––––––––––––––––––––––––––––––– Address d = 0.1 xxxx xxxx x (0+) = 1. xxxx xxxx ––––––––––––––––––––––––––––––––––––––––––––––––––––––– 55 0011 0111 1010 0101 64 0100 0000 1001 1001 ––––––––––––––––––––––––––––––––––––––––––––––––––––––– Example: Table entry at address 55 (311/512  d < 312/512) For 8 bits of convergence, the table entry f must satisfy (311/512)(1 +. f)  1 – 2 –8 (312/512)(1 +. f)  1 + 2 –8 199/311 .f  101/156 or 163.81 ≤ 256 . f ≤ 165.74 Two choices: 164 = (1010 0100) two or165 = (1010 0101) two

30/30 Reference [1] Behrooz Parhami, “Computer Arithmetic Algorithms and Hardware Designs,” Oxford University Press. 2000.

1/30 Division by Convergence 授課老師：王立洋老師製作學生： M9535204 蔡鐘葳.

Similar presentations

Presentation on theme: "1/30 Division by Convergence 授課老師：王立洋老師製作學生： M9535204 蔡鐘葳."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1/30 Division by Convergence 授課老師：王立洋老師 製作學生： M9535204 蔡鐘葳.

Similar presentations

Presentation on theme: "1/30 Division by Convergence 授課老師：王立洋老師 製作學生： M9535204 蔡鐘葳."— Presentation transcript:

Similar presentations

About project

Feedback

1/30 Division by Convergence 授課老師：王立洋老師製作學生： M9535204 蔡鐘葳.

Presentation on theme: "1/30 Division by Convergence 授課老師：王立洋老師製作學生： M9535204 蔡鐘葳."— Presentation transcript: