Presentation is loading. Please wait.

Presentation is loading. Please wait.

CSE 246: Computer Arithmetic Algorithms and Hardware Design Instructor: Prof. Chung-Kuan Cheng Fall 2006 Lecture 8: Division.

Similar presentations


Presentation on theme: "CSE 246: Computer Arithmetic Algorithms and Hardware Design Instructor: Prof. Chung-Kuan Cheng Fall 2006 Lecture 8: Division."— Presentation transcript:

1 CSE 246: Computer Arithmetic Algorithms and Hardware Design Instructor: Prof. Chung-Kuan Cheng Fall 2006 Lecture 8: Division

2 CSE 2462 Topics: Radix-4 SRT Division Division by a Constant Division by a Repeated Multiplication

3 CSE 2463 Project Update  Come in to speak briefly about the final project Status Update 2:30 – 3:00 p.m. Tuesday or Thursday

4 CSE 2464 Radix-4 SRT Division  4s j-1 = q j d + s j where q j is in [-2,2] and s j-1 is in [-hd,+hd] h is less than or equal to 2/3 Therefore, s j-1 is in [-2d/3, 2d/3] And, 4s j-1 is in [-8d/3, 8d/3]  s shifts to the left by 2 bits

5 CSE 2465 Radix-4 SRT Division 0.0 0.1 1.0 1.1 10.0 10.1 11.0.101.110.1111.00.1 2d/3 -2d/3 q j =1 q j =0 q j =2  The overlap regions of q j denote a choice still allowing for recursion. The gap defines the precision for carry save addition. Anything above 8d/3 goes against our assumption and is therefore the infeasible region 4s j-1 d d/3 8d/3 5d/3 4d/3

6 CSE 2466 Radix-4 SRT Division  The value of q j determines the range it governs  For example, q j = 1 1 + 2/3 = 5/3 1 – 2/3 = 1/3 The range is 1/3 to 5/3

7 CSE 2467 Division by a Constant  Multiplication is O(log n) but division is linear…much slower Try to convert division to multiplication  Property: Given an odd number d m such that d*m = 2 n – 1  Ex. d = 3, m = 53*5 = 2 4 – 1 d = 7, m =97*9 = 2 6 – 1 d = 11, m = 93 11 * 93 = 2 10 - 1 E

8 CSE 2468 Division by a Constant  1/d = m/(2 n – 1)  1/(1-r) = 1+r+r 2 +r 3 + … = (1+r)(1+r 2 )(1+r 4 )(1+r 8 ) …  Example z/7 = zm/(2 n -1), m=9, n=6 log(n/6) operations m 1 2 n 1-2 -n = 2n2n m (1+2 -n )(1+2 -2n )(1+2 -4n ) z 9 2 6 1-2 -6 = 2626 9z (1+2 -6 )(1+2 -12 )(1+2 -24 )

9 CSE 2469 Division by Reciprocation  Find 1/d with iteration  Newton Raphson Algorithm x i+1 =x i -f(x i )/f ’ (x i )  Set f(x)=1/x-d, (1/2<=d<1) We have f ’ (x)=-1/x 2  Thus x i+1 =x i (2-x i d)  Let e i =1/d-x i We have e i+1 =1/d-x i+1 =1/d-x i (2-x i d) =d(1/d-x i ) 2 =de i 2  The convergence rate is quadratic.  For k iterations, it takes 2k multiplications

10 CSE 24610 Division by Reciprocation  z/d=3/0.7  x 0 =4(3 1/2 -1)-2d=2.9282-2d=1.5282  e 0 =1/d-x 0 =1/0.7-1.5282=-0.0996286  x 1 =x 0 (2-x 0 d)=1.42164  e 1 =1/d-x 1 =1/0.7-1.42164=0.0069314  x 2 =x 1 (2-x 1 d)=1.4285377  e 2 =1/d-x 2 =1/0.7-1.4285377=0.0000337  x 3 =x 2 (2-x 2 d)=1.4285715  e 3 =1/d-x 3 =1/0.7-1.4285715=-0.000000(1)  The convergence rate is quadratic.

11 CSE 24611 Division by Recursive Multiplication  q = z/d = (z/d) (x 0 /x 0 ) (x 1 /x 1 ) … (x k-1 /x k-1 ) eq(a)  Let ½ <=d<1  It takes 2k multiplication for eq(a)  We also need k operations to find x i

12 CSE 24612 Division by a Repeated Multiplication  q = z/d = (z/d) (x 0 /x 0 ) (x 1 /x 1 ) … (x k-1 /x k-1 )  Let ½ <=d<1  Set d 0 =d, x k = 2-d k 1. d 1 = dx o = d(2-d) = 1-(1-d) 2 2. d k+1 = d k x k = d k (2-d k ) = 1-(1-d k ) 2 3. 1-d k+1 = (1-d k ) 2 =(1-d) 2 k quadratic convergence  For k-bit operands, we need 2m-1 multiplications m 2 ’ s complement m = ceiling(log 2 k) with log 2 m extra bits for precision

13 CSE 24613 Division by a Repeated Multiplication  q = z/d=3/0.7 = (z/d) (x 0 /x 0 ) (x 1 /x 1 ) … (x k-1 /x k-1 )  d 0 =d=0.7, x k = 2-d k, d k+1 =d k x k 1. x 0 =2-d 0 =1.3, d 1 =d 0 x o = 0.7x1.3 = 0.91 2. x 1 =2-d 1 =1.09, d 2 =d 1 x 1 =0.91x1.09=0.9919 3. x 2 =2-d 2 =1.0081, d 3 =d 2 x 2 =0.9919x1.0081=0.9999343

14 CSE 24614 Division Methods  Iteration  Memory  Arithmetic

15 CSE 24615 Division – Iteration effort  Pencil and paper method: (A=QB+2 -n R and R<B) 1 bit partial quotient per iteration, n iterations A = 0.1001, B = 0.1010; Q = A / B. Q = 0.1101 + Q i : Partial Quotient R i : Partial Remainder R i+1 = R i – B  Q i 1 0 0 11 0 R0=AR0=A 1 0 1 0 0 R2R2 0 0 0 1 0 0 0 R3R3 1 1 0 0 1 1 0 R4R4 1 0 0.1 1 0 0 0 R1R1 Q 1 = 0.1 Q 2 = 0.01 Q 3 = 0.000 Q 4 = 0.0001

16 CSE 24616 Division – Memory effort  Lookup table is the simplest way to obtain multiple partial quotient bits in each iteration.  SRT method: a lookup tables stores m-bit partial quotients decided by m bits of partial remainder and m bits of divisor. Table size: 2 2m  m  STR method is limited by memory wall.

17 CSE 24617 Division – Arithmetic effort  Partial quotient is calculated by arithmetic functions.  Prescaling:  Taylor expansion:  Series expansion:

18 CSE 24618 Division – Solution space  Modern FPGAs contains plenty of memory and build-in multipliers, which enable high performance divider. Iteration Effort Memory Effort Arithmetic Effort Memory Wall Pencil-and-paper SRT Prescaling Taylor Expansion Low area Series Expansion Low latency Our target

19 CSE 24619 Division – PST algorithm  Utilize the power of series expansion, but need a good start point.  Prescaling provide a scaled divisor close to 1.  0-order Taylor expansion iterates to reach the final quotient

20 CSE 24620 Division – PST algorithm E 0 = Table (d (m) )  1/d z 1 = z  E 0 ; d 1 = d  E 0 E 1 = (2  d 1 )  INV(d 1 (2m) ) Q i = R i-1  E 1 R i = R i-1  Q i  B 1 Q = Q + Q i z = 0.1011,0110 d = 0.1100,1011 B (m) = 0.1100  E 0 = 1.0011 E 1 = INV(d 1 (2m) ) = 1.0000,1110 z 1 = z  E 0 = 0.1101,1000,0010 d 1 = d  E 0 = 0.1111,0001,0001 Q 1 = z 1  E 1 = 0.1110,0011 R 1 = B 1 – Q 1  d 1 = 0.0000,0010,0101,1110,1101 Q 2 = R 1  E 1 = 0.1001,1111 R 2 = R 1 – Q 2  d 1 = 0.0000,0001,1111,1011,0001 Q = 0.1110,0011 + 0.0000,0010,0111,11 = 0.1110,0101,0111,11

21 CSE 24621 Division – FPGA Implementation  PST algorithm is suitable for high- performance division unit design in FPGAs Fmax (Period) ALUTsMemory Bits DSP Blocks Power Consumption (Dynamic+Static) Throughput IP Core (no DSP) 50.16MH z (19.935ns) 1203840381mW (52mW+329mW) 50.16Mdiv/s PST (DSP) 72.8MHz (13.737ns) 21376828350mW (23mW+327mW) 24.3Mdiv/s PST (no DSP) 73.20MH z (13.661ns) 14377680378mW (50mW+328mW) 24.4Mdiv/s PST-pipelined (DSP) 74.15MH z (13.486ns) 26176840344mW (17mW+327mW) 74.15Mdiv/s PSTp (no DSP) 76.05MH z (13.150ns) 19407680359mW (31mW+328mW) 76.05Mdiv/s 32-bit division with 5-cycle latency


Download ppt "CSE 246: Computer Arithmetic Algorithms and Hardware Design Instructor: Prof. Chung-Kuan Cheng Fall 2006 Lecture 8: Division."

Similar presentations


Ads by Google