CSE 575 Computer Arithmetic Spring 2005 Mary Jane Irwin (www. cse. psu

CSE 575 Computer Arithmetic Spring 2005 Mary Jane Irwin (www. cse. psu
CSE 575 Computer Arithmetic Spring Mary Jane Irwin (

Remaining Lecture Schedule
Mar 15 Introduction, number repr Dr. Irwin Mar 17 Local project design review Theo T. Mar 22 Global project review Dr. Vijay Mar 24 Mar 29 Addition Apr 1 Redundant repr & its uses Apr 5 Multiplication Apr 7 Local/Global project review Apr 12 More multiplication Apr 14 Division Apr 19 More division Apr 21 Final global project review Apr 26 Flt point repr & operation Apr 28

Division At first glance division appears to be the inverse of multiplication Q = P/D  Q * D = P shift and subtract cannot generate next quotient digit until the present one is determined and the subt and shift cycle is complete shift and add can generate all Q * D in parallel (PP array) Inherently a serial process Quotient digit selection is trial and error

Lower Bound on Division
Winograd’s lower bound on division of two n-digit d-valued numbers is t  log2n Division can be done as the subtraction of the log representation of two numbers A / B = C  log A - log B = log C but the data representation is (again) nonstandard Group operation whose output is dependent on all inputs And fewer bits are normally required in the log repr (e.g., log16 = 4 -> 3 bits instead of 5 bits), so mult should be even faster than addition

Division Operation Division as repeated shift & subtracts n n .
Q quotient ½  Q <1 . . P dividend D divisor 2n P < D ½  D <1 partial remainder array Q= P/D and so [1/2, 1) / [1/2, 1) = [1/2, 2) but by restricting P<D then Q [1/2, 1) and no overflow occurs . R remainder n

Remember Long Division?
Q quotient D divisor P dividend P < D partial remainder array + For lecture P= 7/16 and D = 10/16, n=4 so Q = 7/10 (89.6/128)= 11/16 (88/128) + 1/128 + . R remainder

Shift & Subt Division Left shift and subtract
Left shift the partial remainder Select next quotient digit, qi+1 Form the product of qi+1 * D Subtract that product from the current partial remainder to form the next partial remainder Inherently a serial process Right directed (most to least significant) Quotient digit selection is the hard part

Division Recursion Division iterations (q0 . q1 q2 q3 …qn-1)
P0 = P (the dividend and P < D) P1 = rP0 - q1 D (D is normalize divisor) . . . Pi+1 = rPi - qi+1 D for i = 0, 1, 2, … n-2 Pn-1 = the remainder after n iterations sign bit Q = P/D and since P [1/2,1) and D [1/2,1) then Q [1/2,2)

Proof of Convergence For i=0 P1 = rP0 - q1 D
Since P0 < D, then rP0 < rD so | P1 | = | rP0 | - | q1 D | < | rP0 | - (r-1) | D | = | rP0 | - r| D | + | D | so P1 < D By induction, convergence is guaranteed if qi+1 is selected so that Pi+1 < D i For i=n-2 Pn-1 = rn-1P0 - (rn-2q1 + rn-3q2 + … + qn-1)D P/D = r-iqi + r-n+1Pn-1/D Work with largest possible quotient digit – (r-1) for i=1, P2 = rP1 -q2D = r**2 P0 - r q1 D - q2 D = r**2 Pd - (r q1 + q2) D Q remainder

Quotient Digit Selection
Quotient digit selection is the crucial step to guarantee convergence RESTORING 0  Pi+1 < D so that qi  [0, … (r-1)] NONRESTORING | Pi+1 |  | D | so that qi  [-(r-1), …-1,1,… (r-1)] SRT | Pi+1 |  k | D | where ½ < k  1 qi  [-(r-1), …-1,0,1,… (r-1)]

0  Pi+1 < D so that qi  [0, … (r-1)]
Restoring Division “Guess and correct” 0  Pi+1 < D so that qi  [0, … (r-1)] Quotient digit selection rule subtract the divisor from the spr until the difference becomes negative, keeping track of the # of subs (from 1 to r) add back in the divisor one time to restore the spr to a positive value the net # of subs is the quotient digit value Need up to r+1 add/sub to generate one quotient digit; need n such cycles spr is shifted partial remainder

Binary Restoring Division
Basic iterations in binary are Pi+1 = 2Pi - qi+1 D where qi  [0,1] 0 if 2Pi < D  Pi+1 = 2Pi qi+1 = 1 if 2Pi  D  Pi+1 = 2Pi - D stop when 2Pi = 0 or after n iterations Really doing PTi+1 = 2Pi - D and if PTi+1 < 0 then qi+1= 0 & Pi+1 = PTi+1 + D if PTi+1  0 then qi+1= 1 & Pi+1 = PTi+1 average number of operations in binary n subtracts (one per digit) n/2 add (restores) on average n one bit left shifts 3n/2 add/subtract cycles NO WAY TO PARALLELIZE since quotient digits are formed serially restore op

Binary Restoring Divider
n bits n-b CPA Divisor register D n bits (Partial) Remainder P Q Subt/add control & sequencer Quotient digit selection Quotient register shift PQ left one bit subtract D from P and store results back in P if sign is 1 (negative) set qi+1 to 0 and add D back to P, store results back in P if sign is 0 (positive) set qi+1 to 1 loop Dividend register Tserial-divide = O(n(r+1) CPAtime) Division time grows superlinearly with n.

Restoring QP Diagram Pi+1 = 2Pi Pi+1 = 2Pi - D 2Pi Pi+1 convergence
qi+1= 0 qi+1= 1 2Pi convergence bounding box Pi+1

Restoring PD Plot 2Pi where 2Pi = 2D qi+1= 1 and D < 2Pi < 2D
For lecture qi+1= 0 and 2Pi < D D 1/2

Restoring Division Example
P0 = ½ P ½  | 2P0 | < 1 shift + subtract P1 positive so q1= 1 shift subtract PT2 negative so q2= 0 restore (add) + For lecture note that remainder is the same sign as the dividend Have to deal with sign corrections for RC, DRC, SM representations How would this work in base 4??? Would there be any savings??? shift subtract P3 positive so q3= 1 shift subtract P4 positive so q4= 1

Nonrestoring Division
Still “guess and correct” | Pi+1 |  | D | so that qi  [-(r-1), …-1,1,… (r-1)] Quotient digit selection rule subtract the divisor from the positive spr until the difference becomes negative, keeping track of the # of subs (from 1 to r-1) or add the divisor to the negative spr until the sum becomes positive, keeping track of the # of adds (from 1 to r-1) the net # of subs/adds is the quotient digit value Need up to r-1 add/sub to generate one quotient digit; need n such cycles spr is shifted partial remainder

Binary Nonrestoring Division
Basic iterations in binary are Pi+1 = 2Pi - qi+1 D where qi  [-1,1] 1 if 0  2Pi < 2D  Pi+1 = 2Pi - D qi+1 = -1 if -2D  2Pi < 0  Pi+1 = 2Pi + D stop when 2Pi = 0 or after n iterations average number of operations in binary n subtracts (one per digit) and n one bit left shifts n add/subtract cycles Hardware same as restoring (except with difference quotient digit selection logic shift PQ left one bit subtract/add D from/to P based on the sign of D and P and store results back in P if sign is 1 set qj+1 to -1 (add D to P) if sign is 0 set qj+1 to 1 (subtract D from P) loop How do we “store” a -1 in Q? and How do we convert Q to conventional form? NO WAY TO PARALLELIZE since quotient digits are formed serially

Nonrestoring QP Diagram
Pi+1 = 2Pi + D qi+1= -1 qi+1= 1 2Pi convergence bounding box Pi+1 = 2Pi - D Pi+1

Nonrestoring PD Plot 2Pi where 2Pi = 2D qi+1= 1 and 0  2Pi < 2D D
1/2 qi+1= D  2Pi < 0 For lecture where 2Pi = -2D

Nonrestoring Division Example
P0 = ½ P ½  | 2P0 | < 1 shift positive so subtract and q1= 1 P1 shift P2 positive so subtract and q2= 1 For lecture note that remainder is the same sign as the dividend Have to deal with sign corrections for RC, DRC, SM representations How would this work in base 4??? Would there be any savings??? shift negative so add and q3= -1 + P3 shift positive so subtract and q4= 1 P4

Converting “On-the-Fly”
Form a “pseudo” quotient digit called xi+1 = 0 if qi+1 = -1 (i.e., signs disagree so add) 1 if qi+1 = 1 (i.e., signs agree so subt) The basic recursion becomes Pi+1 = 2Pi + (1 – 2xi+1) D ( or 2-i-1Pi+1 = 2-i-12Pi + (2-i-1 – 2-i-12xi+1) D ) For i= P1 = 2P0 + (2-1–2x1) D For i= P2 = 2-1P1 + (( )–(2x1+ 2-1x2)) D … For i = n n+1Pn-1 = P0 + (  2-j –  2-j+1xj ) D n-2 j=1

Converting “On-the-Fly”, con’t
pseudo quotient n-2 So P/D = [-1 + 2–n+1 + 2-j+1xj] + (2-n+1Pn-1)/D j=1 note x1 has weight 20 (sign position) true quotient remainder So true Q = pseudo Q –n+1 and insert 0 into Q for selection qj+1 = -1 insert 1 into Q for selection qj+1 = 1 as correction step add n+1 to Q Note that the sum of 2**-j for j=1 to n-1 = …1 which is 1 – 2**(-n+1)

Nonrestoring Division Example
P0 = ½ P ½  | 2P0 | < 1 shift positive so subtract and q1= 1 P1 and x1 = 1 shift P2 positive so subtract and q2= 1 and x2 = 1 shift For lecture Or could correct by adding 1 to x1 and setting x5 = 1 (which doesn’t cost an extra add time!) negative so add and q3= -1 + P3 and x3 = 0 shift + positive so subtract and q4= 1 P4 and x4 = 1

Speeding It Up Use logn fast adder
Avoid add back cycle when quotient bit is 0 (i.e., nonrestoring division) Higher radix division Quotient digit selection Forming the multiples of the divisor Form the new partial remainder using CSAs (i.e., a carry save adder)

Quotient digit selection is the crucial step to guarantee convergence RESTORING 0  Pi+1 < D so that qi  [0, … (r-1)] NONRESTORING | Pi+1 |  | D | so that qi  [-(r-1), …-1,1,… (r-1)] SRT | Pi+1 |  k | D | where ½ < k  1 qi  [-(r-1), …-1,0,1,… (r-1)]

SRT Division With no redundancy, Q has only one representation so the selection of each quotient must be exact requiring a full precision comparison With redundancy, Q has several representations so qi+1 can be more than one value requiring only a limited precision comparison Removes the guess and correct cycle So the question is How limited?

Binary SRT Division | Pi+1 |  k | D | where ½ < k  1
Initial conditions ½  | D | < 1 and ½  | 2P0 | < 1 Basic recursion Pi+1 = 2Pi - qi+1 D Quotient digit selection -1 if 2Pi  -½  Pi+1 = 2Pi+D qi+1 = 0 if –½ < 2Pi < ½  Pi+1 = 2Pi 1 if 2Pi  ½  Pi+1 = 2Pi - D Stop when 2Pi = 0 or after n iterations k is the redundancy factor/coefficient k = rho/(r-1) where rho is the maximum digit value in the digit set - r/2 <= rho <= r-1 so for r=2, rho = 1 and k=1

Redundancy Definition Review
Signed-digit representation for base r has the digit set [- ,-  -1, …,-1, 0, 1, …,  -1, ] where  is in the range r/2    r-1 And k is the redundancy coefficient k =  /(r-1) maximally redundant minimally redundant r=2, rho=1; r=3, rho=2; r=4, rho=2; r=5, rho=3; r=6, rho=3; r=7, rho=4; r=8, rho=4 In base 4, with two digits Maximally redundant, -3-3 to 33, 7**2 = 49 reps, but 31 distinct numbers (-15 to +15) Minimally redundant, -2-2 to 22, 5**2=25 reps, but 21 distinct numbers (-10 to +10) so more numbers have two (or more) reps in maximally than minimally redundant, but some numbers have only ONE rep Redundancy coefficient k = rho/(r-1) and if k=1 maximally redundant, k = 1/2, no redundancy

After initialization (assuming operands are normalized, fixed point fractions; may require one right shift of dividend) If 2Pi is not normalized choose qi+1 = 0 If 2Pi is normalized choose qi+1 = -1 or +1 based on the signs of 2Pi and D To see why these rules work, look at the PD plot trying to force Pi+1 to zero

Binary SRT Division PD Plot
Pi+1 = 2Pi - qi+1 D so -kD  2Pi - jD  kD (-k+j)D  2Pi  (k+j)D and k = 1, j = 1,0,-1 2Pi qi+1= 1 and 0  2Pi  2D 2Pi = 2D qi+1= 0 and -D  2Pi  D 2Pi = D D For lecture – Pose the question, “what are the selection rules?” as set up for the next slide Pj+1 = 2Pj = qj+1D and Pj+1 <= k|D| so -kD <= 2Pj - iD<= kD where i = -1, 0, 1 so (-k+i)D <= 2Pj <= (k+i) D note that k=1 so when i=1 0 <= 2Pj <= 2D when i=0 -D <= 2Pj <= D when i=-1 -2D <= 2Pj <= 0 1/2 qi+1= D  2Pi  0 2Pi = -D 2Pi = -2D

Binary SRT Selection Rules
2Pi 2Pi = 2D qi+1= 1 qi+1= 0 D 1/2 qi+1= -1 For lecture Choose the simplest selection rules - breakpoints - for quotient digit selection = +1/2 and -1/2 to span the overlap region where two choices for qj+1 are valid 2Pi = -2D

Binary SRT QP Diagram Pi+1 = 2Pi + D Pi+1 = 2Pi Pi+1 = 2Pi - D . . .
qi+1= 1 D = 1 D = 3/4 D = 1/2 D = 1/2 D = 3/4 D = 1 . . . qi+1= 0 . . . qi+1= -1 2Pi -2D -D D 2D . . . convergence bounding box Pi+1

Binary SRT Quotient Coding
2Pi conventional recoding of Q 2Pi = 2D differential recoding of Q qi+1= 1 canonical recoding of Q qi+1= 0 D 1/2 qi+1= -1 For lecture For D = 1/2, qj+1 = 0 and 1 are equal in length (conventional recoding of quotient) For D = 3/4, qj+1 = 1, 0, and -1 are equal in length so a 1 or -1 will be followed immediately by a zero (canonical recoding) For D=1, qj+1 = 1 and -1 are equal in length and longer than qj+1 =0, so a 1 (or -1) will be followed by a -1 (or 1) perhaps after some intervening zeros (differentiating recoding) 2Pi = -2D

Binary SRT Divider n bits n-b CPA Divisor D n bits (Partial) Remainder P Q Add/no add/subt control & sequencer Quotient digit selection Quotient shift PQ left one bit subtract/add D from/to P based on the sign of D and P and msd of P and store results back in P if sign is 1 and P normalized set qi+1 to -1 (add D to P) if P not normalized set qi+1 to 0 (no op on P) if sign is 0 and P normalized set qi+1 to 1 (subtract D from P) loop How do we “store” a -1 in Q? and How do we convert Q to conventional form? How about using a CSA and keeping the partial remainder in a stored carry form so that the add time is constant? Complicates quotient digit selection (need more bits). See Figure 14.8 in Parhami. Dividend Quotient digit selection logic needs the sign bit of the divisor and the sign bit and most significant magnitude bit of the spr

Binary SRT Division Example
P0 = ½ P ½  | 2P0 | < 1 shift pos norm so subt and q1= 1 P1 shift P2 (negative) pos norm so subt and q2= 1 For lecture Note that normalized in 2’sc is 0.1 for positive and 1.0 for negative D’s D = 10/16 Restoring gave Q = with a remainder of 0010 Nonrestoring gave Q = with a remainder of 0010 SRT gives Q = with a remainer of -101 shift - not norm so q3= 0 P3 shift - not norm so q4= 0 P4 shift neg norm so add and q5= -1 P5 +

Another Binary SRT Division Example
P0 = ½ P ½  | 2P0 | < 1 shift pos norm so subt and q1= 1 P1 shift – not norm so q2 = 0 P2 shift P3 (negative) pos norm so subt and q3= 1 Note that normalized in 2’sc is 0.1 for positive and 1.0 for negative D’s D = 3/4 shift - not norm so q4= 0 P4 shift neg norm so add and q5= -1 P5 +

Speed Comparisons # add cycles RESTORING 3n/2 average case
2n-1 worst case n best case NONRESTORING n SRT n worst case n/3 best case Restoring - 3n/2 - 1/2 time guess is wrong 2n-1 worst case - guess wrong every time n best case - never guess wrong SRT - n - for D=1/2 and convention encoding of Q (and all 1’s) n/3 for D=3/4 and canonical encoding of Q

Base 4 SRT Division | Pi+1 |  k | D | where ½ < k  1
Initial conditions ½  | D | < 1 and ½  | 4P0 | < 1 Basic recursion Pi+1 = 4Pi - qi+1 D Quotient digit selection qi+1 =  [-3, -2, -1, 0, 1, 2, 3] (these values give easy multiples of D) k = /(r-1) = 2/3 or 1 Stop when 4Pi = 0 or after n/2 iterations k is the redundancy factor/coefficient k = rho/(r-1) where rho is the maximum digit value in the digit set - r/2 <= rho <= r-1 so for r=4, rho = 2 and k=2/ or rho = 3 and k = 1

Base 4 (k=2/3) SRT PD Plot Pi+1 = 4Pi - qi+1 D so
(-k+j)D  4Pi  (k+j)D 4Pi qi+1= 2 and 4/3D  4Pi  8/3D qi+1= 1 and 1/3D  4Pi  5/3D D qi+1= 0 and -2/3D  4Pi  2/3D 1/2 Pj+1 = 4Pj - qj+1D and Pj+1 <= k|D| so -kD <= 4Pj - iD<= kD where i = -2, -1, 0, 1, 2 so (-k+i)D <= 4Pj <= (k+i) D and k = 2/3 qi+1= -1 and -5/3D  4Pi  -1/3D qi+1= -2 and -8/3D  4Pi  -4/3D

Base 4 (k=2/3) SRT PD Plot Pi+1 = 4Pi - qi+1 D so
(-k+j)D  4Pi  (k+j)D 4Pi qi+1= 2 Overlap regions Pj+1 = 4Pj - qj+1D and Pj+1 <= k|D| so -kD <= 4Pj - iD<= kD where i = -2, -1, 0, 1, 2 so (-k+i)D <= 4Pj <= (k+i) D and k = 2/3 qi+1= 1 qi+1= 0 D 1/2

Base 4 (k=1) SRT PD Plot Pi+1 = 4Pi - qi+1 D so (-k+j)D  4Pi  (k+j)D
qi+1= 3 and 2D  4Pi  4D qi+1= 2 and D  4Pi  3D qi+1= 1 and 0  4Pi  2D D qi+1= 0 and -D  4Pi  D 1/2 Pj+1 = 4Pj - qj+1D and Pj+1 <= k|D| so -kD <= 4Pj - iD<= kD where i =-3, -2, -1, 0, 1, 2,3 so (-k+i)D <= 4Pj <= (k+i) D and k = 1 qi+1= -1 and 0  4Pi  -2D qi+1= -2 and - D  4Pi  -3D qi+1= -3 and -2D  4Pi  -4D

Base 4 (k=1) SRT PD Plot Pi+1 = 4Pi - qi+1 D so (-k+j)D  4Pi  (k+j)D
Pj+1 = 4Pj - qj+1D and Pj+1 <= k|D| so -kD <= 4Pj - iD<= kD where i =-3, -2, -1, 0, 1, 2,3 so (-k+i)D <= 4Pj <= (k+i) D and k = 1 Overlap regions qi+1= 1 qi+1= 0 D 1/2

Goals for QDS Must determined the breakpoints spanning the overlap regions such that we have to inspect the fewest digits of rPi and D (determines the number of inputs into the quotient digit selection (QDS) PLA or address bits for the QDS ROM) have the simplest and fewest number of comparison constants that can be expressed in the above precision (determines the size of the QDS PLA)

Base 4 (k=1) SRT PD ROM rPi = 1/4 (2+2 bits) D = 1/8 (3 bits) 4Pi
11.11 11.10 D = 1/8 (3 bits) 11.01 11.00 qi+1= 3 10.11 10.10 10.01 qi+1= 2 10.00 01.11 if ignore encoding the red stippled rectangles then end up with pentium bugs since part of the rectangle area is in the legal region even though the encoded point (lower left corner) is not in the legal region ! 01.10 01.01 qi+1= 1 01.00 00.11 00.10 00.01 qi+1= 0 D 0.100 0.101 0.110 0.111

11.1 D = 1/8 (3 bits) 11.0 qi+1= 3 10.1 10.0 qi+1= 2 01.1 easy to see that we were overly conservative with number of bits retained for 4Pi qi+1= 1 01.0 00.1 qi+1= 0 D 0.100 0.101 0.110 0.111

11.1 D = 1/4 (2 bits) 11.0 qi+1= 3 10.1 10.0 qi+1= 2 Source of Pentium divider bug (points were not encoded in QDS table) 01.1 AND we were overly conservative with number of bits retained for D qi+1= 1 01.0 00.1 qi+1= 0 D 0.10 0.11

Sufficient Precision Define  as the # of ms fractional bits of D
D < 2- is the maximum error incurred by truncating D after  fractional bits (always additive) D = s . d d d d d d d Define  as the # of ms fractional bits of rPi rPi < 2- is the maximum error incurred by truncating rPi after  fractional bits (always additive) rPi = s d d . d d d d d d d So our first task is to determine  and   D  rPi

How to Determine  and  If the breakpoints (“steps”) can be constructed in the worst case overlap region (the one requiring the most precision for selection) with precision  and , then it is guaranteed that steps can be determined in the remaining overlap regions The steepest and narrowest portion of the PD plot - at D near ½ and where qi+1 is either  or -1

Worst Case Overlap Region
all points in the box truncate to point C UP(-1)  (k+-1)D step region A rPi LO()  (-k+)D C B D above the breakpoint choose rho Consider values for the truncated rPi and D that results in a quotient digit selection of . Must guarantee that the digits ignored in rPi and D do not throw us out of the valid region for  (as in red box). 1/2

Defining the Break Points
D and rPi define a grid of all possible “steps” If one tread spans the entire regions (as in binary SRT) then we only need to know the sign of the normalized D May be good to optimize risers across all overlap regions to have fewest constants to store in the PLA Remember - steps must be representable in the precision D and rPi choose upper digit for grid truncating to here riser choose lower digit for grid truncating to here step

(k +  - 1)*½ - (-k + )*½ = k - ½
Starting the 1st Step Must be able to start the 1st step (point C) between UP(-1) and LO() at D = ½. The “distance” between is (k +  - 1)*½ - (-k + )*½ = k - ½ This determines the smallest value for  and defines the minimum precision of the steps. Determine where the step(s) are in the overlap region and choose the “highest” step (the one closest to UP(-1)). (Increasing  will give more steps in the overlap region and allow you to choose a higher step resulting in a smaller  value.) This is the new explanation

Starting the 1st Step Example #1
Base 4 and k = 1 Must be able to start the 1st step between UP(2) and LO(3) at D = ½ (k +  - 1)*½ - (-k + )*½ = k - ½ ( )*½ - (-1 + 3)*½ = ½ Thus, the smallest value for  is ½ as is the minimum precision of the steps.

1st step (in  bits)  (-k + )(½ + D)
Stopping the 1st Step Must be able to stop the 1st step and start the 1st riser (point B) before crossing LO(). 1st step (in  bits)  (-k + )(½ + D) This determines the smallest value for  and defines the minimum precision of the risers. Determine where the riser(s) are in the overlap region and choose the “rightmost” riser (the one closest to LO()). This is the new explanation

Stopping the 1st Step Example #1
Must be able to stop the 1st step and start the 1st riser before crossing LO(3) 1st step (in  bits)  (-k + )(½ + D) 1.5  (-1 + 3)(½ + D) = 1 + 2D ½  2D Thus, the smallest value for  is ¼ as is the minimum precision of the risers.

2nd step (in  bits)  (k +  - 1)(1st riser (in  bits))
Stopping the 1st Riser Must be able to stop the 1st riser (point A) before crossing UP(-1). Risers must stop at step precisions determined by . 2nd step (in  bits)  (k +  - 1)(1st riser (in  bits)) Once again start the 2nd step as high in the overlap region as possible. If the 1st step, riser pair did not start high enough & stop right enough, it could be that the 1st riser can not be stopped within the overlap region (in  bits) in which case either  or  (or both) must be increased. This is the new explanation

Base 4 (k=1) SRT PD PLA rPi = 1/2 (2+1 bits) D < 1/4 D = 1/4
11.1 D < 1/4 D = 1/4 (2 bits) 11.0 qi+1= 3 rPi < ½ 10.1 10.0 qi+1= 2 01.1 AND we were overly conservative with number of bits retained for D qi+1= 1 01.0 00.1 qi+1= 0 D 0.10 0.11

Starting the 1st Step Example #2
Base 4 and k = 2/3 Must be able to start the 1st step between UP(1) and LO(2) and (k +  - 1)*½ - (-k + )*½ = k - ½ (2/ )*½ - (-2/3 + 2)*½ = 2/3 - ½ Thus, the minimum value for  is 1/6, so the minimum precision of the steps is 1/8.

Stopping the 1st Step Example #2
Must be able to stop the 1st step and start the 1st riser before crossing LO(2). This distance determines the minimum . 1st step (in  bits)  (-k + )(½ + D) 3/4  (-2/3 + 2)(½ + D) = 4/6 + 4/3D 1/12  4/3D so 1/16  D Thus, the minimum value for  is 1/16 as is the minimum precision of the risers.

Base 4 (k=2/3) SRT PD PLA rPi = 1/8 (2+3 bits) D = 1/16 (4 bits) 4Pi
10.110 10.101 rPi < 1/8 10.100 D < 1/16 D = 1/16 (4 bits) 10.011 10.010 10.001 10.000 01.111 qi+1= 2 01.110 01.101 01.100 01.011 01.010 Pentium divider = 2**8 ROM entries per quadrant (1024 in all) 5 bits of rPi leaves 3 bits for D (not counting sign which tells which quadrant) 01.001 01.000 qi+1= 1 00.111 00.110 00.101 00.100 00.011 00.010 qi+1= 0 00.001 D 0.1000 0.1100

Radix 4 SRT Divider n+2 bits Divisor n+2-b CPA D !D
D !D Add/no add/subt control & sequencer Quotient digit selection (Partial) Remainder Need an n+2 bit adder to handle –2D multiples (1-shift and 1-negative) Sign extend during right shift Which is cheaper – if you want a scheme that is parallel – in terms of logic, speed, power??? P Q (-2D,-1D,0,1D,2D) Quotient +3 bits of P +1 bits of D Dividend Shift P || Q left 2 bits each iteration

Speed Comparisons # add cycles RESTORING 2n-1 to n NONRESTORING n
Binary SRT n to n/3 Base 4 SRT n/2 Restoring - 3n/2 - 1/2 time guess is wrong 2n-1 worst case - guess wrong every time n best case - never guess wrong SRT - n - for D=1/2 and convention encoding of Q (and all 1’s) n/3 for D=3/4 and canonical encoding of Q

Division Review Choose r, , and hence k
Make sure |P0| < |D| either by ensuring that ½  |D| < 1 and ½  |rP0| < 1 or that 1  |D| < 2 and ½  |P0| < 1 Construct the PD plot Decide on adder type (CPA or CSA or upper part CPA and lower part CSA) Determine  and  and construct the breakpoints in all overlap regions in  and  precision Determine how to form the multiples of D Determine how to deal with special cases (e.g., D = 0, P0 = 0, or 4Pi = 0 Convert Q to conventional format (or convert “on-the-fly”) Using a CSA complicates QDS - have to have both sum and carry information as table inputs - and allow for one more digit of each to allow for a carry that would have propagated

Key References Atkins, Higher-radix division using estimates of the divisor and partial remainders, IEEE Trans. Computers, 17(10): , 1968. Coe, Tang, It takes six ones to reach a flaw, Proc. 12th Symp. Computer Arithmetic, pp , July (Pentium divider bug paper) Ercegovac, Lang, On-the-fly conversion of redundant into conventional representations, IEEE Trans. Computers, 36(7): , 1987. Oberman, Flynn, Division algorithms and implementations, IEEE Trans. on Computers, 46(8): , 1997. Parhami, Computer Arithmetic, Oxford Univ. Press, 1999. Robertson, A new class of digital division methods, IRE Trans. Electronic Computers, 7: , Sept 1958. Taylor, Radix-16 SRT dividers with overlapped quotient digit selection stages, Proc. 7th Symp. Computer Arithmetic, pp , 1985. Tocher, Techniques of multiplication and division for automatic binary computers, Quarterly J. Mech. and Applied Math, 11(3): , 1958.

CSE 575 Computer Arithmetic Spring 2005 Mary Jane Irwin (www. cse. psu

Similar presentations

Presentation on theme: "CSE 575 Computer Arithmetic Spring 2005 Mary Jane Irwin (www. cse. psu"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CSE 575 Computer Arithmetic Spring 2005 Mary Jane Irwin (www. cse. psu

Similar presentations

Presentation on theme: "CSE 575 Computer Arithmetic Spring 2005 Mary Jane Irwin (www. cse. psu"— Presentation transcript:

Similar presentations

About project

Feedback