CSE 246: Computer Arithmetic Algorithms and Hardware Design Instructor: Prof. Chung-Kuan Cheng Winter 2004 Lecture 10 Thursday 02/19/02
CSE 2462 Topics: Rounding F.P. Numbers Ch. 11 (all)
CSE 2463 Rounding the numbers Why we need the Sticky bit Round bit Guard bit
CSE 2464 Example x x2 -3 Normalize according to exponent x x x2 4 Renormalize x2 3 Result = x2 3 Take 5 bits after decimal Round bit Sticky Bit
CSE 2465 Example x x2 -1 Normalize according to exponent x x x2 3 Renormalize x2 2 Result = x2 2 Take 5 bits after decimal Round bit Bit on the boundary Non-zero => round-up
CSE 2466 Theory behind it gr round guard Other bits OR Sticky bit When shifting right, don ’ t need to remember anything more than 3 bits below This is a necessary and sufficient condition The most we ever normalize is by just 1 bit after a subtraction, since all numbers are exponent- normalized before the operation
CSE 2467 Chapter 11 Polynomial Approximation of Functions
CSE 2468 Taylor Series f(x) = f(x 0 ) + Example: sin(x) = x – x^3/3! + x^5/5! – x^7/7!+ …
CSE 2469 Taylor Series Given: P N (x) = = c 0 +x(c 1 +x(c 2 + … +x(c N-1 +xc N ))))) R(N) =c N R(i-1) =c i-1 +xR(i) … P N (X) =R(0) How to calculate value of function? Group common factors …. N multiples and adds Recursively
CSE Taylor Series 1 adder => do it in series Given more components => can we go faster? Take N = 7 as example c 7 x 7 +c 6 x 6 +c 5 x 5 +c 4 x 4 +c 3 x 3 +c 2 x 2 +c 1 x 1 +c 0 How to accelerate?
CSE Taylor Series c 7 x 7 +c 6 x 6 +c 5 x 5 +c 4 x 4 +c 3 x 3 +c 2 x 2 +c 1 x 1 +c 0 But this is not much better. Still have overhead of 3 stages to generate x^7 + x x xxxx x Carry-save =constant time Log n x x2x2 x3x3 x4x4 x5x5 x6x6 x7x7
CSE Taylor Series c 7 x +c 6 c 5 x +c 4 x c 3 x +c 2 c 1 x +c 0 x 2( c 7 x +c 6 )+c 5 x +c 4 x x 2 (c 3 x +c 2 )+ c 1 x +c 0 x 4 [x 2( c 7 x +c 6 )+c 5 x +c 4 x]+x 2 (c 3 x +c 2 )+c 1 x +c 0 This is a bit faster. Only 2 stages But what is fastest way to produce result? & energy efficient? => minimize[# of multiplies] All this uses + ’ s and x ’ s. Need to get rid of them. => Let ’ s to try table look-up x x2x2 x4x4
CSE Taylor Series – Table look-up SRAM/DRAM => eat power ROM => better option f(x) = Suppose there is a table as a binary tree. Let x = x H + x L x 0 = x H Example X = x H = f(x H + x L ) = x L =
CSE Taylor Series – Table look-up 1 st order f(x H + x L ) ~= => Only 1 multiplication !!! x Table-1 Table-2 x + f(x H + x L ) xHxH xLxL f(x H ) f’(x H )
CSE Taylor Series With extra order => 1 Extra table and 1 multiplier If you wish to change the function, all you have to do is just change the content of the table Problem? => Now it ’ s the size of the table! L / 2^L
CSE Taylor Series Let ’ s reduce X into 3 sections (instead of the previous 2 (High and Low) ) x = x 1 +x 2 2 -k +x k => f( x) = f( x 1 +x 2 2 -k )+ x k + f ’ ( x 1 ) + Epsilon E ~= 2 -3k f(x) requires a 2 n x V n table 2 n : # of bits of x V n : # bits of f(x) 32bit x => 2^32 x 2^32 = 2^64 bits -> HUGE!! -> but do we really need all those # ’ s in the table??
CSE Taylor Series Let E = epsilon, [] = Lower limit x*y = (x+y)^2 / 4 – (x-y)^2 / 4 = ( [(x+y)/2] + E/2 )^2 - ( [(x-y)/2] + E/2 )^2 = [ (x+y)/2 ] ^ 2 - [ (x-y)/2 ] ^ 2 - E * y ……… x Content of lower bits determines lower bits of result, but not other bits !! ……… x^2 Table
CSE Taylor Series 2^n x V vs.2^n x (v-w ) + 2^L x w 2^n x v – (2^n x w - 2^L x w ) 2^n x v – w (2^n - 2^L ) Size of table is reduced by 2^n x v n /x v /f(x) 2^n x (v-w) n /x v-w / 2^L x w L / w / f(x)
CSE End of Ch. 11 Some parts of Ch. 11 (e.g. log ) will be covered part of Ch. 12 discussion