Download presentation
Presentation is loading. Please wait.
1
Spring 2006EE 5324 - VLSI Design II - © Kia Bazargan 299 EE 5324 – VLSI Design II Kia Bazargan University of Minnesota Part VII: Floating Point Arithmetic
2
Spring 2006EE 5324 - VLSI Design II - © Kia Bazargan 300 Floating-Point vs. Fixed-Point Numbers Fixed point has limitations x = 0000 0000. 0000 1001 2 y = 1001 0000. 0000 0000 2 Rounding? Overflow? (x 2 and y 2 under/overflow) Floating point: represent numbers in two fixed- width fields: “magnitude” and “exponent” Magnitude: more bits = more accuracy Exponent: more bits = wider range of numbers sem ±ExponentMagnitude X =
3
Spring 2006EE 5324 - VLSI Design II - © Kia Bazargan 301 Floating Point Number Representation Sign field: When 0: positive number, when 1, negative Exponent: Usually presented as unsigned by adding an offset Example: 4 bits of exponent, offset=8 oExp=1001 2 e = 1001 2 -1000 2 = 0001 2 oExp=0010 2 e = 0010 2 -1000 2 = 1010 2 = -6 Magnitude (also called significand, mantissa) Shift the number to get: 1.xxxx Magnitude is the fractional part (hidden ‘1’) Example: 6 bits of mantissa oNumber=110.0101 shift: 1.100101 mantissa=100101 oNumber=0.0001011 shift: 1.011 mantissa=011000
4
Spring 2006EE 5324 - VLSI Design II - © Kia Bazargan 302 Floating Point Numbers: Example X = ± 1.m × 2 e se (+bias)m ±ExponentMagnitude X = X 1 = + 1.0011101 × 2 2 01 0 0 0 1 1 1 0 1 X 1 = X 2 = + 1. 1 × 2 -6 00 0 1 01 0 0 0 0 0 0 X 2 = X 3 = - 1.0000001 × 2 3 11 0 1 10 0 0 0 0 0 1 X 3 = X 4 = + 1.0000000 × 2 -8 = 0 00 0 0 0 0 0 0 0 0 X 4 = X 5 = + 1.0000000 × 2 7 = + 01 1 0 0 0 0 0 0 0 X 5 =
5
Spring 2006EE 5324 - VLSI Design II - © Kia Bazargan 303 -- ++ 0 Underflow Regions Overflow Region Overflow Region Positive numbers Negative numbers FLP- FLP+maxmin-max-min DenserSparser... Denser Sparser... Floating Point Number Range Range: [-max, -min] [min, max] Min = smallest magnitude x 2 smallest exponent Max = largest magnitude x 2 largest exponent What happens if: We increase # bits for exponent? Increase # bits for magnitude? Ref: http://steve.hollasch.net/cgindex/coding/ieeefloat.html ftp://download.intel.com/technology/itj/q41999/pdf/ia64fpbf.pdf [© Oxford U Press]
6
Spring 2006EE 5324 - VLSI Design II - © Kia Bazargan 304 Floating Point Operations Addition/subtraction, multiplication/division, function evaluations,... Basic operations Adding exponents / magnitudes Multiplying magnitudes Aligning magnitudes (shifting, adjusting the exponent) Rounding Checking for overflow/underflow Normalization (shifting, adjusting the exponent)
7
Spring 2006EE 5324 - VLSI Design II - © Kia Bazargan 305 No need to normalize in this case Floating Point Addition More difficult than multiplication! Operations: Align magnitudes (so that exponents are equal) Add (and round) Normalize (result in the form of 1.xxx) X = + 1.0011101 × 2 3 01 0 1 10 0 1 1 1 0 1 X = y = + 1.1010011 × 2 0 01 0 0 01 0 1 0 0 1 1 y = y = + 0.0011010 × 2 3 01 0 1 10 0 1 1 0 1 0 y = x+y= +1.0110111 × 2 3 01 0 1 10 1 1 0 1 1 1 x+y=
8
Spring 2006EE 5324 - VLSI Design II - © Kia Bazargan 306 Floating Point Adder Architecture Unpack Complement/swap Subtract Exponents Align Magnitudes Add Magnitudes Normalize Round/Complement Normalize Pack Adjust Exponent Adjust Exponent Sign Logic +/- C in C out [© Oxford U Press]
9
Spring 2006EE 5324 - VLSI Design II - © Kia Bazargan 307 Floating Point Adder Components Unpacking Inserting the “hidden 1” Checking for special inputs (NaN, zero) Exponent difference Used in aligning the magnitudes A few bits enough for subtraction oIf 32-bit magnitude adder, 8 bits of exponent, only 5 bits involved in subtraction If negative difference, swap, use positive diff oHow to compute the positive diff? Pre-shifting and swap Shift/complement provided for one operand only Swap if needed
10
Spring 2006EE 5324 - VLSI Design II - © Kia Bazargan 308 Floating Point Adder Components (cont.) Rounding Three extra bits used for rounding Post-shifting Result in the range (-4, 4) z = C out z 1 z 0.z -1 z -2 … Right shift: 1 bit max oIf C out z 1 right shift Left shift: up to # of bits in magnitude oDetermine # of consecutive 0’s (1’s) in z, beginning with z 1. Adjust exponent accordingly Packing Check for special results (zero, under-/overflow) Remove the hidden 1
11
Spring 2006EE 5324 - VLSI Design II - © Kia Bazargan 309 Counting vs. Predicting Leading Zeros/Ones Shift amount Post-shifter Magnitude Adder Adjust Exponent Count Leading 0/1 Post-Shifter Magnitude Adder Adjust Exponent Shift amount Predict Leading 0/1 Counting: Simpler but on the critical path Predicting: More complex architecture [© Oxford U Press]
12
Spring 2006EE 5324 - VLSI Design II - © Kia Bazargan 310 Floating Point Multiplication Simpler than floating-point addition Operation: Inputs: z 1 = ± 1.m 1 × 2 e1 z 2 = ± 1.m 2 × 2 e2 Output = ± (1.m 1 × 1.m 2 ) × 2 e1+e2 Sign: XOR Exponent: oTentatively computed as e1+e2 oSubtract the bias (=127) HOW? oAdjusted after normalization Magnitude oResult in the range [1,4) (inputs in the range [1,2) ) oNormalization: 1- or 2-bit shift right, depending on rounding oResult is 2.(1+m) bits, should be rounded to (1+m) bits oRounding can gradually discard bits, instead of one last stage
13
Spring 2006EE 5324 - VLSI Design II - © Kia Bazargan 311 Floating Point Multiplier Architecture Note: Pipelining is used in magnitude multiplier, as well as block boundaries Unpack XOR Add Exponents Normalize Adjust Exponent Pack Round Normalize Multiply Magnitudes Floating-point operands Product Adjust Exponent [© Oxford U Press]
14
Spring 2006EE 5324 - VLSI Design II - © Kia Bazargan 312 Square-Rooting Most important elementary function In IEEE standard, specified a basic operation (alongside +,-,*,/) Very similar to division Pencil-and-paper method: Radicand: z=z 2k-1 z 2k-2 …z 1 z 0 Square root: q k-1 q k-2 …q 1 q 0 Remainder (z-q 2 ) s k s k-1 s k-2 …s 1 s 0 (k+1 digits)
15
Spring 2006EE 5324 - VLSI Design II - © Kia Bazargan 313 Append digits ×2×2 ×2×2 Square Rooting: Example Example: sqrt(9 52 41) q 2 q 1 q 0 qq (0) =0 95241=zq 2 =3q (1) =3 9 0526q 1 × q 1 52q 1 =0q (2) =30 00 524160q 0 × q 0 5241q 0 =8q (3) =308 4864 0377s = 377q=308
16
Spring 2006EE 5324 - VLSI Design II - © Kia Bazargan 314 Square Rooting: Example (cont.) Why double the partial root? Partial root after step 2 is: q (2) = 30 Appending the next digit q 0 10 × q (2) + q 0 Square of which is 100 × (q (2) ) 2 + 20 × q (2) × q 0 + q 0 2 The term 100 × (q (2) ) 2 already subtracted Find q 0 such that (10 ×( 2 × q (2) ) + q 0 ) × q 0 is the max number partial remainder The binary case: Square of 2 × q (2) + q 0 is: 4 × (q (2) ) 2 + 4 × q (2) × q 0 + q 0 2 Find q 0 such that (4 × q (2) + q 0 ) × q 0 is partial remainder For q 0 =1, the expression becomes 4 × q (2) +1 (i.e., append “01” to the partial root)
17
Spring 2006EE 5324 - VLSI Design II - © Kia Bazargan 315 Square Rooting: Example Base 2 Example: sqrt(01110110 2 ) = sqrt(118) q 3 q 2 q 1 q 0 qq (0) =0 01110110=z=(118) 10 q 3 =1q (1) =1 01 0011 101 ?Noq 2 =0q (2) =10 000 01101 1001 ? Yesq 1 =1q (3) =101 1001 010010 10101 ?Noq 0 =0q (4) =1010 000000 10010s=18 10 q=1010 2 =10 10
18
Spring 2006EE 5324 - VLSI Design II - © Kia Bazargan 316 Sequential Shift/Subtract Square Rooter Architecture Square root Load sub (l+2)-bit adder Trial Difference l+2 PartialRemainder q -j 2s (j-1) MSB of Put z - 1 here at the outset Select Root Digit l+2 C in C out Complement [© Oxford U Press]
19
Spring 2006EE 5324 - VLSI Design II - © Kia Bazargan 317 Other Methods for Square Rooting Restoring vs. non-restoring We looked at the restoring algorithm (after subtraction, restore partial remainder if the result is negative) Non-restoring: Use a different encoding (use digits {-1,1} instead of {0,1}) to avoid restoring High-radix Similar to modified Booth encoding multiplication: take care of more number of bits at a time More complex circuit, but faster
20
Spring 2006EE 5324 - VLSI Design II - © Kia Bazargan 318 Convergence methods Use the Newton method to approximate the function f(x) = x 2 – z approximates x= z OR f(x) = 1/x 2 – z approximates x=1/ z, multiply by z to get z Iteratively improve the accuracy Can use lookup table for the first iteration Other Methods for Square Rooting (cont.)
21
Spring 2006EE 5324 - VLSI Design II - © Kia Bazargan 319 Square Rooting: Abstract Notation q z -q 3 (q (0) 0q 3 ) 2 6 -q 2 (q (1) 0q 2 ) 2 4 -q 1 (q (2) 0q 1 ) 2 2 -q 0 (q (3) 0q 0 ) 2 0 s Floating point format: - Shift left (not right) - Powers of 2 decreasing
22
Spring 2006EE 5324 - VLSI Design II - © Kia Bazargan 320 Restoring Floating-Point Square Root Calc. z 0 1. 1 1 0 1 1 0(118/64) s (0) =z- 10 0 0. 1 1 0 1 1 0q 0 =1q (0) =1. 2s(0)0 0 1. 1 0 1 1 0 0 -[2×(1.)+2 ] 1 0. 1 s (1) 1 1 1. 0 0 1 1 0 0 q =0q (1) = 1.0 s(1) = 2s(0) 0 0 1. 1 0 1 1 0 0 2s (1) 0 1 1. 0 1 1 0 0 0 -[2×(1.0)+2 -2 ] 1 0. 0 1 s(2)0 0 1. 0 0 1 0 0 0q -2 =1q (2) = 1.01 2s (2) 0 1 0. 0 1 0 0 0 0 -[2×(1.01)+2 -3 ] 1 0. 1 0 1 s(3)1 1 1. 1 0 1 0 0 0q -3 =0q (3) = 1.010 s (3) = 2s (2) 0 1 0. 0 1 0 0 0 0 2s(3) 1 0 0. 1 0 0 0 0 0 -[2×(1.010)+2 -4 ] 1 0. 1 0 0 1 Restore [© Oxford U Press]
23
Spring 2006EE 5324 - VLSI Design II - © Kia Bazargan 321 Restoring Floating-Point Sq. Root Calc. (cont.) s (4) 0 0 1. 1 1 1 1 0 0q -4 =1q (4) = 1.0101 2s(4)0 1 1. 1 1 1 0 0 0 -[2×(1.0101)+2 -5 ] 1 0. 1 0 1 0 1 s(5)0 0 1. 0 0 1 1 1 0q -5 =1q (5) = 1.01011 2s(5) 0 1 0. 0 1 1 1 0 0 -[2×(1.01011)+2 -6 ] 1 0. 1 0 1 1 0 1 s(6)1 1 1. 1 0 1 1 1 1q -6 =0q (6) = 1.010110 s (6) = 2s (5) 0 1 0. 0 1 1 1 0 0(156/64) s (true remainder) 0. 0 0 0 0 1 0 0 1 1 1 0 0 q 1. 0 1 0 1 1 0(86/64) Restore s(3)1 1 1. 1 0 1 0 0 0q -3 =0q (3) = 1.010 s (3) = 2s (2) 0 1 0. 0 1 0 0 0 0 2s(3) 1 0 0. 1 0 0 0 0 0 -[2×(1.010)+2 -4 ] 1 0. 1 0 0 1 Restore [© Oxford U Press] (156/64 2 )
24
Spring 2006EE 5324 - VLSI Design II - © Kia Bazargan 322 Nonrestoring Floating-Point Square Root Calc. z 0 1. 1 1 0 1 1 0(118/64) s(0) =z- 1 0 0 0. 1 1 0 1 1 0q 0 =1q (0) =1. 2s(0) 0 0 1. 1 0 1 1 0 0q =1q (1) =1.1 -[2×(1.)+2 ] 1 0. 1 s(1) 1 1 1. 0 0 1 1 0 0q -2 =q (2) =1.01 2s(1) 1 1 0. 0 1 1 0 0 0 +[2×(1.1)-2 -2 ] 1 0. 1 1 s(2) 0 0 1. 0 0 1 0 0 0 q -3 =1q (3) =1.011 2s(2) 0 1 0. 0 1 0 0 0 0 -[2×(1.01)+2 -3 ] 1 0. 1 0 1 s(3) 1 1 1. 1 0 1 0 0 0 q -4 =q (4) =1.0101 2s(3) 1 1 1. 0 1 0 0 0 0 +[2×(1.011)-2 -4 ] 1 0. 1 0 1 1 s(4) 0 0 1. 1 1 1 1 0 0q -5 =1q (5) =1.01011 2s(4) 0 1 1. 1 1 1 0 0 0 -[2×(1.0101)+2 -5 ] 1 0. 1 0 1 0 1
25
Spring 2006EE 5324 - VLSI Design II - © Kia Bazargan 323 Nonrestoring FP Square Root Calc. (cont.) s(4) 0 0 1. 1 1 1 1 0 0q -5 =1q (5) =1.01011 2s(4) 0 1 1. 1 1 1 0 0 0 -[2×(1.0101)+2 -5 ] 1 0. 1 0 1 0 1 s(5) 0 0 1. 0 0 1 1 1 0q -6 =1q (6) =1.010111 2s(5) 0 1 0. 0 1 1 1 0 0 -[2×(1.01011)+2 -6 ] 1 0. 1 0 1 1 0 1 s(6) 1 1 1. 1 0 1 1 1 1 Negative(-17/64) 1 0. 1 0 1 1 0 1 Correct s(6) (corrected) 0 1 0. 0 1 1 1 0 0(156/64) s (true remainder) 0. 0 0 0 0 1 0 0 1 1 1 0 0 q(signed-digit) (87/64) q(corrected bin) 1. 0 1 0 1 1 0 (86/64) 1. 1 -1 1 -1 1 1 If final S negative, drop the last ‘1’ in q, and restore the remainder to the last positive value. s(6)=2 S(5)
26
Spring 2006EE 5324 - VLSI Design II - © Kia Bazargan 324 x (0) read out from table=1.5accurate to 10 x (1) =0.5(x (0) +2.4/x (0) )=1.550000 accurate to 10 -2 x (2) =0.5(x (1) +2.4/x (1) )=1.549193548accurate to 10 -4 x (3) =0.5(x (2) +2.4/x (2) )=1.549193338accurate to 10 -8 [Par00] p354 Square Root Through Convergence Newton-Rapson method: Choose f(x)=x 2 -z x (i+1) = x (i) – f(x (i) ) / f’(x (i) ) x (i+1) = 0.5 (x (i) + z / x (i) ) Example: compute square root of z=(2.4) 10
27
Spring 2006EE 5324 - VLSI Design II - © Kia Bazargan 325 Non-Restoring Parallel Square Rooter q -3 z -1 1 1 1 1 0 0 0 0 1 Cell FA XOR z -2 z -3 z -4 z -5 z -6 z -7 z -8 q -4 q -2 q -1 s -1 s -2 s -3 s -4 s -5 s -6 s -7 s -8 [© Oxford U Press]
28
Spring 2006EE 5324 - VLSI Design II - © Kia Bazargan 326 Function Evaluation We looked at square root calculation Direct hardware implementation (binary, BSD, high-radix) oSerial oParallel Approximation (Newton method) What about other functions? Direct implementation oExample: log 2 x can be directly implemented in hardware (using square root as a sub-component) Polynomial approximation Table look-up oEither as part of calculation or for the full calculation
29
Spring 2006EE 5324 - VLSI Design II - © Kia Bazargan 327 Table Lookup 2 u x v table Result(s) bits Operand(s) bitsu v [© Oxford U Press] Post- processing logic Smaller table(s) Operand(s) bitsu Result(s) bitsv...... Preprocessing Logic Direct table-lookup implementation Table-lookup with pre- and post-processing
30
Spring 2006EE 5324 - VLSI Design II - © Kia Bazargan 328 × Linear Interpolation Using Four Subintervals x x f(x) 4-entry tables a x 2-bit address min max x xx b /4 Radix Point (i) + a (0) +b (0) x a (1) +b (1) x a (2) +b (2) x a (3) +b (3) x 4x4x [© Oxford U Press]
31
Spring 2006EE 5324 - VLSI Design II - © Kia Bazargan 329 Piecewise Table Lookup Table 2 m* d d-bit output b-hh Z mod p b-bit input z Adder Table 1 v d* d*-hh d* Table 1 2 v d d Adder -p Mux d-bit output b-bit input b-gg dd d+1 d d Sign d+1 z z mod p L v H [© Oxford U Press]
32
Spring 2006EE 5324 - VLSI Design II - © Kia Bazargan 330 Accuracy vs. Lookup Table Size Trade-off Worst-case absolute error 10 -2 -3 -4 -5 -6 -7 -8 -9 Number of address bits (h) Linear 2nd- degree 3rd- degree 1048620 [© Oxford U Press]
33
Spring 2006EE 5324 - VLSI Design II - © Kia Bazargan 331 Useful Links M. E. Phair, “Free Floating-Point Madness!”, http://www.hmc.edu/chips/
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.