Download presentation
Presentation is loading. Please wait.
Published byJune Cross Modified over 9 years ago
1
1 Floating Point Operations - Part II
2
Multiplication Do unsigned multiplication on the mantissas including the hidden bits Do unsigned multiplication on the mantissas including the hidden bits Add the true exponents or unbias one of the exponents (subtract 127 from it) then perform 2’s complement addition Add the true exponents or unbias one of the exponents (subtract 127 from it) then perform 2’s complement addition Normalize the result Normalize the result Set the sign bit of the result Set the sign bit of the result
3
Setting the Sign bit The following table gives the sign bit of the result:
4
Example 12.5 18.0 x 9.5 10010000 0 1000 0011 (1)001 0000 x 10011000 0 1000 0011 (1)001 0000 x 10011000 0 1000 0010 (1)001 1000 10010000 0 1000 0010 (1)001 1000 10010000 0 1000 0110 (1)010 1011 10010000 0 1000 0110 (1)010 1011 10010000 10010000 10010000 101010110000000 101010110000000 1000 0011 0000 0011 1000 0110 least 16 bits eliminated Unbias (subtract 127) one of exponent, then perform 2’s complement addition mantissas exponents already normalized 14 bits
5
Division Do unsigned division of the mantissas Do unsigned division of the mantissas Subtract the exponent of the divisor from the exponent of the dividend Subtract the exponent of the divisor from the exponent of the dividend Normalize the result Normalize the result Set the sign bit of the result Set the sign bit of the result
6
Setting the sign bit of the quotient The sign bit of the quotient is set using the following table:
7
Rounding In floating point operations, some results may not be representable. In floating point operations, some results may not be representable. There is always a small amount of error incurred during rounding. There is always a small amount of error incurred during rounding. Error tend to accumulate over time Error tend to accumulate over time Operations performed in a different order might give different results Operations performed in a different order might give different results Exact comparison of two floating point variables is infeasible Exact comparison of two floating point variables is infeasible
8
Example 13.1 Suppose x = -1.5 10 x 10 38, y = 1.5 10 x 10 38 and z = 1.0 and suppose these are single-precision numbers. x+(y+z)= -1.5 10 x 10 38 +(1.5 10 x 10 38 + 1.0) = -1.5 10 x 10 38 + 1.5 10 x 10 38 = 0.0 = -1.5 10 x 10 38 + 1.5 10 x 10 38 = 0.0 (x+y)+z= (-1.5 10 x 10 38 + 1.5 10 x 10 38) + 1.0 = 0 + 1.0 = 1.0 = 0 + 1.0 = 1.0 Floating point addition is not associative.
9
Rounding Rules Round to nearest. Same as taught in school. In case of tie, if the lsb is 1 add a 1; if the lsb is a 0 truncate. The lsb is always 0. Round to nearest. Same as taught in school. In case of tie, if the lsb is 1 add a 1; if the lsb is a 0 truncate. The lsb is always 0. Round toward zero. Truncate the magnitude to the correct number of bits. Round toward zero. Truncate the magnitude to the correct number of bits. Round toward positive infinity. The least positive value representable that is not arithmetically less than the unrounded value is chosen. Round toward positive infinity. The least positive value representable that is not arithmetically less than the unrounded value is chosen. Round toward negative infinity. The least negative value representable but not arithmetically greater than the unrounded value is chosen. Round toward negative infinity. The least negative value representable but not arithmetically greater than the unrounded value is chosen.
10
Overflow Overflow occurs when the exponent of the normalized result is outside the range of values representable Overflow occurs when the exponent of the normalized result is outside the range of values representable The smallest number that can be represented normally has an exponent of The smallest number that can be represented normally has an exponent of e = -126, i.e. E = 1 = 0000 0001 and the largest number has an exponent of e = 127, i.e. E = 254 = 1111 1110 e = -126, i.e. E = 1 = 0000 0001 and the largest number has an exponent of e = 127, i.e. E = 254 = 1111 1110
11
The IEEE FPS assigns special meaning for extreme values of the exponent The IEEE FPS assigns special meaning for extreme values of the exponent - (S=1,E=255,F=0) - (S=1,E=255,F=0) + (S=0,E=255,F=0) + (S=0,E=255,F=0) NaN (E=255,F 0) NaN (E=255,F 0) 0 (E=0,F=0) 0 (E=0,F=0)
12
Underflow Underflow occurs when the result is too close to zero to be represented Underflow occurs when the result is too close to zero to be represented Repeatedly dividing a number by a positive constant results in values that will approach zero but may never be zero, e.g. 1 divide by 10 repetitively Repeatedly dividing a number by a positive constant results in values that will approach zero but may never be zero, e.g. 1 divide by 10 repetitively In these cases, floating point operations after some iteration will eventually return zero In these cases, floating point operations after some iteration will eventually return zero
13
Until underflow occurs, the computation is reversible, i.e. if we multiply the current result by the constant the same number of times we have divided it, it will return the original number Until underflow occurs, the computation is reversible, i.e. if we multiply the current result by the constant the same number of times we have divided it, it will return the original number Once, underflow occurs any number of multiplication will still produce zero Once, underflow occurs any number of multiplication will still produce zero
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.