Presentation is loading. Please wait.

Presentation is loading. Please wait.

Floating Point Numbers

Similar presentations


Presentation on theme: "Floating Point Numbers"— Presentation transcript:

1 Floating Point Numbers
Chapters 4 and 6 Floating Point Numbers

2 Which bit patterns for reals selected? Answer: use scientific notation
Floating Point Numbers Registers for real numbers usually contain 32 or 64 bits, allowing 232 or 264 numbers to be represented. Which reals to represent? There are an infinite number between 2 adjacent integers. (or two reals!!) Which bit patterns for reals selected? Answer: use scientific notation Consider: A x 10B, where A is one digit Scientific notation in binary Standard: IEEE 754 Floating-Point A B A x 10B 0 any 0

3 S is one bit representing the sign of the number
Floating Point Representation: S E F S is one bit representing the sign of the number E is an 8 bit biased integer representing the exponent F is an unsigned integer The true value represented is: (-1)S x f x 2e e = E – bias f = F/2n + 1 for single precision numbers n=23, bias=127

4 Just a sign bit for signed magnitude E is the exponent field
Floating Point S, E, F all represent fields within a representation. Each is just a bunch of bits. S is the sign bit (-1)S  (-1)0 = +1 and (-1)1 = -1 Just a sign bit for signed magnitude E is the exponent field The E field is a biased-127 representation. True exponent is (E – bias) The base (radix) is always 2 (implied). Some early machines used radix 4 or 16 (IBM) F (or M) is the fractional or mantissa field. It is in a strange form. There are 23 bits for F. A normalized FP number always has a leading 1. No need to store the one, just assume it. This MSB is called the HIDDEN BIT.

5 Get a binary representation for 64.2
Floating Point: 64.2 into IEEE SP Get a binary representation for 64.2 Binary of left of radix point 64 is: Binary of right of radix .2 x 2 = .4 x 2 = .8 x 2 = .6 x 2 = Binary for .2: 64.2 is: Normalize binary form Produces: Turn true exponent into bias-127 Put it together: 23-bit F is: S E F is: In hex:

6 What numbers cannot be represented because of this?
Floating Point Since floating point numbers are always stored in normal form, how do we represent 0? 0x and 0x represent 0. What numbers cannot be represented because of this?

7 0/0 or + + - =NaN (Not a number) NaN ? 11111111 ?????…
Floating Point Other special values: + 5 / 0 = + … (0x7f ) -7/0 = - … (0xff ) 0/0 or =NaN (Not a number) NaN ? ?????… (S is either 0 or 1, E=0xff, and F is anything but all zeroes) Also de-normalized numbers (beyond scope)

8 What is 0x4228 0000 if in a SP FP notation?
Floating Point What is 0x if in a SP FP notation?

9 Floating Point What is as a SP FP?

10 What do floating-point numbers represent?
Rational numbers with non-repeating expansions in the given base within the specified exponent range. They do not represent repeating rational or irrational numbers, or any number too small (0 < |x|<Bemin) – underflow, or too large (|x| > Bemax) – overflow IEEE Double Precision is similar to SP 52-bit M 53 bits of precision with hidden bit 11-bit E, excess 1023, representing –1022:1023 One sign bit Always use DP unless memory/file size is important SP ~ … 1038 DP ~ … 10308 Be very careful of these ranges in numeric computation

11 Floating Point operations include ?
Floating Point: Chapter 6 Floating Point operations include ? They are complicated because…

12 Often already normalized Otherwise move one digit 1.0001631 x 103
Floating Point Addition 9.997 x 102 x 10-1 Align decimal points Add Normalize the result Often already normalized Otherwise move one digit x 103 Round result x103 9.997 x 102 x 102 x 102

13 Shifting F left by 1 bit, decreasing e by 1
Floating Point Addition .25 = 100 = Or with hidden bit .25 = 100 = Align radix points Shifting F left by 1 bit, decreasing e by 1 Shifting F right by 1 bit, increasing e by 1 Shift mantissa right so least significant bits fall off Which should we shift?

14 Shift the .25 to increase its exponent
Floating Point Addition Shift the .25 to increase its exponent – (127) = – (127) = Shift smaller by: Bias cancels with subtraction, so Carefully shifting, (original value) (shifted by 1) (shifted by 2) (shifted by 3) (shifted by 4) (shifted by 5) (shifted by 6) (shifted by 7) (shifted by 8)

15 Get a ‘1’ back in hidden bit Already normalized 4. Round result
Floating Point Addition 2. Add with hidden bit (100) (.25) 3. Normalize the result Get a ‘1’ back in hidden bit Already normalized 4. Round result Renormalize if needed Post-normalization example: s exp h frtn  discarded

16 Mantissa’s are sign-magnitude Watch out when the numbers are close
Floating Point Subtraction Mantissa’s are sign-magnitude Watch out when the numbers are close x 102 x 102 A many-digit normalization is possible This is why FP addition is in many ways more difficult than FP multiplication

17 Perform sign-magnitude operand swap
Floating Point Subtraction Align radix points Perform sign-magnitude operand swap Compare magnitudes (with hidden bit) Change sign bit if order of operands is changed. Subtract Normalize Round Renormalize if needed s exp h frtn smaller bigger switch and make difference negative bigger smaller switch sign

18 Multiplication Example 3.0 x 101 x 5.0 x 102 Multiply mantissas 3.0
Floating Point Multiplication Multiplication Example 3.0 x 101 x 5.0 x 102 Multiply mantissas 3.0 x 5.0 15.00 Add exponents 1 + 2 = 3 15.00 x 103 Normalize if needed 1.50 x 104

19 Multiplication in binary (4-bit F)
Floating Point Multiplication Multiplication in binary (4-bit F) x Multiply mantissas 1.0100 x 00000 10100

20 Add exponents, subtract extra bias. Biased: 10000100 + 00111100
Floating Point Multiplication Add exponents, subtract extra bias. Biased: Check by converting to true exponents and adding Renormalize, correcting exponent Becomes Drop the hidden bit

21 Unsigned, full-precision division on mantissas
Floating Point Division Division True division Unsigned, full-precision division on mantissas This is much more costly (e.g. 4x) than mult. Subtract exponents Faster division Newton’s method to find reciprocal Multiply dividend by reciprocal of divisor May not yield exact result without some work Similar speed as multiplication


Download ppt "Floating Point Numbers"

Similar presentations


Ads by Google