Download presentation
Presentation is loading. Please wait.
1
CMPE12cGabriel Hugh Elkaim 1 What do floating-point numbers represent? Rational numbers with non-repeating expansions in the given base within the specified exponent range. They do not represent repeating rational or irrational numbers, or any number too small or too large. Floating Point Format
2
CMPE12cGabriel Hugh Elkaim 2 IEEE Double Precision FP IEEE Double Precision is similar to SP –52-bit M 53 bits of precision with hidden bit –11-bit E, excess 1023, representing –1022 1023 –One sign bit Always use DP unless memory/file size is important –SP ~ 10 -38 … 10 38 –DP ~ 10 -308 … 10 308 Be very careful of these ranges in numeric computation
3
CMPE12cGabriel Hugh Elkaim 3 Floating Point Arithmetic Floating Point operations include Addition Subtraction Multiplication Division They are complicated because…
4
CMPE12cGabriel Hugh Elkaim 4 Floating Point Addition 1.Align decimal points 2.Add 3.Normalize the result Often already normalized Otherwise move one digit 1.0001631 x 10 3 4.Round result 1.000 x 10 3 9.997x 10 2 +4.631x 10 -1 9.997x 10 2 + 0.004631x 10 2 10.001631 x 10 2 Decimal Review How do we do this?
5
CMPE12cGabriel Hugh Elkaim 5 Floating Point Addition First step: get into SP FP if not already.25 = 0 01111101 00000000000000000000000 100 =0 10000101 10010000000000000000000 Or with hidden bit.25 =0 01111101 1 00000000000000000000000 100 =0 10000101 1 10010000000000000000000 Example:0.25 + 100 in SP FP Hidden Bit
6
CMPE12cGabriel Hugh Elkaim 6 Second step: Align radix points –Shifting F left by 1 bit, decreasing e by 1 –Shifting F right by 1 bit, increasing e by 1 –Shift F right so least significant bits fall off –Which of the two numbers should we shift? Floating Point Addition
7
CMPE12cGabriel Hugh Elkaim 7 Floating Point Addition Shift the.25 to increase its exponent so it matches that of 100. 0.25’s e:01111101 – 1111111 (127) = 100’s e:10000101 – 1111111 (127) = Shift.25 by 8 then. Easier method: Bias cancels with subtraction, so Second step: Align radix points cont. 10000101 - 01111101 00001000 100’s E 0.25’s E
8
CMPE12cGabriel Hugh Elkaim 8 Carefully shifting the 0.25’s fraction S E HB F 0 01111101 1 00000000000000000000000 (original value) 0 01111110 0 10000000000000000000000 (shifted by 1) 0 01111111 0 01000000000000000000000 (shifted by 2) 0 10000000 0 00100000000000000000000 (shifted by 3) 0 10000001 0 00010000000000000000000 (shifted by 4) 0 10000010 0 00001000000000000000000 (shifted by 5) 0 10000011 0 00000100000000000000000 (shifted by 6) 0 10000100 0 00000010000000000000000 (shifted by 7) 0 10000101 0 00000001000000000000000 (shifted by 8) Floating Point Addition
9
CMPE12cGabriel Hugh Elkaim 9 Floating Point Addition Third Step: Add fractions with hidden bit 0 10000101 1 10010000000000000000000 (100) +0 10000101 0 00000001000000000000000 (.25) 0 10000101 1 10010001000000000000000 Fourth Step: Normalize the result Get a ‘1’ back in hidden bit Already normalized most of the time Remove hidden bit and finished
10
CMPE12cGabriel Hugh Elkaim 10 Normalization example S E HB F 001111100 +001111011 0011110111 Need to shift so that only a 1 in HB spot 0100110111 discarded Floating Point Addition
11
CMPE12cGabriel Hugh Elkaim 11 Floating Point Example 0xD4F80000 + 0x56B00000
12
CMPE12cGabriel Hugh Elkaim 12
13
CMPE12cGabriel Hugh Elkaim 13 Another SP FP Example 0xD5D00000 + 0x54600000
14
CMPE12cGabriel Hugh Elkaim 14
15
CMPE12cGabriel Hugh Elkaim 15 Floating Point Subtraction Mantissa’s are sign-magnitude Watch out when the numbers are close 1.23455x 10 2 -1.23456x 10 2 A many-digit normalization is possible This is why FP addition is in many ways more difficult than FP multiplication
16
CMPE12cGabriel Hugh Elkaim 16 Floating Point Subtraction 1.Align radix points 2.Perform sign-magnitude operand swap if needed Compare magnitudes (with hidden bit) Change sign bit if order of operands is changed. 3.Subtract 4.Normalize 5.Round Steps to do subtraction
17
CMPE12cGabriel Hugh Elkaim 17 S E HB F 001111011smaller -001111101bigger switch order and make result negative 001111101bigger -001111011smaller 101100010 100010000 switched sign Floating Point Subtraction Simple Example:
18
CMPE12cGabriel Hugh Elkaim 18 Floating Point Multiplication 1.Multiply mantissas 3.0 x 5.0 15.00 2.Add exponents 1 + 2 = 3 3. Combine 15.00 x 10 3 4.Normalize if needed 1.50 x 10 4 Decimal example: 3.0 x 10 1 x5.0 x 10 2 How do we do this?
19
CMPE12cGabriel Hugh Elkaim 19 Floating Point Multiplication Multiplication in binary (4-bit F) 0 10000100 0100 x1 00111100 1100 Step 1: Multiply mantissas (put hidden bit back first!!) 1.0100 x 1.1100 00000 10100 + 10100 1000110000 10.00110000
20
CMPE12cGabriel Hugh Elkaim 20 Floating Point Multiplication Second step: Add exponents, subtract extra bias. 10000100 +00111100 Third step: Renormalize, correcting exponent 10100000110.00110000 Becomes 1010000101.000110000 Fourth step: Drop the hidden bit 101000010000110000 11000000 11000000 - 01111111 (127) 01000001
21
CMPE12cGabriel Hugh Elkaim 21 Multiply these SP FP numbers together 0x49FC0000 x0x4BE00000 Floating Point Multiplication
22
CMPE12cGabriel Hugh Elkaim 22
23
CMPE12cGabriel Hugh Elkaim 23
24
CMPE12cGabriel Hugh Elkaim 24 Another SP FP Example 0xC9F4 × 0x484F
25
CMPE12cGabriel Hugh Elkaim 25
26
CMPE12cGabriel Hugh Elkaim 26 Floating Point Division True division Unsigned, full-precision division on mantissas This is much more costly (e.g. 4x) than mult. Subtract exponents Faster division Newton’s method to find reciprocal Multiply dividend by reciprocal of divisor May not yield exact result without some work Similar speed as multiplication
27
CMPE12cGabriel Hugh Elkaim 27 Questions?
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.