Presentation is loading. Please wait.

Presentation is loading. Please wait.

Ellen Spertus MCS 111 October 11, 2001 Floating Point Arithmetic.

Similar presentations


Presentation on theme: "Ellen Spertus MCS 111 October 11, 2001 Floating Point Arithmetic."— Presentation transcript:

1 Ellen Spertus MCS 111 October 11, 2001 Floating Point Arithmetic

2 2 Decimal addition (1) Problem: 9.999×10 1 + 1.610×10 -1 Estimate answer:

3 3 Decimal addition (2) Problem: 9.999×10 1 + 1.610×10 -1 Calculate answer: 9.999×10 1 +1.610×10 -1

4 4 Decimal addition (3) Problem: 9.999×10 1 + 1.610×10 -1 How should we add them?

5 5 Floating point addition Adjust numbers to have same exponent Add the significands Normalize the sum

6 6 Binary addition (1) Problem: 1.01×2 2 + 1.101×2 -1 Adjust numbers to have same exponent: Add the significands Normalize the sum

7 7 Binary addition (2) Problem: 1.11×2 1 + 1.01×2 3 Adjust numbers to have same exponent: Add the significands Normalize the sum

8 8 8-bit floating-point format (2) Exponent (3 bits) is biased by 3 The leading one of significand is implicit Zero is represented by all zeros

9 9 Practice Add two numbers from previous slide

10 10 Problem

11 11 Rounding (1) Round 1.00011 to have one fewer digit Modes –Always round up (IRS) –Always round down –Truncate –Round to nearest even

12 12 Rounding (2) Round -1.00011 to have one fewer digit Modes –Always round up (IRS) –Always round down –Truncate –Round to nearest even

13 13 Ensuring accurate results Our significands are 4 bits wide. We use 6 bits when adding two significands. –Guard bit –Round bit Purpose: Accurate rounding

14 14 Adding large numbers What if we add 1.1111×2 4 + 1.1111×2 4

15 15 How can we get underflow?

16 16 Associativity of arithmetic (x+y)+z = x+(y+z) When is this true?

17 17 Breakdown of associativity Values –x = 1.0000 –y = 0.00001 –z = 0.00001 Assume rounding by truncation. (x+y)+zx+(y+z)

18 18 MIPS floating point 32 floating-point registers (32 bits each) Instructions –Addition: add.s, add.d –Subtraction: sub.s, sub.d –Multiplication: mul.s, mul.d –Division: div.s, div.d –Comparison: c.x.s and c.x.d where x is: eq, neq, lt, le, gt, ge –Conditional branch: bc1t, bc1f

19 19 Summary Computers aren’t limited to integers Floating-point arithmetic is quirky –Loss of precision due to rounding –Underflow –Overflow Big picture: Floating point arithmetic can be implemented with enough ______________________.

20 20


Download ppt "Ellen Spertus MCS 111 October 11, 2001 Floating Point Arithmetic."

Similar presentations


Ads by Google