Presentation is loading. Please wait.

Presentation is loading. Please wait.

CSE1301 Computer Programming Lecture 33: Real Number Representation

Similar presentations


Presentation on theme: "CSE1301 Computer Programming Lecture 33: Real Number Representation"— Presentation transcript:

1 CSE1301 Computer Programming Lecture 33: Real Number Representation

2 Topics Terminology IEEE standard for floating-point representation
Floating point arithmetic Limitations

3 Some Terminology All digits in a number following any leading zeros are significant digits:

4 Some Terminology (cont)
Lecture 34: Numerical 2 Some Terminology (cont) The scientific notation for real numbers is: mantissa  base exponent In C, the expression: e-2 means:  10-2

5 Some Terminology (cont)
The mantissa is always normalized between 1 and the base (i.e., exactly one significant digit before the point) Unnormalized Normalized   B1.39FC  B.139FC    2-3

6 Some Terminology (cont)
Lecture 34: Numerical 2 Some Terminology (cont) The precision of a number is how many digits (or bits) we use to represent it For example:

7 Lecture 34: Numerical 2 Representing Numbers A real number n is represented by a floating-point approximation n* The computer uses 32 bits (or more) to store each approximation It needs to store the mantissa the sign of the mantissa the exponent (with its sign)

8 Representing Numbers (cont)
The standard way to allocate 32 bits (specified by IEEE Standard 754) is: 23 bits for the mantissa 1 bit for the mantissa's sign 8 bits for the exponent

9 Representing Numbers (cont)
23 bits for the mantissa 1 bit for the mantissa's sign 8 bits for the exponent

10 Representing Numbers (cont)
23 bits for the mantissa 1 bit for the mantissa's sign 8 bits for the exponent

11 Representing Numbers (cont)
23 bits for the mantissa 1 bit for the mantissa's sign 8 bits for the exponent

12 Representing the Mantissa
The mantissa has to be in the range  mantissa < base Therefore If we use base 2, the digit before the point must be a 1 So we don't have to worry about storing it We get 24 bits of precision using 23 bits

13 Representing the Mantissa (cont)
Lecture 34: Numerical 2 Representing the Mantissa (cont) 24 bits of precision are equivalent to a little over 7 decimal digits:

14 Representing the Mantissa (cont)
Lecture 34: Numerical 2 Representing the Mantissa (cont) Suppose we want to represent : That means that we can only represent it as: (if we truncate) (if we round)

15 Representing the Mantissa (cont)
Lecture 34: Numerical 2 Representing the Mantissa (cont) Even if the computer appears to represent more than 7 decimal places, only the first 7 places are meaningful For example: #include <math.h> main() { float pi = 2 * asin(1); printf("%.35f\n", pi); } Prints out:

16 Representing the Exponent
Lecture 34: Numerical 2 Representing the Exponent The exponent is represented as excess-127. E.g., Actual Exponent Stored Value -127   . . . 0   i  (i+127)2 +128 

17 Representing the Exponent (cont)
Lecture 34: Numerical 2 Representing the Exponent (cont) The IEEE standard restricts exponents to the range: –126  exponent  +127 The exponents –127 and +128 have special meanings: If exponent = -127, the stored value is 0 If exponent = 128, the stored value is 

18 Representing Numbers -- Example 1 What is 01011011 (8-bit machine) ?
Lecture 34: Numerical 2 Representing Numbers -- Example 1 What is (8-bit machine) ? sign exp mantissa Mantissa: Exponent (excess-3 format): 5-3=2  22  = = = 6.75

19 Representing Numbers -- Example 2 Represent -10.375 (32-bit machine)
Lecture 34: Numerical 2 Representing Numbers -- Example 2 Represent (32-bit machine) = = =   23 Sign: 1 Mantissa: Exponent (excess-127 format): = =

20 Floating Point Overflow
Lecture 34: Numerical 2 Floating Point Overflow Floating point representations can overflow, e.g.,                    2127 =   2128

21 Floating Point Underflow
Lecture 34: Numerical 2 Floating Point Underflow Floating point numbers can also get too small, e.g.,  2-126 ÷                   2-126 = 0  2-127

22 Floating Point Addition
Lecture 34: Numerical 2 Floating Point Addition Five steps to add two floating point numbers: Express the numbers with the same exponent (denormalize) Add the mantissas Adjust the mantissa to one digit/bit before the point (renormalize) Round or truncate to required precision Check for overflow/underflow

23 Lecture 34: Numerical 2 Floating Point Addition -- Example 1 (Assume precision 4 decimal digits) x =  107 y =  106

24 Lecture 34: Numerical 2 Floating Point Addition -- Example 1 (cont) (Assume precision 4 decimal digits) 1. Use the same exponents: x =  107 y =  107

25 Lecture 34: Numerical 2 Floating Point Addition -- Example 1 (cont) (Assume precision 4 decimal digits) 2. Add the mantissas: x =  107 y =  107 x+y =  107

26 Floating Point Addition -- Example 1 (cont) (Assume precision 4 decimal digits)
3. Renormalize the sum: x =  107 y =  107 x+y =  108

27 Lecture 34: Numerical 2 Floating Point Addition -- Example 1 (cont) (Assume precision 4 decimal digits) 4. Truncate or round: x =  107 y =  107 x+y =  108

28 Lecture 34: Numerical 2 Floating Point Addition -- Example 1 (cont) (Assume precision 4 decimal digits) 5. Check overflow and underflow: x =  107 y =  107 x+y =  108

29 Floating Point Addition -- Example 2 (Assume precision 4 decimal digits)
y =  10-5

30 Floating Point Addition -- Example 2 (cont) (Assume precision 4 decimal digits)
1. Use the same exponents: x =  10-5 y =  10-5

31 Floating Point Addition -- Example 2 (cont) (Assume precision 4 decimal digits)
2. Add the mantissas: x =  10-5 y =  10-5 x+y =  10-5

32 Lecture 34: Numerical 2 Floating Point Addition -- Example 2 (cont) (Assume precision 4 decimal digits) 3. Renormalize the sum: x =  10-5 y =  10-5 x+y =  10-8

33 Floating Point Addition -- Example 2 (cont) (Assume precision 4 decimal digits)
4. Truncate or round: x =  10-5 y =  10-5 x+y =  10-8 (no change)

34 Floating Point Addition -- Example 2 (cont) (Assume precision 4 decimal digits)
5. Check overflow and underflow: x =  10-5 y =  10-5 x+y =  10-8

35 Floating Point Multiplication
Lecture 34: Numerical 2 Floating Point Multiplication Five steps to multiply two floating point numbers: Multiply the mantissas Add the exponents Renormalize the mantissa Round or truncate to required precision Check for overflow/underflow

36 Floating Point Multiplication -- Example (Assume precision 4 decimal digits)
y =  10-3

37 Lecture 34: Numerical 2 Floating Point Multiplication -- Example (cont) (Assume precision 4 decimal digits) 1&2. Multiply mantissas and Add exponents: x =  105 y =  10-3 x  y =  102

38 Floating Point Multiplication -- Example (cont) (Assume precision 4 decimal digits)
3. Renormalize the mantissa: x =  105 y =  10-3 x  y =  103

39 Floating Point Multiplication -- Example (cont) (Assume precision 4 decimal digits)
4. Truncate or round: x =  105 y =  10-3 x  y =  103

40 Floating Point Multiplication -- Example (cont) (Assume precision 4 decimal digits)
4. Truncate or round: x =  105 y =  10-3 x  y =  103

41 Floating Point Multiplication -- Example (cont) (Assume precision 4 decimal digits)
5. Check overflow and underflow: x =  105 y =  10-3 x  y =  103

42 Lecture 34: Numerical 2 Limitations Floating-point representations only approximate real numbers The normal laws of arithmetic don't always hold, e.g., associativity is not guaranteed

43 Limitations -- Example (Assume precision 4 decimal digits)
y =  103 z =  100

44 Limitations -- Example (cont) (Assume precision 4 decimal digits)
y =  103 z =  100 x+y =  100

45 Limitations -- Example (cont) (Assume precision 4 decimal digits)
x+y =  100 y =  103 z =  100 (x+y)+z =  100

46 Limitations -- Example (cont) (Assume precision 4 decimal digits)
y =  103 z =  100

47 Limitations -- Example (cont) (Assume precision 4 decimal digits)
Lecture 34: Numerical 2 Limitations -- Example (cont) (Assume precision 4 decimal digits) x =  103 y =  103 z =  100 y+z =  103

48 Limitations -- Example (cont) (Assume precision 4 decimal digits)
y =  103 y+z =  103 z =  100 x+(y+z) =  103

49 Limitations -- Example (cont) (Assume precision 4 decimal digits)
x+(y+z) =  100 y =  103 y+z =  103 z =  100

50 Limitations -- Example (cont) (Assume precision 4 decimal digits)
x+(y+z) =  100 y =  103 (x+y)+z =  100 z =  100

51 Limitations -- Exercise Laws of Arithmetic
Lecture 34: Numerical 2 Limitations -- Exercise Laws of Arithmetic Consider the laws of arithmetic: Commutativity (additive and multiplicative) Associativity Distributivity Identity (additive and multiplicative) Try to work out which ones always hold for floating-point numbers

52 Reading (for the Very Keen)
Goldberg, D., What Every Computer Scientist Should Know About Floating-Point Arithmetic, ACM Computing Surveys, Vol.23, No.1, March 1991 Knuth, D.E., The Art of Computer Programming (Vol 2) -- Seminumerical Algorithms, Section 4.4, pp (ed 3)


Download ppt "CSE1301 Computer Programming Lecture 33: Real Number Representation"

Similar presentations


Ads by Google