Download presentation
1
CSE1301 Computer Programming Lecture 33: Real Number Representation
2
Topics Terminology IEEE standard for floating-point representation
Floating point arithmetic Limitations
3
Some Terminology All digits in a number following any leading zeros are significant digits:
4
Some Terminology (cont)
Lecture 34: Numerical 2 Some Terminology (cont) The scientific notation for real numbers is: mantissa base exponent In C, the expression: e-2 means: 10-2
5
Some Terminology (cont)
The mantissa is always normalized between 1 and the base (i.e., exactly one significant digit before the point) Unnormalized Normalized B1.39FC B.139FC 2-3
6
Some Terminology (cont)
Lecture 34: Numerical 2 Some Terminology (cont) The precision of a number is how many digits (or bits) we use to represent it For example:
7
Lecture 34: Numerical 2 Representing Numbers A real number n is represented by a floating-point approximation n* The computer uses 32 bits (or more) to store each approximation It needs to store the mantissa the sign of the mantissa the exponent (with its sign)
8
Representing Numbers (cont)
The standard way to allocate 32 bits (specified by IEEE Standard 754) is: 23 bits for the mantissa 1 bit for the mantissa's sign 8 bits for the exponent
9
Representing Numbers (cont)
23 bits for the mantissa 1 bit for the mantissa's sign 8 bits for the exponent
10
Representing Numbers (cont)
23 bits for the mantissa 1 bit for the mantissa's sign 8 bits for the exponent
11
Representing Numbers (cont)
23 bits for the mantissa 1 bit for the mantissa's sign 8 bits for the exponent
12
Representing the Mantissa
The mantissa has to be in the range mantissa < base Therefore If we use base 2, the digit before the point must be a 1 So we don't have to worry about storing it We get 24 bits of precision using 23 bits
13
Representing the Mantissa (cont)
Lecture 34: Numerical 2 Representing the Mantissa (cont) 24 bits of precision are equivalent to a little over 7 decimal digits:
14
Representing the Mantissa (cont)
Lecture 34: Numerical 2 Representing the Mantissa (cont) Suppose we want to represent : That means that we can only represent it as: (if we truncate) (if we round)
15
Representing the Mantissa (cont)
Lecture 34: Numerical 2 Representing the Mantissa (cont) Even if the computer appears to represent more than 7 decimal places, only the first 7 places are meaningful For example: #include <math.h> main() { float pi = 2 * asin(1); printf("%.35f\n", pi); } Prints out:
16
Representing the Exponent
Lecture 34: Numerical 2 Representing the Exponent The exponent is represented as excess-127. E.g., Actual Exponent Stored Value -127 . . . 0 i (i+127)2 +128
17
Representing the Exponent (cont)
Lecture 34: Numerical 2 Representing the Exponent (cont) The IEEE standard restricts exponents to the range: –126 exponent +127 The exponents –127 and +128 have special meanings: If exponent = -127, the stored value is 0 If exponent = 128, the stored value is
18
Representing Numbers -- Example 1 What is 01011011 (8-bit machine) ?
Lecture 34: Numerical 2 Representing Numbers -- Example 1 What is (8-bit machine) ? sign exp mantissa Mantissa: Exponent (excess-3 format): 5-3=2 22 = = = 6.75
19
Representing Numbers -- Example 2 Represent -10.375 (32-bit machine)
Lecture 34: Numerical 2 Representing Numbers -- Example 2 Represent (32-bit machine) = = = 23 Sign: 1 Mantissa: Exponent (excess-127 format): = =
20
Floating Point Overflow
Lecture 34: Numerical 2 Floating Point Overflow Floating point representations can overflow, e.g., 2127 = 2128
21
Floating Point Underflow
Lecture 34: Numerical 2 Floating Point Underflow Floating point numbers can also get too small, e.g., 2-126 ÷ 2-126 = 0 2-127
22
Floating Point Addition
Lecture 34: Numerical 2 Floating Point Addition Five steps to add two floating point numbers: Express the numbers with the same exponent (denormalize) Add the mantissas Adjust the mantissa to one digit/bit before the point (renormalize) Round or truncate to required precision Check for overflow/underflow
23
Lecture 34: Numerical 2 Floating Point Addition -- Example 1 (Assume precision 4 decimal digits) x = 107 y = 106
24
Lecture 34: Numerical 2 Floating Point Addition -- Example 1 (cont) (Assume precision 4 decimal digits) 1. Use the same exponents: x = 107 y = 107
25
Lecture 34: Numerical 2 Floating Point Addition -- Example 1 (cont) (Assume precision 4 decimal digits) 2. Add the mantissas: x = 107 y = 107 x+y = 107
26
Floating Point Addition -- Example 1 (cont) (Assume precision 4 decimal digits)
3. Renormalize the sum: x = 107 y = 107 x+y = 108
27
Lecture 34: Numerical 2 Floating Point Addition -- Example 1 (cont) (Assume precision 4 decimal digits) 4. Truncate or round: x = 107 y = 107 x+y = 108
28
Lecture 34: Numerical 2 Floating Point Addition -- Example 1 (cont) (Assume precision 4 decimal digits) 5. Check overflow and underflow: x = 107 y = 107 x+y = 108
29
Floating Point Addition -- Example 2 (Assume precision 4 decimal digits)
y = 10-5
30
Floating Point Addition -- Example 2 (cont) (Assume precision 4 decimal digits)
1. Use the same exponents: x = 10-5 y = 10-5
31
Floating Point Addition -- Example 2 (cont) (Assume precision 4 decimal digits)
2. Add the mantissas: x = 10-5 y = 10-5 x+y = 10-5
32
Lecture 34: Numerical 2 Floating Point Addition -- Example 2 (cont) (Assume precision 4 decimal digits) 3. Renormalize the sum: x = 10-5 y = 10-5 x+y = 10-8
33
Floating Point Addition -- Example 2 (cont) (Assume precision 4 decimal digits)
4. Truncate or round: x = 10-5 y = 10-5 x+y = 10-8 (no change)
34
Floating Point Addition -- Example 2 (cont) (Assume precision 4 decimal digits)
5. Check overflow and underflow: x = 10-5 y = 10-5 x+y = 10-8
35
Floating Point Multiplication
Lecture 34: Numerical 2 Floating Point Multiplication Five steps to multiply two floating point numbers: Multiply the mantissas Add the exponents Renormalize the mantissa Round or truncate to required precision Check for overflow/underflow
36
Floating Point Multiplication -- Example (Assume precision 4 decimal digits)
y = 10-3
37
Lecture 34: Numerical 2 Floating Point Multiplication -- Example (cont) (Assume precision 4 decimal digits) 1&2. Multiply mantissas and Add exponents: x = 105 y = 10-3 x y = 102
38
Floating Point Multiplication -- Example (cont) (Assume precision 4 decimal digits)
3. Renormalize the mantissa: x = 105 y = 10-3 x y = 103
39
Floating Point Multiplication -- Example (cont) (Assume precision 4 decimal digits)
4. Truncate or round: x = 105 y = 10-3 x y = 103
40
Floating Point Multiplication -- Example (cont) (Assume precision 4 decimal digits)
4. Truncate or round: x = 105 y = 10-3 x y = 103
41
Floating Point Multiplication -- Example (cont) (Assume precision 4 decimal digits)
5. Check overflow and underflow: x = 105 y = 10-3 x y = 103
42
Lecture 34: Numerical 2 Limitations Floating-point representations only approximate real numbers The normal laws of arithmetic don't always hold, e.g., associativity is not guaranteed
43
Limitations -- Example (Assume precision 4 decimal digits)
y = 103 z = 100
44
Limitations -- Example (cont) (Assume precision 4 decimal digits)
y = 103 z = 100 x+y = 100
45
Limitations -- Example (cont) (Assume precision 4 decimal digits)
x+y = 100 y = 103 z = 100 (x+y)+z = 100
46
Limitations -- Example (cont) (Assume precision 4 decimal digits)
y = 103 z = 100
47
Limitations -- Example (cont) (Assume precision 4 decimal digits)
Lecture 34: Numerical 2 Limitations -- Example (cont) (Assume precision 4 decimal digits) x = 103 y = 103 z = 100 y+z = 103
48
Limitations -- Example (cont) (Assume precision 4 decimal digits)
y = 103 y+z = 103 z = 100 x+(y+z) = 103
49
Limitations -- Example (cont) (Assume precision 4 decimal digits)
x+(y+z) = 100 y = 103 y+z = 103 z = 100
50
Limitations -- Example (cont) (Assume precision 4 decimal digits)
x+(y+z) = 100 y = 103 (x+y)+z = 100 z = 100
51
Limitations -- Exercise Laws of Arithmetic
Lecture 34: Numerical 2 Limitations -- Exercise Laws of Arithmetic Consider the laws of arithmetic: Commutativity (additive and multiplicative) Associativity Distributivity Identity (additive and multiplicative) Try to work out which ones always hold for floating-point numbers
52
Reading (for the Very Keen)
Goldberg, D., What Every Computer Scientist Should Know About Floating-Point Arithmetic, ACM Computing Surveys, Vol.23, No.1, March 1991 Knuth, D.E., The Art of Computer Programming (Vol 2) -- Seminumerical Algorithms, Section 4.4, pp (ed 3)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.