CSE1301 Computer Programming Lecture 33: Real Number Representation

Name: CSE1301 Computer Programming Lecture 33: Real Number Representation
Uploaded: 2017-10-03T15:28:17+00:00
Duration: PTM21S9
Description: CSE1301 Computer Programming Lecture 33: Real Number Representation

CSE1301 Computer Programming Lecture 33: Real Number Representation

Topics Terminology IEEE standard for floating-point representation
Floating point arithmetic Limitations

Some Terminology All digits in a number following any leading zeros are significant digits:

Some Terminology (cont)
Lecture 34: Numerical 2 Some Terminology (cont) The scientific notation for real numbers is: mantissa  base exponent In C, the expression: e-2 means:  10-2

The mantissa is always normalized between 1 and the base (i.e., exactly one significant digit before the point) Unnormalized Normalized   B1.39FC  B.139FC    2-3

Lecture 34: Numerical 2 Some Terminology (cont) The precision of a number is how many digits (or bits) we use to represent it For example:

Lecture 34: Numerical 2 Representing Numbers A real number n is represented by a floating-point approximation n* The computer uses 32 bits (or more) to store each approximation It needs to store the mantissa the sign of the mantissa the exponent (with its sign)

Representing Numbers (cont)
The standard way to allocate 32 bits (specified by IEEE Standard 754) is: 23 bits for the mantissa 1 bit for the mantissa's sign 8 bits for the exponent

Representing Numbers (cont)
23 bits for the mantissa 1 bit for the mantissa's sign 8 bits for the exponent

Representing the Mantissa
The mantissa has to be in the range  mantissa < base Therefore If we use base 2, the digit before the point must be a 1 So we don't have to worry about storing it We get 24 bits of precision using 23 bits

Representing the Mantissa (cont)
Lecture 34: Numerical 2 Representing the Mantissa (cont) 24 bits of precision are equivalent to a little over 7 decimal digits:

Lecture 34: Numerical 2 Representing the Mantissa (cont) Suppose we want to represent : That means that we can only represent it as: (if we truncate) (if we round)

Lecture 34: Numerical 2 Representing the Mantissa (cont) Even if the computer appears to represent more than 7 decimal places, only the first 7 places are meaningful For example: #include <math.h> main() { float pi = 2 * asin(1); printf("%.35f\n", pi); } Prints out:

Representing the Exponent
Lecture 34: Numerical 2 Representing the Exponent The exponent is represented as excess-127. E.g., Actual Exponent Stored Value -127   . . . 0   i  (i+127)2 +128 

Representing the Exponent (cont)
Lecture 34: Numerical 2 Representing the Exponent (cont) The IEEE standard restricts exponents to the range: –126  exponent  +127 The exponents –127 and +128 have special meanings: If exponent = -127, the stored value is 0 If exponent = 128, the stored value is 

Representing Numbers -- Example 1 What is 01011011 (8-bit machine) ?
Lecture 34: Numerical 2 Representing Numbers -- Example 1 What is (8-bit machine) ? sign exp mantissa Mantissa: Exponent (excess-3 format): 5-3=2  22  = = = 6.75

Representing Numbers -- Example 2 Represent -10.375 (32-bit machine)
Lecture 34: Numerical 2 Representing Numbers -- Example 2 Represent (32-bit machine) = = =   23 Sign: 1 Mantissa: Exponent (excess-127 format): = =

Floating Point Overflow
Lecture 34: Numerical 2 Floating Point Overflow Floating point representations can overflow, e.g.,    2127 =   2128

Floating Point Underflow
Lecture 34: Numerical 2 Floating Point Underflow Floating point numbers can also get too small, e.g.,  2-126 ÷   2-126 = 0  2-127

Floating Point Addition
Lecture 34: Numerical 2 Floating Point Addition Five steps to add two floating point numbers: Express the numbers with the same exponent (denormalize) Add the mantissas Adjust the mantissa to one digit/bit before the point (renormalize) Round or truncate to required precision Check for overflow/underflow

Lecture 34: Numerical 2 Floating Point Addition -- Example 1 (Assume precision 4 decimal digits) x =  107 y =  106

Lecture 34: Numerical 2 Floating Point Addition -- Example 1 (cont) (Assume precision 4 decimal digits) 1. Use the same exponents: x =  107 y =  107

Lecture 34: Numerical 2 Floating Point Addition -- Example 1 (cont) (Assume precision 4 decimal digits) 2. Add the mantissas: x =  107 y =  107 x+y =  107

Floating Point Addition -- Example 1 (cont) (Assume precision 4 decimal digits)
3. Renormalize the sum: x =  107 y =  107 x+y =  108

Lecture 34: Numerical 2 Floating Point Addition -- Example 1 (cont) (Assume precision 4 decimal digits) 4. Truncate or round: x =  107 y =  107 x+y =  108

Lecture 34: Numerical 2 Floating Point Addition -- Example 1 (cont) (Assume precision 4 decimal digits) 5. Check overflow and underflow: x =  107 y =  107 x+y =  108

Floating Point Addition -- Example 2 (Assume precision 4 decimal digits)
y =  10-5

1. Use the same exponents: x =  10-5 y =  10-5

2. Add the mantissas: x =  10-5 y =  10-5 x+y =  10-5

Lecture 34: Numerical 2 Floating Point Addition -- Example 2 (cont) (Assume precision 4 decimal digits) 3. Renormalize the sum: x =  10-5 y =  10-5 x+y =  10-8

4. Truncate or round: x =  10-5 y =  10-5 x+y =  10-8 (no change)

5. Check overflow and underflow: x =  10-5 y =  10-5 x+y =  10-8

Floating Point Multiplication
Lecture 34: Numerical 2 Floating Point Multiplication Five steps to multiply two floating point numbers: Multiply the mantissas Add the exponents Renormalize the mantissa Round or truncate to required precision Check for overflow/underflow

Floating Point Multiplication -- Example (Assume precision 4 decimal digits)
y =  10-3

Lecture 34: Numerical 2 Floating Point Multiplication -- Example (cont) (Assume precision 4 decimal digits) 1&2. Multiply mantissas and Add exponents: x =  105 y =  10-3 x  y =  102

Floating Point Multiplication -- Example (cont) (Assume precision 4 decimal digits)
3. Renormalize the mantissa: x =  105 y =  10-3 x  y =  103

4. Truncate or round: x =  105 y =  10-3 x  y =  103

5. Check overflow and underflow: x =  105 y =  10-3 x  y =  103

Lecture 34: Numerical 2 Limitations Floating-point representations only approximate real numbers The normal laws of arithmetic don't always hold, e.g., associativity is not guaranteed

Limitations -- Example (Assume precision 4 decimal digits)
y =  103 z =  100

Limitations -- Example (cont) (Assume precision 4 decimal digits)
y =  103 z =  100 x+y =  100

x+y =  100 y =  103 z =  100 (x+y)+z =  100

y =  103 z =  100

Lecture 34: Numerical 2 Limitations -- Example (cont) (Assume precision 4 decimal digits) x =  103 y =  103 z =  100 y+z =  103

y =  103 y+z =  103 z =  100 x+(y+z) =  103

x+(y+z) =  100 y =  103 y+z =  103 z =  100

x+(y+z) =  100 y =  103 (x+y)+z =  100 z =  100

Limitations -- Exercise Laws of Arithmetic
Lecture 34: Numerical 2 Limitations -- Exercise Laws of Arithmetic Consider the laws of arithmetic: Commutativity (additive and multiplicative) Associativity Distributivity Identity (additive and multiplicative) Try to work out which ones always hold for floating-point numbers

Reading (for the Very Keen)
Goldberg, D., What Every Computer Scientist Should Know About Floating-Point Arithmetic, ACM Computing Surveys, Vol.23, No.1, March 1991 Knuth, D.E., The Art of Computer Programming (Vol 2) -- Seminumerical Algorithms, Section 4.4, pp (ed 3)

CSE1301 Computer Programming Lecture 33: Real Number Representation

Similar presentations

Presentation on theme: "CSE1301 Computer Programming Lecture 33: Real Number Representation"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CSE1301 Computer Programming Lecture 33: Real Number Representation

Similar presentations

Presentation on theme: "CSE1301 Computer Programming Lecture 33: Real Number Representation"— Presentation transcript:

Similar presentations

About project

Feedback