Dr Damian Conway Room 132 Building 26

Name: Dr Damian Conway Room 132 Building 26
Uploaded: 2017-12-23T09:26:36+00:00
Duration: PTM16S20
Description: Dr Damian Conway Room 132 Building 26

Dr Damian Conway Room 132 Building 26
Real Number Representation (Lecture 25 of the Introduction to Computer Programming series) Dr Damian Conway Room 132 Building 26

Some Terminology All digits in a number following any leading zeros are significant digits:

Some Terminology The scientific notation for real numbers is: mantissa  base exponent

Some Terminology The mantissa is always normalized between 1 and the base (i.e. exactly one significant figure before the point): Normalized Unnormalized   B.139FC  B1.39FC    2-1

Some Terminology The precision of a number is how many digits (or bits) we use to represent it. For example:

Representing numbers A real number n is represented by a floating-point approximation n* The computer uses 32 bits (or more) to store each approximation. It needs to store the mantissa, the sign of the mantissa, and the exponent (with its sign).

Representing numbers So it has to allocate some of its 32 bits to each task. The standard way to do this (specified by IEEE standard 754) is:

Representing numbers 23 bits for the mantissa;
1 bit for the mantissa's sign (i.e. the mantissa is signed magnitude); The remaining 8 bits for the exponent.

Representing the mantissa
Since the mantissa has to be in the range 1 ≤ mantissa < base, if we use base 2 the digit before the decimal has to be a 1. So we don't have to worry about storing it! That way we get 24 bits of precision using only 23 bits.

Those 24 bits of precision are equivalent to a little over 7 decimal digits:

Suppose we want to represent : That means that we can only represent it as: (if we truncate) (if we round)

Even if the computer appears to give you more than seven decimal places, only the first seven are meaningful. For example: #include <math.h> main() { float pi = 2 * asin(1); printf("%.35f\n", pi); }

On my machine this prints out:

Representing the exponent
The exponent is represented as an excess-127 number. That is:  –  –    +128

Representing the exponent
However, the IEEE standard restricts exponents to the range: –126 ≤ exponent ≤ +127 The exponents –127 and +128 have special meanings (basically, zero and infinity respectively)

Floating point overflow
Just like the integer representations in the previous lecture, floating point representations can overflow:    10128

Floating point overflow
Just like the integer representations in the previous lecture, floating point representations can overflow:   ∞

Floating point underflow
But floating point numbers can also get too small:  ÷  

Floating point underflow
But floating point numbers can also get too small:  ÷ 

Floating point addition
Five steps to add two floating point numbers: Express them with the same exponent (denormalize) Add the mantissas Adjust the mantissa to one digit/bit before the point (renormalize) Round or truncate to required precision. Check for overflow/underflow

Floating point addition example
y =  106

1. Same exponents: x =  107 y =  107

2. Add mantissas: x =  107 y =  107 x+y =  107

3. Renormalize sum: x =  107 y =  107 x+y =  108

4. Trucate or round: x =  107 y =  107 x+y =  108

5. Check overflow and underflow: x =  107 y =  107 x+y =  108

Floating point addition example 2
y =  10-5

1. Same exponents: x =  10-5 y =  10-5

2. Add mantissas: x =  10-5 y =  10-5 x+y =  10-5

3. Renormalize sum: x =  10-5 y =  10-5 x+y =  10-8

4. Trucate or round: x =  10-5 y =  10-5 x+y =  10-8 (no change)

5. Check overflow and underflow: x =  10-5 y =  10-5 x+y =  10-8

Question: should we believe these zeroes? x =  10-5 y =  10-5 x+y =  10-8

Floating point multiplication
Five steps to multiply two floating point numbers: Multiply mantissas Add exponents Renormalize mantissa Round or truncate to required precision. Check for overflow/underflow

Floating point multiplication example
y =  10-3

1&2. Multiply mantissas/add exponents: x =  105 y =  10-3 x  y =  102

3. Renormalize product: x =  105 y =  10-3 x  y =  103

4. Trucate or round: x =  105 y =  10-3 x  y =  103

5. Check overflow and underflow: x =  105 y =  10-3 x  y =  103

Limitations Float-point representations only approximate real numbers.
The normal laws of arithmetic don't always hold (even less often than for integer representations). For example, associativity is not guaranteed:

Limitations x =  103 y =  103 z =  100

Limitations x = 3.002  103 x+y = 2.000  100 y = -3.000  103
z =  100

Limitations x = 3.002  103 x+y = 2.000  100 y = -3.000  103
(x+y)+z =  100 z =  100

Limitations x =  103 y =  103 z =  100

Limitations x = 3.002  103 y = -3.000  103 y+z = -2.993  103

Limitations x = 3.002  103 x+(y+z) = 0.009  103 y = -3.000  103

Limitations x = 3.002  103 x+(y+z) = 9.000  100 y = -3.000  103

Limitations Consider the other laws of arithmetic:
Commutativity (additive and multiplicative) Associativity Distributivity Identity (additive and multiplicative) Spend some time working out which ones (if any!) always hold for floating- point numbers.

Reading (for the very keen)
Goldberg, D., What Every Computer Scientist Should Know About Floating- Point Arithmetic, ACM Computing Surveys, Vol.23, No.1, March 1991.

Dr Damian Conway Room 132 Building 26

Similar presentations

Presentation on theme: "Dr Damian Conway Room 132 Building 26"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Dr Damian Conway Room 132 Building 26

Similar presentations

Presentation on theme: "Dr Damian Conway Room 132 Building 26"— Presentation transcript:

Similar presentations

About project

Feedback