Floating Point Arithmetic – Part I

Floating Point Arithmetic – Part I

Motivation Floating point representation and manipulation are considered a key aspect in computer design FLOPS – Floating Point Operations Per Second gives a rough performance estimate of computers that must perform precise mathematical operations Floating point operations are inherently more complex than integer operations In addition or subtraction exponents must be equal before the operation In division or multiplication exponents have to be added together, and the result normalized

All floating point arithmetic can be performed by treating individual parts of the representation as integers The IEEE FPS is a widely accepted standard and will be the representation used in this lecture. A hardware implementation to performing floating point arithmetic provides circuits that do the operations. A software implementation will require less hardware and uses a program code to perform the operations.

Addition and Subtraction
A floating point number can be expressed as N where N = (-1)s(m)(2e) Conversion to S,F,E s,m,e F = (m-1)2n m = 1 + F/2n E = e + 127 e = E -127 S = s s = S

To add two floating point numbers A and B, we must first align their radix points. Let A be a number such that its exponent is smaller than B’s. Aligning the radix points means shifting the fraction corresponding to the smaller exponent. We have to increment A’s exponent until it is equal to B’s. At the same time, the contents of the mantissa of A must be shifted to the right including the hidden bit with the same amount the exponent of A was incremented. We then add the mantissas of A and B.

Example 12.1 (1) (1) (0) (1) Note that this is already normalized

In general, when adding two positive mantissas, the range of the resulting mantissa is
If m < 2, it is already normalized. If m ³ 2, then it must be normalized. Note that only a single shift is required since it cannot be as large as four. To normalize, simply add one to the exponent of the result and shift the mantissa to the right 1 bit position

Example 12.2 (1) (1) (10) To normalize: add 1 to the exponent and shift the mantissa 1 bit to the right. The answer is: (1) “overflow”

The exponents can be positive or negative.
If both numbers are negative, the “smaller exponent” means more negative. In a biased-127 representation, the “more negative number” always has a smaller value for E. Note that E is unsigned. Negative mantissas can also be handled by the same algorithm. To add a negative mantissa, convert the mantissa first to 2’s complement. Then convert the result back to sign magnitude.

Example 12.3 ( ) (0) (1) (1) sign extend mantissas: answer

Subtraction can be achieved by simply adding the additive inverse of a number
The exponents are aligned and the mantissas are converted to 2’s complement. The mantissas are then added. The result, if there is a need, is normalized.

Example 12.4 (1) (1) (0) (1) Mantissas: Unnormalized result: Normalized result: Adjusted 12 positions Subtracted 12

If two numbers being compared are identical, the resulting subtraction will result in a mantissa of zero. No shifting can move a one into the hidden bit position, thus this condition must be explicitly detected and E = F = 0 is set. In subtraction, if the exponents of the numbers vary by more than the precision of the mantissa (24), the result of the shift will obtain a zero for the smaller number

Floating Point Arithmetic – Part I

Similar presentations

Presentation on theme: "Floating Point Arithmetic – Part I"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Floating Point Arithmetic – Part I

Similar presentations

Presentation on theme: "Floating Point Arithmetic – Part I"— Presentation transcript:

Similar presentations

About project

Feedback