CENG536 Computer Engineering department Çankaya University
CENG Spring Dr. Yuriy ALYEKSYEYENKOV 2 The problem with fixed-point representation is illustrated by the following examples: The relative representation error due to truncation is quite significant for x while it is much less severe for y. On the other hand, both x 2 and y 2 are unrepresentable, because their computations lead to underflow (number too small) and overflow (too large), respectively.
CENG Spring Dr. Yuriy ALYEKSYEYENKOV 3 This numbers can be represented as The exponent -5 or +7 essentially indicates the direction and amount by which the radix-point must be moved to produce the corresponding fixed-point representation shown above. Hence, the designation is “floating-point numbers”.
CENG Spring Dr. Yuriy ALYEKSYEYENKOV 4 A floating-point number has four components: the sign, the significand (mantissa) s, the exponent base b, and the exponent e. The exponent base is usually a power of two except for digital arithmetic, where it is 10. mantissa
CENG Spring Dr. Yuriy ALYEKSYEYENKOV 5 A typical floating-point format. A key point to observe is that two signs are involved in a floating-point number.
CENG Spring Dr. Yuriy ALYEKSYEYENKOV 6 The use of biased exponent format has virtually no effect on the speed or cost of exponent arithmetic (addition / subtraction), given small number of bits involved. It does, however, facilitate zero detection (zero can be represented with the smallest biased exponent of 0 and an all-zero significand) and magnitude comparison (we can compare normalized floating-point numbers as if they were integers).
CENG Spring Dr. Yuriy ALYEKSYEYENKOV 7 The range of values in a floating-point number representation is composed of the intervals [- max, - min] and [max, min] :
CENG Spring Dr. Yuriy ALYEKSYEYENKOV 8 Number distribution pattern and subranges in presentations: There are three special or singular values - , 0 + . Zero is special because it can not be presented with a normalized mantissa (significand).
CENG Spring Dr. Yuriy ALYEKSYEYENKOV 9 Overflow occurs when a result is less then – max or greater then + max. Underflow, on the other hand, occurs for results in a range (– min, 0) or (0, min)
CENG Spring Dr. Yuriy ALYEKSYEYENKOV 10 The equation for the value of a floating-point number suggests that the range [- max, max] increases if we choose a larger exponent base b. A larger b also simplifies arithmetic operations on the exponents, since for the given range, smaller exponents must be dealt with. However, if the significand is to be kept in normalized form, effective precision decreases for larger b. In the past, machines with b = 2, 8, 16, or 256 were built.
CENG Spring Dr. Yuriy ALYEKSYEYENKOV 11 The exponent sign is almost always encoded in a biased format. As for a sign of a floating-point number, alternatives to the currently dominant signed-magnitude format include the use the 1’s or 2’s complement representation. Several variations have been tried in the past, including the complementation of the significand part only and the complementation of the entire number (including the exponent part) when the number to be represented is negative.
CENG Spring Dr. Yuriy ALYEKSYEYENKOV 12 The two representation formats in IEEE standard for binary floating-point numbers (ANSI/IEEE Std ) are depicted:
CENG Spring Dr. Yuriy ALYEKSYEYENKOV 13
CENG Spring Dr. Yuriy ALYEKSYEYENKOV 14 Standard defines extended formats that allow implementation to carry higher precisions internally to reduce the effect of accumulated errors. Two extender formats are defined:
CENG Spring Dr. Yuriy ALYEKSYEYENKOV 15 Value = N = (-1) s 2 E-127 (1.M) The decimal number is to be represented in the IEEE 754 single precision format: = (converted to a binary number) = 1.1 2 -1 (normalized a binary number) hidden The mantissa is positive so the sign S is given by S = 0 The biased exponent E is given by E = e E = = = Fractional part of mantissa M =.1000…..000 (in 23 bits)
CENG Spring Dr. Yuriy ALYEKSYEYENKOV 16 The IEEE 754 single precision representation is given by: SignExponentBitsMantissa 1 bit 8 bits 23 bits
CENG Spring Dr. Yuriy ALYEKSYEYENKOV 17 The decimal number – is to be represented in the IEEE 754 single precision format: – = – (converted to binary) = – 2 11 (normalized binary) hidden The mantissa is negative so the sign S is given by S = 1 The biased exponent E is given by E = e E = = = Fractional part of mantissa M = (in 23 bits)
CENG Spring Dr. Yuriy ALYEKSYEYENKOV 18 The IEEE 754 single precision representation is given by: SignExponentBitsMantissa 1 bit 8 bits 23 bits
CENG Spring Dr. Yuriy ALYEKSYEYENKOV 19 Basic arithmetic on floating-point numbers is conceptually simple. However, care must be taken in hardware implementation for ensuring corrections and avoiding undue loss of precision; in addition, it must be possible to handle any exceptions. Addition and subtraction are most difficult of the elementary operations for floating-point operands. Here, we deal only with addition, since subtraction can be converted to addition by flipping the sign of subtrahend.
CENG Spring Dr. Yuriy ALYEKSYEYENKOV 20 Consider the addition Assuming, we begin by aligning the two operand through right-shifting of the significand (mantissa) of the number with the smaller exponent.
CENG Spring Dr. Yuriy ALYEKSYEYENKOV 21 If the exponent base b and the number representation radix (base) are the same, we simply shift s2 to the right by e1 – e2 digits. When b = r a the shift amount, which is computed through direct subtraction of the biased exponent, is multiplied by a. In either case, this step is referred to as alignment shift, or preshift, (in contrast to normalization shift or postshift which is needed when the resulting significand s is unnormalized).
CENG Spring Dr. Yuriy ALYEKSYEYENKOV 22 We then perform addition as follows
CENG Spring Dr. Yuriy ALYEKSYEYENKOV 23 Floating-point multiplication is simpler then floating-point addition; it is performed by multiplying the significands and adding the exponents Postshifting may be needed, since the product s1 s2 of the two significands can be unnormalized. For example, we have, leading to the possible need for a single- bit right shift. Also, the computed exponent needs adjustment if the exponents are biased or if a normalization shift is performed. Overflow/underflow is possible during multiplication if e1 and e2 have like signs. Overflow is also possible due to normalization.
CENG Spring Dr. Yuriy ALYEKSYEYENKOV 24 Similarly, floating-point division is performed by dividing the significands and subtracting the exponents Here, problems to be dealt with are similar to those of multiplication. The ratio of the significands may have to be normalized. For example we have and a single bit left-shift is always adequate. The computed exponent needs adjustment is the exponents are biased or if a normalizing shift is performed. Overflow / underflow is possible during division if e1 and e2 have unlike signs. Underflow due to normalization is also possible.
CENG Spring Dr. Yuriy ALYEKSYEYENKOV 25 To extract the square root of a positive floating-point number, we first make its exponent even. This may require subtracting 1 from the exponent and multiplying the significand by b. We then use the following In the case of IEEE floating-point numbers, the adjusted significand will be in the range 1 s 4, which leads directly to a normalized significand for the result. Square-rooting never produced overflow or underflow.
CENG Spring Dr. Yuriy ALYEKSYEYENKOV 26
CENG Spring Dr. Yuriy ALYEKSYEYENKOV 27
CENG Spring Dr. Yuriy ALYEKSYEYENKOV 28
CENG Spring Dr. Yuriy ALYEKSYEYENKOV 29
CENG Spring Dr. Yuriy ALYEKSYEYENKOV 30
CENG Spring Dr. Yuriy ALYEKSYEYENKOV 31
CENG Spring Dr. Yuriy ALYEKSYEYENKOV 32
CENG Spring Dr. Yuriy ALYEKSYEYENKOV 33
CENG Spring Dr. Yuriy ALYEKSYEYENKOV 34
CENG Spring Dr. Yuriy ALYEKSYEYENKOV 35
CENG Spring Dr. Yuriy ALYEKSYEYENKOV 36
CENG Spring Dr. Yuriy ALYEKSYEYENKOV 37
CENG Spring Dr. Yuriy ALYEKSYEYENKOV 38