COMP201 Computer Systems Floating Point Numbers
Floating Point Numbers Representations considered so far have a limited range dependent on the number of bits: 16 bits0 to for unsigned to for 2's complement. 32 bits0 to for unsigned to 's Complement How do we represent the following numbers: mass of an electron = Kg Mass of the earth = Kg Both these numbers have very few significant digits but are well beyond the range above.
Floating Point Numbers We normally use scientific notation: Mass of electron = x Kg Mass of earth = x 1024 Kg These numbers can be split into two components: Mantissa Exponent e.g x Kg mantissa = exponent = -31
Floating Point Numbers Also need to be able to represent negative numbers Same approach can be taken with binary numbers i.e. a x re where a is the mantissa, r is the base( Radix) and e is the exponent Notice, we are trading off precision and possibly making computation times longer, in exchange for being able to represent a larger range of numbers.
Defining a floating point number Several things must be defined, in order to define a floating point number: Size of mantissa Sign of mantissa Size of exponent Sign of exponent Number base in use Positive 31 Negative 10
Floating Point Numbers Consider the following using eight digits and radix 10: MSDLSD Exponent (2 digits) Mantissa (5 digits) The high order bit is the sign bit (0 for + and 1 for - )
Floating Point Numbers But this has serious limitations! Range of numbers which can be represented is 10 0 – since only two digits are devoted to exponent. Precision is only 5 digits No provision for negative exponents
Floating Point Numbers (Continued) One solution for the limitations of range and negative exponents, is to decrease the positive range to 49, by applying an offset of 50 (excess-50 notation). Then, the range of numbers which can be expressed becomes: x to x Often implementations require that the most significant digit not be zero so restrict the most negative value to x and the all-0’s represents 0.
From the text: Excess-50 notation Range of represented numbers
Examples (from textbook) Problem: Convert to the standard format Write as x 10 3 Truncate at 5 digits: x 10 3 Final number: Problem: Convert to FP format. Write as x Zero fill to 5 digits Final number: (the offset) NOTE: Plus is represented by 0; minus by 5 in this format.
Floating point in the computer Within the computer, we work in base 2, and use a format such as: NOTE: Bit order depends upon the particular computer
Numbers represented Using 32 bits to represent a number, with 1 sign bit, 8 bits for exponent, and the remaining (23 bits) for mantissa. In order to represent negative exponents, use excess-128 notation. Range of numbers represented is approximately to (in decimal terms)
But this is not the end! The precision of the mantissa can be improved from 23 bits to 24 bits by noticing that the MSB of the mantissa in a normalized binary number is always “1” And so, it can be implied, rather than expressed directly. The added complications (see below) are felt to be a good tradeoff for the added precision. This complication is the fact that certain numbers are too small to be normalized, and the number 0.0 cannot be represented at all! The solution is to utilize certain codes for these special cases.
Floating point numbers– the IEEE standard IEEE Standard 754 Most (but not all) computer manufactures use IEEE-754 format Number represented: (-1) S * (1.M)*2 (E - Bias) 2 main formats: single and double SExponentMantissa 1823 Single Bias = 127 S ExponentMantissa Double Bias = 1023
And there are special cases to be considered: Zero, represented by +/ x 0.M 2E -127 x 1.M +/- infinity Special condition Exponent Mantissa 0+/- 0 0not any 255+/ not 0
Special cases (continued) The thing to remember about the special cases, is that they are SPECIAL, that is, not expected to occur. They cover numbers outside the range expected, by using some unlikely-to-be used codes: Special code for 0.0 Special code for numbers too small to be normalized… Special code for infinity What you are expected to remember is that these special codes exist, and they limit the range of numbers that can be represented by the standard.
Final ranges represented to Approx to Similarly, the double-precision used sixty-four bits, and represents a range of approximately to
Floating Point Numbers IEEE Standard 754 Example: What Decimal number does the following IEEE floating point number represent? Sign = 0 (positive) Exponent = 2 (0x81 or 129 – 127 = 2) Mantissa = implied Final answer: or The bias
More… What is the IEEE f.p. number Converted to decimal? Begin by dividing up the number into the fields of sign, exponent and mantissa: Then, convert the exponent ( = 131) and subtract the bias (127) to get the shift (4) Add in the “implied one” to the mantissa And shift 4 places, to get the final number , or in decimal. Mantissa (23 bits)
And more… What is the decimal number converted to IEEE fp format? First, convert the number to binary: Then, normalize for 1.xxxx format: shift = 5 Add in the bias (127) and convert the total (132) to binary ( or 0x84) Assemble final number: I’ve added spaces for emphasis
Floating Point Application Floating point arithmetic is very important in certain types of application Primarily scientific computation Many applications use no floating point arithmetic at all Communications Operating systems Most graphics
Floating point implementation Floating point arithmetic may be implemented in software or hardware. It is much more complex than integer arithmetic. Hardware instruction implementations require more gates than equivalent integer instructions and are often slower Software implementations are generally very slow CPU Floating performance can be very different from Integer performance