Floating Point Math & Representation Intro to Computer Org. Floating Point Math & Representation
Motivation Integers are a very limited subset of the real numbers, even if we could represent all integers. Much of the time, we wish to use fractional numbers, be they rational or irrational.
A Human Perspective Consider a standard fractional number such as 105.8712. The decimal indicates the fractional component. We could try storing this one as an integer and mark the decimal place…
A Human Perspective But consider a number such as 1.058712 * 1046. This one can’t be easily represented by integers due to its magnitude. What about 1.058712 * 10-46? All of these are fairly easy to represent in our writing system.
A Human Perspective Note that the last two numbers presented were written in scientific notation. Commonly used shorthand that preserves the most significant details of any number.
Binary Scientific Notation Just as we have scientific notation in the decimal number system, we can have it in a binary system. Ex: 1.11010001 * 23 = 111010.001 = 25 + 24 + 23 + 21 + 2-3 = 32 + 16 + 8 + 2 + .125 = 58.125
Decimal -> Binary We already learned how to convert integers to binary. Divide the number by two. Take the remainder as the next digit, working right-to-left. If quotient = 0, stop. Otherwise, take the quotient as the new number, return to step 1.
Decimal -> Binary To convert the fractional part from decimal to binary: Multiply the number by two. Remove the integer part and save it as the next digit, working left-to-right. If remainder of number = 0, stop. Otherwise, take the remainder as the new number, return to step 1.
Decimal -> Binary Note that the process is almost a direct inversion of the process for whole numbers.
Decimal -> Binary 0.625 * 2 = 1.250 => .1___ 0.625 * 2 = 1.250 => .1___ 0.250 * 2 = 0.500 => .10__ 0.500 * 2 = 1.000 => .101_ 0 is left, so we’re done!
Decimal -> Binary Because we’re using base 2, some fractions become repeating. 0.2 * 2 = 0.4 => .0______ 0.4 * 2 = 0.80 => .00_____ 0.8 * 2 = 1.60 => .001____ 0.6 * 2 = 1.20 => .0011___ 0.2 * 2…
Decimal -> Binary Because we’re using base 2, some fractions become repeating. 0.2 * 2 = 0.4 => .0______ 0.4 * 2 = 0.80 => .00_____ 0.8 * 2 = 1.60 => .001____ 0.6 * 2 = 1.20 => .0011___ 0.2 * 2…
Decimal -> Binary For conversions in this class, it’s highly advised to handle each side of a decimal point separately.
Decimal -> Binary Let’s now consider 14.625. 14 => 11102. 0.625 => 0.1012. So, 14.625 => 1110.1012. Final step – normalization 14.625 = 1.1101012 * 23. Scientific notation allows only one leading digit.
Binary Scientific Notation All the basic mathematical operations operate in a similar fashion to their decimal scientific notation counterparts. Read starting in page 195 for more details on the operations. Part of section 3.6, the introduction to floating point.
Binary Scientific Notation Notable exception – sometimes division is performed by taking the reciprocal of the divisor, then performing multiplication.
Floating Point The floating point specification has three pieces. A sign bit The fractional part of the number An exponent field The entire number fits within 32 bits if single-precision 64 bits if double-precision
Floating Point The floating point specification has three pieces. A sign bit The fractional part of the number An exponent field
Floating Point – Sign Bit Floating point represents sign in a manner identical to that of the sign-magnitude representation of integers. Leading bit = 1 if negative, 0 if positive. Denotable as S in formulas, where (-1)S gives the sign of the floating-point number.
Floating Point The floating point specification has three pieces. A sign bit The fractional part of the number Also called significand. An exponent field
Floating Point - Significand With one exception, all binary numbers in scientific notation share the same leading digit – 1. Definition of the notation says leading digit must be non-zero. As a result, the leading 1 is not stored – only the parts after the “binary point” are.
Floating Point - Significand Thus, to obtain the part of the number that represents the significand, (1. + Fraction) signifies the need to prefix the bits appropriately.
Floating Point The floating point specification has three pieces. A sign bit The fractional part of the number An exponent field
Floating Point – Exponents Floating point numbers store their exponents in yet another representation – one that is biased. Take desired exponent and add the bias to it. Single-precision: Add 127 (27-1). Double-precision: Add 1023 (210-1).
Floating Point – Exponents Represent the result as an unsigned integer. Why might one wish to do this?
Floating Point – Exponents Special cases After biasing the exponent, two values indicate special conditions. The minimum possible value (zero) The maximum possible value (all ones) 255 in single-precision 2047 in double-precision The corresponding unbiased exponents are thus out-of-range.
Floating Point – Exponents Before covering the special cases, let’s take a look at standard floating-point numbers. We’ll use single-precision for our examples.
Single-Precision Floats Sign Exponent Fraction/Significand 1 bit 8 bits 23 bits Let’s take a look at 14.625. = 1.1101012 * 23. Sign = positive, so we’d represent it with a zero.
Single-Precision Floats Sign Exponent Fraction/Significand 8 bits 23 bits Let’s take a look at 14.625. = 1.1101012 * 23. Exponent (unbiased) = 3. Exponent (biased) = 3 + 127 = 130. 130 = 100000102. Sidenote – 2(8-1) – 1 = 127. Pattern holds for double-precision.
Single-Precision Floats Sign Exponent Fraction/Significand 10000010 23 bits Let’s take a look at 14.625. = 1.1101012 * 23. We ignore the leading 1. The part to the right of the binary point is 110101, so we store this.
Single-Precision Floats Sign Exponent Fraction/Significand 10000010 1101 0100 0000 0000 0000 000 Let’s take a look at 14.625. = 1.1101012 * 23. We ignore the leading 1. The part to the right of the binary point is 110101, so we store this and append zeros to the end to fill out the remaining bits. 31
Single-Precision Floats How about (136.111…) * 2-13? Let’s work this one on the board.
Floating Point - Zero One of the special cases of floating point is when all bits of the float/double are zero. A very convenient representation of zero!
Imprecision of Floating Point All floating-point numbers are stored in a scientific notation format with limited bits. This gives a tradeoff between range of numbers representable and the precision on those numbers. Single-precision gives 22 bits for the significant, giving accuracy to a little over seven decimal digits.
Imprecision of Floating Point What is the smallest possible difference between any two numbers with the same (binary) magnitude? That is, the same power of two in binary scientific notation. It depends on the power of two!
Imprecision of Floating Point Sign Exponent Fraction/Significand 3 (unbiased) 22 bits Let’s take a look at numbers between 8 and 16. (16 not included.) Unbiased exponent =3. What would be the smallest possible difference, looking at the bit patterns?
Imprecision of Floating Point Sign Exponent Fraction/Significand 3 (unbiased) 0000 0000 0000 0000 0000 01 The answer, in floating point format, is all zeros with a one at the end. Since the exponent is the same for both numbers, the difference is only in the significand. The above difference is the smallest non-zero difference possible. Warning – the above is before normalization – there is no leading 1! 37
Imprecision of Floating Point Sign Exponent Fraction/Significand 3 (unbiased) 0000 0000 0000 0000 0000 01 What would this translate to in decimal? (-1)0 * (1. + 2-22) * 23 = 2-22 + 3 = 2-19 Note that we don’t add the 1 to the fraction because we’re dealing with a difference between two floating point numbers here before normalization. 2-19 = 0.0000019073486328125, or 1.9073486328125 * 10-6. 38
Imprecision of Floating Point Note that this number is not precise to seven actual decimal places when the number is not in scientific notation! Numerical Analysis studies the consequences of discrete representations of real numbers, equations, and the like.
Other Special Cases Special case – biased exponent is all ones. Is the significand all zero bits? If so, the bit pattern represents ±∞. If not, it represents NaN. (Not a number) Useful for when illegal operations are performed, such as dividing by zero.
Other Special Cases Last special case – biased exponent is zero, but the significand is non-zero. This represents a denormalized number. Allows a gradual loss of precision at the limits of the exponent range. For more details, read section 3.6 of the text. We won’t worry about it much here.
More examples (On board) 6,706,993,152 – approximate population of the world. 1100011111100010010011000000000002. 1110001.012 + 101011.10012 Answer in single-precision + decimal 1110001.012 * 101011.10012 Answer in single-precision Answer in decimal notation