Number Systems II Prepared by Dr P Marais (Modified by D Burford)
Floating point Numbers Fixed point numbers have very limited range (determined by bit length) 32-bit value can hold integers from to or smaller range of fixed point fractional values Solution: use floating point (scientific notation) Thus 9.76*10 -14
Floating point Numbers Consists of two parts: mantissa & exponent –Mantissa: the number multiplying the base –Exponent: the power The significand is the part of the mantissa after the decimal point
Floating point Numbers * exponent -14 mantissa 9.76 significand 0.76
Floating point Numbers Range is very large Accuracy limited by significand So, for 8 digits of precision, = *10 11 and we loose accuracy (truncation error)
Floating point Numbers Can normalise any floating point number: 34.34*10 12 = 3.434*10 13 Shift point until only one non-zero digit is to left –add 1 to exponent for each left shift –subtract 1 for each right shift
Floating point Numbers Can use notation for binary (base of 2!!) *2 -3 = *2 -4 = * (2's complement exponent) For binary FP numbers, normalise to: 1.xxx…xxx*2 yy…yy
Floating point Numbers Problems with FP: –Many different floating point formats; problems exchanging data –FP arithmetic not associative: x + (y + z) != (x + y) + z IEEE 754 format introduced: –single (32-bit) –double (64-bit)
Floating point Numbers Single precsion number represented internally as –1 sign-bit –exponent (8-bits) –significand (fractional part of normalised number) (23 bits) The leading 1 of mantissa is implied; not stored
Floating point Numbers Double precision – 1 sign-bit – 11 bit exponent – 52 bit significand
Floating point Numbers The exponent is “biased‘”: no explicit negative number Single precision: 127, Double precision 1023 So, for single prec: –If exponent is 128, represent as = 255 –If exponent is –127, represent as = 0 –Can't be symmetric, because of zero
Floating point Numbers Most positive exponent: , most negative: 00.…000 Makes some hardware/logic easier for exponents (easy sorting/compare) numeric value of stored IEEE FP is actually: (-1) S * (1 + significand) * 2 exponent - bias
Example: to IEEE754 Single Sign is negative: so S = 1 Binary fraction: 0.75*2 = 1.5 (IntPart = 1) 0.50*2 = 1.0 (IntPart = 1), so = Normalise: 0.11*2 0 = 1.1*2 -1 Exponent: -1, add bias of 127 = 126 = Answer: [1] [ ] [100… ] s 8 bits 23 bits
What is the value of this FP num? [1] [ ] [ ]
What is the value of this FP num? [1] [ ] [ ] 1.Negative number (S=1) 2.Biased exponent: = = 129 Unbiased exponent = = 2 3.Significand: = = Value = (-1) * ( )*2 2 =
Floating point Numbers IEEE 754 has special codes for zero, errors –Zero: exp and significand are zero –Infinity: exp = , significand = 0 –NaN (not a number eg. 0/0): exp = , significand != 0
Range of floating point –Single precision range: to ( )*2 127 –Approx. 2* to 2*10 38 –Double range: to ( )* –Approx. 2* to 2*10 308
Floating point Numbers Addition/Subtraction: normalise, match to larger exponent then add, normalise again Underflow/overflow conditions : –Exponent Overflow Exponent bigger than max permissable size; may be set to “infinity”' –Exponent Underflow Neg exponent, smaller than minimum size; may be set to zero –Significand Underflow Alignment may causes loss of significant digits –Significand Overflow Addition may cause carry overflow; realign significands