Binary Real Numbers
Introduction Computers must be able to represent real numbers (numbers w/ fractions) Two different ways: Fixed-point Floating-point NOTE: Everything in binary uses powers of two
Decimal Review Digits to the right of the decimal point correspond to negative powers of
Binary Fractions /21/41/81/161/321/64
Fixed Point Notation 1. Multiply each 1 by the corresponding power of 2 2. Add up the resulting powers of 2 Example: = ¼ = = ¼ =
Floating-Point Notation Floating-point notation is essentially the computer’s way of storing a number that has been normalized 3 different parts of any number: Mantissa: normalized number Exponent: power to which the base is raised Sign: of both mantissa and exponent Decimal Example: 12.5 = x 10 2 normalized!
Normalization Steps 1. Beginning with a fixed point number 2. Normalize the number such that the radix point (decimal point) is all the way to the left (produces the mantissa) 3. Multiply the resulting number by the base raised to an exponent
Floating-Point Example What is 12.5 in floating-point representation? 1. Convert 12.5 to binary fixed point = Normalize the number by moving the radix point, producing the mantissa = * Fill in the bits for each of the three parts of any real number: 1. Sign (2 bits) 2. Mantissa (# bits varies) 3. Exponent (# bits varies) 4. NOTE: 2’s complement may be applied to the mantissa or the exponent if either are negative
Placing the Bits Assume you have the following: 1 bit for the mantissa sign 8 bits for the mantissa 1 bit for the exponent sign 6 bits for the exponent S M M M M M M M M M S E E E E E E E Example: * 2 4
Another Example Convert to binary: 1. Convert 12.5 to fixed point Normalize * Convert exponent base to binary: 4 s complement the mantissa by flipping bits and adding 1: Final number
Upper & Lower Bounds Assume you have the following: 1 bit for the mantissa sign 8 bits for the mantissa 1 bit for the exponent sign 6 bits for the exponent What is the upper bound for the floating-point number? What is the lower bound for the floating-point number? What happens if we convert a floating-point number to an integer?
Integers vs. Floating-point integers: smaller range than floating-point all numbers within the range are 100% accurate floating-point large range of numbers not all numbers within the range can be represented accurately Example: repeating
Possible Errors truncation error round off errors using floating-point numbers because not all real numbers can be represented accurately overflow error attempting to represent a number that is greater than the upper bound for the given number of bits underflow error attempting to represent a number that is less than the lower bound for the given number of bits