Introduction To Computer Science CS500 Introduction To Computer Science Lecture 3
Floating-Point Representation Numbers used in scientific calculations are designed by a sign, by the magnitude of the number, and by the position of the radix point. The position of the radix point is required to represent fractions, integers, or mixed integer-fraction numbers. There are two ways of specifying the position of the radix point which are: Fixed-point representation (which we have studied up till now) . Floating-point representation.
Floating-Point Representation (2) In Floating-point representation, there are two ways for positioning the radix point. They are: 1- Putting the radix point at the extreme left of the number. 2- Putting the radix point at the extreme right of the number. According to the first way, any number can be expressed as a fraction (mantissa) and a positive exponent. (e.g., 255.489 = 0.255489 * 10+3) According to the second way, any number can be expressed as an integer and a negative exponent. (e.g., 255.489 = 255489.0 * 10-3)
Floating-Point Representation (3) In decimal system, the floating-point notation; which is also called “scientific notation”; of any real number can be represented as: (N)10= .F × 10E (N)10 : is the real number in decimal. F : is a fraction(mantissa). E : is an exponent. Example 1: The real number +6132.789 can be expressed using floating-point notation notation as:
Floating-Point Representation (4) There are 4 different formats for representing floating-point binary numbers: Signed-Magnitude Method 2’s Complement Method Excess Method IEEE Standard Method
Signed-Magnitude Method Example 2: Using 14-bit signed-magnitude method, the binary number +1001.11 can be represented with a 6-bit exponent and 8-bit fraction as: Therefore, the floating-point representation is: 0 00100 0 1001110
Signed-Magnitude Method (2) Example 3: Using 14-bit signed-magnitude method, the binary number -0.001101101 can be represented with a 8-bit fraction and 6-bit exponent as: -0.001101101 = - (.1101101) x 2-2 Therefore, the floating-point representation is: 1 00010 1 1101101
Signed-Magnitude Method (3) Example 4: Using 14-bit signed-magnitude method with a 8-bit fraction and 6-bit exponent, calculate the following: (a) The allowed decimal range for signed floating-point binary numbers. (b) The forbidden decimal range for signed floating-point binary numbers. Solution: (a) The allowed decimal range for signed floating-point binary numbers is the range extended form the lowest negative decimal value to the highest positive decimal value that can be represented by the method. .
Signed-Magnitude Method (4)
Signed-Magnitude Method (5) (b) The forbidden decimal range for signed floating-point binary numbers is the range extended form the highest negative decimal value to the lowest positive decimal value that can be represented by the method. .
Signed-Magnitude Method (6) In the allowed decimal range for signed floating-point binary number, the lowest negative decimal value is the negative image of the highest positive decimal value. In the forbidden decimal range for signed floating-point binary number, the highest negative decimal value is the negative image of the lowest positive decimal value. .
2’s Complement Method In this method, only the exponent part is expressed using 2’s complement notation. There is only 1-bit sign exists for the mantissa part (with 0 for + and 1 for -), i.e., if the exponent is (–ve), it is converted using the two’s complement. If 32-bit 2’s complement representation of floating-point binary number with 1-bit sign, 7-bit exponent and 24-bit fraction, then its format will be: Example 5: Using 14-bit 2’s complement method, the binary number - 0.001101101 can be represented with 1-bit sign, 6-bit exponent and 7-bit fraction: - .1101101 x 2-2 Therefore, the floating-point representation is: 1 111110 1101101 where the exponent 111110 (-2) is the 2’s complement of 000010 (+2) Sign bit for fraction
2’s Complement Method (2) Example 6: Using 32-bit 2’s complement representation of floating-point binary number with 1-bit sign, 7-bit exponent and 24-bit fraction, find the following: (a) The allowed decimal range for signed floating-point binary numbers. (b) The forbidden decimal range for signed floating-point binary numbers. Solution: (a) The allowed decimal range: .
2’s Complement Method (3) (b) The forbidden decimal range: .
Excess Method In this method, only the exponent part is expressed using excess notation. There is only 1-bit sign exists for the fraction part (with 0 for + and 1 for -) and the exponent is converted using the excess notation. Example 7: Using 14-bit excess method with 1-bit sign, 4-bit exponent and 9-bit fraction, the binary number +1001.11 can be represented as: Because the exponent takes 4-bit, then it’s in excess-8 notation. Thus, 8 + (+4) = 12 → 1100 Therefore, the excess –based floating-point representation is: 0 1100 100111000
Excess Method (2) Example 8: Using 14-bit excess method with 1-bit sign, 3-bit exponent and 10-bit fraction, the binary number -0.0001011 be represented as: Because the exponent takes 3-bit, then it’s in excess-4 notation. Thus, 4 + (-3) = +1 → 001. Therefore, the excess-based floating-point representation is: 1 001 1011000000 Example 9: Using 10-bit excess method with 1-bit sign, 4-bit exponent and 5-bit fraction, find the following: (a) The allowed decimal range for signed floating-point binary numbers. (b)The forbidden decimal range for signed floating-point binary numbers.
Excess Method (3) Solution: (a) The allowed decimal range: (b)The forbidden decimal range for signed floating-point binary numbers.
IEEE Standard Method In this method, the single precision floating-point number is expressed using 32-bits The exponent part is expressed using 8-bits in excess-127 notation. The fraction part is considered to be 1.f , with 1 is hidden and appears only during conversion to decimal. E.g., if fraction is .01101, then fraction would be 1.01101 If the exponent is (+4), the exponent will be (+4) + 127 = 131 (10000011)
IEEE Standard Method (2) Example 10: Code the decimal value +.001275 as a binary IEEE floating-point format Solution: + (.001275) = + (.000000000101) = + (.101) X 2-9= + (1.01) X 2-10 (-10) exponent in excess-127 = -10 + 127 = 117 = 1110101 Since the fraction part is considered to be 1.f , with 1 is hidden and appears only during conversion to decimal, so f = .01 2 X .001275 .00255 .0051 .0102 .0204 .0408 .0816 .1632 .3264 .6528 .3056 1 .6112 .2224 1-bit Sign of Fraction (f) 8-bits Exponent (e) 23-bit Fraction 01110101 01000000000000000000000 the binary pattern of floating-point format of (+.001275) = (0 01110101 01000000000000000000000) = (3AA00000)16
IEEE Standard Method (3) Example 11: Decode the binary IEEE bit pattern (C3D1E000)16 to its decimal value Solution: (C3D1E000)16 = 1100 0011 1101 0001 1110 0000 0000 0000 Since the fraction part is considered to be 1.f , with 1 is hidden and appears only during conversion to decimal, so f = - 1.10100011110000000000000 e = (10000111 ) in excess-127 to decimal, we will subtract the notation = 135 – 127 = +8 (f) X 2e = - (1.10100011110000000000000) X 2+8 = - (110100011.110000000000000) = - (110100011.11) = - 419.75 1-bit Sign of Fraction (f) 8-bits Exponent (e) 23-bit Fraction (f) 1 10000111 10100011110000000000000
IEEE Standard Method (4)
Floating-Point Representation Questions Example 12: Assuming a binary pattern of length 14-bits is used to store signed numbers as floating point values with 1-sign bit, 5-exponent bits, and 8-fraction bits. Find the following: Decode the bit pattern 00010110110101 into its equivalent decimal value Code the decimal value -12.75 as a binary pattern of floating-point format If the following floating-point representations are used: Signed-Magnitude Method 2’s Complement Method Excess Method
Floating-Point Representation Questions (2) Solution: Decode the bit pattern 00010110110101 into its equivalent decimal value Signed-Magnitude is used f = ( - .0110101 ) e = + (00101) = +5 (f) X 2e = - (.0110101 ) X 2+5 = - (01101.01 ) = - 13.25 1-bit Sign of Exponent (e) 5-bits Exponent Fraction (f) 7-bit Fraction 00101 1 0110101
Floating-Point Representation Questions (3) Solution: Decode the bit pattern 00010110110101 into its equivalent decimal value 2’s Complement is used f = ( + .10110101 ) e = (00101) = +5 (f) X 2e = + (.10110101) X 2+5 = + (10110.101 ) = +22.625 1-bit Sign of Fraction (f) 5-bits Exponent (e) 8-bit Fraction 00101 10110101
Floating-Point Representation Questions (4) Solution: Decode the bit pattern 00010110110101 into its equivalent decimal value Excess is used f = ( + .10110101 ) e = (00101) in excess-16 to decimal, we will subtract the notation = 5 – 16 = -11 (f) X 2e = + (.10110101) X 2-11 = + .0000000000010110101 1-bit Sign of Fraction (f) 5-bits Exponent (e) 8-bit Fraction 00101 10110101
Floating-Point Representation Questions (5) Solution: Code the decimal value -12.75 as a binary pattern of floating-point format Signed-Magnitude is used - 12.75 = - (1100.11) = - .110011 X 2+4 4 in 5-bits magnitude only = 00100 1-bit Sign of Exponent (e) 5-bits Exponent Fraction (f) 7-bit Fraction 00100 1 1100110 the binary pattern of floating-point format of (-12.75) = 0 00100 1 1100110
Floating-Point Representation Questions (6) Solution: Code the decimal value -12.75 as a binary pattern of floating-point format 2’s Complement is used - 12.75 = - (1100.11) = - .110011 X 2+4 (+4) = (00100) in 5-bits 2’s Complement 1-bit Sign of Fraction (f) 5-bits Exponent (e) 8-bit Fraction 1 00100 11001100 the binary pattern of floating-point format of (-12.75) = 1 00100 11001100
Floating-Point Representation Questions (7) Solution: Code the decimal value -12.75 as a binary pattern of floating-point format Excess is used - 12.75 = - (1100.11) = - .110011 X 2+4 (+4) in binary 5-bits excess-16 = (+4) + 16 = 20 = (10100) 1-bit Sign of Fraction (f) 5-bits Exponent (e) 8-bit Fraction 1 10100 11001100 the binary pattern of floating-point format of (-12.75) = 1 10100 11001100