Faculty of Computer Science © 2006 CMPUT 229 Floating Point Representation Operating with Real Numbers.

© 2006 Department of Computing Science CMPUT 229 Reading Material  This set of slides is based on the texts by Patt and Patel and by Patterson and Hennessy.  The topics covered in these slides are presented in Section 4.9 of Clements’ textbook.

© 2006 Department of Computing Science CMPUT 229 Representing Large and Small Numbers How would you represent a number such as 6.023  10 23 in binary? The range (10 23 ) of this number is greater than the range of the 32-bits representation that we have used for integers (2 31  2.14  10 10 ). However the precision (6023) of this number is quite small, and can be expressed in a small number of bits. The solution is to use a floating point representation. A floating point representation allocates some bits for the range of the value, some bits for precision, and one bit for the sign. Patt/Patel, pp. 32

© 2006 Department of Computing Science CMPUT 229 Floating Point Representation Most standard floating point representation use: 1 bit for the sign (positive or negative) 8 bits for the range (exponent field) 23 bits for the precision (fraction field) Sexponentfraction 2381 Patt/Patel, pp. 33

© 2006 Department of Computing Science CMPUT 229 Floating Point Representation (example) Sexponentfraction 2381 00111101100000000000000000000000 What is the decimal value of the following floating point number? exponent exponent = 64+32+16+8+2+1=(128-8)+3=120+3=123 Patt and Patel, pp. 34

© 2006 Department of Computing Science CMPUT 229 Floating Point Representation (example) Sexponentfraction 2381 01000001100101000000000000000000 What is the decimal value of the following floating point number? exponent exponent =128+2+1=131 Patt and Patel, pp. 35

© 2006 Department of Computing Science CMPUT 229 Floating Point Representation (example) Sexponentfraction 2381 11000001000101000000000000000000 What is the decimal value of the following floating point number? exponent exponent =128+2=130 Patt and Patel, pp. 35

© 2006 Department of Computing Science CMPUT 229 Floating Point Sexponentfraction 2381 What is the largest number that can be represented using a 32-bit floating point number using the IEEE 754 format above? 01111111011111111111111111111111 exponent exponent =254 Patt and Patel, pp. 35

© 2006 Department of Computing Science CMPUT 229 Floating Point Sexponentfraction 2381 What is the largest number that can be represented in 32 bits floating point using the IEEE 754 format above? 01111111011111111111111111111111 exponent actual exponent =254-127 = 127 Patt and Patel, pp. 35

© 2006 Department of Computing Science CMPUT 229 Floating Point Sexponentfraction 2381 What is the smallest number (closest to zero) that can be represented in 32 bits floating point using the IEEE 754 format above? 00000000000000000000000000000001 exponent actual exponent =0-126 = -126 Patt and Patel, pp. 35

© 2006 Department of Computing Science CMPUT 229 Special Floating Point Representations In the 8-bit field of the exponent we can represent numbers from 0 to 255. We studied how to read numbers with exponents from 0 to 254. What is the value represented when the exponent is 255 (i.e. 11111111 2 )? An exponent equal 255 = 11111111 2 in a floating point representation indicates a special value. When the exponent is equal 255 = 11111111 2 and the fraction is 0, the value represented is  infinity. When the exponent is equal 255 = 11111111 2 and the fraction is non-zero, the value represented is Not a Number (NaN). Hen/Patt, pp. 301

© 2006 Department of Computing Science CMPUT 229 Double Precision 32-bit floating point representation is usually called single precision representation. A double precision floating point representation requires 64 bits. In double precision the following number of bits are used: 1 sign bit 11 bits for exponent 52 bits for fraction (also called significand)

© 2006 Department of Computing Science CMPUT 229 Floating Point Addition (Decimal) How do we perform the following addition? 9.999 10  10 1 + 1.610 10  10 -1 Step 1: Align decimal point of the number with smaller exponent (notice lost of precision) 9.999 10  10 1 + 0.016 10  10 1 Step 2: Add significands: 9.999 10  10 1 + 0.016 10  10 1 = 10.015 10  10 1 Step 3: Renormalize the result: 10.015  10 1 = 1.0015  10 2 Step 3: Round-off the result to the representation available: 1.0015  10 2 = 1.002  10 2 Hen/Patt, pp. 281

© 2006 Department of Computing Science CMPUT 229 Floating Point Addition (Example) Convert the numbers 0.5 10 and -0.4375 10 to floating point binary representation, and then perform the binary floating-point addition of these numbers. Which number should have its significand adjusted? Hen/Patt, pp. 283

© 2006 Department of Computing Science CMPUT 229 Floating Point Multiplication (Decimal) Assume that we only can store four digits of the significand and two digits of the exponent in a decimal floating point representation. How would you multiply 1.110 10  10 10 by 9.200 10  10 -5 in this representation? Step 1: Add the exponents: new exponent = 10 - 5 = 5 Step 2: Multiply the significands: 1.110  9.200 0000 2220 9990 10.212000 Step 3: Normalize the product: 10.212 10  10 5 = 1.0212 10  10 6 Step 4: Round-off the product: 1.0212 10  10 6 = 1.021 10  10 6 Hen/Patt, pp. 286

Faculty of Computer Science © 2006 CMPUT 229 Floating Point Representation Operating with Real Numbers.

Similar presentations

Presentation on theme: "Faculty of Computer Science © 2006 CMPUT 229 Floating Point Representation Operating with Real Numbers."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Faculty of Computer Science © 2006 CMPUT 229 Floating Point Representation Operating with Real Numbers.

Similar presentations

Presentation on theme: "Faculty of Computer Science © 2006 CMPUT 229 Floating Point Representation Operating with Real Numbers."— Presentation transcript:

Similar presentations

About project

Feedback