Download presentation
Presentation is loading. Please wait.
1
CSCI206 - Computer Organization & Programming
Introduction to Floating Point Numbers zyBook: 10.5
2
Real Numbers in Binary Recall the equation describing a positional number system:
3
Real Numbers in Binary Recall the equation describing a positional number system: This can be extended to real numbers: decimal point!
4
Examples base 10 0.1 in binary is = 0.5 (base 10)
5
Real Numbers in Binary Fractions of a power of 2 are easy
Other values are represented using the sum of fractions with a power of 2 denominator
6
Algorithm to Convert Decimal to Binary
For decimal to binary integers we divided by 2 and record the remainder. For decimal to binary real numbers we multiply by 2 and record the integer part. Example: convert to binary.
7
binary digits beginning to the right of the decimal
to binary binary digits beginning to the right of the decimal
8
binary digits beginning to the right of the decimal
to binary binary digits beginning to the right of the decimal
9
binary digits beginning to the right of the decimal
to binary binary digits beginning to the right of the decimal
10
binary digits beginning to the right of the decimal
to binary binary digits beginning to the right of the decimal
11
binary digits beginning to the right of the decimal
to binary binary digits beginning to the right of the decimal only work with fractional part!
12
fraction part is zero, stop!
to binary = It appears magic, but the reason behind the algorithm is to find k, such that v * 2^k = 1.0, as v is expressed as b0*2^(-1)+b1*2^(-2) … fraction part is zero, stop!
13
Convert binary to decimal
Convert to decimal == == 1/16 + 1/32 == 3/32 ==
14
Number Representation in Computing
For a given range of integers, there is a corresponding range of (exact) binary representations Example: the range [0-15] corresponds to the 4-bit binary numbers 0000 through 1111.
15
Number Representation in Computing
Within a range of real numbers, there is no way to encode all possible values. Example: the range [ ] has an infinite number of points, so we would need an infinite number of bits to represent all of the possible values! As a result, in computing, real numbers are approximate. Activity 24, question 1 - 2
16
An Observation Many common base 10 real numbers generate an infinite number of binary digits
17
Fixed vs. Floating Point Approximations
18
Floating Point Representation
Decimal notation Scientific notation 2 2×100 300 3×102 321.7 3.217×102 −53,000 −5.3×104 6,720,000,000 6.72×109 0.2 2×10−1 Floating Point Representation Where S is the sign bit M is a fixed point number (precision of numbers) E is a signed integer (range of numbers) S Exponent Mantissa or Fraction 32 or 64 bit word
19
IEEE 754 Standard (1985) S Exponent Mantissa One bit for Sign
Single precision float (32 bits) 8 bit Exponent 23 bit Mantissa Double precision float (64 bits) 11 bit Exponent 52 bit Mantissa
20
IEEE 754 Standard (1985) S Exponent Mantissa
Mantissa is normalized, meaning it is a fixed point number in the form 1.xxxxxx to save one bit, the 1. is implicit (not represented) Exponent is represented in biased form B = 127 for single B = 1023 for double
21
IEEE 754 Standard (1985) (normalized)
Exponent Mantissa S, E, and M are encoded in the binary word
22
IEEE754 - Reserved Values Not a Number =
23
IEEE754 - Example Show 3.14 as a single precision float
24
3.14 - step 1 write in binary 3.14 == 3 + 0.14 0.14*2 = 0.28
0.28*2 = 0.56 0.56*2 = 1.12 0.12*2 = 0.24 ......
25
3.14 - step 1 write in binary need 24 bits for single (52 for double)
3.14 ==
26
3.14 - step 2 normalize binary
Normalized form is 1.yyyyy 3.14 == == Note a total of 24 bits.
27
3.14 - step 3 write mantissa & sign
3.14 == M = S = 0 (positive) Note that the mantissa keeps only 23 bits, the leading bit is always 1, so it is omitted in representation (only!!).
28
3.14 - step 4 encode exponent 3.14 == Exponent = 1, B = 127, (8 bits)
E (biased exponent) = 128 =
29
step 5 write result S = 0 (positive) E = M = to hex = 0x4048f5c3
30
Endianness On a little-endian system (Intel, etc), the IEEE754 value is byte & word swapped 0x f5 c3 (big endian) c3f5 0x c3f (little endian) Swap bytes and words! float f = 3.14; unsigned char* p = (unsigned char*)&f; printf("%02x%02x %02x%02x\n", *p, *(p+1), *(p+2), *(p+3)); // result: c3f5 4840
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.