Integers & Floating Point Numbers: Limits of Representation CSE 351 Autumn 2016 Section 3
Key Points Remember that there are limitations! Design Decisions Memory is finite, numbers/data are not finite We can only represent so much We have 𝟐 𝒘 distinct bit patterns with w bits Design Decisions Efficient/Fast and Easy to Implement Accuracy Range Precision
Unsigned Integers Unsigned values follow base 2 system Example of converting from base 2 to base 10 b 7 b 6 b 5 b 4 b 3 b 2 b 1 b 0 = b 7 2 7 + b 6 2 6 +…+ b 1 2 1 + b 0 2 0 Benefit: Add and subtract using the normal “carry” and “borrow” rules, just in binary 63 + 8 71 00111111 +00001000 01000111
Signed Integers: Two’s Complement 𝐛 𝐰−𝟏 has weight − 𝟐 𝐰−𝟏 , other bits have usual weights + 𝟐 𝐢 . . . b0 bw-1 bw-2 TMax TMin –1 –2 UMax UMax – 1 TMax + 1 2’s Complement Range Unsigned Range Benefits: Roughly same number of (+) and (–) numbers Positive number encodings match unsigned Single representation of zero All zeros encoding (000000…) = 0 Negation is easy: ~x + 1 == -x
Values To Remember! Two’s Complement Values Unsigned Values UMin = 0 000…0 UMax = 2w – 1 111…1 Two’s Complement Values TMin = –2w–1 100…0 TMax = 2w–1 – 1 011…1 Negative one 111…1 0xF...F Values for W = 32 Decimal Hex Binary UMax 4,294,967,296 FF FF FF FF 11111111 11111111 11111111 11111111 TMax 2,147,483,647 7F FF FF FF 01111111 11111111 11111111 11111111 TMin -2,147,483,648 80 00 00 00 10000000 00000000 00000000 00000000 -1 00 00 00 00 00000000 00000000 00000000 00000000 LONG_MIN = -9223372036854775808 Values for W = 64 LONG_MAX = 9223372036854775807 ULONG_MAX = 18446744073709551615
Floating Point Numbers: The Vision What do we want? Large range of values Large numbers and very small numbers Precise values Reflect real arithmetic Support values such as +∞, ‐∞, Not‐A‐Number (NaN) Similar encoding to Two’s Complement
Floating Point Numbers V = (–1)s * M * 2E s exp frac Numerical Form Sign bit s determines whether number is negative or positive Significand (mantissa) M normally a fractional value in range [1.0, 2.0) Exponent E weights value by a (possibly negative) power of two Representation in Memory MSB s is sign bit s exp field encodes E (but is not equal to E) – remember the bias frac field encodes M (but is not equal to M)
Floating Point Numbers Value: ±1 × Mantissa × 2Exponent Bit Fields: (‐1)S × 1.M × 2(E+bias) Bias Read exponent as unsigned, but with bias of –(2w‐1‐1) = –127 Representable exponents roughly ½ positive and ½ negative Exponent 0 (Exp = 0) is represented as E = 0b 0111 1111 Why? Floating point arithmetic = easier Somewhat compatible with 2’s complement
Floating Point Numbers: Denormalized No leading 1 Remember! Implicit exponent is –126 (not –127) even though E = 0x00 Why? To represent really smaller numbers that are close to 0
Floating Point Representation Summary Exponent Mantissa Meaning 0x00 ± 0 Non-zero ± denorm num 0x01 – 0xFE Anything ± norm num 0xFF ± ∞ NaN
Floating Point Limitations: Math Properties Exponent overflow yields +∞ or -∞ Floats with value +∞, -∞, and NaN can be used in operations Result usually still +∞, -∞, or NaN; sometimes intuitive, sometimes not Floating point ops do not work like real math, due to rounding! Not associative: (3.14 + 1e100) – 1e100 != 3.14 + (1e100 – 1e100) Not distributive: 100 * (0.1 + 0.2) != 100 * 0.1 + 100 * 0.2 Not cumulative Repeatedly adding a very small number to a large one may do nothing
Distribution of Values What can’t we get? Between largest norm and infinity: Overflow Between zero and smallest denorm: Underflow Between norm numbers?: Rounding
Problems Problems Problems! Consider the decimal number 1.25. Give the IEEE-754 representation of this number as a 32-bit floating-point number. Convert 1.1 x 2-128 to IEEE 754 single precision a. 0 0111 1111 0100 0000 0000 0000 0000 000 b. (note, 1.1x2^-128 is in base 2) Shift the radix left by 2 to get 0.011x2^-126 which is denormalized 0 00000000 011000000000000000000000
Problems Problems Problems! If x and y have type float, give two different reasons that (x+2*y)-y == x+y might evaluate to 0 (i.e., false). Refer to midterm 2016 winter Overflow: If x and y are certain (very large) values, then x+2*y might produce the special value “infinity” (or negative infinity is possible too) and then subtracting y still produces infinity while x+y is still small enough not to overflow. Rounding error: If x and y are the right number of orders of magnitude apart, we might due to rounding get that x+y is still x while x+2*y-y is slightly more than x.
Problems Problems Problems! What is the largest positive number we can represent with a 10-bit signed two’s complement integer? Bit pattern? Decimal value? 01 1111 1111 2^8 + … + 2^0 = 2^(9 – 1) = 512 -1 = 511
Problems Problems Problems! Assuming unsigned integers, what is the result when you compute UMAX+1? Assuming two’s complement signed representation, what is the result when you compute TMAX+1? TMIN (0x80000000)
Problems Problems Problems! Is the ‘==’ operator a good test of equality for floating point values? Why or why not? No, since floating point suffers from rounding issues. Instead, use the <= or >= operator to test a range once you take a difference.
Problems Problems Problems! Give an example of three floating-point numbers x, y, and z, such that the distributive property x (y + z) = x y + x z does not hold. X = large number Y = small number Z = large number We might lose y when we add it to z
How to use GDB Download calculator.c from class webpage For debugging, we need to compile the file with debugging symbols. This an be done using –g flag in GCC. gcc -Wall -std=gnu99 -g calculator.c -o calculator To load binary into GDB, use following command: gdb calculator You should see bunch of information including version and license information To run binary in GDB, use run command (type run or just r). This will start executing your program till any error occurs in your program. If you want to start stepping through main(), use start command. Passing command line arguments in GDB run calculator 3 4 + View source code while debugging Use list command. For example, if you want to look at the main function, type list main() If you want to list a content around line 45, then type list 45 If you want to display a range of line numbers such as lines 10-15, then use list 10,15
How to use GDB (continued) Setting Breakpoints break command creates break point (example: break main). Each break point is associated with a number. To enable/disable breakpoint, use enable or disable command. TO see summary of all breakpoints, use info command (example: info break) To continue execution after breakpoint, use continue or c command Stepping through source code in GDB To step one line of source code at a time, use next or n command. To step through functions, use step or s command. To step out of the function, use finish command. Printing values while debugging Use print command. Exiting GDB Press Ctrl-D or type quit or type q