Recitation 4&5 and review 1 & 2 & 3 9-26/28-2017
Recitation 4
IEEE Floating Point Representation V = (-1)s * M * 2E S: sign, s = 0 positive s = 1 negative M: Significand, 1 ≤ M ≤ 2 - € for Normalized 0 ≤ M ≤ 1- € for Denormalized € = smallest possible number greater than 0. E: Exponent and possibly negative.
IEEE Floating Point Representation 32 bit (Single precision) s =1, k = 8, n =23 64 bit (Double precision) s =1, k = 11, n =52 1 bit k bits n bits 111 s exponent fraction
IEEE Floating Point Representation Normalized Denormalized Infinity Nan 32 bits 111 s exponent ≠ 0 & ≠ 255 fraction 111 s exponent = 0 fraction 111 s exponent = 255 fraction = 0 111 s exponent = 255 fraction ≠ 0
IEEE Floating Point Representation V = (-1)s * M * 2E S: Sign bit E = exponent – Bias (Normalized) = 1 – Bias (Denormalized) Bias = 2k-1 - 1 M = 1 + Fraction (Normalized) = Fraction (Denormalized) Fraction = .fn-1fn-2…f1f0 * 2-n 1 bit k bits n bits 111 s exponent fraction
Normalized 32 bit (Single precision) 64 bit (Double precision) s =1, k = 8, n =23 Bias = 28-1 - 1 = 127 Exponent ranges : 0 to 255 but not 0 and 255 E = Exponent – bias = -126 to +127 64 bit (Double precision) s =1, k = 11, n =52 Bias = 211-1 - 1 = 1023 Exponent ranges : 0 to 2047 but not 0 and 2047 E = Exponent – bias = -1022 to +1023
Normalized M = 1 + Fraction Fraction = .fn-1fn-2…f1f0 * 2-n Here .fn-1fn-2…f1f0 = .11……1 < 1 M = 1 +Fraction < 2 = 1 ≤ M ≤ 2 – ε
Denormalized Exponent = 0 (All k bit is 0) 32 bit (Single precision) s =1, k = 8, n =23 Bias = 28-1 - 1 = 127 E = 1 – bias = -126 64 bit (Double precision) s =1, k = 11, n =52 Bias = 211-1 - 1 = 1023 E = 1 – bias = -1022
Denormalized M = Fraction Fraction = .fn-1fn-2…f1f0 * 2-n Here .fn-1fn-2…f1f0 = .11……1 < 1 M = Fraction < 1 = 0 ≤ M ≤ 1 – ε
Example FP representation of (40.15625)10 in 32 bit Sign bit, s = 0 (40.15625)10 = (101000.00101)2 Normalize: 1.0100000101 * 25 Convert the exp to biased: 127 + 5 = 132 (132)10 = (10000100)2 Result : 0 10000100 01000001010...0 s k n
Example 0 10001001 11010001101000010100110 s = 0, positive exp = 10001001 = 137 (Normalize) E = exp – Bias = 137 – 28-1-1 = 10 M = 1 + Fraction = 1 + .fn-1fn-2…f1f0 * 2-n = 1. 11010001101000010100110 s k n
Example(Contd.) V = (-1)s * M * 2E = (-1)0 * M * 210 = M * 210 = 1. 11010001101000010100110 * 210 = 11101000110.1000010100110 = 1862.520263671875
Examples Description Exponent Fraction Smallest denorm. 000…000 000…001 Largest denorm. 111…111 Smallest norm. Largest norm. 111…110 Examples of Positive Floating Point Numbers
Rounding Modes 1.40 1.60 1.50 2.50 -1.50 Round-to-even 1 2 -2 Round-toward-zero -1 Round-down Round-up 3 Rounding Modes
Recitation 5
Rounding Binary Numbers Binary Fractional Numbers – “Even” when least significant bit is 0 – “Half way” when bits to right of rounding position = 100…2 Examples – Round to nearest 1/4 (2 bits right of binary point) Value Binary Rounded Action Rounded Value 2 3/32 10.000112 10.002 (<1/2- down) 2 2 3/16 10.001102 10.012 (>1/2 - up) 2 1/4 2 7/8 10.111002 11.002 ( 1/2 - up) 3 2 5/8 10.101002 10.102 ( 1/2- down) 2 1/2
Round to Even When rounding to even, consider the two possible choices and choose the one with a 0 in the final position. Example: round to even at the 1/4 position: 1.10 1 1/2 1.1010000 1 5/8 1.11 1 3/4 1.11 1 3/4 1.1110000 1 7/8 10.00 2.0
Rounding practice
Floating Point Representation
An IEEE floating point representation uses 4 exp bits and 5 frac bits.
Review 1 & 2 & 3 Endian Bitwise & shift operation Conversion between binary, decimal, hexadecimal
Recitation1: 10 problems 0x3A6B=(0011 1010 0110 1011)b Problem2: Convert to binary: 0x3A6B 0x3A6B=(0011 1010 0110 1011)b
Recitation1: 10 problems 935=a*162+b*161+c =a*256+b*16+c Problem3: Convert from decimal to binary: 935 Problem4: Convert from decimal to hexadecimal: 935 935=a*162+b*161+c =a*256+b*16+c =3*256+10*16+7 =0x3A7 (0011 1010 0111)b
Recitation1: 10 problems (1011011101)b =(10 1101 1101)b =(2 D D )h Problem5: Convert from binary to hexadeximal: 1011011101 = 10 1101 1101 = 0x2DD Problem6: Convert from binary to decimal: 1011011101 = 2*162 + 13*16 + 13 = 733 (1011011101)b =(10 1101 1101)b =(2 D D )h =0x2DD =2*162+13*16+13 =733
Little Endian vs Big Endian Little endian: store the least significant byte in the smallest address. store the most significant byte in the largest address. Big Endian: store the most significant byte in the smallest address. store the least significant byte in the largest address. So, we can know how 0x1234567 is stored in memory. address 1000 1001 1002 1003 Little Endian 67 45 23 01 address 1000 1001 1002 1003 Big Endian 01 23 45 67
Two’s Complement Addition n = 5 bits (-3)10 + (-10)10 N = 3 =(00011)2 N = (10)10 = (01010)2 N* = -N = (11101)2 N* = -N = (10110)2 11101 +10110 110 011 = N* = -N = (-13)10 N = (01100)2 = (13)10 Ignored
Bitwise Operators Or And Not Exclusive-Or (Xor) A|B = 1 when either A=1 or B=1 And A&B = 1 when both A=1 and B=1 Not ~A = 1 when A=0 Exclusive-Or (Xor) A^B = 1 when either A=1 or B=1, but not both
Bitwise Operators Operate on Bit Vectors Operations applied bitwise 01101001 & 01010101 01000001 01101001 | 01010101 01111101 01101001 ^ 01010101 00111100 ~ 01010101 10101010 01000001 01111101 00111100 10101010
Shift Operations Left Shift: x << y Right Shift: x >> y Argument x 01100010 Left Shift: x << y Shift bit-vector x left y positions Throw away extra bits on left Fill with 0’s on right Right Shift: x >> y Shift bit-vector x right y positions Throw away extra bits on right Logical shift Fill with 0’s on left Arithmetic shift Replicate most significant bit on left Undefined Behavior Shift amount < 0 or ≥ word size << 3 00010000 Log. >> 2 00011000 Arith. >> 2 00011000 Argument x 10100010 << 3 00010000 Log. >> 2 00101000 Arith. >> 2 11101000