Download presentation
Presentation is loading. Please wait.
1
Recitation 4&5 and review 1 & 2 & 3
9-26/
2
Recitation 4
3
IEEE Floating Point Representation
V = (-1)s * M * 2E S: sign, s = 0 positive s = 1 negative M: Significand, 1 ≤ M ≤ 2 - € for Normalized 0 ≤ M ≤ 1- € for Denormalized € = smallest possible number greater than 0. E: Exponent and possibly negative.
4
IEEE Floating Point Representation
32 bit (Single precision) s =1, k = 8, n =23 64 bit (Double precision) s =1, k = 11, n =52 1 bit k bits n bits 111 s exponent fraction
5
IEEE Floating Point Representation
Normalized Denormalized Infinity Nan 32 bits 111 s exponent ≠ 0 & ≠ 255 fraction 111 s exponent = 0 fraction 111 s exponent = 255 fraction = 0 111 s exponent = 255 fraction ≠ 0
6
IEEE Floating Point Representation
V = (-1)s * M * 2E S: Sign bit E = exponent – Bias (Normalized) = 1 – Bias (Denormalized) Bias = 2k-1 - 1 M = 1 + Fraction (Normalized) = Fraction (Denormalized) Fraction = .fn-1fn-2…f1f0 * 2-n 1 bit k bits n bits 111 s exponent fraction
7
Normalized 32 bit (Single precision) 64 bit (Double precision)
s =1, k = 8, n =23 Bias = = 127 Exponent ranges : 0 to 255 but not 0 and 255 E = Exponent – bias = -126 to +127 64 bit (Double precision) s =1, k = 11, n =52 Bias = = 1023 Exponent ranges : 0 to 2047 but not 0 and 2047 E = Exponent – bias = to +1023
8
Normalized M = 1 + Fraction Fraction = .fn-1fn-2…f1f0 * 2-n
Here .fn-1fn-2…f1f0 = .11……1 < 1 M = 1 +Fraction < 2 = 1 ≤ M ≤ 2 – ε
9
Denormalized Exponent = 0 (All k bit is 0) 32 bit (Single precision)
s =1, k = 8, n =23 Bias = = 127 E = 1 – bias = -126 64 bit (Double precision) s =1, k = 11, n =52 Bias = = 1023 E = 1 – bias = -1022
10
Denormalized M = Fraction Fraction = .fn-1fn-2…f1f0 * 2-n
Here .fn-1fn-2…f1f0 = .11……1 < 1 M = Fraction < 1 = 0 ≤ M ≤ 1 – ε
11
Example FP representation of (40.15625)10 in 32 bit Sign bit, s = 0
( )10 = ( )2 Normalize: * 25 Convert the exp to biased: = 132 (132)10 = ( )2 Result : s k n
12
Example s = 0, positive exp = = 137 (Normalize) E = exp – Bias = 137 – = 10 M = 1 + Fraction = 1 + .fn-1fn-2…f1f0 * 2-n = s k n
13
Example(Contd.) V = (-1)s * M * 2E = (-1)0 * M * 210 = M * 210
= * 210 = =
14
Examples Description Exponent Fraction Smallest denorm. 000…000
000…001 Largest denorm. 111…111 Smallest norm. Largest norm. 111…110 Examples of Positive Floating Point Numbers
15
Rounding Modes 1.40 1.60 1.50 2.50 -1.50 Round-to-even 1 2 -2
Round-toward-zero -1 Round-down Round-up 3 Rounding Modes
16
Recitation 5
17
Rounding Binary Numbers
Binary Fractional Numbers – “Even” when least significant bit is 0 – “Half way” when bits to right of rounding position = 100…2 Examples – Round to nearest 1/4 (2 bits right of binary point) Value Binary Rounded Action Rounded Value 2 3/32 10.002 (<1/2- down) 2 2 3/16 10.012 (>1/2 - up) 2 1/4 2 7/8 11.002 ( 1/2 - up) 3 2 5/8 10.102 ( 1/2- down) 2 1/2
18
Round to Even When rounding to even, consider the two possible choices and choose the one with a 0 in the final position. Example: round to even at the 1/4 position: /2 /8 /4 /4 /8
19
Rounding practice
20
Floating Point Representation
21
An IEEE floating point representation uses 4 exp bits and 5 frac bits.
23
Review 1 & 2 & 3 Endian Bitwise & shift operation
Conversion between binary, decimal, hexadecimal
25
Recitation1: 10 problems 0x3A6B=(0011 1010 0110 1011)b
Problem2: Convert to binary: 0x3A6B 0x3A6B=( )b
26
Recitation1: 10 problems 935=a*162+b*161+c =a*256+b*16+c
Problem3: Convert from decimal to binary: 935 Problem4: Convert from decimal to hexadecimal: 935 935=a*162+b*161+c =a*256+b*16+c =3*256+10*16+7 =0x3A7 ( )b
27
Recitation1: 10 problems (1011011101)b =(10 1101 1101)b =(2 D D )h
Problem5: Convert from binary to hexadeximal: = = 0x2DD Problem6: Convert from binary to decimal: = 2* * = 733 ( )b =( )b =( D D )h =0x2DD =2*162+13*16+13 =733
28
Little Endian vs Big Endian
Little endian: store the least significant byte in the smallest address. store the most significant byte in the largest address. Big Endian: store the most significant byte in the smallest address. store the least significant byte in the largest address. So, we can know how 0x is stored in memory. address 1000 1001 1002 1003 Little Endian 67 45 23 01 address 1000 1001 1002 1003 Big Endian 01 23 45 67
29
Two’s Complement Addition
n = 5 bits (-3)10 + (-10)10 N = 3 =(00011)2 N = (10)10 = (01010)2 N* = -N = (11101)2 N* = -N = (10110) = N* = -N = (-13)10 N = (01100)2 = (13)10 Ignored
30
Bitwise Operators Or And Not Exclusive-Or (Xor)
A|B = 1 when either A=1 or B=1 And A&B = 1 when both A=1 and B=1 Not ~A = 1 when A=0 Exclusive-Or (Xor) A^B = 1 when either A=1 or B=1, but not both
31
Bitwise Operators Operate on Bit Vectors Operations applied bitwise
& | ^ ~
32
Shift Operations Left Shift: x << y Right Shift: x >> y
Argument x Left Shift: x << y Shift bit-vector x left y positions Throw away extra bits on left Fill with 0’s on right Right Shift: x >> y Shift bit-vector x right y positions Throw away extra bits on right Logical shift Fill with 0’s on left Arithmetic shift Replicate most significant bit on left Undefined Behavior Shift amount < 0 or ≥ word size << 3 Log. >> 2 Arith. >> 2 Argument x << 3 Log. >> 2 Arith. >> 2
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.