Download presentation
Presentation is loading. Please wait.
1
CSE 246: Computer Arithmetic Algorithms and Hardware Design Instructor: Prof. Chung-Kuan Cheng Winter 2004 Lecture 9
2
CSE 2462 Topics: Floating Point Numbers (IEEE P754) Standard Operations Exceptional Situations Rounding Modes
3
CSE 2463 Standard 2 32 Typically Goal: Dynamic Range: largest #/ smallest # If too large, holes between # ’ s
4
CSE 2464 Standard ulp (unit in the last place) Difference between two consecutive values of the significand. 3 Parts x = s b e Sign Bit 8-bit exponent Significand
5
CSE 2465 Standard a 1 a 2 a 3 a 4 a 5 a 6 a 7 a 8 b 1 b 2 b 3 b 22 b 23 1.* normalized number 0.* denormalized number 0 0.b 1 b 2 b 3 b 22 b 23 2 -126 1--------------------------------- 1. b 1 b 2 b 3 b 22 b 23 2 -126 2. 253 254 ------------------------------- 1. b 1 b 2 b 3 b 22 b 23 2 127 255 if b i = 0 for all i = 1,2, …,23, NaN otherwise NaN Not a Number
6
CSE 2466 Standard 0.01x2 -3 = 0.00x2 -2 Same number, so normalize to remove redundancy Smallest Number 0.00 … 01x2 -126 = 1.0x2 -23 x2 -126 = 1x2 -149 1.1101111001110011100101 Difference between 2 # ’ s small for normalized 0.0001 2 times compared to magnitudes 0.0010
7
CSE 2467 Standard - Example s. eeeeeeee nnnnnnnnnnnnnnnnnnnnnnn 0.00000000 00000000000000000000000 = 0.000 … 0x2 -126 1.00000000 00000000000000000000000 = 0 0.00000001 00000000000000000000000 = 1.000 … 0x2 -126 - minimal normalized # 0.00000001 00000000000000000000001 = 1.000 … 1x2 -126. 0.01111111 00000000000000000000001 = 1.000 … 1x2 0 0.10000000 00000000000000000000001 = 1.000 … 0x2 1
8
CSE 2468 Standard – Example Cont. 0.11111110 00000000000000000000001 = 1.000 … 1x2 127 0.11111110 11111111111111111111111 = 1.111 … 1x2 127 - Normalized Maximum 0.11111111 00000000000000000000000 = N min = 1.0 x 2 -126 N max = (2 – 2 -23 )2 127
9
CSE 2469 Double Floating Point a 1 a 2 … a 11 b 1 b 2 … b 52 000 … 00 0. b 1 b 2 … b 52 x 2 -1023 000 … 01 1. b 1 b 2 … b 52 x 2 -1022. 011 … 11 1. b 1 b 2 … b 52 x 2 0 100 … 00 1. b 1 b 2 … b 52 x 2 1. 111 … 10 1. b 1 b 2 … b 52 x 2 1023 111 … 11 = if b i = 0 for all i = 1,2, …,52
10
CSE 24610 Overflow/Underflow N max N min SparserDenser Overflow Underflow
11
CSE 24611 Addition/Multiplication s 1 xb e1 + ( s 2 xb e2 ) = sxb e = s 1 xb e1 + s 2 /b e1-e2 x b e1 = ( s 1 s 2 /b e1-e2 ) x b e1 ( s 1 xb e1 ) x ( s 2 xb e2 ) = (s 1 xs 2 )b e1+e2
12
CSE 24612 Exceptions a/0 = if a > 0 a/ = 0if a != 0 a · 0 = 0 a · = if a > 0 0 · = invalid operation (NaN) 0/0 = invalid operation (NaN) NaP op a = NaN a + = - = NaN
13
CSE 24613 Rounding Mode Adder Output = Cout z 1 z 0.z -1 z -2 … z - l GRS Guard Bit Round Bit Sticky Bit, OR of all bits below bit R 1.101 x 2 3 +1.110 x 2 3 11.011 x 2 3 1.1011x2 4 Normalize – need to round or
14
CSE 24614 Rouding 1.110 x 2 3 - 1.101 x 2 3 0.001 x 2 3 1.000 x 2 0 normalize 1.101 x 2 3 - 1.111 x 2 2 1.101 x 2 3 - 0.1111 x 2 3 0.1101 x 2 3 1.101 x 2 2 Guard bit
15
CSE 24615 Rounding Round to the nearest even toward 0 1.1011 Toward + 1.1100 Toward - 1.1011
16
CSE 24616 Conventional Rounding Error Rounding Error 1.10100 1.101= 0 1.10101 1.101= -0.25 1.10110 1.110 = +0.5 1.10111 1.110 = +0.25 Average Error = 0.5/4 = 0.125
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.