Download presentation
Presentation is loading. Please wait.
Published byTodd Hoover Modified over 9 years ago
1
Representing fractions – Fixed point The problem: How to represent fractions with finite number of bits ?
2
Representing fractions – Fixed point a 1 a 2 a 3 a 4 a 5 a 6 a 7 a 8 a 9 a 10 A number with 10 bits
3
Representing fractions – Fixed point a 1 a 2 a 3 a 4 a 5 a 6 a 7 a 8 a 9 a 10 A number with 10 bits a 1 a 2 a 3 a 4 a 5 a 6 a 7 a 8. a 9 a 10 Fixing the point
4
Representing fractions – Fixed point NumberFraction Signed (2 complement) -2 7 …2 7 -1Quanta of 0.25 Unsigned0…2 8 -1Quanta of 0.25 Range of representation:
5
Fixed point : the problem Cannot represent wide ranges of numbers. In scientific applications.
6
Representing Fractions – Floating point 10 1 * 10 1 -1.23 * 10 -2 Base (radix) - r -0.123
7
Representing Fractions – Floating point 10 1 * 10 1 -1.23 * 10 -1 exponent -0.123
8
Representing Fractions – Floating point 10 1 * 10 1 -1.23 * 10 -1 Number (Mantissa) -0.123
9
Representing Fractions – Floating point 10 -0.123 (-1) 0 *1 * 10 1 (-1) 1 *1.23 * 10 -1 Sign bit
10
Problem of uniqueness 0.1 100*10 -4 0.001*10 2 Representation is not Unique
11
Problem of uniqueness - Normalization 0.61 610*10 -4 0.0061*10 2 Standardization 6.1*10 -1 One digit to the Left of the point
12
Normalized Binary Floating point D = (-1) a 0 * (1.a 1 a 2 a 3 …)*2 b 1 b 2 b 3 … a 0 b 1 b 2 …b n a 1 a 2 a 3 …a m String of bits
13
Floating point - Questions Representing the (signed) exponent How to represent zero? And Nan, infinity ? How to add, subtract and multiply? Rounding Errors.
14
Floating point – Representing the exponent How to represent singed number ? Sign bit 2-Complement
15
Floating point – Representing the exponent How to represent singed number ? Sign bit 2-Complement Neither
16
Floating point – Representing the exponent We want the exponent to be binary ordered: 0000 < 0001 < …. < 1000 < … < 1111
17
Floating point – Representing the exponent Number = Number - B 000…0001 111…1110 e min e max Usually B = 2 n-1 -1 We define the following sizes like this:
18
Floating point – Representing zero,NAN, ± ExponentMantissaRepresent e = e min -1M = 0±0 e = e min -1M ≠ 00.M*2 e min e min ≤ e ≤ e max 1.M*2 e e = e max +1M = 0 ±± e = e max +1M ≠ 0NaN IEEE754 special values Denormalized number normalized number
19
IEEE 754 ParameterSingleDouble Mantissa2453 e max 1271023 e min -126-1022 Exponent width 811 Format width 3264 (Including the sign Bit)
20
What is NaN (not a number) OperationNaN produced by + + (- ) * 0 * / 0/0, / Partial list Operation X±NaN X*NaN X/NaN
21
Infinity Provide a safe was to continue calculation when overflow is encountered.
22
Calculations with Floating Point numbers Addition: Equalize the exponents (smaller larger exponent) Sum the mantissa Renormalize if necessary
23
Calculations with Floating Point numbers Example (in base 10): 91 9.10*10 1 |E| = 1, |M| = 3 9.7 9.70*10 0
24
Calculations with Floating Point numbers 9.10*10 1 + 9.70*10 0 Not The same Order.
25
Calculations with Floating Point numbers 9.10*10 1 + 9.70*10 0 9.10*10 1 + 0.97*10 1 10.7*10 1 renormalize 1.07*10 2
26
Calculations with Floating Point numbers Example II (in base 10): 91 9.10*10 1 |E| = 1, |M| = 3 9.75 9.75*10 0
27
Calculations with Floating Point numbers 9.10*10 1 + 9.75*10 0 Not The same Order.
28
Calculations with Floating Point numbers 9.10*10 1 + 9.75*10 0 9.10 *10 1 + 0.975*10 1 10.75*10 1 renormalize 1.07*10 2 5 (rounding error)
29
Rounding Errors The Problem: Squeezing infinite many real numbers into a finite number of bits
30
Measuring Rounding Errors Units in last place (Ulps) Relative Error
31
Measuring Rounding Errors – ULP error = |d.dddd – (z/r e )|*r p-1 If d.dddd*r e represent z p digits
32
Measuring Rounding Errors – ULP Example I: r = 10, p = 3 The number 3.14*10 -2 represents 0.0314159 Error = 0.159
33
Measuring Rounding Errors – ULP What is the maximum ULP if the rounding is toward the nearest number? 0.5 ULP
34
Measuring Rounding Errors – Relative Error Relative error = |d.dddd*r e – z|/z If d.dddd*r e represent z p digits
35
Measuring Rounding Errors – Relative errors Example I: r = 10, p = 3 The number 3.14*10 -2 represents 0.0314159 Relative Error ~ 0.0005
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.