Representing fractions – Fixed point The problem: How to represent fractions with finite number of bits ?

Representing fractions – Fixed point a 1 a 2 a 3 a 4 a 5 a 6 a 7 a 8 a 9 a 10 A number with 10 bits

Representing fractions – Fixed point a 1 a 2 a 3 a 4 a 5 a 6 a 7 a 8 a 9 a 10 A number with 10 bits a 1 a 2 a 3 a 4 a 5 a 6 a 7 a 8. a 9 a 10 Fixing the point

Representing fractions – Fixed point NumberFraction Signed (2 complement) -2 7 …2 7 -1Quanta of 0.25 Unsigned0…2 8 -1Quanta of 0.25 Range of representation:

Fixed point : the problem Cannot represent wide ranges of numbers. In scientific applications.

Representing Fractions – Floating point 10 1 * 10 1 -1.23 * 10 -2 Base (radix) - r -0.123

Representing Fractions – Floating point 10 1 * 10 1 -1.23 * 10 -1 exponent -0.123

Representing Fractions – Floating point 10 1 * 10 1 -1.23 * 10 -1 Number (Mantissa) -0.123

Representing Fractions – Floating point 10 -0.123 (-1) 0 *1 * 10 1 (-1) 1 *1.23 * 10 -1 Sign bit

Problem of uniqueness 0.1 100*10 -4 0.001*10 2 Representation is not Unique

Problem of uniqueness - Normalization 0.61 610*10 -4 0.0061*10 2 Standardization 6.1*10 -1 One digit to the Left of the point

Normalized Binary Floating point D = (-1) a 0 * (1.a 1 a 2 a 3 …)*2 b 1 b 2 b 3 … a 0 b 1 b 2 …b n a 1 a 2 a 3 …a m String of bits

Floating point - Questions Representing the (signed) exponent How to represent zero? And Nan, infinity ? How to add, subtract and multiply? Rounding Errors.

Floating point – Representing the exponent How to represent singed number ? Sign bit 2-Complement

Floating point – Representing the exponent How to represent singed number ? Sign bit 2-Complement Neither

Floating point – Representing the exponent We want the exponent to be binary ordered: 0000 < 0001 < …. < 1000 < … < 1111

Floating point – Representing the exponent Number = Number - B 000…0001 111…1110 e min e max Usually B = 2 n-1 -1 We define the following sizes like this:

Floating point – Representing zero,NAN,  ±  ExponentMantissaRepresent e = e min -1M = 0±0 e = e min -1M ≠ 00.M*2 e min e min ≤ e ≤ e max 1.M*2 e e = e max +1M = 0 ±± e = e max +1M ≠ 0NaN IEEE754 special values Denormalized number normalized number

IEEE 754 ParameterSingleDouble Mantissa2453 e max 1271023 e min -126-1022 Exponent width 811 Format width 3264 (Including the sign Bit)

What is NaN (not a number) OperationNaN produced by +  + (-  ) * 0 *  / 0/0,  /  Partial list Operation X±NaN X*NaN X/NaN

Infinity Provide a safe was to continue calculation when overflow is encountered.

Calculations with Floating Point numbers Addition: Equalize the exponents (smaller  larger exponent) Sum the mantissa Renormalize if necessary

Calculations with Floating Point numbers Example (in base 10): 91  9.10*10 1 |E| = 1, |M| = 3 9.7  9.70*10 0

Calculations with Floating Point numbers 9.10*10 1 + 9.70*10 0 Not The same Order.

Calculations with Floating Point numbers 9.10*10 1 + 9.70*10 0 9.10*10 1 + 0.97*10 1 10.7*10 1 renormalize 1.07*10 2

Calculations with Floating Point numbers Example II (in base 10): 91  9.10*10 1 |E| = 1, |M| = 3 9.75  9.75*10 0

Calculations with Floating Point numbers 9.10*10 1 + 9.75*10 0 Not The same Order.

Calculations with Floating Point numbers 9.10*10 1 + 9.75*10 0 9.10 *10 1 + 0.975*10 1 10.75*10 1 renormalize 1.07*10 2 5 (rounding error)

Rounding Errors The Problem: Squeezing infinite many real numbers into a finite number of bits

Measuring Rounding Errors Units in last place (Ulps) Relative Error

Measuring Rounding Errors – ULP error = |d.dddd – (z/r e )|*r p-1 If d.dddd*r e represent z p digits

Measuring Rounding Errors – ULP Example I: r = 10, p = 3 The number 3.14*10 -2 represents 0.0314159 Error = 0.159

Measuring Rounding Errors – ULP What is the maximum ULP if the rounding is toward the nearest number? 0.5 ULP

Measuring Rounding Errors – Relative Error Relative error = |d.dddd*r e – z|/z If d.dddd*r e represent z p digits

Measuring Rounding Errors – Relative errors Example I: r = 10, p = 3 The number 3.14*10 -2 represents 0.0314159 Relative Error ~ 0.0005

Representing fractions – Fixed point The problem: How to represent fractions with finite number of bits ?

Similar presentations

Presentation on theme: "Representing fractions – Fixed point The problem: How to represent fractions with finite number of bits ?"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Representing fractions – Fixed point The problem: How to represent fractions with finite number of bits ?

Similar presentations

Presentation on theme: "Representing fractions – Fixed point The problem: How to represent fractions with finite number of bits ?"— Presentation transcript:

Similar presentations

About project

Feedback