Representing fractions – Fixed point The problem: How to represent fractions with finite number of bits ?

Slides:



Advertisements
Similar presentations
Fixed Point Numbers The binary integer arithmetic you are used to is known by the more general term of Fixed Point arithmetic. Fixed Point means that we.
Advertisements

Computer Engineering FloatingPoint page 1 Floating Point Number system corresponding to the decimal notation 1,837 * 10 significand exponent A great number.
Topics covered: Floating point arithmetic CSE243: Introduction to Computer Architecture and Hardware/Software Interface.
Binary Arithmetic Binary addition Binary subtraction
Princess Sumaya Univ. Computer Engineering Dept. Chapter 3:
Floating Point Numbers
Floating Point Numbers. CMPE12cGabriel Hugh Elkaim 2 Floating Point Numbers Registers for real numbers usually contain 32 or 64 bits, allowing 2 32 or.
Floating Point Numbers. CMPE12cCyrus Bazeghi 2 Floating Point Numbers Registers for real numbers usually contain 32 or 64 bits, allowing 2 32 or 2 64.
Number Systems Standard positional representation of numbers:
Floating Point Numbers
CSE 378 Floating-point1 How to represent real numbers In decimal scientific notation –sign –fraction –base (i.e., 10) to some power Most of the time, usual.
Floating Point Numbers
Computer ArchitectureFall 2008 © August 27, CS 447 – Computer Architecture Lecture 4 Computer Arithmetic (2)
The Binary Number System
Simple Data Type Representation and conversion of numbers
Ch. 2 Floating Point Numbers
Binary Real Numbers. Introduction Computers must be able to represent real numbers (numbers w/ fractions) Two different ways:  Fixed-point  Floating-point.
Information Representation (Level ISA3) Floating point numbers.
Computer Architecture Lecture 3: Logical circuits, computer arithmetics Piotr Bilski.
Number Systems II Prepared by Dr P Marais (Modified by D Burford)
Computing Systems Basic arithmetic for computers.
Computer Architecture
ECE232: Hardware Organization and Design
Floating Point. Agenda  History  Basic Terms  General representation of floating point  Constructing a simple floating point representation  Floating.
Data Representation in Computer Systems
9.4 FLOATING-POINT REPRESENTATION
Fixed and Floating Point Numbers Lesson 3 Ioan Despi.
CSC 221 Computer Organization and Assembly Language
1 Number Systems Lecture 10 Digital Design and Computer Architecture Harris & Harris Morgan Kaufmann / Elsevier, 2007.
Princess Sumaya Univ. Computer Engineering Dept. Chapter 3:
Floating Point Arithmetic
1 Floating Point Operations - Part II. Multiplication Do unsigned multiplication on the mantissas including the hidden bits Do unsigned multiplication.
Computer Arithmetic Floating Point. We need a way to represent –numbers with fractions, e.g., –very small numbers, e.g., –very large.
Computer Architecture Lecture 22 Fasih ur Rehman.
Floating Point Numbers Representation, Operations, and Accuracy CS223 Digital Design.
Binary Arithmetic.
Number Representation Fixed and Floating Point
Binary Numbers The arithmetic used by computers differs in some ways from that used by people. Computers perform operations on numbers with finite and.
Cosc 2150: Computer Organization Chapter 9, Part 3 Floating point numbers.
FLOATING-POINT NUMBER REPRESENTATION
Floating Point Numbers
Floating Point Numbers
Introduction to Numerical Analysis I
Floating Point Representations
Department of Computer Science Georgia State University
Fundamentals of Computer Science
Programming and Data Structure
Binary Numbers The arithmetic used by computers differs in some ways from that used by people. Computers perform operations on numbers with finite and.
CSCI206 - Computer Organization & Programming
Integer Division.
Lecture 9: Floating Point
Floating Point Numbers: x 10-18
Floating Point Number system corresponding to the decimal notation
Outline Introduction Floating Point Arithmetic Adder Multiplier.
Topic 3d Representation of Real Numbers
Luddy Harrison CS433G Spring 2007
Number Representations
CSCI206 - Computer Organization & Programming
CSCI206 - Computer Organization & Programming
How to represent real numbers
Numbers representations
Overview Fractions & Sign Extension Floating Point Representations
Number Representations
1.6) Storing Integer: 1.7) storing fraction:
Lecture 9: Shift, Mult, Div Fixed & Floating Point
Presentation transcript:

Representing fractions – Fixed point The problem: How to represent fractions with finite number of bits ?

Representing fractions – Fixed point a 1 a 2 a 3 a 4 a 5 a 6 a 7 a 8 a 9 a 10 A number with 10 bits

Representing fractions – Fixed point a 1 a 2 a 3 a 4 a 5 a 6 a 7 a 8 a 9 a 10 A number with 10 bits a 1 a 2 a 3 a 4 a 5 a 6 a 7 a 8. a 9 a 10 Fixing the point

Representing fractions – Fixed point NumberFraction Signed (2 complement) -2 7 …2 7 -1Quanta of 0.25 Unsigned0…2 8 -1Quanta of 0.25 Range of representation:

Fixed point : the problem Cannot represent wide ranges of numbers. In scientific applications.

Representing Fractions – Floating point 10 1 * * Base (radix) - r

Representing Fractions – Floating point 10 1 * * exponent

Representing Fractions – Floating point 10 1 * * Number (Mantissa)

Representing Fractions – Floating point (-1) 0 *1 * 10 1 (-1) 1 *1.23 * Sign bit

Problem of uniqueness * *10 2 Representation is not Unique

Problem of uniqueness - Normalization * *10 2 Standardization 6.1*10 -1 One digit to the Left of the point

Normalized Binary Floating point D = (-1) a 0 * (1.a 1 a 2 a 3 …)*2 b 1 b 2 b 3 … a 0 b 1 b 2 …b n a 1 a 2 a 3 …a m String of bits

Floating point - Questions Representing the (signed) exponent How to represent zero? And Nan, infinity ? How to add, subtract and multiply? Rounding Errors.

Floating point – Representing the exponent How to represent singed number ? Sign bit 2-Complement

Floating point – Representing the exponent How to represent singed number ? Sign bit 2-Complement Neither

Floating point – Representing the exponent We want the exponent to be binary ordered: 0000 < 0001 < …. < 1000 < … < 1111

Floating point – Representing the exponent Number = Number - B 000… …1110 e min e max Usually B = 2 n-1 -1 We define the following sizes like this:

Floating point – Representing zero,NAN,  ±  ExponentMantissaRepresent e = e min -1M = 0±0 e = e min -1M ≠ 00.M*2 e min e min ≤ e ≤ e max 1.M*2 e e = e max +1M = 0 ±± e = e max +1M ≠ 0NaN IEEE754 special values Denormalized number normalized number

IEEE 754 ParameterSingleDouble Mantissa2453 e max e min Exponent width 811 Format width 3264 (Including the sign Bit)

What is NaN (not a number) OperationNaN produced by +  + (-  ) * 0 *  / 0/0,  /  Partial list Operation X±NaN X*NaN X/NaN

Infinity Provide a safe was to continue calculation when overflow is encountered.

Calculations with Floating Point numbers Addition: Equalize the exponents (smaller  larger exponent) Sum the mantissa Renormalize if necessary

Calculations with Floating Point numbers Example (in base 10): 91  9.10*10 1 |E| = 1, |M| =  9.70*10 0

Calculations with Floating Point numbers 9.10* *10 0 Not The same Order.

Calculations with Floating Point numbers 9.10* * * * *10 1 renormalize 1.07*10 2

Calculations with Floating Point numbers Example II (in base 10): 91  9.10*10 1 |E| = 1, |M| =  9.75*10 0

Calculations with Floating Point numbers 9.10* *10 0 Not The same Order.

Calculations with Floating Point numbers 9.10* * * * *10 1 renormalize 1.07* (rounding error)

Rounding Errors The Problem: Squeezing infinite many real numbers into a finite number of bits

Measuring Rounding Errors Units in last place (Ulps) Relative Error

Measuring Rounding Errors – ULP error = |d.dddd – (z/r e )|*r p-1 If d.dddd*r e represent z p digits

Measuring Rounding Errors – ULP Example I: r = 10, p = 3 The number 3.14*10 -2 represents Error = 0.159

Measuring Rounding Errors – ULP What is the maximum ULP if the rounding is toward the nearest number? 0.5 ULP

Measuring Rounding Errors – Relative Error Relative error = |d.dddd*r e – z|/z If d.dddd*r e represent z p digits

Measuring Rounding Errors – Relative errors Example I: r = 10, p = 3 The number 3.14*10 -2 represents Relative Error ~