Sources of Computational Errors FLP approximates exact computation with real numbers Two sources of errors to understand and counteract: 1. Representation.

Slides:



Advertisements
Similar presentations
Computer Engineering FloatingPoint page 1 Floating Point Number system corresponding to the decimal notation 1,837 * 10 significand exponent A great number.
Advertisements

CENG536 Computer Engineering department Çankaya University.
Lecture 2 Addendum Rounding Techniques ECE 645 – Computer Arithmetic.
Topics covered: Floating point arithmetic CSE243: Introduction to Computer Architecture and Hardware/Software Interface.
2-1 Chapter 2 - Data Representation Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring Computer Architecture.
Floating Point Numbers
1 IEEE Floating Point Revision Guide for Phase Test Week 5.
COMP3221: Microprocessors and Embedded Systems Lecture 14: Floating Point Numbers Lecturer: Hui Wu Session 2, 2004.
ECIV 201 Computational Methods for Civil Engineers Richard P. Ray, Ph.D., P.E. Error Analysis.
May 2015Computer Arithmetic, Real ArithmeticSlide 1 Part V Real Arithmetic 28. Reconfigurable Arithmetic Appendix: Past, Present, and Future.
CS 447 – Computer Architecture Lecture 3 Computer Arithmetic (2)
Round-Off and Truncation Errors
1 CSE1301 Computer Programming Lecture 30: Real Number Representation.
CSE1301 Computer Programming Lecture 33: Real Number Representation
Dr Damian Conway Room 132 Building 26
Computer Arithmetic, Number Representation
Copyright 2008 Koren ECE666/Koren Sample Mid-term 1.1 Israel Koren Spring 2008 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Digital.
CSE 378 Floating-point1 How to represent real numbers In decimal scientific notation –sign –fraction –base (i.e., 10) to some power Most of the time, usual.
Copyright 2008 Koren ECE666/Koren Part.4c.1 Israel Koren Spring 2008 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Digital Computer.
1 Error Analysis Part 1 The Basics. 2 Key Concepts Analytical vs. numerical Methods Representation of floating-point numbers Concept of significant digits.
1 Module 2: Floating-Point Representation. 2 Floating Point Numbers ■ Significant x base exponent ■ Example:
Computer ArchitectureFall 2008 © August 27, CS 447 – Computer Architecture Lecture 4 Computer Arithmetic (2)
Number Systems Lecture 02.
Chapter 3 Data Representation part2 Dr. Bernard Chen Ph.D. University of Central Arkansas Spring 2010.
Chapter3 Fixed Point Representation Dr. Bernard Chen Ph.D. University of Central Arkansas Spring 2009.
Numerical Computations in Linear Algebra. Mathematically posed problems that are to be solved, or whose solution is to be confirmed on a digital computer.
Information Representation (Level ISA3) Floating point numbers.
Computer Organization and Architecture Computer Arithmetic Chapter 9.
Computer Arithmetic Nizamettin AYDIN
Number Systems II Prepared by Dr P Marais (Modified by D Burford)
Lecture 2 Number Representation and accuracy
Computer Architecture
Floating Point. Agenda  History  Basic Terms  General representation of floating point  Constructing a simple floating point representation  Floating.
Data Representation in Computer Systems
S. Rawat I.I.T. Kanpur. Floating-point representation IEEE numbers are stored using a kind of scientific notation. ± mantissa * 2 exponent We can represent.
Floating Point (a brief look) We need a way to represent –numbers with fractions, e.g., –very small numbers, e.g., –very large numbers,
9.4 FLOATING-POINT REPRESENTATION
Round-off Errors.
Fixed and Floating Point Numbers Lesson 3 Ioan Despi.
Round-off Errors and Computer Arithmetic. The arithmetic performed by a calculator or computer is different from the arithmetic in algebra and calculus.
ECE 8053 Introduction to Computer Arithmetic (Website: Course & Text Content: Part 1: Number Representation.
CSC 221 Computer Organization and Assembly Language
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1 Chapter 3.
Computer Engineering FloatingPoint page 1 Floating Point Number system corresponding to the decimal notation 1,837 * 10 significand exponent A great number.
Floating Point Numbers Representation, Operations, and Accuracy CS223 Digital Design.
CS 232: Computer Architecture II Prof. Laxmikant (Sanjay) Kale Floating point arithmetic.
Fixed-point and floating-point numbers Ellen Spertus MCS 111 October 4, 2001.
Binary Numbers The arithmetic used by computers differs in some ways from that used by people. Computers perform operations on numbers with finite and.
COSC2410: LAB 2 BINARY ARITHMETIC SIGNED NUMBERS FLOATING POINT REPRESENTATION BOOLEAN ALGEBRA 1.
Cosc 2150: Computer Organization Chapter 9, Part 3 Floating point numbers.
Introduction to Numerical Analysis I
Integer Division.
Floating Point Number system corresponding to the decimal notation
CS 232: Computer Architecture II
Chapter 6 Floating Point
Outline Introduction Floating Point Arithmetic Adder Multiplier.
(Part 3-Floating Point Arithmetic)
CSCI206 - Computer Organization & Programming
Computer Science 210 Computer Organization
Floating Point Arithmetic
UNIVERSITY OF MASSACHUSETTS Dept
Prof. Giancarlo Succi, Ph.D., P.Eng.
UNIVERSITY OF MASSACHUSETTS Dept
Numbers with fractions Could be done in pure binary
Presentation transcript:

Sources of Computational Errors FLP approximates exact computation with real numbers Two sources of errors to understand and counteract: 1. Representation errors e.g., no machine representation for 1/3, 2, or  2. Arithmetic errors e.g., (1 + 2 –12 ) 2 = – –24 not representable in IEEE format Errors due to finite precision can lead to disasters in life- critical applications

Example of Representation and Computational Errors Compute 1/99 – 1/100 (decimal floating-point format, 4-digit significand in [1, 10), single-digit signed exponent) precise result = 1/9900   10–4 (error  10–8 or 0.01%) x = 1/99   10–2 Error  10–6 or 0.01% y = 1/100 =  10–2 Error = 0 z = x –fp y =  10–2 –  10–2 =  10–4 Error  10–6 or 1%

Notation for Floating-Point System FLP(r,p,A) Radix r (assume to be the same as the exponent base b) Precision p in terms of radix-r digits Approximation scheme A  {chop, round, rtne, chop(g),...} Let x = r e s be an unsigned real number, normalized such that 1/r  s < 1, and x fp be its representation in FLP(r, p, A) x fp = r e s fp = (1 +  )x A = chop –ulp < s fp – s  0 –r  ulp <  0 A = round –ulp/2 < s fp – s  ulp/2  r  ulp/2

Arithmetic in FLP(r, p, A) Obtain an infinite-precision result, then chop, round,... Real machines approximate this process by keeping g > 0 guard digits, thus doing arithmetic in FLP(r, p, chop(g))

Error Analysis for FLP(r, p, A) Consider multiplication, division, addition, and subtraction for positive operands x fp and y fp in FLP(r, p, A) Due to representation errors, x fp = (1 +  )x, y fp = (1 +  )y x fp  fp y fp = (1 +  )x fp y fp = (1 +  )(1 +  )(1 +  )xy = (1 +  +  +  +  +  +  +  )xy  (1 +  +  +  )xy x fp  fp y fp = (1 +  )x fp  y fp = (1 +  )(1 +  )x  [(1 +  )y] = (1 +  )(1 +  )(1 –  )(1 +  2 )(1 +  4 )(... )x  y  (1 +  +  –  )x  y

Error Analysis for FLP(r, p, A) x fp + fp y fp = (1 +  )(x fp + y fp ) = (1 +  )(x +  x + y +  y) = (1 +  )[1 +(  x +  y)/(x + y)](x + y) Since  x +  y  max( ,  )(x + y), the magnitude of the worst-case relative error in the computed sum is roughly bounded by  + max( ,  ) x fp – fp y fp = (1 +  )(x fp – y fp ) = (1 +  )(x +  x – y –  y) = (1 +  )[1 +(  x –  y)/(x – y)](x – y) The term (  x –  y)/(x – y) can be very large if x and y are both large but x – y is relatively small This is known as cancellation or loss of significance

Fixing the Problem The part of the problem that is due to  being large can be fixed by using guard digits Theorem 19.1: In floating-point system FLP(r, p, chop(g)) with g  1 and –x < y < 0 < x, we have: x + fp y = (1 +  )(x + y) with –r –p +1 <  < r –p–g+2 Corollary: In FLP(r, p, chop(1)) x + fp y = (1 +  )(x + y) with  < r –p+1 So, a single guard digit is sufficient to make the relative arithmetic error in floating-point addition/subtraction comparable to the representation error with truncation

Example With and Without Use of Guard Decimal floating-point system (r = 10) with p = 6 and no guard digit x =  10 3 y = –  10 2 x fp =  10 3 y fp = –  10 2 x + y =  10 –4 and x fp + y fp = 10 –4, but: x fp + fp y fp =  10 3 – fp  10 3 =  10 –2

Example With and Without Use of Guard Relative error = (10 –3 –  10 –4 )/(0.544  10 –4 )  (i.e., the result is 1738% larger than the correct sum!) With 1 guard digit, we get: x fp + fp y fp =  10 3 – fp  10 3 =  10 –3 Relative error = 80.5% relative to the exact sum x + y but the error is 0% with respect to x fp + y fp

Invalidate Laws of Algebra Many laws of algebra do not hold for floating-point Associative law of addition: a + (b + c) = (a + b) + c a =  10 5 b = –  10 5 c =  10 1 a + fp (b + fp p c) =  fp (–  fp  10 1 ) =  10 5 – fp  10 5 =  10 1 (a +fp b) + fp c = (  10 5 – fp  10 5 ) + fp  10 1 =  fp  10 1 =  10 1 The two results differ by about 20%!

Possible Remedy: Unnormalized Arithmetic a + fp (b + fp c) =  fp (–  fp  10 1 ) =  10 5 – fp  10 5 =  10 5 (a + fp b) + fp c = (  10 5 – fp  10 5 ) + fp  10 1 =  fp  10 1 =  10 5 Not only are the two results the same but they carry with them a kind of warning about the extent of potential error

Examples of Other Laws of Algebra That Do Not Hold Associative law of multiplication a  (b  c) = (a  b)  c Cancellation law (for a > 0) a  b = a  c implies b = c Distributive law a  (b + c) = (a  b) + (a  c) Multiplication canceling division a  (b  a) = b

Error Distribution and Expected Errors MRRE = maximum relative representation error MRRE(FLP(r, p, chop)) = r –p+1 MRRE(FLP(r, p, round)) = r –p+1 /2 From a practical standpoint, however, the distribution of errors and their expected values may be more important Limiting ourselves to positive significands, we define: ARRE(FLP(r, p, A)) = average relative representation error See Fig for example probability density function