© David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2010 ECE408, University of Illinois, Urbana-Champaign 1 Floating-Point Considerations.

© David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2010 ECE408, University of Illinois, Urbana-Champaign 2 Motivation High speed floating point arithmetic has become a standard feature for microprocessors. –Important for application programmers to understand and take advantage of floating point arithmetic in developing applications Focus on –Accuracy of floating point arithmetic –Precision of floating point number representation How the accuracy and precision of floating point number representation should be taken into consideration in parallel computation.

© David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2010 ECE408, University of Illinois, Urbana-Champaign 3 Objectives To understand the fundamentals of floating-point representation To know the IEEE-754 Floating Point Standard GeForce 8800 CUDA Floating-point speed, accuracy and precision –Deviations from IEEE-754 –Accuracy of device runtime functions –-fastmath compiler option –Future performance considerations

© David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2010 ECE408, University of Illinois, Urbana-Champaign 4 What is IEEE floating-point format? A floating point binary number consists of three parts: –sign (S), exponent (E), and mantissa (M). –Each (S, E, M) pattern uniquely identifies a floating point number. For each bit pattern, its IEEE floating-point value is derived as: –value = (-1) S * M * {2 E }, where 1.0 B ≤ M < 10.0 B = (2.0) 10 The interpretation of S is simple: S=0 results in a positive number and S=1 a negative number.

© David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2010 ECE408, University of Illinois, Urbana-Champaign 5 Normalized Representation Specifying that 1.0 B ≤ M < 10.0 B makes the mantissa value for each floating point number unique. –For example, the only one mantissa value allowed for 0.5 D is M =1.0 0.5 D = 1.0 B * 2 -1 –Neither 10.0 B * 2 -2 nor 0.1 B * 2 0 qualifies Because all mantissa values are of the form 1.XX…, one can omit the “1.” part in the representation. –The mantissa value of 0.5 D in a 2-bit mantissa is 00, which is derived by omitting “1.” from 1.00.

© David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2010 ECE408, University of Illinois, Urbana-Champaign 6 Exponent Representation In an n-bits exponent representation, 2 n-1 -1 is added to its 2's complement representation to form its excess representation. –See Table for a 3-bit exponent representation A simple unsigned integer comparator can be used to compare the magnitude of two FP numbers Symmetric range for +/- exponents (111 reserved) 2’s complementActual decimalExcess-3 0000011 0011100 0102101 0113110 100(reserved pattern) 111 101-3000 110-2001 111010

© David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2010 ECE408, University of Illinois, Urbana-Champaign 7 A simple, hypothetical 5-bit FP format Assume 1-bit S, 2-bit E, and 2-bit M –0.5 D = 1.00 B * 2 -1 –0.5 D = 0 00 00, where S = 0, E = 00, and M = (1.)00 2’s complementActual decimalExcess-1 00001 110 (reserved pattern) 11 00

© David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2010 ECE408, University of Illinois, Urbana-Champaign 8 Representable Numbers The representable numbers of a given format is the set of all numbers that can be exactly represented in the format. See Table for representable numbers of an unsigned 3- bit integer format 0000 0011 0102 0113 1004 1015 1106 1117 574210 36 98

© David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2010 ECE408, University of Illinois, Urbana-Champaign 9 Representable Numbers of a 5-bit Hypothetical IEEE Format: No-zero 0 No-zero Abrupt underflowGradual underflow EM S=0S=1 S=0S=1S=0S=1 00 2 -1 -(2 -1 ) 0000 01 2 -1 +1*2 -3 -(2 -1 +1*2 -3 ) 001*2 -2 -1*2 -2 10 2 -1 +2*2 -3 -(2 -1 +2*2 -3 ) 002*2 -2 -2*2 -2 11 2 -1 +3*2 -3 -(2 -1 +3*2 -3 ) 003*2 -2 -3*2 -2 0100 2020 -(2 0 ) 2020 2020 01 2 0 +1*2 -2 -(2 0 +1*2 -2 ) 2 0 +1*2 -2 -(2 0 +1*2 -2 )2 0 +1*2 -2 -(2 0 +1*2 -2 ) 10 2 0 +2*2 -2 -(2 0 +2*2 -2 ) 2 0 +2*2 -2 -(2 0 +2*2 -2 )2 0 +2*2 -2 -(2 0 +2*2 -2 ) 11 2 0 +3*2 -2 -(2 0 +3*2 -2 ) 2 0 +3*2 -2 -(2 0 +3*2 -2 )2 0 +3*2 -2 -(2 0 +3*2 -2 ) 1000 2121 -(2 1 ) 2121 2121 01 2 1 +1*2 -1 -(2 1 +1*2 -1 ) 2 1 +1*2 -1 -(2 1 +1*2 -1 )2 1 +1*2 -1 -(2 1 +1*2 -1 ) 10 2 1 +2*2 -1 -(2 1 +2*2 -1 ) 2 1 +2*2 -1 -(2 1 +2*2 -1 )2 1 +2*2 -1 -(2 1 +2*2 -1 ) 11 2 1 +3*2 -1 -(2 1 +3*2 -1 ) 2 1 +3*2 -1 -(2 1 +3*2 -1 )2 1 +3*2 -1 -(2 1 +3*2 -1 ) 11Reserved pattern Mmmmmmmmmmmmmmmmmm m mmmmmmmmmmmmmmmmmm Mmmmmmmmmmmmmmmmmm m Mmmmmmmmmmmmmmmmm Cannot represent Zero! Actual decimalExcess-1 001 110 (reserved pattern) 11 00 Exponent Values

© David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2010 ECE408, University of Illinois, Urbana-Champaign 10 Representable Numbers: 5 Observations 1.3 major intervals of representable numbers (defined by the number of different exponent values) –E = 00 B representable numbers: {0.5, 0.625, 0.750, 0.875} 10 –E = 01 B representable numbers: {1.00, 1.250, 1.500, 1.750} 10 –E = 10 B representable numbers: {2.00, 2.500, 3.000, 3.500} 10 1234 0

© David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2010 ECE408, University of Illinois, Urbana-Champaign 11 Representable Numbers: 5 Observations (cont.) 2.Mantissa bits define the number of representable numbers in every range; with N bits mantissa, 2 N representable numbers in each range –A value falling in an interval will be rounded to a representable number –The larger the number of representable numbers in an interval, the more precisely we can represent the value –Hence number of mantissa bits determine the precision of the representation 3.0 is not represented 4.Numbers less than 0.5 and greater than 0 are not represented (a gap in representable numbers) 5.Representable numbers closer to each other towards 0 1234 0

© David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2010 ECE408, University of Illinois, Urbana-Champaign 12 Flush to Zero Treat all bit patterns with E=0 as 0.0 –This takes away several representable numbers near zero and lump them all into 0.0 –For a representation with large M, a large number of representable numbers will be removed. 0 1234 0

© David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2010 ECE408, University of Illinois, Urbana-Champaign 13 Flush to Zero 0 No-zero Flush to Zero Denormalized EMS=0S=1 S=0S=1 S=0S=1 00 2 -1 -(2 -1 ) 00 00 012 -1 +1*2 -3 -(2 -1 +1*2 -3 ) 00 1*2 -2 -1*2 -2 102 -1 +2*2 -3 -(2 -1 +2*2 -3 ) 00 2*2 -2 -2*2 -2 112 -1 +3*2 -3 -(2 -1 +3*2 -3 ) 00 3*2 -2 -3*2 -2 01002020 -(2 0 ) 2020 2020 012 0 +1*2 -2 -(2 0 +1*2 -2 ) 2 0 +1*2 -2 -(2 0 +1*2 -2 ) 2 0 +1*2 -2 -(2 0 +1*2 -2 ) 102 0 +2*2 -2 -(2 0 +2*2 -2 ) 2 0 +2*2 -2 -(2 0 +2*2 -2 ) 2 0 +2*2 -2 -(2 0 +2*2 -2 ) 112 0 +3*2 -2 -(2 0 +3*2 -2 ) 2 0 +3*2 -2 -(2 0 +3*2 -2 ) 2 0 +3*2 -2 -(2 0 +3*2 -2 ) 10002121 -(2 1 ) 2121 2121 012 1 +1*2 -1 -(2 1 +1*2 -1 ) 2 1 +1*2 -1 -(2 1 +1*2 -1 ) 2 1 +1*2 -1 -(2 1 +1*2 -1 ) 102 1 +2*2 -1 -(2 1 +2*2 -1 ) 2 1 +2*2 -1 -(2 1 +2*2 -1 ) 2 1 +2*2 -1 -(2 1 +2*2 -1 ) 112 1 +3*2 -1 -(2 1 +3*2 -1 ) 2 1 +3*2 -1 -(2 1 +3*2 -1 ) 2 1 +3*2 -1 -(2 1 +3*2 -1 ) 11Reserved pattern Mmmmmmmm

© David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2010 ECE408, University of Illinois, Urbana-Champaign 14 Denormalized Numbers The actual method adopted by the IEEE standard is called denromalized numbers or gradual underflow. –The method relaxes the normalization requirement for numbers very close to 0. –whenever E=0, the mantissa is no longer assumed to be of the form 1.XX. Rather, it is assumed to be 0.XX. –In general, if the n-bit exponent is 0, the value is 0.M * 2 {-2^(n-1) + 2} e.g. if a 3-bit exponent is 0, the value is 0.M * 2 -2 0 1 2 3

© David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2010 ECE408, University of Illinois, Urbana-Champaign 15 Denormalization 0 No-zeroFlush to Zero Denormalized EMS=0S=1S=0S=1 S=0S=1 00 2 -1 -(2 -1 )00 00 012 -1 +1*2 -3 -(2 -1 +1*2 -3 )00 1*2 -2 -1*2 -2 102 -1 +2*2 -3 -(2 -1 +2*2 -3 )00 2*2 -2 -2*2 -2 112 -1 +3*2 -3 -(2 -1 +3*2 -3 )00 3*2 -2 -3*2 -2 01002020 -(2 0 )2020 2020 012 0 +1*2 -2 -(2 0 +1*2 -2 )2 0 +1*2 -2 -(2 0 +1*2 -2 ) 2 0 +1*2 -2 -(2 0 +1*2 -2 ) 102 0 +2*2 -2 -(2 0 +2*2 -2 )2 0 +2*2 -2 -(2 0 +2*2 -2 ) 2 0 +2*2 -2 -(2 0 +2*2 -2 ) 112 0 +3*2 -2 -(2 0 +3*2 -2 )2 0 +3*2 -2 -(2 0 +3*2 -2 ) 2 0 +3*2 -2 -(2 0 +3*2 -2 ) 10002121 -(2 1 )2121 2121 012 1 +1*2 -1 -(2 1 +1*2 -1 )2 1 +1*2 -1 -(2 1 +1*2 -1 ) 2 1 +1*2 -1 -(2 1 +1*2 -1 ) 102 1 +2*2 -1 -(2 1 +2*2 -1 )2 1 +2*2 -1 -(2 1 +2*2 -1 ) 2 1 +2*2 -1 -(2 1 +2*2 -1 ) 112 1 +3*2 -1 -(2 1 +3*2 -1 )2 1 +3*2 -1 -(2 1 +3*2 -1 ) 2 1 +3*2 -1 -(2 1 +3*2 -1 ) 11Reserved pattern

© David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2010 ECE408, University of Illinois, Urbana-Champaign 16 Special Bit Patterns All numbers are normalized floating point numbers except when All E bits are 0s –If M= 0 ; The number represents Zero –If M ≠ 0 ; The number is denormalized All E bits are 1s –If M= 0 ; The number represents ∞ All representable numbers fall between -∞ and + ∞ Divide by 0 results ∞ (overflow) Divide by ∞ results in 0 –If M ≠ 0 ; The number is “Not a Number”:NaN

© David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2010 ECE408, University of Illinois, Urbana-Champaign 17 Special Bit Patterns: NaN Generated by operations with input values that do not make sense: e.g. 0/0; 0 x ∞; ∞ / ∞; ∞ - ∞ Signaling NaNs (SNaN): represented with MSB of M 0 –Cause exception if used as input of arithmetic operation; execution is interrupted Something wrong with the execution of the code Making uninitialized data SNaNs allows detection of the use of such data during execution Current GPUs do not support SNaN due to the difficulty of supporting accurate signaling during massively parallel execution

© David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2010 ECE408, University of Illinois, Urbana-Champaign 18 Special Bit Patterns: Quiet NaNs Quiet NaNs: represented with MSB of M 1 (1.0 + quiet NaN) generates a quiet NaN A quite NaN is printed as “NaN” –User can review the output and decide if the application should be re-run with a different input for more valid results.

© David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2010 ECE408, University of Illinois, Urbana-Champaign 19 Precision Single-precision(SP) 32 bits: 1-bit S, 8-bit E, 23-bit M Double-precision(DP) 64 bits: 1-bit S, 11-bit E, 52-bit M –M has 29 bits more than SP: largest representation error is reduced to (1/2 29 ) of that of the SP format –E has 3 bits more than SP: # of intervals of representable numbers is extended Very large and very small values are representable

Arithmetic Accuracy and Rounding Accuracy of a floating point arithmetic operation is measured by maximal error introduced by the operation –source of error is when the result cannot be exactly represented and requires rounding mantissa of the result needs too many bits to be represented exactly Cause of rounding is pre-shifting in floating point arithmetic –Example: two operands to a floating-point addition have different exponents Mantissa of operand with the smaller exponent is right shifted until exponents are equal Final result has more bits than the format can accommodate © David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2010 ECE408, University of Illinois, Urbana-Champaign 20

Rounding Error Example Assume 5-bit representation (E: Excess-1; Denormilized M) 1.00*2 1 + 1.00*2 -2 { (0 10 00) + (0 00 01) } –Right shift mantissa of 1.00*2 -2 (0 00 01) so that the exponent becomes 1 instead of -2 –1.00*2 1 + 0.001*2 1 = 1.001 * 2 1 (ideally) –Result is not representable using 5-bit format (requires 3 bit mantissa) –Generate a result = closest representable number, 1.01 * 2 1 Introduces an error, 0.001 *2 1 = Half the place value of the least significant place: 0.5 ULP (Units in the Last Place) The most error introduced by properly designed hardware is no more than 0.5 ULP. –This is the accuracy achieved by addition and subtraction operations in G80/GT200. –Inversion introduces 2ULP errors © David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2010 ECE408, University of Illinois, Urbana-Champaign 21

Algorithm Considerations Numerical algorithms need to sum up a large number of values Order of summing values can affect accuracy of final result Example: Using 5-bit representation (E: Excess 1; M: Denormilized) {1.00*2 0 = (0 01 00); 1.00*2 -2 = (0 00 01) } Then sequential order of performing the following addition yields: (1.00*2 0 ) + ( 1.00*2 0 ) + (1.00*2 -2 ) + (1.00*2 -2 ) = (1.00*2 1 ) + (1.00*2 -2 ) + (1.00*2 -2 ) = (1.00*2 1 ) + (1.00*2 -2 ) = 1.00*2 1 © David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2010 ECE408, University of Illinois, Urbana-Champaign 22

Algorithm Considerations (cont.) Using a parallel algorithm, order of performing the addition yields: [(1.00*2 0 ) + ( 1.00*2 0 )] + [(1.00*2 -2 ) + (1.00*2 -2 )] = (1.00*2 1 ) + (1.00*2 -1 ) = 1.01*2 1 Some other parallel algorithm might produce less accurate results than the serial algorithm © David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2010 ECE408, University of Illinois, Urbana-Champaign 23 Pre-sorting of operands is often used to maximize floating-point arithmetic accuracy [(1.00*2 -2 ) + (1.00*2 -2 )] + [(1.00*2 0 ) + ( 1.00*2 0 )] = (1.00*2 -1 ) + (1.00*2 1 ) = 1.01*2 1

© David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2010 ECE408, University of Illinois, Urbana-Champaign 24 Arithmetic Instruction Throughput int and float add, shift, min, max and float mul, mad: 4 cycles per warp –int multiply (*) is by default 32-bit requires multiple cycles / warp –Use __mul24() / __umul24() intrinsics for 4-cycle 24-bit int multiply Integer divide and modulo are expensive –Compiler will convert literal power-of-2 divides to shifts –Be explicit in cases where compiler can’t tell that divisor is a power of 2 ! –Useful trick: foo % n == foo & (n-1) if n is a power of 2 Example: foo = 11; n=8; n-1=7; 11%8 = 3 = (1011)&(0111)=0011 SP Float is 32 bits; 23-bit M

© David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2010 ECE408, University of Illinois, Urbana-Champaign 25 Arithmetic Instruction Throughput Reciprocal, reciprocal square root, sin/cos, log, exp: 16 cycles per warp –These are the versions prefixed with “__” –Examples:__rcp(), __sin(), __exp() Other functions are combinations of the above –y / x == rcp(x) * y == 20 cycles per warp –sqrt(x) == rcp(rsqrt(x)) == 32 cycles per warp

© David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2010 ECE408, University of Illinois, Urbana-Champaign 26 Runtime Math Library There are two types of runtime math operations –__func(): direct mapping to hardware ISA Fast but low accuracy (see prog. guide for details) Examples: __sin(x), __exp(x), __pow(x,y) –func() : compile to multiple instructions Slower but higher accuracy (5 ulp, units in the least place, or less) Examples: sin(x), exp(x), pow(x,y) The -use_fast_math compiler option forces every func() to compile to __func()

© David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2010 ECE408, University of Illinois, Urbana-Champaign 27 Make your program float-safe! Future hardware will have double precision support –G80 is single-precision only –Double precision will have additional performance cost –Careless use of double or undeclared types may run more slowly on G80+ Important to be float-safe (be explicit whenever you want single precision) to avoid using double precision where it is not needed –Add ‘f’ specifier on float literals: foo = bar * 0.123; // double assumed foo = bar * 0.123f; // float explicit –Use float version of standard library functions foo = sin(bar); // double assumed foo = sinf(bar); // single precision explicit

© David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2010 ECE408, University of Illinois, Urbana-Champaign 28 Deviations from IEEE-754 Addition and Multiplication are IEEE 754 compliant –Maximum 0.5 ulp (units in the least place) error However, often combined into multiply-add (FMAD) –Intermediate result is truncated Division is non-compliant (2 ulp) Not all rounding modes are supported Denormalized numbers are not supported No mechanism to detect floating-point exceptions

© David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2010 ECE408, University of Illinois, Urbana-Champaign 29 GPU Floating Point Features G80SSEIBM AltivecCell SPE PrecisionIEEE 754 Rounding modes for FADD and FMUL Round to nearest and round to zero All 4 IEEE, round to nearest, zero, inf, -inf Round to nearest only Round to zero/truncate only Denormal handlingFlush to zero Supported, 1000’s of cycles Flush to zero NaN supportYes No Overflow and Infinity support Yes, only clamps to max norm Yes No, infinity FlagsNoYes Some Square rootSoftware onlyHardwareSoftware only DivisionSoftware onlyHardwareSoftware only Reciprocal estimate accuracy 24 bit12 bit Reciprocal sqrt estimate accuracy 23 bit12 bit log2(x) and 2^x estimates accuracy 23 bitNo12 bitNo

© David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2010 ECE408, University of Illinois, Urbana-Champaign 1 Floating-Point Considerations.

Similar presentations

Presentation on theme: "© David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2010 ECE408, University of Illinois, Urbana-Champaign 1 Floating-Point Considerations."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

© David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2010 ECE408, University of Illinois, Urbana-Champaign 1 Floating-Point Considerations.

Similar presentations

Presentation on theme: "© David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2010 ECE408, University of Illinois, Urbana-Champaign 1 Floating-Point Considerations."— Presentation transcript:

Similar presentations

About project

Feedback