Download presentation
Presentation is loading. Please wait.
Published byVictoria Henry Modified over 9 years ago
1
Victor Kuliamin Institute for System Programming, Russian Academy of Sciences Moscow
2
Astrophysics Geosciences Biosciences Social Sciences Confidence? 2 / 25 PSI 2009, June 19
3
FFloating-point numbers + arithmetics BBasic math functions EElementary SSpecial SSpecialized libraries LLinear algebra NNumber theory NNumeric calculus DDynamic systems OOptimization …… 3 / 25 : sqrt, pow, exp, log, sin, atan, cosh, … : erf, tgamma, j0, y1, … PSI 2009, June 19
4
Normal: E > 0 & E < 2 k –1 X = (–1) S ·2 (E–B) ·(1+M/2 (n–k–1) ) Denormal: E = 0 X = (–1) S ·2 (–B+1) ·(M/2 (n–k–1) ) Exceptional: E = 2 k –1 M = 0 : + , – M ≠ 0 : NaN 4 / 25 sign k+1 n-1n-1 0 exponentmantissa 0111111010010000000000000000000 01k n, k S E M B = 2 (k–1) –1 2 (–1) ·1.101 2 = 13/16 = 0,8125 0, -0 1/0 = + , (–1)/0 = – 0/0 = NaN n = 32, k = 8 – float (single precision) n = 64, k = 11 – double n = 79, k = 15– extended double n = 128, k = 15– quadruple PSI 2009, June 19 1/2 (n-k-1) – 1 ulp
5
Correct rounding – 4 rounding modes to + to – to 0 to the nearest Exception flags INVALID: Incorrect arguments (NaN result) DIVISION-BY-ZERO: Infinite result (precise ±∞) OVERFLOW: Too big result (approximate ±∞) UNDERFLOW: Too small (or denormal) result INEXACT: Inexact result 5 / 25 0 PSI 2009, June 19
6
6 / 25 PSI 2009, June 19 IDProcessor archLibraryOS x86i686glibc 2.5Linux Fedora ia64 glibc 2.4Linux Debian x86_64 glibc 2.3.4Linux RHEL s390 glibc 2.4Linux Debian ppc64 glibc 2.7Linux Debian ppc32 glibc 2.3.5Linux SLES sparcUltraSparc IIISolaris libcSolaris 10 VC8x86_64MS Visual C 2005Windows XP VC6i686MS Visual C 6.0Windows XP
7
PSI 2009, June 19 7 / 25 rint(262144.25)↑ = 262144 Exact 1 ulp errors* 2-5 ulp errors 6-2 10 ulp errors 2 10 -2 20 ulp errors >2 20 ulp errors Errors in exceptional cases Errors for denormals Completely buggy Unsupported logb(2 −1074 ) = −1022 expm1(2.2250738585072e−308) = 5.421010862427522e−20 exp(−6.453852113757105e−02) = 2.255531908873594e+15 sinh(29.22104351584205) = −1.139998423128585e+12 cosh(627.9957549410666) = −1.453242606709252e+272 sin(33.63133354799544) = 7.99995094799809616e+22 sin(− 1.793463141525662e−76) = 9.801714032956058e−2 acos(−1.0) = −3.141592653589794 cos(917.2279304172412) = −13.44757421002838 erf(3.296656889776298) = 8.035526204864467e+8 erfc(−5.179813474865007) = −3.419501182737284e+287 to nearest to –∞ to +∞ to 0 exp(553.8042397037792) = −1.710893968937284e+239
8
PSI 2009, June 19 8 / 25
9
PSI 2009, June 19 9 / 25
10
PSI 2009, June 19 10 / 25 Unsupported
11
Standards IEEE 754 (Floating-point arithmetics) FP numbers, basic operations ISO 9899 (C language and libraries) 56 real + 16 complex functions IEEE 1003.1 (POSIX) 63 real + 22 complex functions ISO 10697.1-3 (Language independent arithmetics) Elementary real and complex functions 11 / 25 PSI 2009, June 19
12
type conversions, +, –, *, /, sqrt, remainder, fma (2008) Correctly rounded results 4 rounding modes Infinite results in overflow and precise infinity cases In overflow rounding to 0 returns the biggest finite number NaN results outside of function domain (and for NaN args) Exception flags INVALID, DIVISION-BY-ZERO, OVERFLOW, UNDERFLOW, INEXACT 12 / 25 PSI 2009, June 19
13
ISO/IEC 9899 (C language) : 54 real functions Exact values : sin(0) = 0, log(1) = 0, … DIVISION-BY-ZERO flag : log(0), atanh(1), pow(0,x), Г(-n) NaN results and INVALID flag outside of domains IEEE 1003.1 (POSIX) : 63 real + 22 complex All IEEE 754 flags (except for INEXACT) for real functions errno setting Domain error ~ INVALID or DIVISION-BY-ZERO Range error ~ OVERFLOW or UNDERFLOW If x is denormal f(x) = x for each f(x)~x in 0 (sin, asin, sinh, expm1…) In overflow HUGE_VAL should be returned (value of HUGE_VAL unspecified) 13 / 25 Inconsistency with rounding modes PSI 2009, June 19 Source of non-interoperability glibc : +∞ MSVCRT: max double (1.797693134862316e+308) Solaris libc: max float (3.402823466385289e+38)
14
Real and complex elementary functions (no erf, gamma, j0, y1, … ) Only symmetric rounding modes (no rounding to + or to – ) Preservation of sign Preservation of monotonicity Inaccuracy 0.5-2.0 ulp Evenness and oddity Exact values: cosh(0) = 1, log(1) = 0, … Asymptotics near 0 : cos(x) ~ 1, sin(x) ~ x, … Relations: expm1 = sinh, atan <= ↓( π /2 ), … 14 / 25 for sin, cos, tan – small arguments only PSI 2009, June 19
15
DDomain boundaries and poles (+ flags) EExact values, limits and asymptotics PPreservation of sign and monotonicity SSymmetries Evenness, periodicity, others : Г(1+x) = x·Г(x) RRelations and range boundaries PPrecision Correct rounding (according to mode) CComputational accuracy IInteroperability and portability of libraries and applications FFeasible – ~ia64 (Intel), crlibm (INRIA) 15 / 25 PSI 2009, June 19
16
| Correct rounding Oddity (sym. with –x, 1/x) 16 / 25 Range boundaries POSIX : f(x) = x for denormal x and f(x)~x in 0 PSI 2009, June 19 POSIX : HUGE_VAL instead of +∞
17
Extension of IEEE 754 on all library functions Correctly rounded results for 4 modes Except for ones contradicting to range boundaries Infinite results in overflow and precise infinity cases In overflow rounding to 0 returns the biggest finite number NaN results outside of function domain (and for NaN args) Exception flags INVALID (and EDOM for errno ) : Incorrect arguments DIVISION-BY-ZERO (and ERANGE for errno ) : Infinite result OVERFLOW (and ERANGE for errno ) : Too big result UNDERFLOW (and ERANGE for errno ) : Too small result ( + dnr) INEXACT : Inexact result 17 / 25 PSI 2009, June 19
18
Bit structure of FP numbers Boundaries o 0, - 0, + , - , NaN o Least and greatest positive and negative, normal and denormal Mantissa patterns FFFFFFFFFFFFF 16 FFFFF11110000 16 555550000FFFF 16 Both arguments and values of a function Intervals of uniform function behavior Points hard to compute correctly rounded result 18 / 25 PSI 2009, June 19 rint(262144.25)↑ = 262144 0100000100010000000000000000000100000000000000000000000000000000 x10000010001xxxxxxxxxxxxxxxxxx0100000000000000000000000000000000
19
Neighbourhoods of 0, ±∞ Poles and overflow points Zeroes and extremes Tangents and asymtotics – horizontal and diagonal 19 / 25 max 0 PSI 2009, June 19
20
tan(1.1101111111111111111111111111111111111111111100011111 2 ·2 -22 ) = 1.1110000000000000000000000000000000000000000101010001 0 1 78 010… 2 ·2 -22 sin(1.1110000000000000000000000000000000000000011100001000 2 ·2 -19 ) = 1.1101111111111111111111111111111111111100000010111000 0 67 11101… 2 ·2 -19 j1(1.1000000000000000000000000000000000000000000000000011 2 ·2 -23 ) = 1.0111111111111111111111111111111111111111111111101000 0 94 11001… 2 ·2 -22 20 / 25 Rounding to the nearest f = x.xxxxxxxxxx|011111111... 1xx... f = x.xxxxxxxxxx|100000000... 0xx... Rounding to 0, + , - f = x.xxxxxxxxxx|00000000... 0xx... f = x.xxxxxxxxxx|11111111... 1xx... ? ! PSI 2009, June 19 0,5 ulp
21
PSI 2009, June 19 Probabilistic evaluation Uniform independent bits distribution Total N = 2 (n-k-1) values ~N·2 -m have m consecutive equal bits Real data for sin on exponent -16 21 / 25 Eval. 0, + , - N 540.501 53112 52244 51466 5081012 49161921 4832 37 47647067 46128142106 45256280239 44512547518 4310241073996 42204821031985 41409641874040 40819283258142
22
PSI 2009, June 19 Exhaustive search Continued fractions (Kahan, 1983) Dyadic method (Tang, 1989; Kahan, 1994) Reduced search (Lefevre, 1997) Lattice reduction (Gonnet, 2002; Stehle, Lefevre, Zimmermann, 2003) Integer secants method (2007) 22 / 25 Feasible only for single precision numbers X ≈ N· π ; X = M·2 m ; 2 (n – k – 1) <= M < 2 (n – k) π ≈ (2 m ·M)/N 3386417804515981120643892082331156599120239393299838035242121518428537 5540647742216209302675834747096020680456860263629892718144118637084998 6972132271594662263430201169763297290792255889271083061603403854134215 4669787134871905353772776431251615694251273653 · π/2 = 1.0110101011000101101100100110001011001010000111111110 1 857 011… 2 ·2 849 sin(1.0110101011000101101100100110001011001010000111111111 2 ·2 849 ) = 1.11111111111111111111111111111111111111111111111111 1 69 0110… 2 ·2 -1 sqrt(N·2 m ) ≈ M + ½; 2 (n-k-1) <= M, N < 2 (n-k) 2 (m+2) ·N = (2·M + 1) 2 – j (2·M + 1) 2 = j (mod 2 (m+2) ) j = 15 sqrt(1.0010010101100101011001011100101011011100101111110100 2 ) = 1.0001001000001111100110011001111010011001001101110100 0 1 50 000… 2 F(x) = f(x) – a·x – b = c 1 x 2 + c 2 x 3 + c 3 x 4 + … F(x) = c 1 (G(x) ) 2, G(x) = x + d 1 x 2 + d 2 x 3 +… G(x) = y x = H(y), H is the reversed series x m = H(sqrt(m/c 1 2 z )) F(x m ) – a·x m – b = m/2 z 2–z2–z
23
Hard points double o Some hard points with ≥ 48 additional bits can be found in crlibm tests http://lipforge.ens-lyon.fr/projects/crlibm o Calculated (some) hard points with ≥ 40 additional bits for sqrt, cbrt, sin, asin, cos, acos, tan, atan, sinh, asinh, cosh, tanh, atanh, exp, log, exp2, expm1, log1p, erf, erfc, j0, j1 float (single precision) o All hard points with ≥ 17 additional bits for sqrt, cbrt, exp, sin, cos extended double o All with ≥ 53 additional bits for sqrt, some for sin, exp Test suites developed double : all 37 single real variable POSIX functions Correct values calculated by Maple and MPFR 23 / 25 PSI 2009, June 19 sqrtexpsinatanlgammaj1 Boundary20 Intrevals1061622367442421168024538 Patterns141009138451331744155008121502109036 Hard points170170285876234295512029436 Other848200461602295664 Total396125168680402396254782133431168694
24
No adequate standards for math libraries Several standards, sometimes inconsistent, highly incomplete Correct rounding is needed for interoperability Test suites are useful even without standard 24 / 25 PSI 2009, June 19 ? Complete set of hard points for some function ? Multiple variable functions
25
Contact E-mail:kuliamin@ispras.ru Web:www.ispras.ru/~kuliamin Thank you! Questions? 25 / 25 PSI 2009, June 19
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.