Victor Kuliamin Institute for System Programming, Russian Academy of Sciences Moscow.

Victor Kuliamin Institute for System Programming, Russian Academy of Sciences Moscow

 Astrophysics  Geosciences  Biosciences  Social Sciences Confidence? 2 / 25 PSI 2009, June 19

FFloating-point numbers + arithmetics BBasic math functions EElementary SSpecial SSpecialized libraries LLinear algebra NNumber theory NNumeric calculus DDynamic systems OOptimization …… 3 / 25 : sqrt, pow, exp, log, sin, atan, cosh, … : erf, tgamma, j0, y1, … PSI 2009, June 19

 Normal: E > 0 & E < 2 k –1 X = (–1) S ·2 (E–B) ·(1+M/2 (n–k–1) )  Denormal: E = 0 X = (–1) S ·2 (–B+1) ·(M/2 (n–k–1) )  Exceptional: E = 2 k –1  M = 0 : + , –   M ≠ 0 : NaN 4 / 25 sign k+1 n-1n-1 0 exponentmantissa 0111111010010000000000000000000 01k n, k S E M B = 2 (k–1) –1 2 (–1) ·1.101 2 = 13/16 = 0,8125 0, -0 1/0 = + , (–1)/0 = –  0/0 = NaN n = 32, k = 8 – float (single precision) n = 64, k = 11 – double n = 79, k = 15– extended double n = 128, k = 15– quadruple PSI 2009, June 19 1/2 (n-k-1) – 1 ulp

 Correct rounding – 4 rounding modes  to +   to –   to 0  to the nearest  Exception flags  INVALID: Incorrect arguments (NaN result)  DIVISION-BY-ZERO: Infinite result (precise ±∞)  OVERFLOW: Too big result (approximate ±∞)  UNDERFLOW: Too small (or denormal) result  INEXACT: Inexact result 5 / 25 0 PSI 2009, June 19

6 / 25 PSI 2009, June 19 IDProcessor archLibraryOS x86i686glibc 2.5Linux Fedora ia64 glibc 2.4Linux Debian x86_64 glibc 2.3.4Linux RHEL s390 glibc 2.4Linux Debian ppc64 glibc 2.7Linux Debian ppc32 glibc 2.3.5Linux SLES sparcUltraSparc IIISolaris libcSolaris 10 VC8x86_64MS Visual C 2005Windows XP VC6i686MS Visual C 6.0Windows XP

PSI 2009, June 19 7 / 25 rint(262144.25)↑ = 262144 Exact 1 ulp errors* 2-5 ulp errors 6-2 10 ulp errors 2 10 -2 20 ulp errors >2 20 ulp errors Errors in exceptional cases Errors for denormals Completely buggy Unsupported logb(2 −1074 ) = −1022 expm1(2.2250738585072e−308) = 5.421010862427522e−20 exp(−6.453852113757105e−02) = 2.255531908873594e+15 sinh(29.22104351584205) = −1.139998423128585e+12 cosh(627.9957549410666) = −1.453242606709252e+272 sin(33.63133354799544) = 7.99995094799809616e+22 sin(− 1.793463141525662e−76) = 9.801714032956058e−2 acos(−1.0) = −3.141592653589794 cos(917.2279304172412) = −13.44757421002838 erf(3.296656889776298) = 8.035526204864467e+8 erfc(−5.179813474865007) = −3.419501182737284e+287 to nearest to –∞ to +∞ to 0 exp(553.8042397037792) = −1.710893968937284e+239

PSI 2009, June 19 8 / 25

PSI 2009, June 19 9 / 25

PSI 2009, June 19 10 / 25 Unsupported

 Standards  IEEE 754 (Floating-point arithmetics) FP numbers, basic operations  ISO 9899 (C language and libraries) 56 real + 16 complex functions  IEEE 1003.1 (POSIX) 63 real + 22 complex functions  ISO 10697.1-3 (Language independent arithmetics) Elementary real and complex functions 11 / 25 PSI 2009, June 19

type conversions, +, –, *, /, sqrt, remainder, fma (2008)  Correctly rounded results  4 rounding modes  Infinite results in overflow and precise infinity cases  In overflow rounding to 0 returns the biggest finite number  NaN results outside of function domain (and for NaN args)  Exception flags INVALID, DIVISION-BY-ZERO, OVERFLOW, UNDERFLOW, INEXACT 12 / 25 PSI 2009, June 19

 ISO/IEC 9899 (C language) : 54 real functions  Exact values : sin(0) = 0, log(1) = 0, …  DIVISION-BY-ZERO flag : log(0), atanh(1), pow(0,x), Г(-n)  NaN results and INVALID flag outside of domains  IEEE 1003.1 (POSIX) : 63 real + 22 complex  All IEEE 754 flags (except for INEXACT) for real functions  errno setting Domain error ~ INVALID or DIVISION-BY-ZERO Range error ~ OVERFLOW or UNDERFLOW  If x is denormal f(x) = x for each f(x)~x in 0 (sin, asin, sinh, expm1…)  In overflow HUGE_VAL should be returned (value of HUGE_VAL unspecified) 13 / 25 Inconsistency with rounding modes PSI 2009, June 19 Source of non-interoperability glibc : +∞ MSVCRT: max double (1.797693134862316e+308) Solaris libc: max float (3.402823466385289e+38)

 Real and complex elementary functions (no erf, gamma, j0, y1, … )  Only symmetric rounding modes (no rounding to +  or to –  )  Preservation of sign  Preservation of monotonicity  Inaccuracy 0.5-2.0 ulp  Evenness and oddity  Exact values: cosh(0) = 1, log(1) = 0, …  Asymptotics near 0 : cos(x) ~ 1, sin(x) ~ x, …  Relations: expm1 = sinh, atan <= ↓( π /2 ), … 14 / 25 for sin, cos, tan – small arguments only PSI 2009, June 19

DDomain boundaries and poles (+ flags) EExact values, limits and asymptotics PPreservation of sign and monotonicity SSymmetries Evenness, periodicity, others : Г(1+x) = x·Г(x) RRelations and range boundaries PPrecision Correct rounding (according to mode) CComputational accuracy IInteroperability and portability of libraries and applications FFeasible – ~ia64 (Intel), crlibm (INRIA) 15 / 25 PSI 2009, June 19

| Correct rounding Oddity (sym. with –x, 1/x) 16 / 25 Range boundaries POSIX : f(x) = x for denormal x and f(x)~x in 0 PSI 2009, June 19 POSIX : HUGE_VAL instead of +∞

Extension of IEEE 754 on all library functions  Correctly rounded results for 4 modes  Except for ones contradicting to range boundaries  Infinite results in overflow and precise infinity cases  In overflow rounding to 0 returns the biggest finite number  NaN results outside of function domain (and for NaN args)  Exception flags  INVALID (and EDOM for errno ) : Incorrect arguments  DIVISION-BY-ZERO (and ERANGE for errno ) : Infinite result  OVERFLOW (and ERANGE for errno ) : Too big result  UNDERFLOW (and ERANGE for errno ) : Too small result ( + dnr)  INEXACT : Inexact result 17 / 25 PSI 2009, June 19

 Bit structure of FP numbers  Boundaries o 0, - 0, + , - , NaN o Least and greatest positive and negative, normal and denormal  Mantissa patterns FFFFFFFFFFFFF 16 FFFFF11110000 16 555550000FFFF 16 Both arguments and values of a function  Intervals of uniform function behavior  Points hard to compute correctly rounded result 18 / 25 PSI 2009, June 19 rint(262144.25)↑ = 262144 0100000100010000000000000000000100000000000000000000000000000000 x10000010001xxxxxxxxxxxxxxxxxx0100000000000000000000000000000000

 Neighbourhoods of 0, ±∞  Poles and overflow points  Zeroes and extremes  Tangents and asymtotics – horizontal and diagonal 19 / 25 max 0 PSI 2009, June 19

tan(1.1101111111111111111111111111111111111111111100011111 2 ·2 -22 ) = 1.1110000000000000000000000000000000000000000101010001 0 1 78 010… 2 ·2 -22 sin(1.1110000000000000000000000000000000000000011100001000 2 ·2 -19 ) = 1.1101111111111111111111111111111111111100000010111000 0 67 11101… 2 ·2 -19 j1(1.1000000000000000000000000000000000000000000000000011 2 ·2 -23 ) = 1.0111111111111111111111111111111111111111111111101000 0 94 11001… 2 ·2 -22 20 / 25 Rounding to the nearest f = x.xxxxxxxxxx|011111111... 1xx... f = x.xxxxxxxxxx|100000000... 0xx... Rounding to 0, + , -  f = x.xxxxxxxxxx|00000000... 0xx... f = x.xxxxxxxxxx|11111111... 1xx... ? ! PSI 2009, June 19 0,5 ulp

PSI 2009, June 19 Probabilistic evaluation Uniform independent bits distribution  Total N = 2 (n-k-1) values  ~N·2 -m have m consecutive equal bits Real data for sin on exponent -16 21 / 25 Eval.  0, + , -   N 540.501 53112 52244 51466 5081012 49161921 4832 37 47647067 46128142106 45256280239 44512547518 4310241073996 42204821031985 41409641874040 40819283258142

PSI 2009, June 19  Exhaustive search  Continued fractions (Kahan, 1983)  Dyadic method (Tang, 1989; Kahan, 1994)  Reduced search (Lefevre, 1997)  Lattice reduction (Gonnet, 2002; Stehle, Lefevre, Zimmermann, 2003)  Integer secants method (2007) 22 / 25 Feasible only for single precision numbers X ≈ N· π ; X = M·2 m ; 2 (n – k – 1) <= M < 2 (n – k)  π ≈ (2 m ·M)/N 3386417804515981120643892082331156599120239393299838035242121518428537 5540647742216209302675834747096020680456860263629892718144118637084998 6972132271594662263430201169763297290792255889271083061603403854134215 4669787134871905353772776431251615694251273653 · π/2 = 1.0110101011000101101100100110001011001010000111111110 1 857 011… 2 ·2 849 sin(1.0110101011000101101100100110001011001010000111111111 2 ·2 849 ) = 1.11111111111111111111111111111111111111111111111111 1 69 0110… 2 ·2 -1 sqrt(N·2 m ) ≈ M + ½; 2 (n-k-1) <= M, N < 2 (n-k)  2 (m+2) ·N = (2·M + 1) 2 – j  (2·M + 1) 2 = j (mod 2 (m+2) ) j = 15 sqrt(1.0010010101100101011001011100101011011100101111110100 2 ) = 1.0001001000001111100110011001111010011001001101110100 0 1 50 000… 2 F(x) = f(x) – a·x – b = c 1 x 2 + c 2 x 3 + c 3 x 4 + … F(x) = c 1 (G(x) ) 2, G(x) = x + d 1 x 2 + d 2 x 3 +… G(x) = y  x = H(y), H is the reversed series x m = H(sqrt(m/c 1 2 z ))  F(x m ) – a·x m – b = m/2 z 2–z2–z

 Hard points  double o Some hard points with ≥ 48 additional bits can be found in crlibm tests http://lipforge.ens-lyon.fr/projects/crlibm o Calculated (some) hard points with ≥ 40 additional bits for sqrt, cbrt, sin, asin, cos, acos, tan, atan, sinh, asinh, cosh, tanh, atanh, exp, log, exp2, expm1, log1p, erf, erfc, j0, j1  float (single precision) o All hard points with ≥ 17 additional bits for sqrt, cbrt, exp, sin, cos  extended double o All with ≥ 53 additional bits for sqrt, some for sin, exp  Test suites developed  double : all 37 single real variable POSIX functions  Correct values calculated by Maple and MPFR 23 / 25 PSI 2009, June 19 sqrtexpsinatanlgammaj1 Boundary20 Intrevals1061622367442421168024538 Patterns141009138451331744155008121502109036 Hard points170170285876234295512029436 Other848200461602295664 Total396125168680402396254782133431168694

 No adequate standards for math libraries Several standards, sometimes inconsistent, highly incomplete  Correct rounding is needed for interoperability  Test suites are useful even without standard 24 / 25 PSI 2009, June 19 ? Complete set of hard points for some function ? Multiple variable functions

Contact  E-mail:kuliamin@ispras.ru  Web:www.ispras.ru/~kuliamin Thank you! Questions? 25 / 25 PSI 2009, June 19

Victor Kuliamin Institute for System Programming, Russian Academy of Sciences Moscow.

Similar presentations

Presentation on theme: "Victor Kuliamin Institute for System Programming, Russian Academy of Sciences Moscow."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Victor Kuliamin Institute for System Programming, Russian Academy of Sciences Moscow.

Similar presentations

Presentation on theme: "Victor Kuliamin Institute for System Programming, Russian Academy of Sciences Moscow."— Presentation transcript:

Similar presentations

About project

Feedback