J. Randall Brown Milton E Harvey Ceyhun Ozgur

J. Randall Brown Milton E Harvey Ceyhun Ozgur
Rational Arithmetic Mathematica Functions to Evaluate the Two-Sided Two Sample Kolmogorov-Smirnov Cumulative Sampling Distribution J. Randall Brown Milton E Harvey Ceyhun Ozgur

Kolmogorov-Smirnov Test
widely used nonparametric goodness-of-fit test. designed to determine whether two populations are the same or one population is greater than the other. five formulae are identified that can evaluate the Kolmogorov-Smirnov two-sided two sample cumulative sampling distribution (determine a p-value). All five formulae are implemented in rational arithmetic and the fastest formula under various conditions is determined using Mathematica.

Kolmogorov-Smirnov two sample two-sided Test
Following the notation in Pratt and Gibson (1981), take two independent random samples measured on at least an ordinal scale of size m and n from continuous cumulative distributions F(t) and G(t) respectively . Let the observations from F(t) be designated by X1, X2, X3,…..Xm and from G(t) by Y1, Y2, Y3,…..Yn where no two X and Y values are equal . We define nXleYj as the number of X’s less than or equal Y(j).

Alternate ways to express the two-sided upper random variable are shown below.

The formula to calculate the test statistic d+ for a specific data set is shown below.

Since the test statistic for the two-sided two sample K–S test is the absolute difference between two empirical sampling distributions and a zero difference is impossible because we assume no two and values are equal (either or both must be positive), the only possible values of d are: =1/mn, 2/mn….,mn-2/mn, mn-1/mn, 1 The test statistic cannot be equal to zero because we assume that no two X values and no two Y values are equal to zero and the test statistics d+ and d- cannot both be zero. Note that two X values can be equal and two Y values can be equal but we assume no X value equals a Y value. However, in the real world, an X value may equal a Y value because they are actually equal or they are equal to the accuracy of the measurement technique.

The smallest value for the test statistic d+ = k/mn is zero because when j = n, the corresponding term in equation (1) is In general, if i is the largest integer common denominator between m and n, then the possible values must be an integer multiple of i. For both sample size pairs m=2, n = 4 and m=4, n = 6; the largest integer common denominator is i=2 and the possible values for k in Table 2 are integer multiples of 2.

Kolmogorov-Smirnov Test Implementation issues
Using current computational software, the formulae can be implemented using either rational arithmetic, arbitrary precision arithmetic, or machine precision arithmetic. Rational arithmetic stores every number as a ratio of two integers (a rational number) where each integer can have as many decimal digits as needed to express the number exactly. Although the speed of rational arithmetic declines as the number of digits in the numerator/ denominator integers increase, it has the advantage of no error as long as no irrational numbers are used.

machine precision arithmetic specifies the number of decimal digits (usually less than twenty and determined by the computer hardware) to use in computations so it is subject to round off error and catastrophic cancellation. Catastrophic cancellation occurs when one number is subtracted from another number of about the same value. For example, if is subtracted from both with nine decimal digits of precision, then the result is with two decimal digits of precision. Although machine precision is fast, it is possible to significantly degrade the accuracy and even worse, not be aware that the accuracy has been reduced.

Arbitrary precision arithmetic is like machine precision except that the number of decimal digits of precision is not dependent on the computer hardware and the user can specify the number of decimal digits of precision. Although arbitrary precision is slower than machine precision, it is faster than rational arithmetic. In addition, the software system Mathematica, keeps track of the resulting precision rp so that for the example above, Mathematica would also know that the result had a precision of rp = 2.

The trick in using arbitrary precision arithmetic is specifying the precision to be used in internal calculations (internal precision ip) so that the final answer has a specified desired precision dp or greater. In other words, the user must specify both ip and dp so that the final answer has ip greater than or equal to dp.

In terms of accuracy, rational arithmetic gives exact probability (no error) as long as the test statistic d can be expressed exactly as a rational number; it cannot be an irrational number like /100. The only way d+ can be an irrational number is if the hypothesized distribution F(x) for some x is an irrational number because by definition the empirical distribution Fn(x) is a rational number (i/n for i = 0,1,2,….n). In such cases d can be approximated arbitrarily closely by rational numbers above and below d+. These are then used to calculate the p-value to any desired accuracy.

Kolmogorov-Smirnov Test (formulae)
Gnedenko and Korolyuk (GK) Hodges Steck Determinate (SteckDet) Pratt and Gibson (PG)

Kolmogorov-Smirnov Test (software)
IMSL (Two-sided KS)-Kolmogorov-Smirnov) (calculates p-values based on a recursive formula in Kim and Jenrich) IMSL (One-sided KS-Kolmogorov-Smirnov) One-half the corresponding two-sided p-value. Numerical Recipes (Two-sided KS-Kolmogorov-Smirnov) (calculates p-values based on a approximation in Stephens) R Development Core Team (Two-sided KS) (calculates p-values based on a table in Birnbaum and Hall

Kolmogorov-Smirnov Test (software)
S Plus (Two-sided KS) (limiting distribution in Smirnov and table in Massey) S Plus (One-sided KS) (One-half of the corresponding two-sided p-value) SPSS (Two-sided KS) (Modification of limiting distribution in Smirnov) Most software packages calculate the one-sided two-sample p-value directly. They just use one-half the two-sided p-value which is an approximation of unknown accuracy.

All formulae were implemented in Mathematica using rational arithmetic When, m=n the GK formula is faster than Korolyuk, Steck, SteckDet and PG. When n is an integer multiple of m Korolyuk is much faster than Steck, SteckDet and PG. When n is not an integer multiple of m, Steck is faster than both SteckDet and PG.

Kolmogorov-Smirnov Test Research progression
Rational arithmetic Rational arithmetic to evaluate approximations. Arbitrary Precision Arbitrary Precision to evaluate approximations for large samples

J. Randall Brown Milton E Harvey Ceyhun Ozgur

Similar presentations

Presentation on theme: "J. Randall Brown Milton E Harvey Ceyhun Ozgur"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

J. Randall Brown Milton E Harvey Ceyhun Ozgur

Similar presentations

Presentation on theme: "J. Randall Brown Milton E Harvey Ceyhun Ozgur"— Presentation transcript:

Similar presentations

About project

Feedback