Evaluating Kolmogorov’s Distribution

Kolmogorov’s goodness-of-ﬁt measure, D n , for a sample CDF has consistently been set aside for methods such as the D + n or D − n of Smirnov, primarily, it seems, because of the diﬃculty of computing the distribution of D n . As far as we know, no easy way to compute that distribution has ever been provided in the 70+ years since Kolmogorov’s fundamental paper. We provide one here, a C procedure that provides Pr( D n < d ) with 13-15 digit accuracy for n ranging from 2 to at least 16000. We assess the (rather slow) approach to limiting form, and because computing time can become excessive for probabilities > .999 with n ’s of several thousand, we provide a quick approximation that gives accuracy to the 7th digit for such cases


Introduction
For an ordered set x1 < · · · < xn of purported uniform [0,1) variates, Kolmogorov [5] suggested as a goodness-of-fit measure. The distribution of Dn is difficult. It has been discussed extensively in the literature, but to date no easily-applied method has been made available. We offer one here. The alternatives proposed by Smirnov, either D + n , the maximum of the first half of the above list, or D − n , the maximum of the second half, have a common, easier, distribution. They are widely used, particularly in statistical computing, because of Knuth's recommended use of K + n = √ nD + n and K − n = √ nD − n on the grounds that they "seem most convenient for computer use", [4] p57.
Concerning the distribution of Dn, Drew, Glen and Leemis report in a recent article that after an extensive review, "There appears to be no source that produces exact distribution functions for any distribution where n > 3 in the literature", [2] p3. They then undertake to provide such by extending Birnbaum's development [1] of Pr(Dn < d) as a spline function: polynomials of degree n between knots at 1 2n , 2 2n , . . . , 1, using multiple integrals. They succeed in reducing the required successive integrations of Birnbaum's method-for example from 444540 to 800 when n = 10-and provide the polynomials to n = 6 with a comment that they had found all such polynomials up to n = 30, available on request at www.math.wm.edu/∼leemis. (Our request yielded "Access not authorized" and an email request went unanswered.) We provide here a relatively small C procedure, K(n,d), that will provide Pr(Dn < d) with far greater precision than is needed in practice. The method expresses d in the form d = (k − h)/n with k a positive integer and 0 ≤ h < 1. The C procedure K(n,d) uses numerical values for h, but with just the symbol h, one can, for example in Maple or Mathematica, easily derive polynomials in h that, with the substitution h = k − nd, yield the polynomials that make up the CDF between knots 1 2n , 2 2n , . . . , 1.
2 Evaluating Pr(D n < d) The method we use is based on a succession of developments that started with Kolmogorov's viewing the steps of the sample CDF as a Poisson process and culminated in the masterful treatment by Durbin [3]. His monograph summarizes and extends the results of numerous authors who had made progress on the problem in the years 1933-73. The result is a method that expresses the required probability as a certain element in the nth power of an easily formed matrix. History of the development is available through the monograph's 136 references.
We want to evaluate Pr(Dn < d). Write d = k − h n with k a positive integer and 0 ≤ h < 1.
Then Pr(Dn ≤ d) = n! n n t kk , where t kk is the k, k element of the matrix T = H n , and H is an m × m matrix, m = 2k − 1, whose general form is easily inferred from this particular case when m = 6 and h ≤ 1/2: The bottom row of the matrix reflects the first column in reverse order. Aside from the first column and last row, the i, jth element is 1/ Example: Suppose n = 10 and we want Pr(D10 ≤ .274). Express d = .274 as .274 = 3−h 10 , so that k = 3, m = 2k − 1 = 5 and h = .36.

Limiting Forms
The limiting form for the distribution function of Kolmogorov's Dn is the first representation given by Kolmogorov, the second coming from a standard relation for theta functions and better suited for small x. The moments come from easily-integrated terms of xL (x) and x 2 L (x). The mean and variance of √ nDn approach µ = π/2 ln(2) = .8687311605 · · · and σ 2 = π 2 /12 − µ 2 = .0677732044 · · · , σ = .2603328723 · · · , Since the mean and standard deviation of Dn are, roughly, .8687/ √ n and .26/ √ n, we may compare distributions and their approaches to limiting form by plotting Pr(Dn ≤ x/ √ n)−L(x) for, say, n = 64, 256, 1024, 4096, with x over an effective range for L(x), say .2 < x < 2.5, (-2.6 to 6.3 sigmas). Such plots are in Figure 1.
Approach to the limit is rather slow, with maximum error of about .278/ √ n near the 33rd percentile. That An is the current favorite for Diehard, but new versions will include both An and Dn.
In practice (at least in our practice), we have a randomly produced Dn which we wish to convert to a uniform (0,1) variate (p-value) by means of the probability transformation p = K(n, Dn). The C procedure below lets us do this very accurately, as well as quickly-except for p's near 1 and n's several thousand.
If K(n, Dn) is used in the Diehard tests, we might encounter some bad RNGs that return values up to 10 σ's from the mean, for which conversion to a p-value by means of K(n, Dn) might require minutes . For that reason, we include an optional line in the C program: s=d*d*n; if(s>7.24||(s>3.76&&n>99)) return 1.-2.*exp(-(2.000071+.331/sqrt(n)+1.409/n)*s); (As d √ n exceeds about 1.94, K(n, d) will exceed .999 and is approximately 1−2e −2nd 2 , which can be improved to 1 − 2e −(2.000071+.331/ √ n+1.409/n)nd 2 , with maximum error less than .0000005.) Use of that line provides more than adequate accuracy for K(n, d) > .999 and n ≥ 100, (roughly d √ n > 1.94), as well as protection from possible long computing time for any n when K(n, d) > .999999, (roughly, d √ n > 2.69). That extra line can be commented out for users who need the full 13-15 digit accuracy at the extreme right (and are willing to contend with potentially long running times). The extreme left causes no problems.
In computing H n , the required number of matrix multiplications is only log 2 (n) plus the number of 1's in the binary representation of n. A straightforward implementation encounters floating point exponent overflow around n = 714. Detailed inspection shows that the elements of H n grow quickly as n increases. Their magnitudes are not too diversified though, with largest values around the center of the matrix. To maintain floating point exponents within their allowable range, we keep a special matrix exponent. When the k, k element of a current matrix becomes greater than 10 140 , we divide every element by 10 140 and increase the matrix exponent by 140. The final matrix exponent is used to adjust the value of n! n n t k,k , where T = H n . The following C program contains the procedure K(n,d), as well as supporting procedures for multiplying and exponentiating matrices. It is in compact form to save space. To use K(n,d) you need only add a main program to a cut-and-paste version of the code listed below. Then make calls to K(n,d) from an int main(){ }. You should also lead with the usual #include <stdio.h>,#include <math.h> and #include <stdlib.h>.