A Fortran 90 Program for Evaluation of Multivariate Normal and Multivariate t Integrals Over Convex Regions

Let X' = (X1,X2, ... ,Xk) have the multivariate normal distribution f(X) = MVN(μ, ∑σ2) where ∑ is a known positive definite matrix, and σ2 is a constant. There are many problems in statistics which require the evaluation of f(x) over some convex region A. That is P = ∫A f(X) dX. If σ2 is known, then without loss of generality, set μ = 0, σ =1 and let ∑ be the correlation matrix. For the case where the region A is rectangular, the problem has been addressed by many authors. They include Gupta (1963), Milton (1972), Schervish (1984), Deak (1986), Wang and Kennedy (1990,1992), Olson and Weissfeld (1991), Drezner (1992) and Genz (1992,1993). However, regions of integration for many statistical applications, for example multiple comparisons, are not rectangular.


Introduction
Let X' = (X 1 ,X 2 , ... ,X k ) have the multivariate normal distribution f(X) = MVN(µ µ µ µ, Σσ 2 ) where Σ is a known positive definite matrix, and σ 2 is a constant. There are many problems in statistics which require the evaluation of f(x) over some convex region A. That is If σ 2 is known, then without loss of generality, set µ µ µ µ = 0 , σ =1 and let Σ be the correlation matrix. For the case where the region A is rectangular, the problem has been addressed by many authors. They include Gupta (1963), Milton (1972), Schervish (1984), Deak (1986), Kennedy (1990,1992), Olson and Weissfeld (1991), Drezner (1992) and Genz (1992Genz ( ,1993. However, regions of integration for many statistical applications, for example multiple comparisons, are not rectangular. If σ 2 is estimated by s 2 with ν degrees of freedom such that ν s 2 / σ 2 is a chi-square variate, then without loss of generality we may assume f(X) has the central multivariate-t distribution with correlation matrix Σ. For the case where the correlations have the product form ρ ij = λ i λ j (i ≠ j ) , Dunnett (1989) developed an algorithm to evaluate integrals of the multivariate t over rectangular regions.
The Fortran 90 program MVI3.FOR (which combines and extends two previous programs, MVI and MVIB) can be used to evaluate multivariate normal or multivariate-t integrals over any region which is bounded by hyperplanes (and thus convex). Three decimal accuracies may be obtained in seconds for 20 dimensional integrals using a 486 processor.
The original programs MVI and MVIB could be used only if the convex region contained the origin. The present program MVI3 has no such limitation. However, if the region does not contain the origin, MVI3 uses "crude Monte Carlo" for the evaluation. In addition, the user of the program may elect to do the evaluation using "crude Monte Carlo" even when the convex region contains the origin.

Methodology
The following is an overview of the methodology involved.
With no loss of generality, we can assume that the mean is at the origin of the coordinate system. A coordinate transformation (Cholesky) is made so that in the new coordinate system we have k uncorrelated normal or spherically symmetric t random variables with unit variances. The region will be bounded by the transformed hyperplanes.
We discuss first the case where the origin is inside the convex region. Consider a randomly selected point in the k dimensional space. The square of the distance to the point is the sum of the squares of the individual coordinates of the point. Thus if σ 2 is known, it is distributed as χ 2 with k degrees of freedom. If σ 2 is not known, the distance squared divided by k has the F distribution with k degrees of freedom for the numerator and degrees of freedom for the denominator the same as those for the estimate of σ 2 .
Select at random a direction in the k dimensional space and obtain the distance from the origin to the boundary. An unbiased estimate of the value of the integral is the probability of obtaining a distance less than or equal to the distance to the boundary.
Using the program (MVI), a prespecified number of random directions, say mocar, are obtained. The average of the corresponding probabilities estimates the value of the integral. To obtain a standard error, repeat the process of obtaining an estimate using mocar random directions a specified number of times (say irep). The final integral estimate is the average of the irep estimates, and the standard error of this estimate is obtained from the standard error of the irep individual estimates.
The program (MVIB) uses a somewhat more efficient procedure (called binning). Instead of calculating a probability corresponding to each individual random direction, the distances are "binned" and an empirical frequency step function of the distances is obtained. Gauss Legendre quadrature is then used to obtain a single estimate of the value of the integral for the mocar random directions. The boundaries of the bins are chosen so as to optimize the quadrature.
The binning procedure has been shown to be especially useful, Somerville (1997), in the calculation of percentage points for a statistic, using an iterative procedure. Using the "binning" procedure the random directions need only be generated for the first iteration. The empirical distribution function generated by the first iteration is used for subsequent iterations and the time for subsequent iterations is negligible.
The functions of the two programs MVI and MVIB have been combined in the program MVI3. In addition, the program has been extended to include the case where the origin is not in the region. For this case, "crude Monte Carlo" is used. Random points in the k space are selected and a determination is made as to whether the point is in or out of the region. To increase the efficiency of the Monte Carlo procedure, the boundaries are sorted and ordered so that the "signed distances" from the origin to the boundaries are non-decreasing. The efficiency of the Monte Carlo procedure is greatest for small integral values since the variance of the estimate decreases as the value ot the integral decreases toward zero The user may elect to use "crude Monte Carlo", regardless of whether or not the region contains the origin.
A comprehensive description of MVI and MVIB is given in Somerville and Wang (1994) and Somerville (1998). The latter paper also contains a comparison of the efficiencies of the methods. A summary is given in the APPENDIX.

User instructions and examples for MVI3
The program requires that two files exist before the program is run. QCALC.in contains the input requirements and QCALC.out is used for the program output. QCALC.in can contain the instructions for several integral evaluations, in which case the evaluations are done sequentially, with QCALC.out containing the results for each of the evaluations.
The instructions for a given evaluation consists of three parts for a total of k + mm +1 lines in QCALC.in.
Part 1 One line consisting of 6 (integer) items (in order): k, iseed, ndenom, mocar, irep, mm. k dimension of the integral iseed integer in range 1 to 2 31 -1 for a 32 bit machine ndenom degrees of freedom for variance estimate (use -1 if variance known) mocar # of random directions used for each individual estimate irep # of individual estimates mm # of hyperplanes boundaries for the convex region Part 2 k lines one line is used for each of the k rows of the lower triangular portion of either the variance covariance or the correlation matrix.
Part 3 mm rows of k + 1 constants one row is used for each hyperplane. Each row contains the coefficients of the expression l' x < d, where l and x are vectors.
The output (in QCALC.out) for the above example would be: date and time 19981008102335.520 elapsed time in seconds is 1.000 no of pops is 3 seed is 457 df for var est is 30 no of ran dir. is 1000 value of integral is calculated 10 times mean value is 5.003167E-01 standard error of mean is 5.807164E-04 --------------------

Additional Examples
Two possible replacements for the first line might be : 3 457 30 10000 1 5 or 3 457 30 100 100 5. The first replacement would use 10000 x 1 = 10000 random directions to produce a single estimate of the 3 dimensional integral. There would be no estimate of the accuracy of the calculation for the integral. Using the second replacement, we would use 100 x 100 = 10000 random direction for the estimate of the value of the integral. In practice, using the same seed, the estimate of the value of the integral would be nearly identical for all 3 cases. The program input (in QCALC.in) for the two replacement inputs follows:  Somerville (1998). Defining efficiency to mean the ratio of running times to achieve the same standard error of the estimate, MVIB was found to be more efficient than MVI by ratios of 1.26 to 2.25, depending on the dimension of the integral and the number of boundaries. Except for small probabilities, and possibly some "extreme" boundaries, "crude Monte Carlo" is not efficient.
A large number of runs were made to determine the proper number of bins for the binning procedure (MVIB). Except for cases of "extreme" regions, nq = 25 was found to be more than a sufficient number. By "extreme" regions we mean regions departing drastically from a "spheroid with center at the origin", and especially those with a boundary close to the origin. The proper number of bins required to obtain accuracies consistent with the obtained standard error of the estimate was found to be unrelated to the degrees of freedom for the estimate of the variance, but highly dependent on the minimum distance r* of the boundary from the origin. The following empirical formula was developed relating the value of the number of bins to the minimum distance: nq* = -14.9 -39.4*log10(r*) + 21.8*(log10(r*))**2. Using the "binning" procedure, the program uses nq = 25 whenever nq* < 25. This correspond to a distance of .20 (approximately). For nq* > 25, the program uses nq = nq* +10, unless the distance r* is less than .01, in which case the binning procedure is abandoned and the calculation is done using the MVI methodology.
A comment is in order where one of the boundaries goes through the origin (e.g. l'x < 0 ). The program automatically makes the evaluation using Monte Carlo whenever a boundary passes through the origin. An alternate (and recommended) solution would be to change the boundary from l'x < 0 to l'x < .000,000,1. Since the integral of the standard normal variable from 0 to .000,000,1 is 4 x 10 -8 , the additional error in the estimate of the integral value can increase by no more than that amount.
To generate random normal deviates, MVI3 uses an efficient generator recently developed by Marsaglia (1998).
A question which arises is how many random directions should be chosen. Using 10,000 random directions should always result in an estimate whose standard error is less than .005 and may be sufficient for many applications. The standard errors for integral values near 0, and especially for integral values near 1 are much smaller. Three decimal accuracies are obtained in seconds for 20 dimensional integrals on a 486 processor.

APPENDIX
Let X' = (X 1 ,X 2 , ... ,X k ) have the multivariate normal distribution f(X) = MVN(µ µ µ µ, Σ σ 2 ) where Σ is known and σ 2 is a constant. We may assume without loss of generality that X is MVN(0, Σ σ 2 ). We wish to evaluate where A is the convex region, containing the origin, and bounded by mm ( > 1) hyperplanes described by Lx < d where L' = (l 1 , l 2 , ... ,l k ). The j th hyperplane is given by l j ' x = d j .
Let Σ =T T' (Cholesky decomposition) and set X = T W. The random variables W 1 , W 2 , ... ,W k are independent standard normal or spherically symmetric t variables. With the transformation, the region becomes Gw < d where G = L T. Setting G' = (g 1 , g 2 , ... .g k ), the j th hyperplane becomes g j ' w = d j .
Consider a randomly selected point in the k dimensional space. The square of the distance to the point is the sum of the squares of the individual cordinates (random N(0,1) variables) of the point on the boundary. Thus if σ 2 is known, it is distributed as χ 2 with k degrees of freedom. If σ 2 is not known, the distance squared divided by k has the F distribution with k degrees of freedom for the numerator and degrees of freedom for the denominator the same as those for the estimate of σ 2 .
Select at random a direction in the k dimensional space and obtain the distance from the origin to the boundary. An unbiased estimate of the value of the integral is the probability of obtaining a distance less than or equal to the distance to the boundary. Then, for a random direction, if σ 2 is unknown, an unbiased estimate of the integral P is Prob [ F < r 2 /k]. If σ 2 is known, the unbiased estimate is Prob [ χ 2 < r 2 ]. To implement the procedure, successive random directions are chosen and corresponding estimates obtained. The value of the integral is the arithmetic mean of the individual estimates.

Binning procedure (MVIB)
Let r* be the minimum distance from the origin to the boundary of A. Divide A into two regions, the portion inside the hypersphere of radius r* and centered at the origin, and the region outside (say A 2 ). For σ 2 unknown, the probability content of the hypersphere is P 1 = Prob [ F < r* 2 /k]. For σ 2 known the probability is P 1 = Prob [ χ 2 < r* 2 ].
As before let r be the distance from the origin to a randomly selected point in the k-space. The relative frequency function of r, h(r) may be readily derived from the chi-square or F distributions depending on whether σ 2 is known or must be estimated. Let R be the distance from the origin to the boundary of the region in a random direction. Let G(R) be the cumulative distribution function of R. Then the probability content of the region is P 2 = ∫ [1 -G(r)] h(r) dr where the integration is from r* to infinity. The cumulative distribution function G(R) can be estimated using Monte Carlo.
It will be convenient to use reciprocal distance instead of distance. Putting v = 1/r, we obtain P 2 = ∫ E(v) g(v) dv where the integration is from 0 to 1/r*. The relative frequency function g(v) may be derived from h(r). E(v) = 1 -G(r) is the cumulative distribution function of the reciprocal distance from the origin to the boundary of the region.
Let a i be the i th abscissa and wt i be the corresponding weight for the q-point Gauss-Legendre quadrature. For each random direction c, the corresponding distance d is calculated and by means of a binning process, estimates are obtained for E(a i ), i = 1 to q. The estimate of P 2 is given by P 2 = Σ E(a i ) g(a i ) wt i where the summation is from i = 1 to q.