Ratios of Normal Variables

This article extends and ampliﬁes on results from a paper of over forty years ago. It provides software for evaluating the density and distribution functions of the ratio z/w for any two jointly normal variates z, w , and provides details on methods for transforming a general ratio z/w into a standard form, ( a + x ) / ( b + y ) , with x and y independent standard normal and a, b non-negative constants. It discusses handling general ratios when, in theory, none of the moments exist yet practical considerations suggest there should be approximations whose adequacy can be veriﬁed by means of the included software. These approximations show that many of the ratios of normal variates encountered in practice can themselves be taken as normally distributed. A practical rule is developed: If a < 2 . 256 and 4 < b then the ratio ( a + x ) / ( b + y ) is itself approximately normally distributed with mean µ = a/ (1 . 01 b − . 2713) and variance σ 2 = ( a 2 + 1) / ( b 2 + . 108 b − 3 . 795) − µ 2 .


Introduction
In the early 60's, I encountered an application in medicine in which estimates of the life span of red cells, (normally around 110-120 days), depended on the intercept of a line whose coefficients were taken to be normally distributed, and hence the distribution of the intercept was that of a ratio of normal variables.
I undertook study of such ratios and later wrote a paper for Journal of the American Statistical Association (Marsaglia 1965) in which I remarked that an arbitrary ratio of jointly normal numerator and denominator could be linearly transformed to a ratio (a+x)/(b+y) with x and y independent standard normal and a and b non-negative constants, gave a compact expression for the density of (a+x)/(b+y) , and reported results of numerical studies showing that the density was unimodal for certain choices of the pair (a, b), and bimodal for the others.
The numerical work was difficult in those early days of computers-a FORTRAN program had to be punched, one statement per card, then taken, with two ten-inch stacks of cards for the two-pass FORTRAN compiler, to an early IBM computer half a mile away, then, a few days later, the stacks fetched with line printer results that you hoped had no errors requiring re-submission. Then numerical results from those studies led to lists of coordinates for points on densities for (a+x)/(b+y) that were sent to an Air Force plotter the size of a billiard table.
Over the intervening years, I have had many requests for further detail on how a general ratio can be transformed to the particular form (a + x)/(b + y) , as well as requests for practical handling of distributions for ratios that seem to be well-behaved, in spite of theory indicating that no moments exist. Those further details for transforming a general ratio of jointly normal variates z/w into the ratio (a+x)/(b+y) will be developed here, as well as software for evaluating the wide variety of densities and distribution functions that arise, plus simple approximations for moments of ratios when the denominator is conditioned by never being zero, and examples where the resulting ratios may themselves be taken as approximately normally distributed.

Transforming to the standard form (a+x)/(b+y)
Proposition: For any two jointly normal variates z and w with means µ z , µ w , variances σ 2 z , σ 2 w and correlation ρ, the distribution of z/w is, after translation and change of scale, the same as that of (a + x)/(b + y) with x, y independent standard normal and a, b non-negative constants. Specifically, for a given ratio z/w, there are constants r and s such that Thus, for suitable choices of constants r and s, r(z/w) − rs becomes (a + x)/(b + y).
, so that z − sw and w will be independent. Thus (z − sw)/w is the ratio of two independent normal variates. To put the ratio (z − sw)/w = z/w − s into the required form, we need only divide numerator and denominator by their respective sigmas, that is, multiply by r, the inverse ratio of the two standard deviations. Now (z − sw) has variance σ 2 z (1 − ρ 2 ), and the denominator, w, has variance σ 2 w . Thus But what are a and b? With h = ±σ z 1 − ρ 2 , the mean of (z − sw)/h is (µ z − sµ w )/h, and that of w/σ w is µ w /σ w . Since (−a + x)/(−b + y) has the same distribution as (a + x)/(b + y), we need only choose the sign of h so that the resulting a and b have the same sign: Example: Suppose we have jointly normal variates z and w with correlation ρ = .8, means and standard deviations µ z = 30.5, µ w = 32, σ z = 5, σ w = 4. We will develop the r and s that make r(z/w) − rs distributed as (a+x)/(b+y) , then find that the resulting distribution is among a class of ratios of normals that are themselves approximately normal (a < 2.5, b > 4, see Section 4). Using, from that section, approximations for moments that do not exist, we will find the mean and variance for a normal distribution that is quite close to the distribution of z/w, then verify that closeness by plotting the true and approximate densities.
For this ratio z/w, we have h = ±σ z 1 − ρ 2 = ±3, so that r = 4/h and s = 1. Then b = µ w σ w = 32 4 = 8, a = ±(µ z − sµ w )/h = ± 30.5 − 32 3 = ±.5, so we choose "−" for h (and r) to make a = .5 have the same sign as b = 8, and thus Since (.5 + x)/(8 + y) falls in the range a < 2.5, 4 < b for which the ratio of normals is itself approximately normal (Section 4), this z/w should be nearly normal. The approximating-or what might be called the practical-mean and variance of (.5 + x)/(8 + y) are, (via Sec. 4), so that z/w is itself approximately normal(.952,.0959), (mean= 1 − (3/4).064 = .952, sigma= (3/4).1279 = .0959). Figure 1 compares the normal density with mean .952, sigma .0959, with the true density of z/w, based on transforming the density of (.5 + x)/(8 + y) obtained by the methods in Section 3. Figure 1 also plots the density obtained in Section 5 by simply assuming that w is never negative. In this case, the b in the standard form for z/w is large enough, 8, that the easy and the true densities, both displayed here, are virtually indistinguishable.
The fit of the normal distribution is good enough that 10,000 variates produced as Φ((z/w − .952)/.0959) will likely pass a test for uniformity, but that little gap near the peaks is not likely to get past samples of millions produced as Φ((z/w − .952)/.0959).

Distribution of the standard form (a+x)/(b+y)
As developed in Marsaglia (1965), the density of (a+x)/(b+y) can be written which has 1/π/(1 + t 2 ), the Cauchy density for x/y, as a factor. The density f (t) may be evaluated by a simple procedure similar to that for Phi(x) in Marsaglia (2004), but direct evaluation of the distribution function, say However, values can be easily obtained by numerical integration of the density function, and small C procedures for both the density and the distribution are in the attached files for this article (available at http://www.jstatsoft.org/v16/i04/) where we make numerical integration practical by avoiding the need to integrate over tails. With Φ(z), the standard normal integral available as Phi(z) in the attached files, As we shall see, pictured in Figure 4 below, the density f of (a+x)/(b+y) is unimodal if the point (a, b) is in a tall, narrow region of the positive (a, b) quadrant, else f is bimodal. The reason for this can perhaps be explained by observing that the density f (t) is a mixture of two densities, one of them the Cauchy density, the other a density that must be bimodal. We have The Cauchy component, f 1 (t), is fixed, but the f 2 component changes with a and b, as do the proportions p and 1 − p. The f 2 part is always bimodal, with one mode to the left, and one mode to the right, of the point where q = 0, that is t = −b/a. If p = e − 1 2 (a 2 +b 2 ) is large enough in the mixture f (t) = pf 1 (t) + (1 − p)f 2 (t), then the contribution from the Cauchy part of that mixture may dominate, resulting in a mixture with a single mode.  gives examples where all three curves, the unimodal pf 1 (t), the bimodal (1 − p)f 2 (t) and their sum, p 1 f 1 (t) + (1 − p)f 2 (t), are plotted together. It indicates that sometimes the bimodal component falls just short of being able to make the total curve bimodal, as with(1.5+x)/(.35+y), or succeeds, as with (1.5+x)/(.5+y). The top curve is p 1 f 1 +(1−p)f 2 , the sum of the two lower curves, and is unimodal for the left plot, bimodal for the right.
As described above, the density of (a+x)/(b+y) is a mixture, f = pf 1 + (1 − p)f 2 , with f 1 the Cauchy density and f 2 a bimodal density with one mode to the left of t = −b/a, the other mode to the right. For a ≤ 1 the density of (a+x)/(b+y) is unimodal. There is a curve, starting at (a, b) = (1, 0) then quickly rising to become asymptotic to a = 2.256058904 · · ·, such that points (a, b) to the left of the curve lead to a unimodal density for (a+x)/(b+y) , and points (a, b) to the right yield a bimodal density. Points (a, b) on that curve may be defined implicitly by the condition that f (t) = f (t) = 0, but numerical studies seem necessary to find them. Results here were obtained using Maple with 50 digits of accuracy, then a fifth degree polynomial fitted to points (a, (2.256058904 − a)b).
However, one should keep in mind that the left modes of many of the bimodal densities may be ignored in practical applications, as they are likely to occupy only a tiny fraction of the total area under the full density function-typically 10 −6 to 10 −10 or less.
For many general ratios z/w, the density after shifts and changes of scale from that of the standard form (a+x)/(b+y) may be theoretically bimodal, but in practice that second mode may be so remotely located and small as to be insignificant. Two examples are in Figure 5, where the left mode is greatly magnified to show its shape, and moved to get into the picture. Both densities are bimodal, but the left modes are infinitesimal. For (3 + x)/(15 + y), the left mode height is 4.e-53 , shown from -50 to -5, magnified. The left mode of (20 + x)/(20 + y) has height 6.e-96 and is shown from -700 to -1, magnified.
Although the derivative of f (t) may be evaluated by a series method similar to the one developed for f , it seems difficult, except by successive evaluations, to find places where f (t) = 0, that is, modal points for f (t). But extensive numerical work seems to support the above unimodal-bimodal classification. Specific values (a, b) on that dividing curve have only been determined numerically. For example, the density of (2.25605 + x)/(500 + y) is unimodal, yet that of (2.25606 + x)/(500 + y) is bimodal, with a left mode near t = −376 having height f (−376) = .588 × 10 −54293 . (Ignoring the constant factor e −.5(a 2 +b 2 ) in the density of (a + x)/(b + y) makes it possible to find local extrema in the neighborhood of t = −376 without exceeding the floating point limits on some platforms.) That asymptotic value a = 2.256058904 · · · is supported by evaluations in Maple, where setting b = 10000, then 100000, then 1000000 all show f (t) = f (t) = 0 for a = 2.256058904 to ten places, in the neighborhood of, respectively, t = −7520, −75200, −752000.

Means and variances when the moments do not exist
As with the simpler Cauchy variate x/y, none of the moments of T = (a + x)/(b + y) exist, in the sense that integrals ∞ −∞ t i f (t) dt for i = 1, 2. . . . are infinite. Yet practical applications, in which the denominator b + y is not expected to approach zero, suggest that there may be some value in assuming that the denominator b + y is a normal variate conditioned by, say, −4 < y. In that case, all the moments of (a + x)/(b + y) exist, and it turns out that the mean and variance of (a + x)/(b + y), while difficult to find numerically, have remarkably simple yet accurate approximation formulas, as well as the surprising result that, given the means and variances determined by those formulas, (and a not too big, say a < 2.25), the ratio (a + x)/(b + y) seems to be itself close to normally distributed with mean and variance given by approximating formulas for E[1/(b + y)] and E[1/(b + y) 2 ].
If the denominator of (a + x)/(b + y) is conditioned by the assumption that −4 < y, then Here are the formulas: Under the assumption that the denominator b + y is a normal variate conditioned by −4 < y, the mean and variance of (a + x)/(b + y) are approximately The plots in Figure  The fit seems quite good. Furthermore, now that we have a mean and variance for the conditioned ratio (a+x)/(b+y) , it seems reasonable to see if the ratio is itself nearly normaland it often is, at least when a < 2.5 and 4 < b. Figure 7 shows examples of the true density of (a+x)/(b+y) compared to the normal density that results from conditioning the denominator and using the means and variances from the above approximating formulas.
We saw that for the z/w example of Section 2, where the closeness of z/w to normal was confirmed by generating ten thousand z/w's, then converting to uniform variates by means of Φ((z/w−.959)/.0959), with mean and sigma via the above formulas. Figure 7 compares the approximating normal density with the true density for that case, (.5 + x)/(8 + y), as well as for two other (a, b) cases where the normal ratio is itself nearly normal, (2, 8) and (1.5, 5). When b is large enough that we can ignore the possibility of b + y < 0, we may write and thus the density of (a+x)/(b+y) should be very close to When a > 2.256 · · · the true densities will still be bimodal, but the left mode will be insignificant, and, as Figure 8 shows, that right mode is virtually indistinguishable from φ((bt − a)/ √ 1 + t 2 )(b + at)/(1 + t 2 ) 3/2 when both are plotted.
Both the true right modes and the approximating densities are shown, but their closeness exceeds the resolution of the postcript plot that produced them. This provides some reassurance that, for many practical situations, the densities of ratios of normals z/w may be considered to have the familiar kinds of shapes often seen in Statistics-normal, t, chi, etc.

Is transforming to (a+x)/(b+y) necessary?
All the fuss may be unnecessary if one is willing to assume the denominator of z/w is always positive. With that assumption, Pr(z/w < t) = Pr(z < tw). If c = cov(z, w) = ρσ z σ w , then approximating distributions G and densities g are Thus, in the example of Section 2, with z/w having µ z = 30.5, µ w = 32, σ z = 5, σ w = 4 and ρ = .8, plotting the above g(t) together with the normal density with µ = .952 and σ = .0959 yielded two of the three curves in Figure 1. The indistinguishable pair: g and the true density.
The suitability of G serving as the distribution of z/w can be assessed by simulation: generate one million z/w's using the routine in the attachments, then see if G(z/w) is satisfactorily uniform in [0,1). Dividing the unit interval into 0-.01,.01-.02,.02-.03, etc., and letting k i be the number of times that G(w/z) falls in the ith interval, the chisquare value, (k i − 10 4 ) 2 /10 4 should average around 99 with sigma=14.07. The first run gave 96.756, a second gave 104.31.

Software attachments
Programs in C are provided in the attached files. They include procedures for evaluating F (t) = Pr((a + x)/(b + y) < t), as well as the density, f (t) = F (t). They also include the Phi function from Marsaglia (2004), needed to avoid integrating tail areas of f (t), as well as a procedure for generating a pair of jointly normal variates z, w, given µ z , µ w , σ z , σ w and correlation ρ. The latter is included for two reasons: firstly, to encourage use of large quantities of random variables in order to investigate properties of their distributions or confirm/refute the adequacy of various approximations, and secondly, to include my polar method for generating pairs of normal variates, often attributed to Box and Muller (1958).
In the early days of the cold war I was consulted by an Operations Research group seeking methods for choosing normally distributed radar sites for simulated bombing attacks. The method I developed required an average of 2(4/π) = 2.546 uniforms: choose random points x, y uniformly from the square −1 < x < 1, −1 < y < 1 until s = x 2 + y 2 < 1. Then s will itself be uniform in [0, 1), and is independent of the point (x/ √ s, y/ √ s), which is uniform on the circumference x 2 + y 2 = 1.
Thus the point −2 ln(s)(x/ √ s, y/ √ s) has the standard normal distribution in the plane; simplification follows by forming r = −2 ln(s)/s and returning rx and ry.
To get zero means, unit variances and correlation ρ, use x and (x 1 − ρ 2 + yρ), as in the zoverw() procedure of the attachments. Other parts of the attachments include procedures for f and F , the density and distribution of (a+ x)/(b + y) , for g and G, the approximate density and distribution for z/w when w is assumed > 0. With them, comparisons can be made after getting the true density of z/w by means of the true density of (a+x)/(b+y) , and approximate moments can be used to see if a normal distribution for z/w is appropriate.
Because the parameters of z/w : µ z , µ w , σ z , σ w , ρ, as well as the r, s, a, b used to exress z/w as a linear function of the standard form (a + x)/(b + y): z/w = s + [(a + x)/(b + y)]/r, may be needed in various routines, they are declared in a single initial statement, static double muz,muw,sigz,sigw,rho,a,b,r,s; and are thus available to all the routines as needed.
The main routine merely makes successive calls such as: zwviaxy (30.5,32,5,4,0.8); to provide details of the conversion of z/w to (a+x)/(b+y), given the parameters µ z , µ w , σ z , σ w , ρ for the jointly distributed z and w, which provides the distribution of z/w as well as tests for approximations: G when b > 4 and normality for z/w itself when b > 4 and a < 2.5.

Notes added in press
A referee has pointed out a recent article, Nadarajah (2006), that considers a more general ratio X/Y for "elliptically symmetric" variates, and a Google Scholar search will provide references for articles by Nadarajah on ratios of gamma, beta, t, bessel and other important kinds of distributions for X and Y .
That referee also commented on a point that, to avoid embarrassment, I had not included in my original submission. It involves an article, Hinkley (1969), that claims it is wrong to think that z/w is equivalent to a linear transformation of (a + x)/(b + y), because the former involves five parameters, µ z , µ w , σ z , σ w , ρ, while the latter involves only four: r, s, a, b.
It is nonsense to make conclusions based on the number of parameters involved when dealing with a ratio. Surely (3z)/(3w) has the same distribution as z/w.
The present article spells out details of the transformation of z/w to (a + x)/(b + y), which I had left to the reader in the original article, apparently assuming too much. Perhaps the present article, together with attached C files to facilitate numerical evaluation of distributions and densites-both exact and approximate-will help to clear up uncertainties that have led to comments in articles, queries and misinterpretations over the intervening forty years.