Calculation of critical values for Somerville's FDR procedures

A Fortran 95 program has been written to calculate critical values for the step-up and step-down FDR procedures developed by Somerville (2004). The program allows for arbitrary selection of number of hypotheses, FDR rate, one- or two-sided hypotheses, common correlation coeﬃcient of the test statistics and degrees of freedom. An MCV (minimum critical value) may be speciﬁed, or the program will calculate a speciﬁed number of critical values or steps in an FDR procedure. The program can also be used to eﬃciently ascertain an upper bound to the number of hypotheses which the procedure will reject, given either the values of the test statistics, or their p values. Limiting the number of steps in an FDR procedure can be used to control the number or proportion of false discoveries (Somerville and Hemmelmann 2007). Using the program to calculate the largest critical values makes possible eﬃcient use of the FDR procedures for very large numbers of hypotheses.


Introduction
Recent developments in diverse fields, including genomics, have resulted in very large data sets where it may be desired to simultaneously test many thousands of null hypotheses. Traditional procedure to control the probability of rejecting true hypotheses in multiple hypotheses testing has been to control the family-wise error rate (FWER). For a large number of hypotheses, however, extremely low power for testing single hypotheses may result.
Recently, procedures have been proposed which control the false discovery rate (FDR). The false discovery rate is defined to be the expected value of the proportion of rejected hypotheses which are actually true, with the false discovery rate defined to be zero when no hypotheses are rejected. Using an FDR procedure, the hypotheses H 1 , H 2 , . . . , H m , are simultaneously tested using the corresponding test statistics t 1 , t 2 , . . . , t m . Let T (1) ≤ T (2) ≤ . . . ≤ T (m) be the ordered test statistic values, and denote by H (i) the hypotheses corresponding to T (i) . Then critical constants d 1 ≤ d 2 ≤ . . . ≤ d m , are used to compare T (i) with d i .
For a step-down FDR procedure, for i = m, m − 1, . . ., compare T (i) with d i until for the first time T (i) < d i , rejecting in turn all the hypotheses for which T (i) ≥ d i . For a step-up FDR procedure, for i = 1, 2, . . . , compare T (i) with d i until for the first time T (i) ≥ d i , rejecting all the hypotheses, except those for which the comparisons already made have shown T (i) ≤ d i .
The critical values d i are chosen such that the expected value of the FDR is always less than some value q (or α). FDR procedures may also be defined in terms of critical p values. Benjamini and Hochberg (1995) proposed an FDR step-up procedure valid for independent test statistics. Benjamini and Lui (1999) presented a step-down FDR procedure valid under the same conditions. Sarkar (2002) showed that both procedures were valid under positive dependency. Troendle (2000) developed both step-up and step-down FDR procedures which asymptotically control the FDR when the test statistics have a multivariate t distribution. , assuming a multivariate t distribution and common correlation of the test statistics used least favorable configurations to develop step-down and step-up FDR procedures. His step-down procedure yielded the same critical values as those of Troendle. It should be stated that, although extensive calculations strongly suggested the validity of the assumed location of the least favorable locations, a rigorous proof was not realized. Also, instead of setting negative critical values equal to zero, Somerville used a minimum critical value (MCV). The step-down FDR procedures of Troendle and Somerville have been shown by Horn and Dunnett (2004) and Somerville (2003) to have superior power, under the assumption that the test statistics have a multivariate t distribution.
While FDR procedures control the expected value of the proportion of true hypotheses which are rejected, it is possible that a very large number of true hypotheses could be rejected. To control the number of true hypotheses, which are falsely rejected, van der Laan, Dudoit, and Pollard (2004) proposed the generalized family-wise error rate gFWER(k). It is defined as the probability of at least (k +1) Type 1 errors (k = 0 for the usual FWER), i.e. P[U ≤ k] ≥ 1−α where U is the number of true hypotheses which are rejected. Augmentation procedures proposed by van der Laan et al. (2004) and Dudoit, van der Laan, and Birkner (2004) can then be used to control PFP (proportion of false positives), where P[PFP ≤ γ] ≥ 1−α for some pre-specified γ and α . More recently Lehmann and Romano (2005) suggested new methods of controlling gFWER(k) (called by them k-FWER) and PFP (called by them FDP -False Discovery Proportion). Both single-step and step-down procedures were derived which control the k-FWER. The procedures make no assumptions concerning the dependence structure or the p values of the individual tests. The step-down procedure is simple to apply, and cannot be improved without violation of control of the k-FWER. Lehmann and Romano (2005) also proposed two methods for controlling the PFP. The first holds under "mild conditions on the dependence structure of p values" while the second requires no dependence assumptions.  noticed that by limiting the number of steps in his procedures, the FDR could be reduced with a small subsequent loss of power. (An s step procedure is equivalent to an m step procedure where the m − s smallest critical values are replaced with the value d m−s+1 , also called MCV. Thus, only the largest s critical values are used in the comparisons). Somerville and Hemmelmann (2007), by limiting the maximum number of steps in stepdown and step-up procedures, developed new procedures to control k-FWER. The procedures require only an arbitrary set of critical constants d 1 ≤ d 2 ≤ . . . ≤ d m , and need not be related to any step-down or step-up procedure. However, by starting with a set of critical values which satisfy an FDR requirement, the procedures can be used to simultaneously control FDR and k-FWER requirements. Using data from the literature, the method of Somerville and Hemmelmann (2007) was compared with those of Lehmann and Romano (2005), and under the assumption of multivariate normality and a common correlation of the test statistics, considerable improvement in the reduction of false positives, in control of PFP, and in power, were accomplished while still controlling the FDR. The Fortran 95 program seq.f95 can be used in four different modes to calculate the critical values for the procedure of . In mode 1, the program calculates the critical values for step-down or step-up FDR procedures for one-or two-sided hypotheses, arbitrary degrees of freedom, arbitrary FDR levels and arbitrary common correlation coefficients. Using mode 3, the program calculates the largest critical values (number of critical values is specified by the user). Using mode 4, the Fortran 95 program, given the test statistic values, efficiently calculates an upper bound to the number of hypotheses which will be rejected. Given p values for the test statistics, the program, using mode 5, converts the p values to test statistic values and also obtains an upper bound to the number of rejected hypotheses. This paper has a number of objectives: 1. Describe the ways in which the Fortran program seq can be used in hypotheses testing, when the number of hypotheses is large.
2. Describe how the program obtains the constants used in Somerville's FDR procedures.

Present instructions and examples
, which make it relatively easy for a user, using the Fortran program seq to solve many hypotheses testing problems involving a large number of hypotheses.

Calculation of critical values
The procedure for calculating the critical values is described in . Let H 1 , H 2 , . . . , H m be m hypotheses to be simultaneously tested using the test statistics T 1 , T 2 , . . . , T m . We require that the false discovery rate (FDR), be less than a pre-specified value q (or α). If Q is the proportion of rejected hypotheses which are true, then FDR = E(Q). We wish to calculate the m critical values d 1 ≤ d 2 ≤ . . . ≤ d m , such that E(Q) ≤ q. We outline here the procedure for the step-down case.
Let A i be the probability that exactly i hypotheses are rejected. Then m) . To obtain the critical values, use is made of m least favorable configurations of the location parameters of the test statistics. Define LFC i as the configuration where i of the location parameters are zero and the remainder are infinite.
(The case where all m hypotheses are true corresponds to LFC m ). To obtain d 1 , use LFC 1 . Then, with probability 1, T (m) , . . . , T (2) are infinite, and A 0 , . . . , A m−2 are zero, and To maximize power considerations, we choose the smallest value d i which satisfies the equation. To obtain d i , (1 < i ≤ m), given the values of d 1 , . . . , d i−1 , use LFC i . With probability 1, T (m) , T (m−1) , . . . , T (i+1) are infinite and A 0 , . . . , A m−i−1 are zero. Under LFC i , when exactly r(≥ m − i) hypotheses are rejected, the proportion of true hypotheses rejected equals (r − (m − i)/r) with probability 1. We may then write, using the basic expectation algorithm, , and the corresponding FDR value F 3 calculated. If F 1 is closer to q than F 2 , then the values x 2 and F 2 are replaced with the values x 3 and F 3 , and likewise the values x 3 and F 3 replace the values x 1 and F 1 if F 2 is closer. Again linear interpolation using the new pair, x 1 and x 2 is used to obtain a new x 3 and the iteration continues until the process is defined to converge. The program attempts to obtain an estimate for d i with an FDR within .0001 of q.

Subroutine getFDR
The subroutine getFDR (getFDRup for the step-up case) uses Monte Carlo to get estimates for A m−i+1 to A m (step-down case) given the previously obtained values d 1 , . . . , d i−1 and an estimate for d i . The estimate of A j is the proportion of the n vectors, whose elements are the randomly obtained values for the m test statistics, for which the FDR procedure rejects exactly j hypotheses under LFC i . The FDR is then obtained as Subroutine bound(X,L) The subroutine bound(X,L) replaces boundXU with x i and boundFL with FDR i if boundFL < F i < q, and replaces boundXL with x i , and boundFU with F DR i if q < F i < boundFU.

Mode 3
In mode 3, the program calculates the pre-specified "ic""unique" values (number of steps s).
The values d 1 , . . . , d m−ic+1 are equal and trial values x 1 and x 2 are chosen to calculate the corresponding FDR values F 1 and F 2 . As in mode 1, linear interpolation is used to obtain x 3 and iteration is employed to obtain the common value of the m−ic+1 smallest critical values. seq.f95 uses the procedures of mode 1 to calculate the remaining ic − 1 critical values.

Mode 4
Calculating all of the critical values for Somerville's FDR procedure would present an almost insurmountable task when the number of hypotheses to be simultaneously tested is very large. The object of mode 4 is, given the values of all of the test statistics (or their p values), is to estimate an upper bound to the number of hypotheses that the procedure would reject.
Having this upper bound, a user could use an s-step version of the procedure. (The value of s would be equal to the upper bound). This would require the calculation of only s critical values, which should then present a manageable task. The program is designed so that using mode 4 estimates both an upper bound to the number of test statistics to be rejected, and also calculates the needed s critical values for a corresponding s-step procedure.
Starting with i = 1, the program assumes that MCV equals the i th largest test statistic and estimates the probability that at least i test statistics will each be greater than or equal to MCV. The program finds the largest value of i (ub) such that the probability is less than or equal to q. Mode 3 can be used to calculate the largest s (= ub) critical values. The values of nCV and nCVend in line 2 are not used in the program. For a quick estimate, the value of n need not be large, with n = 1000 usually adequate.

Mode 5
seq.f95 converts the p values to test statistic values and mode 4 is used.

Errors in the estimates
The iterative procedure is complicated by the fact that the FDR calculations are Monte Carlo based. To reduce the effect of such errors, the same random number seeds are used for each FDR calculation. For fixed critical values, the error in the calculated value of FDR can be approximated by (FDR(1 − FDR)/n). Numerous calculation to estimate the partial Values of n greater than 10,000 are recommended when calculating critical values. Figure 1 illustrates the effect on the accuracy of the critical values as n increases. Values used are step-down FDR, 1-sided hypotheses with m = 1000, ρ = 0, q = .05, and four values for n, namely 10 3 , 10 4 , 10 5 and 10 6 , and using exactly 31 "unique" critical values.

Examples
The program was used in mode 4 to obtain an upper bound to the number of rejected hypotheses for the data of Hedenfalk et al. (2001). Using n = 10, 000 or 100, 000, q = .05, ρ = 0, df = ∞, 2-sided hypotheses, seq.f95 has given an upper bound to the number of hypotheses to be rejected as 153.
An example for each of the four modes is given. The input file crit.in is given for each, as are the two output files crit.out and seq.out.

Summary and conclusions
Using the program seq.f95 to obtain a specified limited number of critical values for the FDR procedure is simple and efficient and the results can be used to control the number and proportion of false positives. The FDR procedure requires that the test statistics have a multivariate t distribution with a common correlation. Limited studies suggest that the procedure has robust qualities with regard to the assumption of the multivariate t distribution, and is conservative when correlations are underestimated. It is not difficult to show that d 1 and d m , the smallest and largest critical values are decreasing functions of ρ, the common correlation coefficient.

A. User manual
A.1. Capabilities of the program Some of the capabilities are: • The program can be used to calculate all or a specified number of the largest critical values needed to simultaneously test m null hypotheses. The hypotheses may all be one-sided or all two-sided.
• If all of the values of the test statistics (or all of the p values) are available, the program can be used to estimate an upper bound to the number of hypotheses which will be rejected, and a corresponding MCV will be given. The program can then be used to calculate that number of critical values or the user can specify the MCV. This is an especially useful capability of the program, and eliminates the practically insurmountable task of calculating all of the m critical values when m is very large.
• A minimum critical value (MCV) can be specified. That is, the user decides that a hypothesis will never be rejected if the test statistic value is less than MCV. The program will then calculate all of the necessary critical values.
• Enables user to calculate critical values for an implementation of the procedure of Somerville and Hemmelmann (2007) (controlling the number of false positives, given a set of critical values).
In all cases, the user will need to specify all of the following parameters: m the number of hypotheses, df the common number of degrees of freedom for all of the test statistics (or p values), q the false discovery rate (also often denoted by α), upordown for step-down, a negative integer, for step-up, a positive integer or zero, nbrsided 1 for one-sided hypotheses, 2 for two-sided hypotheses, rho common correlation between the test statistics.
In addition, the user will need to specify values for iseed, n, nCV and nCVend: iseed a positive integer for the random number generator, e.g., 757.
n the number of random m-dimensional vector to be generated. The size of n determines the accuracy of the critical values generated. A value less than 10,000 is almost never recommended. A value of 1,000,000 will usually be sufficient to produce near 3 decimal accuracy.
nCV, nCVend d nCV is the first value and d nCV is the last value to be calculated. It is assumed that the values d 1 to d nCV −1 have already been calculated. Note: if the mode is not 1, arbitrary values for nCV and nCVend may be input.
Three files must exist prior to the running of the program. crit.in consists of three lines and contains the necessary input for the program. The two files crit.out and seq.out will contain the output of the program at the completion of running the program seq.f95.

Mode 1
In this mode, the program calculates all of the critical values from d nCV to d nCV end . It is assumed that the values from d 1 to d nCV −1 are known. If all of the critical values are to be calculated, set nCV = 1, and set nCVend = m.
This mode can also be used when an MCV is given. Set nCV = 2 and nCVend = m, and in line 3, make both the first and second values equal to MCV. This could be very inefficient if m is large unless a small value of n can be used. If one is able to make a close underestimate of the number critical values which will be equal to MCV (say ub), then setting nCV equal to ub and nCVend equal to m, and inserting ub consecutive values of MCV in line 3 of crit.in will be much more efficient. A method to make an estimate of ub might be to make several runs using guessed values for ub, using a smaller value for n (say 1000). See also the note in Mode 3.

Mode 3
This mode is used when only the largest critical values need to be calculated. (See modes 4 and 5.) When using mode 3, the values for nCV and nCVend are ignored. If ub is the number of critical values to be calculated, line 3 should contain only the value ub. Note: The smallest critical value for an s-step procedure is also the MCV for the procedure.

Mode 4
If the set of m test statistic values are given, the program efficiently calculates an upper bound (say ub) to the number of hypotheses that will be rejected, and a corresponding MCV. Mode 3 can then be used to calculate only the largest ub critical values. The remaining critical values can then be set equal to the smallest of the calculated critical values, constituting the set of critical values needed for an s-step FDR procedure.

Mode 5
Given the p values corresponding to the m hypotheses, the p values are converted to test statistic values, and the methodology used in mode 4 is used to determine an upper bound to the number of hypotheses which will be rejected.

Contents of the input file crit.in
crit.in contains three lines. The first line contains the digit 1, 3, 4 or 5, corresponding to the desired mode. The second line contains ten elements: n, iseed, m, nCV, nCVend, df, q, nbrsided, upordown, rho. Note that unless the mode is 1, arbitrary integers may be used for nCV and nCVend.

A.4. Calculating the s largest critical values
For m = 1000, I would like to calculate the 10 largest critical value (use a 10-step procedure). I wish to use 2 sided FDR, assume a common correlation of 0.0. Each test statistic has 20 df. Arbitrary values may be given for nCV and nCVend. The following may be used as input for crit.in.

A.5. Specifying a minimum critical value
For any set of parameters, m, rho, upordown, df, nbrsided, q there is a relationship between an MCV and the number of "unique" critical values (number of steps in the corresponding s-step procedure). Knowing the approximate relationship may assist in determining whether to use a "unique" number of critical values or choose an MCV. A quick way to explore this is to choose several numbers of "unique" critical values and find the corresponding MCVs. If m = 5000, df = 30, q = .05, rho = 0, and 1-sided step-down, choosing 20 "unique" values (a small value for n will be sufficient here, and values for nCV and nCVend are arbitrary), crit.in could be the following: Note that after the program calculates the first critical value, the following statement occurs: MCV = 3.887 for exactly 20 "unique" critical values. Abort the program (using CTRL BREAK) and replace 20 in line 3 with another number, and repeat the process. Using