A Visual Basic Software for Computing Fisher’s Exact Probability

Fisher’s exact test (FET) is an important statistical method for testing association between two groups. However, the computations involved in FET are extremely tedious and time consuming due to multi-step factorial calculations after the construction of numerous 2×2 tables depending on the smallest cell value. A Visual-Basic computer program, CalcFisher, has been developed to handle the complexities of FET resorting the techniques of looping subroutines and logarithmic conversions. The software automatically calculates the P-value after entering the respective cell values and has been validated for proper functioning with a wide range of frequencies (tens to several thousands). The important features of the program include, easy data entry, tail-selection, comprehensive report format and the facility of printing and saving of results.


INTRODUCTION
Fisher's exact test (FET) calculates the exact probability value for the relationship between two dichotomous variables (Campbell and Machin 1996;Siegel 1956;Armitage and Berry 1994;Kramer 1988). It is an extremely useful non-parametric method for analyzing statistical association between the two independent sample groups and is commonly used for analyzing clinical and experimental data in biomedical research. The results of FET are expressed in terms of exact probability (P-value), varying within 0 and 1. Two groups are considered statistically significant if the P-value is less than the chosen significance level, which is quite often 0.05. The data format for FET is conveniently represented by 2×2 table, made of 2 rows and 2 columns. The two rows are two independent groups and the two columns represent the two effects or conditions.
Although the calculations required for FET are fairly straightforward the construction of additional 2×2 tables and the computation of respective probabilities using factorial formula entail considerable time and effort, especially when the lowest cell value is high (Siegel 1956;Armitage and Berry 1994;Kramer 1988). The aim of this study was to develop a computer program to solve the complexities involved in factorial computations for data analysis using FET.

Standard Method
The design of 2×2 contingency tables provides a comprehensive view of data to be analyzed as shown in Table 1. There are four input parameters (frequencies) belonging to two different groups. The top row is one group and the bottom row is another group whereas '+' and '-' signs above the two columns indicate presence or absence of a certain condition respectively. The standard formula (Formula 1) for calculating P-value (Campbell and Machin 1996;Siegel 1956;Armitage and Berry 1994;Kramer 1988) is shown in Table 1. If the smallest cell value in the contingency table is 0 then only one exact probability has to be calculated which is the simplest form of FET. However, if none of the cell frequencies is 0, more extreme deviations from the distribution could occur with the same marginal totals; thus, all those possible deviations must be considered and respective probabilities summed for testing null hypothesis. For instance, if the smallest cell value is 2, then three exact probabilities (using smallest cell values 2, 1 and 0) must be determined and then summed to get the exact P-value.

Modified Procedure
Our preliminary efforts while developing this Visual Basic application showed that Formula 1 could only be used for up to a total of 113 subjects (X = 113), beyond that the output of factorial computations exceeds the range of Visual Basic. The use of Stirling approximation formula (Diem and Lentner 1975) was also avoided for the sake of exactness of resulting P-values. Consequently, a modified procedure (Formula 2, Table 1) based on logarithmic conversions was used to perform FET for a wider range of frequencies. Table 1. Construction of 2×2 contingency table and formulae for P-value calculation.
2x2 Table Format Present Group 1 x 1 x 2 t 1 =x 1 +x 2 Group 2 x 3 x 4 t 2 =x 3 +x 4 t 3 =x 1 +x 3 t 4 =x 2 +x 4 X x 1 , x 2 , x 3 and x 4 are 4 frequencies, t 1 and t 2 are rows' totals, t 3 and t 4 are columns' totals and X is total number of subjects.
x min in the smallest cell value in 2×2 table.
According to this procedure, the program finds out the P-value of the original frequencies using the antilogarithm of the value obtained by subtracting the logarithm of total subjects (X) and sum of logarithms of individual frequencies (x 1 , x 2 , x 3 and x 4 ) from the sum of logarithms of row and column totals (t 1 , t 2 , t 3 and t 4 ). Then the program identifies the minimum frequency in the 2×2 table, subtracts 1 from this frequency and adjusts the remaining frequencies in the table so that row (t 1 and t 2 ) and column totals (t 3 and t 4 ) remain constant. The resulting set of frequencies is also used to compute the respective P-value. The whole process of subtracting 1 from the current minimum frequency, adjusting remaining 3 frequencies and computing the P-value is repeated until the least frequency becomes 0. All the P-values (obtained by using the least frequency 0, 1, 2, x min ) are summed up to get the exact P-value (Formula 2, Table 1).

Hardware and System Specification
The hardware used was a Pentium III computer with 20 GB hard disk capacity and 64 MB RAM. The program was developed in VISUAL BASIC 6.0 and will run as an executable file (48 KB) in any WINDOWS environment even without the presence of VB.

Design
The program is comprised of two windows (forms), which appear together on the screen when the program is run. The design and special features of both the forms are given below:

Form 1 (data input)
The configuration of form 1 is shown in Fig. 1, which is the main window for data input and computations. Among the various controls, there are 4 text boxes for data entry, 2 option buttons for tail selection, 4 command buttons for different applications and 2 labels, one for the output of P-value and the other is a counter.

Form 2 (report)
The report window is comprised of one rich-text box and two command buttons namely 'Print' and 'Save' which are controlled by a common-dialog application. The report shows test number, P-value, input frequencies of the two groups and the tail type.

PROGRAM VALIDATION
The software was validated for its proper functioning and accuracy using representative frequencies and the results were compared with standard statistical programs, SPSS and EPI-INFO (Table 2). The results confirmed the efficiency of CalcFisher program for computing Fisher's exact P-values for a wide range of frequencies.

DISCUSSION
Earlier studies from our center (Khan 2003) and other investigators (Todd and Wang 1996;Martin et al 1997;Runciman et al 1998) have demonstrated potential utility of Visual Basic software for various biomedical applications. Visual Basic programs are user friendly because of their object-oriented feature and the familiarity of most of the users with Microsoft windows environment. Unfortunately the use of Visual Basic for large factorial computations is greatly hampered by its upper limit (1.79E308) that is roughly closer to factorial of 170 (7.25E308); the factorial of 171 crosses the range of Visual Basic. This problem was successfully resolved by using logarithmic approach that drops out the exponential power of numeric values by converting them into respective log values. Moreover, the involvement of lesser number of operators in log-based factorial formula simplifies the Visual Basic code with a direct impact on software's efficiency.
The results of software validation clearly demonstrated the ability of CalcFisher program to accurately compute Fisher's exact P-values for a wider range of frequencies ( Table 2). The commonly used statistical packages including SPSS and EPI-INFO can also be used for computing FET, but in a condition-bound strategy. The former program calculates Fisher's exact P-value only when the total subjects are twenty or less (SPSS for Windows 1999) whereas the later performs FET when any of the expected values is less than five (EPI-INFO). On the other hand, CalcFisher computes P-values irrespective of cell frequencies and therefore can be utilized for universal application of FET for any data sets. Both SPSS and EPI-INFO basically compute χ2 statistics in the 2×2 table format when the cell values are high. In fact, the χ2 test is an approximation to FET and when applied with appropriate continuity correction leads to a fair approximation to exact probability (Campbell and Machin 1996). However, the estimate of probability in the χ2 test may not be very accurate if the marginal is very uneven or if one of the values is very small (Cochran 1954). Whereas, FET is a valid procedure for any number of frequencies and can easily be performed using CalcFisher.
In conclusion, the complexity of factorial computations can be greatly simplified by using logarithmic methodology. Log-based computations are highly suitable for developing Visual Basic applications as they involve lesser number of operations and also keep the output of intermediate steps within the permissible range of Visual Basic. The operational simplicity and integrated report format of CalcFisher render a handy tool for performing Fisher's exact test.