Closing the Gap between Methodologists and End-Users: R as a Computational Back-End

The R environment provides a natural platform for developing new statistical methods due to the mathematical expressiveness of the language, the large number of existing libraries, and the active developer community. One drawback to R , however, is the learning curve; programming is a deterrent to non-technical users, who typically prefer graphical user interfaces (GUIs) to command line environments. Thus, while statisticians develop new methods in R , practitioners are often behind in terms of the statistical techniques they use as they rely on GUI applications. Meta-analysis is an instructive example; cutting-edge meta-analysis methods are often ignored by the overwhelming majority of practitioners, in part because they have no easy way of applying them. This paper proposes a strategy to close the gap between the statistical state-of-the-science and what is applied in practice. We present open-source meta-analysis software that uses R as the underlying statistical engine, and Python for the GUI. We present a framework that allows methodologists to implement new methods in R that are then automatically integrated into the GUI for use by end-users, so long as the programmer conforms to our interface. Such an approach allows an intuitive interface for non-technical users while leveraging the latest advanced statistical methods implemented by methodologists.


Introduction
In statistically-driven sciences there is often a disparity between cutting edge methodologies and the techniques used by practitioners in their daily work. This disparity is due in part to a dearth of technical know-how. Methodologists (usually statisticians) tend to implement advanced methods in statistical programming languages such as Stata and SAS, and perhaps most commonly now, in R (R Development Core Team 2012). Non-technical researchers, meanwhile, are often deterred by programming and rely on available graphical user interface (GUI) driven tools. Such specialized software is generally easy to use, but usually lags with respect to the statistical methods available. The result is that practitioners continue to use inappropriate or suboptimal methods due to their being restricted to what is made available via GUIs. The problem is compounded by the fact that statistical programmers generally do not implement custom GUIs for new methods they develop.
The example of meta-analysis, described at length in the following section, motivates our work. Most meta-analysts are non-technical clinical researchers without experience in R or any other statistical programming language. They therefore tend to rely on available (often commercial) GUI software, e.g., Comprehensive Meta-Analysis (CMA, Borenstein, Rothstein, and Cohen 2005). Such programs are user-friendly, but it is difficult for the maintainers to keep up-todate with the latest methodological developments. For example, despite their user-friendly interfaces, no existing GUI-driven specialized meta-analysis software that we are aware of accomodates multivariate datasets with multiple correlated outcomes, or longitudinal data on multiple treatments (in a network of treatments). Furthermore, most analytic choices are presented to the user as defaults with few, if any, options for more advanced analyses. Finally, graphs are often of low quality and the software architecture cannot easily accomodate new graphing capabilities.
Indeed, we encountered the above difficulties firsthand when implementing and maintaining our previous version of meta-analysis software, Meta-Analyst (Wallace, Schmid, Lau, and Trikalinos 2009). The program was implemented in C#, atop the .NET platform. Although the .NET platform made it easy to develop a user-friendly interface (spreadsheet-driven, etc.), it practically restricted us to the Windows platform. 1 More importantly, however, the language is not ideal for statistical programming; we found ourselves porting existing code from R to C#, an error-prone and inefficient process.
In this work, we propose a novel strategy for making advanced methods available to nontechnical end-users. We apply this strategy to meta-analysis as an example, but we believe it to be relatively general. We use a software architecture where an 'agnostic' front-end GUI (in our case written in Python) interfaces with a calculation and graphics engine (in R) in the back-end, using relatively simple standards. The GUI is agnostic in that it can render inputs to and results from arbitrary routines implemented in the calculations engine, so long as they conform to the predefined standards. The architecture lends itself to a 'two-tiered' distribution of statistical methods; advanced/technical users may opt to use the underlying R package(s) directly, ignoring the GUI, while non-technical users may only use the GUI, unaware even that R is the back-end.  Figure 1: Growth of the meta-analysis literature in the biomedical sciences. Based on a MEDLINE search (using the keywords: 'meta-analysis[Publication Type]' and 'meta-analysis as topic[MeSH Terms]'), we obtained estimates of the crude number (left panel), and the proportion (per 100,000 publications, right panel) of published papers related to meta-analysis, respectively, over the last 3 decades. The left panel represents the growth in the absolute number of meta-analysis-related publications whereas the right pannel demonstrates the growth of the meta-analysis field normalized to the overall growth of the literature. not designed to address. Meta-analysis has a long tradition in many scientific disciplines, including the natural sciences, such as physics, astronomy (Airy 1875) and ecology (Osenberg, Sarnelle, and Goldberg 1999), as well as medicine (Simpson and Pearson 1904;Fisher 1925), epidemiology (Stroup et al. 2000), psychology (Glass 1976), and education (Kulik and Kulik 1989). As the number of primary studies in all scientific fields has increased, so too has the need to synthesize this evidence using meta-analytic methods. For example, Figure 1 shows the exponential growth of meta-analyses and related publications in the medical literature over the last 3 decades. At the same time methodological advances have made it possible to the synthesize complex data, including multivariate data on multiple correlated outcomes, data assessed over successive time points (longitudinal data), or data on multiple treatments (networks of treatments).

Existing software for meta-analysis
The tasks involved in evidence synthesis are complex, and in order to accomodate this complexity, meta-analysis software should ideally: Allow data entry for complex datasets, including multiple outcomes reported in each study, simultaneous comparisons of multiple interventions across studies (networks of contrasts) and reporting of results over multiple timepoints (longitudinal datasets).
Facilitate back-calculation of sufficient statistics for meta-analysis based on the the data reported by each study. For example, the software could back-calculate a variance (a statistic that is often necessary for meta-analysis) from the corresponding p-value or confidence interval, or other information. Automating such calculations can reduce the reviewers' work burden and minimize data extraction errors (Gotzsche, Hrobjartsson, Maric, and Tendal 2007).
Include most commonly used meta-analysis methods and, importantly, allow the addition of new statistical techniques as they become available. Implement Bayesian meta-analysis methods through Markov chain Monte Carlo simulations.
Generate publication quality graphs for data presentation and exploration and allow the rendering of new graph types based on the particular requirements of novel statistical methods.
Be open-source to ensure availability in resource-limited settings and to allow community involvement in development and testing.
Combine the above listed technical requirements with ease of use, ideally through a GUI.
Currently, meta-analysis methods are implemented in many statistical packages, including Stata  (Clarke and Oxman 2000). In addition, there is dedicated software for specialized types of meta-analysis, such as meta-analysis of diagnostic test data; examples are Meta-Test (Lau 1997) and Meta-Disc (Zamora, Abraira, Muriel, Khan, and Coomarasamy 2006). We have provided a more exhaustive overview elsewhere (Wallace et al. 2009).
The methodologies that are implemented in software end up being widely used, because they are readily available. A striking example is the almost universal use of the DerSimonian-Laird method for random effects univariate meta-analysis (DerSimonian and Laird 1986) (the default or only method in the aforementioned programs), irrespective of good data that alternative random effect methods have better statistical properties (Turner, Omar, Yang, Goldstein, and Thompson 2000;Viechtbauer 2005).
In the following sections we describe the proposed software architecture by virtue of an applied example, namely our new meta-analysis program, which we call OpenMeta-Analyst.

Why R and Python?
We will briefly explain our reasons for electing to use the R and Python languages for OpenMeta-Analyst. R has become the lingua franca amongst methodologists and statisti-cians. We argue that Python is in many ways ideal for implementing a GUI; it is crossplatform, has a large, open-source community and the rpy library (Moriera and Warnes 2004) facilitates easy communication with R. Moreover, there are many mature, rich GUI toolkits available in Python. We feel that general purpose programming languages provide a better framework for writing GUIs compared to specialized statistical languages such as R.
There are many major benefits to using R as a statistical engine. First, doing so allows one to exploit the wealth of methods already written in R. By the same token, it allows us to incorporate new methods that are written with relatively little effort (i.e., we need only 'wrap' them so as to conform to our interface, as explicated in Section 4.2). Second, R is both cross-platform and open-source, both of which are desirable traits in any scientific software.
Our approach is similar in spirit to the R Commander package (Rcmdr), which provides an "easy-to-use, cross-platform, extensible GUI" with enough functionality to span a basic statistics course (Fox 2005). Like Rcmdr, we hope to provide an inuitive GUI that exploits the power of the R environment. However, rather than creating a general statistics program, we aimed to develop an intuitive, easy-to-use program for conducting meta-analyses that leverages the existing R code for this task. Moreover, we wanted this architecture to be extensive, so that new statistical routines can be developed by methodologists and plugged into the GUI with minimal effort.
For the GUI, we elected to use Python, rather than a GUI toolkit native to R such as the tcltk package, as the author of Rcmdr did. We did so for a few reasons. First, we were able to use the PyQT library (River Bank Computing 2012), a mature GUI toolkit written in Python. We feel that PyQT provides a richer set of GUI widgets than does tcltk. Second, Python is a cross-platform, open-source, object-oriented programming language, which we feel is more conducive to writing GUI code than is R. In any case, we note that had we opted to use tcltk, our approach to communicating between the GUI and underlying methods would remain as it is outlined in the following section.
The GUI we built with PyQT is shown in Figure 2. Because PyQT is a modern, mature graphics package, we were able to include such features as a full-fledged spreadsheet component, unlimited undo/redo functionality, and other modern interface features with relative ease. Moreover, QT widgets emulate the native look of the target platforms (e.g., a button will appear as a standard Windows button on Windows, and a standard OS X button on OS X), which is aesthetically pleasing.

Communication between statistical methods and the GUI
We will now describe the interface between the Python and R elements that comprise OpenMeta-Analyst. The key idea is that by exploiting assumptions about naming conventions and argument passing, methods written in R can be plugged in seamlessly to the GUI, meaning that they will be available to end-users via the GUI without the R coder having to write any GUI code. We will first outline our general framework for accomplishing this, and then walk through a concrete example.

The general framework
The main tenet of the design of OpenMeta-Analyst is that the user interface must be entirely divorced from the underlying statistical routines. In particular, the Python interface is to p 1 , p 2 , … p k Figure 3: Python requests a list of available methods for a particular data type from R. The parameter names and types required by these methods are also requested. Knowing the type of parameter allows an interface to be generated, on the fly.
know only that it is calling an abstract type of method (e.g., a meta-analysis method), and not the concrete method that it is actually invoking. We accomplish this through the dual strategy of assuming certain naming standards and exploiting R's introspective capabilities (i.e., ability to dynamically examine the current run-time environment).
The meta-analysis methods themselves are contained in our R package openmetar. This is a stand-alone package that in the future may be useful on its own for users interested in bypassing the GUI. At present, the package serves as an adapter that 'wraps' many of the methods found in metafor (Viechtbauer 2010) package so that they are consistent with our interface, which we explain in detail in the following subsection. In the (near) future, we plan on expanding our package to include meta-analysis routines for network meta-analysis and other advanced meta-analytic methods.

Standardized API
Existing and newly developed R packages Figure 4: The openmetar package is a liason between the UI and the underlying methods, which themselves may live in arbitrary packages. openmetar provides a coherent interface (i.e., API) for communication between the 'front' and 'back'-ends; new R methods can be 'plugged-in' to the program by wrapping them in the openmetar package, as demonstrated in Section 4.2.
In meta-analysis, there are a few basic data outcome types that can be pooled quantitatively. These include: binary outcomes, in which one of two possibilities is realized (e.g., dead/alive); continuous outcomes, which are arbitrary scalars capturing some measured quality (e.g., pain, as expressed on some scale); and diagnostic outcomes, which assess the performance of a diagnostic test (e.g., the sensitivity and specificity of a test). These data types are associated with different types of 'raw data' from which outcomes are calculated. Characterizing the space of data types over which analyses can be performed allows methodologists to write new methods for particular types of data, making assumptions about the parameters these methods will receive.
The interface side of things uses introspection to retrieve a list of available functions in the openmetar package for the current data type. It then assumes that method names that begin with, e.g., 'binary' are meta-analysis methods for binary data. We plan on augmenting this for more advanced data types (e.g., network data) in the future. The interface also assumes that there is a function for each of these methods that will return a list of parameter names and types the method requires. These are then used to generate a GUI for the user to input the required values. The GUI elicits these parameters from the user and passes them pack to the method in a dictionary with names corresponding to those given by the .parameters method. Figure 3 depicts the described interaction.

An example
We will now walk through an example of how a methodologist can implement a new metaanalysis method in R, and, so long as they conform to our interface, this method will automatically be available to end users via the (Python) GUI.
In particular, we require that each meta-analysis method that operates on binary data takes as arguments a BinaryData S4 object and a named list of parameters (similiarly, continuous meta-analysis methods are assumed to accept a ContinuousData object and a parameters list). 2 By distilling the elemental arguments sufficient to run meta-analyses, the GUI can remain agnostic with respect to what method it is invoking, as it needs only to know that it must pass these two arguments. The parameters, of course, will be specific to the meta-0 binary.random <-function(binary.data, params) { 1 if (!("BinaryData" %in% class(binary.data))) stop("Binary data expected.") 2 res <-rma.uni(yi = binary.data@y, sei = binary.data@SE, slab = binary.data@study.names, method = params$rm.method, level = params$conf.level, digits = params$digits) 3 degf <-res$k.yi -1 4 model.titless <-paste("Binary Random-Effects Model (k = ", res$k, ")", sep = "") 5 summary.disp <-create.summary.disp(res, params, degf, model.title) 6 forest.path <-paste(params\$fp_outpath, sep = "") 7 plot.data <-create.plot.data.binary(binary.data, params, res) 8 forest.plot(plot.data, outpath=forest.path) 9 images <-c("Forest Plot" = forest.path) 10 plot.names <-c("forest plot" = "forest_plot") 11 results <-list("images" = images, "Summary" = summary.disp, "plot_names" = plot.names) 12 results 13 } Figure 5: Code for our implemenation of the binary random effects meta-analysis. The main point is that we are able to wrap existing code so as to conform to our interface: the methods are then automatically made available via the UI. We provide detailed exposition, below. Line 1 asserts that the parametric data is of the right type. Line 2 is the call out to the random effects as implemented by the metafor package (Viechtbauer 2010). We parse out the parameters required by this method in order to pass them along to the method we are wrapping. The first three of these parameters are assumed to belong to every BinaryData object; the latter three are passed in via the params object because the associated .parameters method (below) requested them, and so the GUI solicited the user for appropriate values. Lines 3-5 assemble some output for the user, in particular Line 5 invokes a method to construct a pretty-printed summary table. This will be passed back to the GUI, which will display it to the user. Lines 6-8 call out to a routine that constructs an image associated with this type of analysis; in particular this is a 'forest plot', a popular graphic in meta-analysis. The forest.plot method generates this figure and writes it out to disk. We then return a named list mapping the name of the plot to the path to this generated image file. This image will subsequently be displayed by the GUI (in Python we will read the image from the disk). In this case there is only one image, however we could generate (and return) an arbitrary number of figures in the same way. Finally, in lines 9-11, we package the results in a dictionary (technically, a named list) to be returned and consuemd by the GUI. In particular, there are two fields that must be returned; the aformentioned dictionary of images (mapping titles to image paths) and a list of texts (mapping titles to pretty-printed text). The GUI will take care of displaying these to the user.
analysis method being implemented -we next discuss how we accomodate this requirement by way of an example.
Let us suppose that we are interested in implementing the random effects models we previously discussed. In fact, these methods have already been implemented in the metafor package (Viechtbauer 2010), and so our implementation of random effects meta-analysis for binary data ( Figure 5) simply wraps the pre-existing routine for this contained in the metafor package. This example illustrates how using R as an analytic back-end allows us to exploit the wealth of existing statistical packages, thereby freeing us from having to reinvent the proverbial wheel. Moreover, contrary to most existing (GUI-based) meta-analysis software, the metafor implementation of random effects meta-analysis includes many alternative estimators of between-study variance, beyond the popular DerSimonian-Laird estimator (see Figure 8).
The code itself is relatively straightforward 3 (the reader is encouraged to read through the commented code). We parse out the elements required by the rma.uni function, namely the individual effect estimates (binary.data@y) and the corresponding standard errors (binary.data@SE). Both of these fields are assumed to exist in all BinaryData objects. We pass along some other parameters, as provided in the params argument. The rest of the routine is concerned with parsing the output of interest from the rma.uni function and packaging it for consumption by the GUI. We also generate a forest plot, which is saved to disk at a user-specified path.
The method must return a named list. This list is assumed to contain an 'images' entry, which is a dictionary that maps image names to their paths on disk. The GUI will render 0 binary.random.parameters <-function(){ 1 rm_method_ls <-c("HE", "DL", "SJ", "ML", "REML", "EB") 2 params <-list("rm.method" = rm_method_ls, "conf.level" = "float", "digits" = "float") 3 defaults <-list("rm.method" = "DL", "conf.level" = 95, "digits" = 3) 4 var_order <-c("rm.method", "conf.level", "digits") 5 parameters <-list("parameters" = params, "defaults" = defaults, "var_order" = var_order) 6 parameters 7 } Figure 7: The .parameters method associated with the binary.random function ( Figure 5). This method specifies the arguments required by said method, as well as their types. Line 1 specificies the different available estimates for tau 2 , discussed above: the user will be able to choose which of these the software usess. Line 2 assembles a complete list of parameters the binary.random method requires (and that the user must provide): the estimation method, the confidence level and the number of digits to be displayed. Crucially, this requires that the code here specify what type of values each of these parameters may take; this allows the interface code to draw suitable input widgets. In Line 3 the defaults for these are provided. The GUI will automatically set the corresponding parameters to these values, initially; the user can then change them. In 4 we specify the order in which these variables are to appear in the GUI, vertically (see Figure 8). Finally, in line 5 we package these up for consumption by the Python interface code, which will be responsible for rendering an interface that allows the user to impart values for these parameters to the system. these for the end user. Other entries in the returned list are assumed to correspond to textual output -in this case we only have a 'Summary' field, which contains information about the analysis and the overall (pooled) estimate, etc. An arbitrary number of text fields may be returned. The text will be rendered in the GUI exactly as it would appear at the console, so the programmer is responsible for 'pretty-printing' the output.
There remains the issue that different meta-analysis methods will require different parameters. For example, a random-effects meta-analysis requires the specification of the random effects method to be used, whereas no such parameter is required for fixed-effects analysis. We therefore need a way for R programmers who are implementing meta-analysis methods to communicate to the GUI what parameters are required for their routine. The GUI will then be responsible for rendering an interface that solicits these parameters from users and, ultimately, passes them to the analytic routine.
We accomplish this again by making assumptions about naming conventions; every metaanalysis method is assumed to have a corresponding .parameters routine, whose only purpose is to return a named list specifying the argument names and types required by the the associated function. For example, the binary.random.parameters method shows the .parameters routine for the random effects meta-analysis routine for binary data implemented in Figure 5. Note that the type of each argument must also be specified. For example, the conf.level parameter is a float, whereas the rm.method argument is a categorical string. In the latter case, the values the argument can assume are enumerated in a list, which will be presented Figure 8: The GUI that is automatically built and presented to the end-user using the values specified by the binary.random.parameters (Figure 7).
to the user as a standard combo box (see Figure 8).
We allow the author of the method to optionally specify default values for their parameters using the defaults list, as shown in Figure 7; here, for example, we set the default confidence interval to 95. We also allow the author to optionally specify the order in which the parameters are presented to the user (e.g., in Figure 7, the rm.method argument will be at the top of the GUI, as shown in Figure 8).
An example of the GUI rendering the output returned in the last line of the routine shown in Figure 5 is displayed in Figure 6. Note also that in the bottom panel of the results window there is a small console; this provides direct access to the R environment. More advanced users may examine their results via the console, if they would prefer, and non-technical users can simply ignore it.
An additional part of the architecture not demonstrated by this example is the ability for the method author to express necessary conditions for their function. For example, a metaanalysis method may require at least three studies (i.e., individual point estimates) to run. The author of such a method can enforce this requirement by implementing a .is.feasible function associated with their method. This function is passed the current data object, which can then be inspected to see if the corresponding meta-analysis routine is feasible given the provided data. The function is returns True if the function can be run and False otherwise. Furthermore, we allow the implementer of methods to optionally provide more informative names for both the method and parameters, as well as descriptions for both. If these are available, the GUI will retrieve and display them. To recapitulate; the GUI requests which methods for the current data type are available by using introspection and making assumptions about naming conventions, and generates an interface for soliciting user parameters for these methods based on the corresponding .parameters function. Methods are assumed to return pointers to image files and pretty-printed text as output.

Discussion
In meta-analysis, the methodologists who invent and implement cutting-edge meta-analysis methods often do so in R. Meta-analysts are often non-technical and therefore tend to use GUI-driven programs instead of R to perform their analyses. Thus there is a gap between the statistical methods that ought to be used and the methods that are used due to the limited availability of the former in GUI-based programs. It is difficult for software developers to continuously implement and integrate the latest statistical methods into specialized programs, particularly when the methods are fast-evolving. This problem is compounded by the fact that such specialized GUI-based programs are written in general programming languages, not in specialized statistical languages like R.
To address this issue for the case of meta-analysis software, we have presented a framework for using R as the underlying computational/statistical engine, while the GUI is written in a different language more suited to GUI programming (in our case, Python). We have outlined our framework for extending the methods available in the GUI, which does not require directly writing GUI code. Thus new methods developed by methodologists can be made available to a more general, non-technical set of meta-analysts with minimal effort -the methods need only be wrapped so as to conform to our architecture. In our case, we have opted to use this strategy in developing a desktop application with the QT toolkit. However because of our strict separation of the analytic and UI componenents, one could re-implement the UI generation (following the described API) in, e.g., a web-interface with relative ease.
While meta-analysis was the driving example in this case, we believe this framework of using R as the underlying computational engine and a different language for the GUI could be of use in many mathematically driven application areas, particularly when the target users of new methods may be non-technical. Such an approach is suitable whenever the problem space (e.g., meta-analysis methods) is more conveniently expressed in a statistical language such as R but there is a need for an intuitive GUI for end-users. Some examples of such areas include: decision theory, machine learning, and network analysis. Given that this strategy is likely widely applicable, it would perhaps be useful in future work to extract the UI generating functionality into a stand-alone Python library, which could attempt to build UIs (automatically) for any R package that conformed to the API. Admittedly, however, such a general package would require careful design.

Code
OpenMeta-Analyst is a completely open source project. The source code (both Python and R) is available on GitHub at: http://github.com/bwallace/OpenMeta-analyst-.
Compiled 'binary' releases of the program are available to Windows and OS X users at http: //tuftscaes.org/open_meta/.