bamdit : An R Package for Bayesian Meta-Analysis of Diagnostic Test Data

In this paper we present the R package bamdit . The name of the package stands for “ Ba yesian m eta-analysis of di agnostic t est data”. bamdit was developed with the aim of simplifying the use of models in meta-analysis, that up to now have demanded great statistical expertise in Bayesian meta-analysis. The package implements a series of innovative statistical techniques including: the Bayesian summary receiver operating characteristic curve, the use of prior distributions that avoid boundary estimation problems of variances and correlations of random eﬀects, analysis of conﬂict of evidence and robust estimation of model parameters. In addition, the package comes with several published examples of meta-analysis that can be used for illustration or further research in this area.


Introduction
One of the most important decisions in the presence of illness is the correct medical diagnosis. Ideally, for a particular diagnostic problem we should have a collection of studies which indicate the best way to proceed. However, this is not the case in clinical and other areas of empirical research. Instead, researchers have to face a heterogeneous and fragmented evidence which has to be analyzed.
Meta-analysis is a branch of statistical techniques that helps researchers to combine evidence from a multiplicity of sources. In particular, meta-analysis of diagnostic test data differs from other types of meta-analysis in several aspects: First, the diagnostic summaries that we aim to combine (e.g., sensitivity and specificity) could be interdependent and a marginal combination by pooling these quantities might be misleading (Irwig, Macaskill, Glasziou, and Fahey 1995). Second, diagnostic studies are usually performed under slightly different diagnostic setups and they can be applied to different patients' populations. Hence, we can expect high heterogeneity between studies' results. In addition, the number of studies included might be small and with different qualities (e.g., they might have different study designs) (Lijmer et al. 1999;Lijmer, Bossuyt, and Heisterkamp 2002;Westwood, Whiting, and Kleijnen 2005). Hence, conducting meta-analysis and combining results from diagnostic studies may become a challenge.
In this paper we present the R package bamdit (Verde 2018). The name of the package stands for "Bayesian meta-analysis of diagnostic test data". The development of the package started with the following question: "How can we make complex meta-analysis in an automatic fashion?" The initial release of bamdit was the version 1.0 of summer 2011. This version was an experimental package where the aim was to investigate different statistical software architectures to fit complex meta-analysis models. During the last years we have rewritten and updated the package several times with the intention of making the package more user-friendly. The current release corresponds to the version 3.2.1 of September 2018 which is presented in this paper.
The package may be helpful to practitioners who are not familiar with complex Bayesian modeling and who do not have the skills to implement these models in Bayesian software such as WinBUGS/OpenBUGS (Lunn, Spiegelhalter, Thomas, and Best 2009) or JAGS (Plummer 2003).
For more than a decade, meta-analysis of diagnostic tests has been an active area of research, a gentle introduction is given by Gatsonis and Paliwal (2006) and more recently by Takwoingi, Riley, and Deeks (2015). Statistical methods have fallen into two main approaches: On the one hand we have techniques that focus on making a meta-analysis summary by recovering an underlined receiver operating characteristic (ROC) curve. This is the case of the summary ROC (SROC) curve introduced by Moses, Shapiro, and Littenberg (1993) and the hierarchical ROC (HROC) curve presented in Rutter and Gatsonis (2001); Macaskill (2004).
On the other hand we have approaches that directly model the diagnostic outcomes as a bivariate meta-analysis (Reitsma, Glas, Rutjes, Scholten, Bossuyt, and Zwinderman 2005;Chu and Guo 2009). The relationships between these two approaches have been investigated by Harbord, Deeks, Egger, Whiting, and Sterne (2007) and Arends, Hamza, Van Houwelingen, Heijenbrik, Hunink, and Stijnen (2008) from the classical perspective and by Novielli, Cooper, Sutton, and Abrams (2010) from the Bayesian perspective.
problems. An extensive list with a comprehensive description of these packages is presented in the CRAN task view "Meta-Analysis" (Dewey 2018). In particular the following R packages have been developed for meta-analysis of diagnostic test data: mada (Doebler 2017) implements the bivariate method of Reitsma et al. (2005) by using the normal approximation of the observed diagnostic rates in the logit scale. This package also offers bivariate metaregression functionality. HSROC (Schiller and Dendukuri 2015) provides a full Bayesian implementation of the hierarchical summary receiver operating characteristic (HSROC) method of Rutter and Gatsonis (2001). Metatron (Huang 2014) includes the implementation of the Reitsma et al. (2005) model by fitting a bivariate generalized linear mixed-effects model (GLMM), the package includes the case of diagnostic tests with an imperfect reference standard. metamisc (Debray 2017) implements the method of Riley et al. (2008) which estimates a common within and between correlation when the within-study correlations are unknown. Approximate Bayesian methods using INLA (integrated nested Laplace approximation) can be found in Paul, Riebler, Bachmann, Rue, and Held (2010). This approach is implemented in the package meta4diag (Guo and Riebler 2017). In Section 5 we give more detailed information about these R packages.
Implementation of different Bayesian meta-analysis models for diagnostic test data in BUGS software is discussed in Rutter and Gatsonis (2001), Verde (2008Verde ( , 2010 and Novielli et al. (2010).
The rest of the paper is organized as follows: In Section 2 we describe the software implementation of bamdit. In Section 3 we present methodological details of the Bayesian statistical model. In Section 4 we show how to use bamdit in practice. In Section 5 we compare bamdit with other R packages for meta-analysis of diagnostic test data. Finally, in Section 6 we give a brief summary of the work and we discuss future developments of the bamdit package.

Software implementation
In the implementation of bamdit we have considered that the package should be easy to use for practitioners familiar with R, but without a Bayesian statistical background.
We also considered that the package has to be portable between different operating systems. bamdit uses JAGS for MCMC (Markov chain Monte Carlo) computations, therefore the main system requirement is that JAGS (≥ 3.4.0) is installed on your computer (see http: //mcmc-jags.sourceforge.net/). It is important to note that R 3.3.0 introduced a major change in the use of toolchain for Windows. This new toolchain is incompatible with older packages written in C++. As a consequence, if the installed version of JAGS does not match the R installation, then the rjags package will spontaneously crash. Therefore, if a user works with R ≥ 3.3.0, then JAGS must be installed with the installation program JAGS-4.2.0-Rtools33.exe. For users who continue using R 3.2.4 or an earlier version, the installation program for JAGS is the default installer JAGS-4.2.0.exe.
A single function called metadiag() performs the meta-analysis. This function allows to fit bivariate normal random effects or bivariate scale mixture of normals. The default link function is the logistic link, but the user can choose between the three classical link functions of binomial data: logistic, complementary log-log or probit.
Internally, this function writes the BUGS script and sends the script to JAGS where MCMC computations are performed and returned to R.
The metadiag() function is a generic function implemented in S3 object-oriented programming in R. The output of the function is an object of the class metadiag, which contains results of the MCMC computations, the data used for analysis and further information from the fitted model. Results from a metadiag object can be displayed with its print, summary and plot functions. Further statistical details of the model behind bamdit is presented in Section 3.
Convergence of the MCMC computations can be analyzed using the R package coda (Plummer, Best, Cowles, and Vines 2006). In addition, we have implemented a series of graphical functions that can be used to summarize results and to compare results between models. We demonstrate this software's functionality in Section 4.

Some statistical advantages of using bamdit
From the statistical point of view, bamdit reduces the risk of having boundary problems in the estimation of the variances and the correlation between random effects of the metaanalysis model. In this regard it can be applied to problems where classical approaches fail (see Section 4 and Section 5).
In addition, bamdit is equipped with an automatic analysis of conflict of evidence (Verde 2014) which allows to detect studies with unusual results that have been included in the metaanalysis. The user does not need to exclude these studies, the heavy-tailed distributions for random effects implemented in bamdit automatically down-weight conflicting studies, which results in a robust Bayesian technique.
The statistical approach implemented in bamdit is fully Bayesian (see Section 3). Therefore, the variability of all parameters in the model are propagated to their posteriors. This contrasts with packages that use classical GLMM such as metatron, where standard deviations and correlations are calculated by fixing parameter values at their estimates.
The likelihood contributions of the models implemented in bamdit are exact and normal approximations are not required. Packages such us mada and metamisc use normal approximations of the likelihood contributions. These approximations can be very misleading in studies with a small number of patients or in meta-analysis of high-technology diagnostic tests where we expect zero outcomes in true positives or true negatives.
A particular value of bamdit is that it calculates the marginal and joint posterior predictive distribution of the sensitivity and specificity. As discussed by Higgins, Thompson, and Spiegelhalter (2009), these predictions are the most important summaries in meta-analysis involving random effects, which can be used to predict results in a new study.
With respect to predictions, most of the R packages for meta-analysis of diagnostic tests provide a graphical summary of the predictive contours at given confidence levels (e.g., 50% and 95%). These contours are calculated by assuming that the predictive distribution of random effects follows a bivariate normal distribution. The plot function of bamdit implements this option by using the argument smooth.par = TRUE. For smooth.par = FALSE a nonparametric smoothing of the bivariate distribution of the predictive sensitivity and specificity is displayed. This last option is very useful when the normal distribution of random effects is not plausible.

A data model for diagnostic test results
We assume that the pieces of evidence that we aim to combine are the results of N diagnostic studies, where results of the ith study (i = 1, . . . , N ) are summarized in a 2 × 2 table as given in Table 1, where tp i and f n i are the number of patients with positive and negative diagnostic results from n i,1 patients with disease, and f p i and tn i are the positive and negative diagnostic results from n i,2 patients without disease.
Assuming that n i,1 and n i,2 have been fixed by design, we model the tp i and f p i outcomes with two independent Binomial distributions: where TPR i is the true positive rate or sensitivity of study i and FPR i is the false positive rate or complementary specificity (1-specificity).
At face value, diagnostic performance of each study is summarized by the empirical true positive rate and true negative rate or specificity, and the complementary empirical rates of false positive rate and false negative diagnostic results, The main question in meta-analysis of diagnostic test data is: How can we combine the multiplicity of diagnostic accuracy rates in a single coherent model? In this work we recognize that in order to combine results of different studies we have to explicitly model the variability between studies, which is the topic of the next section.

Random effects model
We model between studies' variability with the following random components:  where g(·) corresponds to a link function which maps the diagnostic rates to the real scale (−∞, ∞). The canonical link function used in this work is the logistic link g(p) = log(p/(1 − p)), but other links are also possible (e.g., the complementary log-log link function g(p) = log(− log(1 − p)).
The random component D i represents the study effect associated with the diagnostic discriminatory power. For example, the logistic link function of D i corresponds to the diagnostic odds ratio in the logarithmic scale: Meta-analysis based on odds ratios is a common practice for therapeutic outcomes and one could also follow this approach for diagnostic studies. However, diagnostic results are sensitive to diagnostic settings (e.g., the use of different thresholds) and to populations where the diagnostic procedure under investigation is applied. These issues are associated with the external validity of diagnostic results. Following the footsteps of Moses et al. (1993), Verde (2008) introduced the random effect S i . This random effect quantifies variability produced by patients' characteristics, study design and diagnostic setup, that may produce a correlation between the observed TPRs and FPRs.
In short, we called S i the threshold effect of study i and it represents an adjustment of external validity in the meta-analysis.
Conditionally to a study weight q i , the study effects D i and S i are modeled as exchangeable between studies and they follow a scale-mixture of bivariate normal distributions with mean and variance: and scale mixing density The inclusion of the random weights q i into the model was proposed by Verde (2010), where p(q i ) allows for a great flexibility to model the marginal distribution of D i and S i . Two important cases are: q i ∼ χ 2 (ν), which corresponds to a marginal bivariate t distribution with known degrees of freedom ν, and p(q i = 1) = 1 which corresponds to a bivariate normal distribution.
In the case of the bivariate t distribution, when the degrees of freedom parameter is fixed to a constant, by integrating q i from the conditional distribution of ( In this case, we have to restrict ν > 2 in order to have finite marginal variance in the random effects. However, in the scale mixture of normal distribution we can fix ν = 1 and have a bivariate Cauchy distribution with infinite marginal variances for a particular study. This is equivalent to excluding a study from the meta-analysis. Another generalization of the random effects distribution happens when we put a prior on the degrees of freedom parameter ν (see Section 3.4). This corresponds to an adaptive robust distribution of the random effects.  Figure 1: DAG for the model which combines diagnostic accuracy results. Elliptical nodes represent random variables (parameters and data), rectangular nodes represent fixed parameters, single arrows correspond to stochastic dependencies between nodes and double arrows correspond to deterministic relationships. Model parameters with priors are depicted with dashed ellipses. Repeated structures of the graph are represented by the central plate. The model of interest is framed with a rectangle containing the hyperparameters of the model The use of the scale mixture of normal distributions as a statistical robust technique has been used in Bayesian statistics for a long time. There is a substantial literature in this area and a good starting point is the recent review by O'Hagan and Pericchi (2012). Figure 1 displays the directed acyclic graph (DAG) of the model presented in this section. In the usual DAG notation, elliptical nodes represent random variables (parameters and data), rectangular nodes represent fixed parameters, single arrows correspond to stochastic dependencies between nodes and double arrows correspond to deterministic relationships. Model parameters with priors are depicted with dashed ellipses. Repeated structures of the graph are represented by the central plate, where each 2 × 2 table is modeled as the result of diagnostic parameters (TPR i and FPR i ) which are the result of random study effects (D i and S i ). The model of interest is framed with a rectangle containing the hyperparameters of the model (µ D , µ S , σ D , σ S , ρ, ν).

The directed acyclic graph of the model
The DAG of Figure 1 links the statistical model to the MCMC computations implemented in JAGS. Using an automated theorem proof algorithm, JAGS factorized the joint posterior distribution in a set of conditional distributions which are used for Gibbs sampling. In addition, the DAG representation helps to understand how to extend the model of interest. For example, the pooled sensitivity and the pooled specificity are the result of functional parameters of the hyperparameters (see Section 3.7).

Priors for hyperparameters
The formulation of the model for aggregate data is completed by specifying the priors for the hyperparameters µ D , µ S , σ D , σ S and ρ. We assume that parameters are independent and we use the following set of priors: and The correlation parameter ρ is transformed by using the Fisher transformation, and a normal prior is used for z: Modeling priors in this way guarantees that in each MCMC iteration the variance-covariance matrix of the random effects θ 1 and θ 2 is positive definite. The values of the constants m 1 , v 1 , m 2 , v 2 , u 1 , u 2 , m r and v r have to be given. They can be used to include valid prior information which might be empirically available or they could be the result of expert elicitation. If such information is not available, we recommend setting these parameters to values that represent weakly informative priors. In this work, we use m 1 = m 2 = m r = 0, v 1 = v 2 = 1 and v r = √ 1.7 as weakly informative prior setup.
These values are fairly conservative, in the sense that they induce prior uniform distributions for TPR i and FPR i . They give locally uniform distributions for µ 1 and µ 2 , uniforms for σ 1 and σ 2 , and a symmetric distribution for ρ centered at 0. In our experience, the most difficult parameter to estimate in this model is ρ. Therefore, we recommend to make a prior-toposterior sensitivity analysis by giving different values for m r and v r in order to understand their influence in the analysis. Taking v r = √ 1.7 gives an approximate uniform distribution for ρ between −0.9 and 0.9, and less than 1.5% probability that ρ is less than −0.95 or greater than 0.95. This setup protects the computations from being trapped into impossible values of ρ.
Finally, in the current implementation of bamdit we give the following prior to the degrees of freedom ν parameter: and a uniform distribution for U : with a = 1/df.upper and b = 1/df.lower. The default values in bamdit are df.lower = 3 and df.upper = 30, this setup allows to explore random effect distributions that go from a t distribution with 3 degrees of freedom to a normal distribution. This prior is designed to favor long-tailed distributions and to explore conflict of evidence in meta-analysis. In addition, we provide the option to give a fixed value of ν, where the default is ν = 4, or to disable the scale mixture random effects and to use a bivariate normal distribution in the meta-analysis.

Interpretation of the studies' weights as conflict-of-evidence measures
An important aspect of q i is its interpretation as estimated bias correction. A priori all studies included in the review have a mean of E(q i ) = ν. We can expect that studies which are unusually heterogeneous will have posteriors substantially less than ν.
In bamdit we use w i = ν/q i and we report the posterior pr(w i > 1|Data) to indicate studies' heterogeneity. In our working experience, a study could be atypical if this posterior probability is greater than 0.7.
In addition, if the model is not corrected by the influence of unusual study results, then the meta-analysis may produce biased results. The use of scale mixtures of random effects automatically down-weights the influence of outliers in the meta-analysis and produces a robust estimation of the fixed-effects.
Unusual studies' results could be produced by factors that may affect the quality of the study, such as errors in recording diagnostic results, confounding factors, loss to follow-up, etc. For that reason, the studies' weights w i can be interpreted as an adjustment of studies' internal validity bias.

Splitting the studies' weights
In Verde (2014) I conjectured that one way to perform conflict of evidence in a multi-parameter meta-analysis model was to extend the random effects distribution by using a scale mixture of normal distributions per random effect. I have called this technique "splitting the studies' weights" and it is implemented in the bamdit function metadiag() by using the argument split.w = TRUE. The study's weight w i = ν/q i is now "split" into two components w i,1 = ν/q 1,i and w i,2 = ν/q 2,i , these weights measure individual conflict for the components D i and S i respectively. For example, if the sources of conflict are studies with unusual specificity, the posteriors of w i,2 will be further away from a prior mean E(w i,2 ) = 1, while the corresponding posteriors of w i,1 will be concentrated around the prior mean.
It is worth mentioning that the mixture of normal distributions introduced by using the splitting argument changes the distribution of the random effects. For example, if this splitting option is used with a t distribution with ν = 4, then we look at outliers in two orthogonal directions: the direction of D and S. If the splitting option is not used then the multivariate t distribution looks at outliers in any direction of the space (D, S). We report the posteriors pr(w i,1 > 1|Data) and pr(w i,1 > 1|Data) to indicate the direction of the studies' heterogeneity. We illustrate how to use this technique in the examples of Section 4.
Conditionally to q i,1 and q i,2 , the study effects D i and S i are modeled as exchangeable between studies. As a common scale mixing density, we use a χ 2 distribution conditionally to the degrees of freedom ν: q 1,1 , . . . , q N,1 , q 1,2 . . . , q N,2 ∼ χ 2 (ν).

Pooled and predictive summaries
In meta-analysis of diagnostic data we are interested in summarizing the overall accuracy of the test in terms of the pooled sensitivity and the pooled specificity.
These quantities are calculated as functions of µ D and µ S as following: In Figure 1 these quantities are represented as functions of logical nodes, statistical inference is based on sampling from their marginal posterior distributions: p(Sensitivity pooled |Data) p(Specificity pooled |Data).
Another important summary is the predicted pair of rates (FPR, TPR) for a study that has not been included in the meta-analysis. Statistical inference of these quantities is based on sampling from the bivariate predictive posterior In Figure 1 we display how this posterior is built by defining a stochastic node (D new , S new ) which is used to calculate TPR new , FPR new in each MCMC iteration.
The predictive posterior (18) can be used graphically in order to report the predictive surface at a given credibility level (e.g., 95%). We call this summary the Bayesian predictive surface Data prediction can be extended to generate N studies with the same number of n i,1 and n i,2 as the original ones (i = 1, . . . , n). The resulting predictive data can be compared with the observed data to assess model misfit.

Conditional summaries
The most common statistical technique used by practitioners to summarize meta-analysis of diagnostic data is the summary receiving operating characteristic (SROC) curve introduced by Moses et al. (1993). The model presented in Section 3 allows to build the Bayesian version of the SROC curve introduced by Verde (2008).
An alternative representation of the marginal model presented in Section 3.2 is the model based on the conditional distribution of (D i |S i = x) and the marginal distribution of S i . The conditional mean of (D i |S i = x) is given by: where the functional parameters A and B are We define the Bayesian SROC curve (BSROC) by transforming back results from (S, D) to (FPR, TPR) with The BSROC curve is obtained by calculating TPR in a grid of values of FPR which gives a posterior conditionally on each value of FPR. Therefore, it is straightforward to give credibility intervals for the BSROC for each value of FPR.
One important aspect of the BSROC is that it incorporates the variability of the model's parameters, which influences the width of its credibility intervals. In addition, given that FPR is modeled as a random variable, the curve is corrected by measurement error bias in FPR.
Finally, we can define a Bayesian area under the SROC curve (BAUC) by numerically integrating the BSROC for a range of values of the FPR: We recommend to use the limits f pr 0 and f pr 1 within the observed values of FPRs.
We have implemented these conditional summaries in the function bsroc(), the function plots the study results with the fitted SROC curve, its credibility intervals and the posterior distribution of the BAUC. We illustrate this functionality in Section 4.

Further parametrization of random effects
In the bivariate normal distribution case, the random effects distribution is similar to the bivariate model introduced by Reitsma et al. (2005) where the authors modeled random effects on the logistic transformed sensitivities (se i ) and specificities (sp i ). From Equation 4 we have: Taking g(·) = logit(·) we have the same random effects distribution as in Reitsma et al. (2005). However, the likelihood contributions of each study in the Reitsma model are assumed to be approximately normal, while in our model the likelihood contributions are exactly binomial. Moreover, given that our model is a full Bayesian hierarchical model with priors on the hyperparameters (see Section 3.4), the resulting estimation could be different (see Section 5).
The model implemented in bamdit generalizes the Reitsma model in the following aspects: by allowing the random effects to be non-normal; by relaxing the normality assumption of the likelihood contributions; and by introducing priors on hyper-parameters, which reduces the risk of having numerical problems in the estimation of the random effects distributions (e.g., variances equal to zero or correlations equal to one).
The argument re.model in the function metadiag() allows to choose between two parametrizations of the random effects: taking re.model = "SeSp" parametrizes the model in terms of (g(se i ), g(sp i )) while taking re.model = "DS" parametrizes the model as presented in Section 3.2 where the default value is re.model = "DS". The hyperpriors are automatically adapted according to the choice of parametrization. Glas, Lijmer, Prins, Bonsel, and Bossuyt (2003) performed a systematic review to investigate diagnostic procedures for tumor markers used for diagnosing bladder cancer. One of these markers was telomerase, a ribonucleoprotein enzyme, which was evaluated in 10 studies. Riley, Abrams, Sutton Lambert, and Thompson (2007) used this example to present issues regarding boundary problems in the estimation of the correlation between random effects. Paul et al. (2010) illustrate the use of INLA computations in this example as well.

Looking at the data
The data of this meta-analysis can be found in the glas data frame in bamdit. We can have a quick view of the different subgroups of markers by using the function plotdata(), here we present some of its functionality: R> set.seed(2017) R> library("bamdit") R> data("glas") R> head (

Fitting Bayesian meta-analysis models
A single function called metadiag() is used to fit different types of Bayesian meta-analysis models. Below we illustrate some of the arguments of this function. For example, to fit a model, with a bivariate normal distribution with a logistic link function, and random effects on D i and S i type: R> glas.m1 <-metadiag(glas.t, re = "normal", re.model = "DS", + link = "logit", sd.  Figure 3: Display of the meta-analysis results of studies with the telemerase marker in the data frame glas.

Initializing model
To see the results of these computations print the object by typing: For each parameter, n.eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor (at convergence, Rhat=1).
DIC info (using the rule, pD = var(deviance)/2) pD = 14.9 and DIC = 95.0 DIC is an estimate of expected predictive error (lower deviance is better).
We can see that hyper-parameters, like the component of variances (σ D and σ S ) and the correlation between random effects (ρ) are estimated without boundary problems.
If we need to directly calculate the correlation between the pooled sensitivity and the pooled specificity, then we can attach the object glas.m1 by using the package R2jags and directly calculate the correlation:

Displaying meta-analysis summaries
It is very useful to display the Bayesian predictive surface by contours at different credibility levels and compare these curves with the observed data. The function plot displays parametric or non-parametric predictive contours: R> plot(glas.m1, level = c(0.5, 0.75, 0.95), parametric.smooth = TRUE) The function plotsesp() is a user-friendly function in bamdit which displays the posterior distribution of the pooled sensitivity and specificity and their predictive posteriors. We can display these posteriors as follows: R> plotsesp(glas.m1) Figure 5 shows the output. Clearly, the low number of studies influence the ability to predict the result of a future study.
The BSROC curve and its area under the curve are useful summaries of a meta-analysis, we can easily display these summaries by using the function bsroc() as follows:

Hyper-parameters posteriors
If we are interested in visualizing the posterior distributions of all hyper-parameters simultaneously, we can use one of the alternative matrix plot functions in R. For example, we can use the ggpairs() function from the package GGally as follows: R> library("ggplot2") R> library("GGally") R> library("R2jags") R> attach.jags(glas.m1) R> hyper.post <-data.frame(mu.D, mu.S, sigma.D, sigma.S, rho) R> ggpairs(hyper.post, title = "Hyper-Posteriors", + lower = list(continuous = "density")) In lower diagonal panels of Figure 7 we can also see the correlation structure of this multivariate posterior. The main diagonal of this matrix plot contains the posterior densities of

Conflict of evidence analysis by using scale mixtures random effects
We can fit a model with scale mixtures as random effects to investigate if there is conflict of evidence between the studies included in the systematic review. The following code gives an example: R> glas.m2 <-metadiag(glas.t, re = "sm", link = "logit", + sd.Fisher.rho = 1.7, df.estimate = TRUE, split.w = TRUE, + nr.burnin = 10000, nr.iterations = 100000, nr.chains = 1, + r2jags = TRUE) The results are printed as usual:

R> glas.m2
Inference for Bugs model at "4", fit using jags, 1 chains, each with 1e+05 iterations (first 10000 discarded) n.sims = 90000 iterations saved mean sd 2.5% 25% 50% 75% 97.5% deviance 80.5 5.6 71. Although this model shows similar results as the model with bivariate normal random effects, there is a reduction of about 5% of the standard deviations of the pool summaries and we have the additional information coming from the posterior weights. The posterior probability that w 2 is greater than one is 0.7 for observations 5, 7 and 10. This indicates that these studies may contain unusual results. The function plotw displays the posteriors of the weights w 1 and w 2 :

R>
plotw(m = glas.m2) Figure 8 summarizes the results of the component weights w 1 and w 2 . If the normal random effects assumption were correct, then we would expect the posteriors of w 1 and w 2 centered at 1. Studies 5 and 7 showed a moderate deviation and Study 10 a clear deviation. We can print the original data to explain these results:  Studies 5 and 7 have a very low false positive rate, maybe too low to be true! Study 10 has over 75% false positive rate, which is extreme for these data. We can use the function plotcompare() to display the differences between two models with respect to the predictive posterior contours: R> plotcompare(m1 = glas.m1, m2 = glas.m2, m1.name = "Binomial + Normal", + m2.name = "Binomial + Scale mixtures", level = 0.95) Figure 9 shows that the model with the scale mixture random effects extends the predictive contours in the lower direction of sensitivity and in the upper direction of the false positive rate.

Example: Computer tomography in the diagnosis of appendicitis
This example refers to a meta-analysis of 51 studies investigating the performance accuracy of of computer tomography (CT) scans in the diagnosis of appendicitis (Verde 2008).
One characteristic of this meta-analysis is the combination of disparate data. From the 51 studies 22 were retrospective and 29 were prospective. Verde (2008) analyzed this characteristic and found that retrospective studies substantially had more heterogeneity than prospective ones, which led to the structural dispersion model of Verde (2010). Recently, Zhou and Dendukuri (2014) used this data to illustrate measurement heterogeneity in a bivariate random effects meta-analysis.

Looking at the data
The data of this meta-analysis can be found in the ct data frame in bamdit. In addition to the test performance results, this data frame contains information about study characteristics, patient characteristics, study design, and diagnostic setup.

Analyzing conflict of evidence of studies with different design
We analyze these data to show how to compare the posterior weights for different groups of studies. In the following example we compare these posteriors by using the function plotw.
We apply the factor variable gr to the argument group which indicates if a study has a prospective or a retrospective design.
In synthesis, retrospective studies are less specific and more uncertain than prospective ones.

Comparison with other R packages
The aim of this section is to present a brief comparison of results between different R packages that can be used for meta-analysis of diagnostic tests. For this aim we use the example of Glas et al. (2003) presented in Section 4.
Different packages implement different parametrization of random effects distributions, they use different estimation techniques, different numerical procedures and different inferential approaches. Therefore, in order to make results comparable we harmonize results in the following way: We use the logistic link function for sensitivity and specificity, the bivariate normal distribution is used for random effects, and the parametrization of random effects is based on sensitivity and specificity in the logistic scale. We apply the default settings for all functions and we present results with three significant decimal digits.
For example metatron uses sensitivity and false positive rate, therefore the estimated correlation was multiplied by −1 to obtain the correlation between sensitivity and specificity in the logistic scale. The same was used for mada and metamisc. The R script from Kuss et al. (2014) presents results in the probability scale, so we use the delta-method to back-transform the variances in the logistic scale. The package HSROC implements the calculations in the  Comparative Predictive Posterior Contours Figure 12: Predictive posteriors contours at 95 credibility level: Two models with normal random effects are fitted to studies with retrospective (blue points) and prospective (red points) design. Table 2 summarizes the results of our analysis. Of course the results are not conclusive and it is not the intention to make a systematic comparison between packages, but we can give the following remarks: • Estimation of the pooled means: The parameters µ SE and µ SP represent the pooled sensitivity and specificity in the logistic scale respectively. All packages estimate µ SE similarly, with respect to µ SP results are similar too, but metatron tends to overestimate this parameter. In the probability scale we can see that the pooled sensitivity and specificity are approximately similar across the packages, with the exception of metatron.
• Estimation of the standard deviations: The parameters σ SE and σ SP are the standard deviations of the random effects of the sensitivity and specificity in the logistic scale.
Looking across the results we clearly see that metatron gives implausible values close to zero for both parameters. As we saw in Section 4 there is an outlier in the direction of the specificity, which makes the estimates of σ SP more variable across packages, bamdit gives the larger value of 2.207 and the HSROC the smallest value 1.422.
• Estimation of the correlations: As mentioned in Section 3.4, the correlation parameter ρ is the most difficult parameter to estimate. The package mada gives the impossible value −1. The same happens with metatron with a value close to 0. The other packages managed to estimate ρ between −0.857 to −0.615.
• Bayesian approaches: There are similarities between results of meta4diag and bamdit. This is not by chance, both packages implement the same model for random effects. The differences come from the priors used for the variance-covariance matrix, meta4diag uses a Wishart distribution while bamdit uses a conditional model to implement the priors. HSROC delivers similar results for the pooled estimates of sensitivity and specificity with posterior intervals similar to meta4diag and bamdit.
• Predictions: Only two packages, HSROC and bamdit, presented the posterior predictive intervals of sensitivity and specificity. The predictive interval for sensitivity reported by bamdit is close to the observed data, which have a range of 0.54 to 0.84. The predictive interval for specificity reported by HSROC excludes one observed specificity equal to 0.24, indicating that the model over-predicts specificity in this example.
In this example, we can highlight that the packages which implement a classical approach based on GLMM, or its approximation, have problems with the estimation of variances and correlations of random effects. The package metamisc is the exception: It gives results similar to meta4diag and bamdit.
A casual practitioner may only look at the pooled sensitivity and specificity and find no difference between packages. However, the correlation structure gives important information about the heterogeneity of the studies included in the meta-analysis and the prediction of future studies. Therefore, packages with problems in the estimation of this part of the model will predict impossible results.

Conclusions
When developing bamdit, our aim was to simplify the application of a meta-analysis model which was accessible to practitioners but which up to now had required a large amount of statistical expertise. The package implements a series of innovative statistical techniques to avoid boundary estimation of parameters, conflict of evidence and robust estimation of model parameters.
The first example in Section 4 shows that the MCMC algorithm implemented in bamdit outperforms a classical bivariate random effects approach based on REML estimation, which can be unreliable when the meta-analysis contains a small number of studies with a large heterogeneity (Riley et al. 2007). Moreover, the flexible random effects distribution used in bamdit helps to better understand the studies' results by pointing out unusual results.
The conflict-of-evidence assessment is the deconstructionist side of meta-analysis, where each piece of evidence is put aside from the full model and compared to the rest of the evidence. One possibility for this type of analysis is to embed a meta-analysis model in a more general model where the non-conflict situation is a particular case. Both examples in Section 4 demonstrated that we could apply a double scale mixture of bivariate normal distributions and we made conflict diagnostics by direct interpretation of the scale weights.
One important topic currently not implemented in bamdit is the meta-regression and the indirect comparison of several diagnostic procedures. These topics are linked to the problematic of ecological bias and are topics of current research. However, we plan to update bamdit in order to include this functionality soon.
Actually, there is no other statistical software such as R that has implemented such a universe of possibilities to make meta-analyses of diagnostic tests. Unfortunately, R is not the most popular software in meta-analysis of diagnostic test data. Recently, we reviewed 68 published meta-analyses of diagnostic test data in medical journals between October 2015 and February 2016. We found that 29 papers chose Stata for fitting the bivariate meta-analysis of Reitsma et al. (2005), 9 used Meta-Disc alone, and 19 papers combined the use of Meta-Disc with Stata. Among the remaining 11 papers, 3 used SAS and 5 R. We found that practitioners publish statistical results in medical journals with convergence errors, variance equal to zero, integrated AUC in a range that is not empirically plausible, and so on. There is an imperative need to improve statistical results in this area and we hope that bamdit can help with this.