Conducting Meta-Analyses in R with the metafor Package

The metafor package provides functions for conducting meta-analyses in R . The package includes functions for ﬁtting the meta-analytic ﬁxed-and random-eﬀects models and allows for the inclusion of moderators variables (study-level covariates) in these models. Meta-regression analyses with continuous and categorical moderators can be conducted in this way. Functions for the Mantel-Haenszel and Peto’s one-step method for meta-analyses of 2 × 2 table data are also available. Finally, the package provides various plot functions (for example, for forest, funnel, and radial plots) and functions for assessing the model ﬁt, for obtaining case diagnostics, and for tests of publication bias.


Introduction
Science is a cumulative process (Shoemaker et al. 2003). Therefore, it is not surprising that one can often find dozens and sometimes hundreds of studies addressing the same basic question. Researches trying to aggregate and synthesize the literature on a particular topic are increasingly conducting meta-analyses (Olkin 1995). Broadly speaking, a meta-analysis can be defined as a systematic literature review supported by statistical methods where the goal is to aggregate and contrast the findings from several related studies (Glass 1976).
In a meta-analysis, the relevant results from each study are quantified in such a way that the resulting values can be further aggregated and compared. For example, we may be able to express the results from a randomized clinical trial examining the effectiveness of a medication in terms of an odds ratio, indicating how much higher/lower the odds of a particular outcome (e.g., remission) were in the treatment compared to the control group. The set of odds ratios from several studies examining the same medication then forms the data which is used for further analyses. For example, we can estimate the average effectiveness of the medication models. Functions for the Mantel-Haenszel and Peto's one-step method are also available. Finally, the package provides various plot functions (for example, for forest, funnel, and radial plots) and functions for assessing the model fit, for obtaining case diagnostics, and for tests of publication bias.
The purpose of the present article is to provide a general overview of the metafor package and its current capabilities. Not all of the possibilities and options are described, as this would require a much longer treatment. The article is therefore a starting point for those interested in exploring the possibility of conducting meta-analyses in R with the metafor package. Plans for extending the package are described at the end of the article.

Meta-analysis models
In this section, the meta-analytic fixed-and random/mixed-effects models are briefly described (e.g., Hedges and Olkin 1985;Berkey et al. 1995;van Houwelingen et al. 2002;Raudenbush 2009). These models form the basis for most meta-analyses and are also the models underlying the metafor package. We start with i = 1, . . . , k independent effect size estimates, each estimating a corresponding (true) effect size. We assume that where y i denotes the observed effect in the i-th study, θ i the corresponding (unknown) true effect, e i is the sampling error, and e i ∼ N (0, v i ). Therefore, the y i 's are assumed to be unbiased and normally distributed estimates of their corresponding true effects. The sampling variances (i.e., v i values) are assumed to be known. Depending on the outcome measure used, a bias correction, normalizing, and/or variance stabilizing transformation may be necessary to ensure that these assumptions are (approximately) true (e.g., the log transformation for odds ratios, Fisher's r-to-z transformation for correlations; see Section 3.1 for more details).

Random-effects model
Most meta-analyses are based on sets of studies that are not exactly identical in their methods and/or the characteristics of the included samples. Differences in the methods and sample characteristics may introduce variability ("heterogeneity") among the true effects. One way to model the heterogeneity is to treat it as purely random. This leads to the random-effects model, given by where u i ∼ N (0, τ 2 ). Therefore, the true effects are assumed to be normally distributed with mean µ and variance τ 2 . The goal is then to estimate µ, the average true effect and τ 2 , the (total) amount of heterogeneity among the true effects. If τ 2 = 0, then this implies homogeneity among the true effects (i.e., θ 1 = . . . = θ k ≡ θ), so that µ = θ then denotes the true effect.

Mixed-effects model
Alternatively, we can include one or more moderators (study-level variables) in the model that may account for at least part of the heterogeneity in the true effects. This leads to the mixed-effects model, given by where x ij denotes the value of the j-th moderator variable for the i-th study and we assume again that u i ∼ N (0, τ 2 ). Here, τ 2 denotes the amount of residual heterogeneity among the true effects, that is, variability among the true effects that is not accounted for by the moderators included in the model. The goal of the analysis is then to examine to what extent the moderators included in the model influence the size of the average true effect.

Fixed-effects vs. random/mixed-effects models
The random/mixed-effects models need to be carefully distinguished from fixed-effects models. When using fixed-effects models, the goal is to make a conditional inference only about the k studies included in the meta-analysis (Hedges and Vevea 1998). For example, a fixed-effects model without moderators provides an answer to the question: How large is the average true effect in the set of k studies included in the meta-analysis?
To be precise, the question addressed by a fixed-effects model depends on the type of estimation method used. If weighted least squares is used to fit the model, then the fixed-effects model provides an estimate ofθ the weighted average of the true effects, where the weights are typically set equal to w i = 1/v i . On the other hand, unweighted least squares provides an estimate of the simple (unweighted) average of the true effects (Laird and Mosteller 1990).
In contrast to the fixed-effects model, random/mixed-effects models provide an unconditional inference about a larger set of studies from which the k studies included in the meta-analysis are assumed to be a random sample (Hedges and Vevea 1998). We typically do not assume that this larger set consists only of studies that have actually been conducted, but instead envision a hypothetical population of studies that comprises studies that have been conducted, that could have been conducted, or that may be conducted in the future. The random-effects model then addresses the question: How large is the average true effect in this larger population of studies (i.e., how large is µ)?
Therefore, contrary to what is often stated in the literature, it is important to realize that the fixed-effects model does not assume that the true effects are homogeneous. In other words, fixed-effects models provide perfectly valid inferences under heterogeneity, as long as one is restricting these inferences (i.e., the conclusions about the size of the average effect) to the set of studies included in the meta-analysis. 1 On the other hand, the random-effects model provides an inference about the average effect in the entire population of studies from which the included studies are assumed to be a random selection.
In the special case that the true effects are actually homogeneous, the distinction between the various models disappears, since homogeneity implies that µ =θ w =θ u ≡ θ. However, since there is no infallible method to test whether the true effects are really homogeneous or not, a researcher should decide on the type of inference desired before examining the data and choose the model accordingly. For more details on the distinction between fixed-and random-effects models, see Hedges and Vevea (1998) and Laird and Mosteller (1990).

Model fitting
In essence, the various meta-analytic models are just special cases of the general linear (mixedeffects) model with heteroscedastic sampling variances that are assumed to be known. The random/mixed-effects models can therefore be fitted using a two step approach (Raudenbush 2009). First, the amount of (residual) heterogeneity (i.e., τ 2 ) is estimated with one of the various estimators that have been suggested in the literature, including the Hunter-Schmidt estimator (Hunter and Schmidt 2004), the Hedges estimator (Hedges and Olkin 1985;Raudenbush 2009), the DerSimonian-Laird estimator (DerSimonian and Laird 1986;Raudenbush 2009), the Sidik-Jonkman estimator (Sidik and Jonkman 2005a,b), the maximum-likelihood or restricted maximum-likelihood estimator (Viechtbauer 2005;Raudenbush 2009), or the empirical Bayes estimator (Morris 1983;Berkey et al. 1995). Next, µ or β 0 , . . . , β p are estimated via weighted least squares with weights equal to w i = 1/(v i +τ 2 ), whereτ 2 denotes the estimate of τ 2 .
Once the parameter estimates have been obtained, Wald-type tests and confidence intervals (CIs) are then easily obtained for µ or β 0 , . . . , β p under the assumption of normality. For models involving moderators, subsets of the parameters can also be tested in the same manner.
Based on the fitted model, we can also obtain fitted/predicted values, residuals, and the best linear unbiased predictions (BLUPs) of the study-specific true effects. The null hypothesis H 0 : τ 2 = 0 in random-and mixed-effects models can be tested with Cochran's Q-test (Hedges and Olkin 1985). A confidence interval for τ 2 can be obtained with the method described in Viechtbauer (2007a).
Fixed-effects models can be fitted with either weighted or unweighted least squares, again taking into consideration the heteroscedastic sampling variances. 2 As mentioned above, this provides either an estimate of the weighted or unweighted average of the true effects when not including moderators in the model. With moderators in the model, weighted estimation provides an estimate of the weighted least squares relationship between the moderator variables and the true effects, while unweighted estimation provides an estimate of the unweighted least squares relationship.

The metafor package
The metafor package provides functions for fitting the various models described above. The package is available via the Comprehensive R Archive Network (CRAN) at http://CRAN. R-project.org/package=metafor, the author's website at http://www.wvbauer.com/, or can be directly installed within R by typing install.packages("metafor") (assuming an internet connection and appropriate access rights on the computer). The current version number is 1.4-0. Once the package has been installed, it should be possible to replicate the analyses described in this paper.

Calculating outcome measures
Before beginning with a meta-analysis, one must first obtain a set of effect size estimates with their corresponding sampling variances. If these have been calculated already, for example by hand or with the help of other software, then these can be read-in from a text file with the read.table() function (see help("read.table") for details).
The metafor package also provides the escalc() function, which can be used to calculate various effect size or outcome measures (and the corresponding sampling variances) that are commonly used in meta-analyses. There are two different interfaces for using this function, a default and a formula interface. For the default interface, the arguments of the function are escalc (measure, ai, bi, ci, di, n1i, n2i, m1i, m2i, sd1i, sd2i, xi, mi, ri, ni, data = NULL, add = 1/2, to = "only0", vtype = "LS", append = FALSE) where measure is a character string specifying which outcome measure should be calculated (see below for the various options), arguments ai through ni are used to supply the needed information to calculate the various measures (depending on the outcome measure specified under measure, different arguments need to be supplied), data can be used to specify a data frame containing the variables given to the previous arguments, add and to are arguments needed when dealing with 2 × 2 table data that may contain cells with zeros, and vtype is an argument specifying the sampling variance estimate that should be calculated (see below). When setting append = TRUE, the data frame specified via the data argument is returned together with the effect size estimates and corresponding sampling variances.

Outcome measures for 2 × 2 table data
Meta-analyses in the health/medical sciences are often based on studies providing data in terms of 2 × 2 tables. In particular, assume that we have k tables of the form: where ai, bi, ci, and di denote the cell frequencies and n1i and n2i the row totals in the i-th study. For example, in a set of randomized clinical trials, group 1 and group 2 may refer to the treatment and placebo/control group, with outcome 1 denoting some event of interest (e.g., remission) and outcome 2 its complement. In a set of case-control studies, group 1 and group 2 may refer to the group of cases and the group of controls, with outcome 1 denoting, for example, exposure to some risk factor and outcome 2 non-exposure. The 2 × 2 tables may also be the result of cross-sectional (i.e., multinomial) sampling, so that none of the table margins (except the total sample size n1i + n2i) are fixed by the study design.
Depending on the type of design (sampling method), a meta-analysis of 2 × 2 table data can be based on one of several different outcome measures, including the odds ratio, the relative risk (also called risk ratio), the risk difference, and the arcsine transformed risk difference (e.g., Fleiss and Berlin 2009). For these measures, one needs to supply either ai, bi, ci, and di or alternatively ai, ci, n1i, and n2i. The options for the measure argument are then: "RR": The log relative risk is equal to the log of (ai/n1i)/(ci/n2i).
"OR": The log odds ratio is equal to the log of (ai*di)/(bi*ci).
Note that the log is taken of the relative risk and the odds ratio, which makes these outcome measures symmetric around 0 and helps to make the distribution of these outcome measure closer to normal.
Cell entries with a zero can be problematic especially for the relative risk and the odds ratio. Adding a small constant to the cells of the 2 × 2 tables is a common solution to this problem. When to = "all", the value of add is added to each cell of the 2 × 2 tables in all k tables. When to = "only0", the value of add is added to each cell of the 2 × 2 tables only in those tables with at least one cell equal to 0. When to = "if0all", the value of add is added to each cell of the 2 × 2 tables in all k tables, but only when there is at least one 2 × 2 table with a zero entry. Setting to = "none" or add = 0 has the same effect: No adjustment to the observed table frequencies is made. Depending on the outcome measure and the presence of zero cells, this may lead to division by zero inside of the function (when this occurs, the resulting Inf values are recoded to NA).

Raw and standardized mean differences
The raw mean difference and the standardized mean difference are useful effect size measures when meta-analyzing a set of studies comparing two experimental groups (e.g., treatment and control groups) or two naturally occurring groups (e.g., men and women) with respect to some quantitative (and ideally normally distributed) dependent variable (e.g., Borenstein 2009). For these outcome measures, m1i and m2i are used to specify the means of the two groups, sd1i and sd2i the standard deviations of the scores in the two groups, and n1i and n2i the sample sizes of the two groups.
"MD": The raw mean difference is equal to m1i -m2i.
"SMD": The standardized mean difference is equal to (m1i -m2i)/spi, where spi is the pooled standard deviation of the two groups (which is calculated inside of the function). The standardized mean difference is automatically corrected for its slight positive bias within the function (Hedges and Olkin 1985). When vtype = "LS", the sampling variances are calculated based on a large sample approximation. Alternatively, the unbiased estimates of the sampling variances can be obtained with vtype = "UB".

Raw and transformed correlation coefficients
Another frequently used outcome measure for meta-analyses is the correlation coefficient, which is used to measure the strength of the (linear) relationship between two quantitative variables (e.g., Borenstein 2009). Here, one needs to specify ri, the vector with the raw correlation coefficients, and ni, the corresponding sample sizes.
"COR": The raw correlation coefficient is simply equal to ri as supplied to the function. When vtype = "LS", the sampling variances are calculated based on the large sample approximation. Alternatively, an approximation to the unbiased estimates of the sampling variances can be obtained with vtype = "UB" (Hedges 1989).
"UCOR": The unbiased estimate of the correlation coefficient is obtained by correcting the raw correlation coefficient for its slight negative bias (based on equation 2.7 in Olkin and Pratt 1958). Again, vtype = "LS" and vtype = "UB" can be used to choose between the large sample approximation or approximately unbiased estimates of the sampling variances.
"ZCOR": Fisher's r-to-z transformation is a variance stabilizing transformation for correlation coefficients with the added benefit of also being a rather effective normalizing transformation (Fisher 1921). The Fisher's r-to-z transformed correlation coefficient is equal to 1/2 * log((1 + ri)/(1 -ri)).

Proportions and transformations thereof
When the studies provide data for single groups with respect to a dichotomous dependent variable, then the raw proportion, the logit transformed proportion, the arcsine transformed proportion, and the Freeman-Tukey double arcsine transformed proportion are useful outcome measures. Here, one needs to specify xi and ni, denoting the number of individuals experiencing the event of interest and the total number of individuals, respectively. Instead of specifying ni, one can use mi to specify the number of individuals that do not experience the event of interest.
"PR": The raw proportion is equal to xi/ni. "PLO": The logit transformed proportion is equal to the log of xi/(ni -xi).
"PAS": The arcsine transformation is a variance stabilizing transformation for proportions and is equal to asin(sqrt(xi/ni)).
Again, zero cell entries can be problematic. When to = "all", the value of add is added to xi and mi in all k studies. When to = "only0", the value of add is added only for studies where xi or mi is equal to 0. When to = "if0all", the value of add is added in all k studies, but only when there is at least one study with a zero value for xi or mi. Setting to = "none" or add = 0 again means that no adjustment to the observed values is made.

Formula interface
The escalc() function also provides an alternative formula interface to specify the data structure. The arguments for this interface are escalc(measure, formula, weights, data, add = 1/2, to = "only0", vtype = "LS") As above, the argument measure is a character string specifying which outcome measure should be calculated. The formula argument is then used to specify the data structure as a multipart formula (based on the Formula package; see Zeileis and Croissant 2010) together with the weights argument for the group sizes or cell frequencies. The data argument can be used to specify a data frame containing the variables in the formula and the weights variable.
The add, to, and vtype arguments work as described above.
For 2×2 table data, the formula argument takes the form outcome~group | study, where group is a two-level factor specifying the rows of the tables, outcome is a two-level factor specifying the columns of the tables, and study is a k-level factor specifying the studies. The weights argument is then used to specify the frequencies in the various cells. An example to illustrate the default and the formula interface is given in the following section.

Example
The metafor package provides the data set object dat.bcg with the results from 13 studies on the effectiveness of the BCG vaccine against tuberculosis (Colditz et al. 1994).
R> library("metafor") R> data("dat.bcg", package = "metafor") R> print(dat.bcg, row.names = FALSE) trial author year tpos tneg cpos cneg ablat alloc 1 Aronson 1948  Besides the trial number, author(s), and publication year, the data set includes information about the number of treated (vaccinated) subjects that were tuberculosis positive and negative (tpos and tneg, respectively) and similarly for the control (non-vaccinated) subjects (cpos and cneg, respectively). In addition, the absolute latitude of the study location (in degrees) and the treatment allocation method (random, alternate, or systematic assignment) are indicated for each trial.
The results of the studies can be expressed in terms of 2 × 2 tables, given by TB+ For interpretation purposes, it is important to note that the log relative risks were calculated in such a way that values below 0 indicate a lower infection risk for the vaccinated group. Except for two cases, this is also the direction of the findings in these 13 studies.

Specifying the data
The function can be used in conjunction with any of the usual effect size or outcome measures used in meta-analyses (e.g., log odds ratios, standardized mean differences, correlation coefficients). One simply needs to supply the observed outcomes via the yi argument and the corresponding sampling variances via the vi argument (or the standard errors, the square root of the sampling variances, via the sei argument). When specifying the data in this way, one must set measure = "GEN" (which is the default).
Alternatively, the function takes as input the same arguments as the escalc() function and then automatically calculates the values for the chosen effect size or outcome measure (and the corresponding sampling variances) when supplied with the needed data. The measure argument is then used to specify the desired outcome measure (arguments add, to, and vtype have the same meaning as for the escalc() function).

Specifying the model
Assuming the observed outcomes and corresponding sampling variances are supplied via yi and vi, the random-effects model is fitted with rma(yi, vi, data = dat). Restricted maximum-likelihood estimation is used by default when estimating τ 2 (the REML estimator is approximately unbiased and quite efficient; see Viechtbauer 2005). The various (residual) heterogeneity estimators that can be specified via the method argument are the "HS": Hunter-Schmidt estimator.
One or more moderators can be included in the model via the mods argument. A single moderator can be given as a (row or column) vector of length k specifying the values of the moderator. Multiple moderators are specified by giving an appropriate design matrix with k rows and p columns (e.g., using mods = cbind(mod1, mod2, mod3), where mod1, mod2, and mod3 correspond to the names of the variables for the three moderator variables). The intercept is included in the model by default unless the user sets intercept = FALSE.
Many R user will be familiar with the formula syntax used to specify the desired model in functions such as lm() and glm() (see help("formula") for details). One can also specify the desired meta-analytic model in this way by setting the mods argument equal to a one-sided formula of the form~model (e.g., mods =~mod1 + mod2 + mod3). Interactions, polynomial terms, and factors can be easily added to the model in this manner. When specifying a model formula via the mods argument, the intercept argument is ignored. Instead, the inclusion/exclusion of the intercept term is controlled by the specified formula (e.g., mods = mod1 + mod2 + mod3 -1 would remove the intercept term). A fixed-effects model can be fitted with rma(yi, vi, data = dat, method = "FE"). Here, one must consider carefully whether weighted or unweighed least squares should be used (the default is weighted = TRUE). Again, moderators can be included in the model via the mods argument.

Omnibus test of parameters
For models including moderators, an omnibus test of all the model coefficients is conducted that excludes the intercept β 0 (the first coefficient) if it is included in the model (i.e., a test of H 0 : β 1 = . . . = β p = 0). If no intercept is included in the model, then the omnibus test includes all of the coefficients in the model including the first. Alternatively, one can manually specify the indices of the coefficients to test via the btt argument. For example, btt = c(3, 4) would be used to include only the third and fourth coefficient from the model in the test (if an intercept is included in the model, then it corresponds to the first coefficient in the model).

Categorical moderator variables
Categorical moderator variables can be included in the model in the same way that appropriately (dummy) coded categorical independent variables can be included in linear models in general. One can either do the dummy coding manually or use a model formula together with the factor() function to let R handle the coding automatically. An example to illustrate these different approaches is provided below.

Knapp and Hartung adjustment
By default, the test statistics of the individual coefficients in the model (and the corresponding confidence intervals) are based on the normal distribution, while the omnibus test is based on a χ 2 distribution with m degrees of freedom (m being the number of coefficients tested). The Knapp and Hartung (2003) method (knha = TRUE) is an adjustment to the standard errors of the estimated coefficients, which helps to account for the uncertainty in the estimate of τ 2 and leads to different reference distributions. Individual coefficients and confidence intervals are then based on the t-distribution with k − p degrees of freedom, while the omnibus test statistic then uses an F-distribution with m and k − p degrees of freedom (p being the total number of model coefficients including the intercept if it is present). The Knapp and Hartung adjustment is only meant to be used in the context of random-or mixed-effects model.

Random-effects model
We will now start by fitting a random-effects model to the BCG data. Both of the commands R> res <-rma(yi, vi, data = dat) R> res and R> res <-rma(ai = tpos, bi = tneg, ci = cpos, di = cneg, data = dat, + measure = "RR") R> res yield the same output, namely Random-Effects Model (k = 13; tau^2 estimator: REML) tau^2 (estimate of total amount of heterogeneity): 0.3132 (SE = 0.1664) tau (sqrt of the estimate of total heterogeneity): 0.5597 I^2 (% of total variability due to heterogeneity): 92.22% H^2 (total variability / within-study variance): 12.86 Test for Heterogeneity: Q(df = 12) = 152.2330, p-val < .0001 Model Results: indicating that the estimated average log relative risk is equal toμ = −0.7145 (95% CI: −1.0669 to −0.3622). For easier interpretation, it may be useful to transform these values back to the relative risk scale through exponentiation (i.e., exp(μ) = 0.49 with 95% CI: 0.34 to 0.70). The results therefore suggest that the risk of a tuberculosis infection in vaccinated individuals is on average half as large as the infection risk without the vaccination. The null hypothesis H 0 : µ = 0 can be clearly rejected (z = −3.97, p < 0.0001).
The amount of heterogeneity in the true log relative risks is estimated to beτ 2 = 0.3132. Various measures for facilitating the interpretation of the estimated amount of heterogeneity were suggested by Higgins and Thompson (2002). The I 2 statistic estimates (in percent) how much of the total variability in the effect size estimates (which is composed of heterogeneity and sampling variability) can be attributed to heterogeneity among the true effects (τ 2 = 0 therefore implies I 2 = 0%). The H 2 statistic is the ratio of the total amount of variability in the observed outcomes to the amount of sampling variability (τ 2 = 0 therefore implies H 2 = 1). It is important to realize, however, thatτ 2 , I 2 , and H 2 are often estimated imprecisely, especially when the number of studies is small. With

R> confint(res)
we  which are quite wide and therefore indicate that we should not give too much credence to the exact point estimates. However, even the lower bound values of the confidence intervals are quite large and the test for heterogeneity (Q = 152.23, df = 12, p < 0.0001) suggests considerable heterogeneity among the true effects.
A graphical overview of the results so far can be obtained by creating a forest plot (Lewis and Clarke 2001) with the forest() function. While forest(res) would be sufficient, a more appealing figure can be produced with some extra code (see Figure 1). By default, the observed effects are drawn proportional to the precision of the estimates. The summary estimate based on the random-effects model is automatically added to the figure (with the outer edges of the polygon indicating the confidence interval limits). The results are shown using a log scale for easier interpretation. The figure was created with the following code.

Mixed-effects model
At least part of the heterogeneity may be due to the influence of moderators. For example, the effectiveness of the BCG vaccine may depend on the study location, as the increased abundance of non-pathogenic environmental mycobacteria closer to the equator may provide a natural protection from tuberculosis (Ginsberg 1998). Moreover, the effectiveness of the vaccine may have changed over time. We can examine these hypotheses by fitting a mixedeffects model including the absolute latitude and publication year of the studies as moderators.
R> res <-rma(yi, vi, mods = cbind(ablat, year), data = dat) R> res and R> res <-rma(yi, vi, mods =~ablat + year, data = dat) R> res produce the same results, namely The estimated amount of residual heterogeneity is equal toτ 2 = 0.1108, suggesting that (0.3132 − 0.1108)/.3132 = 65% of the total amount of heterogeneity can be accounted for by including the two moderators in the model. However, while we can reject H 0 : β 1 = β 2 = 0 based on the omnibus test (Q M = 12.20, df = 2, p < 0.01), only absolute latitude appears to have a significant influence on the effectiveness of the vaccine (i.e., for H 0 : β 1 = 0, we find z = −2.74 with p < 0.01, while for H 0 : β 2 = 0, we find z = 0.13 with p = 0.90). The test for residual heterogeneity is significant (Q E = 28.33, df = 10, p < 0.01), possibly indicating that other moderators not considered in the model are influencing the vaccine effectiveness.
The results indicate that a one degree increase in absolute latitude corresponds to a change of −0.03 (95% CI: −0.01 to −0.05) units in terms of the average log relative risk. To facilitate the interpretation of the latitude moderator, we can obtain predicted average relative risks for various absolute latitude values, holding the year constant, for example, at 1970 (one could also consider dropping the year moderator from the model altogether). For this, we use the predict() function, using the newmods argument to specify the values of the moderators and the transf argument to specify a function for transforming the results (here, we use the exp() function for exponentiation of the predicted log relative risks). By setting addx = TRUE, the results are printed together with the moderator values used to obtain the predicted values.
R> predict(res, newmods = cbind(seq(from = 10, to = 60, by = 10), 1970), + transf = exp, addx = TRUE) The results show the predicted average relative risks (pred) and the bounds of the corresponding 95% confidence intervals (ci.lb and ci.ub). The standard errors of the predicted values (se) are only provided when not using the transf argument. 3 The average relative risk is not significantly different from 1 at 10 • absolute latitude (95% CI: 0.58 to 1.50), indicating an equal infection risk on average for vaccinated and non-vaccinated individuals close to the equator. However, we see increasingly larger effects as we move further away from the equator. At 40 • , the average infection risk is more than halved (95% CI: 0.30 to 0.55) for vaccinated individuals. At 60 • , the risk of an infection in vaccinated individuals is on average only about a quarter as large (95% CI: 0.12 to 0.44).
More generally, Figure 2, shows a plot of the relative risk as a function of absolute latitude. The observed relative risks are drawn proportional to the inverse of the corresponding standard errors. The predicted effects with corresponding confidence interval bounds are also shown. For reasons to be discussed later, four studies (i.e., studies 4, 7, 12, and 13) are labeled with their study numbers in the figure. The figure (except for the labeling of the four studies) was created with the following code.

Categorical moderator variables
Moderators are often categorical, either inherently or because the information provided in articles does not allow for more fine-grained coding. Therefore, subgrouping the studies based on the levels of a categorical moderator is a frequent practice in meta-analyses. One possibility is to fit a random-effects model separately within each level. For example, we can fit separate models within each level of the treatment allocation moderator with R> rma(yi, vi, data = dat, subset = (alloc=="random")) R> rma(yi, vi, data = dat, subset = (alloc=="alternate")) R> rma(yi, vi, data = dat, subset = (alloc=="systematic")) which illustrates the use of the subset argument (which can either be a logical vector as used here or a numeric vector indicating the indices of the observations to include). However, unless differences in the amount of heterogeneity are of interest or suspected to be present within the different levels, this is not an ideal approach (as τ 2 needs to be estimated separately within each level based on an even smaller number of studies). Instead, we can fit a single mixed-effects model to these data with a dummy coded factor. First, we create the necessary dummy variables with R> dat$a.random <-ifelse(dat$alloc == "random", 1, 0) R> dat$a.alternate <-ifelse(dat$alloc == "alternate", 1, 0) R> dat$a.systematic <-ifelse(dat$alloc == "systematic", 1, 0) and then estimate separate effects for each factor level with R> rma(yi, vi, mods = cbind(a.random, a.alternate, a.systematic), + intercept = FALSE, data = dat) Instead of doing the coding manually, we can use the factor() function to handle the coding for us.

Knapp and Hartung adjustment
The next example illustrates the use of the Knapp and Hartung adjustment in the context of a model that includes both the allocation factor and the continuous absolute latitude moderator.
R> rma(yi, vi, mods =~factor(alloc) + ablat, data = dat, knha = TRUE) The omnibus test (H 0 : β 1 = β 2 = β 3 = 0) is now based on an F-distribution with m = 3 and k − p = 9 degrees of freedom, while a t-distribution with k − p = 9 degrees of freedom is now used as the reference distribution for tests and confidence intervals for the individual coefficients. Adding btt = c(2, 3) to the call would provide a test of H 0 : β 1 = β 2 = 0, that is, an omnibus test only of the allocation factor (while controlling for the influence of absolute latitude).
Usually, the Knapp and Hartung adjustment will lead to more conservative p values, although this is not guaranteed in any particular case. In general though, the Type I error rate of the tests and the coverage probability of the confidence intervals will be closer to nominal when the adjustment is used. We can easily check this for the given data by repeatedly simulating a random moderator (which is unrelated to the outcomes by definition), recording the corresponding p value, and then calculating the empirical Type I error rate. Note that the DerSimonian-Laird estimator is used, because it is non-iterative and therefore faster and guaranteed to provide an estimate for τ 2 (the REML estimator requires iterative estimation and can occasionally fail to converge; see Section 3.6 for some more technical details). The resulting empirical Type I error rates are approximately equal to 0.09 and 0.06, respectively. While the difference is not striking in this particular example, it illustrates how the Knapp and Hartung adjustment results in test statistics with closer to nominal properties. Table 1 provides an overview of the various functions and methods that can be used after fitting a model with the rma() function. Some of these additional functions will now be discussed.

Fitted/predicted values
The fitted() function can be used to obtain the fitted values for the k studies. The predict() function provides the fitted values in addition to standard errors and confidence interval bounds. As illustrated earlier, one can also use the newmods argument together with the predict() function to obtain predicted values for selected moderator values based on the fitted model. Note that for models without moderators, the fitted values are the same for all k studies (e.g.,μ in the random-effects model). The predict() function then only provides the fitted value once instead of repeating it k times.
For example, we can obtain an estimate of the average relative risk by first fitting a randomeffects model to the log relative risks and then transforming the estimated average log relative risk (i.e.,μ) back through exponentiation.

Raw and standardized residuals
Many meta-analyses will include at least a few studies yielding observed effects that appear to be outlying or extreme in the sense of being well separated from the rest of the data. Visual inspection of the data may be one way of identifying unusual cases, but this approach Function Description print() standard print method summary() alternative print method that also provides fit statistics coef() extracts the estimated model coefficients, corresponding standard errors, test statistics, p values, and confidence interval bounds vcov() extracts the variance-covariance matrix of the model coefficients fitstats() extracts the (restricted) log likelihood, deviance, AIC, and BIC fitted() fitted values predict() fitted/predicted values (with confidence intervals), also for new data blup() best linear unbiased predictions (BLUPs) of the true outcomes residuals() raw residuals rstandard() internally standardized residuals rstudent() externally standardized (studentized deleted) residuals hatvalues() extracts the diagonal elements of the hat matrix weights() extracts the weights used for model fitting influence() various case and deletion diagnostics leave1out() leave-one-out sensitivity analyses for fixed/random-effects models forest() forest plot funnel() funnel plot radial() radial (Galbraith) plot qqnorm() normal quantile-quantile plot plot() general plot function for model objects addpoly() function to add polygons to a forest plot ranktest() rank correlation test for funnel plot asymmetry regtest() regression tests for funnel plot asymmetry trimfill() trim and fill method confint() confidence interval for the amount of (residual) heterogeneity in random-and mixed-effects models (confidence intervals for the model coefficients can also be obtained) cumul() cumulative meta-analysis for fixed/random-effects models anova() model comparisons in terms of fit statistics and likelihoods permutest() permutation tests for model coefficients may be problematic especially when dealing with models involving one or more moderators. Moreover, the studies included in a meta-analysis are typically of varying sizes (and hence, the sampling variances of the y i values can differ considerably), further complicating the issue. A more formal approach is based on an examination of the residuals in relation to their corresponding standard errors.
Various types of residuals have been defined in the context of linear regression (e.g., Cook and Weisberg 1982), which can be easily adapted to the meta-analytic models. Most importantly, rstandard() and rstudent() provide internally and externally standardized residuals, respectively (residuals() provides the raw residuals). If a particular study fits the model, its standardized residual follows (asymptotically) a standard normal distribution. A large stan-dardized residual for a study therefore may suggest that the study does not fit the assumed model (i.e., it may be an outlier).
For example, Figure 2 indicates that studies 7, 12, and 13 have observed outcomes that deviate noticeably from the model. However, the size of a residual must be judged relative to the precision of the predicted average effect for the corresponding study, while taking the amount of residual heterogeneity and the amount of sampling variability into consideration. Clearly, this is difficult to do by eye. On the other hand, the externally standardized residuals for this model can be easily obtained with R> res <-rma(yi, vi, mods = cbind(ablat, year), data = dat) R> rstudent(res) suggesting that only studies 7 and 13 have relatively 'large' residuals (even though the log relative risk for study 12 deviates considerably from the corresponding predicted average effect under the fitted model, the estimate for study 12 is also the least precise one of all 13 studies).

Influential case diagnostics
An outlying case may not be of much consequence if it exerts little influence on the results. However, if the exclusion of a study from the analysis leads to considerable changes in the fitted model, then the study may be considered to be influential. Case deletion diagnostics known from linear regression (e.g., Belsley et al. 1980;Cook and Weisberg 1982) can be adapted to the context of meta-analysis to identify such studies. The influence() function provides the following diagnostic measures for the various meta-analytic models: externally standardized residuals, DFFITS values, Cook's distances, covariance ratios, DFBETAS values, the estimates of τ 2 when each study is removed in turn, the test statistics for (residual) heterogeneity when each study is removed in turn, metafor: Conducting Meta-Analyses in R the diagonal elements of the hat matrix, and the the weights (in %) given to the observed outcomes during the model fitting.
For example, for the mixed-effects model with absolute latitude and publication year as moderators, we can obtain these diagnostic measures with R> res <-rma(yi, vi, mods = cbind(ablat, year), data = dat) R> inf <-influence(res) R> inf Instead of printing these results, we can use

R> plot(inf, plotdfb = TRUE)
to obtain two plots, the first with the various diagnostic measures except the DFBETAS values and the second with the DFBETAS values. Figure 3 shows the first of these two plots, which suggests that studies 7 and 13 introduce some additional residual heterogeneity into the model (i.e., removing these studies in turn would yield considerably smaller estimates of τ 2 ), but only have a modest influence on the fit of the model (the plot of the Cook's distances shows this most clearly). On the other hand, removing study 4 would yield little change in the amount of residual heterogeneity, but its influence on the model fit is more considerable. Due to its large Cook's distance and hat value, it is also colored in red in the figure (a plot of the absolute latitudes against the publication years for the 13 studies (i.e., plot(dat$ablat, dat$year)) reveals the reason for the large influence of the 4th study).
For models without moderators, one can also use the leave1out() function to repeatedly fit the model, leaving out one study at a time. For example, R> res <-rma(yi, vi, data = dat) R> leave1out(res, transf = exp, digits = 3)  shows the results from the random-effects model, leaving out one study at a time (the transf argument can again be used to specify a function which transforms the model estimate and the confidence interval bounds; the standard errors are then NA).
Plot functions (forest, funnel, radial, and Q-Q normal plots) The metafor package provides several functions for creating plots that are frequently used in meta-analyses. Several examples are given in this section to illustrate how such plots can be created.
The use of the forest() function for creating forest plots from fitted model objects was already illustrated earlier ( Figure 1). An additional example of a forest plot is shown in Figure 4 which shows how to create forest plots from individual effect size estimates and the corresponding sampling variances and illustrates the use of the addpoly() function for adding (additional) polygons to such plots. This is particularly useful to indicate the estimated effects for representative sets of moderator values or for subgroups of studies. The figure was created with the following code.
R> res <-rma(yi, vi, data = dat) R> funnel(res, main = "Random-Effects Model") R> res <-rma(yi, vi, mods = cbind(ablat), data = dat) R> funnel(res, main = "Mixed-Effects Model")  Radial (or Galbraith) plots were suggested by Rex Galbraith (1988aGalbraith ( ,b, 1994 as a way to assess the consistency of observed outcomes that have differing precisions (e.g., due to heteroscedastic sampling variances). For a fixed-effects model, the radial() function creates a plot showing the inverse of the standard errors on the horizontal axis (i.e., 1/ √ v i ) against the individual observed outcomes standardized by their corresponding standard errors on the vertical axis (i.e., y i / √ v i ). On the right hand side of the plot, an arc is drawn. A line projected from (0, 0) through a particular point within the plot onto this arc indicates the value of the observed outcome for that point. For a random-effects model, the function uses 1/ √ v i +τ 2 for the horizontal and y i / √ v i +τ 2 for the vertical axis. Figure 6 shows two examples of radial plots, one for a fixed-and the other for a random-effects model. The figures were created with the following code.
R> res <-rma(yi, vi, data = dat, method = "FE") R> radial(res, main = "Fixed-Effects Model") R> res <-rma(yi, vi, data = dat, method = "REML") R> radial(res, main = "Random-Effects Model") The qqnorm() function creates Q-Q normal plots which can be a useful diagnostic tool in meta-analyses (Wang and Bushman 1998). The plot shows the theoretical quantiles of a normal distribution on the horizontal axis against the observed quantiles of the (externally) standardized residuals on the vertical axis. For reference, a line is added to the plot with a slope of 1, going through the (0, 0) point. By default, a pseudo confidence envelope is also added to the plot. The envelope is created based on the quantiles of sets of pseudo residuals simulated from the given model (for details, see Cook and Weisberg 1982). between 0 and 10, with higher numbers indicating increasing smoothness (bass = 0 by default). Figure 7 shows two examples of Q-Q normal plots, the first for a random-effects model and the second for a mixed-effects model with absolute latitude as a moderator. The figures were created with the following code.

Tests for funnel plot asymmetry
As mentioned earlier, funnel plots can be a useful graphical device for diagnosing certain forms of publication bias. In particular, if studies with small and/or non-significant findings remain unpublished (and therefore are less likely to be included in a meta-analysis), then this may result in an asymmetric funnel plot (Light and Pillemer 1984;Sterne and Egger 2001;Rothstein et al. 2005). One may be able to detect such asymmetry by testing whether the observed outcomes (or residuals from a model with moderators) are related to their corresponding sampling variances, standard errors, or more simply, sample sizes.
Various tests for funnel plot asymmetry of this form have been suggested in the literature, including the rank correlation test by Begg and Mazumdar (1994) and the regression test by Egger et al. (1997). Extensions, modifications, and further developments of the regression test are described (among others) by Macaskill et al. (2001), Sterne and Egger (2005), Harbord et al. (2006), Peters et al. (2006), Rücker et al. (2008), and Moreno et al. (2009). The various versions of the regression test differ in terms of the model (either a weighted regression with a multiplicative dispersion term or one of the meta-analytic models is used), in terms of the independent variable that the observed outcomes are hypothesized to be related to when publication bias is present (suggested predictors include the standard error, the sampling variance, the sample size, and the inverse of the sample size), and in terms of the suggested outcome measure to use for the test (e.g., for 2 × 2 table data, one has the choice between various outcome measures, as described earlier).
The ranktest() and regtest() functions can be used to carry out the rank correlation and the regression tests. For the regression test, the arguments of the function are regtest(x, model = "rma", predictor = "sei", ni = NULL, ...) where x is a fitted model object. One can choose the model used for the test via the model argument, with model = "lm" for weighted regression with a multiplicative dispersion term or model = "rma" for the standard meta-analytic model (the default). In the latter case, arguments such as method, weighted, and knha used during the initial model fitting are also used for the regression test. Therefore, if one wants to conduct the regression test with a mixed-effects model, one should first fit a model with, for example, method = "REML" and then use the regtest() function on the fitted model object.
The predictor is chosen via the predictor argument, with predictor = "sei" for the standard error (the default), predictor = "vi" for the sampling variance, predictor = "ni" for the sample size, and predictor = "ninv" for the inverse of the sample size. The fitted model object will automatically contain information about the sample sizes when measure was not equal to "GEN" during the initial model fitting. The sample sizes can also be supplied via the ni argument when measure = "GEN" during the initial model fitting.
For example, to carry out the regression test with a weighted regression model using the standard error as the predictor, we would use the following code.
R> res <-rma(yi, vi, data = dat) R> regtest(res, model = "lm") Regression Test for Funnel Plot Asymmetry model: weighted regression with multiplicative dispersion predictor: standard error t = -1.4013, df = 11, p = 0.1887 We may also want to control for the influence of potential moderators and use a meta-analytic mixed-effects model together with the sample size of the studies as a potential predictor. The regression test could then be carried out as follows.

Trim and fill method
The trim and fill method is a nonparametric (rank-based) data augmentation technique proposed by Duval and Tweedie (2000a;2000b; see also Duval 2005). The method can be used to estimate the number of studies missing from a meta-analysis due to the suppression of the most extreme results on one side of the funnel plot. The method then augments the observed data so that the funnel plot is more symmetric. The trim and fill method can only be used in the context of the fixed-or random-effects model (i.e., in models without moderators). The method should not be regarded as a way of yielding a more "valid" estimate of the overall effect or outcome, but as a way of examining the sensitivity of the results to one particular selection mechanism (i.e., one particular form of publication bias).
After fitting either a fixed-or a random-effects model, one can use the trimfill() function to carry out the trim and fill method on the fitted model object. The syntax of the function is given by trimfill(x, estimator = "L0", side = NULL, maxit = 50, verbose = FALSE, ...) where x again denotes the fitted model object, estimator is used to choose between the "L0" or "R0" estimator for the number of missing studies (see references), side is an argument to indicate on which side of the funnel plot the missing studies should be imputed (if side = NULL, the side is chosen within the function depending on the results of the regression test), maxit denotes the maximum number of iterations to use for the trim and fill method, and verbose can be set to TRUE to obtain information about the evolution of the algorithm underlying the trim and fill method.
To illustrate the use of the function, we can fit a fixed-effects model to the BCG vaccine data and then use the trim and fill method to obtain the estimated number of missing studies. The model object is automatically augmented with the missing data and can then be printed.
R> res <-rma(yi, vi, data = dat, method = "FE") R> rtf <-trimfill(res) R> rtf Even though the estimated effect of the vaccine is smaller with the missing studies filled in, the results still indicate that the effect is statistically significant. A funnel plot with the filled-in studies can now be obtained with R> funnel(rtf) Figure 8 shows the resulting funnel plot with the filled-in data.
As a final note to the issue of funnel plot asymmetry and publication bias, it is worth mentioning the copas package ), which can be used together with the meta package (Schwarzer 2010) and provides additional methods for modeling and adjusting for bias in a meta-analysis via selection models (Copas 1999;Copas andShi 2000, 2001).

Cumulative meta-analysis
In a cumulative meta-analysis, an estimate of the average effect is obtained sequentially as studies are added to the analysis in (typically) chronological order (Chalmers and Lau 1993;Lau et al. 1995). Such analyses are usually conducted retrospectively (i.e., after all of the studies have already been conducted), but may be planned prospectively (Whitehead 1997 Figure 9: Forest plot, showing the results from a cumulative meta-analysis of 13 studies examining the effectiveness of the BCG vaccine for preventing tuberculosis. Cumulative meta-analyses can also be carried out with the metafor package using the cumul() function. The function takes as its first argument a fitted model object (either a fixed-or a random-effects model) and then refits the same model k times adding one study at a time.
The order argument of the function can be set equal to a vector with indices giving the desired order for the cumulative meta-analysis. The results can either be printed or passed on to the forest() function, which then creates a cumulative forest plot. For example, R> res <-rma(yi, vi, data = dat, slab = paste(author, year, sep=", ")) R> rcu <-cumul(res, order = order(dat$year)) R> forest(rcu, xlim = c(-6, 3), atransf = exp, + at = log(c(0.10, 0.25, 0.5, 1, 2))) R> text (-6, 15, "Author(s) and Year", pos = 4, font = 2) R> text(3, 15, "Relative Risk [95% CI]", pos = 2, font = 2) results in the forest plot shown in Figure 9. Note that the study labels must already be specified via the rma() function (via argument slab), so that they can be properly ordered by the cumul() function. Although the effectiveness of the vaccine appears to be decreasing over time, this finding is related to the fact that the more recent studies were conducted closer to the equator.

Model fit statistics
Model fit statistics can be obtained via the fitstats() function. In particular, the (restricted) log likelihood, deviance (−2 times the log likelihood), AIC, and BIC are provided.
The unrestricted log likelihood is computed (and used for calculating the deviance, AIC, and BIC), unless REML estimation is used to estimate τ 2 (in which case the restricted log likelihood is used). Note that τ 2 is counted as an additional parameter in the calculation of the AIC and BIC in random/mixed-effects models.

Likelihood ratio tests
As an alternative to Wald tests, one can conduct full versus reduced model comparisons via likelihood ratio tests with the anova() function. The function provides information about the fit statistics of two models and the corresponding results from a likelihood ratio test. Obviously, the two models must be based on the same set of data and should be nested for the likelihood ratio test to make sense. Also, likelihood ratio tests are not meaningful when using REML estimation and the two models differ with respect to their fixed effects. Therefore, to test moderator variables via likelihood ratio tests, one must switch to maximum likelihood (ML) estimation.
To illustrate the use of the anova() function, suppose that we would like to conduct a likelihood ratio test of the absolute latitude moderator.
R> res1 <-rma(yi, vi, mods = cbind(ablat), data = dat, method = "ML") R> res2 <-rma(yi, vi, data = dat, method = "ML") R> anova(res1, res2) The null hypothesis H 0 : β 1 = 0 is rejected (LRT = 9.96, df = 1, p < 0.002), again suggesting that absolute latitude does have an influence on the average effectiveness of the vaccine. The function also provides the estimates of τ 2 from both models and indicates how much of the (residual) heterogeneity in the reduced model is accounted for in the full model (i.e., 100% × (τ 2 R −τ 2 F )/τ 2 R , whereτ 2 F andτ 2 R are the estimated values of τ 2 in the full and reduced model, respectively). In this example, approximately 88% of the total heterogeneity in the true effects is accounted for by the absolute latitude moderator.
In principle, one can also consider likelihood ratio tests for (residual) heterogeneity (i.e., for testing H 0 : τ 2 = 0) in random-and mixed-effects models. The full model should then be fitted with either method = "ML" or method = "REML" and the reduced model with method = "FE" (while keeping the fixed effects the same in both models). The p value from the test would then be based on a χ 2 distribution with 1 degree of freedom, but actually needs to be adjusted for the fact that the parameter (i.e., τ 2 ) falls on the boundary of the parameter space under the null hypothesis. However, the Q-test usually keeps better control of the Type I error rate anyway and therefore should be preferred (see Viechtbauer 2007b for more details). Follmann and Proschan (1999) and Higgins and Thompson (2004) have suggested permutation tests of the model coefficients in the context of meta-analysis as an alternative approach to the standard (Wald and likelihood ratio) tests which assume normality of the observed effects (as well as the true effects in random/mixed-effects models) and rely on the asymptotic behavior of the test statistics.

Permutation tests
For models without moderators, the permutation test is carried out by permuting the signs of the observed effect sizes or outcomes. The (two-sided) p value of the permutation test is then equal to twice the proportion of times that the test statistic under the permuted data is as extreme or more extreme than under the actually observed data.
For models with moderators, the permutation test is carried out by permuting the rows of the design matrix. The (two-sided) p value for a particular model coefficient is then equal to twice the proportion of times that the test statistic for the coefficient under the permuted data is as extreme or more extreme than under the actually observed data. Similarly, for the omnibus test, the p value is the proportion of times that the test statistic for the omnibus test is more extreme than the actually observed one.
Permutation tests can be carried out with the permutest() function. The function takes as its first argument a fitted model object. If exact = TRUE, the function will try to carry out an exact permutation test. An exact permutation test requires fitting the model to each possible permutation once. However, the number of possible permutations increases rapidly with the number of outcomes/studies (i.e., k), especially with respect to the possible number of permutations of the design matrix. For example, for k = 5, there are only 120 possible permutations of the design matrix (32 possible permutations of the signs). For k = 8, there are already 40,320 (256). And for k = 10, there are 3,628,800 (1024). Therefore, going through all possible permutations may become infeasible.
Instead of using an exact permutation test, one can set exact = FALSE (which is also the default). In that case, the function approximates the exact permutation-based p value(s) by going through a smaller number (as specified by the iter argument) of random permutations (iter = 1000 by default). Therefore, running the function twice on the same data will then yield (slightly) different p values. Setting iter sufficiently large ensures that the results become stable. For example, R> res <-rma(yi, vi, data = dat) R> permutest(res, exact = TRUE) Running 8192 iterations for exact permutation test. shows the results for an approximate permutation test with 10000 iterations (an exact test is not feasible here, as it would require more than 6 × 10 9 model fits). Although the conclusions are unchanged for the random-and the mixed-effects models, we see that the permutationbased p values are larger (i.e., more conservative) when compared to the corresponding results presented earlier.
By setting retpermdist = TRUE, the permutation distributions of the test statistics for the individual coefficients and the omnibus test are returned together with the object. These elements are named zval.perm and QM.perm and can be used to examine the permutation distributions. For example, Figure 10 shows a histogram of the permutation distribution of the test statistic for absolute latitude, together with the standard normal density (in red) and a kernel density estimate of the permutation distribution (in blue). The figure shows that the tail area under the permutation distribution is larger than under the standard normal density (hence, the larger p value in this case). Figure 10 was created with the following code (leaving out the code for the annotations): R> hist(permres$zval.perm[,2], breaks = 140, freq = FALSE, xlim = c(-5, 5), + ylim = c(0, 0.4), main = "", xlab = "Value of Test Statistic") R> abline(v = res$zval[2], lwd = 2, lty = "dashed") R> abline(v = 0, lwd = 2) R> curve(dnorm, from = -5, to = 5, add = TRUE, lwd = 2, + col = rgb(1, 0, 0, alpha = 0.7)) R> lines(density(permres$zval.perm[,2]), lwd = 2, + col = rgb(0, 0, 1, alpha = 0.7)) Like the Knapp and Hartung adjustment, permutation tests lead to test statistics with better control of the Type I error rate. To examine this, a simulation was conducted like in Section 3.4, randomly generating values for an unrelated moderator and then testing its significance via a permutation test. This was repeated 10000 times with 5000 iterations for the

Best linear unbiased predictions
For random/mixed-effects models, the blup() function calculates the best linear unbiased predictions (BLUPs) of the true outcomes by combining the fitted values (which are based only on the fixed effects of the model) and the estimated contributions of the random effects (e.g. Morris 1983;Raudenbush and Bryk 1985;Robinson 1991). Corresponding standard errors and prediction interval bounds are also provided. 5 There are various ways of interpreting these values. Essentially, the BLUPs are the predicted θ i values in the random-and mixedeffects models (Equations 2 and 3) for the k studies included in the meta-analysis. From a Bayesian perspective, the BLUPs can also be regarded as the means (and due to normality also the modes) of the posterior distributions of the θ i values under vague priors on the model coefficients.
What is most notable about the BLUPs is their "shrinkage" behavior, which is most easily illustrated in the context of a random-effects model. Suppose that τ 2 = 0 and we want to obtain estimates of the study-specific θ i values. Then the best estimates would all be equal toμ ≡θ, since homogeneity implies that there is only one true effect. On the other hand, when τ 2 is very large, thenμ contains very little information about the location of the studyspecific true effects. Instead, we should then place more emphasis on the observed estimates, especially for studies with small sampling variances (where the observed estimates will tend to be close to the corresponding θ i values). In general, the BLUPs will fall somewhere in between y i andμ, depending on the size of τ 2 and the amount of sampling variability. In that sense, the observed data are "shrunken" towardsμ. A plot of the observed log relative risks and the corresponding BLUPs is shown in Figure 11, clearly demonstrating the shrinkage behavior of the BLUPs, especially for the smaller studies (i.e., studies with larger sampling variances).

Technical details
This section concludes with some technical details about the algorithms and methods used in the metafor package, in particular with respect to the rma() function. Some issues related to the assumptions of the model underlying the rma() function are also touched on.
While the HS, HE, DL, and SJ estimators of τ 2 are based on closed-form solutions, the ML, REML, and EB estimators must be obtained numerically. For this, the rma() function makes use of the Fisher scoring algorithm, which is robust to poor starting values and usually converges quickly (Harville 1977;Jennrich and Sampson 1976). By default, the starting value is set equal to the value of the Hedges estimator and the algorithm terminates when the change in the estimated value of τ 2 is smaller than 10 −5 from one iteration to the next. The maximum number of iterations is 100 by default (which should be sufficient in most case). A different starting value, threshold, and maximum number of iterations can be specified via the control argument by setting control = list(tau2.init = value, threshold = value, maxiter = value) when calling the rma() function. The step length of the Fisher scoring algorithm can also be manually adjusted by a desired factor with control = list(stepadj = value) (values below 1 will reduce the step length). Information on the evolution of the algorithm can be obtained with control = list(verbose = TRUE).
All of the heterogeneity estimators except SJ can in principle yield negative estimates for the amount of (residual) heterogeneity. However, negative estimates of τ 2 are outside of the parameter space. For the HS, HE, and DL estimators, negative estimates are therefore truncated to zero. For ML, REML, and EB estimation, the Fisher scoring algorithm makes use of step halving to guarantee a non-negative estimate. For those brave enough to step into risky territory, there is the option to set the lower bound of τ 2 equal to some other value besides zero with control = list(tau2.min = value). 6 The Hunter-Schmidt estimator for the amount of heterogeneity is defined in Hunter and Schmidt (2004) only in the context of the random-effects model when analyzing correlation coefficients. A general version of this estimator for the random-effects model not specific to any particular outcome measure is described in Viechtbauer (2005). The same idea can be easily extended to the mixed-effects model and is implemented in the rma() function.
Outcomes with non-positive sampling variances are problematic. If a sampling variance is equal to zero, then its weight will be 1/0 for fixed-effects models when using weighted estimation. Switching to unweighted estimation is a possible solution then. For random/mixedeffects model, some estimators of τ 2 are undefined when there is at least one sampling variance equal to zero. Other estimators may work, but it may still be necessary to switch to unweighted model fitting, especially when the estimate of τ 2 turns out to be zero.
A "singular matrix" error when using the function indicates that there is a linear relationship between the moderator variables included in the model. For example, two moderators that correlated perfectly would cause this error. Deleting (redundant) moderator variables from the model as needed should solve this problem.
Finally, some words of caution about the assumptions underlying the models are warranted: The sampling variances (i.e., the v i values) are treated as if they were known constants. Since this is usually only asymptotically true, this implies that the distributions of the test statistics and corresponding confidence intervals are only exact and have nominal coverage when the within-study sample sizes are large (i.e., when the error in the sampling variance estimates is small). Certain outcome measures (e.g., the arcsine transformed risk difference and Fisher's r-to-z transformed correlation coefficient) are based on variance stabilizing transformations that also help to make the assumption of known sampling variances more reasonable.
When fitting a mixed/random-effects model, τ 2 is estimated and then treated as a known constant thereafter. This ignores the uncertainty in the estimate of τ 2 . As a consequence, the standard errors of the parameter estimates tend to be too small, yielding test statistics that are too large and confidence intervals that are not wide enough. The Knapp and Hartung adjustment can be used to counter this problem, yielding test statistics and confidence intervals whose properties are closer to nominal.
Most effect size measures are not exactly normally distributed as assumed under the various models. However, the normal approximation usually becomes more accurate for most effect size or outcome measures as the within-study sample sizes increase. Therefore, sufficiently large within-study sample sizes are (usually) also needed to be certain that the tests and confidence intervals have nominal levels/coverage. Again, certain outcome measures (e.g., Fisher's r-to-z transformed correlation coefficient) may be preferable from this perspective as well.
These concerns apply in particular to the standard (i.e., Wald and likelihood ratio) tests (and the corresponding confidence intervals). Permutation tests may provide better control of the Type I error rate when some of these assumptions are violated, but more research is needed to determine the properties of such tests for different outcome measures.

Validation of the package
The functions in the metafor package have been validated to the extent possible (i.e., when corresponding analyses could be carried out) by comparing the results provided by the metafor package with those provided by other software packages for several data sets (including the BCG vaccine data set described in the present article).
In particular, results were compared with those provided by the metan, metareg, metabias, and metatrim commands in Stata (StataCorp. 2007; see Sterne 2009 for more details on these commands). Results were also compared with those provided by SAS (SAS Institute Inc. 2003) using the proc mixed command (van Houwelingen et al. 2002), by SPSS (SPSS Inc. 2006) using the macros described in Lipsey and Wilson (2001), and by the meta (Schwarzer 2010) and rmeta (Lumley 2009) packages in R (R Development Core Team 2010). Results either agreed completely or fell within a margin of error expected when using numerical methods.

Comparison between packages
Several packages for conducting meta-analyses are currently available for R via CRAN. These include the metafor (Viechtbauer 2010), meta (Schwarzer 2010), and rmeta (Lumley 2009) packages, all of which could be considered "general purpose" meta-analysis packages (i.e., they can be used for arbitrary effect size or outcome measures). 7 This section provides a brief metafor meta rmeta  comparison between these packages in terms of their current capabilities.
All three packages allow the user to fit fixed-and random-effects models (without moderators). The user can either specify the observed outcomes and the corresponding sampling variances (or standard errors) directly (via the metagen() function in meta and the meta.summaries() function in rmeta) or can provide the necessary information (e.g., 2 × 2 table data) so that the outcomes (e.g., log relative risks) and sampling variances are automatically computed inside of the functions (metabin(), metacont(), and metaprop() in meta; meta.DSL() and meta.MH() in rmeta). For random-effects models, the meta and rmeta packages allow estimation of τ 2 only via the DerSimonian-Laird estimator, while the metafor package provides several estimator choices (see Section 3.3).
Forest and funnel plots can be created with all three packages. Radial plots are implemented in the metafor and meta packages. The meta package also provides L'Abbé plots (L'Abbé et al. 1987) (which will be added to the metafor package in the future). Q-Q normal plots can be obtained with the metafor package.
The meta package allows the user to specify a categorical moderator (via the byvar argument), which is then used for a moderator analysis based on a fixed-effects model. Mixed-effects models (involving a single or multiple categorical and/or continuous moderators) can only be fitted with the metafor package. Advanced methods for testing model coefficients and obtaining confidence intervals (i.e., the Knapp and Hartung adjustment and permutation tests) are also implemented only in this package.
A more detailed comparison of the capabilities of the packages can be found in Table 2.

Conclusions
The present article is meant to provide a general overview of the capabilities of the metafor package for conducting meta-analyses with R. The discussion was focused primarily on the rma() function, which allows for fitting fixed-and random/mixed-effects models with or without moderators via the usual mechanics of the general linear (mixed-effects) model.
Alternative methods for fitting the fixed-effects model for 2 × 2 table data are the Mantel-Haenszel and Peto's one-step method (Mantel and Haenszel 1959;Yusuf et al. 1985). The Mantel-Haenszel method is implemented in the rma.mh() function. It can be used to obtain an estimate of the overall odds ratio, relative risk, or risk difference. The method is particularly advantageous when aggregating a large number of tables with small sample sizes (the so-called sparse data or increasing strata case). When analyzing odds ratios, the Cochran-Mantel-Haenszel test (Mantel and Haenszel 1959;Cochran 1985) and Tarone's test for heterogeneity (Tarone 1985) are also provided.
Yet another method that can be used in the context of a meta-analysis of 2 × 2 tables is Peto's method (Yusuf et al. 1985), implemented in the rma.peto() function. The method provides a fixed-effects model estimate of the overall odds ratio. In sparse data situations and under certain conditions, the method has been shown to produce the least biased results with the most accurate confidence interval coverages (Bradburn et al. 2007), but can also be quite biased in other situations (Greenland and Salvan 1990).
It is important to note that all of these model fitting functions assume that the observed outcomes (or tables) are independent. At the very least, this implies that a particular participant should only contribute data once when calculating the observed outcomes. More complex models are necessary to deal with correlated outcomes and multivariate analyses.
Functions to handle such situations are currently under development and will be included in the package at a later point.
Although the present article discusses only some of the functions and options of the metafor package, it should provide a starting point for those interested in exploring the capabilities of the package in more detail.