beanz : An R Package for Bayesian Analysis of Heterogeneous Treatment Eﬀects with a Graphical User Interface

In patient-centered outcomes research, it is essential to assess the heterogeneity of treatment eﬀects (HTE) when making health care decisions for an individual patient or a group of patients. Nevertheless, it remains challenging to evaluate HTE based on information collected from clinical studies that are often designed and conducted to evaluate the eﬃcacy of a treatment for the overall population. The Bayesian framework oﬀers a principled and ﬂexible approach to estimate and compare treatment eﬀects across sub-groups of patients deﬁned by their characteristics. In this paper, we describe the package beanz which facilitates the conduct of Bayesian analysis of HTE by allowing users to explore a wide range of Bayesian HTE analysis models and produce posterior inferences about HTE. The package beanz also provides a web-based graphical user interface (GUI) for users to conduct the Bayesian analysis of HTE in an interactive and user-friendly manner. With the GUI feature, package beanz can also be used by analysts not familiar with the R environment. We demonstrate package beanz using data from a randomized controlled trial on angiotensin converting enzyme inhibitor for treating congestive heart failure ( N = 2569).


Introduction
Clinical trials typically evaluate the efficacy of a medical intervention based on its overall treatment effect, the average benefit or harm the enrolled patients will achieve. However, individuals respond to medical interventions in different ways. Even for a treatment that is effective on average, some individuals may derive substantial benefit; some little benefit; others could actually be harmed. Thus, understanding this heterogeneity of treatment effect (HTE) is critical in evaluating how well a treatment can be expected to work for an individual patient. Such understanding is especially important in patient-centered outcomes research (PCOR) that aims to help patients and stakeholders make informed personalized health care decisions (Varadhan, Segal, Boyd, Wu, and Weiss 2013).
Although the importance of understanding HTE is obvious, reliable identification of HTE is challenging. Ideally, we would like to be able to estimate individual treatment effects (ITE). However, ITE estimation is not feasible because it is impossible to observe the outcomes of both the treatment and control for the same individual. This is the fundamental problem of causal inference (Holland 1986). Consequently, subgroup analysis is commonly used to evaluate HTE, but it is challenging due to the need to specify the groups and control for multiplicity in a low-information context.
The Bayesian approach provides a principled way to formulate treatment effect heterogeneity across patient characteristics. The approach includes a prior distribution that allows expression of optimism, skepticism or agnosticism concerning the differences in treatment effect. The Bayesian approach also allows a great deal of flexibility in making inferences on a variety of questions such as "what is the probability that women derive more benefit from the treatment than men?" A frequentist approach, on the other hand, would ask: "what is the probability of observing a difference in treatment effect between men and women, if in truth there were no difference?" This way of framing the question implies a premature dichotomization: the treatment effect is either the same in men and women or it is not, based on arbitrary conventions (e.g., the significance level is 0.05). While the Bayesian approach is principled and flexible, there are several challenges in conducting Bayesian HTE analysis, including the need to specify prior distributions, little guidance, and the lack of easy-to-use software to implement the HTE models.
There are a few software packages for subgroup analysis available for the statistical software environment R (R Core Team 2018) on the Comprehensive R Archive Network (CRAN), most of which implement frequentist methods. Package quint (Dusseldorp, Doove, and van Mechelen 2015) applies a tree-based clustering method for inducing subgroups that are involved in qualitative treatment-subgroup interactions. Package subgroup (Schou and Marschner 2014) considers order statistics-based measures to evaluate the magnitude and the nature of the variation in treatment effects and provides non-inferential visual aids. Package FINDIt (Egami, Ratkovic, and Imai 2018) applies machine learning methods to identify subsets of the population who benefit or are harmed by a treatment and to select the most or least efficacious treatment from many alternative treatments. Package DSBayes (Varadhan and Yao 2014) implements the Bayesian Dixon and Simon model (Dixon and Simon 1991) for subgroup analysis. To the best of our knowledge, there is no statistical software package for comprehensive Bayesian HTE analysis, let alone one with a graphical user interface (GUI).
To this end, we describe the R package beanz (Wang, Varadhan, and Trustees of Columbia University 2018) that facilitates conducting Bayesian HTE analysis. The package gives users the ability to explore a wide range of Bayesian HTE analysis models and obtain posterior inferences related to HTE. Specifically, we develop a web-based GUI for package beanz that allows users to apply functions in package beanz in an interactive and user-friendly manner. With the GUI feature, package beanz can also be used by analysts not familiar with the R environment. Package beanz is available from the Comprehensive R Archive Network (CRAN) at https://CRAN.R-project.org/package=beanz.
In this paper, we use the data from the SOLVD trial (SOLVD Investigators et al. 1991) for the demonstration of package beanz. SOLVD was a randomized, double-blind, placebo-controlled study evaluating the efficacy of the drug Enalapril compared to placebo on mortality and hospitalization in patients with congestive heart failure (CHF). From 1986 to 1989, a total of 2569 patients with CHF and ejection fraction ≤ 0.35 were enrolled and randomly assigned to the Enalapril arm or the placebo arm in a 1:1 randomization ratio. At the end of the study, 735 patients had died or were hospitalized in the placebo group as compared with 613 in the Enalapril group. The overall treatment effect on mortality was statistically significant with log-rank test p value < 0.0001. The SOLVD protocol pre-specified examination of subgroup effects for clinically important baseline factors. For the current demonstration we focus on baseline covariates sodium level, usage of vasodilators other than angiotensin-convertingenzyme inhibitors, and ejection fraction.
We emphasize that subgroup analysis are mainly exploratory, unless they were pre-specified in the study protocol at the design stage (Assmann, Pocock, Enos, and Kasten 2000;Pocock, Assmann, Enos, and Kasten 2002;Jones, Ohlssen, Neuenschwander, Racine, and Branson 2011). Moreover, although the software makes it easy to conduct sophisticated Bayesian modeling for subgroup analysis, there are intricate issues involved in such modeling. Chiefly, specifying the prior distribution for parameters in the model is challenging, and there is no automatic method that will handle all situations. In Henderson, Louis, Wang, and Varadhan (2016), we discussed the issue of prior selection and sensitivity analysis for Bayesian subgroup analysis in greater detail.
The reminder of the paper is organized as follows. In Section 2, we introduce the Bayesian HTE analysis models implemented in package beanz. In Section 3, we demonstrate the beanz package, used in the R interactive mode, with a comprehensive example using data from SOLVD. In Section 4, we describe the details of the GUI of package beanz, including its architecture and user manual. We revisit the SOLVD example to illustrate the GUI of package beanz in Section 5. Section 6 is devoted to the discussion.

Notation
Consider a randomized two-arm clinical trial. For patient i, let Y i denote the response and Z i denote treatment arm assignment (Z i ∈ {0, 1}). For subgroup analysis, assume there are P baseline covariates of interest, X 1 , X 2 , . . . , X P , that are binary, ordinal with numerical values, or nominal. If a covariate is ordinal with character labels, then we assume the covariate can be re-coded with numerical values. For j = 1, . . . , P , assume X j ∈ {x j,1 , . . . , x j,n j } with n j ≥ 2. If X j is binary or ordinal with numerical values, x j,k (k = 1, . . . , n j ) denotes the value of the k-th level of X j . If X j is a nominal variable, x j,k denotes the vector of dummy variables with dimension n j − 1 that corresponds to the k-th level of X j . For example, if X j is patients' race with three levels: "white", "black" and "other", we may have x j,1 = (0, 0) for "white", x j,2 = (0, 1) for "black" and x j,1 = (1, 0) for "other".
Let G i ∈ Ω denote the subgroup for patient i, and X g = (X g,1 , . . . , X g,P ) be the covariates for subgroup G = g. Let θ * g denote the treatment effect in subgroup G = g. Note that θ * g may be measured on different scales. For example, if the response is continuous, θ * g may be measured as the mean difference by If the response is binary, θ * g may be measured as the odds ratio by If the response is time-to-event, θ * g may be measured as the hazard ratio by where λ z,g is the hazard rate of arm Z = z in subgroup G = g.
We define θ g = h(θ * g ) where h is a link function that allows the support of θ g to be unrestricted. For example, we may let h be the identity link function for the mean difference of continuous responses and let h be the log link function for the odds ratio of binary and hazard ratio of time-to-event responses.
Lastly, we let θ g be the estimated θ g with σ 2 g the estimated variance associated with θ g .

Bayesian models
We consider the subgroup modeling strategy and selection of Bayesian hierarchical models suggested by Jones et al. (2011). First, assume that the estimated treatment effect θ g in subgroup G = g approximately follows a normal distribution, Next, we consider assigning an informative prior to σ g . For example, we can let Alternatively, we may let In both cases, ∆ is a non-negative parameter specified by the user. These priors are "centered" at the point estimate of the variance σ g corresponding to θ g . They account for the uncertainty in σ g by means of ∆. Note that in Jones et al. (2011), σ g was assumed known, which is equivalent to setting ∆ = 0, and assuming θ g |θ g ∼ N (θ g , σ 2 g ). Next, we consider a set of models together with the priors for parameters in θ g . These models are listed in the following.

No subgroup effect model:
where B is large in relation to the magnitude of the treatment effect size so that the prior for µ is essentially non-informative.
Note. This model assumes that patients in all the subgroups are common with similar characteristics. That is, all the subgroups are statistically identical with regard to the treatment effect and there is no subgroup effect. Information about treatment effects can be directly combined from all subgroups for inference.

Full stratification model:
Note. The subgroups are fully distinguished from each other with regard to the treatment effect. There is no information about treatment effects shared between any subgroups.

Simple regression model:
where C is large in relation to the magnitude of γ so that the priors for γ are essentially non-informative.
Note. When X j is nominal, X g,j and γ j are vectors with dimension n j − 1 and 1 is the (n j − 1) × (n j − 1) identity matrix. Otherwise, X g,j and γ j are scalars and 1 is equal to the scalar 1. Unless otherwise specified, the same notation and modeling strategy also apply to the following models: Dixon and Simon model (7), simple regression and shrinkage model (9) and extended Dixon and Simon model (10).
The model introduces a first-order, linear regression structure for θ g . This model takes into account the information that the subgroups are formulated based on the set of baseline covariates X 1 , . . . , X P . The coefficients γ j 's are shared among subgroups, i.e., these coefficients are assumed to arise from the same distribution. Information about treatment effects are shared between subgroups with similar baseline covariates through these coefficients.

Dixon and Simon model (Dixon and Simon 1991):
where Half-N denotes the Half-normal distribution with variance D. By definition, when ω ∼ N (0, D), ω = |ω | follows a Half-N (D) distribution.
Note. This model assumes that the coefficients γ j 's are shared across the subgroups, which allows information about the treatment effect to be shared between the subgroups. This model has exactly the same structure as the simple regression model (6). However, the simple regression model assumes that all the γ j 's are independent, whereas this model allows them to be shared across the subgroups, leading to shrinkage of the subgroup-specific treatment effects towards each other.
Basic shrinkage model: Note. Here, φ g denotes a random subgroup effect for subgroup G = g and, directly estimated subgroup treatment effects are shrunken towards the overall mean µ. This approach assumes all subgroups are similar with regards to the treatment effect.
Simple regression and shrinkage model: Note. This model combines basic regression with shrinkage, with a linear regression structure and a random effect term. Direct estimates are shrunken towards the regression surface.

Extended Dixon and Simon model (Jones et al. 2011):
where ξ (k) denotes the set of k-th order interaction terms among X 1 , . . . , X P and X ξ (k) ,j and 1 is the same as X ξ (k) ,j .
Note. This approach extends the Dixon and Simon model (7) by introducing the higher-order interactions and sharing the interaction effects of the same order. The model may be highly sensitive to the choices of priors for ω k when k is large.

Inference
Treatment effect evaluation and treatment effect comparison among subgroups are based on the posterior distribution p(θ g | θ g , σ 2 g , ∆).

Installation
The beanz package is available from CRAN at http://CRAN.R-project.org/package=beanz. To install and load package beanz, type the following in R: R> install.packages("beanz") R> library("beanz") The major functions provided in package beanz are listed in Table 1.

Data format
There are two types of data structures that package beanz recognizes: • Patient level raw data. Each row represents a patient with covariates that define the subgroup to which the patient belongs, treatment indicator and outcome. The outcome can be binary, continuous, or time to event.
• Summary treatment effect data. Each row represents a subgroup with covariates that define the subgroup, estimated treatment effect in the subgroup and variance for the estimation.
Package beanz provides dataset solvd.sub from the SOLVD trial as an example of patient level raw data. Get the predictive distribution of the subgroup treatment effects. bzRptTbl Overall summary of Bayesian HTE analysis. bzShiny Run package beanz using web-based GUI.  If patient level raw data is provided, package beanz provides function bzGetSubgrpRaw for estimating subgroup effects for each subgroup. The return value from bzGetSubgrpRaw is a data frame with the format of summary treatment effect data.
For continuous and binary response data, bzGetSubgrpRaw calls glm with model where g(·) is the identity or logit link function. For time to event data, package beanz calls the coxph function in package survival (Therneau 2017) with proportional hazard model The model parameter estimation θ g and the associated variance σ 2 g are further used in the Bayesian HTE analysis.

Bayesian analysis
For Bayesian inference, the function bzCallStan calls the sampling function in package rstan (Carpenter, Gelman, Hoffman, Lee, Goodrich, Betancourt, Brubaker, Guo, Li, and Riddell 2017) to draw MCMC samples for Bayesian models introduced in Section 2. The parameter mdls allows the following modeling options: "nse" for no subgroup effect model, "fs" for full stratification model, "sr" for simple regression model, "ds" for Dixon and Simon model, "bs" for basic shrinkage model, "srs" for simple regression with shrinkage model and "eds" for extended Dixon and Simon model. Note that the results obtained are platform-dependent and will thus vary slightly for different platforms.
Priors of the parameters in the Bayesian models are passed to the function bzCallStan by parameter par.pri as a vector. The return value is a list object of class 'beanz.stan' containing the name of the Bayesian HTE model, raw rstan results, the posterior sample matrix, the posterior samples of the subgroup treatment effects (θ g ), the deviance information criterion (DIC, Spiegelhalter, Best, Carlin, and Van Der Linde 2002) and the leave-one-out cross-validation information criteria (LOOIC, Vehtari, Gelman, and Gabry 2017).
Gradient evaluation took 3.1e-05 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.31 seconds.
Adjust your expectations accordingly!
The function bzSummary creates a data frame with the summary of the posterior subgroup treatment effects. The summary statistics include the posterior mean, standard deviation (SD), 2.5% (Q025), 25% (Q25), Median, 75% (Q75), 97.5% (Q975) quantiles of the treatment effect in each selected subgroup and the posterior probability that the subgroup treatment effect is less than a given cut off value (ProbLT0), which can be specified by parameter cut.
The following example shows the posterior subgroup treatment effect summary table for the simple regression model.
R> bzPlot(rst.sr) R> bzForest(rst.sr) The beanz package provides functions for treatment effect comparisons between subgroup. The function bzSummaryComp creates a data frame with the summary of the posterior subgroup treatment effect differences between subgroups. The parameter sel.grps allows to select specific subgroups for the comparison.

Model selection
We consider the basic shrinkage model and compare its LOOIC with the simple regression model.

Gail-Simon qualitative interaction test
Qualitative interaction is a type of treatment effect heterogeneity, where one or more of the subgroups have a treatment effect in the opposite direction relative to the overall treatment effect. It is critically important to detect such interactions, if they are present. Package beanz provides the frequentist Gail-Simon qualitative interaction test (Gail and Simon 1985) for examining qualitative interactions by function bzGailSimon. The Gail-Simon test examines the null hypothesis that all the subgroup treatment contrasts lie within the positive orthant O + = {G g : θ g ≥ d} or in the negative orthant, O − = {G g : θ g ≤ −d}, where d ≥ 0 denotes the threshold for clinically meaningful qualitative interaction.
The Gail-Simon test is to reject the null of no qualitative interaction if both of the following conditions are met: The critical value c is calculated such that when Ω ⊂ O − or Ω ⊂ O + , the probability that both Equations 13 and 14 hold is no greater than the significance level α.

Overall architecture
The beanz GUI is web-based and developed in R using the shiny (Chang, Cheng, Allaire, Xie, and McPherson 2017) web application framework. Figure 3 shows the architecture of the beanz GUI.
The GUI can be accessed within R using the function bzShiny:

R> bzShiny()
The function bzShiny calls runApp function in package shiny. To stop the GUI application, one may click the Exit button on the main title banner.

Tabpanel description
The main tabpanels that can be accessed through the menubar in the beanz GUI include Start, Upload Data, Subgroup Specification, Configuration, Bayesian Analysis, Toolbox and Report. The details of each panel are given as follows: Start panel: The Start panel serves as an introduction page for the software. The sections on this panel include: Figure 3: Architecture of the beanz GUI.
• What does beanz do? Presents the background introduction to package beanz and the basic steps of using the software.
• What does beanz need? Explains the different formats of the data files that package beanz accepts.
• What does beanz provide? Introduces the results generated with an package beanz analysis.

• Warnings
Warnings that emphasize that subgroup analysis are mostly exploratory and sensitivity analysis are necessary to evaluate the robustness of the findings with regards to different prior choices.

Upload Data panel:
The Upload Data panel provides an interface for users to upload the data to be analyzed. The sections and items within each section on this panel include: • Upload Data

Choose File
Clicking the Browse... button will load local data files in csv or plain text format.

Data Format
Summary treatment effect data or patient level raw data.

Separator
Field separating character.

Quote
Quoting character.

Other
There are two additional options: Header. Checkbox indicating if the first line of the file contains the names of the columns. Show Data. Checkbox indicating whether to present the uploaded data in the Review Data section on this panel.

• Try An Example
Clicking the Try it button will load the example SOLVD Patient level raw data dataset. Details of the example dataset are also given in this section.

• Review Data
Presents the uploaded dataset in a table view.

Subgroup Specification panel:
The Subgroup Specification panel is designed to identify the columns from the uploaded dataset for formulating subgroups. This panel is only available after a data file has been successfully uploaded. The sections and items within each section on this panel include: • Select Variables

Treatment Effect Estimates
Dropdown list for specifying the column for treatment effect estimation. Only available for summary treatment effect data.

Treatment Effect Variance
Dropdown list for specifying the column for the variance associated with the treatment effect estimation. Only available for summary treatment effect data.

Treatment
Dropdown list for specifying the column for treatment. Only available for patient level raw data.

Response
Dropdown list for specifying the column for response. Only available for patient level raw data.

Censoring
Dropdown list for specifying the column for censoring. Only available for patient level raw data with response that is time to event with treatment effect measured by log hazard ratio.

Get Subgroups
Button for generating subgroups based on the column specifications. A specification validation check will be first conducted. If there are errors in the specifications, error messages will be shown in this section.

• Subgroups
Provides a table view of the subgroups. Each row represents a subgroup and the columns include covariates, treatment effect estimation and the variance associated with the treatment effect estimation.

Configuration panel:
The Configuration panel contains options for Bayesian model priors, MCMC sampling and results presentation. The sections and items within each section on this panel include:

• Statistical Models and Priors
Checkboxes for selecting the Bayesian models to include in the analysis. Note that the no subgroup effect model is always included.
For each model, numeric inputs are provided for specifying the parameters in the priors. See Section 2 for details of the priors.

• Prior Of Variance
Prior of log SD Choose normal distribution or uniform distribution as the prior distribution for log σ. See Section 2 for details.

Uncertainty of log SD
Numeric input for specifying parameter ∆.

Number of iterations
Stan parameter specifying how many iterations including burn-in for posterior sampling.
Number of burn-in Stan parameter specifying how many burn-in for posterior sampling.
Number of thinning Stan parameter specifying the period for saving posterior samples.

Number of Chains
Stan parameter specifying the number of MCMC chains for sampling.

Random seed
Stan parameter for random number generation.

Number of cores
Stan parameter for the number of cores for parallel computation.

Rhat Warning
Threshold for generating warnings of problematic convergence based on the Gelman and Rubin potential scale reduction statistic Rhat (Gelman and Rubin 1992) in the Stan posterior samples.

Initial
Step-size Stan parameters that affect the MCMC convergence.

Target Metropolis Acceptance Rate
Stan parameters that affect the MCMC convergence.

• Display Parameters
Cut off for treatment effects The subgroup treatment effect result tables will present the probability that the subgroup effect is less than this cut off value.

Cut off for comparison
The treatment effect comparison between subgroups result tables will present the probability that the subgroup treatment effect difference is less than this cut off value.

Digits
Integer indicating the number of decimal places in the result tables.

Maximum subgroups for comparison plots
When the number of subgroups is large, the treatment effect comparison density plots will not be presented. This integer specifies the maximum number of subgroups for the plot to display.

Organize results
Different ways to present the analysis results: By results. In the results, each row represents a statistical model. The columns include treatment effect, subgroup comparison and Stan diagnosis.
By model. In the results, each column represents a statistical model. The rows include treatment effect, subgroup comparison and Stan diagnosis.

Transformation
Checkbox indicating if the treatment effect should be measured by log-odds ratio or odds ratio. Only effective for patient level raw data and time to event with treatment effect measured by log hazard ratio.

Reference
Checkbox indicating whether the results from the no subgroup effect model should be included in the result tables and figures for the other models as a reference.

Select subgroups to display
A table for selecting a subset of subgroups to be included in the result presentation. The selected subgroups are highlighted in the table.
Bayesian Analysis panel: The Bayesian Analysis panel allows users to conduct the Bayesian analysis by clicking the Conduct Bayesian Analysis button and presents the analysis results following the display configurations on the Configuration panel. This panel is only available after subgroups have been successfully specified.
The sections and items within each section on this panel include: • Effect Table  Presents the posterior mean, standard deviation, 2.5%, 25%, 50%, 75%, 97.5% quantiles of the treatment effect in each selected subgroup (see Select subgroups to display) and the posterior probability that the subgroup treatment effect is less than a given cut off value (see Cut off for treatment effects).

Density
Presents the posterior density plots of the subgroup treatment effects for the selected subgroups.

Forest plot
Presents the posterior forest plots of the subgroup treatment effects for the selected subgroups.

Predictive plot
Presents the posterior predictive distribution of the median, standard deviation, minimum and maximum of the subgroup treatment effects and compare them to the observed statistics.
• Comparison Table  Presents the posterior mean, standard deviation, 2.5%, 25%, 50%, 75%, 97.5% quantiles of the treatment effect difference in each pairwise comparison of the selected subgroups and the posterior probability that the subgroup treatment effect difference is less than a given cut off value (see Cut off for comparison).

Density
Presents the posterior density plots of the treatment effect difference for each pairwise comparison of the selected subgroups.

Forest plot
Presents the posterior forest plots of the treatment effect difference for each pairwise comparison of the selected subgroups.

• Stan Diagnosis
Raw output Presents the outputs from Stan.

Information criteria
Presents the deviance information criterion (DIC) and the leaveone-out cross-validation information criterion (LOOIC).

Trace plot
Presents trace plots of the posterior samples for evaluating the convergence of the Monte-Carlo sampling chains.

Rhat
Presents the distribution of the Rhat statistics and lists the parameters with possible convergence issues based on the Rhat warning threshold (see Rhat warning).
Toolbox panel: The Toolbox panel provides tools that implement frequentist methods for the evaluation of HTE. The sections on this panel include: • ANOINT By clicking the Conduct Analysis button, package beanz conducts frequentist subgroup analysis using the anoint package and presents the outputs generated in R in this section. The details of package anoint can be found in Varadhan and Kovalchik (2015).
This section is only available for patient level raw data and when subgroups have been successfully formulated.

• Gail-Simon Test for Qualitative Interactions
Provides the p value for the Gail-Simon test for qualitative interactions. See Section 3.5 for details.
Clinically meaningful threshold allows the user to specify the threshold for defining the positive and negative orthants.
Report panel: The Report panel presents a summary of the Bayesian analysis results and provides a link for downloading a final report. This panel is only available after the Bayesian analysis has been successfully conducted. The sections on this panel include: • Summary Presents a summary result table for the selected subgroups. The table contains columns of covariates, treatment effect posterior mean and standard deviation, and the posterior probability that the treatment effect is smaller than a cut off value. The results are generated by the Bayesian model with the smallest LOOIC with the given set of priors.
• Download the analysis report The final report can be downloaded by clicking the Download button. The available document formats for the report include PDF, HTML and Word.

Demonstration of beanz GUI
We demonstrate the beanz GUI using the SOLVD clinical trial data in this section.
Step 1. We first upload the patient-level SOLVD data file to package beanz from the Upload Data panel (Figure 4). One can also load the data from package beanz by clicking the Try it button.   Step 2. Next, on the Subgroup Specification panel we first select the Type of response to be time-to-event. We then specify the columns for Treatment, Response and Censoring and then include the baseline covariates sodium, vasodilator and LVEF indicators. By clicking the Get Subgroups button, package beanz checks the validity of the subgroup specification, estimates the treatment effect that is measured by log hazard ratio for each subgroup and reports them in the Subgroups section on this panel ( Figure 5).
We can see that there are a total of 8 subgroups. The subgroup with ejection fraction > 29%, sodium > 141 and no vasodilator (subgroup 7) shows the least Enalapril drug effect with θ 7 = 0.15.
Step 3. After the subgroups are formulated, on the Configuration panel, for this demonstration, we select the Simple regression and the Basic shrinkage models with their default priors (B = 1000, C = 1000, D = 1) for the Bayesian analysis ( Figure 6). We use the default MCMC parameters: 4000 iterations, 2000 burn-ins and 2 thinning iterations. We choose the normal distribution with ∆ = 0 for log σ g . We require 4 MCMC chains and set the target Metropolis acceptance rate to be 0.95. Lastly, we set the random seed for Stan to be 1000. For reporting results, we set the cut off value for subgroup treatment effects to be 0 to report the posterior probability that the log hazard ratio is smaller than 0. We use the default value 0 for the cut off value for subgroup treatment effect differences. We also set the format of the result tables with default value 3 digits. We choose the By model option that organizes results with each column presenting a model. Lastly, we choose the Display no subgroup effect outcome option to present the results from   Step 4. Next, we move to the Bayesian Analysis panel. A progress bar will appear indicating the progress of the Stan analysis after clicking the Conduct Bayesian Analysis button. When the analysis is completed, the results will be presented according to the configuration options from the Configuration panel.    Next, for patients with sodium ≤ 141 and no vasodilator, we evaluate the treatment effect difference between the subgroups with ejection fraction ≤ 29% (subgroup 1) and > 29% (subgroup 5). Figure 13 shows how to select the subgroups of interest in the Display parameter section on the Configuration panel. Figure 14 presents the comparison forest plot based on the Basic shrinkage model. We observe evidence that suggests there may exist treatment by ejection fraction interaction: the Enalapril drug seems to be more effective for the patients with ejection fraction ≤ 29%, sodium ≤ 141 and no vasodilator, compared to the patients with ejection fraction > 29%, sodium ≤ 141 and no vasodilator, but the difference is not significant.
Step 5. At the end, on the Report panel, a summary result table is provided based on the model with the smallest LOOIC. The analysis report can be downloaded as a PDF, HTML, or Word document ( Figure 15).
The report contains sections Data Summary, Subgroup Definition, Analysis Results for each selected model, and Summary. Figure 16 shows the contents page of an example report.

Discussion
In this era of precision medicine, it is not sufficient to evaluate the efficacy of a medical intervention based only on its average treatment effect. It is critical to be able to better understand the nature and sources of heterogeneity in treatment effects in order to help patients and stakeholders make informed health care decisions.
The Bayesian approach offers a principled and flexible way for evaluating the differences in treatment effects across patient characteristics. It is a formal approach to combine evidence from the study at hand with external information or subjective beliefs to answer decisiondriven questions from the varied perspectives of different stakeholders.
A major impediment to the conduct of Bayesian subgroup analysis is the lack of user-friendly software. We attempt to solve this problem by providing, for the first time, an R package software tool with web-based GUI, which makes it easy to carry out Bayesian subgroup analysis. Our software readily facilitates integral aspects of a Bayesian analysis including prior specification, model specification (choosing from a suite of models), control of MCMC simulation settings, visualization of posterior densities, tabular presentation of numerical summaries, and report generation.
We also provide a toolbox for conducting frequentist analysis via fitting regression models with interactions between treatment and baseline covariates, by linking to the R package anoint, which provides an extensive array of subgroup analysis techniques and interaction models. The toolbox also provides a likelihood ratio test for detecting qualitative interactions (Gail and Simon 1985).
For Bayesian analysis, specifying a prior distribution is challenging. There is no default method that will handle all situations. In the absence of prior (empirical) information, we recommend low-information priors, ones that provide support for a broad range of values. Even here care is needed, especially with respect to the scale of a parameter. For example, a N (0, 10000) can be either low information or high information depending on the scale of the regressor and consequently the slope. Similarly, a Uniform(0, 10) prior for a sampling standard deviation can be either low or high information depending on the measurement scale. Specifically, Neuenschwander, Capkun-Niggli, Branson, and Spiegelhalter (2010) introduced a prior maximum sample size concept that can be used to quantify the information level in such priors. We recommend careful attention to these issues. If empirical information is available, we recommend summarizing it by a probability distribution, for example the normalized likelihood, or if the estimate and standard error (SE) are given, by N (estimate, c 2 ×SE 2 ), with c > 1, a "power prior" penalizing to reflect that the external data are not completely exchangeable with the current data. If the analyst also determined that the external data were biased relative to the current study, a N (estimate + offset, c 2 ×SE 2 ) prior can be used. Potential bias can also be captured by selecting an appropriate value for c. For all important analyses, we recommend sensitivity analyses relative to these very explicit analytic choices.
Given the exploratory nature of the subgroup analysis, it is important to evaluate whether a particular Bayesian model provides a good fit to the data (Gelman 2003;Rubin et al. 1984). The software currently provides LOOIC and DIC criteria that can be the basis for comparing the goodness-of-fit. The software also provides posterior predictive checks for checking if the modeled summaries of subgroup-specific treatment effects such as the median, minimum, maximum, and standard deviation are consistent with the corresponding observed data summaries. Other techniques such as cross-validation may also be considered for this task. We plan to include this in a future version of the software.
A potential limitation of the models represented in our software is that the subgroup treatment effects are assumed to follow a normal distribution. For certain scenarios such as analysis of rare events, the normal approximation may not be appropriate (Jones et al. 2011). Nevertheless, maximum likelihood estimates with large information will be approximately normal at least in some scale for most cases. Moreover, the normal approximation allows for relatively straightforward interpretation of the models considered for the mean of θ g . In practice, one can transform for improved normality although such transformation may move the analysis away from the clinically relevant scale. This is the first release of the software; we plan to include other distributions in subsequent versions of the software.