msBP: An R Package to Perform Bayesian Nonparametric Inference Using Multiscale Bernstein Polynomials Mixtures

msBP is an R package that implements a new method to perform Bayesian multiscale nonparametric inference introduced by Canale and Dunson (2016). The method, based on mixtures of multiscale beta dictionary densities, overcomes the drawbacks of Polya trees and inherits many of the advantages of Dirichlet process mixture models. The key idea is that an infinitely-deep binary tree is introduced, with a beta dictionary density assigned to each node of the tree. Using a multiscale stick-breaking characterization, stochastically decreasing weights are assigned to each node. The result is an infinite mixture model. The package msBP implements a series of basic functions to deal with this family of priors such as random densities and numbers generation, creation and manipulation of binary tree objects, and generic functions to plot and print the results. In addition, it implements the Gibbs samplers for posterior computation to perform multiscale density estimation and multiscale testing of group differences described in Canale and Dunson (2016).


Introduction
Multiscale methods have received abundant attention in the statistical literature, having several appealing characteristics that pushed their use in many applications. With the term "multiscale model" we refer to a model in which multiple sub-models at different scales are used simultaneously. A notable example is represented by wavelets, which are routinely used in signal and image processing, nonparametric regression, and density estimation (Donoho, Johnstone, Kerkyacharian, and Picard 1996). However, from the Bayesian perspective, multiscale density estimation is surprisingly understudied. Indeed, most of the approaches rely on single-scale kernel mixtures. Among these, the Dirichlet process (DP; Ferguson 1973Ferguson , 1974 mixtures of Gaussians (Lo 1984;Escobar and West 1995) are the gold standard in many applications. An exception is represented by Pólya trees (Mauldin, Sudderth, and Williams 1992;Lavine 1992a,b) that unfortunately have some unappealing characteristics. For example they tend to produce extremely spiky densities even when the true density is fairly smooth and are sensitive to the prior specification. This sensitivity can be overcome within a mixture approach (Hanson and Johnson 2002), but in this case there is a price to pay in terms of computation. Both the DP and Pólya tree mixture models are implemented in the famous DPpackage (Jara, Hanson, Quintana, Müller, and Rosner 2011), an R package that represents the de facto standard software for Bayesian nonparametric inference under a variety of settings. Canale and Dunson (2016) recently proposed a Bayesian multiscale method that inherits some advantages of the DP mixture and avoids the disadvantages of Pólya trees. The key idea lies in introducing an infinitely-deep binary tree, with a beta dictionary density assigned to each node of the tree. Using a multiscale stick-breaking (Sethuraman 1994) characterization, the authors define a stochastically decreasing sequence of weights assigned to each node of the tree. This formulation implies that within a level of the tree, the densities are equivalent to Bernstein polynomials (Petrone 1999a,b). Extensions to deal with unconstrained domain data are also discussed. A similar idea appeared also in Chen, Hanson, and Zhang (2014). The DP-like characteristics are derived from the formulation of a multiscale generalization of the stick-breaking process, which can be exploited to build an efficient slice sampling algorithm. The same multiscale stick-breaking process has also been used by Wang, Canale, and Dunson (2016) to learn the joint density in massive dimensional settings, using geometric multiresolution analysis to estimate the dictionary densities over the binary tree at a first stage.
The R package msBP (Canale 2017) implements the multiscale stick-breaking process, and its applications to density estimation and to testing of group differences as discussed in Canale and Dunson (2016), and a series of basic R functions to deal with this family of nonparametric priors such as random density and number generation, creation and manipulation of binary trees, and generic functions to plot and print the results. The package's core is written in C++ by means of a specific 'bintree' data class and it is called from R via the .C function. The package is available from the Comprehensive R Archive Network (CRAN) at https: //CRAN.R-project.org/package=msBP. The rest of the paper is organized as follow: In the next section we outline the theoretical framework with particular emphasis on the multiscale stick-breaking process. Section 3 describes the main features of the C++ implementation while Section 4 is concerned with demonstrating the main features of the package.

Basic formulation
Let x ∈ X ⊂ R, be a random variable, g be an unknown density and x ∼ g. Under a Bayesian perspective g 0 is assumed to be a prior guess for g, with G 0 and G −1 0 the corresponding cumulative distribution function (CDF) and inverse CDF, respectively. A prior for g centered on g 0 can be introduced through a prior for the density f of y = G 0 (x) ∈ (0, 1). The CDFs F and G corresponding to the densities f and g, respectively, have the following relationship A similar construction also appeared in . The density f is assumed to have the following structure: where Be(a, b) denotes the beta density with mean a/(a+b). The sequence of random weights {π s,h } are constructed via the multiscale stick-breaking process described below. We will refer to the latter construction as multiscale Bernstein polynomial (msBP) model.
To build a multiscale stick-breaking process, an infinite sequence of scales s = 0, 1, . . . , ∞ labelling the levels of an infinite-deep binary tree is introduced. At each scale s there will be 2 s different nodes. A cartoon of the binary tree is reported in Figure 1. To describe a stochastic path from the root node to the leaves, at each scale s and node h within the scale, the following independent random variables are introduced: corresponding to the probability of stopping at node (s, h) and taking the right path after node (s, h) conditionally on not stopping, respectively. This formulation generalizes the stickbreaking process representation of Sethuraman (1994). Each time the stick is broken, it is then randomly divided in two parts (one for the probability of going right, the remainder for the probability of going left) before the next break. Hence, similarly to Sethuraman (1994), the infinite sequence of weights can be defined as where g shr = h/2 s−r is the node traveled through at scale r on the way to node h at scale s, T shr = R r,g shr if (r + 1, g shr+1 ) is the right daughter of node (r, g shr ), and T shr = 1 − R r,g shr if (r + 1, g shr+1 ) is the left daughter of (r, g shr ). Note that the general (s, h) node is related to the Be(h, 2 s − h + 1) density.
The above construction leads to a meaningful sequence of weights, i.e., ∞ s=0 2 s h=1 π s,h = 1 almost surely for any a, b > 0 as proved in Lemma 1 of Canale and Dunson (2016). An appealing aspect of this formulation is that it produces a multiscale clustering of the subjects. In particular, two subjects having similar observations may have the same cluster allocation up to some scale s, but are not clustered together on finer scales.

Bayesian multiscale inference on group differences
A promising feature of this multiscale stick-breaking process is its ease of generalization to more complex settings than mere density estimation. For example, the sequence of random variables defined in Equation 3 can be generalized to include predictors or other forms of dependence, (e.g., spatial or temporal). Motivated by epigenetic data, Canale and Dunson (2016)   Let y i be a bounded (between zero and one) outcome for subject i with y i ∼ f d i and d i ∈ {0, 1}. The label d i denotes a subject's group (e.g., cases/controls, drug/placebo). Using the msBP representation, the hypothesis f 0 = f 1 is true if the groups share the same weights over the binary tree. If f 0 = f 1 , we may have the same weights on the dictionary elements up to a given scale, so that the densities are equivalent up to that scale but not at finer scales. Thus, one can also test for H s 0 : f s 0 = f s 1 , i.e., no differences between the two groups at scale s. Clearly H 0 0 is true with probability one, and thus a further modification of (3) consists to set S 0,1 = 0. The subjects surviving up to scale s can stop or progress to the next scale. Let N s denote these actions, with N s (d) denoting the actions in group d. Conditionally on N s the posterior probability of H 0 being true at scale s can be written as where P s 0 is our prior guess for the null being true at scale s and P(N s |H s 0 ) is the probability of the possible actions if H 0 is true up to scale s. To compute the latter, we can use where v s,h is the number of subjects stopping at node (s, h) in group d, and r (d) s,h is the number of subjects that continue to the right after passing through node (s, h) in group d, with d = 0, 1. The global null will be the cumulative product of Equation 5 for each scale.
Motivated by a DNA methylation arrays application, Canale and Dunson (2016) generalized the latter formulation in the case in which y i = (y i1 , . . . , y ip ) . To deal with p-dimensional arrays, a prior for P s 0 is assumed in order to borrow information across sites and to learn the joint null probability P s 0 . This feature is not yet implemented in the msBP package. A similar multiscale approach to perform two-sample comparison has been proposed and successfully applied in the multivariate context in Ma and Wong (2011) extending the optional Pólya tree prior of Wong and Ma (2010). The latter approach is able to jointly perform testing of two sample difference and learn the underlying structure of the difference. Anther proposal connected to Pólya trees and dealing with more than two groups, censored, and multivariate data, is discussed in ; see also Holmes, Caron, Griffin, and Stephens (2015) for a related approach.

The C++ implementation
All the main functions of the msBP package are written in C++ and most of them rely on the 'bintree' data structure, i.e., struct bintree { double data; struct bintree *left; struct bintree *right; }; The 'bintree' structure is composed of a root (or parent node), each of which stores data and the two links to the leaves (or daughters nodes). Clearly each leaf connects to two other leaves and it is the beginning of a new, nested binary tree. A binary tree is a well known data structure with appealing characteristics in computer science. For example, it is possible to easily access and insert data into a binary tree using search and insert functions recursively called on successive leaves. This data structure will be used to store the random variables S s,h and R s,h , the weights in the mixtures, and other sample statistics. Basic functions to handle the 'bintree' data structure, such as create a tree, write and extract the data on a given node of a tree and so forth, have also been implemented. Among them, the following functions void tree2array(struct bintree *node, double *array, ...) void array2tree(double *a, int maxScale, struct bintree *node) allow for the conversion of a binary tree structure into an array and vice versa, and have been written to allow the input-output communication of R and C++ via the .C function. The first two arguments of the tree2array function are the pointers to the binary tree and to the array in which to write the values of the tree. Note that the array needs to be initialized before the use of tree2array and needs to have at least length 2 s − 1, where s is the maximum depth of the tree. The tree2array function writes the array as described in Figure 2. The arguments of array2tree, instead, are the pointer to the array, an integer denoting the maximum scale of the binary tree, and the pointer to the binary tree structure to populate. In this latter case the binary tree structure only needs to be initialized and the function takes care of growing the tree up to the desired depth.
In addition to the basic functions already described, the msBP package also features more complex functions. However, most of them are then wrapped into R scripts and define the working functions of the package itself. Thus we do not further describe them here.

Usage of the msBP package
The main functions of msBP are msBP.Gibbs, which allows to perform nonparametric density estimation using the Gibbs sampler, and msBP.test which allows to perform Bayesian multiscale testing of group differences. In this section of the article we provide examples of how to use these functions. In Section 4.1, basic and generic functions to handle the multiscale prior, to sample from an msBP process, and to plot the results, are described. Then, in Sections 4.2 and 4.3, the functions msBP.Gibbs and msBP.test are extensively discussed.

Basic and generic functions
The msBP package introduces two new R object classes implemented in S3. The first is the 'binaryTree' class. An object of the 'binaryTree' class represents a finite-depth binary tree.
It consists of a list containing T and max.s, the binary tree itself and an integer denoting its depth, respectively. Specifically, T is a list where each element is a vector containing the values of the nodes at a given scale. A binary tree of depth 3 containing the integers from 1 to 15 can be obtained with R> tree <-structure(list(T = list(1, c(2, 3), c(4, 5, 6, 7), + c (8,9,10,11,12,13,14,15)), max.s = 3), class = "binaryTree") The tree structure can be converted into a vector using the tree2vec function while vec2tree(x) populates a binary tree with the values contained in the vector x. The latter function is ideally constructed for vectors of length 2 n − 1, where n ∈ N . However if the length l = 2 n − 1 for any n, the function creates a tree up to scale log 2 ( l/2 + 1) with the last leaves populated with NA. This object class will be largely used by other higher level functions, since the approach described in Section 2 deals with several binary trees such as Equations 3, 4 and so forth. A general plot function is available for the 'binaryTree' object class. The result of plot(tree, ...) is a cartoon of a binary tree with the root node at the top. As additional arguments, the function features: value, size, and white. If value = TRUE the numerical values of each node appear inside the node (up to the number of digits specified by precision); if size = TRUE the sizes of the nodes are proportional to their values; if white = TRUE the background color of the nodes is white, otherwise it is in color scale (default gray.colors). Figure 3 shows the output of some combinations.
The second S3 object class implemented in the msBP package is the 'msBPTree' class. An object of class 'msBPTree' is a list of 5 elements that represent a random draw from an msBP(a, b) process. The first two elements are the trees of the stopping and descending-tothe-right probabilities, described by Equation 3. Both are objects of the class 'binaryTree' with the same max.s. The third and fourth argument are the hyperparameters of the msBP prior, namely a and b. The last value is an integer with the maximum depth of both trees.
To simulate a random density from an msBP(a, b) prior truncated at scale 3, the msBP.rtree function can be used as R> set.seed(17012014) R> draw <-msBP.rtree(a = 5, b = 1, max.s = 3) Note that the last scale has S s,h = 1. The induced trees of probabilities, calculated by means of (4) can be obtained with the msBP.compute.prob function as R> weights <-msBP.compute.prob(draw) and the results can be plotted using plot, as it is an object of class 'binaryTree'. An additional argument root = FALSE sets S 0,1 = 0. This can be used, for example, in the settings of Section 2.2. The induced random density can be drawn on a finite grid of length n.points of its domain with the function msBP.pdf, i.e., R> density <-msBP.pdf(weights, n.points = 100) R> plot(dens~y, data = density, xlab = "y", ylab = "Density", type = "l") scales q q q q q q q q q q q q q q q Given a random density from an msBP(a, b) process, it is also possible to generate a sample of size n from that density. To this end we use the Algorithm 1, implemented in C++ and wrapped into R via the msBP.rsample function. The function msBP.rsample needs two parameters only: the sample size n and an object of class 'msBPTree'.

Density estimation
Posterior density estimation under the msBP setup relies on a Markov chain Monte Carlo (MCMC) sampling algorithm. We briefly recall the algorithm proposed by Canale and Dunson (2016) in what follows. It basically consists of two steps: (i) multiscale cluster allocation Algorithm 1 Generating a random sample from a random density having an msBP prior. for i = 1, . . . , n do loop = TRUE; where π s = 2 s h=1 π s,h , andπ s,h = π s,h /π s . The cluster allocation uses a modification of the slice sampler of Kalli, Griffin, and Walker (2011) and is reported in Algorithm 2.
Algorithm 2 is implemented in the msBP.postCluster function. It requires two arguments: the sample of observations y and a binary tree of weights weights. The function makes a call to the postCluster C++ subroutine. The output of the function is a matrix with length(y) rows and two columns of cluster labels: the first denoting the scale and the second denoting the node within a given scale. The same C++ subroutine called by msBP.postCluster is called at each iteration of the MCMC sampler as described below.
Conditionally on the cluster allocations, the stopping and descending-right probabilities can be updated from their full conditional posteriors, namely: msBP.postCluster, i.e., a matrix with 2 columns and number of rows equal to the sample size. The output of msBP.nrvTrees is a list containing tree objects of the class 'binaryTree'.
The main function implemented in the msBP package is the msBP.Gibbs function, performing the actual MCMC simulation from the posterior. The function basically iterates between cluster allocation, using the postCluster C++ subroutine and parameter updating, calculating first the elements n s,h , r s,h , and v s,h by means of the allTree C++ subroutine, and then using Equation 8. The Markov chain sampling is written in C++ but additional R language is used to initialize the function.
To describe the use of msBP.Gibbs, we will now walk the reader through the entire process of density estimation under the msBP setup. We will start showing how to elicit prior information, how to run the sampler, and how to analyze the output of the analysis. We do this using the famous Galaxy dataset (Roeder 1990). The dataset contains the velocity of 82 galaxies. The histogram of the speeds reveals that the data are clearly multimodal. This feature supports the Big Bang theory, as the different modes of density can be thought as clusters of galaxies moving at different speed.
R> data("galaxy", package = "DPpackage") R> x <-galaxy$speed / 1000 We start by discussing prior elicitation. In Section 2.1 we assumed that g 0 is our prior guess for g, the density of x and we want to center our msBP process in such a prior. We discussed that if G 0 and G −1 0 are the corresponding CDF and inverse CDF, we can first transform the data with y = G 0 (x) ∈ (0, 1), and then estimate f ∼ msBP(a, b). It can be shown (see Canale and Dunson 2016) that the expectation E{F (A)} = λ(A), where λ(A) is the Lebesgue measure over the set A and F (A) = A f . Since the prior for f is centered about the uniform, the prior on g is automatically centered in g 0 . To allow this from a practical viewpoint we can use the argument g0 of the msBP.Gibbs function. The package features four different prior guesses for g 0 : g0 = c("uniform", "normal", "gamma", "empirical") for uniform, normal, gamma and, following an empirical approach, the kernel density estimate. As default choice the function implements the "empirical" specification. For "normal" and "gamma", the parameters can be fixed or an additional prior distribution can be assumed. The former approach is adopted using g0par a vector of size two corresponding to mean and standard deviation of the normal, or shape and rate parameters for the gamma, respectively. The latter approach is adopted using hyper$hyperpriors$g0 = TRUE. In this case the model becomes and thus an additional step of Gibbs sampling to simulate θ is necessary. The full conditional posterior of θ is simply and to sample from the latter full conditional posterior distribution the package uses a Metropolis-Hastings step (Hastings 1970) with proposal equal to the prior. Currently only g0 = "normal" is allowed with normal-inverse-gamma prior. Then one has to specify the values of a and b, the hyperparameters of the msBP prior. The hyperparameter a controls the decline in probabilities over scales. Let S (i) denotes the scale of i-th observation. It can be shown that E(S (i) ) = a which means that for small a, high probability is placed on coarse scales, leading to smoother densities and as a increases, finer scale densities will be weighted more, leading to spiker realizations. Additional hyperpriors for a and b can be assumed. Clearly, this will lead to additional sampling steps in the Gibbs sampling algorithm. In the msBP.Gibbs function this can be achieved by setting hyper$hyperpriors$a = TRUE and hyper$hyperpriors$b = TRUE, respectively. Specifically the algorithm implements a ∼ Ga(β, γ) and b ∼ Ga(δ, λ). This leads to the following conditional posterior distributions: and where s is the maximum occupied scale and B(p, q) is the Beta function. To sample from the conditional posterior distribution of b, a Griddy-Gibbs approach over the grid defined by hyper$hyperpar$gridB is used (see Ritter and Tanner 1992). For sake of illustration we run and discuss msBP.Gibbs under the following different prior specifications: R> hyper1 <-list(hyperprior = list(a = TRUE, b = TRUE, g0 = FALSE), + hyperpar = list(beta = 50, gamma = 5, delta = 10, lambda = 1, + gridB = seq(0, 20, length = 30))) R> g0_1 <-"empirical" R> hyper2 <-list(hyperprior = list(a = TRUE, b = TRUE, g0 = FALSE), + hyperpar = list(beta = 50, gamma = 5, delta= 10, lambda = 1, + gridB = seq(0, 20, length = 30))) R> g0_2 <-"normal" R> g0par_2 <-c(21, 2.5) R> hyper2 <-list(hyperprior = list(a = TRUE, b = TRUE, g0 = TRUE), + hyperpar = list(beta = 50, gamma = 5, delta = 10, lambda = 1, + gridB = seq(0, 20, length = 30), mu0 = 21, kappa0 = 0.1, + alpha0 = 1, beta0 = 20)) R> g0_3 <-"normal" R> g0par_3 <-c(21, 2.5) which correspond to: (i) g 0 is assumed to be equal to the empirical kernel density estimate, (ii) g 0 is assumed to be normal with mean 21 and variance 2.5, and (iii) g 0 is assumed to be normal with random mean and variance with prior (µ, σ 2 ) ∼ N (µ, µ 0 , κ 0 σ 2 )I-Ga(σ 2 ; α 0 , β 0 ). In all cases the parameters of the msBP prior are assumed to be random with hyperprior distributions a ∼ Ga(50, 5), and b ∼ Ga(10, 1), with the prior for b evaluated on a grid from 0 to 20 of length 30.
The number of iterations to perform in the MCMC chain can be set via the function argument mcmc, a list including nb, the number of burn-in iterations, nrep the total number of iterations (including nb), and ndisplay the multiple of iterations to be displayed on screen while the C++ routine is running: R> mcmc <-list(nrep = 10000, nb = 5000, ndisplay = 1000) To obtain a posterior estimate of the density, the grid argument needs to be fixed. It consists of a named list giving the parameters for plotting the posterior mean density over a finite grid of points. It must include low and upp giving the lower and upper bound respectively of the grid and n.points, an integer giving the number of evaluation points.
R> grid <-list(n.points = 150, low = 5, upp = 38) Additional arguments to be set are maxS and printing. The former is an upper bound for the binary trees involved in the MCMC sampling, and the latter is a control argument. If printing = TRUE the C++ routine prints on standard output what it is doing every ndisplay iterations. The default choice is printing = FALSE. With the following code we run the MCMC algorithm: R> set.seed(17012014) R> fit.msbp.1 <-msBP.Gibbs(speeds, a = 10, b = 10, g0 = g0_1, + mcmc = mcmc, hyper = hyper1, maxS = 5, grid = grid) R> fit.msbp.2 <-msBP.Gibbs(speeds, a = 10, b = 10, g0 = g0_2, + g0par = g0par_2, mcmc = mcmc, hyper = hyper2, maxS = 5, grid = grid) R> fit.msbp.3 <-msBP.Gibbs(speeds, a = 10, b = 10, g0 = g0_3, g0par = g0par_3, mcmc = mcmc, hyper = hyper3, maxS = 5, grid = grid) The function output is a named list containing four objects: • density: a named list containing postMeanDens, the posterior mean density estimate evaluated over xDens and the related lower and upper pointwise 95% credible bands (postLowDens and postUppDens). • mcmc: a named list containing the MCMC chains: dens is a matrix (nrep-nb) × n.grid containing the values of the density for each MCMC iteration, a and b are vectors containing the MCMC replicates for the two msBP parameters (if hyperprior$a or hyperprior$b are set to TRUE), scale is a matrix where each column is an MCMC sample of the total mass for each scale, R and S are matrices where each column is the tree2vec form of the corresponding trees, weights is a matrix where each column is the tree2vec form of the corresponding tree of weights, s and h are matrices where each column is the MCMC chain for the node labels for a subject, mu and sigma are vectors containing the MCMC replicates for the two parameters of the normal transformation of the data (if hyper$hyperprior$g0 was set to TRUE).
• postmean: a named list containing posterior means over the MCMC samples of a, b, and of all binary trees. If hyper$hyperprior$g0 was set to TRUE, the named list contains also the posterior means of the two parameters of the normal transformation of the data.
• fit: a named list containing the LPML, mean, and median of the log CPO (conditional predictive ordinate).
The histogram of the raw data and the plot of the posterior mean density and the related 95% credible bands for the first specification are reported in Figure 4.
To assess the convergence of the MCMC, one can visually inspect the traceplots of the chains for some parameter of interest. In general fit.msbp.1$mcmc contains the MCMC chains of all the model's parameters. For example, if hyperpriors on the msBP prior parameters have been assumed, one can monitor the convergence of the chains for a and b with R> plot(fit.msbp.1$mcmc$a, type = "l", main = "Traceplot for a", ylab = "") R> plot(fit.msbp.1$mcmc$b, type = "l", main = "Traceplot for b", ylab = "") and test for convergence using, for example, the Geweke (1992)  We finally compare the fit obtained with the different prior specifications with the fit obtained running DPdensity and PTdensity from package DPpackage. As prior specification for the latter models we rely on the specifications described in the documentation of the package, reported for sake of completeness in the online supplements of this article. The comparison is based on the log pseudo-marginal likelihood (LPML) criterion. The LPML is a leave-one-out cross validatory measure based on the predictive densities, see Green and Richardson (2001) and Gelfand and Dey (1994) for details. The three msBP specifications have a LPML of −215, −298, and −265, respectively. The best performance is obtained using the first prior specification which centers the prior expected density in the kernel density estimate of the raw data. The latter is practically equivalent to the best fits obtained with DPdensity and PTdensity, where the LPML is equal to −210 and to −216, respectively.

Inference in group differences
The function msBP.test performs multiscale hypothesis testing of difference in the distribution of two groups. It exploits the closed form expression for the conditional posterior probability for H s 0 in Equation 5. However, since it cannot be directly used due to the dependence on the unknown multiscale clustering structure, the function relies on a Gibbs sampling algorithm. Again, the algorithm is made of two steps: multiscale cluster allocation, and update of the tree of weights. For node h at scale s, let π (1,d) s,h for d = 0, 1 denote the group-specific weights under H s 1 . The allocation of subject i, at each iteration, will be made via msBP.postCluster using the following set of weights: Then, at a given iteration the quantities in Equations 6-7 can be calculated explicitly, and used to update the stopping and descending probabilities.
We describe the parameters and the behavior of the function via the Indian Liver dataset, available at the UCI Machine Learning repository (Bache and Lichman 2013). This dataset contains data on 580 subjects where 413 are liver patients and 167 are non-liver patients. Subjects with liver problems typically register higher levels of bilirubin in their blood and thus we want to test if there is a difference in the distribution of the relative direct bilirubin, calculated as the ratio of the direct bilirubin over the total bilirubin. An histogram of the raw data is reported in Figure 6(a).
The msBP.test function, in addition to a vector of observations and to a vector of group labels, requires prior values for a, b, and for the probability of H 0 . The choice of the hyperparameters a and b can be made using prior information. In what follows, however, the choice is done with a two-step procedure. First, the density of the pooled dataset is fitted with the msBP.Gibbs function assuming hyperpriors for a and b R> mcmc.test <-list(nrep = 8000, nb = 4000, ndisplay = 1000) R> hyper.test <-list(hyperprior = list(a = TRUE, b = TRUE), + hyperpar = list(beta = 5, gamma = 0.5, delta = 1, lambda = 1)) R> set.seed(17012014) R> dens.liver <-msBP.Gibbs(liver$dirbil, a = 10, b = 10, g0 = "unif", + mcmc = mcmc.test, hyper = hyper.test, maxScale = 5) Then the posterior mean of a and b are used as plug-in estimates for the testing. We fix the prior probability of H 0 to 0.5 in order to put equal weights to the null and the alternative, and we fix the upper bound for the scales to 5. The function can be executed via R> test.liver <-msBP.test(liver$dirbil, a = dens.liver$postmean$a, + b = dens.liver$postmean$b, group = liver$group, priorH0 = 0.5, + mcmc = mcmc.test, plot.it = TRUE, maxScale = 5) The function's output is a list containing all the MCMC replicates for P(H s 0 |−) along with their posterior means and the global Bayes factor BF = P(H 0 | −) P(H 1 | −) . The differences between the two groups are minimal at the first scale but start to become evident for increasing scales.

Conclusions
We have presented a detailed introduction to the R package msBP, which implements a recently introduced multiscale stick-breaking prior and allows to perform density estimation and to test for differences in the distribution of two groups. The package implements also basic and generic functions to handle the involved multiscale trees structures.