bsamGP : An R Package for Bayesian Spectral Analysis Models Using Gaussian Process Priors

The Bayesian spectral analysis model (BSAM) is a powerful tool to deal with semipara-metric methods in regression and density estimation based on the spectral representation of Gaussian process priors. The bsamGP package for R provides a comprehensive set of programs for the implementation of fully Bayesian semiparametric methods based on BSAM. Currently, bsamGP includes semiparametric additive models for regression, generalized models and density estimation. In particular, bsamGP deals with constrained regression models with monotone, convex/concave, S-shaped and U-shaped functions by modeling derivatives of regression functions as squared Gaussian processes. bsamGP also contains Bayesian model selection procedures for testing the adequacy of a parametric model relative to a non-speciﬁc semiparametric alternative and the existence of the shape restriction. To maximize computational eﬃciency, we carry out posterior sampling algorithms of all models using compiled Fortran code. The package is illustrated through Bayesian semiparametric analyses of synthetic data and benchmark data.


Introduction
Gaussian processes provide a natural method for specifying prior distributions on the space of functions in Bayesian inference. Since the seminal results on Gaussian process priors (Leonard 1978;O'Hagan 1978) in regression and density estimation, Gaussian processes have been widely used as nonparametric priors for unknown random functions. For instance, Gelfand, Kottas, and MacEachern (2005), Duan, Guindani, and Gelfand (2007), and Banerjee, Gelfand, Finley, and Sang (2008) developed models based on Gaussian processes for spatial statistics; Lenk (1999), Shi, Murray-Smith, and Titterington (2005), and Banerjee, Dunson, computing environment and utilizes compiled Fortran 90 code to maximize computational efficiency when generating random samples from posterior distributions.
Although Bayesian semiparametric models are extremely powerful, their applicability depends on the availability of user-friendly software. Because semiparametric models are typically based on complex representations, the tools to fit semiparametric models in regression and density estimation may not be easily accessible to non-experts or practitioners. Currently, there are a few good command-driven software packages in R for semiparametric modeling. Package DPpackage (Jara, Hanson, Quintana, Müller, and Rosner 2011) is perhaps the most popular R package that includes many semiparametric models; it mostly uses Dirichlet process mixture models and B-spline functions, and it provides either flexible regression or density estimation for data analysis. Moreover, there are several general-purpose programs available for Bayesian inference, including BUGS (Lunn, Spiegelhalter, Thomas, and Best 2009), JAGS (Plummer 2003), and Stan (Carpenter et al. 2016), which can be tailored for fitting semiparametric models for regression and density estimation. For example, BUGS can handle Dirichlet process priors and nonparametric regression with spline and radial basis functions as well as the basic BSAM of Lenk (1999) with cosine series (see, e.g., Congdon 2006, Marley and Wand 2010, and Congdon 2014. Wood (2016) developed the jagam function in package mgcv (Wood 2017), which automatically generates JAGS code for the generalized additive model (GAM) supported by mgcv, utilizing spline-like basis expansions. The brms (Bürkner 2017) package implements Bayesian multilevel models in R using Stan, which also includes non-linear models. In addition, Gaussian process models can be built using Stan for regression and classification (Stan Development Team 2017).
Although a number of fully Bayesian programs and user-friendly software packages continue to be developed, as mentioned earlier, these programs and packages generally do not include semiparametric models with shape restrictions, with the exceptions of the monotonicity constraints in GPstuff (Vanhatalo et al. 2013) for Gaussian process regression, in bisoreg (Curtis and Ghosh 2011) with Bernstein polynomials, and in bnpmr (Bornkamp and Ickstadt 2009) with mixtures of cumulative distributions, in spite of recent results using Bayesian shape constraints for semiparametric models (see, e.g., Meyer, Hackstadt, and Hoeting 2011, Lin and Dunson 2014, Golchi, Bingham, Chipman, and Campbell 2015, Wang and Berger 2016 and the references therein). To the best of our knowledge, bsamGP is the first user-friendly software developed for the purpose of Bayesian shape-restricted function estimation for semiparametric generalized additive models, handling monotonicity, convexity, S-shaped functions and U-shaped functions.
Note that it is possible to incorporate the basic BSAM without shape restrictions into the general programs either through direct coding with BUGS and Stan or by interfacing JAGS and mgcv developed in Wood (2016). However, it would be rather unfriendly, time-consuming, and tedious for practitioners to fit the semiparametric models with shape restrictions. In addition, BSAM with the shape restrictions of Lenk and Choi (2017) involves numerical integration and quadratic forms to be evaluated for modeling the derivatives of the unknown function as squared Gaussian processes. This would be computationally inefficient and cumbersome when using the general purpose MCMC sampling implementations, particularly for estimating S-shaped and U-shaped functions.
bsamGP currently supports various parametric and semiparametric models with nonparametric components using the Gaussian process priors: linear regression models with Gaussian and non-Gaussian errors, partial linear regression models with Gaussian and non-Gaussian errors, partial linear quantile regression models based on asymmetric Laplace distributions and Dirichlet mixtures of asymmetric Laplace distributions, generalized partial linear regression models (probit and logistic regression models for binary data and Poisson and negative binomial regression models for count data), and density estimation models using a logistic Gaussian process prior. For non-Gaussian errors, Dirichlet process mixture models are exploited for either non-Gaussian or unknown error distributions. In particular, bsamGP models unknown regression functions with monotonic, monotonic convex or concave, S-shaped, or U-shaped shape constraints. As explained previously, both applications use novel squared Gaussian process priors that assume that the derivatives of the functions are the squares of Gaussian processes. In addition, bsamGP contains model selection procedures for testing the existence of a shape restriction in a regression model as well as the adequacy of a parametric model relative to a non-specific alternative in a partial regression model based on the computation of marginal likelihoods.
The rest of this paper is organized as follows. Section 2 briefly reviews the approach to partial linear models using BSAM, as implemented in bsamGP, and explains how shape constraints are imposed on the unknown function. Then, we present basic procedures for partial linear quantile models, generalized partial linear models, and density estimation using BSAM. Section 3 illustrates the basic usage of the main functions and methods of bsamGP using synthetic data sets and demonstrates how shape restrictions are specified on the semiparametric regression models with various options in bsamGP using benchmark data sets. Section 4 concludes with a discussion on a possible extension to the bsamGP package.

Bayesian spectral analysis models
In this section, we briefly review a Gaussian process and its spectral representation, and then describe various semiparametric models using a spectral analysis of a Gaussian process as a prior for nonparametric components. Gaussian processes are widely used to define prior distributions over functions with continuous domains (e.g., see Rasmussen and Williams 2006, Murphy 2012, and Gelman, Carlin, Stern, Dunson, Vehtari, and Rubin 2013. Let Z = (Z(x), x ∈ X ) and X be an arbitrary index set, such as time or space. A stochastic process Z is called a Gaussian process if the marginal distribution of any finite-dimensional Z(x 1 ), . . . , Z(x m ), where x 1 , . . . , x m ∈ X , is multivariate normal. This process is clearly determined by the mean function µ(x) = E [Z(x)] and the covariance function A spectral analysis model of a Gaussian process is defined by linearizing its covariance matrix with the Karhunen-Loève representation (see, e.g., Grenander 1981). Let Z be a second-order Gaussian process with a mean function equal to zero and the covariance function COV(s, t). Let ϕ 0 (x), ϕ 1 (x), ϕ 2 (x), . . . be orthonormal basis functions for all x ∈ [0, 1]. Then, a spectral analysis model represents the Gaussian process Z as an infinite series expansion with the Karhunen-Loève representation and its covariance function is given by All of the methods implemented in bsamGP are based on the spectral representation in (1) of Gaussian processes, in particular using a Fourier series expansion based on the cosine basis functions (Lenk 1999(Lenk , 2003Lenk and Choi 2017): Note that the cosine functions in (2) form an orthonormal basis for piece-wise continuous functions on [0, 1] (see, e.g., Kreider, Kuller, Ostberg, and Perkins 1966), and that the spectral analysis models facilitate the computation of the posterior distribution of the Gaussian process. If the support of x is S, then the orthonormal basis can be defined as ϕ 0 (x) = q(x) and ϕ j (x) = 2q(x) cos[πjQ(x)], where Q is a cumulative distribution function with support S, and q is its density.

The BSAR model
We first consider a partial linear additive model for the mean regression. Let y i and w i be the response and the vector of parametric predictors, respectively. Further, let x i,k be the covariate related to the response through an unknown nonlinear function. The partial linear additive model is given as follows: where f k is an unknown nonparametric function of the scalar x i,k ∈ [0, 1], which is estimated using the spectral analysis model of the Gaussian process given in (1) and (2), and the error terms { i } are a random sample from a normal distribution, N (0, σ 2 ). For the spectral coefficients of each function f k , we assign the scale-invariant prior : With regard to the priors of the hyperparameters τ 2 k and γ k , we consider the following two specifications: To complete the model specification, we choose the conjugate priors for β and σ 2 : We refer to this model as the Bayesian spectral analysis regression (BSAR) in the remainder of this paper. The MCMC sampling steps for BSAR are not complicated because all parameters except γ k have conjugate priors for T smoother, whereas all the parameters other than τ 2 k and γ k have conjugate priors for Lasso smoother. Specifically, we generate the posterior samples of β, σ 2 and θ from the explicit full conditional distributions, and we use slice sampling methods (e.g., Damien, Wakefield, andWalker 1999 andNeal 2003) in order to generate γ k for T smoother and both τ 2 k and γ k for Lasso smoother. The further details of sampling procedures with posterior distributions can be found in Lenk (1999) and Lenk and Choi (2017).
Additionally, we consider modeling the error terms { i } nonparametrically as a random sample from the Dirichlet process mixture models, which can provide a flexible and robust regression analysis of data. Specifically, the error distributions are assumed to follow one of two mixture models as follows: Here DP(·) denotes the Dirichlet process (Ferguson 1973), M > 0 is the precision parameter, and G 0 (·) is the centering measure. We implement the Dirichlet process mixture models with the "no-gaps" algorithm of MacEachern and Müller (1998). Lenk and Choi (2017) developed the BSAR models for the shape-restricted regression functions by assuming that the derivatives of the functions are the squares of Gaussian processes. That is, the qth (q ≥ 1) derivative of the unknown function f , f (q) is modeled as the square of a zero-mean, second-order Gaussian process, depending on the a priori shape constraint imposed on the unknown function. Specifically, for monotonicity and convexity/concavity, we consider

The BSAR model with shape restrictions
• Monotone: • Convex/concave: where δ can be either 1 (non-decreasing) or −1 (non-increasing) as given by the user. Note that the f is assumed to be mean centered and orthogonal to the constant function for identification and that the constant of integration is included in each of integral representations of f with α for mean-centering constraints (see Lenk and Choi 2017 for further detailed discussions). Because Z uses a spectral representation Z(x) = ∞ j=1 θ j ϕ j (x) with the same cosine basis functions as in (1) and (2), the BSAR model with a monotone restriction (5) is given by The infinite series representation of Z is approximated by a finite sum where J denotes the truncation point, and the BSAR model with monotone restrictions can be written in matrix notation as where y = (y 1 , . . . , y n ) , Similarly, the BSAR model with monotone convexity or concavity in (6) is given by the spectral representation where Φ b J (x) is the (J + 1) × (J + 1) matrix with (j, k) entry ϕ b j,k (x). Thus, the scale-invariant prior distributions of the spectral coefficients of the shape-restricted functions basically inherit those of the BSAR model described in Section 2.1 but are modified as The other prior distributions are the same as those for the BSAR model described in Section 2.1 except for the truncated normal prior distribution of α, which ensures that f is positive or negative.
For the U-shaped and S-shaped restrictions, we additionally include a decreasing logistic function h(x) scaled between −1 and 1 in (5) and (6), • U-shaped: • S-shaped: where ζ is given by the user and ξ is also considered for mean-centering constraints, similarly to α. Note that there are two additional parameters ω and ψ in h(x). ω is a unique zero of h(x) and ψ is a slope of h(x), controlling its steepness at ω. Truncated normal prior distributions are assigned to these parameters as For the shape restricted models, the full conditional distributions for the parameters θ, σ 2 , ω and ψ do not have closed-form expressions. For the MCMC sampling steps, we therefore use adaptive Metropolis algorithms for θ, slice sampling algorithms for σ 2 and additional hyperparameters for θ, and numerical integration using the trapezoidal rule to obtain the basis functions, ϕ a (·) and ϕ b (·). Additional details can be found in Lenk and Choi (2017). Further, as shown in the representations of U-shaped (8) and S-shaped functions (9), all of the parameters other than two additional parameters, ω and ψ for h(x), and the integral Z 2 h in computing f (x) are the same as in the BSAR models with monotonicity and convexity. Using a finite approximation of Z J , the same MCMC sampling steps can be used with additional adaptive Metropolis steps for ω and ψ using truncated normal distributions and numerical integrations to compute Z 2 J h over a fine grid. Further details are also given in the supplementary materials in the online appendix of Lenk and Choi (2017).

The BSAQ model
We next consider a partial linear additive model for quantile regression. A Bayesian analysis of a quantile regression assumes that the error term has an asymmetric Laplace distribution (e.g., see Moyeed 2001 andSriram, Ramamoorthi, andGhosh 2013), with density where σ 2 is a positive scale parameter and p is a fixed value in (0, 1). Based on the asymmetric Laplace distribution in (10), we have the following partial linear additive quantile regression model, using the aforementioned spectral analysis approach: where η 1 = (1−2p)/{p(1−p)}, η 2 2 = 2/{p(1−p)}, and π(u) is a density function of an exponential distribution with mean σ 2 , π(u) = σ −2 e −u/σ 2 . As in the BSAR model in Section 2.1, we use the Bayesian spectral representation for the unknown additive regression functions f k (·), k = 1, . . . , K. We refer to this model as the Bayesian spectral analysis quantile regression (BSAQ) model.
Note that the joint posterior distribution for the BSAQ is proportional to where we use a location-scale mixture representation of exponential and normal distributions for AL p , which is commonly used to facilitate the posterior sampling algorithm. Thus, the MCMC sampling steps are the same as those described in Sections 2.1 and 2.2 for the BSAR model with or without shape restrictions, with the exception of an additional sampling step that updates latent mixing parameters from generalized inverse Gaussian distributions (see, e.g., Kozumi and Kobayashi 2011, Alhamzawi and Yu 2013 for detailed sampling procedures with posterior distributions). That is, we update the latent variables u = (u 1 , . . . , u n ) from The shape restrictions in (5)-(9) are also incorporated into the BSAQ model, and the posterior sampling steps are identical to those of BSAR with shape restrictions with the previous additional sampling step for the latent mixing parameters u.
Furthermore, we consider error terms { i } as a random sample from the Dirichlet process scale mixture of an asymmetric Laplace distribution for flexible nonparametric modeling for the error distribution (see, e.g., Jo et al. 2016 andKrnjajić 2009). Specifically, we model the error distribution as follows: for which the "no-gaps" algorithm of MacEachern and Müller (1998) is also used.

The GBSAR model
Generalized linear models (GLMs), introduced by Nelder and Wedderburn (1972), provide a unified framework for the regression analysis of normal and non-normal response variables, assuming that the mean of the response variable is linearly associated with the predictors by a known smooth function, called a link function. A GAM (Hastie and Tibshirani 1990) combines the properties of GLMs with those of additive non-/semiparametric models by relaxing the linearity assumption on the actual relationship between response variable and predictors. Here, we consider the GAM approach using a spectral analysis regression, referred to as the generalized linear BSAR (GBSAR) model. Specifically, we deal with semiparametric generalized additive models for discrete responses, including binary, Poisson and negative binomial data.

GBSAR: Binary response regression
Consider a binary response variable y i ∈ {0, 1}, i = 1, . . . , n for a collection of n subjects associated with p linear effect predictors w i = (w i1 , . . . , w ip ) and K nonlinear effect predictors . The Bayesian generalized partial linear model for binary response is given as follows: where g(·) is a link function, β is a p-dimensional vector of coefficients for linear effects, and f k (·) is an unknown nonlinear function of the scalar x i,k ∈ [0, 1], which is modeled using spectral analysis models of Gaussian processes. Typically, the link function g(·) in the binary regression is chosen as the cumulative distribution function (CDF) of some continuous distributions defined on the real line, such as, for example, the standard normal distribution and the logistic distribution. In the former case, the binary regression model is referred to as the probit model, and the latter is called the logit model (see, e.g., McCullagh and Nelder 1989, Hastie and Tibshirani 1990and Dey, Ghosh, and Mallick 2000.

GBSAR with the probit regression.
For the probit link, g −1 (µ) = Φ(µ), where Φ(µ) denotes the CDF of the standard normal distribution, Albert and Chib (1993) proposed a simple data augmentation approach that renders the full conditional distributions of the model parameters equivalent to those under the Bayesian normal regression model. Thus, the GBSAR with the probit link represents the model in (11) using auxiliary variables, as follows: The main advantage of Albert and Chib (1993)'s approach in (12) is that it enables the use of the Gibbs sampling algorithm to simulate posterior samples of the unknown parameters in the GBSAR with the probit regression model. That is, assuming K = 1 for simplicity, the joint posterior of all unknown parameters is proportional to and, thus, the MCMC sampling steps are the same as those in BSAR with the Gibbs sampling and slice sampling methods, except for an additional step to update the latent variable z i from a form of the truncated normal posterior distribution, GBSAR with the logistic regression. As an alternative to the probit link, the logit link, , is also widely used. From the missing-data mechanism of Albert and Chib (1993), the GBSAR model (11) with the logit link is defined by where Lo(0, 1) denotes a standard logistic distribution (Devroye 1986) with a density function given by Lo( ; 0, 1) = e (1 + e ) 2 . The binary regression model with a logit link in (13), called the logistic regression model, is typically preferred to the probit regression model (12) for most statistical applications because of the easy interpretation of the regression coefficients. In the implementation of GBSAR logistic regression, we exploit the hierarchical representation of the logistic distribution and the data-augmentation approach for the posterior sampling proposed by Holmes and Held (2006). Specifically, Holmes and Held (2006) represented the logistic distribution as a scale mixture of a normal distribution (Andrew and Mallows 1974): where KS denotes the Kolmogorov-Smirnov distribution (see, e.g., Devroye 1986; Holmes and Thus, the MCMC sampling procedures are the same as those of probit regression using the algorithm of Albert and Chib (1993) except for the additional steps for updating z i and ψ i from the logistic distribution as a scale mixture of the normal distribution.

bsamGP: Bayesian Spectral Analysis Models Using Gaussian Process Priors in R
Alternatively, because the full conditional distributions of the unknown parameters do not have standard forms, the Metropolis-Hastings algorithm, which is typically used in the Bayesian logistic regression model, can be directly applied to posterior sampling without using the missing-data mechanism of Albert and Chib (1993) and Holmes and Held (2006). Specifically, we generate posterior samples from the joint posterior distribution based on the adaptive Metropolis algorithms (e.g., Haario, Saksman, andTamminen 2001 andRoberts andRosenthal 2009) using normal distributions with adaptively updated variances as proposal distributions. The shape restrictions in Section 2.2 are also incorporated into the GBSAR for binary regression, and the same posterior sampling algorithms are applied as in BSAR with shape restrictions.

GBSAR: Count response regression
When the response is an unbounded count, y = 0, 1, 2 . . ., we can use count response regression models. In this case, two regression models for counts are commonly considered: the Poisson regression and the negative binomial regression. The Poisson regression model assumes that the response variables y = (y 1 , . . . , y n ) are independently Poisson distributed with the rate parameters λ i , i = 1, . . . , n and that the logarithm of λ i can be modeled by a regression function of predictors. The negative binomial regression model extends the Poisson regression model by introducing an additional parameter that is known to easily handle the over-dispersion problem (see, e.g., Hilbe 2011).

GBSAR with Poisson regression.
We work with the following GBSAR for the Poisson additive model: where β is a p-dimensional vector of the coefficients for the linear effects, w i is the corresponding design vector, and the {f k (·)} are unknown nonlinear functions of the scalars x i,k ∈ [0, 1], modeled using spectral analysis models of Gaussian processes, as before. Note that because the full conditional distributions of the unknown parameters in (14) do not have standard forms, we also use the adaptive Metropolis algorithms.
That is, we generate posterior samples from the joint posterior distribution based on the adaptive Metropolis algorithms using normal distributions with adaptively updated variances as proposal distributions. Specifically, we illustrate the process for generating candidates for β: • Generate the candidates of β t with a proposal distribution given at iteration t by where d is the dimension of β, and Σ t is the empirical covariance estimate of β, sequentially computed as • Then, the candidates are accepted with the product of the prior distributions in Sections 2.1 and 2.2 for BSAR with or without shape restrictions and the Poisson likelihood from (14).
Note that the remaining parameters with spectral coefficients and other hyperparameters are updated as before.
GBSAR with negative binomial regression. The Poisson regression model can be of limited use in several disciplines because empirical count data sets typically exhibit overdispersion, usually caused by dependencies, variation between responses or violations in the distributional assumptions. The negative binomial distribution can better deal with such overdispersed count data and, thus, the negative binomial regression model is often preferred to the Poisson regression model (see, e.g., Hilbe 2014 andFahrmeir, Kneib, Lang, andMarx 2013).
The Bayesian negative binomial additive model with GBSAR is given as follows: where κ > 0 is a dispersion parameter, assigned a gamma distribution with shape parameter r 0,κ and the rate parameter s 0,κ as a prior. The negative binomial model (15) can also be expressed as a continuous mixture of Poisson distributions, where the mixing distribution of the Poisson rate is a gamma distribution (see, e.g., Koop, Poirier, andTobias 2007 andLuts andWand 2015): Here, the parameters κ and β are updated using the adaptive Metropolis algorithm, and the posterior sampling procedures of the remaining unknown parameters are the same as before.
Note that the negative binomial distribution was originally derived as a limiting case of the Poisson-gamma mixture distribution (Greenwood and Yule 1920), and that, in the current version of bsamGP, both the (15) and (16) approaches are available for implementing the GBSAR negative binomial regression. The shape restrictions in Section 2.2 are also incorporated into the GBSAR for count regression, and the same posterior sampling algorithms are applied as in BSAR with shape restrictions, using the joint posterior distribution proportional to the product of either the Poisson or negative binomial likelihood and the prior distributions described in Sections 2.1 and 2.2.

The BSAD model
In addition to regression models, we include a Bayesian semiparametric density estimation procedure using a spectral representation of Gaussian processes as in Lenk (2003). The semiparametric density model consists of a parametric component, specified by an exponential family, and a nonparametric component, specified by a logistic Gaussian process (see, e.g., Leonard 1978and Lenk 1988. The logistic Gaussian process, defined as the logistic transformation of a Gaussian process, provides a flexible model for estimating unknown densities using the covariance structure (see, e.g., Lenk 1991;Tokdar and Ghosh 2007;Riihimäki and Vehtari 2014).
Specifically, consider the situation in which the observations y 1 , . . . , y n constitute a random sample from an absolutely continuous probability density f with respect to a known, dominating measure G on the support Y. A semiparametric density estimation model using a logistic Gaussian process prior (Lenk 2003), which we refer to as the Bayesian spectral analysis density estimation (BSAD) hereafter, is given as follows: where w(y) = [w 1 (y), . . . , w p (y)] is a p-vector of non-constant functions specifying an exponential family, β is a p-vector of unknown coefficients, and Z is a zero-mean, secondorder Gaussian process using the spectral representation Z(y) = ∞ j=1 θ j ϕ j (y), ϕ j (y) = √ 2 cos(πjG(y)), as before. A normal prior for {θ j } similar to that of BSAR in (4) is considered However, two possible parameterizations of c j are considered, namely c j = j for the geometric smoother and c j = log(j + 1) for the algebraic smoother (Lenk 1999(Lenk , 2003. Other unknown parameters have the same prior specifications as in the BSAR with a normal distribution for β and in the T smoother for τ 2 and γ. In order to generate posterior samples in BSAD, given the logistic transformation in (17), the MCMC sampling procedures of BSAR are employed to estimate β, spectral coefficients, and additional hyperparameters by transforming the problem from density estimation into the semiparametric regression based on a discrete version of the likelihood. Additional details can be found in Lenk (2003).

Basic implementation of bsamGP
The bsamGP package for R provides tools that implement the Bayesian spectral analysis models described in the previous section. In this section, we demonstrate the general usage of the main functions and methods of bsamGP using two synthetic data sets: one with a normal semiparametric regression for the BSAR and the other with a semiparametric density estimation for the BSAD. To assess the adequacy of the semiparametric model against the parametric model, we also provide the log marginal likelihoods for both cases. More importantly, we demonstrate how to specify shape restrictions on the semiparametric regression models, the BSAR, BSAQ, and GBSAR, with various options in bsamGP using benchmark data sets.

The BSAR model with synthetic data
The main function in bsamGP for the analysis of normal regression data is a function bsar() that fits a semiparametric regression model using the BSAR approach described in Section 2.1.
To illustrate the use of the bsar function for the BSAR model, we generate synthetic data with n = 100 observations from where w i follows a normal distribution with mean 0 and standard deviation 0.5 and f ( Lenk (1999). Here, the predictor values x i are generated from a uniform distribution, x i iid ∼ Unif(0, 1). The data are simulated using the following code: R> set.seed(1) R> n <-100 R> f <-function(x) { + 5 -10 * x + 8 * exp(-100 * (x -0.3)^2) -+ 8 * exp(-100 * (x -0.7)^2) + } R> w <-rnorm(n, sd = 0.5) R> x <-runif(n) R> y <-2 * w + f(x) + rnorm(n, sd = 1) To fit the data with the BSAR, we first need to specify the number of basis functions and the hyperparameters as follows: R> nbasis <-50 R> prior <-list(beta_m0 = numeric(2), beta_v0 = diag(100, 2), w0 = 2, + tau2_m0 = 1, tau2_v0 = 100, sigma2_m0 = 1, sigma2_v0 = 1000) In total, 60,000 draws from the MCMC sampling method implemented in the bsar function were completed. A transition (burn-in) period of 10,000 samples was considered, and the chain was sub-sampled at every tenth iteration to obtain a final posterior sample size of 5,000. The following code illustrates the MCMC specification: R> mcmc <-list(nblow = 10000, nskip = 10, smcmc = 5000, ndisp = 5000) Then, the BSAR model is fitted using the following commands: R> fout <-bsar(y~w + fs(x), nbasis = nbasis, mcmc = mcmc, prior = prior, + shape = "Free", marginal.likelihood = TRUE, spm.adequacy = TRUE) bsamGP facilitates an intuitive formula interface to describe the semiparametric additive model where fs() indicates that nonparametric smoothing is applied to a variable x with spectral analysis regression in BSAM. Note that no shape constraint in the regression function is considered with option shape = "Free" in the specification of the BSAR model. Furthermore, we use the logarithm of the marginal likelihoods and the Bayes factor to test the adequacy of the semiparametric model with the options marginal.likelihood = TRUE and spm.adequacy = TRUE.
Note that the coda (Plummer, Best, Cowles, and Vines 2006) package can be used to perform convergence diagnostics based on output objects in fout and produce trace and density plots with the plot method in coda. For illustration, we display these plots of β and σ 2 in Figure 1 using the following commands. R> library("coda") R> post <-as.mcmc(data.frame(beta = fout$mcmc.draws$beta, + sigma2 = fout$mcmc.draws$sigma2)) R> plot(post) Alternatively, the generic plot method for fout can be used to produce trace plots for each parameter using the following command.

R> plot(fout)
The posterior summary is produced from the object fout of the bsar function using either the print or the summary command. The code using the summary command and some of the results for the posterior summary are given as follows: R> summary(fout)  Gelfand and Dey (1994) and Newton and Raftery (1994).
The left panel of Figure 2 displays a scatter plot of the simulated data along with the true regression function, the estimated regression function, and point-wise 95% equal-tail credible intervals. The posterior mean estimate is very close to the true mean function (i.e., the "double-normal" function). The following code obtains the estimated mean function under the BSAR model and the true function and is used to draw the plot displayed in the left panel of Figure 2.
R> fit <-fitted(fout, HPD = FALSE) R> plot(fit, ggplot2 = FALSE) R> lines(fit$xgrid, f(fit$xgrid), lwd = 3, lty = 2) bsamGP also provides the predict method to easily assess an out-of-sample posterior prediction. Specifically, the predict function for objects created by bsamGP is provided to allow the drawing of samples from the posterior predictive distributions and, thus, to obtain the unobserved response predicted by the model after the MCMC sampling is completed.
The following code illustrates the use of the predict method in an out-of-sample prediction with 100 test data points. The type = "response" option by default in predict provides the posterior predictive estimates of the future (unobserved) observations given the current observations by averaging the conditional distribution of the future observations over all possible parameters weighted by their posterior distributions, whereas type = "mean" returns posterior predictive regression curves. The right panel of Figure 2 portrays the posterior predictive mean of regression curves with point-wise 95% equal-tail intervals for posterior predictive estimates of future observations, specifically, out-of-sample test data, as well as those for posterior predictive regression curves superimposed with the test data. Note that the latter for the regression curves, that is, the noise-free observations, is narrower than the former for the predicted value ofỹ, the future observation with noise.
R> set.seed(2) R> n.test <-100 R> w.test <-rnorm(n.test, sd = 0.5) R> x.test <-runif(n.test) R> fit.pred.tobs <-predict(fout, newp = w.test, newnp = x.test, + type = "response") R> fit.pred.tmean <-predict(fout, newp = w.test, newnp = x.test, + type = "mean") Note that similar plots to those in Figure 2 can be drawn by the plot method using the ggplot2 (Wickham 2009) package with the option ggplot2 = TRUE option by default. Alternatively, the simulated data set in (18) can be fitted based on the assumption of an unknown error distribution using the function bsardpm with a Dirichlet process mixture model. By default, the function bsardpm utilizes a location-scale mixture, as described in Section 2.1, and the option of a scale mixture can be specified as location = FALSE in bsardpm. To complete the model specification of bsardpm, we additionally specify hyperparameters on κ and M as default values. Then, the BSAR model with a Dirichlet process mixture is fitted using the following R code, and the abridged output of fout.dpm is given as follows: R> prior <-append(prior, list(kappa_r0 = 1, kappa_s0 = 100, tmass_a = 2, + tmass_b = 4)) R> mcmc <-list(nblow = 20000, nskip = 10, smcmc = 5000) R> fout.dpm <-bsardpm(y~w + fs(x), nbasis = nbasis, mcmc = mcmc, + prior = prior, shape = "Free") R> summary(fout.dpm) . With the plot command on the fitted object, bsamGP provides a density plot of residuals for diagnostics as well as a fitted mean curve with point-wise 95% credible intervals, as shown in Figure 3.

Electricity demand data: BSAR with shape restriction
To illustrate the usage of the argument shape for the BSAR with shape restrictions, we first consider a data set on demand for electricity, which consists of 288 quarterly observations in Ontario for the period from 1971 to 1994, taken from Yatchew (2003) and available from https://www.economics.utoronto.ca/yatchew/. This data set has been analyzed using a cubic spline model (Engle, Granger, Rice, and Weiss 1986), a partial linear model (Yatchew 2003), and shape-restricted regression models , among others. The R package bsamGP includes the data set Elec.demand, which contains six variables: electricity demand (enerm), gross domestic product (gdp), price of electricity (pelec), price of natural gas (pgas), and the number of heating (hddqm) and cooling (cddqm) degree days relative to a reference temperature. We use Elec.demand to show how the results in Lenk and Choi (2017) can be reproduced.

R> xrange <-xmax -xmin
Following the approach of Lenk and Choi (2017), we fit a model to Elec.demand using the bsar function in bsamGP with shape restrictions. We use the logarithm of the ratio of electricity demand to gross domestic product as the response variable and the log price ratio of electricity to natural gas and temperature, which is the number of heating and cooling degree days relative to 68 • F, as the linear and nonlinear fixed effects, respectively. For the fixed effect of temperature, we enforce shape restrictions on its nonlinear regression function.
R> fout1 <-bsar(y~w + fs(x), nbasis = 50, mcmc = mcmc, prior = prior, + shape = "Decreasing", marginal.likelihood = TRUE, verbose = TRUE) R> fit1 <-fitted(fout1) R> plot(fit1) R> summary(fout1) Note that we specify the verbose option to be active, verbose = TRUE, so that the bsar function displays its running procedure with progress bars, acceptance rates in adaptation, burn-in periods, and sampling phases. The output of fout1 given by the summary method provides the log-integrated likelihood (LIL) and the fit statistics that are summarized in Table 1 as well as the posterior estimates of all unknown parameters. Figure 4 shows the parametric residuals y i − β w i and the posterior mean estimates f versus temperature (number of cooling or heating degree days) for the BSAR with a monotone decreasing restriction, obtained from the plot command, as in the previous R code. The plots also include the point-wise 95% highest posterior density (HPD) intervals for the estimated model.
In addition to the restriction of monotonicity, using the same data set, we consider two other types of shape restrictions, namely convexity and an S-shape, by specifying the argument shape, as required. The following R code illustrates how the BSAR models are fitted with these shape restrictions.
R> fout2 <-bsar(y~w + fs(x), nbasis = 50, mcmc = mcmc, prior = prior, + shape = "DecreasingConvex", marginal.likelihood = TRUE) R> fout3 <-bsar(y~w + fs(x), nbasis = 50, mcmc = mcmc, prior = prior, + shape = "DecreasingS", marginal.likelihood = TRUE) We summarize the fit statistics and estimated parameters of all three models in Table 1. The monotone decreasing model has a marginally better LIL than those of the model with convexity and the S-shaped model, as discussed in Lenk and Choi (2017). All of the models indicate that demand decreases as the price of electricity increases relative to the price of natural gas, as shown in Figure 5.

Micronutrient blood plasma data: BSAQ with shape restrictions
As a real data example of the BSAQ model with shape restrictions, we consider data on n = 314 healthy patients from a study of micronutrient levels in blood plasma (Nierenberg, Stukel, Baron, Dain, and Greenberg 1989), available from http://lib.stat.cmu.edu/ datasets/Plasma_Retinol. This data set was originally used to investigate the relationship between personal characteristics and plasma concentrations of micronutrients such as retinol or beta-carotene. Recently, Meyer et al. (2011) analyzed the same data set for the effects of age, body mass index (BMI), and smoking status on the log of blood plasma beta-carotene using semiparametric additive normal regression models. They considered a shape-constrained regression model to fit the data, with a decreasing function of BMI and an increasing function of age, based on shape-restricted splines.
The R package bsamGP includes the data set plasma, which contains 14 variables: 12 personal characteristics (age, sex, smoking status, etc.) and two variables (beta-carotene and retinol) for the plasma concentrations of micronutrients. For a more detailed description of the data, see Nierenberg et al. (1989). Following the work of Meyer et al. (2011), we focus on the relationship between the log of blood plasma beta-carotene (betaplasma) and smoking status (current or not), age, and body mass index (bmi), which is defined as weight/height 2 in the units kg/m 2 . Here, we are interested in quantile regression functions rather than the mean regression of Meyer et al. (2011), and, thus, we apply the BSAQ by imposing the same shape restrictions as Meyer et al. (2011) for illustration purposes. Specifically, in order to evaluate the effects of age, BMI, and smoking status on the blood plasma concentration of beta-carotene, we consider the bsaq function, estimating the model given by where smoke denotes a design matrix containing a dummy variable for smoking status ("current smoker" or "not"), and the functions f 1 and f 2 are assumed to be decreasing and increasing, respectively.
Other quantile regression functions can be considered by specifying the argument p with different quantile levels of interest in the function bsaq. As an illustration, we provide the quantile regression functions of bmi and age with shape restrictions using BSAQ models at p = {0.1, 0.5, 0.9} (see Figure 7). In each panel, the black line denotes the quantile regression function for current non-smokers, and the red line indicates the function for current smokers. All of the quantile regression function estimates for bmi and log(betaplasma) show overall negative trends, whereas the estimates for age and log(betaplasma) demonstrate increasing relationships, as enforced by the argument shape in bsaq. In both relationships, the level of log(betaplasma) for current non-smokers is higher than that for current smokers, which indicates that the level of betaplasma is influenced by the smoking status.
Alternatively, the data set plasma can be fitted based on the assumption of an unknown error distribution using the function bsaqdpm with a Dirichlet process scale mixture model for BSAQ, as described in Section 2.3. The following R code illustrates how to fit the additive BSAQ model with shape restrictions using bsaqdpm for plasma data. For an object returned by bsaqdpm, the plot method provides estimated quantile curves as well as an estimated density plot of residuals for diagnostics, as shown in Figure 8. Age log(bpbc) -f1(bmi) Figure 7: Posterior mean estimates for the quantile regression function of bmi and age on log(betaplasma) levels. The top panels show the relationship between log(betaplasma) and bmi, and the bottom panels display the relationship between log(betaplasma) and age.

Wage-union membership data: GBSAR binary regression with shape restrictions
In order to illustrate the GBSAR model for binary responses with shape restrictions, we consider the data on wages and union membership of Berndt (1991), as discussed in Ruppert, Wand, and Carroll (2003) and Crainiceanu, Ruppert, and Wand (2005), among others. The data consists of a random sample of 534 US workers on 11 variables from the 1985 Current Population Survey (CPS), available from http://lib.stat.cmu.edu/datasets/CPS_85_Wages and also from the R package SemiPar (Wand 2018).
The R package bsamGP includes the data set wage.union, which has one dependent variable (an indicator of union membership, union), and ten predictor variables, including six dummy variables for southern residence (south), race, occupational status (occupation), sector, sex, and marital status (married), and four continuous variables for experience, wages, age, and education. Ruppert et al. (2003) and Crainiceanu et al. (2005) analyzed the wageunion membership data using a semiparametric additive logistic regression model with splines because the standard linear, quadratic and cubic regressions were not adequate for identifying the features of union membership probability. They used wages, age, education, race, gender, and south as predictor variables and modeled the first three predictors (wages, age, and education) with penalized splines. They found that the effect of wages appears to be non-monotonic and that the probability of union membership tends to increase with age and decrease with education. Here, we re-analyze the same data set with the generalized semiparametric additive logistic regression, using the gbsar function of bsamGP. We assume that age and education are monotonically related to the probability of union membership. The R code illustrates how to fit the GBSAR model with a logit link for the binary data using the arguments family = "bernoulli", link = "logit" and the default prior distributions in gbsar. R> data("wage.union", package = "bsamGP") R> attach(wage.union) R> race <-ifelse(race == 3, 1, 0) R> mcmc <-list(nblow0 = 10000, nblow = 10000, nskip = 10, smcmc = 5000) R> foutLogit <-gbsar(union~race + sex + south + fs(wage) + fs(age) + + fs(education), family = "bernoulli", link = "logit", mcmc = mcmc, + nbasis = 50, shape = c("Free", "Decreasing", "Increasing")) R> fitLogit <-fitted(foutLogit, HPD = FALSE) Note that the gbsar function for the logistic regression uses adaptive Metropolis algorithms (Haario et al. 2001;Atchadé and Rosenthal 2005;Roberts and Rosenthal 2009) to generate posterior samples by default. To draw posterior samples from the algorithm of Holmes and Held (2006), the user should utilize the gbsar function with the option algorithm = "KS".
The analysis results using bsamGP are shown in Figure 9. The plots show the effects of the three predictors (wage, age and education) modeled nonparametrically. The Y-axes are the estimated probabilities of union membership. Each plot displays the posterior mean estimates and 95% point-wise credible intervals. The non-monotonic behavior of the probability of union membership is captured similarly to that found in Ruppert et al. (2003) and Crainiceanu et al. (2005). In addition, the monotonic (increasing with age and decreasing with education) relationship between the probability of union membership and age and education is captured reasonably well, as shown in Figure 9. In particular, the probability of union membership nearly linearly increases with age and decreases with education.
Note that the same data can also be fitted with a probit regression using the gbsar function by replacing the argument link = "logit" with link = "probit", as in the R code below.

Mortality-temperature data: GBSAR count response regression with shape restrictions
The health effects of exposure to extreme temperatures have been frequently investigated, and many epidemiological studies (e.g., Curriero, Heiner, Samet, Zeger, Strug, and Patz 2002;Anderson and Bell 2009;Gasparrini et al. 2015) have confirmed that when temperatures are extremely high or low, there is a significant association with increased mortality. Furthermore, the mortality rate has an exposure-response curve with respect to temperature with a specific shape. Consequently, we illustrate the shape-restricted semiparametric count response regression models for Poisson and negative binomial data using the arguments family and link in gbsar with a benchmark data set on the mortality-temperature relationship.
The data set is based on real applications from Gasparrini et al. (2015) for assessing the effect of mean temperature on mortality. In particular, the data include regional daily death occurrences in England and Wales in the United Kingdom (UK), which are publicly available from the Office for National Statistics, and daily mean temperature estimated from minimum and maximum temperatures, which are available from the British Atmospheric Data Centre. These data are described in detail in related epidemiological studies and recent publications (see, e.g., Armstrong et al. 2011;Gasparrini, Armstrong, Kovats, and Wilkinson 2012;Gasparrini et al. 2015). Specifically, the data set is part of that collected in the UK and used for a multi-country observation study in Gasparrini et al. (2015) based on a first-stage time-series quasi-Poisson regression model with Bsplines. Details on the data set are given in the supplementary appendix of Gasparrini et al. (2015), and the data set can be obtained from the supplementary web page at https: //github.com/gasparrini/2015_gasparrini_Lancet_Rcodedata. In order to illustrate gbsar, we use 5,113 daily observations from London for the period from 1993 to 2006, included as the data set London.Mortality in bsamGP. The data set London.Mortality consists of 5,113 observations on the number of deaths (death), mean daily temperature (tmean in • C), and other related variables, such as the relative humidity, maximum/minimum temperatures, and others (rh, tmax, tmin, date, dewp). Here, for simplicity, we seek only the effect of temperature on the mortality rate in order to demonstrate the gbsar function. We use a Poisson semiparametric regression model with a count response variable death and a continuous predictor tmean, and compare the fitted results, with or without appropriate shape restrictions. In particular, the U-shaped mortality-temperature relationship (e.g., Muggeo 2008;Anderson and Bell 2009;Muggeo 2010) can be identified and tested using the argument shape in the gbsar function.
The following R code illustrates the use of gbsar() in the bsamGP package for a Poisson semiparametric regression with shape restrictions. We specify the Poisson family with the log link function, with and without shape restrictions. We set the number of basis functions to 50 and use the default hyperparameters, as before. Additional hyperparameters for ω and ψ in the U-shaped restriction are specified by default as m 0,ψ = 100, v 2 0,ψ = 100 2 , m 0,ω = Q 2 , and v 2 0,ω = R/8, where Q 2 is the midpoint and R is the range of tmean, respectively. Note that we set the adaptation iterations to 10,000 (only applicable for shape restricted cases) and obtain 1,000 MCMC samples at every ten steps after 10,000 burn-in iterations.
The adequacy of the shape restrictions can be tested by computing the marginal likelihoods. The marginal likelihoods (LIL) given in the abridged output of fout.free and fout.U below show that the unrestricted Poisson semiparametric regression has a larger LIL and would be favored over that with a U-shaped restriction by the observed data set, in terms of model selection. However, note that the U-shaped model is not always correctly identified by the marginal likelihood and that when the two models are consistent with the true function, such as the unrestricted model versus the U-shaped model with additional parameters, as in the current exposure-response curve, the marginal likelihood may prefer a simpler model, as discussed in Lenk and Choi (2017). Further, the mortality rate from the shape-restricted model clearly shows the U-shaped exposure-response curve estimates and confirms the empirical findings on the shape of the curve in previous mortality-temperature studies (e.g., Curriero et al. 2002;Muggeo 2008;Anderson and Bell 2009). In addition, an estimate of ω, the abscissa of the minimum of the U-shaped curve, is provided as approximately 13.1 • C, the temperature that minimizes mortality, as shown in the abridged output of fout.U.  In addition, we check the adequacy of the Poisson regression versus the negative binomial regression with a U-shaped restriction for the data set by computing the marginal likelihoods for model selection. To fit the negative binomial regression, we specify the arguments in gbsar as family = "negative.binomial" and link = "log", as illustrated in the following R code and abridged output: R> fout.U.nb <-gbsar(death~fs(tmean), family = "negative.binomial", + link = "log", nbasis = 50, shape = "Ushape", mcmc = mcmc, + prior = prior) R> fit.U.nb = fitted ( Table 2 summarizes the computed marginal likelihoods of the two candidate models (U-shaped Poisson, and U-shaped negative binomial). Based on the results in Table 2, the U-shaped negative binomial semiparametric regression model provides the better fit for the data set than the U-shaped Poisson model.
Note that the data set was originally used in statistical analyses based on the over-dispersed Poisson distribution (Gasparrini et al. 2012) and the time-series quasi-Poisson model (Gasparrini et al. 2015) associated with different time lags. However, the current analysis considers only the mean temperature as a predictor of the mortality rate in order to illustrate the U-shape Poisson U-shape negative binomial LIL (Gelfand & Dey) −26965.14 −23668.52 LIL (Newton & Raftery) −26803.36 −23434.95 Table 2: LIL of London mortality data.
gbsar function of bsamGP. Thus, the negative binomial distribution could be favored over the Poisson distribution for the observed data, which is probably because over-dispersion is better explained by the negative binomial distribution.
Here, we use a default prior specification in the negative binomial regression with an additional hyperparameter, that is, the dispersion parameter κ with the shape parameter r 0,κ and the rate parameter s 0,κ , to be weakly informative by setting a prior mean of 1 and a prior variance of 100. Note that the negative binomial distribution can be represented as a Poisson-gamma mixture and that bsamGP also implements the mixture approach to the negative binomial regression by specifying family = "poisson.gamma".

Density estimation with the BSAD model
In addition to regression modeling, the BSAM provides a semiparametric density estimation procedure for the BSAD model, as described in Section 2.4. Here, we explain how to use the bsamGP package for the BSAD model as the final illustration of bsamGP using synthetic data. The data used in this analysis are based on a random sample of size n = 500 from a semiparametric density f (y) ∝ exp {h 1 (y)β 1 + h 2 (y)β 2 + z(y)}. Here, the parametric component has a truncated gamma distribution with h 1 (y) = log(y), β 1 = 3, h 2 (y) = y, β 2 = −1, defined on (0, 10), and the nonparametric component has a sigmoid function z(y) = tanh(y − 5). In order to generate samples from the semiparametric model, we utilize rejection sampling with a uniform envelope on (0, 10). The following R code illustrates the process of generating the synthetic data from the semiparametric model.
R> fit <-fitted(fout, HPD = FALSE) R> plot(fit) The posterior summary is also produced for the object returned by the bsad function using the summary command. The code and partial output of the posterior probability of each model and the natural logarithm of the marginal likelihoods are given as follows: R> summary(fout) .  Figure 11 and by the marginal likelihood in the posterior summary, the semiparametric model is more appropriate for fitting the simulated data than the parametric model is and, thus, provides a reasonable estimate of the probability density function.

Conclusion
This article has presented a user-friendly new R package called bsamGP for fitting semiparametric models using Gaussian process priors. The bsamGP package provides easy access to fitting Bayesian semiparametric models for regression and density estimation. The bsamGP package utilizes spectral representations in order to simplify the posterior computation of Gaussian processes through the linearization of the covariance structure. bsamGP contains popular parametric linear regression models, semiparametric partial linear regression models, semiparametric density estimation models, semiparametric quantile regression models, and generalized semiparametric models. All semiparametric regression models in bsamGP allow for additive models and nonlinear regression functions with, for example, monotonic, monotonic convex or concave, S-shaped, and U-shaped constraints. bsamGP uses compiled Fortran 90 code to maximize computational efficiency. Thus, the implementation is fast and reliable, making it viable for various Bayesian semiparametric models with shape restrictions to be applied more widely in practice.
Our package can be expanded in several directions. The current version allows for normal prior distributions for the coefficients of the linear regression and two smoothing prior distributions (T-smoother and Lasso smoother) for the nonparametric function. Thus, it could be expanded to allow spike-and-slab prior distributions for variable selection in the linear regression as well as for the nonparametric function with shape restrictions, which improves the model fit when the true unknown function is on the boundary of the constraints for the shape restrictions, as discussed in Jo et al. (2016) and Lenk and Choi (2017). For longitudinal data or clustered data, semiparametric additive mixed effects models can be considered in the BSAM framework by incorporating random effects and measurement errors in the covariates (e.g., Choi, Jo, Park, and Lenk 2017), and such models can be extended to and implemented by the bsamGP package. Furthermore, a companion software application is being developed as either a Python-based (van Rossum et al. 2011) graphical user interface (GUI) or an interactive web application using shiny (Chang, Cheng, Allaire, Xie, and McPherson 2019) in the R package bsamGP. These applications will have the same functionality as that of bsamGP but will allow practitioners to use bsamGP easily by means of a GUI instead of a command-line user interface in R.