Shrinkage in the Time-Varying Parameter Model Framework Using the R Package shrinkTVP

Time-varying parameter (TVP) models are widely used in time series analysis to flexibly deal with processes which gradually change over time. However, the risk of overfitting in TVP models is well known. This issue can be dealt with using appropriate global-local shrinkage priors, which pull time-varying parameters towards static ones. In this paper, we introduce the R package shrinkTVP (Knaus, Bitto-Nemling, Cadonna, and Fr\"uhwirth-Schnatter 2019), which provides a fully Bayesian implementation of shrinkage priors for TVP models, taking advantage of recent developments in the literature, in particular that of Bitto and Fr\"uhwirth-Schnatter (2019). The package shrinkTVP allows for posterior simulation of the parameters through an efficient Markov Chain Monte Carlo (MCMC) scheme. Moreover, summary and visualization methods, as well as the possibility of assessing predictive performance through log predictive density scores (LPDSs), are provided. The computationally intensive tasks have been implemented in C++ and interfaced with R. The paper includes a brief overview of the models and shrinkage priors implemented in the package. Furthermore, core functionalities are illustrated, both with simulated and real data.


Introduction
Time-varying parameter (TVP) models are widely used in time series analysis, because of their flexibility and ability to capture gradual changes in the model parameters over time. The popularity of TVP models in macroeconomics and finance is based on the fact that, in most applications, the influence of certain predictors on the outcome variables varies over time (Primiceri 2005;Dangl and Halling 2012;Belmonte, Koop, and Korobolis 2014). TVP models, while capable of reproducing salient features of the data in a very effective way, present a concrete risk of overfitting, as often only a small subset of the parameters are time-varying. Hence, in the last decade, there has been a growing need for models and methods able to discriminate between time-varying and static parameters in TVP models. A key contribution in this direction was the introduction of the non-centered parameterization of TVP models in arXiv:1907.07065v1 [econ.EM] 16 Jul 2019 Frühwirth-Schnatter and Wagner (2010), which recasts the problem of variance selection and shrinkage in terms of variable selection, thus allowing any tool used to this end in multiple regression models to be used to perform selection or shrinkage of variances. Frühwirth-Schnatter and Wagner (2010) employ a spike and slab prior, while continuous shrinkage priors have been utilised as a regularization alternative in, e.g., Belmonte et al. (2014) and Bitto and Frühwirth-Schnatter (2019). For an excellent review of shrinkage priors, with a particular focus on high dimensional regression, the reader is directed to Bhadra, Datta, Polson, and Willard (2017).
In this paper, we describe the shrinkTVP package (Knaus et al. 2019) for Bayesian TVP models with shrinkage. The package is available under the general public license (GPL ≥ 2) from the Comprehensive R Archive Network (CRAN) at https://cran.r-project. org/web/packages/shrinkTVP. The package efficiently implements recent developments in the Bayesian literature, in particular the ones presented in Bitto and Frühwirth-Schnatter (2019). The computationally intensive algorithms in the package are written in C++ and interfaced with R (R Core Team 2017) via the Rcpp (Eddelbuettel and Balamuta 2017) and the RcppArmadillo (Eddelbuettel and Sanderson 2014) packages. This approach combines the ease-of-use of R and its underlying functional programming paradigm with the computational speed of C++. The goal is to provide an easy entry point for fitting TVP models with shrinkage priors, while also giving more experienced users the option to adapt the model to their needs. This is achieved by providing a robust baseline model that can be estimated by only passing the data, while also allowing the user to specify more advanced options. Coupled with intuitive summary and plot methods, this leads to a package that is both easy to use while remaining highly flexible.
shrinkTVP is, to our knowledge, the only R package that combines TVP models with shrinkage priors. In the TVP models context, the most popular R package is dlm (Petris 2010), which provides routines for maximum likelihood estimation, Kalman filtering and smoothing, and Bayesian analysis for dynamic linear models (DLMs), of which TVP models are a subset. The package bsts (Scott 2019) performs Bayesian analysis for the closely related class of structural time series models. Moreover, a number of R packages providing regularization and shrinkage methods are available. For example, shrink (Dunkler, Sauerbrei, and Heinze 2016) implements various shrinkage methods for linear, generalized linear, or Cox regressions, biglasso (Zeng and Breheny 2017) aims at very fast lasso-type models for highdimensional linear regression and glmnet (Friedman, Hastie, and Tibshirani 2010) provides efficient procedures for implementing elastic-net regularization for a variety of models. With regards to Bayesian shrinkage, the normal-beta prime shrinkage prior is implemented in the package NormalBetaPrime (Bai and Ghosh 2019) and the popular horseshoe prior in the package horseshoe (van der Pas, Scott, Chakraborty, and Bhattacharya 2016). Both of these packages focus on high-dimensional regression models, and do not provide shrinkage for TVP models.
The remainder of the paper is organized as follows. Section 2 briefly introduces TVP models and the normal-gamma shrinkage priors, and describes the Markov Chain Monte Carlo (MCMC) algorithm for posterior simulation. The package shrinkTVP is introduced in Section 3. In particular, we illustrate how to run the MCMC sampler using the main function shrinkTVP, how to choose a specific model, and how to conduct posterior inference using the return object of shrinkTVP. Section 4 explains how to assess the performance of the model by calculating log predictive density scores (LPDSs), and how to use LPDSs to compare the predictive performances of different priors. This is illustrated using the usmacro.update data set from the bvarsv (Krueger 2015) package. Finally, Section 6 concludes the paper.

TVP models
Let us recall the state space form of a TVP model. For t = 1, . . . , T , we have that where y t is a univariate response variable and x t = (x t1 , x t2 , . . . , x td ) is a d-dimensional row vector containing the regressors at time t, with x t1 corresponding to the intercept. For simplicity, we assume here that Q = Diag(θ 1 , . . . , θ d ) is a diagonal matrix, implying that the state innovations are conditionally independent. Moreover, we assume the initial value follows a normal distribution, i.e. β 0 ∼ N d (β, Q) , with initial mean β = (β 1 , . . . , β d ). Model (1) can be rewritten equivalently in the non-centered parametrization as shrinkTVP is capable of modelling the observation error both homoscedastically, i.e. σ 2 t ≡ σ 2 for all t = 1, . . . , T and heteroscedastically, via a stochastic volatility (SV) specification. In the latter case, the log-volatility h t = log σ 2 t follows an AR(1) model (Jacquier, Polson, and Rossi 1994;Kastner and Frühwirth-Schnatter 2014;Kastner 2016). More specifically, . The stochastic volatility model on the errors can prevent the detection of spurious variations in the TVP coefficients (Nakajima 2011;Sims 2001) by capturing some of the variability in the error term.

Shrinkage priors on variances and model parameters
We place conditionally independent normal-gamma priors (Griffin and Brown 2010) both on the standard deviations of the innovations, that is the θ j 's, and on the means of the initial value β j , for j = 1, . . . , d. Note that, in the case of the standard deviations, this can equivalently be seen as a double gamma prior on the innovation variances θ j , for j = 1, . . . , d. This prior can be represented as a conditionally normal distribution, with the component specific variance following a gamma distribution, that is The prior variances ξ 2 j and τ 2 j are referred to as local shrinkage parameters, in that they control the strength with which each individual parameter √ θ j and β j is pulled toward zero. The parameters κ 2 and λ 2 are dubbed global shrinkage parameters, as they determine how strongly all parameters are pulled to zero. Since E(θ j |a ξ , κ 2 ) = 2/κ 2 and E(β 2 j |a τ , λ 2 ) = 2/λ 2 , the larger κ 2 and λ 2 , the stronger this effect. Finally, we refer to a ξ and a τ as shrinkage adaption parameters. As a ξ and a τ decrease, marginally more mass is placed around zero and jointly more mass is put on sparse specifications of the model. In particular, setting the local adaption parameters, a ξ and a τ , equal to one results in a Bayesian Lasso (Park and Casella 2008) prior on the √ θ j 's and the β j 's, respectively.
The parameters κ 2 , λ 2 , a ξ , a τ can be learned from the data through appropriate prior distributions. As priors for the global shrinkage parameters, we use Moreover, in order to learn the shrinkage adaption parameters, we generalize the approach taken in Bitto and Frühwirth-Schnatter (2019) and place the following gamma distributions as priors: which corresponds to the exponential prior used in Bitto and Frühwirth-Schnatter (2019) when ν ξ = 1 and ν τ = 1. The parameters ν ξ and ν τ act as degrees of freedom and allow the prior to be bounded away from zero.

Prior on the volatility parameter
In the homoscedastic case we employ a hierarchical prior, where the scale of an inverse gamma prior for σ 2 follows a gamma distribution, that is, with hyperparameters c 0 , g 0 , and G 0 .
In the case of stochastic volatility, the priors on the parameters µ, φ and σ 2 η in Equation (3) are chosen as in Kastner and Frühwirth-Schnatter (2014), that is with hyperparameters b µ , B µ , a φ , b φ , and B σ .

MCMC sampling algorithm
The package shrinkTVP implements an MCMC Gibbs sampling algorithm with Metropolis-Hasting steps to obtain draws from the posterior distribution of the model parameters.
Here, we roughly sketch the sampling algorithm and refer the interested reader to Bitto and Frühwirth-Schnatter (2019) for further details. 6. Sample the error variance σ 2 from an inverse gamma distribution in the homoscedastic case or, in the SV case, sample the level µ, the persistence φ, the volatility of the volatility σ 2 η and the latent log-volatilities h = (h 0 , . . . , h T ) using stochvol (Kastner 2016). When fitting the model under the full hierarchical shrinkage prior defined in Equations (4)-(7), all steps in Algorithm 1 are performed, while steps 4(a), 4(b), 5(a) and 5(b) are skipped in certain prior setups.
One key feature of the algorithm is the joint sampling of the time-varying parametersβ t , for t = 0, . . . , T in step 1 of Algorithm 1. We employ the procedure described in McCausland, Miller, and Pelletier (2011) which exploits the sparse, block tri-diagonal structure of the precision matrix of the full conditional distribution ofβ = (β 0 , . . . ,β T ), to speed up computations.
Moreover, as described in Bitto and Frühwirth-Schnatter (2019), in step 3 we make use of the ancillarity-sufficiency interweaving strategy (ASIS) introduced by Yu and Meng (2011). ASIS is well known to improve mixing by sampling certain parameters both in the centered and non-centered parameterization. This strategy has been successfully applied to univariate SV models (Kastner and Frühwirth-Schnatter 2014), multivariate factor SV models (Kastner, Frühwirth-Schnatter, and Lopes 2017) and dynamic linear state space models (Simpson, Niemi, and Roy 2017).

Running the model
The core function of the package shrinkTVP is the function shrinkTVP, which serves as an R-wrapper for the actual sampler coded in C++. The function works out-of-the-box, meaning that estimation can be performed with minimal user input. With default settings, the TVP model in Equation (1) is estimated in a Bayesian fashion with priors (4) to (7) and the following choice for the hyperparameters: d 1 = d 2 = e 1 = e 2 = 0.001, ν ξ = ν τ = 5 and b ξ = b τ = 10, implying a prior mean of E(a ξ ) = E(a τ ) = 0.1. The error is assumed to be homoscedastic, with prior defined in Equation (8) and hyperparameters c 0 = 2.5, g 0 = 5, and The only compulsory argument is an object of class "formula", which most users will be familiar with (see, for example, the use in the function lm in the package stats (R Core Team 2017)). The second argument is an optional data frame, containing the response variable and the covariates. Exemplary usage of this function is given in the code snippet below, along with the default output. All code was on run on a personal computer with an Intel i5-8350U CPU.
Converting results to coda objects and summarizing draws... Done! Note that the data in the example is generated by the function simTVP, which can create synthetic datasets of varying sizes for illustrative purposes. The inputs theta and beta can be used to specify the true θ 1 , . . . , θ d and β 1 , . . . , β d used in the data generating process, in order to evaluate how well shrinkTVP recaptures these true values. The values correspond to the ones used in the synthetic example of Bitto and Frühwirth-Schnatter (2019).
The user can specify the following MCMC algorithm parameters: niter, which determines the number of MCMC iterations including the burn-in, nburn, which equals the number of MCMC iterations discarded as burn-in, and nthin, indicating the thinning parameter, meaning that every nthin-th draw is kept and returned. The default values are niter = 10000, nburn = round(niter/2) and nthin = 1.
Hierarchical normal-gamma prior fixed G(d 1 , d 2 ) fixed G(e 1 , e 2 ) Normal-gamma prior fixed fixed fixed fixed Bayesian Lasso fixed at 1 fixed fixed at 1 fixed Table 1: Overview of different possible model specifications The user is strongly encouraged to check convergence of the produced Markov chain, especially for a large number of covariates. The output is made coda (Plummer, Best, Cowles, and Vines 2006) compatible, so that the user can utilize the tools provided by the excellent package coda to assess convergence.

Specifying the priors
More granular control over the prior setup can be exercised by specifying all or a subset of the parameters. In addition to changing the hyperparameters given in Section 2.2, the user can fix one or both values of the global shrinkage parameters (κ 2 , λ 2 ) and the shrinkage adaption parameters (a τ , a ξ ), instead of learning them from the data as done in the default setting. The benefit of this is twofold: on the one hand, desired degrees of sparsity and global shrinkage can be achieved through fixing the hyperparameters; on the other hand, interesting special cases arise from setting certain values of hyperparameters. For example, setting the local adaption parameters, a ξ and a τ , equal to one results in a Bayesian Lasso (Park and Casella 2008) prior on the √ θ j 's and the β j 's, respectively. If the user desires a higher degree of sparsity, this can be achieved by setting the shrinkage adaption parameters to a value closer to zero. Table 1 gives an overview of different model specifications. Note that separate prior choices can be made for the variances and the means of the initial values.
In the following, we give some examples of models that can be estimated with the shrink-TVP package. In particular, we demonstrate how certain combinations of input arguments correspond to different model specifications. Note that in the following snippets of code, the argument display_progress is always set to FALSE, in order to suppress the progress bar and other outputs.

Stochastic volatility specification
The stochastic volatility specification defined in Equation (3) can be used by setting the option sv to TRUE. This is made possible by a call to the update_sv function exposed by the stochvol package. The code below fits a model in which all the parameters are learned and the observation equation errors are modeled through stochastic volatility: R> res_sv <-shrinkTVP(y~x1 + x2, data = data, sv = TRUE, + display_progress = FALSE) The priors on the SV parameters are the ones defined in Equation 9, with hyperparameters fixed to b µ = 0 , B µ = 1, a φ = 5, b φ = 1.5 , and B σ = 1.

Specifying the hyperparameters
Beyond simply switching off parts of the hierarchical structure of the prior setup, users can also modify the hyperparameters governing the prior distributions. This can be done through the arguments hyperprior_param and sv_param, which both have to be named lists.
In addition to user defined hyperparameters, unspecified parameters will be set to default values, as defined in Section 3.2.

Posterior inference: Summarize and visualize the posterior distribution
The return value of shrinkTVP is an object of type shrinkTVP_res, which is a named list containing a variable number of elements, depending on the prior specification. For the default model, the values are: 1. the parameter draws of σ 2 in sigma2, 2. the parameter draws of ( √ θ 1 , . . . , √ θ d ) in theta_sr, 3. the parameter draws of β = (β 1 , . . . , β d ) in beta_mean, 4. a list holding d mcmc.tvp objects (one for each β j = (β j0 , . . . , β jT )) containing the parameter draws in beta, 5. the parameter draws of ξ 2 1 , . . . , ξ 2 d , in xi2, 6. the parameter draws of a ξ in a_xi, 7. some acceptance statistics for the Metropolis Hastings step for a ξ in a_xi_acceptance, 8. the parameter draws of τ 2 1 , . . . , τ 2 d in tau2, 9. the parameter draws of a τ in a_tau, 10. some acceptance statistics for the Metropolis Hastings step for a τ in a_tau_acceptance, 11. the parameter draws for κ 2 in kappa2, 12. the parameter draws for λ 2 in lambda2, 13. the parameter draws of C 0 in C0,

the prior hyperparameters in priorvals,
15. the design matrix and the response in model, and 16. summary statistics for the parameter draws in summaries.
When some parameters are fixed by the user, the corresponding output value is omitted. In the SV case, the draws for the parameters of the SV model on the errors are contained in sv_mu, sv_phi and sv_sigma. For details, see Kastner (2016). The two main tools for summarizing the output of shrinkTVP are the summary and plot methods implemented for shrinkTVP_res objects. summary has two arguments beyond the mandatory shrinkTVP_res object itself, namely digits and showprior, which control the output displayed. digits indicates the number of decimal places to round the posterior summary statistics to, while showprior determines whether or not to show the prior distributions resulting from the user input. In the example below, the default digits value of 3 is used, while the prior specification is omitted. The output of summary consists of the mean, standard deviation, median, 95% highest posterior density region and effective sample size (ESS) for the non time-varying parameters.
Statistics of posterior draws of parameters (thinning = 1): The plot method can be used to visualize the posterior distribution estimated by shrinkTVP.
Aside from the shrinkTVP_res object, its arguments are the mandatory pars, a character vector containing the names of the parameters to visualize, and in the case of time-varying parameters nplot, which controls the number of plots to display per page. The names supplied in pars have to coincide with the names of the list elements of the shrinkTVP_res object. The default value is c("beta"), i.e. empirical quantiles of the posterior of β t over time are shown. If there are too many plots to fit on one page, plot will cycle through all parameters to display. It will call either plot.mcmc from the coda package, if the parameter is non time-varying, or plot.mcmc.tvp from the shrinkTVP package for time-varying parameters, passing all additional arguments specified via ... to the respective plotting functions. See the code below for an example and Figure 1 for the corresponding output.

R> plot(res)
To visualize other parameters via the plot method, the user has to change the pars argument. pars can either be set to a single character object or to a vector of characters containing the names of the parameter draws to display. In the latter case, the plot method will display groups of plots at a time, prompting the user to move on to the next series of plots, similarly to how coda handles long plot outputs. Naturally, as all parameter draws are converted to coda objects, any method from this package that users are familiar with (e.g. to check convergence) can be applied to the parameter draws contained in a shrinkTVP_res object. An example of this can be seen in Figure 2, where pars = "theta_sr", changes the output to a graphical summary of the parameter draws of  Figure 3: Visualization of the evolution of the time-varying parameter β jt over time t = 100, . . . , 200 for j = 1, . . . , 3, as provided by the plot method. plot is in turn calling plot.mcmc.tvp on the individual mcmc.tvp objects. Arbitrary arguments can be passed to plot, in this example the x-axis of the plot was restricted with xlim and the label of the y-axis was changed with ylab. The median is always displayed as a black line, and the red dotted lines indicate the pointwise 2.5%, 25%, 75% and 97.5% quantiles, unless otherwise specified.
The plot method will pass any additional arguments on to the respective plotting functions, allowing for some flexibility in the displayed plots. An example of this behavior is demonstrated in the code below and the resulting Figure 3, where the x-axis is truncated with the xlim argument and the label for the y-axis is changed by the ylab argument.
R> plot(res, pars = "beta", xlim = c(100, 200), + ylab = expression(beta[c]), xlab = "t") The plot.mcmc.tvp method displays empirical posterior quantiles of a single time-varying parameter over time. Instead of being called indirectly via plot, the user can also directly call it by calling plot on a single mcmc.tvp object within the shrinkTVP_res object. plot.mcmc.tvp takes one additional mandatory argument, probs, which is a vector of quantiles to plot. plot.mcmc.tvp will automatically add 0.5, i.e. the median, to probs if the user does not. It will pass on any additional arguments specified via ..., allowing the user some flexibility concerning the final plot. In the code below, this behavior is exploited to add a title to the plot, resulting in Figure 4.

Predictive performances and model comparison
Within a Bayesian framework, a natural way to predict a future observation is through its posterior predictive density. For this reason, log-predictive density scores (LPDSs) provide a means of assessing how well the model performs in terms of prediction on real data. The log-predictive density score for time t 0 + 1 is obtained by evaluating at y t 0 +1 the log of the posterior predictive density obtained by fitting the model to the previous t 0 data points. Given the data up to time t 0 , the posterior predictive density at time t 0 + 1 is given by where ψ is the set of model parameters and latent variables up to t 0 + 1. For a TVP model with homoscedastic errors, ψ = (β 0 , . . .β t 0 +1 , √ θ 1 , . . . , √ θ d , β 1 , . . . , β d , σ 2 ), whereas for a TVP model with SV errors, ψ = (β 0 , . . .β t 0 +1 , √ θ 1 , . . . , √ θ d , β 1 , . . . , β d , σ 2 1 , . . . , σ 2 t 0 +1 ). Given M samples from the posterior distribution of the parameters and latent variables, p(ψ|y 1 , . . . , y t 0 ), Monte Carlo integration could be applied immediately to approximate (10). However, Bitto and Frühwirth-Schnatter (2019) propose a more efficient approximation of the predictive density, the so-called conditionally optimal Kalman mixture approximation which is obtained by analytically integrating outβ t 0 +1 from the likelihood at time t 0 + 1.
In the homoscedastic error case, given M samples from the posterior distribution of the parameters and the latent variables up to t 0 , the predictive density approximated by Monte Carlo integration is given by  Figure 4: Visualization of the evolution of the stochastic volatility σ 2 t over time t = 1, . . . , T , as provided by the plot.mcmc.tvp method. The median is always displayed as a black line, and the red dotted lines indicate the pointwise 2.5%, 25%, 75% and 97.5% quantiles, unless otherwise specified.
where the conditional predictive densities of y t 0 +1 are Gaussian with the mean and the variance depending on the MCMC draws. These moments are computed for the mth MCMC iteration from F t 0 +1 = x t 0 +1 Diag( √ θ 1 , . . . , θ d ) and the mean m t 0 and the covariance matrix Σ t 0 of the posterior distribution ofβ t 0 . These quantities can be obtained by iteratively calculating Σ t and m t up to time t 0 , as described in McCausland et al. (2011): The quantities c t , Ω tt and Ω t−1,t for t = 1, . . . , t 0 are given in Appendix A.
For the SV case, it is still possible to analytically integrate outβ t 0 +1 from the likelihood at time t 0 + 1 conditional on a known value of σ 2 t 0 +1 , however it is not possible to integrate the likelihood with respect to both latent variablesβ t 0 +1 and σ 2 t 0 +1 . Hence, at each MCMC iteration a draw is taken from the predictive distribution of σ 2 t 0 +1 = exp(h t 0 +1 ), derived from Equation (3), and used to calculate the conditional predictive density of y t 0 +1 . The approximation of the one step ahead predictive density can then be obtained through the following steps: 1. for each MCMC draw of (µ, φ, σ 2 η ) (m) and h (m) t 0 , obtain a draw of (σ 2 t 0 +1 ) (m) ; 2. calculate the conditionally optimal Kalman mixture approximation as , m t 0 and Σ t 0 are defined as above.
The shrinkTVP package provides a way to calculate the LPDSs using the function shrinkTVP. When LPDS is set to TRUE, the data provided via data or the formula interface are assumed to contain the covariates and response up to time t 0 . The values of x t 0 +1 are then supplied through x_test, while y t 0 +1 is passed to y_test, as shown in the following snippet of code.

Predictive exercise: usmacro dataset
In the following, we provide a brief demonstration on how to use the shrinkTVP package on real data and compare different prior specifications via LPDSs. Specifically, we consider the usmacro.update dataset from the bvarsv package (Krueger 2015). The dataset usmacro.update contains the inflation rate, unemployment rate and treasury bill interest rate for the United States, from 1953:Q1 to 2015:Q2, that is T = 250. The same dataset up to 2001:Q3 was used by Primiceri (2005). The response variable is the inflation rate inf, while the predictors are the lagged inflation rate inf_lag, the lagged unemployed rate une_lag and the lagged treasury bill interest tbi_lag. We construct our dataset as follows: R> library(bvarsv) R> data("usmacro.update") R> # Create matrix of lags and create final data set R> lags <-usmacro.update[1:(nrow(usmacro.update) -1), ] R> colnames(lags) <-paste0(colnames(lags), "_lag") R> us_data <-data.frame (inf = usmacro.update[2:nrow(usmacro.update), "inf"], + lags) In the snippet of code below, we run the default TVP model with the full hierarchical shrinkage prior for 60000 iterations, with a thinning of 10 and a burn-in of 10000, hence keeping 5000 posterior draws.
Statistics of posterior draws of parameters (thinning = 10): It appears clear by looking at Figure 5 that the intercept and the parameter associated with the lagged inflation rate are time-varying, while the parameters associated with the lagged treasury bill interest rate and the lagged unemployment rate are relatively constant. This can be confirmed by looking at the posterior distributions of the corresponding standard deviations, displayed in Figure 6. The posterior densities of the standard deviations associated with the intercept and the lagged inflation are bimodal, with very little mass around zero. This bimodality results from the non-identifiability of the sign of the standard deviation. As a convenient side effect, noticeable bimodality in the density plots of the posterior distribution p( √ θ j |y) of the standard deviations √ θ j is a strong indication of time variability in the associated parameter β jt . Conversely, the posterior densities of the standard deviations associated with the lagged unemployment and the lagged treasury bill interest rate have a pronounced spike at zero, indicating strong model evidence in favor of constant parameters. Moreover, the path of the parameter of the treasury bill interest rate is centered at zero, indicating that this parameter is neither time-varying nor significant.
In order to compare the predictive performances of different shrinkage priors, we calculate one step ahead LPDSs for the last 50 points in time for five different prior choices: (1) the full hierarchical shrinkage prior, (2) the hierarchical normal-gamma prior with fixed a ξ = a τ = 0.1, (3) the normal-gamma prior with a ξ = a τ = 0.1 and κ 2 = λ 2 = 20, (4) the hierarchical Bayesian Lasso, and (5) the Bayesian Lasso with κ 2 = λ 2 = 20. Figure 7 shows the cumulative LPDSs for the last 50 quarters of the usmacro.update dataset. The default prior, that is the fully hierarchical shrinkage prior on both the β j 's and the θ j 's, performs the best in terms of prediction amongst the five fitted models. In Appendix B we show how to obtain LPDSs for different models and points in time, using the packages foreach (Microsoft and Weston 2017) and doParallel (Microsoft and Weston 2018).

Conclusions
The goal of this paper was to introduce the reader to the functionality of the R package shrinkTVP (Knaus et al. 2019). This R package provides a fully Bayesian approach for statistical inference in TVP models with shrinkage priors.
On the one hand, the package provides an easy entry point for users who want to pass on only their data in a first step of exploring TVP models for their specific application context. Running the function shrinkTVP under the default model with a fully hierarchical shrinkage prior with predefined hyperparameters, estimation of a TVP model becomes as easy as using the well-known function lm for a standard linear regression model. On the other hand, exploiting numerous advanced options of the package, the more experienced user can also explore alternative model specifications such as the Bayesian Lasso and use log-predictive density scores to compare various model specifications.  Various examples of the usage of shrinkTVP were given, and the summary and plot methods for straightforward posterior inference were illustrated. Furthermore, a predictive exercise with the dataset usmacro.updade from the package bvarsv was performed, with a focus on the calculation of LPDSs using shrinkTVP. The default model in shrinkTVP showed better performance than its competitors in terms of cumulative LPDSs. While these examples were confined to univariate responses, the package can also be applied in a multivariate context, for instance to the sparse TVP Cholesky SV model considered in Bitto and Frühwirth-Schnatter (2019), exploiting a representation of this model as a system of independent TVP models with univariate responses.
A plan for further versions of the package is to implement additional shrinkage priors for TVP models such as the well-known horseshoe prior (Bhadra et al. 2017).