Modeling Univariate and Multivariate Stochastic Volatility in R with stochvol and factorstochvol

Stochastic volatility (SV) models are nonlinear state-space models that enjoy increasing popularity for fitting and predicting heteroskedastic time series. However, due to the large number of latent quantities, their efficient estimation is non-trivial and software that allows to easily fit SV models to data is rare. We aim to alleviate this issue by presenting novel implementations of four SV models delivered in two R packages. Several unique features are included and documented. As opposed to previous versions, stochvol is now capable of handling linear mean models, heavy-tailed SV, and SV with leverage. Moreover, we newly introduce factorstochvol which caters for multivariate SV. Both packages offer a user-friendly interface through the conventional R generics and a range of tailor-made methods. Computational efficiency is achieved via interfacing R to C++ and doing the heavy work in the latter. In the paper at hand, we provide a detailed discussion on Bayesian SV estimation and showcase the use of the new software through various examples.


Introduction
Time dependent variance is an indispensable ingredient of financial and economic time series modeling. Already Markowitz (1952) concerns himself with methods that take into account heteroskedasticity in a better way than a rolling window estimation. By 1982, two fundamentally different approaches had been developed to cater to these needs. On the one hand, Engle (1982) lays the groundwork for a family of time varying volatility models, most notably the generalized autoregressive conditional heteroskedasticity model (GARCH, Bollerslev 1986). These models feature conditionally deterministic changes in the variance. Taylor (1982), on the other hand, addresses heteroskedasticity in his seminal work with a non-linear latent state space model, later coined the stochastic volatility (SV) model. There, the volatility process evolves in a stochastic manner. Despite some empirical evidence in favor of SV models over their corresponding GARCH counterparts (Jacquier, Polson, and Rossi 1994;Ghysels, Harvey, and Renault 1996;Kim, Shephard, and Chib 1998;Nakajima 2012), SV and its variants enjoy little publicity among practitioners. As Bos (2012) underlines, one reason for this might be the lack of standard software. In response, Kastner (2016) provides a first version of the R (R Core Team 2020) package stochvol but fails to feature conditional non-Gaussianity, asymmetry (the so-called leverage effect), and multivariate generalizations.
We address these shortcomings in the manuscript at hand. First, we extend stochvol (Hosszejni and Kastner 2021) with several practically relevant univariate methods. Second, we introduce the new companion package factorstochvol  which focuses on the multivariate case. The extended stochvol now provides the means for the Bayesian estimation of vanilla SV, heavy-tailed SV, SV with leverage, and heavy-tailed SV with leverage (Harvey and Shephard 1996;Omori, Chib, Shephard, and Nakajima 2007;Nakajima and Omori 2012). Moreover, the package also handles these models naturally when embedded into a linear model or an autoregressive (AR) context. The factorstochvol package implements an efficient method for the Bayesian estimation of the factor SV model (Kastner, Frühwirth-Schnatter, and Lopes 2017). Among other features, the package provides several automatic factor identification schemes, hierarchical shrinkage priors (variations of the normal gamma prior, Griffin and Brown 2010), and an array of intuitive visualization methods for the highdimensional posteriors.
The remainder of this paper is structured as follows. We formally introduce the univariate and the multivariate models in Sections 2 and 3, respectively, including a discussion about prior distributions and a brief overview of the estimation methods. In Section 4, we unveil the new samplers of the stochvol package through three example models. We describe the factorstochvol package in Section 5, and then we conclude.

Univariate SV models
We begin by introducing the vanilla SV model with linear regressors, henceforth simply called the SV model. This is a minor but important extension of the SV model without regressors. We also settle the notation and establish a baseline model that we generalize and reuse throughout the manuscript. Consequently, we proceed with three generalized models: the SV model with Student's t errors (SVt), the SV model with leverage (SVl), and their combination, the SV model with Student's t errors and leverage (SVtl). Finally, we close the section after discussing prior distributions and Markov chain Monte Carlo (MCMC) sampling.

Model specifications
The key feature of the SV model is its stochastic and time-varying specification of the variance evolution. In particular, the log-variance is assumed to follow an AR(1) process. This feature unites the following models.

Vanilla SV with linear regressors
Let y = (y 1 , . . . , y n ) denote a vector of observations. The SV model assumes the following structure for y, where N (b, B) denotes the normal distribution with mean b ∈ R and variance B ∈ R + , and ε t and η t are independent. The log-variance process h = (h 1 , . . . , h n ) is initialized by h 0 ∼ N (µ, σ 2 /(1 − ϕ 2 )). X = (x 1 , . . . , x n ) is an n × K matrix containing in its tth row the vector of K regressors at time t. The K regression coefficients are collected in β = (β 1 , . . . , β K ) . We refer to ϑ = (µ, ϕ, σ) as the SV parameters: µ is the level, ϕ is the persistence, and σ (also called volvol) is the standard deviation of the log-variance.

SV with Student's t errors
Several authors have suggested to use non-normal conditional residual distributions for stochastic volatility modeling. Examples include the Student's t distribution (Harvey, Ruiz, and Shephard 1994), the extended generalized inverse Gaussian (Silva, Lopes, and Migon 2006), (semi-)parametric residuals (Jensen and Maheu 2010;Delatola and Griffin 2011), or the generalized hyperbolic skew Student's t distribution (Nakajima and Omori 2012). We implement Student's t errors for the observation equation in stochvol. Formally, ε t ∼ t ν (0, 1), where ε t and η t are independent. t ν (a, b) is the Student's t distribution with ν degrees of freedom, mean a, and variance b. The single difference between Equation 1 and Equation 2 is that here the observations are conditionally t distributed. Hence, Equation 2 generalizes Equation 1 through the new parameter ν as the Student's t distribution converges in law to the standard normal distribution when ν goes to infinity.

SV with leverage
Propositions for asymmetric innovations include non-parametric distributions (Jensen and Maheu 2014), skewed distributions (Nakajima and Omori 2012), and distributions featuring correlation with the variance process, also called the leverage effect (Harvey and Shephard 1996;Jacquier, Polson, and Rossi 2004). We implement the leverage effect in the stochvol package. Formally, where the correlation matrix of (ε t , η t ) is The vector ζ = (µ, ϕ, σ, ρ) collects the SV parameters. The new parameter compared to Equation 1 is a correlation term ρ which relates the residuals of the observations to the innovations of the variance process. Equation 1 is therefore a special case of Equation 3 with a pre-fixed ρ = 0.

SV with Student's t errors and leverage
Some authors have proposed the combination of t errors with the leverage effect (Jacquier et al. 2004;Omori et al. 2007;Nakajima and Omori 2009). We implement the common generalization of Equation 2 and Equation 3. Formally, where the correlation matrix of (ε t , η t ) is Σ ρ as in Equation 4.

Prior distributions
We a priori assume β ∼ N K (b β , B β ), where N l (b, B) is the l-dimensional normal distribution with mean vector b and variance-covariance matrix B. For small values in the diagonal of B β , this prior enforces shrinkage towards b β ; for large values in the diagonal, the prior turns rather uninformative. By setting b β to the zero vector and B β to a scaled identity matrix, the prior distribution becomes the Bayesian analogue to ridge regression (see, e.g., Park and Casella 2008, for a discussion of this and other shrinkage priors).
The level µ ∈ R is unrestricted, hence we can apply the common µ ∼ N (b µ , B µ ) prior. Depending on the application, a fairly uninformative distribution is the usual choice, e.g., setting b µ = 0 and B µ ≥ 100 for daily asset log returns. In our experience, the exact values of the prior mean and prior variance of µ do not strongly affect the estimation results unless B µ is small. To achieve stationarity in the variance process, a restricted persistence ϕ ∈ (−1, 1) is needed.
To this end, we assume (ϕ + 1)/2 ∼ B(a ϕ , b ϕ ), where B(a ϕ , b ϕ ) is the beta distribution with shape parameters a ϕ and b ϕ . The selection of the shape parameters may be relatively influential with many data sets. In financial applications with daily asset log returns, the variance tends to be highly persistent, i.e., ϕ ≈ 1. Such domain knowledge can be used as prior information by allocating more probability to positive high values of ϕ, e.g., by setting a ϕ 5 and b ϕ ≈ 1.5. As an alternative, when stationarity is not assumed, the untruncated prior ϕ ∼ N (b ϕ , B ϕ ) can also be applied.
The volvol is positive but we would like allow σ to approach 0 as closely as neededthat allows us to be less informative and to improve the estimates. Following Frühwirth-Schnatter and Wagner (2010) and Kastner and Frühwirth-Schnatter (2014), we advocate σ ∼ |N (0, B σ )| instead, where |N (0, B σ )| denotes the half normal distribution. It corresponds to σ 2 ∼ G(1/2, 1/(2B σ )), where G(a, b) is the gamma distribution with shape parameter a and rate parameter b. As an alternative, the commonly applied and convenient conjugate gamma prior on σ −2 can be assumed. However, it bounds σ away from 0 and it is therefore in our view an unsatisfactory choice.
As a last step in fully specifying the vanilla SV model in Equation 1, the variance process is initialized a priori with its stationary distribution, i.e., h 0 ∼ N (µ, σ 2 /(1 − ϕ 2 )). This consistently extends our prior assumptions about h following a stationary AR(1) process.
As an alternative, when stationarity is not assumed, h 0 ∼ N (µ, B h ) can be applied with a constant variance B h .
The SV models with Student's t errors additionally require the prior specification of the degrees of freedom parameter ν. To ascertain interpretability of the scaling exp(h t /2), we ensure finite second moments of y by enforcing ν > 2. As a reviewer recommended, we follow (Geweke 1993) and equip ν with an exponential prior ν − 2 ∼ E(λ ν ), where λ ν is the rate of the exponential distribution.
Finally, in the case of the SV models with leverage, we employ the translated and scaled beta distribution for ρ ∈ (−1, 1) as in Omori et al. (2007), i.e., (ρ + 1)/2 ∼ B(a ρ , b ρ ). We find that the posterior estimates of ρ can be sensitive to its prior distribution, thus, some care is needed when setting the hyperparameters in practice. In our experience, slightly informative choices such as a ρ = b ρ ≈ 4 work well in financial applications.

Estimation
All methods implemented in stochvol and factorstochvol rely on the Bayesian paradigm. 1 Bayesian analysis aims to estimate model parameters through Bayesian updating. By using probability distributions to represent information, Bayes' theorem can be employed to update the prior information to the posterior information by incorporating the observations. This approach has the advantage of providing full uncertainty quantification in a probabilistic framework without relying on asymptotic results; moreover, so-called shrinkage priors can be used to regularize the posterior and guard against overfitting. For an introductory textbook on Bayesian statistics, see, for instance, McElreath (2015).
When the posterior distribution is not available analytically, one customarily resorts to approximations such as perfect simulation (Huber 2015), approximate Bayesian computation (Sisson, Fan, and Beaumont 2018), adaptive Monte Carlo methods (Roberts and Rosenthal 2007), or MCMC methods. When computationally feasible, MCMC is a valuable tool that provides draws from the posterior distribution in question. That way, MCMC approximates the posterior distribution similarly to a histogram approximating a density. For a more in-depth introduction on MCMC methods, see, for instance, Brooks, Gelman, Jones, and Meng (2011).
The estimation algorithm of SV, SVt, SVl, and SVtl all resemble the original methodology developed in Kastner and Frühwirth-Schnatter (2014) for the vanilla SV model. Namely, to draw from the posterior distribution of h efficiently, the MCMC sampler resorts to approximate mixture representations of Equations 1, 2, 3 and 5 similar to the ones in Kim et al. (1998) and Omori et al. (2007). Doing so yields a conditionally Gaussian state space model for which efficient sampling methods are available (Frühwirth-Schnatter 1994;Carter and Kohn 1994). Following Rue (2001) and McCausland, Miller, and Pelletier (2011), we draw the full vector h "all without a loop" (AWOL).
When Student's t errors with unknown degrees of freedom are used, we handle the added complication through the well-known representation of the t distribution as a scale mixture of Gaussians. This requires additional Gibbs and independence Metropolis-Hastings steps documented in Kastner (2015). Furthermore, we deal with the increased complexity in the posterior space of the leverage case by repeated ancillarity-sufficiency interweaving strategies (ASIS, Yu and Meng 2011) steps in the sampling scheme, see Hosszejni and Kastner (2019) for details.
To verify the correctness of the implementation, unit tests are included in the package which can be run by devtools::test() (Wickham, Hester, and Chang 2020). In particular, a variant of Geweke's test (Geweke 2004) is part of the test suite. In this test, we exploit that the sampling distribution of the model parameters during the Geweke test is identical to their preset prior distribution. Therefore, the cumulative distribution function maps the sample to a uniform distribution, which in turn is mapped to a normal distribution using the normal distribution's quantile function. If the user chooses to execute the automated unit tests in stochvol, the system evaluates the thinned and transformed sample using the shapiro.test() function, where the thinning of the sample is done to approximate independent sampling.

Multivariate SV models
A key difficulty accompanying dynamic covariance estimation is the relatively high number of unknowns compared to the number of observations. More precisely, letting m denote the cross-sectional dimension, the corresponding covariance matrix Σ t contains m(m + 1)/2 degrees of freedom, a quadratic term in m. Table 1 illustrates the "curse of dimensionality" for various values of m. One way to break this curse is to use latent factors and thereby achieve a sparse representation of Σ t .

The factor SV model
Latent factor models embody the idea that even high dimensional systems are driven by only a few sources of randomness. These few sources of randomness control a few factors, which in turn account for the interactions between the observations. Moreover, latent factor models provide an efficient tool for dynamic covariance matrix estimation. They allow for a reduction in the number of unknowns. A conventional latent factor model with r factors implies the m free elements of Σ t free elements of Σ t per data point 1 1 1 10 55 5.5 100 5050 50.5 1000 500500 500.5 where rank(Σ t ) = r < m, andΣ t is the diagonal matrix containing the variances of the idiosyncratic errors. The rank assumption on the symmetricΣ t gives rise to the factorizatioň Σ t = ΨΨ , where Ψ ∈ R m×r contains mr − r(r − 1)/2 free elements (see, e.g., the pivoted Cholesky algorithm in Higham 1990). Hence, m(r + 1) − r(r − 1)/2 free elements remain in Σ t , now only linear in m. Table 2 illustrates the "broken curse of dimensionality" for various values of m and r = 4.
In the following, we describe the factor SV model employed in the factorstochvol package. We model the observations y t = (y t1 , . . . , y tm ) as follows.
where f t = (f t1 , . . . , f tr ) is the vector of factors, β = (β 1 , . . . , β m ) is an observation-specific mean, and Λ ∈ R m×r is a tall matrix holding the factor loadings. The covariance matrices Σ t andΣ t are both diagonal representing independent vanilla SV processes.
For a more theoretical treatment of factor SV from a Bayesian point of view, the reader is referred to, e.g., Pitt and Shephard (1999), Aguilar and West (2000), Chib, Nardari, and Shephard (2006), and Han (2006).

Based on Equation 7, we can reformulate Equation 6 as
from which several identification issues are apparent: the order, the sign, and the scale of the factors is unidentified. More specifically, for any generalized permutation matrix 3 P of size r × r, we find another valid decomposition We resolve the ambiguity in the scale of the factors by fixing the level of their log-variance to zero, i.e.,μ j = 0 for j = 1, . . . , r. Sign and order identification can be enforced through restrictions on the factor loadings matrix Λ. Several options are available in factorstochvol for restricting Λ, for details see Section 5.2.

Prior distributions
Priors need to be specified for the mean, the latent log-variance processes, and for the factor loadings matrix Λ. We choose β j ∼ N (b β , B β ), independently for j = 1, . . . , m. For small values of B β , this shrinks β j toward b β ; for large values of B β , the prior is fairly uninformative.
The log-variance processes have the same prior specification as in the univariate case in Section 2.2. For Λ, three types of priors are currently implemented in factorstochvol. All three can be written in the form Λ ij ∼ N (0, τ 2 ij ) independently for each applicable i ∈ {1, . . . m} and j ∈ {1, . . . , r}. First, one can fix all the τ 2 ij s -not necessarily to the same value -a priori. This results in a normal prior for each element of the loadings matrix.
The second type is a hierarchical prior which has been developed to induce more flexible and potentially stronger shrinkage, This distribution is termed normal gamma prior by Griffin and Brown (2010) and implies a conditional variance V(Λ ij | λ 2 i ) of 2/λ 2 i and an unconditional excess kurtosis of 3/a. The value of a is treated as a structural parameter to be fixed by the user, where choosing a small ( 1) enforces strong shrinkage towards zero, while choosing a large ( 1) imposes little shrinkage. The case a = 1 is a special case termed the Bayesian Lasso prior (Park and Casella 2008). The parameter λ 2 i is estimated from the data with λ 2 i ∼ G(c, d). The third type is a slight modification of the second. Because variances in each row of the factor loadings matrix Λ can be seen as "random effects" from the same underlying distribution, the prior in Equation 10 induces row-wise shrinkage with element-wise adaption. Analogously, one could also consider column-wise shrinkage with element-wise adaption, i.e., with the corresponding prior λ 2 j ∼ G(c, d).

Estimation
Bayesian estimation in the factor SV model builds on the univariate vanilla SV implementations in stochvol and features several levels of efficiency boosting. To alleviate the problem of potentially slow convergence in high dimensions, it is carried out via a sampler that utilizes several variants of ASIS. The sampling details implemented in factorstochvol are described in Kastner et al. (2017, using Gaussian priors for the factor loadings) as well as Kastner (2019, using hierarchical shrinkage priors for the factor loadings).
Similarly to stochvol and in an attempt to make computation time bearable even in higher dimensions, factorstochvol's main sampler is written in C++. It uses the R package Rcpp to ease communication between R and C++. The univariate SV parts are borrowed from stochvol and interfaced through its C/C++-level updating function update_fast_sv(). In doing so, moving between interpreted R code and compiled C++ code at each MCMC iteration is avoided.

The stochvol package
The stochvol package provides means for fitting univariate SV, SVt, SVl, and SVtl models via its sampling routines svsample(), svtsample(), svlsample(), and svtlsample(), respectively. In the following, we describe a recommended workflow with stochvol. First, we discuss estimation, visualization, and prediction using default settings. Then, we show how to adapt the values of the prior hyperparameters and how to configure the sampling mechanism.

Preparing the data and running the MCMC sampler
We estimate three models that exemplify the features and the user interface of stochvol. Using the exrates data found in the package, we model the EURCHF exchange rate (the price of 1 euro in Swiss franc) in the period between March 1, 2008 and March 1, 2012 (1028 data points) in three different ways.

AR(1) model with SV residuals
The first example is an AR(1) model with SV residuals, i.e., Equation 1 turns into Using this model, we test whether the exchange rate follows a random walk with SV. In this case, we expect the posteriors of β 0 and β 1 to concentrate around 0 and 1, respectively.

Constant mean model with SVt residuals
The second example is a constant mean model with SVt residuals, i.e., Equation 2 becomes If the returns are heavy-tailed, most of the posterior mass of ν concentrates on low values, e.g., smaller than 20. Otherwise, there is little evidence for high kurtosis.

Multiple regression with SVl residuals
The third example is a multiple regression model with an intercept, two regressors, and SVl residuals; that is, Equation 3 turns into For illustration, we regress EURCHF log returns onto the contemporaneous log returns on EURUSD and EURJPY, the value of 1 euro per US dollar and Japanese yen, respectively.
To estimate a multiple regression model using stochvol, we need to prepare a numeric matrix X of dimension n × K, where rows correspond to time points and columns to covariates. We create an intercept as the first column of X, and we set the second and the third columns to the EURUSD log returns and the EURJPY log returns, respectively; finally, we use the columns of X as covariates in the multiple regression.

Visualizing the results
Often, the joint posterior distribution of model parameters and latent quantities mark the goal of a Bayesian analysis. To inspect it, one can look at summary statistics and various types of visualizations of marginal posterior distributions. Also, it is recommended to examine the through its median (black) and 5% and 95% quantiles (gray). The remaining panels summarize the Markov chains of the parameters µ, ϕ, σ, and ρ. In particular, the middle row presents trace plots and the bottom row shows prior (gray, dashed) and posterior (black, solid) densities.
Markov chain for possible convergence issues -this happens usually by investigating trace plots of posterior quantities. For this reason, inspired by the coda package, stochvol provides its own instances of the R generic functions plot() and summary(). In order to introduce the tools that stochvol provides for analyzing MCMC output, we briefly examine the results of the third example (multiple regression with SVl errors) in the remaining part of the section.
First, we plot the output of the estimation.

R> plot(res_svl, showobs = FALSE, dates = exrates$date[ind[-1]])
The result is shown in Figure 1. We see in the first row a summary of the posterior density of the volatility. Apart from its median, we also receive a quantification of the uncertainty through the 5% and the 95% quantiles at each time point. In the second row, we can follow the evolution of the Markov chain of the SV parameters. In this example, they are µ, ϕ, σ, and ρ. Lastly, we see prior and posterior density plots of the parameters in the third row in gray and black, respectively. They show high persistence and significant leverage.
Next, we observe the AR coefficients.

+ }
The result is shown in Figure 2. On the left hand side, we do not spot any signs of convergence or mixing problems in the trace plots. On the right hand side, we see that none of the posterior densities of β 0 , β 1 , and β 2 concentrate around 0, hence the covariates seem to have an impact on the dependent variable.
As the final step, we print a numeric summary of the estimation results.
R> summary(res_svl, showlatent = FALSE) ## ## Summary of svdraws object ## ## Prior distributions: ## mu~Normal(mean = 0, sd = 100) ## (phi+1)/2~Beta(a = 5, b = 1.5) ## sigma^2~Gamma(shape = 0.5, rate = 0. For brevity, we set showlatent = FALSE in order not to print all the 1027 latent states. The output shows the length of the burn-in and the number of draws, the prior specification of the parameters, and a concise summary of the marginal posterior distributions of the parameters µ, ϕ, σ, and ρ, and additionally of the level of the volatility exp(µ/2) and of σ 2 , and of the vector of regression coefficients β. This posterior summary is a table consisting of columns for the posterior mean and standard deviation, the 5%, 50%, and 95% quantiles. The user can influence the shown quantiles by passing a sequence of values between 0 and 1 to svsample(), svtsample(), svlsample(), or svtlsample() via the argument quantiles.
The last column in the table depicts the so-called effective sample size (ESS), a measure of the quality of a converged MCMC chain. Formally, ESS of a Markov chain C is defined through M/(1 + 2 ∞ s=1 ρ eff (s)), where M is the length of C and ρ eff (s) denotes the autocorrelation function for lag s among the elements of C. In principle, ESS is the sample size of a serially uncorrelated chain bearing the same Monte Carlo error as our (marginal) chain. Intuitively speaking, this means that ESS is the number of independent and identically distributed draws that were acquired and gives a sense of how well our chain has explored the posterior space. Higher values of ESS indicate better mixing.

Prediction with stochvol
We employ our estimated model to predict log returns for the remaining days in the data set. To do so, we first prepare the covariates for the next 24 days and pass them via the argument newdata of the generic predict() function along with the estimation output. Note that we need 25 days of price data to obtain 24 returns.

Rolling window estimation
Inspired by ugarchroll() in the R package rugarch (Ghalanos 2020), we introduce the suite of wrapper functions svsample_roll(), svtsample_roll(), svlsample_roll(), and svtlsample_roll(), built around their corresponding routines svsample(), svtsample(), svlsample(), and, respectively, svtlsample(), to simplify rolling window estimation of SV models. In this estimation method, either a fixed width time window is moving through the time series or a sequence of expanding time windows with the same starting time point covers larger and larger chunks of the observations, and the same model is estimated in all time windows independently. Next, each estimated model is employed for out-of-sample prediction, typically one day to one week ahead of the time window. Lastly, the set of predicted values might be used to evaluate the model fit.
In Bayesian statistics, a natural approach for assessing the predictive power of a model is through its posterior predictive distribution. Its density, also called the predictive density, is defined as where κ collects all unobserved variables, i.e., κ = (µ, ϕ, σ, ρ, ν, h, β) in the most general case of SVtl, and the domain of integration K is the set of all possible values for κ. We follow Geweke and Amisano (2010) in our notation by using a superscript o for the vector of observed values y o [1:t] = (y 1 , y 2 , . . . , y t ) . Equation 12 can be seen as the integration of the predictive likelihood over the posterior distribution of all parameters and therefore it accounts for posterior parameter uncertainty for the predicted values.
The integral in Equation 12 has no closed form and its dimensionality increases with t; it is intractable. Hence, we rely on Monte Carlo integration and we simulate from the posterior predictive distribution. For the evaluation of the predictive density at an observation x = y o t+1 , called the predictive likelihood, we apply the computation where κ (m) denotes the mth posterior sample from the estimation procedure of the SV model. For other applications, the quantiles of the posterior predictive distribution, henceforth the predictive quantiles, might be of interest. We estimate the q% quantile through random variates simulated from y t+1 ∼ p(y t+1 | y o [1:t] ), which we acquire by repeating two steps for m = 1, . . . , M : Step 1. Simulate κ (m) from the SV posterior p(κ | y o [1:t] ), and Step 2. Simulate y Lastly, we take the q% quantile of the sample vector (y t+1 ) as the approximate q% quantile of the predictive density. We implement the estimation of both the predictive likelihood and predictive quantiles in stochvol.
All four rolling window routines svsample_roll(), svtsample_roll(), svlsample_roll(), and svtlsample_roll() bear the same programming interface. They expect as their first argument the input data y [1:L] , which is of length L. For estimating the SV model in each time window j = 1, . . . , J in the moving or expanding window scheme, the sub-vector y [j:(t+j−1)] , or, respectively, y [1:(t+j−1)] , is taken as data and is used to predict n ahead ≥ 1 time steps ahead. The width t of the first time window can be determined from L, J, and n ahead . The following example demonstrates how the rolling window sampling routines can be called in stochvol.
R> set.seed(5) R> res <-svsample_roll(CHF_logret, n_ahead = 1, forecast_length = 30, + refit_window = "moving", calculate_quantile = c(0.01, 0.05), + calculate_predictive_likelihood = TRUE) Argument n_ahead is used to set n ahead , forecast_length is used to set J, and refit_window expects either "moving" or "expanding" to set the rolling window scheme to moving or expanding, respectively. Argument calculate_quantile expects a vector of numbers between 0 and 1; the numbers are interpreted as the quantiles to be predicted. Furthermore, if calculate_predictive_likelihood is set to TRUE, the function estimates the predictive likelihood. Lastly, the output res is a list of length J, i.e., one element for each time window. It contains the respective posterior quantile and predictive likelihood results together with all posterior parameter draws for κ.

Specifying the prior hyperparameters
As discussed in Section 2.2, the prior distributions need to be specified before the estimation process can start. Concerning the common model parameters µ, ϕ, and σ, all of svsample(), svtsample(), svlsample(), and svtlsample() expect through their input arguments priormu, priorphi, and priorsigma values for (b µ , B µ ), (a ϕ , b ϕ ), and B σ , respectively. Furthermore, all sampling functions accept the argument priorbeta to set an independent prior for the regression coefficients by providing (b β , s β ), where b β and s β are the common mean and, respectively, the common standard deviation. For a general multivariate normal distribution, the specify_priors() interface exists, which we detail later in this Section. The prior for ν can be influenced in svtsample() and svtlsample() by passing λ ν as the argument priornu. Finally, svlsample() and svtlsample() take the numeric sequence (a ρ , b ρ ) through the input argument priorrho.
The code snippet below shows all the default values of the prior hyperparameters.
All input arguments for specify_priors are optional, their default values and how they are used is seen below.

Setting up the Markov chain
When conducting Bayesian inference using an MCMC sampling scheme, the number of draws from the posterior distribution, the length of the so-called burn-in phase, the initial values of the Markov chain, and the various strategies of storing the results are all of general interest. The input arguments draws and burnin settle the first two points. A sample size of burnin + draws is acquired from the posterior distribution out of which the first burnin number of draws are thrown away. The default is to draw 10000 elements after a burnin of 1000 for SV models without leverage, and draw 20000 elements after a burnin of 2000 for SV models with leverage, which in our experience is enough for most applications.
As for the initial values, startpara and startlatent provide a way to set them. The argument startpara is expected to be a named list mapping parameter names to starting values, and startlatent must be a sequence of length m that contains starting values for h. Default values are set to be the prior mean for ϕ, σ, ν, and ρ, these have only minor influence on the Markov chain. The default value for β is the ordinary least squares estimator (X X) −1 X y, where X denotes the regression design matrix and y denotes the vector of observations. After setting β, the level of log-variance µ is initialized according to the Bayesian linear regression where ξ t ∼ N (−1.27, 4.934). Equation 14 results from the first line of Equation 1 by fixing h t at its stationary expected value µ and then taking x → log(x 2 ) of both sides. The homoskedastic error term ξ t is acquired as the Laplace approximation to log(ε 2 t ) (Harvey and Shephard 1996). At the end, by default all values of the vector startlatent are set to the initial value of µ.
It is customary to start independent Markov chains in parallel and stochvol provides facilities for that in all of its sampling procedures. The argument n_chains is expected to be a positive integer, it sets the number of independent chains. Additionally, arguments parallel, n_cpus, and cl can be used to control parallelism used by stochvol. To overwrite the default sequential execution strategy, parallel is to be set either to "snow", to employ the so-called "SNOW" clusters, or to "multicore" to use the "multicore" type computation (R Core Team 2020). Next, argument n_cpus should be set to the physical number of parallel processing units to be used. Finally, in case "SNOW" is applied, the sampling routines optionally accept an already running "SNOW" cluster through argument cl.
As mentioned earlier, the sampling algorithms for the latent states h in stochvol rely on a Gaussian mixture approximation as in Omori et al. (2007) and Kastner and Frühwirth-Schnatter (2014). The approximation tends to be very good, therefore the default setting is not to correct for model misspecification. However, this correction can be enabled in all of the sampling routines through the expert argument as shown for svsample() in the following.
Lastly, stochvol provides three ways to economize storage during and after the execution of the sampler. Setting the integer argument thinpara to ι tells the sampler to store only every ιth draw of the vector of parameters, and supplying a value for thinlatent does the same for h. Finally, one has the opportunity not to store the full vector h but only its last value by setting keeptime = "last". The default behavior is to store every draw after the burn-in phase. The most common workflow of using factorstochvol for fitting multivariate factor SV models consists of the following steps: (1) Prepare the data, (2) decide on an identification structure, (3) specify the prior hyperparameters, (4) run the sampler, (5) investigate the output and visualize the results, and (6) predict (if required). These steps are described in detail in the following sections.

Preparing the data
The workhorse in factorstochvol is the sampling function fsvsample(). It expects the data to come in form of a matrix Y = (y 1 , . . . , y n ) with n rows and m columns. For illustration, we use the exchange rate data set in stochvol which contains 3140 daily observations of exchanges rates for 23 currency pairs against EUR, ranging from March 3, 2000 to April 4, 2012. To keep the analysis simple and computation times moderate, we however only model the last 1001 days of the first six series in alphabetical order (Australian dollar, Canadian dollar, Swiss franc, Czech koruna, Danish krone, Great British pound) for further analysis. Instead of using the nominal exchange rates we compute log returns. This leaves us with a data set of size n = 1000 and m = 6. The data is prepared using the code snippet below and visualized in Figure 4 using the zoo package (Zeileis and Grothendieck 2005).

Deciding on an identification structure
The likelihood in factor models is invariant to certain factor transformations such as reordering of factors and their loadings or sign switches thereof. In addition to this, it is often multimodal. Consequently, identifying the factor loadings is far from trivial. The most common way to address this issue in factor SV models is to impose a lower-diagonal factor loadings matrix where all elements above the diagonal are set to zero (e.g., Aguilar and West 2000;Chib et al. 2006;Han 2006;Zhou, Nakajima, and West 2014). To use this constraint in factorstochvol, the argument restrict = "upper" can be passed to the main sampling function fsvsample(). Evidently, this practice imposes an order dependence, as, e.g., the first variable is not allowed to load on anything else but the first factor.
A rather ad hoc method for automatically ordering the data is implemented in the helper function preorder(). After a maximum likelihood factor model fit to the data (using factanal() from the stats package with the default varimax rotation), the series are ordered as follows: The variable with the highest loading on factor 1 is placed first, the variable with the highest loading on factor 2 second (unless this variable is already placed first, in which case the variable with the second highest loading is taken), et cetera. For the data set at hand, this would imply the following ordering for a two-factor model.

R> preorder(y, factors = 2)
## [1] 2 3 1 4 5 6 According to this algorithm, the second series should be placed first and the third series should be placed second. Thereafter, the alphabetical ordering remains.
To achieve this effect without reordering the data, a logical matrix of size m×r can be passed to fsvsample() via restrict, where the entry TRUE means that this element is restricted to zero; FALSE means that it is to be estimated from the data. Similarly to preorder(), the function findrestrict() tries to automate this procedure. Again, the maximum likelihood estimates from a static factor analysis are used; however, findrestrict() uses a slightly different algorithm than the one above: The variable with the lowest absolute loadings on factors 2, 3, . . . , r (relative to factor 1) is determined to lead the first factor, the variable with the lowest absolute loadings on factors 3, 4, . . . , r (relative to factors 1 and 2) is placed second, et cetera. Below is the result for the data set at hand.
R> findrestrict(y, factors = 2) If fsvsample() is called with the argument restrict = "auto", it automatically invokes findrestrict() with the appropriate number of factors. Using restrict = "none" (the default) causes the sampler not to place any constraints on the loadings matrix; thus, the resulting posterior draws may be unstable or suffer from multiple local modes. If, however, inference on the factor loadings themselves is not the primary concern of the analysis, leaving the factor loadings unidentified may be the preferred option. This is in particular the case when inference for the covariance matrix is sought, as this only depends on Λ through the rotation-invariant transformation of Equation 9. For a more elaborate discussion of these issues, we refer the reader to Sentana and Fiorentini (2001) who discuss automatic identification through heteroskedasticity. A comparison of log predictive scores under different identification schemes for factor SV models is given in Kastner et al. (2017); see also Frühwirth-Schnatter and Lopes (2018) for related issues in static factor models. To continue with the current example, we chose not to place any a priori restrictions on the factor loadings matrix while using a row-wise normal-gamma shrinkage prior on the factor loadings matrix (cf. Kastner 2019).

Specifying prior hyperparameters
Apart from the obvious prior choice about the number of factors and the identification scheme discussed above, a number of hyperparameter choices are available in factorstochvol. Regarding the log-variance processes, the interface is analogous to that of svsample(). In the following, i = 1, . . . , m and j = 1, . . . , r index the idiosyncratic and the factor log-variance processes, respectively. The pair of common prior hyperparameters (b β , B β ) can be passed as a sequence of length two to priorbeta. The common prior ofμ i can be set by passing a sequence of length two -the mean and the standard deviation of the normal distribution -to priormu; the common priors ofφ i andφ j can also be set by passing sequences of length twothe parameters of the corresponding beta distribution -to priorphiidi and to priorphifac, respectively; similarly, the common priors ofσ i andσ j can be specified via the arguments priorsigmaidi and priorsigmafac, respectively, that accept as positive numbers the scale B σ of the corresponding gamma distribution.
As discussed in Section 3.2, factorstochvol offers three specifications as priors for Λ, controlled through the argument priorfacloadtype. To use the first option (priorfacloadtype = "normal"), one needs to fix the values of τ ij a priori. The user can pass these fixed values to fsvsample() via the argument priorfacload, either as an m × r matrix with positive entries or as a single positive number which will be recycled accordingly. For the second option, the normal gamma prior with row-wise or column-wise shrinkage (priorfacloadtype = "rowwiseng" and priorfacloadtype = "colwiseng", respectively), the value of argument priorfacload is then interpreted as the shrinkage parameter a. Both specifications of the normal gamma prior need the values c and d. They can be set as a two-element vector passed to the argument priorng.

Running the MCMC sampler
Running the sampler corresponds to invoking fsvsample(). Apart from the prior settings discussed above, its most important arguments are listed below with the default value in brackets. For a complete list of all arguments and more details, see ?fsvsample.
• y: the data; • factors [1]: the number of factors; • draws [1000]: the number of MCMC samples to be drawn after burnin; • thin [1]: the amount of thinning (every thin th draw is kept); • burnin [1000]: the length of the burn-in period, i.e., the number of MCMC draws to be discarded before the samples are considered to emerge from the stationary distribution, • zeromean [TRUE]: a logical value indicating whether β is to be estimated from the data or whether β is set to zero (the default); • keeptime ["last"]: either "all", meaning that all latent log volatilities are being monitored at all points in time, or "last", meaning that the latent log volatility draws are only stored at t = n, the last point in time; the latter setting is the default to avoid excessive memory usage in higher dimensions; • heteroskedastic [TRUE]: indicator(s) to turn off stochastic volatility for the idiosyncratic variances, the factor variances, or both; • samplefac [TRUE]: indicator to turn off sampling of the factors; useful to work with observed instead of latent factors (see Kastner 2019, for a use case of this); • runningstore [6]: to avoid having to store all MCMC draws, fsvsample's default is to compute and store the first two ergodic moments of some interesting quantities (namely log variances, factors, volatilities, covariance matrices, correlation matrices, communalities) only; the default (runningstore = 6) is to compute and store everything; however, one can set runningstore to a lower number to save computation time; the argument runningstoremoments [2] can further be used to modify the number of moments to be stored; • runningstorethin [10]: indicates how often ergodic moments should be calculated, where 1 means that this should be done at every iteration and higher numbers lessen both runtime as well as accuracy; • quiet [FALSE]: a logical indicator determining the verbosity of fsvsample.
For our illustrative example, most settings are left at their default values. The number of factors is increased from one to two, instead of 1000 we sample 10000 draws, we estimate a constant mean, a thinning of 10 is used, and quiet is set to TRUE.

Investigating the output and visualizing the results
The resulting object • draws of certain posterior quantities such as the factors f , the factor loadings Λ, the various factor and idiosyncratic SV parameters, the latent factor and idiosyncratic log variancesh andh, and the intercept β, • configuration settings such as the number of draws, potential restrictions on the loadings matrix, prior hyperparameters, etc., • running moments (such as means and standard deviations) of quantities of interest, depending on the values of runningstore and runningstoremoments specified when calling fsvsample(), • the data input y.
For more details, please investigate str(res) and/or ?fsvsample.
Using covmat(), one can extract the MCMC draws of the implied covariance matrices for all points in time which have been stored during sampling. By default, this is the last point in time (keeptime = "last"), and thus R> dim(cov_n <covmat(res)) ## [1] 6 6 1000 1 shows that we have stored 1000 posterior draws of a 6 × 6 covariance matrix at one point in time, t = n = 1000. To check convergence, one can take a look at the trace plot and the autocorrelation function of the log determinant, i.e.,
To assess the mixing speed of each individual covariance matrix element, one can check, e.g., the estimated effective sample size (out of 1000 draws kept) which is implemented in coda. Again, no major convergence problems are apparent.  Assuming that runningstore was set sufficiently high when sampling, several convenience functions can be used for quick visualizations without having to post-process the MCMC draws. For example, to visualize the time-varying correlation matrices, consider R> corimageplot(res, these = seq(1, n, length.out = 3), plotCI = "circle", + plotdatedist = 2, date.cex = 1.1) which produces the three estimated posterior correlation matrices depicted in Figure 6. Setting plotCI = "circle" visualizes posterior uncertainty -inner and outer radii correspond to the posterior mean plus/minus two standard deviations, respectively.
To get an idea about how the marginal volatilities evolve over time, voltimeplot() can be used. To exemplify, R> palette(RColorBrewer::brewer.pal(6, "Dark2")) R> cortimeplot(res, 1) R> cortimeplot(res, 2) yields the estimated pairwise correlations in Figure 8. While, relative to EUR, the estimated correlation between AUD and CAD appears to be relatively stable over time, correlations with CHF can become negative at times. To visualize the communalities, i.e., the proportions of variances explained through the latent factors, invoke R> comtimeplot(res, maxrows = 6) which yields the estimated communalities in Figure 9.
To gain an even deeper understanding of the estimated model, we now turn towards examining the latent factors and their variances themselves. To visualize the loadings, the functions facloadpairplot(), facloadcredplot(), facloadpointplot(), facloadtraceplot(), and facloaddensplot() are available; the former two are exemplified in Figure 10. Moreover, we can see the factor log variances produced through logvartimeplot(res, show = "fac"). Similarly, logvartimeplot(res, show = "idi") produces plots of the idiosyncratic log variances which are displayed in Figure 11.
In order to provide some guidance when it comes to selecting the number of factors, factorstochvol ships with the function evdiag(). It computes and visualizes the eigenvalues of Λ Λ which can be used as a rough guide in analogy to a scree plot for static factor models.
To use it, one can fit a model with a relatively large number of factors and assess the importance of each of these (in descending order). For example, the code below produces Figure 12, hinting at a model with no more than four factors.

Predicting covariances, correlations, and future observations
One of the main use cases of factorstochvol might be to predict covariance and correlation matrices of time series. To this end, predcov() and predcor() yield draws from the posterior predictive distribution of these, defined in analogy to Equation 12. For instance, the code below can be used to obtain one-step-ahead posterior predictive means and standard deviations for the correlation matrix on April 5, 2012 (using data up to April 4 only).

Eigenvalues of crossprod(facload)
Posterior draws Posterior means Figure 12: Eigenvalues of Λ Λ which can be used as a rough guide to selecting the number of factors.
To obtain draws from the posterior predictive distribution of new data points, one can simply draw from the corresponding mixture of multivariate normals. In Figure 13, these draws are visualized via heatpairs() from LSD (Schwalb, Tresch, Torkler, Duemcke, Demel, Ripley, and Venables 2018).

Summary and discussion
We extended the work of Kastner (2016) to other SV models, including the univariate heavytailed SV, the SV model with leverage, and the multivariate factor SV model. We showcased the features that are the most important to end users in R: estimation through the sampler functions, visualization, summary, and prediction methods. Due to its more involved nature, however, we did not include the description of the C++ interface. Two functions called update_fast_sv() and update_general_sv() are exported and programmers have the possibility to access the samplers in stochvol directly from C++ after linking to the compiled package. For usage examples, see the implementations of factorstochvol or shrinkTVP (Knaus, Bitto-Nemling, Cadonna, and Frühwirth-Schnatter 2020). Source code is available in the GitHub repositories https://github.com/gregorkastner/stochvol and https://github.com/gregorkastner/factorstochvol.