Dimension Reduction for Time Series in a Blind Source Separation Context Using R

Multivariate time series observations are increasingly common in multiple ﬁelds of science but the complex dependencies of such data often translate into intractable models with large number of parameters. An alternative is given by ﬁrst reducing the dimension of the series and then modelling the resulting uncorrelated signals univariately, avoiding the need for any covariance parameters. A popular and eﬀective framework for this is blind source separation. In this paper we review the dimension reduction tools for time series available in the R package tsBSS . These include methods for estimating the signal dimension of second-order stationary time series, dimension reduction techniques for stochastic volatility models and supervised dimension reduction tools for time series regression. Several examples are provided to illustrate the functionality of the package.


Dimension reduction and BSS for time series
In many fields of applied science multivariate time series, x t = (x t,1 , . . . , x t,p ) , t = 1, . . . T , are collected. Examples include geophysical time series, telecommunications data, financial time series and biomedical time series, e.g., EEG, MEG and fMRI. Such data are often characterized by two main features, which both pose problems for multivariate time series modelling. First, the p components in multivariate time series data are often correlated and therefore individual time series cannot be modelled using univariate time series models. Models such as multivariate autoregressive moving average (ARMA) and multivariate generalized autore-gressive conditional heteroskedasticity (GARCH) include huge numbers of parameters and model fitting becomes computationally impractical unless the models are noticeably simplified. Second, the data can be high-dimensional and might contain a high unknown number of noise components. As an example, consider current biomedical datasets where the number of time series components vary from hundreds to millions, and the main aim of the analysis is to separate the signals of interest from noise. For such high-dimensional data, fitting multivariate time series models might become impossible, and at the very least unreasonable as it is not sensible to assign a huge number of model parameters for noise components. These two features often encountered in multivariate time series data motivate us to consider dimension reduction methods suitable for time series.
In this paper we illustrate how blind source separation (BSS) methods (see for example Comon and Jutten 2010;Nordhausen and Oja 2018) can be used for dimension reduction in a time series context. Recall first that a (linear) BSS model assumes that the observable p-variate time series x = (x t ) t=0,±1,±2,... satisfy where µ ∈ R p is a p-variate location vector, Ω ∈ R p×p is a full-rank mixing matrix and z = (z t ) t=0,±1,±2,... is a p-variate latent time series satisfying E(z t ) = 0, Cov(z t ) = I p and Cov τ (z t ) = E(z t z t+τ ) = Λ τ is a diagonal matrix for all τ = 1, 2, . . . .
The p time series in z are thus weakly stationary and uncorrelated, and the model is often referred to as a second-order source separation model (SOS). If we make the stronger assumption on the independence of the p time series in z, then the model is called the independent time series model. The BSS approach for dimension reduction is often preferred since the obtained components can be treated as independent, allowing univariate fitting and prediction methods. For example Van der Weide (2002) and Broda and Paolella (2009) have used this idea for multivariate GARCH modelling. Furthermore, with BSS one can also decide componentwise whether an estimated component is relevant or not. For related research on this, see Matteson and Tsay (2011).
There are two different approaches to define signal and noise in BSS. One commonly used noisy BSS model is the one where an additive external noise vector is included in model (1), that is, we assume that x t = µ + Ωz t + t , t = 0, ±1, ±2, . . . , where t ∈ R p is a vector of white noise (Comon and Jutten 2010). The other approach, and the one we prefer, differs from the noisy BSS model in that it assumes that q of the p sources are signal and p − q are internal noise components. In particular, we assume that the source vector z t can be partitioned into two parts, z t = (s t , w t ) , so that the first part s t ∈ R q includes the sources of interest while w t ∈ R p−q represents the noise components. To be more specific, we assume the following BSS model x t = µ + Ωz t = µ + Ω 1 s t + Ω 2 w t , t = 0, ±1, ±2, . . . , where µ ∈ R p is a location vector, Ω = (Ω 1 , Ω 2 ) ∈ R p×p is a full-rank mixing matrix, Ω 1 ∈ R p×q and Ω 2 ∈ R p× (p−q) , and the p-variate latent time series z t = (s t , w t ) satisfy (A1) E(z t ) = 0 and Cov(z t ) = I p , (A2) s t ∈ R q and w t ∈ R p−q are independent.
(A3) Cov τ (z t ) = Λ τ 0 0 0 , where Λ τ ∈ R q×q is a diagonal matrix for all τ = 1, 2, . . . Notice that in model (2) no distributional assumptions are made for the noise components w t , we simply require that they do not show any linear autocorrelations. Examples of such noise components include white noise as well as components exhibiting volatility clustering. The latter components have zero linear autocorrelations but non-zero quadratic autocorrelations. Notice also that in the BSS model (2) the components in s t can be identified at most up to signs and permutations of its components, and Ω 1 at most up to the signs and permutation of its columns. Matrix Ω 2 is identifiable only up to post-multiplication with an orthogonal (p − q) × (p − q) matrix.
We note that other approaches besides BSS are frequently used for the dimension reduction of multivariate time series data. We next list some of the most well-known of these and discuss the benefits of BSS over them. The simplest options are arguably the classical principal component analysis (PCA) and factor analysis, which however treat the data as independent observations. As the ability to use temporal information is crucial in analyzing time series, a number of extensions of the two methods that allow for this have been suggested in the literature. The most prominent example is given by dynamic factor models, see Stock and Watson (2010); Ensor (2013) and the references therein, where the latent factors are most often assumed to obey a vector autoregressive structure. This is in strict contrast to the BSS model (1) where no structural assumptions on the latent variables are made (beyond temporal uncorrelatedness). As an example of this versatility, consider the methods SOBI and vSOBI discussed in Section 2 and Section 3, respectively, which both comply with the BSS framework but the former assumes the sources to be standard linear processes and the latter that they are series exhibiting stochastic volatility. For a more recent extension of PCA, see the time series PCA (Chang, Guo, and Yao 2018) which is actually based on a slight modification of the model (1) and thus constitutes a form of BSS. Other contenders include, e.g., non-negative matrix factorizations (Cheung, Devarajan, Severini, Turolla, and Bonato 2015;Févotte, Smaragdis, Mohammadiha, and Mysore 2018) and autoencoders (Bianchi, Livi, Mikalsen, Kampffmeyer, and Jenssen 2019). The involved complicated constraints and structures make such models difficult to analyze theoretically, whereas the relatively straightforward form of the BSS model makes it possible to obtain convergence rates and other theoretical guarantees (Miettinen, Illner, Nordhausen, Oja, Taskinen, and Theis 2016). To summarize, the advantages of BSS over its competitors lie in its ability to model temporal data while making minimal assumptions on the latent variables and in the BSS model which is simultaneously complicated enough to be useful and simple enough to study theoretically.
This paper is organized as follows. In Section 2 we recall two classical second-order source separation methods and show how they can be used to estimate the dimension of the interesting source components in model (2). In Section 3 we consider dimension reduction in a case where the time series exhibit stochastic volatility and our main interest is finding out whether some of the components lack stochastic volatility features. In Section 4 several supervised dimension reduction methods for multivariate time series are discussed. Section 5 reviews the existing packages available in R (R Core Team 2021) for multivariate time series and BSS as well as describes the functionality of the R package tsBSS (Matilainen, Croux, Miettinen, Nordhausen, Oja, and Taskinen 2021). In Section 6 several examples are given to illustrate the functionality of the tsBSS package. The package tsBSS is available from the Comprehensive R Archive Network (CRAN) at https://CRAN.R-project.org/package=tsBSS.

Dimension estimation using AMUSE and SOBI
In this section we illustrate how two classical second-order source separation methods, AMUSE (algorithm for multiple unknown signals extraction) by Tong, Soon, Huang, and Liu (1990) and SOBI (second-order blind identification) by Belouchrani, Abed-Meraim, Cardoso, and Moulines (1997), can be used to estimate the dimension of important components in the BSS model. Let us start by recalling the two methods. Hence, assume for a moment that x t follows a BSS model (2) with q = p, that is, there are no noise components included in the model. Assume also, without loss of generality, that µ = 0. The aim of second-order source separation is to find an unmixing matrix W ∈ R p×p such that the component time series in W x t are standardized and mutually uncorrelated, that is, W x t = z t up to location shifts, sign changes and permutations of the components.
Write now x st t = Cov(x t ) −1/2 x t for a standardized time series. Cardoso and Souloumiac (1993) showed that the standardized series then satisfy x st t = U z t , where U ∈ R p×p is an unknown orthogonal matrix. The standardization thus solves the BSS problem up to rotation, that is, the unmixing matrix is given by W = U Cov(x t ) −1/2 . In AMUSE, the rotation matrix U is found simply using the eigendecomposition of the autocovariance matrix with given lag τ , as by assumption, Cov τ (x st t ) = U Cov τ (z t )U = U Λ τ U for any lag τ . For the statistical properties of AMUSE estimators, see Miettinen, Nordhausen, Oja, and Taskinen (2012).
The drawback of the AMUSE procedure is that, in order to separate all p source components, we need to assume that for a chosen lag τ , the eigenvalues in Λ τ are distinct. As this may not hold in practice, it is often better to use approximate joint diagonalization of several autocovariance matrices as suggested in Belouchrani et al. (1997). Their SOBI method is a natural extension of AMUSE as the solution is given by that is, we find a rotation that makes the autocovariance matrices defined by a set of lags T = {τ 1 , . . . , τ K } "as diagonal as possible". Different algorithms for approximate joint diagonalization yield naturally different SOBI solutions. The statistical properties of various SOBI estimators are discussed in Miettinen, Nordhausen, Oja, and Taskinen (2014); Illner et al. (2015); Miettinen et al. (2016) where the performances of the estimators are also compared. In most cases, the SOBI method outperforms the AMUSE method.
For more details about joint diagonalization in general and especially in R, see Miettinen, Nordhausen, and Taskinen (2017) and the references therein. In the following, joint diagonalization will be a reoccuring theme in most discussed methods and we will always use for this purpose an algorithm based on Given's (or Jacobi) rotations as suggested by Clarkson (1988) and implemented in R, for example, in the package JADE (Miettinen et al. 2017). The joint diagonalization of K symmetric matrices of size p × p is an operation of complexity O(Kp 3 ) (whereas the ordinary eigendecomposition has K = 1 and the complexity O(p 3 )), and as such some care has to be taken when applying the methods to data sets where the number of variables is measured in hundreds rather than in tens. Moreover, as is common with eigendecomposition-type algorithms, the possibility of numerical issues should be taken into account when working with data sets having p in the high hundreds.
Consider then the BSS model (2) with q < p, that is, the model consists of mutually uncorrelated and stationary components as well as of noise components which are not of interest. When using AMUSE, the eigendecomposition of an autocovariance matrix of the standardized vectors where U 1 ∈ R p×q , U 2 ∈ R p×(p−q) , and Λ τ ∈ R q×q is a diagonal matrix consisting of the q marginal τ -th autocovariances of the latent series. As noise components do not show any linear autocorrelations, the last p − q diagonal values inΛ τ are equal to zero. The aim is thus to find a rotation matrix U 1 so that the latent source components are given by s t = U 1 x st t up to sign changes and permutation. In Matilainen, Nordhausen, and Virta (2018), the noise components are identified simply by investigating the diagonal elements ofΛ 2 τ , where the squaring is used for ordering the sources in decreasing order of interestingness. The noise components are then the last p − q components having zero eigenvalues. For finite samples, the eigenvalues are naturally never equal to zero, and we look for p − q components having "small enough" eigenvalues. In Matilainen et al. (2018) a bootstrap-based test for testing the null hypothesis H 0k : q = k is proposed, and in  an asymptotic test is given. The signal dimension can then be estimated by testing successively H 0k for all possible dimensions k = 0, . . . , p − 1 and using the resulting chain of p-values to pin-point the correct dimension. To avoid testing all p hypotheses, one can also use, e.g., a divide-andconquer approach, see Matilainen et al. (2018) for details. Previously similar approaches have been used in the context of independent component analysis (ICA) in Nordhausen, Oja, and Tyler (2017a) and Nordhausen, Oja, Tyler, and Virta (2017b).
The AMUSE method based dimension reduction suffers again from one major drawback. The q components of interest must have non-zero autocovariances with the given lag so that they can be separated from noise components. To overcome this problem, the AMUSE method can be replaced with the SOBI method, which uses K autocovariance matrices in estimation. In order to find all latent source components, it is only required that for each source its autocovariance with one of the lags τ ∈ T must be non-zero. In Matilainen et al. (2018) and , bootstrap and asymptotic tests based on the sum of squared "eigenvalue matrices" τ ∈TΛ 2 τ are used. Notice that as the SOBI procedure is based on joint diagonalization, matricesΛ τ are not exact eigenvalue matrices, but obtained asΛ = diag(U Cov τ (x st t )U ). Nevertheless, the testing procedure is similar to that used in the case of AMUSE. For a comparison of different approaches for testing the dimension of source subspace, see Matilainen et al. (2018) and .
As already stated, the choice of the lag set T is of prime importance in the SOBI-based tests as they only recognize signals having autocorrelation on at least one lag in T . As such, T should be chosen to contain enough lags to identify all the signals, but at the same time be kept small enough, as including too many unnecessary lags can lead into slower computations and noisy estimation in practice. The default set {1, . . . , 12} used in tsBSS exhibits this compromise. The previous naturally holds even stronger for AMUSE which allows only for a single lag τ (yielding faster performance in return) and unless expert knowledge on an appropriate lag is available, SOBI should be used instead of AMUSE. The choice of lags for SOBI is for example also discussed in Tang, Liu, and Sutherland (2005); .
The above idea to use the magnitude of "eigenvalues" to determine which of the components are noise components and which are interesting source components, can also be visually investigated with a scree plot. In such a plot it is however often difficult to decide when an "eigenvalue" is small enough to be related to a noise component. Recently, Luo and Li (2016) extended the scree plot into the ladle plot where, in addition to the information contained in the eigenvalues, also information from the corresponding eigenvectors is used. The main idea in the ladle plot is that the eigenvectors related to noise components span a space and are not well-defined as compared to the eigenvectors related to components with unique signal eigenvalues. Thus measuring this variability using bootstrap sampling and combining it with the information from the eigenvalues gives us a better picture of the number of noise components.
The ladle plot based on AMUSE and SOBI was introduced recently in Nordhausen and . The plots are constructed using stationary time series bootstrapping ideas as follows. Let λ 2 0 , λ 2 1 , . . . , λ 2 p−1 be the (re-indexed) "eigenvalues" in decreasing order and define the normalized function φ : Next, letB k contain the first k corresponding columns of the (joint) diagonalizer of AMUSE (SOBI) computed for the original data sample and denote by B * k,j , j = 1, . . . , m, the corresponding quantity for the j-th bootstrap sample. The average variation (aroundB k ) of the space spanned by the first k vectors can then be measured as for k = 1, . . . , p − 1 with f n (0) := 0. The ladle function g n (k) is now obtained simply by summing the functions φ n and f n , and visualizing the ladle function then yields the ladle plot. The ladle estimator for the source subspace dimension is the value of k for which g n (k) is minimal. For further details, the interested reader is referred to Luo and Li (2016) and Nordhausen and Virta (2018).

Dimension reduction in the context of volatility clustering
As discussed earlier, AMUSE and SOBI use linear autocovariances in estimation and have proven to be powerful methods when separating uncorrelated linear processes from noise. However, as these two methods treat also components exhibiting volatility clustering as noise, they cannot be used for estimating such time series. If this is our main interest, then extensions of second-order source separation methods are needed. Matilainen, Nordhausen, and Oja (2015) considered the independent component model for time series (which assumes that the component series in z t are independent) and proposed two methods which use fourth-order cumulants in estimation: In gFOBI (generalized FOBI), The gFOBI estimator is then given by W = U Cov(x st t ) −1/2 . Notice that by using τ = 0 we obtain the classical FOBI (fourth-order blind identification) method which was proposed already in Cardoso (1989) for solving the independent component analysis problem.
The gJADE (generalized JADE) method, in turn, extends the classical JADE (joint approximate diagonalization of eigenmatrices) method by Cardoso and Souloumiac (1993) to a time series context by using a larger class of fourth-order cumulants in estimation. Write the cross-cumulant matrices as where E jk = e j e k , j, k = 1, . . . , p with e i denoting the vector with a 1 in the i-th coordinate and 0's elsewhere. Then the gJADE solution is given by Again, the choice τ = 0 reduces to the classical JADE solution.
Another branch of methods designed for separating independent time series exhibiting volatility clustering is based on the use of nonlinearity functions in the estimation. An extension of classical SOBI method was proposed in Matilainen, Miettinen, Nordhausen, Oja, and Taskinen (2017b). Their vSOBI (variant of SOBI) estimate is given by and G can be any twice continuously differentiable function. Commonly used choices for G are G(y) = y 2 and G(y) = log(cosh(y)). The vSOBI method also extends the two FixNA (Fixedpoint algorithms for maximizing nonlinear autocorrelation) methods proposed by Hyvärinen (2001) and Shi, Jiang, and Zhou (2009). For the comparisons of gFOBI, gJADE, FixNA and vSOBI, see Matilainen et al. (2017b).
The methods recalled above are designed for separating time series which do not show any linear autocorrelations. However, if our interest is in separating uncorrelated stationary time series from components exhibiting volatility clustering, that is, we consider the BSS model (2) with the p − q noise sources actually being components with volatility clustering, the above methods are not applicable. Miettinen, Matilainen, Nordhausen, and Taskinen (2019) combined a second-order source separation method (SOBI) with vSOBI using G(x) = x 2 . Their gSOBI (generalized SOBI) solution is given as Here T 1 and T 2 are the lag sets for the linear and the quadratic parts, respectively, and b ∈ [0, 1] assigns the weights for the two parts. The first part of the objective function is designed to find components which are linear processes and the second part components with volatility clustering. In Miettinen et al. (2019) the statistical properties of the gSOBI estimator are given, and its performance under different values for b is studied using simulation studies. The value of b should be larger than 0.5 and b = 0.9 is suggested for general use. Miettinen et al. (2019) also propose a test for identifying whether there is volatility clustering in the components or not. In order to do this, possible linear autocorrelation of a component needs to be removed first.

Supervised dimension reduction for time series
The methods described above can be considered as unsupervised dimension reduction methods as no specific response, which should be modelled based on x t , was assumed. In supervised dimension reduction it is assumed that we have observed a univariate response time series y t and that the dimension of x t should be reduced without losing information about y t and without knowing the functional relationship between y t and x t . Supervised dimension reduction methods are well established for iid data, the most popular methods being sliced inverse regression (SIR) (Li 1991) and sliced average variance estimation (SAVE) (Cook 2000). These and other iid supervised dimension reduction methods are implemented, for example, in R in the package dr (Weisberg 2002).
Supervised dimension reduction in case of time series data is however much more difficult as the dependence between y t and x t may also lag in time. To address this problem Matilainen, Croux, Nordhausen, and Oja (2017a) and Matilainen, Croux, Nordhausen, and Oja (2019) suggested time series versions of SIR and SAVE as well as a weighted linear combination of the two methods. The three approaches are denoted TSIR, TSAVE and TSSH (time series SIR SAVE hybrid), respectively, and recalled in the following. Let now y = (y t ) t=0,±1,±2,... and x = (x t ) t=0,±1,±2,... be (weakly and jointly) stationary univariate and p-variate time series, respectively. We then assume that where function f is unspecified, = ( t ) t=0,±1,±2,... is an unspecified stationary noise process independent from x t , and the explaining time series x t follow the BSS model where again µ ∈ R p is a location vector and Ω ∈ R p×p is a full rank mixing matrix. The stationary p-variate source time series z t can be partitioned into z t = (z The dimension q denotes the smallest value which fulfills the conditions (B1) E(z t ) = 0 and COV(z t ) = I p and (B2) (y, z (1) ) ⊥ ⊥z (2) .
All the information needed to model y t is therefore contained in the process z (1) t and one can write with another unspecified function f 0 , possibly depending on Ω and µ. As in the unsupervised case, this model is ill-defined in the sense that both z (1) t and z (2) t can be multiplied by orthogonal matrices and still fulfill conditions (B1) and (B2). The goal is therefore to find an estimate for an unmixing matrix W such that W x t = z TSIR and TSAVE use similar approximate joint diagonalization as the unsupervised methods considered in previous sections. For TSIR Matilainen et al. (2017a) suggest to use matrices G 0,τ (z t , y t ) = Cov(E(z t |y t+τ )), and for TSAVE the matrices G 1,τ (z t , y t ) = E((I p − Cov(z t |y t+τ )) 2 ), are suggested in Matilainen et al. (2019). The solutions are then given as with s = 0 for TSIR and s = 1 for TSAVE, respectively. To compute the matrices G s,τ in practice, the response time series y t is ordered and divided into H slices which are then used to approximate the conditional expectations in G 0,τ and G 1,τ .
For a weighted combination of TSIR and TSAVE, TSSH, an orthogonal matrix U ∈ R p×q maximizes where b ∈ [0, 1]. Notice that b = 0 gives now the TSIR solution and b = 1 the TSAVE solution.
In practice the number of sources, that is, the value of q, is unknown and needs to be estimated. Also the important lags regarding the sources need to be found. So far no statistical tests for these are available, however Matilainen et al. (2017aMatilainen et al. ( , 2019 used the pseudo-eigenvalues λ ij = u i G s,τ j (x st t , y t )u i for determining the value of q as follows. Notice that now T = {τ 1 , . . . , τ K }. Consider the matrix L = l ij where are the scaled pseudo-eigenvalues and the scaling is chosen so that the elements of L add up to 1. Denote the marginal sums of the "eigenvalues" as λ i· = K j=1 λ ij and assume that the unmixing matrix is such that the latent sources are ordered with respect to λ 1· ≥ . . . ≥ λ q· and q = p. Then Matilainen et al. (2017a) suggested four different strategies for finding the appropriate amount of sources and the lags corresponding to the sources by, similarly to the principal component analysis (PCA), trying to explain 100 × P % of the dependence between the latent sources and the response series. The recommended strategy according to Matilainen et al. (2019) is the so called "biggest value" strategy which finds the smallest number r of elements (i 1 , j 1 ), . . . , (i r , j r ) of L such that r k=1 l i k j k ≥ P . For details about the other strategies see Matilainen et al. (2017a). For the number of slices Matilainen et al. (2019) recommend to use H = 10 for TSIR and H = 5 for TSAVE.

Existing packages
In R, many classes are available for time series data and numerous packages implement modeling multivariate time series, see, for example, the CRAN task view "Time Series Analysis" (https://CRAN.R-project.org/view=TimeSeries). Possibly the most comprehensive and general package for multivariate time series modeling is the MTS package (Tsay and Wood 2021), offering methods for fitting, among others, multivariate MA, AR, ARMA and multivariate stochastic volatility models, but also dimension reduction methods for time series data. Further R packages that implement unsupervised dimension reduction for time series, mainly through PCA and factor models, include the packages gdpc (Peña, Smucler, and Yohai 2021), PCA4TS (Chang, Guo, and Yao 2015), freqdom (Hormann and Kidzinski 2017) and tsfa (Gilbert and Meijer 2005).
R-implementations of numerous blind source separation methods are also available, most of the packages, however, containing mainly BSS methods for iid data. These include, for example, fICA (Miettinen, Nordhausen, and Taskinen 2018)

The package tsBSS
The tsBSS package (Matilainen et al. 2021), which we introduce here, is the most compre-hensive package for BSS methods for vector-valued time series. The package tsBSS depends on the packages boot (Canty and Ripley 2021), forecast (Hyndman and Khandakar 2008), ICtest (Nordhausen, Oja, Tyler, and Virta 2021), JADE, parallel, Rcpp (Eddelbuettel and François 2011) and RcppArmadillo (Eddelbuettel and Sanderson 2014). Most functions in the package assume that the input multivariate time series is either a matrix with p columns and T rows or a corresponding object of class 'mts', 'xts' or 'zoo'.
We next describe the functions implementing the methods discussed in Sections 2, 3 and 4. The AMUSE-and SOBI-based signal dimension testing and estimation methods described in Section 2 are implemented as the functions: • AMUSEasymp, AMUSEboot, and AMUSEladle • SOBIasymp, SOBIboot, and SOBIladle In each function, the user supplies the multivariate time series and the lag(s) to be used. In the hypothesis testing functions (*asymp and *boot) the signal dimension to be tested is specified using the argument k, whereas in the *ladle functions the user supplies the maximum number of components to be evaluated as possible signals via the ncomp argument. All bootstrapping functions use by default only one core, but offer the possibility of doing parallel computations. Examples on this are shown in Section 6. For the bootstrap hypothesis testing, the user can choose between one parametric and three nonparametric bootstrap versions. For details, see the corresponding help files. For the ladle functions, the user can choose between fixed block bootstrapping and stationary bootstrapping.
The testing functions return objects of class 'ictest', which is introduced in the package ICtest and inherits from class 'htest', and there are print, components, plot, ggplot and screeplot methods available for the class. The ladle functions return objects of class 'ladle', which was again introduced in ICtest, and available methods for this class are print, summary, components, plot. The actual ladle plot is obtained by calling ladleplot or ggladleplot. The functions which implement BSS methods for multivariate time series exhibiting volatility clustering, as described in Section 3, are usually called by the name of the method. Hence, the package has the functions FixNA, gFOBI, gJADE, PVC and vSOBI. All five functions let the user specify the set of lags to be used. The default is to use the first 12 lags which seems fairly standard in the BSS literature. As there is no natural order of the components, it will usually differ between the methods and the functions usually return the unmixing matrix, the estimated sources, the vector of the lags used and the location of the original series. For financial time series, however, the components of main interest are those which exhibit volatility clustering -therefore all the functions have also the arguments ordered, acfk and original, functioning as follows. When setting ordered = TRUE, the components are ordered according to their degree of volatility clustering as measured by quadratic correlation. For this purpose, acfk specifies a vector of lags which will be used to test for serial correlations. Then, if marginal tests of linear correlation detect dependencies at level α, marginal ARMA models will be fitted to each source component. The default is α = 0.05 but the user can change it. Then the components whose serial dependence was deemed significant are replaced by their residuals from an ARMA fit using auto.arima. For these components, tests of quadratic correlations are performed and the components are ordered according to their test statistics, in order to set the components having the strongest quadratic correlations first. The argument original  ' Matilainen et al. (2017a)  is used to indicate whether the user wants to return the original sources in the desired order or the ordered sources after removing the serial linear correlations. A function for testing the linear and quadratic correlations is also available in the package under the name lbtest, see the corresponding help page for details.
In case of ordered = TRUE, the output of the stochastic volatility BSS functions will contain many additional features, such as, details of the marginal ARMA fits as well as test statistic values and p values for the marginal linear and quadratic correlation tests. If original = FALSE, the source component object S naturally does not anymore correspond to the original sources, which are then returned as Sraw. The returned object for all the stochastic volatility BSS functions is of the class 'bssvol', which inherits from the class 'bss'. Many useful methods such as components or plot are available for this class.
The previous BSS methods designed for time series with volatility clustering, FixNA, gFOBI, gJADE, PVC and vSOBI, can deal with linear processes, but the sixth method described in Section 3, gSOBI, is clearly a more efficient choice when there are both linear process components and components with volatility clustering. The gSOBI function works basically the same way as all other BSS functions mentioned above. In gSOBI, however, the user has to specify two sets of lags, one for the "SOBI part" and one for the "vSOBI part". From our experience, it seems that the lag set for the vSOBI part can be much smaller than the lag set for the SOBI part. Furthermore, the user can, via the argument b, control how much weight should be given to the SOBI part. The default is b = 0.9.
For the supervised dimension reduction methods described in Section 4, the function tssdr is available. Via the argument algorithm, the user can choose between "TSIR" (default), "TSAVE" and the hybrid method "TSSH". The other main input arguments are the univariate response series y and the multivariate explaining series X. The argument k controls the lags to Class Methods 'bssvol' bss.components * , coef * , plot, print 'lbtest' print 'tssdr ' components, plot, print 'summary.tssdr' coef, components, plot, print * inherited from class 'bss' (JADE) be used and the argument H the number of slices. Note that for the TSSH method, different numbers of slices can be used for the TSIR and TSAVE parts by providing a vector of length two. The argument weight specifies the weight given to TSAVE. By default, both parts get equal weight. The returned object is of the class 'tssdr' and the function returns, among other things, the unmixing matrix, the estimated sources and the matrix L as described in Section 4. The most useful methods for an object of this class are plot and summary. In the summary call one can specify, using the argument type, one of the four methods to get an indication about which sources and which lags are relevant. The threshold value P needed for this is by default 0.8, but can be changed via the argument thres.
The functions and the classes and their methods in tsBSS package are summarized in Tables 1 and 2, respectively.

Dimension estimation using AMUSE and SOBI
Following the structure of the paper, we first consider the case where it is assumed that the signals are components with second-order dependencies, and the goal of the analysis is to test hypotheses about the number of signals in order to estimate their number.
For the example, the packages tsBSS and tuneR (Ligges, Krey, Mersmann, and Schnackenberg 2018) are needed, which we load as follows.

R> library("tsBSS") R> library("tuneR")
We create an artificial 20-variate time series by mixing three sound signals available in the JADE package with 17 Gaussian white noise processes.
R> SOBIasymp(Xt, k = 3, tau = 1:10) SOBI test for white noise processes data: x T = 1441.1, df = 1530, p-value = 0.9482 alternative hypothesis: there are fewer than 17 white noise components Thus, the null hypothesis cannot be rejected. Experimenting with different choices of tau (not shown here) also gave the same conclusion. For demonstration purposes we test the hypothesis H 02 : q = 2 using nonparametric bootstrap with the third bootstrap variant "np3" which allows for dependence between the noise components. For further details about the different bootstrap strategies see the help files for AMUSEboot and SOBIboot.
To speed up the bootstrap computations we use 3 cores and fix for reproducibility the seeds using the iseed argument.
To directly estimate the number of signals, the Ladle plot is a natural starting point. We use again SOBI with lags 1 to 10. As SOBIladle uses ts.boot from the boot package (Canty and Ripley 2021) also this can be computed in parallel, by passing the arguments parallel = "multicore" and ncpus to specify the number of cores. However, as bootstrapping is done separately for each suspected number of dimensions, it is advisable to set the maximum number of components that should be evaluated in order to save computation time. Here we say the maximum is 12 via the ncomp argument. The argument l = 40 specifies that we use stationary bootstrap with expected block lengths of 40.

R> ladleplot(SL)
and shown in Figure 1 where the position of the minimum again verifies that there are three signals in this time series. To see the three signals one can use R> plot(SL, which = "k") where which = "k" specifies that one does not want to plot all sources but only those corresponding to the estimated number of signals. The time courses of the three signals are shown in Figure 2.

Dimension reduction in the context of stochastic volatility
In financial time series the second order correlations are often not of interest and the focus is on the stochastic volatility features of the data. Our preferred method in this context is gSOBI and we will demonstrate its functionality using the exchange rate dataset from stochvol package (Kastner 2016), which contains daily bilateral prices of one Euro in 23 currencies. The daily measurements are from January 3, 2000, until April 4, 2012.
In addition to the tsBSS package we use for this example also the packages stochvol (Kastner 2016) and zoo (Zeileis and Grothendieck 2005). For reproducibility purposes for this example also a seed is set. R> library("tsBSS") R> library("zoo") R> library("stochvol") R> data("exrates", package = "stochvol") R> set.seed (1234) We apply the gSOBI method to the logarithmic returns of the original variables using the preferred weight b = 0.9 for the "SOBI" part. The resulting sources will be ordered according to their volatility clustering level which we evaluate using quadratic autocorrelations after removing componentwise second order autocorrelations by fitting automatically ARMA to those components which exhibit linear autocorrelation. The argument acfk = 1:5 specifies that the first five lags should be used to test for linear and quadratic autocorrelation, respectively.
R> round(gSOBIwrd$linP, 4)  [, c(1:3, 21:23)], 2, + svsample_exph) R> ExpH_gSOBI <-zoo(ExpH_gSOBI, order.by = as.Date(rownames(exrlogr))) R> plot (ExpH_gSOBI,ylim = c(0,8), main = "Volatility series of components + with the most and the least volatility clustering") Based on these results all the components seem to have significant volatility clustering. It is however obvious that the degree of volatility clustering differs considerable between the components and that the first components are much more interesting that the last components.
As a comparison, we perform the same analysis using the (modified) principal volatility component (PVC) analysis. We first look at the p values for the linear and quadratic autocorrelations.   Series 23 Volatility series of components with the most and the least volatility clustering Figure 4: Volatility series of components with the most volatility clustering and three components with the least volatility clustering extracted using PVC.
Finally, we compare the gSOBI and the PVC results to corresponding principal components computed with standard principal component analysis (PCA) which ignores the time structure and is thus not targeting volatility clustering effects. As gSOBI and PVC standardize the components to have variance one, we do so here for the obtained PCs as well.
R> vol <-lbtest(PCst2, 1:5, "squared") R> ordered <-order(vol$TS, decreasing = TRUE) R> PCst3 <-PCst2[, ordered] Comp.21 Volatility series of components with the most and the least volatility clustering Figure 5: Volatility series of components with the most volatility clustering and three components with the least volatility clustering extracted using PCA.
R> ExpH_PCA <-apply(PCst3[, c(1:3, 21:23)], 2, svsample_exph) R> ExpH_PCA <-zoo(ExpH_PCA, order.by = as.Date(rownames(exrlogr))) R> plot(ExpH_PCA, ylim = c(0, 8), main = "Volatility series of components + with the most and the least volatility clustering") As before, all the components have volatility clustering, but it seems to be higher in the components with the least volatility clustering compared to the volatility series based on gSOBI or PVC. Note that the ordering of the volatility clustering is not related to the order of the PCs.

Supervised dimension reduction for time series
To conclude the illustration of the tsBSS package we demonstrate how to perform TSIR in a supervised dimension reduction context. For that purpose we use the PRSA data set (Liang et al. 2015) which is available from the UCI Machine Learning Repository (Dheeru and Karra Taniskidou 2017) and can be loaded into R as R> PATH <-paste0("https://archive.ics.uci.edu/ml/machine-learning", + "-databases/00381/PRSA_data_2010.1.1-2014.12.31.csv") R> PRSA <-read.csv(PATH) The dataset includes hourly measurements of fine particulate matter, PM2.5 concentrations, taken at the US Embassy in Beijing, and some meteorological data from Beijing Capital International Airport. The whole dataset is from January 2010 to December 2014. The goal is here to predict the PM2.5 concentration using as predicting time series the hourly measurements of DEWP (dew point; ℃), TEMP (temperature; ℃), PRES (air pressure; hPa), Iws (cumulated wind speed; m/s), Is (cumulated hours of snow) and Ir (cumulated hours of rain). For a complete description of the data set, see Liang et al. (2015) and https: //archive.ics.uci.edu/ml/datasets/Beijing+PM2.5+Data. The response time series in Figure 6 can be plotted as R> plot(ts(response), ylab = "PM 2.5 concentration", xlab = "Time in hours") As a preparation step we make the predicting time series more stationary by taking the first order differences. For the variables Iws, Is and Ir this is done after taking logarithms where, if needed, one was added to make the counts all positive. Also all the time series are then centered and scaled.
R> library("tsBSS") R> TSIR <-tssdr(y, X, H = 10, algorithm = "TSIR", k = 1:12) We could check all the estimated sources using components(TSIR). However, the most convenient way to work with the TSIR object, which is of class 'tssdr', is to use the summary function, where one can specify the method and threshold to investigate which directions and lags might be relevant. Among the strategies for choosing the directions and the lags, we choose the biggest values method with a threshold of 0.8, R> TSIRbiggest <-summary(TSIR, thres = 0.8, type = "big") To see the most important results we can just use

R> TSIRbiggest
Summary of TSIR for response y and predictors X The signal separation matrix W is:  Following the biggest value strategy, one linear combination of predicting time series seems sufficient to model the response. The weights that the linear combination of interest gives to each variable can be seen from the matrix W which can be extracted using coef(TSIRbiggest).
The weights corresponding to the differentiated variables of Iws, Is and Ir are very small, whereas the differentiated variables of DEWP and PRES have the largest impact on the linear combination.
From the last part of the summary output it seems also that the most relevant thing is what happened 6-8 hours before as these lags have the largest information content. If the air pressure has gone up and the dew point has gone down during the last 6-8 hours, which might be an indication of drier weather, which would have an effect on how long the pollution particles stay in the air.
The first few values of the chosen source can be printed as The response and the chosen sources can be obtained using bss.components(TSIRbiggest).
Using plot one obtains a figure with the response time series and the chosen interesting components as shown in Figure 8.

R> plot(TSIRbiggest)
In practice now a good model for the source components should be found and the relationship between the response and the source component should be explored. This is however beyond the scope of this example.

Summary and discussion
We presented the R package tsBSS containing implementations of several BSS methods aimed for time series data. Dimension reduction through BSS is an attractive option for the first step in multivariate data analysis, helping avoid the cumbersome modelling of complex dependencies and, in the best case, ridding the data altogether of noise. In its current form the package contains tools for dimension reduction of both second-order stationary models and for time series exhibiting stochastic volatility. It is also the first package providing tools for supervised dimension reduction in a time series context.