Wavelet-based and Fourier-based multivariate Whittle estimation: multiwave

Multivariate time series with long-dependence are observed in many applications such as finance , geophysics or neuroscience. Many packages provide estimation tools for univariate settings but few are addressing the problem of long-dependence estimation for multivariate settings. The package multiwave is providing efficient estimation procedures for multivariate time series. Two semi-parametric estimation methods of the long-memory exponents and long-run covariance matrix of time series are implemented. The first one is the Fourier-based estimation proposed by [18] and the second one is a wavelet-based estimation described in [4]. The objective of this paper is to provide an overview of the R package multiwave with its practical application perspectives.


Introduction
Time series with defined autocovariance functions are said to present long-memory or long-range dependency when their autocovariance function is decreasing very slowly, slower than an exponential decay. More precisely, let g(·) be the autocovariance function of a time series X. X is said to be long-memory if there exists α, 0 < α < 1, such that g(t) is asymptotically equivalent to |t| −α when t → +∞ (see [5] and references therein). This definition implies that the covariance function is not summable. Equivalently, the spectral density f (·), if it exists, is such that f (λ) is equivalent up to a constant to |λ| 1−α when λ → 0 + . In this case, when trying to estimate the expectation using the empirical mean of long-memory time series, the variance of the estimator is not decreasing to 0 as N −1 (where N is the sample size). Hence it is crucial to take into account the presence of longmemory for defining good estimators [5]. In the case of univariate time series, several very efficient approaches have been developed and validated. A web page entitled "Time Series Analysis" 1 from the R software is providing a very exhaustive list of methods and softwares dealing with long-memory time series. Among others, we can cite the packages fracdiff [7], arfima and FGN [23], longmemo [5] and forecast [8]. For example, fracdiff is dedicated to simulation of fractional ARIMA time series and to estimation using regression of the periodogram. longmemo provides real data examples of time series with long-memory properties.
Approaches for multivariate long-memory time series are less developed. When dealing with multivariate time series, an important quantity to estimate is the covariance or correlation between pairs of time series. The effect of the presence of long-memory on this estimation is obvious, as stated by [14]. One R package, waveslim, is dedicated to the wavelet correlation analysis for pairs of random variables [24] but long-range dependence properties are not considered. [17] provide R code 2 for bivariate longrange dependent time series with parametric estimations. The objective of this paper is to provide an efficient R package, called multiwave, to estimate the long-memory parameters and covariance matrices for multivariate time series. The estimation procedures are based on a semi-parametric approach, which is robust to model misspecification.
The procedures are also suited for dealing with more than two dimensional data. Indeed they are based on Whittle approximation which provides a simple function to optimize. This function can be used for any dimension of the problem. In comparison, regression of the scalogram or periodogram [3] is based on a linear fit of pairs of time series, and thus there does not exist an easy way to extend to more than two dimensions.
multiwave package is based on [4], where we developed a wavelet-based approach using Whittle approximation for an efficient estimation of the long-memory parameters and the long-run covariance matrices. In addition, multiwave proposes an implementation of an alternative method using Fourier decomposition as described in [18] 3 .
As it is described in this paper, multiwave is a very versatile package and opens the way to estimation of the long-memory parameters and the long-run covariance matrices using multivariate data sets. It is in particular not necessary to assume the stationarity of the time series as it is the case when using Fourier decomposition [6]. The Whittle approximation is computed using either the coefficients of wavelet decomposition or the coefficients of Fourier decomposition when the time series are stationary.
The package multiwave is divided in three parts. A first group of functions is dedicated to the simulation of multivariate long-memory time series; the main function is fivarma. A second group of functions is implementing the wavelet decomposition, through DWTexact and associated functions. Finally the computation of the estimators are coded using the Fourier decomposition in mfw and its derivatives and using the wavelet decomposition in mww and its derivatives.
The mathematical background is detailed in a separate Section 2. The rest of the paper is dedicated to the description of the package multiwave. Simple examples of parametric models and real data are presented in Section 3 with corresponding functions of multiwave ready to apply. Core estimation functions using wavelets and Fourier transform are detailed in Section 4 along with pieces of code using simulated time series. Finally, practical considerations are discussed in the three last sections. Section 5 is discussing the practical choices of parameters. Comparaisons between wavelets and Fourier approaches are described in Section 6. And an application to real data in neuroscience is concluding the paper, Section 7.

Theoretical background
As in the univariate case, the definition of long-memory for a p-vector process is based on the asymptotic behaviour of the cross-spectral density in the neighbourhood of zero [12]. We consider N observations of a long-memory p-vector process X = {X (k), k ∈ Z, = 1, . . . , p}, namely X(1), . . . X(N ). X is said to be a multivariate M (d) process when for each = 1, . . . , p there exists D ∈ N such that the D -th order difference ∆ D X is covariance stationary. In addition, let us assume that for any , m = 1, . . . , p the generalized cross-spectral density of X and X m is with Ω = (Ω ,m ) ,m=1,...,p an Hermitian matrix. The functions f S ,m (·) correspond to the short-memory dynamics of the process. The parameters d satisfies −1/2 < d − D < 1/2. More generally, the wavelet-based procedure is available for cross-spectral density satisfying an approximation where the exponent * denotes the conjugate transpose operator. Here and subsequently ∼ means that the ratio of the left-and right-hand sides converges to one. Note that, the process X is not necessarily stationary.
The long-range dependence parameter measures the power-like rate of decay of the autocovariance function. The long-run covariance matrix Ω can be seen as the covariance at low frequencies between the time series. It gives a quantification of the link between the components of the multivariate time series. The long-run covariance parameter of the model is free from the difference in the autocorrelation behaviour of each component. It is linked with long-run correlations Ω ,m / Ω , Ω m,m ,m=1,...,p , which are also encountered in literature as power-law coherencies between two time series [17] or as fractal connectivities [3].

A parametric example: FIVARMA
Fractionally Integrated Vector Auto Regressive Moving Average (FIVARMA) processes are parametric models with a spectral density satisfying approximation (2). They correspond to Model A of [9]. We refer to this paper for a detailed mathematical description.
is the σ-field generated by {u(s), s < t}, and Σ is a positive definite matrix. Let (A k ) k∈N be a sequence of R p×p -valued matrices with A 0 the identity matrix and We assume that all the roots of |A(L)| are outside the closed unit circle, where L denotes the lag operator. Let also (B k ) k∈N be a sequence in R p×p with B 0 the identity matrix and ∞ k=0 B k 2 < ∞. As defined for A, B(·) denotes the discrete Fourier transform of the sequence, The spectral density satisfies with Then X is called a F IV ARM A(d, q) process and satisfies approximation (2).

Limits of the model
Note that in definition (3) the operators are applied in a given order, where the lag operator is taken first. Changing the order of the lag operator and autoregression corresponds to model B of [9] and V ARF I models of [17] where X is obtained with equation diag(1 1 − L) d A(L) X(t) = B(L)u(t), with similar notations than above. That is, X is obtained by fractional integration after autoregression, which is also called cointegration. The spectral density still satisfies the approximation (2) however the matrix Ω may no longer be Hermitian. Clearly multiwave package is not built to deal with such cases. We refer to alternative methods in literature, among others [15,17,19]. Taking into account cointegration is a difficult problem that exceeds the scope of this paper. Future work is needed to handle this particular case.

Fourier-based estimation (MFW)
The discrete Fourier transform and the periodogram of X evaluated at frequency λ are defined as in [18]'s procedure Let λ j = 2πj/N , j = 1, . . . , m, be the Fourier frequencies used in estimation, m ∈ N.
The solution satisfies with Ω The dynamics of the frequencies at the neighbourhood of the origin is given by the dynamics of the spectral density around the zero frequency. The form of the criterion is justified by a second-order approximation of the spectral density matrix (1), rather than the approximation (2). For an estimation based on the first-order approximation (2), one should replace established the theoretical performance of this estimation procedure, for both the long-range dependence parameters and the long-run covariance matrix. It is shown that the variance for the estimation of the vector d is decreased for the multivariate procedure with respect to a univariate one. It is worth mentioning that [10] developed a similar estimation procedure, based on a rougher approximation of the cross-spectral density, Interestingly, the quality of estimation for the vector d is similar. Nevertheless, the estimation of the long-run covariance matrix Ω is biased since it does not take into account the phase-shift appearing in Λ F j (d). We refer to [10] and to [18] for a more detailed study of these estimators and their consistency.

Wavelet-based estimation (MWW)
Wavelets are providing a very efficient tool because of their high flexibility to deal with nonstationary time series which is particularly useful for real data applications. Their good performances in comparison to Fourier have already been shown for example in univariate settings [6]. Let (φ(·), ψ(·)) be respectively a father and a mother wavelets, satisfying regularity conditions, as stated in [4]. At a given resolution j 0, for k ∈ Z, we define the dilated and translated functions φ j,k (·) = 2 −j/2 φ(2 −j · −k) and ψ j,k (·) = 2 −j/2 ψ(2 −j · −k). The wavelet coefficients of the process X are defined by where X(t) = k∈Z X(k)φ(t − k). For given j 0 and k ∈ Z, W j,k is a p-dimensional vector W jk = W j,k (1) W j,k (2) . . . W j,k (p) where W j,k ( ) = R X (t)ψ j,k (t)dt. For any j 0, the process (W j,k ) k∈Z is covariance stationary [4]. Let θ ,m (j) denote the wavelet covariance at scale j between processes X and X m , i.e. θ ,m (j) = Cov(W j,k ( ), W j,k (m)) for any position k. Let us introduce the wavelet scalogram The wavelet scalogram is the equivalent of the Fourier periodogram. Yet the scalogram is not normalized, on the contrary of the periodogram. We also introduce the function K(·), defined as The wavelet Whittle procedure is described in [4].
The estimators ( d The estimation is here based on a first-order approximation of the spectral density matrix around 0. The solutions of this problem satisfy where I W (j) is the wavelet scalogram at scale j defined in (7). The long-run covariance matrix can then be estimated by This second step in the estimation of the long-run covariance matrix Ω is needed because the wavelets used in this paper are real and cannot correct the phase-shift (given by (12)). This is not the case for the Fourier Whittle estimation described in [18]. Fortunately, the phase-shift can be expressed as a multiplicative cosine term in the covariance of the wavelet coefficients and a correction is still possible. [4] established that the MWW estimators (10) and (12) are consistent under non-restrictive conditions. The rate of convergence for the estimation of the long-range parameters d is similar to the MFW estimator and is minimax. We refer to [4] for the detailed study of the asymptotic behaviour of MWW estimation.

Examples of multivariate long-memory time series
This section is describing specific functions of multiwave for the user to be able to simulate multivariate long-memory processes. Parametric models are defined and implemented. In addition a data sets of real data from neuroimaging is provided.

Simulations of FIVARMA
multiwave package proposes simulation functions for time series with a spectral density satisfying approximation (2). The main function is fivarma which computes a parametric FIVARMA process defined in Section 2.1.
The input parameters of a F IV ARM A(q, d, r) process are the covariance matrix Σ of the innovation process u, the vector AR (AutoRegressive) (A k ) k=0,...,q , A k ∈ R p×p , the vector MA (MovingAverage) (B k ) k=0,...,r , B k ∈ R p×p , and the vector of long-range parameters d ∈ R p . FIVAR model of [16] is a subcase, corresponding to MA coefficients equal to zero. The parameters of fivarma are thus, in order, (N, d, cov_matrix, VAR, VMA) where cov_matrix= Σ and VAR and VMA denote respectively the sequences of matrices (A k ) k=0,...,q and (B k ) k=0,...,r .
The output of the function fivarma is a list with first the values X(1), X(2), . . . , X(N ) obtained by equation (3) with u(t) white noise with centered Gaussian distribution and covariance Σ. The second element of the list is the value of the matrix Ω defined in (4).
fivarma is based on two other functions: • fracdiff applies a vectorial fractional differencing procedure and corresponds to a FIVARMA(0,d,0).
• varma computes a realisation of a multivariate ARMA process and corresponds to the case d = 0.
Similar functions can be found in other packages (e.g. fracdiff and MST [22]) but were re-implemented in multiwave package. Example.

A real data set
In order to describe how parameters can be chosen in a practical point of view, we provide a real data example (see Section 7). Noninvasive data recorded from the brain are an example where the proposed methodology is efficient. The data consist of time series recording signals from the brain: electroencephalography (EEG) for the electrical signals, magnetoencephalography (MEG) for the magnetic signals or functional Magnetic Renonance Imaging (fMRI) for the Blood Oxygen Level Dependent (BOLD) signals. These data are intrinscally correlated because of the known interactions of the brain areas (also called regions of interest). Furthermore, it has already been shown that these time series present long-memory features [11]. Other data sets presenting similar features are coming from finance e.g. [20], where time series are correlated because of links between companies for example, and they also present long-memory characteristics. In this section, we observed time series extracted using fMRI facilities. The whole description of this data sets is detailed in [21]. The data set called brainHCP contains the time series of 1200 points in time and 89 regions of the brain. Figure 1 displays 6 arbitrary signals from one subject in this data set.

R> data(brainHCP) R> dim(brainHCP)
[1] 1200 89 The two functions mfw and mww are implementing the semiparametric Whittle estimation using respectively Fourier decomposition and wavelet decomposition in order to estimate d and Ω.
A fast execution of these two functions is

Multivariate Fourier Whittle estimation
As a first implementation, it is natural to use Fourier decomposition to approximate the spectral density of time series.
Package multiwave proposes functions to compute MFW estimators: • mfw computes the multivariate Fourier Whittle estimators of both the long-range dependence parameters and the long-run covariance matrix.
• mfw_cov_eval computes the multivariate Fourier-based Whittle estimator for the long-run covariance matrix for a given value of the long-range dependence d.
• mfw_eval returns the value of the multivariate Fourier Whittle criterion with respect to d at a given value of d.
The functions mfw_cov_eval and mfw_eval are internal functions of mfw. In mfw, we apply first a minimum search of mfw_eval with respect to d, and mfw_cov_eval is returning the estimation of Ω for the estimated value d.
We only detail function mfw hereafter and refer to the package description for other functions.
Let X be the p × N -matrix of observations, with general term x ,i = X (i), = 1, . . . , p and i = 1, . . . , N . Let m be the number of frequencies used in MFW procedure. Given x and m, the function mfw computes the MFW estimators defined by (5) and (6), with the frequencies λ j = 2πj/N , j = 1, . . . , m. The optimization in equation (5) is done using optimize function of R in one-dimensional settings and a Newton-type algorithm through nlm function of R otherwise. The initialization of the algorithm is set equal to the vector of univariate Fourier-based Whittle estimations. Even if it increases the computational time, such an initialization is important in high-dimensional settings. For example, in the MEG data set studied in [4], the optimization is done in R 274 . An initialization at the origin may not be able to reach the minimum even with a high number of iterations, due to the high dimension.
Function mfw returns a list with first the p-dimensional vector d The quality of estimation is depending on the parameter m. Theoretical results show that m must be small enough so the short-range properties of the time series do not bias the estimation. On the contrary a too small value will introduce variance in estimation since it decreases the number of frequency used in the procedure. [10], [18] or [13] propose a default value m = N 0.65 . This choice is discussed in the simulation study in Section 6. Example.

Multivariate Wavelet Whittle estimation
The functions applying MWW estimation in package multiwave are the following: • mww computes the multivariate wavelet Whittle estimators of the long-range dependence parameters and the long-run covariance matrix.
• mww_cov_eval computes the multivariate wavelet-based Whittle estimator for the long-run covariance matrix for a given value of the long-range dependence d.
• mww_eval returns the value of the multivariate wavelet Whittle criterion with respect to d at a given value of d.
mww_cov_eval and mww_eval are internal functions of mww. In mww, we apply first a minimum search of mww_eval with respect to d, and mww_cov_eval is returning the estimation of Ω for the estimated value of d. MWW estimation is based on the wavelet transform of time series and mww needs the definition of a wavelet filter. The computation of a filter and of a wavelet transform are described below.

Wavelet transform and scalogram
The wavelet decomposition in package multiwave is implemented using an exact discrete wavelet transform.
• computenj computes the number of wavelet coefficients for each individual scale.
• DWTexact provides the wavelet transform of the data.
• psi_hat_exact gives the Fourier transform of the wavelet function.
Example. To obtain the wavelet filter of a Daubechies' wavelet of order 4, that is with 2 vanishing moments, one should write: R> res_filter <-scaling_filter('Daubechies',4); R> filter <-res_filter$h R> filter [1] 0.4829629 0.8365163 0.2241439 -0.1294095 Next, given an N dimensional vector x, the wavelet coefficients of x are given by function DWTexact: Finally it is useful to compute the quantity K(δ) defined in equation (8). Thus one needs to recover the Fourier transform of the wavelet, ψ(·). This is done using the function psi_hat_exact. Its inputs are the filter defined previously and an index of precision. It returns ( ψ(u i )) i=1,...,q * 2 J where q is the length of the filter and u i are equally spaced points on the interval where res_psi$grid returns the values of the grid (u i ) and res_psi$psih returns the corresponding values of ψ(u i ). It is recommended to take J 15 in practice and the default value is J = 10. Indeed, a large value of J is increasing the computational time.
Given the function ψ(·), we are now able to evaluate K(d) for a given value of d: R> K <-K_eval(psih,gridh,d0)

Estimation
mww is now described in detail. Let X be the p × N -matrix of observations, with general entries x ,i = X (i), = 1, . . . , p and i = 1, . . . , N . Let LU be the bivariate vector giving the lowest scale j 0 and the upper scale j 1 of the wavelet coefficients used in estimation. Given X and LU , the function mww computes the MWW estimators defined by (10) and (12). As previously, the optimization in equation (10) is done using optimize function of R in one-dimensional settings and a Newton-type algorithm through nlm function of R otherwise. The initialization of the algorithm is set equal to the vector of univariate wavelet-based Whittle estimations. The reasons are identical to the ones given for the function mfw. . The quality of estimation is depending on the parameters j 0 and j 1 appearing in (10) and (11). Default value for j 0 is set to 2 and for j 1 to the highest integer lower than log 2 (N ). The critical value to choose for estimation is j 0 , as it can be seen in the theoretical conditions for consistency [4] and in simulations studies. Similarly to the choice of the parameter m for MFW procedure, a compromise exists between choosing a small value of j 0 , which would introduce a bias due to the short-range properties of the time series, and a high value that will reduce the number of frequencies and thus increase variance.

Example.
R> # Simulation of the data R> N <-2^8 R> d0 <-c(0.2,0.4) R> rho <-0.8 R> cov <-matrix(c(1,rho,rho,1),2,2) R> VMA <-diag(c(0.4,0.7)) R> VAR <-array(c(0.8,0.2,0,0.6),dim=c(2,2)) R> resp <-fivarma (N, d0,  If one wants to apply several times the estimation on the same data set, or modifying the parameters of estimation, it is useful to separate the wavelet transform and the estimation scheme. Wavelet-based estimation can be evaluated directly on the wavelet transform of the data using the following functions: • mww_wav computes the multivariate wavelet Whittle estimators of the long-range dependence parameters and the lon-run covariance matrix, given the wavelet transform of the data.
• mww_wav_cov_eval computes the MWW estimator for the long-run covariance matrix for a given value of the long-range dependence d, given the wavelet transform of the data.
• mww_wav_eval returns the value of the multivariate wavelet-based Whittle criterion with respect to d at a given value of d, for a specific wavelet transform of the data.
We refer to the description of the functions in the package for more details. Example.

Practical choices of parameters for MWW estimation
MWW procedure implemented in mww depends on mainly two parameters: the choice of the wavelet bases filter and the choice of the wavelet scales LU. These parameters are not fixed in the package as the performances of the estimation may be improved by a careful choice. The possibility to choose the wavelet scales is particularly of interest when dealing with short-range dependence. We show that a simple graphical representation of the scalogram is able to guide the user in the choice of the wavelet scales.

Choice of the wavelet bases
Actually multiwave only proposes Daubechies' wavelets, which satisfy theoretical properties of [4]. Other bases are possible but not implemented. The wavelet bases is imputed via the parameter filter in function mww, where filter is obtained by filter <-scaling_filter('Daubechies',2*M)$h. The main parameter characterizing the Daubechies' bases is thus the number of vanishing moments, M . MWW estimation presents the advantage to be available even if the time series are nonstationary or with polynomial trends, as soon as max d M . For example, for stationary time series, a parameter M = 1 is sufficient (which is equivalent to considering Haar bases). For real data application, when nonstationarity or trends are suspected, a higher value of M is necessary.
As discussed in [6], when M increases, the quality of estimation (slightly) decreases. Depending on the data, a compromise is then needed between choosing a large enough number of vanishing moments M to handle nonstationarity in the data and the quality of estimation. On the contrary, MFW estimators are only suited to stationary time series. Some extensions of Fourier-based estimation were proposed in univariate setting such as tapered Fourier (see e.g. [6] and references therein). For multivariate estimation [13] proposes an extension of [18] based on the transform defined in [1]. However, this approach gives satisfactory results only for d < 1.5 and we decided not to implement it in the package for simplicity.

Choice of wavelet scales
The second parameter we need to tune is LU, which corresponds to the range of scales used in estimation. LU is a two-dimensional vector, that is LU<-c(j0, j1), with j0 the lowest scale and j1 the upper scale. Parameters j0 and j1 are respectively j 0 and j 1 defined in equations (10) and (11) in the estimation procedure.
One advantage of wavelets is to be able to qualitatively evaluate the choice of wavelet scales to estimate the long-memory parameters and correlation by inspection of wavelet scalogram. As mentioned in [2,6] for univariate settings, the first and last scales may have to be discarded from the analysis. The first scale may be affected by the presence of short-memory phenomena. In the example of FIVARMA model, this is driven by the AR and MA coefficients. For the last scales, the impact is different and it comes from the finite length of the time series. Indeed, as derived in [24], the variance of the estimator is increasing with the wavelet scales. The usual log-scalogram diagram used in univariate settings is showing the linear behaviour of the log variance with respect to the wavelet scales [2]. This is also true for the covariances as shown in Proposition 2 of [4]. For all k ∈ Z, Cov(W j,k ( ), W j,k (m)) is equivalent to 2 j(d +dm) G ,m (d) when j goes to infinity, with G ,m (d) defined in equation (9). This property is illustrated in Figure 3 with a bivariate FIVARMA processes, as described in Section 2.1. This figure represents the boxplots of the variance of the wavelet coefficients of each component of the time series at each scales (subfigures 3.(a) and 3.(b)). These plots correspond to the usual log-scalogram diagram [2]. Subfigure 3.(c) displays the analog representation of the covariance between the components. For both variance and covariance, the points satisfying the above approximation are aligned. Scales corresponding to non-aligned points should be removed from estimation as it can be seen on subfigure 3.(a) and 3.(c) where the highest frequencies are modified by the presence of short-range dependence. Thus, one may discard the first scale from estimation to improve its quality. Using the results on the wavelet variance and covariance in terms of scales, we showed that the wavelet correlation is asymptotically constant with respect to the wavelet scales. Indeed, for all , m = 1, . . . , p, for all k ∈ Z, Cor(W j,k ( ), W j,k (m)) is equivalent to G ,m (d)/ G , (d)G m,m (d) when j goes to infinity [4]. As for the log-scalogram diagram, the correlation between wavelet coefficients with respect to the scales can be plotted and scales where the observed correlation is not equal to the value obtained for the majority of scales should be removed from estimation. The wavelet correlation spectrum is a complementary way to qualitatively evaluate the range of scales where the analysis should be carried out. Figure 4 illustrates this on four different data sets. Four different simulations of bivariate processes are applied using finite difference processes, FIVAR and FIVARMA processes, as described in Section 2.1. Figure 4 represents the boxplots of the correlation between wavelet coefficients of the two components of the time series at each scales. Again the presence of short-range dependence alters the highest frequencies (subfigures 4.(b) and 4.(c)). This is also observed with nonstationarity (subfigure 4.(d)). The first scales should then be removed from estimation. This free parameter of the package is particularly useful with the presence of short-range dependence or nonstationarity. Visual comparison to constant values may be easier for selection of the correct range of wavelet scale to use in the estimation. A similar discussion is detailed in Section 7 for a real neuroscience data set. With real data sets a bootstrap procedure is necessary to obtain boxplots, as it will be explained in Section 7. For the Fourier procedure, the equivalent parameter is the number of frequencies m. However, wavelets are providing a graphical way to choose the upper and lower scales. To our knowledge, no equivalent qualitative evaluation for Fourier procedure is available.

Numerical examples
In order to quantify the quality of the choice of the parameters, numerical examples are provided. The first tables illustrate the quality of the estimators for a long-memory process with no short-range dependence. Then, the simulations are complexified by adding short-range behaviour or nonstationarity. In each example we simulated 1000 Monte-Carlo replications of N = 512 observations of FIVARMA(q,d,r),  The quality of estimation is measured via the bias, the standard deviation (std) and the RMSE which is equal to √ bias 2 + std 2 . For clarity, all tables are displayed at the end of the paper.

A reference example
We first consider a simple example, with neither short-range dependence, nor nonstationarity. Time series were simulated using a bivariate F IV ARM A(0, d, 0) with a long-run correlation matrix Ω =  As shown in Figure 4(a), and discussed in Section 5.2, all scales can be kept for estimation. Table 2 displays results for the MWW estimation of d. This illustrates that multivariate estimation improves the quality of estimation for d. Indeed, the last column gives the ratio between the RMSEs of the multivariate wavelet-based estimation and of the univariate wavelet-based estimation (ratio M/U). This ratio is always smaller than 1, that is, multivariate RMSE is always lower than univariate RMSE.

Short-range dependence
Consider a F IV ARM A (1, d, 0) obtained with the model described in section 2.1 and given by the function fivarma. This case corresponds to a F IV AR model of [17]. The AR coefficient is taken equal to A = 0.8 0 0.2 0.6 and the correlation between the innovation processes equal to ρ = 0.8. More precisely let ε be a bivariate white noise process with covariance matrix Σ = 1 ρ ρ 1 and let u be the AR(1) process defined by u(t) + Au(t − 1) = ε(t). The time series observation X(t) at time t satisfies The matrix Ω in equation (4) is equal to The corresponding long-run correlation is thus equal to 0.754. As explained above, the finest scales are influenced by the short-range dependence and they have to be discarded from the estimation. As it can be seen in Figure 4(c), the first two scales should be removed. We obtained accordingly that the lowest RMSE in estimation is obtained taking j 0 = 3.
Numerical results for estimation are given in Table 4 and Table 5. We can observe that estimation of d and Ω are still satisfactory. The RMSE is very similar to the previous case with no short-range behaviour.

Nonstationarity
Nonstationary examples are simulated using values of d higher than 0.5. We consider order 1 or 2 of nonstationarity, that is d ∈ [0.5, 2.5) 2 . The behaviour of the wavelet correlations at each scale is illustrated in Figure 4(d). Contrary to stationary simulations where the optimal choice of j 0 was equal to j 0 = 1, the optimal choice of the parameter j 0 is j 0 = 2. Results are given in Table 6 and Table 7.
Comparing Table 2 and Table 6, the quality of estimation of d is still accurate in nonstationary settings, with similar values for the RMSE. As for the estimation of Ω, Table 3 and Table 7 indicate that MWW still provides a good quality estimation of the long-run covariance matrix. The quality is slightly lower but still satisfactory.

Discussion on identifiability
In practical applications, it seems natural to assume that time series have the same order of stationarity. However, when two time series have long-memory parameters d and d m satisfying d − d m = 1 the long-run covariance matrix Ω is no longer identifiable with the wavelet-based procedure. Indeed, Proposition 2 in [4] states that in this particular case the covariance Cov(W j,k ( ), W j,k (m)) tends to 0 when the scale j tends to infinity. Figure 5 illustrates this approximation for a bivariate  When d − d m = 1, the estimator (12) is no longer defined. In practice, the quantity d − d m cannot be exactly equal to 1. Nevertheless, as dividing by a cosine function of this difference, a small error in the estimation of (d , d m ) will lead to an important bias in the estimation of Ω ,m . As it can be seen in Figure 6, the resulting bias increases in the neighbourhood of the non-identifiable lines d − d m = ±1.
When this situation occurs, say when the difference between d − d m is between 0.75 and 1.25, the estimation of d is not affected. But the user must be careful for the estimation of Ω. One solution is to differentiate or integrate one of the two processes. For example, Table 1 illustrates the non-identifiability of Ω in a bivariate F IV ARM A(0, 0.2 1.2 , 0). When differentiating the second component (with d 2 = 1.2) the estimator has again good performances.

MFW estimation and comparison with MWW
The comparison between Fourier-based and wavelet-based approach is presented now. Time series were simulated using a bivariate F IV ARM A(0, d, 0) with a long-run correlation matrix Ω = 1 ρ ρ 1 and ρ = 0.8. The bivariate vector d is chosen in [0, 0.5) 2 , such that the time series are stationary. In such a setting, Fourier-based estimators are available.
For Fourier-based approach, the parameter to choose is m corresponding to the number of frequencies taken into account in the estimation. The default value in [18] is m = N 0.65 . We also make comparisons with an optimal value computed by minimizing the RMSE.  Table 8 gives the results obtained for the estimation of d using MFW procedure (to be compared to Table 2 for wavelets). Comparison between Fourier and wavelet-based procedures is summarized by the ratio between the RMSE given by MWW estimation and the RMSE given by MFW estimation, denoted by ratio W/F. Taking the same number of frequencies as [18], that is, m = N 0.65 , Table 8 shows that the quality of MWW end MFW procedures are comparable, even if wavelet-based estimation slightly improves Fourier-based estimation with such a choice of m.

Estimation of the long-memory parameters
Next we also consider the number of frequencies leading to the minimal RMSE for MFW estimation. As it can be seen in Table 8, qualities of both procedures are very similar but MFW estimation then (slightly) surpasses MWW estimation.
Very precise comparisons of Fourier-based and wavelet-based approaches are described in [6] for a univariate setting. In particular, it is shown that since the time series are stationary, the use of Haar bases should improve MWW quality. The authors indeed obtained better results with the Haar-based  [6]. They quantify the influence of the regularity of the wavelet bases and discuss the comparison with (tapered) Fourier bases. Similar results are expected to be obtained in the case of multivariate time series, however, this topic exceeds the scope of this paper.

Estimation of the long-run covariance
Finally, Table 9 and Table 10 display results for the estimation of Ω with MFW method (to be compared to Table 3 for wavelets). When MFW estimation is applied with the usual number m = N 0.65 of frequencies, one can see that the wavelet-based procedure still estimates better the long-run covariance and the long-run correlation, with a ratio W/F always lower than 0.6 for the estimation of Ω terms. When the number of frequencies in Fourier-based estimation is chosen optimally, MFW and MWW procedures behave similarly and none appears significantly better than the other.
To conclude, MFW and MWW estimation procedures give very similar results. The slight improvement of Fourier-based procedure for the estimation of d can be explained by the choice of the wavelet bases, however wavelets are efficient for a large set of applications, including time series with trends and nonstationarity features.

Application on real neuroscience data
As already shown in Figure 4 for simulated data, the advantage of representing the wavelet correlation in terms of scale is to qualitatively determine the scales necessary to estimate the long-memory parameters and long-range covariance matrix. When dealing with real data, bootstrap is providing a way to assess the variability of the estimators. Using the real data described in Section 3.2, sliding overlapping window of the time series were extracted containing 512 points and we repeated the estimation until reaching the final point of the time series. This is illustrated in Figure 7, where an example of four pairs of fMRI data from one subject is presented. Boxplots are constructed using the sliding window extractions. From these plots, and taking into account neuroscientific hypothesis stating that the signal of interest for resting state is occurring for frequency below 0.1Hz, we chose to compute the long-memory parameters between scales 3 and 6.

Conclusion
The R package multiwave provides a versatile wavelet-based approach, as well as a Fourier-based approach, for estimating long-memory parameters and long-run covariance matrices of multivariate time series. The two estimation procedures are based on semi-parametric approaches of [18] and [4]. The added value of the package is to provide estimations in long-range dependence multidimensional settings, which is not proposed presently by any R package to our knowledge. This paper describes the functions of the package multiwave and discusses some practical points for applications, including on a real data set. A simulation study shows first that multivariate estimation improves univariate estimation. The advantage of the wavelet-based procedure with respect to the Fourier-based estimation is its flexibility, allowing to take into account trends or nonstationarity.          Table 9: Fourier Whittle estimation of Ω for a bivariate ARF IM A(0, d, 0) with ρ = 0.8, N = 512 with 1000 repetitions. The number of frequencies is m = n η with η = 0.65 as chosen in [18].