cts : An R Package for Continuous Time Autoregressive Models via Kalman Filter

We describe an R package cts for ﬁtting a modiﬁed form of continuous time autoregressive model, which can be particularly useful with unequally sampled time series. The estimation is based on the application of the Kalman ﬁlter. The paper provides the meth-ods and algorithms implemented in the package, including parameter estimation, spectral analysis, forecasting, model checking and Kalman smoothing. The package contains R functions which interface underlying Fortran routines. The package is applied to geophysical and medical data for illustration.


Introduction
The discrete time autoregressive model of order p, the AR(p), is a widely used tool for modeling equally spaced time series data. It can be fitted in a straightforward and reliable manner and has the capacity to approximate a second order stationary process to any required degree of precision by choice of a suitably high order. Various information criteria such as the AIC (Akaike 1974) or the BIC (Schwarz 1978) can, in practice be used to determine a suitable order when fitting to a finite data set. This model is clearly not appropriate for irregularly sampled data, for which various authors have advocated the use of the continuous time autoregressive model of order p, the CAR(p). For example Jones (1981) developed an estimation method for the CAR(p) fitted to discretely sampled data through the application of the Kalman filter. However, the CAR(p) model does not share with its discrete counterpart the same capacity to approximate any second order stationary continuous process. Belcher, Hampton, and Tunnicliffe Wilson (1994) modified the parameterization and structure of the CAR(p) model in such a way that this general capacity for approximation was restored, requiring only a relatively minor modification of the estimation procedure of Jones. More recently, see Tunnicliffe Wilson and Morton (2004), this modified CAR(p) model has been named the CZAR(p) model and expressed in autoregressive form using the generalized continuous time shift operator with the alternative parameterization appearing in a natural manner. Wang, Woodward, and Gray (2009) utilized the methods in Belcher et al. (1994) for fitting time varying nonstationary models. Since the technical details for the Kalman filter on CAR models are scattered in the literature, we give a thorough presentation in this paper, which provides a foundation for the implementations in the R (R Development Core Team 2013) package cts (Tunnicliffe Wilson and Wang 2013) for fitting CAR models with discrete data. The cts package contains typical time series applications including spectral estimation, forecasting, diagnostics, smoothing and signal extraction. The application is focused on unequally spaced data although the techniques can be applied to equally spaced data as well. The paper is organized as follows. Section 2 summarizes the methods from which the cts package was developed. Section 3 outlines the implementations in the package. Section 4 illustrates the capabilities of cts with two data sets. Finally, Section 5 concludes the paper.

The CAR and CZAR model
Suppose we have data x k observed on time t k for k = 1, 2, ..., n. We assume noise-affected observations x k = y(t k ) + η k and y(t k ) follows a p-th order continuous time autoregressive process Y (t) satisfying where Y (i) (t) is the ith derivative of Y (t) and (t) is the formal derivative of a Brownian process B(t) with variance parameter σ 2 = var{B(t + 1) − B(t)}. In addition, it will be assumed that η k is a normally distributed random variable representing observational error, uncorrelated with (t), and E(η k ) = 0; E(η j η k ) = 0, for j = k; E(η 2 k ) = γσ 2 . The operator notation of model (1) where D is the derivative operator. The corresponding characteristic equation is then given by α(s) = s p + α 1 s p−1 + ... + α p−1 s + α s = 0.
To assure the stability of the model, a parameterization was constructed on the zeros r 1 , ..., r p of α(s) (Jones 1981), i.e., The model in the cts package follows the modified structure (Belcher et al. 1994): with scaling parameter κ > 0. This introduces a prescribed moving average operator of order p − 1 into the model, which makes the model selection convenient along with other theoretic benefits described in Belcher et al. (1994). In practice model (5) has been found to fit data quite well without the need for an observation error term (Belcher et al. 1994). Indeed, for large orders p, the observation error variance may become confounded with the parameters of model (5), thus leading to some difficulty in estimation (Belcher et al. 1994). As argued by these authors, model 5 can be sufficient unless there is a substantial variability for measurements at the same or very nearly coincident times.
The power spectrum of the pth order continuous process (5) is defined by This may also be expressed in a form which defines the alternative set of parameters φ 1 , ..., φ p , by using a transformation g = arctan(2πf /κ)/π of the frequency f : The link between the α and φ parameters is given in Belcher et al. (1994) (9) and the line which follows. The new parameter space is identical to that of the stationary discrete time autoregressive model. The CZAR model of Tunnicliffe Wilson and Morton is identical except that the right-hand side of (5) is modified to (κ+D) p−1 (t) so that σ 2 /κ 2p−2 is re-scaled to σ 2 .
The system frequencies are determined by the roots of (4). In fact, the representation of (4) breaks a pth order autoregressive operator into its irreducible linear and quadratic factors that have complex zeros. A quadratic factor (s − r 2k−1 )(s − r 2k ) with complex zeros is associated with a component of the data having the nature of a stochastic cycle, with approximate frequency given by f = | (r 2k )| 2π , where | (r 2k )| is the absolute value of the imaginary part of r 2k . This will be reflected as a component of the autocorrelations with the same frequency, decaying in amplitude with a rate equal to the absolute value of the real part of the same zero. The model can represent a strong cycle in the data if this decay rate is very low. A linear factor with zero at r k is associated with a component of the data having the nature of a first order autoregression with autocorrelation function exponentially decaying at the rate r k . If r k is very large, this can give the appearance of a white noise component.

Kalman filtering
This section deals with the details related to applying the Kalman filter to estimate the parameters of model (5), following Jones (1981) and Belcher et al. (1994). To apply the Kalman filter, it is required to rewrite model (5) to a state space form, which may be found in Wiberg (1971). Let the unobservable state vector θ(t) = (z(t), z (t), z (t)..., z (p−1) (t)) and θ the first derivative of θ(t). The state equation is then given by where and The observation equation is given by where the elements of the 1 × p vector H are given by Suppose that A can be diagonalized by r 1 , r 2 , ..., r p are the roots of α(s), and D is a diagonal matrix with these roots as its diagonal elements. In this case, we let θ = U ψ, and the state equation becomes where J = U −1 R. Consequently, the observation equation becomes where G = HU . The necessary and sufficient condition for the diagonalization of A is that A has distinct eigenvalues. The diagonal form not only provides computational efficiency, but also provides an interpretation of unobserved components. The evaluation of T θ,k = e Aδ k (standard form) is required where δ k = t k − t k−1 . For a review of computations related to the exponential of a matrix, see Moler and Loan (2003). For the diagonal form, T ψ,k = e Dδ k is diagonal with elements e r i δ k . When a diagonal form is not available, a numerical matrix exponential evaluation is needed.
To start the Kalman filter recursions, initial conditions are in demand. For a stationary model, the unconditional covariance matrix of state vector θ(t) is known (Doob 1953) and used in Jones (1981) and Harvey (1990, §9.1). The initial state for both standard and diagonalized version can be set as θ 0 = 0 and ψ 0 = 0, respectively. The stationary covariance matrix Q satisfies When A can be diagonalized, it is straightforward to show that whereJ j andr j are complex conjugates of J j and r j , respectively.
The scale parameter κ can be chosen approximately as the reciprocal of the mean time between observations. The algorithm of Kalman filter for the diagonal form is presented below. Starting with an initial stationary state vector of ψ 0 = ψ(0|0) = 0 and the initial stationary state covariance matrix Q ψ (16), the recursion proceeds as follows: a diagonal matrix, then 2. Calculate the covariance matrix of this prediction: 3. Predict the observation at time t k : and variance 5. Update the estimate of the state vector: 6. Update the covariance matrix: 7. The unknown scale factor σ 2 can be concentrated out by letting σ 2 = 1 temporally. -2 log-likelihood is calculated by The log-likelihood function (25) thus can be evaluated by a recursive application of the Kalman filter, and a nonlinear numerical optimization routine is then used to determine the parameter estimation. The unknown scale factor can then be estimated bŷ When a diagonal form is not stable, a standard form Kalman filter recursion may be found in Belcher et al. (1994) or Wang (2004). However the computational load is reduced dramatically with the diagonal form since matrix D is diagonal.
When the nonlinear optimization is successfully completed, in addition to the maximum likelihood estimation of the parameters and error variances, the Kalman filter returns the optimal estimate of the state and the state covariance matrix at the last time point. The forecasting of the state, state covariance matrix and observation can be continued into future desired time points using equations from (17) to (20).

Model selection
To identify a model order, Belcher et al. (1994) proposed a strategy corresponding to the reparameterization. Start with a large order model, and obtain the parameter vector φ and its covariance matrix V φ , we then make a Cholesky decomposition such that V −1 φ = L φ L φ where L φ is a lower triangular matrix, and define the vector t φ = L φ φ and construct the The index of the minimum value of AIC d suggests a preferred model order. In addition, if the true model order p is less than the large value used for model estimation, then for i > p the t-statistics may be treated as normaldistributed variables, so that the deviation from their true values of 0 will be small. However, Belcher et al used a 33MHz maths co-processor, and with the speed of present day computers the best practice is to compute the classical AIC or BIC (Akaike 1974;Schwarz 1978) by fitting the models of increasing order p to the series. The AIC is defined as n log SS(p) + 2p and BIC is defined as n log SS(p) + p log(p) where SS is the sum of squares function given by Belcher et al. (1994) equation 15. The AIC and BIC can be easily modified if an additional mean value of the series is estimated. The package cts has implemented the relevant functions for model selection and a data example will be illustrated.

Diagnostics
The assumptions underlying the model (7) and (10) are that the disturbances (t) and η k are normally distributed and serially independent with constant variances. Based on these assumptions, the standardized one-step forecast errors are also normally distributed and serially independent with unit variance. Hence, in addition to inspection of time plot, the QQ-normal plot can be used to visualize the ordered residuals against their theoretical quantiles. For a white noise sequence, the sample autocorrelations are approximately independently and normally distributed with zero means and variances 1/n. Note that for a purely random series, the cumulative periodogram should follow along a line y = 2x where x is frequency. A standard portmanteau test statistic for serial correlation, such as the Ljung-Box statistic, can be used as well. The proposed calculation of the scaled innovation is frequently done in classical discrete-time models. This way a sequence of n numbers is calculated, and the auto-correlation and discrete-time spectrum of these numbers are calculated, i.e. the time between observations does not enter these calculations.

Kalman smoothing
For a structural time series model, it is often of interest to estimate the unobserved components at all points in the sample. Estimation of smoothed trend and cyclical components provides an example. The purpose of smoothing at time t is to find the expected value of the state vector, conditional on the information made available after time t. In this section, a fixedinterval smoothing algorithm (Harvey 1990, §3.6.2) is implemented with modifications for the model considered, though a more efficient approach is possible, see the discussion in Durbin and Koopman (2001, §4.3). Estimating unobserved components relies on the diagonal form which provides the associated structure with the corresponding roots r 1 , ...r p . The smoothing state and covariance matrix are given by where and T ψ,k+1 = e D(t k+2 −t k+1 ) , andT ψ,k+1 andP (t k |t k ) are complex conjugates. To start the recursion, the initial values are given by ψ s (t n |t n ) = ψ(t n |t n ) and P s (t n |t n ) = P (t n |t n ). The observed value x k , in the absence of measurement error, is the sum of contributions from the diagonalized state variables ψ, i.e., x k = j G j ψ j (t k ). Therefore, the original data may be partitioned, as in Jenkins and Watts (1968, §7.3.5). Any pair of two complex conjugate zeros of (4) is associated with two corresponding state variables whose combined contribution to x k represents a source of diurnal variation. The real zeros correspond to exponential decay. If a real zero is very large, this can provide an appearance of white noise component. Hence, the contributions G j ψ j at every time point can be estimated from all the data using the Kalman smoother as described above.

Implementation
The cts package utilizes the Fortran program developed by the authors of Belcher et al. (1994), with substantial additional Fortran and R routines. In this process, two Fortran subroutines in Belcher et al. (1994) have to be substituted since they belong to commercial NAG Fortran Library, developed by the Numerical Algorithms Group. One subroutine was to compute the approximate solution of a set of complex linear equations with multiple right-hand sides, using an Lower-Upper LU factorization with partial pivoting. Another subroutine was to find all roots of a real polynomial equation, using a variant of Laguerre's Method. In the cts package, these subroutines have been replaced by their public available counterparts in the LAPACK & BLAS Fortran Library. All the Fortran programs were written in double precision.
If a constant term is estimated by the default setting ccv="CTES" in the car function, it represents the mean µ, and the model is formulated in terms of (x − µ). In the setting (ccv="MNCT"), the series is not actually mean corrected, but the sample mean is just used to estimate µ in the above model formulation. Several supporting R functions are available in the cts package that extract or calculate useful statistics based on the fitted CAR model, such as model summary, predicted values and model spectrum. In particular, the function car returns objects of class car, for which the following methods are available: print, summary, plot, predict, AIC, tsdiag, spectrum, kalsmo. A detailed description of these functions is available in the online help files. Here a brief introduction will be given and the usage will be illustrated in the next section. Specifying trace=TRUE in car_control could trigger annotated printout of information during the fitting process and major results for the fitted model. The model fitting results can be graphical displayed with plot function. With argument type equal to "spec", "pred" and "diag", respectively, a figure can be plotted for spectrum, predicted values and model diagnostic checking, respectively. Three types of prediction exist: forecast past the end, forecast last L-step, forecast last L-step update. This can be achieved by invoking argument fty=1, 2, 3, respectively. For instance, ctrl=car_control(fty=1, n.ahead=10) can predict 10 steps past the end. Function AIC can generate both t-statistic and AIC values following section 2.3. Function tsdiag follows section 2.4 to provide model diagnostic checking. Indeed, this function provides the backbone for function plot with argument type="diag". Function kalsmo implements the Kalman smoothing described in section 2.5.
The source version of the cts package is freely available from the Comprehensive R Archive Network (http://CRAN.R-project.org). The reader can install the package directly from the R prompt via R> install.packages("cts") All analyses presented below are contained in a package vignette. The rendered output of the analyses is available by the R-command R> library("cts") R> vignette("kf", package = "cts") To reproduce the analyses, one can invoke the R code R> edit(vignette("kf", package = "cts"))

Data examples
Two data examples in Belcher et al. (1994) are used to illustrate the capabilities of cts. A detailed description of the data can be found in the original paper. Since some analysis here reproduces the results in Belcher et al. (1994), we also ignore a lengthy discussion for brevity. These analyses were done using R version 3.0.0. We first fit a model of order 14 to the data, following Belcher et al. (1994). The scale parameter is chosen to be 0.2 as well. The estimation algorithm converges rather quickly as demonstrated in the following printout, which shows the sum of squares and the value of φ 14 at each iteration. The results are similar to Table 1 of Belcher et al. (1994), which took 30 minutes on a PC386/387 machine to carry out the computing. These authors expected that simple improvements to the program's code could substantially speed up the procedure. Despite that the current cts package has no intent to accomplish such a task, running the above car function took only 0.3 second, on an ordinary desktop PC (Intel Core 2 CPU, 1.86 GHz). Such a dramatic efficiency improvement is unlikely driven by software change, but by hardware advancement in the last 20 years.

Geophysical application
R> V22174.car14 <-car(V22174, scale = 0.2, order = 14) R> tab1 <-cbind(V22174.car14$tnit, V22174.car14$ss, V22174.car14$bit[, + 14]) R> colnames(tab1) <-c("Iteration", "Sum of Squares", "phi_14") R> print(as.data.frame(round(tab1, 5)), row.names = FALSE, + print.gap = 8) Following section 2.3, a model selection was conducted with AIC which generates exactly the same results as Table 2 of Belcher et al. (1994). Accordingly, the first-order value for the AIC shows the most rapid drop from the base-line of 0. Consequently a large t-value of 3.20 suggests order 7 while the minimum AIC implies order 9. For illustration, a model order 7 was selected as in Belcher et al. (1994). The estimated spectra for both models of order 14 and 7 are displayed on logarithmic (base 10) scale in Figure 2. Both two models indicate three peaks, while for the model of order 14 the resolution is much improved. In spectrum calculation, the default frequency range is set as from zero to scale parameter κ in n.freq=500 intervals. It is convenient to plot the spectrum for a new range of frequencies with the argument frmult which can be used to multiply the frequency range.

CAR (7) spectrum
Figure 2: Spectra from fitted models for the oxygen isotope series.
To check model assumptions as described in section 2.4, Figure 3 displays a plot of the standardized residuals, the ACF of the residuals, cumulative periodogram of the standardized residuals, and the p-values associated with the Ljung-Box statistic. Visual inspection of the time plot of the standardized residuals in Figure 3 shows no obvious patterns, although one outlier extends 3 standard deviations. The ACF of the standardized residuals shows no apparent departure from the model assumptions, i.e., approximately independently and normally distributed with zero means and variances 1/n at lag > 0. The cumulative periodogram of standardized residuals follows the line y = 2x reasonably well. The Ljung-Box statistic is not significant at the lags shown.   Belcher et al. (1994) analyzed 209 measurements of the lung function of an asthma patient. The time series is measured mostly at 2 hour time intervals but with irregular gaps, as demonstrated by the unequal space of tick marks in Figure 4.

Medical application
R> data("asth") R> plot(asth, type = "l", xlab = "Time in hours", ylab = "") R> rug(asth[, 1], col = "red") To apply cts, a scale parameter 0.25 was chosen and a model of order 4 was fitted to the data (Belcher et al. 1994 The log-spectrum (base 10) of the fitted model is shown in Figure 5. The spectral peak indicates a strong diurnal cycle in the data.
It is possible to fit a model with an observation error term by setting vri=TRUE in the parameter control statement. The following code shows how to fit such a model. The estimated observation error variance can be found with the summary command and the corresponding spectrum in Figure 5 is compared with the model without an observation error.
Finally, we predicted the last 10 steps past the end of time series in Figure 7.

Conclusion
In this article we have outlined the methods and algorithms for fitting continuous time autoregressive models through the Kalman filter. The theoretical ingredients of Kalman filter have their counterparts in the R package cts, which can be particularly useful with unequally sampled time series data.