State Space Methods in RATS

This paper uses several examples to show how the econometrics program RATS can be used to analyze state space models. It demonstrates Kalman filtering and smoothing, estimation of hyperparameters, unconditional and conditional simulation. It also provides a more complicated example where a dynamic simultaneous equations model is transformed into a proper state space representation and its unknown parameters are estimated.


Introduction
RATS (Estima 2010) is a general-purpose econometrics and programming package, with a specialty in time series analysis.The instruction in RATS for handling state space models is called DLM (for dynamic linear models).This was introduced with version 5.00 in 2001.Since then, there have been many improvements.With version 8, the instruction has the following features: All component matrices can be fixed matrices, or time-varying formulas, or can be computed using functions of arbitrary complexity.
Multiple observables (y t with dimension > 1) are permitted, with proper handling of situations where some components are missing, but some are present.
Non-stationary roots in the transition matrix are treated with the "exact" (limit) methods of Koopman (1997) and Durbin and Koopman (2001).The transition matrix is analyzed automatically for stationary and non-stationary roots.
The ergodic variance for the stationary (linear combinations of) states is computed using the efficient Schur decomposition method described in Doan (2010).
The calculations of the Kalman filter and smoother can switch (under user control) to use the last calculated values of the Kalman gain and predictive variances.This can save time in large models with time-invariant component matrices.
With the RATS distribution, we include the worked examples from several textbooks devoted to space-space models, including Durbin and Koopman (2001), Commandeur and Koopman (2007), West and Harrison (1997) and Harvey (1989).These are also posted on our web site at http://www.estima.com/textbookindex.shtml.
The aim of RATS is to combine the best features of statistical and "math" packages.It has almost all of the capabilities of the commonly used matrix languages1 , but also includes carefully written, highly-optimized instructions for estimating and analyzing the most important types of models.DLM is a particularly good example of this.While it is not hard to write in matrix form Kalman filtering and Kalman smoothing for small, well-behaved special cases, it is much harder to handle (correctly) missing data and initialization and to hook the calculations up with optimization code.DLM handles those issues and more.
In addition to the "built-in" instructions like DLM and GARCH (which handles a wide variety of univariate and multivariate GARCH models), RATS includes hundreds of procedures which are usually short sequences of special-purpose code.Because these are plain text files, they can be modified easily by the user.For instance, in our examples in this paper, we use the procedure STAMPDIAGS for diagnostics on state space models.If you want to change what this reports, you can just modify the procedure.This paper is organized as follows.Section 2 describes the structure of state space models as handled through the DLM instruction.Section 3 introduces the model used through most of this paper.For that model, Section 3.1 demonstrates Kalman smoothing, given values for the variances.Section 3.2 shows the various ways to estimate the hyperparameters (variances) with Section 3.3 discussing several types of calculated or graphed diagnostics for those estimates.Section 3.4 shows how to forecast out-of-sample.Section 3.5 offers an example of unconditional simulation and 3.6 uses conditional simulations to do Gibbs sampling.We look at a more complicated model in Section 4. In all cases, we are providing only the segment of code needed to demonstrate a technique.The full running examples are available along with this manuscript and on our web site at http://www.estima.com/resources_articles.shtml.

The DLM instruction
Our state space structure takes a bit broader form than the one described in the introduction paper to this volume.Because the components are input to the DLM instruction using short alphabetical names based upon our own description of the state space model, we will use that from this point on in this article: Table 1 shows the translations between the notation in the introduction paper and ours.The addition of the µ t term to the measurement equation is only a minor matter of convenience, Table 1: Notation for state space models.
since the identical model can be produced by subtracting µ t from both sides of the equation.However, the enhanced form of the state equation with the Z t state shift can not so easily be accommodated in the simpler form, particularly when the state shift component is timevarying.
Given a state space model, you can choose to: Kalman filter, computing the likelihood function assuming Gaussian errors; this also computes predictions for the states and the observables, the prediction errors and variances for the predictions and states.
Kalman smooth, with calculations of smoothed states and their variances, disturbances and their variances.
Simulate unconditionally, with random draws for the state and measurement shocks, producing simulated states and observables.
Simulate conditional on observed data, producing simulated states and shocks.
There are as many as ten inputs to DLM when you count the pre-sample mean and variance, and even more potential outputs.However, most applications will not need all, or even most, of them.Most of the information is supplied to the instruction using options.Each option is given a name, which for DLM generally matches with the notation from (1) and (2), so the A matrix is put in using something like A = 1.0 or A = ADLM.By relying on these options, we make it easier to do simpler models, since little-used inputs like µ t and Z t just default to zero and can be left out of the specification.

The local level model
The model that we will use for the example in this section is the local level model, applied to the Nile flow data, annual from 1871 to 1970.The model is (with our timing on the state equation) where α t is the unobservable local level.The model has time-invariant components A = C = F = 1, Z = µ = 0.These are the default values for all but C. The measurement error variance σ 2 ε is input using the SV option, while the state shock variance σ 2 ξ comes in through the SW option.
The data are read from a plain text file3 into a time series called NILE, and the option Y = NILE is used to provide the y t information for the measurement equation.As with other unobserved components (UC) models, the state has non-stationary dynamics.To handle the initial conditions, we can use the option PRESAMPLE = DIFFUSE, which indicates that the initial condition for the state is fully diffuse.This is implemented using the "exact" method of Koopman (1997) and Durbin and Koopman (2001).The same outcome will be obtained using the more flexible PRESAMPLE = ERGODIC, which analyzes the transition matrix and determines its roots.

Kalman smoothing
For now, we will take the component variances as given, and discuss estimation in Section 3.2.We will peg them at σ SET is the main RATS instruction for creating and transforming time series.The %SCALAR function selects the first element out of a vector or matrix, so the series A will be the time series of estimated states, and P the time series of estimated variances.GRAPH is the time series graphing instruction; the / 3 on the last two lines forces the upper and lower bounds to use the same color or pattern.The graph produced by this is Figure 1.

Estimation of hyperparameters
The DLM instruction will always, as a side effect, compute the log likelihood of the model given the input variances.This can be maximized, with a wide range of choices for optimization, allowing for both derivative-based hill-climbing techniques, and slower but more flexible search methods.It also has the ability to (easily) incorporate equality or inequality constraints.
One way to estimate the two variances in the local level model is: nonlin psi compute psi = 0.0 dlm(a = 1.0, c = 1.0, sv = 1.0, sw = exp(psi), y=nile,$ presample = diffuse, method = bfgs, var = concentrate) The NONLIN instruction declares the set of free parameters to be estimated-here, that is The measurement error variance is concentrated out, which can sometimes be helpful in improving the behavior of difficult estimation problems.The estimation method being used here is the hill-climbing method BFGS.The output is shown in Table 2.5 Note that, while there are 100 data points, the likelihood is calculated using only the final 99 of them.This is done automatically here because of the diffuse initial conditions-the predictive variance for observation 1 is infinite, and so it is dropped from the calculation of the likelihood.DLM has an additional option CONDITION which can control the number of data points which are included in the filtering calculations, but omitted from the likelihood used for estimation.This is generally not needed, since DLM handles the diffuse states automatically, but is useful when the number of non-stationary states is not known a priori, if, for instance, autoregressive parameters are being estimated.Direct estimation of the variances requires a bit more care with guess values.This uses scalings of the series sample variance, which should get the order of magnitude correct.The output is in Table 3.
While not important in this case, the NONLIN instruction can also handle various constraints on the parameters, either equality or inequality.With no change to the setup, we could estimate this with σ 2 ξ pegged to zero (which here gives a model with a fixed mean) using nonlin sigsqeps sigsqxi=0.0dlm(a = 1.0, c = 1.0, sv = sigsqeps, sw = sigsqxi, y = nile,$ method = bfgs, presample = diffuse) 1871:1 1970:1 In a more complex model, where there is some chance that a component variance might be zero, the NONLIN instruction can be used to set an inequality constraint: This uses a penalty function variation on BFGS.Since it is quite a bit slower than standard BFGS, we generally do not recommend using it unless the simpler unconstrained estimates fail to provide values in range.The equality constraints from the previous case, on the other hand, are done by taking the constrained parameter out of the parameter set and using standard BFGS, so it actually runs faster than unconstrained BFGS.

Diagnostics
The most straightforward diagnostics come from the standardized residuals.These can be computed with the help of the VHAT and SVHAT options.VHAT is used to fetch the prediction errors and SVHAT the predictive variance. 6Again, these will be in the form of a VECTOR (for VHAT) and a SYMMETRIC matrix (for SVHAT) to allow for the possibility of multiple observables.
The following generates standardized predictive errors (into the series EHAT), graphs them (Figure 2) and does a standard set of diagnostics on the recursive residuals (output in Table 4): dlm(a = 1.0, c = 1.0, sv = sigsqeps, sw = sigsqxi, y = nile,$ method = bfgs, presample = diffuse,$ vhat = vhat, svhat = svhat) 1871:1 1970:1 set ehat = %scalar(vhat)/sqrt(%scalar(svhat)) graph(footer = "Standardized residual", vgrid = ||-2.0,2.0||) # ehat @STAMPDiags(ncorrs = 9) ehat The VGRID = ||-2.0,2.0|| option on the GRAPH adds the horizontal lines at ±2.Note that, because of the diffuse prior, the first standardized error is omitted.This is handled automatically in the code because the SVHAT for 1871:1 is a missing value.procedure7 also produces the graph of autocorrelations seen in Figure 3. Durbin and Koopman (2001) recommend also computing auxiliary residuals, which are the Kalman smoothed estimates for the measurement errors and state disturbances.Large values for these can help identify outliers (in the measurement errors) or structural shifts (in the state disturbances).These can be obtained using the VHAT and WHAT options when Kalman smoothing.The results returned from those are standardized to mean zero, unit variance.The following graphs both of these.This uses SPGRAPH instructions to create a graph page with two panes.The result is Figure 4.

Forecasts
Out-of-sample forecasts can be generated by simply running a Kalman filter past the end of the data set.When the Y value is missing, DLM does the Kalman "update" step but not the "correction".This is how embedded missing values are handled.For out-of-sample forecasts, however, it is generally more straightforward to Kalman filter through the observed data, then run a separate filter into the forecast range.This next code segment uses the X0 and SX0 options to feed in the final estimated mean and variance for the states (from Kalman filtering over the sample) into the Kalman filter for the forecast range.The YHAT and SVHAT options are used to get the prediction and the predictive error variance for the dependent variable.You can also get the predicted value of the state and its predictive variance using the standard state parameters.The following organizes a graph of the forecasts with their 50% confidence interval.Only forty years of actual data are included to give the forecast range enough space.This produces Figure 5.

Simulations
There are two choices for random simulations of a model: TYPE = SIMULATE chooses unconditional simulation, where shocks for the states and measurements are drawn independently, and TYPE = CSIMULATE, where they are drawn subject to the requirement that the observed data are produced.TYPE = SIMULATE will generally be used in out-of-sample operations.
In this section, we will demonstrate unconditional simulation, with conditional simulations described in Section 3.6.
To simulate out-of-sample, Kalman filtering is used through the observed range of the data to get the end-of-period estimates of the mean and variance of the state.Here, we will generate 10000 realizations for the process over the next fifty periods.The maximum flow for each realization is recorded.The percentiles are computed once the simulations are done.This could be used, for instance, to estimate the level for 50-year or 100-year floods.The results are shown in Table 5.

Conditional simulations
Drawing the states and disturbances conditional on the observed data is more complicated, but more useful than the unconditional draws.The Kalman smoother gives us the mean and covariance matrix of the states and disturbances individually, but that is not enough to allow us to do draws, since conditional on the data, the states are highly correlated.Conditional simulation can be done with a pair of Kalman smoothing passes, one on simulated data, as described in Durbin and Koopman (2002), which is the algorithm used in RATS.
The main use of conditional simulation is in Bayesian techniques such as Markov Chain Monte Carlo.In fact, it is sometimes called (inaccurately) "Carter-Kohn", after the (alternative) algorithm described in Carter and Kohn (1994) as a step in a Gibbs sampler.For the Gibbs sampler, the states and/or disturbances are added to the parameter set, and conditional simulation gives a draw for those given the data and the underlying model parameters.The next step would then be to produce a draw for those model parameters conditional on the states and disturbances, which often takes quite a simple form once the latent information from the state space model can be treated as known.
We now examine the model estimated in Section 3.2.The two hyperparameters are modeled as the precision h of the measurement error and the relative variance ψ of the state shock to the measurement error.These are given very loose priors, with h being inverse gamma with 1 degree of freedom and ψ being gamma with 1 degree of freedom.We start each at the mean of its prior.The hyperparameters are then drawn conditional on the just-created draws for the disturbances.For this, we need the sums of squares for each of the two disturbances.This is easily done with the SSTATS instruction, which can compute sums, means, and other types of statistics on general expressions.
This computes and graphs (Figure 7) estimated density functions for the two parameters:

A larger model
A recent addition to RATS is the instruction DSGE (for dynamic stochastic general equilibrium (model)), which takes a model with expectational terms and solves it symbolically for a backwards-looking state space representation.If necessary, it will linearize or log-linearize the model to create an approximate state space form.The combination of DSGE and DLM can be used to evaluate the likelihood (for Gibbs sampling) or directly estimate by maximum likelihood the deep parameters in a DSGE.As an example, we include with RATS a program for replicating the estimation of the model from Ireland (2004).
The model that we will show here does not have expectational terms, but instead, is a dynamic model in a non-standard form.8DSGE can also help here by solving out for a standard state space representation.The original model is from Sargent (1977).The two observables are money growth (µ t ) and inflation (x t ).With two observables, we need two shock processes: one will be to money demand, the other to money supply.The model (reduced to growth rates) can be written as: The underlying shocks are ε t and η t , which are assumed to be independent.Since those are both tangled up in the definitions of a 1 and a 2 , we will use separate equations to put them into the model.The model set up is9  The DSGE instruction solves symbolically for a state space representation (returned by the A and F options) given a particular set of deep parameters (here λ and α).It requires the user to list the endogenous variables-the order of listing determines the order of placement within the state space model.
Note that x t and µ t are cointegrated with x t − µ t being stationary-the process has mixed stationary and non-stationary dynamics.As a result, a fully diffuse prior is not technically correct.However, with the option PRESAMPLE = ERGODIC, the DLM instruction can handle this automatically, analyzing the transition matrix to isolate the unit roots and create a partially diffuse prior.
If we want to estimate the free parameters of this model (λ, α and the two variances), we need to use the DSGE instruction at the start of every function evaluation to get the revised system matrices.The most convenient way to do this is to define a FUNCTION, which does the calculation and creates the required matrices (called here ADLM and FDLM).
function EvalModel dsge(model = cagan, a = adlm, f = fdlm) x mu a1 a2 eps eta end EvalModel This technique is applied with many models, since the transition matrix is often very sparse when there are more than a few states.In this case, we compute the full matrices, but you can also use a function to "poke" values which depend upon the parameters into the proper locations in a larger matrix.
The shocks are both in the state equation, so there is no direct measurement error, thus no SV option.The model can be estimated with: nonlin alpha lambda sig_eta sig_eps dlm(start = %(EvalModel(), sw = %diag(||sig_eps^2, sig_eta^2||)),$ a = adlm, f = fdlm, y = ||x, mu||, c = cdlm, sw = sw,$ presample = ergodic, pmethod = simplex, piters = 5, method = bfgs) At the start of each function evaluation, the EvalModel function is called to solve for state space matrices, and the (diagonal) matrix SW is created from the standard deviations.The option y = ||x, mu|| describes the 2-vector of observables.This uses an option available on DLM (and other estimation instructions in RATS) to use two separate optimization methods.PMETHOD = SIMPLEX (PMETHOD for preliminary method) uses a small number of derivative-free simplex iterations at the start of the estimation, to refine guess values before switching to the derivative-based BFGS.This can be helpful in many types of models where the log likelihood function may be ill-behaved.

Conclusion
This paper has used several examples to give a taste of how RATS can be used with state space models.The DLM instruction has many options, allowing it to handle a wide range of tasks.Its internal calculations for filtering, smoothing and simulation have been highly optimized.
When combined with the programming flexibility of the RATS package, many models which are quite cumbersome when done with matrix languages or less flexible packages can be done simply and quickly.We invite you to check our web site or e-mail us for more information.
2 ε = 15099 and σ 2 ξ = 1469.1,which are the maximum likelihood values.The instruction for Kalman smoothing with the Nile data is: 4 dlm(a = 1.0, c = 1.0, sv = 15099.0,sw = 1469.1,presample = diffuse,$ y = nile, type = smooth) / xstates vstates TYPE = SMOOTH chooses Kalman smoothing.The default analysis is Kalman filtering-the extra calculations for Kalman smoothing are not done unless requested.The XSTATES parameter gets the smoothed state estimates and VSTATES gets the smoothed state variances.Since the state vector is (in almost all cases) bigger than a single element, XSTATES is a time series of vectors and VSTATES is a time series of (symmetric) matrices.Code for generating 90% confidence intervals and graphing them is given next: set a = %scalar(xstates) set p = %scalar(vstates) set lower = a + sqrt(p)*%invnormal(.05) set upper = a + sqrt(p)*%invnormal(.95) graph(footer = "Figure 1. Smoothed state and 90% confidence intervals"

Table 2 :
Estimation with concentrated variance.

Table 3 :
Estimation with both variances.

Table 4 :
Table 4 include a Ljung-Box Q test for serial correlation, a Jarque-Bera normality test and a Goldfeld-Quandt style test for heteroscedasticity.The STAMPDiags State space model diagnostics.

Table 5 :
Percentiles from maximum simulated flows.