The STAMP Software for State Space Models

This paper reviews the use of STAMP (Structural Time Series Analyser, Modeler and Predictor) for modeling time series data using state-space methods with unobserved components. STAMP is a commercial, GUI-based program that runs on Windows, Linux and Macintosh computers as part of the larger OxMetrics System. STAMP can estimate a wide-variety of both univariate and multivariate state-space models, provides a wide array of diagnostics, and has a batch mode capability. The use of STAMP is illustrated for the Nile river data which is analyzed throughout this issue, as well as by modeling a variety of oceanographic and climate related data sets. The analyses of the oceanographic and climate data illustrate the breadth of models available in STAMP , and that state-space methods produce results that provide new insights into important scientiﬁc problems.

1. STAMP STAMP (Structural Time Series Analyser, Modeler and Predictor) is a commercial, graphical user interface (GUI)-based package for the analysis of both univariate and multivariate state-space models written by Koopman, Harvey, Doornik, and Shephard (2009). It runs on Windows, Macintosh and Linux operating systems as part of the larger OxMetrics System, a software system for (econometric) data analysis and forecasting (Doornik 2009). The authors are some of the leading researchers in the field of state-space modeling, and as such the algorithms used in STAMP, as well as the features available for analysis of model fit, tend to stay current with the state of the art in the literature.
A strength of STAMP is the large variety of models that can be analyzed, both univariate and multivariate. The univariate models include structural component models as well as timevarying regressions, and the multivariate models include "common trends", "common cycles", "common seasons" and "seemingly unrelated time series equations" (SUTSE), with exoge-In Section 2 the Nile river data, that are analyzed in all the articles in this special issue, are modeled using STAMP. For purposes of "full disclosure" the Nile river data are analyzed extensively in the manual that comes with STAMP 8.2, and the analysis in Section 2 follows the analysis therein. Even so, all the steps in the analysis are straightforward given the tools provided by STAMP and the end result would be the same with or without the analysis in the manual. All the analyses in Section 2 were done using STAMP 8.2 on a 2.2 GHz Intel Core 2 Duo Macbook Pro, and all graphics in that section are either unretouched from what is produced in STAMP 8.2 or use features for editing the graphics that are supplied by OxMetrics. In Section 3 oceanographic and climate indices are analyzed using state-space models in an area where the uses of these methods are just beginning to come to fore. In Section 4 state-space models are combined with subspace identification techniques to analyze a large-scale space-time dataset.

Nile river
OxMetrics can import a variety of formats including Excel and Lotus spreadsheets, comma separated (.csv) files, Gauss and Stata files, a variety of text formats as well as its own native format. The Nile river volume data were provided in one of the text file formats that OxMetrics supports, making import of the data easy. As part of the data import, the time and time interval of the data are defined, and that information is carried along in the rest of the analyses and graphics. STAMP is started within OxMetrics.
The initial STAMP dialog (Figure 1) allows the user to select other dialogs that change options or settings for the various stages of data modeling. Selecting "Formulate" to define the initial model, a screen comes up allowing the user to choose which variables to include in the model, and what role they will play ("X" or "Y"), the latter being necessary for certain classes of models, such as a state-space model with a fixed effect or a time-dependent regression ( Figure 2).
Having chosen which variables to use in the model, a dialog comes up with options for what components to include in the model (Figure 3).
A nice feature of STAMP is the wide range of models that can be estimated, and the fact that it is kept up-to-date with new research, such as allowing higher "orders" in both the level and the cycle (see Harvey and Trimbur 2003). Kitagawa and Gersch (1996, for example) have long advocated that many series are better fit with a higher order of differencing in the level    Table 1: Maximum likelihood estimates of the variances in the local level model for the Nile river data. The q ratio is the relative variance of the level and irregular, that is, the signal-to-noise ratio.
term. The default model is the "basic structural model" (BSM), a model which includes a stochastic level and slope as well as a stochastic seasonal component, and a BSM without the seasonal component was used as the initial model to estimate the Nile river data. In the notation of equation (4) in Commandeur, Koopman, and Ooms (2011) the initial model is: For this model the maximum likelihood estimate of the slope was zero (with no variance), so the model was re-estimated with just a stochastic local level (Table 1).
An important feature of STAMP is that although it is GUI based, it also can be run in batch mode using a scripting language that is part of OxMetrics. Even more so, as you select a model in the GUI, STAMP automatically generates the code needed to run that model in batch mode (Figure 4). This is a feature used in Section 4.
The estimated local level model ( Figure 5) suggests a mean level that is relatively constant over long periods with several possible shifts, and this impression is reinforced in the plots of the standardized residuals ( Figure 6) and even more so in the auxiliary residuals ( Figure 7). STAMP has a feature to automatically detect possible points of intervention in a series, and the local level model was re-estimated with this feature. A level break in 1899 and an outlier break in 1913 are identified by this automatic feature, the former corresponding to the building of the Aswan dam, and the latter to a year of extremely low flow in several African rivers following the high-latitude Katmai eruption.
Re-estimating the local level model to include the interventions produces a model where the level changes only due to the interventions (Table 2) but the residual graphs show significant autocorrelations and other signs of periodicity in the residuals. Based on this, a final model was fit with a fixed level, a stochastic cycle, and the two interventions (Table 3). A stochastic cycle, in the notation of equation (6) in Commandeur et al. (2011) is defined as: where ψ t and ψ * t are the states, λ c is the frequency, in radians, in the range 0 < λ c ≤ π, κ t and κ * t are two mutually uncorrelated white noise disturbances with zero means and common variance σ 2 κ , and ρ is a damping factor. The damping factor ρ in (1)  Nile River Volume Level +/-2SE   Table 2: Maximum likelihood estimates of the variance and regression terms in the local level model with interventions for the Nile river data. RMSE is the residual mean square error and the q ratio is the relative variance of the level and irregular, that is, the signal-to-noise ratio.  which a higher amplitude event (consider this to be a "shock" to the series) in the stochastic cycle will contribute to subsequent cycles. A stochastic cycle has changing amplitude and phase, and becomes a first order autoregression if λ c is 0 or π. Moreover, it can be shown that as ρ → 1, then σ 2 κ → 0 and the stochastic cycle reduces to the stationary deterministic cycle: The cycle has a period of 6.65 years and a damping factor of 0.26034 (Figure 8), and all of the test statistics (Table 4), residual diagnostics (Figures 9), and auxiliary residual diagnostics ( Figure 10 ) for this model suggest a good fit both to the data and to the assumptions of the model. The roughly seven year cycle is consistent with a number of processes in the atmosphere and ocean, and conditions in the Nile have been correlated with drought conditions in the Southwest United States, for example, based on these teleconnections (Cayan, Dettinger, Diaz, and Graham 1998;Jiang, Mendelssohn, Schwing, and Fraedrich 2002 Table 4: Test statistics for the cycle with interventions model for the Nile river data. T is the number of time periods, p the number of parameters, std.error is the square root of the prediction error variance, N ormality is the Bowman-Shenton statistic of the third and fourth moments of the residuals, H(32) is a measure of heteroskedasticity, DW is the Durbin-Watson test, r(1) is the estimated residual lag 1 autocorrelation, Q(q, q − p) is the Box-Ljung statistic based on the first q autocorrelations, r(q) is the lag q autocorrelation, and R 2 is a goodness of fit measure.

Ocean and climate indices
New statistical methods are often demonstrated to produce results that differ from those of previous methods, but it is unclear that these differences are of any importance to the subject matter at hand. In this section, we use state-space methods to examine oceanographic and climate related indices, and show that the use of these models produce significant scientific insight. All of the univariate modeling was done using STAMP, while the subspace identification procedure was performed in MATLAB on the output from univariate models estimated by STAMP.

Pacific decadal oscillation
The Pacific decadal oscillation (PDO) is defined as the first principal component (or as called in the oceanography literature "Empirical orthogonal function" or "EOF") of monthly 5-degree sea surface temperature (SST) anomalies after removal of the monthly global mean of the anomalies (Mantua, Hare, Zhang, Wallace, and Francis 1997). There has been much discussion as to what the PDO represents, with some authors calling any decadal-scale dynamic in the ocean the PDO, while others say the PDO is simply red noise (autoregressive) and is consistent with an oceanographic model where the atmosphere is white and the ocean is red (see for example Pierce 2001;Rudnick and Davis 2003). If the latter is true, the PDO is stationary and does not represent any shifts in the climate dynamics.
To examine this question, a series of models were estimated for the PDO, including models with a fixed and a stochastic level, models with a fixed and a stochastic seasonal component, and each of those in combination with an AR(1) component and a stochastic cycle. This type of analysis can be readily carried out in STAMP given the large class of models and model statistics it provides and by utilizing the batch language to automate the estimation of the different models. The "best" model, based on AIC and BIC statistics as well as examination of the residuals, was a model with a stochastic level, a stochastic cycle, and a fixed seasonal component ( Figure 11). What is clear is that while the largest amount of the variance is contained in the higher frequency stochastic cycle term, there is a significant trend in the series.
The same set of analyses that were carried out for the PDO were also done for an index of the north Pacific high (NPH) pressure center, with the same model (stochastic level, stochastic cycle, and fixed seasonal) being selected as the best model. The estimated levels in the two series ( Figure 11) as well as the estimated stochastic cycles (not shown) are almost identical. So while there is a significant trend and cyclic component in the PDO, these components are capturing the dynamics in the region of the north Pacific influenced by the NPH. An interesting question is to what degree the feedback occurs in each direction (the ocean influencing the atmosphere or the atmosphere influencing the ocean).

El Niño
This section follows Mendelssohn, Bograd, Schwing, and Palacios (2005). There has been recent speculation that El Niño events, at least measured by two indices, the southern oscillation index (SOI) (see for example Philander 1990) and the NINO3 index (Mann, Gille, Bradley, Hughes, Overpeck, Keimig, and Gross 2000), have been increasing both in frequency and intensity over time (see for example Trenberth and Hoar 1997). The SOI is defined as  1990 1980 1970 1960 1950 1940 1930 1920 1910 1900 PDO Trend PDO Cycle Figure 11: Estimated cycle and level for the Pacific decadal oscillation (PDO) and comparison with the estimated north Pacific high (NPH) level. the anomalies of the difference in the pressure at Darwin, Australia and Tahiti. The relative change in the pressures is believed to be associated with shifts in the wind fields, and a negative anomaly signifying El Niño periods while a positive anomaly signifying La Niña periods. The NINO3 series is defined as the average sea surface temperature over a large area of the tropical Pacific, viewed as a measure of the overall surface heat in the tropical Pacific waters. It has been suggested that the NINO3 is exhibiting an increased frequency of warm events.
To examine these questions, the same estimation process used in Section 3.1 was used to estimate a "best fit" model for the NINO3 series, for the SOI, and for the component series of  the SOI (i.e., the Darwin and Tahiti pressures). Again the local level model with a stochastic cycle term was the best fit to the data, except for the Tahiti pressure series where a fixed level model was preferred over a model with a stochastic level. The trends in the series are very similar (Figure 12a,b), in particular the Darwin pressure trend and the SOI trend, which is not surprising since Tahiti pressure has a constant mean. What is more surprising is that the stochastic cycles are very close (Figure 12c), and the El Niño events can be discerned equally as well if not better just from Darwin pressures rather than from the more complicated index (Figure 12d). The smoothed residuals for the stochastic cycle suggest that more recent events are in the tails of the distributions, but the overall impression from the residual and auxiliary residual plots is that this model provides a reasonable fit to the data series. Since the stochastic cycle is stationary, this analysis shows no significant increase in the frequency of El Niño events. Rather there has been warming trend in tropical SST, and this has caused SST anomalies to similarly increase.

Large space-time datasets
The univariate methods of the previous sections in theory extend to multivariate problems by defining "common trends" or "common cycles", parameterized the same way but of lower dimension than the observed data (see for example Harvey 1989, Chapter 8, andDurbin andKoopman 2001, Chapter 3) . These methods work for small to medium sized problems, but after that the Kalman filter/smoother calculations become impractical, although recent advances in dynamic factor models may change this (see for example Jungbacker and Koopman 2008;Koopman and Massmann 2009). Large problems can occur in oceanography and climate and atmospheric sciences, where data are on a space-time grid. Even a 5-degree grid, such as was used to calculate the PDO, can have several hundred series with over 1000 time periods. We have developed an approximate but useful method of dealing with these types of problems by combining state-space decompositions with subspace identification techniques (descriptions of these methods can be found in Akaike 1975;Larimore 1983;Van Overschee and De Moor 1996) .
Subspace identification techniques are used to derive approximate maximum likelihood estimates for the more general dynamic linear Gaussian model of the form: where the observation equation (6a) has y t a p × 1-vector of the observed data, Z t is a p × m matrix which relates the data to the unobserved components α t , which is a vector of dimension m × 1, and all the error terms are zero mean Gaussian random variables, as given in the introductory paper of this volume.
The evolution of the unobserved components or states α t is governed by the initial value α 1 and the state equation (6b). The matrix T t is a m×m transition matrix and the m×1-vector η t is another independent, identically distributed Gaussian random variable; see Shumway and Stoffer (2006, Chapter 6) for further details on the state-space model and how the unknown parameters can be estimated using the EM algorithm.
There are many ways to derive the results for subspace identification techniques, but one follows Akaike (1975) and Larimore (1983) by looking at the canonical variate analysis between the past y − t and the future y + t at a given time t defined as: . . .
where t 0 is for a given lag p the time period t − p and T for a given lead q the time period t + q, by solving the generalized singular value decomposition for the matrices J, L defined as: where Σ pp = cov(y − t , y − t ), Σ f f = cov(y + t , y + t ), and Σ pf = cov(y − t , y + T ). Letting Λ(k) = cov(y t+k , y t ) be the lag k covariance matrix this can be shown to be the equivalent of performing a regular singular value decomposition on the system Hankel matrix defined as: In linear dynamic models the Hankel matrix plays a central role in determining the dimension of the system, and the system matrices can be estimated from the singular value decomposition. When the generalized singular value decomposition is used, which is computationally more stable and efficient, the system state is then estimated as Jy − (t) and the system matrices of the dynamic linear Gaussian model are estimated either by multivariate regression or through algebraic identities that relate the matrices of the model with those obtained from the decomposition. The relationship between subspace identification methods and maximum likelihood based methods of estimating state-space models is discussed in Ninness and Gibson (2000) and Smith and Robinson (2000).
Subspace identification techniques are designed to operate on raw, stationary data, and will not necessarily produce the types of components used in state-space decompositions, in par-ticular the higher frequency terms will tend to mask lower frequency modes. To deal with this problem, we use the following procedure: 1. Estimate a univariate model at each location, in the case using the batch mode ability in STAMP.
2. Calculate the partial residual series by removing the estimated seasonal and cyclic terms for "common trend", the estimated trend and seasonal terms for "common cycles" etc.
3. Perform the subspace identification procedure on the partial residual series.
This procedure was carried out on the raw 5-degree data used in the calculation of the PDO. Interestingly, in this analysis, the fourth common trend is identical to the trend of the PDO, showing that EOFs do not necessarily capture the dominant dynamic modes of a system. The second common trend (Figure 13) is the most interesting. As can be seen, it is almost identical to the estimated trend for the NINO3 series as well as for the trend estimated for Arctic sea ice extent. Here we have identified a global warming-like trend that extends from the tropics to the extra-tropics to the arctic, a sobering result.

Conclusions
STAMP provides a powerful, flexible, easy to use and up-to-date environment to perform state-space analysis of time series data, and can help produce real scientific results. STAMP shares an algorithmic base with SsfPack in Ox, and with the S+FinMetrics in the S-PLUS environment. The mix of GUI based analysis and batch mode allows it to be used for a wide assortment of analyses, and the integration into OxMetrics provides additional features, including data transformations, data arithmetic, graphics and other analyses.
STAMP is a commercial product, which may be a drawback for some users. However other "free" packages, such as SsfPack, require a fee for government and industrial users, and are only free to academic users. I have found STAMP simple to use and very handy when you want to quickly but deeply explore a dataset, examine different models, and have an array of information to help guide the model fitting process. Now that STAMP runs on Windows, Macintosh, and Linux operating systems, it is an even more appealing software package. If there is one feature that needs to be added to STAMP it is the ability to use the simulation smoother to estimate certain classes of non-Gaussian and nonlinear models, a feature available in the companion SsfPack package (Koopman, Shephard, and Doornik 1999).