The glarma Package for Observation-Driven Time Series Regression of Counts

We review the theory and application of generalized linear autoregressive moving average observation-driven models for time series of counts with explanatory variables and describe the estimation of these models using the R package glarma . Forecasting, diagnostic and graphical methods are also illustrated by several examples.


Introduction
Increasingly, in fields as diverse as financial econometrics, public policy assessment, environmental science, modeling of disease incidence and musicology, there is a need for modeling discrete response time series in terms of covariates. In many of these applications the counts are often quite small, at least for a significant portion of the observation times, so as to render the use of traditional Gaussian continuous-valued response time series regression methods inappropriate. In response to this, over the past twenty or so years there has been substantial development of models for discrete-valued time series. For regression models, where interest is focused on making correct inferences about the impact of covariates on the response series, accounting for serial dependence is critical, particularly for obtaining correct standard errors for regression coefficient estimates. Assessing and modeling dependence when the outcomes are discrete random variables can be challenging since traditional methods for detecting series dependence in regression residuals are often not effective. In this paper we consider the GLARMA (generalized linear autoregressive moving average) subclass of observationdriven models in detail. Compared to the parameter driven class of models (see Section 1.1) GLARMA models are relatively easy to fit and provide an accessible and rapid way to detect and account for serial dependence in regression modeling of time series.
The R (R Core Team 2015) glarma package (Dunsmuir, Li, and Scott 2015) provides functionality for estimating regression relationships between a vector of regressors (covariates, predictors) and a discrete-valued response and is available from the Comprehensive R Archive Network (CRAN) at http://CRAN.R-project.org/package=glarma. In time series regression modeling it is typically the case that there is serial dependence. This package models the serial dependence using the GLARMA class of observation-driven models and provides valid inference for the regression model components. Forecasting for this class of models is straightforward using simulation. Dunsmuir (2015) provides a review of GLARMA models and their applications and extensions to analysis of multiple independent time series. GLARMA models have found applications in many disciplines including: financial modeling, epidemiological assessments, clinical management, analysis of crime statistics, and primate behavior. Other areas of applications and examples where GLARMA models could readily be applied include those mentioned in the references cited in Sections 2 where we review observation-driven models more widely.

Generalized state space models
The GLARMA models considered here form a subclass of generalized state space models for non-Gaussian time series described in Davis, Dunsmuir, and Wang (1999), Brockwell and Davis (2010) and Durbin and Koopman (2012) for example. A generalized state space model for a time series {Y t } consists of an observation variable and a state variable. The model is expressed in terms of conditional probability distributions for the observation and state variables. Following Cox (1981) such models can be loosely characterized as either parameter driven or observation driven. The observation specification is the same for both models.
For parameter driven models the state equation commonly consists of a regression component and a latent, usually stationary, time series that cannot be observed directly and which evolves independently of past and present values of the observed responses or the covariates. On the other hand, as the name implies, in observation driven models, the random component of W t depends on past observations {Y s , s < t} as well as other covariates. Davis and Dunsmuir (2015) review estimation for parameter driven models including generalized methods of moments, maximum likelihood and composite likelihood methods. In particular, maximum likelihood estimation requires very high dimensional integrals to be evaluated or approximated using asymptotic expansions, simulation methods, numerical integration or all three. Because of this they can be difficult to fit and for routine model building, in which many potential regressors need to be considered and evaluated for significance, the parameter driven models for count time series are not yet ready for general use.
On the other hand, the observation-driven models considered here are much easier to fit because the likelihood is conditionally specified as a product of conditional distributions which belong to the exponential family and for which the natural parameter is readily calculated via recursion. As a result they are relatively straightforward to apply in practical settings with numerous regressors and long time series.
The outline of the remainder of the paper is as follows. Section 2 briefly reviews observationdriven models in order to place the GLARMA models into context. Section 3 provides the necessary theoretical background for GLARMA models. It describes the various combinations of model elements (response distributions, dependence structures and predictive residuals) that are currently supported in the glarma package. Options for initializing and obtaining maximum likelihood estimates using a form of Fisher scoring or Newton-Raphson iterations are described. Issues of parameter identifiability and convergence properties of GLARMA models and the associated maximum likelihood estimates are also reviewed to guide users in the specification of these models. The forecasting methodology for GLARMA models is also described. Section 4 describes the various modeling functions available in the package. Section 5 describes the built-in model diagnostic procedures and plotting functions. Section 6 provides several examples illustrating the use of the package on real data sets. Section 7 reviews other R packages that are available for fitting observation-driven time series models and Section 8 concludes with discussion of some potential future enhancements to the package.

Observation-driven models
Particularly over the past twenty or so years, following the early contributions of Jacobs and Lewis (1978a,b), Cox (1981), Zeger and Qaqish (1988) and McKenzie (1988) for instance, there has been substantial development of a range of observation-driven models for time series. A general reference is the book by Kedem and Fokianos (2005). Without aiming to be exhaustive we mention some of the main models that have been developed. For count responses, often modeled using a Poisson or negative binomial distribution conditional on the past of the process and regression variables, reviews can be found in Davis et al. (1999), Tremayne (2011), Jung, Kukuk, and and Fokianos and Tjøstheim (2011) for example. The main models discussed include the autoregressive conditional Poisson (ACP) model, the integer-valued autoregressive model (INAR), the integer-valued generalized autoregressive conditional heteroscedastic model (INGARCH), the conditional linear autoregressive process, and the dynamic ordered probit model. For binary responses the autologistic model proposed by Cox (1958) has been extended to the binary autoregressive model (BARMA) proposed by Li (1994).
There is no single model class that covers all of these observation driven models. However Benjamin, Rigby, and Stasinopoulos (2003) propose a quite general class of models, called generalized autoregressive moving average (GARMA) models which we briefly summarize as follows. Let there be n consecutive times at which the response and regressor series are observed. The response series is comprised of observations on the random variables {Y t : t = 1, . . . , n} and associated with these are K-dimensional vectors of regressors x t also observed for t = 1, . . . , n. We let F t = {Y s : s = 1, . . . , t − 1; x s : s = 1, . . . , t} and note that this summarizes, relative to any time point t, the past responses and the past and current regressors. The distribution of the current observed response Y t conditional on F t is assumed to be a member of the exponential family of distributions (McCullagh and Nelder 1989) with density where W t is the canonical parameter, which we call the "state variable", κ is the scale parameter, and the functions b(·) and c(·) define the specific member of the exponential family of interest. Then, see McCullagh and Nelder (1989), are the conditional mean and variance of Y t given F t . Hereḃ andb refer to first and second derivatives of b(·) with respect to its argument. A link function, g(·), is used to relate the conditional mean to W t so that g(µ t ) = W t . In the GARMA model (and in the GLARMA model of this paper) focus is on the case where the state variable in (1) is If Z t were not present in W t , the above model would be the standard generalized linear model (GLM) of McCullagh and Nelder (1989) in which, given the regressors, observations Y t are independent. In the most general form of the GARMA model Benjamin et al. (2003) define Z t as where A and M are functions for autoregression and moving average components with unknown parameters (in addition to the regression parameters β) φ for the autoregressive and θ for the moving average components. Choices for M suggested in Benjamin et al. (2003) are deviance residuals, Pearson residuals, residuals on the original scale (y t − µ t ) or on the predictor scale (g(y t ) − W t ). Benjamin et al. (2003) state that the GARMA model (3) "is too general for practical application" and concentrate on a simplified version of it. In particular they define their GARMA(p, q) model as a model which satisfies (1) and Note that the GARMA class of models consists of models on the predictor scale. Benjamin et al. (2003) then present a series of examples such as the Poisson GARMA model and the binomial logistic GARMA model amongst others. However, because the observed responses enter into (4) after transformation by the link function there is a need to introduce an additional arbitrary threshold quantity c to cater for y t = 0 in the Poisson or binomial cases and y t = 1 in the binary case so that g(y t ) is properly defined. For example, for Poisson responses with log link, g(y t ) = log(y t ) is not defined when y t = 0 which arises often in low count series, which corresponds to the situation for which these types of models are most required. The threshold quantity c is not a parameter that is estimated along with the other parameters and this is a major drawback of the model with the state variable evolving according to (4) when the canonical link is used. One alternative discussed in Benjamin et al. (2003) is the identity link in which case the arbitrary threshold parameter c is not needed. However, care must be taken with scaling the residuals in the moving average part of (4) as well as the regression variables -see Davis et al. (1999) for details. Some R packages exist for estimation of some GARMA models and these are reviewed in Section 7.
A variant of the GARMA and GLARMA models for binary series is the binary autoregressive moving average (BARMA) model suggested by Li (1994), and discussed more recently in Wang and Li (2011). This has a state variable defined by where µ t is the conditional probability of a success at time point t given F t . For binary time series good reviews of modeling approaches are provided in Startz (2008) and Kauppi and Saikkonen (2008). Their focus is on predicting U.S. quarterly recessions and they demonstrate the utility of dynamics in the model for W t , a feature of GARMA and GLARMA models. Models of this type without a moving average term could be fit using standard GLM procedures by defining the regressors x t to include lagged values of the response series without mean adjustment or scaling. Cross product terms between y t−j and y t−k or x t−k can be included also -see Startz (2008) or Kauppi and Saikkonen (2008) for details. Models of this type have also been considered by Rydberg and Shephard (2003) as a component of a within day stock price movement model but they suggest that these are numerically unstable and recommend using a GLARMA specification instead. The glarma package can be used to fit BARMA models (see below). Note that the model considered by Kauppi and Saikkonen (2008) in their Equation 4 is not a BARMA model because the recursions for W t include the regression term whereas in GLARMA and BARMA models the regression term and the dynamics are additively combined.

Theory of GLARMA models
GLARMA models are similar in structure to GARMA models with some important differences which will be apparent once we give a complete description of the former. In general the conditional distribution of Y t given F t can be a member of the exponential family (1). However, to describe the GLARMA model as currently implemented in glarma we define a simplified exponential family form in which the dispersion parameter κ = 1 and, in order to accommodate the binomial distribution in a simple way we define the b function to be scaled by a factor a t . The simplified version of (1) is where a t and c t are sequences of constants possibly depending on the observations y t . Details of the specific distributions currently available in glarma are provided in Section 3.1.
Note that (6) is not the fully general form of the exponential family (see McCullagh and Nelder 1989) in that it does not include an over-dispersion parameter and the canonical link is used. It follows from (6), as before for the GARMA specification, that the conditional means and variances of the responses are µ t := E(Y t |W t ) = a tḃ (W t ) and σ 2 t := VAR(Y t |W t ) = a tb (W t ). The negative binomial case is not a member of the exponential family when the scale parameter is not fixed and therefore needs to be treated in a special way -see below.
For GLARMA models as implemented in the glarma package the state variable is defined as in which there is an additional offset term O t included because for many applications this may be important. For example in modeling motor vehicle driver deaths, disease counts or crime counts using a Poisson distribution, inclusion of the logarithm of population as an offset term is important and provides a model for the incidence per unit of population. The offset term enters the model without regression parameters.

Response distributions
Response distributions currently supported in the glarma package are Poisson and binomial (which includes Bernoulli as a special case), both of which can be put in the form of (6), and the negative binomial distribution with density in the form given by where µ t = exp(W t ). This cannot be put into the one parameter exponential family form when α is not known, which, however, is the case of most practical interest. As α → ∞ the negative binomial density converges to the Poisson density. Table 1 specifies the distributions and associated link functions currently supported in the glarma package.

GLARMA dependence structure
Serial dependence in the response process W t given by (7) is, in a similar vein to Benjamin et al. (2003), introduced by specifying Z t as an autoregressive moving average recursion of the form in which the predictive residuals are defined as for some scaling sequence {ν t } -see Section 3.3 for choices currently supported.
This specification allows recursive calculation (in t) of the state equation. The model is referred to as a GLARMA model in Davis, Dunsmuir, and Streett (2003). Shephard (1995) provides the first example of such a model class. Rydberg and Shephard (2003) define the GLARMA model in the following form which can be made equivalent to (9) by suitable redefinition of the degrees p and q and the autoregressive and moving average coefficients as shown in Section 3.4.
The name GLARMA, which we prefer to GARMA, was originally used in Shephard (1995), an unpublished paper, and subsequently in Rydberg and Shephard (2003) and Davis et al. (2003). This terminology captures the essence of the models we consider which are generalizations of both generalized linear (GL) and autoregressive moving average (ARMA) models.
We can also write Z t given by (9) using linear combinations of past predictive residuals e t (Davis et al. 1999(Davis et al. , 2003, i.e., where γ j are given as coefficients in the power series with φ(ζ) = 1 − φ 1 ζ − · · · − φ p ζ p and θ(ζ) = 1 + θ 1 ζ + · · · + θ q ζ q being the respective autoregressive and moving average polynomials of the ARMA filter, each having all zeros outside the unit circle. It follows that the {Z t } defined in this way can be thought of as the best linear predictor of a stationary invertible ARMA process with driving noise specified by the sequence {e t } of scaled deviations of count responses from their conditional mean given the past responses and the past and current regressors.

Types of GLARMA residuals
GLARMA predictive residuals are of the form (10) where ν(W t ) is a scaling function. Currently several choices for this are supported.
Pearson residuals: These are scaled by the standard deviation, σ t (defined in Table 1) of the predictive distribution ν t = ν P,t = σ t .
Score-type residuals: These replace the conditional standard deviation by the conditional variance ν t = ν S,t = σ 2 t as suggested in Creal, Koopman, and Lucas (2008).
Identity residuals: These use no scaling so that ν t = ν I,t = 1.
The use of identity residuals allows BARMA models considered in Wang and Li (2011) to be fit using the glarma package. For the Poisson response distribution GLARMA model, failure to scale by the variance or standard deviation function will lead to unstable Poisson means (that diverge to infinity or collapse to zero as an absorbing state for instance) and existence of stationary and ergodic solutions to the recursive state equation is not assured -see Davis et al. (1999), Davis et al. (2003) and Davis, Dunsmuir, and Streett (2005) for details. For the binomial situation this lack of scaling should not necessarily lead to instability in the success probability as time evolves since the success probabilities, π t , and observed responses, Y t , are both bounded between 0 and 1. Thus degeneracy can only arise if the regressors x t become unbounded from below or above. As recommended in Davis et al. (1999) temporal trend regressors should be scaled using a factor relating to sample size n.
Note that for the distributions specified in Table 1 and provided that the regressors are such that the choice of scaling ν t = 0, then the e t have zero mean and are uncorrelated (Davis et al. 2003). When ν t is set to the conditional standard deviation of Y t , the e t also have unit variance, hence are weakly stationary white noise.
Note on BARMA models These can be put into the GLARMA framework by including lagged values of the response variable in the regression matrix as additional columns and specifying moving average terms with identity residuals. Startz (2008) cautions that in BARMA models it is possible to get parameter estimates such that π t = 0 or π t = 1 numerically in which case the information matrix for the model is singular and estimates of parameter covariances are not defined.
The above residuals provide a reasonable range of possible choices. Dunsmuir, Leung, and Liu (2004) investigate the use of parametric scaling for the Poisson response case where e t = (y t − µ t )/µ λ t which gives Pearson residuals when λ = 0.5 and score residuals when λ = 1. They estimate λ by profiling the GLARMA likelihood. For the Polio data of Zeger (1988) they obtain the estimateλ = 1.14 which is not significantly different from 1 (score residuals) but is significantly different from 0.5 (Pearson residuals). This method has not been implemented in the glarma package because it is cumbersome to control convergence and has limited application (mainly to Poisson responses). However, the models using Pearson and score residuals could be fitted and their likelihoods compared to choose between λ = 0.5 and λ = 1. Another alternative is to use Anscombe residuals which are claimed to produce residuals which, although discrete, may be better matched moment-wise to the normal distribution particularly in regard to skewness. These have been discussed in Davis et al. (1999) for developing methods for testing for serial dependence. Including them in the glarma package would require only a modest amount of work for the Poisson and negative binomial response distributions but it would be substantially more involved for the binary or binomial response, since in these cases, the Anscombe residuals require a computation using the incomplete beta function which is defined in terms of the model parameters through the conditional probability of success. Hence the computation of derivatives of the likelihood would be quite challenging.

Parameter identifiability
The GLARMA component Z t of the state variable given in (9) can be rewritten as whereq = max(p, q) and: When pre-observation period values are set to zero (that is Z t = 0 for t ≤ 0 and e t = 0 for t ≤ 0) then, if and only ifθ j = 0 for j = 1, . . . ,q, the recursion (13) would result in Z t = 0 for all t and hence there is no serial dependence in the GLARMA model. This is is equivalent to φ j = −θ j for j = 1, . . . , p and θ j = 0 for j = p + 1, . . . ,q.
Consequently the null hypothesis of no serial dependence requires only these constraints on the θ and φ parameters. In some situations this means that under the null hypothesis of no serial dependence there are nuisance parameters which cannot be estimated. This has implications for the convergence behavior, i.e., the number of iterations required to optimize the likelihood, and on testing that there is no serial dependence in the observations (other than that induced by the regression component x t β).
When p > 0 and q = 0 (equivalent to an ARMA(p, p) specification with constraint θ j = φ j or a pure MA with p = 0 and q > 0), then identification issues do not arise and the hypothesis of no serial dependence corresponds to the hypothesis that φ j = 0 for j = 1, . . . , p in the first case and θ j = 0 for j = 1, . . . , q in the second case. The likelihood ratio and Wald tests (see Section 5.1 for further details) provided in the glarma package will have an asymptotic chi-squared distribution with correct degrees of freedom.
In cases where p > 0 and q > 0 some caution is advised when fitting models and testing that serial dependence is not present. To simplify the discussion we focus on the case where p = q.
1. If there is no serial dependence in the observations but p = q > 0 is specified then there is a strong possibility that the likelihood optimization for this overspecified model will not converge because the likelihood surface will be "ridge-like" along the line where φ j = −θ j . This issue is classical for standard ARMA models. Similarly if the degree of serial dependence is of lower order than that specified for the GLARMA model, identifiability issues and lack of convergence of the likelihood optimizing recursions are likely to occur. Lack of identifiability typically manifests itself in the matrix of second derivatives D NR or the approximate Fisher scoring version D FS becoming close to singular or even non-positive definite. The state variable W t can also degenerate to ±∞ in which case an error code in the output of the glarma() call is provided.
2. The likelihood ratio test, that there is no serial dependence, versus the alternative, that there is GLARMA-like serial dependence with p = q > 0, will not have a standard chisquared distribution, because the parameters φ j for j = 1, . . . , p are nuisance parameters which cannot be estimated under the null hypothesis. Testing methods such as proposed in Hansen (1996) for example need to be developed for this situation.

The GLARMA likelihood
Given n successive observations {y t : t = 1, . . . , n} of the response series, the likelihood, conditional on initial values for the recursions to calculate Z t , is constructed as the product of conditional densities of Y t given F t . The state vector W t at each time point embodies these conditioning variables as well as the unknown parameters δ = (β, φ, θ) and in order to stress this we will use the notation W t (δ) where appropriate. Then, the log-likelihood is given by Strictly speaking this is a conditional (on initial values) log-likelihood but for brevity we will henceforth use the term log-likelihood. For the Poisson and binomial response distributions the log-likelihood (14) is For the negative binomial response distribution the log-likelihood is based on (8) and the shape parameter α also has to be estimated along with β, φ and θ. We then let δ = (β, φ, θ, α).
Note that the e t in (10) and the Z t in (9) are also functions of the unknown parameter δ and hence need to be recomputed for each iteration of the likelihood optimization. Thus in order to calculate the likelihood and its derivatives, recursive expressions are needed to calculate e t , Z t and W t as well as their first and second partial derivatives with respect to δ. Expressions for these recursive formulae are available in Davis et al. (2005) for the Poisson case. Corresponding formulae for the binomial case were derived in Lu (2002) and for the negative binomial case in Wang (2004). The essential computational cost is in the recursions for Z t and W t and their first and second derivatives with respect to δ. Fortunately, these require identical code for the various response distributions and definitions of predictive residuals e t .
The likelihood is maximized from a suitable starting value of the parameter δ using a version of Fisher scoring iteration or by Newton-Raphson iteration. For a given value of δ let the vector of first derivatives with respect to δ of the log-likelihood (14) be and the second derivative matrix be where the matrix of second derivatives of the log-likelihood is (in the Poisson and binomial response cases) given by Using the fact that, at the true parameter value δ, E[y t − a tḃ (W t )|F t ] = 0, the expected value of the first summation in (17) is zero and hence the expected value of the matrix of second . While these expectations cannot be computed in closed form, expression (18) requires first derivatives only and is used in the glarma package as the basis for the approximate Fisher scoring method. Thus, if δ (k) is the parameter vector at the current iterate k, the Newton-Raphson updates proceed using and the approximate Fisher scoring updates use D FS in place of D NR . Given a specified tolerance TOL, iterations continue until the largest gradient of the log-likelihood satisfies max i |d i (δ (k) |) ≤ TOL or a maximum number of iterations MAXITER is surpassed. At termination we letδ = δ (k+1) and call this the "maximum likelihood estimate" of δ.
Selecting initial values of the state variable For calculation of the Z t in (9), initializing conditions for the recursions must be used. The current implementation in glarma is to set e t = 0 and Z t = 0 for t ≤ 0 ensuring that the conditional and unconditional expected values of e t are zero for all t. These would seem to be natural specifications for GLARMA models and are used in Rydberg and Shephard (2003) and Davis et al. (2003). In BARMA models (5) the recursions for W t are in terms of past Y t and past unscaled deviations Y t − µ t . While setting pre-period values of these latter to zero may be natural, there remains the question of how best to initialize Y t for t ≤ 0. For example Kauppi and Saikkonen (2008) suggest using a method that can be "interpreted as estimates of the unconditional mean" of W t and Startz (2008) suggests setting µ t to zero, or the sample mean of Y t , or π t = 0.5 or initial values of y t − µ t to zero. Currently to use the glarma package to fit BARMA models y t − µ t = 0, t ≤ 0 is assumed and the users are free to specify values y t for t ≤ 0 in whatever way they wish.
Choosing lags for the ARMA components Selection of appropriate lags for the AR and MA components in the GLARMA model is usually considerably more difficult than it is for Gaussian series where residuals from least squares model fits can provide quite useful information about the serial dependence structure through use of the autocorrelation and partial autocorrelation functions. Unlike residuals from least squares regression, which often provide useful information about serial dependence model structure for continuous responses, in the discrete-valued response setting, residuals from the GLM fit often do not give good guidance on choosing p and q needed to specify the GLARMA model, particularly if the dependence is weak or moderate. We would recommend that the autocorrelation and partial autocorrelation functions be estimated using GLM residuals and, if strong patterns are observed, that these be reflected in the choice of p and q. However, following the discussion on parameter identifiability in Section 3.4 it is also highly recommended that users start with low orders for p and q and initially avoid specifying them to be equal. Once stability of estimation is achieved for a lower order specification increasing the values of p or q could be attempted.
Initializing the MLE recursions By default, the iterations in (19) are initialized using the GLM estimates for β and zero initial values for the autoregressive moving average terms. For the negative binomial case, β and α are initialized using a call to glm.nb() from package MASS (see Venables and Ripley 2002). While this is often a successful strategy, for which convergence to the MLE is typically rapid, it does not always lead to convergence or can take quite a few iterations. In some cases, using the Fisher scoring method can be more robust to starting values. Thus, this can be used to initialize the iterative algorithm and to then switch to Newton-Raphson once the procedure is reasonably stable, because Newton-Raphson is typically faster once the iterations are close to the optimum. Users may optionally specify initial parameter values of their own choice. This can be important particularly when strong serial dependence is required to be modeled or when the model may be overspecified and hence poorly identified because the likelihood surface is flat in some directions in parameter space. In the former case, the autocorrelation function of the residuals from an initial GLM fit can guide selection of AR and MA terms and initial values for these. Alternatively the autocorrelation of the randomized probability integral transformation residuals might provide guidance. When the model is poorly identified it is particularly difficult to obtain convergence of the likelihood iterations as discussed in more detail in Section 3.4. Our recommendation is to start with moving average terms at small order lags and seasonal lags (if appropriate) and build up a sequence of models with the moving average order, q, increasing. Then, examination of the moving average coefficient estimates,θ j can sometimes suggest a simpler autoregressive or mixed autoregressive moving average specification.

Stochastic properties of GLARMA models
Sufficient conditions for the mean and variance of the predictive residuals to exist are, for example, that the autoregressive polynomial φ(ζ) has all zeros outside the unit circle, the regressors x t are such that 0 < σ t < ∞ and the recursions in (9) are initialized with Z t = e t = 0 for t ≤ 0. Assuming then that σ t < ∞ in the Poisson and negative binomial case and that 0 < π t < 1 and m t < ∞ in the binomial case the three types of residuals considered will have finite mean and variance at any time point t. In that case they form a martingale difference sequence (Rydberg and Shephard 2003;Davis et al. 2003) and as a result of that are uncorrelated. Further the Pearson residuals (and only those) have unit variance and therefore are zero mean, unit variance weakly stationary white noise. Means, variances and autocovariances for the state process {W t } can be readily derived using the definition of Z t in (12) -see (Davis et al. 1999). For the Poisson response case the corresponding means, variance and autocovariances for the count response series {Y t } can be derived approximately.
Additionally an approximate interpretation of the regression coefficients β can be given -see (Davis et al. 2003). Similar results could be derived for the negative binomial response case. For binomial and Bernoulli responses, calculation of means, variances, autocovariances for the response series and interpretation of regression coefficients is not straightforward. This is a typical issue for interpretation of random effects models and transition models in the binomial or Bernoulli case -see Diggle, Heagerty, Liang, and Zeger (2002) for example.
To date the stationarity and ergodicity properties of the GLARMA model are only partially understood. These properties are important in order to ensure that the process is capable of generating sample paths that do not degenerate to zero or do not explode as time progresses, as well as for establishing the large sample distributional properties of estimates of the parameters. Davis et al. (2003) provide partial results for perhaps the simplest of all possible models for Poisson responses specified with p = 0, q = 1 and x t β = β. Results for simple examples of the stationary Bernoulli case are given in Streett (2000). Recently Davis and Liu (2012) have considered a general class of observation-driven models for one parameter exponential family response distributions. As they point out, the GLARMA processes do not necessarily satisfy the contraction condition required for the existence of a stationary distribution except in simple cases of Davis et al. (2003). These cases exclude regression terms in W t .
Increasingly, asymptotic results for various types of observation driven models without covariates are becoming available. Tjøstheim (2012) reviews the ergodic and stationarity properties for some observation-driven models primarily in the Poisson response context. However these results do not apply to the specific form of GLARMA models that are presented here. Wang and Li (2011) discuss the BARMA model and present some asymptotic results for that case. However the state equation for the BARMA model differs structurally from that for the binary GLARMA model, because past observations of Y t also enter in the autoregressive terms without scaling or centering, whereas in the GLARMA model they only enter through the residuals. Woodard, Matteson, and Henderson (2011) present some general results on stationarity and ergodicity for the form of the GARMA model of Benjamin et al. (2003). However these results are not directly applicable to the GLARMA model form considered here because the state equation recursions again involve applying the link function to both the responses and the mean responses. None of these recent results consider the case of covariates and hence are not, as yet, applicable to likelihood estimation for regression models for discrete outcome time series.

Fitted values
There are two concepts of fitted values currently supported for the GLARMA model. The first is defined as the estimated conditional mean functionμ t at time point t calculated using the maximum likelihood estimatesδ. Thuŝ whereẐ t are calculated usingδ in (9). These fitted values combine the regression fit (fixed effects) together with the contribution from weighted sums of past estimated predictive residuals.
Because for GLARMA models the unconditional mean function is difficult to obtain exactly in all cases an estimated unconditional mean function of t is not provided. Instead, for the second concept of fitted values, the fitted value from the regression term only is suggested as a guide to the fit without the effect of random variation due to Z t . This is defined to bẽ We refer to this as the "fixed effects fit" in plotting functions below. Note that this is not an estimate of the unconditional mean even in the Poisson case (arguably the most tractable for this calculation). The theoretical unconditional mean for this case is approximated by exp(x t β +ν 2 /2) where ν 2 = ∞ l=1 γ 2 i -see Davis et al. (2003) for details. A similar calculation for the binomial case is not available. Hence, in view of these theoretical constraints, the use of the fixed effects fit seems a simple and sensible alternative to the conditional meanμ t given by (20).

Distribution theory for likelihood estimation
For inference in the GLARMA model it is assumed that the central limit theorem holds so thatδ where the approximate covariance matrix is estimated bŷ in the case of Newton-Raphson and similarly with D NR replaced by D FS in the case of Fisher scoring. Thus a standard error for the maximum likelihood estimates of the ith component of δ is computed usingΩ ii . There have been a number of claims in the literature concerning a central limit theorem for models of this type. However all of these make assumptions concerning convergence of key quantities which require ergodicity to be established which has not been done in generality yet. The central limit theorem for the maximum likelihood parameter estimates is rigorously established only in the stationary Poisson response case in Davis et al. (2003) and in the Bernoulli stationary case in Streett (2000). Simulation results are also reported in Davis et al. (1999Davis et al. ( , 2003 for non-stationary Poisson models. Other simulations not reported in the literature support the supposition that the estimatesδ have a multivariate normal distribution for large samples for a range of regression designs and for the various response distributions considered here. Hence, while a central limit theorem for the maximum likelihood estimators is currently not available for the general model it seems plausible, since for these models the log-likelihood is a sum of elements in a triangular array of martingale differences. For nested models likelihood ratio test statistics and Wald statistics can be calculated. Whenever the central limit theorem (22) holds an approximate chi-squared distribution with appropriate degrees of freedom can be used to assess significance. Let δ (1) specify a subset of δ that is hypothesized to take a specific value δ (1) 0 . The Wald test is constructed as whereΩ (1) is the submatrix corresponding to δ (1) 0 of the estimated asymptotic covariance matrix of (23). Further details on implementation of these tests in glarma are given in Section 5.1.
Associated with testing of the need for GLARMA terms in the model is the use of residuals to investigate serial dependence not accounted for by the model fitted so far. If the residuals from a GLM fit (without GLARMA terms) are used to calculate estimated autocorrelations and partial autocorrelations in the usual way then it is straightforward to show that, under the conditions on the regressors as given in Davis and Wu (2009), these will be asymptotically normally distributed with standard error given by the usual 1/ √ n. This provides the usual 95% intervals in the diagnostic plots provided in glarma. If the residuals are calculated after fitting GLARMA terms in the model, then the asymptotic theory is not yet available to establish the corresponding result. However the first author's experience with simulations suggests that the autocorrelations estimated using residuals from the GLARMA model fit are asymptotically normal with the usual standard error. Because the residuals from an adequately specified GLARMA model are martingale differences their autocorrelations at the true parameters are uncorrelated with standard error 1/ √ n as usual. Asymptotic normality of the estimated autocorrelations is also likely. Experience with simulations suggests that, for example, the usual Box-Ljung portmanteau test for serial dependence has the correct asymptotic chi-squared distribution.

Forecasting GLARMA models
Methods for forecasting future values of the observed time series using observation-driven models are not as well developed as they are for the traditional ARMA or ARMAX models for continuous responses. The conditional specification of the observation-driven model means that one-step ahead forecasts are simple. However for multi-step ahead forecasts the conditional specification means that all possible future sample paths over the forecast horizon need to be considered either theoretically or via simulation. We discuss these aspects here.
For one-step ahead forecasts of Y n+1 from the last observation at time point n, provided values of the regressors x n+1 at time point n + 1 are available either by knowledge or separate forecasting, then the conditional distribution of Y n+1 may be forecast simply by forecasting the state variable W n+1 . However, this can be done using the parameter estimates in the observation-driven model. We illustrate this for the GLARMA model with similar formulae applying for GARMA and BARMA models.
Note thatẐ n+1 can be completely determined using values ofẐ t andê t for t ≤ n. Then the predictive distribution for Y n+1 is estimated to be f (y|Ŵ n+1 ) where the density f is given by (6). Obviously this estimated density provides a complete description of the forecast and could be presented as a bar chart or features such as its mean, mode or median for a single point forecast could be used. Additionally one could determine prediction intervals but due to the discreteness of the distribution these will not correspond to commonly selected prediction coverage probabilities such as 95% for example.
Multi-step ahead forecasts are more involved as has been noted by several authors. Rydberg and Shephard (2003) give the following general formula which applies for lead times L > 1.
Here the information sets F n+j are defined to combine the actual information in F n which is available, knowledge of the regressors x n+1 , . . . , x n+j and the values of y n+1 , . . . , y n+j−1 used in the summations on the right hand side of (26). The summation then is over all possible projected sample paths over n + 1, . . . , n + L − 1, the number of which will grow at least exponentially with L. The state variables W n+j required to specify the conditional densities in (26) need to be updated recursively. However the GLARMA updating equations are rapid to compute. Rydberg and Shephard (2003) suggest that complete enumeration of the sums and products is impossible and suggest either doing this by truncating the state space for responses Y t (relevant in the Poisson and negative binomial cases) or by simulation.
The binomial, and particularly the binary case may be amenable to complete enumeration and, indeed, such an approach is suggested in Kauppi and Saikkonen (2008). In the binary case Startz (2008), Klingenberg (2008), and in an appendix Kauppi and Saikkonen (2008), discuss implementation of (26) for binary dynamic models including the BARMA type structure and provide formulae for analogous terms on the right hand side by summing over all possible paths of zeros and ones over the forecast period. For binary data this will provide non-simulation based forecasts of the future probabilities of success conditional on available data. Extension of this method to binomial, Poisson or negative binomial distributions will be computationally and algebraically complex. Because of this, simulation is the approach that has been implemented in the glarma package for all distributions currently supported. Benjamin et al. (2003) also utilize the simulation approach and suggest that this is feasible for short-range forecasting. Again they mention that the regressors need to be known in the future or, if stochastic, are capable of being forecast.
Other recent work on forecasting includes McCabe, Martin, and Harris (2011), who study multi-period ahead forecasts within the context of the integer autoregressive class of models, and Freeland and McCabe (2004), for the Poisson autoregression model. These models are not in the GLARMA class of models.

Modeling functions in glarma
There are seven modeling functions for fitting GLARMA models, falling into three groups: Poisson: glarmaPoissonPearson() and glarmaPoissonScore().
The second component of the name indicates the distribution used for the counts and the third component the residuals used in the fitting routine. A call to glarma() results in a call to the appropriate fitting routine, as determined by the values of the arguments type and residuals supplied to the glarma() call. Pearson residuals are used by default. Two iterative methods are available for the optimization of the log-likelihood, Fisher scoring (method = "FS") and Newton-Raphson (method = "NR"), the default method being Fisher scoring. The object returned by any of the fitting routines is of class 'glarma'.
To specify the model in a call to glarma(), the response variable is given by the argument y, and the matrix of predictors for the regression part of the model is given by the argument X. The matrix X must include a column of ones to enable the fitting of a mean term in the regression component of the model. If required for W t in (7), an offset term O t must be specified also. Initial values can be given for the coefficients in the regression component using the argument beta. If no initial values are provided, a call is made to the corresponding GLM to obtain initial regression coefficient values.
The ARMA component of the model is specified using the arguments phiLags and phiInit (for the AR terms) and thetaLags and thetaInit (for the MA terms). For both the AR and MA terms, the first argument of the pair of arguments specifies the orders of the lags which are to be included in the model, and the second argument the initial values of the coefficients for those lags.
When the counts are modeled using the negative binomial distribution, there is an additional parameter, the shape parameter of the negative binomial, designated as α in the GLARMA model. This parameter is called θ in the function glm.nb() from package MASS, but for GLARMA models θ refers to the moving average terms in the ARMA component of the model. An initial value for α can be provided using the argument alphaInit. If no initial value is provided, a call is made to glm.nb() from MASS. An initial value for the call to glm.nb() can be supplied by giving a value to the argument alpha of glarma(). The default value for alpha is 1.
Because the GLARMA model is fitted using numerical non-linear optimization, it is possible that non-convergence occurs. Two error codes are included in the object returned by glarma() to alert users to numerical problems which occurred during fitting. If the Fisher scoring or Newton-Raphson iterations fail to converge, errCode will be set to 1. This can result from non-identifiability of the ARMA component of the model such as when the degrees and lags of both the AR and MA components are specified to be the same, as discussed in Section 3.4. It is possible that for certain values of the ARMA parameters the recursions calculating {W t } diverge to ±∞. In that case the value of WError will be set to 1 allowing the user to check for this condition when the likelihood optimization fails to converge. Once a fitted model object has been obtained, there are accessor functions available using S3 methods to extract the coefficients (coef(), or the alias coefficients()), the fitted values (fitted() or the alias fitted.values()), the residuals (residuals() or the alias resid()), the model frame (model.frame()), the number of observations (nobs()), the log-likelihood (logLik()), and the AIC (extractAIC()). These are standard implementations of the meth-ods with the exception of coef(). This method takes an argument types which allows the extraction of the ARMA coefficients (types = "ARMA"), or the regression coefficients (types = "beta"), or both sets of coefficients (types = "all"), the default.
For an object of class 'glarma' the package includes S3 print, summary, and plot methods as well as a print method for the object returned by summary.

Likelihood ratio and Wald tests
In glarma, the likelihood ratio test and the Wald test both test that the serial dependence parameters ψ = (φ , θ ) are all equal to zero (that is H 0 : ψ = 0 versus H a : ψ = 0). These tests are provided by the function likTests(), which operates on an object of class 'glarma'. The likelihood ratio test compares the likelihood of the fitted GLARMA model with the likelihood of the GLM model with the same regression structure. The same null hypothesis applies to the Wald test, which is based on the Wald statistic defined in (24). Values of both statistics are compared to the chi-squared distribution with degrees of freedom given by the number of ARMA parameters. These degrees of freedom and associated chi-squared p values are correct under the situations discussed in Section 3.4.
Package users may also construct their own tailor-made likelihood ratio tests by using the reported log-likelihood (logLik()) for the two models under comparison and Wald tests W 2 in (24) using the appropriate submatrix of the reported estimated covariance matrix in (23) available as glarmamod$cov.

Probability integral transformation
To examine the validity of the assumed distribution in the GLARMA model a number of authors have suggested the use of the probability integral transformation (PIT), see for example Czado, Gneiting, and Held (2009). Although PIT applies to continuous distributions and the distributions in GLARMA models are discrete, Czado et al. (2009) have provided a non-randomized approach which has been implemented in the glarma package. There are four functions involved: glarmaPredProb() calculates conditional predictive probabilities; glarmaPIT() calculates the non-randomized PIT; histPIT() plots a histogram of the PIT; and qqPIT() draws a Q-Q plot of the PIT. If the distribution selected for the model is correct, then the histogram and Q-Q plot should resemble the histogram and Q-Q plot obtained when sampling from the uniform distribution on [0, 1]. Of the two plots, the histogram is generally more revealing. Deviations from the expected form of the Q-Q plot are often difficult to discern.
To calculate the conditional predictive probabilities and the PIT the following formulae from Czado et al. (2009) are used. Given the counts {y t }, the conditional predictive probability function F (t) (.|y t ) is given by Here F (y t ) and F (y t − 1) are the upper and lower conditional predictive probabilities respectively.
Then the non-randomized PIT is defined as To draw the PIT histogram, the number of bins, I, is chosen, then the height of the ith bin is The default number of bins in histPIT() is 10. To help with assessment of the distribution, a horizontal line is drawn on the histogram at height 1, representing the density function of the uniform distribution on [0, 1].
The Q-Q plot of the PIT plotsF (u) against u, for u ∈ [0, 1]. The quantile function of the uniform distribution on [0, 1] is also drawn on the plot for reference.
Jung and Tremayne (2011) employ the above diagnostics as well as the randomized version of PIT residuals to compare alternative competing count time series models for several data sets.
One can also define the normalized conditional (randomized) quantile residuals of Dunn and Smyth (1996) as r t = Φ −1 (u t ) where Φ −1 is the inverse standard normal distribution function and u t is a uniform random variable on the interval [F (y t − 1), F (Y t )] defined as in (27). Benjamin et al. (2003) advocate the use of autocorrelation and partial autocorrelation function and associated portmanteau statistics (e.g., the Box-Ljung statistic) of these r t to assess the adequacy of the serial dependence components in a GARMA model. This idea can easily be implemented using glarmaPredProb() described above. Earlier, Berkowitz (2001) suggested using a likelihood ratio test that the r t are mean zero, variance one and independent in particular and, more generally, for other ways in which dependence may not have been modeled adequately.

Plots
The plot method for objects of class 'glarma' produces six plots by default: a time series plot with the observed values of the dependent variable, the fixed effects fit, and the GLARMA fit; a plot of the residuals against time; a histogram of the uniform PIT values; a histogram of the normal randomized residuals; a Q-Q plot of the normal randomized residuals; and the autocorrelation of the normal randomized residuals. Additional four plots can be produced: the autocorrelation of the residuals; a Q-Q plot of the residuals; a Q-Q plot of the uniform PIT values; and the partial autocorrelation of the normal randomized residuals. Any subset of these ten plots can be obtained using the which argument. For example the default value of which is set to which = c(1L, 3L, 5L, 7L, 8L, 9L). Arguments to the plot method are also provided to change properties of the lines in these plots, namely line types, widths, and colors.

Examples
There are several example data sets included in the glarma package which cover binary, binomial, Poisson and negative binomial responses. Sample analyses for all these data sets are provided in either the help pages for the data sets or for the glarma() function.
GLARMA models with Poisson counts have appeared previously in the literature, however analyses using the binomial and negative binomial distributions are novel, so we concentrate on those cases in this section.

Asthma data
This data set arose from a single hospital (at Campbelltown), as part of a larger study into the relationship between atmospheric pollution and the number of asthma cases presenting at emergency departments in the South West region of Sydney, Australia, see Davis et al. (2003). A description of the columns in the data set is given in Table 2.
We fit a model with a moving average term at lag 7 with negative binomial counts. The initial values of the regression coefficients are found by fitting the corresponding GLM model, and the initial value of the shape parameter, α, of the negative binomial distribution is taken as 0. Pearson residuals are used and fitting is by Newton-Raphson.
The plot method for an object of class 'glarma' shows six plots by default, as explained previously. As an example, in Figure 1, we show just four of these plots. Since the default title for the PIT histogram is too long for the available space we use the titles argument to abbreviate it.
R> plot(glarmamod, which = c(1, 2, 3, 5), + titles = list(NULL, NULL, NULL, "PIT for GLARMA (Neg. Binomial)")) The ACF plot indicates that the model has dealt adequately with any serial correlation present, and the PIT histogram suggests that the negative binomial distribution provides a suitable model for the counts.

Court conviction data
This data set records monthly counts of charges laid and convictions made in Local Court and Higher Court in armed robbery in New South Wales, Australia, from 1995-2007, see Dunsmuir, Tran, and Weatherburn (2008. A description of the columns in the data set is given in Table 3.
The first step is to set up dummy variables for months.  Step.2001 Step change from 2001 onwards. 5 Trend.  Similar analyses can be carried out for both the Lower Court and the Higher Court data.
Here we consider only the Lower Court data. The ARIMA component of the model is chosen to be AR(1) and the model for the conviction counts is binomial. A GLM is fitted first to obtain an initial value for the regression coefficients. The initial value of the AR parameter is set at 0. Pearson residuals are used with Newton-Raphson iteration.
First of all the data is prepared for fitting a binomial and the initial GLM fit is obtained.  R> glarmamod <-glarma(Y, X, phiLags = 1, type = "Bin", method = "NR", + maxit = 100, grad = 1e-6) R> summary(glarmamod) Call: glarma(y = Y, X = X, type = "Bin", method = "NR", phiLags = 1, maxit = 100, grad = 1e-06) We observe that the regression coefficients for the GLARMA model are quite similar to those for the GLM model. In particular, the step change in 2001 is highly significant. The likelihood ratio and Wald tests both suggest the need to deal with autocorrelation. Figure 2 is produced using plot(glarmamod).
In these diagnostic plots, the ACF plot of the randomized residuals shows little residual autocorrelation, and the Q-Q plot of the randomized residuals shows reasonable conformity with normality. However the PIT histogram suggests that the binomial model for the counts is not appropriate for this data.

Example of diagnostics based on normalized randomized residuals
We illustrate the normalized randomized quantile residuals on the court conviction data below. The function normRandPIT() produces the randomized PIT residuals. Figure 3 shows the utility of these residuals when the serial dependence term is omitted from the model. We first fit the model and obtain the randomized residuals.

Example of forecasting with GLARMA models
Forecasting for GLARMA models is implemented in the glarma package via the forecast method for 'glarma' objects. In the following example we first produce a series of one-step ahead forecasts of µ t and plot them along with the estimated values of µ t obtained by using all the data. We first obtain the estimates and predicted values.
For forecasting more than one-step ahead, sampling is required as explained previously. We continue the example above by producing a sample of possible values of µ t for the last time period obtained by predicting two steps ahead.

Other packages for observation-driven models
A number of other packages on CRAN are available which use various approaches to fit observation-driven models for the type of data modeled using the glarma package. We here discuss the packages detected using searches for functions via the package sos Graves, Dorai-Raj, and Francois (2013). The keywords used for these searches were "GLARMA", "GARMA" and "polio", the last of these because the polio data set of Zeger (1988) is the classical example considered by most researchers concerned with analyzing this type of data.
VGAM has a function garma() which fits a GARMA model. The help file indicates that the current version of the function is only a preliminary version, viz "Currently, this function is rudimentary and can handle only certain continuous, count and binary responses only". Also "This function is unpolished and is (sic) requires lots of improvements. In particular, initialization is very poor. Results appear very sensitive to quality of initial values." Fitting is by Fisher scoring, and Pearson residuals are produced. There is no plot method. One example is given, the interspike data from Zeger and Qaqish (1988) and the model fitted is an AR (2), with no regression component other than the overall mean.
The package gamlss.util, which is an extension to gamlss (Rigby and Stasinopoulos 2005;Stasinopoulos and Rigby 2007) has a garmaFit() function. As an example the Polio data of Zeger (1988) is modeled. The word "GARMA" does not appear in the main references to the gamlss package, so further information about the function must be sought in Benjamin et al. (2003) and the code of the function itself. The work in Benjamin et al. (2003) has already been discussed.
The garmaFit() function has no offset argument, although it seems that an offset can be included using the formula specification of the model. The ARMA specification only allows for specification of the maximum AR and MA orders, so specifying that the model has MA terms 1, 2 and 5 only (as in the GLARMA polio analysis) does not appear possible. The range of possible response distributions is large and different link functions can also be used. The residuals used are identity residuals. There are print, summary, fitted, coef, residuals, update, plot, deviance, and formula methods for the object returned by garmaFit(). There is a predict method, but the help page states it is under development. Overall the function garmaFit() and associated methods provide a useful modeling capability, although we have misgivings about the methodology, as we have outlined in Section 3.
The package gsarima only provides code for simulating some GLMs which have an autoregressive component.
Searching on "polio" produced two further packages on CRAN which analyze the polio data: acp (autoregressive conditional Poisson, Vasileios 2015); and gcmr (Gaussian copula marginal regression, Masarotto and Varin 2015). The theory behind these two packages may be found in Heinen (2003) and Masarotto and Varin (2012), respectively. A further package found in this search was mr (marginal regression, Masarotto and Varin 2010). This package is archived, but appears to have been superseded by gcmr.
Heinen (2003) only deals with the ARMA(1,1) structure for modeling the lack of independence. Besides the Poisson distribution, two versions of the double Poisson distribution are considered for the marginal distribution of the counts. Heinen (2003) also examines models where the variance has an ARMA(1,1) structure. There are examples in the paper for these models, but package acp only fits the ACP model, not the double Poisson or other enhancements.
Masarotto and Varin (2012) use multivariate Gaussian copula regression to model GLMs with time series errors. They use maximum likelihood to fit the parameters of their models but due to the complexity of their approach, a Monte Carlo approximation of the likelihood with importance sampling is used for model fitting. In the package gcmr fitting uses the function gcmr(). There are diagnostics for the model available, including examination of the residuals. Analysis of the polio data is carried out as an example for the package capability. The model specification only allows for all lag orders to be included when errors with given orders are specified. Marginal distributions can be beta, binomial, gamma, normal, negative binomial, Poisson or Weibull.

Future enhancements and applications
The glarma package can be further developed to expand the range of response distributions and forms of state equation. In relation to the state equation the development of transfer function type models (along the lines of Box, Jenkins, and Reinsel 2013) and the inclusion of intervention models would require replacing the linear regression component x t β by a transfer function which is non-linear in the parameters. Such an enhancement would create useful models for policy assessment in fields such as criminology.
As regards increasing the range of response distributions available three suggest themselves as being of considerable practical utility. The first is the gamma response (for continuous outcomes) or the generalized gamma. These distributions would allow modeling durations between events as arise in financial modeling. We have developed the GLARMA methods for these which, however, are currently not implemented in the glarma package but could be easily enough. The second is inclusion of zero-inflated distributions. This could currently be accommodated to a limited extent by modeling the probability of a zero occurring with a binary process and then modeling the probability of a positive response with a separate count distribution. The model could be conditionally specified in a similar vein to that used for model transaction level stock price movements as in Rydberg and Shephard (2003). A more general model would require both the zero-inflated component and the count response component to be modeled jointly in terms of possibly coupled state equations or bivariate state equations. This would require a substantial additional development of the current glarma package.
The third is the multinomial response distribution which would be useful in financial modeling with, for example, the hurdle model used in Liesenfeld, Nolte, and Pohlmeier (2006) and in modeling listener responses to music as in Dean, Bailes, and Dunsmuir (2014). Again development of this in a complete way would require a multivariate state equation to be used and this would, as in the zero-inflated case, require substantial development.
The glarma package has already been utilized in modeling multiple independent time series of binary responses (Dean et al. 2014) and in modeling the impact of changes in legal blood alcohol levels on driver deaths (Dunsmuir 2015). Data sets of this type, which we call "long longitudinal data", are increasingly needed to be analyzed. Dunsmuir (2015) outlines a fixed effects approach and a random effects approach (random effects being used to explain differences between regression parameters in the individual series). Currently to implement these additional R code is required, but it is planned to make this available in future releases of the glarma package.