cquad : An R and Stata Package for Conditional Maximum Likelihood Estimation of Dynamic Binary Panel Data Models

We illustrate the R package cquad for conditional maximum likelihood estimation of the quadratic exponential (QE) model proposed by Bartolucci and Nigro (2010) for the analysis of binary panel data. The package also allows us to estimate certain modiﬁed versions of the QE model, which are based on alternative parametrizations, and it includes a function for the pseudo-conditional likelihood estimation of the dynamic logit model, as proposed by Bartolucci and Nigro (2012). We also illustrate a reduced version of this package that is available in Stata . The use of the main functions of this package is based on examples using labor market data.


Introduction
With the growing number of panel datasets available to practitioners and the recent development of related statistical and econometric models, ready-to-use software to estimate non-linear models for binary panel data is now essential in applied research. In particular, the panel structure allows for formulations that include both unobserved heterogeneity (i.e., time-constant individual intercepts) and the lagged response variable, which accounts for the so-called state dependence (i.e., how the experience of a certain event affects the probability of experiencing the same event in the future), as defined in Heckman (1981a).
A simple and, at the same time, interesting approach for the analysis of binary panel data is based on the dynamic logit (DL) model, which includes individual-specific intercepts and state dependence. The estimation of such a model may be based either on a random-effects or on a fixed-effects formulation. In the first case, individual intercepts are treated as random parameters while, in the second, each intercept is considered as a fixed parameter to be estimated. The fixed-effects approach attracts considerable attention as it requires a reduced amount of assumptions with respect to the random-effects formulation, based on the independence between the individual unobserved effects and the observable covariates, and on the normality assumption.
For the static fixed-effects logit model (i.e., the DL model without the lagged response variable among the covariates), it is possible to eliminate the individual intercepts by conditioning on simple sufficient statistics (Andersen 1970;Chamberlain 1980). In general, the estimator based on this method is known as conditional maximum likelihood (CML) estimator. The full DL model, however, does not admit simple sufficient statistics for the individual intercepts and, therefore, cannot be estimated by CML in a simple way as the static logit model. The drawback described above is overcome by Bartolucci and Nigro (2010), who develop a model for the analysis of dynamic binary panel data models based on a Quadratic Exponential (QE) formulation (Cox 1972), which has the advantage of admitting sufficient statistics for the unobserved heterogeneity parameters. Therefore, the model parameters can easily be estimated by the CML method. Recently, further extensions to the approach of Bartolucci and Nigro (2010) have also been proposed. In particular, Bartolucci and Nigro (2012) propose a QE model that closely approximates the DL model. Finally, Bartolucci, Nigro, and Pigini (2017) derive a test for state dependence that is more powerful than the one based on the standard QE model.
In this paper we illustrate cquad , which is a comprehensive R (R Core Team 2017) package for the CML estimation of fixed-effects binary panel data models. In particular, cquad contains functions for the estimation of the static logit model (Chamberlain 1980), and of the dynamic QE models recently proposed by Bartolucci andNigro (2010, 2012) and . A version of the R package cquad, including its main functionalities, is also available for Stata (StataCorp. 2015; Bartolucci 2015) and is illustrated here.
As it implements fixed-effects estimators of non-linear panel data models for binary dependent variables, cquad complements the existing array of R packages for panel data econometrics. Above all, it is closely related to the plm package (see Croissant and Millo 2008), which provides a wide set of functions for the estimation of linear panel data models for both static and dynamic formulations. In addition, cquad shares with plm the peculiarities of the data frame structure, of the formula supplied to model.matrix, and of the object class panelmodel. cquad is also related to package nlme (Pinheiro, Bates, DebRoy, Sarkar, and R Core Team 2017), which implements non-linear mixed-effects models that can be estimated with longitudinal data.
The Stata module cquad represents an addition to the many existing commands and modules for panel data econometrics available in this software, such as xtreg and xtabond2 for linear models, and it complements the available routine for the CML and ML estimation of the static logit model, namely the native xtlogit. In addition, it relates to the routines and modules for the estimation of static random-effects binary panel data models, such as the built-in xtprobit and the module gllamm (2011) for the estimation for generalized linear mixed models (see Rabe-Hesketh, Skrondal, and Pickles 2005), and the implementation of dynamic models, in the modules redprob and redpace (see Stewart 2006). Finally, a package for the estimation of binary panel data models with similar functionalities is the DPB function package for gretl (see Lucchetti and Pigini 2015, for details), which implements the CML estimator for the QE model by Bartolucci and Nigro (2010). A related package, which however uses a different approach for parameter estimation, is the R package panelMPL described in Bartolucci, Bellio, Salvan, and Sartori (2016). The paper is organized as follows. In the next section we briefly review the basic definition of the DL model and of the different versions of the QE model here considered. We also briefly review CML and pseudo-CML estimation of the models. Then, in Section 3 we describe the main functionalities of package cquad for R and the corresponding module for Stata. Finally, the illustration of the packages by examples is provided in Section 4.
For the purpose of describing cquad functionalities, we use data on unionized workers extracted from the U.S. National Longitudinal Survey of Youth. In particular, to illustrate the R package, we use the same data as in Wooldridge (2005), whereas for the Stata module we employ similar data already available in the Stata repository.

Preliminaries
We consider a binary panel dataset referred to a sample of n units observed at T consecutive time occasions. We adopt a common notation in which y it is the response variable for unit i at occasion t, with i = 1, . . . , n and t = 1, . . . , T , and x it is the corresponding column of covariates. In the following we first describe the CML method applied to the logit model, then we illustrate the DL and QE models for the analysis of dynamic binary panel data models and inference based on the CML method.

Conditional maximum likelihood estimation
In order to provide an outline of the CML method by Andersen (1970), in the following we describe the derivation of the conditional likelihood for the static logit model (Chamberlain 1980), which will be the basic framework for the QE models described later in this section.
Consider the static logit formulation based on the assumption where α i is the individual specific intercept and vector β collects the regression parameters associated with the explanatory variables x it . For the joint probability of y i = (y i1 , . . . , y iT ) , this model implies that where the sum t and product t range over t = 1, . . . , T and y i+ = t y it is called the total score. It can be shown that y i+ is a sufficient statistic for the individual intercepts α i (Andersen 1970). Consequently, the joint probability of y i , conditional on y i+ , does not depend on α i . In fact, we have where the denominator is the sum of the probabilities of observing each possible vector configuration of binary responses z = (z 1 , . . . , z T ) such that z + = y i+ , where z + = t z t , that is, Therefore, the conditional distribution of the vector of responses y i is where the individual intercepts α i have been canceled out.
The conditional log-likelihood based on the above distribution can be written as where the indicator function I(·) is introduced to take into account that observations whose total score is 0 or T do not contribute to the likelihood. This conditional log-likelihood can be maximized with respect to β by a Newton-Raphson algorithm, obtaining the CML estimatorβ. Expressions for the score vector and information matrices can be derived using the standard theory on the regular exponential family (Barndorff-Nielsen 1978).

Dynamic logit model
The DL model (Hsiao 2005) represents an interesting dynamic approach for binary panel data as it includes, apart from the observable covariates, both individual specific intercepts and the lagged response variable. Its formulation is a simple extension of Equation 1 with also y i,t−1 in the set of covariates. For a sequence of binary responses y it , t = 1, . . . , T , referred to the same unit i, and the corresponding covariate vectors x it , the conditional distribution of a single response is where γ is the regression coefficient for the lagged response variable measuring the true state dependence.
The inclusion of the individual intercept α i for the unobserved heterogeneity in a dynamic model raises the so-called "initial conditions" problem (Heckman 1981b), which concerns the correlation between time-invariant effects and the initial realization of the outcome, y i0 .
However, with a fixed-effects approach, individual unobserved effects are treated as fixed parameters and the initial observation can be considered as given. The distribution of the vector of responses y i conditional on y i0 is where y i * = t y i,t−1 y it .
Differently from the static logit model in Equation 1, the full DL model does not admit sufficient statistics for the individual parameters α i . Therefore, CML inference is not viable in a simple form, but can only be derived in the special case of T = 3 and in absence of explanatory variables (Chamberlain 1985). Honoré and Kyriazidou (2000) extend this approach to include covariates in the regression model, so that parameters are estimated by CML on the basis of a weighted conditional log-likelihood. However, their approach presents some limitations; mainly, discrete covariates cannot be included in the model specification and, although the estimator is consistent, its rate of convergence to the true parameter value is slower than √ n.

Quadratic exponential models
The shortcomings of the fixed-effects DL model can be overcome by the approximating QE model defined in Bartolucci and Nigro (2010), based on the family of distributions for multivariate binary data formulated by Cox (1972). The QEext model directly formulates the conditional distribution of y i as follows: where δ i is the individual specific intercept, z ranges over the possible binary response vectors z, and z i * = y i0 z 1 + t>1 z t−1 z t . The parameter ψ measures the true state dependence and vector η 1 collects the regression parameters associated with the covariates. Here we consider φ and η 2 as nuisance parameters. We refer the reader to Bartolucci and Nigro (2010) for the discussion on the interpretation of these parameters.
The QE model allows for state dependence and unobserved heterogeneity, other than the effect of observable covariates, some of which may be discrete. Moreover, it shares several properties with the DL model: 2. for t = 1, . . . , T , the conditional log-odds ratio for (y i,t−1 , y it ) is constant: while in the DL model it is constant and equal to γ.
Differently from the DL model, the QE model does admit a sufficient statistic for the individual intercepts δ i . The parameters for the unobserved heterogeneity are removed by condition on the total score y i+ . In particular, following the same derivations as in Section 2.1, we obtain: The parameter vector θ = (η 1 , φ, η 2 , ψ) can be estimated by maximizing the conditional log-likelihood based on Equation 5, that is, As for the static logit model, this maximization may simply be performed by a Newton-Raphson algorithm, and the resulting estimatorθ = (η 1 ,φ,η 2 ,ψ) is √ n-consistent and has asymptotic normal distribution. For the derivation of the score vector and the information matrix and of the expression for the standard errors, we refer the reader to Bartolucci and Nigro (2010).
A simplified version of the QEext model can be derived by assuming that the regression parameters are equal for all time occasions. The joint probability of the individual outcomes of this model, which we will refer to as QEbasic hereafter, is expressed as In the same way as for the QEext model, a √ n-consistent estimator of θ = (η , ψ) can be obtained by maximizing the conditional log-likelihood based on (6) by a Newton-Raphson algorithm.
Finally,  introduce a test for state dependence based on a modified version of the QEbasic model, named QEequ hereafter. The joint probability of y i is defined as The difference with the QE models described earlier is in how the association between the response variables is formulated: this modified version is based on the statisticỹ i * that, differently from y i * , is equal to the number of consecutive pairs of outcomes that are equal each other, regardless of whether they are 0 or 1. This allows us to use a larger set of information with respect to the QEext and QEbasic in testing for state dependence.
Conditioning on the total score y i+ , the expression for the joint probability becomes In the same way as for the QEext and QEbasic model, θ = (η , ψ) can be consistently estimated by CML and, in particular, by maximizing the conditional log-likelihood based on (8), obtainingθ e = (η e ,ψ e ).
Once the parameters in Equation 7 are estimated, a t-statistic for where se(·) is the standard error derived using the sandwich estimator; see  for the complete derivation of score, information matrix, and variance-covariance matrix.
Under the DL model, and provided that the null hypothesis H 0 : γ = 0 holds, the test statistic W has asymptotic standard normal distribution as n → ∞. If γ = 0, W diverges to +∞ or −∞ according to whether γ is positive or negative.

Pseudo-conditional maximum likelihood estimation
In order to estimate the structural parameters of the DL model, Bartolucci and Nigro (2012) propose a pseudo-CML estimator based on approximating this model by a QE model of the type described in Section 2.3. The proposed approximating model also has the advantage of admitting a simple sufficient statistic for each individual intercept and its parameters share the same interpretation as the true DL model.
The approximating model is derived from a linearization of the log-probability of the DL model defined in Equation 3, that is, The non-linear component is approximated by a first-order Taylor series expansion around α i =ᾱ, β =β, and γ = 0: . Under this approximating model, referred to QEpseudo hereafter, the joint probability of y i is Given α i and X i , the above model corresponds to a quadratic exponential model (Cox 1972) with second-order interactions equal to γ, when referred to consecutive response variables, and to 0 otherwise. Under the approximating model, each y i+ is a sufficient statistic for the incidental parameter α i . By conditioning on the total scores, the joint probability of y i becomes: where the individual intercepts α i cancel out.
A pseudo-CML estimator based on the approximating model described in Equation 11 is introduced by Bartolucci and Nigro (2012). The estimator is based on the following two-step procedure: 1. A preliminary estimate of the regression parameter β,β, is computed by maximizing the conditional log-likelihood of the static logit model described in Section 2.1. In addition, the probabilitiesq it , for i = 1, . . . , n and t = 2, . . . , T , are computed withβ =β andᾱ i equal to its maximum likelihood estimate under the static logit model.
2. The parameter vector θ = (β , γ) is estimated by maximizing the conditional loglikelihood The maximization of p (θ|β) is possible by a simple Newton-Raphson algorithm, resulting in the pseudo-CML estimatorθ p = (β p ,γ p ) of the structural parameters of the DL model. For asymptotic results and computation of standard errors we refer the reader to Bartolucci and Nigro (2012).

Package description
Here we describe the main functionalities of the R package cquad and then the corresponding commands of the cquad module implemented in Stata.

The R package
The cquad interface Package cquad includes several functions, the majority of which are called by the main interface cquad. The first argument of the cquad function is a formula that shares the same syntax with that of the plm package. For instance, using the sample data on unionized workers, Union.RData, a simple function call is

R> cquad(union~married, Union)
where the dependent variable must be a numeric binary vector. In general, as in plm and differently from lm, the formula can also recognize the operators lag, log, and diff that can be supplied directly without additional transformations of the covariates.
The second argument supplied to cquad is the data frame. As in plm, the data must have a panel structure, that is the data frame has to contain an individual identifier and a time variable as the first two columns. For instance, the data frame Union has the following structure: R> head(Union[c(1, 2)]) nr year 1 13 1980 2 13 1981 3 13 1982 4 13 1983 5 13 1984 6 13 1985 where nr is the individual identifier and year provides the time variable. As Union already has a panel structure, cquad can be called directly. Differently, if the dataset does not contain the individual and time indicators, cquad sets the panel structure and creates automatically the first two variables, provided index is supplied, namely the number of cross-section observations in the data. As an example, the dataset Wages, supplied by plm and containing 595 individuals observed over 7 periods, does not have a panel structure, which however is created by cquad as follows: R> cquad(union2~married, Wages, index = 595) Package cquad uses the same function as plm to impose the panel structure on a data frame, called plm.data. Indeed, this function can also be used to set the panel structure to the data frame, which can then be supplied to cquad without the index argument. For instance: where the factors id and time have been created and added to the data frame.

R>
In the examples above, both data frames refer to balanced panels. Nevertheless, cquad also handles unbalanced panels.
Each of the models described in Section 2 is estimated by cquad by supplying a dedicated string to the function argument model. In particular, we can estimate: • the fixed-effects static logit model by Chamberlain (1980) (model = "basic", default); • the simplified QE model, QEbasic (model = "basic", dyn = TRUE); • the QEext model proposed by Bartolucci and Nigro (2010) (model = "extended"); • the modified version of the QE model, QEequ proposed in  (model = "equal"); • the pseudo-CML estimation of the DL model based on the approach of Bartolucci and Nigro (2012) (model = "pseudo").
As an optional argument, the cquad function can also be supplied with an n-dimensional vector of individual weights; the default value is rep (1, n).
The results of the calls to cquad are stored in an object of class panelmodel. The returned object shares only some elements with a panelmodel object and contains additional ones due to the peculiarities of CML inference.
The elements in common with the object panelmodel, as described in plm, are coefficients, vcov, and call. The vector coefficients contains the estimates of: the k-dimensional vector β, for the static logit; the (k + 1)-dimensional vector θ = (η , ψ) for the dynamic models QEbasic, the conditional probability of which is defined in Equation 6, and QEequ in Equation 7, respectively; the (2k + 2)-dimensional vector θ = (η 1 , φ, η 2 , ψ) for the QEext model in Equation 4; the (k + 1)-dimensional vector θ = (β , γ) in Equation 10 for the pseudo-CML estimator of the DL model. The matrix vcov contains the corresponding asymptotic variance-covariance matrix for the parameter estimates. Finally, call contains the function call to the sub-routines required to fit each model, namely cquad_basic, cquad_ext, cquad_equ, or cquad_pseudo. The output of cquad does not provide fitted values nor residuals: as discussed in Section 2, the CML estimation approach is based on eliminating the individual intercepts in each model, and this does not allow for the computation of predicted probabilities. Similarly, residuals are not a viable tool for standard inference. On the other hand, we supply the object with estimated quantities useful for inference and diagnostics within the CML estimation approach.
The asymptotic standard errors associated with the estimated coefficients are collected in the vector se and the robust standard errors (White 1980) in vector ser. For the pseudo-CML estimator, the standard errors contained in the vector ser are corrected for the presence of estimated regressors (see Bartolucci and Nigro 2012, for the detailed derivation of the two-step variance-covariance matrix). The function output also provides the matrix scv containing the individual scores and the matrix J containing the Hessian of the log-likelihood function. In addition, cquad returns the conditional log-likelihood at convergence (lk) for each of the fitted models. Finally, it contains the n-dimensional vector Tv of the number of observations for each unit.

Simulate data from the DL model
Package cquad also contains function sim_panel_logit, which allows the user to generate a binary vector from a DL data generating process. This function requires in input the list of unit identifiers in the panel, which are collected in vector id having length equal to the overall number of observations n × T = r. As other inputs, the function requires the ndimensional vector of the individual specific intercepts that must be somehow generated, for instance drawing them from a standard normal distribution, and the matrix of covariates (if they exist) that has dimension r × k, where k is the number of covariates. Each row of this matrix contains a vector of covariates x it arranged according to vector id. Finally, in input the function requires the vector of structural parameters, denoted by eta, that is, β for the static logit model and (β , γ) for the DL model; the model of interest is specified by the optional argument dyn.
As output values, function sim_panel_logit returns a list containing two vectors, pv and yv. The first contains the success probability computed according to the DL model corresponding to each row of matrix X and accounting for the corresponding individual intercept in al. Vector yv contains the binary variable which is randomly drawn from this distribution.

The Stata module
The cquad module in Stata consists of four Mata routines for the estimation by CML of the QE models described in Section 2.3. It contains four commands with the syntax where cmd has to be substituted with the string corresponding to the type of model to be estimated. In particular: • cquadext fits the QEext model of Bartolucci and Nigro (2010)

defined in Equation 4;
• cquadbasic estimates the parameters of the simplified QE model, QEbasic, the conditional probability of which is defined in Equation 6. Differently from the R package, cquadbasic fits only the dynamic QE model, as the static logit model can estimated by xtlogit; • cquadequ fits the modified QE model defined in Equation 7 proposed by ; • cquadpseudo fits the pseudo-CML estimator proposed by Bartolucci and Nigro (2012) for the parameters in Equation 10.
In addition, depvar is the series containing the binary dependent variable, and id is the variable containing the list of reference units uniquely identifying individuals in the panel dataset. Optionally a list of covariates [indepvars] can be supplied.
The four commands return an eclass object with the estimation results. Scalar e(lk) contains the final conditional log-likelihood and macro e(cmd) holds the function call. Moreover, matrix e(be) contains the estimated coefficients and it is of dimension (2k + 2) × 1 for cquadext, or of dimension (k + 1) × 1 for cquadbasic, cquadequ, and cquadpseudo. Matrices e(se) and e(ser) contain the corresponding estimated asymptotic and robust standard errors, respectively. Finally, matrices e(tstat) and e(pv) collect the t test statistics and the corresponding p values.

Examples
In the following we illustrate package cquad by means of three applications. In particular, we show how to compute the CML estimators for the QE models and the pseudo-CML estimator in R and Stata using longitudinal data on unionized workers extracted from the U.S. National Longitudinal Survey of Youth, which has been employed in several applied works to illustrate dynamic binary panel data models (Wooldridge 2005;Stewart 2006;Lucchetti and Pigini 2015). Moreover, we propose a simulation example using sim_panel_logit provided in the R package.

Use of the Union dataset in R
To illustrate the R package, we use the dataset employed in Wooldridge (2005) and available in the Journal of Applied Econometrics data archive. The dataset is referred to 545 male workers interviewed for eight years, from 1980 to 1987. Similarly to the empirical application in Wooldridge (2005), the variables relevant to our example are a binary variable equal to 1 if the worker's wage is set by a union, which will be used as the dependent variable, and a binary variable describing his marital status, used as covariate. The original dataset also contains information on the race and years of schooling, which however cannot be employed in our example since they are time-invariant: Notice that the panel structure required by cquad is already imposed.
Then, in order to fit the static logit model to this data by the CML method, we call cquad with the following syntax

R> out1 <-cquad(union~married + year, Union)
This estimates a logit model with union as the dependent variable and married and time dummies as covariates, obtaining the following output The output of summary displays the function call, the value of the log-likelihood at convergence, and the estimated coefficients with the corresponding asymptotic standard errors and t test results. Notice that including variable year among the covariates in the formula leads cquad to the automatic inclusion of the time dummies in the model specification, except for year1980 due to collinearity, even though variable year is numeric in the original data frame:

R> str(Union$year)
int [1:4360] 1980 1981 1982 1983 1984 1985 1986 1987 1980 1981 ... This happens because cquad recognizes the second variable in the data frame as the time variable, and with the call to plm.data and model.matrix the numeric time variable is transformed into a factor.
To estimate the dynamic specification of the QEbasic model, cquad needs to be called with the dyn = TRUE option. In addition, as we are working with a balanced panel, an additional time dummy must be excluded because the lag of the dependent variable is included in the conditioning set and the initial time occasion is lost. In this case, we perform this operation outside the cquad interface R> year2 <-Union$year R> year2[year2 == 1980 | year2 == 1981] <-0 R> year2 <-as.factor(year2) R> out2 <-cquad(union~married + year2, Union, dyn = TRUE) R> summary(out2) In the code above, we store the numeric time variable from the original data frame in year2; then, we set the variable to 0 for two of its values, as we loose one time occasion due to the dynamic specification and one time effect due to the collinearity of the remaining dummies. In order to estimate the model with time dummies, we need to convert year2 into a factor: cquad will not recognize year2 as the time variable since it is not in the data frame. If instead we leave year in the formula, a warning message is given after convergence and the results are obtained using the generalized inverse of the Hessian matrix. To fit the QEext model, we need to further exclude the last time value (i.e., 1987): since there is an intercept term φ in Equation 5, the effect associated with the last time dummy is not identified with balanced panels: year3 == 1980 | year3 == 1981 | year3 == 1987] <-0 R> year3 <-as.factor(year3) R> out3 <-cquad(union~married + year3, Union, model = "extended") By typing summary(out3) we obtain Call: cquad_ext(id = id, yv = yv, X = X, w = w)  Notice that the estimation results are in agreement with those obtained by fitting the QEext or the QEbasic models; however they exhibit some differences since the pseudo-CML estimator is based on the conditional probability in Equation 11 that contains the parameters of the true DL model. Nevertheless, these results confirm the presence of a high degree of state dependence in union participation.
In the first part of the script inside the for loops, we generate the identifier id as an ndimensional vector, the n×T vector for the single covariate X, and the n-dimensional vector of individual intercepts alpha, which is computed in a similar manner as in Honoré and Kyriazidou (2000). Lastly, we generate the binary response variable using function sim_panel_logit described in Section 3.1. As the function returns both the binary variable and the response probabilities, the dependent variable needs to be retrieved by yv <-data$yv.
Once the data have been generated, we proceed to the estimation of the QEequ model using cquad with model = "equal" to fit the modified QE model in Equation 7 by CML; we store the results for the t test in Equation 9. Finally, we display the results containing the average value of the test in the 100 sample and the average rejection rate of a bilateral test at the 0.05 significance level. The last part or the script produces the following output: ... where the iteration logs from cquad have been omitted. Under the null hypothesis γ = 0, the rejection rate is very close to the nominal size of 0.05, while under the alternative hypothesis γ = 1 the test exhibits good power properties. These results are close to those found by  in their simulation study, to which we refer the reader for an extension of this simple design to several other scenarios.

Analysis of union data in Stata
In the following, we illustrate the Stata module cquad that contains the four commands to fit the QE models described in Section 2.3 by an example based again on data about unionized workers. The dataset to replicate this example is already available in the Stata online data repository and is contained in file union.dta.
The three commands reported below load the dataset, then describe the panel structure, already in place, and list the variables present in the dataset webuse union xtdes descr The output generated by these command lines is: . webuse union (NLS Women 14-24 in 1968) .  Bartolucci & Nigro (2010), Econometrica  First the iteration logs are reported, then the estimation output is displayed in a standard fashion, reporting the estimated coefficients for the QEbasic model, along with asymptotic standard errors, the related t test statistics and p values. Notice that the estimate associated with ψ in Equation 6 reflects a high degree of positive state dependence, in line with the well-known results in other applied works.
The extended version of the QE model, QEext, can be fitted in a similar manner, by using command cquadext: cquadext union idcode age grade south not_smsa _Iyear_72 _Iyear_73 _Iyear_77 _Iyear_78 _Iyear_80 _Iyear_82 _Iyear_83 _Iyear_85 _Iyear_87 Notice that here we are not using the xi: prefix and the factor i.year as explanatory variable. In fact, we list the time dummies separately in order to exclude the dummy for 1988: in the QEext model, not all the effects associated with the time dummies can be identified, due to the presence of an intercept term, φ, in the regressors referred to the observation at time T (see Equation 4).

(output omitted)
where the iteration logs have been omitted for brevity. If the time-dummy associated with the last observation is not dropped beforehand, a warning message is printed, and the results are obtained using the generalized inverse of the Hessian.   The estimation results are different from those obtained by cquadbasic because of the different way the association between y it and y it−1 is specified in Equation 7. The test for absence of state dependence is the t test associated with the lagged dependent variable reported in the output above.
Finally, command cquadpseudo fits the pseudo-CML estimator of the parameters of the DL model described in Section 2.4. The input line is as follows xi: cquadpseudo union idcode age grade south not_smsa i.year and produces the following output: . xi: cquadpseudo union idcode age grade south not_smsa i.year i.year _Iyear_70-88 (naturally coded; _Iyear_70 omitted)