DPB : Dynamic Panel Binary Data Models in gretl

This paper presents the gretl function package DPB for estimating dynamic binary models with panel data. The package contains routines for the estimation of the random-eﬀects dynamic probit model proposed by Heckman (1981b) and its generalisation by Hyslop (1999) and Keane and Sauer (2009) to accommodate AR(1) disturbances. The ﬁxed-eﬀects estimator by Bartolucci and Nigro (2010) is also implemented. DPB is available on the gretl function packages archive.


Introduction
Non-linear dynamic models for binary dependent variables with longitudinal data are nowadays quite common in microeconometric applications, especially given the increasing availability of panel datasets. One of the most attractive features of models of the type y * it = γy it−1 + x it β + α i + ε it , y it = 1{y * it ≥ 0} for i = 1, . . . , n, t = 2, . . . , T, where1{·} is an indicator function, is that they lend themselves very naturally to an interpretation in terms of true state dependence, that is a situation in which the realization of an event affects the probability of the same event occurring in the future, as opposed to simple time persistence in y * it , which may be due to covariates x it and/or to the time-invariant unobserved heterogeneity (Heckman 1981a).
This feature is particularly attractive when the problem at hand displays some form of path dependence. Therefore, these models have been found to be extremely useful in the analysis of several microeconomic topics: labor force participation, more specifically female labor supply The initial conditions problem can also be circumvented via a fixed-effects (FE) approach, which makes it possible to estimate the regression parameters consistently without having to make distributional assumptions on the unobserved heterogeneity. The key idea is to condition the joint distribution of y i on a suitably defined sufficient statistic for α i . For static logit models, it is possible to define a conditional maximum likelihood (CML) estimator. Its dynamic extension, however, has not gained widespread adoption in empirical work since it cannot be easily generalized to every time-configuration of the panel and requires strong restrictions to the model specification. Moreover, these models generally require that at least a transition between the states 0 and 1 is observed for the individual to contribute to the likelihood. As a result, the number of usable observations often reduces drastically compared to the sample size, especially in the cases when strong persistence in the dependent variable is exactly the reason why a dynamic model is needed. Nevertheless, estimators based on a FE approach always represent a useful complement to the practitioner's toolkit, as they provide reliable inference in cases where the unobserved heterogeneity may be correlated with the individual covariates.
The first proposal of a FE logit model can be found in Chamberlain (1985): Estimation relies on conditional inference and, therefore, is rather simple to perform. Exogenous covariates, however, cannot be included and the proposed sufficient statistic for incidental parameters needs to be determined on a case-wise basis according to the time-series length. Honoré and Kyriazidou (2000) extended Chamberlain's formulation in order to include explanatory variables; this approach, however, implies the usage of semi-parametric techniques that require a substantial computational effort. Moreover, time-dummies (a customary addition to practically all empirical models) cannot be handled either. Recently, Bartolucci and Nigro (2010) defined a dynamic model, which belongs to the QE family. Their proposed model closely resembles a dynamic logit model and overcomes most of the difficulties encountered by Honoré and Kyriazidou's estimator and can be implemented by suitably adapting ordinary static FE logit software. 2 In this work, we present the gretl (gretl Team 2016) implementation of the available set of tools to estimate dynamic models for binary dependent variables in panel datasets by both FE and RE approaches, collected in the DPB (dynamic panel binary) function package. The RE models contained in DPB are the dynamic probit with linearized index initial condition proposed in Heckman (1981b) and the generalizations by Hyslop (1999) and Keane and Sauer (2009). Compared to the other available estimators based on a RE approach, Heckman's estimator has been shown to suffer from remarkably little small-sample bias (Miranda 2007;Akay 2012) and is widely used in microeconomic applications. On the contrary, the estimator proposed by Wooldridge (2005) is not included in the package per se, despite its common use by practitioners, since gretl already provides suitable functions natively: Therefore, we just provide an example showing how to implement it via a simple script. Finally, DPB also contains the software for estimating the QE model in Bartolucci and Nigro (2010).
The DPB function package represents a more comprehensive toolbox to estimate dynamic binary RE models compared with other available software. Stewart (2006) implemented the Stata (StataCorp. 2015) commands redprob and redpace for the dynamic probit models proposed by Heckman (1981b) and Hyslop (1999), respectively. Compared with Stewart's module, apart from including the latest proposal by Keane and Sauer (2009), DPB handles unbalanced panel datasets, which is a fundamental feature for practitioners working with reallife datasets. As for the available software to estimate QE models, routines are available in the R (R Core Team 2017) package and Stata module cquad . These packages include the QE model of Bartolucci and Nigro (2010), as well as some extensions 3 that will be included in future versions of DPB.
The rest of the paper is organized as follows: Section 2 lays the methodological background for the estimators provided in the package, while in Section 3 we discuss the main computational issues of their implementation; Section 4 describes in detail the features of DPB; Section 5 provides an empirical application based on a dataset of unionized workers extracted from the US National Longitudinal Survey of Youth; Section 6 concludes.

Methodological background
A general model for the conditional probability of a binary response variable y it can be written as where F t is the information set at time t available for individual i, which, in general, may consist of a set of individual covariates X i = [x i1 , . . . , x iT ] as well as the lag of the response variable y t−1 i = [y i1 , . . . , y it−1 ]; α i is the individual time-invariant unobserved effect, which is assumed to be a continuous r.v.; ψ 0 is the vector of model parameters. In the simple case where the relevant conditioning information set is X i , i.e., y it does not depend on y t−1 i , the joint probability of In a dynamic model y it is allowed to depend on its past history. When unobserved effects are present, the process for the response variable needs to be initialized in order to account for how y i relates to the process before the observations started being available. The dependence on the past of y it is usually likely to be limited to its first lag, so that the relevant subset of F t is [y it−1 , X i ]. Therefore, the joint probability of y i , conditional on X i , becomes The rest of the section describes the RE approach proposed by Heckman (1981b) and its generalizations and the FE approach put forward by Bartolucci and Nigro (2010). In the first case, a reduced form equation for p(y i1 |X i , α i ; ψ 0 ) is specified. In the second case, the unobserved effect α i is eliminated by conditioning p (y i |X i , α i ; ψ 0 ) on a suitable sufficient statistic and, as a result, the initial condition does not need to be dealt with, so that the joint probability can be written as p (y i |X i , y i1 , α i ; ψ 0 ).

Random-effects approach
The estimator proposed by Heckman (1981b) is based on a standard formulation of a dynamic RE binary choice model with an additional equation for the initial observation y i1 : where y it is the binary response variable, 1{·} is an indicator function, x it is a vector of individual covariates and z i1 contains the values of X i in the first period and pre-sample information that can be used as exclusion restrictions. The assumptions on α i and ε i are: where σ 2 α = V (α i ). In this paper, this model will be referred to as the "DP" model (dynamic probit). Note that the distributional assumption implies independence between α i and ε i and absence of autocorrelation in ε i , i.e., E [ε it ε is ] = 0 for t, s = 1, . . . , T and t = s. Under these assumptions, the parameter vector ψ = β , γ, π , θ, σ α can be estimated by ML, and the ith contribution to the likelihood is given by the following expression: where Φ(·) is the standard normal c.d.f. Since the unobserved effects are normally distributed, the integral over α i can be evaluated by means of Gauss-Hermite quadrature (Butler and Moffitt 1982). Hyslop (1999) considered an interesting generalization of Heckman's approach by allowing for autocorrelation in ε i . In terms of interpretation, this setting makes it possible to further disentangle two different sources of time persistence: In Heckman (1981b), the true state dependence captured by γ in Equation 1 is isolated from the persistence induced by timeinvariant unobserved effects α i ; with autocorrelated errors, the persistence in the time-varying unobserved effects is also parametrized. The error terms ε it follow the AR(1) process where |ρ| ≤ 1 and η it ∼ N (0, 1 − ρ 2 ). Therefore, the variance-covariance matrix of the error components needs to be modified as follows: Note that Equation 4 reduces to Equation 3 for ρ = 0; hence, we will call this the "ADP" model (AR1 dynamic probit).
Recently, Keane and Sauer (2009) introduced a more general version of Σ in which an additional parameter is defined. The starting point is to modify Equation 2 as with E(u i · ε i2 ) = τ . Note that since both u i and ε it need to be normalized for identification, τ is effectively a correlation coefficient, so |τ | ≤ 1 holds by the Cauchy-Schwartz inequality. Therefore, Equation 4 becomes: Notice that Equation 5 is equal to Equation 4 for τ = 1 and equal to Equation 3 for ρ = 0. The acronym we will use henceforth for this model is "GADP" (generalized ADP).
In theory, its greater generality makes model (5) more appealing for applied work than its restricted counterpart Equation 4 and, a fortiori, Equation 3. However, there are two factors that may make it advisable to opt for the more restrictive versions of the model. The first one is computational complexity: The specification of ε it as an AR(1) process makes it impossible to evaluate the joint probability p(·) by integrating out the random effect α i . Instead, T -variate normal probabilities must be evaluated by simulation by using the GHK algorithm (Geweke 1989;Hajivassiliou and McFadden 1998;Keane 1994) in order to compute the likelihood function: ; C is the lower-triangular Cholesky factor of Σ, defined in Equation 4 or Equation 5, and r is the number of random draws used in the simulation. The second one is that, in finite samples, the ADP and GADP models (especially the latter) may suffer from serious identification problems for certain regions of the parameter space; in some cases, this could cause serious numerical problems for conducting inference, because of insufficient curvature of the loglikelihood function. These issues will be described in greater detail in Section 3.4.

Fixed-effects approach
With the exception of linear probability models, FE approaches to the estimation of dynamic binary choice models are usually based on the dynamic logit formulation. In this case, model (1) is subject to strict exogeneity conditional on α i and it is assumed that the error terms ε it , t = 1, . . . , T , are i.i.d. standard logistic random variables. Therefore, the probability of observing y it can be written as As is well known (Chamberlain 1985), in the static case (γ = 0) the total score y i+ ≡ t y it is a sufficient statistic for α i , on which the distribution of y i can be conditioned, so as to remove the unobserved individual effect α i ; unfortunately, in the dynamic logit model (γ = 0) equivalent sufficient statistics may exist, but they must be derived on a case-wise basis. Moreover, the inclusion of explanatory variables requires restrictions on their distribution and a nonnegligible computational effort (Honoré and Kyriazidou 2000).
The joint probability of observing the response configuration y i conditional on the initial observation y i1 is where sums and products go from t = 2 to T .
The QE model proposed by Bartolucci and Nigro (2010) directly defines the joint probability of y i as where This probability closely resembles the one in Equation 6 and, under Equations 6 and 7, Bartolucci and Nigro (2010) show that γ has the same interpretation in terms of log-odds ratio between each pair of consecutive y it s. It can be shown that the total score y i+ is a sufficient statistic for the unobserved effects α i in Equation 7. The conditional distribution based on the total score is so that the denominator contains only those vectors b ∈ B such that b + = y i+ . By using the above conditional probability, the conditional log-likelihood can be written as and maximized with respect to ψ = γ, β 1 , µ, β 2 . We refer the reader to Bartolucci and Nigro (2010) for details on the score and Hessian of (ψ).

Treatment of missing values
The DPB package can handle unbalanced panels, as long as there are enough consecutive observations for each longitudinal unit. If we use T i to indicate the maximum time span of consecutive observations for individual i, the requirements for a cross-sectional unit i to be included in the sample are • T i ≥ 2 for the DP, ADP and GADP models; • T i > y i+ > 0 for the QE models.
For each unit, the longest available consecutive set of observations is used. If more than one sequence is available, we take the most recent one to compute T i . Of course, the choices above imply the assumption that observability of y i,t and/or x i,t is totally independent of the random variables included in the models. For further details, see Albarrán, Carrasco, and Carro (2015).

Computation of the log-likelihood and its derivatives
The two most important algorithms employed for computing the log-likelihood of the four models handled by the package are Gauss-Hermite quadrature for the DP model and the GHK algorithm for the ADP and GADP models. 4 As for the QE model, the main problem lies in efficient computation of the denominator in Equation 8, which does not include problematic functions, but is made tricky by the multiplicity of cases to consider. Fortunately, a recursive approach is possible, which we will describe in Section 3.3.
For Gauss-Hermite quadrature, the DPB function package uses the gretl native function quadtable, which guarantees good speed and accuracy. The number of quadrature points can be chosen by the user, with a default of 24. The DPB package does not include, at present, the option of using adaptive quadrature methods as recommended, for example, in Rabe-Hesketh, Skrondal, and Pickles (2002).
As for simulation-based methods, DPB relies on the gretl function ghk, which natively provides analytic derivatives. This function does not implement optimization techniques such as the pivoting method by Genz (1992), because Genz's method may introduce discontinuities that would make numerical differentiation problematic. On the other hand, it automatically switches to a parallel implementation on a multi-core machine in a shared-memory environment, which gives a noticeable performance boost on modern CPUs. 5 The default method for feeding the uniform sequence to the GHK algorithm is by using Halton sequences (Halton 1964), but the user can switch to the uniform generator used by default in gretl (the SIMD-oriented implementation of the Mersenne Twister algorithm described in Saito and Matsumoto 2008) if so desired.
The choice method for optimization is BFGS with analytical derivatives for the probit models DP, ADP and GADP; this has generally proven quite effective and remarkably more robust and efficient than Newton-Raphson. Alternatively, its limited-memory variant described in Byrd, Lu, Nocedal, and Zhu (1995) can also be used. Two of the parameters in the covariance matrix Σ are in fact maximized via an invertible transformation to help numerical stability: σ 2 α is expressed as its natural logarithm and ρ through the (inverse) hyperbolic tangent transform, so that the parameters that enter the actual maximization routine are unbounded. Initial values for β, γ, and π are obtained by straightforward linear probability models, while σ α and θ are both set to 1.
For the QE model, instead, the method of choice is Newton-Raphson, which takes advantage of the fact that the computation of the Hessian matrix is remarkably inexpensive once the analytical score has been been obtained. Initial values for γ, β 1 , µ and β 2 are simply 0. 6

Computation of the denominator term in the QE model
The peculiar nature of the QE model makes it impossible to compute its denominator in a manner akin to the algorithm used in several packages for the ordinary conditional logit model, which is in turn a variation of the recursive algorithm by Krailo and Pike (1984).
The algorithm we implement has the virtue of relegating the recursive computation of the relevant combinations (b vectors) to the initialization stage; by memorizing the relevant information into an array of matrices, we avoid recursion at each computation of the likelihood. Our algorithm can be briefly described as follows.
The denominator in Equation 8 can be easily written as a function of a matrix Q, whose size and elements are function of three scalars: T i , y i+ and y i * , where y i * = y it y it−1 is the number of consecutive ones in y i .
Define an injective index function j = j(T, y + , y * ) (where we omit the i subscript for conciseness). Obviously, all individuals in the sample with the same values for T , y + and y * can share the same j index. The idea is to pre-generate all the needed Q j matrices and store them in an array; since T i ≥ y i+ > y i * holds, the array can have at most T 2 max (T max − 1) elements, although in practice the number of distinct elements in the array will be much smaller. In a typical panel data setting, it is unlikely that more than a few hundred distinct Q j matrices will have to be computed and stored in memory.
The internal structure of Q j is where 1. Q 1 (T, y + ) is a matrix with T ! y + !(T −y + )! rows and T columns whose rows are b : b + = y i+ (see Section 2.2); 2. q 2 (y * ) is a column vector holding the number of times in which consecutive ones are obtained in the corresponding row of Q 1 .
For example, consider an individual for whom y i = [0, 1, 1, 0]; in this case we have T i = 4, y + = 2 and y * = 1. The corresponding Q matrix would be which lists all the possibilities. We call the "relevant" row k i the one which contains the sequence y i actually observed (the third row of Q in this example, so k i = 3).
The Q 1 (T, y + ) matrices can be computed recursively by noting that with the special cases Since none of the necessary Q j depend on parameters to be estimated, they are pre-computed in a preliminary loop and stored in an array of matrices. The algorithm employed can be therefore given the following schematic description: 1. Initialize an empty array of matrices Q.

For each individual i:
(a) Compute the j(i) index as a function of T i , y i+ and y i * .
(b) If Q j has already been computed, stop and go to the next individual i; else i. Compute Q 1 (T i , y i+ ) via the recursive method described above. ii. Store the relevant row k i for individual i. iii. Compute Q j via Equation 9. iv. Store Q j into the array at position j.
Once this is done, the likelihood for individual i in Equation 8 becomes a simple function of k i and Q j , which do not need to be recomputed during the Newton-Raphson iterations.

Identification of ρ and τ for RE probit models with AR(1) errors
When estimating RE probit models with autocorrelated errors, maximizing the log-likelihood may be quite challenging from a numerical point of view because of weak identification in some of the parameters. Consider first the ADP model, in which the covariance matrix of the compound disturbance terms for a unit is given in Equation 4. It is quite obvious that some combinations of the parameters give rise to a very badly conditioned matrix. This happens, in particular, when θ 1 and ρ = 1 − (for "small" ). In those cases, Σ will be very close to being singular, especially in those units for which the time dimension is short (note that all these three possibilities are far from being uncommon in practice). The typical outcome is BFGS taking many iterations to converge or, worse, not converging at all.
The situation is even worse for the GADP model, whose covariance matrix is given in Equation 5: First, since the covariance matrix depends on four parameters, an elementary order condition dictates that only the observations for which T i ≥ 3 will provide enough curvature to the log-likelihood to separately identify all four parameters, which is not an overly restrictive requirement. However, note that the parameter τ only appears in products like τ · ρ t , with 1 ≥ t ≥ (T i − 1). This implies that, for obtaining enough curvature in the objective function in the direction of τ , the sequence ρ, ρ 2 , ρ 3 , . . . must be noticeably different (from a numerical point of view) from a constant sequence. In practice, this implies that estimation is likely to fail anytime in case (i) ρ is very close to 0 or to 1 and (ii) T max is small, unless N is truly enormous. In the limit, if ρ = 0 or ρ = 1 the model is under-identified even when N → ∞ and T → ∞.

The DPB function package
In gretl, function packages are collections of user defined functions made available to other users. The user can download and install the function packages available on the gretl server by menu or script. A function package can be downloaded and installed simply by invoking the install command: Gretl console: type help for a list of commands ? install DPB.zip

Installed DPB.zip
Then, in each work session, the function package needs to be loaded by: We refer the reader to the gretl user guide for illustration on how to download and install function packages from the menu. The function package DPB includes five functions that handle model set-up, option management, estimation and printing of results. A summary of these functions is given in Table 1 and a detailed description is contained in the section "List of Public Functions" of the DPB documentation file. It is possible to call the public functions from the list window of function packages with the exception of DPB_printout.

Return type bundle
Function arguments string mod "DP": Dynamic probit model (Heckman 1981b) "ADP": AR(1) dynamic probit model (Hyslop 1999) "GADP": Generalized AR (1)  After estimation, this function is called automatically with a double click on the bundle icon in the icon window, where the model bundle is stored.
In the rest of this section, we first illustrate how to estimate the RE models DP, ADP and GADP described in Section 2.1, then we show how to perform estimation of the QE model proposed by Bartolucci and Nigro (2010) by means of code files written in gretl's scripting language, Hansl (Cottrell and Lucchetti 2017). 7 All the code used in this section as well as the data required are available in the supplementary material for replication accompanying the paper. The minimum version of gretl required to use the package is 2016a.

Random-effects dynamic probit
We start by describing how DPB handles the RE approach proposed by Heckman (1981b), by using an artificial dataset generated following Equations 1 and 3: for i = 1, . . . , 4096 and t = 1, . . . , 6. We also assume that ρ = 0, so ε it is not autocorrelated. The supplementary material contains the code for generating the data. The code generates dynamic binary data under normality with the error structure in Equation 3, sets the parameter values in Equation 10 and stores the artificial data in DP_artdata.gdtb.
Once the DPB function package has been downloaded from the gretl server, a simple Hansl script to estimate the DP model is: ? set echo off ? set messages off ? include DPB.gfn ? open DP_artdata.gdtb ? list X = const x ? list Z = const x z ? b = DPB_setup("DP", y, X, Z) ? DPB_estimate(&b) The first two lines of code just prevent gretl from echoing unnecessary output, while the include command loads the function package. Notice that here we are not setting any panel structure to the data, which is required for DPB to work, after they have been opened; this is because it has already been imposed on the sample datasets upon creation. If the data had not already had such structure, it should have been set by the user with the instruction:

? setobs id time --panel-vars
The lists of explanatory variables are then created. In the next line, the call to the public function DPB_setup initializes the model and returns a bundle: The first argument is a string containing the name of the model to be estimated ("DP" in this case). The remaining arguments y, X, Z are the dependent variable, the list of explanatory variables for the primary equation and those for the initial condition equation.
The function DPB_estimate takes as its argument the pointer to the model bundle and fills the bundle with the estimated quantities. The bundle elements can be listed by simply typing print b. A detailed description of the bundle elements is given in the DPB documentation. Bundle elements can be accessed by the syntax npar = b["npar"] or npar = b.npar. The returned bundle can also be stored as an XML file by using the gretl builtin function bwrite(b, "mod") and reloaded, if necessary, by the companion function b = bread("mod"   where coefficients, standard errors and related z-tests are reported for the state dependence parameter γ, the coefficients of the main equation β, the coefficients of the initial conditions equation π, and the covariance matrix parameters θ and σ α . At the end of the output table, a Wald test for the joint significance of the explanatory variables in the main equation is reported. One of the reasons why RE models are often preferred to FE models by practitioners is the possibility of computing partial effects (PE), that have a meaningful economic interpretation in terms of probability variations. While the computation of PE is straightforward in models for cross-sectional data, additional sources of complication arise when dealing with panel data, mainly because the individual unobserved effect α i enters the index function. Therefore, in addition to the estimation results, DPB provides the user with the possibility of printing out average PE by calling the public function DPB_printape, in the same manner as DPB_printout: where the average partial effects are computed following (Wooldridge 2010, p. 485) and the associated standard errors are derived via the Delta method.
As shown in the output headings, the model has been estimated using the Gauss-Hermite quadrature with 24 quadrature points, and the variance-covariance matrix of the parameters has been estimated by a sandwich formula. These are the default settings, which can be altered by the user by calling the public function DPB_setoption before DPB_estimate. For instance, ? err = DPB_setoption(&b, "nrep", 32) ? DPB_estimate(&b) string opt scalar value "method" 0 = Gauss-Hermite quadrature (GHQ), default choice for the DP model, 1 = GHK algorithm. For the ADP and GADP models, method is forced to 1. For the QE model a warning message is printed.
"nrep" Number of quadrature points or GHK draws. Default is 24 for the DP model with GHQ, 128 for the DP model with GHK and for the ADP and GADP models. For the QE model a warning message is printed.
"verbose" Degree of output verbosity. 0 = No output is printed, 1 = The log-likelihood at each iterations is printed (default), 2 = Log-likelihood, parameters and gradient at each iteration are printed.
"draws" Type sequence for the GHK algorithm. 0 = Halton (default), 1 = Uniform with seed 31415927. For the DP model with GHQ and the QE model a warning message is printed. DPB_setoption returns a scalar (called err in the above code lines) containing 0 if the option is valid and successfully set, an error code otherwise. For a detailed description of the available options, see Table 2.
In the example above, the string "nrep" and the scalar 32 are used to set the number of quadrature points to be used during estimation.
The DP model can also be estimated by computing the multivariate normal probabilities by GHK instead of using numerical integration by Gauss-Hermite quadrature. After the call to DPB_setup, the user can switch the method from GHQ (default) to GHK by calling DPB_setoption as follows ? err = DPB_setoption(&b, "nrep", 100) ? err = DPB_setoption(&b, "method", 1) When the GHK method is invoked, a Halton sequence is used by default. The user may instead use random draws from a uniform distribution by setting "draws" to 1, as described in Table 2. The number of Halton points used by the GHK algorithm has a default value of 128 that can also be modified by the function DPB_setoption with the string "nrep" and the number of points, as illustrated earlier in this section.
In some cases, estimation can be computationally quite intensive. In particular, the GHK algorithm is quite sensitive to the time dimension of the panel, as T sets the dimension for the covariance matrix given in Equation 3.  . Results obtained on a system with two Intel Pentium CPU G640 @ 2.80GHz processors.
package for different numbers of quadrature points and GHK replications to give the user an idea of the trade-off between algorithm quality and time. The table was obtained by the following command lines: ? b = DPB_setup("DP", y, X, Z) ? err1 = DPB_setoption(&b, "verbose", 2) ? err2 = DPB_setoption(&b, "vcv", 1) ? loop foreach i 16 24 32 > set stopwatch > err_$i = DPB_setoption(&b, "nrep", $i) > DPB_estimate(&b) > DPB_printout(&b) > t_$i = $stopwatch > print t_$i > endloop The verbose string and the scalar 2 in DPB_setoption result in the printing of the values of parameters and gradients at each iteration in the log-likelihood maximization. In addition, we use the option vcv with value 1 to set the covariance matrix estimator to OPG, which is the least computationally demanding. Since the method option was left unmodified, the above code estimates the dynamic probit model by GHQ, with 16, 24, and 32 quadrature points. If, instead, one wishes to inspect the performance of the GHK algorithm, the method must be changed via err = DPB_setoption(&b, "method", 1) after the set-up function; in Table 3 we display the results obtained by setting the number of GHK draws to 128, 192, and 256. The estimates obtained via GHQ do no exhibit remarkable differences from each other. Conversely, the GHK algorithm obviously takes a longer time to yield the results because of its nature. In particular, the CPU-time increases in a roughly linear way with the number of replications.
As described in Section 2.1, the ADP and GADP models can be formulated by setting the variance-covariance matrix of the error terms as in Equations 4 and 5, respectively. Such a ADP GADP  structure does not allow the use of GHQ to evaluate the relevant normal integral and GHK has to be used instead. In order to illustrate how DPB handles these extensions, we use the simulated dataset in Equation 10 with ρ = 0.3, τ = 1 (stored in ADP_artdata.gdtb) and ρ = 0.3, τ = 0.6 (stored in GADP_artdata.gdtb) to build Equations 4 and 5, respectively. The ADP model can be estimated by setting the first argument of DPB_setup to "ADP": ? open ADP_artdata.gdtb ? list X = const x ? list Z = const x z ? b = DPB_setup("ADP", y, X, Z) ? DPB_estimate(&b) ? DPB_printout(&b) In this case, if the user tries to set GHQ as estimation method, a warning message is printed and the estimation proceeds with GHK using 128 Halton points.
Similarly, the GADP model can be estimated by setting the first value in the DPB_setup to "GADP": ? open GADP_artdata.gdtb ? list X = const x ? list Z = const x z ? b = DPB_setup("GADP", y, X, Z) Running the two examples above produces the results shown in Table 4.

Quadratic exponential model
We exemplify the FE estimator proposed in Bartolucci and Nigro (2010), by means of an artificial data set, generated in a similar way as in Equation 10: where α i is generated as in Equation 10, the regressors x 1it and x 2it are standard normal random variables and the error terms ε it are logistically distributed with zero mean and variance π 2 /3. The supplementary material also contains the Hansl code for generating the artificial data which is then stored in QE_artdata.gdtb. A simple script to estimate the quadratic exponential model described in Section 2.2 is: ? include DPB.gfn ? open QE_artdata.gdtb ? list X = x1 x2 x3 ? b = DPB_setup("QE", y, X) ? DPB_setoption(&b, "vcv", 2) ? DPB_estimate(&b) ? DPB_printout(&b) After the required panel structure is set and the list of explanatory variables has been created, the script calls the set-up function with the string "QE" as its first argument. For illustrative purposes, in the above example the covariance matrix estimator is set to the Hessian by setting the vcv option to 2. The script returns the following output:  Notice that, since only the units with 0 < y i+ < T are used, DPB reports in the output headings the number of actual contributions to the log-likelihood. The output reports coefficients, standard errors and z statistics for the parameters in Equation 8: the coefficient associated with the lagged dependent variable γ, the parameters β 1 for the explanatory variables and, finally, the parameters associated with the last observation, µ, β 2 .
The handling of time dummies in the model specification deserves a special mention. For instance, let us discuss the case of a balanced dataset: In a panel of T periods, the QE model identifies T − 3 time effects, as two dummies are dropped for the initial and rank condition in the main equation; another one gets dropped as the observation at time T is handled separately in the model specification (see Equation 8) and it includes an intercept term.

Examples: The union dataset
In this section, we illustrate the DPB function package by means of two empirical applications.
In the first exercise, we replicate the example proposed by Stewart (2006), used to present the software components to estimate RE dynamic probit models.
In the second, we show how to implement in Hansl the estimator proposed by Wooldridge (2005), how to compute predicted probabilities, and compare it with the DP model available in DPB. As a side-product, we show how to implement the popular correlated random effects (CRE) approaches (Mundlak 1978;Chamberlain 1980).
Both examples use data extracted from the US National Longitudinal Survey of Youth on unionized workers, a very popular dataset often used as a benchmark for RE (dynamic) models for binary dependent variables with longitudinal data. The code and the dataset used for these examples are available in the supplementary material for replication.

Example 1
In the following, we replicate the empirical example provided in Stewart (2006), where the union dataset is used to illustrate the Stata software component for the estimation of the RE dynamic probit models of Heckman (1981b), DP, and Hyslop (1999), ADP. 8 Differently from DPB, the software provided by Stewart (2006) unsurprisingly does not allow for the implementation of the estimator proposed by Keane and Sauer (2009). The results reported in this section can be replicated using the code and the data file union_full.gdtb available in the supplementary material.
Using the variables given in the union dataset, a model is specified for the binary dependent variable at time t and for the initial condition as in Heckman (1981b): union it = 1{γ union it−1 + β 0 + β 1 age + β 2 grade it + β 3 south it + α i + ε it ≥ 0}, union i1 = 1{π 0 + π 1 age i1 + π 2 grade i1 + π 3 south i1 + π 4 not_smsa + θα i + ε i1 ≥ 0}, for i = 1, . . . , n and t = 2, . . . , T , where union is the binary dependent variable, age is the age at time t, grade are the years of schooling and south is a dummy variable for living in the South. For the initial condition, not_smsa, living outside a standard metropolitan statistical area, is used as an exclusion restriction.
The dataset is an unbalanced panel which starts in 1970 and ends in 1988; a few years do not appear in the dataset, so several gaps are present. The number of units is 4434 and the maximum time length T = 12, for a total of 26200 observations. Simple descriptive statistics on the dataset are given by the summary command, followed by the variable names: ? summary union age grade south not_smsa --simple In order to replicate the example provided in Stewart (2006), the dataset needs to be subsampled: Only the years from 1978 are kept and 1983 is dropped; in addition, the panel is balanced by keeping units that are present for six consecutive waves. The following code fragment sub-samples the dataset in order to keep the portion used in Stewart (2006) and stores it, for convenience, in the gretl data file union.gdtb: The resulting dataset is a balanced panel of 799 units observed for T = 6 periods, for a total of 3995 observations.
The models estimated in Stewart (2006) are a pooled probit, a RE probit (with no special treatment for the initial conditions), the DP and the ADP. Table 5 shows the estimation results of the pooled probit and of the RE probit. In the first case, the estimates of both the  coefficients and standard errors are identical to those obtained by Stewart (2006), whereas the estimates of the RE probit (with 24 quadrature points) exhibit very small differences (the coefficient associated with union t−1 is 1.1507 in Stewart 2006 compared to 1.1509, and so on).
The first column of Table 6 reports the estimation results of the DP model, using 24 quadrature points. In this case, estimated coefficients, standard errors and the value of the loglikelihood at convergence are identical. 9 The second and third columns of Table 6 show the estimation results of the DP model using GHK instead of GHQ and of the ADP, both using 500 Halton points. While substantially similar, the results do present some discrepancies from those obtained by Stewart (2006). Such differences can probably be ascribed to the use of Halton sequences in DPB as opposed to random draws from the uniform distribution in the command written by Stewart (2006). Nevertheless, both sets of estimates show that ignoring the autocorrelation in the unobservables leads to the underestimation of the state dependence effect in these data. Finally, Table 7 reports the estimation results for the GADP and the QE model. In the first case, the results are very similar to those obtained by estimating the ADP model, which is expected since the additional correlation coefficient τ is positive and close to unity. In the second, the values of the estimated coefficients are coherent with those of the RE models with no autocorrelation.
Since DPB handles unbalanced datasets, we repeat the exercise keeping all the usable observations. Starting from the full dataset at our disposal (union_full.gdtb), the DPB_setup function for RE models automatically sub-samples the dataset as described in Section 3: Out of the 4434 units 3790 are kept, while 400 are dropped because they are observed only once and 244 because they are are never observed for at least two consecutive periods. Table 8 reports the estimation results based on this dataset and only for the models implemented in DPB. Retaining all usable units in the dataset considerably increases the estimate of the state dependence parameter in all models while reducing the estimated persistence due to time invariant unobserved heterogeneity. 9 The reader should take into account that we directly estimate σα while Stewart (2006)

Example 2
In the following, we show how to implement in gretl the popular RE estimator for dynamic binary models proposed by Wooldridge (2005), CDP (conditional DP) henceforth. To this aim, we use the same dataset available in the data archive of the Journal of Applied Econometrics as well as in the gretl data archive, automatically included with the installation of gretl, and accessible from "File, Open data, Sample file, Gretl, union_wooldridge". 10 The data are a balanced panel of workers, extracted from the US Longitudinal Survey of Youth, which comprises 545 individuals observed for T = 7 periods, from 1981 to 1987.
The method proposed by Wooldridge (2005) relies on specifying a distribution for the individual unobserved heterogeneity conditional on the initial value of the dependent variable and the observed history of strictly exogenous explanatory variables. Following the paper, to which we refer the reader for details, the distribution of the individual unobserved effect is specified as where y i1 is the initial observation and w i contains the whole sequence of the strictly exogenous covariate w it as in the approach of Chamberlain (1980). Notice that we use a different notation of the variance parameter as we will use σ 2 α to denote the variance of the unconditional distribution.

Estimation of the CDP probit
As is well known, the main virtue of the CDP model is that it can be easily estimated by means of standard routines used for the static RE probit with additional regressors. In Section 6 of Wooldridge (2005) the CDP model is estimated with union it as the dependent variable, a dummy variable married it and time dummies as explanatory variables. In addition, the whole history of married and the initial observation union i80 are included in the set of regressors to account for the conditional distribution of α i . A second specification is also used, where the set of regressors contains also educ (years of schooling) and the black dummy variable. The practice of including lags and leads of a time-varying strictly exogenous variable is often adopted by practitioners as a preventive measure against possible correlation between the random effect and the covariates. For this reason, a static/dynamic model with this feature is often referred to as a CRE (correlated random effects) approach.
The Hansl script to replicate the results in Table I  where, after creating the list of explanatory variables, the two probit commands with the -random-effects flag estimate the CDP models. The default number of quadrature points has to be set to 12 to have an exact replication of Wooldridge's original results.
For comparison purposes, we use the same specifications to estimate the DP model with the CRE approach. The estimation results are displayed in Table 9. The results in the first two columns of the table perfectly replicate the results in Wooldridge (2005). In addition, the same models estimated as DP models produce very similar results, as to be expected from the simulation study in Akay (2012).
The Mundlak version (Mundlak 1978) of the CRE approach is also very popular among practitioners: Instead of lags and leads, the regression is augmented by the within group mean of one or more time-varying strictly exogenous covariates. This estimator can easily be implemented in gretl by means of dedicated panel data functions, available to the user after setting the panel data structure. In the replication code file, the within group mean of the variable married is readily created in the line m_marr = pmean(married). Using the baseline specification in Wooldridge (2005), we estimate the CDP and DP models with Mundlak's CRE specification. The results are displayed in Table 10.

Computation of predicted probabilities
In Section 4.1, we illustrated how to have DPB report average PE and the associated standard errors by calling DPB_printape after estimation of the dynamic probit models. Here we also show how to compute predicted probabilities by Hansl scripting, that can be used to derive  As an example, Wooldridge (2005) computes the estimated probability of being in a union in 1987 conditional on being or not being in a union in 1986 and on being married or not. After estimating the RE model and saving the relevant quantities, the predicted probabilities can easily be computed by means of a few lines of code. For instance, the probability of being in a union in 1987, conditional on being in a union in 1986 and being married, P(union i1987 = 1 | union i1986 = 1, married it = 1, x it , w i ), can be written as Φ γ +δ 0 +δ 1 union i1980 +β + w iδ 2 +β 1987 1 + s 2 α and is computed as CDP DP  ? list X = const union_1 union80 married MARR TIME ? probit union X --random-effects --quadpoints=12 --quiet ? k = nelem(X) ? bw = $coeff After the model estimation command is invoked on the second line, we retrieve the estimated parameters by means of the accessor $coeff, which contains the parameter ln(s α ) at the end. Then, we proceed to the computation of the index function, where bw[1] isδ 0 , bw[2] iŝ γ, bw[3] isδ 1 , bw[4] is the coefficient that multiplies the dummy married, lincomb (MARR,bw[5 : 11]) is w iδ 2 and bw[17] is the coefficient for the 1987 time dummy. After the index function is normalized by 1 +ŝ 2 α , the predicted probability is computed by the function cnorm, which returns the standard normal cdf. Similarly the estimated probabilities P(union i1987 = 1 | union i1986 = 0, married it = 1, x it , w i ), and for married it = 0, can be computed by appropriately excluding bw[2] and bw[4] from the index function calculation.
The top panel of Table 11 reports the average estimated probabilities in the four cases and is the equivalent of Table II on page 52 in Wooldridge (2005). APEs can readily be derived by taking cell differences: For instance, the state dependence for married individuals is 0.182, and so on. The lower panel of Table 11 reports the results for the same exercise performed Replication of Wooldridge (2005), Table II  This time, after estimating the model, estimated coefficients need to be extracted from the bundle b, where they are stored into the vector coeff. The syntax bh = b.coeff[1 : k + 1] and sig_a = b.coeff[k + z + 3] extracts the needed estimates according to the order in which they are stored. The index function and the estimated probabilities are computed as in the previous exercise. The script file contains the code to replicate the full Table 11.

Conclusions
The aim of the DPB function package is to provide the practitioner with an intuitive and simple-to-use tool for the estimation of panel data dynamic binary choice models, whose adoption is often called for in applied microeconometrics.
Compared to existing pre-packaged software, we believe that DPB offers several improvements; the most important are, in our opinion, the ability to work with unbalanced or even "gappy" panel datasets; a relatively broad variety in the available estimation techniques; an extremely efficient implementation of certain numerical techniques, so that estimation of realistically-sized models becomes feasible even on ordinary PCs.
Hopefully, DPB will allow practitioners, who may otherwise be discouraged by the complexity that arises from implementing these estimators in-house, to easily employ these estimators in standard research problems.