A Panel Data Toolbox for MATLAB

Panel Data Toolbox is a new package for MATLAB that includes functions to estimate the main econometric methods of balanced and unbalanced panel data analysis. The package includes code for the standard ﬁxed, between and random eﬀects estimation methods, as well as for the existing instrumental panels and a wide array of spatial panels. A full set of relevant tests is also included. This paper describes the methodology and implementation of the functions and illustrates their use with well-known examples. We perform numerical checks against other popular commercial and free software to show the validity of the results.


Introduction
Panel data econometrics has grown in importance over the past decades due to increase in the availability of data related to units that are observed over a long period of time. Panel data econometric methods are available in Stata (StataCorp 2015) and R (R Core Team 2016), but there is a lack of a full set of functions for MATLAB (The MathWorks Inc. 2015).
The Panel Data Toolbox introduces such set of functions, including estimation methods for the standard fixed, between and random effects models, both balanced and unbalanced, as well as instrumental panel data models, including the error components by Baltagi (1981), and, finally, recently introduced spatial panels, (Kapoor, Kelejian, and Prucha 2007;Baltagi and Liu 2011). Numerical checks against Stata and R using well-known classical examples show that the estimated coefficients and t statistics are consistent with those obtained with the new MATLAB toolbox. 1 A full set of corresponding tests is included for poolability of the data, individual effects, fixed and random effects, serial correlation, and cross-sectional dependence. An overidentification test is also available for instrumental panels, as well as tests for spatial autocorrelation.
Spatial econometrics in MATLAB can be estimated using the Econometrics Toolbox (LeSage and Pace 2009), which uses maximum likelihood and Bayesian methods, and using maximum likelihood methods (Elhorst 2014a). In this new Panel Data Toolbox we use a generalized spatial two stage least squares (GS2SLS) estimator for spatial panels following Kapoor et al. (2007) and Baltagi and Liu (2011).
Panel Data Toolbox is available as free software, under the GNU General Public License version 3, and can be downloaded from http://www.paneldatatoolbox.com/, with all the supplementary material (data, examples and source code) to replicate all the results presented in this paper. The toolbox is also hosted on an open source repository on GitHub at https: //github.com/javierbarbero/PanelDataMATLAB.
The paper is organized as follows. Section 3 presents the estimation methods for panel data models. Testing procedures are shown in Section 4. Numerical checks against Stata and R are presented in Section 5. Section 6 concludes.

Data structures
Panel data contains units (individuals, firms, countries, etc.) that are observed over a long period of time. Units are usually denoted by i = 1, 2, . . . , n, and T i is the number of time periods for which unit i is observed. This toolbox handles both balanced and unbalanced panel data, without any previous sorting required, as the toolbox orders the data internally. The total number of observations is N = n i=1 T i , and simplifies to N = nT in case of a balanced panel where T i = T ∀i.
Data are managed as regular MATLAB vectors and matrices, constituting the inputs of the estimation functions. All estimation functions return a structure estout that contains fields with the estimation results as well as the input of the estimation function. Fields can be accessed directly using the dot notation and the whole structure can be used as an input to other functions that print results (e.g., estdisp) or perform postestimation tests.
Some of the fields of the estout structure are the following: 2 • y and X: Contain the dependent and the independent variables, respectively.
• n, T and N: Number of entities, time periods, and total number of observations.
• k and l: Number of explanatory variables and instruments.
• coef, varcoef and stderr: Estimated coefficients, estimated covariance matrix, and estimated standard errors.
• yhat and res: Fitted values and residuals.
Testing functions take as input a estout structure and return as output a testout structure with the results of the test. The common fields of the testout structure are the following: • test: Name of the test performed.
• value: Value/score of the test.
• df: Degrees of freedom.
• p: Associated p value.

Model estimation
The starting formulation is the panel data model with specific individual effects: where µ i represents the ith invariant time individual effect and v it the disturbance,
Panel data models are estimated using the panel(id, time, y, X, method, options) function, where id and time are vectors of unit and time indexes, y is the vector of the dependent variable, X is the matrix of explanatory variables, and method is a string that specifies the panel data estimation method to be used among the following: • po: For a pooling estimation.
• fe: For a fixed effects (within) estimation.
• be: For a between estimation.
• re: For a random effects GLS estimation.
These estimation methods are explained in the following sections. options is an optional list of parameter-value pairs to specify advanced estimating options.

Fixed effects
Under typical specifications, individual effects are correlated with the explanatory variables: COV(X it , µ i ) = 0, which motivates the use of the fixed-effects (within) estimation, so as to capture unobserved heterogeneity (Baltagi 2008).
In this context, including individual effects on the error component while performing OLS (ordinary least squares) results into a biased estimation. In order to extract these effects, the within estimator of the parameters is computed using OLS: whereỹ = y −ȳ andX = X −X are the transformed variables in deviations from the group means,ȳ andX. It is called "within" estimator because it takes into account the variations in each group. This estimator is unbiased and consistent for n → ∞. Statistical inference is generally based on the asymptotic variance-covariance matrix: where S 2 denotes the residual variance: S 2 = (e e)/(N −n−k), with residuals e =ỹ−(Xβ fe ). Finally, inference can be performed using the standard t and F tests.
The function estdisp is used to display the estimation results taking the names of the variables specified in the fields ynames and xnames of the estout structure that is returned from the panel function. 4 The individual effects, with their standard errors and significance test, can be recovered with the ieffects command, and conveniently displayed with the ieffectsdisp function. They are computed as follows:μ ieff = ieffects(fe); ieffectsdisp(fe);  An "overall constant term", computed as the mean of the individual effects, can be calculated and displayed adding the parameter 'overall' to the ieffects or ieffectsdisp functions. ieffOver = ieffects(fe, overall ); ieffectsdisp(fe, overall ); The between estimation is performed by applying OLS to the transformed variables: whereȳ andX are the group means of the variables. It is called "between" estimator because it takes into account the variation between groups. Again, statistical inference is based on the asymptotic variance-covariance matrix: where S 2 denotes the residual variance: S 2 = (e e)/(n − k), with residuals e =ȳ −Xβ be .

Random effects model
In the panel data model (1) the loss of degrees of freedom can be avoided if the individual effects can be assumed random, where the error component u it = µ i + v it includes the ith invariant time individual effects µ i and the disturbance v it .
The individual effect µ i is assumed independent of the disturbance v it . In addition, individual effects and disturbances are independent of the explanatory variables; i.e., COV(X it , µ i ) = 0 and COV(X it , v it ) = 0 for all i and t. For this reason, the random effects model is an appropriate specification in the analysis of n individuals randomly drawn from a large population.
In this context, n is usually large and a fixed effects model would lead to a loss of degrees of freedom.
Following the formalization of Wallace and Hussain (1969), as stated in Baltagi (2008), the composed error component has the following properties: This results in a block-diagonal covariance matrix with serial correlation over time, only between disturbances of the same individual and zero otherwise: This implies the following correlation coefficient between disturbances: Therefore, the covariance matrix can be computed as follows: where J T is a matrix of ones of size T and the homoscedastic variance is VAR(u it ) = σ 2 µ + σ 2 v for all i and t. In this case, the GLS (generalized least squares) method yields an efficient estimator of the parameters,β with Ω −1 = 1/σ 2 1 P +1/σ 2 v Q, where σ 2 1 = T σ 2 µ +σ 2 v , and P and Q are the matrices that compute the group means and the differences with respect to the group means, respectively. In order to obtain the GLS estimator of the regression coefficients, it is necessary to estimate the Ω −1 matrix of dimension nT × nT . Battese (1973, 1974) suggest premultiplying the model by σ v Ω −1/2 , which is equivalent to computing a quasi-time demeaning of the variables Then, the random effects GLS estimation is computed aŝ Now the question is how to obtain estimates of σ 2 v , σ 2 µ and σ 2 1 . Among the different methods proposed in the literature, Swamy and Arora (1972) suggest using the within regression residuals to computeσ 2 v and the residuals from the between regression to computeσ 2 1 . From these estimatesσ 2 µ is calculated as: 5σ whereT is the harmonic mean of T in case of an unbalanced panel, and simple T if the panel is balanced. The random effects estimator (16) is a weighted average of the within and the between estimators. In this case, the asymptotic variance-covariance matrix for statistical inference is: The estimation output displays the estimatedσ µ ,σ v ,σ 1 , andθ, as well as rho_mu, which is the fraction of variance due to the individual effects computed asρ µ =σ 2 µ /(σ 2 µ +σ 2 v ).

Confidence intervals
5 If the estimated σ 2 µ is negative, which occurs when the true value is close to zero (Baltagi 2008, p. 20), it may be replaced by zero as suggested by Maddala and Mount (1973).
Confidence intervals at the desired significance level can be computed with the estci functions, and appropriately displayed with the estcidisp function. Both functions take as input an estimation output structure estout and the desired significance level, which defaults to 0.05 if not specified.

Robust standard errors
If we suspect that there exists heteroskedasticity in the residuals, we can compute a robust standard error estimation of the fixed and random effects models. Liang and Zeger (1986) and Arellano (1987) propose an extension of the White (1980) sandwich estimator for panel data models, whose asymptotic properties are studied by Hansen (2007) and Stock and Watson (2008). The correct standard errors should be computed as a clustered-robust standard errors using the observation groups as the different clusters.
where, in the fixed effects estimation,X is the within transformation of the explanatory variables, e are the residuals from the within regression, and the degrees of freedom correction n/(n − 1) × N/(N − k) is usually applied. In a random effects estimation,X is the quasitime demeaning transformation of the explanatory variables, e the residuals from the random effects regression, and the degrees of freedom correction is n/ The panel function allows robust standard errors estimation, both for fixed and random effects, by setting the option vartype to robust.
Standard errors can be adjusted to a different cluster by setting the option vartype to cluster, and specifying the cluster variable to the option clusterid. 6

Instrumental panels
The assumption of strict exogeneity of the independent variables, X, when they are uncorrelated with the disturbance, E(X it , v it ) = 0, implies that the basic panel data methods we have shown remain valid. However, there are many applications in which this assumption is untenable. In this case, when some of the regressors are endogenous, the fixed effects, between, and random effects estimators lose consistency and unbiasedness. Consequently, we can apply an instrumental variables (IV) two stage estimation to the fixed effects, between, and random effects models (Wooldridge 2010).
To apply this estimation method, we need a set of variables that are strictly exogenous, uncorrelated with the disturbance in all time periods, and relevant; i.e., correlated with the endogenous independent variables. These variables constitute the set of instrumental variables (IV).
For an application of instrumental panel data, we follow Baltagi and Levin (1992) and Baltagi, Griffin, and Xiong (2000) who estimate the demand for cigarettes using data from 46 U.S. states over the period 1963-1992. 7 We estimate the consumption, log(c), measured as per capita sales, which depends on the price per pack, log(price), per capita disposable income, log(ndi), and the minimum price in neighbor states, log(pimin). 8 We believe the log(price) is potentially endogenous, and use as instrumental variables the lags of the disposable income, log(ndi_1) and the lag of the minimum price log(pimin_1). load( CigarData ) y = log(c); X = [log(price), log(ndi), log(pimin)]; Z = [log(ndi_1), log(pimin_1)]; ynames = { lc }; xnames = { lprice , lndi , lpimin }; znames = { lndi_1 , lpimin_1 }; 6 In fact, setting vartype to robust is equivalent to setting vartype to cluster and clusterid to id. 7 The data is available in MATLAB format in the supplementary file CigarData.mat. 8 The equation we estimate differs from the original one, which corresponds to a dynamic panel data model.
Instrumental panel models are estimated using the ivpanel(id, time, y, X, Z, method, options) function, where Z is the matrix of instruments -excluding the exogenous variables in X that are instruments of themselves and are automatically added by the function. A vector of indexes corresponding to the endogenous variables must be set in the endog option. method is a string that specifies the choice of instrumental panel data estimation method, among the following: • po: For a pool estimation.
• fe: For a fixed effects (within) estimation.
• be: For a between effects estimation.
• re: For a random effects estimation.

Two stage least squares
Instrumental panel data models are estimated by two stage least squares (2SLS). The first stage of the 2SLS estimation consists of estimating the independent variables,X, by an OLS estimation ofX overH = [X * ,Z], whereX * are the exogenous variables inX, which are instruments of themselves, andZ is the matrix of new instruments. For simplification, the tilde over the variables denotes the corresponding within, between or quasi-time demeaning transformation.X =H(H H ) −1H X .
The second stage consists in estimating the coefficients,β, using the predictedX: WhereverX andH correspond to the within, between, or quasi-time demeaning transformation of the variables, we are computing the corresponding fixed effects 2SLS (FE2SLS), between 2SLS (BE2SLS), and random effects 2SLS (RE2SLS).
Regarding statistical inference, the statistic of individual significance is normally distributed, while the statistic of joint significance is distributed as a χ 2 distribution with the corresponding degrees of freedom.

Baltagi's error components estimator
whereH corresponds to the within transformation of the instruments H, andH are the group means of the instruments. Then, the EC2SLS estimation is performed using A as the matrix of instruments in a random effects context. 9 Consequently, EC2SLS incorporates more instruments than RE2SLS. Baltagi and Li (1992) show that both estimators are consistent and have the same limiting distributions, although it is worth noting that for small samples EC2SLS shows gains in efficiency. More recently, Baltagi and Liu (2009) present proofs to obtain the EC2SLS asymptotic properties with respect to RE2SLS.
The error components two stage least squares (EC2SLS) estimation can also be performed with the ivpanel by specifying the 'ec' method:

Spatial panels
In recent years the econometrics literature has grown with topics related to the analysis of spatial relations using panel data models. The main reason is the availability of more complete data sets in which units characterized by spatial features are observed over time. In general, a spatial panel data set contains more information and less multicollinearity among the variables than a cross-section spatial counterpart -see Anselin (1988Anselin ( , 2010, Elhorst (2014b) and Arbia (2014) for an introduction to this literature.
In the context of cross-sectional models Kelejian and Prucha (1998) introduce a generalized spatial two-stage least squares estimator, Kelejian and Prucha (1999) 10 propose a generalized moments (GM) estimation method that is feasible for large n, while Anselin (1988) provides the ML (maximum likelihood) estimator. Drukker, Egger, and Prucha (2013) extend the model allowing for endogenous regressors. Most recently, Elhorst (2003Elhorst ( , 2010 and Lee and Yu (2010) present the ML estimators of the spatial lag model as well as the error model extended to include fixed and random effects, solving the computational problems when the number of cross sectional units n is large. Kapoor et al. (2007), Mutl and Pfaffermayr (2011), and Piras (2013) generalize the GM procedure from cross-section to panel data and derive its properties.
In order to compute different estimators in spatial panel models, we consider the general spatial panel model: A spatial panel data model can include a spatial lag of the dependent variable, W y it , a spatial lag in the error structure, W it , and a spatial lag in the explanatory variables, W X it , whose coefficients are λ, ρ, and β λ , respectively. Depending on the spatial lags they include the model receives a different name.
Procedures for estimating spatial panel data models in MATLAB are already available in LeSage and Pace (2009), using Bayesian methods, and in Elhorst (2014a), by maximum likelihood. In this toolbox, we implement the GM procedure for spatial panels, which allows the inclusion of additional endogenous covariates, and it is integrated with the rest of the toolbox, both regarding estimation and testing functions. 11 In the case where only the spatial lag of the dependent variable is included, this spatial lag is endogenous and the estimation of the spatial model is performed as an instrumental variables estimation using the instruments suggested by Kelejian and Prucha (1998), H = [X, W X, W 2 X]. If the model contains a spatial lag of the error structure, the estimation method is a GM estimation, and we refer the reader to Kapoor et al. (2007), Mutl and Pfaffermayr (2011), and Piras (2013) for a full explanation of the estimation methods and the corresponding moments conditions.
The application is based on the Munnell (1990) and Baltagi (2008)  Spatial panel data models are estimated using the spanel(id, time, y, X, W, method, options), where W is the n×n spatial weight matrix. 13 method can be one of the following: 14 • fe: For a spatial fixed effects (within) estimation.
• re: For a spatial random effects estimation.
11 These three packages work by taking the data as input and returning a structure with the results of the estimation as output. Although LeSage and Pace (2009) and Elhorst (2014a) use different functions for estimating models with different spatial lags, here all are condensed in a single spanel function which allows to estimate models by selecting which spatial lags to include. Despite this small difference, the user will find no difficulty in using the three packages if he wants to compare results using different estimation procedures.
12 The Munnell (1990) data is available in MATLAB format in the supplementary file MunnellData.mat, while the W matrix comes from Millo and Piras (2012) and is available in the file MunnellW.mat. 13 The function transforms the W matrix into a sparse matrix to take advantage of the computational speed improvements of MATLAB when working with sparse matrices.
14 As for now, spatial panels are only available for balanced panels, since the methods for unbalanced ones are still in their early stages.
• ec: For the Baltagi and Liu (2011) spatial error components estimation of the model with a spatial lag of the dependent varaible.
The different spatial lags can be included by setting the following options: • slagy: If set to 1 includes a spatial lag of the dependent variables.
• slagerror: If set to 1 includes a spatial lag of the error structure.
• slagX: A vector of indexes specifying the explanatory variables for which a spatial lag should be added.
Estimating a model with a spatial lag in the dependent variable and a spatial lag in the error structure, usually denoted as SARAR (spatial autoregressive with additional autoregressive error structure), is straightforwardly performed with the spanel function: The spanel function also allows to perform spatial panel estimation when one of the explanatory variables is endogenous. This is performed by including a vector of indexes of the endogenous variables in the option endog, and passing the matrix of new instruments to the option inst. For example, if we assume that the public capital log(pcap) is exogenous and we want to instrument it using the highway and the water components of the public capital, log(hwy) and log(water): Z = [log(hwy), log(water)]; sarfe = spanel(id, year, y, X, W, fe , slagy , 1, slagerror , 1,... endog , 1, inst , Z); sarfe.ynames = ynames; sarfe.xnames = xnames; estdisp(sarfe);

Tests
In this section we describe the implementation of several canonical tests for the panel data regression models presented previously. Specification tests in panel data involves testing for poolability, individual effects and the Hausman test to select the efficient estimator between fixed and random effects models. In addition, we provide a suite of serial correlation and cross-sectional dependence tests. Finally, we consider as the usual diagnostic checks an overidentification test for validity of instruments in instrumental panels and tests for spatial autocorrelation in spatial panels. Appropriate corrections for heteroskedasticity and unbalanced panels for these tests are applied when available.
All test functions require as input an estimation output structure, estout, from a panel estimation and return a testout structure, described in Section 2, that can be displayed in a suitable way using the testdisp function.

Testing linear hypotheses
Linear hypotheses of the form H 0 : Rβ = r can be tested with the standard Wald joint significance test, using the waldsigtest function and specifying the R and r matrices of the null hypothesis to be tested.

Testing poolability
pooltest tests the hypothesis that the population parameters are the same across individuals. Therefore we want to test the stability of the coefficients, H 0 : β i = β for all i, in Equation 1. It is a standard F test based on a comparison between the model estimated for the complete sample and a model that estimates an equation for each individual (Baltagi 2008). pool = pooltest(re); testdisp(pool); Test of poolability H0: Stability of coefficients F(282,528) = 33.829171 p-value = 0.0000

Testing individual effects
The test for individual effects contrasts the existence of different time invariant specific effects based on the results of the pooling model. effectsftest performs the Chow F test for individual effects as in Baltagi (2008). Under the null hypothesis that there are no individual effects, µ i = 0 ∀i, the restricted model comes from an OLS pooling estimation, while the unrestricted model follows the fixed effects estimation. effF = effectsftest(fe); testdisp(effF) F test of individual effects H0: All mu_i = 0 F(47,764) = 75.820406 p-value = 0.0000 bpretest implements the Baltagi and Li (1990) version of the Lagrange multiplier (LM) test of individual effects proposed by Breusch and Pagan (1980). This test contrasts the existence of individual effects by checking its variance that under the null hypothesis of no individual effects is equal to zero, and the LM statistic is distributed as a χ 2 1 .

Testing fixed vs. random effects
In order to determine the correct specification of the model, fixed versus random effects, it is necessary to check the correlation between the individual effects and the regressors. When the individual effects and the explanatory variables are correlated: COV(X it , µ i ) = 0, the fixed effects model provides an unbiased estimator, otherwise a feasible GLS estimator in a random effects model is an efficient estimator.
hausmantest computes the Hausman test (Hausman 1978) that compares the GLS estimator of the random effects model,β re , and the within estimator in the fixed effects model,β f e , both of which are consistent under the null hypothesis. Under the alternative, only the GLS estimator of random effects is consistent. Therefore, the statistics is based on the difference between both estimators H 0 : β f e − β re = 0, and it is computed as: where, under the assumption of homoskedasticity: VAR(β f e −β re ) = VAR(β f e ) − VAR(β re ).
For n fixed and T large, both estimators tend to similar values, with their difference converging to zero, and Hausman's test is unnecessary. However, in applications where n is relatively large with respect to T , it can be used to choose between estimators.
The input of the hausmantest function requires the output structures of the two estimations to be compared. In case of a spatial panel data model with a spatial lag in the error structure, the spatial Hausman test described in Mutl and Pfaffermayr (2011) is performed by passing the spatial estimation output structures to the hausmantest function.
The Mundlak (1978) approach suggests estimating the following regression by GLS: whereX i are the group means of the variables. Then, a test can be performed by computing a Wald joint significance test on γ, under the null hypothesis of random effects, H 0 : γ = 0. This approach is computationally more stable in finite samples and can be estimated with robust standard errors (Wooldridge 2010). mundlak = mundlakvatest(fe); testdisp(mundlak); Mundlak s variable addition test for fixed or random effects H0: Group means are zero. Random effects. Chi2(4) = 9.718105 p-value = 0.0455

Testing serial correlation
In linear panel data models it is necessary to identify serial correlation in the error term because it biases the standard errors and causes loss of efficiency. We present tests for serial correlation in random and fixed effects models.
woolserialtest performs the Wooldridge's test (Wooldridge 2010) for the null hypothesis of no serial correlation in the error term of a fixed effects model. Under the null hypothesis of no serial correlation in the errors, v it , the time demeaned errors of a within regression are negatively serially correlated, with correlation ρ = −1/(T − 1). Thus, a test of serial correlation can be performed by regressing the within estimation residuals,v it , over their lag, v i,t−1 :v it = α + ρv it + it , and testing whetherρ = −1/(T − 1), using a Wald test with clustered standard errors. In the context of a random effects model blserialtest performs the Lagrange multiplier test for first-order serially correlated errors and random effects proposed by Baltagi and Li (1990), as an extension to Breusch and Pagan (1980). This test contrasts the joint null hypothesis of serial correlated and random individual effects. The LM test is based on the OLS residuals and it is asymptotically distributed as a χ 2 2 .

Testing cross-sectional dependence
Cross-sectional dependence in the errors may arise because of the presence of common shocks or when the estimated models present spatial dependence in the disturbances. Cross-sectional dependence results in the inefficiency of the usual estimators and an invalid inference when using the standard covariance matrix. This indicates that testing for cross-sectional dependence is important in fitting panel data models.

Testing overidentification
To evaluate the validity of the instruments in instrumental panels we perform an overidentification test. The function sarganoitest performs the Sargan (1958) test of overidentification restrictions regressing the residuals of the instrumental estimation on all the instruments, including the exogenous variables that are instruments of themselves. Under the null hypothesis that instruments are uncorrelated with the error term, validity of the overidentifying restrictions, the statistic is distributed as a χ 2 r where r is the number of overidentifying restrictions. The input of the sarganoitest function must be an estimation output structure from an instrumental panel. sargan = sarganoitest(ivfe); testdisp(sargan); Sargan s test of overidentification H0: Instruments are uncorrelated with the error term Score = 25.520199~Chi2(1) p-value = 0.0000

Testing spatial autocorrelation
The function bsjksatest implements the join Lagrange multiplier test for testing serial correlation, spatial autocorrelation and random effects in spatial panels by Baltagi, Song, Jung, and Koh (2007). The test is based on the OLS residuals and the W matrix and under the null hypothesis of no spatial autocorrelation, no serial error correlation and no random effects, it is distributed as a χ 2 3 . The input of the bsjksatest function must be an estimation output structure from a spatial panel.  Results for the basic panel data models -fixed, between and random -estimations using the MATLAB panel function, and the results reported by Stata xtreg function, and the plm function from the R package plm by Croissant and Millo (2008), are reported in Table 1.
Results show that there are no differences in the estimated coefficients and t statistics between the three programs.
Numerical checks for the instrumental variables panel data models of fixed effects, random effects, and Baltagi's error components using the MATLAB ivpanel function, the Stata xtivreg function, and plm function from the R package plm are reported in Table 2. Again, results are equal regardless of the software, although there is a slightly difference in the last decimal between Stata and the other two.
Spatial panel estimations using the MATAB function spanel are checked against the R package splm by Millo and Piras (2012), using the spgm function, which performs a GM implementation. Since a large variety of models can be computed for spatial panels depending on the  Table 3: Comparison of estimated coefficients and t statistics for spatial panel data against R.
spatial lags we assume, we perform the numerical checks of a spatial SARAR model, which includes a spatial lag of the dependent variable and a spatial lag of the error structure, both with fixed and random effects. Although different interpretations of the literature as well as on the choice of techniques when implementing spatial econometrics lead to some differences in the results (Bivand and Piras 2015), results in Table 3 reveal no differences in the estimated coefficients and t statistics between MATLAB and R.

Conclusions
The new Panel Data Toolbox covers a wide variety of balanced and unbalanced panel data models in an organized environment for MATLAB. Estimation methods include fixed, between and random effects, as well as instrumental and spatial panels, and the full set of relevant tests for testing poolability, individual effects, serial correlation, cross-sectional dependence, overidentification and spatial autocorrelation.
Numerical checks show the consistency of the results, as the estimated coefficients and t statistics are equal to those reported by Stata and R for panel, instrumental panels and spatial panel data methods. This positions the new toolbox as a valid self-contained package for panel data econometrics in MATLAB.
Since the code is freely available in an open source repository on GitHub at https://github. com/javierbarbero/PanelDataMATLAB, under the GNU General Public License version 3, users will benefit from the review, collaboration and contributions from the community, and can check the syntax to learn how the theoretical formulas of econometrics can be translated into code.