phtt : Panel Data Analysis with Heterogeneous Time Trends in R

The R package phtt provides estimation procedures for panel data with large dimensions n , T , and general forms of unobservable heterogeneous eﬀects. Particularly, the estimation procedures are those of Bai (2009) and Kneip, Sickles, and Song (2012), which complement one another very well: both models assume the unobservable heterogeneous eﬀects to have a factor structure. Kneip et al. (2012) considers the case in which the time-varying common factors have relatively smooth patterns including strongly positively auto-correlated stationary as well as non-stationary factors, whereas the method of Bai (2009) focuses on stochastic bounded factors such as ARMA processes. Additionally, the phtt package provides a wide range of dimensionality criteria in order to estimate the number of the unobserved factors simultaneously with the remaining model parameters.


Introduction
One of the main difficulties and at the same time appealing advantages of panel models is their need to deal with the problem of the unobserved heterogeneity. Classical panel models, such as fixed effects or random effects models, try to model unobserved heterogeneity using dummy variables or structural assumptions on the error term (see, e.g., Baltagi 2005). In both cases the unobserved heterogeneity is assumed to remain constant over time within each cross-sectional unit -apart from an eventual common time trend. This assumption might be reasonable for approximating panel data with fairly small temporal dimensions T ; however, for panel data with large T this assumption becomes very often implausible.
Nowadays, the availability of panel data with large cross-sectional dimensions n and large time dimensions T has triggered the development of a new class of panel data models. Recent discussions by Ahn, Lee, and Schmidt (2013), Pesaran (2006), Bai (2009), Bai, Kao, and Ng (2009), and Kneip et al. (2012) have focused on advanced panel models for which the unobservable individual effects are allowed to have heterogeneous (i.e., individual specific) time trends that can be approximated by a factor structure. The basic form of this new class of panel models can be presented as follows: x itj β j + ν it + it for i ∈ {1, . . . , n} and t ∈ {1, . . . , T }, where y it is the dependent variable for each individual i at time t, x itj is the jth element of the vector of explanatory variables x it ∈ R P , and it is the idiosyncratic error term. The time-varying individual effects ν it ∈ R of individual i for the time points t ∈ {1, . . . , T } are assumed to be generated by d common time-varying factors. The following two specifications of the time-varying individual effects ν it are implemented in our R package phtt: for the model of Bai (2009), v i (t) = d l=1 λ il f l (t), for the model of Kneip et al. (2012). (2) Here, λ il are unobserved individual loadings parameters, f lt are unobserved common factors for the model of Bai (2009), f l (t) are the unobserved common factors for the model of Kneip et al. (2012), and d is the unknown factor dimension.
Note that the explicit consideration of an intercept in model (1) is not necessary but may facilitate interpretation. If x it includes an intercept, the time-varying individual effects ν it are centered around zero. If x it does not include an intercept, the time-varying individual effects ν it are centered around the overall mean.
Model (1) includes the classical panel data models with additive time-invariant individual effects and common time-specific effects. This model is obtained by choosing d = 2 with a first common factor f 1t = 1 for all t ∈ {1, . . . , T } that has individual loadings parameters λ i1 , and a second common factor f 2t that has the same loadings parameter λ i2 = 1 for all i ∈ {1, . . . , n}.
An intrinsic problem of factor models lies in the fact that the true factors are only identifiable up to rotation. In order to ensure the uniqueness of these parameters, a number of d 2 restrictions are required. The usual normalization conditions are given by T t=1 f lt f kt = 0 for all l, k ∈ {1, . . . , d} with k = l, and (c) n i=1 λ il λ ik = 0 for all l, k ∈ {1, . . . , d} with k = l; see, e.g., Bai (2009) and Kneip et al. (2012). For the model of Kneip et al. (2012), f lt in conditions (a) and (b) has to be replaced by f l (t). As usual in factor models, a certain degree of indeterminacy remains, because the factors can only be determined up to sign changes and different ordering schemes. Kneip et al. (2012) consider the case in which the common factors f l (t) show relatively smooth patterns over time. This includes strongly positively auto-correlated stationary as well as non-stationary factors. The authors propose to approximate the time-varying individual effects v i (t) by smooth nonparametric functions, say, ϑ i (t). In this way (1) becomes a semiparametric model and its estimation is done using a two-step estimation procedure, which we explain in more detail in Section 2. The asymptotic properties of this method rely, however, on independent and identically distributed errors.
Alternatively, Bai (2009) allows for weak forms of heteroskedasticity and dependency in both time and cross-section dimensions and proposes an iterated least squares approach to estimate (1) for stationary time-varying individual effects v it such as ARMA processes or nonstationary deterministic trends. However, Bai (2009) rules out a large class of non-stationary processes such as stochastic processes with integration.
Moreover, Bai (2009) assumes the factor dimension d to be a known parameter, which is usually not the case. Therefore, the phtt package uses an algorithmic refinement of Bai's method proposed by Bada and Kneip (2014) in order to estimate the number of unobserved common factors d jointly with the remaining model parameters; see Section 4 for more details.
Besides the implementations of the methods proposed by Kneip et al. (2012), Bai (2009), and Bada and Kneip (2014) the R package phtt comes with a wide range of criteria (16 in total) for estimating the factor dimension d. The main functions of the phtt package are given in the following list: KSS(): Computes the estimators of the model parameters according to the method of Kneip et al. (2012); see Section 2.
Eup(): Computes the estimators of the model parameters according to the method of Bai (2009) and Bada and Kneip (2014); see Section 4. OptDim(): Allows for a comparison of the estimated factor dimensionsd obtained from many different (in total 16) criteria; see Section 3. For the returned objects the usual print(), summary(), plot(), coef() and residuals() methods are provided.
Standard methods for estimating models for panel and longitudinal data are also implemented in the R (R Core Team 2014) packages plm (Croissant and Millo 2008), nlme (Pinheiro, Bates, DebRoy, Sarkar, and R Core Team 2014), and lme4 (Bates, Maechler, Bolker, and Walker 2014); see Croissant and Millo (2008) for an exhaustive comparison of these packages. Recently, Millo and Piras (2012) published the R package splm for spatial panel data models. The phtt package further extends the toolbox for statisticians and econometricians and provides the possibility of analyzing panel data with large dimensions n and T and can be considered when the unobserved heterogeneity effects are time-varying.
To the best of our knowledge, our phtt package (Bada and Liebl 2014) is the first software package that offers the estimation methods of Bai (2009) and Kneip et al. (2012). Regarding the different dimensionality criteria that can by accessed via the function OptDim() only those of Bai and Ng (2002) are publicly available as MATLAB codes (The MathWorks Inc. 2012) from the homepage of Serena Ng (http://www.columbia.edu/~sn2294/).
To demonstrate the use of our functions, we re-explore the well known Cigar dataset, which is frequently used in the literature of panel models. The panel contains the per capita cigarette consumptions of n = 46 American states from 1963 to 1992 (T = 30) as well as data about the income per capita and cigarette prices (see, e.g., Baltagi and Levin 1986, for more details on the dataset).
We follow Baltagi and Li (2004), who estimate the following panel model: Here, Consumption it presents the sales of cigarettes (packs of cigarettes per capita), Price it is the average real retail price of cigarettes, and Income it is the real disposable income per capita. The index i ∈ {1, . . . , 46} denotes the single states and the index t ∈ {1, . . . , 30} denotes the year. We revisit this model, but allow for a multidimensional factor structure such that The Cigar dataset can be obtained from the phtt package using the function data("Cigar", package = "phtt"). In Figure 1 the panels of the variables ln(Consumption it ), ln(Price it ), and ln(Income it ) are shown.
Section 2 is devoted to a short introduction of the method of Kneip et al. (2012), which is appropriate for relatively smooth common factors f l (t). Section 3 presents the usage of the function OptDim(), which provides access to a wide range of panel dimensionality criteria recently discussed in the literature on factor models. Section 4 deals with the explanation as well as application of the panel method proposed by Bai (2009), which is basically appropriate for stationary and relatively unstructured common factors f lt .

Panel models for heterogeneity in time trends
The panel model proposed by Kneip et al. (2012) can be presented as follows: where the time-varying individual effects v i (t) are parametrized in terms of common nonparametric basis functions The asymptotic properties of this method rely on second order differences of v i (t), which apply to continuous functions as well as to classical discrete stochastic time series processes such as (S)AR(I)MA processes. Therefore, the functional notation of the time-varying individual effects v i (t) and their underlying common factors f 1 (t), . . . , f d (t) does not restrict them to a purely functional interpretation. The main idea of this approach is to approximate the time series of individual effects v i (t) by smooth functions ϑ i (t).
The estimation approach proposed by Kneip et al. (2012) relies on a two-step procedure: first, estimates of the common slope parameters β j and the time-varying individual effects v i (t) are obtained semi-parametrically. Second, functional principal component analysis is used to estimate the common factors f 1 (t), . . . , f d (t), and to re-estimate the time-varying individual effects v i (t) more efficiently. In the following we describe both steps in more detail.
Step 1: The unobserved parameters β j and v i (t) are estimated by the minimization of over all β j ∈ R and all m-times continuously differentiable functions ϑ i (t), where ϑ (m) i (t) denotes the mth derivative of the function ϑ i (t). A first approximation of v i (t) is then given byṽ i (t) :=θ i (t). Spline theory implies that any solutionθ i (t) possesses an expansion in terms of a natural spline basis z 1 (t), . . . , z T (t) such thatθ i (t) = T s=1ζ is z s (t); see, e.g., Boor (2001). Using the latter expression, we can rewrite (6) to formalize the following objective function: where Y i = (y i1 , . . . , y iT ) , X i = (x i1 , . . . , x iT ) , β = (β 1 , . . . , β P ) , ζ i = (ζ i1 , . . . , ζ iT ) . Z and R are T × T matrices where Z has elements {z s (t)} s,t=1,...,T and R elements { z (m) k (t)dt} s,k=1,...,T , respectively. κ is a preselected smoothing parameter to control the smoothness ofθ i (t). We follow the usual choice of m = 2, which leads to cubic smoothing splines.
In contrast to Kneip et al. (2012), we do not specify a common time effect in model (4), but the vector of explanatory variables is allowed to contain an intercept. This means that the time-varying individual effects v i (t) are not centered around zero for each specific time point t, but around a common intercept term. The separate estimation of the common time effect, say θ t , is also possible with our phtt package; we discuss this in detail in Section 5.
The solutions are given bŷ Step 2: The common factors are obtained by the first d eigenvectorsγ 1 , . . . ,γ d that correspond to the largest eigenvaluesρ 1 , . . . ,ρ d of the empirical covariance matrix The estimator of the common factor f l (t) is then defined by the lth scaled eigenvector whereγ lt is the tth element of the eigenvectorγ l . The scaling factor √ T yields that f l (t) satisfies the normalization condition 1 T T t=1f l (t) 2 = 1 as listed above in Section 1. The estimates of the individual loadings parameters λ il are obtained by ordinary least squares regressions of Y i − X iβ onf l , wheref l = (f l (1), . . . ,f l (T )) . Recall from conditions (a) and (b) thatλ il can be calculated as follows: A crucial part of the estimation procedure of Kneip et al. (2012) is the re-estimation of the time-varying individual effects v i (t) in Step 2 byv i (t) := d l=1λ ilfl (t), where the factor dimension d can be determined, e.g., by the sequential testing procedure of Kneip et al. (2012) or by any other dimensionality criterion; see also Section 3. This re-estimation leads to more efficiently estimated time-varying individual effects. Kneip et al. (2012) derive the consistency of the estimators as n, T → ∞ and show that the asymptotic distribution of common slope estimators is given byΣ A consistent estimator of σ 2 can be obtained bŷ To determine the optimal smoothing parameter κ opt , Kneip et al. (2012) propose the following cross validation (CV) criterion: whereβ −i ,λ −i,l , andf −i,l are estimates of the parameters β, λ, and f l based on the dataset without the ith observation. Unfortunately, this criterion is computationally very costly and requires determining the factor dimension d in advance. To overcome this disadvantage, we propose a plug-in smoothing parameter that is discussed in more detail in the following Section 2.1.

Computational details
Theoretically, it is possible to determine κ by the CV criterion in (16); however, cross validation is computationally very costly. Moreover, Kneip et al. (2012) do not explain how the factor dimension d is to be specified during the optimization process, which is critical since the estimatord is influenced by the choice of κ.
In order to get a quick and effective solution, we propose to determine the smoothing parameter κ by generalized cross validation (GCV). However, we cannot apply the classical GCV formulas as proposed, e.g., in Craven and Wahba (1978) since we do not know the parameters β and v i (t). Our computational algorithm for determining the GCV smoothing parameter κ GCV is based on the method of Cao and Ramsay (2010), who propose optimizing objective functions of the form (7) by updating the parameters iteratively in a functional hierarchy. Formally, the iteration algorithm can be described as follows: 1. For given κ and β, we optimize (7) with respect to ζ i to get 2. By using (17), we minimize (7) with respect to β to get 3. Once (17) and (18) are obtained, we optimize the following GCV criterion to calculate κ GCV : The program starts with initial estimates of β and κ and proceeds with Steps 1, 2, and 3 in recurrence until convergence of all parameters. The initial valueβ start is defined in (49) and the initial value κ start is the GCV-smoothing parameter of the residuals Y i − X iβstart .
The advantage of this approach is that the inversion of the P × P matrix in (18) does not have to be updated during the iteration process. Moreover, the determination of the GCV minimizer in (19) can be easily performed in R using the function smooth.spline(), which calls a rapid C routine.
But note that the GCV smoothing parameter κ GCV in (19) does not explicitly account for the factor structure of the time-varying individual effects v i (t) as formalized in (2). In fact, given that the assumption of a factor structure is true, the goal should not be to obtain optimal estimates of v i (t) but rather to obtain optimal estimates of the common factors f l (t), which implies that the optimal smoothing parameter κ opt will be smaller than κ GCV ; see Kneip et al. (2012).
If the goal is to obtain optimal estimates of f l (t), κ opt will be used as an upper bound when minimizing the CV criterion (16) (via setting the argument CV = TRUE); which, however, can take some time. Note that, this optimal smoothing parameter κ opt depends on the unknown factor dimension d. Therefore, we propose to, first, estimate the dimension based on the smoothing parameter κ GCV and, second, to use the estimated dimensiond (via explicitly setting the dimension argument factor.dim =d) in order to determine the dimension-specific smoothing parameter κ opt (via setting the argument CV = TRUE).

Application
This section is devoted to the application of the method of Kneip et al. (2012) discussed above. The computation of this method is accessible through the function KSS(), which has the following arguments: KSS(formula, additive.effects = c("none", "individual", "time", "twoways"), consult.dim.crit = FALSE, d.max = NULL, sig2.hat = NULL, factor.dim = NULL, level = 0.01, spar = NULL, CV = FALSE, convergence = 1e-06, restrict.mode = c("restrict.factors", "restrict.loadings"), ...) The argument formula is compatible with the usual R-specific symbolic specification of the model. The unique specificity here is that the variables should be defined as T × n matrices, where T is the temporal dimension and n is the number of the cross-section units. 1 The argument additive.effects makes it possible to extend model (4) with additional additive individual, time, or twoways effects as discussed in Section 5.
If the logical argument consult.dim.crit is set to TRUE all dimensionality criteria discussed in Section 3 are computed and the user is asked to choose one of their results.
The arguments d.max and sig2.hat are required for the computation of some dimensionality criteria discussed in Section 3. If their default values are maintained, the function internally computes d.max = min{ √ n, √ T } and sig2.hat as in (15), where x indicates the integer part of x. The argument level allows to adjust the significance level for the dimensionality testing procedure (21) of Kneip et al. (2012); see Section 3.
CV is a logical argument. If it is set to TRUE the cross validation criterion (16) of Kneip et al. (2012) will be computed. In the default case, the function uses the GCV method discussed in Section 2.1.
The factor dimension d can be pre-specified by the argument factor.dim. Recall from restriction (a) that 1 T T t=1f l (t) 2 = 1. Alternatively, it is possible to standardize the individual loadings parameters such that 1 n n i=1λ il = 1, which can be done by setting restrict.mode = "restrict.loadings". As an illustration we estimate the cigarettes model (3) introduced in Section 1: with In the following lines of code we load the Cigar dataset and take logarithms of the three variables, Consumption it , Price it /cpi t and Income it /cpi t , where cpi t is the consumer price index. The variables are stored as T × n matrices. This is necessary, because the formula argument of the KSS() function takes the panel variables as matrices in which the number of rows has to be equal to the temporal dimension T and the number of columns has to be equal to the individual dimension n.
R> library("phtt") R> data("Cigar", package = "phtt") R> n <-46; T <-30 R> l.Consumption <-log(matrix(Cigar$sales, T, n)) R> cpi <-matrix(Cigar$cpi, T, n) R> l.Price <-log(matrix(Cigar$price, T, n)/cpi) R> l.Income <-log(matrix(Cigar$ndi, T, n)/cpi) The model parameters β 1 , β 2 , the factors f l (t), the loadings parameters λ il , and the factor dimension d can be estimated by the KSS() function with its default arguments. Inferences about the slope parameters can be obtained using the summary() method. The effects of the log-real prices for cigarettes and the log-real incomes on the log-sales of cigarettes are highly significant and in line with results in the literature. The summary output reports an estimated factor dimension ofd = 6. In order to get a visual impression of the six estimated common factorsf 1 (t), . . . ,f 6 (t) and the estimated time-varying individual effectŝ v 1 (t), . . . ,v n (t), we provide a plot() method for the objects of class 'summary.KSS' which are returned by the summary() method of the 'KSS' objects returned by KSS().

R> plot(Cigar.KSS.summary)
The left panel of Figure 2 shows the six estimated common factorsf 1 (t), . . . ,f 6 (t) and the right panel of Figure 2 shows the n = 46 estimated time-varying individual effectsv 1 (t), . . . ,v n (t). The common factors are ordered according to the decreasing sequence of their eigenval-ues. Obviously, the first common factor is nearly time-invariant; this suggests extending the model (20) by additive individual (time-invariant) effects; see Section 5 for more details.
By setting the logical argument consult.dim.crit = TRUE, the user can choose from other dimensionality criteria, which are discussed in Section 3. Note that the consideration of different factor dimensions d would not alter the results for the slope parameters β since the estimation procedure of Kneip et al. (2012) for the slope parameters β does not depend on the dimensionality parameter d.

Panel criteria for selecting the number of factors
In order to estimate the factor dimension d, Kneip et al. (2012) propose a sequential testing procedure based on the following test statistic: . . , f l (T )) , and The selection method can be described as follows: choose a significance level α (e.g., α = 1%) and begin with H 0 : If the null hypothesis can be rejected, go on with d = 1, 2, 3, . . . until H 0 cannot be rejected. Finally, the estimated dimension is then given by the smallest dimension d, which leads to a rejection of H 0 .
The dimensionality criterion of Kneip et al. (2012) can be used for stationary as well as nonstationary factors. However, this selection procedure has a tendency to ignore factors that are weakly auto-correlated. As a result, the number of factors can be underestimated.
More robust against this kind of underestimation are the criteria of Bai and Ng (2002). The basic idea of their approach consists simply of finding a suitable penalty term g nT , which countersteers the undesired variance reduction caused by an increasing number of factorsd. Formally,d can be obtained by minimizing the following criterion: for all l ∈ {1, 2, . . .}, whereŷ it (l) is the fitted value for a given factor dimension l. To estimate consistently the dimension of stationary factors Bai and Ng (2002) propose specifying g nT by one of the following penalty terms: whereσ 2 is the sample variance estimator of the residualsˆ it . The proposed criteria are denoted by PC1, PC2, PC3, and BIC3 respectively. Note that only the first three criteria satisfy the requirements of Theorem 2 in Bai and Ng (2002), i.e., (i) g nT → 0 and (ii) min{n, T }g nt → ∞, as n, T → ∞. These conditions ensure consistency of the selection procedure without imposing additional restrictions on the proportional behavior of n and T . The requirement (i) is not always fulfilled for BIC3, especially when n is too large relative to T or T is too large relative to n (e.g., n = exp(T ) or T = exp(n)). In practice, BIC3 seems to perform very well, especially when the idiosyncratic errors are cross-correlated.
The variance estimatorσ 2 can be obtained bŷ where d max is an arbitrary maximal dimension that is larger than d. This kind of variance estimation can, however, be inappropriate in some cases, especially whenσ 2 (d max ) underestimates the true variance. To overcome this problem, Bai and Ng (2002) propose three additional criteria (IC1, IC2, and IC3): In order to improve the finite sample performance of IC1 and IC2, Alessi, Barigozzi, and Capasso (2010) propose to multiply the penalties g (IC1) nT and g (IC2) nT with a positive constant c and apply the calibration strategy of Hallin and Liška (2007). The choice of c is based on the inspection of the criterion behavior through J-different tuples of n and T , i.e., (n 1 , T 1 ), . . . , (n J , T J ), and for different values of c in a pre-specified grid interval. We denote the refined criteria in our package by ABC.IC1 and ABC.IC2 respectively. Note that such a modification does not affect the asymptotic properties of the dimensionality estimator.
Under similar assumptions, Ahn and Horenstein (2013) propose selecting d by maximizing the ratio of adjacent eigenvalues (or the ratio of their growth rate). The criteria are referred to as eigenvalue ratio (ER) and growth ratio (GR) and defined as following: Note that the theory of the above dimensionality criteria PC1, PC2, PC3, BIC3, IC1, IC2, IC3, IPC1, IPC2, IPC3, ABC.IC1, ABC.IC2, KSS.C, ER, and GR are developed for stochastically bounded factors. In order to estimate the number of unit root factors, Bai (2004) proposes the following panel criteria: where Alternatively, Onatski (2010) introduced a threshold approach based on the empirical distribution of the sample covariance eigenvalues, which can be used for both stationary and non-stationary factors. The estimated dimension is obtained bŷ where δ is a positive threshold, estimated iteratively from the data. We refer to this criterion as ED, which stands for eigenvalue differences.
As an illustration, imagine that we are interested in the estimation of the factor dimension of the variable ln(Consumption it ) with the dimensionality criterion "PC1". The function OptDim() requires a T × n matrix as input variable. In order to help users to choose the most appropriate dimensionality criterion for the data, 'OptDim' objects are provided with a plot() method. This method displays, in descending order, the magnitude of the eigenvalues in percentage of the total variance and indicates where the selected criteria detect the dimension; see Figure 3. After entering a number of factors, e.g., 6 we get the following feedback:
The asymptotic properties of Bai's method rely, among others, on the following assumption: where Σ F is a fixed positive definite d × d matrix. This allows for the factors to follow a deterministic time trend such as f t = t/T or to be stationary dynamic processes such that f t = ∞ j=1 C j e t−j , where e t are i.i.d. zero mean stochastic components. It is, however, important to note that such an assumption rules out a large class of non-stationary factors such as I(p) processes with p ≥ 1. Bai (2009) proposes to estimate the model parameters β, F and Λ i by minimizing the following least squares objective function:

Model with known number of factors d
For each given F , the OLS estimator of β can be obtained bŷ where The idea of Bai (2009) is to start with initial values for β or F and calculate the estimators iteratively. The method requires, however, the factor dimension d to be known, which is usually not the case in empirical applications.
A feasible estimator of (44) can be obtained by using an arbitrary large dimension d max greater than d. The factor dimension can be estimated subsequently by using the criteria of Bai and Ng (2002) to the remainder term Y i = X iβ (F (d max )), as suggested by Bai (2009). This strategy can lead, however, to inefficient estimation and spurious interpretation of β due to over-parameterization.

Model with unknown number of factors d
In order to estimate d jointly with β, F , and Λ i , Bada and Kneip (2014) propose to integrate a penalty term into the objective function to be globally optimized. In this case, the optimization criterion can be defined as a penalized least squares objective function of the form: The role of the additional term lg nT is to pick up the dimensiond, of the unobserved factor structure. The penalty g nT can be chosen according to Bai and Ng (2002). The estimation algorithm is based on the parameter cascading strategy of Cao and Ramsay (2010), which in this case can be described as follows: 1. Minimizing (45) with respect to Λ i for each given β, F and d, we get 2. Introducing (46) in (45) and minimizing with respect to F for each given β and d, we getF whereγ(β, d) is a T × d matrix that contains the first d eigenvectors corresponding to the first d eigenvalues ρ 1 , . . . , ρ d of the covariance matrixΣ = (nT ) −1 n i=1 w i w i with w i = Y i − X i β. (47) and (46) in (45) and minimizing with respect to β for each given d, we getβ

Reintegrating
4. Optimizing (45) with respect to l given the results in (46), (47), and (48) allows us to selectd aŝ The final estimators are obtained by alternating between an inner iteration to optimizê β(d),F (d), andΛ i (d) for each given d and an outer iteration to select the dimensiond. The updating process is repeated in its entirety till the convergence of all the parameters. This is why the estimators are called entirely updated estimators (Eup). In order to avoid over-estimation, Bada and Kneip (2014) propose to re-scale g nT in each iteration stage witĥ Simulations show that such a calibration can improve the finite sample properties of the estimation method.
It is notable that the objective functions (43) and (45) are not globally convex. There is no guarantee that the iteration algorithm converges to the global optimum. Therefore, it is important to choose reasonable starting valuesd start andβ start . We propose to select a large dimension d max and to start the iteration with the following estimate of β: where G is the T × d max matrix of the eigenvectors corresponding to the first d max eigenvalues of the augmented covariance matrix The intuition behind these starting estimates relies on the fact that the unobserved factors cannot escape from the space spanned by the eigenvectors G. The projection of X i on the orthogonal complement of G in (49) eliminates the effect of a possible correlation between the observed regressors and unobserved factors, which can heavily distort the value of β 0 if it is neglected. Greenaway-McGrevy, Han, and Sul (2012) give conditions under which (49) is a consistent estimator of β. In order to avoid misspecifying the model through identifying factors that only exist in X i and not Y i , Bada and Kneip (2014) recommend to under-scale the starting common factors G l that are highly correlated with X i .
According to Bai (2009), the asymptotic distribution of the slope estimatorβ(d) for known d is given by if heteroskedasticity in the time dimension exists and T /n → 0, , if correlation and heteroskedasticity in the time dimension exist and T /n → 0, and Case 6.
if heteroskedasticity in both time and cross-section dimensions exists with T /n 2 → 0 and n/T 2 → 0.
In presence of correlation and heteroskedasticity in panels with proportional dimensions n and T , i.e., n/T → c > 0, the asymptotic distribution ofβ(d) will be not centered at zero. This can lead to false inference when using the usual test statistics such as t and χ 2 statistic.
To overcome this problem, Bai (2009) proposes to estimate the asymptotic bias and correct the estimator as follows:β whereB andĈ are the estimators of In a similar context, Bada and Kneip (2014) prove that estimating d with the remaining model parameters does not affect the asymptotic properties ofβ(d). The asymptotic distribution of β =β(d) is given by under Cases 7-8, whereβ * =β * (d).
The asymptotic variance ofβ and the bias terms B and C can be estimated by replacing F , Λ i , Z it , and it withF ,Λ i ,Ẑ it , andˆ it respectively.
In presence of serial correlation (Cases 5 and 8), consistent estimators for D Z and C can be obtained by using the usual heteroskedasticity and autocorrelation (HAC) robust limiting covariance. In presence of cross-section correlation (
Setting the argument double.iteration = FALSE may speed up computations, because the updates ofd will be done simultaneously withF without waiting for their inner convergences. However, in this case, the convergence of the parameters is less stable than in the default setting.
The argument start.beta allows us to give a vector of starting values for the slope parameters β start . The maximal number of iteration and the convergence condition can be controlled by max.iteration and convergence.
In our application, we take first-order differences of the observed time series. This is because some factors show temporal trends, which can violate the stationarity condition (42); see Figure 2. We consider the following modified cigarettes model: In order to avoid notational mess, we use the same notation for the unobserved time-varying individual effects v it = d l=1 λ il f lt as above in (20). Thetransformation can be easily performed in R using the standard diff() function as follows: R> d.l.Consumption <-diff(l.Consumption) R> d.l.Price <-diff(l.Price) R> d.l.Income <-diff(l.Income) As previously mentioned for the KSS() function, the formula argument of the Eup() function takes balanced panel variables as T × n dimensional matrices, where the number of rows has to be equal to the temporal dimension T and the number of columns has to be equal to the individual dimension n. Inferences about the slope parameters can be obtained by using the summary() method. The type of correlation and heteroskedasticity in the idiosyncratic errors can be specified by choosing one of the corresponding Cases 1-8 described above using the argument error.type = c (1,2,3,4,5,6,7,8). In presence of serial correlations (Cases 5 and 8), the kernel weights required for estimating the long-run covariance can be externally specified by giving a vector of weights in the argument kernel.weights. By default, the function uses internally the linearly decreasing weights of Newey and West (1987) and a truncation at min{ √ n, √ T } . If Case 7 or 8 is chosen, the summary() method calculates the realization of the bias corrected estimators and gives appropriate inferences. The bias corrected coefficients can be called by using the coef() method on the object produced by summary(). The summary output reports that "PC3" detects 5 common factors. The effect of the differenced log-real prices for cigarettes on the differenced log-sales is negative and amounts to −0.31. The estimated effect of the differenced real disposable log-income per capita is 0.16.
The estimated factorsf tl as well as the individual effectsv it can be plotted using the plot() method for 'summary.Eup' objects. The corresponding graphics are shown in Figure 4.

Models with additive and interactive unobserved effects
Even though the classical additive "individual", "time", and "twoways" effects can be absorbed by the factor structure, there are good reasons to model them explicitly. On the one hand, if there are such effects in the true model, then neglecting them will result in nonefficient estimators; see Bai (2009). On the other hand, additive effects can be very useful for interpretation.
Consider now the following model: In order to ensure identification of the additional additive effects α i and θ t , we need the following further restrictions: T t=1 f lt = 0 for all l ∈ {1, . . . , d}, T t=1 θ t = 0.

phtt: Panel Data Analysis with Heterogeneous Time Trends in R
By using the classical within-transformations on the observed variables, we can eliminate the additive effects α i and θ t , such thatẏ Note that restrictions (d) and (e) ensure that the transformation does not affect the timevarying individual effects ν it . The parameters µ, α i and θ t can be easily estimated in a second step once an estimate of β is obtained. Because of restrictions (d) and (e), the solution has the same form as the classical fixed effects model.
The parameters β and ν it can be estimated by the above introduced estimation procedures. All possible variants of model (51) are implemented in the functions KSS() and Eup(). The appropriate model can be specified by the argument additive.effects = c("none", "individual", "time", "twoways"): "none" The presence of µ can be controlled by -1 in the formula argument: a formula with -1 refers to a model without intercept. However, for identification purposes, if a twoways model is specified, the presence -1 in the formula will be ignored.
As an illustration, we continue with the application of the KSS() function in Section 2. The left panel of Figure 2 shows that the first common factor is nearly time-invariant. This motivates us to augment the model (20) with a time-constant additive effects α i . In this case, it is convenient to use an intercept µ, which yields the following model: where The estimation of the augmented model (52) can be done using the following lines of code.

Specification tests
Model specification is an important step for any empirical analysis. The phtt package is equipped with two types of specification tests: the first is a Hausman-type test appropriate for the model of Bai (2009). The second one examines the existence of a factor structure in Bai's model as well as in the model of Kneip et al. (2012).

Testing the sufficiency of classical additive effects
For the case in which the estimated number of factors amounts to one or two (1 ≤d ≤ 2), it is interesting to check whether or not these factors can be interpreted as classical "individual", "time", or "twoways" effects. Bai (2009) considers the following testing problem: The model with factor structure, as described in Section 4, is consistent under both hypotheses. However, it is less efficient under H 0 than the classical within estimator, while the latter is inconsistent under H 1 if x it and v it are correlated. These conditions are favorable for applying the Hausman test: whereβ within is the classical within least squares estimator, ∆ is the asymptotic variance of √ nT β −β within , P is the vector-dimension of β, and χ 2 P is the χ 2 -distribution with P degrees of freedom.
Under i.i.d. errors, J Bai can be calculated by replacing ∆ with its consistent estimator The residual variance estimatorσ 2 is chosen here, since it is supposed to be consistent under the null as well as the alternative hypothesis. The idea behind this trick is to avoid negative definiteness of∆. But notice that even with this construction, the possibility of getting a negative definite variance estimator cannot be excluded. As an illustration, consider the case in which the true number of factors is greater than the number of factors used under the alternative hypothesis, i.e., the true d > 2. In such a case, the favorable conditions for applying the test can be violated, since the iterated least squares estimatorβ is computed withd ≤ 2 and can be inconsistent under both hypotheses. To avoid such a scenario, we recommended to the user to calculateβ with a large dimension d max instead ofd ≤ 2.
The test is implemented in the function checkSpecif(), which takes the following arguments: R> checkSpecif(obj1, obj2, level = 0.05) The argument level is used to specify the significance level. The arguments obj1 and obj2 take both objects of class Eup produced by the function Eup(): obj1 Takes an 'Eup' object from an estimation with "individual", "time", or "twoways" effects and a factor dimension equal to d = 0; specified as factor.dim = 0.
obj2 Takes an 'Eup' object from an estimation with "none"-effects and a large factor dimension d max ; specified with the argument factor.dim.
If the test statistic is negative (due to the negative definiteness of∆), the checkSpecif() prints an error message.
Notice that the Hausman test of Bai (2009) assumes the within estimator to be inconsistent under the alternative hypothesis, which requires x it to be correlated with v it . If this assumption is violated, the test can suffer from power to reject the null hypothesis, since the within estimator becomes consistent under both hypotheses. Bai (2009) discusses in his supplementary material another way to check whether a classical panel data with fixed additive effects is sufficient to describe the data. His idea consists of estimating the factor dimension after eliminating the additive effects as described in Section 5. If the obtained estimate of d is zero, the additive model can be considered as a reasonable alternative for the model with factor structure. But note that this procedure cannot be considered as a formal testing procedure, since information about the significance level of the decision are not provided.
An alternative test for the sufficiency of a classical additive effects model can be given by manipulating the test proposed by Kneip et al. (2012) as described in the following section.

Testing the existence of common factors
This section is concerned with testing the existence of common factors. In contrast to the Hausman-type statistic discussed above, the goal of this test is not merely to decide which model specification is more appropriate for the data, but rather to test in general the existence of common factors beyond the possible presence of additional classical "individual", "time", or "twoways" effects in the model. This test relies on using the dimensionality criterion proposed by Kneip et al. (2012) to test the following hypothesis after eliminating eventual additive "individual", "time", or "twoways" effects: Under H 0 the slope parameters β can be estimated by the classical within estimation method. In this simple case, the dimensionality test of Kneip et al. (2012) can be reduced to the following test statistic: whereΣ w is the covariance matrix of the within residuals. The reason for this simplification is that under H 0 there is no need for smoothing, which allows us to set κ = 0.
We reject H 0 : d = 0 at a significance level α, if J KSS > z 1−α , where z 1−α is the (1−α)-quantile of the standard normal distribution. It is important to note that the performance of the test depends heavily on the accuracy of the variance estimatorσ 2 . We propose to use the variance estimators (15) or (55), which are consistent under both hypotheses as long asd is greater than the unknown dimension d. Internally, the test procedure setsd = d.max and σ 2 as in (55).
This test can be performed for 'Eup' as well as for 'KSS' objects using the function checkSpecif() leaving the second argument obj2 unspecified. In the following, we apply the test for both models.
For the model of Bai (2009)  The null hypothesis H 0 : d = 0 can be rejected for both models at a significance level α = 0.01.

Interpretation
This section is intended to outline an exemplary interpretation of the panel model (52), which is estimated by the function KSS() in Section 5. The interpretation of models estimated by the function Eup() can be done accordingly. For convenience sake, we re-write the model (52) in the following: where v i (t) = d l=1 λ il f l (t).
A researcher, who chooses the panel models proposed by Kneip et al. (2012) or Bai (2009), will probably find them attractive due to their ability to control for very general forms of unobserved heterogeneity. Beyond this, a further great advantage of these models is that the time-varying individual effects v i (t) provide a valuable source of information about the differences between the individuals i. These differences are often of particular interest as, e.g., in the literature on stochastic frontier analysis.
The left panel of Figure 5 shows that the different states i have considerable different timeconstant levelsα i of cigarette consumption. A classical further econometric analysis could be to regress the additive individual effectsα i on other time-constant variables, such as the general populations compositions, the cigarette taxes, etc.
The right panel of Figure 5 shows the five estimated common factorsf 1 (t), . . . ,f 5 (t). It is good practice to start the interpretation of the single common factors with an overview about their importance in describing the differences between the v i (t)'s, which is reflected in the variances of the individual loadings parametersλ il . A convenient depiction is the quantity of variance-shares of the individual loadings parameters on the total variance of the loadings parameters  Table 1: List of the variance shares of the common factorsf 1 (t), . . . ,f 5 (t). which is shown for all common functionsf 1 (t), . . . ,f 5 (t) in Table 1.
The values in Table 1 suggest to focus on the first two common factors, which explain together about 90% of the total variance of the time-varying individual effectsv i (t).
The first two common factors coef(Cigar2.KSS)$Common.factors[,1] =f 1 (t) and coef(Cigar2.KSS)$Common.factors[,2] =f 2 (t) are plotted as black and red lines in the middle panel of Figure 5. Figure 6 visualizes the differences of the time-varying individual effects v i (t) in the direction of the first common factor (i.e.,λ i1f1 (t)) and in the direction of the second common factor (i.e.,λ i2f2 (t)). As for the time-constant individual effectsα i a further econometric analysis could be to regress the individual loadings parametersλ i1 andλ i2 on other explanatory time-constant variables.
Generally, for both models proposed by Kneip et al. (2012) and Bai (2009)  can be interpreted as it is usually done in the literature on factor models. An important topic that is not covered in this section is the rotation of the common factors. Often, the common factors f l can be interpreted economically only after the application of an appropriate rotation scheme for the set of factorsf 1 , . . . ,fd. The latter can be done, e.g., using the function varimax() from the stats package. Alternatively, many other rotation schemes can be found in the GPArotation package (R Core Team 2014; Bernaards and Jennrich 2005). Sometimes, it is also preferable to standardize the individual loadings parameters instead of the common factors as it is done, e.g., in Ahn, Lee, and Schmidt (2001). This can be done by choosing restrict.mode = "restrict.loadings" in the functions KSS() and Eup() respectively.

Summary
This paper introduces the R package phtt for the new class of panel models proposed by Bai (2009) and Kneip et al. (2012). The two main functions of the package are the Eup() function for the estimation procedure proposed in Bai (2009) and the KSS() function for the estimation procedure proposed in Kneip et al. (2012). Both of the main functions are supported by the usual print(), summary(), plot(), coef() and residuals() methods. While parts of the method of Bai (2009) are available for commercially available software packages, the estimation procedure proposed by Kneip et al. (2012) is not available elsewhere. A further remarkable feature of our phtt package is the OptDim() function, which provides an easy access to many different dimensionality criteria proposed in the literature on factor models. The usage of the functions is demonstrated by a real data application.