Copula regression spline sample selection models:the R Package SemiParSampleSel

Sample selection models deal with the situation in which an outcome of interest is observed for a restricted non-randomly selected sample of the population. The estimation of these models is based on a binary equation, which describes the selection process, and an outcome equation, which is used to examine the substantive question of interest. Classic sample selection models assume a priori that continuous covariates have a linear or pre-specified non-linear relationship to the outcome, and that the distribution linking the two equations is bivariate normal. We introduce the R package SemiParSampleSel which implements copula regression spline sample selection models. The proposed implementation can deal with non-random sample selection, non-linear covariate-response relationships, and non-normal bivariate distributions between the model equations. We provide details of the model and algorithm and describe the implementation in SemiParSampleSel. The package is illustrated using simulated and real data examples.


Introduction
The sample selection model was introduced by Gronau (1974), Lewis (1974) and Heckman (1976) to deal with the situation in which the observations available for statistical analysis are not from a random sample of the population; the model was discussed by Heckman (1990) among others. This issue occurs when individuals have selected themselves into (or out of) the sample based on a combination of observed and unobserved characteristics. Estimates based on models that ignore such a non-random selection may be biased and inconsistent.
To fix ideas, let us consider the RAND Health Insurance Experiment (RHIE), a study con-ducted in the United States between 1974and 1982(Newhouse 1999 which will also be analyzed in Section 5. The aim was to quantify the relationship between several socio-economic characteristics and annual health expenditures. Non-random selection arises if the sample consisting of individuals who used health care services differ in important characteristics from the sample of individuals who did not use them. If the link between the decision to use the services and health expenditure is through observables, then selection bias can be avoided by accounting for these variables. However, if the link is through unobservables as well then inconsistent parameter estimates are obtained when using a classic univariate equation method. There are two more aspects that may complicate modeling the relationship between covariates and annual health expenditure. Variables such as age and education are likely to have a non-linear relationship with both decisions, the one to use health services and the one of the amount to spend on them; this is because they embody productivity and life-cycle effects that are likely to have non-linear effects. Imposing a priori a linear relationship (or non-linear by simply using quadratic polynomials, for example) could mean failing to capture the true more complex relationships. Finally, the (often criticized) assumption of bivariate normality (employed in many sample selection models) between decision to use health services and expenditure may be too restrictive for applied work and is typically made for mathematical convenience.
The literature on sample selection models is vast and many variants of such models have been proposed. Chib, Greenberg, and Jeliazkov (2009) and Wiesenfarth and Kneib (2010) introduced two estimation methods to deal with non-linear covariate effects. Specifically, the approach of the former authors is based on Markov chain Monte Carlo simulation techniques and uses a simultaneous equation system that incorporates Bayesian versions of penalized smoothing splines. The latter further extended this approach by introducing a Bayesian algorithm based on low rank penalized B-splines for non-linear and varying-coefficient effects and Markov random-field priors for spatial effects. Recently, Marra and Radice (2013) proposed a frequentist counterpart which has the advantage of being computationally fast and might be especially appealing to practitioners already familiar with traditional frequentist techniques.
Under the assumption of bivariate normality Heckman (1979) proposed a two-step estimator. However because the estimator is inconsistent under distributional misspecification various methods that relax the assumption of normality have been proposed over the years; these include semiparametric (e.g., Gallant and Nychka 1987;Powell, Stock, and Stoker 1989;Ahn and Powell 1993;Lee 1994a,b;Powell 1994;Andrews and Schafgans 1998;Newey 2009) and nonparametric methods (e.g., Das, Newey, and Vella 2003;Lee 2008;Chen and Zhou 2010). Another way to relax the normality assumption is to use non-normal parametric distributions. Recently, Marchenko and Genton (2012) and Ding (2014) extended the sample selection model to deal with heavy tailedness by using the bivariate Student-t distribution. Another parametric method, which includes as a subcase the above mentioned Student-t approach, is copula modeling. This allows for a great deal of flexibility in specifying the joint distribution of the selection and outcome equations (e.g., Smith 2003;Prieger 2002;Hasebe and Vijverberg 2012;Schwiebert 2013).
In summary, the numerous estimation approaches that deal with the relaxation of the assumption of normality in the sample selection model can be divided into two large groups: semi/non-parametric and flexible parametric estimators. The first relaxes the assumption of bivariate normality by using a general bivariate density function, whereas the second offers the possibility of replacing bivariate normality with an alternative parametric stochastic structure. There are advantages and disadvantages to both approaches (semi/non-parametric and flexible parametric). The strongest point of the semi/non-parametric approach is the property of maintaining consistency of such estimators even when disposing, in part or altogether, of distributional assumptions. In some cases, simplified versions of these methods are easy to implement (e.g., Das et al. 2003). However, these estimators do have shortcomings. Specifically, semi/non-parametric methods are usually restricted when it comes to including a large set of covariates in the model and the resulting estimates are inefficient relatively to fully parametrized models (e.g., Bhat and Eluru 2009). To date, packages implementing semi/nonparametric procedures are CPU-intensive and the set of options provided is often quite limited. In addition, convergence problems are likely to occur when using models which include, for instance, many discrete variables and interactions. As for the parametric approach, many scholars agree upon its greater computational feasibility as compared to semi/non-parametric approaches, which allows for the use of familiar tools such as maximum likelihood without requiring simulation methods or numerical integration. As pointed out by Smith (2003), maximum likelihood techniques allow for the simultaneous estimation of all model parameters, and such methods, if the usual regularity conditions hold and the model is correctly specified, ensure consistent, efficient and asymptotically normal estimators. In addition, when using copulas the practitioner has the possibility of a piece-wise model specification. This is because marginal distributions are not constrained to belong to the same family of the chosen bivariate copula distribution. Moreover, Genius and Strazzera (2008) argue that copula modeling allows for direct estimation of the dependence structure in the sample selection model while non-parametric methods do not. However, a crucial point stands on the correct specification of these models; maximum likelihood estimators are not consistent when the distributional assumption is not correct. Also, testing the distributional assumption is not straightforward. In the context of Heckman's two-step estimator, Lee (1982Lee ( , 1984 presented misspecification tests based on bivariate Edgeworth expansions. Recently, Montes-Rojas (2011) proposed a similar methodology for testing normality in sample selection models. Specifically, he proposed Lagrange multiplier and Neyman's C(α) tests for the marginal normality and linearity of the conditional expectation of the error terms for the two-step estimator. Although these tests provided encouraging results, more research is necessary to construct likelihood ratio and Wald tests. As for the maximum likelihood approach, to date, all that can be done is a posteriori model selection using, for instance, traditional information criteria. Finally, while a fully parametric copula approach is less flexible than semi/non-parametric approaches, it still allows the user to assess the sensitivity of results to different modeling assumptions. Some of the methods described above are implemented in popular software packages like SAS (SAS Institute Inc. 2011), Stata (StataCorp. 2011) and R (R Core Team 2016). For example, the conventional Heckman sample selection model can be fitted in SAS using proc qlim and in Stata using heckman. The non-parametric method by Lee (2008) can be employed using the Stata package leebounds and the bivariate Student-t distribution Heckman model using heckt. In R available sample selection packages are sampleSelection (Toomet and Henningsen 2008), bayesSampleSelection (Wiesenfarth and Kneib 2010), available from the first author's webpage, ssmrob (Zhelonkin, Genton, and Ronchetti 2013) and SemiParBIVProbit (Marra and Radice 2015). sampleSelection and bayesSampleSelection make the assumption of bivariate normality between the model equations. sampleSelection and ssmrob assume a priori that continuous regressors have linear or pre-specified non-linear relationships to the responses, whereas ssmrob relaxes the assumption of bivariate normality by providing a ro-bust two-stage estimator of Heckman's approach. sampleSelection and SemiParBIVProbit support binary responses for the outcome equation, with the latter allowing for non-linear covariate effects and non-Gaussian bivariate distributions. It is also worth mentioning the packages censReg (Henningsen 2012) which deals with censored dependent variables, and intReg (Toomet 2012) which implements interval regression models.
We introduce the R package SemiParSampleSel (Marra, Radice, Wojtyś, and Wyszynski 2016) to deal simultaneously with non-random sample selection, non-linear covariate effects and non-normal bivariate distribution between the model equations. The problem of nonrandom sample selection is addressed using the conventional system of two equations: a binary selection equation determining whether a particular statistical unit will be available in the outcome equation. Covariate-response relationships are flexibly modeled using a spline approach whereas non-normal distributions are dealt with by using copula functions. The core algorithm is based on the penalized maximum likelihood framework proposed by Marra and Radice (2013) for the bivariate normal case. We further extend this by allowing for non-normal bivariate distributions using copulas. Note that if a normal copula is chosen and linear or pre-specified covariate effects are assumed then, similarly to sampleSelection, SemiParSampleSel fits the classical Heckman sample selection model using the maximum likelihood approach. We believe that when a practitioner faces a non-normality problem in the sample selection model, the option offered by the copula approach is worth pursuing whenever the accuracy of structural parameter estimates is the priority. Well motivated conjectures on the stochastic structure of the phenomenon may lead to specifications better fitting the data than the traditional sample selection model. Moreover, using different assumptions on the bivariate distribution, as it happens with copulas, allows the specification of the conditional mean to remain intact. This is crucial for the interpretability of the model parameters.
The paper is organized as follows. In the next section, we present the model, describe the algorithm used to estimate the model parameters and discuss inferential and numerical issues. Section 3 provides details on the implementation of the model in SemiParSampleSel. In Section 4, we illustrate the usage of the package on various simulated data sets, whereas Section 5 is devoted to an illustrative real data example.

Model definition
In the sample selection problem, our aim is to fit a regression model when some observations of the outcome variable are missing not at random. Thus assuming that y * 2i , for i = 1, . . . , n, is a random variable of our primary interest, we can represent the random sample using a pair of variables (y 1i , y 2i ), such that y i1 ∈ {0, 1} and y 2i = y * 2i y 1i . The variable y 1i governs whether or not an observation of the variable of primary interest is generated and the unobserved values of the variable of interest are coded as 0. In the model statement, a latent continuous variable y * 1i such that y 1i = 1(y * 1i > 0) is used, where 1 is the indicator function. Let F i denote the joint cumulative distribution function (cdf) of (y * 1i , y * 2i ) and let F 1i and F 2i be the marginal cdf's pertaining to y * 1i and y * 2i , respectively. We assume normality of the marginal distributions whilst the relationship between them is modeled using a copula approach. That is, y * 1i ∼ N (µ 1i , 1) (which yields a probit model for y 1i ) and y * 2i ∼ N (µ 2i , σ), where µ 1i , µ 2i ∈ R are linear predictors defined in the next section and σ > 0, the standard deviation, is unknown. F 1i relates to the selection equation and F 2i to the outcome equation. The model is then defined by using the copula representation for some two-place function C which is unique, where θ is an association parameter measuring the dependence between the two marginal cdf's. In the package, the families currently implemented are normal, Clayton, Joe, Frank, Gumbel, Farlie-Gumbel-Morgenstern (FGM), and Ali-Mikhail-Haq (AMH); these are listed in Table 1. Rotations by 90, 180 and 270 degrees for Clayton, Joe and Gumbel can be obtained using the results reported in Brechmann and Schepsmeier (2013); these will be available in future releases. As it can be seen from Table 1, θ may be difficult to interpret in some cases. To this end, we can use the Kendall's τ coefficient which is a measure of association that lies in the customary range [−1, 1]. This is generally defined as τ = P ((y * 11 − y * 12 )(y * 21 − y * 22 ) > 0) − P ((y * 11 − y * 12 )(y * 21 − y * 22 ) < 0) for independent pairs (y * 1j , y * 2j ), j = 1, 2, that are copies of (y * 1 , y * 2 ). Testing the null hypothesis of absence of selection bias is an important issue. If the null hypothesis cannot be rejected then joint estimation of the two model equations can be avoided and consistent estimates for the parameters of the equation of interest can be obtained using a univariate equation model. In the context of the copula regression spline sample selection model, the absence of sample selection bias is equivalent to the condition that the Kendall's τ coefficient equals 0. Thus the null hypothesis can, for instance, be tested by checking whether the confidence interval for the Kendall's τ includes 0. The problem of testing for sample selection bias is further addressed in Section 4.3. For a comprehensive introduction to the theory of copulas and their properties see the monographs of Nelsen (2006) and Joe (1997).

Copula likelihood
The log-likelihood function for the sample selection model can be expressed as a sum over two disjoint subsets of the sample: one for the observations with a missing value of the response of interest and the other for the remaining observations. In the first case, the likelihood for the ith observation takes the simple form of P(y 1i = 0), which is equivalent to F 1i (0). In the second case, the joint likelihood can be expressed, using the multiplication rule, as P(y * 1i > 0)f 2|1,i (y 2i |y * 1i > 0), where f 2|1,i denotes the probability density function of y * 2i given y * 1i > 0. After substituting the conditional density f 2|1,i (y 2i |y * 1i > 0) by Using (1), we then have where z i = ∂ ∂v C(F 1i (0), v; θ) v→F 2i (y 2i ) . The normality of margins implies that F 1i (0) = Φ(−µ 1i ) and f 2i (y 2i ) = σ −1 φ (y 2i − µ 2i )σ −1 , where Φ and φ are used throughout the paper to denote the standard normal distribution and density functions, respectively.

Linear predictor specification
We assume that the expected values µ 1i and µ 2i of variables y * 1i and y * 2i , respectively, are linked with the predictors, i.e., µ 1i = η 1i and µ 2i = η 2i , where the linear predictor of the selection equation can be written as and that of the outcome equation as where vector u 1i = (1, u 12i , . . . , u 1P 1 i ) is the ith row of U 1 = (u 11 , . . . , u 1n ) , the n×P 1 model matrix containing P 1 parametric model components (e.g., intercept, dummy and categorical variables), α 1 is a parameter vector, and the s 1k 1 are unknown smooth functions of the K 1 continuous covariates z 1k 1 i . Similarly, u 2i = (1, u 22i , . . . , u 2P 2 i ) is the ith row vector of the n s × P 2 model matrix U 2 = (u 21 , . . . , u 2ns ) , where n s is the size of the selected sample, α 2 is a parameter vector, and the s 2k 2 are unknown smooth terms of the K 2 continuous regressors z 2k 2 i . The smooth functions are subject to the centering (identifiability) constraint Wood 2006). The smooth functions are represented using regression splines, where, in the one-dimensional case, a generic s k (z ki ) is approximated by a linear combination of known spline basis functions, b kj (z ki ), and regression parameters, β kj , i.e., s k (z ki ) = J k j=1 β kj b kj (z ki ) = β k B k (z ki ), where J k is the number of spline bases used to represent s k , B k (z ki ) is the ith vector of dimension J k containing the basis functions evaluated at the observation z ki , i.e., B k (z ki ) = {b k1 (z ki ), b k2 (z ki ), . . . , b kJ k (z ki )} , and β k is the corresponding parameter vector. The subscript indicating which equation each smooth component belongs to has been suppressed for simplicity. Calculating B k (z ki ) for each i yields J k curves (encompassing different degrees of complexity) which multiplied by some real valued parameter vector β k and then summed will give a (linear or non-linear) estimate for s k (z k ) (see, for instance, Marra and Radice 2010, for a more detailed overview). Basis functions should be chosen to have convenient mathematical and numerical properties. B-splines, cubic regression and low rank thin plate regression splines are supported in our implementation (see Wood 2006, for full details on these spline bases). Our implementation also supports varying coefficients' models, obtained by multiplying one or more smooth terms by some predictor(s), smooth functions of two or more (e.g., spatial) covariates, random effect and Markov random field smooth functions, to name but a few (Wood 2006). These cases follow a similar construction as described above. For instance, in the case of a smooth of two variables z 1i and z 2i we would have where the specification of the basis functions depends again on the kind of spline chosen (Wood 2006). Linear predictors (3) and (4) can therefore be written as and In principle, the parameters of the sample selection model are identified even if the same regressors appear in both linear predictors (e.g., Wiesenfarth and Kneib 2010). However, better estimation results are generally obtained when the set of regressors in the selection equation contains at least one or more regressors (usually known as exclusion restrictions) that are not included in the outcome equation (e.g., Marra and Radice 2013).

Estimation approach
Denote the log-likelihood function as (δ), where δ = (δ 1 , δ 2 , σ, θ) and Given the flexible structure of the linear predictors considered here, unpenalized estimation can result in smooth term estimates that are too rough to produce practically useful results. This issue is dealt with by using the penalty term 2 v=1 Kv kv=1 λ vkv s vkv (z vkv ) 2 dz vkv which measures the (typically, second-order) roughness of the smooth terms in the model. For a smooth of two variables generically written as s 12 (z 1 , z 2 ) and represented using thin plate regression splines the integral would look like ∂ 2 s 12 ∂z 2 1 2 +2 ∂ 2 s 12 ∂z 1 ∂z 2 2 + ∂ 2 s 12 ∂z 2 2 2 dz 1 dz 2 , where the subscripts have been dropped to avoid clutter. The λ vkv are smoothing parameters controlling the trade-off between fit and smoothness. Since regression splines are linear in their model parameters, the overall penalty can be written as β S λ β where β = (β 1 , β 2 ), S λ = 2 v=1 Kv kv=1 λ vkv S vkv and the S vkv are positive semi-definite known square matrices expanded with zeros everywhere except for the elements which correspond to the coefficients of the vk v th smooth term. Because of the restrictions on the values that θ can take, we use a proper transformation of it, θ * , in order to avoid the use of a constraint when estimating this parameter (see Table 2 for the list of transformations used). Similarly, since σ can only take positive real values, we use σ * = log(σ). So, in optimization, we use δ * = (δ 1 , δ 2 , σ * , θ * ) ∈ R p , where p is the total number of parameters. Therefore, the function to maximize is Given a parameter vector value forλ = (λ 1k 1 , . . . ,λ 1K 1 ,λ 2k 2 , . . . ,λ 2K 2 ), we seek to maximize (5). The issues with this maximization problem are that p (δ * ) is not globally concave and the penalized Hessian may be non-positive definite on some occasions (Toomet and Henningsen 2008;Marra and Radice 2013). To this end, we use a trust region approach which is typically believed to be more stable than its line-search counterparts, particularly for functions that are, for example, non-concave and/or exhibit regions that are close to flat (Nocedal and Wright 2006, Chapter 4).
Consider a minimization problem and let a be an iteration index. Intuitively speaking, line search methods choose a direction to move from, say, m a to m a+1 and find the distance along that direction which gives the best improvement in the objective function. If the function is, for instance, non-convex or has long plateaus, the optimizer may search far away from m a but choose an m a+1 that is close to m a and that offers marginal improvement in the objective function. In some cases, the function will be evaluated so far away from m a that it will not be finite and the algorithm will fail. Trust region methods choose a maximum distance for the move from m a to m a+1 , defining a "trust region" around m a that has a radius of that maximum distance, and then let a candidate for m a+1 be the minimum of a quadratic approximation of the objective function. Since points outside of the trust region are not considered, the algorithm never runs too far and/or too fast from the current iteration.
The trust region is shrunken if the proposed point in the region is worse/not better than the current point. The new problem with smaller region is then solved. If a point close to the boundary of the trust region is accepted and it gives a large enough improvement in the function then the region for the next iteration is expanded. If a point along a search path causes the objective function to be undefined or indeterminate, most implementations of line search methods will fail and user intervention is required. In the trust region approach, the search for m a+1 is always a solution to the trust region problem; if the function at the proposed m a+1 is not finite or not better than the value at m a , then the proposal is rejected and the trust region shrunken. Finally, a line search approach requires repeated estimation of the objective function, while trust region methods evaluate the objective function only after solving the trust region problem. Hence, trust region methods can be considerably faster when the objective function is expensive to compute. Full details can be found in Nocedal and Wright (2006, Chapter 4).
At each iteration of the algorithm,˘ p (δ [a] * ) is minimized subject to the constraint that the solution falls within a trust region with radius r [a] . The proposed solution is then accepted or rejected and the trust region expanded or shrunken based on the ratio between the improvement in the objective function when going from δ * [a] to δ * [a+1] and that predicted by the quadratic approximation. Note that, near the solution, the trust region Newton algorithm typically behaves like a Newton algorithm. Table 2: Transformations, θ * , of the dependence parameter, θ, used in optimization. Quantity is set to the machine smallest positive floating-point number multiplied by 10 6 and is used to ensure that the dependence parameters lie in the ranges reported in Table 1.

Smoothing parameter estimation
Multiple smoothing parameter estimation by direct grid search optimization of, for instance, a prediction error criterion can be computationally expensive, especially if the model has more than one smooth term per equation. This section briefly describes the automatic approach employed by Marra and Radice (2013) to estimate λ. Note that joint estimation of δ * and λ via maximization of (5) would clearly lead to overfitting since the highest value for p (δ * ) would be obtained when λ = 0. Parameter vectorλ is the solution to the problem where z = z i is the 4-dimensional vector n * = 4n, A λ = X(X WX + S * λ ) −1 X W is the hat matrix, and tr(A λ ) the estimated degrees of freedom (edf ) of the penalized model. The iteration index has been dropped to avoid clutter. Note that the working linear model quantities are constructed for a given estimate of δ * . Iteration (6) will produce an updated estimate for λ which will then be used to obtain a new parameter vector estimate for δ * . The two steps, one for δ * and the other for λ, are iterated until convergence.

Confidence intervals, variable selection and model selection
Inferential theory for penalized estimators is complicated by the presence of smoothing penalties which undermines the usefulness of classic frequentist results for practical modeling.
As shown in Marra and Radice (2013), reliable pointwise confidence intervals for the terms of a regression spline sample selection model can be constructed using where y refers to the response vectors,δ * is an estimate of δ * and V δ * = (−H + S * λ ) −1 . The structure of V δ * is such that it includes both a bias and variance component in a frequentist sense, which is why such intervals exhibit close to nominal coverage probabilities (Marra and Wood 2012). Given (7), confidence intervals for linear and non-linear functions of the model parameters can be easily obtained. For instance, for a genericŝ k (z ki ) these can be obtained usingŝ where V δ * k is the submatrix of V δ * corresponding to the regression spline parameters associated with kth function. Intervals for non-linear functions of the estimated model coefficients (i.e., σ, θ) can be conveniently obtained by simulation from the posterior distribution of δ * .
As for the parametric model components, using (7) is equivalent to using classic likelihood results because such terms are not penalized.
Result (8) can be used to find intervals for s k (z ki ) for each k and i but cannot be used to test whether smooth terms are equal to zero (e.g., Ruppert, Wand, and Carroll 2003, Chapter 6). For this purpose, p values or shrinkage methods may be employed. To test smooth components for equality to zero we use the results by Wood (2013). Defineŝ k = B k (z k )β k , where B k (z k ) denotes a full column rank matrix and z k = (z k1 , z k2 , . . . , z kn ) , and . It is then possible to obtain approximate p values for testing smooth components for equality to zero based on is the rank r k Moore-Penrose pseudoinverse of V s k . Parameter r k is selected using the established notion of edf used in (6). Because edf is not an integer, it can be rounded as follows (Wood 2013) which proved effective in semiparametric bivariate probit models (Marra 2013).
As an alternative, the shrinkage single penalty approach presented in Marra and Wood (2011) can be adopted. Specifically, the generic second-order smoothing penalty matrix S k can be decomposed as U k Λ k U k , where U k is an eigenvector matrix associated with the kth smooth function, and Λ K the corresponding diagonal eigenvalue matrix. Because a part of the spline basis deals with the penalty null space, Λ k contains zero eigenvalues. So even if λ k goes to infinity the smooth term of a nuisance variable may still be estimated as non-zero, because the function component in the null space (i.e., the linear term) is unpenalized. This can be fixed by replacing Λ k withΛ k , where the latter is the same as the former except that the zero eigenvalues are set to a small proportion, typically 0.1, of the smallest strictly positive eigenvalue of S k . This forces the eigenvalues of the new penalty matrix,S k , associated with the penalty null space to be different from zero. Hence a smooth component can in principle be removed from the model altogether.
Copula models with a single dependence parameter can be thought of as non-nested models. As suggested by Zimmer and Trivedi (2006) among others, one approach for choosing between copula models is to use either the Akaike or (Schwarz) Bayesian information criterion (AIC and BIC , respectively). In our case, AIC = −2 (δ * ) + 2edf and BIC = −2 (δ * ) + log(n)edf , where the log-likelihood is evaluated at the penalized parameter estimates and edf = tr(Âλ).

Numerical considerations
As explained in Section 2.2, a trust region Newton algorithm is a more reliable choice to estimate the model parameters. As for the initial values, they are provided by using an extension of the Heckman (1979) procedure detailed in Appendix B of Marra and Radice (2013). The adopted approach proved to be fast and reliable in most cases, with occasional convergence failure for small values of n and n s .
As the analytical expressions for g and H of the copula log-likelihood functions are very complicated, numerical issues may be encountered in some cases when certain quantities take values which lie nearby their boundaries. Firstly, this may occur when the dependence between the margins is very strong or very weak, i.e., when θ takes extreme values (for example, association tending to 1 implies θ → ∞ for a number of copulas). This leads to expressions which are equal to Inf during the numerical evaluations, especially the Frank copula where the exponential transformation of θ appears in the expressions for the gradient and Hessian. Secondly, data points which lie in the tails of F 1i and F 2i will lead to their values equal to 0 or 1. Also, the value of z i appearing in the log-likelihood (2) may be approximately equal to 1, hence producing -Inf. These numerical problems are dealt with by truncating the values of F 1i , F 2i , f 2i and z i to the interval (ε, 1 − ε) with ε = 10 −10 . Moreover, the ratio φ(x)/Φ(x) appearing in the expressions for g and H is defined using the approximation φ(x)/Φ(x) ∼ −x for x < −35 in order to avoid NaN. If a given model cannot be fitted due to numerical issues then the user receives the message Ill-conditioned task. It is worth noting that numerical problems that arise when fitting a model may be also a hint that the chosen model is not appropriate to fit the data at hand.

Overview of the package
The SemiParSampleSel package is available from the Comprehensive R Archive Network The main function in SemiParSampleSel is SemiParSampleSel(), which fits copula regression spline sample selection models as described in the previous section. The function can be called using the following syntax: SemiParSampleSel(list(formula.eq1, formula.eq2), data = list(), BivD = "N", margins = c("N", "N"), infl.fac = 1, ...) The first argument is a list of formula.eq1 and formula.eq2 which are the formulas for the selection and outcome equations, respectively. These are glm like formulas except that smooth terms can be included in the equations as for gam in mgcv. For instance, the selection equation may look like: y.sel~as.factor(x1) + s(x2, bs = "cr", k = 10, m = 2) + s(x3, x4) + ...
where y.sel represents the binary selection variable, x1 is a categorical predictor, and the s terms are used to specify smooth functions of the continuous predictors x2, x3 and x4. Argument bs specifies the spline basis; possible choices include "cr" (cubic regression spline), "cs" (shrinkage version of "cr"), "tp" (thin plate regression spline) and "ts" (shrinkage version of "tp"). Bivariate smoothing, e.g., s(x3, x4), is achieved using bs = "tp". k is the basis dimension (default is 10) and m the order of the penalty (default is 2). More details and options on smooth term specification can be found in the documentation of mgcv. SemiParSampleSel does not currently support the use of tensor product smooths. Optional arguments of the function SemiParSampleSel include data which is a data frame, list or environment containing the variables in the model, and infl.fac which is an inflation factor for the model degrees of freedom used in the smoothing step. Smoother models can be obtained setting this parameter to a value greater than 1; infl.fac = 1.4 typically achieves this and was found by Kim and Gu (2004) in a different context. The type of bivariate copula linking the two model equations can be specified through BivD. Possible choices are "N", "C0", "J0", "FGM", "F", "AMH" and "G0" which stand for bivariate normal, Clayton, Joe, Farlie-Gumbel-Morgenstern, Frank, Ali-Mikhail-Haq and Gumbel. The argument margins specifies the marginal distributions of the selection and outcome equations, given in the form of a two-dimensional vector which is equal to c("N", "N") for normal margins. Details on all the other arguments, including starting value and control options, and the fitted-object list that the function returns can be found in Marra et al. (2016).
Other available functions are: • plot(x, eq, pages = 0, scale = -1, shade = FALSE, seWithMean = FALSE, ...). This function takes a fitted object x as produced by SemiParSampleSel() and plots the component smooth functions that make it up on the scale of the linear predictor. eq denotes the equation from which smooth terms should be considered for plotting, pages is the number of pages over which to produce the plots (e.g., if pages = 1 then all terms will be plotted on one page), and scale is the y-axis scale to use for each plot (scale = 0 gives a different axis for each plot). If shade is set to TRUE then shaded regions as confidence bands for smooth terms are produced. Of interest is the argument seWithMean which indicates whether the component smooth should be shown with confidence intervals that include the uncertainty about the overall mean. Marra and Wood (2012) showed that seWithMean = TRUE results in intervals with better nominal frequentist coverage probabilities. This function is based on plot method for 'gam' objects in mgcv to which the reader is referred for full details.
• aver(x, sig.lev = 0.05, ...). This function takes an object fitted with the function SemiParSampleSel() and calculates the overall estimated average correct for nonrandom sample selection, with corresponding confidence interval obtained using the delta method.
• summary(object, n.sim = 1000, s.meth = "svd", prob.lev = 0.05, ...). This function produces some summaries of an object fitted with SemiParSampleSel(). n.sim indicates the number of simulated coefficient vectors from the posterior distribution of the estimated model parameters, which are used to calculate "confidence" intervals for σ and θ, for instance. s.meth is the matrix decomposition used to determine the matrix root of the covariance matrix (see the documentation of mvtnorm for further details). prob.lev is the probability of the left and right tails of the posterior distribution used for interval calculations. The object list returned includes, for instance, summary tables for the selection and outcome equations for the parametric and nonparametric components, and the estimated standard deviation and association coefficient.
• conv.check(x) which produces some diagnostic information about the fitting procedure for an object fitted with SemiParSampleSel().

Simulations
In this section, we conduct a Monte Carlo simulation study to evaluate the empirical effectiveness of the copula regression spline sample selection models implemented in the package. The simulation study was performed using version 1.1 of the package SemiParSampleSel. For convenience, all the tables and figures of results are given in Appendix B.
In Tables 4-10 the association parameter used to generate the data is expressed in terms of Kendall's τ coefficient. For each combination of parameter settings, the number of simulated datasets was set to 250. We also explored the performance of the models in the absence of an exclusion restriction as detailed in Section 4.2.

Main results
Since the selection equation is not in principle affected by non-random sample selection bias, we focus on the estimation results for the outcome equation only. Tables 4-10 report the percentage relative bias and root mean squared error (RMSE) calculated for the estimators of α 21 , α 22 , σ, τ , and the RMSE for that of s 21 (z 1 ), calculated as The results presented in the tables show overall that the model employing the true copula achieves the lowest bias and/or RMSE of the estimators of all considered parameters in most cases. We can particularly observe this for data generated using the Clayton copula (see Table 5), where the estimators of α 21 , α 22 , σ, τ and s 22 obtained from the Clayton model outperform in terms of bias and RMSE those yielded by the other copula models. Using the right model is particularly important for estimating τ when its true value falls outside the dependence range covered by a given copula, as some of them allow only for a restricted interval of dependence (here, this is the case for AMH and FGM). The results also show that, for data generated using the Frank or normal copulas, both models yield comparably good results, hence reflecting the similarity between these two copulas (see Tables 4 and 9 for τ = 0.7). We observe a similar effect for data generated using the Joe and Gumbel copulas.
The findings also suggest that in some cases for small values of τ the choice of the correct copula model does not seem to play an important role in estimation (see Table 4 for τ = 0.1, and Tables 7 and 8), and often the Clayton and Gumbel models yield estimators with a relatively low bias and RMSE for such data regardless of the true copula.
As for copula model selection, the two criteria work overall well. The case of very weak dependence is the most difficult one as the underlying distribution converges to the normal product distribution when τ → 0. Thus in this situation all copulas entail very similar distributions. As an example, Figure 2 presents contour plots of FGM, Clayton and Joe copulas with normal margins for small values of the dependence parameter. For those distributions the choice of the correct copula based on an empirical sample is extremely difficult and the selection criteria appear to select an arbitrary model as can be seen in Table 7. At the same time, the finite sample performance of the estimators is unaffected by the wrong choice of a copula in those border cases as, again, all copulas tend to the same (normal product) distribution. Even in this difficult situation, AIC seems to be successful for some copulas (see Tables 5, 6 and 9). For medium and large values of τ , the true copula model is the most frequent choice with all model selection criteria, with AIC performing much better than BIC and achieving a hit rate of more than 90% in some cases (see Table 5). It is also worth noting that in general, the accuracy of the choice of the copula improves with the sample size as can be seen in Tables 15 and 16 in Appendix B, where the experiment was repeated for samples of size n = 3000 and n = 5000 pertaining to bivariate normal distribution. There we can also observe consistency of the estimators when the right copula is chosen. In the case of a wrong copula the estimators are inconsistent.

Absence of an exclusion restriction
Sometimes the same regressors have to be used in both selection and outcome equations as an exclusion restriction is not available. To investigate the performance of the copula sample selection models in this situation, the simulation study described above was repeated for the case in which system (9) did not include s 12 (z 2i ). The sampling experiments were based on where functions s 11 and s 21 and parameters α 11 , α 12 , α 21 , α 22 were the same as given at the beginning of Section 4 and the predictors u i and z 1i were generated in the same way.
Following a reviewer's suggestion we also considered the harder scenario in which the same functional form of the effect of variable z 1 was present in both model equations. Thus the simulated data were based on the equations We observe that the quality of the estimatorσ is practically unaffected by the lack of exclusion restriction for both scenarios considered in terms of root mean squared error and bias. For the remaining parameters, we observe that removing s 12 (z 2 ) from the selection equation increases the bias and RMSE of the estimators in most of the cases considered. We also observe a larger variance and more cases of lack of convergence when the exclusion restriction is not available. The lack of exclusion restriction leads to particularly unstable estimators of the Kendall's τ in terms of the relative bias in cases where this parameter is close to zero, as can be seen in Figures 5(a) and 6(b) where the relative percentage bias exceeds 160% and 110%, respectively, while the RMSEs ofτ in those cases do not indicate any particularly bad performance. The above values of relative bias imply however that the average estimated values of τ equal approximately −0.06 and −0.018, respectively, which in turn has a major importance while testing the absence of sample selection bias as it affects the size and power of the test. This issue is further discussed in Section 4.3. However, for scenario (10) the influence of the lack of exclusion restriction is usually much less significant than for the more difficult scenario (11). Moreover, in some cases the differences between the RMSEs of the model parameters for the cases with and without the exclusion restriction are rather negligible (see Figure 5b) and Figure 6a).

Testing the absence of sample selection bias
The key issue while fitting a sample selection model is testing the null hypothesis of absence of selection bias. If the variables y * 1i and y * 2i are associated then sample selection bias occurs and it is necessary to consider both, outcome equation and selection equation together, with the dependence structure between the two of them while estimating the model. Otherwise, the model can be much simplified by dropping the selection equation (and consequently the copula function) from the analysis.
In general, the approach to testing for sample selection bias relies on the specific sample selection model assumed. In the Heckman's two step procedure (Heckman 1979) sample selection bias is tested using the t-test related with the significance of the omitted variable. Dubin and Rivers (1989) considered likelihood ratio, Wald and Lagrange multiplier tests in the context of a censored probit model. Moreover, Vella (1992) proposed a conditional moment test.
In the context of copula regression spline sample selection models, testing for sample selection bias can be based on the dependence parameter θ as absence of sample selection bias is equivalent to the condition θ = 0 for normal, Frank, FGM and AMH copulas and θ = 1 for Gumbel copula (note that Clayton and Joe copulas do not allow independence). Because of the restrictions on the values of the copula association parameters, the use of classic testing approaches may yield unreliable results in some copula cases. As a practical alternative, the Kendall's τ coefficient can be employed. Hence the null hypothesis can be tested by checking whether the confidence interval for τ includes 0. In this section, results of a Monte Carlo study of the finite-sample performance of such approach are presented.
For data sets generated using (9), the null rejection probabilities of the test for absence of selectivity bias have been calculated based on 99%, 95% and 90% confidence intervals for the parameter τ . As before, for every data set different copulas were considered while fitting the spline sample selection models (normal, FGM, AMH, Frank and Gumbel). The Clayton and Joe copulas are not considered in the study as they allow only strictly positive dependence implying that the test would always reject the null hypothesis in these cases. For the case of a lack of the sample selection bias (τ = 0), the results of the Monte Carlo simulations based on 250 repetitions and sample sizes n = 1000, 3000, 5000 are presented in Table 11(a).
For n = 1000 the test using Frank, normal and Gumbel copulas suffers from high rejection frequency, whereas that using FGM has low rejection frequency. The rejection probabilities for the test using AMH copula are close to the nominal values. As n increases the null rejection probabilities converge to the nominal values; AMH and Frank copulas perform best achieving probabilities of the test that are very close to the theoretical value for n = 5000. The poorest performance occurs in the case of the Gumbel copula for which the rejection probabilities converge to the theoretical values at a much slower rate, as the sample size increases. Note, however, that the null hypothesis τ = 0 involves the boundary value of τ allowed under the Gumbel copula which implies that the testing is a difficult issue in this case.
Tables 11(b) and (c) present the null rejection probabilities for the sample selection bias test for data without exclusion restriction, generated using (10) and (11), respectively. In both cases, a negative influence of lack of the exclusion restriction can be observed as the values of probabilities are larger than those presented in Table 11(a) where the exclusion restriction is used. Moreover, the effect of lack of exclusion restriction is more severe for data generated using (11) where the variable z 1 enters both, the selection and the outcome equations, in the same functional form. However, in Tables 11(b) and (c) the same tendency regarding the comparison between different copulas can be observed with FGM, AMH and Frank having rejection probabilities close to the nominal values and Gumbel copula displaying the worst performance.
A study of power of the test for sample selection bias has also been conducted for the copulas where the null rejection probabilities were reasonable (FGM, AMH and Frank). Results are reported in Table 12. Using Frank copula leads to the most powerful tests. A poor performance can be observed when using the FGM and AMH copulas with FGM copula performing the worst. Those are the copulas allowing very limited scope for τ (τ ∈ [−2/9, 2/9] for FGM and τ ∈ [−0.1817, 1/3] for AMH) which makes them perform poorly when a strong dependence holds.
Tables 13 and 14 present the power of the test for sample selection bias in the absence of an exclusion restriction. Overall, the power of the test is smaller than when the exclusion restriction is present with the best performance observed, as before, when using the Frank copula and the worst when using the FGM copula.

Real data example
The copula regression spline sample selection models presented in this paper are illustrated using data from the RAND Health Insurance Experiment (RHIE) which was a comprehensive study of health care cost, utilization and outcome conducted in the United States between 1974and 1982(Newhouse 1999. As explained in the introductory section, the aim was to quantify the relationship between various covariates and annual health expenditures in the population as a whole. In this context, non-random sample selection arises because the sample consisting of individuals who used health care services differ in important characteristics from the sample of individuals who did not use them. Because some characteristics cannot be observed, traditional regression modeling is likely to deliver inconsistent estimates, hence the need to correct parameter estimates for non-random sample selection. We use the same subsample as in Cameron and Trivedi (2005, p Table 3. Additional information can be found in Cameron and Trivedi (2005, Table 20.4) and Newhouse (1999).
Following Cameron and Trivedi (2005) the outcome and the selection equations include the same set of regressors. As in Marra and Radice (2013) the two equations include logc, idp, fmde, physlm, disea, hlthg, hlthf, hlthp, female, child, fchild and black as parametric components, and smooth functions of pi, inc, fam, educdec and xage, represented using thin plate regression splines with basis dimensions equal to 10 and penalties based on secondorder derivatives (which are the default options in the package). Specifically, after reading the dataset, called ND, we load the package and specify the selection and outcome equations.
As for the selection equation, the results show that all variables, which enter the model parametrically, are statistically significant at the 10% level, except for fmde. The p values for the smooth terms, calculated as discussed in Section 2.3, indicate that fam does not have an impact on the response. Regarding the outcome equation, health status variables (such as physlm and disea) have an effect on annual health expenses, whereas health insurance variable logc seems not to determine the medical expenses. The p values for the estimated smooths indicate that inc, fam and xage are significantly different from zero. The estimate for σ is 1.51 and is significantly different from zero. The estimate for θ is positive and statistically different from zero. This indicates that the unobserved factors which affect the use of health services also affect medical expenses. The estimated degrees of freedom (total edf) of the penalized model, calculated as described in Section 2.2, is 65.349.
Using plot(), we produce the smooth function estimates for the outcome equation obtained from the AMH copula model; these are displayed in Figure 3.
R> plot(out_AMH, eq = 2, pages = 1, scale = 0, shade = TRUE, + seWithMean = TRUE, cex.axis = 1.6, cex.lab = 1.6) The shaded regions represent 95% confidence bands calculated from the posterior distribution, as described in Section 2.3. The "rug plot", at the bottom of each graph, shows the covariate values. The numbers shown on the y-axis in each plot indicate the estimated degrees of freedom (edf). Due to the identifiability constraints, the estimated curves are centered around zero. The results for xage and fam are consistent with the interpretation that health expenditure increases non-linearly as people become older, and that individual health expenditure decreases as family size increases.
We re-fit the AMH copula regression model by using the shrinkage option bs = "ts" in s().
Finally, we use est.aver() to calculate the overall estimated average from the fitted copula sample selection, with corresponding confidence interval obtained using the delta method.

Discussion
We introduced flexible continuous response sample selection models and discussed the R package SemiParSampleSel which implements them. The package can be used to fit models where the linear predictors are flexibly specified using parametric and non-parametric components, and the dependence between the selection and outcome equations is modeled through the use of copulas. The developments and implementation proposed here extend and complement previous R implementations of sample selection models. Allowing for non-normal bivariate distributions between the model equations is important since the assumption of bivariate normality is often criticized.
A large number of copulas have been proposed in the literature and our selection aims to reflect the most commonly used bivariate copulas in empirical applications as well as different types of dependence in the data. Copulas such as normal and Frank allow for equal degrees of positive and negative dependence and are comprehensive. On the other hand, copulas such as Clayton, Joe and Gumbel only account for positive dependence but capture a type of structure which is not reflected by Frank or normal. Specifically, the Clayton copula exhibits a strong left tail dependence and a relatively weak right tail dependence, and vice versa for Gumbel and Joe.
In order to address the issue of testing for the absence of sample selection bias, an approach based on a confidence interval for the Kendall's τ association coefficient has been explored. The empirical study has indicated that this approach performs well when using the Frank copula. However, for other copulas a significantly poorer small sample performance of the test has been observed. As every copula function has different characteristics and poses different issues while testing, each of them requires a separate study. It is also unclear weather a unique test that performs equally well for a wide range of copulas can be designed. Thus a further detailed study of the problem of testing for the sample selection bias in such a general context will be another direction of future research.
The reader is cautioned that the class of models presented here is not intended to be exhaustive; as with the majority of methods, under model misspecification the proposed approach does not provide consistent estimates. For example, if the marginals are non-normal (e.g., they exhibit a heavy-tailed behavior or should be modeled using skewed, contaminated and mixture distributions), biased estimates should be expected. The extent of the bias cannot be predicted a priori and it depends on the application at hand. In light of this, possible generalizations of the methods implemented in SemiParSampleSel are to extend the scope of the marginal distribution for the outcome equation, using for instance the gamma and Poisson distributions, and that of the available copulas in the package, using for example the Plackett and rotated copulas. Future research will also concern the development of model checking tools. Finally, a next release of the package will allow the user to model σ and θ as functions of linear predictors like those defined in Section 2.1; the theoretical and computational framework remains essentially unchanged.

A. Analytical expressions for g and H
In this section, we present expressions for the gradient vector and Hessian matrix of sample selection log-likelihood function (2) for the Clayton, Joe, FGM, AMH, Frank and Gumbel copulas, with normal margins. The expressions for the normal case can be found in Marra and Radice (2013). We use the notation , and Φ and φ are the standard normal distribution and density functions, respectively.
The elements of the gradient can be expressed as whereas those of the Hessian as and h 44 are defined below for each copula.

B. Tables and figures
For convenience, in this section we report all the tables and figures of results which are commented in Sections 4.1 and 4.2 as well as further supplementary tables.    Table 5: Percentage biases and RMSEs forα 21 ,α 22 ,σ,τ andŝ 21 (z 1 ), and percentage frequency at which each copula model was selected by AIC and BIC for data simulated using a bivariate Clayton copula with normal margins, when employing the normal, Clayton, Joe, FGM, AMH, Frank and Gumbel copula regression spline sample selection models. Number of simulated datasets is equal to 250. τ denotes the association between the selection and outcome equations. See Section 4 for further details.    Table 7: Percentage biases and RMSEs forα 21 ,α 22 ,σ,τ andŝ 21 (z 1 ), and percentage frequency at which each copula model was selected by AIC and BIC for data simulated using a bivariate FGM copula with normal margins, when employing the normal, Clayton, Joe, FGM, AMH, Frank and Gumbel copula regression spline sample selection models. Number of simulated datasets is equal to 250. τ denotes the association between the selection and outcome equations. See Section 4 for further details.    Table 9: Percentage biases and RMSEs forα 21 ,α 22 ,σ,τ andŝ 21 (z 1 ), and percentage frequency at which each copula model was selected by AIC and BIC for data simulated using a bivariate Frank copula with normal margins, when employing the normal, Clayton, Joe, FGM, AMH, Frank and Gumbel copula regression spline sample selection models. Number of simulated datasets is equal to 250. τ denotes the association between the selection and outcome equations. See Section 4 for further details. (a) Random samples generated using (9) and a normal product distribution (τ = 0). (b) Random samples generated using (10) and a normal product distribution (τ = 0).
n α(%) τ = 0.2 τ = 0.5 τ = 0. 7  1000  1  22  78  94  5  38  86  97  10  51  92  98  3000  1  40  95  100  5  59  97  100  10  67  99  100  5000  1  62  99  100  5  80  100  100  10  85  100  100   Table 14: Null rejection probabilities (%) for testing the null hypothesis H 0 : τ = 0 based on (100 − α)% confidence intervals for τ , when fitting (a) FGM, (b) AMH, (c) Frank copula regression spline sample selection models. Random samples were generated using (11), without exclusion restriction.  Table 15: Percentage biases and RMSEs forα 21 ,α 22 ,σ,τ andŝ 21 (z 1 ), and percentage frequency at which each copula model was selected by AIC and BIC for data simulated using a normal bivariate distribution and (9), when employing the normal, Clayton, Joe, FGM, AMH, Frank and Gumbel copula regression spline sample selection models. Sample size equals n = 3000.   (9) Table 17: Percentage biases and RMSEs forα 21 ,α 22 ,σ,τ andŝ 21 (z 1 ), and percentage frequency at which each copula model was selected by AIC and BIC for data simulated using a normal bivariate distribution and (10), when employing the normal, Clayton, Joe, FGM, AMH, Frank and Gumbel copula regression spline sample selection models. Number of simulated datasets is equal to 250. τ denotes the association between the selection and outcome equations. See Section 4.2 for further details.   (10) Table 19: Percentage biases and RMSEs forα 21 ,α 22 ,σ,τ andŝ 21 (z 1 ), and percentage frequency at which each copula model was selected by AIC and BIC for data simulated using a bivariate Joe copula with normal margins and (10), when employing the normal, Clayton, Joe, FGM, AMH, Frank and Gumbel copula regression spline sample selection models. Number of simulated datasets is equal to 250. τ denotes the association between the selection and outcome equations. See Section 4.2 for further details.    Table 21: Percentage biases and RMSEs forα 21 ,α 22 ,σ,τ andŝ 21 (z 1 ), and percentage frequency at which each copula model was selected by AIC and BIC for data simulated using a bivariate AMH copula with normal margins and (10), when employing the normal, Clayton, Joe, FGM, AMH, Frank and Gumbel copula regression spline sample selection models. Number of simulated datasets is equal to 250. τ denotes the association between the selection and outcome equations. See Section 4.2 for further details.    Table 23: Percentage biases and RMSEs forα 21 ,α 22 ,σ,τ andŝ 21 (z 1 ), and percentage frequency at which each copula model was selected by AIC and BIC for data simulated using a bivariate Gumbel copula with normal margins and (10), when employing the normal, Clayton, Joe, FGM, AMH, Frank and Gumbel copula regression spline sample selection models. Number of simulated datasets is equal to 250. τ denotes the association between the selection and outcome equations. See Section 4.2 for further details.    Table 25: Percentage biases and RMSEs forα 21 ,α 22 ,σ,τ andŝ 21 (z 1 ), and percentage frequency at which each copula model was selected by AIC and BIC for data simulated using a bivariate Clayton copula with normal margins and (11), when employing the normal, Clayton, Joe, FGM, AMH, Frank and Gumbel copula regression spline sample selection models. Number of simulated datasets is equal to 250. τ denotes the association between the selection and outcome equations. See Section 4.2 for further details.   (11), when employing the normal, Clayton, Joe, FGM, AMH, Frank and Gumbel copula regression spline sample selection models. Number of simulated datasets is equal to 250. τ denotes the association between the selection and outcome equations. See Section 4.2 for further details.  Table 27: Percentage biases and RMSEs forα 21 ,α 22 ,σ,τ andŝ 21 (z 1 ), and percentage frequency at which each copula model was selected by AIC and BIC for data simulated using a bivariate FGM copula with normal margins and (11), when employing the normal, Clayton, Joe, FGM, AMH, Frank and Gumbel copula regression spline sample selection models. Number of simulated datasets is equal to 250. τ denotes the association between the selection and outcome equations. See Section 4.2 for further details.  Table 28: Percentage biases and RMSEs forα 21 ,α 22 ,σ,τ andŝ 21 (z 1 ), and percentage frequency at which each copula model was selected by AIC and BIC for data simulated using a bivariate AMH copula with normal margins and (11), when employing the normal, Clayton, Joe, FGM, AMH, Frank and Gumbel copula regression spline sample selection models. Number of simulated datasets is equal to 250. τ denotes the association between the selection and outcome equations. See Section 4.2 for further details.  Table 29: Percentage biases and RMSEs forα 21 ,α 22 ,σ,τ andŝ 21 (z 1 ), and percentage frequency at which each copula model was selected by AIC and BIC for data simulated using a bivariate Frank copula with normal margins and (11), when employing the normal, Clayton, Joe, FGM, AMH, Frank and Gumbel copula regression spline sample selection models. Number of simulated datasets is equal to 250. τ denotes the association between the selection and outcome equations. See Section 4.2 for further details.   (11), when employing the normal, Clayton, Joe, FGM, AMH, Frank and Gumbel copula regression spline sample selection models. Number of simulated datasets is equal to 250. τ denotes the association between the selection and outcome equations. See Section 4.2 for further details.