Robust Analysis of Sample Selection Models through the R Package ssmrob

The aim of this paper is to describe the implementation and to provide a tutorial for the R package ssmrob , which is developed for robust estimation and inference in sample selection and endogenous treatment models. The sample selectivity issue occurs in practice in various ﬁelds, when a non-random sample of a population is observed, i.e., when observations are present according to some selection rule. It is well known that the classical estimators introduced by Heckman (1979) are very sensitive to small deviations from the distributional assumptions (typically the normality assumption on the error terms). Zhelonkin, Genton, and Ronchetti (2016) investigated the robustness properties of these estimators and proposed robust alternatives to the estimator and the corresponding test. We brieﬂy discuss the robust approach and demonstrate its performance in practice by providing several empirical examples. The package can be used both to produce a complete robust statistical analysis of these models which complements the classical one and as a set of useful tools for exploratory data analysis. Speciﬁcally, robust estimators and standard errors of the coeﬃcients of both the selection and the regression equations are provided together with a robust test of selectivity. The package therefore provides additional useful information to practitioners in diﬀerent ﬁelds of applications by enhancing their statistical analysis of these models.


Introduction
The present paper has three purposes.First, we introduce the R package ssmrob (Zhelonkin, Genton, and Ronchetti 2021) for the robust analysis of data with sample selection; second, we discuss some practical aspects about the use of robust methods in general and in sample selection models in particular; third, we propose a robust estimator for the endogenous treat-ment model.As advocated by several authors (see e.g., Athey and Imbens 2017), in order to increase transparency and credibility of research, it is reasonable to complement the reported results by a supplementary analysis.Several measures have been proposed, see Andrews, Gentzkow, and Shapiro (2017) and Athey and Imbens (2015).We believe, that reporting the results of a robust analysis should become a normal practice, if one uses parametric estimators.This will not only safeguard the statistical analysis from misleading implications possibly due to deviations from the assumptions on the model, but also enrich the analysis by offering additional useful information on the structure of the data.
The paper can be read in different ways.Those readers already familiar with robust statistics and interested in applying directly the new robust analysis can check the models covered here in Section 2 and then go directly to the implementation and the use of the package in Section 5 and 6, respectively.Those who would like to have a short introduction to the basic concepts of robust statistics and a general discussion on its role in this setup, including its relationship to parametric and semi-parametric methods, can read Section 2 and 3. Section 4 contains the description of the robust estimators and tests, which is useful to understand their structure and the options in the functions presented in Section 5 and 6.A user willing to use only the default options of the package can skip this section.Finally, the practical use of the package is discussed in Section 5 and Section 6, where two empirical applications are discussed in details by comparing the classical statistical analysis with its full robust counterpart.

Sample selection models and main functions
We consider three models, but for simplicity of exposition we will focus on the standard Heckman (1979) framework.

Heckman's model (Tobit-2 model)
where x ji is a vector of explanatory variables, β j is a p j × 1 vector of parameters, j = 1, 2, e ji are the error terms which follow a bivariate normal distribution with variances σ 2 1 = 1, σ 2 2 , and correlation ρ, and I is the indicator function.The variance parameter σ 2 1 is set to be equal to 1 to ensure identifiability.Here (1) is the selection equation, defining the observability rule, and (2) is the equation of interest or outcome equation.Notice that sometimes instead of NA (not available) zeros are used, although this notational practice can be misleading.The system (1)-( 2) is also known as Tobit-2 model according to Amemiya (1984) classification.In the analysis of the dataset in Section 6.2 we use this model, where y 1i in the selection equation defines whether or not the i'th individual enters into the labor force and y 2i in the outcome equation represents its log-wage.x 1i and x 2i are vectors of covariates which include age, education status, experience, and squared experience.

Switching regressions model (Tobit-5 model)
It is a natural extension of Heckman's model.In this case instead of NA in (2) we have a second regime.The selection equation (1) remains the same.The outcome equation can be written as follows: where x 2ji are the vectors of explanatory variables, β 2j are q j vectors of parameters, j = 1, 2, the error terms e 21i and e 22i together with e 1i follow a multivariate normal distribution (3)

Endogenous treatment model (ETM)
It has the same selection equation ( 1), but the outcome equation becomes where the dependent variable y 1 appears as an explanatory variable.The error terms follow a bivariate normal distribution with the same covariance structure as in the Tobit-2 model.Because of non-zero correlation between the errors, y 1 becomes endogenous.The parameter α is the average treatment effect.In this case y 1 is a treatment variable, for instance the decision to enroll in a job training program.
These three models are the central tools for the analysis of data with non-random sampling and play an important role in policy evaluation and treatment effect estimation in observational studies.
Package ssmrob (Zhelonkin et al. 2021) is available from the Comprehensive R Archive Network (CRAN) at https://CRAN.R-project.org/package=ssmrob.The main functions in the package are ssmrob() and etregrob().The function etregrob() is the estimator of ETM, which returns an object of class 'etregrob'.Function ssmrob() works as a wrapper.
In the current version (version 1.0) there are two options: the Heckman's selection model (Tobit-2) and the switching regressions model with probit selection mechanism (Tobit-5).If the Tobit-2 model is chosen, then heckitrob() is called; if the Tobit-5 model is chosen then heckit5rob() is called.The function ssmrob() returns the object of class 'heckitrob' or 'heckit5rob' for Tobit-2 or Tobit-5, respectively.

Estimation
In this section we briefly review different estimation approaches (for a thorough review, see e.g., Vella 1998) and discuss situations where each method is preferable from the robust statistics perspective.We focus on the Tobit-2 model, but the general arguments hold for two other models as well.

Parametric estimation
Without any doubt, the parametric approach is currently the most popular approach for the estimation of sample selection models in practice.It is straightforward to write the likelihood function which now can be relatively easily maximized, although it was quite computationally difficult in the seventies, when the model was introduced.Heckman (1979) proposed an appealing two-step estimator, which became standard because of its simplicity and easy interpretation.Consider the conditional expectation of y 2i given x 2i and the selection rule The expectation on the right hand side of ( 5) is in general not equal to zero, which leads to the following regression where ) is the conditional expectation on the right hand side of (5) called the inverse Mills ratio (IMR), ν i is the zero expectation error term and φ(•) and Φ(•) denote the density and cumulative distribution function of the standard normal distribution, respectively.Heckman suggested to estimate β 1 by probit maximum likelihood estimator (MLE) and to use ordinary least squares (OLS) in regression ( 6), where λ(•) is correcting for the selection bias.A similar two-step estimation structure can be used for switching regressions and ETM.The first step is probit MLE, the second step is OLS with corresponding conditional expectation as a selection bias correction.
Nowadays both Heckman's two-step estimator and the full information maximum likelihood estimator (FIML) are standard methods implemented in many software packages including SAS (SAS Institute Inc. 2014), using proc qlim, Stata (StataCorp 2017), using heckman and etregress, and R (R Core Team 2021) using package sampleSelection by Toomet and Henningsen (2008).
However, in spite of their simplicity and appealing interpretation, both estimators are very sensitive to the normality assumption on the error terms.The two-step estimator is considered to be slightly more robust than FIML (Cameron and Trivedi 2009, p. 544).In spite of the fact that the distributional assumptions for both estimators are the same, in the situation when there are measurement errors in the outcome equation, the two-step estimator remains consistent, while FIML is not (Stapleton and Young 1984).Another interesting case is when there are outliers only in the outcome equation.Then, in the two-step estimator at least the selection equation will be estimated correctly and the second step can become biased, while using FIML, parameters in both equations can become biased.In general, as we show below, both estimators can be arbitrarily biased, even when the assumed model F is approximately correct, say e.g., 99% of observations come from F and 1% from some arbitrary distribution G.
A natural way to mitigate the sensitivity problem is to use a more flexible parametric family of distributions.For instance, Lee (1983) and Marchenko and Genton (2012) proposed to use the t distribution, which allows longer tails and their adjustment using the degrees of freedom, Smith (2003) proposed to use a copula-based approach, and Ogundimu and Hutton (2016) proposed skew-normal selection model.Although, these approaches add more flexibility to the standard normal model, they do not provide full protection against possible deviations from the central model, i.e., when the true data generating distribution lies in a full neighborhood of the assumed parametric model; see Section 3.2.

Robust estimation
The robust approach offers a reasonable compromise between the fully parametric approach explained in Section 3.1 and the semiparametric approach discussed in Section 3.3.We still assume the classical normal model F as the central model, but we believe that the true data generating distribution lies in a neighborhood of it, i.e., F = (1 − )F + G, where G is some unknown arbitrary distribution and is (typically) small.We then derive (robust) estimators that are consistent for the central model F and remain stable when the true data generating distribution is F lying in a neighborhood of F , i.e., their bias will always be bounded no matter the distribution F .Notice that the classical estimators (FIML and Heckman's two-step) are also consistent for the central model, but can have an infinite bias when the underlying distribution of the data lies in a small neighborhood of the central model.
The robust estimators identify the same parameters as the classical parametric estimators and therefore retain the interpretation as in the classical case.The price to pay for robustness is some loss of efficiency at the central model.In the default tuning that is implemented in the package, this is approximately 20%.One might argue that the two-step estimator is inefficient itself, however even with minor contamination (1%), the robust estimator becomes more efficient than the classical one.We demonstrate this issue in the examples in Section 6.Details about the robust approach are provided in Section 4 and a complete discussion can be found in Zhelonkin et al. (2016).

Semi-and nonparametric estimation
If the parametric assumptions are not satisfied even approximately or if we are completely uncertain about the data generating distribution, the natural alternative is to use semi-and nonparametric estimators.The goal of robust methods is to estimate the parameters of the central model F in order to retain their interpretation in spite of the fact that the data were generated by some F .In a fully nonparametric setup the goal would be to estimate characteristics of the distribution F , such as its expectation instead of the expectation of the central model F .
The literature on semi-and nonparametric estimation on models with sample selectivity is large, see Ahn and Powell (1993), Newey (2009) and Chapter 8 in the book by Pagan and Ullah (1999).Here we only briefly discuss the estimators which preserve the linearity in the parameters and relax the distributional assumptions.For the treatment of nonlinear predictors we refer to the paper by Wojtyś, Marra, and Radice (2016) and their package SemiParSampleSel, see also Das, Newey, and Vella (2003) and references therein.
Similarly to parametric estimation there are one-step FIML-like estimators and semiparametric two-step estimators.An example of one-step estimator is a method proposed by Gallant and Nychka (1987).The method is based on Hermite series approximation of the true density.The authors provide the consistency result, but not the (asymptotic) distribution theory.
Another strategy is to use a two-step approach.Newey (2009) proposed first to estimate the selection equation as a semiparametric single index model (Klein and Spady 1993) and then to use a series expansion to correct for sample selection in the second step with OLS.However, there is a possible identification problem.The single index model does not identify β 1 : the intercept is not identified and other components of β 1 are identified only up to a proportionality factor.For the identification of parameters in the outcome equation we need to assume exclusion restrictions, i.e., there must be a variable in x 1 that is not included in x 2 .If x 1 = x 2 then the identification of the model depends entirely on the functional form and the distributional assumptions (Manski 1989).In practice it is often difficult to find such a variable, and as mentioned by (Vella 1998, p. 135) some economic models require the same explanatory variables to appear in both equations.Moreover, the intercept in β 2 is incorporated in the series expansion and needs a separate estimation.Heckman (1990) and Andrews and Schafgans (1998) proposed methods to estimate the intercept, which require the so-called identification at infinity.
If the practitioner switches to semi-nonparametric methods, then the exclusion restriction must be imposed, and not all the structural parameters can be identified.This creates a delicate trade-off between the parametric assumptions and the nonparametric identification assumptions.What is more restrictive in practice is a complicated question, which is beyond the scope of this paper.It must also be mentioned that the exclusion restriction is also desirable (but formally not compulsory) for the fully parametric setup since the inverse Mills ratio is quasi-linear on a wide range of its support.The FIML estimator also suffers from this problem; see Leung and Yu (2000) for a thorough discussion.
Most empirical work is based on the parametric approach and there are several reasons for that.The simplest one is that nonparametric methods are technically demanding.Heckman and Vytlacil (2007) mentioned the issues of the sensitivity of semiparametric estimators to the choices of smoothing parameters, trimming parameters and bandwidths.Another reason is that the parametric framework allows to estimate the structural parameters corresponding to the economic model and to evaluate the entire model and not only to estimate some parameters.

Quantile regression approach
Quantile regression (QR) (Koenker 2005) is often considered a robust alternative to OLS in linear regression.In the original paper by Koenker and Bassett (1978) the robustness issue was one of the central points for the introduction of QR together with the advantage (compared to standard OLS) of providing the estimation of all the quantiles (and not just the expectation) of the conditional distribution of the response variable given the covariates.
In the sample selection literature several QR methods have been proposed (Buchinsky 1998;Huber and Melly 2015;Arellano and Bonhomme 2017).The main focus of these papers is the the estimation of quantile effects, but the robustness properties of the estimators are not discussed.Although the QR approach is a powerful tool for statistical modeling, its robustness is not automatically guaranteed, at least by the currently available methods.Indeed there are situations, where the QR estimator can break down in the presence of even a very small deviation from the assumed model.We will get back to this point with more details at the end of Section 4.1.

A general robust approach
In this section we describe a general way to obtain robust estimators and tests for models with sample selectivity.

Robust estimation
The robustness issues with Heckman's two-stage estimator can be naturally described by studying the behavior of the estimator when the true error generating distribution is not the central normal model F but some perturbation F = (1 − )F + G, where G is an unknown arbitrary distribution, and is the contamination proportion.Our estimator in this framework will be consistent for the model F , but slightly biased for F , any distribution in a -neighborhood of F .Notice that the classical estimator can have an infinite bias at a F , see below.
The robust approach controls the worst possible bias that can occur when the data generating distribution belongs to that neighborhood.It turns out that the worst possible bias over all G can be linearly approximated by where IF (z; T, F ) is the so-called influence function (IF), a derivative of the estimator viewed as functional of the underlying distribution; see Hampel (1974) and Hampel, Ronchetti, Rousseeuw, and Stahel (1986).If the estimator has an unbounded influence function, the corresponding worst bias in the -neighborhood of F will be infinite.
To check the IF of Heckman's estimator, remember that β 1 is estimated in the first stage by MLE in a probit model, whereas β 2 is estimated by OLS in regression ( 6), with λ(•) correcting for the selection bias.
The population versions of the estimating equations defining the two-stage estimator are given by: where are the score functions of the first and second stage estimators, respectively, S(F ) and T (F ) are the population counterparts of the estimators of the parameters β 1 and β 2 respectively, and λ{(x 1 , y 1 ); S(F )} denotes the dependence of λ on S(F ), while T (F ) depends directly on F and indirectly on F through S(F ).
The two-stage estimator is the solution of the empirical counterpart of the above system of estimating equations.In particular, ( 7) with ( 9) corresponds to the estimating equation of the MLE in a probit model, whereas ( 8) with ( 10) is the estimation equation for OLS in the regression model ( 6).
It can be shown that the IF of T is proportional to the score functions Ψ 1 and Ψ 2 , i.e., it is linear in Therefore it is unbounded and this causes the non-robustness problems of the estimator; see Proposition 1 in Zhelonkin et al. (2016).
It is then clear that in order to robustify the classical Heckman's two-stage estimator, we need to bound the two score functions, which amounts to perform a robust probit estimation in the first stage and a robust regression in the second stage.To do that, a thresholding of the linear functions mentioned above is necessary and this is achieved by "huberizing" (i.e., applying the Huber function ( 12) to) the errors {y 1 − Φ(x 1 β 1 )} and (y 2 − x 2 β 2 − λβ λ ), and by downweighting x 1 and x 2 .This operation defines new scores functions, so-called of Mallows type.For the first stage, following Cantoni and Ronchetti (2001), we define: where is the Huber function defined by is a weight based on the inverse of the robust Mahalanobis distance computed by means of high breakdown robust estimators of location and scatter of the x 1i .Several options are implemented in the package; see the argument heckitrob.control() in Section 5.The constant α(β 1 ) ensures that the new modified estimating equation remains unbiased, i.e., E[Ψ R 1 {z 1 ; S(F )}] = 0.The modification of the classical score function entails an efficiency loss at the normal model.This can be viewed as an insurance premium, which provides protection against possible deviations from the model and their consequences on the bias of the estimator.Then, the tuning constant c 1 can be chosen to ensure a given level of asymptotic efficiency at the normal model.A typical value is 1.345.
For the second stage, we have a Mallows type robust score function: where c 2 = 1.345, z 2 = (x 2 , y 2 ), the weight function ω(•, •) is based on the robust Mahalanobis distance d(x 2 , λ), e.g., with c m the 95% quantile of the χ 2 -distribution.In the situation, when the exclusion restriction is not available an additional modification of weights is used in order to reduce the loss of efficiency.We split the covariate space in two subspaces, then calculate the weights separately, and finally combine them.Details are given in Zhelonkin et al. (2016), p. 814.
Notice that the original QR estimator (Koenker and Bassett 1978) has a bounded IF with respect to the dependent variable, but its IF is unbounded in the space of explanatory variables.In a sample selection setting there is also the first estimation stage.If one uses probit MLE, then the final QR won't be robust in any case.If the robust probit is used, then the introduction of robustness weights in the covariates space is required, and to the best of our knowledge, this is still an open question.Buchinsky (1998) and Huber and Melly (2015) used the semiparametric single index model in the first step and the original QR in the second step.This leads to identification issues discussed in Section 3.3, and this procedure is in general not robust to outliers in the covariates space.Moreover, it is less efficient than our robust estimator (Zhelonkin et al. 2016).To summarize, the QR approach is a very useful tool for modeling quantile effects.However, one should be careful about using it as a robust alternative, since a fully robust QR estimator in sample selection model is not available yet.

Robust inference
The robust two-stage estimator defined by ( 7)-( 8), where the score functions are given by ( 11)-( 13) is an M -estimator and its asymptotic variance is readily available (see Zhelonkin et al. (2016), p. 814, Formula ( 19)), which can be consistently estimated by means of the heteroscedasticity-consistent variance estimator; see Eicker (1967), Huber (1967) and White (1980).This allows us to construct a t test to test sample selection bias, i.e., H 0 : β λ = 0 vs H A : β λ = 0, based on the robust estimator of β λ and the corresponding estimator of its standard error.

Robust ETM
The two-step estimator of ETM consists of a probit MLE for the selection equation ( 1) and OLS for the following regression where νi is a zero mean error term, λ C is the inverse Mills ratio for the complete sample, defined by .
The second option is to represent (15) in the form of switching regressions and estimate the average treatment effect α as a difference between the intercepts of two states.Both estimators have unbounded IF's and are not robust.The IF of the former is derived in Appendix A.
The IF of the switching regressions estimator is discussed in the supplementary material of Zhelonkin et al. (2016).
The structure of the IF is similar to that of the Tobit-2 estimator.Hence, we apply the same principles as in Section 4.1 for the construction of the robust estimator for ETM.The first step is a robust probit with the score function (11).The second step is a Mallows type M-estimator with score function as in ( 13) where ψ c 2 (•) is the Huber function defined by ( 12), and ω(•, •) is the weight function defined by ( 14).The combination of these score functions bounds the IF, making the estimator (locally) robust.A simulation study illustrating the performance of the robust estimator as well as the classical two-step and FIML estimators is presented in Appendix B.

Implementation and description of the functions
The package is written completely in R. We give a description of the functions implemented in the package and discuss some aspects of the implementation.
ssmrob(outcome, selection, data, control = heckitrob.control()) This function is a wrapper which, depending on the type of arguments in selection and outcome, chooses the model and calls the corresponding estimator.The argument outcome is a simple formula for the case of Tobit-2, or a list of two formulas for the case of switching regressions.The argument selection is a formula for the selection equation.The argument control defines the accuracy and the robustness tuning parameters.
If "hat" is chosen, then weights on the design of the form √ 1 − h ii are used, where h ii are the diagonal elements of the hat matrix.If "robCov" is chosen, then weights based on the robust Mahalanobis distance of the design matrix are used, where the covariance matrix is estimated by the rob.cov() method from package MASS (Venables and Ripley 2002) using the minimum volume ellipsoid estimator (Rousseeuw 1985).Similarly, if "covMcd" is chosen, the covariance is estimated by the minimum covariance determinant estimator (Rousseeuw and Van Driessen 1999).The arguments tcc and t.c are the tuning constants c 1 and c 2 for the Huber-functions of the first (11) and second (13) stage estimators, respectively.heckitrob(outcome, selection, data, control = heckitrob.control()) This function presents the robust two-stage estimator of the Tobit-2 model.The arguments outcome and selection must be formulas.Note that, if tcc and t.c (tuning constants c 1 and c 2 ) are large and the leverage weights are ones, then the estimator converges to the classical Heckman's two-stage estimator.
heckit5rob(outcome1, outcome2, selection, data, control = heckitrob.control()) Similarly to the previous function, but heckit5rob() estimates the switching regressions model.The set of the tuning parameters for the second step estimator is used for both states.
etregrob(outcome, selection, data, control = heckitrob.control()) This function presents the two-step estimator for ETM, described in Section 4.3.It uses the same control function as the Tobit-2 and Tobit-5 models.
The computation of the asymptotic variance matrices is performed by the functions heck2steprobVcov(), heck5twosteprobVcov() and etreg2steprobVcov() for the Tobit-2, Tobit-5 and ETM models, respectively.They are used inside of the corresponding estimator functions.The output can be obtained by using the vcov() method on the object of 'heckitrob', 'heckit5rob' or 'etregrob' classes.
The package provides the generic functions.The function print() prints the estimation results.The function summary() calculates and prints the summary of the estimation with standard errors, t values and p values, and returns an object of class 'summary.heckitrob'or 'summary.heckit5rob','summary.etregrob'for Tobit-2, Tobit-5, and ETM respectively.The function coef() extracts the estimated coefficients, the function vcov() returns a list of two or three variance-covariance matrices, one for the selection equation and one or two (depending on the model) for the outcome equation.The functions fitted() and residuals() return one vector or a list that contains two vectors of fitted values or residuals, also depending on the model (one vector for Tobit-2 and ETM, two vectors for Tobit-5).The function model.matrix()has an argument part with "outcome" as a default value, which produces a design matrix of the outcome equation for Tobit-2 and ETM and a list with two matrices for Tobit-5.If part = "selection" then a design matrix for the selection equation is returned.Finally, the function nobs() returns the number of observations.

Using the ssmrob package
In this section we provide illustrative examples.First we treat a simulated example for sample selection model ( 1)-(2).It illustrates the behavior of the classical and robust estimators when the data generating process (DGP) is known.Second, we examine two empirical applications.
Both have already been analyzed in the literature and are well known.In the example about wage offers there is no robustness problem, and the results of the estimation by classical and robust procedures are close.In the second example (ambulatory expenditures), on the contrary, the distributional assumptions are violated and the robust estimator is different from the classical estimator.The behavior of robust and classical estimators for Tobit-5 and endogenous treatment models is similar and their discussion can be found in Appendix.
To start, we load the package ssmrob.
In order to study the robustness of the estimator we introduce a contamination.With probability 0.01 we generate outliers from the degenerate distribution putting mass one at the point x 1 = (−2, −2, −1), x 2 = (−2, −2, −1), (y 1 , y 2 ) = (1, 0).It generates leverage outliers, which also emerge in the outcome equation since y 1 = 1.This contamination corresponds to the situation when some observations are extremely unlikely to be observed, but somehow appear in the sample.
R> uni <-runif(N, 0, 1) R> for(i in 1 The results of the robust estimation are as follows: R> summary(ssmrob(y1 ~x1, y2 ~x2, control = rob.ctrl)) Call: ssmrob(selection = y1 ~x1, outcome = y2 ~x2, control = rob.ctrl)The robust estimator is stable and does not deviate a lot from the true values of the parameters.For comparison we present the output of the classical estimator from the sampleSelection package.
R> summary(selection(y1 ~x1, y2 ~x2, method = "2step")) The parameter estimates are clearly biased.It is important to notice that both estimation stages are affected and the biases are clearly larger than those from the robust estimator.In practice this divergence between estimators should indicate a problem with the data.Moreover, the estimator of the inverse Mills ratio increases to 1.76, while with the robust estimator it is 0.83, which is much closer to the true value of 0.7.Note that rho is not a sample correlation.It is not bounded between -1 and 1 and can easily exceed these limits, see Greene (1981) for details.This effect can also occur due to contamination, as we see that rho is 1.145.Hence, one should treat this quantity with caution, while using two-step estimator.
When the exclusion restriction is not available, the influence of the contamination is stronger.The biases are larger as well as the loss of efficiency.One can verify this by repeating the analysis above erasing the line > x2[, 3] <-rnorm(N, 1, 1) in the data generation step.From this we can conclude that the contamination and the absence of exclusion restriction make the classical estimator particularly unstable.
We can also obtain the classical estimates using the ssmrob() function by using large values, e.g., 1000, of the tuning parameters tcc and t.c, and by setting the leverage weights weights.x1and weights.x2 to "none".

Wage offer data
The first dataset is an example from Wooldridge (2002).We consider the Example 17.6 (p.565) about the wage offer for married women, with potential selectivity bias into the labor force.A Heckman model is used, as presented in Section 2. The dataset consists of 753 observations, with 325 (43.2%) truncated observations.The selection equation defining the labor force participation includes the following variables as age, education status (educ), non-wife income (nwifeinc), experience (exper), squared experience (expersq), number of children less than 6 years old (kidslt6), and number of children older than 6 years (kidsge6).

Ambulatory expenditures data
The second example is an example considered in the book by Cameron and Trivedi (2009) ------------------------------------------  In this case we see that the robust estimates are considerably different from the classical estimator.The classical estimator returns the inverse Mills ratio coefficient of −0.48 with p value of 0.099, which is significant only at 10% level.Cameron and Trivedi (2009) raised concern about robustness issues in this case (p.544), and that the conclusion of no selection bias was doubtful.The robust estimator returns β λ = −0.68 with p value of 0.009, which is significant at 1% level.If we do not reject the hypothesis of no selection bias then the OLS should be preferred.Note that if one uses FIML estimator then the likelihood ratio test has p value of 0.36, which clearly indicates that there is no selection bias.The robust estimator is more reliable and should be preferred in such situations.
We repeat the analysis, but now include the income variable as an exclusion restriction.

Conclusion
The package ssmrob extends a data analyst toolbox by providing the instruments for a robust analysis of specific models with sample selectivity.Robust methods allow to deal with deviations from the assumed model.Given the well-documented sensitivity of the parametric estimators the practitioners using them ideally should also verify that the classical and robust estimators do not diverge considerably.If it happens, then the robust estimators are (typically) more reliable.In any case it should be a signal for a thorough examination of the data in order to prevent misleading conclusions.
The package contains estimators for three very popular models.However the list of models with sample selectivity is larger (see Maddala 1983, for a list of examples).The discussed two-step robust estimation framework provides the necessary background for construction of robust alternatives for other models.Moreover, the estimators discussed in this paper are used as the first steps for evaluation of various treatment effects (Heckman, Tobias, and Vytlacil 2003).The robustness of the treatment effect estimators is investigated by Naghi, Váradi, and Zhelonkin (2021), and the implementation of the robust alternatives is planned for future extensions of the package.

A. Endogenous treatment model: IF of the two-step estimator
We derive the IF of the classical two-step estimator (probit MLE/OLS) of the endogenous treatment model.The first estimation step is the probit MLE.Let S be its estimating functional.Then, its IF is given by The inverse Mills ratio for the complete sample is defined by .
The OLS score function is Using the result for the two-step M-estimators (Zhelonkin, Genton, and Ronchetti 2012) we get the IF of the second step , where

B. Endogenous treatment model: Simulation study
We illustrate the performance of the proposed estimator in a simulation study.For the data generating process we use a modification of the setup used in Section 6.1.We generate y 1i = I(x 1i (1, 1, 0.75) + e 1i > 0), where x 1i ∼ N {(0, −1, 1) ; diag(1, 0.25, 1)}.The outcome equation is y 2i = x 2i (1.5, 1, 0.5) + 1.25y 1i + e 2i , where the x's are the same x 2 = x 1 if the exclusion restriction is not available, and if it is available, then the last explanatory variable of x 2 is generated independently of x 1 from the same distribution.The errors e 1 and e 2 follow a zero mean bivariate normal distribution with σ 1 = σ 2 = 1 and ρ = −0.7. is 1000 and we repeat the experiment 500 times.For other sample sizes (500 or 5000) the results are similar.
We estimate the model without contamination and with two types of contamination.In the first scenario we have outliers in the control group, i.e., x 1 is contaminated when y 1 = 0. To generate the outliers, we replace the original observations with probability 0.01 by a point mass at (y 1 , y 2 ) = (0, 1) and x 1 = x 2 = (2, 0, 3).In this case the observation is unlikely to be in the control group but appears there.In the second scenario we have outliers in the treatment group, i.e., y 1 = 1.The mechanism of contamination is the same, but the point mass is at (y 1 , y 2 ) = (1, 0) and x 1 = x 2 = (−2, −2, −1).In this case the observation should be in the control group but appears in the treatment group.
We compare the classical two-step estimator, full information maximum likelihood (FIML) and our robust two-step estimator.In Table 1 and Figure 1 we report the estimation's results of the treatment effect parameter α.All three estimators perform well at the model.The biases are close to zero, FIML is the most efficient and the robust two-step estimator is a bit less efficient than the classical two-step estimator.When the contamination is added, both FIML and classical two-step break down.They are outperformed by the robust estimator in terms of bias and efficiency.Notice that the biases are negative, which in the context of treatment effects can lead to a non-significant effect when it is present.The robust estimator allows some bias (it is clearly visible when the exclusion restriction is not available), but it is small relatively to the classical estimators.In Table 2 (with exclusion restriction) and Table 3 (without exclusion restriction) we report the estimation's results of the other parameters.Similarly, all estimators perform well without contamination.FIML is the most efficient, then the classical two-step, then the robust two-step, as expected from the theory.Under contamination both FIML and the classical two-step are biased with inflated variances, while the robust estimator remains stable.

C. Tobit-5 model: Example
Similarly to the Tobit-2 model, we generate the data using the same algorithm.
To compare, below we give the output of the classical estimator obtained by our package.

Figure 1 :
Figure1: Boxplots of the estimated average treatment effect (ATE), parameter α in (4).Left panels correspond to the case with exclusion restriction, right panels correspond to the case without exclusion restriction.
It imports the packages sampleSelection (Toomet and Henningsen 2008), robustbase (Todorov and Filzmoser 2009; Maechler et al. 2021), and MASS (Venables and Ripley 2002).The package mvtnorm (Genz et al. 2020) is used for the examples , p. 544-547.The data on ambulatory expenditures comes from the 2001 Medical Expenditure Panel Survey.The sample size is 3328 observations, with 526 (15.8%) zero expenditures.The set of explanatory variables contains age, gender (female), education status (educ), ethnicity (blhisp), number of chronic diseases (totchr), insurance status (ins) and income, where income is used only in selection equation as an exclusion restriction variable.Other variables enter both the selection equation and the outcome equation.First we carry out the analysis without exclusion restriction.We apply the classical estimator and we obtain the following output: R> selectEq <-dambexp ~age + female + educ + blhisp + totchr + ins R> outcomeEq <-lnambx ~age + female + educ + blhisp + totchr + ins R> summary(selection(selectEq, outcomeEq, data = MEPS2001, method

Table 1 :
Bias and mean-squared error (MSE) of the FIML, classical 2-step and robust 2-step estimators of treatment effect parameter α at the model and under two types of contamination, when x is contaminated and corresponding y 1 = 0 (columns 4 and 5) and y 1 = 1 (columns 6 and 7).

Table 2 :
Robust Analysis of Sample Selection Models in R Bias and mean-squared error (MSE) of the FIML, two-step and robust two-step estimators of endogenous treatment model parameters at the model and under two types of contamination, when x is contaminated and the corresponding y 1 = 0 (columns 4 and 5) and y 1 = 1 (columns 6 and 7).The exclusion restriction is available. ssmrob:

Table 3 :
Bias and mean-squared error (MSE) of the FIML, classical two-step and robust two-step estimators of endogenous treatment model parameters at the model and under two types of contamination, when x is contaminated and the corresponding y 1 = 0 (columns 4 and 5) and y 1 = 1 (columns 6 and 7).The exclusion restriction is not available.The estimates are close to the true values of the parameters.Next, we introduce the contamination.With probability 0.01 we introduce the leverage outliers in the selection equations, such that they appear in the equation of interest in the second regime.We estimate the contaminated sample and obtain the following output: R> summary(ssmrob(y1 ~x1, list(y2 ~x21, y2 ~x22), control = rob.ctrl)) The estimated treatment effect α is 1.269, the estimated parameter for the control function (CIMR -complete inverse Mills ratio) is −0.734 with a true value equal to −0.7.Now we add the contamination: