Discrete Choice Models with Random Parameters in R: The Rchoice Package

Rchoice is a package in R for estimating models with individual heterogeneity for both cross-sectional and panel (longitudinal) data. In particular, the package allows binary, ordinal and count response, as well as continuous and discrete covariates. Individual heterogeneity is modeled by allowing the parameter associated with each observed variable (e.g., its coefficient) to vary randomly across individuals according to some pre-specified distribution. Simulated maximum likelihood method is implemented for the estimation of the moments of the distributions. In addition, functions for plotting the conditional individual-specific coefficients and their confidence interval are provided. This article is a general description of Rchoice and all functionalities are illustrated using real databases.


Introduction
Discrete choice models or qualitative choice models are intended to describe, explain and predict choices between two or more discrete alternatives, such as buying a car or not, choosing between heating systems, or choosing among different occupations among other applications (Train 2009). One of the main feature of these models is that the choice made by each individual can be derived under the assumption of utility-maximization behavior. 1 Essentially, the decision maker chooses the alternative that has the higher satisfaction utility given a set of attributes of the person, the attributes of the alternatives, plus a random term intended to capture the factors that affect utility but are not included. For example, the utility that individual i obtains from alternative j = 1, . . . , J can be written as U ij = V ij + ij , where V ij is the deterministic part of the utility that depends upon observable characteristics and ij is the random part. Individual i chooses the alternative j if U ij > U ik for all k = j. Once the distribution of ij and the nature of the observable output decision are specified, a probabilistic model can be used in order to estimate the parameters of the behavioral process and the probability of choosing some alternative. This formulation known as the random utility model (RUM) has been the standard approach in order to derived the conditional logit Model (McFadden 1974). However, it can also be applied to describe the relation of explanatory variables and the decision when the observed output is binary, ordered or count data. For these types of dependent variables, the traditional binary logit/probit model, ordered probit/logit, and Poisson model for count data can been applied. For a general overview of these models see for example Long (1997) and Winkelmann and Boes (2006).
One important modeling shortcoming of these methods is the inherent assumption of a fixed and unique coefficient for all individuals in the sample, which might not be realistic given that individuals are intrinsically heterogeneous. As an example, consider the effect of the number of young children on scientists' productivity measured as the number of published articles. Since having more kids may imply fewer hours available to work, we would expect, on average, a negative correlation between the number of children and the level of publications. This negative relationship is global, but it neglects the fact that for some scientists having more kids might imply having a more organized life-style and hence increase their productivity (see for example Krapf et al. 2014). Thus, the negative coefficient might hide significant individual heterogeneity resulting in misleading inferences for a subgroup of individuals with a positive coefficient. Similarly, we might assume that the effect is negative for all scientists, but the magnitude of the detrimental effect might vary across the sample. In this case, there exists individual heterogeneity but only in the negative domain of the coefficient. Finally, we might also find that the coefficient is zero (not significant). In such case, we would conclude that having young children is not important for productivity. Nonetheless, this may be due to the fact that heterogeneity among scientists in the sample cancels out positive and negative effects. Under these circumstances a model that allows for individual heterogeneity may be more appropriate.
Thanks to the emergence of more powerful computers and the development of simulationbased models, much of the recent progress in regression models with limited dependent variables has focused on more realistic behavioral models that allow individual heterogeneity in the parameters. The modeling strategy to accommodate such heterogeneity is assuming that coefficients vary randomly across individuals according to some continuous distribution, denoted by g (θ). Since the distribution is unknown and must be specified a priori by the researcher, this type of individual heterogeneity is usually labeled as 'unobserved heterogeneity' in the literature. All the information of the unobserved heterogeneity is capture by the parameters of the distribution θ, which usually represent the mean and variance of the coefficient. The goal is to estimate those parameters in order to get a profile of individual heterogeneity.
Although the random parameter approach has been widely applied to the Multinomial logit model (Train 2009;Hensher and Greene 2003), there are just a few applications for ordinal (Falco et al. 2015;Greene and Hensher 2010b) and count models (Gourieroux et al. 1984;Greene 2007;Anastasopoulos and Mannering 2009). The scarcity of applications to models other than the multinomial logit appears to be driven by the lack of statistical software that enables to estimate random parameter models for binary, ordinal, and count response. To my knowledge, only the commercial programs LIMDEP (Greene 2015a) and NLOGIT (Greene 2015b) are able to estimate those types of models in a concise and flexible manner. The Rchoice (Sarrias 2016) package for R (R Core Team 2016) is intended to make these estimation methods available to the general public and practitioners in a friendly and flexible way.
The aim of this paper is to present the functionalities of Rchoice for estimating ordered, count and binary choice models with random parameters. Its current version 0.3-1 allows estimating cross-sectional and panel (longitudinal) data models and includes also new functionalities to obtain the standard errors of the variance-covariance matrix of the random parameters. All models in Rchoice are estimated using simulated maximum likelihood (SML), which is very flexible for estimating models with a large number of random parameters (Train 2009). An additional characteristic of Rchoice is the ability to retrieve the individual conditional estimates of either the random parameters or compensating variations.
Rchoice is also intended to complement other related packages in R. For example, there exist several packages to estimate binary, count and ordered models with fixed parameters. The glm function (R Core Team 2016) allows to estimate different kind of discrete choice models such as Poisson and binary models. The function probit from the micEcon (Henningsen 2014) package allows to estimate probit model. Moreover, the function polr from the package MASS (Venables and Ripley 2002) allows to estimate ordered probit and logit models. The advantage of Rchoice is that allows more flexibility in the optimization routines, which might improve the convergence speed. Furthermore, Rchoice offers an alternative approach to fitting random effects models in the context of panel data to those already programmed in pglm (Croissant 2013b) and lme4 (Doran et al. 2007). Regarding random parameter models, mlogit (Croissant 2013a), RSGHB (Dumont and Keller 2015) and gmnl (Sarrias and Daziano 2015) allow estimating models with individual heterogeneity in the context of multinomial logit model in a very similar fashion to Rchoice. Other packages such as FlexMix (Grün and Leisch 2008) and PReMiuM (Liverani et al. 2015) allow to estimate models with individual heterogeneity by assuming that g(θ) is discrete or a mixture of distributions. All these packages available in R cover almost all possibilities of estimating discrete choice models with random parameters or individual heterogeneity. The Rchoice is available from the Comprehensive R Archive Network (CRAN) at http://CRAN.R-project.org/package=Rchoice. This paper is organized as follows. Section 2 briefly explains the methodology for models with random parameters. Section 3 gives a general description of the functions in Rchoice. Section 4 shows how Rchoice handles the random coefficients. All functionalities of Rchoice using real databases are presented in Section 5. Section 6 explains some computational issues that may arise when estimating random parameter models using SML. Finally, Section 7 concludes.

Models with random parameters
In order to develop and motivate the idea behind random parameter models, consider the following latent process where y * it is a latent (unobserved) process for individual i in period t, x it is a vector of covariates, and it is the error term. Note that the conditional probability density function (PDF) of the latent process f (y * it |x it , β i ) is determined once the nature of the observed y it and the population PDF of it is known: if y it is binary and it is distributed as normal, then the latent process becomes the traditional probit model; if y it is an ordered categorical variable and it is logistically distributed, then the traditional ordered logit model arises. Formally, the PDF for binary, ordered, and Poisson model are, respectively For the binary and ordered models, F (·) represents the cumulative distribution function (CDF) of the error term, which F ( ) = Φ( ) for probit and F ( ) = Λ( ) for logit. 2 For the ordered model, κ j represents the threshold for alternative j = 1, . . . , J −1, such that κ 0 = −∞ and κ J = +∞.
In the structural model given by Equation 1, we allow the vector coefficient β i to be different for each individual in the population. In other words, the marginal effect on the latent dependent variable is individual-specific. Nevertheless, we do not know how these parameters vary across individuals. All we know is that they vary according to the population PDF g(β i |θ), where θ represents the moments of the distribution such as the mean and the variance, which must be estimated. A fully parametric model arises once g(β i |θ) and the distribution of are specified. In Section 4 I detail the distributions allowed by Rchoice.
For simplicity in notation, assume that the coefficient vector is independent normal distributed, so that β k ∼ N (β k , σ 2 k ) for the k-th element in β i . Note that each coefficient can be written as β ki = β k + σ k ω i , where ω i ∼ N (0, 1), or in vector form as β i = β + Lω i , where L is a diagonal matrix that contains the standard deviation parameters, σ k . All the information about the individual heterogeneity for each individual attribute is captured by the standard deviation parameter σ k . If σ k = 0, then the model is reduced to the fixed parameter model, but if it is indeed significant then it would reveal that the relationship between x itk and y it is heterogeneous and focusing just on the central tendency β k alone would veil useful information. It is useful to note that the random effect model is a special case in which only the constant is random (see Section 5.3).

Extensions: observed heterogeneity and correlation
One important and straightforward extension of the random parameter model is to allow the coefficients to be correlated. In this case, L is a lower triangular matrix (also known as the Cholesky matrix), which produces the covariance matrix of the random parameters, LL = Σ. Furthermore, observed heterogeneity can be also accommodated by allowing the parameter heterogeneity to be partly systematic in terms of observed variables. This type of model is also known as a hierarchical model (Greene and Hensher 2010a). Formally, the parameter vector can be written as where Π is a matrix of parameters, s i is a vector of covariates that do not vary across time, and ω ∼ N (0, I). Then, the mean of the parameters is E(β i ) = β + Πs i + LE(ω) = β + Πs i and its covariance is VAR(β i ) = E(Lω(ωL) ) = LE(ωω )L = LIL = LL = Σ.
As an illustration of the formulation above, suppose that we are interested in modeling some latent process, which is explained by two variables. The parameters associated with each observed variable, β 1i and β 2i , are not fixed, but rather they vary across individuals and are correlated. Furthermore, suppose that the means of the random parameters depend upon some observed individual socio-economic variables: S i , B i and C i . Then, the parameters can be written as: or in vector form: Since the mean of the random parameter is a function of observed variables, this specification relaxes the assumption of homogeneity across individuals in terms of the means. The unobserved part accounts for all other individual-specific factors that cannot be captured for the observed variables. Note that the variance-covariance matrix of the random parameters is: , and the conditional mean vector is E(β i |s i ) = β +Πs i , which varies across individuals because of s i . Illustrations with real data for the model with correlated random parameters and with observed heterogeneity are presented in Section 5.2 and Section 5.4, respectively.

Estimation
In this section, I explain the simulated maximum likelihood (SML) procedure used by Rchoice to estimate models with random parameters. For a more complete treatment of SML see for example Gourieroux and Monfort (1997); Lee (1992); Hajivassiliou and Ruud (1994) or Train (2009).
Let y i = {y i1 , y i2 , . . . , y iT i } be the sequence of choices made by individual i. Assuming that individuals are independent across time, the joint PDF, given β i , can be written as where f (y * it |x it , β i ) is given in 2 for each model. Since β i is unobserved, we need to integrate it out of the joint density. The unconditional PDF will be the weighted average of the conditional probability 4 evaluated over all possible values of β, which depends on the parameters of the distribution of β i : The probability in 5 has no close-form solution, that is, it is difficult to integrate out the random parameter and hence it is difficult to perform maximum likelihood (ML) estimation. However, ML estimation may still be possible if we instead use a good approximation P i (θ) of P i (θ) to form a likelihood function.
But, how can we obtain P i (θ)? A good approximation can be obtained by Monte Carlo integration. This procedure provides an alternative to deterministic numerical integration.
Here we can 'simulate' the integration using random draws from the distribution g(β i ). 3 For a given value of the parameters θ, a value of β i is drawn from its distribution. Using this draw of β i , P i (θ) from Equation 5 is calculated. This process is repeated for many draws, and the average over the draws is the simulated probability. Formally, the simulated probability for individual i is where P(y i |X i , β ir ) is the simulated probability for individual i in period t evaluated at the r-th draw of β i , and R is the total number of draws. Given independence over i, the SML estimator is the value θ that maximizes: log P i (θ). Lee (1992) and Hajivassiliou and Ruud (1994) show that under regularity conditions, the SML estimator is consistent and asymptotically normal. When the number of draws, R, rises faster than the square root of the number of observations, the estimator is asymptotically equivalent to the maximum likelihood estimator.
Rchoice uses maxLik package (Henningsen and Toomet 2011) to perform ML and SML estimation procedures. All models with random parameters are estimated using SML, while models with fixed parameters are estimated by ML. Furthermore, Rchoice uses analytical gradients and the Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithm (the default) to iteratively solve the SML. For further details about the computation of SML see Section 6.

An overview of Rchoice package
After installation, Rchoice is loaded by typing:

R> library("Rchoice")
The main function in the package is Rchoice, which enables us to estimate the models. The arguments for this function are the following:

.) NULL
The formula argument is a symbolic description of the model to be estimated and consists of two parts. The first one is reserved for standard variables with fixed and random parameters. The second part is reserved for variables that enter the mean of the random parameters (s i from Equation 3). The usage of the second part is further explained in Section 5.4. The models are specified by the argument family. The models estimated by Rchoice and the corresponding family function are presented in Table 1.
The family for binary and Poisson models comes from stats package (R Core Team 2016), while those for ordered models are included in Rchoice.
The main arguments for the control of the random parameters are ranp, R, haltons, seed, correlation and mvar. The argument ranp is a vector that specifies the distribution of the random parameters. The distributions allowed by Rchoice are presented in Section 4 and illustrated in Section 5.2. R specifies the number of draws used for the simulation of the probabilities. haltons indicates the type of draws used in the simulation procedure and it is further explained in Section 4. The argument seed sets the seed for the pseudo-random draws, being 10 the default. If correlation = TRUE, the correlated random parameter model presented in Section 2.2 is estimated. For an example see Section 5.2. Finally, the argument mvar is a named list that indicates how the variables that modify the mean of the random parameters enter in each random parameter. This feature is illustrated in Section 5.4.
The main arguments for panel or longitudinal data bases are panel and index. If panel = TRUE, then the panel structure of the data is taken into account. index is a string variable indicating the 'id' for individuals in the data. Models with panel structure are presented in Section 5.3.
Finally, the arguments print.init, init.ran and gradient are intended to control the optimization procedure. If print.init = TRUE, then the initial values for the optimization procedure are printed. The argument init.ran sets the initial values for the standard deviation of the random parameters, σ k , ∀k. Numerical instead of analytical gradients are used in the optimization of either ML or SML if gradient = FALSE. Other arguments that control the optimization can be passed to maxLik package. A more detail discussion of these arguments are presented in Section 6.

Drawing from densities
As mentioned in Section 2.1, the distribution of the random parameters g(β i ) can take any shape, but it must be chosen a priori by the researcher. This is a critical step in applied work. As stated by Hensher and Greene (2003), "distributions are essentially arbitrary approximations to the real behavioral profile. The researcher chooses a specific distribution because he has a sense that the "empirical truth" is somewhere in their domain". Therefore, some prior theoretical knowledge about the expected domain of the coefficients may lead to a more appropriate choice of the distributions. The distributions allowed by Rchoice are the following: normal, triangular, uniform, log-normal, truncated normal and Johnson S b . These distributions are explained in further detail below.
Another important issue is that good performance of SML requires a very large number of draws. The main drawback to this approach is that with large samples and complex models, the maximization of SML can be very time consuming. Researchers have gained speed with no degradation in simulation performance through the use of smaller number of Halton draws (Bhat 2001;Train 2000). 4 The Halton sequence is constructed based on a deterministic method that uses prime numbers as its base (see Train 2009, for further details). The idea is that, instead of taking independent random draws, simulation can potentially be improved by selecting evaluation points more systematically and with better coverage (Sándor and Train 2004). Figure 1 shows two sequences of 1,000 Halton draws (Panel A) and two sequences of 1,000 pseudo-random draws (Panel B). The Halton draws are based on prime numbers 2 and 3. It can be observed that the Halton draws have better coverage of the unit square than the pseudo-random draws. This characteristic of the Halton sequence ensures a better coverage of the multidimensional area of integration and reduces the computation time of the SML.
Rchoice handles Halton or pseudo-random draws in the following way. Suppose that there are K random parameters. Then, the K elements of ω ir are drawn as follows. We begin with a K random vector ω ir , that is: 4 There is no consensus about the number of draws that has to be used in applied work. Bhat (2001)'s Monte Carlo analysis found that the precision of the estimated parameters was smaller using 100 Halton draws than 1000 pseudo-random number in the context of mixed logit. However, as suggested by Hensher and Greene (2003), the best approach is to estimate models over a range of draws (e.g., 25, 50, 100, 250, 1000, 2000 draws) and analyze the stability and precision of the parameters. In general, and depending in the application, the results should stabilize after about 500 draws.  • K independent draws from the standard uniform (0, 1) distribution or, • K independent draws from the m-th Halton sequence, where m is the m-th prime number in the sequence of K prime numbers.
An important attribute of the Halton values is that they are also distributed in the (0, 1) interval as shown in Figure 1. Then, the primitive draw (pseudo or Halton draws) is then transformed to the distribution specified by the user as follows: • u k,ir ∼ U (0, 1): primitive draw from Halton or pseudo-random number generator.
Using these two primitive draws, Rchoice creates the random parameters as follows: 1. Normal distribution: where β k and σ k are estimated. Then, β k,i ∼ N (β k , σ 2 k ). Since the domain of the normal distribution is (−∞, +∞), assuming a given coefficient to follow a normal distribution is equivalent to making an a priori assumption that there is a proportion of individuals with a positive coefficient and another proportion with negative ones (see Panel A of Figure 2). For example, the proportion of positive coefficients can be computed as Φ( β k / σ k ). The main disadvantage of the normal distribution is that it has infinite tails, which may result in some individuals having implausible extreme coefficients. If this is the case, the triangular or uniform distribution may be more appropriate.

Triangular distribution:
where 1(·) is an indicator function that takes the value 1 when the statement in brackets is true and 0 when is false; and β k and σ k are the parameters estimated. The PDF of this distribution looks like a triangle, and its main advantage is that it has a definite upper and lower limits resulting in shorter tails than the normal distribution. See for example panel B of Figure 2. 3. Uniform distribution: where β k and σ k are estimated. In this case, the parameter for each individual is equally likely to take on any value in some interval (see panel C of Figure 2). Note also that the uniform distribution with bounds 0 and 1 is very suitable when there exists individual heterogeneity in a dummy variable. For this case, the restriction β k = σ k = 1/2 can be applied.

Log-Normal distribution:
where β k and σ k are estimated. Then, β k,i ∼ log N (β k , σ 2 k ). The support of the lognormal distribution is (0, +∞), therefore the coefficient is allowed to have individual heterogeneity in the positive domain only (see panel D of Figure 2). If the some coefficient is expected a priori to be negative for all the individuals, one can create the negative of the variable and then include this new variable in the estimation. This allows the coefficient to be negative without imposing a sign change in the estimation procedure (Train 2009). 5. Truncated (at 0) normal distribution: where β k and σ k are estimated. Then, β k,i ∼ N (β k , σ 2 k ) with the share below zero massed at zero. This distribution is useful when the researcher has a priori belief that for some individuals the marginal latent effect of the variable is null. See panel F of Figure 2.
6. Johnson's S b distribution: where β k and σ k are estimated. This distribution gives coefficients between 0 and 1, which is also very suitable for dummy variables. If the researcher needs the coefficient to be between 0 and d, then the variable can be multiplied by d before estimation. The main advantage of the Johnson S b is that it can be shaped like log-normal distribution, but with thinner tails below the bound. See panel E of Figure 2.
Rchoice allows to the user to specify both types of random draws by the argument haltons: pseudo-random draws (haltons = NULL) and Halton draws (haltons = NA) as default. For the Halton draws, the default is to use the first K prime numbers starting with 3. Within each series, the first 100 draws are discarded, as the first draws tend to be highly correlated across different draws. The user can also change the prime number and the number of elements dropped for each series. For example, if K = 2, and the user wants to use prime numbers 5 and 31 along with dropping the first 10 draws, he could specify haltons = list("prime" = c(5, 31), "drop" = c(10, 10)).

Applications using Rchoice
In this section I present some of the capabilities of Rchoice. Section 5.1 shows how to estimate Poisson, binary and ordered response models with fixed parameters. Examples of how to estimate models with random parameters are presented in Section 5.2. Section 5.3 explains how to estimate models with random parameters with panel or longitudinal data. In Section 5.4, models with both observed and unobserved heterogeneity are estimated. Finally, Section 5.5 shows how the conditional estimates for each individual can be plotted.

Standard models
In this section, I show the capabilities of Rchoice to estimate Poisson, binary and ordered regression models without random parameters. The main objective of this section is to show how Rchoice can interact with other packages in R.
To show how to estimate Poisson regression models using Rchoice, I will use data on scientific productivity (Long 1990(Long , 1997. We load the data using To see more information about the data, one can use:

R> help(Articles)
The work by Long (1990)  The output shows that the log-likelihood function is estimated using the Newton-Raphson algorithm in 7 iterations. In terms of interpretation, we can say that, being a female scientist decreases the expected number of articles by a factor of 0. where the correction is n/(n − 1).
Rchoice also interacts with the linearHypothesis and deltaMethod functions from car (Fox and Weisberg 2011) and the lrtest and waldtest functions from the lmtest package (Zeileis and Hothorn 2002). For example, we can test the nonlinear hypothesis that the ratio between the phd and ment coefficient is zero 5 , H 0 : phd/ment = 0, by: R> library("car") R> deltaMethod(poisson.fixed, "phd/ment") Estimate SE phd/ment 0.5020048 1.043031 The main argument to estimate other models is family (see Table 1). This provides a convenient way to specify the details of the models used by Rchoice. For probit models, the user should specify family = binomial("probit"), and for logit family = binomial("logit"). In the following example, I use the Workmroz data base to estimate a binary probit model, where the dependent variable lfp equals 1 if wife is in the paid labor force, and 0 otherwise. R> data("Workmroz") R> oprobit <-Rchoice(lfp~k5 + k618 + age + wc + hc + lwg + linc, + data = Workmroz, family = binomial("probit")) R> summary (

Random parameters models with cross-sectional data
The main advantage of Rchoice over other packages is that it allows estimating models with random parameters. In this section, I show how to estimate those kinds of models for count data using cross-sectional data. For binary and ordered models the syntax is the same provided that the family argument is correctly specified.
Continuing with the previous Poisson model, now I will assume that the effect of kid5, phd and ment are not fixed, but rather heterogeneous across the population. Specifically, I will assume that the coefficients for those variables are independent normally distributed, that is, I will not allow correlation among them. In particular, where ω k,ir ∼ N (0, 1). Then, the Poisson model with random parameter is estimated using the following syntax: R> poisson.ran <-Rchoice(art~fem + mar + kid5 + phd + ment, + data = Articles, family = poisson, + ranp = c(kid5 = "n", phd = "n", ment = "n")) It is important to discuss the arguments for the Rchoice function. First, the argument ranp indicates which variables are random in the formula and their distributions. In this case, I have specified that all of them are normal distributed using "n". The shorthand for the remaining distributions that can be used are: 7 • Log-Normal = "ln", • Truncated Normal = "cn", • Uniform = "u", • Triangular = "t", • Johnson's S b = "sb".
The number of draws are not specified. Therefore, Rchoice will set R = 40 as default. The user can change this by changing the R argument. The type of draws are Halton draws as default, but if users want pseudo-random draws they can specify haltons = NULL (see Section 4). Call: Rchoice(formula = art~fem + mar + kid5 + phd + ment, data = Articles, family = poisson, ranp = c(kid5 = "n", phd = "n", ment = "n"), method = "bfgs", iterlim = 2000) The 24% of the individuals have a positive coefficient for kid5. In other words, for about 76% of PhD students, having children less than 6 years old reduces their productivity. Note also that the mean coefficient for phd is 0 (not significant). This is due to the fact that the unobserved heterogeneity among scientists in the sample cancel out positive and negative effects. These observations are not possible with a Poisson regression with fixed parameters.

R> Ttable
Calls: model 1: Rchoice(formula = art~fem + mar + kid5 + phd + ment, data = Articles, family = poisson, ranp = c(kid5 = "n", phd = "n", ment = "n"), method = "bfgs", iterlim = 2000) model 2: Rchoice(formula = art~fem + mar + kid5 + phd + ment, data = Articles, family = poisson, ranp = c(kid5 = "u", phd = "t", ment = "cn"), method = "bfgs", iterlim The previous model specifies the coefficients to be independently distributed while one would expect correlation. For example, the effect of the prestige of PhD department could be positive correlated with the number of publications by mentor. Now, I estimate the model poisson.ran, but assuming that the random parameters are correlated: β i ∼ N (β, Σ) for a general matrix Σ. The main argument for this model is correlation = TRUE: R> poissonc.ran <-Rchoice(art~fem + mar + kid5 + phd + ment, + data = Articles, ranp = c(kid5 = "n", phd = "n", ment = "n"), + family = poisson, correlation = TRUE) R> summary(poissonc.ran) Call: Rchoice(formula = art~fem + mar + kid5 + phd + ment, data = Articles, family = poisson, ranp = c(kid5 = "n", phd = "n", ment = "n"), correlation = TRUE, method = "bfgs", iterlim = 2000) The estimation took: 0h:0m:57s The output prints the mean of the random parameters along with the lower-triangular Cholesky factor L. We can extract the variance-covariance matrix, Σ, and the correlation matrix of the random parameters using S3 method vcov in the following way: R> vcov(poissonc.ran, what = "ranp", type = "cov") The argument what indicates which covariance matrix has to be extracted. The default is coefficient, and the vcov behaves as usual. If what = "ranp" the covariance matrix of the random parameters is returned as default. The argument type indicates what matrix of the random parameters should be returned. Among other things, the output shows that ment is negatively related with phd and kid5. Specifically, we can see that the correlation between phd and ment is around -0.4. We can also test whether the variances of the random parameters are significant using Delta Method. To do so, we can use the se = TRUE argument in the vcov function, which is a wrapper of deltamethod function from msm (Jackson 2011) package: R> vcov(poissonc.ran, what = "ranp", type = "cov", se = TRUE) Elements of the variance-covariance matrix Note that the covariance between ment and phd even though is negative is not significant. To get the standard errors of the standard deviations for the random parameters, we might use: R> vcov(poissonc.ran, what = "ranp", type = "sd", se = TRUE)

Random parameters models with panel data
The current version of this package also handles panel data by estimating random effect (RE) models. Suppose that the i observation y i has unconditional joint density f (y * i |X i , α i , β) and the random effect has density α i ∼ g(α i |θ), where g(α i |θ) does not depend on observables. Thus, the unconditional joint density for the i-th observation is This model can be estimated using the package pglm (Croissant 2013b), which uses Gauss-Hermite quadrature to approximate the integration in the probability. However, note that this model can be seen as a random parameter model where the constant is random. Therefore, users might estimate a simple RE model with Rchoice by typing ranp = (constant = "n").
In the following example I will estimate a binary probit model with RE and random parameters using Unions database from the pglm package.
R> data("Unions", package = "pglm") R> Unions$lwage <-log(Unions$wage) The model is estimated using the following syntax: R> union.ran <-Rchoice(union~age + exper + rural + lwage, + data = head (Unions, 2000), family = binomial("probit"), + ranp = c(constant = "n", lwage = "t"), R = 10, panel = TRUE, + index = "id", print.init = TRUE) In this case, I assumed that lwage is distributed as triangular, while the constant is assumed to be normal distributed. This is the same as assuming that α i ∼ N (0, σ 2 α ). There are two main arguments for the panel estimation. The argument panel = TRUE indicates that the data is a panel. This implies that users should indicate the variable that corresponds to the 'id' of the individuals in the index argument. In the this example, the 'id' is given by the variable "id".
Finally, the argument print.init = TRUE indicates that the initial values used by Rchoice will be displayed. This argument is useful if one wants to see the magnitude of the initial values for the parameters. Call: Rchoice(formula = union~age + exper + rural + lwage, data = head (Unions, 2000), family = binomial("probit"), ranp = c(constant = "n", lwage = "t"), R = 10, panel = TRUE, index = "id", print.init = TRUE, method = "bfgs", iterlim = 2000) Frequencies The results indicate that σ α =1.2 and is significant. Furthermore, the finding of a significant standard deviation yet insignificant mean for lwage attests to the existence of substantial heterogeneity; positive and negative coefficients in the sample compensate for each other, such that the coefficient of the mean is not significant.

Random parameter model with observed heterogeneity
In this section I illustrate how to estimate a Poisson random parameter model with observed heterogeneity. In the following example, I assume that there exists not only unobserved heterogeneity in the coefficients for phd and ment, but also observed heterogeneity in the mean of the coefficients. Specifically, I assume that the coefficients vary across individuals according to: β kid5,ir = β kid5 + σ kid5 ω kid5,ir β phd,ir = β phd + π phd,fem fem + σ phd ω phd,ir β ment,ir = β ment + π ment,fem fem + π ment,phd phd + σ ment ω ment,ir The formulation above implies that, for example, the ment's coefficient (or the marginal effect on latent productivity) varies by gender and phd.

Plotting conditional means
It is important to note that the estimates of the model parameters provide the unconditional estimates of the parameter vector, but we can derive an individual-specific conditional estimator (see Train 2009;Greene 2012). The estimator of the conditional mean of the distribution of the random parameters, conditioned on the person-specific data, is: 8 where: β ir = β + Πs i + Lω ir .
Note that these are not actual estimates of β i , but are estimates of the conditional mean of the distribution of the random parameters (Greene et al. 2014). We can also estimate the standard deviation of this distribution by estimating: and then computing the square root of the estimated variance, With the estimates of the conditional mean and conditional variance, we can then compute the limits of an interval that resembles a confidence interval as the mean plus and minus two estimated standard deviation. This will construct an interval that contains at least 95 percent of the conditional distribution of β i (Greene 2012).
Rchoice allows plotting the histogram and kernel density of conditional means of random parameters using the function plot. For instance, the kernel of the conditional mean of β lwage,i for union.ran model can be obtained by typing: The graph produced using this command is visualized in Figure 3. We may also plot the individual confidence interval for the conditional means for the first 20 individuals using: R> plot(union.ran, par = "lwage", ind = TRUE, id = 1:20, col = "blue") R> bi.wage <-effect.Rchoice(union.ran, par = "lwage", effect = "ce") The argument effect is a string indicating what conditional mean should be computed. In this example, we are requiring the conditional expectation of the individual coefficients effect = "ce". effect.Rchoice returns two list. The first one with the estimated conditional means for all the individuals, and the second one with the estimated standard errors of the conditional means. One might also get the individual's conditional mean of the compensating variations using both plot and effect.Rchoice. Compensating variation is the variation in two regressors such that the latent variable does not change. Let x il denote the l-th elment in x i and β l the corresponding parameter, and let m index the m-th elements in both vectors x i and β, respectively. Now consider a change in x il and x im at the same time, such that y * i = 0. This requires where β im is a random coefficient. This ratio (without the minus sign) is computed or plotted if the argument effect = "cv" in any of the two functions. The argument par is the variable whose coefficient goes in the numerator (β im ), and the argument wrt is a string indicated which coefficient goes in the denominator (β l ). Note that since β im is random, the ratio of the coefficient is random and its distribution follows from the joint distribution of the coefficients.

Computational issues
The estimated parameters in any model estimated using SML depend on at least four factors. The first of them is the random number seed. If the random draws used in the estimation are pseudo-random draws, instead of Halton draws, then the parameter might change if the seed is changed. As default, Rchoice sets seed = 10. Second, the number of draws are very important for the asymptotic properties of the SML (see Section 2.3).
In the examples provided in this document, I used just a few draws for time restrictions. However, in real applied work, researchers must use a greater number of draws, especially if pseudo-random draws are used in the estimation. A good starting point is 500 draws (see for example Bhat 2001;Sándor and Train 2004).
Starting values are also crucial to achieve convergence, specially for the correlated random parameter model with deterministic variation in the mean. Note that version 0.1 of Rchoice sets the initial values at 0, while versions 0.2 and 0.3 set them at 0.1 as default. Users might change this with the argument init.ran or setting a new vector of starting values with the argument start.
Finally, the choice of optimization method is another important factor that influences model convergence. As explained before, Rchoice uses the function maxLik in order to maximize the log-likelihood function, which permits to estimate models by the Newton-Raphson (NR), BGFS and Berndt-Hall-Hall-Hausman (BHHH) procedures. I have found that BFGS often works best in the sense it is more likely to converge than the alternative algorithm. That is the main reason why this algorithm is set as default. If this method does not converge, users might re-estimate the model using another algorithm. For example, the user might type method = "nr" for the NR method or method = "bhhh" for the BHHH method in the Rchoice function. Rchoice uses the numerical Hessian if method = "nr" and the model is estimated with random parameters, thus it can be very slow compared to the other methods. It is worth mentioning that BHHH is generally faster than the other procedures, but it can failure if the variables have very different scale. The larger the ratio between the largest standard deviation and the smallest standard deviation, the more problems the user will have with the estimation procedure. Given this fact, users should check the variables and re-scale or recode them if necessary. Furthermore, it is also convenient to use the argument print.level = 2, for example, to trace the optimization procedure in real time. For more information about arguments for optimization type help(maxLik).

Conclusions
The Rchoice package contains most of the newly developed models in binary, count and ordered models with random parameters. The current version of Rchoice handles cross-sectional an panel data with observed and unobserved individual heterogeneity. Allowing parameter values to vary across the population according to some pre-specified distribution overcomes the problem of having a fixed-representative coefficient for all individuals. The distribution supported by Rchoice are normal, log-normal, uniform, truncated normal, triangular distribution and Johnson's S b . It also allows the user to choose between Halton draws and pseudo-random numbers and correlated parameters.
The Rchoice package intends to make available those estimation methods to the general public and practitioners in a friendly and flexible way. In future versions, I expect to add functions that allow estimating latent class models. I also hope to include functions for marginal effects.