Multinomial Logit Models with Continuous and Discrete Individual Heterogeneity in R : The gmnl Package

This paper introduces the package gmnl in R for estimation of multinomial logit models with unobserved heterogeneity across individuals for cross-sectional and panel (longitudinal) data. Unobserved heterogeneity is modeled by allowing the parameters to vary randomly over individuals according to a continuous, discrete, or mixture distribution, which must be chosen a priori by the researcher. In particular, the models supported by gmnl are the multinomial or conditional logit, the mixed multinomial logit, the scale heterogeneity multinomial logit, the generalized multinomial logit, the latent class logit, and the mixed-mixed multinomial logit. These models are estimated using either the Maximum Likelihood Estimator or the Maximum Simulated Likelihood Estimator. This article describes and illustrates with real databases all functionalities of gmnl, including the derivation of individual conditional estimates of both the random parameters and willingness-to-pay measures.


Introduction
Modeling individual choices has been a very important avenue of research in diverse fields such as marketing, transportation, political science, and environmental, health, and urban economics.In all these areas the most widely used method to model choice among mutually exclusive alternatives has been the Conditional or Multinomial Logit model (MNL) (McFadden 1974), which belongs to the family of Random Utility Maximization (RUM) models.The main advantage of the MNL model has been its simplicity in terms of both estimation and interpretation of the resulting choice probabilities and elasticities.On the one hand, the MNL has a closed-form choice probability and a likelihood function that is globally concave.MNL estimation is thus straightforward using the Maximum Likelihood Estimator (MLE).On the other hand, it has been recognized that MNL not only imposes constant competition across alternatives -as a consequence of the independence of irrelevant alternatives (IIA) property -but also lacks the flexibility to allow for individual-specific preferences.
With the advent of more powerful computers and the improvement of simulation-aided inference in the last decades, researchers are no longer constrained to use models with closed-form solutions that may lead to unrealistic behavioral specifications.In fact, much of recent work focuses on extending MNL to allow for random-parameter models that accommodate unobserved preference heterogeneity.
The most popular MNL extension is the Mixed Logit Model (MIXL).MIXL allows parameters to vary randomly over individuals by assuming some continuous heterogeneity distribution a priori while keeping the MNL assumption that the error term is independent and identically distributed (i.i.d) extreme value type 1 (McFadden and Train 2000;Train 2009;Hensher and Greene 2003).MIXL is a very flexible model that can approximate any RUM model, and it does not exhibit the IIA property encountered in MNL.Furthermore, using the parametric heterogeneity distribution that describes how preferences vary in the population it is possible to derive conditional estimates of the parameters at the individual-level.
Latent Class (LC) discrete choice models offer an alternative to MIXL by replacing the continuous distribution assumption with a discrete distribution in which preference heterogeneity is captured by membership in distinct classes or segments (Boxall and Adamowicz 2002;Greene and Hensher 2003;Shen 2009).The standard LC specification is useful if the assumption of preference homogeneity holds within segments.In effect, all individuals in a given class have the same parameters (fixed parameters within a class), but the parameters vary across classes (heterogeneity across classes).Bujosa, Riera, and Hicks (2010), and more recently Greene and Hensher (2013), have extended the LC model to allow for unobserved heterogeneity both within and across segments.The cross-group variation is modeled with the LC model, whereas the within-group variation is modeled as a continuous variation.An important characteristic of this model is that it nests both the MIXL and LC model in a double mixture specification.This model is also known as Mixed-Mixed Logit (MM-MNL) (Keane and Wasi 2013).
Other researchers have focused on MNL extensions that allow for a more flexible representation of heteroskedasticity.For example, Fiebig, Keane, Louviere, and Wasi (2010) proposed two new models, namely the Scale Heterogeneity (S-MNL) model and the Generalized Multinomial Logit (G-MNL) model.S-MNL extends the MNL by letting the scale of errors vary across individuals (via a parametric specification of heteroskedasticity), whereas the G-MNL nests the S-MNL, MIXL, and MNL models.For a discussion of confounding effects between scale and preference heterogeneity, see Hess and Rose (2012) and Hess and Stathopoulos (2013).
There exist different packages in R (R Core Team 2015) in order to estimate models with multinomial responses.Some packages that allow the estimation of Multinomial Logit model with fixed parameters are mlogit (Croissant 2012), RSGHB (Dumont, Keller, and Carpenter 2014), mnlogit (Hasan, Zhiyu, and Mahani 2015), the function multinom function from nnet package (Venables and Ripley 2002), VGAM (Yee 2010), and package bayesm (Rossi. 2012).The Multinomial Probit (MNP) model is fitted in MNP (Imai and Dyk 2005) and mlogit package.Models with random parameters are supported by mlogit and RSGHB.In terms of models with latent classes bayesm, RSGHB, flexmix (Leisch 2004), and poLCA (Linzer and Lewis 2011) offer alternative estimation procedures.Among all these packages, mlogit is probably the most user-friendly R package for the estimation of models with multinomial responses.Table 1 presents a more complete overview of the models supported by each package and the estimation procedure used to estimate the parameters.
The gmnl package (Sarrias and Daziano 2015) is intended to consolidate in a single R package the whole range of discrete choice models with random parameters for the use of researchers MIXL generalizes the MNL model by allowing the preference or taste parameters to be different for each individual (McFadden and Train 2000;Train 2009).MIXL is basically a random parameter logit model with continuous heterogeneity distributions.The random utility of person i for alternative j and for choice occasion t is: where x ijt is a K × 1 vector of observed alternative attributes; ijt is the idiosyncratic error term or taste shock, and is i.i.d.extreme value type 1; the parameter vector β i is unobserved for each i and is assumed to vary in the population following the continuous density f (β i |θ), where θ are the parameters of this distribution.This mixing distribution can in principle take any shape.For example, when assuming that the parameters are distributed multivariate normal, β i ∼ MVN(β, Σ), the vector β i can be re-written as: where η i ∼ N (0, I), and L is the lower-triangular Cholesky factor of Σ such that LL = VAR(β i ) = Σ.If the off-diagonal elements of L are zero, then the parameters are independently normally distributed.Observed heterogeneity (deterministic taste variations) can also be accommodated in the random parameters by including individual-specific covariates (see for example Greene 2012).Specifically, the vector of random coefficients is: where z i is a set of M characteristics of individual i that influence the mean of the preference parameters; and Π is a K × M is a matrix of additional parameters.
Unlike the MIXL model, LC uses a discrete mixing distribution, where individual i belongs to class q with probability w iq , i.e.,: where q w iq = 1 and w iq > 0. The discrete mixing distribution (or class assignment probability) is unknown to the analyst.The most widely used formulation for w iq is the semiparametric Multinomial Logit format (Greene and Hensher 2003;Shen 2009): where h i denotes a set of socio-economic characteristics that determine assignment to classes.The parameters of the first class are normalized to zero for identification of the model.Note that one could omit any socio-economic covariate as a determinant of the class assignment probability.Under this scenario, the class probabilities simply become: where γ q is a constant (Scarpa and Thiene 2005).
Let y ijt = 1 if individual i chooses j on occasion t, and 0 otherwise.Then, the unconditional probabilities of the sequence of choices by individual i for MIXL and LC are respectively given by: Both MIXL and LC are widely used in practice to accommodate preference heterogeneity across respondents.As discussed above, in the MIXL approach parameters are assumed to vary across the population according to some prespecified statistical distribution in a way that defines continuous segmentation of preferences.In the LC model a discrete number of separate classes or segments, each with different fixed parameters, recover preference heterogeneity.In addition to differentiation in terms of continuous versus discrete consumer segments, there exist further differences between MIXL and LC.For example, compared with the MIXL approach, the LC model has the advantage of being "relatively simple, reasonably plausible and statistically testable" (Shen 2009).In addition, because LC is a semiparametric specification that depends only on the prespecified number of classes, it avoids misspecification problems in the distribution of individual heterogeneity.In fact, the main disadvantage of MIXL is that the researcher has to choose the distribution of the random parameters a priori.Nevertheless, LC is less flexible than MIXL precisely because the parameters in each class are fixed.Another important difference between these two models is the estimation procedure.The MIXL requires the use of the maximum simulated likelihood estimator -which can be very costly in terms of computational time -but no simulation is required for LC. 1 gmnl implements the Maximum Likelihood Estimator for both LC and MIXL, using analytical expressions for the appropriate gradient.
To take advantage of the benefits of both models, recent empirical papers have derived a mixture of LC and MIXL.This double-mixture model is known as the 'Mixed-Mixed' Logit model (MM-MNL) (Keane and Wasi 2013).Consider the case where the heterogeneity distribution is generalized to a discrete mixture of 1 For an empirical comparison between these two models, see for example Greene and Hensher (2003), Shen (2009) and Hess, Ben-Akiva, Gopinath, and Walker (2011).
2 Train (2008) refers to this model as 'discrete mixture of continuous distributions', whereas Greene and Hensher (2013) label it 'LC-MIXL'.multivariate normal distributions.In this case we have: q ) with probability w iq for q = 1, ..., Q. (3) The appeal of using a Gaussian mixture for the heterogeneity distribution is that any continuous distribution can be approximated by a discrete mixture of normal distributions (Train 2008).Note that the MM-MNL with only one class is equivalent to the MIXL model.Furthermore, if Σ q → 0 for all q, the model in Equation 3 becomes a LC-MNL model (Bujosa et al. 2010;Keane and Wasi 2013).Thus, MM-MNL nests both MIXL and LC.
The choice probabilities for the MM-MNL are given by: where f (β i ) = N (β q , Σ q ).gmnl implements the maximum likelihood estimator for the MM-MNL parameters with the Monte-Carlo approximation of this choice probability and the analytical expression of the gradient.

Generalized Multinomial Logit model
where σ i is the individual-specific scale of the idiosyncratic error term, and γ is a scalar parameter that controls how the variance of residual taste heterogeneity Lη i varies with scale.To better understand this specification, it is useful to note that differing sub-models arise when some structural parameters in the G-MNL model are constrained: • G-MNL-I: If γ = 1, then β i = σ i β + Lη i .In this model, the residual taste heterogeneity is independent of the scaling of β.
• G-MNL-II: If γ = 0, then In this model, the residual taste heterogeneity is proportional to σ i .
• S-MNL: If VAR(η i ) = 0, then β i = σ i β.As pointed out by Fiebig et al. (2010), this model is observationally equivalent to the particular type of heterogeneity in which the parameters increase or decrease proportionally across individuals by the scaling factor σ i .S-MNL provides a more parsimonious representation of continuous heterogeneity than MIXL, because βσ i is a simpler object than β + Lη i (Fiebig et al. 2010).
• MIXL: • MNL: Fiebig et al. (2010) note that some restrictions need to be considered to estimate the G-MNL model.First, the domain of σ i should be the positive real line.A positive scale parameter is ensured by assuming that σ i is distributed log-normal with standard deviation τ and mean σ Fiebig et al. (2010): where υ ∼ N (0, 1).Fiebig et al. (2010) also note that when τ is too large, numerical problems arise for extreme draws of υ i .To avoid this numerical issue, the authors suggest to use a truncated normal distribution for υ i with truncation at ±2, so that υ ∼ T N [−2, +2].Greene and Hensher (2010) found that constraining υ i at −1.96 and +1.96 maintains the smoothness of the estimator.Specifically, the authors used υ ir = Φ −1 (0.025 + 0.95u ir ), where u ir is a draw from the standard uniform distribution.gmnl allows the user to choose between these two ways of drawing from υ i , using the argument typeR (see Section 3.1).
Note that the parameters σ, τ , and β are not separately identified.Fiebig et al. (2010) suggest that one can normalize the mean σ by setting: Another important issue in G-MNL is the domain of γ.Initially, Fiebig et al. (2010) imposed γ ∈ [0, 1].To constrain γ in this interval, the authors used the logistic transformation: and estimated γ * .However, Keane and Wasi (2013) pointed out that both γ < 0 and γ > 1 still have meaningful behavioral interpretations.Thus, these authors estimate γ directly.gmnl allows to estimate γ using both procedures.
Finally, one can allow the mean of the scale to differ across individuals by including individualspecific characteristics.In this case the scale parameter can be written as: where s i is a vector of attributes of individual i.
In terms of computation, all models, except for the LC and the MNL model, are estimated in gmnl using the maximum simulated likelihood estimator (MSLE).For a complete derivation of the asymptotic properties of the MSLE and a more comprehensive review of how to implement this estimator, see Train (2009), Lee (1992), Gourieroux and Monfort (1997) or Hajivassiliou and Ruud (1986).

A general overview of gmnl
All the models are estimated using the function gmnl, which has the following arguments.
• data: The data used for estimation, which must be of class mlogit.data.This is further explained in Section 3.2.
• model: A character string indicating which model will be estimated.The options are given in Table 2.
Options for "model" Model Table 2: Models supported by gmnl.
• start: A vector of starting values provided by the user.
• ranp: This is a named vector whose names are the random parameters and values the continuous distributions.This argument is valid only if the MIXL, G-MNL or MM-MIXL model is estimated.The distributions supported by gmnl are presented in Table 3.
It is worth mentioning that given how the random parameters of the G-MNL model are constructed (see Equation 4), the distributions allowed when model = "gmnl" are the normal, uniform, and triangular.Similarly, when the model is estimated with correlated random parameters, only the normal distribution and its transformations -log-normal and truncated normal-are allowed.
• R: The number of draws used to simulate the probability if ranp is not NULL.
• Q: The number of classes for LC-MNL or MM-MNL model.

Shorthands Distributions
"n" Normal distribution "ln" Log-normal distribution "cn" Truncated (at zero) normal distribution "t" Triangular distribution "u" Uniform distribution "sb" Johnson S b distribution Table 3: Continuous distributions supported by gmnl.
• haltons: This argument is relevant if ranp is not NULL.If haltons = NULL, pseudorandom draws are used instead of Halton sequences.If haltons = NA, the first K primes are used to generate the Halton draws, where K is the number of random parameters, and 15 of the initial sequence of elements are dropped.Otherwise, haltons should be a list with elements prime and drop.For a further explanation of Halton draws see Train (2009).
• mvar: This argument is only valid if the model has observed heterogeneity in the mean of the random parameters.A more detailed discussion of this argument is presented in Section 3.5.
• seed: This is the seed for the random number generator if haltons = NULL.
• correlation: This argument is valid if ranp is not NULL.If TRUE, the correlation across random parameters is taken into account.
• bound.err:This argument is only relevant if the S-MNL or G-MNL model is estimated.It indicates at which values the draws for the scale parameter σ i are truncated.By default bound.err= 2. Therefore, a truncated normal distribution with truncation at ±2 for υ i is used.
• panel: If TRUE a panel (longitudinal) data model is estimated.
• init.tau:Initial value for the τ parameter in Equation 4. The default is 0.1.
• init.gamma:Initial value for the γ parameter in Equation 4. The default is 0.1.
• notscale: This argument is relevant if the model being estimated is either S-MNL or G-MNL.It is a vector indicating which variables should not be scaled.See Section 3.4 for an illustration.
• print.init:If TRUE, then the initial values for the optimization procedure are displayed.
• gradient: If TRUE, then the analytical gradient is used for the optimization procedure.Otherwise, numerical approximation for the gradient is used.
• typeR: If TRUE, truncated normal draws are used for the scale parameter.In this case, the function rtruncnorm of truncnorm (Trautmann, Steuer, Mersmann, and Bornkamp 2014) is used.If typeR = FALSE the procedure suggested by Greene and Hensher (2010) is used.See Section 2.2.

Format of data
The function mlogit.datafrom mlogit is very useful to handle multinomial data formats.gmnl thus uses the same class of data for estimation.If the user forgets to set the data in the mlogit.dataformat, gmnl will give an error message and the estimation process will stop.
For illustration purposes, we use the This data base is in "long" format and can be transformed into the structure needed by gmnl using the mlogit.data in the following way: library("mlogit") TM <-mlogit.data(TravelMode,choice = "choice", shape = "long", alt.levels = c("air", "train", "bus", "car")) The argument choice indicates the choice made by the individuals; shape specifies the original format of the data; and alt.levels is a character vector that contains the name of the alternatives.We show how to transform other kinds of data in the examples below.For a more complete treatment of the data using mlogit.datafunction see Croissant (2012).

Formula interface
The specification of Multinomial Logit models using gmnl is similar to that of mlogit and mnlogit.In particular, we use the R package Formula (Zeileis and Croissant 2010), which is able to handle multi-part formulae.
Consider the TravelMode data and suppose that we want to estimate a Multinomial Logit model where the variables wait and vcost are alternative-specific variables with a generic coefficient β; income is an individual-specific variable with an alternative specific coefficient γ j ; and the variables travel and gcost are alternative-specific variables with an alternative specific coefficient δ j .This is done using the following 3-part formula: By default, the alternative-specific constants (ASC) for each alternative are included.They can be omitted by adding +0 or -1 in the second part of the formula.For example: Some parts may be omitted when there is no ambiguity.For instance, a model with only individual specific variables can be specified as follows: Similarly, a Conditional Logit model, that is, a model with alternative-specific variables with a generic coefficient β, can be specified using either of the following formula objects: For other models, such as the MIXL, S-MNL, LC-MNL and MM-MNL model, we require to use the fourth and fifth part of the formula.As explained in Section 2.1, gmnl allows incorporating observed heterogeneity in the mean of the random parameters.This can be achieved by including individual-specific characteristics (income and size) in the fourth part of the formula: and then use the mvar argument to indicate how these two variables modify the mean of the random parameters.For a more complete example see Section 3.5.
The fifth part of the formula is reserved for either models with heterogeneity in the scale parameter or models with latent classes.For example, an S-MNL or G-MNL model where the scale varies across individuals by individual-specific characteristics can be specified as follows: The same formulation can be used if a model with latent classes is estimated and both income and size determine the class assignment.

Estimating S-MNL models
In this example, we estimate an S-MNL model using the TravelMode data where the ASCs are fixed and not scaled.Fiebig et al. (2010) found that in a model where all attributes are scaled -including the ASCs -the estimates often show a explosive behavior and the model actually produces a worse fit.The basic syntax for estimation is the following: ## ## The following variables are not scaled: ## [1] "train:(intercept)" "bus:(intercept)" "car:(intercept)" ## Estimating SMNL model The component | 1 in the formula means that the model is fitted using ASCs for the J − 1 alternatives.The main argument in the model is model = "smnl", which indicates to the function that the user wants to estimate the S-MNL model (without random parameters).R = 30 indicates that 30 draws are used to simulate the probabilities.Another important argument in this example is notscale.This is a vector that indicates which variables will not be scaled (1 = not scaled and 0 = scaled).Since the ASCs are always the first variables entering in the model (if they are specified using | 1 in the second part of formula) and only J −1 = 3 ASCs are created, notscale = c(1, 1, 1, rep(0, 4)) implies that the constants will not be scaled.summary(smnl.nh)## ## Model estimated on: Thu Jun 04 13:14:54 2015 The results report the point estimates for each variable and τ , which represents the standard deviation of σ i .The output also gives useful estimation information.The model is estimated using the BFGS procedure.Other optimization procedures such as the BHHH and Newton Raphson (NR) can be called using the argument method passed to the maxLik function. 3nother important point is that the number of observations reported by gmnl corresponds to N/J if cross-sectional data is used, or N × T /J if panel data (repeated choice situations) is used.Finally, it is always important to check all the details in the estimation output.In our example, the output informs us that the convergence was achieved successfully.
In the next example, we allow the scale to differ across individuals according to their income.Basically, we assume that: The syntax is very similar to our previous example, with minor changes in the formula argument: The following variables are not scaled: ## [1] "train:(intercept)" "bus:(intercept)" "car:(intercept)" ## Estimating SMNL model The fifth part of the formula is reserved for individual-specific variables that affect scale.In this example, we specify that the variable income and no constant are included in σ i .,  ## 1, 1, 0, 0, 0, 0), method = "bfgs") The results are very similar to those of the previous example.All the parameters for the variables that enter in the scale are preceded by the string het.Thus, the coefficient het.income corresponds to δ income .
Suppose now that we want to test the null hypothesis H 0 : δ income = 0.This test can be performed using the function waldtest or lrtest from the package lmtest (Zeileis and Hothorn

Estimating MIXL models
In the following examples we show how to estimate MIXL models using gmnl.The package mlogit is very efficient in estimating MIXL models.However, one advantage of using gmnl is the inclusion of individual-specific variables to explain the mean of the random parameters (see Equation 2).Other important expansions include the possibility of producing point and interval estimates at the individual level, and the consideration of Johnson S b heterogeneity distributions.
If we assume that the coefficients of travel and wait vary across individuals according to: where η 1i is triangular and η 2i ∼ N (0, 1), the corresponding MIXL model is estimated by typing: mixl.hier<-gmnl(choice ~vcost + gcost + travel + wait | 1 | 0 | income + size -1, data = TM, model = "mixl", ranp = c(travel = "t", wait = "n"), mvar = list(travel = c("income","size"), wait = c("income")), R = 50, haltons = list("primes"= c(2, 17), "drop" = rep(19, 2))) ## Estimating MIXL model The argument model = "mixl" indicates that the MIXL model will be estimated.The distribution of the random coefficients are specified by the argument ranp (See Table 3 for shorthands of other continuous distributions allowed by gmnl).Note also that the fourth part of the formula is reserved for all the variables that enter the mean of the random parameters.The argument mvar indicates which variables enter each specific random parameter.For example travel = c("income","size") indicates that the mean of the travel coefficient varies according to income and size.Finally, haltons indicates the prime numbers used for the Halton draws and how many elements to drop for each random parameter.2)), mvar = list(travel = c("income", "size"), wait = c("income")), ## method = "bfgs") The output shows the estimates in the following order: fixed parameters, mean of the random parameters, effect of the variables that affect the mean of the random parameters, and finally the standard deviation/spread of the random parameters.Note that travel.incomecorresponds to π 11 , travel.sizecorresponds to π 12 , and wait.incomecorresponds to π 21 .
We now estimate a correlated random parameter model using the Electricity data from the mlogit package, which is a panel dataset.Given time compilation restrictions, in this example we will use just a subsample of this database (subset = 1:3000).The user may want to use the whole sample to reproduce this case study.data("Electricity", package = "mlogit") Electr <-mlogit.data(Electricity,id.var = "id", choice = "choice", varying = 3:26, shape = "wide", sep = "") In this example, two arguments are especially relevant in the gmnl function.First, panel = TRUE indicates that the data is a panel.When using panel data the user needs to specify a variable in the id.var argument of the mlogit.datafunction.Second, to estimate correlated random parameters correlation = TRUE needs to be indicated in the gmnl function.The syntax is the following:
By default, the initial values for the mean of the random parameters come from an MNL, and the standard deviations or spread are set at 0.1.However, the starting values from an MNL model may not be the best guess, since the G-MNL model is not globally concave.The best starting values for a G-MNL model with correlated parameters might be: 1) G-MNL with uncorrelated parameters, 2) MIXL with correlated parameters, or 3) GMNL with correlated parameters with γ fixed at 0. One can first get these initial parameters and then use the start argument of gmnl to indicate the vector of appropriate starting values (see Section 3.8 for an example of how to use the start argument).

Estimating LC and MM-MNL models
The next example shows how an LC model with two classes can be estimated: Note that for the LC model, one needs to specify at least a constant in the fifth part of the formula.If the class assignment w iq is also determined by socio-economic characteristics, those covariates can also be included in the fifth part.The LC model is estimated by typing model = "lc", and the prespecified number of classes is indicated with the argument Q.
summary(Elec.lc)## ## Model estimated on: Thu Jun 04 13:16:39 2015   The specification is similar to that of the LC model, but we now allow the parameters in each class to be normally distributed using the argument ranp.It is worth mentioning that the number of iterations required for this model is greater than that for previous models.For that reason we have set the maximum of iterations at 500 using the argument iterlim.

Willingness-to-pay space
Willingness-to-pay space models reparameterize the parameter space in such a way that the marginal WTP for each attribute is directly estimated rather than the marginal utility (preference parameters).Train and Weeks (2005) and Sonnier, Ainslie, and Otter (2007) extend the WTP-space approach by allowing random parameters (and, consequently, random willingnessto-pay measures).The WTP-space approach is very appealing because it allows the analyst to estimate the WTP heterogeneity distribution directly (Scarpa, Thiene, and Train 2008).
To illustrate the concept of WTP space, and how it can be estimated using gmnl, we will first show the case without random parameters.The standard procedure to derive willingness-topay measures is to start with a model in preference space.For example, consider the simple conditional logit model,  The argument wrt = "pf" indicates that all the parameters should be divided by the parameter of the attribute pf.
Another way to estimate the same WTP coefficients is to use the S-MNL model.We need first to compute the negative of the price coefficient using the mlogit.datafunction: ElectrO <-mlogit.data(Electricity,id = "id", choice = "choice", varying = 3:26, shape = "wide", sep = "", opposite = c("pf")) Next, we need to set the values for the price parameter and τ at 1 and 0, respectively.The fixed argument is used to set these values.Each value in the output represents the WTP estimates for each respective attribute.Note that these WTP estimates are the same as those obtained using the wtp.gmnl function.The price coefficient can be obtained using the following transformation:

Individual parameters
Similarly to the Rchoice package (Sarrias 2015), gmnl also allows the analyst to get the conditional estimates for each individual in the sample (see for example Train 2009;Greene 2012).Using Bayes' theorem we obtain where f (β i |y i , X i , θ) is the distribution of the individual parameters β i conditional on the observed sequence of choices, and g(β i |θ) is the unconditional distribution.The conditional expectation of β i is thus given by: The expectation in Equation 5gives us the conditional mean of the distribution of the random parameters, which can also be interpreted as the posterior distribution of the individual parameters.Simulators for this conditional expectation are presented below, for both the continuous and discrete cases: t f (y it |x it , β ir , θ) βi = E [β i |y i , X i , θ q ] = Q q=1 β q w iq t f (y it |x it , β ir , θ q ) Q q=1 t f (y it |x it , β q , θ q ) In order to construct the confidence interval for βi , we can derive an estimator of the conditional variance from the point estimates as follows (Greene 2012, chap. 15): An approximate normal-based 95% confidence interval can be then constructed as βi ± 1.96 × V 1/2 i .
The gmnl package uses these formulae to compute the individual parameters along with their 95% confidence interval.As an illustration, we can plot the kernel density of the individuals' conditional mean for the loc parameter using Elec.cormodel by typing the following: plot(Elec.cor,par = "loc", effect = "ce", type = "density", col = "grey") Figure 1 displays the distribution of the individuals' conditional mean for the parameter of loc.The gray area gives us the proportion of individuals with a positive conditional mean.The 95% confidence interval of the conditional mean for the first 30 individuals is shown in Figure 2, which was plotted using the following syntax:5 plot(Elec.cor,par = "loc", effect = "ce", ind = TRUE, id = 1:30) Another important function in gmnl is effect.gmnl.This function allows the users to get the individuals' conditional mean of both the preference parameters and the willingness-to-pay measures.For example, one can plot the individual conditional mean and standard errors (Figure 2) by typing: bi.loc <-effect.gmnl(Elec.cor,par = "loc", effect = "ce") summary(bi.loc$mean)The conditional mean of the willingness to pay for "loc" (wtp = β i,loc /β pf ) for all individuals in the sample can be obtained using: wtp.loc <-effect.gmnl(Elec.cor,par = "loc", effect = "wtp", wrt = "pf") Note that the argument par is the variable whose parameter goes in the numerator, and the argument wrt is a string indicating which parameter goes in the denominator.

Conclusions
The package gmnl implements the maximum likelihood estimator of random parameter logit models with heterogeneity distributions that can be continuous, discrete, or discrete-continuous mixtures.In this paper we have shown how gmnl can fit several extensions to the standard multinomial logit model, including the recently derived mixed-mixed multinomial logit (MM-MNL).To our knowledge there is no other widely available statistical package that has implemented the maximum simulated likelihood estimator of MM-MNL, and we want to highlight that gmnl makes use of analytical expressions of the gradient.gmnl is also the first implementation in R of the estimator of the scale heterogeneity multinomial logit (S-MNL), the generalized multinomial logit (G-MNL), and the latent class logit (LC).Whereas there are other packages in R for the estimation of MIXL, gmnl allows for the inclusion of individualspecific variables to explain the mean of the random parameters for a mixture of deterministic taste variations and unobserved preference heterogeneity.In addition, gmnl also implements Johnson S b heterogeneity distributions.
Another key post-estimation functionality of gmnl that we have illustrated in this paper is the derivation of conditional point and interval estimates of either the random parameters or willingness-to-pay measures at the individual level.Random parameter models can be used to make inference on the preference parameters of each individual in the sample, but most packages that estimate MIXL models lack a command to produce individual-level estimates.gmnl is able to compute individual parameters for all generalized logit models that are implemented in the package, including G-MNL, MIXL, and LC.
Additional functionalities that we expect to incorporate in the future are the consideration of different choice sets for each individual and the implementation of different methods for the construction of confidence intervals of willingness-to-pay measures.

2
Bujosa et al. (2010), andGreene and Hensher (2013) developed this MM-MNL model by extending the LC model to allow for random parameters within each class.

Figure 2 :
Figure 1: Kernel density of the individuals' conditional mean.

Table 1 :
Packages available in R for models with multinomial response.and practitioners.It shares similar functionalities with mlogit and mnlogit in terms of the formula interface.Furthermore, since gmnl is able to estimate G-MNL models, it also allows the user to estimate models in willingness-to-pay space with a minimal extra reformulation.Our package also provides the ability of constructing the conditional estimates for the individual parameters and willingness-to-pay.gmnl is available from the Comprehensive R Archive Network (CRAN) at http://cran.r-project.org/package=gmnl.
The paper is organized as follows: Section 2 presents a brief overview of the models supported by gmnl.Section 3 discusses the functionalities of the package.Section 4 concludes the paper.
The estimates from sd.cl.cl to sd.seas.seasare the elements of the lower triangular matrix L. If the user is interested in the standard errors of the variance-covariance matrix of the random parameters LL = Σ or the standard deviations, the S3 function vcov can be used for finding these elements.The syntax for both cases is the following: 4 vcov(Elec.cor,what= 'ranp', type = 'cov', se = 'true')The correlation matrix of the random parameters can be recovered using the following syntax: vcov(Elec.cor,what = 'ranp', type = 'cor') To estimate the willingness to pay for each attribute, one needs to divide each attribute parameter by that of price pf.This ratio can be easily retrieved using the function wtp.gmnl: