cdfquantreg: An R Package for CDF-Quantile Regression

The CDF-quantile family of two-parameter distributions with support (0, 1) described in Smithson and Merkle (2014) and recently elaborated by Smithson and Shou (2017), considerably expands the variety of distributions available for modeling random variables on the unit interval. This family is especially useful for modeling quantiles, and also sometimes out-performs the other distributions. The distributions are very tractable, with a location and dispersion parameter, explicit probability distribution functions, cumulative distribution functions, and quantiles. They enable a wide variety of quantile regression models with predictors for the location and dispersion parameters, and simple interpretations of those parameters. The R package cdfquantreg (Shou and Smithson 2019) (at least R 3.2.0) presented in this paper includes 36 distributions from the CDF-quantile family. Separate submodels may be specified for the location and for the dispersion parameters, with different or overlapping sets of predictors in each. The package offers maximum likelihood, Bayesian MCMC, and bootstrap estimation methods. Model diagnostics, including the gradient, three types of residuals, and the dfbeta influence measures, are available for evaluating models. The package also provides pseudo-random generators for all of its distributions. Many of its functions and their usage have forms familiar to R users, and the documentation is extensive. We also present a SAS macro for general linear models using the CDF-quantile family that includes many of the same capabilities as the cdfquantreg package. The paper provides examples of applications to real data-sets.


Introduction
The most popular two-parameter distribution for modeling random variables on the (0, 1) interval is the beta distribution (e.g., Ferrari and Cribari-Neto 2004;Smithson and Verkuilen 2006). Less commonly used are the Kumaraswamy (1980), lambda, logit-logistic, simplex, The background to the CDF-quantile family begins with Tadikamalla and Johnson (1982) replacing the standard normal distribution in Johnson's (1949) SB distribution with the standard logistic distribution, thus producing the logit-logistic distribution. A natural extension of this approach is to employ other transformations from (0, 1) to either the real line or nonnegative half of the real line, and to expand the variety of standard distributions as well. Alzaatreh, Lee, and Famoye (2013) defined the so-called T-X family as follows: where r(t) is the probability density function (PDF) of a random variable, T ∈ [a, b], for −∞ ≤ a < b ≤ ∞; and W (S(x)) satisfies three properties: 1. W (S(x)) ∈ [a, b], 2. W (S(x)) is differentiable and monotonically non-decreasing, and 3. W (S(x)) → a as x → −∞ and W (S(x)) → b as x → ∞.
The cumulative distribution function (CDF) in Equation 1 can be written in terms of the CDF of T : 3. The family enables a wide variety of quantile regression models for random variables on the (0, 1) interval with predictors for both the location and scale parameters.
4. The family can model four distinct varieties of distribution shapes, with different skew and kurtosis coverage from the beta or the Kumaraswamy. 5. Explicit quantiles render random generation of variates straightforward.
All of the CDF-quantile distributions in the cdfquantreg package have this form.
Explications and examples of members with other combinations of D 1 and D 2 are provided in Smithson and Shou (2017). The distributions in Equation 3 are related to the T-X family in Equation 2 by setting F = R and U [H −1 (S(x), µ, σ)] = W (S(x)), with H differentiable and x ∈ (0, 1). This reduces to Equation 3 by restricting S to be the uniform CDF, so that S(x) = x. One way of interpreting the CDF-quantile family is that G redistributes X by first transforming it via H −1 to a random variable whose domain is D 1 , and that variable is rescaled via µ and σ and then fitted by a location-scale distribution, F . Because X and G both share the unit interval as their domain, G provides a redistribution of X. Indeed, if H and F are identical, and µ = 0 and σ = 1, then G simply returns X. Lemonte and Bazán (2015) also describe a family of distributions with support (0, 1) as an extension of the Johnson SB family. Theirs is a special case of the CDF-quantile family with H restricted to the logistic CDF (details available from the first author). Lemonte and Bazán (2015) do not cite Alzaatreh et al. (2013) or any related papers, so their research efforts seem to have been unconnected with that group of researchers and with Smithson and his co-authors.
If F is invertible, then the distribution has an explicit quantile. If G is differentiable then it has an explicit PDF. All of the distributions in this package share both properties. There is a relation between pairs of these distributions in which F and H exchange roles. These pairs are "quantile-duals" of one another in the sense that one's CDF is the other's quantile, with the appropriate parameterization. We name these distributions with the nomenclature F-H (e.g., Cauchit-logistic and logit-Cauchy). Other relevant properties shared by the members of the family included in this package are as follows (proofs are provided in Smithson and Shou 2017): 1. The PDFs g (x, µ, σ) are self-dual in this respect: g (x, µ, σ) = g (1 − x, −µ, σ).
2. When H = F the distribution includes the uniform distribution as a special case. Otherwise, all distributions are symmetrical at x = 1 2 when µ = 0. 3. The median is a function solely of the location parameter µ.
4. Simple functions of the median and other particular quantiles yield expressions solely in the scale parameter σ.
5. The likelihood function is explicit, as are the gradient and Hessian.

Example distribution
An example is the arcsinh-Cauchy distribution. This distribution employs the hyperbolic arcsinh CDF F (z) = 1 e −sinh −1 (z) + 1 (5) and the Cauchy CDF Inverting H and applying it and F to the equation above for G (x, µ, σ) gives the CDF and differentiating it gives the PDF Inverting F and the appropriate substitutions give us the quantile: As described in property 3 above in Section 1.1, the median is a function of just the location parameter µ: and therefore where Q(γ) denotes the quantile at γ. Likewise, as in property 4, the scale parameter σ is a simple function of selected quantiles: so that It also can be shown that this distribution has a finite density in the limits at 0 and 1, unlike most distributions on the unit interval: lim

CDF-quantile regression
Maximum likelihood inference can be performed for this distribution family, and for all members where the gradient also has an explicit expression. For all of the distributions in this package, the PDF may be written as where f is the PDF corresponding to F, and q is the quantile density function corresponding to H −1 . Differentiating the log of g with respect to µ and σ drops q and gives the following: Thus, the requirement for an explicit gradient is that f is differentiable with respect to µ and σ. The resulting regression model in our framework has two submodels, the "location submodel" for µ and the "dispersion submodel" for σ: where x and z are vectors of predictors and β and δ are vectors of coefficients. The location submodel link function L µ is the identity, and the dispersion submodel link function L σ is the log. The sets of predictors in x and z may or may not overlap.
Note that because this is maximum likelihood estimation, the parameter estimates may be seen as determining the shape of both the PDF (or CDF) and the quantile function G −1 . Although this was illustrated in our example distribution, a more general understanding of this point is available by rewriting Equation 3 by assigning a quantile value G(y, µ(x), σ(z)) = γ, and rearranging it as in Equation 18.
Note that the third line in this rearrangement also shows why this model is a generalized linear model (GLM); the parameters are in a linear equation with the inverses of F and H acting as link functions. Furthermore, if F is a symmetric distribution around 0 (which it always is in cdfquantreg) then F −1 (0.5) = 0. So the median always is a function only of µ and the location submodel provides a GLM for the median. Appropriate combinations with the dispersion submodel provide GLMs for other quantiles. Thus, the predictors in the location submodel influence the locations of all quantiles, whereas predictors in the dispersion submodel influence all of them except the median.
As detailed in Smithson and Shou (2017), the maximum likelihood estimators are well-behaved for the CDF-quantile family. Their sampling distributions closely approximate the normal distribution for modest sample sizes, and they are relatively stable in the presence of outliers. In these respects they compare favorably with the beta and Kumaraswamy distributions.

The R package cdfquantreg
Despite the fact that doubly-bounded variables are commonplace in many scientific disciplines, relatively few distributions currently are available in software for fitting models to such data. Beta distribution models for modeling both location and dispersion parameters are available in the well-developed betareg package (Cribari-Neto and Zeileis 2010; Grün, Kosmidis, and Zeileis 2012) and the Stata (StataCorp. 2015) package betafit (Buis, Cox, and Jenkins 2003). The R package gamlss (Rigby and Stasinopoulos 2005;Stasinopoulos and Rigby 2008) also provides beta distribution models, including 0-and 1-inflated models. A recent addition to these resources is the simplexreg package (Zhang, Qiu, and Shi 2016), which also estimates two-parameter models, using the simplex distribution.
The cdfquantreg package expands the variety of available distributions for such models by including 36 members of the two-parameter CDF-quantile family of distributions for modeling random variables on the (0, 1) interval. Separate submodels may be specified for the location and for the dispersion parameters, with different or overlapping sets of predictors in each. The package offers maximum likelihood, Bayesian MCMC, and bootstrap estimation methods, on a par with the aforementioned packages, except for the regression tree and mixture distribution models available in betareg. Package cdfquantreg is available from the Comprehensive R Archive Network (CRAN) at http://CRAN.R-project.org/package=cdfquantreg.

General usage
The main function is cdfquantreg(), and the basic usage is: The formula has two parts, using the Formula package (Zeileis and Croissant 2010). An example formula is y~X1 + X2 | Z1 + Z2, where y on the left of~is the dependent variable to be modeled. The two parts separated by | on the right-hand side include predictors or independent variables in the model. The X1 + X2 on the left-hand side of | specifies the location submodel, which is linked to the location parameter of y,μ. The Z1 + Z2 on the right-hand of | specifies the dispersion submodel, which is linked to the dispersion parameter of y, log(σ). A null model part (i.e., intercept only model) can be represented by 1. For example, y~1 | Z1 + Z2 specifies the location submodel as a null model.
The dependent variable y must have numerical values within the (0, 1) interval. For variables that are on different scales, the function scaleTR() can be used to linearly transform the variable into the (0, 1) interval. The user can specify the extent to which the variable's values are pushed away from the boundary (i.e., 0 or 1). scaleTR employs the method suggested by Smithson and Verkuilen (2006) and applies a linear transformation to values into the open interval (0, 1). It first transforms the values from their original scale by taking y = (y − a)/(b − a), where a is the lowest possible value of that variable and b is the highest possible value of that variable. Next, it compresses the range to avoid zeros and ones by taking y" = (y (N − 1) + c)/N , where N is the sample size and c is the compression parameter. The smaller the value c is, the closer the extreme values are to 0 or 1, and the greater is the impact they may have on the estimation of the dispersion parameter in the model. Typically, c is chosen to be 1/2.
A CDF-quantile distribution can be specified by using the arguments fd and sd, where fd refers to the parent distribution while sd specifies the child distribution. Specifications of available distributions using fd and sd are available via the cdfqrFamily(shape = "all") command. The help display also describes the shape of each distribution. We briefly review them here; for more details see Smithson and Shou (2017). We have found that these distributions exhibit four kinds of characteristic shapes, which can be described by their density's tail behavior at the boundaries of the (0, 1) interval: 1. Logit-logistic subfamily: For some value of s > 0 (depending on the distribution), The first group is typified by the logit-logistic distribution, and thus is labeled "logit-logistic". Its tail behavior depends on σ and for s < 1, the densities at 0 and 1 go to 0, for s > 1 both densities go to infinity, and for s = 1 the density at 0 is exp(−µ) while at 1 it is exp(µ). The second group is the "bimodal" distributions, which are capable of having two modes within the (0, 1) interval because the densities at 0 and 1 always are 0. The third group is labeled "finite-tailed" distributions, which tend to be unimodal but with finite, identical, densities at 0 and 1 that are a function of σ. The arcsinh-Cauchy distribution described in Section 1.2 is a member of this subfamily, with density of πσ/2 at 0 and 1. Finally, the fourth group contains the "trimodal" distributions, which have one mode in the interior of (0, 1) and modes at 0 and 1 with infinite densities at the limit. Examples of the four subfamilies are displayed in Figure 1 (reproduced from Smithson and Shou 2017, Figures 1-4).
As shown in Table 1, data takes the data set, including both the dependent and predictor variables, in either matrix or data frame format. The variable (or corresponding column) names in the data object should correspond to the variable names in the formula. In some cases, users might use start to specify starting values for the mean and dispersion to improve convergence. By default, the empirical median and σ of y are used as the starting values for

Argument Description formula
A formula object, with the dependent variable (DV) on the left of a~operator, and predictors on the right. For the part on the right of '~', the location and dispersion submodels are separated by '|'. So y~X1 + X2 | Z1 + Z2 specifies that the DV is y, X1 + X2 specifies the location submodel, and Z1 + Z2 specifies the predictors in the dispersion submodel.

fd, sd
Arguments that specify the distribution. fd indicates the parent distribution, while sd indicates the child distribution. data Specifies the data object which is in a data.frame format. The columns are variables while the rows are observations. start User-specified starting values for estimation of the distribution mean and dispersion. control Other specifications for the estimation. the intercepts of the location and dispersion submodels. Finally, the control argument can be used to specify a list of parameters in the optimization procedure, such as the maximal number of iterations. The function cdfquantreg() returns a model object of S3 class 'cdfqr'. Model outputs can be extracted via common generic functions, which are described in Table 2.
Function Description summary(), print() Display the main results of the model fit, including model coefficients, log-likelihood, and gradient. coef() Extracts the coefficient values of the model object. deviance(), logLik(), Extract common model fit indices.

Examples
We present five examples applying the cdfquantreg package to real data-analyses. The first example compares the performance of the CDF-quantile model with a beta-regression model. The second example illustrates the incorporation of a continuous predictor into a CDFquantile model with a categorical moderator. The third example demonstrates how multivariate models of dependent observations on the (0, 1) interval can be constructed using the CDF-quantile family and copulas. The fourth applies CDF-quantile distributions to modeling data from an experiment, and this is reprised using the SAS (SAS Institute Inc. 2013) macro in Appendix A. The fifth example demonstrates the use of the cdfquantreg package's random-sampling capabilities for a simulation problem. Budescu, Broomell, and Por (2009) conducted an experimental study of lay interpretations of verbal phrases such as "likely" and "unlikely" to describe uncertainties. They used 13 sentences from the Intergovernmental Panel on Climate Change (IPCC) report (e.g., "The Greenland ice sheet and other Arctic ice fields likely contributed no more than 4 m of the observed sea level rise."). They asked participants to provide lower, "best", and upper numerical estimates of the probabilities to which they believed each sentence referred.

Interpretation of uncertainty phases in the IPCC report
The IPCC data-set includes the lower, best, and upper estimates for the phrases "likely" and "unlikely" in six IPCC report sentences. Half of the six sentences used the term "likely" and the remaining used "unlikely". The "likely" sentences are categorized as having a "positive" term while the "unlikely" sentences are categorized as having a "negative" term. A dummy variable named valence codes the responses in the positive term condition as 1, and those in the negative term condition as 0.
Using beta regression, Smithson, Budescu, Broomell, and Por (2012) reported two main findings. The first was that the "best" estimates were nearer to the middle of the [0, 1] interval in the negative-term condition than in the positive-term condition (estimated coefficient in the mean submodel = 0.150, p = .004, indicating that the mean is further from 1/2 in the positive-term condition). In addition, the responses were more variable (i.e., there was less consensus among respondents) in the negative-term than in the positive-term conditions (estimated coefficient in the precision submodel = 0.603, p < .001, indicating less variability in the positive-term condition).
We retest both findings by modeling the data with members of the CDF-quantile distribution family. The raw estimates themselves are in the variable named prob. The estimates for the negative-valence sentences were subtracted from 1 to render them directly comparable to the estimates for the positive-valence sentences. This variable was then transformed into a new variable named probm for shifting values away from the boundary values of 0 and 1.
A CDF-quantile model can be fitted in the similar way as a regression model that uses lm(). Here, we estimate a model using the t2-t2 distribution (where both parent and child distributions are t-distributions with 2 degrees of freedom). The t2-t2 CDF is where x ∈ (0, 1) and q = −1 when x < 1/2 and q = 1 when x ≥ 1/2. The t2-t2 distribution is a member of the finite-tailed subfamily and it has density σ 2 at 0 and 1.

R> summary(fit)
Family: t2 t2 Call: cdfquantreg(formula = probm~valence | valence, data = dataipcc, fd = "t2", sd = "t2") Gradient: -0.0114 -0.01 0.0294 0.0203 The first part of the output labeled as "Mu coefficients" displays the parameter estimation results for the location submodel. The results show that valence had a significant influence on the median of participants' probability estimates. The median of the probability estimates in the positive-term condition is found to be lower than in the negative-term condition (estimated coefficient = −0.186, p < .001). This is the opposite of the finding in Smithson et al. (2012) (for a detailed comparison of their model with this one, see Smithson and Shou 2017). The second part of the output shows the estimation results for the dispersion submodel. Our model agrees with the finding reported by Smithson et al. (2012), because the probability estimates in the positive-term condition have less dispersion (i.e., more consensus) than those in the negative-term condition (estimated coefficient = −0.4207, p < .001). Also noteworthy is that the log-likelihood for the beta regression model is 264.7 whereas the log-likelihood for the t2-t2 model is 435.3, a substantially better fit than the beta.
R> xx <-seq(0.005,0.995,.005) R> cdf_neg <-pq(xx, coefs[1], exp(coefs[3]), "t2", "t2") R> cdf_pos <-pq(xx, sum(coefs[1:2]), sum(coefs[1:2]), "t2", "t2") R> fitqtot <-quantreg::rq(probm~valence, tau = xx, data = dataipcc) Figure 4 shows that the noticeable difference between the two models is in the lower half of the distribution when valence is negative. The quantile regression model is capturing a "bump" in the distributions that is not able to be accounted for with the limited number of parameters in the CDF-quantile model. But this is simply because, with its arbitrarily many parameters, in the two-sample case the quantile regression model is simply repeating the empirical CDFs for the two samples. Moreover, we also note that there is very close agreement between the two models for the upper half of the distribution when valence is negative and for the entire distribution when valence is positive. All this is achieved with just two parameters in each of the CDF-quantile submodels. The user must then decide whether to prefer the more parsimonious parametric model or the slightly better-fitting so-called non-parametric model.

A continuous predictor and a categorical moderator
To supplement the preceding example, here we present a model that involves a continuous predictor whose effect is moderated by a categorical covariate. The data are, as in the previous example, people's interpretations of the IPCC report verbal uncertainty phrase "likely" in the sentence "Temperatures of the most extreme hot nights, cold nights and cold days are likely to have increased due to anthropogenic forcing." However, these data are from a different study involving samples from 27 countries reported by Budescu, Por, Broomell, and Smithson (2014). Our illustration uses the Australian sample of 393 respondents. The (reduced) dataset contains each participant's age, gender (0 = male, 1 = female), the probability that they consider corresponds to their most typical use of "likely", and their best estimate of the probability intended in the sentence from the IPCC report.
Our model investigates the influence that a person's own probability that they associate with "likely" may have had on the probability they nominated for the IPCC report sentence containing that term. The hypothesis being tested is that the personal probability is positively related to the nominated probability. The first model tested (mod0 below) fits a conditional logit-logistic distribution and verifies that this relationship exists and is positive (the location submodel coefficient for cfprob, the personal probability, is 2.1666).

Multivariate models of the IPCC data using copulas
An attractive approach to constructing multivariate distributions uses copulas, which are functions of CDFs and quantile functions. The CDF-quantile family has explicit CDFs, so copulas may be used to construct multivariate models of dependent doubly-bounded random variables. The most direct method is to select a copula, C (e.g., T, Clayton, or Frank), and derive an explicit expression for C (G 1 , G 2 , . . .), where C is the appropriate copula function and G j is the jth CDF. If C is differentiable in all of its parameters, then it has an explicit loglikelihood function and thereby (in principle) is amenable to maximum likelihood estimation of its parameters. In R the copula package (Hofert, Kojadinovic, Maechler, and Yan 2017;Yan 2007;Kojadinovic and Yan 2010) obtains maximum likelihood estimates of the marginal distribution and copula parameters simultaneously. Moreover, user-defined marginal distributions can be used in copula, so long as the CDF, PDF, and quantile functions (vectorized) are available. The cdfquantreg package provides all three functions.
We illustrate the application of cdfquantreg and copula to modeling multivariate distributions of variables with support on the unit interval by constructing a trivariate copula for the IPCC example data. The trivariate copula models the data from questions 4-6 (the positive-valence probability expression "likely"). The copula package offers two alternative estimation methods: a one-stage procedure in which the package estimates the marginal distribution and association parameters simultaneously, and a two-stage procedure in which the marginal distribution parameters are estimated first and the association parameters thereafter. In the latter procedure, the quantile functions G −1 j are applied using the marginal parameter estimates to generate pseudo-observations with uniform marginals, and the association parameters are then estimated from those pseudo-observations. We compare both procedures, using cdfquantreg to estimate the marginal parameters in the two-stage process. The marginal distributions are t2-t2 and the copula is the T copula, with degrees of freedom as a free parameter.
The dialog below displays the way to form a trivariate t-copula that allows three association parameters.
R> lrtest(copfitPV1, copfitPV2) Likelihood ratio test The next dialog shows the two-stage estimation procedure. It begins with the cdfquantreg estimates of the marginal parameters, generates the uniformly-distributed pseudo-observations, and then estimates a one-parameter trivariate copula using the pseudo-observations. R> fit4 <-cdfquantreg(Q4~1 | 1, fd = "t2", sd = "t2", data = IPCC_Wide) R> fit5 <-update(fit4, Q5~1 | 1) R> fit6 <-update(fit4, Q6~1 | 1) R> udat <-cbind(pqt2 (IPCC_Wide[, (fit6)[2]))) R> copfitudatPV <-fitCopula(tCop2, udat, start = c(0.5, 2.5)) R> loglikMvdc(c(coef (fit4) [1] 334.5089 Table 3 displays the estimates from the one-and two-stage models, along with their respective log-likelihoods. Theρ estimate denotes the Spearman's correlation coefficient and theψ estimate denotes the degrees of freedom in the T copula. Both models have similar loglikelihoods and fairly similar parameter estimates. The standard errors for the cdfquantreg estimates are somewhat larger than they are for the copula estimates. However, differences such as these are not surprising given that these two packages are using different estimation algorithms as well as different methods.  Table 3: Estimates for one-stage and two-stage models. Smithson, Priest, Shou, and Newell (2018) conducted a study to examine lay people's probability judgments when receiving ambiguous or conflicting information. In each of four scenarios, participants were presented with two pairs of expert forecasts regarding the number of days on which it would rain or the probability of raining. The forecasts were presented in pairs. For example, one expert predicts that 4 to 6 days out of the next 7 days will have rain, while another predicts that 2 to 5 days out of the next 7 days will have rain. Based on this pair of estimates (i.e., [4,6] and [2, 5]), participants were requested to provide their own estimates of how many days out of the 7 days will have rain. There were four scenarios in the study, in which the first pair of estimates in Scenarios 1, 3 and 4 had identical intervals. One of the research questions was whether the identical intervals yielded identical distributions of best estimates by participants even though they were presented in different contexts.

Probability estimates under ambiguity and conflict
To answer this question, we fitted the response variable using cdfquantreg() with several distributions. We examined whether the responses in different scenarios had significantly different location and/or dispersion parameters. The outputs below show the likelihoodratio tests assessing these effects via model comparisons. For all of the distributions fitted, the location parameters did not significantly differ across the scenarios, but the dispersion parameters did. We illustrate these findings with three of the distributions and their respective models.
The results of model comparison for models using t2-t2 distribution are displayed below. The likelihood-ratio tests show that the model with a scenario effect in the location submodel does not improve model fit over the null model, but a model with scenario effects in both the location and dispersion submodels does. This is due to the effect of scenario on dispersion.

Likelihood ratio tests
Resid. Df -2Loglik Df LR stat Pr(>Chi) 1 4708 -5800 2 4706 -5801 2 1.01 0.6 3 4704 -5830 2 29.31 4.3e-07 *** ---Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 However, the three distribution models provide different conclusions regarding where the effect in the dispersion submodel occurred (as shown below). The results of the t2-t2 model suggest that the dispersion is similar in Scenarios 3 and 1 (Scenario 1 is the base comparison group), whereas the dispersion in Scenario 4 is significantly smaller than in Scenario 1. The logit-logistic model, on the other hand, finds that Scenarios 4 and 1 have similar dispersion whereas the dispersion in Scenario 3 is significantly greater than in Scenario 1. Finally, the arcsinh-t2 model suggests that the estimates in both Scenarios 3 and 4 have significantly less dispersion than Scenario 1. Which models should be trusted more? Returning to the log-likelihoods in the output above, we can see that the arcsinh-t2 and t2-t2 models fit the data considerably better than the logit-logistic model, with the arcsinh-t2 model best of the three. The pattern of the coefficients is similar for the arcsinh-t2 and t2-t2 models as well, whereas this pattern is not to be found in the logit-logistic model. We should therefore regard the arcsinh-t2 and t2-t2 models as more trustworthy.

Simulating polarized attitudes
Suppose we wish to simulate populations in which an attitude on a controversial issue is equally polarized, and we are interested in the effects of the extremity of this polarization. In this case, the attitude in question is the proportion of one's investment portfolio that should be invested in high-risk but potentially very high-return shares. A conventional approach would be to use a mixture of two distributions (e.g., two beta distributions), but a convenient and simple method is to employ one of the bimodal distributions from the family in the cdfquantreg package. The logit-Cauchy distribution, for instance, has a simple quantile function and we may use it to control the degree of polarization in the simulated population. The 25th and 75th percentiles for populations with µ = 0 (and therefore a median of .5) are (3)) π + 1 2 and G −1 (.75, 0, σ) = tan −1 (σ log (3)) π + 1 2 .
Solving their difference for σ gives where q is the difference between these quantiles. To simulate from three populations in which the 25th and 75th percentiles are separated by q = .25, .5, and .75, we assign to σ values 0.377033, 0.910239, and 2.19751, respectively. These yield a unimodal and two bimodal distributions. The second distribution has modes at approximately .15 and .85, while the third has modes at about .05 and .95. Figure 7 illustrates the shapes of the three distributions using the PDF dq().

Conclusion
The cdfquantreg package and SAS macro offer researchers the ability to construct and test GLMs of random variates on the unit interval using the CDF-quantile family of distributions, which provides a viable alternative to well-known distributions such as the beta and Kumaraswamy. As demonstrated by Smithson and Shou (2017), members of the CDF-quantile family can model a variety of shapes unavailable to the beta or Kumaraswamy, and in the data-fitting examples presented in their paper members of this family out-perform those distributions. The family includes the logit-logistic distribution, and others that have not appeared before in the literature.
The package and macro also present a framework for systematically modeling quantiles of variates on the (0, 1) interval, thanks to the explicit quantile functions possessed by this family. Like beta regression, the GLMs in this family have a location and a dispersion submodel. Unlike beta regression, dispersion and location may be modeled independently of one another. Smithson and Shou (2017) show that the location parameter determines the median, and the dispersion parameter determines how far other quantiles are from the median.
The cdfquantreg package also enables researchers to venture beyond classical maximum likelihood inference, by providing bootstrap and Bayesian MCMC options for model estimation.

cdfquantreg: CDF-Quantile Regression in R
Model diagnostics, including the gradient, three types of residuals, and the dfbeta influence measures, are available for evaluating models. The package also provides pseudo-random generators for all of its distributions. Many of its functions and their usage have forms that are familiar to R users, and the documentation is extensive. The package does not depend extensively on other R packages, and (as demonstrated in the example involving the copula package) is easy to work with in conjunction with other packages in the R environment.
A. The SAS macro: %cdfquantreg CDF-quantile regression also is implemented in the SAS macro %cdfquantreg, using the NLMIXED (nonlinear mixed models) procedure to estimate model parameters. A variety of optimization techniques such as BFGS can be chosen, while the default optimization used in this macro is the trust region optimization. Successful convergence yields parameter estimates along with their approximate standard errors based on the Hessian matrix.
The macro employs the following input statement: %cdfquantreg(DATA, DV, FD, SD, LMIV, DMIV, INIT); where: • DATA is the name of a data set that includes the dependent variable and independent variables. The dependent variable should be within (0, 1) interval.
• DV is the name of the dependent variable.
• FD, SD are the abbreviations (without quotes) for the parent and child distributions. The family and usage are similar to those in R as outlined in Section 2.1.
• LMIV are the names of the independent variables in the location submodel. Dummycoded variables should be used to represent any categorical independent variables.
• DMIV are the names of the independent variables in the dispersion submodel.
• INIT are user-specified starting values for parameters, including intercepts. Different starting values are separated by '|', and in the same order as the names in LMIV and DMIV.
An example using the t2-t2 distribution to fit a null model is as follows: %cdfquantreg(data, y, "t2", "t2"); The macro returns the usual NLMIXED model outputs including iteration history, convergence results, fit statistics, parameter estimates, Hessian matrix, and covariance matrix of the parameter estimates. The fit statistics consist of the common metrics including −2 loglikelihood, AIC, AICC and BIC. A new data-set called "dataout" is generated and stored in the working directory. In addition to the original data-set, "dataout" includes the fitted µ i , σ i , and y i values, raw residuals, and Pearson residuals. The user can utilize these for further model diagnoses.

A.1. Ambiguity and conflict study with SAS
We replicate the example analyses from the ambiguity and conflict study by using the SAS macro. We first run the t2-t2 models to examine the model fit for an intercept-only model, a model with location predictors, and a model with both location and dispersion predictors. Finally, we replicate the arcsinh-t2 distribution run, and verify that SAS produces similar parameter estimates and log-likelihood values to those from cdfquantreg.