Holistic Generalized Linear Models

Holistic linear regression extends the classical best subset selection problem by adding additional constraints designed to improve the model quality. These constraints include sparsity-inducing constraints, sign-coherence constraints and linear constraints. The $\textsf{R}$ package $\texttt{holiglm}$ provides functionality to model and fit holistic generalized linear models. By making use of state-of-the-art conic mixed-integer solvers, the package can reliably solve GLMs for Gaussian, binomial and Poisson responses with a multitude of holistic constraints. The high-level interface simplifies the constraint specification and can be used as a drop-in replacement for the $\texttt{stats::glm()}$ function.


Introduction
Selecting a sensible model from the set of all possible models is an important but typically time-consuming task in the data analytic process.To simplify this process, Bertsimas and King (2015); Bertsimas and Li (2020) introduce the holistic linear model (HLM).The HLM is a constrained linear regression model in which the constraints aim to automate the model selection process.In particular, this can be achieved by utilizing quadratic mixed-integer optimization, where the integer constraints are used to place cardinality constraints on the linear regression model.
Cardinality constraints are used to introduce sparsity in the statistical model, a desirable property that leads to more interpretability, increase in computational efficiency and reduction in the variance of the estimates (at the cost of introducing some bias).Placing a cardinality constraint on the total number of variables allowed in the final model leads to the classical best subset selection problem (Miller 2002) which, given a response vector y ∈ R n , a predictor matrix X ∈ R n×p+1 and a subset size k between 0 and min{n, p}, finds the subset of at most k predictors β that produces the best fit in terms of squared error, solving the non-convex problem minimize β 1 2 ∥y − Xβ∥ 2 2 subject to p j=1 Several algorithms have been proposed to solve the best subset selection problem (among the most recent ones see e.g., Hazimeh and Mazumder 2020;Zhu, Wen, Zhu, Zhang, and Wang 2020).Other approaches which aim to achieve global sparsity mostly rely on different types of penalties (such as LASSO-type penalties Tibshirani 1996;Sun and Zhang 2021).Moreover, recent work considers similar constraints which be specified with the goal of introducing sparsity in the model at a group level, i.e., ensuring that certain sets of predictors are jointly included (excluded) in (from) the model.Hu, Li, Meng, Qin, and Yang (2017) introduce sparsity in a group structure via the ℓ p,q norm by using group restricted eigenvalue condition, while (Zhang, Zhu, Zhu, and Wang 2022b) propose a group-splicing algorithm that iteratively detects the relevant groups and excludes the irrelevant ones.Further cardinality constraints placed on user-defined groups of predictors can be used to e.g., limit the pairwise multicollinearity, select the best (non-linear) transformation for the predictors or include expert knowledge in the model estimation (such as forcing specific predictors to stay in the model).
In addition to cardinality constraints, upper or lower bounds as well as linear constraints on the coefficients are also relevant for improving interpretability along with introducing sparsity (e.g., see Slawski and Hein 2013, who introduce non-negativity constraints in linear regression and show performance similar to other sparsity inducing methods).
In this paper, we introduce the holiglm package, an R package for formulating and fitting holistic generalized linear models (HGLMs).The package supports holistic constraints such as mixed-integer-type constraints related to sparsity (e.g., limits on the number of predictors to be included in the model, group sparsity constraints), linear constraints and constraints related to multicollinearity.The contribution of the paper is three-fold.First, to the best of our knowledge, we are the first to suggest the use of conic optimization to extend the results presented for linear regression by Bertsimas and King (2015); Bertsimas and Li (2020) to the class of generalized linear models (GLMs) and to formulate the most common ones as conic optimization problems.Secondly, we survey the literature related to holistic linear regression and, more generally, to constrained regression and provide an extensive survey of what should be considered as holistic constraints.Finally, we provide a ready-to-use implementation of holistic GLMs in the package holiglm.We exemplify the use of the package for a variety of statistical problems with the hope that this will encourage the statistical community in further exploiting the recent advances in mixed-integer conic optimization.
The holiglm package provides a flexible infrastructure for automatically translating constrained generalized linear models into conic optimization problems.The optimization problems are solved by utilizing the R optimization infrastructure package ROI (Theußl, Schwendinger, and Hornik 2020).Additionally, a high-level interface, which can be used as a drop-in replacement for the stats::glm() function, is provided.Using ROI makes it possible for the user to choose between a wide range of commercial and open source optimization solvers.With recent advancements in conic optimization, it is now possible to routinely solve conic problems with a large number of variables and/or constraints to proven optimality.Using conic optimization instead of iteratively reweighted least squares (IRLS) has the advantages that no starting values are needed, the results are more reliable and the solvers are designed to handle different types of constraints.These advantages come at the cost of a longer runtime; however, as shown by Schwendinger, Grün, and Hornik (2021) for some GLMs the speed of the conic formulation is similar to the IRLS implementation.Nevertheless, problems with (very) large n and/or (very) large p can currently pose a marked computational burden on the software.It should be noted that the computational burden depends on both the conic formulation itself (in Appendix A one can see the dimension of the underlying optimization problems for different family and link functions) as well as the computational ability of the solvers.
At the time of writing, there exists no ready-to-use package or library for fitting HGLMs or HLMs.However, for R (R Core Team 2023) several packages allow the estimation of (generalized) linear models under linear or sparsity constraints.Package abess (Zhu et al. 2022) can fit generalized linear models with canonical link functions while enforcing global and group sparsity constraints.The CMLS (Helwig 2018) package can fit linear regression models under linear constraints by translating the problem into a linear constrained quadratic optimization problem, which is solved by quadprog (Turlach, Weingessel, and Moler 2019).Package colf (Boutaris 2017) can fit linear regression models with lower and upper bounds on the coefficients.The glmc (Chaudhuri, Handcock, and Rendall 2006) package can fit generalized linear models while imposing linear constraints on the parameters.Similarly, package restriktor (Vanbrabant 2022) provides functionalities for estimation, testing and evaluation of linear equality and inequality constraints about parameters and effects for generalized linear models, where the constraints can be specified by a text-based description.Finally, ConsReg (Sallés 2020) provides a similar functionality to restriktor.Package holiglm (Schwendinger, Schwendinger, and Vana 2024) is available from the Comprehensive R Archive Network (CRAN) at https://CRAN.R-project.org/package=holiglm.This paper is structured as follows: Section 2 introduces the GLM class and presents the representation of specific family-link combinations as conic optimization models.In Section 3, we present a list of holistic constraints implemented in the package.Section 4 introduces package holiglm for R. In Section 5 we compare our package regarding runtimes with existing packages for fitting constrained GLMs.Section 6 shows the usage of the package in two applications: fairness constraints in logistic regression and model selection in log-binomial regression.Section 7 concludes.

Generalized linear models
A generalized linear model is a model for a response variable whose conditional distribution belongs to a one-dimensional exponential family.A GLM consists of a linear predictor and two functions, namely a twice continuously differentiable and invertible link function g that describes how the mean, E(y) = µ, depends on the linear predictor (one has g(µ) = η), and a variance function V that describes how the variance depends on the mean VAR(y) = ϕV (µ) (where the dispersion parameter ϕ > 0 is a constant).
The most common distributions for the response y that can be accommodated in the GLM setting include the normal, binomial or Poisson.The density of members of the exponential Table 1: GLMs as exponential families with conic reformulations.The objective functions to be maximized are proportional to ∝ n i=1 y i λ i (β) − b(λ i (β)).The conic constraints with subscript i should hold for all i = 1, . . ., n. Superscript a marks family-link combinations with an objective modeled by second-order cones.Superscript b marks family-link combinations with an exponential cone representation.Note that for the probit link an approximation is used by scaling the coefficients obtained from the logit link by π/8 which is equivalent to using a simple approximation (Page 1977) for the standard normal distribution.
family can be written as: where λ is called the canonical or natural parameter, b(•) and c(•) are known real-valued measurable functions that vary from one exponential family to another.
It can be shown that In case h is the identity, the link g is called canonical, i.e., b ′ (•) is the inverse of the canonical link function.
The parameters β can be estimated by maximizing the likelihood function corresponding to the different types of response.Using the distribution of the exponential family, the loglikelihood for a sample y = (y 1 , . . ., y n ) ⊤ is given by: where ϕ i = ϕ/a i for a i are user-defined observation weights.
In this paper, we leverage the fact that the maximization of the objective function in Equation 1 can be reformulated as a conic optimization problem, for most common family-link combinations.A conic optimization problem is designed to model convex problems by optimizing a linear objective function over the intersection of an affine hyperplane and a nonempty closed convex cone.For GLMs, the types of cones relevant for maximum likelihood estimation include the exponential cone K exp , which can be used to model a variety of objectives and constraints involving exponentials and logarithms, and the second-order cone K soc , which can be used to model problems involving -directly or indirectly -quadratic terms.In holiglm we provide functionality for estimating GLMs with Gaussian, binomial and Poisson families with various link functions.Table 1 contains information on the family-link combinations implemented in package holiglm.The second column contains, for each i, the λ i () and b i () as functions of the vector of coefficients β.The third column presents the objective function obtained by reformulating the maximization of the log-likelihood as a conic program, while the fourth column contains the corresponding conic constraints.Note that all reformulations necessitate the specification of auxiliary variables ζ ∈ R and/or δ ∈ R n , which will be optimized together with the vector of regression coefficients.More details on the reformulation of the log-likelihood maximization problem for the presented family-link combinations can be found in Appendix A. Furthermore, we provide a package vignette which discusses how the user can decide whether a problem is solvable via conic optimization, i.e., whether the objective (and desired constraints) can be represented using convex cones.Note that the package currently does not allow for inverse Gaussian, gamma or negative binomial families, as at the time of writing no appropriate conic representation could be found for these families.More information about conic optimization in general can be found in Boyd and Vandenberghe (2004) and MOSEK ApS (2022a), more information about conic optimization in R can be found in Theußl et al. (2020).Bertsimas and King (2015) and Bertsimas and Li (2020) introduced a set of constraints designed to improve the quality of linear regression models.These constraints are crafted based on a survey of modeling recommendations from statistical textbooks and articles.In this section we survey different constraints suggested by the literature and provide an extensive overview which includes but is not limited to constraints in Bertsimas and King (2015) and Bertsimas and Li (2020).Moreover, we provide their formulations as optimization constraints.

Holistic constraints
Most of the constraints introduced in holiglm are cardinality constraints.For modeling cardinality constraints we need p binary variables: (2) These binary variables are only added to the model if needed.Here, the binary variable z j represents the selection of variable X j , hence, z j = 0 implies β j = 0. Problems which involve such binary variables are called mixed-integer optimization problems.This type of cardinality constraints can be modeled with a so-called big-M constraint.
where M is a positive constant.For the big-M constraint it is important to choose a good constant M .A good M can be characterized by two properties: • It is chosen big enough, that it does not impact the magnitude of the parameter β j .
• It is chosen as small as possible to improve the speed of the underlying optimization solver and ensure stability.

Global sparsity
The global sparsity constraint can be used to model the classical best subset selection problem (Miller 2002).The best subset selection problem is a combinatorial optimization problem designed to select a predefined number of k max predictors out of all available predictors.This can be accomplished by maximizing the likelihood function while restricting the number of predictors via integer constraints:

Group sparsity
Contrary to global sparsity constraints, group sparsity constraints do not restrain all predictors but only specific local groups of predictors.Group sparsity constraints can again be subdivided into excluding and including constraints: • Excluding constraints enforce a cardinality constraint on a subset of predictors.
• Including constraints enforce that either all predictors of a group are selected or none.Bertsimas and King (2015) suggest to use excluding constraints for modeling pairwise multicollinearity and selecting the best (non-)linear transformation.Including constraints can be used for modeling categorical variables via one-hot encoding or splines via spline basis functions.In the following we present the formulation of the excluding group sparsity constraints for restricting pairwise multicollinearity and for selecting the best variable transformation.

Limited pairwise multicollinearity
A desired property for a good model is that the predictors exhibit no multicollinearity.Collinear predictors can cause numerical problems during the estimation, less interpretable coefficients and misleading standard errors.
The group sparsity constraint can be used to restrict pairwise multicollinearity by excluding pairs of predictors that exhibit strong pairwise correlation.
Using the previously introduced variables z, the pairwise multicollinearity can be formulated as, here PC = {(j, k) | ρ max < |ρ(j, k)|} denotes the set of pairs of highly correlated predictors.The general rule of thumb is excluding pairs of predictors where the absolute value of the pairwise correlation exceeds the threshold ρ max = 0.7, but other -less as well as more restrictive -choices are also advocated for in the literature (see Dormann et al. 2013, for a review).

Non-linear transformations
In cases where the true relationship between the response and a covariate is non-linear, using an appropriate transformation can improve model quality.Commonly, non-linear transformations like log(x), √ x or x 2 are used.When using non-linear transformations, it is often desirable to only include either the non-transformed variable or only one of the transformed variables.A group sparsity constraint of the form j∈N L z j ≤ 1, can be used to select at most one of the transformations.Here N L denotes the set of indices of applied transformations, including the identity mapping, on a certain covariate.

Including group constraints
In cases where all predictors within a specified group G should be included in the model (as in the case of dummy-encoded categorical variables), the including group sparsity constraint (also referred here as in-out constraint) is translated to equality of all corresponding binary variables: Bertsimas and King (2015) point out that, in some situations, the modeler knows the true importance of a particular covariate, be it some business requirement or other more profound expert knowledge.In these cases, the modeler can choose to force the inclusion of a particular covariate.This might be necessary when the automated covariate selection process decides to discard the covariate due to other global or group sparsity constraints.To achieve this inclusion, the following constraint can be used:

Bounded domains for predictors
In certain applications, it is of interest to set constraints (bounds) on the value of particular regression coefficients.For example, McDonald and Diamond (1990) consider the problem of finding maximum likelihood estimates of a generalized linear model when some or all regression parameters are constrained to be non-negative and motivate their approach with applications in cancer death prediction and demography.Similarly, Slawski and Hein (2013) propose the use of non-negative least squares in high-dimensional linear models.They show that this can have similar effects to explicit regularization like LASSO (Tibshirani 1996) or ridge regression (Hoerl and Kennard 1970) regarding predictive power and sparsity when mild conditions on the design matrix are met.Moreover, there exist problems where the sign of the effect of a covariate is known beforehand.This allows the restriction of coefficient domains to only non-negative or only non-positive values (Carrizosa, Olivares-Nadal, and Ramírez-Cobo 2020).In our conic model approach, for a coefficient β j , we can add bounds (sometimes also called box constraints) l j and u j by the following constraint:

Linear constraints
As Lawson and Hanson (1995) point out, it might also be desirable to enforce linear constraints of the type Lβ ≤ c for certain predictors (typical constraints include . Such constraints might arise in applied mathematics, physics and economics and usually convey additional information about a problem.Adding additional convex constraints to a problem willonly tighten the set of solutions.One still has to be careful not to add constraints that make the underlying optimization problem infeasible.If any infeasibility arises, the underlying optimization solver will detect it.
Note that such linear constraints can also be imposed on the binary variables.For example, constraints on the binary variable can be used to model if-then types of relations.As an example, consider a constraint of the type "if the j-th predictor is included in the model then the k-th predictor should also be included in the model".This would translate to the constraint z j ≤ z k .Note also that the global and group sparsity constraints introduced above are special cases of linear constraints on the binary variables.

Sign coherence constraint
Sign coherence can be a desirable model property which improves interpretability, especially in the presence of highly correlated predictors.Carrizosa et al. (2020) suggest using a sign coherence constraint instead of the group sparsity constraint to restrict pairwise multicollinearity.They argue that enforcing sign coherence (forcing strongly positive correlated variables to have the same sign and strongly negative correlated variables to have opposite signs) is less restrictive than using a group sparsity constraint, since it allows strongly correlated variables to be jointly in the model without lowering interpretability due to inconsistent signs.Without this restriction, inconsistent signs of strongly correlated variables are often caused by compensating coefficients (Hahs-Vaughn 2016).Given the set PC of pairs of indices of highly correlated predictors, we can enforce equal signs of their respective coefficients by adding the constraints where again M is a large enough constant and u jk is a newly introduced binary variable.One can see that for u jk = 0 and ρ jk > 0 we have −M ≤ β j , β k ≤ 0, while for u jk = 1, we have 0 ≤ β j , β k ≤ M for an arbitrary pair (j, k) in PC.

The holiglm package
Package holiglm allows to reliably and conveniently fit generalized linear models under constraints.We aimed to allow for as many family-link combinations and constraints as possible without reducing the reliability of the solution.To accomplish these goals, the package uses state-of-the-art mixed-integer (conic) solvers.Using conic optimization, we can reliably solve convex non-linear mixed-integer problems, given that there exists a combination of cones that can express the non-linear problem at hand.Luckily, the (log-)likelihood of most GLMs can be expressed by combinations of the linear (non-negative) cone, the second-order cone and the exponential cone.Table 2 gives an overview on the cones used to express specific GLMs.Currently, the solvers ECOS (Domahidi, Chu, and Boyd 2013), MOSEK (MOSEK ApS 2022b) CPLEX (IBM ILOG 2022), GUROBI (Gurobi Optimization LLC 2022) and the NEOS (Czyzyk, Mesnier, and Moré 1998) Server via the ROI.plugin.neos(Hochreiter and Schwendinger 2020) can be used if the model contains mixed-integer constraints.Table 3 provides additional information on the solver licenses and on the different types of cones supported by these solvers.All the solvers listed in Table 3 internally aim to prove the optimality of the solution by checking criteria based on the Karush-Kuhn-Tucker optimality conditions.Consequently, if these solvers signal optimality, the user can be certain that the maximum likelihood estimate (MLE) was obtained, except for the special case where the MLE does not exist.In theory, these solvers should be able to provide a certificate of unboundedness if the MLE does not exist; however, practically this is often not the case for the exponential cone.Fortunately, package detectseparation (Kosmidis, Schumacher, and Schwendinger 2022) can be used to verify the existence of the MLE before the estimation.More information on conic optimization in general can be found in Boyd and Vandenberghe (2004).
In R, the packages ROI and CVXR (Fu, Narasimhan, and Boyd 2020) provide access to multiple conic solvers.The CVXR package provides a domain-specific language (DSL) which allows to formulate the optimization problems close to their mathematical formulation.To accomplish this, it first translates the problem into an abstract syntax tree and uses disciplined convex programming (DCP, Grant, Boyd, and Ye 2006) to verify that the problems are indeed convex.After the convexity is verified, the problem is transformed into its matrix representation and dispatched to the selected solver.Here it is important to note that the DCP rules are only sufficient for convexity, meaning that even if the DCP rules can not verify convexity, the problem can still be convex.Based on these properties, from our perspective the CVXR package is especially well-suited for users unfamiliar with conic optimization or fast prototyping.The ROI package, on the other hand, offers a unified solver access to many solvers, with a simple modeling language designed for users already familiar with R. For the holiglm package, we choose to rely on the infrastructure of package ROI since we already know the convexity properties of the likelihood functions and we want to be able to control the transformation of the likelihood into the optimization problem.Additionally, ROI has fewer dependencies and, last but not least, the authors are familiar with package ROI.

Model fitting
Function hglm is the main function for fitting HGLMs within package holiglm.
constraints The constraints imposed on the GLM.All constraints are of class "hglmc", so the constraints argument expects a list of "hglmc" constraints.
scaler Especially for constraints which rely on the big-M formulation, scaling is very important as it helps to make reliable choices in regard to the chosen big-M value.
scale_response Whether the response shall be standardized.Scaling the response is only used for family gaussian(), where the default is also to scale it.
big_m An upper bound M for the coefficients, needed for the big-M constraint.
solver By default, the best available optimization solver is chosen automatically.However, it is possible to force the use of a particular solver by providing the solver name.The argument is then passed along to ROI::ROI_solve.
control The control argument can be used to pass additional arguments along to ROI::ROI_solve.
dry_run The dry_run argument allows to obtain the underlying optimization problem.This can be useful if the user wants to impose additional constraints, which are currently not implemented, directly to the ROI optimization object.
object_size The object_size argument controls whether additional information such as the ROI solution as well as the hglm_model are stored in the fitted object.
By making use of the formula system provided by the stats package, we ensure that all the typical operations work as in stats::glm.We illustrate the usage of hglm without constraints for the log-binomial regression model.This particular family-link combination was chosen as a first example as it can be shown (by simulation) that using conic programming instead of the iteratively reweighted least squares (IRLS) employed by glm is not only more reliable but potentially faster (Schwendinger et al. 2021).

R> summary(fit)
Call: hglm(formula = cbind(n1, n0) ~RISK + NPLAN + ANTIB, binomial(link = "log"), data = Caesarian, constraints = NULL, solver = "ecos") In the above, the number of iterations is the one needed by the solver to reach convergence.Note that the summary function also returns the typical coefficients table which contains information on the standard errors of the coefficients, the z score and the p value corresponding to the hypothesis test H 0 : β j = 0 vs. H 1 : β j ̸ = 0.The user should employ these values with care, as in the case of constrained models, the usual asymptotic theory does not necessarily hold.More information is provided in Section 4.4.Instead of straight using function hglm for estimating the model, it is also possible to take a different approach, especially if the modeler wants to access the underlying optimization problem directly: 1. Generate the model with hglm_model.
2. Generate the underlying optimization problem with as.OP.

Constraints
The most distinctive feature of the holiglm package is that it allows for the usage of many predefined constraints.In Section 3 the constraints were introduced.In this section we focus on the R usage of the constraints.Linear constraint Imposes linear constraints on the coefficients or on the z variables from the big-M constraint.

group_equal(vars) b
Linear constraint Forces all predictors in the specified group to have the same coefficient.

sign_coherence(vars) a
Sign coherence Ensures that the coefficients of the specified predictors have the same sign.pairwise_sign_coherence(rho) a Sign coherence Ensures that coefficients of predictors that exhibit pairwise correlation more than rho (in absolute value) have the same sign.
Table 4: Overview on the pre-implemented constraints.Here only the main arguments are shown, all arguments can be found in the holiglm manual.The constraints marked with a are of mixed-integer type, while the ones marked with b are of linear type.

Combining constraints
Constraints can be combined by the combine operator c().In the following example, we ensure that at most 5 predictors are in the final model and that the pairwise correlation of the active variables is less than or equal to 0.8 by combining the two constraints: R> c(k_max(5), rho_max(0.8)) List of Holistic GLM Sparsity Constraints

Global sparsity
The global sparsity constraint k_max can be used to enforce an upper bound on the number of non-zero predictors contained in the model.The upper bound is enforced on the number of columns in the model matrix without taking into account the intercept.
1.319768 0.000000 -1.774368 As one can see by looking at the coefficients, in contrast to other sparsity-inducing methods, the coefficients are not just close to zero but exactly zero if not included in the model.The information about the active coefficients can be obtained by using the active_coefficients method or the coef function with type="selected".

Group sparsity
The group sparsity constraint can be used to restrict the number of predictors to be included from a particular subgroup.The vars argument is a vector containing the names of the variables for which the constraint should be applied.Note that the names of the variables should correspond to the column names of the model matrix.As previously mentioned, a possible use case for this type of constraint is the selection of the best out of different transformations.For illustration, we employ a Poisson regression with log link to model the number of apprentices moving to Edinburgh from different counties.Data is available in package GLMsData (Dunn and Smyth 2018).
R> data("apprentice", package = "GLMsData") We specify the following model with group constraints.For the variables distance Dist and population Pop also the log transformations are considered for inclusion in the model.Indeed, the logarithmically transformed variables are selected for inclusion in the model.
In-out constraints implemented in group_inout force the coefficients of all variables in argument vars to be jointly zero or different than zero.An application for this type of constraints is dummy-encoded categorical features, where either all dummy variables should be included in the model or none.The following Poisson regression model is restricted to include three predictors, whereas all dummies corresponding to the variable Locn (location relative to Edinburgh) should be included or excluded from the model: R> fo <-Apps ~log(Dist) + log(Pop) + Urban + Locn R> constraints <-c(k_max(3), group_inout(c("LocnSouth", "LocnWest"))) R> fit <-hglm(fo, constraints = constraints, + family = poisson(), data = apprentice) To restrict the pairwise collinearity, we provide the constraint rho_max which sets an upper bound on the pairwise collinearity.For the mtcars data set in R we specify the following linear model, where the empirical correlation of the variables cyl and disp is more than 0.9: R> fit <-hglm(mpg ~cyl + disp + hp + drat + wt, + data = mtcars, constraints = rho_max(0.9))R> coef(fit) (Intercept) cyl disp hp drat 34.49587578 -0.76229201 0.00000000 -0.02088543 0.81771262 wt -2.97331332 The variable disp is excluded from the model.

Modeler expertise
The function include can be used to force the inclusion of a specific variable in the model.This can be useful if a modeler is interested in setting an upper bound on the number of active variables while ensuring that a certain variable stays in the model.For example, the following model on the Caesarian data set will include two variables and one of them will be NPLAN:

Bounded domains
The functions lower and upper can be used to set lower and upper bounds on the value of the coefficients.This can be used, for example, to add non-negativity constraints or any other bounds on the coefficients.
The standard errors are corrected as described in the vignettes.
The standard errors are corrected as described in the vignettes.
The standard errors are corrected as described in the vignettes.
R> coef(fit) To impose both constraints at the same time, we can either combine the linear constraints or use the matrix notation as shown below.
The standard errors are corrected as described in the vignettes.
R> coef(fit) A special case of linear constraint is group_equal, which ensures that the coefficients of the variables in vars are equal: R> fit <-hglm(cbind(n1, n0) ~RISK + NPLAN + ANTIB, binomial(link = "log"), + constraints = group_equal(c("RISK", "NPLAN", "ANTIB")), + data = Caesarian) R> coef(fit) (Intercept) RISK NPLAN ANTIB -0.9710750 -0.1750262 -0.1750262 -0.1750262Finally, we showcase how linear constraints can be imposed on the z variables by setting on_big_m = TRUE in function linear.This can be useful for modeling restrictions of the type "if variable j is included variable k should also be included".An application for this type of constraints is estimating models with polynomial features, where if a higher order of the polynomial is included in the model, it is desirable that all lower orders are also included.The following linear model example uses data heatcap from package GLMsData where the relation of heat capacity for a type of acid and temperature is investigated.

Sign coherence constraint
We estimate a Poisson model with identity link for the apprentice data where we use interactions of the location with the other predictors.In this case it can be desirable that, for each covariate, the slopes for all locations have the same sign.The slope parameters for distance are all negative, while the slope parameters for population and degree of urbanization are all positive.
Finally, we showcase on the mtcars data set how we can restrict the highly correlated predictors to have the same sign using pairwise_sign_coherence: R> fit <-hglm(formula = mpg ~cyl + disp + hp + drat + wt, + data = mtcars, constraints = pairwise_sign_coherence(0.9)) R> coef(fit) (Intercept) cyl disp hp drat 3.449560e+01 -7.622500e-01 -1.535748e-08 -2.088604e-02 8.177559e-01 wt -2.973326e+00 Unlike in the unrestricted model, variables cyl and disp both have a negative sign.Note also that, in this example, the coefficient of disp is close to zero, which is in line with the result obtained when using rho_max constraint.

Estimate a sequence of models
Function hglm_seq can be used to estimate the sequence of models containing k = 1, . . ., p predictors.The print method for the resulting "hglm_seq" object includes the AIC, BIC, EBIC γ (Chen and Chen 2008), SIC (Zhu et al. 2020) as well as the vector of coefficients for each model.For the polynomial regression example introduced before, we can estimate the sequence of models: R> fit_seq <-hglm_seq(formula = Cp ~poly (Temp,5) We can observe that the model with polynomial features up to degree 4 is the one preferred by AIC, BIC and EBIC 1 .

Group duplicates
For binomial regression models, if all the predictors are categorical or discrete variables with few possible values, it is often the case that the model matrix contains duplicated rows.In this case, it is convenient to aggregate the original data to a data frame containing the unique values of the predictors in the rows and for each row the number of successes and failures corresponding to the response.Function agg_binomial can be used for this purpose.For the Heart data set the number of rows reduces from originally 16949 to 74: R> data("Heart", package = "lbreg") R> heart <-agg_binomial(Heart ~., data = Heart, as_list = FALSE) R> c(nrow(Heart), nrow(heart)) [1] 16949 74 This function should always be used when possible, as it drastically reduces the size of the underlying optimization problem and therefore, reduces the solving time.After the reduction the model can be fitted as follows: R> hglm(cbind(success, failure) ~., binomial(link = "log"), + data = heart, constraints = NULL)

A note on the output of the summary function
The summary function in holiglm returns information on the standard errors of the coefficients, the associated z score and the p value corresponding to the hypothesis test H 0 : β j = 0 vs.
H 1 : β j ̸ = 0 using the normal approximation.However, in the case of constrained models, the usual asymptotic theory does not necessarily hold and the p values can be misleading in testing for significance.
We use the following strategy in computing the standard errors of the regression coefficients in holiglm: a.For the case of sparsity constraints, the standard errors in holiglm are computed expost using the Fisher information matrix only for the non-zero coefficients.While this practice is commonly used for model selection procedures, the reader should note that this does not take into account the uncertainty of the selection process itself.The issue of assessing significance and effect sizes from a data set after the same data set has been used to identify relevant predictors is also referred to as selective inference (Taylor and Tibshirani 2015).A survey of strategies which can be used to mitigate this problem is presented in Zhang, Khalili, and Asgharian (2022a).Possible approaches include adjusting the p values (see e.g., Taylor and Tibshirani 2015; Lee, Sun, Sun, and Taylor 2016), using a randomized response (Tian and Taylor 2018) or different kind of bootstrap strategies.
b.For all other constraint types we apply a correction to the standard errorsof the coefficients.More specifically, the variance-covariance matrix of the regression coefficients is computed as the corresponding portion of the covariance matrix of the parameters, Lagrangeans, and all auxiliary parameters corresponding to the optimization problem of the augmented likelihood (see e.g., Schoenberg 1997, for more computational details).
Note that the approach in b. does not distinguish between binding inequality constraints and equality constraints, however, for inference the difference is relevant.In the presence of binding inequality constraints the normal approximation may no longer valid and confidence intervals obtained by the usual t statistics can be misleading (Silvapulle and Sen 2005).For more information on constrained statistical inference we refer the readers to the monographs Robertson, Wright, and Dykstra (1988) and Silvapulle and Sen (2005), which provide a broad overview of inference under inequality constraints.In such cases, alternative strategies should be employed to derive confidence intervals for the parameters of interest (e.g., such as nonparametric or residual bootstrap).For example, Self and Liang (1987) derive large sample properties for the likelihood function in the case where the true value lies at the boundary of the parameter space, while Grömping (2010) implements methods for performing inference under linear equality and inequality constraints in normal models in the ic.inferR package.
In addition to performing the correction of the standard errors, the holiglm package returns a warning every time the employed inequality constraints are binding.This warning informs the user to proceed with caution in interpreting the output.

Comparison with existing R packages
In this section, we are showcasing the versatility of our package holiglm in comparison with existing R packages for fitting GLMs under linear and/or sparsity constraints.
Table 5 gives an overview of which constraints are supported by the evaluated packages holiglm, abess, ConsReg, bestglm and restriktor.
Table 6: Overview of family-link combinations supported by R packages for fitting constrained GLMs.A bracketed check mark (✓) indicates that starting values are needed for finding valid solutions to constrained problems.
family-link combinations can be fitted while also making a distinction between software that is able to estimate the models directly and software that needs additional information such as, e.g., starting values.
In the following, we aim to investigate by means of simulation studies how holiglm compares to other packages in terms of runtimes and reliability of the solution for the best subset selection problem and for non-negative GLMs.Finally, we also investigate the memory consumption of the models fitted with holiglm and compare it to the one of stats::glm() for an unconstrained problem.It should be noted that, for the various log-likelihoods, the time for solving the underlying optimization problems with holiglm is heavily dependent on their conic representations and the used solver.As outlined in Table 1 these representations include the second-order cone as well the primal exponential cone for linear, logistic and Poisson regression.
In setting up the exercises, we follow a simulation setting similar to the one in Hofmann, Gatu, Kontoghiorghes, Colubi, and Zeileis (2020).For generating the data of our predictor matrix X ∈ R n×(p+1) (including an intercept term) we sample n observations from a p-dimensional multivariate normal distribution with zero mean and covariance matrix Σ := δ ij where δ denotes the Kronecker delta.Given p features, we choose the first q = p 2 as true features and set their respective coefficients to 1. Additionally, we set the intercept term to 1.The remaining coefficients are set to 0. Therefore, we have where g is the link function.For the different distributions of the response, we simulated y i ∼ Normal(µ i , 1), y i ∼ Binomial(1, µ i ) and y i ∼ Poisson(µ i ), respectively.
In the different exercises, we use different n and p configurations.For each configuration, we carry out 10 runs leading to 10 different data sets where we report the mean runtimes together with the standard deviation of these single runs.All computational experiments were conducted on a Dell XPS15 laptop with 32GB of RAM and an Intel Core i7-8750H CPU@2.20GHzx12processor.As optimization solvers in holiglm we used GUROBI 9.1.2(Gurobi Optimization LLC 2022), ECOS 2.0.5 (Domahidi et al. 2013) and MOSEK 10.0.34 (MOSEK ApS 2022b).For all examined examples, the open-source ECOS solver can be used, which is also the default solver in holiglm.However, regarding running times, we have also included GUROBI (for linear regression) and MOSEK (for logistic and Poisson regression) to show which runtimes are possible by using commercial solvers.

Fitting GLMs under global sparsity constraints
For the first study, we fit GLMs under global sparsity constraints leading to the best subset selection problem.In this setting, we compare holiglm with the R packages abess (Zhu et al. 2022) andbestglm (McLeod, Xu, andLai 2020).
We are interested in finding models with minimal AIC as information criterion.The AIC is monotonic with respect to the log-likelihood, that is where |β| denotes the number of non-zero coefficients in β.In holiglm, we estimate the sequence of models with global sparsity constraints to find the model with minimal AIC.Out of this sequence containing the models with maximum log-likelihood for each support size, we select the model with minimal AIC.In total, this leads to solving p mixed-integer conic optimization problems.
We consider different data configurations with n = 1000 observations and varying sizes of = 5, 10, 15, 20, 25, 30, 35, 40 features.For logistic and Poisson regression and linear regression with ECOS as solver, we only benchmark up to 20 features for the best subset selection study due to long solver running times.Moreover, since bestglm uses exhaustive enumeration for non-Gaussian GLMs, it is limited to data sets containing 15 features or less.
The runtimes results for the best subset selection problem in our first simulation are illustrated in Table 7 and Figures 4 to 6.We can observe that in this scenario abess is orders of magnitudes faster than the compared packages.In terms of scaling, Table 7 indicates that for linear and logistic regression holiglm scales better than bestglm with increasing dimensions.
We compare the different packages not only with respect to the runtime but also based on their accuracy in finding global AIC minima.To account for numerical rounding errors, we already consider the selection of model coefficients with the lowest AIC as success.Table 8 shows that holiglm (with GUROBI respectively MOSEK as solver) and bestglm are always able to find the model with the minimum AIC.As a result, they solve the best subset selection problem with regard to the information criteria.

Fitting GLMs under non-negativity constraints
The second study deals with estimating non-negative GLMs, i.e., finding models under linear constraints, specifically under non-negativity constraints on the coefficients.We again consider different data configurations with n = 1000 observations and varying sizes of p = 5,10,15,20,25,30,35,40 features.Here we compare holiglm with the packages ConsReg and restriktor.
In the case of linear constraints, we encountered the problem that for ConsReg, the default optimizer solnp provides results where coefficients can be negative contrary to the specified constraints.Package ConsReg relies on non-linear general-purpose optimization solvers.This has two main disadvantages: • Starting values need to be provided, which must reside in the feasible region's interior.
• General-purpose solvers are less reliable than conic optimization solvers in the sense that they are more likely to find local rather than global optima.
Hence, a general-purpose solver may signal success, but the returned solution is not a global optimum.Since conic solvers internally verify the solution using the Karush-Kuhn-Tucker conditions, they are unlikely to signal success without obtaining the global optima.We also tried to use a vector consisting of zeros as an initial solution (which lies on the boundary of the feasible region), but this also led to solutions with negative coefficients.To mitigate our problem, we start with finding a solution to the unconstrained optimization problem fitting  a full model.We set the respective negative coefficients of this full model to zero and use it as the initial solution to solnp.This procedure helped increase the number of times where ConsReg could find a valid solution satisfying the non-negativity constraints.The time for fitting this full model is also included in the reported runtimes of package ConsReg.
The results presented in Table 9 and Figure 7 show that when 10 or more features are used, the hglm method is faster than other methods for fitting non-negative GLMs with Gaussian responses.This is achieved by using GUROBI as the optimization solver.Moreover, the timings imply that, in this case, holiglm scales better with higher dimensions.Table 10 displays the number of times each method was able to obtain a valid solution.It is worth noting that all methods were able to fit non-negative regression models in the case of linear regression.
For the case of logistic regression, both Table 9 and Figure 8 show that holiglm is faster than other packages for higher dimensions (i.e., 20 or more covariates).This is true for both conic solvers ECOS and MOSEK and we observe better scaling behavior as the dimensions increase.However, unlike linear regression, there are cases where ConsReg cannot find a valid solution, as indicated in Table 10.
Table 9 summarizes the runtimes for Poisson regression, and Figure 9 shows them visually.
Using ECOS as solver the runtimes of ConsReg and holiglm are similar for lower dimensions, but ConsReg becomes faster when we reach 30 or more dimensions.However, Table 10 indicates that ConsReg cannot always find non-negative Poisson regression models.The most efficient method appears to be using hglm in combination with MOSEK as the solver.Moreover, the benchmarks indicate that this combination also has the best scaling behavior for the Poisson regression case.
In all evaluated scenarios restriktor is the slowest at fitting non-negative GLMs.

Memory usage in unconstrained logistic regression
Finally, we take a look at the used memory of models produced by holiglm with regards to different data sizes in an unconstrained logistic regression.Both methods for fitting hglm and hglm_fit contain the argument object_size, which can take the values "normal" and "big".Notably, the "big" object is primarily intended for development purposes and includes additional components such as hglm_model and the ROI optimization problem. Figure 1    Comparison between object.size of hglm(object_size = "normal"), hglm(object_size = "big") and glm objects for logistic regression.the used memory of fitted objects (both object sizes "normal" and "big") for logistic regression in comparison with plain objects of class glm.To provide a comprehensive evaluation, we vary the number of observations (n = 100, 1200, 2300, 3400, 4500, 5600, 6700, 7800, 8900, 10000) and features (p = 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60).Our results show that hglm objects with object_size = "big" occupy more memory than their object_size = "normal" counterparts while objects of object_size = "normal" occupy slightly less memory than comparable glm objects.

Use cases
This section illustrates the usage of the holiglm package on more realistic real-world examples.

Fairness in logistic regression
We illustrate how the holiglm package can be employed for estimating the coefficients of a logistic regression model with the fairness constraints proposed in Zafar, Valera, Gomez-Rodriguez, and Gummadi (2019).In the fairness literature, one is typically interested in building classification models free of the so-called disparate impact.A decision-making process suffers from disparate impact if it grants disproportionately large fraction of beneficial (or positive) outcomes to certain groups of a sensitive feature, such as gender or race.Assuming we have a binary response variable y and a binary sensitive feature w, a binary classifier is free of disparate impact if the probability that the classifier assigns an observation to the positive class is the same for both values of the sensitive feature w: P(ŷ = 1|w = 0) = P(ŷ = 1|w = 1).If w has more than two classes, this relation should hold for all values of w.Note that even if the sensitive variable w is not included in the model, the classifier can still suffer from disparate impact (mainly due to the association between the non-sensitive variables x included in the model and the sensitive variable).
However, for many classifiers these probabilities are typically non-convex functions of the parameters, which complicates estimation.To provide a more general framework, Zafar et al. (2019) propose a covariance measure of decision boundary unfairness.This is given by the covariance between a sensitive binary attribute w and the signed distance from the (nonsensitive) feature vectors x i to the decision boundary: (5) The framework can also accommodate for various binary sensitive variables w (k) , k = 1, . . ., K, where limits are imposed for the above covariance measure for each of the k sensitive variables.
For the logistic regression problem, where the signed distance to the decision boundary is proportional to x ⊤ i β, the fairness constrained model is given by: maximize where c k ∈ R + for k = 1, . . ., K is a given constant which trades off accuracy and decision boundary unfairness.
As an illustrative example we employ the AdultUCI data set in package arules (Hahsler, Buchta, Grün, and Hornik 2021), which contains data extracted from the 1994 U.S. census bureau database.The response variable in this data set is the binary variable income which indicates whether the income of the respondents exceeds USD 50 000 per year.
R> data("AdultUCI", package = "arules") Before estimation, we eliminate all observations with missing values: R> AdultUCI <-na.omit(AdultUCI)R> dim(AdultUCI) [1] 30162 15 The data is rather unbalanced in terms of the distribution of the response: We will therefore consider both race and gender as sensitive variables in our use case.We create one w (k) binary (i.e., dummy) variable for each factor level combination.
R> W <-model.matrix(~0 + sex:race, data = AdultUCI) We consider the following predictive model for the analysis (note that the sensitive variables are not included in the model formula): R> form <-"income ~age + relationship + `marital-status`+ workclass + + `education-num`+ occupation + I(`capital-gain`-`capital-loss`) + + `hours-per-week`" The variable native-country is not included in the model as it was eliminated in a preliminary stepwise selection procedure.
In order to assess the trade-off between fairness and prediction accuracy, we follow Zafar et al. (2019) and choose a grid of values α k ∈ [0, 1] which are used to compute c k = α k s (k) .
Here s (k) is the empirical covariance (in absolute value) between the sensitive variable w (k)  and the linear predictor x ⊤ i β from the unconstrained regression; s (k) serves in this case as an upper bound on the disparate impact covariance measure.A value α k = 0 represents the case where the empirical covariance in Equation 5 is restricted to zero while α k = 1 represents the case with no fairness constraints.We first estimate the unconstrained model using the stats::glm function and the calculate s (k) : R> m0 <-glm(as.formula(form),family = binomial(), data = AdultUCI) R> s <-apply(W, 2, function(w) abs(cov(w, m0$linear.predictors))) The left-hand side of the constraints from Equation 6 can be specified in matrix form Lβ where L is constructed by: R> xm <-model.matrix(m0)R> L <-t(apply(W, 2, function(w) colMeans((w -mean(w)) * xm))) For a grid of α k values, we estimate the fairness constrained logistic regressions.We set the big_m argument to 5, which means that the standardized coefficients (see scaler = "center_standardization") are constrained to be at most 5 in absolute value.A value of 5 for big_m is small enough to ensure a fast convergence of the algorithm and, at the same time, sufficiently large to be non-restrictive for the problem at hand (in none of the estimated models this bound was reached).For each model we store the in-sample predictions, which will be used for assessing the accuracy versus fairness trade-off.R> library("holiglm") R> K <-nrow(L) R> alpha <-seq(0, 1, by = 0.05) R> pred_prob_race_sex <-sapply(alpha, function(ak) { + ck <-ak * s + model_constrained <-hglm(as.formula(form),+ family = binomial(), data = AdultUCI, + scaler = "center_standardization", big_m = 5, + constraints = c(linear(L, rep(">=", K), -ck), + linear(L, rep("<=", K), ck))) + phat <-predict(model_constrained, type = "response") + phat + }) For each model we compute the in-sample accuracy and area under the ROC curve (using package pROC, Robin et al. 2011).Note that we consider a cut-off probability threshold of 0.5, i.e., a label of ŷ = 1 will be assigned if the predicted probability exceeds 50%.To assess the disparate impact of the different models we use the following measure: ) .
We only consider the dummy variable w (k)   The resulting plot can be seen in Figure 2. We observe that the model with α = 0 has a DI measure of around 0.6, so it still exhibits disparate impact in favor of the privileged group, even if the covariance measure is restricted to zero in the model building phase.To get a more accurate picture, one could repeat the analysis by computing the above measures on a test set.The cut-off values for the prediction can also be chosen differently.Finally, one can see that the trade-off between fairness and predictive power is not very high, as the loss in the predictive measures is not dramatic as α k decreases.Such trade-off analysis can help the modelers and decision-makers in choosing a setting which is acceptable for the use-case at hand.

Model selection in log-binomial regression
In the following example we show how the holiglm package can be used for model selection for log-binomial regression.For this example we use the icu data set from the aplore3 package (Braglia 2016).In this application, the goal is to model the status of patients at hospital discharge (i.e., alive versus dead) using a series of patient-specific predictors.
R> data("icu", package = "aplore3") R> dim(icu) [1] 200 21 First, we will check if the MLE exists by making use of the detectseparation package.For log-binomial regression the existence criterion is slightly different from the existence criterion for logistic regression, for more information we refer to Schwendinger et al. (2021).
R> library("detectseparation") R> icu <-icu[, setdiff(colnames(icu), "id")] R> glm(sta ~., family = binomial("log"), data = icu, + method = "detect_separation") For the purpose of model selection, we estimate a sequence of models of different size k = 1, . . ., 21 using function hglm_seq.In Figure 3 we present, for each model size k, the AIC and BIC of the optimized model together with the selected coefficients (which can be interpreted in terms of relative risks for the log-binomial regression).
R> fits <-hglm_seq(formula = sta ~., family = binomial("log"), + data = icu, solver = "ecos") In terms of AIC, models of size 7-9 are preferred, while the size suggested by BIC is around 3-5.Looking at the estimated set of coefficients for each k can provide additional insights into the model behavior.
For example, the modeler can look at the stability of the magnitude of the coefficients and how they change when the number of predictors decreases.Big changes in the magnitude of the coefficients or sign changes can be an indication of multicollinearity problems within the data.

Conclusion
The holiglm package can be used to fit generalized linear models which are optimal with respect to a set of specified constraints.The package provides different types of constraints which are relevant for statistical modeling purposes such as: (i) different requirements for sparsity (e.g., setting an upper limit on the number of predictors included in the model globally or from a group of pre-specified variables, forcing a group of predictors to jointly be included or excluded from the model), (ii) linear constraints, (iii) constraints on the pairwise correlation among predictors, (iv) constraints on the sign of the coefficients.The interface of the package is very similar to the one provided by stats::glm, making it accessible for users to specify the desired GLMs.Using the fact that the most common GLMs can be represented as conic programs, holiglm leverages the tremendous advancements in the field of conic programming to provide efficient estimation of constrained GLMs.The package relies on ROI as the underlying infrastructure.Through the ROI plugins, problems with conic objectives and conic-and mixed-integer constraints such as the ones implemented in holiglm can reliably be computed within R. Since conic optimization is an active field of research, this means that the runtime of estimating constrained GLMs with the holiglm package will be reduced once more mature solvers or faster algorithms for conic optimization will become available.Future work includes extending the package with further family-link combinations, by finding appropriate conic representations (if possible) for e.g., the binomial family with complementary log-log link, the inverse Gaussian or the gamma families.Package holiglm can be used for automatic model selection.However, we recommend using it complementary to the modeler's expertise as an exploratory tool which provides guidance on the structure of the final model.

A. Conic formulations
In this section we reformulate the log-likelihood of the different family-link combinations in terms of Equation 1.Let x ⊤ i β = η i .The family-link combinations implemented in holiglm can be expressed by the zero and free cones: K zero = {0}, K free = R = K * zero , where K * = {y ∈ R n | y ⊤ x ≥ 0 ∀x ∈ K} denotes the dual cone for a closed convex cone K, the linear cone: the second-order cone: and the primal exponential cone:

Identity link
For the identity link we have g −1 (η i ) = η i so Using an auxiliary variable ζ, an inequality of the form ζ ≥ w 2 can be expressed by a secondorder cone: subject to (δ i , 1, 1 + γ i ) ∈ K exp for all i ∈ {1, . . ., n}

Log link
For the log link we have: Using Equation 8, the objective can be reformulated as:

Identity link
For the identity link we have: Imposing η i ≥ 0 for all i = 1, . . ., n and using Equation 8, the objective can be reformulated as:

Square root link
For the square root link we have: Using Equations 7 and 8, the objective can be reformulated as:

B. Runtime plots
This appendix contains the beeswarm plots for the runtime benchmarks in Section 5.The data described by these plots along with scripts to generate and plot the results can be found in the replication material of this paper.

Figure 3 :
Figure3: AIC, BIC and relative risks (exp β j ) for the icu data from the log-binomial regression with constraints on the maximum number of predictors k.RR 0 corresponds to the baseline relative risk exp β 0 .

Figure 4 :
Figure 4: Runtimes for best subset selection for linear regression across 10 data sets.

Figure 5 :
Figure 5: Runtimes for best subset selection for logistic regression across 10 data sets.

Table 3 :
Overview of the solvers implemented as plugins in ROI that can handle mixedinteger constraints in combination with either a quadratic objective with linear constraints or a linear objective with linear, second-order and exponential conic constraints.
After the model is set up, the underlying optimization problem can be built by using the generic function as.OP exported from ROI and solved using function ROI_solve in ROI.Alternatively, the model can be fitted with hglm_fit.Here the solver output is already converted into a form similar to the output of glm.fit.
Table 4 gives an overview of all the available constraints in holiglm.
Table 6 displays for each package which

Table 7 :
Comparison of R packages on best subset selection with respect to runtime.The average runtimes are given in seconds together with the standard deviation (in parentheses) across 10 data sets.

Table 8 :
Comparison of R packages on best subset selection with respect to the number of times the method selected the correct coefficients of the model with minimal AIC for 10 data sets.

Table 9 :
Comparison of R packages on fitting linear constraints with respect to runtime.The average runtimes are given in seconds together with the standard deviation (in parentheses) across 10 data sets.

Table 10 :
displays ConsReg restriktor hglm (ECOS) hglm (MOSEK) ConsReg restriktor hglm (ECOS) hglm (MOSEK) ConsReg restriktor Comparison of R packages for fitting GLMs with linear constraints with respect to the number of times the method found a valid solution for 10 data sets.
table(table(AdultUCI$income))Moreover, this data set is known to be skewed in terms of race and gender.Given the sparsely populated race groups Amer-Indian-Eskimo, Asian-Pac-Islander and Other, we merge them into the Other group.
which corresponds to the White:Male factor combination when computing DI but other approaches which take into account all w (k) 's are available.The DI measure considers the White:Male as the privileged group in this data set and a value less than one indicates that, in the prediction, the privileged group is favored.As a rule of thumb, values greater than 0.8 can be considered acceptable. 1 This figure displays the disparate impact, accuracy and area under the ROC curve for different degrees of fairness given by α ∈ [0, 1], where an α of zero implies a fair model in terms of the specified covariance measure.
Package detectseparation signals that the data has the separation property; however, the estimates are still all expected to be finite for the log-binomial regression.Note that for logistic regression the MLE does not exist (i.e., has infinite components).To verify this, again detectseparation can be used.