CVXR: An R Package for Disciplined Convex Optimization

CVXR is an R package that provides an object-oriented modeling language for convex optimization, similar to CVX, CVXPY, YALMIP, and Convex.jl. It allows the user to formulate convex optimization problems in a natural mathematical syntax rather than the restrictive form required by most solvers. The user specifies an objective and set of constraints by combining constants, variables, and parameters using a library of functions with known mathematical properties. CVXR then applies signed disciplined convex programming (DCP) to verify the problem's convexity. Once verified, the problem is converted into standard conic form using graph implementations and passed to a cone solver such as ECOS or SCS. We demonstrate CVXR's modeling framework with several applications.


Introduction
Optimization plays an important role in fitting many statistical models. Some examples include least squares, ridge and lasso regression, isotonic regression, Huber regression, support vector machines, and sparse inverse covariance estimation. Koenker and Mizera (2014) discuss the role of convex optimization in statistics and provide a survey of packages for solving such problems in R (R Core Team 2018). Our package, CVXR, solves a broad class of convex optimization problems, which includes those noted above as well as many other models and methods in statistics. Similar systems already exist, such as CVX (Grant and Boyd 2014) and YAMLIP (Lofberg 2004) in MATLAB, CVXPY (Diamond and Boyd 2016) in Python, and Convex.jl (Udell, Mohan, Zeng, Hong, Diamond, and Boyd 2014) in Julia. CVXR brings these capabilities to R, providing a domain-specific language (DSL) that allows users to easily formulate and solve new problems for which custom code does not exist. As an illustration, suppose we are given X ∈ R m×n and y ∈ R m , and we want to solve the ordinary least squares (OLS) problem minimize β y − Xβ 2 2 with optimization variable β ∈ R n . This problem has a well-known analytical solution, which can be determined using lm in the default stats package . In CVXR, we can solve for β using the code beta <-Variable(n) obj <-sum((y -X %*% beta)^2) arXiv:1711.07582v3 [stat.CO] 25 Oct 2018 prob <-Problem(Minimize(obj)) result <-solve(prob) The first line declares our variable, the second line forms our objective function, the third line defines the optimization problem, and the last line solves this problem by converting it into a second-order cone program and sending it to one of CVXR's solvers. The results are retrieved with result$value # Optimal objective result$getValue(beta) # Optimal variables result$metrics # Runtime metrics This code runs slower and requires additional set-up at the beginning. So far, it does not look like an improvement on stats::lm. However, suppose we add a constraint to our problem: minimize β y − Xβ 2 2 subject to β j ≤ β j+1 , j = 1, . . . , n − 1.
This is a special case of isotonic regression. Now, we can no longer use stats::lm for the optimization. We would need to find another R package tailored to this type of problem such as nnls or write our own custom solver. With CVXR though, we need only add the constraint as a second argument to the problem: prob <-Problem(Minimize(obj), list(diff(beta) >= 0)) Our new problem definition includes the coefficient constraint, and a call to solve will produce its solution. In addition to the usual results, we can get the dual variables with

result$getDualValue(constraints(prob)[[1]])
This example demonstrates CVXR's chief advantage: flexibility. Users can quickly modify and re-solve a problem, making our package ideal for prototyping new statistical methods. Its syntax is simple and mathematically intuitive. Furthermore, CVXR combines seamlessly with native R code as well as several popular packages, allowing it to be incorporated easily into a larger analytical framework. The user can, for instance, apply resampling techniques like the bootstrap to estimate variability, as we show in Section 3.2.3.
DSLs for convex optimization are already widespread on other application platforms. In R, users have access to the packages listed in the CRAN Task View for Optimization and Mathematical Programming (Theussl and Borchers 2014). Packages like optimx (Nash and Varadhan 2011) and nloptr (Johnson 2008) provide access to a variety of general algorithms, which can handle nonlinear and certain classes of nonconvex problems. CVXR, on the other hand, offers a language to express convex optimization problems using R syntax, along with a tool for analyzing and restructuring them for the solver best suited to their type. ROI (Theußl, Schwendinger, and Hornik 2017) is perhaps the package closest to ours in spirit. It offers an object-oriented framework for defining optimization problems, but still requires users to explicitly identify the type of every objective and constraint, whereas CVXR manages this process automatically.
In the next section, we provide a brief mathematical overview of convex optimization. Interested readers can find a full treatment in Boyd and Vandenberghe (2004). Then we give a series of examples ranging from basic regression models to semidefinite programming, which demonstrate the simplicity of problem construction in CVXR. Finally, we describe the implementation details before concluding. Our package and the example code for this paper are available on CRAN and the official CVXR site.

Disciplined Convex Optimization
The general convex optimization problem is of the form where v ∈ R n is our variable of interest, and A ∈ R m×n and b ∈ R n are constants describing our linear equality constraints. The objective and inequality constraint functions f 0 , . . . , f M are convex, i.e., they are functions f i : R n → R that satisfy for all u, v ∈ R n and θ ∈ [0, 1]. This class of problems arises in a variety of fields, including machine learning and statistics.
A number of efficient algorithms exist for solving convex problems (Wright 1997;Boyd, Parikh, Chu, Peleato, and Eckstein 2011;Andersen, Dahl, Liu, and Vandenberghe 2011;Skajaa and Ye 2015). However, it is unnecessary for the CVXR user to know the operational details of these algorithms. CVXR provides a DSL that allows the user to specify the problem in a natural mathematical syntax. This specification is automatically converted into the standard form ingested by a generic convex solver. See Section 4 for more on this process.
In general, it can be difficult to determine whether an optimization problem is convex. We follow an approach called disciplined convex programming (DCP) (Grant, Boyd, and Ye 2006) to define problems using a library of basic functions (atoms), whose properties like curvature, monotonicity, and sign are known. Adhering to the DCP rule, • g i is convex and f is increasing in argument i, or • g i is concave and f is decreasing in argument i, we combine these atoms such that the resulting problem is convex by construction. Users will need to become familiar with this rule if they wish to define complex problems.
The library of available atoms is provided in the documentation. It covers an extensive array of functions, enabling any user to model and solve a wide variety of sophisticated optimization problems. In the next section, we provide sample code for just a few of these problems, many of which are cumbersome to prototype or solve with other R packages.

Examples
In the following examples, we are given a dataset (x i , y i ) for i = 1, . . . , m, where x i ∈ R n and y i ∈ R. We represent these observations in matrix form as X ∈ R m×n with stacked rows x T i and y ∈ R m . Generally, we assume that m > n.

Robust (Huber) Regression
In Section 1, we saw an example of OLS in CVXR. While least squares is a popular regression model, one of its flaws is its high sensitivity to outliers. A single outlier that falls outside the tails of the normal distribution can drastically alter the resulting coefficients, skewing the fit on the other data points. For a more robust model, we can fit a Huber regression (Huber 1964) instead by solving minimize for variable β ∈ R n , where the loss is the Huber function with threshold M > 0, This function is identical to the least squares penalty for small residuals, but on large residuals, its penalty is lower and increases linearly rather than quadratically. It is thus more forgiving of outliers.
In CVXR, the code for this problem is beta <-Variable(n) obj <-sum(huber(y -X %*% beta, M)) prob <-Problem(Minimize(obj)) result <-solve(prob) Note the similarity to the OLS code. As before, the first line instantiates the n-dimensional optimization variable, and the second line defines the objective function by combining this variable with our data using CVXR's library of atoms. The only difference this time is we call the huber atom on the residuals with threshold M, which we assume has been set to a positive scalar constant. Our package provides many such atoms to simplify problem definition for the user.

Quantile Regression
Another variation on least squares is quantile regression (Koenker 2005). The loss is the tilted l 1 function, where τ ∈ (0, 1) specifies the quantile. The problem as before is to minimize the total residual loss. This model is commonly used in ecology, healthcare, and other fields where the mean alone is not enough to capture complex relationships between variables. CVXR allows us to create a function to represent the loss and integrate it seamlessly into the problem definition, as illustrated below.
By default, the solve method automatically selects the CVXR solver most specialized to the given problem's type. This solver may be changed by passing in an additional solver argument. For instance, the following line fits our quantile regression with SCS (O'Donoghue, Chu, Parikh, and Boyd 2016).
To solve this problem in CVXR, we first define a function that calculates the regularization term given the variable and penalty weights.
loss <-sum((y -X %*% beta)^2)/(2*m) obj <-loss + elastic_reg(beta, lambda, alpha) prob <-Problem(Minimize(obj)) result <-solve(prob) The advantage of this modular approach is that we can easily incorporate elastic net regularization into other regression models. For instance, if we wanted to run regularized Huber regression, CVXR allows us to reuse the above code with just a single changed line, loss <-huber(y -X %*% beta, M)

Logistic Regression
Suppose now that y i ∈ {0, 1} is a binary class indicator. One of the most popular methods for binary classification is logistic regression (Cox 1958;Freedman 2009). We model the conditional response as y|x ∼ Bernoulli(g β (x)), where g β (x) = 1 1+e −x T β is the logistic function, and maximize the log-likelihood function, yielding the optimization problem CVXR provides the logistic atom as a shortcut for f (z) = log(1 + e z ), so our problem is succinctly expressed as obj <--sum(X[y == 0,] %*% beta) -sum(logistic(-X %*% beta)) prob <-Problem(Maximize(obj)) result <-solve(prob) The user may be tempted to type log(1 + exp(X %*% beta)) as in conventional R syntax. However, this representation of f (z) violates the DCP composition rule, so the CVXR parser will reject the problem even though the objective is convex. Users who wish to employ a function that is convex, but not DCP compliant should check the documentation for a custom atom or consider a different formulation.
We can retrieve the optimal objective and variables just like in OLS. More interestingly, we can evaluate various functions of these variables as well by passing them directly into result$getValue. For instance, the log-odds are log_odds <-result$getValue(X %*% beta) This will coincide with the ratio we get from computing the probabilities directly: beta_res <-result$getValue(beta) y_probs <-1 /(1 + exp(-X %*% beta_res)) log(y_probs / (1 -y_probs)) We illustrate with a logistic regression fit from a credit scoring example (Mathworks Inc 2018). The nine regression coefficients other than the intercept are constrained to be in the unit interval. To reflect the correlation between two of the covariates, customer age (x 2 ) and customer income (x 6 ), an additional constraint is placed on the respective coefficients β 2 and β 6 : The code below demonstrates how the latter constraint can be specified by seamlessly combining familiar R functions such as abs with standard indexing constructs.   Figure 1: Logistic regression with constraints using data from Mathworks Inc (2018). The addition of constraint 1 moves the coefficients for customer age and customer income closer to each other. Figure 1 compares the unconstrained and constrained fits and shows that the addition of constraint 1 pulls the coefficient estimates for customer age and customer income towards each other.
Many other classification methods belong to the convex framework. For example, the support vector classifier is the solution of a l 2 -norm minimization problem with linear constraints, which we have already shown how to model. Support vector machines are a straightforward extension. The multinomial distribution can be used to predict multiple classes, and estimation via maximum likelihood produces a convex problem. To each of these methods, we can easily add new penalties, variables, and constraints in CVXR, allowing us to adapt to a specific dataset or environment.

Sparse Inverse Covariance Estimation
Assume we are given i.i.d. observations x i ∼ N (0, Σ) for i = 1, . . . , m, and the covariance matrix Σ ∈ S n + , the set of symmetric positive semidefinite matrices, has a sparse inverse T be our sample covariance. One way to estimate Σ is to maximize the log-likelihood with an l 1 -norm constraint (Yuan and Lin 2007;Banerjee, El Ghaoui, and d'Aspremont 2008;Friedman, Hastie, and Tibshirani 2008), which amounts to the optimization problem The parameter α ≥ 0 controls the degree of sparsity. Our problem is convex, so we can solve it with S <-Semidef(n) obj <-log_det(S) -matrix_trace(S %*% Q) constr <-list(sum(abs(S)) <= alpha) prob <-Problem(Maximize(obj), constr) result <-solve(prob) The Semidef constructor restricts S to the positive semidefinite cone. In our objective, we use CVXR functions for the log-determinant and trace. The expression matrix_trace(S %*% Q) is equivalent to sum(diag(S %*% Q)), but the former is preferred because it is more efficient than making nested function calls. However, a standalone atom does not exist for the determinant, so we cannot replace log_det(S) with log(det(S)) since det is undefined for a Semidef object. Figure 2 depicts the solutions for a particular dataset with m = 1000, n = 10, and S containing 26% non-zero entries represented by the black squares in the top left image. The sparsity of our inverse covariance estimate decreases for higher α, so that when α = 1, most of the off-diagonal entries are zero, while if α = 10, over half the matrix is dense. At α = 4, we achieve the true percentage of non-zeros.

Saturating Hinges
The following example comes from work on saturating splines in Boyd, Hastie, Boyd, Recht, and Jordan (2016). Adaptive regression splines are commonly used in statistical modeling, but the instability they exhibit beyond their boundary knots makes extrapolation dangerous. One way to correct this issue for linear splines is to require they saturate: remain constant outside their boundary. This problem can be solved using a heuristic that is an extension of lasso regression, producing a weighted sum of hinge functions, which we call a saturating hinge.
For simplicity, consider the univariate case with n = 1. Assume we are given knots t 1 < t 2 < · · · < t k where each t j ∈ R. Let h j be a hinge function at knot t j , i.e., h j (x) = max(x − t j , 0), and define f (x) = w 0 + k j=1 w j h j (x). We want to solve The function : R × R → R is the loss associated with every observation, and λ ≥ 0 is the penalty weight. In choosing our knots, we set t 1 = min(x i ) and t k = max(x i ) so that by construction, the estimatef will be constant outside [t 1 , t k ].
We demonstrate this technique on the bone density data for female patients from Hastie, Tibshirani, and Friedman (2001, Section 5.4). There are a total of m = 259 observations. Our response y i is the change in spinal bone density between two visits, and our predictor x i is the patient's age. We select k = 10 knots about evenly spaced across the range of X and fit a saturating hinge with squared error loss (y In R, we first define the estimation and loss functions: q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q qq q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q

} loss_obs <-function(y, f) { (y -f)^2 }
This allows us to easily test different losses and knot locations later. The rest of the set-up is similar to previous examples. We assume that knots is a R vector representing (t 1 , . . . , t k ).
We plot the fitted saturating hinges in Figure 3a. As expected, when λ increases, the spline exhibits less variation and grows flatter outside its boundaries. The squared error loss works well in this case, but as we saw in Section 3.1.1, the Huber loss is preferred when the dataset contains large outliers. We can change the loss function by simply redefining and passing an extra threshold parameter in when initializing loss. In Figure 3b, we have added 50 randomly generated outliers to the bone density data and plotted the re-fitted saturating hinges. For a Huber loss with M = 0.01, the resulting spline is fairly smooth and follows the shape of the original data, as opposed to the spline using squared error loss, which is biased upwards by a significant amount.

Log-Concave Distribution Estimation
Let n = 1 and suppose x i are i.i.d. samples from a log-concave discrete distribution on {0, . . . , K} for some K ∈ Z + . Define p k := P(X = k) to be the probability mass function. One method for estimating (p 0 , . . . , p K ) is to maximize the log-likelihood function subject to a log-concavity constraint (Dümbgen and Rufibach 2009), i.e., where p ∈ R K+1 is our variable of interest and M k represents the number of observations equal to k, so that K k=0 M k = m. The problem as posed above is not convex. However, we can transform it into a convex optimization problem by defining new variables u k = log p k and relaxing the equality constraint to K k=0 p k ≤ 1, since the latter always holds tightly at an optimal solution. The result is If counts is the R vector of (M 0 , . . . , M K ), the code for our convex problem is Once the solver is finished, we can retrieve the probabilities directly with The above line transforms the variables u k to e u k before calculating their resulting values. This is possible because exp is a member of CVXR's library of atoms, so it can operate directly on a Variable object such as u.
As an example, we consider the reliability data from Dümbgen and Rufibach (2011) that was collected as part of a consulting project at the Institute for Mathematical Statistics and Actuarial Science, University of Bern (Dümbgen and Rufibach 2009). The dataset consists of n = 786 observations, and the goal is to fit a suitable distribution to this sample that can be used for simulations. For various reasons detailed in the paper, the authors chose a log-concave estimator, which they implemented in the R package logcondens. Figure 4 shows that the curve obtained from the CVXR code above matches their results exactly.

Survey Calibration
Calibration is a widely used technique in survey sampling. Suppose m sampling units in a survey have been assigned initial weights d i for i = 1, . . . , m, and furthermore, there are n auxiliary variables whose values in the sample are known. Calibration seeks to improve the initial weights d i by finding new weights w i that incorporate this auxiliary information while perturbing the initial weights as little as possible, i.e., the ratio g i = w i /d i must be close to one. Such reweighting improves precision of estimates (Lumley 2010, Chapter 7).
Let X ∈ R m×n be the matrix of survey samples, with each column corresponding to an auxiliary variable. Reweighting can be expressed as the optimization problem with respect to g ∈ R m , where φ : R → R is a strictly convex function with φ(1) = 0, r ∈ R n are the known population totals of the auxiliary variables, and A ∈ R m×n is related to X by A ij = d i X ij for i = 1, . . . , m and j = 1, . . . , n. A common technique is raking, which uses the penalty function φ(g i ) = g i log(g i ) − g i + 1.
We illustrate with the California Academic Performance Index data in the survey package (Lumley 2018), which also supplies facilities for calibration via the function calibrate. Both the population dataset (apipop) and a simple random sample of m = 200 (apisrs) are provided. Suppose that we wish to reweight the observations in the sample using known totals for two variables from the population: stype, the school type (elementary, middle or high) and sch.wide, whether the school met the yearly target or not. This reweighting would make the sample more representative of the general population. The code below solves the problem in CVXR, where we have used a model matrix to generate the appropriate dummy variables for the two factor variables.

Nearly-Isotonic and Nearly-Convex Fits
Given a set of data points y ∈ R m , Tibshirani, Hoefling, and Tibshirani (2011) fit a nearlyisotonic approximation β ∈ R m by solving where λ ≥ 0 is a penalty parameter and x + = max(x, 0). Our CVXR formulation follows directly as shown below. The pos atom evaluates x + elementwise on the input expression.
neariso_fit <-function(y, lambda) { m <-length(y) beta <-Variable(m) penalty <-sum(pos(diff(beta))) obj <-0.5 * sum((y -beta)^2) + lambda * penalty prob <-Problem(Minimize(obj)) result <-solve(prob) result$getValue(beta) } We demonstrate this technique on the global warming data provided by the Carbon Dioxide Information Analysis Center (CDIAC). Our data points are the annual temperature anomalies relative to the 1961-1990 mean. Combining near_fit with the boot library, we can obtain the standard errors and confidence intervals for our estimate in just a few lines of code.
For a smoother curve, we can solve for the nearly-convex fit described in the same paper: This replaces the first difference term with an approximation to the second derivative at β i+1 .
In CVXR, the only change necessary is the penalty line in near_fit, penalty <-sum(pos(diff(x, differences = 2))) The resulting curve is depicted in Figure 5b with 95% confidence bands generated from R = 100 samples. Note the jagged staircase pattern has been smoothed out. We can easily extend this example to higher-order differences or lags by modifying the arguments to diff.

Worst Case Covariance
Suppose we have i.i.d. samples x i ∼ N (0, Σ) for i = 1, . . . , m and want to determine the maximum covariance of y = w T x = m i=1 w i x i , where w ∈ R m is a given vector of weights. We are provided limited information on the elements of Σ. For example, we may know the specific value or sign of certain Σ jk , which are represented by upper and lower bound matrices L and U ∈ R n×n , respectively (Boyd and Vandenberghe 2004, pp. 171-172). This situation can arise when calculating the worst-case risk of an investment portfolio (Lobo and Boyd 2000). Formally, our optimization problem is maximize Σ w T Σw subject to Σ ∈ S n + , L jk ≤ Σ jk ≤ U jk , j, k = 1, . . . , n.
Consider the specific case where a + means the element is nonnegative, a − means the element is nonpositive, and a ± means the element can be any real number. In CVXR, this semidefinite program is It is necessary to specify a quadratic form with quad_form rather than the usual t(w) %*% Sigma %*% w because the latter will be interpreted by the CVXR parser as a product of two affine terms and rejected for not being DCP.

Catenary Problem
We consider a discretized version of the catenary problem in Griva and Vanderbei (2005). A chain with uniformly distributed mass hangs from the endpoints (0, 1) and (1, 1) on a 2-D plane. Gravitational force acts in the negative y direction. Our goal is to find the shape of the chain in equilibrium, which is equivalent to determining the (x, y) coordinates of every point along its curve when its potential energy is minimized.
To formulate this as an optimization problem, we parameterize the chain by its arclength and divide it into m discrete links. The length of each link must be no more than h > 0. Since mass is uniform, the total potential energy is simply the sum of the y-coordinates. Therefore, our problem is with variables x ∈ R m and y ∈ R m . This basic catenary problem has a well-known analytical solution (Gelfand and Fomin 1963), which we can easily verify with CVXR.

constr) result <-solve(prob)
A more interesting situation arises when the ground is not flat. Let g ∈ R m be the elevation vector (relative to the x-axis), and suppose the right endpoint of our chain has been lowered by ∆y m = 0.5. The analytical solution in this case would be difficult to calculate. However, we need only add two lines to our constraint definition,

constr[[4]] <-(y[m] == 0.5) constr <-c(constr, y >= g)
to obtain the new result. Figure 6 depicts the solution of this modified catenary problem for m = 101 and h = 0.04. The chain is shown hanging in blue, bounded below by the red staircase structure, which represents the ground.

Portfolio Optimization
In this example, we solve the Markowitz portfolio problem under various constraints (Markowitz 1952;Roy 1952;Lobo, Fazel, and Boyd 2007). We have n assets or stocks in our portfolio and must determine the amount of money to invest in each. Let w i denote the fraction of our budget invested in asset i = 1, . . . , m, and let r i be the returns (i.e., fractional change in price) over the period of interest. We model returns as a random vector r ∈ R n with known mean E[r] = µ and covariance Var(r) = Σ. Thus, given a portfolio w ∈ R n , the overall return is R = r T w.
Portfolio optimization involves a trade-off between the expected return E[R] = µ T w and associated risk, which we take as the return variance Var(R) = w T Σw. Initially, we consider only long portfolios, so our problem is where the objective is the risk-adjusted return and γ > 0 is a risk aversion parameter.
w <-Variable(n) ret <-t(mu) %*% w risk <-quad_form(w, Sigma) obj <-ret -gamma * risk constr <-list(w >= 0, sum(w) == 1) prob <-Problem(Maximize(obj), constr) result <-solve(prob) We can obtain the risk and return by directly evaluating the value of the separate expressions: result$getValue(risk) result$getValue(ret) Figure 7a depicts the risk-return trade-off curve for n = 10 assets and µ and Σ 1/2 drawn from a standard normal distribution. The x-axis represents the standard deviation of the return. Red points indicate the result from investing the entire budget in a single asset. As γ increases, our portfolio becomes more diverse (Figure 7b), reducing risk but also yielding a lower return.
Many variations on the classical portfolio problem exist. For instance, we could allow long and short positions, but impose a leverage limit w 1 ≤ L max by changing constr <-list(p_norm(w,1) <= Lmax, sum(w) == 1) An alternative is to set a lower bound on the return and minimize just the risk. To account for transaction costs, we could add a term to the objective that penalizes deviations of w from the previous portfolio. These extensions and more are described in Boyd, Busseti, Diamond, Kahn, Koh, Nystrup, and Speth (2017). The key takeaway is that all of these convex problems can be easily solved in CVXR with just a few alterations to the code above.

Kelly Gambling
In Kelly gambling (Kelly 1956), we are given the opportunity to bet on n possible outcomes, which yield a random non-negative return of r ∈ R n + . The return r takes on exactly K values r 1 , . . . , r K with known probabilities π 1 , . . . , π K . This gamble is repeated over T periods. In a given period t, let b i ≥ 0 denote the fraction of our wealth bet on outcome i. Assuming the nth outcome is equivalent to not wagering (it returns one with certainty), the fractions must satisfy n i=1 b i = 1. Thus, at the end of the period, our cumulative wealth is w t = (r T b)w t−1 . Our goal is to maximize the average growth rate with respect to b ∈ R n : In the following code, rets is the K by n matrix of possible returns with row r i , while ps is the vector of return probabilities (π 1 , . . . , π K ). b <-Variable(n) obj <-t(ps) %*% log(rets %*% b) constr <-list(b >= 0, sum(b) == 1) prob <-Problem(Maximize(obj), constr) result <-solve(prob) We solve the Kelly gambling problem for K = 100 and n = 20. The probabilities π j ∼ Uniform(0, 1), and the potential returns r j ∼ Uniform(0.5, 1.5) except for r n = 1, which represents the payoff from not wagering. With an initial wealth of w 0 = 1, we simulate the growth trajectory of our Kelly optimal bets over P = 100 periods, assuming returns are i.i.d. over time.
bets <-result$getValue(b) idx <-sample.int(K, size = P, probs = ps, replace = TRUE) winnings <-rets[idx,] %*% bets wealth <-w0 * cumprod(winnings) For comparison, we also calculate the trajectory for a naïve betting scheme, which holds onto 15% of the wealth at the beginning of each period and divides the other 85% over the bets in direct proportion to their expect returns.
Growth curves for five independent trials are plotted in Figure 8. Red lines represent the wealth each period from the Kelly bets, while cyan lines are the result of the naïve bets. Clearly, Kelly optimal bets perform better, producing greater net wealth by the final period. However, as observed in some trajectories, wealth tends to drop by a significant amount before increasing eventually. One way to reduce this drawdown risk is to add a convex constraint as proposed in Busseti, Ryu, and Boyd (2016, Section 5.3), where λ ≥ 0 is the risk-aversion parameter. With CVXR, this can be accomplished in a single line using the log_sum_exp atom. Other extensions like wealth goals, betting restrictions, and VaR/CVaR bounds are also readily incorporated.

Channel Capacity
The following problem comes from an exercise in Boyd and Vandenberghe (2004, pp. 207-208). Consider a discrete memoryless communication channel with input X(t) ∈ {1, . . . , n} and output Y (t) ∈ {1, . . . , m} for t = 1, 2, . . .. The relation between the input and output is given by a transition matrix P ∈ R m×n + with P ij = P(Y (t) = i|X(t) = j), i = 1, . . . , m, j = 1, . . . , n Assume that X has a probability distribution denoted by x ∈ R n , i.e., x j = P(X(t) = j) for j = 1, . . . , n. A famous result by Shannon and Weaver (1949) states that the channel capacity is found by maximizing the mutual information between X and Y , where y = P x is the probability distribution of Y . Since I is concave, this is equivalent to solving the convex optimization problem x for x ∈ R n and y ∈ R m . The associated code in CVXR is x <-Variable(n) y <-P %*% x c <-apply(P * log2(P), 2, sum) obj <-c %*% x + sum(entr(y)) constr <-list(sum(x) == 1, x >= 0) prob <-Problem(Maximize(obj), constr) result <-solve(prob) The channel capacity is simply the optimal objective, result$value.

Fastest Mixing Markov Chain
This example is derived from the results in Boyd, Diaconis, and Xiao (2004, Section 2). Let G = (V, E) be a connected graph with vertices V = {1, . . . , n} and edges E ⊆ V × V. Assume that (i, i) ∈ E for all i = 1, . . . , n, and (i, j) ∈ E implies (j, i) ∈ E. Under these conditions, a discrete-time Markov chain on V will have the uniform distribution as one of its equilibrium distributions. We are interested in finding the Markov chain, i.e.constructing the transition probability matrix P ∈ R n×n + , that minimizes its asymptotic convergence rate to the uniform distribution. This is an important problem in Markov chain Monte Carlo (MCMC) simulations, as it directly affects the sampling efficiency of an algorithm.
The asymptotic rate of convergence is determined by the second largest eigenvalue of P , which in our case is µ(P ) := λ max (P − 1 n 11 T ) where λ max (A) denotes the maximum eigenvalue of A. As µ(P ) decreases, the mixing rate increases and the Markov chain converges faster to equilibrium. Thus, our optimization problem is minimize P λ max (P − 1 n 11 T ) subject to P ≥ 0, P 1 = 1, P = P T P ij = 0, (i, j) / ∈ E.
The element P ij of our transition matrix is the probability of moving from state i to state j. Our assumptions imply that P is nonnegative, symmetric, and doubly stochastic. The last constraint ensures transitions do not occur between unconnected vertices. The function λ max is convex, so this problem is solvable in CVXR. For instance, the code for the Markov chain in Figure 9a is P <-Variable(n,n) ones <-matrix(1, nrow = n, ncol = 1) obj <-Minimize(lambda_max(P -1/n)) constr1 <-list(P >= 0, P %*% ones == ones, P == t(P)) constr2 <-list(P[1,3] == 0, P[1,4] == 0) prob <-Problem(obj, c(constr1, constr2)) result <-solve(prob) where we have set n = 4. We could also have specified P 1 = 1 with sum_entries(P,1) == 1, which uses the sum_entries atom to represent the row sums. It is easy to extend this example to other Markov chains. To change the number of vertices, we would simply modify n, and to add or remove edges, we need only alter the constraints in constr2. For instance, the bipartite chain in Figure 9b is produced by setting n = 5 and

Implementation
CVXR represents the atoms, variables, constraints, and other parts of an optimization problem using S4 class objects. S4 enables us to overload standard mathematical operations so CVXR combines seamlessly with native R script and other packages. When an operation is invoked on a variable, a new object is created that represents the corresponding expression tree with the operator as the root node and the arguments as leaves. This tree grows automatically as more elements are added, allowing us to encapsulate the structure of an objective function or constraint.
Once the user calls solve, DCP verification occurs. CVXR traverses the expression tree recursively, determining the sign and curvature of each sub-expression based on the properties of its component atoms. If the problem is deemed compliant, it is transformed into an equivalent cone program using graph implementations of convex functions (Grant et al. 2006). Then, CVXR passes the problem's description to the CVXcanon C++ library (Miller, Quigley, and Zhu 2015), which generates data for the cone program, and sends this data to the solverspecific R interface. The solver's results are returned to the user in a list. This object-oriented design and infrastructure were largely borrowed from CVXPY.
CVXR interfaces with the open-source cone solvers ECOS (Domahidi, Chu, and Boyd 2013) and SCS (O'Donoghue et al. 2016) through their respective R packages. ECOS is an interiorpoint solver, which achieves high accuracy for small and medium-sized problems, while SCS is a first-order solver that is capable of handling larger problems and semidefinite constraints. As noted by Domahidi et al. (2013, Section I.A), first-order methods can be slow if the problem is not well conditioned or if it has a feasible set that does not allow for an efficient projection, while interior-point methods have a convergence rate that is independent of the problem data and the particular feasible set. Both solvers run single-threaded at present. Support for multi-threaded SCS will be added in the future. Furthermore, CVXR version 0.99 provides support for the commercial solvers MOSEK (Andersen and Andersen 2000) and GUROBI (Gurobi Optimization, Inc 2016) through binary R packages published by the respective vendors. It is not difficult to connect additional solvers so long as the solver has an API that can communicate with R. Users who wish to employ a custom solver may obtain the canonicalized data for a problem and solver combination directly with get_problem_data(problem, solver). When more than one solver is capable of solving a problem, the solver argument to the solve function can be used to indicate a preference. Available solvers, depending on installed packages in a session, are returned via installed_solvers(). Interested users should consult tutorial examples on the CVXR site for further guidance.
We have provided a large library of atoms, which should be sufficient to model most convex optimization problems. However, it is possible for a sophisticated user to incorporate new atoms into this library. The process entails creating a S4 class for the atom, overloading methods that characterize its DCP properties, and representing its graph implementation as a list of linear operators that specify the corresponding feasibility problem. For instance, the absolute value function f (x) = |x| is represented by the Abs class, which inherits from Atom. We defined its curvature by overloading the S4 method is_atom_convex, used in the DCP verification step, to return TRUE when called on an Abs object. Then, we derived the graph form of the absolute value to be f (x) = inf{t| − t ≤ x ≤ t}. This form's objective and constraints were coded into lists in the atom's graph_implementation function. A full mathematical exposition may be found in Grant et al. (2006, Section 10). In general, we suggest users try to reformulate their optimization problem first before attempting to add a novel atom.

Speed Considerations
Usually, CVXR will be slower than a direct call to a solver, because in the latter case, the user would have already done the job of translating a mathematical problem into code and constraints ingestible by the solver. CVXR performs this translation for the user starting from a DCP formulation of the problem by walking the abstract syntax tree, which represents the canonicalized objectives and constraints, and building appropriate matrix structures for the solver. The matrix data are passed to a compatible solver using either Rcpp (Eddelbuettel and François 2011) or calls to a solver-specific R package. CVXR stores data in sparse matrices, thereby allowing large problems to be specified. However, the restrictions imposed by R on sparse matrices (Bates and Maechler 2018) still apply: each dimension cannot exceed the integer limit of 2 31 − 1.
Currently, the canonicalization and construction of data in R for the solver dominates computation time, particularly for complex expressions that access individual elements by indexing into matrices or vectors. Using available CVXR functions for vectorized operations provides substantial speed improvements. Gao and Shi (2018) present speed and accuracy comparisons between different convex optimization packages on the classifier-Lasso (C-Lasso) highdimensional estimator proposed by Su, Shi, and Phillips (2016). Of the included packages nloptr, Rmosek (MOSEK ApS 2017), CVX, and CVXR -the authors note that only the latter three obtained correct results. The timing results in Gao and Shi (2018 , Table 1

) show
Rmosek to be the fastest, with a hundred-fold speedup over CVXR. However, if we reformulate the objective in Gao (2018) using matrix operations and CVXR atoms, this advantage is reduced to a factor between 2.5 and 3.5. In fact, since the problem is known to be convex, following our Getting Faster Results tutorial further decreases the advantage to between 1.5 and 1.8, as shown in Table 2.  Gao and Shi (2018) for computing the C-Lasso estimator of Su et al. (2016). For CVXR, the objective was reformulated using matrix functions. The "nodcp" version skips DCP checks and directly constructs the matrix data structures. The parameters are n, the number of cross sectional units and T the time length. The results for the (n, T ) = (200, 50) could not be replicated. We used 100 replications rather than the original 500. The experiments were run on a Macbook Pro equipped with a 4-core Intel Core i7 processor and 16 GB of RAM.

Conclusion
Convex optimization plays an essential role in many fields, particularly machine learning and statistics. CVXR provides an object-oriented language with which users can easily formulate, modify, and solve a broad range of convex optimization problems. While other R packages may perform faster on a subset of these problems, CVXR's advantage is its flexibility and simple intuitive syntax, making it an ideal tool for prototyping new models for which custom R code does not exist. For more information, see the official CRAN site and documentation. p ∈ R n + , p = 0 log (det(X)) cvxr_norm(X, "nuc") . .
In CVXR, * and / are affine because expr1 * expr2 and expr1 %*% expr2 are allowed only when one of the expressions is constant and expr1 / expr2 is allowed only when expr2 is a scalar constant.
The transpose of any expression can be obtained using t(expr). Transpose is an affine function. The construct expr^p is equivalent to the function power(expr, p).

A.2. Indexing and Slicing
All non-scalar expressions can be indexed using expr [i, j]. Indexing is an affine function. Non-scalar expressions can also be sliced using the standard R slicing syntax. For example, expr[i:j, r] selects rows i through j of column r and returns a vector. CVXR supports advanced indexing using lists of indices or boolean arrays. The semantics are the same as in R. Any time R might return a numeric vector, CVXR returns a column vector.

A.3. Scalar Functions
CVXR provides the scalar functions displayed in Tables 3 and 4, which take in one or more scalars, vectors, or matrices as arguments and return a scalar.
For a vector expression x, cvxr_norm(x) and cvxr_norm(x, 2) give the Euclidean norm. For a matrix expression X, however, cvxr_norm(X) and cvxr_norm(X, 2) give the spectral norm.
The function cvxr_norm(X, "fro") gives the Frobenius norm and cvxr_norm(X, "nuc") the nuclear norm. The nuclear norm can also be defined as the sum of the singular values of X.
The functions max_entries and min_entries give the largest and smallest entry, respectively, in a single expression. These functions should not be confused with max_elemwise and min_elemwise (see Section A.4). The functions max_elemwise and min_elemwise return the maximum or minimum of a list of scalar expressions.
The function sum_entries sums all the entries in a single expression. The built-in R sum should be used to add together a list of expressions. For example, the following code sums three expressions.
sum(expr1, expr2, expr3) Some functions such as sum_entries, cvxr_norm, max_entries, and min_entries can be applied along an axis. Given an m by n expression expr, the line func(expr, axis = 1) applies func to each row, returning an m by 1 expression. The line func(expr, axis = 2) applies func to each column, returning a 1 by n expression. For example, the following code sums along the columns and rows of a matrix variable: X <-Variable(5, 4) x > 0 x 2 x ∈ R + convex for x ≥ 0, for x ≤ 0  X (1) · · · X (k) affine kronecker(C, X) affine depends on C reshape_expr(X, n', m') X ∈ R m ×n X ∈ R m×n m n = mn same as X affine vec(X) x ∈ R mn X ∈ R m×n same as X affine vstack(X1,..., Xk)