ManifoldOptim: An R Interface to the ROPTLIB Library for Riemannian Manifold Optimization

Manifold optimization appears in a wide variety of computational problems in the applied sciences. In recent statistical methodologies such as sufficient dimension reduction and regression envelopes, estimation relies on the optimization of likelihood functions over spaces of matrices such as the Stiefel or Grassmann manifolds. Recently, Huang, Absil, Gallivan, and Hand (2016) have introduced the library ROPTLIB, which provides a framework and state of the art algorithms to optimize real-valued objective functions over commonly used matrix-valued Riemannian manifolds. This article presents ManifoldOptim, an R package that wraps the C++ library ROPTLIB. ManifoldOptim enables users to access functionality in ROPTLIB through R so that optimization problems can easily be constructed, solved, and integrated into larger R codes. Computationally intensive problems can be programmed with Rcpp and RcppArmadillo, and otherwise accessed through R. We illustrate the practical use of ManifoldOptim through several motivating examples involving dimension reduction and envelope methods in regression.


Introduction
This article presents ManifoldOptim (Adragni et al., 2016a), an R (R Core Team, 2016) package for the optimization of real-valued functions on Riemannian manifolds. ManifoldOptim is a wrapper for ROPTLIB, a C++ optimization library by Huang et al. (2016), which provides a variety of cutting edge algorithms for optimization on manifolds and a framework for constructing such optimization problems.
There is a growing literature on manifolds and manifold optimization. Manifolds are generalizations of smooth curves and surfaces within higher dimensions. Riemannian manifolds are a class of manifolds which have rules to compute distances and angles. Some manifolds are especially abundant in the literature because of their wide applicability; for example, the differential geometry of Grassmann manifolds has been well studied (Wong, 1967;Edelman et al., 1998;Chikuse, 2003;Absil et al., 2008). Mathematical introductions to manifolds can be found in Lee (2000Lee ( , 2003 and Tu (2011).
Formulating optimization algorithms on manifolds requires endowing the manifolds with a differentiable structure. The gradient and Hessian are often used in Euclidean optimization to find the direction for the next iterate, but these notions must be extended to honor the curved structure of the manifold within Euclidean space. Each manifold is unique with respect to its differential geometry and requires some individual consideration. In this article, we focus on the following manifolds: Euclidean, Grassmann, low-rank manifold, manifold of n-dimensional vectors on the unit sphere, Stiefel, and space of positive definite matrices. We also consider spaces formed by taking Cartesian products of these manifolds. Available optimization algorithms include the: Riemannian trust-region Newton (Absil et al., 2007), Riemannian symmetric rank-one trust-region (RTRSR1; Huang et al., 2015a), limited-memory (RTRSR1; Huang et al., 2015a), Riemannian trust-region steepest descent (Absil et al., 2008), Riemannian line-search Newton (Absil et al., 2008), Riemannian Broyden family (Huang et al., 2015b), Riemannian Broyden-Fletcher-Goldfarb-Shannon (RBFGS; Ring and Wirth, 2012;Huang et al., 2015b), limited memory (RBFGS; Huang et al., 2015b), Riemannian conjugate gradients (Nocedal and Wright, 1999;Absil et al., 2008;Sato and Iwai, 2013), and Riemannian steepest descent (Absil et al., 2008).
Consider the problem of minimizing f (U) for U ∈ M where M is a Riemannian manifold. For example, let U be a p × d semi-orthogonal matrix where U T U = I d and d < p. This problem naturally lies in a Stiefel manifold, where orthonormality of the argument U is intrinsic to the manifold. When f is invariant under right orthogonal transformations of U, so that f (U) = f (UO) for any d × d orthogonal matrix O, then the problem lies in a Grassmann manifold. If U is a p × p symmetric positive definite (SPD) matrix, such as a covariance matrix, the problem is naturally defined on the manifold of SPD matrices. Methods abound in the literature for optimization problems with linear constraints (Fletcher, 1987;Nocedal and Wright, 1999).
Constrained problems sometimes become unconstrained by an appropriate transformation. The constraints in the above examples are naturally honored by restricting to the respective manifolds.
In a typical use of ManifoldOptim, the user provides an objective function, its gradient and Hessian, and a specification of the manifold, the solver, and the optimization method to be used. Numerical gradient and Hessian functions are provided, as defaults, for problems where closed-form expressions are not easily programmed. ManifoldOptim users can construct problems in plain R for convenience and quick prototyping.
If performance becomes a concern, users can implement the objective, gradient, and Hessian functions in C++ and otherwise interact with ManifoldOptim through R. We make use of the Rcpp (Eddelbuettel and Francois, 2011) and RcppArmadillo (Eddelbuettel and Sanderson, 2014) packages to reduce the burden of C++ programming for R users, and to facilitate seamless integration with R. The ROPTLIB library itself is written in C++ and uses standard linear algebra libraries such as BLAS and LAPACK (Anderson et al., 1999) to ensure generally good performance.
ROPTLIB can be readily used in Matlab through its included mex-based wrapper. Interfaces to ROPTLIB for other high level languages, such as Julia, have also been developed (Huang et al., 2016). Other manifold optimization software packages are found in the literature: sg min written by Lippert (2000) which was adapted from Edelman et al. (1998), and Manopt of Boumal et al. (2014) are available in Matlab. In R, Adragni and Wu (2010) developed GrassmannOptim specifically for the Grassmann manifold. To our knowledge, there are no other publicly available R packages for manifold optimization.
The following notations are adopted throughout this manuscript: G d,p represents the Grassmann manifold, the set of all d-dimensional subspaces in R p ; S d,p represents the Stiefel manifold, the set of all d-dimensional orthonormal matrices in R p ; S + p represents the manifold of all p × p symmetric positive definite matrices, and S p is the unit sphere manifold.
The remainder of this article is organized as follows. Section 2 gives a brief description of the algorithms that guided the programming of ManifoldOptim. Section 3 demonstrates usage of the package through brief examples. Section 4 presents several applications where manifold optimization is useful in statistics. R code for these applications is somewhat lengthy and therefore is provided as supplementary material. Discussions and conclusions follow in section 5. The ManifoldOptim package is available from the Comprehensive R Archive Network at https://CRAN.R-project.org/package=ManifoldOptim.

Optimization on Manifolds
Manifold optimization appears in a wide variety of computational problems in the applied sciences. It has recently gained prominence in the statistics literature. Optimization over a Grassmann manifold is used in several models for sufficient dimension reduction (SDR) in regression (Cook, 1998(Cook, , 2007, including covariance reducing models (Cook and Forzani, 2008) and likelihood acquired directions (Cook and Forzani, 2009). Adragni and Wu (2010) provide some relevant examples. Essentially, a manifold optimization is characterized by symmetry or invariance properties of the objective function.
Consider the Brockett problem (Absil et al., 2008, section 4), which amounts to minimizing the objective function f (X) = Trace(X T BXD), X ∈ R p×d (1) subject to X T X = I p . Here, D = Diag(µ 1 , . . . , µ p ) with µ 1 ≥ · · · ≥ µ p ≥ 0 and B ∈ R n×n is a given symmetric matrix. The optimization can be carried out over a Stiefel manifold S d,p . It is known that a global minimizer of f is the matrix whose columns are the eigenvectors of B corresponding to the d smallest eigenvalues µ p−d+1 , . . . , µ p . This is essentially an eigenvalue problem reminiscent of the generalized Rayleigh quotient for discriminant analysis (Adragni and Wu, 2010). It will be later used in section 3 to illustrate practical use of the package.
We next present three algorithms in statistics that can be formulated as manifold optimization problems.
The first is the recently developed Minimum Average Deviance Estimation method for SDR (Adragni et al., 2016b). Second is Cook's Principal Fitted Components model for SDR (Cook, 2007). Third is an envelope method for regression initially proposed by Cook et al. (2010). These problems will be explored further in section 4.

Minimum Average Deviance Estimation
Minimum Average Deviance Estimation (MADE) is a dimension reduction method based on local regression (Adragni et al., 2016b). This approach was first proposed by Xia et al. (2002) under the assumption of additive errors. Suppose Y ∈ R is a response, X is a p-dimensional predictor, and the distribution of Y | X is given by an exponential family distribution of the form with dispersion parameter φ. The canonical parameter θ(X) possesses the main information that connects Y to X; it relates to the mean function E(Y | X) through a link function g so that g(E(Y | X)) = θ(X). Let (Y i , X i ), i = 1, . . . , n, represent an independent sample from the distribution of (Y, X) so that Y i | X i has the distribution (2). If there exists a semi-orthogonal B ∈ R p×d with d < p and θ(X) = ϑ(B T X) for some function ϑ, then X ∈ R p can be replaced by B T X ∈ R d in the regression of Y on X, and the subspace S B is a dimension reduction subspace.
Regression based on the local log-likelihood evaluated at a given X 0 ∈ R p can be written as using the first-order Taylor expansion taking α 0 = ϑ(B T X 0 ) and γ 0 = ∇ϑ(B T X 0 ). The 0 subscript in α 0 , γ 0 denotes that the parameters vary with the choice of X 0 . The weights w(X 1 , X 0 ), . . . , w(X n , X 0 ) represent the contribution of each observation toward L X0 (α 0 , γ 0 , B). A local deviance for the jth observation can be defined as The term max ϑ log f (Y j | ϑ) is the maximum likelihood achievable for an individual observation. Consider minimizing the average deviance n −1 n j=1 D(Y j , ϑ(B T X j )) with respect to (α j , γ j ) ∈ R d+1 for j = 1, . . . , n and B ∈ R p×d such that B T B = I. This is equivalent to maximizing over the same space. The kernel weight function is taken to be where K(u) denotes one of the usual multidimensional kernel density functions and the bandwidth H is a p × p symmetric and positive definite matrix. We may also consider refined weights w(B T X i , B T X 0 ) which make use of the unknown B.
For any orthogonal matrix O ∈ R d×d , γ T B T = γ T OO T B T , which implies that γ and B are not uniquely determined but obtained up to an orthogonal transformation. Furthermore, refined weights based on the Gaussian kernel with H = hI depend on B only through BB T = BOO T B T . In this setting, the MADE problem is invariant to orthogonal transformation of B in the sense that The joint parameter space of (α, γ, B) is a product manifold R n(d+1) × G d,p . However, in their initial work, Adragni et al. (2016b) used an iterative algorithm to maximize the objective function (4): 1. Fix B and maximize over (α j , γ j ) ∈ R d+1 using Newton-Raphson for j = 1, . . . , n.
2. Fix α and γ and the weights w ij , and maximize over B ∈ S d,p .
3. Update the weights w ij = w(B T X i , B T X j ) if refined weights are desired.
These iterations are started from an initial estimate for B, and repeated until changes in B are smaller than some specified threshold.

Principal Fitted Components
The principal fitted components (PFC) model was initially proposed by Cook (2007) as a likelihood-based method for sufficient dimension reduction in regression. Let X be a p-vector of random predictors and Y be the response. Using the stochastic nature of the predictors, Cook (2007) proposed the model Here, X y denotes the conditional X given Y = y, f y ∈ R r is a user-selected function of the response, which helps capture the dependency of X on Y . The other parameters are µ ∈ R p , Γ ∈ R p×d is a semi-orthogonal matrix and β ∈ R d×r . The error term ε ∈ R p is assumed to be normally distributed with mean 0 and variance ∆.
This model can be seen as a way to partition the information in the predictors given the response into two parts: one part Γβf y = E(X y − X) that is related to the response and another part µ + ε that is not. The form of the related part suggests that the translated conditional means E(X y − X) fall in the d-dimensional Cook (2007) showed that a sufficient reduction of X is η T X, where η = ∆ −1 Γ, so that X can be replaced by η T X without loss of information about the regression of Y on X. However, as η T X is a sufficient reduction, Thus Γ is not estimable but the subspace spanned by its columns is estimable. The goal of an analysis is then to estimate the subspace S Γ spanned by the columns of Γ.
We now focus on maximum likelihood estimation in PFC under the heterogeneous and general unstructured forms of ∆. Assuming that a set of n data points is observed, letX be the sample mean of X and let X denote the n × p matrix with rows (X y −X) T . Let F denote the n × r matrix with rows (f y −f) T and set P F to denote the linear operator that projects onto the subspace spanned by the columns of F. Also let X = P F X denote the n × p matrix of the fitted values from the multivariate linear regression of X on f y . Let Σ = X T X/n and Σ fit = X T X/n. For the heterogeneous and the unstructured ∆, the log-likelihood functions are respectively These functions are real-valued, with parameter spaces expressed as Cartesian products of manifolds (Ω, Ω 0 , Γ) ∈ With multiple parameters constrained to different spaces, a common approach to maximum likelihood estimation is to optimize one parameter at a time while keeping the others fixed, cycling through such steps until meeting convergence criteria for the overall problem. For these situations, ManifoldOptim supports a product space of manifolds which are composed from simpler manifolds and can include Euclidean spaces. Here, optimization algorithms work over all parameters jointly rather than cycling through partial optimizations.
In our experience, joint optimization has produced comparable or superior results to the cycling approach with a reduced programming burden.

Envelope Method for Regression
Enveloping is a novel and nascent method initially introduced by Cook et al. (2010) that has the potential to produce substantial gains in efficiency for multivariate analysis. The initial development of the envelope method was in terms of the standard multivariate linear model where µ ∈ R r , the random response Y ∈ R r , the fixed predictor vector X ∈ R p is centered to have sample mean zero, and the error vector ε ∼ N (0, Σ). In this context some linear combinations of Y are immaterial to the regression because their distribution does not depend on X, while other linear combinations of Y do depend on X and are thus material to the regression. In effect, envelopes separate the material and immaterial parts of Y , and thereby allow for gains in efficiency.
Suppose that we can find a subspace S ∈ R r so that where ∼ means identically distributed, P S projects onto the subspace S and Q S = I r − P S . For any S with those properties, P S Y carries all of the material information and perhaps some immaterial information, while Q S carries just immaterial information. Denoting B := span(β), expressions (11) hold if and only if B ⊆ S The subspace S is not necessarily unique nor minimal, because there may be infinitely many subspaces that satisfy these relations in a particular problem. Cook et al. (2010) showed that S is a reducing subspace of Σ if and only if Σ = Σ S + Σ S ⊥ , and defined the minimal subspace to be the intersection of all reducing subspaces of Σ that contain B, which is called the Σ-envelope of B and denoted as E Σ (B). Let u = dim{E Σ (B)}. Then where E Σ (B) is shortened to E for subscripts. These relationships establish a unique link between the coefficient matrix β and the covariance matrix Σ of model (10), and it is this link that has the potential to produce gains in the efficiency of estimates of β relative to the standard estimator of β. Su and Cook (2011), Su and Cook (2012), and Cook and Su (2013) among others expanded the applications of this envelope method.
Cook and Zhang (2015) recently further expanded the envelope method to generalized linear regression models. The response belongs to an exponential family of the form (2). Let α+β T E(W X) so that a and β are orthogonal. The asymptotic variance ofβ is then obtained as avar( With a reparameterization β = Γη, the parameter η is estimated using a Fisher scoring method fitting the GLM of Y on Γ T X. The partially maximized log-likelihood for Γ is where S X is the sample covariance of X. This function is now optimized over a Grassmann manifold to obtain Γ.

Using ManifoldOptim
We now describe use of the ManifoldOptim package. The core function is manifold.optim, which has the following interface.
The problem encapsulates the objective function to be minimized, and, optionally, the gradient and Hessian functions. Analytical expressions of the gradient and Hessian are usually desirable but may be either tedious to compute or intractable. Numerical approximations to the gradient and Hessian via finite differences are used by default. There are three options for constructing a problem with ManifoldOptim: RProblem, VectorManifoldOptimProblem, and MatrixManifoldOptimProblem.
Generally, ManifoldOptim treats the optimization variable as a single vector, and the user must reshape the vector into the matrices and vectors specific to the problem at hand. This approach is similarly taken by the standard optim function in the R package stats (R Core Team, 2016).
The method argument specifies the algorithm to be used in the optimization; LRBFGS is the default method.
A list of possible values for method is provided in Table 1.
ROPTLIB provides solvers for at least nine commonly encountered manifolds at the time of this writing.
In the first version of ManifoldOptim, we have focused on six manifolds: unconstrained Euclidean space, the Stiefel manifold, the Grassmann manifold, the unit sphere, the orthogonal group, and the manifold of symmetric positive definite matrices. As ROPTLIB continues development, and as the need arises in the R community, support for more solvers and manifolds will be added to ManifoldOptim.
The focus of the initial release ManifoldOptim has been on ease of use. ROPTLIB provides additional constructs to assist C++ programmers in avoiding redundant computations and redundant memory usage, which can significantly improve run time of the optimization (Huang et al., 2016). This functionality is currently not exposed in ManifoldOptim, but may be considered in future versions.

Solving the Brockett Problem
We revisit the Brockett problem described in section 2 to illustrate the use of ManifoldOptim. The gradient for the objective function can be obtained in closed form as ∇f (X) = 2BXD.
We set the matrices B and D to prepare a concrete instance of the problem.
tx <-function(x) { matrix(x, n, p) } f <-function(x) { X <-tx(x); Trace( t(X) %*% B %*% X %*% D ) } g <-function(x) { X <-tx(x); 2 * B %*% X %*% D } The tx function reshapes an np-dimensional point from the solver into an n × p matrix, which can then be evaluated by the f and g functions. We will initially select a solver that does not make use of a user-defined Hessian function.
We first consider an RProblem, where all functions are programmed in R. This provides a convenient way to code a problem, but may face some performance limitations. In general, functions coded in R tend to be much slower than similar functions written in C++, especially if they cannot be vectorized. With an RProblem, each evaluation of the objective, gradient, and Hessian by the solver incurs the overhead of a call from C++ to a function defined in R. Having noted the potential performance issues, we now proceed with construction of an RProblem. mod <-Module("ManifoldOptim_module", PACKAGE = "ManifoldOptim") prob <-new(mod$RProblem, f, g) The Module construct (Eddelbuettel and François, 2016) is useful for the internal implementation of ManifoldOptim. When constructing a module in R, the user implicitly creates a C++ problem which can be accessed by the C++ solver.
The Brockett function is to be optimized over a Stiefel manifold, so we use get.stiefel.defn to create the specification. Additionally, we set software parameters for the manifold and solver via the get.manifold.params and get.solver.params functions.

Max_Iteration = 1000, IsCheckParams = TRUE)
The argument IsCheckParams = TRUE requests useful information to be printed on the console for either the manifold or solver, depending where it is placed. DEBUG is an integer that sets the verbosity of the solver during iterations; The lowest verbosity is DEBUG = 0, which prints no debugging information. Max Iteration and Tolerance set the maximum iteration and tolerance to determine when the solver will halt. More extensive descriptions for the arguments are given in the ManifoldOptim manual.
A starting value for the optimization can be specified by the argument x0 in the call of the function manifold.optim. If available, a good initial value can assist the solver by reducing the time to find a solution and improving the quality of the solution when multiple local optima are present. If no initial value is given, the optimizer will select one at random from the given manifold. In this case of the Brockett problem, we consider the following initial value.
x0 <-as.numeric(diag(n)[,1:p]) Now that we have specified the problem, manifold definition, software parameters for the manifold and solver, an algorithm, and an initial value, we can invoke the optimizer through the manifold.optim function.
> names(res) [1] "xopt" "fval" "normgf" "normgfgf0" [5] "iter" "num.obj.eval" "num.grad.eval" "nR" [9] "nV" "nVp" "nH" "elapsed" [13] "funSeries" "gradSeries" "timeSeries" The ManifoldOptim manual gives a description of each of these items. The most important output is the solution xopt, which optimizes the objective function. Using tx to reshape it into a matrix, its first rows are given below. a Hessian function is required by the solver but not provided by the user, a numerical approximation will be used. This provides a convenient default, but can be slow and potentially inaccurate. We briefly illustrate the use of a Hessian function for the Brockett RProblem.
Treating the optimization variable vec(X) as a q-dimensional vector, the q×q Hessian function ∇ 2 f (vec(X)) is programmed by coding the action [∇ 2 f (vec(X))]η for a given η in the tangent space to the manifold at X.

Further detail on the implementation of this action for either line search or trust region solver algorithms is
given in (Huang et al., 2016) and (Absil et al., 2008). For the Brockett problem, this becomes the matrix multiplication 2(D ⊗ B)η. For an RProblem, this expression can be specified as the third argument of the constructor. We note that the efficiency of the h function can be greatly improved by avoiding direct computation of D ⊗ B, but proceed with this simple coding to facilitate the demonstration.
ROPTLIB provides a diagnostic to check correctness of the gradient and Hessian. It produces two outputs: the first uses the starting value, and the second uses the solution obtained from the optimizer.
If the quantity (fy-fx)/<gfx,eta> is approximately 1 for some interval within the first output, it is an indication that the gradient function is correct.

Coding a Problem in C++
For optimization problems which require intensive computation or will be evaluated repeatedly, it may be worth investing some additional effort to write the problem in C++. We now illustrate by converting the Brockett problem to C++. To proceed, we extend the VectorManifoldOptimProblem class within Mani-foldOptim. A VectorManifoldOptimProblem has objective, gradient, and Hessian functions coded in C++ using RcppArmadillo. The user implements this problem type by extending the abstract VectorManifoldOptimProblem class.
The class BrockettProblem can be written as follows.  .constructor<mat,mat>() .method("objFun", &BrockettProblem::objFun) .method("gradFun", &BrockettProblem::gradFun) .method("hessEtaFun", &BrockettProblem::hessEtaFun) .method("GetB", &BrockettProblem::GetB) .method("GetD", &BrockettProblem::GetD) ; } ') Note that we have used sourceCpp to compile the source code as a string from within R. It is also possible to write the C++ code in a standalone file and compile it using sourceCpp, or to have it compiled within the build of your own custom package; refer to the Rcpp documentation for more details. In addition to defining the BrockettProblem class, our source code defines a module which allows BrockettProblem objects to be constructed and manipulated from R. Specifically, the constructor, objective, gradient, Hessian, and accessor functions for B and D can be called from R.
The tx function is responsible for reshaping the vector x into variables which can be used to evaluate the objective, gradient, and Hessian functions. In this case, the optimization variable is a single matrix, so we simply reshape vector x into the matrix X using the Armadillo reshape function. If x were to contain multiple variables, the tx function could be modified to have multiple output arguments in its definition.
To define a BrockettProblem we must specify the B and D matrices via the constructor. Using the same B and D as in the previous section, we can invoke the constructor through R.

prob <-new(BrockettProblem, B, D)
Setting the manifold definition, manifold configuration, and solver configuration is done in the same way as before.

Max_Iteration = 1000, IsCheckParams = TRUE)
We use the same initial value as before. Thanks to our Rcpp module, we can call the problem functions from R, which makes them easier to test. We may now invoke the solver.

tx <-function(x) { matrix(x, n, p) } head(tx(res$xopt))
The result is close to the one obtained in the previous section and is therefore not shown here.

An Extra C++ Convenience for Functions of a Single Matrix
For problems where the optimization variable is a single matrix (e.g. optimization on a Grassmann manifold), users may extend the class MatrixManifoldOptimProblem. Doing this is very similar to extending VectorManifoldOptimProblem, but gives the convenience of having the optimization variable presented to problem functions as a matrix, partially avoiding the need for reshaping code.

Product Manifold
A product of manifolds can be used to optimize jointly over multiple optimization variables. As a demonstration, consider observing a random sample Y 1 , . . . , Y n from a p-variate normal distribution with mean µ constrained to the unit sphere and symmetric positive definite variance Σ. The maximum likelihood estima-tors are obtained as (μ, Σ) = arg max To code this problem, let us first generate an example dataset.
set.seed (1234) n <-400 p <-3 mu.true <-rep(1/sqrt(p), p) Sigma.true <-diag(2,p) + 0.1 y <-mvtnorm::rmvnorm(n, mean = mu.true, sigma = Sigma.true) Next we define the objective, and a function tx which reshapes a dimension p+p 2 vector into a p-dimensional vector and a p × p matrix. We have used an RProblem so that the objective function can be specified as an R function. The negative of the log-likelihood has been given as the objective to achieve a maximization. We have not specified gradient or Hessian functions so that numerical approximations will be used by the solver. A product manifold definition is now constructed for the problem using unit sphere manifold S p and SPD manifold S + p definitions. We also specify options for the manifold and solver.

Customizing Line Search
Line search algorithms are analogous to the method of gradient descent. As described in detail in (Absil et al., 2008), they are based on an update where η k ∈ R n is the search direction and t k ∈ R is the step size. Determining a step size on the curved surface of a manifold requires special consideration beyond what is needed for Euclidean space. The concept of a retraction mapping is employed to move in the direction of a tangent vector while staying on the manifold.
Several line search algorithms are available in the ROPTLIB library, along with options to customize the search. Some of these options can be configured through the R interface.
As an example, consider the product manifold example from Section 3.4. Several line search parameters are specified below. In particular, LineSearch LS = 1 corresponds to the Wolfe line search method.
> res <-manifold.optim(prob, mani.defn, method = "LRBFGS", mani.params = mani.params, solver.params = solver.params, x0 = x0) Observe that the selected line search options are reported in the output. The full list of available line search options can be found in Huang et al. (2016). At this time, a user-defined line search algorithm (LineSearch LS = 5) cannot be specified via R, but support will be considered for a future release of ManifoldOptim.

Examples with Statistical Methods
This section demonstrates the use of ManifoldOptim in the three main methodologies presented in section 2: MADE, PFC, and envelope regression.

Minimum Average Deviance Estimation
We consider an application of MADE, which was summarized in section 2.1, on a fishing dataset. The data are adapted from Bailey et al. (2009), who were interested in how certain deep-sea fish populations were impacted when commercial fishing began in locations with deeper water than in previous years. The dataset has 147 observations and three continuous predictors: density, log(meandepth), and log(sweptarea) which are, respectively, foliage density index, the natural logarithm of mean water depth per site, and the natural logarithm of the adjusted area of the site. The response totabund is the total number of fish counted per site. Our goal is to find a sufficient reduction of the three predictors to capture all regression information between the response and the predictors. To apply MADE to the fishing data, we assume a Poisson family for (2).
A full MADE program is included in the supplemental material.
Step 2(b) of the iterative algorithm from section 2.1 is a Stiefel manifold optimization and can be carried out with ManifoldOptim. We make use of the RCG solver along with code for the objective function and its derivative with respect to B; these codes are based on the overall MADE objective (4).
The observed response is plotted against the fitted MADE response with d = 1 in Figure 1(a), and using the Poisson generalized linear regression model in Figure 1(b). Ideally, these points would follow a 0-intercept unit-slope line (plotted as the dashed line). The observation with y = 1230 appears to be an outlier under the estimated model. q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q

Principal Fitted Components
We now consider a simulation study using the PFC model from section 2.2 to evaluate the performance of ManifoldOptim. The datasets were generated using n = 300 observations with response Y ∈ R n from the normal distribution with mean 0 and variance 9. The matrix of basis functions F ∈ R n×2 was obtained as (Y, |Y|) and is column-wise centered to have sample mean 0. We obtained Γ as two eigenvectors of a p × p generated positive definite matrix. The data matrix of the predictors X ∈ R n×p was obtained as X = FΓ T +E, with error term E ∈ R n×p generated from the multivariate normal distribution with mean 0 and variance ∆.
Two structures were used for the covariance ∆. The first was an unstructured covariance matrix ∆ u = U T U where U is a p × p matrix with entries from the standard normal distribution. The second was an envelope structure ∆ e = ΓΩΓ T + Γ 0 Ω 0 Γ 0 with Ω ∈ R 2×2 and Ω 0 ∈ R (p−2)×(p−2) . The value of Γ is obtained from the first two eigenvectors of the unstructured ∆. The remaining eigenvectors are used for the value of Γ 0 .
These two structures correspond to the two models discussed in section 2.2. The likelihood (7) corresponds to the unstructured ∆ u while the likelihood function (6) corresponds to the envelope structure ∆ e . The parameters are then (Γ, ∆) in the unstructured PFC case, and (Γ, Ω, Ω 0 ) for the envelope PFC. We note that in the case of the envelope structure, Γ 0 is the orthogonal completion of Γ so that ΓΓ T + Γ 0 Γ T 0 = I. Estimation methods for these two PFC models exist. Cook and Forzani (2009) provide closed-form solutions for estimation of Γ and ∆ in the unstructured case. For the envelope PFC, Cook (2007) provided a maximum likelihood estimation over a Grassmann manifold for Γ while Ω and Ω 0 have a closed-form that depends on Γ and Γ 0 . The goal here is to evaluate the optimization and compare its performance to the closed-form solutions. Moreover, a product manifold is used so that the estimation of (Γ, ∆) is done once instead of a typical alternating procedure that alternates between holding certain values constant and optimizing over them.
To proceed, we coded the objective and gradients functions under both setups. For each dataset generated in the simulation, the closed-form solutions were obtained and compared to the solutions obtained via ManifoldOptim. We compared the true parameter Γ to the estimate Γ using the subspace distance ρ(Γ, Γ) = (I − Γ Γ T )Γ suggested by Xia et al. (2002). The true and estimated covariance were compared using d(∆, ∆) = Trace{(∆ − ∆) T (∆ − ∆)}. One hundred replications were used for the simulation. The estimated mean distances d(∆, ∆) and ρ(Γ, Γ) are reported in Tables 2 and 3 with their standard errors in parentheses.
The results indicate that both forms of optimization perform comparably. Thus, the product manifold in ManifoldOptim provides an alternative form of optimization which can ease implementation. Also note that the difference between the true and estimated covariance is significantly less for the envelope-∆ PFC, since there are less parameters to estimate in this case. Optimization was carried out using Σ and Σ res as initial values for ∆; the results were similar to the classical closed form method. The full code is provided as a supplement to this article.

Envelope Models
We consider a simulation in (Cook and Zhang, 2015) to compare the results using ManifoldOptim to results using the three algorithms in Cook and Zhang (2015) during a logistic regression. We generated 150 independent observations using Y i | X i ∼ Bernoulli(logit −1 (β T X i )), with β = (β 1 , β 2 ) T = (0.25, 0.25) T and X i ∼ N (0, Σ X ). We let v 1 = span(β) be the principal eigenvector of Σ X with eigenvalue 10 and let the second eigenvector v 2 be the orthogonal completion with eigenvalue of 0.1 such that v 1 v T 1 + v 2 v T 2 = I. This example is compelling because of the collinearity during construction of Σ X . Consequently, this causes poor estimates when using a standard generalized linear model (GLM). Three algorithms are considered in this section, namely GLM, Algorithm 1 in Cook and Zhang (2015), and a product manifold optimization. Both Algorithm 1 and the product manifold are constructed using functionality in ManifoldOptim. In the case of Algorithm 1, a Grassmann optimization is performed along with Fisher scoring in an iterative fashion. During the product manifold optimization the Grassmann manifold is optimized for Γ; η and α are estimated over the Euclidean manifold. Numerical derivatives are calculated with ManifoldOptim instead of using the analytical expression from Cook and Zhang (2015) since the latter pertains only to Grassmann manifold optimization. In all cases the LRBFGS (Huang et al., 2015b) algorithm is used for optimizing over manifolds with the objective function given by equation 3.7 in Cook and Zhang (2015). To avoid poor local maxima, the optimization was repeated several times from random starting points. Out of the many estimates at convergence, the one with the largest likelihood value was selected.
The full code is provided as a supplement to this article.
By examining the results shown in Table 4 it is immediately clear that the collinearity causes serious issues with classical methods of estimation such as GLM. The estimates in this case are not even close to the true values. However, both of the other methods perform well. Similar to Section 4.2, also note that the alternating optimization method shown in Cook's Algorithm 1 provides comparable results to the product manifold generated using ManifoldOptim. Since the product manifold is easier to implement, it can be a viable alternative.

Conclusions
We have presented ManifoldOptim, an R package for optimization on Riemannian Manifolds. ManifoldOptim fills a need for manifold optimization in R by making the functionality of ROPTLIB accessible to R users. The package so far deals with optimization on the Stiefel manifold, Grassmann manifold, unit sphere manifold, the manifold of positive definite matrices, the manifold of low rank matrices, the orthogonal group, and Euclidean space. We discussed basic usage of the package and demonstrated coding of problems in plain R or with C++. Furthermore, we demonstrated the product space of manifolds to optimize over multiple variables where each is constrained to its own manifold. Optimization over manifold-constrained parameters is needed for emerging statistical methodology in areas such as sufficient dimension reduction and envelope models.
We have discussed several applications of such models, and demonstrated ways in which ManifoldOptim could practically be applied. We hope that availability of this package will facilitate exploration of these methodologies by statisticians and the R community.