GPareto : An R Package for Gaussian-Process Based Multi-Objective Optimization and Analysis

The GPareto package for R provides multi-objective optimization algorithms for expensive black-box functions and an ensemble of dedicated uncertainty quantiﬁcation meth-ods. Popular methods such as Eﬃcient Global Optimization in the mono-objective case rely on Gaussian processes or Kriging to build surrogate models. Driven by the prediction uncertainty given by these models, several inﬁll criteria have also been proposed in a multi-objective setup to select new points sequentially and eﬃciently cope with severely limited evaluation budgets. They are implemented in the package, in addition with Pareto front estimation and uncertainty quantiﬁcation visualization in the design and objective spaces. Finally, it attempts to ﬁll the gap between expert use of the corresponding meth-ods and user-friendliness, where many eﬀorts have been put on providing graphical post-processing, standard tuning and interactivity.


Introduction
Numerical modeling of complex systems is now an essential process in fields as diverse as natural sciences, engineering, quality or economics.Jointly with modeling efforts, methods have been developed for the exploration and analysis of corresponding simulators, in particular when runs are time consuming.A popular approach in this case is to rely on surrogate models to alleviate the computational expense.Many surrogate models are used in practice: polynomials, splines, support vector regression, radial basis functions, random forests or Gaussian processes (GP).They may be integrated in various optimization strategies, see e.g., Wang and Shan (2007), Santana-Quintero, Montano, and Coello (2010), Tabatabaei, Hakanen, Hartikainen, Miettinen, and Sindhya (2015) and references therein.We focus here on GP-based strategies, which have been recognized as very well-suited for sequential designs of experiments, and in particular in an optimization context (Jones, Schonlau, and Welch 1998;Jones 2001).
The GPareto package proposes Gaussian-Process based sequential strategies to solve multiobjective optimization (MOO) problems in a black-box, numerically expensive simulator context.More precisely, it considers the case of models with multiple outputs, y (1) (x), . . ., y (q) (x) (where y (i) : X ⊂ R d → R), that are optimized simultaneously over a box-constrained domain X.Typically, outputs (or objectives) are conflicting (e.g., quality versus quantity, etc.), so there exists no solution where all objectives are minimized at once.The goal is then to iden-tify the set of optimal compromise solutions, called a Pareto set (Collette and Siarry 2003).Defining that a point x * dominates another point x if all its objectives are better (which we denote by x ⪯ x * in the following), the Pareto set X * is the subset of the non-dominated points in X: ∀x * ∈ X * , ∀x ∈ X, ∃k ∈ {1, . . ., q} such that y (k) (x * ) ≤ y (k) (x).
The image of the Pareto set in the objective space, y (1) (X * ), . . ., y (q) (X * ), is called the Pareto front, which is useful to practitioners to select solutions (see Figure 3 for an illustration).In practice, the Pareto set is usually not finite, and optimization strategies aim at providing a finite set that represents X * well.
In general, numerical optimization has motivated a substantial activity in the R community: see for instance the CRAN Task View on optimization and Mathematical Programming (Theussl and Borchers 2015) or the recent special Journal of Statistical Software issue (Varadhan 2014).However, most works are dedicated to mono-objective optimization with large budgets.For small budgets, the packages DiceOptim (Roustant, Ginsbourger, and Deville 2012;Ginsbourger, Picheny, and Roustant 2015) and tgp (Gramacy 2007;Gramacy and Taddy 2010) propose GP-based techniques, but for mono-objective problems only.There are a few packages on MOO in general: nsga2R (Tsou 2013), emoa (Mersmann 2012), mopsocd (Naval 2013), goalprog (Novomestky 2008) and mco (Mersmann 2014), which provide tools and algorithms such as NSGA-II (nondominated sorting genetic algorithm II) implementations (Deb, Pratap, Agarwal, and Meyarivan 2002) or hypervolume computations (see Section 2.2).As for methods available for expensive black-box functions optimization, the package SPOT (Bartz-Beielstein and Zaefferer 2012) seems to be the only alternative to GPareto.On the other hand, GP-based MOO has recently generated a substantial activity in the statistical and optimization communities, with focuses either on sampling strategies (Ponweiser, Wagner, Biermann, and Vincze 2008;Wagner, Emmerich, Deutz, and Ponweiser 2010;Svenson 2011;Emmerich, Deutz, and Klinkenberg 2011;Picheny 2015;Zuluaga, Sergent, Krause, and Püschel 2013) or on uncertainty quantification (Bhardwaj, Dasgupta, and Deb 2014;Calandra, Peters, and Deisenroth 2014;Binois, Ginsbourger, and Roustant 2015a).GPareto aims at filling this gap by making most of the recent approaches available in a unified implementation to both MOO experts and end-users.To this end, a substantial effort has been given to provide graphical visualization and standard tuning, and many entry-points ranging from high-level interfaces to specific method tuning have been made available.
GPareto is built upon the DiceKriging (Roustant et al. 2012) package dedicated to Gaussian process modeling.Several associated packages deal with various related problems, in particular DiceOptim (mono-objective optimization) and KrigInv (algorithms for inversion problems) (Chevalier, Picheny, and Ginsbourger 2014a,b).GPareto shares many aspects with those packages.This document is also available as a vignette in the package, which is available from the Comprehensive R Archive Network at https://CRAN.R-project.org/package=GParetoalong with the full PDF documentation.
The remainder of this paper reviews briefly the methods available in the package, describes important implementation aspects and functionalities, and provides illustrations through a few examples.

Principles of Gaussian-process based optimization
We recall here very briefly the scheme common to most GP-based (mono-or multi-objective) optimization, as in the famous EGO algorithm (efficient global optimization) proposed in the seminal article of Jones et al. (1998).

The mono-objective case
Let y be the output of the numerical model of interest and x ∈ R d the inputs to be optimized over.Considering for now that y is a scalar, it is assumed to be a realization of a Gaussian process F (x) with mean µ(x) and covariance c(x, x ′ ) known up to some parameters.
Step 1 Generate an initial set of n observations: y 1 = y(x 1 ), . . ., y n = y(x n ).Typically, {x 1 , . . ., x n } are chosen using a space-filling design.A classical rule-of-thumb is to use n = 10 × d.
Step 2 Fit the GP model to the data, by estimating the mean µ(x) and covariance c(x, x ′ ).Typically, a parametric form is assumed for those functions, whose parameters are adjusted, e.g., by using maximum likelihood estimation.The GP model is the distribution of Y (x) conditional on the observations y 1 , . . ., y n , with plugged-in mean µ and covariance c.
Step 3 A new point x n+1 is chosen as the maximizer of a so-called infill criterion which is based on the GP model.This step requires running an inner optimization loop to find the best point over R d .
Step 4 A new observation y n+1 = y(x n+1 ) is obtained by running the simulator and the GP model is updated by conditioning on y n+1 .At this step, the estimates of µ and c might be updated.
Steps 3 and 4 are repeated until the simulation budget is exhausted or when a stopping criterion is met.
There are many R packages to perform Step 1, see for instance planor (Kobilinsky, Bouvier, and Monod 2015), DiceDesign (Dupuy, Helbert, and Franco 2015), or lhs (Carnell 2016).For Step 2, GPareto relies on the DiceKriging package, which offers a choice of mean and covariance functions.The model parameters estimation is based on maximum likelihood, see Roustant et al. (2012) for details.
Step 3 defines the sampling strategy, as the infill criterion determines the balance between exploration (search for new solutions) and exploitation (local improvement around existing observations).The EGO algorithm is based on the so-called expected improvement (EI) criterion.The improvement is defined as the difference between the current minimum of the observations and the new function value, such that for a GP model, EI is the conditional expectation of the improvement provided by a new observation Y (x): which has a closed form expression (see Jones et al. 1998, for calculations).

Noisy objectives
In many optimization problems, the objective cannot be evaluated exactly but through a "noisy" procedure, that is, one only has access to measurements of the form f i = y(x i ) + ε i .A classical hypothesis, adopted here, is to assume independent Gaussian centered noise, that is: . GP modeling naturally adapts to this case (see for instance Ankenman, Nelson, and Staum 2010), and the package DiceKriging offers options to take noise into account.
However, the EGO algorithm may not be used directly; Picheny, Wagner, and Ginsbourger (2013) provides a review of the extensions that have been proposed to handle noisy objectives.Within those, the reinterpolation approach of Forrester, Keane, and Bressloff (2006) is attractive, since it amounts to building a secondary noiseless GP that can be directly used with EGO.As showed in Koch, Wagner, Emmerich, Bäck, and Konen (2015), this approach can be readily applied to the multi-objective case, and is implemented in GPareto.

The multi-objective case
When multiple objectives are considered (y has values in R q ), Steps 2 and 3 need to be modified.Let us remark first that it is possible to go back to a scalar problem and apply standard methods, for instance by relying on objectives aggregation (Knowles 2006;Zhang, Liu, Tsang, and Virginas 2010) or modeling desirability functions (Henkenjohann and Kunert 2007).However, these have been found to be relatively poor solutions in practice (Henkenjohann and Kunert 2007;Svenson 2011).
GPareto focuses on approaches where GP models are fitted independently to each objective.Although it is possible to account for correlation between the different objectives, for instance using co-kriging models (e.g., Álvarez, Rosasco, and Lawrence 2011), experimental results in Svenson (2011); Kleijnen and Mehdad (2014) suggested that it provides little benefit compared to the additional complexity.
Choosing infill points from a set of GP models is a complex question (see Section 2.2).Within GPareto, we focus on approaches that compute a single infill criterion from the list of models.Hence, Step 3 is identical to the mono-objective case, provided that an adequate infill criterion is used.

Review of surrogate-based and Bayesian multi-objective optimization
In the mono-objective case, the expected improvement criterion evaluates the potential gain of an additional point in terms of the expected decrease over the best observation so far.In a similar fashion, a multi-objective improvement function can be defined by estimating the expected "progress" brought by a new observation (relatively to the current set of nondominated observations P n ).This leaves room to put the focus either on a good coverage, on extremities, or on convergence toward the actual Pareto front, for which specific metrics, such as the hypervolume or epsilon indicators, have been proposed (see e.g., Svenson 2011;Emmerich et al. 2011).Specifically, the hypervolume improvement is the increment of the volume contained between the Pareto front and a reference point in the objective space, when a non-dominated point is added.The epsilon increment is the smallest scalar that must be added to components of a new point (in the objective space) such that it is dominated by the current Pareto front.An illustrative example is given in Figure 1.In terms of epsilon improvement, the green point is more interesting as it is farther away from the Pareto front, but the blue point is better in terms of volume increment.
These indicators, among others, have been used to define generalizations of the expected improvement.Empirical comparisons showed the clear superiority of some approaches to others (Svenson 2011;Wagner et al. 2010), but no global consensus on a particular improvement function.In GPareto, two infill criteria derived from this point of view are available: the expected hypervolume improvement (EHI, Emmerich et al. 2011) and expected maximin improvement (EMI, Svenson and Santner 2016, related to the epsilon indicator).See the corresponding references for the technical details.
Two alternatives have been included in GPareto as well.First, in the SMS-EGO approach (S-metric selection EGO, Ponweiser et al. 2008;Wagner et al. 2010), the improvement is calculated as the hypervolume added to the current Pareto front by the lower confidence bound of the prediction at x, hence it is closely related, but not equal to the EHI.To avoid large plateaus of zero improvement, an adaptive penalization is provided in regions where the lower confidence bound is dominated.
Finally, the stepwise uncertainty reduction (SUR) criterion of Picheny (2015) is in turn concerned with the probability of non-domination (also called probability of improvement), that is, the probability of a point not to be dominated by the current Pareto set: P [x ̸ ⪯ X n ].Intuitively, regions in the design space with non-null probabilities indicate a potential improvement for the Pareto front, and the improvement considered is the reduction of the average of this probability over the design space.
These sequential infill criteria share the common trait that they do not provide a continuous representation of the Pareto front but only consider the current set of non-dominated observations.This point is addressed in the following with a quantification of the uncertainty on both the Pareto set and front.

Uncertainty quantification
With limited evaluation budgets, the non-dominated solutions in the objective and variable spaces may not give a very precise or dense approximation of the Pareto front and set.However, the Gaussian process framework allows us to overcome this limitation by providing an uncertainty quantification of the optimization results.

Pareto front (objective space)
One straightforward idea is to use the surrogate models to give an estimate of the Pareto front, as is done e.g., in Calandra et al. (2014).While being fast, this approach is very dependent on the quality of the surrogates and there is no measure of uncertainty associated.
In Binois et al. (2015a), an alternative relying on conditional simulations of Gaussian process models is detailed, which provides an estimate of the Pareto front and an associated measure of uncertainty.
In short, it exploits the capacity of the GP models to generate different possible realizations N , . . ., S (q) N for the outputs conditioned by the observations, i.e., conditional simulations, see Figure 2.For each path, a Pareto front is obtained (say, P (1) , . . ., P (N ) ).Then, the set of fronts are used to define an average set P estimating the true Pareto front while the deviation from this set is used as a measure of uncertainty.Note that handling sets of conditional Pareto fronts as performed in Binois et al. (2015a) requires the use of random closed sets theory (Molchanov 2005); in particular, the estimator and uncertainty measure used are the Vorob'ev expectation and deviation, respectively.Visually, representing the deviation for each random Pareto front directly illustrates which parts of the Pareto front are precisely known or not (see Figure 5).As described in Binois et al. (2015a), the current version of this approach requires conditional simulations on discrete sets of inputs (for instance, a grid or a space-filling sample, which is the solution adopted in GPareto, see Section 3.3).This set must be large to ensure that no important potential solution is missed, which makes this approach computationally intensive.

Pareto set (variable space)
In a similar fashion, returning a smooth estimate of the entire Pareto set X * may be useful to practitioners.We propose here to rely on two complementary approaches.
First, conditional simulations can be used here: From each set of GP realizations, the Pareto set X * i can be obtained.Then, the sets X * 1 , . . ., X * N can be used to estimate a density, e.g., using kernel density estimation.
A complementary measure is the probability for a given point in the variable space to be nondominated by the current set of observations, P [x ̸ ⪯ X n ].This probability can be expressed in closed form (Keane 2006), so that it can be computed on a grid for instance to display the dominated and non-dominated regions in the variable space.The amount of intermediate probability values (not zero or one) quantifies the uncertainty on the Pareto set (Picheny 2015).
Note that both approaches require extensive sampling over the design space, which makes them computationally intensive.

Architecture
The structure of the package reflects its main orientations: multi-objective optimization and associated quantification of uncertainty.In particular, readers familiar with the DiceOptim and KrigInv packages will find a very similar set of functions ranging from high-level interfaces to lower level criteria.Additional helper functions are also provided as well as test functions.

Functions related to sequential design of experiments
As described in Section 2.1, Gaussian-process based optimization can be separated in four steps.Depending on the characteristics of the problem at hand, several levels of control are available.For the sake of clarity, we start by describing the highest-level functionalities before detailing routines that enable more control on the optimization process or may be integrated in other procedures.

User-friendly wrapper: easyGParetoptim
This is a simple interface to multi-objective optimization that perform all steps described in Section 2.1, which does not require much knowledge on the specificities of Gaussian process based optimization.If no additional control parameters are set, all Steps 1-4 are performed according to default values.
The minimal arguments for easyGParetoptim are the following, common with many optimization methods in R such as optim: • fn, this is the multi-objective function that returns the values of the objectives at a given design; • budget, the maximal number of evaluations of the expensive black-box function fn; • lower, upper, vectors giving the limits of the domain for optimization.A design of experiments may be passed using the argument par and corresponding values provided with values; otherwise a maximin LHS design is constructed from DiceDesign.
Noisy objectives can be handled with the argument noise.var,which stands for the noise variance.We assume here that the user has prior knowledge of the variance.The two main options are to provide a vector of size q (constant noise) or a function (same arguments as fn) if the noise depends on x.Additional tuning of the inner procedures are available using the control list, in particular the criterion (method) and the optimization routine of the acquisition function (inneroptim).By default, easyGParetoptim uses SMS as criterion, with pso as inner-optimization routine.Both choices have been made to favor speed while ensuring robustness.They are also the default choice for the following GParetoptim routine.

GParetoptim
This function handles Steps 3 and 4, hence assuming that users have performed a design of experiments and built surrogate models at their convenience, which they provide in the argument model.Besides fn, lower, upper and noise.varshared with easyGParetoptim, more parameters are directly exposed, such as crit for selecting the infill criteria or cov.reestim to decide whether or not hyperparameters are updated after adding new observations.More flexibility is given using control parameters, optim_control for the optimization of the infill criterion and crit_control for parameters of this latter, that are useful for the following crit_optimizer function.

crit_optimizer
Optimizing the criteria, a.k.a.acquisition functions, is quite complicated due to their multimodality: see Figure 5 for an illustration.Besides, in general, no derivative expressions are available and there are large plateaus.On top of that, the attraction basin of the global optimum of the infill criterion may have a very small volume in the variable space (see Roustant et al. 2012, for the illustration of this problem).Nonetheless, acquisition functions are typically much cheaper to evaluate than the objective functions and intensive optimization can be carried out.
Three solutions to perform this inner optimization are provided in GPareto: 1. the user can provide a set of candidate points with optimcontrol in crit_optimizer and GParetoptim (hence reducing the problem to a discrete search); 2. the default optimization routine is genoud (Mebane and Sekhon 2011), a genetic algorithm; 3. the psoptim optimization method (Bendtsen 2012), a particle swarm algorithm is also provided; and the corresponding tuning parameters may be passed to optimcontrol.Passing any other optimization method is also possible, given that it works as the standard optim method in R from package stats (R Core Team 2015).

Criteria functions
Four criteria are available in GPareto 1.1.6: • crit_SMS for the SMS-EGO criterion (Ponweiser et al. 2008;Wagner et al. 2010) (based on the MATLAB source code of the authors); • crit_EHI for the expected hypervolume improvement criterion (Emmerich et al. 2011) (based on the MATLAB source code of the authors for the bi-objective case); • crit_EMI for the expected maximin improvement criterion (Svenson and Santner 2016;Svenson 2011); • crit_SUR for the expected excursion volume Reduction criterion (Picheny 2015).
The crit_SMS criterion has an analytical expression for any number of objectives while the one for crit_EHI is only for the bi-objective case.There is a semi-analytical1 formula for crit_EMI for two objectives.Note that the formula for crit_EHI is coded using Rcpp (Eddelbuettel and François 2011; Eddelbuettel 2013), which offers considerable speed-up over an R implementation.
With m > 2, computations of crit_EMI and crit_EHI rely on sample average approximation (SAA) (Shapiro 2003), as proposed e.g., in Svenson (2011).The principle is to take samples from the posterior distribution of Y(x), i.e., Y(x) (1) , . . ., Y(x) (p) , and take the average of the improvement function over these samples: ).Note that a large sample size p is often needed to obtain a good approximation, which is at the cost of computational time.By default, the number of SAA samples nb.samp is set to 50.
crit_SUR requires integrating some quantities over the design space X, which must be done numerically, making this criterion computationally intensive.Similarly to the KrigInv package (Chevalier et al. 2014a), several alternatives to select integration points are provided using the function integration_design_optim, including uniformly distributed random points, quasi Monte Carlo sequences, as well as importance sampling schemes (as described in Picheny 2015).For now crit_SUR is available for two and three objectives.
In terms of complexity, both crit_EHI with m > 2 and crit_SMS use hypervolume computations provided in the emoa package (much more frequently for the first one, which is thus slower).Those have an exponential complexity in the number of objectives and also depend on the number of points in the Pareto front.For crit_EMI the complexity mainly depends on the number of sample points for the SAA approximation and linearly in the number of objectives, it is more affordable than crit_EHI for more than two objectives.For crit_SUR, the complexity is essentially related to the integration over the input domain which can become cumbersome with many variables.
Importantly, except for crit_SUR, these criteria depend on the relative scaling of the objectives, i.e., multiplying one objective by a constant modifies the results.Scaling may be performed by the user, e.g., from the maximum and minimum values observed for each objective as in Parr (2012) or Svenson (2011).In addition, crit_EHI and crit_SMS need a reference point for bounding hypervolume computations.If no reference point is given by the user, with refPoint, we set it to R i = max

User-friendly wrapper: plotGPareto
Results given by easyGParetoptim or GParetoptim can be visualized using the plotGPareto function.The default output of this function is to display only the points visited during optimization along with optimal points.Depending on the number of objectives, the Pareto front approximation is a simple plot (two objectives), a perspective view of the Pareto front (three) or a representation in parallel coordinates (Inselberg 2009) (more than three).
Then, three different outputs are possible to improve insight on the algorithm results.These can be obtained either by setting some options of plotGPareto or directly by calling the corresponding functions: • an estimation of the Vorob'ev expectation giving the expected location of the Pareto front along with a visualization of the corresponding uncertainty (option UQ_PF = TRUE or with CPF and plotSymDevFun); • an estimation of the density of Pareto optimal points in the variable space (option UQ_dens = TRUE or with ParetoSetDensity); • a visualization of the probability of non-domination in the variable space (option UQ_PF = TRUE or with plot_uncertainty).

Uncertainty quantification on Pareto front
The entry function is the creator of the 'CPF' class (for conditional Pareto front), which deals with computing the probability for a target in the objective space to be dominated, also known as the attainment function, Vorob'ev expectation (VE) and Vorob'ev deviation (VD), from a grid discretization.It takes as main arguments: • fun1sims, fun2sims, the sets of conditional simulations for both objectives, that can be computed for instance using the simulate function of DiceKriging; • response, the known objective values.
The empirical attainment function is calculated on a grid in the objective space from the CPF sets given by the conditional simulations.Taking advantage of the regularity of the grid to compute volumes, the Vorob'ev expectation is computed quickly by dichotomy.Then the Vorob'ev deviation is a sum of hypervolume indicator values.The plot method applied to CPF objects displays the attainment function in gray-scale, and possibly the VE.In addition, the plotSymDefFun function can be used to display the spread of conditional simulations of Pareto fronts around the Vorob'ev expectation.See Binois et al. (2015a) for details.

Uncertainty quantification on Pareto set
The function plot_uncertainty, based on the print_uncertainty_nd function of the Krig-Inv package Chevalier et al. (2014a), draws contour lines of the probability of non-domination.
In dimension larger than two, contour lines are drawn for each couple of two variables representing either the average, maximum or minimum of the probability over the other variables.
The function ParetoSetDensity relies from one end on conditional simulations of the objectives given by the simulate function of DiceKriging, and on the other end on a kernel density estimation of the probability of belonging to the Pareto set.It returns an object of class 'kde' from the package ks (Duong 2016).This object can be displayed in small dimension (which is done by plotGPareto), or may be used to sample points.

Search for target designs
Finally, GPareto allows the user to search for additional points corresponding to a particular target in the objective space.Given a target point (for instance, a location along the estimated Pareto front based on the Vorob'ev expectation), the function getDesign returns the closest design, that is, the design that maximizes the probability of dominating the target in the variable space.This step requires running an optimization algorithm, which can be tuned similarly to crit_optimizer using an optimcontrol argument.

Fast objectives
Motivated by applications where some objective functions are computable at a negligible cost compared to other objectives, GPareto offers an option for MOO in case of co-existing cheapand expensive-to-evaluate objectives.As an example, in structural mechanics one objective is typically the mass (which is directly derived from the design variables) and the other depends on the response of the system, hence involving a finite element model.To ensure compatibility with the infill criteria, fast objectives are wrapped in the 'fastfun' class which mimics the behavior of methods such as predict or update.Then predicting the value at a new point amounts to evaluating the fast function, which returns the corresponding value with a zero prediction variance, exactly like what happens for already evaluated points.They may be used with the cheapfn argument in easyGParetoptim, GParetoptim and crit_optimizer.

Numerical stability
Another computational challenge with kriging, discussed, e.g., in Roustant et al. (2012), is the numerical non invertibility of covariance matrices.It usually happens whenever design points are too close.This is especially troublesome in optimization since, when converging, points are likely to be added close to each other2 .In GPareto, preventing this problem is achieved with the checkPredict function.Before evaluating the selected criterion, checkPredict tests whether the new point x is too close to existing ones, with a tunable threshold that can be passed as argument.Three options are available to define when designs are considered as "too close": • minimal Euclidean distance in the input space: min • ratio of the predictive variance s n (x) 2 over the variance parameter for stationary kernels; • minimal canonical distance coupled with k n : min The first two options are also used in KrigInv and DiceOptim respectively.The first one is less computationally demanding but also less robust.
Moreover, to improve stability of the update of already existing models with new observations, it is possibly attempted twice.First, an update with re-estimation of the hyperparameters is performed.Then, if it has failed, a new update is tested with the old hyperparameters.If this is still insufficient to train the model with all observations, the user may try to remove some points or apply the jitter technique consisting in adding a small constant to the diagonal of the covariance matrix to improve its condition number, see e.g., Roustant et al. (2012).
Replacing two close observations by one observation and its estimated directional derivative as proposed in Osborne ( 2010) is another appealing solution.

Illustrating examples using GPareto
This section shows the different functionalities of GPareto on three classical toy examples.

Two objectives, unidimensional example
We consider the following simple 1-dimensional bi-objective optimization problem from the literature, see e.g., Van Veldhuizen and Lamont (1999), re-scaled to [0, 1], to illustrate the different steps of the procedure and the key concepts of GP-based multi-objective optimization: We first define the initial design of experiments (design.init,six points evenly spaced between zero and one) and compute the corresponding set of observations response.init,which we use to build two kriging models with DiceKriging's km function and put them into a single list (model): R> design.init<-matrix(seq(0, 1, length.out= 6), ncol = 1) R> response.init<-MOP2(design.init)R> mf1 <-km(~1, design = design.init,response = response.init[,1]) R> mf2 <-km(~1, design = design.init,response = response.init[,2]) R> model <-list(mf1, Then, we call the main function GParetoptim to perform seven optimization steps using the EHI criterion.Note that EHI requires a reference point as a parameter, which corresponds to an upper bound for each objective (here [2, 2], if not provided, it is estimated at each iteration, see Section 3.2.4).The other mandatory inputs are the GP models model, the objective function fn, number of steps (nsteps) and the design bounds (lower and upper).
We use this example to show three important features of the package: • the possibility to access different steps of the EGO strategy, • the use of 'fastfun' objects and, • the post-processing functionalities.
On this analytical example, it is possible to display the true Pareto front and set using the plotParetoGrid function: R> plotParetoGrid(P1) The graphical output is shown in Figure 4. Now, we call directly the function crit_optimizer to choose the next point to evaluate using the SUR criterion.Here, the optimcontrol input is used to choose the genoud algorithm for the criterion optimization.The critcontrol input allows us to choose the integration points for the criterion, here a regular 21 × 21 grid.
In Figure 5, we show the initial set of observations and the next point to evaluate according to each setup.For illustration purposes, the contour lines of the criteria are also computed.We see that using the 'fastfun' object (hence, additional information), the SMS criterion points clearly to a narrower region, which is in addition quite different from the ones given by the other setup.On both cases, the inner optimization loops successfully find the global maxima of the criteria surfaces.The red crosses show the optimal sampling points according to the criteria, found using genoud (left) and pso (right), respectively.Now, we apply two (for vignette building speed) steps of SUR, first with two regular objectives, then with the fastfun setting: R> sol <-GParetoptim(model = model, fn = fun, crit = "SUR", nsteps = 2, + lower = c(0, 0), upper = c(1, 1), optimcontrol = list(method = "pso"), + critcontrol = list(SURcontrol = list(distrib = "SUR", n.points = 40))) R> solFast <-GParetoptim(model = list(mf1), fn = fun1, cheapfn = fun2, + crit = "SUR", nsteps = 2, lower = c(0, 0), upper = c(1, 1), + optimcontrol = list(method = "pso"), + critcontrol = list(SURcontrol = list(distrib = "SUR", n.points = 40))) Then, we generate the post-treatment processes using plotGPareto.The graphical outputs are given in Figure 6.Optional parameters f1lim and f2lim are used to fix bounds for the top graphs to allow better comparison.First, we see the interest of using the 'fastfun' class when some objectives are cheap to compute: The Pareto front obtained this way is much more accurate (Figure 6, top), in particular for low values of the second objective.
Interestingly, the two Vorob'ev expectations are similar, and provide a very good prediction of the actual Pareto front (Figure 4), except for the lowest values of the first objective.However, the Vorob'ev deviations (gray areas) show a higher local uncertainty for this part of the front.Overall the Vorob'ev deviation values (394 and 296, respectively) indicate a substantially better confidence on the predicted Pareto front using fastfun.
The probability and density plots (Figure 6, second and third rows, respectively) provide complementary information on the Pareto set (input space).The probability plots indicate interesting (white) and uninteresting (black) regions, as well as uncertain ones (gray), but do not provide a clear insight on the Pareto set.Here, on both cases, the large gray areas show that additional observations may be beneficial, which is consistent with the large difference between the current Pareto front and the Vorob'ev expectation (Figure 6,top).On the other hand, the densities provide a rather accurate estimates of the Pareto set, in particular for the fastfun setup.
Finally, one may want to extract points from the Vorob'ev expectation of the Pareto front (that is, the input realizing a particular trade-off) that have not been observed yet.To this end, the getDesign function returns the most probable design given a target in the objective space, and can be called as follow: Here, we have chosen a target [55, −30] that is on the Vorob'ev expectation, where the uncertainty is small but where no observation is near (Figure 6, top left).The getDesign output is a list with the value of the design (par), the value of the criterion, i.e., the probability that the newPoint objective is not dominated by the target) (value, here 90%) and the GP prediction of each objective with the associated uncertainty ((mean), (sd) and confidence intervals).Here, the value of the second objective reaches the target with large confidence, but the first objective value is quite uncertain.

Four variables, three objectives
Here we consider the DTLZ2 optimization problem (Deb et al. 2005)  This time we simply use easyGParetoptim to solve the problem without having to train or prepare models.
R> res <-easyGParetoptim(fn = DTLZ2, budget = 50, lower = rep(0, 4), + upper = rep(1, 4)) Then, we visualize the output using plotGPareto.Note that with dimensions larger than two and more than two objectives, only the Pareto front visualization and the probability plots are available.For the latter, we changed the grid size parameter (resolution) and the number of integration points (nintegpoints) to avoid overly costly figures.
Then, we visualize the output using plotGPareto.Note that with dimensions larger than two and more than two objectives, only the Pareto front visualization and the probability plots are available.For the latter, we changed the grid size parameter(resolution) and the number of integration points (nintegpoints) to avoid overly costly figures.
The graphical outputs are shown in Figure 7. From the definition of DTLZ2, the optimal value for both x 3 and x 4 is 1/2.This is clearly visible on the probability of non-domination graphs: The (x 3 , x 4 ) surface (bottom right) is unimodal with its maximum at (0.5, 0.5), the other graphs show a ridge at 0.5 for one of the variables.From this representation, optimal sets for x 1 and x 2 are more difficult to observe.

Figure 1 :
Figure 1: Comparison of additive-epsilon (left, arrows) and hypervolume (right, filled areas) improvements for two possible new observations (green and blue) to the current Pareto front (red points).The reference point for hypervolume computations is the black crossed circle.In terms of epsilon improvement, the green point is more interesting as it is farther away from the Pareto front, but the blue point is better in terms of volume increment.

Figure 2 :
Figure 2: Left and center: three conditional (i.e., interpolating at observations) simulations of objectives f 1 and f 2 , respectively, based on GP modeling.Right: corresponding images in the objective space.Pareto sets and fronts are shown in bold.

Figure 3 :
Figure 3: Summary of the optimization procedure on the 1-dimensional example.Top: objective functions are in black, with design points in blue.The red points show the Pareto set.The right figure shows the problem in the objective space (f 1 vs. f 2 for all x).The red line shows all the Pareto-optimal solutions of the problem and the blue line is the current Pareto front based on the six observations.Middle: GP models corresponding to both objectives based on the initial observations and corresponding acquisition criterion (expected hypervolume improvement) that is maximized to select the next observation.Bottom: GP models at the end of the optimization process and Pareto front returned by the method.

Figure 4 :
Figure 4: Actual Pareto front and set for (P1).As in the previous example, we first build an initial set of (ten) observations and a list of two GP models:

Figure 7 :
Figure 7: Perspective view of the Pareto front (left) and uncertainty in the variable space (right) for example 3.

Table 1 :
Svenson (2011)(2010)08)d objectives the method ofPonweiser et al. (2008)and references therein.The scaling and additional parameters are some of the drawbacks of multi-objective infill criteria, as discussed inWagner et al. (2010)andSvenson (2011).Summary of the characteristics of infill criteria available in GPareto.The computational costs are given for a bi-objective example.Note that the cost of crit_EHI is low in this case but increase exponentially with the output dimension.SURcontrol is a list of parameters depending on the integration strategy chosen.