DEoptim: An R Package for Global Optimization by Differential Evolution

This article describes the R package DEoptim, which implements the Differential Evolution algorithm for global optimization of a real-valued function of a real-valued parameter vector. The implementation of Differential Evolution in DEoptim interfaces with C code for efficiency. The utility of the package is illustrated by case studies in fitting a Parratt model for X-ray reflectometry data and a Markov-Switching Generalized AutoRegressive Conditional Heteroskedasticity model for the returns of the Swiss Market Index.


Introduction
Optimization algorithms inspired by the process of natural selection have been in use since the 1950s (Mitchell 1998), and are often referred to as evolutionary algorithms. The genetic algorithm is one such method, and was invented by John Holland in the 1960s (Holland 1975). Genetic algorithms apply logical operations, usually on bit strings of fixed or variable length, in order to perform crossover, mutation, and selection on a population. Over the course of successive generations, the members of the population are more likely to represent a minimum of an objective function. Genetic algorithms have proven themselves to be useful heuristic methods for global optimization, in particular for combinatorial optimization problems. Evolution strategies are another variety of evolutionary algorithm, in which DEoptim: Global Optimization by Differential Evolution in R members of the population are represented with floating point numbers, and the population is transformed over successive generations using arithmetic operations. See Price, Storn, and Lampinen (2006, Section 1.2.3) for a detailed overview of evolutionary algorithms.
The DEoptim implementation of DE was motivated by our desire to extend the set of algorithms available for global optimization in the R language and environment for statistical computing (R Development Core Team 2009). R enables rapid prototyping of objective functions, access to a wide array of tools for statistical modeling, and ability to generate customized plots of results with ease (which in many situations makes use of R preferable over the use of programs in languages like Java, MS Visual C++, Fortran 90 or Pascal). Furthermore, R is released in open-source form under the terms of the GNU General Public License, meaning that packages implemented for it do not require the purchase of commercial software. R also has a large and growing user base interested in optimization. DEoptim has been published on the Comprehensive R Archive Network and is available at http://CRAN.R-project.org/package=DEoptim. Since becoming publicly available it has been used by a variety of authors, e.g., Börner, Higgins, Kantelhardt, and Scheiter (2007), Higgins, Kantelhardt, Scheiter, and Boerner (2007), Cao, Vilar, andDevia (2009), Opsina Arango (2009), and Ardia, Boudt, Carl, Mullen, and Peterson (2010) to solve optimization problems arising in diverse domains.
In the remainder of this manuscript we elaborate on DEoptim's implementation and use.
In Section 1.1, the package is introduced via a simple example. Section 2 describes the underlying algorithm. Section 3 describes the R implementation and serves as a user manual. DEoptim is then illustrated via two cases studies, involving fitting a Parratt recursion model for X-ray reflectometry data (in Section 4) and a Markov-switching generalized autoregressive conditional heteroscedasticity (MSGARCH) model for log-returns of the Swiss Market Index (in Section 5).

An introductory example
Minimization of the Rastrigin function in x ∈ D f (x) = D j=1 x 2 j − 10 cos (2πx j ) + 10 for D = 2 is a common test for global optimization algorithms.
In order to minimize this function using DEoptim, the R interpreter is invoked, and the package is loaded with the command

R> library("DEoptim")
DEoptim package Differential Evolution algorithm in R Authors: David Ardia and Katharine Mullen  The DEoptim function of the package DEoptim searches for minima of the objective function between lower and upper bounds on each parameter to be optimized. Therefore in the call to DEoptim we specify vectors that comprise the lower and upper bounds; these vectors are the same length as the parameter vector. The call to DEoptim can be made as R> est.ras <-DEoptim(rastrigin, lower = c(-5, -5), upper = c(5, 5), + control = list(storepopfrom = 1, trace = FALSE)) Note that the vector of parameters to be optimized must be the first argument of the objective function fn passed to DEoptim. The above call specifies the objective function to minimize, rastrigin, the lower and upper bounds on the the parameters, and, via the control argument, that we want to store intermediate populations from the first generation onwards (storepopfrom = 1), and do not want to print out progress information each generation (trace = FALSE). Storing intermediate populations allows us to examine the progress of the optimization in detail. Upon initialization, the population is comprised of 50 vectors x of length two (50 being the default value of NP), with x i a random value drawn from the uniform distribution over the values defined by the associated lower and upper bound. The operations of crossover, mutation, and selection explained in Section 2 transform the population so that the members of successive generations are more likely to represent the global minimum of the objective function. The members of the population generated by the above call are plotted at the end of different generations in Figure 2. DEoptim consistently finds the minimum of the function within 200 generations using the default settings. We have observed that DEoptim solves the Rastrigin problem more efficiently than the simulated annealing method found in the R function optim.
We note that as the dimensionality of the Rastrigin problem increases, DEoptim may not be able to find the global minimum in the default number of generations. Heuristics to help ensure that the global minimum is found include re-running the problem with a larger population size (value of NP), and increasing the maximum allowed number of generations.

Problems suitable for DE
Differential evolution does not require derivatives of the objective function. It is therefore useful in situations in which the objective function is stochastic, noisy, or difficult to differentiate. DE, however, may be inefficient on smooth functions, where derivative-based methods generally are most efficient.
In the example below, a generalized Rosenbrock function is considered. This function is differentiable and has a single local minima. It is often more efficient to apply methods other than DE to optimization of such functions. Functions that are smooth but have many local minima, however, may still be good candidates for optimization with DE, since alternative algorithms for local, as opposed to global, optimization may converge to a sub-optimal solution.

The differential evolution algorithm
We sketch the classical DE algorithm here and refer interested readers to the work of Storn and Price (1997) and Price et al. (2006) for further elaboration. The algorithm is an evolutionary technique which at each generation transforms a set of parameter vectors, termed the population, into another set of parameter vectors, the members of which are more likely to minimize the objective function. In order generate a new parameter vector, DE disturbs an old parameter vector with the scaled difference of two randomly selected parameter vectors.
The variable NP represents the number of parameter vectors in the population. At generation 0, NP guesses for the optimal value of the parameter vector are made, either using random values between upper and lower bounds for each parameter or using values given by the user. Each generation involves creation of a new population from the current population members x i,g , where i indexes the vectors that make up the population and g indexes generation. This is accomplished using differential mutation of the population members. A trial mutant parameter vector v i,g is created by choosing three members of the population, x r0,g , x r1,g and x r2,g , at random. Then v i,g is generated as where F is a positive scale factor. Effective values of F are typically less than 1.
After the first mutation operation, mutation is continued until either length(x) mutations have been made or rand > CR, where CR is a crossover probability CR ∈ [0, 1], and where here and throughout rand is used to denote a random number from U(0, 1). The crossover probability CR controls the fraction of the parameter values that are copied from the mutant. CR approximates but does not exactly represent the probability that a parameter value will be inherited from the mutant, since at least one mutation always occurs. Mutation is applied in this way to each member of the population.
If an element v j of the parameter vector is found to violate the bounds after mutation and crossover, it is reset, where here and throughout we use j to index into a parameter vector. In the implementation of DEoptim, if v j > upper j , it is reset as v j . = upper j − rand · (upper j − lower j ), and if v j < lower j , it is reset as v j . = lower j + rand · (upper j − lower j ). This ensures that candidate population members found to violate the bounds are set some random amount away from them, in such a way that the bounds are guaranteed to be satisfied. Then the objective function values associated with the children v are determined. If a trial vector v i,g has equal or lower objective function value than the vector x i,g , v i,g replaces x i,g in the population; otherwise x i,g remains. The algorithm stops after some set number of generations, or after the objective function value associated with the best member has been reduced below some set threshold.
Variations on this theme are possible, some of which are described in the following section. Values of NP and CR that have been found to be most effective for a variety of problems are described in Price et al. (2006, Section 2). Reasonable default values for many problems are given in the following section.

Implementation
DEoptim was first published on the Comprehensive R Archive Network (CRAN) in 2005 by David Ardia. Early versions were written in pure R. Since version 2.0-0 (published to CRAN in 2009 by Katharine Mullen) the package has relied on an interface to a C implementation of DE, which is significantly faster on most problems as compared to the implementation in pure R. Since version 2.0-3 the C implementation dynamically allocates the memory required to store the population, removing limitations on the number of members in the population and length of the parameter vectors that may be optimized.
The implementation is used by calling the R function DEoptim, the arguments of which are: fn: The objective function to be minimized. This function should have as its first argument the vector of real-valued parameters to optimize, and return a scalar real result.
lower, upper: Vectors specifying scalar real lower and upper bounds on each parameter to be optimized, so that the ith element of lower and upper applies to the ith parameter. The implementation searches between lower and upper for the global optimum of fn.
control: A list of control parameters, discussed below.
...: allows the user to pass additional arguments to the function fn.
The control argument is a list, the following elements of which are currently interpreted: VTR: The value to reach. Specify the global minimum of fn if it is known, or if you wish to cease optimization after having reached a certain value. The default value is -Inf.
strategy: This defines the differential evolution strategy used in the optimization procedure, described below in the terms used by Price et al. (2006): -1: DE / rand / 1 / bin (classical strategy). This strategy is the classical approach described in Section 2.
-2: DE / local-to-best / 1 / bin. In place of the classical DE mutation given in (1) is used, where old i,g and best g are the ith member and best member, respectively, of the previous population. This strategy is currently used by default.
-3: DE / best / 1 / bin with jitter. In place of the classical DE mutation given in (1), the expression is used, where jitter is defined as 0.0001 · rand + F .
-4: DE / rand / 1 / bin with per vector dither. In place of the classical DE mutation given in (1), the expression is used, where dither is calculated as dither .
-5: DE / rand / 1 / bin with per generation dither. The strategy described for 4 is used, but dither is only determined once per-generation. DEoptim: Global Optimization by Differential Evolution in R any value not above: variation to DE / rand / 1 / bin: either-or algorithm. In the case that rand < 0.5, the classical strategy described for 1 is used. Otherwise, the expression is used.
bs: If FALSE then every mutant will be tested against a member in the previous generation, and the best value will survive into the next generation. This is the standard trial vs. target selection described in Section 2. If TRUE then the old generation and NP mutants will be sorted by their associated objective function values, and the best NP vectors will proceed into the next generation (this is best-of-parent-and-child selection). The default value is FALSE. initialpop: An initial population used as a starting population in the optimization procedure, specified as a matrix in which each row represents a population member. May be useful to speed up convergence. Defaults to NULL, so that the initial population is generated randomly within the lower and upper boundaries. The default value of control is the return value of DEoptim.control(), which is a list with the above elements and specified default values.
The return value of the DEoptim function is a member of the S3 class DEoptim. Members of this class have a plot method that accepts the argument plot.type. When retVal is an object returned by DEoptim, calling plot(retVal, plot.type = "bestmemit") results in a plot of the parameter values that represent the lowest value of the objective function each generation. Calling plot(retVal, plot.type = "bestvalit") plots the best value of the objective function each generation. Calling plot(retVal, plot.type = "storepop") results in a plot of stored populations (which are only available if these have been saved by setting the control argument of DEoptim appropriately). A summary method for objects of S3 class DEoptim also exists, and returns the best parameter vector, the best value of the objective function, the number of generations optimization ran, and the number of times the objective function was evaluated.
A note on recommended settings: We have set the default values to the methods recommended by Price et al. (2006) as starting points. We use strategy = 2 by default; the user should consider trying as alternatives strategy = 6 and strategy = 1, though the best method will be highly problem-dependent. Generally, the user should set the lower and upper bounds to exploit the full allowable numerical range, i.e., if a parameter is allowed to exhibit values in the range [-1, 1] it is typically a good idea to pick the initial values from this range instead of unnecessarily restricting diversity. Increasing the value for NP will mean greater likelihood of finding the minimum, but run-time will be longer.

Application I: X-ray reflectometry
X-ray reflectometry (XRR) is a measurement method that uses the interference of X-rays (i.e., photons with a wavelength in the approximate range of 0.01 nm-10 nm) caused by changes in a material's electron density to characterize thin films or other layered structures at the nanometer to micrometer scale. The data collected consists of pairs of incident/scattered angle and scattered X-ray intensities, {(θ r , I r )}, typically over a range of about 5 degrees. Information regarding the density and thickness of each layer, and on the roughness of the interface between layers and at the surface of the material is extracted by fitting a parametric model to the measurements.
In the supplementary information we provide the full description of a model function used by DEoptim to obtain physically realistic parameter estimates from the data shown in Figure 3. This model is based on the Parratt recursion (Parratt 1954), which, as Als-Nielsen and Mc-Morrow (2001) describe in detail, is often used to model each of the layers in multilayered materials. For the data here, the Parratt recursion is used to describe reflection and transmission of X-rays from two thin layers of Pt (with each layer having a possibly distinct thickness, density, and roughness at the interface) atop an infinitely thick layer of SiO 2 . Figure 4 is a schematic description of the model for this multilayered material.
The free parameters of the applied Parratt recursion model are the thicknesses d 1 and d 2 of each Pt layer, the density α 1 and α 2 of each Pt layer, terms ρ 1 , ρ 2 and ρ 3 descriptive of the roughness of the interfaces between layers and at the surface, a parameter b describing a linear background, and a multiplicative scaling parameter m.   Figure 4: Schematic description of two layers of Pt on a substrate of SiO 2 . A Parratt recursion model representing this structure will be fit to the XRR measurements, with free parameters including the thickness of the Pt layers (d 1 and d 1 ), and terms (ρ 1 , ρ 2 , and ρ 3 ) describing the roughness of the interfaces between layers. position of the abrupt drop-off in scattered intensity after the initial plateau is determined by the density of the layer. The period of the subsequent oscillation fringes is set by the thickness of the layer, whereas the decay of the oscillations is a function of the roughness of the layer. Because the amplitudes of reflected and transmitted waves interfere, this qualitative view cannot be extended to multilayered systems and model fitting is a necessity.
The objective function R to minimize is formulated as the sum of the squared differences between the log of the data and the log of the Parratt recursion model function. The surface of objective function values in the 9-dimensional parameter space contains many local minima. Discovery of parameter estimates that represent a qualitatively good fit requires a global optimization algorithm such as DE. Treatment of global optimization problems such as these For comparison, the model has also been evaluated at the lower and upper bounds on the parameters used in the call to DEoptim (solid and dashed grey, respectively).
have been successfully addressed for many years in the XRR community using DE as, e.g., Wormington, Panaccione, Matney, and Bowen (1999), Taylor, Wall, Loxley, Wormington, and Lafford (2001), and Bowen and Tanner (2006) describe. Special purpose programs, e.g., the GenX program developed by Björck and Andersson (2007) and the MOTOFIT program developed by Nelson (2006), have been implemented for XRR model fitting problems of this sort.
The XRR measurements shown in Figure 3 are included in DEoptim as the dataset xrrData, with the vector of data to be fit represented by the vector counts. We have encoded the objective function R as the function rss. Using knowledge of the physical system underlying the measurements in order to set plausible lower and upper bounds on the parameters to optimize, and to set fixed values for beta, wavelength, theta_r and delta, the objective function is minimized with the call R> parrattFit <-DEoptim(lower = c(d_1 = 5.5e-10, d_2 = 1.5e-08, + rho_1 = 2.1e-10, rho_2 = 5.0e-12, rho_3 = 2.2e-10, alpha_1 = 10, + alpha_2 = 10, b = 40, m = .90e7), upper = c(d_1 = 5.5e-09, + d_2 = 1.5e-07, rho_1 = 2.1e-09, rho_2 = 5.0e-11, rho_3 = 2.2e-09, + alpha_1 = 21.46, alpha_2 = 21.46, b = 55, m = 1.1e7), + fn = rss, theta_r = theta_r, delta = delta, beta = beta, + wavelength = wavelength, data = counts, + control = list(itermax = 1500, NP = 90)) Table 2 gives parameter estimates arrived at via the above call, along with the associated lower and upper bounds. The resulting fit of the model to the data is shown in Figure 5. The upper bound for the density of the Pt layers was set at 21.46 g·cm −3 , the density of Pt in bulk. The estimates for the densities (19.6 g·cm −3 and 20.9 g·cm −3 ) are slightly lower than for the bulk material. The remaining parameter estimates are also plausible from physical first principles, though the evaluation of the ability of the model to describe the material underlying the XRR measurements is beyond the scope of this paper. As shown in Figure 5,  Table 2: Parameter estimates (bestmem) and lower and upper bounds associated with the call to DEoptim that results in the fit of Parratt recursion model to XRR data shown in Figure 5. Parameters d 1 and d 2 represent the thickness of the Pt layers, parameters ρ 1 , ρ 2 , ρ 3 describe the roughness of the interfaces between layers, and parameters α 1 and α 2 represent the density of the Pt layers. Parameter b represents an additive background term, and parameter m represents a multiplicative scaling factor for the intensity. Estimates are reported to two significant figures, except for t 1 and t 2 , which are reported to three.
the model fit captures the qualitative features of the dataset well. The robustness of the estimates has been validated via initialization of DE using a variety of starting populations; the estimates presented in Table 2 reliably represent the best results obtained.
The function rss encoding the objective function can easily be customized to the dataset at hand, allowing, for instance, inclusion of more or fewer free parameters. Note that in this example, the population size, NP, was set to 90 since in practice it has been observed that convergence to the global optimum is facilitated if NP is at least ten times the number of parameters being optimized (Price et al. 2006). Determination of whether the best member returned by DEoptim with this call represents a unique global minimum is beyond the scope of this paper, but would be interesting to check for the purpose of developing a physical interpretation of the model fit.

Application II: Log-returns of the Swiss Market Index
Volatility plays a central role in empirical finance and financial risk management. Research on changing volatility (i.e., conditional variance) using time series models has been active since the creation of the original ARCH (autoregressive conditional heteroscedasticity) and GARCH (generalized ARCH) models. Since then, GARCH type models grew rapidly into a rich family of empirical models for volatility forecasting during the last twenty years. They are now widespread and essential tools in financial econometrics.
In the GARCH(p, q) specification introduced by Bollerslev (1986), the conditional variance at time t of the log-return y t (of a financial asset or a financial index), denoted by σ 2 t , is postulated to be a linear function of the squares of past q log-returns and past p conditional variances. More precisely: where the parameters α 0 > 0, α i ≥ 0 (i = 1, . . . , q) and β j ≥ 0 (j = 1, . . . , p) in order to ensure a positive conditional variance. In most empirical applications it turns out that the simple specification p = q = 1 is able to reproduce the volatility dynamics of financial data. This has led the GARCH(1, 1) model to become the workhorse model by both academics and practitioners.
Numerous extensions and refinements of the GARCH(1, 1) model have been proposed to mimic additional stylized facts observed in financial markets, such as nonlinearity, asymmetry, and long memory properties in the volatility process; see Bollerslev, Chou, and Kroner (1992) and Bollerslev, Engle, and Nelson (1994) for a review. Among them, the class of Markov-switching GARCH (MSGARCH) has gained particular attention in recent years. In these models, the parameters of the scedastic function (2) can change over time according to a latent (i.e., unobservable) variable taking values in the discrete space {1, . . . , K} where K is an integer defining the number of regimes or states. The interesting feature of these models lies in the fact that they provide an explanation of the high persistence in volatility, i.e., nearly unit root process for the conditional variance, observed with single-regime GARCH models (Lamoureux and Lastrapes 1990). Furthermore, these models are apt to react quickly to changes in the volatility level (unconditional volatility) which leads to significant improvements in volatility forecasts as shown by Dueker (1997) or Klaassen (2002) for instance. These features make the models attractive for various applications in financial modeling, such as risk management.
While MSGARCH models are attractive for the description of a variety of phenomena, we face practical difficulties when attempting to fit their parameters to data. The maximization of the likelihood function is a constrained optimization problem since some (or all) of the model parameters must be positive to ensure a positive conditional variance. It is also common to require that the covariance stationarity condition holds; this leads to additional non-linear inequality constraints which render the optimization procedure cumbersome. Optimization results are often sensitive to the choice of starting values. Finally, convergence is hard to achieve if the true parameter values are close to the boundary of the parameter space and if the underlying process is nearly non-stationary. For these reasons, a robust optimizer is required. DE offers an adequate approach to finding the maximum likelihood parameter estimates in this framework.
In order to illustrate the robustness of DEoptim compared to traditional estimation techniques, we consider a two-state asymmetric MSGARCH model investigated in Ardia (2008, Chapter 7). The author illustrated the poor performance of traditional local optimizers when estimating such sophisticated models. Only computationally demanding Markov chain Monte Carlo techniques were able to provide meaningful results.
A two-state Markov-switching asymmetric GARCH(1, 1) model with Student-t innovations for the log-returns {y t } may be written as where ω i > 0, α + i , α − i , β i ≥ 0 (i = 1, 2) and ν > 2. The restriction on the degrees of freedom parameter ν ensures that the conditional variance σ 2 i,t remains finite; the restrictions on the GARCH parameters ω i , α + i , α − i and β i guarantee its positivity. t is the time index and T denotes the total number of observations. 1 {·} denotes the indicator function which is equal to one if the constraint holds and zero otherwise. The sequence {s t } is assumed to be a stationary, irreducible Markov process with discrete state space {1, 2} and transition matrix P .
is the transition probability of moving from state i to state j (i, j ∈ {1, 2}). Finally, S(0, 1, ν) denotes the standard Student-t density with ν degrees of freedom and (ν − 2)/ν is a scaling factor which ensures that the conditional variance of y t is σ 2 st,t . Model specification (3) allows reproduction of the so-called volatility clustering observed in financial returns, i.e., the fact that large changes tend to be followed by large changes (of either sign) and small changes tend to be followed by small changes. Moreover, it allows for sudden changes in the unconditional variance of the process; in the ith regime, the unconditional variance isσ provided that (α + i + α − i )/2 + β i < 1 (i.e., the process is covariance stationary); see Bollerslev (1986, page 310). Finally, it allows determination of whether or not an asymmetric response is present (i.e., α − i > α + i for at least one i) and is different between the regimes (i.e., α − i = α − i ). This asymmetric response, referred to as the leverage effect in the financial literature, reflects the fact that the volatility tends to rise more in response to bad news (i.e., y t−1 < 0) than to good news (i.e., y t−1 ≥ 0).
The conditional density of y t in state s t = i given θ and the information set I t−1 .
where Γ(·) denotes the Gamma function. This stems from the fact that in state i, y t follows a Student-t distribution with mean zero, variance σ 2 i,t and degrees of freedom ν. By integrating out the state variable s t , we can obtain the density of y t given θ and I t−1 only. The (discrete) integration is obtained as follows: where η i,t−1 . = P(s t−1 = i | θ, I t−1 ) is the filtered probability of state i at time t − 1 and where we recall that p ij denotes the transition probability of moving from state i to state j. The filtered probabilities {η i,t ; i = 1, 2; t = 1, . . . , T } are obtained by an iterative algorithm similar in spirit to a Kalman filter; we refer the reader to Hamilton (1989) and Hamilton (1994, Chapter 22) for details.
Finally, the log-likelihood function corresponding to model specification (3) is obtained from (5) as follows: ln f (y t | θ, I t−1 ) .
The maximum likelihood estimatorθ is obtained by maximizing (6) (or minimizing its negative value).
To illustrate the utility of DEoptim, we fit the MSGARCH model (3) to daily log-returns of the Swiss Market Index (SMI), displayed in Figure 6. The sample period is from November 12, 1990, to October 20, 2000, for a total of 2500 observations and the log-returns are expressed in percent. The data set was downloaded from http://finance.yahoo.com/ and is available when DEoptim is loaded using the command data("SMI"). Note that the two-regime specification is used for illustrative purposes only; checking for possible model misspecification is beyond the scope of the present paper.
In addition to the positivity constraints on the model parameters, we require covariance stationarity to hold in the two regimes, i.e., (α + i + α − i )/2 + β i < 1 for i = 1, 2 and that the unconditional variance (4) in state 1 is smaller than in state 2, i.e.,σ 2 1 <σ 2 2 . We also require the transition probabilities p 11 and p 22 of the state variable to lie within the [0, 1] interval. The constraints on the domain are set using the arguments lower and upper of DEoptim, while the covariance stationarity and unconditional variance constraints are tested within the (1) function optim with method "Nelder-Mead", (2) method "BFGS", (3) method "CG", (4) method "L-BFGS-B", (5) method "SANN" with control parameter itermax = 1e5, (6) function nlminb, (7) function constrOptim, (8) function constrOptim.nl of the package alabama, (9) function solnp of the package Rsolnp, (10) function DEoptim with control parameters NP = 110 and itermax = 500. Starting values are generated randomly in the feasible parameter set. Lower boundaries were set to 0.0 (2.0 for ν) and upper boundaries to 1.0 (50 for ν). Bottom: Percentage of non-convergent runs of the different optimization methods (either NaN output or convergence flag indicating non-convergence).
objective function which then returns a very large value (in our case 1e10) if not satisfied. In the DE optimization, we set the control parameters of DEoptim to itermax = 500 and NP = 110. Note that the objective function (6) is implemented in C to speed up the optimization procedure. We refer the reader to the Appendix for details on the R implementation.
For comparison, the objective function (6) is also optimized using various unconstrained and constrained optimization routines available in R. More specifically, we use the function optim with methods "Nelder-Mead" (unconstrained optimization), "BFGS" (unconstrained), "CG" (unconstrained), "SANN" (unconstrained), "L-BFGS-B" (box constrained), the function nlminb (box constrained). More details on the underlying algorithms for these methods can be found in the optim and nlminb manuals. We also consider optimization routines which handle more complicated constraints such as constrOptim, constrOptim.nl of the package alabama (Varadhan 2010) and solnp of the package Rsolnp (Ghalanos and Theussl 2011). The former relies on an adaptive barrier algorithm and handles linear inequality constraints. The two latter belong to the class of indirect solvers and implement the augmented Lagrange multiplier methods, in which linear and non-linear equality and inequality constraints are allowed; see Ye (1987) for details. For all methods we use ten times the default values of the control parameters related to the maximum number of iterations and function evaluations. For optim with method "SANN" we set itermax = 1e5. For unconstrained methods, the constraints on the model parameters were tested within the function which returns 1e10 is not satisfied, as for DEoptim. For functions which handled inequality constraints, we implement the constraints explicitly in the inputs of the functions; see the Appendix for an example with solnp.
We run the estimation 50 times for all optimization routines (including DEoptim) and use random starting values in the feasible parameter set when needed (using the same random starting values for the various methods). Boxplots of the negative log-likelihood value (NLL) at optimum for convergent estimations is displayed in the upper part of Figure 7. The lower part reports the percentage of non-convergent optimizations (defined as NaN output or non-convergent flag, usually convergence > 0). We notice that the unconstrained and box constrained methods 1-6 perform poorly compared to the optimizers which can handle more complicated constraints. In particular, we note that nlminb does not converge over the 50 runs. Overall, the methods solnp and constrOptim are the best performers in terms of lowest NLL values reached over the 50 runs. However, we notice that for both the convergence is not achieved in about 20% of the time. DEoptim compares favorably with the two best competitors in terms of NLL and is more stable over the runs. It does not however reach the lowest value obtained by solnp after 500 iterations. The tradeoff for the stability afforded by DEoptim is in runtime; it requires considerably more CPU time than either solnp or constrOptim (though significantly less runtime than the other stochastic optimization algorithm tested, optim with method "SANN"). We notice that increasing the number of generated population in DEoptim leads to the convergence toward the global optimum. Figure 9 displays the boxplots of the parameter values obtained with the 50 runs of DEoptim together with the parameter values corresponding to its best run (in blue squares), i.e., the run leading to the minimum NLL, and the parameter values corresponding to the best run of solnp (in red dots). The parameters at the global optimum (NLL = 3350.6979) obtained by solnp and DEoptim (after a longer run with itermax = 2500) areω 1 = 0.2062,ω 2 = 0.0930, α + 1 = 0.0000,α + 2 = 0.0043,α − 1 = 0.2123,α − 2 = 0.1566,β 1 = 0.5295,β 2 = 0.8717,p 11 = 0.9981,p 22 = 0.9969 andν = 9.2480. The parameter estimates clearly indicate two different regimes for the conditional variance process. More precisely, the values ofω i andβ i are far apart between the regimes. We note the presence of leverage effect in both regimes (i.e., α + i <α − i for i = 1, 2), with similar levels. The estimated transition probabilitiesp 11 andp 22 very close to one indicate infrequent mixing between states. Finally, the estimated degrees of freedom parameter suggests heavy tails for the conditional distribution of the log-returns.
Finally, Figure 10 displays the estimated filtered probabilities of the second state (high unconditional volatility state), {P(s t = 2 |θ, I t ); t = 1, . . . , T }, implied by the best model parameters   Figure 9: Boxplot of the parameters obtained over the 50 runs of DEoptim, together with the parameters of its best run (blue squares), i.e., the run leading to the lowest NLL, and the parameters corresponding to the best run of solnp (red dots). of solnp (in red solid line) together with the log-returns (in small circles). In addition, we report in dashed blue lines, the 50% area of the paths obtained over the 50 runs of DEoptim. The parameters obtained with DEoptim and solnp lead to a clear separation of regimes in the filtering probabilities. The beginning of year 1991 is associated with the high unconditional volatility state. Then, from the second half of 1991 to 1997, the returns are clearly associated with the low unconditional volatility regime, with the exception of 1994. From 1997 to 2000, the model remains in the high unconditional volatility regime with a transition during the second semester 2000 to the low unconditional volatility state.

Summary and conclusions
Differential evolution is a heuristic evolutionary method for global optimization that is effective on many problems of interest in science and technology. By implementing the package DEoptim we have made this algorithm possible to easily apply in the R language and environment. As Section 3 details, we have also made available many variations on the classical DE strategy. These variations as well as the classical strategy are due to Price, Storn and Lampinen, and we have referred the interested reader to their textbook (Price et al. 2006) on DE for details.
We have described herein the use of the package for fitting the Parratt recursion models for X-ray reflectometry and an MSGARCH model for the log-returns of the Swiss Market Index. These case studies showcase the power of the DE algorithm underlying DEoptim. We hope that readers will find the package to be a valuable tool for optimization. If you use R or DEoptim, please cite the software in publications.
Current work in extension of the package includes the addition of a framework for adaptive differential evolution (Zhang and Sanderson 2009). Future work will also be directed at parallelization of the implementation. The DEoptim project hosted on R-Forge (http:// R-Forge.R-project.org/projects/deoptim/) links to development versions of the package.

Computational details
The results in this paper were obtained using R 2.10.0 (R Development Core Team 2009) with the package DEoptim version 2.0-4 (Ardia and Mullen 2010). Computations were performed on a Genuine Intel dual core CPU T2400 1.83Ghz processor and on a quad core Intel Xeon Processor E5410.
DEoptim relies on repeated evaluation of the objective function in order to move the population toward a global minimum. Users interested in making DEoptim run as fast as possible should ensure that evaluation of the objective function is as efficient as possible. Using pure R code, this may often be accomplished using vectorization. Writing parts of the objective function in a lower-level language like C or Fortran may also increase speed.
National Institute of Standards and Technology, nor does it imply that the materials or equipment identified are necessarily the best available for the purpose. DEoptim: Global Optimization by Differential Evolution in R A. Implementation of the MSGARCH model We described below the major steps for implementing and estimating in R the Markovswitching asymmetric GARCH(1, 1) model described in Section 5.
First, we define the function NLL which computes the negative value of the log-likelihood function (6). This function will be minimized in order to find the maximum likelihood estimatorθ. The function NLL has input parameters theta, a 11-dim vector containing the MSGARCH(1, 1) parameters, y the (T × 1) vector of log-returns y . = (y 1 , . . . , y T ) and checkConstraints, a boolean indicating if the constraints should be checked within the function (which is TRUE by default). The function outputs 1e10 when the constraints are not fulfilled. In the implementation below, two constraints must be fulfilled: i) the covariancestationarity condition of the conditional variance is satisfied in the two states; ii) the unconditional variance in state 1 is smaller than in state 2. The constraints on the lower and upper bounds on the parameters will be checked by the optimization function itself (below we consider DEoptim and solnp which handle this case). For optimization functions which cannot handle box constraints (e.g., optim with method Nelder-Mead), lower and upper bounds should also be checked within the function NLL. If the two constraints are satisfied, the function NLL calls a C routine which computes the negative log-likelihood value. The C implementation is needed for speed purposes, since for each input theta, two conditional variance processes need to be computed as well as the filtering and updating steps of a Kalman-like algorithm (Hamilton 1989). Since DEoptim evaluates the objective function for a large number of theta values, this C implementation is recommended. We do not document the C implementation here and refer the reader to file MSGJR.c attached for details. if (isTRUE(constraintsOK)) { + n <-length(y) + outC <-.C(name = "MSGJRS", theta = as.double(theta), + y = as.double(y), n = as.integer(n), nll = as.double(0), + f = vector('double', 2 * (n-1)), u = vector('double', 2 * (n-1))) + nll <-outC$nll + } + if (is.nan(nll)) { + nll <-1e10 + } + return(nll) + } Second, we load the package, the data set and the complied C function MSGJR.c (under Linux,