CollocInfer: Collocation Inference in Differential Equation Models

This monograph details the implementation and use of the CollocInfer package in R for smoothing-based estimation of continuous-time nonlinear dynamic systems. These routines represent an extension of the generalized profiling methods in Ramsay, Hooker, Campbell, and Cao (2007) for estimating parameters in nonlinear ordinary differential equations. An interface to the fda package is included. The package also supports discretetime systems. We describe the methodological and computational framework and the necessary steps to use the software. Equivalent functionality is available in MATLAB.


Background, notation and overview
This paper details the CollocInfer (Hooker, Xiao, and Ramsay 2016) package for using basis expansion methods to estimate parameters in differential equation models in both the R (R Core Team 2016) and MATLAB (The MathWorks, Inc. 2011) programming languages. The CollocInfer package supports the generalized profiling (GP) methods in Ramsay et al. (2007). It also provides some diagnostic plots and testing procedures, as well as access to gradient matching (GM) methods. The CollocInfer package automatically installs and works with the functional data analysis (FDA) package fda (Ramsay, Wickham, Graves, and Hooker 2014). In this paper, we will use squared-error criteria to fit data; however more general criteria is also accommodated by the package.
We assume that the reader is familiar with the basic theory of ordinary differential equations (ODEs) as well as with basis expansion systems and nonparametric smoothing. We will therefore give these topics only a cursory introduction; the reader requiring more background is directed to Ramsay and Silverman (2005) or Ramsay, Hooker, and Graves (2009).
All of the example analyses in this paper are presented in R code. However, code in the MATLABs language is available as well in the CollocInfer package and is structured to look as much as possible like the R code. We provide some comments on the differences between the languages in Section 7; however, we think MATLAB users can easily translate the R examples into MATLAB.
A typical differential equation model involves one or several functions, which we designate by x j , j = 1, . . . , d. The literature on dynamical systems often refers to these functions as states, and to values of functions over a range of time values t as trajectories. We use the notation Dx for the first derivative of function x rather than the more classic notation dx/dt; and Dx(t) for the value of the derivative at time t. Higher order derivatives whenever needed are indicated by D 2 x, D 3 x, . . ., here we note that high order ordinary differential can always be re-expressed as systems of first order differential equations.
We begin with the problem set-up. The system under study is assumed to be governed by the model . . .
Here x is a d-dimensional state vector of functions that describes the system as it changes over time. The vector f of d functions f j maps x to its first derivative Dx, and this map may depend on t directly. Map f also depends on a p-dimensional parameter vector θ.
We assume that the system is measured at times t 1 , . . . , t n , which by default are assumed to be common to all observed states. The values of the measurements are a set of n vectors y 1 , . . . , y n , each of size d, and from the measurements we wish to obtain an estimate of the parameter vector θ. The measurements are subject to additive measurement errors where i = ( i1 , . . . , id ) and the errors ij are assumed to have Gaussian distributions that are independent between time points and independent of the values of x. The data vectors y i are allowed to have missing measurements, represented as NA in R. Usually only a subset of the d states x j are measured; for those unmeasured states we provide NA values for the corresponding entries in y i . CollocInfer can also account for indirect measurements of states, the accommodation of which will be detailed in Section 4.2.
The methods described here are the generalized profiling methods in Ramsay et al. (2007) and to our knowledge CollocInfer provides the only implementation of these methods -although it builds on the earlier MATLAB software released at the time of the paper. Relevant stochastic (i.e., probabilistically-evolving) methods in R can be found in the packages pomp (King, Nguyen, and Ionides 2016) and smfsb (Wilkinson 2013) using maximum likelihood or in sl (Wood 2010) using synthetic likelihood. In other software environments, the System Identification toolbox in MATLAB (see Ljung 1995) can be employed for some systems.
The IPOPT package (Wächter and Biegler 2006) utilizes basis expansions and works with AMPL (Fourer, Gay, and Kernighan 2003); a detailed description of the parameter estimation methods is available in Tjoa and Biegler (1991) and Hooker and Biegler (2007). Similar ideas based on Runge-Kutta methods are implemented in the VPLAN package (Körkel 2002); see the multiple shooting methods in Bock (1983) for details. The DEDiscover software described in Wu, Miao, Warnes, Wu, LeBlanc, Dykes, and Demeter (2008) also provides tools for exploring differential equation solutions and parameter estimation. However, these all assume that either the ODE provides an exact description of the trajectory of the system, or that an explicit probabilistic description of the system's dynamics is available. CollocInfer allows for additional flexibility in assuming that the ODE's description of the dynamics are only approximately correct, but without requiring an explicitly stochastic model of system evolution.
The rest of this section provides background material on the numerical methods employed by CollocInfer particularly with reference to methods in packages deSolve and lsoda (Soetaert, Petzoldt, and Setzer 2010). Section 1.2 describes the FitzHugh-Nagumo equations as a specific example of ODEs and some functionality for working with them in R, while Section 1.3 details the basis functions that we will use to approximate trajectories x(t) and 1.4 will illustrate their use with spline smoothing as an initial step. With these tools, Section 2 will present methods of parameter estimation, and in particular the gradient matching and profiling methods implemented in CollocInfer. Section 3 will describe how to apply these methods using CollocInfer, Section 4 illustrates some already-implemented extensions with a model from experimental ecology, and Section 5 discusses further ways in which CollocInfer's functionality can be extended. An example of this extensibility is given in 6 which illustrates the application of these methods to discrete-time dynamics. Appendices discuss equivalent functionality in MATLAB as well as providing technical details.

An example ODE: The FitzHugh-Nagumo spike potential equations
ODEs are not commonly part of a standard statistics curriculum and we this subsection is intended to serve as a brief exposition of the class of models for which CollocInfer is designed as well as presenting functionality in R -particularly in the package deSolve (Soetaert et al. 2010) -for solving them. In particular, we will illustrate these methods using the FitzHugh-Nagumo equations (FitzHugh 1961;Nagumo, Arimoto, and Yoshizawa 1962; see also Wilson 1999), which will be employed to demonstrate CollocInfer later. The equations describe a two-state system, which is relatively simple but can illustrate many practical aspects of fitting differential equation models to data. We provide code to generate and analyze data as we go along. Code to run this analysis can also be found in the fhn demo in the package.
Neurons communicate by sending "pulses" of voltage down their axons to other neurons. These pulses can be described by the FitzHugh-Nagumo equations, a simplification of the Hodgkins-Huxley equations (Hodgkin and Huxley 1952). The equations are where V is the voltage. The pulse is generated by the interplay of several chemical ions that cross the membrane boundary and R (for "recovery") represents the combined flow of two of these that act to counter the build-up of voltage. Typically R cannot be measured, but we will start off assuming that both are measured. The parameters (a, b, c) in the equations are left unspecified and need to be estimated.
Equations 3 uniquely define functions V (t) and R(t) so long as the initial conditions or states V (t 0 ) = V 0 and R(t 0 ) = R 0 are known. However, solutions (V (t), R(t)) to (3) cannot be written down algebraically and have to be approximated by numerical methods. Often these solutions are approximated by Runge-Kutta or linear multistep methods, which, in R, can be carried out by functions in the package deSolve, such as lsoda. See Deuflhard and Bornemann (2000) for an overview of numerical methods for solving ODEs.
Before we begin, we define a couple of vectors of names for the indices of arrays containing parameters and values of the trajectories. These are R> FhNvarnames <-c("V", "R") R> FhNparnames <-c("a", "b", "c") To use lsoda, we specify the initial conditions (the values of V and R at time 0) to be, respectively, R> x0 <-c(-1, 1) R> names(x0) <-FhNvarnames In the above we have added names to the elements of x0 so as to keep track of them easily. We also need initial values for the parameter vector θ = (a, b, c), R> FhNpars <-c(0.2, 0.2, 3) R> names(FhNpars) <-FhNparnames Next, we need a function to evaluate the FitzHugh-Nagumo equations. The deSolve package requires that such a function will return a named list, with a member named fn and containing the function evaluation R> fhn.ode <-function(times, x, p) { + dx <-x + dimnames(dx) <-dimnames ( We can now call function lsoda to compute the approximation to the solution at these time values R> out <-lsoda(x0, times = FhNplottimes, fhn.ode, FhNpars) The output of lsoda is an n × 3 matrix where the first column has the evaluation times. If we plot these solutions against time we obtain a wave R> matplot(out[, 1], out[, 2:3], type = "l") R> legend("bottomleft", c("V", "R")) Or, on the phase space (V versus R) plot, we obtain a square-ish orbit R> plot(out[, 2], out[, 3], type = "l", xlab = "V", ylab = "R") The plots by these commands are given in Figure 1.

Basis functions for representing trajectories
This subsection presents basis expansions as the numerical tool underlying CollocInfer which will be used as an means or representing the trajectory of the state variables x(t). The multistep class of numerical methods employed in deSolve are based on Taylor expansions of (V (t + δ), R(t + δ)) at t for some small step δ. In CollocInfer, we use an alternative set of methods, known as collocation methods, which are based on approximating (V (t), R(t)) by basis expansions: We write (4) in matrix notation as . . , φ rKr (t)) and collecting all the coefficients c vk (c rk ) into the column vector c v (c r ). In this and many other situations we can use the same basis system for all variables, and if so we can simplify notation by designating a K r = K V = K by 2 coefficient matrix C that has c v and c r in its first and second columns, respectively.
We now represent the trajectory in terms of this basis system. To do this, collocation methods seek to match the basis expansion with the differential equation at a pre-specified set of points. That is, they require where the collocation points t q are generally spaced so as to cover the support of the basis functions. K collocation points are employed (including 0) so that the equations above can be satisfied exactly. We use this idea only indirectly and will often employ a few more collocation points. The interested reader is directed to Deuflhard and Bornemann (2000) for a more detailed presentation.
Although CollocInfer allows any choice of basis functions Φ(t), here we use the B-spline basis functions available through the fda library. B-splines are a set of functions that are polynomial between break points and are constrained to be smooth over the break points. We first define the range of the basis and the breaks. For the purpose of visualizing the basis expansion, here we use one break at every four time units, but we will use eight times as many when analyzing the data: R> FhNrange <-c(0, 20) R> breaks <-seq(0, 20, 4) Then we specify the order of the polynomial segments, which is one more than the highest power in the polynomial. Here we use the default order four splines, which correspond to cubic polynomial segments. We now call a function in the fda package to assemble the basis R> ExampleBasis <-create.bspline.basis(FhNrange, norder = 4, + breaks = breaks) The basis can be visualized by R> plot(ExampleBasis, xlab = "time", ylab = "Phi(t)") resulting in the plot in Figure 2.
The resulting object ExampleBasis holds the specifications of the basis system. We can use the function eval.basis to obtain values of Φ(t) and its derivatives at specific t-values.
CollocInfer only requires the values of this basis at particular time points, but it also works directly with a basis object such as ExampleBasis; we use this functionality in our initial examples. These functions are supported on a span of 5 knots and are made up of piecewise cubic polynomials between breaks with continuous second derivatives at breaks. Note that for realistic analyses, many more breaks than used here will be employed.

Basis expansions for nonparametric smoothing
As a final piece of background material, we briefly review nonparametric smoothing, in particular as applied in the fda package, extensively described in Ramsay et al. (2009). These also employ methods based on basis expansions and the commonalities between these methods and the collocation methods represent much of the inspiration for the methods in CollocInfer.
Smoothing data provides us a starting estimate of the matrix of coefficients C. A classical means of smoothing is to estimate c v by minimizing a penalized least squares criterion and c r can be similarly estimated. The first term in (6) measures agreement with data, just as in linear regressions, while the second imposes smoothness in the sense of having low curvature (D 2 x(t): the second derivative of x(t)). This second term is needed because Φ(t) is very flexible and can fit almost any data very closely, even when the data have a lot of noise. The multiplier λ is a smoothing parameter that controls the trade-off between data fitting and smoothness. There are many ways of selecting λ automatically, but for the moment we recommend trying a few values and adjusting it by eye. A more detailed treatment of these ideas can be found in Ramsay and Silverman (2005) and Ramsay, Hooker, and Graves (2009).
Let the 41 by 2 R matrix object FhNdata contain the observations of the two state variables at the 41 observation points. We now define a richer basis class using breaks at every observation point: R> breaks <-seq(0, 20, 0.5) R> FhNbasis <-create.bspline.basis(range = FhNrange, norder = 4, + breaks = breaks) We can obtain initial estimates of C by employing the smooth.fd function in fda. The following command sets up an fdPar object FhNfdPar which specifies that the second derivative will be penalized and that the smoothing parameter λ in (6) has value 1: R> FhNfdPar <-fdPar(FhNbasis, int2Lfd(2), 1) q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q Figure 3: The result of plotfit.fd after smoothing data from the FitzHugh-Nagumo system. Now we are ready to smooth the data: R> DEfd0 <-smooth.basis(FhNtimes, FhNdata, FhNfdPar)$fd We call the resulting functionx(t) returned in the functional data object DEfd0, and we examine the fit to the data using the plotfit.fd function in the fda package, the result of which is given in Figure 3: These, of course, are not solutions to (3), but rather just smooths of the data. The coefficients of this smooth, which we will use later can be extracted from R> coefs0 <-DEfd0$coef

Parameter estimation methods and basis expansions
In this section we introduce the methods implemented in CollocInfer and in the next section we will demonstrate their usage in R. We begin with a method called "gradient matching", a useful means of finding initial parameter estimates.
First we briefly comment on an obvious approach which searches for the set of parameters values such that the output of lsoda matches the observed data as well as possible. This approach might also involve searching over initial conditions x 0 . Such an approach, often referred to as "nonlinear least squares" or "NLS", can be problematical for three reasons: 1. The computational cost of repeatedly approximating the solution of the differential equation within an optimization routine can be considerable.
2. Frequently, the optimization of θ involves a criterion with many local minima and it is easy to get stuck, which necessitates stochastic optimization methods that will increase the computational cost above.
3. Often, we know that the systems under study only approximately follow the dynamic model, hence we may not want to insist on x j being an exact solution to the ith differential equation.
Methods based on basis expansions and smoothing, which we will present later, avoid many of these problems.

Gradient matching with estimated derivatives
The idea behind what we term gradient matching is quite simple: suppose that we have produced a smooth of the data following (6). It is natural to find parameters θ so that the estimated Dx matches the right hand side of (1), f(x; t, θ), as well as possible. Thus it makes sense to choose θ that minimizes the integrated sum of squared errors: This approach, or variants on it, has been studied by numerous authors (Brunel 2008;Ellner, Seifu, and Smith 2002;Gugushvili and Klaassen 2012;Wu, Xue, and Kumar 2012) in recent statistical literature, but the idea goes back to Varah (1982) and Bellman and Roth (1971) as well as several other early re-inventions, and we would be frankly unsurprised to discover that Newton also thought of it.
In practice, we cannot evaluate the integral in ISSE directly, so instead we will rely on an approximation of the form where t q , q = 1, . . . , Q are quadrature points and w q are corresponding quadrature weights.
Simpson's rule is often employed here, but simply taking a large set of equally spaced t q and giving them equal weight is usually also reasonable.
The above gradient matching approach has several advantages -in particular we do not need to repeatedly call lsoda to find values of θ. Frequently f(x; t, θ) is closer to linear than the solution to (1) meaning that ISSE is easier to work with. Finally, this framework allows Dx to deviate somewhat from f(x; t, θ), hopefully without affecting the estimate of θ too much.
There are also some downsides. First, we must be able to estimatex j for every state variable, which means that we must have enough high quality observations of this variable to put a smooth through it. This is often not the case in practice as some state variables may have no observations. For example in the above FitzHugh-Nagumo example, although we have simulated observations in both V and R above, only V will be available in real neural experiments. Another downside is that the form of (6) explicitly tries to reduce curvature. In Figure 3 we observe that the trajectories resulting from the FitzHugh-Nagumo equations may have quite sharp turns that will probably be smoothed over. Sharp curvatures are common in nonlinear ODE models, and bias in estimating the curves may induce bias in the parameter estimates.
Nonetheless, gradient matching is often a useful means of cheaply obtaining initial parameter estimates and these estimates can be used to start the generalized profiling procedure, which we describe below.

Generalized profiling: A data/equation fitting compromise
The generalized profiling methods in Ramsay et al. (2007) can alleviate the problems associated with gradient matching. First we observe that the penalty term in (6) tends to make the estimatedx look like a straight line satisfying D 2 x = 0. We might therefore think that, if we knew the value of θ, a better way to smooth would be via the criterion where the set C O contains the indices j corresponding to observed states. Here the second penalty term ISSE defined in (7) tries to make x satisfy the differential equation (1) while the first term tries to make it look like the data. Since x(t) = Φ(t)C, this is still really a problem of choosing coefficients C. We might hope that, since we think that ISSE(θ, x) ought to be close to zero at the true trajectory, this will induce smaller bias. In particular, when x changes abruptly, (1) ought to model it and hence helps replicate this when we minimize PENSSE.
Note, importantly, that we do not require all of the x j to have measurements to implement (8). A state variable is chosen to minimize ISSE(θ, x) only if it is measured. This way we make the trajectories of the measured state variables look as much like the ODE as possible.
So far, we have minimized PENSSE(x; θ) for x, holding θ fixed. We now consider estimation of θ. We begin by pointing out an important distinction between the smoothing methods in (6) and PENSSE(x, θ). The penalty in (6) is intended to smooth out "wiggles" but otherwise let the curve follow the data. In contrast, we expect ISSE(θ, x) to have information about x: if (1) holds, it should be zero at the correct θ. As a consequence, our estimated x should change considerably with different values of θ. Therefore, while in (6) we choose λ to be only large enough to stabilize our smooth, in PENSSE(x, θ) we expect λ to be large to enforce close agreement to the differential equation and reduce λ only to accommodate evident departures from it.
To proceed with smoothing, we denote byx(t, θ) the trajectory that minimizes PENSSE(x, θ) for each value of the parameter θ. We think ofx(t, θ) as being like a solution of the ODE at θ. We can then choose θ to minimize squared error: That is, we think of this method as nearly solving the ODE to obtainx(t, θ) and then fitting the "solution" to the data. This allows us to gain the advantages of gradient matching without incurring the same level of bias or requiring that all the components of x(t) have high quality observations. The methods here have been used and extended in Cao, Ramsay, and Fussmann (2008); Hooker (2009);Hooker, Ellner, de Vargas Roditi, and Earn (2011);Campbell, Hooker, and McAuley (2012); Qi and Zhao (2010) provides a theoretical analysis. Cao and Ramsay (2009) employs similar ideas in a mixed model framework.
In the next section we describe how to get the generalized profiling method running in Col-locInfer.

An introduction to using the CollocInfer package
In order to obtain parameter estimates from CollocInfer, we set up some preliminary information. In particular, we define • A basis system Φ.
• A function to evaluate the right hand side f of (2).
• Initial values to start off the iterative estimation of θ and C.
We described how to obtain a B-spline basis system in Section 1.3. We will walk through the rest of the set-up process in the next two subsections.

Setting up the FitzHugh-Nagumo dynamic model
We first set up the function to evaluate f(x; t, θ). The package requires that f is a function with arguments: t: a vector of evaluation times.
x: a matrix of evaluations of x(t) at the times in t with rows corresponding to time points.
p: a vector of parameters, θ.
more: a list of additional inputs that f might require.
The more argument allows the user to specify other inputs into f. It can be unused, but should still be listed as an argument. Also the function should return a matrix of the same dimensions as dx.
For the FitzHugh-Nagumo equations, the function looks like R> fhn.fun <-function(times, x, p, more) { The way we organize the evaluation function is somewhat different to the function fhn.ode that lsoda requires. Here we assume that the evaluation times in times are given by a vector and x is a matrix of values for the trajectory corresponding to the times times. The reason for this is that we examine all the values of x(t) at once, rather than sequentially, and employing the collection of evaluation points allows us to be more computationally efficient. We also only output the matrix dx of evaluations of fhn.fun rather than a list.
We assume that we have created a basis system FhNbasis as described in Section 1.3. We require the following input: data an n × d matrix of data values. Missing values are listed as NA.
times a vector of n observation times. Now we can set up some structures that CollocInfer needs. There is a simple function for the common choice of sum of squared errors criterion: R> lambda <-1000 R> profile.obj <-LS.setup(pars = FhNpars, fn = fhn.fun, lambda = lambda, + times = FhNtimes, coefs = coefs0, basisvals = FhNbasis) We have chosen lambda <-1000 for convenience for the moment; we will discuss some ways to choose this later.
LS.setup returns a single list object with two named members, each of the list classes lik and proc that define the map from x to y and the map from x to Dx, respectively.
R> proc <-profile.obj$proc R> lik <-profile.obj$lik We ignore but will use implicitly some of the internal structures of these objects; see Section 5 for details. Note that by default CollocInfer employs finite differencing to estimate the derivatives needed in its optimization routines. However, finite differencing can be numerically unstable and the user can provide a named list of functions in place of fhn.fun above to evaluate also the first and second derivatives of fhn.fun (or whatever right hand side function is being used) with respect to x and p. See Section 8.1 for details. Given this framework, we can now start estimating parameters.

Obtaining initial parameter values via gradient matching
We showed how to obtain initial coefficients by minimizing (6) using the smoothing commands from the fda package. Next we use gradient matching to get an initial estimate of parameters. The function ParsMatchOpt minimizes ISSE(θ,x) in (7) and returns a named list containing the optimized parameter values: R> Pres0 <-ParsMatchOpt(FhNpars, coefs0, proc) R> pars1 <-Pres0$pars R> pars1 a b c 0.2777251 0.1784665 1.4677403 The output list ParsMatchOpt also contains a member Pres0$res, the output of the minimization routine for finding the parameter values. By default ParsMatchOpt employs the R optimization function nlminb; several other options are also available in CollocInfer. Note here that while a and b are fairly close to their true value of 0.2, c is estimated at less than 1/2 its true value. This is because c controls the relative speed of V and R and smoothing tends to dampen the V dynamics. The profiling methods employed below improve the precision of all of these estimates.

Running generalized profiling to obtain parameter estimates
A first step in the generalized profiling routines is to find x that minimizes the inner criterion PENSSE(x; θ) in (8). This is achieved through the function inneropt and is called as follows: R> Ires1 <-inneropt(FhNdata, times = FhNtimes, pars1, coefs0, lik, proc) R> coefs1 <-Ires1$coefs Strictly speaking, this is not particularly necessary, but it can be a useful check on both how long the overall procedure is likely to take and how well the model fits (see Section 3.6 below). The output of inneropt is a matrix of coefficients in Ires1$coefs and the result of the optimization routine in Ires1$res. By default it employs nlminb but a number of other routines can be set by out.meth.
We now carry out the profiling itself by Again the output of outeropt is a list with members pars, coefs and res giving the optimum parameters, coefficients and whatever other information the optimization routine produces. Here again, nlminb is used as a default.
The estimated parameter vector is: to perform profiling directly. These functions both call LS.setup and return lik and proc objects. Because these two objects have enough utility in other functions, we recommend producing them independently.

Using FPE to choose the smoothing parameter λ
How to choose the smoothing λ in (8) is an important problem. In fact, CollocInfer provides the possibility of having a different λ for each state variable by setting λ to be a vector. One approach is to examine visual diagnostics for when the fit to the data starts to degrade - Figure 4 provides an example of the fits obtained for different values of λ. An alternative is to keep increasing λ until the parameter estimates no longer change much (remember that the larger λ is, the closerx(t, θ) is to being solution of (1)). The second of these can start to produce bias when the basis expansion cannot approximate the solutions of the differential equation adequately. This source of bias is readily checked by applying profiling to a numerical solution of (1). Neither of these strategies are based on formal rules.
Ellner (2007) suggested the following idea: choose λ that predicts best using the differential equation (1). We can unpack this "best" with a recipe. For each value of λ 1. Conduct profiling to obtainθ andx(t,θ).
2. For a set of starting times s and ending times e , = 1, . . . , L, obtain solutions to (1) with parametersθ on the time interval [s , e ] with initial conditionsx(s ,θ).

Measure the squared error between these solutions and the observations in the interval
This was labelled forwards prediction error (FPE) because it describes how well exact solutions of (1) can be used for prediction over short-ish time intervals.
In CollocInfer, this recipe is implemented in the function FPE. For this function, besides the result of profiling, we need to define starting and ending times. These are given in a matrix where it is the index of the given observation times. In the case of the FitzHugh-Nagumo example, there are 41 observations and we will start from each time point and predict 10 ahead. This means that we define the object R> whichtimes <-cbind(1:31, 11:41) whichtimes is a matrix where the first column contains the indices of the observation times to start at, and the second column has the indices of the observation times to end at.
We now call R> FPE <-forward.prediction.error(FhNtimes, FhNdata, Ores2$coefs, + lik, proc, Ores2$pars, whichtimes) Generally, we would want to look at this for a selection of λ values to examine fit by eye. The code below cycles through values of λ increasing as powers of 10, reports the resulting parameter estimates and overlays the estimated trajectory on a plot of the data. In order to speed up computation we update the starting value for the parameters and coefficients to be the estimate with the previous value of λ. The code also waits for you to confirm before going on to the next λ (the function fd creates a functional data object that combines basis functions with their coefficients and allows easy plotting). The resulting plot is given in Figure 4. For clarity, we have only reproduced trajectories for three values of λ.

Using IntegrateForward to approximate an ODE Solution
We have used lsoda to solve differential equations to generate data, but it is somewhat frustrating to have to separately define different functions for the differential equation depending Figure  on what solver you use. To overcome this, CollocInfer includes a function IntegrateForward which uses functions in its format and passes them to lsoda.
We employ this function as an alternative to using lsoda. It returns a list with members times and states giving the times and values of the state variables respectively. We use this to compare the data to a solution of the ODE at the estimated parameters in Figure 5. (Here we will assume the initial conditions are known, although they could be extracted from the profiling solution.) R> x0 <-c(-1, 1) R> names(x0) <-FhNvarnames R> sol1 <-IntegrateForward(x0, FhNtimes, Ores2$pars, proc) R> matplot(FhNtimes, FhNdata, pch = c("V", "R")) R> matplot(sol1$times, sol1$states, type = "l", add = TRUE) It should be noted that a reason we did not employ IntegrateForward originally is that it requires setting up a proc object which makes less sense pre-data. Of course, if there is experimental data, one will not have needed to solve an ODE and can start from LS.setup and use this utility directly.

Using CollocInferPlots for diagnostic plots
We now assess how well the profiling procedures are doing. There are two things that we could look at : • How well the estimated trajectoryx(t, θ) fits the data.
The function CollocInferPlots produces basic diagnostic plots for these and can be called as follows R> out1 <-CollocInferPlots(Ores2$coefs, Ores2$pars, lik, proc, + times = FhNtimes, data = FhNdata) This produces the plots in Figure 6. The first plot displays the data and the estimated trajectory which enables us to observe if (1) pulls us too far from the data. The second plot gives the Dx(t, θ) as dashed lines and the value of f(x, θ) at that trajectory value as solid lines. In this case they are nearly on top of each other, but when they are further apart this provides a good indication of what parts of the trajectory do not fit. This is made more explicit in the third plot where the upper panel gives the difference between Dx(t, θ) and f(x; θ) and the lower isx(t, θ), which enables a rough comparison of under what conditions the mismatch between derivatives appears to be large. In this case there are hints that the mismatch in the derivative for R and its equation may be associated with the current value of V , but the overall level of mismatch is very small.

Using Profile.covariance to estimate the sample variance
Finally, it is useful to evaluate the statistical uncertainty in our parameter estimates. This is obtained through an Newey-West estimate (Newey and West (1987)) of the covariance of the score function for the outer optimization, the details of which are somewhat more technical and are given in Hooker et al. (2011).

Using FitMatchOpt when only some state variables are observed
So far we have used an example where both V and R have observations. However, for many systems some components of the model cannot be measured; indeed in neural experiments observations are generally given only for V . Our methods work in this situation as well. However, more work needs to be done to obtain initial coefficient and parameter estimates.
We set up some data with only V by simply converting the data for R to take value NA R> data2 <-FhNdata R> data2[, 2] <-NA and we go back to the original nonparametric smooth from which we obtained coefs0 and set all the coefficients for R to be zero The first thing is to compute some decent starting values for coefs0.2[, 2]. To do this, we assume that we have initial parameter values. Here we will use pars1, the results of gradient matching. In doing this we want to remind the reader that gradient matching cannot be performed without measurements of R -this is just a convenient set of values to use.

Real world experience: A chemostat experiment
Our methods, like most, work very well when tested on data simulated from the model the method is designed for. Working with real-world data can be a very different experience. In this section we deal with data generated from a laboratory-based ecological experiment. In this system, an algae of genus Chlorella, C, is grown in a chemostat, a large glass testtube to which a nutrient-rich medium is continuously added, and from which the contents are removed (including the algae) at a constant rate. The growth of the algal population is limited by nutrition in the ecology and by predation by rotifers, Brachionus, B -a genus of microscopic animals. The rotifers reproduce according to how much algae they consume and die either from natural causes or when they are removed from the tank. Data from a run of this experiment are taken from Becks, Ellner, Jones, and Hairston (2010) and plotted in Figure 7. We choose to model these in terms of C * = log(C) and B * = log(B) so that DC represents the per algae rate of growth in the population (see Section 4 for doing this automatically). We employ the Rosenzweig-MacArthur model (Rosenzweig and MacArthur 1969) which describes the growth of algae in terms of reproduction in which we think of each algae reproducing at rate ρ(1 − C/κ C ) (note that we use C rather than C * ) so that as the population reaches its carrying capacity κ C , the growth rate slows down due to resource limitations, and predation by rotifers which we parameterize by γB/(κ B +C) where the predation rate increases as there are more rotifers, but the denominator ensures that the rate of predation per algae decreases as the number of algae increases reaching the limit of the rotifer's capacity to consume. We model per-population rotifer dynamics in terms of increase following algal consumption by χγC/(κ B +C) and a death rate δ. This gives us the following differential equations (expressed in the logarithmic terms of C * and B * ): We instantiate this in R in the following model: Note that we have exponentiated both x (since the dynamics are all in terms of the non-logged quantities) and the parameter vector p. In fact we know that p is positive and the parameters have very different magnitudes, estimating their log quantities then both ensures that they take sensible values and helps to make the process more numerically stable.

Some further details and a more complex model
So far we have mainly dealt with systems of differential equations that are directly observed, although in practice we may be missing observations for some state variables. CollocInfer can deal with more complicated data designs and modeling situations, which can be useful in some settings. In particular, we will discuss three cases here: 1. We know that all the state variables are positive, and often it is useful to use an ODE representation for the log state variables z = log x. This is especially important for problems where data values vary by orders of magnitude from one variable to another, as is often the case with models for spread of disease and population dynamics.
2. We do not observe the system directly, but rather we measure some transformation of it, or some aggregation of certain variables.
3. We have replicated observations of the system.
These situations arise frequently in practice, and can all be dealt with by using the basic setup utilities in CollocInfer.
To illustrate these cases, we use an extension of the Rosenzweig-MacArthur system described previously. In this system, we propose two types of algae C 1 and C 2 , each of which has similar dynamics to the single algal species in the system above. However, they have different reproduction rates and suffer different predation risks.
The system is of interest because at some parameter values we can maintain two types of algae, even though they face the same pressure and one ought to out-compete the other. This is because we set the first algae to have a defense against predation, but make it pay a cost for that defense in terms of the efficiency of its reproduction. This means that when there are no rotifers around, the second algae has an advantage, but when there are rotifers, the first does and, in fact, the system produces oscillations that keep all three species present.
We model the systems in the following equations: Here ρ 1 and ρ 2 are differential rates of growth of the algae, and the nutrients (hence algal growth) are exhausted at C 1 /κ C 1 + C 2 /κ C 2 = 1. By setting κ C 1 < κ C 2 , we assume that the overall carrying capacity for C 1 is less than for C 2 . Note that we no longer model log quantities (we will do this automatically below) so the right hand side equations are multiplied by C and B respectively. We will convert to using log rates again below, but will use CollocInfer's internal conversion for this.
For predation, the rotifers have nutrients given by πC 1 + C 2 where π represents a fraction of the defended C 1 that are available for predation. Here the second term for C 2 gives a predation rate γB/(κ B + πC − 1 + C 2 ) where κ B is the amount of available algae at which the predation rate is halved. For the C 1 equation the relevant quantity of algae is πC 1 .
In the final equation, the rotifers turn consumed algae into new rotifers with efficiency χ, but also die at rate δ. These equations are coded in a manner suitable for CollocInfer as follows: Here we know that all the parameters should be positive, so we have hard-coded that we will estimate the log parameters. This also helps by reducing the difference in the scaling of the parameters so that their numerical values (after taking logs) are comparable. To generate data, we set these to be R> RMpars <-c(0.2, 0.025, 0.125, 2.2e4, 1e5, 5e6, 1, 1e9, 0.3) R> RMParnames <-c("pi", "rho1", "rho2", "kappaC1", "kappaC2", + "gamma", "chi", "kappaB", "delta")

R> logpars <-log(RMpars) R> names(logpars) <-RMParnames
We start with some initial conditions R> RMVarnames <-c("C1", "C2", "B") R> x0 <-c(50, 50, 2) R> names(x0) <-RMVarnames There are a few things that we need to know about this system: 1. The state variables are all population sizes and hence are always positive. However the populations can become very small, which creates numerical instability near x j = 0. Consequently, it will be useful to model an ODE for z = log x. This also helps to reduce numerical instability caused by the very different scales of C and B.
2. As in many real-world systems, we cannot distinguish C 1 from C 2 (in fact, a problem of current research is to detect if C 2 is there at all). So we really only measure C 1 + C 2 .
3. However, we can repeat the experiment several times so that we have a replicated time series.
Methods for accounting for each of these within CollocInfer are demonstrated in turn in the subsections below.

Dynamics on the log scale
The first observation is that all the state variables are strictly positive, so it may be useful to model z = log x. By employing the chain rule, we write out a new ordinary differential log-scale equation: where the ratio is taken elementwise. The Rosenzweig-MacArthur model (Rosenzweig and MacArthur 1969) is implemented below incorporating the log transform in a form suitable for use with lsoda: With the above function we now generate data by solving the ordinary differential equation and adding noise. For the moment, we pretend that we can measure each of the three species in the chemostat independently and that the experiment lasts 200 days and is sampled daily. We have plotted the outcome of this in Figure 9. Then we employ the same commands as before to set up basis expansions, smooth the data R> rr <-range(time) R> breaks <-seq(rr[1], rr[2], by = 1) R> RMbasis <-create.bspline.basis(rr, breaks = breaks) and obtain an initial set of parameters and coefficients R> coef0 <-smooth.basis(time, data, fdPar(RMbasis, int2Lfd(2), 10))$fd$coef R> colnames(coef0) <-RMVarnames To set up this equation on the log scale, we could transform RosMac2ODE into a form appropriate for CollocInfer. However, this requires writing out a new function and instead, we employ the posproc = TRUE option in LS.setup which makes the transformation automatically. But because our data is already measured on the log-scale since we added noise to the solutions of RosMac2ODE, we do not need the corresponding poslik = TRUE option. This is implemented in R> out <-LS.setup(pars = logpars, coefs = coef0, basisvals = RMbasis, + fn = RosMac2, lambda = 1e5, times = time, posproc = TRUE) R> lik <-out$lik R> proc <-out$proc Now we employ calls to functions ParsMatchOpt and outeropt R> res1 <-ParsMatchOpt(logpars, coef0, proc) R> res3 <-outeropt(data, time, res1$pars, coef0, lik, proc) The following output from function outeropt has been rounded to two decimals to fit into this page: and we call CollocInferPlots to examine goodness of fit in Figure 10 R> out3 <-CollocInferPlots(res3$coefs, res3$pars, lik, proc, + times = time, data = data) The same option can be used with smooth.LS and profile.LS.

Indirectly observed systems: the sum of C 1 and C 2
In chemostat experiments it is often impossible to distinguish C 1 from C 2 , so we only get to observe their sum. We will mimic this by exponentiating the data we generated for these (remember it is currently on a log scale) taking the sum and then re-logging: R> data2 <-cbind(log(exp(data[, "C1"]) + exp(data[, "C2"])), data[, "B"]) See Figure 11. Notice that these data are no longer directly comparable to the states of the ordinary differential equation. In this case we can write a function to go from states to observations by exactly repeating the process above. The structure of this function look exactly like the functions we employ for the right hand side of an ODE:   In this case, the observation model does not change with time t, parameters p and we have no additional information to add in more. We also add this function as the likfn argument to LS.setup, smooth.LS or profile.LS: R> out <-LS.setup(pars = logpars, coefs = coef0, basisvals = RMbasis, + fn = RosMac2, lambda = 1e5, times = time, posproc = TRUE, + likfn = RMobsfn) R> lik2 <-out$lik R> proc2 <-out$proc In this case it is unreasonable to assume that we can start from coef0 as a pre-smooth since we do not have separate data from C 1 and C 2 so we set the corresponding coefficients to zero:  R> coef02 <-coef0 R> coef02[, 1:2] <-0 but we keep the coefficients for B. From this we start by using FitMatchOpt as above and then profiling R> Fres3 <-FitMatchOpt(coef02, 1:2, res1$pars, proc2) R> res32 <-outeropt(data2, time, res1$pars, Fres3$coefs, lik2, proc2) R> exp(res32$pars) pi rho1 rho2 kappaC1 kappaC2 3.03e-01 5.84e-02 1.91e-01 2.84e+08 1.35e+03 gamma chi kappaB delta 6.17e+06 1.092e+00 8.26e+08 3.24e-01 and examine the usual diagnostic plots; see Figure 12.

Replicated experiments
In some instances it is possible to repeat experimental runs and observe how different some results might be. These can be accommodated in CollocInfer, although most situations will require the user to work more carefully with the structure of lik and proc objects. See Section 5 for more details.
However, LS.setup, smooth.LS and profile.LS will work if each repetition of the experiment has the same observation times and the same time domain. This means that we can use the same basis expansion for all experimental repeats. We achieve this structure by using the collection of all observation times with NA entries where observation times do not match up, but this approach will result in more numerical effort than necessary if, for example, the experiments run for very different durations.
We mimic this by running a new chemostat experiment starting from different initial conditions. For the sake of simplicity, we continue to pretend that we measure all the state variables.

User-defined fitting criteria and other extensions
The previous sections of this paper were focussed on functions that will allow using the profiling methodology quickly and with a minimum set-up cost. However, they can only be used with objective functions based on squared error and with certain structures for your data. Therefore, some of the functions employ defaults that may not be optimal for your system and data. CollocInfer allows profiling in a much more flexible class of models, but employing such flexibility requires more work on the part of the user.
In this package we allow the user to choose other measures to define the quality of fit for both terms. In the lower or inner optimization step, where θ is fixed, C(θ) minimizes while θ is then chosen to minimize This formulation allows us to use the following measures of fit: 1. Fit function F is a measure of fit to the data. It will often be the sum of negative log density function values F (y i ) = − j log p j (y i |x j (t i ), θ). In this case, F (y i ) is the negative log likelihood of the observations associated with the ith time value, and the sum over i is the total negative log likelihood of the data. Note that we only sum over the state variables that are observed.
2. Fit function P in the second term that quantifies failure to fit the differential equation, and represents a negative log likelihood for x if it is thought of as being generated by a random process. More general roughness measures using higher order derivatives or spatial co-ordinates are allowed in CollocInfer.
Notice that in both F and P we are no longer obliged to define fit in terms of the size of a difference; we could, for example, use differences between logarithms, differences between other transforms, or make no use of differences at all.
In CollocInfer, the lik and proc objects are designed to allow the evaluation of F and P respectively along with derivatives with respect to parameters θ and coefficients C. Functions to evaluate these must be set up by the user and Section 8 details how they should be specified and some utilities that provide some shortcuts along with an example.

The Hénon map: A discrete system
Another flexibility that this framework allows is demonstrated in the next section. We detail the use of CollocInfer for discrete time difference equations: for t = 1, . . . , T . The discrete option in the Profile.LS function sets up a basis expansion for such systems in which φ j (t) = I(t = j). That is the jth row of C exactly records the values of the state variable x(j) at time t = j. If we make the unusual specification that Dφ j (t) = I(t = j + 1), then ISSE(θ; t, x) measures d j=1 T −1 t=1 [x(t + 1) − f j (x(t); t, θ)] 2 . In order to demonstrate this option, we employ the Hénon map: Parameter values for a and b that yield chaotic behavior are: R> hpars <-c(1.4, 0.3) First, we generate some data, leaving off the first 20 time points as transients R> ntimes <-200 R> x <-c(-1, 1) R> X <-matrix(0, ntimes + 20, 2) R> X[1,] <-x R> for (i in 2:(ntimes + 20)) { + X[i,] <-make.Henon()$ode(i, X[i -1, ], hpars, NULL) + } R> X <-X[20 + 1:ntimes, ] R> Y <-X + 0.05 * matrix(rnorm(ntimes * 2), ntimes, 2) R> t <-1:ntimes This model creates a complex pattern, as can be seen in Figure 14. In the generalized profiling framework, the Hénon map model is and we smooth the data Y to create a time-seriesx t (θ) via the analogous idea to smoothingbased models for differential equations. More specifically, we re-define the penalty ISSE to be To fit within the profiling paradigm, we have to employ a basis expansion. In this case we just use an indicator function for the current time step being time k: φ k = I(t == k) so that the coefficient matrix C gives exactly the value of x i on each row t. Equivalent to evaluating the derivative of x, we just need to evaluate x at the next time point Dφ k = I(t == k + 1) so that ISSE takes the form as previously defined.

The MATLAB version of CollocInfer
A parallel version of the CollocInfer package is maintained in the MATLAB language and is, at the time of writing, also distributed within the R package.
Both versions of the package are written so as to maintain their names and functionality as closely as possible within the syntax of their respective languages. The most noticeable change within the CollocInfer naming conventions is the replacement of '.' in R with '_' in MATLAB within variable and function names. Without providing an exhaustive list of notational differences we note that some are particularly relevant to the functionality presented here: • List objects in R are replaced by struct objects in MATLAB in which the named elements are accessed by the . operator rather than $. Thus one access lik.fn rather than lik$fn.
• MATLAB does not employ named dimensions so that in the FitzHugh-Nagumo example, x[, "V"] must be replaced by x(:,1). Higher dimensional arrays are indexed by further indices in each language.
• An important detail is that the default behavior of each language when faced with the extraction of a particular dimension in an array. For example, if x is n × p, R will treat x[, 1] as a vector, removing the singleton dimension whereas MATLAB will treat x(:, 1) as a matrix of dimension n × 1. Each default can produce errors, although we have found that R's choice is more likely to do so in CollocInfer.
• Functions in MATLAB may return multiple arguments whereas R requires that these be concatenated into a list so that MATLAB's call to outeropt appears as: [pars_opt, ncoefs, value, gradient] = ... outeropt (times, data, coefs, allpars, lik, proc, active) where if we do not wish to record value, we can replace the syntax with []. In the command above, active could also be replaced by []. Note that the ordering of arguments in MATLAB functions is important, whereas they can be named in R as is the ordering of outputs.
A more detailed discussion of notational differences in the two languages can be found in Ramsay et al. (2009). Many of these syntactical differences can be picked up by opening a .R file in the MATLAB editor which will highlight syntax errors.
We will briefly examine some differences in performance between the two implementations.
In general, we find that the MATLAB implementation of CollocInfer provides a speed-up of a factor around 10 on any given task. This is due to the implementation of both optimization routines and sparse matrix calculations. It should be noted, however, that for numerically solving ODE's, the deSolve package (Soetaert et al. 2010) in R has proved to be considerably more accurate than the ode45 function in MATLAB. MATLAB also requires a license, which can be quite expensive, and does not provide access to the plethora of statistical modeling packages available in R.

Low-level details
This section describes the structure of lik and proc objects and the ways in which they can be constructed to incorporate a wide range of objective criteria and potential extensions of profiling methods. These are demonstrated at the end of this section by low-level code that produces these objects for our Rosenzweig-MacArthur model. As an initial step, below we demonstrate how right-hand side functions work.

Right-hand functions f j (t, x|θ) and their derivatives
A differential equation is defined by the right sides of the differential equations f j (t, x, θ), j = 1, . . . , d. We will refer to these functions as right-hand functions in this paper. In the FitzHugh-Nagumo equations (3) The user must supply a set of R functions that evaluate the values of certain mathematical functions and their derivatives at times of observation. We begin with the R functions that are associated with the right-hand functions f j (t, x|θ). These functions are required to run CollocInfer, but we will see that they can be replaced with generic functions based on finite differencing so that, at the risk of adding numerical error, the user only need provide the original right-hand function.
The functions that evaluate right-hand functions and their derivatives are supplied to Col-locInfer in a named list object, and the names used for these list members must be as shown below in typewriter font in order for the CollocInfer functions to be able to locate these functions. Of course, other named list members may also be included for purposes specific to an application, but the following named list members are required. Each of these five functions has four arguments: times: either a vector of times of observations, or a vector of quadrature points depending on which CollocInfer function is calling the function.
x: a matrix of process values corresponding to the times argument. The matrix has d columns corresponding to state variables and n rows corresponding to times. This argument contains state values x(t) rather than data values, and therefore all columns should contain only numeric data.
p: a vector of length p of parameter values.
more: an optional argument containing any other information required to compute the results. Of particular importance are any constant or functional input variables with values u (t), called forcing terms. In the event that a variety of types of additional input are required, the more object will be a list object.
It is wise to include argument checks at least at the fn function. For example, the number of columns in argument x should be equal to the number of state variables that the function is designed to handle; and checks may be needed for the contents of the more argument as well. Also, if there is the possibility of certain derivatives taking inadmissible values such as Inf or zero, this also should be checked. effort if the differential equations either have a large number of states or involve complicated right-hand functions f j . Based on our experience, it is strongly suggested that symbolic computation software be used for even seemingly simple systems, and that the programmer also run checks on these functions before using them in CollocInfer.
As an option, CollocInfer employs a finite differencing scheme on the columns of the fn element. Even if the intention is to program the computation of the necessary partial derivatives, the use of finite differencing can be a helpful debugging tool in the early stages of an analysis.
Finite difference approximations to partial derivatives can be obtained by calling function make.findif.ode() which creates a list with members fn, dfdx, dfdp, d2fdx2, d2fdxdp that calculate derivatives by finite differencing.
This is a situation where having the more member is useful. In this case the more list should itself contain members fn: the function whose derivatives are to be approximated by differencing.
eps: the time interval δt to be used in the differencing.
more: any further objects to be passed to link function.
Thus, for example, dfdx is written as R> findif.ode$dfdx <-function(times, x, p, more) { + fx <-more$fn(times, x, p, more$more) + x <-array(0, c(dim(fx), ncol(x))) + for (i in 1:ncol(y)) { + tx <-x + tx[, i] <-x[, i] + more$eps + dfx[,,i] <-(more$fn(times, tx, p, more$more) -fx) / more$eps + } + return(x) + } Here more$fn in the first statement is the fn object that defines the right-hand functions (e.g., the original fhn.fn). For each column of x, we increment the whole column by more$eps and use this to calculate finite differences in each value of the column. This makes use of R's vectorization and is very fast. The other member functions in findif.ode behave similarly.
By default we set eps <-1e-6, but the user should be aware that this can introduce numerical errors, and especially if the values of right-hand functions f j vary greatly from one variable to another. It is sound practice here as well as elsewhere in computational methodology to attempt to scale variables so as to keep their values within an order of magnitude of one another. This is one reason we used the log transformation in the chemostat example.

Cascading more arguments for layered functions
The example in the previous section demonstrates how the more argument is used to compose functions. We need the original fn right-hand functions, but it is called within the finite differencing functions and is passed to them as a member of the list in the more argument.
This type of passing can be used, for example, to specify a log transformation within finite differencing and we will use this strategy extensively. For example, we do this if we wish to use the transform z = log x to fit dynamics on the log scale along with finite differencing. To do this, we can write the function R> logtrans.fun <-function(times, x, p, more) { + x <-exp(x) + dx <-more$fn(times, x, p, more$more) + dx <-dx / x + return(dx) + } which requires more$fn to be our original right hand side function. This could be fit within a finite differencing method so that we have R> rhs <-make.findif.ode() R> rhs$more <-list(fn = logtrans.fun, eps = 1e-6) R> rhs$more$more <-list(fn = RosMac2, more = NULL) This will be employed further within the lik and proc objects that we define below. The system allows a very flexible composition of a lot of functions, but the user can get confused by the multiple nesting of more arguments and some careful thinking through may be necessary.
With these preliminaries, we continue to the definition of objects describing the fit to data and to process.

Defining lik objects for assessing data fit
The lik object contains a function to compute the value of the fit between observations and data. It also requires the coding of functions for evaluating the values of some of its partial derivatives. The lik object is a named list object with names for its members that must be exactly as shown below in order to allow the information associated with these names to be accessed by other functions in the CollocInfer package. Some of the names for these evaluator functions are also used for the list object containing the evaluator functions for the right-hand functions f j (t, x, θ) described above in Section 8.1. And they will also be used in the proc list object described below in Section 8.5, as well in other named list objects.
The required names of the list members and their contents for the lik named list object are: fn: a function that calculates the data fit measure F for each of d states at n times t i taken over the observed state variables, such as the negative log likelihood of the residuals. This function returns an n by d matrix.
dfdx: a function that calculates the partial derivatives of fn with respect to the values of the state variables x. It returns an array of size n × d × d.
dfdp: a function that calculates partial derivatives with respect to parameters. It returns an array of size n × d × p, where p is the length of the parameter vector θ.
d2fdx2: a function that calculates second partial derivatives with respect to the values of the state variables x. It returns an array of size n × d × d × d.
d2fdxdp: a function that calculates cross partial derivatives for state variable values and parameters. It returns an array of size n × d × d × p.
bvals: an n × K matrix giving the values of the basis at the observation times.
more: This is an optional member that contains any additional inputs that the functions may require. The more member is typically itself a named list with members that can be referenced by various functions described later in the paper.
The approximation of confidence regions for parameter estimates will also require that the user-defined functions in the lik object also contain the following partial derivatives with respect to the data argument y: dfdy: a function that calculates the partial derivatives of fn with respect to data values. It returns a matrix of size n × d × d, but the values returned for the unobserved state values are not used.
d2fdy2: a function that calculates second partial derivatives with respect to the data values. It returns an array of size n×d×d×d, but entries corresponding to unobserved variables are not used.
d2fdxdy: a function that calculates partial cross derivatives with respect to the data values and the process values. It returns an array of size n×d×d×d, but entries corresponding to unobserved variables are not used.
The arguments for the evaluator functions fn, dfdx, dfdy, dfdp, d2fdx2, d2fdxdy, d2fdy2, d2fdxp, d2fdydp include those for the right-hand function evaluators f j (t, x, θ) described in Section 8.1 above, but they are augmented by a first argument y specifying the data and by a matrix of basis function values. That is, they are y: a matrix or array of observation values. The first dimension is length n a the second dimension has length equal to the number of observed states.
times: a vector of n times of observations.
x: an n by d matrix of state values corresponding to the times values.
p: a vector of length p f parameter values.
more: an optional argument containing any other information required to compute the results.
These functions are returned in a named list object. The names for the members or entries in the list are the same as those for the right-hand functions.
While calculating the likelihood is fairly straightforward for most distributions, it can be somewhat more cumbersome to write down functions to calculate the four different derivatives. Therefore a number of constructor functions have been created to make these calculations easier. lik objects can be created by calls to make... functions that produce a list with the appropriate members. For each of these, the member more is required to have specific entries that are detailed below.

Error sum of squares function SSElik
The sum of squared errors criterion that is employed in the first sections of this paper, and often used as the default loss function, may also be viewed as the negative log likelihood for the Gaussian distribution. Function make.SSElik() creates a list with entries fn, dfdx, dfdy, dfdp, d2fdx2, d2fdxdy, d2fdy2, d2fdxp, d2fdydp. These functions calculate and its corresponding derivatives. They require more to contain functions fn, dfdx, dfdp, d2fdx2, df2dxdp and more$more for further objects needed to evaluate the f j . These functions take the arguments t, x, p, more which are the same as the corresponding entries in the lik constructions. However the function output should have an extra dimension. fn is a vector valued function and returns a n × d matrix. Similarly, dfdp returns an array of dimension n × d × p and so forth. The dimensions for the array go in order: element of f , derivatives with respect to x, derivative with respect to p.
In addition, more should contain a member weights. This should be a matrix of the w ij that is the same dimension as Y ; however it can also be a vector of same length as either dimension as Y , in which case it is interpreted as being constant across the other dimension. more may also contain names, giving the names of the states, if these are used in fn. Similarly, it may contain parnames to give the names of the parameters.
Programmers interested in setting up their own make... functions might find make.SSElik useful as a prototype.

Multivariate normal loss function make.multinorm
This set of functions calculates a log multivariate normal distribution for each observation This is a generalization of the SSElik functions to correlated processes whose correlation may vary over time and the state variables. These are most useful for defining proc functions.
Function make.multinorm returns a lik objects with fn, dfdx, dfdy, dfdp, d2fdx2, d2fdxdy, d2fdy2, d2fdxp, d2fdydp which calculate a multivariate normal distribution. These functions require more to contain fn, dfdx, dfdp, d2fdx2, df2dxdp to calculate the f j and their derivatives as in SSElik. It must also contain members var.fn, var.dfdx, var.dfdp, var.d2fdx2, var.df2dxdp to provide the same set of derivatives for V (t, x, p).
Since V (t, x, p) is matrix-valued, the output of these functions must have an extra dimension; giving var.dfdp dimension n × d × d × p, for example.
In addition, more contains entries f.more for additional objects to be passed to f and v.more contains additional objects to be passed to V . It may also contain names, giving the names of the states, if these are used in fn and var.fn. Similarly, it may contain parnames to give the names of the parameters.

Finite differencing with function findif.loglik
This function creates a straightforward finite difference approximation to the various required derivatives of the log likelihood. Since d and p are not expected to be overly large, this is not very computationally intensive unless the likelihood is itself expensive to evaluate. These functions are also helpful in providing a check for user-defined likelihood and derivatives.
Function make.findif.loglik produces a lik object with fn, dfdx, dfdy, dfdp, d2fdx2, d2fdxdy, d2fdy2, d2fdxp, d2fdydp to calculate the appropriate derivatives by fixed-step finite differencing. These functions require more to be a list containing entries: fn: this is the function that lik$fn would use if derivatives were given to it explicitly.
eps: the discretization parameter to be used in the finite difference scheme.
more: any additional inputs to more$fn.

Defining proc objects for assessing equation fit
The proc object stores functions to calculate the second term, the roughness penalty P . This is structured in a similar manner to the lik objects. An examination of both the least squares version of the inner optimization criterion J in PENSSE(x, θ) and its more general version in (12) indicates that we • replace the summation over n discrete time points t i by the integration over continuous t, and • replace the noisy observed data values y ij by the current derivative estimates dx j /dt, which, like the data, are not expected to be exactly equal to their fitted values f j (t, x, θ).
An important difference is that while lik$fn returns a goodness of fit measure at each observation time point, proc$fn returns a single number evaluating the goodness of fit for the entire process. This is largely opaque in inneropt and outeropt, but allows the user to extend the CollocInfer functionality to spatial processes and processes that involve delays or integral-differential systems.

Quadrature points t q and weights w q
At the computational level, the integral of P is necessarily approximated by numerical quadrature. This involves a judicious discretization of t and replacing the integral by a summation over quadrature points t q using quadrature weights w q , so that in the least squares case and in the more general setting (12).
Numerical quadrature plays an essential role in the collocation approach, or indeed in any method of approximating a solution to a differential equation. The user must supply define these quadrature points and weights. The reader is warned that more difficult dynamic systems involving sharp local curvatures in the solutions will demand a more sophisticated rules based on the expected behavior of the ODE.
However, when solutions have only mild curvatures, the quadrature points t q may be equally spaced and sufficient in number to capture the required detail in the solution. The weights w q may be then equal to δ = 1/(t q − t q+1 ) everywhere except at the end points, where they will be δ/2. This simple approach to quadrature is called the trapezoidal rule.

Defining the functions and their arguments
As in the definition of the lik object, the proc is a named list, some of the names are specifically required by the CollocInfer package, and the majority of these are user-defined functions. However, many useful defaults are provided.
The required names and their contents for the proc list are fn: a function that calculates P (dx/dt, f(x; θ)); the regularization penalty (or log probability) of the process. This returns a scalar.
dfdc: a function that calculates the derivatives of fn with respect to coefficients in C, returns a vector of length dK.
dfdp: a function that calculates the derivatives with respect to parameters in θ, and returns an vector of length p.
d2fdc2: a function that calculates the second derivatives with respect to coefficients, and returns a matrix of size dK × dK.
d2fdcdp: a function that calculates the cross derivatives of coefficients and parameters, and returns a matrix of size dK × p.
more: usually a named list object whose members provide additional information defining these functions. Two members that may optionally be provided are names: d names for the state variables.
parnames: p names for the parameters. dbvals: a Q × K matrix of values of the first derivatives of the basis functions at the quadrature points t q .
Some applications may require members of list bvals containing higher derivatives of the basis functions at the quadrature points.
All of the functions above take the following arguments coefs: the current estimate of the coefficients.
bvals: as given in the bvals member in proc.
pars: current parameter values.
more: a named list containing additional information that may be required. In particular, it must specify quadrature points and weights with names qpts: a vector of length Q containing quadrature points t q where the penalty is to be evaluated.
weights: a matrix of dimension q × d containing the quadrature weights w q .
Named list more may also optionally have members names: a list of d names for the state variables. These enable the functions defined above to state variables in terms of their names rather than indexes.
parnames: a list of p names for the parameters.
These rather general definitions for the proc functions imply somewhat more effort for the user in defining them. The payoff is a very general framework that can encompass both discrete and continuous time systems along with higher-order and spatial systems.
However, as with lik objects, a number of pre-defined functions have been constructed to create special proc objects. These may have different definitions for the bvals member of the proc list, as well as for the more member.

Error sum of squares proc function make.SSEproc
This is the analogue of the make.SSElik and is used by default, based on approximation to the integrated squared error version of the roughness penalty The CollocInfer pre-specified function make.SSEproc() creates a proc named list with functional members fn, dfdc, dfdp, d2fdc2 and df2dcdp. In addition, member bvals needs to be defined as a named list to hold bvals: a Q × K array giving the values of the basis functions at the quadrature points.
dbvals: a Q × K array giving the values of the derivatives of the basis functions at the quadrature points.
qpts: a vector of quadrature points t i where the penalty is to be evaluated.
weights: a matrix giving the quadrature weights w ij .

Function Cproc
Function Cproc generalizes SSEproc to allow any log likelihood of dx/dt given x: It can be used to take any of the negative log likelihoods defined for lik objects and convert them into the corresponding proc objects, provided the derivatives with respect to y are defined in the lik object.

Function Dproc
The Dproc functions provide similar functionality to Cproc but for discrete-time processes. These calculate We often define Φ to be a sequence of square functions with breaks in the midpoints between the t i . This is then equivalent to estimating the discrete state of the system.
The arguments for function Dproc are mildly different: bvals: contains a single n × K array giving the basis values at the n evaluation times. For a saturated basis, n = K and in this case is just the identity matrix of order n.
more$qpts is an n − 1 vector of times.

Manual set-up of the Rosenzweig-MacArthur model
In order to put all this in context, we will give the code here that will produce the lik and proc objects for modeling the Rosenzweig-MacArthur model example above. Here, instead of using the LSsetup function to generate the lik and proc object, we go through the process step by step.
We assume that we have defined RosMac2, RMbasis and data, RMpars, RMParnames and RMVarnames. In other words, we assume that the code associated with the Rosenzweig-MacArthur model has already been run and we are just manually producing the lik and proc objects here. We begin by creating matrices of the evaluation of the basis functions at the time points we wish to use. By only insisting on the values of the basis, we allow the user to employ their own set of basis functions as desired. We need the values of the basis system at observation time points: R> bvals.obs <-eval.basis(time, RMbasis) and quadrature times. Here we use the midpoints between breaks plus the end points The quadrature weights are all equal, but we have multiplied them by the λ that we are using: R> qpts <-c(breaks[1], breaks[1:(length(breaks) -1)] + diff(breaks) / 2, + breaks[length(breaks)]) R> qwts <-1e5 * matrix(1, length(qpts), 3) / length(qpts) For proc objects, the basis values are a list containing two matrices providing the values of the basis functions at the quadrature points, and the values of the derivatives of the basis functions: R> bvals.quad <-list(bvals = eval.basis(qpts, RMbasis), + dbvals = eval.basis(qpts, RMbasis)) With these defined, we create the lik object. The command R> lik.m <-make.SSElik() sets up the squared error criterion for the data. This is a list of functions and their derivatives, to which we attach the values of the basis expansion at the observation time points with the command R> lik.m$bvals <-bvals.obs We now need to specify the transformation of the state variables that is to be compared with the data and functions to calculate various derivatives. In this case, we use finite differencing to compute the derivatives that we need; we can achieve this by first employing