MAGI: A Package for Inference of Dynamic Systems from Noisy and Sparse Data via Manifold-constrained Gaussian Processes

This article presents the MAGI software package for the inference of dynamic systems. The focus of MAGI is on dynamics modeled by nonlinear ordinary differential equations with unknown parameters. While such models are widely used in science and engineering, the available experimental data for parameter estimation may be noisy and sparse. Furthermore, some system components may be entirely unobserved. MAGI solves this inference problem with the help of manifold-constrained Gaussian processes within a Bayesian statistical framework, whereas unobserved components have posed a significant challenge for existing software. We use several realistic examples to illustrate the functionality of MAGI. The user may choose to use the package in any of the R, MATLAB, and Python environments.


Introduction
Ordinary differential equations (ODEs) are widely used as models for dynamic systems in science and engineering, including gene regulation (Bolouri 2008), chemical reactions (Walas 1991;Wong, Yang, and Kou 2023), epidemiology and ecology (Busenberg and Cooke 1981), economics (Tu 2012), etc.We focus here on the case where the ODEs are nonlinear with unknown parameters governing their behavior.The problem of interest is to recover the unobserved system trajectories as well as to estimate the parameters from experimental or observational data, where the observations taken from the system may be subject to measurement noise and may only be available at a sparse number of time points.Further, some components in the system may be entirely unobserved.This paper introduces the magi software package, named after its corresponding method (MAnifold-constrained Gaussian process Inference; Yang, Wong, and Kou 2021) which provided fast and accurate inference for this statistical problem on a variety of examples, including the case when there are unobserved system components.Specifically, magi handles parameter estimation for models where the system components are governed by a set of ODEs, which we denote by ẋ(t) = dx(t) dt = f (x(t), θ, t), (1) where x(t) is the D-dimensional system trajectory over time 0 ≤ t ≤ T (i.e., x : [0, T ] → R D ), and ẋ(t) is shorthand for the vector of derivatives dx(t)/dt, which are specified via the known function f .The vector θ denotes the model parameters to be estimated, which govern the behavior of the system.We let y(τ ) denote the observed data, namely the noisy measurements taken from the system at observation time points τ .Throughout this article, we use τ = (τ 1 , τ 2 , . . ., τ D ) to denote the collection of observation time points, where τ d is the vector of time points at which component d is observed, d = 1, . . ., D. Each system component can have its own set of observation times τ d , and some components may not be observed at all (for which τ d = ∅).We assume that the noise is additive and Gaussian, i.e., y(τ ) = x(τ ) + ϵ(τ ), where the error term ϵ has noise level σ (which may be known or unknown).The key feature of magi is to infer x(t) and θ from y(τ ) without the need for any numerical integration, even when there are unobserved system components. 1 This is achieved by taking a Gaussian process (GP) as a prior for x(t) and constraining it to a manifold that satisfies the ODE system.Inference is then carried out within a principled Bayesian statistical framework, that is, we condition on all known information and quantities and apply Bayesian techniques to the resulting posterior distribution.
magi is available for R, MATLAB, and Python which enable practitioners to input and work with custom ODE systems in their preferred computing environment.The packages share a common C++ (Stroustrup 2013) code base, which ensures a consistent method implementation across all three environments.The main text of this article will provide code examples in R (R Core Team 2024).Equivalent code for the examples in MATLAB (The Math-Works Inc. 2021) and Python (Van Rossum et al. 2011) are provided in the replication materials, along with usage instructions in the Appendices A and B. Note that even with the same random seed, the exact numerical results from the replication script may exhibit slight differences depending on the versions of the LAPACK (Anderson et al. 1999) and BLAS (Blackford et al. 2002) libraries present on the system.R package magi (Yang and Wong 2024) is available from the Comprehensive R Archive Network (CRAN) at https: //CRAN.R-project.org/package=magi.

Illustrative example: Oscillation of Hes1 mRNA and protein levels
To begin with a concrete example, consider the three-component dynamic system, which for short we write as X = (P, M, H), introduced in Hirata et al. (2002), governed by the ODEs where P and M are the protein and messenger ribonucleic acid (mRNA) levels in cultured cells.In experimental data, P and M levels exhibit oscillatory cycles approximately every 2 hours, and the H component is a Hes1-interacting factor that helps regulate this oscillation via a negative feedback loop.The parameters of this system are θ = (a, b, c, d, e, f, g), where a and b can be interpreted as synthesis rates; c, d and g as decomposition rates; and e and f as inhibition rates.
The remainder of this subsection describes a realistic sample dataset that is simulated from this system.The key features of the dataset are: (i) M and P are measured at different sets of time points, (ii) the observations for M and P are noisy (i.e., have measurement error), (iii) H is never observed.In Section 3, we will then demonstrate how to use magi to recover the system trajectories and parameters from the dataset, without the use of any numerical solvers.
We first define a function that computes f for this ODE system.Its inputs are a vector of parameters θ, a matrix for X (with columns corresponding to components), and a vector of time points tvec.The function then returns a matrix of values of f , with rows corresponding to the time points in tvec and columns corresponding to the components of X: Following the real experimental setup in Hirata et al. (2002), measurements of P and M are taken every 15 minutes over a four-hour period, but asynchronously: P is observed at t = 0, 15, 30, . . ., 240 minutes, while M is observed at t = 7.5, 22.5, . . ., 232.5 minutes, and H is never observed.
We shall simulate from this system using the parameter values studied theoretically in Hirata et al. (2002): a = 0.022, b = 0.3, c = 0.031, d = 0.028, e = 0.5, f = 20, g = 0.3; the initial conditions P (0) = 1.439,M (0) = 2.037, H(0) = 17.904 are taken to mimic the authors' setting, where the system is initialized at the point in the stable oscillation cycle where P is at its minimum.The observation noise in the experiment is approximately 15% of both the P and M levels, which we treat as multiplicative noise following a log-normal distribution with known standard deviation 0.15.For convenience, we setup a list containing these input values for the simulation: R> param.true<-list(theta = c(0.022, 0.3, 0.031, 0.028, 0.5, 20, 0.3), + x0 = c(1.439, 2.037, 17.904), sigma = c(0.15,0.15, NA)) Since H is never observed, measurement noise is not applicable to that component.Next, to simulate the data for our analysis, we use a numerical solver to construct the system trajectories implied by these values.In R, we can utilize the ODE solvers available in the True system trajectories (solid curves) and sample noisy observations (points) from the Hes1 system.The H component is never observed.In Section 3, we demonstrate how magi recovers the system trajectories and parameters from this sample dataset of noisy observations.
deSolve package (Soetaert, Petzoldt, and Setzer 2023), by defining a wrapper that satisfies the syntax of its ode function.We emphasize that the numerical ODE solver is only used here for generating the data; throughout the MAGI method for inferring the system trajectories and parameters, no numerical solver is ever needed.

Overview of related software
magi handles the so-called inverse problem for ODEs, namely to recover the system trajectories and parameters from a set of observational or experimental data.Existing software for this problem can be broadly categorized into methods that rely on using numerical solvers for ODEs and those that do not.
In Equation 1, the system trajectory x(t) may be determined for a given set of parameter values θ and initial conditions x(0) by integration.For nonlinear functions f , numerical integrators (e.g., Runge-Kutta) are often needed for solving the ODEs in this way.We may denote this numerical solution as x(t; x(0), θ) to indicate its deterministic relationship with θ and x(0).A simple approach can thus repeatedly solve for x(t; x(0), θ) to optimize a likelihood or least-squares criterion for the data y(τ ), as a function of θ (and also x(0) if the initial conditions are unknown).A least-squares criterion to minimize would take the form ∥y(τ ) − x(τ ; x(0), θ)∥ 2 where ∥•∥ is the usual Euclidean norm, i.e., by evaluating x (which is determined for all t) at the observation time points τ .This can be viewed as a non-linear least squares (NLS) problem, which could be handled in R using nls from stats (R Core Team 2024) together with one of the numerical integrators in deSolve.In MATLAB, the System Identification toolbox (Ljung 1995) and the Data2Dynamics modeling environment (Raue et al. 2015) also provide functionality for inverse problems with the help of numerical solvers.Bayesian approaches for parameter estimation using numerical solvers are also available, such as in the deBInfer package (Boersch-Supan, Ryan, and Johnson 2017) in R. In this case, the numerical solution is used to construct a likelihood p(y(τ ) | x(τ ; θ, x(0)), σ) and priors are placed on all the unknown quantities among θ, x(0), and σ.While methods based on numerical solvers are generally applicable for ODE inverse problems (including when system components are unobserved), they may encounter significant computational bottlenecks: The numerical solver must be invoked repeatedly for values of θ and x(0) used in the estimation procedure.
As a result, the second broad category consists of methods designed to estimate θ without the need for numerical integration; magi belongs to this category.These methods use various techniques to curve-fit the observations while following the ODE system dynamics (e.g., by gradient matching).We provide an overview of some representative methods with software available in R in the following: • A pioneering collocation approach proposed a B-spline basis to fit the system trajectories, where x(t) is represented by x(t) = c ⊤ Φ(t) for basis functions Φ(t) and co-efficients c.Then, a penalized likelihood of the form ∥y(τ 2 dt is optimized, where the first term measures fit to the data and the second term (with penalty parameter λ and using B-spline derivatives for ẋ(t)) ensures fidelity to the ODEs (Ramsay, Hooker, Campbell, and Cao 2007).This method is available in the packages CollocInfer (Hooker, Ramsay, and Xiao 2016) and pCODE (Wang and Cao 2022).
• Reproducing kernel Hilbert spaces (RKHS) have also been used for gradient matching (Niu, Rogers, Filippone, and Husmeier 2016).The where k(t, •) is a kernel function from the Hilbert space, t 1 , . . ., t n are the observation times, and b d are kernel coefficients.
A criterion of the form ∥y(τ is then minimized, where the two terms have similar interpretation as in CollocInfer, and the regularization parameter λ may be obtained by cross-validation.Different variants of RKHS methods, including transformations on t to better accommodate potential time inhomogeneity of the ODE solutions, are implemented in KGode (Niu, Wandy, Daly, Rogers, and Husmeier 2021).
• Inference based on a separable integral-matching approach is implemented in simode (Dattner and Yaari 2024).First, x(t) is approximated via fitting a spline representation x(t) to the data, to bypass numerical integration of the ODEs.Noting that the true ODE solution may be expressed as dt as a function of θ and x(0).In systems where f is linear or semi-linear in θ, the estimates can be obtained efficiently by taking advantage of the separable parameters in the optimization procedure (otherwise, non-linear optimization will be required).
• A Bayesian approach using GPs for fitting the trajectories is available in the deGradInfer package (Macdonald and Dondelinger 2020).This implements the gradient matching method of Dondelinger, Husmeier, Rogers, and Filippone (2013), which places a GP prior on x(t) (with hyper-parameters ϕ) so that y, x, ẋ have a joint GP specification.
While some of these methods can handle unobserved system components in theory, the available software implementations tend to lack this functionality in general.Only CollocInfer and pCODE can accommodate an unobserved component in the estimation procedure; however, substantial manual input is required to carry out the analysis.This is subsequently demonstrated in Section 5, where we carry out an illustrative comparison between methods.Thus, one distinct contribution of the magi package is that it provides a ready-made solution for systems with unobserved components, in addition to its principled inference framework that is rooted in Bayesian statistics.

Manifold-constrained Gaussian process inference
This section explains the key points of manifold-constrained Gaussian process inference (MAGI) for inferring the system trajectories x(t) and parameters θ given the observed data y(τ ).The interested reader may refer to Yang et al. (2021) for additional details.
The MAGI method places a GP prior on x(t), so that ẋ(t) conditional on x(t) also has a convenient GP form for facilitating inference without the need for numerical integration.Previous authors adopting this basic idea, e.g., Calderhead, Girolami, and Lawrence (2009); Dondelinger et al. (2013); Barber and Wang (2014); Wenk, Gotovos, Bauer, Gorbach, Krause, and Buhmann (2019), have noted that this setup can cause ẋ(t) to be conceptually specified in two incompatible ways: First via the function f in the ODE (see Equation 1), and second via the GP, e.g., as seen above in the setup of deGradInfer.The MAGI method addressed this conceptual difficulty by conditioning the GP on a manifold constraint that satisfies the ODEs specified by f .This manifold constraint can be described as follows.First, let D denote the number of system components, with x d (t) and f (x(t), θ, t) d denoting the d-th component of x(t) and f respectively, d = 1, . . ., D, and let C 1 [0, T ] be the set of differentiable functions on [0, T ].
Then for x : [0, T ] → R D and a given parameter space Ω θ , we define a manifold X on which the derivative ẋ satisfies the dynamics specified by the ODE: i.e., x ∈ X lies on the manifold of the ODE solutions.Second, to incorporate this manifold as a constraint in the GP, we define a variable W according to i.e., W quantifies the maximum discrepancy between a derivative trajectory and the dynamics implied by the ODEs.Thus, W = 0 if and only if a realization X = x of the GP satisfies x ∈ X .Under a Bayesian paradigm, the joint posterior distribution of θ and X(t) is then conditioned on W = 0 and the observed data y(τ ), i.e., the ideal posterior of interest is p(θ, x(t) | W = 0, y(τ )).However to be practically computable, W = 0 needs to be approximated by finite discretization.Let I = {t 1 , t 2 , . . ., t n } be a set of discretization points in [0, T ], with τ ⊂ I; then as a discretized analogue to W , we define a variable W I according to This allows us to approximate W = 0 by setting W I = 0, i.e., Ẋ(t) from the GP is constrained to equal f (X(t), θ, t) from the ODE for each component d = 1, . . ., D at each time point t ∈ I.
With W I = 0 as the manifold constraint in practice, the corresponding posterior distribution is p(θ, x(I) | W I = 0, y(τ )).As we shall see below, this construction induces a closed-form posterior such that standard techniques of Bayesian inference can be applied, while ensuring that posterior samples of x(I) respect the ODE dynamics.Our construction also contrasts with penalized likelihood or regularization-based approaches (e.g., Ramsay et al. 2007;Wang and Cao 2022;Niu et al. 2021); applying the rules of conditional probability with W I = 0  In panel (a), the GP is conditioned on four noisy observations (solid black dots).The colored curves are five random trajectories drawn from the resulting GP posterior, and the gray shaded area represents the 95% credible interval.In panel (b), the GP is conditioned on the same four noisy observations and also the ODE manifold constraint.The colored curves are five random trajectories drawn from the resulting GP posterior, and the gray shaded area represents the 95% credible interval with the manifold constraint.
ensures that the ODE dynamics are exactly followed for all time points in I, without the need to construct any penalty or regularization term.
To illustrate this concept of manifold constraint, we guide the reader through a simple example in Figure 2. We begin with a selected GP prior on a 1-dimensional x(t) over the interval [0, 10].For practical computation, the GP will be evaluated at I = {0, 0.05, 0.1, . . ., 10}.Suppose four noisy observations y(τ ) are taken at τ = {1, 3, 4, 5}; these are shown as black points in the panels of Figure 2. Then the GP posterior conditional on these four observations is visualized in Figure 2a.Five sample trajectories drawn from this GP posterior are shown via the colored curves, and the gray bands in Figure 2a represent 95% credible intervals of this GP posterior.Next, as an example suppose x(t) satisfies the linear ODE ẋ(t) = f (x(t), θ, t) = θ 1 x(t) + θ 2 with parameters θ = (θ 1 , θ 2 ), which has analytic solution x(t) = c exp(θ 1 t) − θ 2 /θ 1 for some constant c.We now proceed to condition the GP on both y(τ ) and the ODE manifold constraint W I = 0. Five draws of x(I) and θ from this manifold-constrained GP posterior are visualized via the colored curves in Figure 2b.We see that they each obey the functional form of the known analytic solution x(t) in this case, namely x(t) = c exp(θ 1 t) − θ 2 /θ 1 with different values of c, θ 1 , and θ 2 , i.e., ẋ(t) from the GP satisfies the ODE specified by f .The gray shaded area in Figure 2b represents 95% credible intervals of this manifold-constrained GP posterior.Contrasting the two panels of Figure 2, this analytical example provides an intuitive depiction for the key idea of the ODE manifold constraint.The R code to produce Figure 2 is given in the replication script.
We return to our discussion of the joint posterior distribution on x(I) (i.e., the system trajectories at t 1 , . . ., t n ) and θ, given y(τ ) and W I = 0, which is expressed using Bayes' rule and factorized according to: Next, we note that both the GP prior for X and the observations are independent of Θ, so we have the simplifications p(X( To simplify the last term, we substitute the definition of W I = 0 and use the fact that the GP derivative Ẋ(I) given X(I) has a multivariate normal distribution that is conditionally independent of Θ and the observations: Thus, we finally obtain where the four terms are: π(θ) the prior density of the parameters, p(x(I)) the multivariate normal density for the GP prior on x(t) evaluated at the points in I, p(y(τ ) | x(I)) the likelihood of the noisy observations, and p( Ẋ(I) = f (x(I), θ, t I ) | x(I)) the multivariate normal density for the conditional distribution of Ẋ(I) given X(I) evaluated at Ẋ(I) = f (x(I), θ, t I ).Equation 3 is the basis of inference for magi.
Note that the GP prior on x(t) involves hyper-parameters that govern the mean and covariance functions of the GP; we denote these hyper-parameters by ϕ.The likelihood p(y(τ ) | x(I)) may additionally depend on noise parameters, we denote these by σ, which may be known or unknown depending on the application.
The MAGI method obtains Markov chain Monte Carlo (MCMC) samples for x(I) and θ from Equation 3. The method proceeds according to the following steps: 1.For system components that have observations, obtain values of ϕ and σ by finding the optimal hyper-parameters and noise level based on fitting a GP to the data.(If the noise level σ is known or given, optimization is applied to ϕ only.)The σ obtained is used to initialize MCMC sampling when the noise level is unknown.MCMC sampling for x(I) is initialized by linearly interpolating the observations y(τ ).
2. For system components that are unobserved, obtain values of ϕ and x(I) together with θ by a joint optimization of Equation 3, treating (ϕ, σ, x(I)) for the observed components obtained in step 1 as fixed.If there are no unobserved components, only θ is optimized in Equation 3.This step provides values at which MCMC sampling for θ is initialized, along with (ϕ, x(I)) for unobserved components.
3. Hamiltonian Monte Carlo (HMC, Neal 2011) is used as the MCMC sampling algorithm for obtaining joint draws of x(I) and θ (together with σ if the noise parameters are unknown).During this posterior sampling, the GP hyper-parameters ϕ are fixed at the values obtained in steps 1 and 2. Our implementation of HMC automatically tunes the leapfrog step sizes of HMC during burn-in to achieve an acceptance rate of 60-90%.The theoretical optimal acceptance rate for HMC is 65%, as suggested in Neal (2011).
At the conclusion of MCMC sampling (and after discarding the burn-in iterations), we may treat the posterior means of x(I) and θ as estimates of the trajectories and parameters respectively.The MCMC samples can also be used to characterize the uncertainty in these estimates, by computing credible intervals (CIs).
More methodological details of the MAGI method are discussed in Yang et al. (2021); the remainder of this paper focuses on practical usage of magi and the functionalities of the software package.

Using the magi package
This section illustrates the main functionalities of the magi package by analyzing the sample dataset for the Hes1 model discussed in the introduction.
After installing the magi package from CRAN, we load it into R:

R> library("magi")
The overall method is carried out via the core function MagiSolver, which initializes the GP hyper-parameters ϕ and then carries out MCMC sampling for the parameters together with the system trajectories.The basic syntax is where y is a data matrix that includes a column named time for the time points, odeModel is a list that specifies the ODE functions and its parameters, and control is a list used to provide any additional control settings.We describe each of these in turn, as we set up MagiSolver for the Hes1 dataset.Since the Hes1 observations for P and M are positive and subject to multiplicative error, we may apply a log-transform to the data and equations for the analysis, which then satisfies the framework for additive noise ϵ.Thus, continuing the data setup from Section 1, we create y.tilde as the log-transformed version of the observations y (i.e., excluding the time column): R> y.tilde <-y R> y.tilde[, names(y.tilde)!= "time"] <-+ log(y.tilde[,names(y.tilde)!= "time"]) Recall that any unobserved values of y.tilde (in this case, the entire H component column and every other value for P and M ) are assigned NaN.The data matrix y.tilde for input to MagiSolver is now prepared, where the discretization set is I = {0, 7.5, 15, . . ., 240} minutes and corresponds to the time points where either P or M are observed.Note that with |I| denoting the cardinality of I, the dimensions of the input data matrix are |I| × (D + 1), to include a column for time and each system component.To set up the input data matrix with different choices of I, a helper function setDiscretization is provided; see Section 4.1 for a discussion of its usage and general guidelines for setting up I in practice.
Next we set up the functions required for the odeModel list.First, we apply a log-transform to each component of the ODE system by defining X = (log(P ), log(M ), log(H)), then applying the relation where P , M , and H are the components in Equation 2. We may code this via a function which follows the same format as the function for the original (non-transformed) hes1modelODE: Second, to facilitate MCMC sampling via HMC we also supply functions for the gradients of the ODEs with respect to the system components x and the parameters theta.With respect to x, we have the matrix of gradients as follows, which are specified via a function that outputs a 3-D array with dimensions |I|×D ×D, where the array slice [, i, j] is the partial derivative of the ODE for the j-th system component with respect to the i-th system component: With respect to theta, we have the matrix of gradients as follows, which are specified via a function that outputs a 3-D array with dimensions |I|×|θ|×D, where the array slice [, i, j] is the partial derivative of the ODE for the j-th system component with respect to the i-th parameter in θ: At this point, it can be worthwhile to check that the gradients have been coded correctly.This can be done using testDynamicalModel, which tests the provided analytic gradients for correctness using numerical differentiation (via a finite difference approximation).To illustrate, we generate some test values for the data and theta as input into testDynamicalModel along with our ODE functions, which indicate the numerical and analytic gradients match for both hes1logmodelDx and hes1logmodelDtheta: R> yTest <-matrix(runif(nrow(y.tilde)* (ncol(y.tilde)-1)), + nrow = nrow(y.tilde),ncol = ncol(y.tilde)-1) R> thetaTest <-runif(7) R> testDynamicalModel(hes1logmodelODE, hes1logmodelDx, hes1logmodelDtheta, + "Hes1 log", yTest, thetaTest, y.tilde[, "time"]) Hes1 log model, with derivatives Dx and Dtheta appear to be correct Third, odeModel must specify the upper and lower bounds on the parameters theta.In this example, all of the seven parameters (a, b, c, d, e, f, g) are non-negative, so we may set the corresponding bounds as 0 and Inf.
We are now ready to define the required list containing the three ODE model functions and parameter bounds: Finally, additional settings can be supplied to MagiSolver via the list control, which may include any number of the following optional inputs.Brief descriptions are provided here, along with references to subsequent sections for further details.
• Settings related to the MCMC sampling setup and initialization of σ, θ and x(I).
sigma: A numeric vector of length D, specifies the noise levels σ (i.e., standard deviations of observation noise) at which to initialize MCMC sampling.By default, MagiSolver assumes that σ is unknown and initializes it via fitting a GP to the data.If the noise levels are known, supply sigma together with the option useFixedSigma = TRUE, which will then omit σ from MCMC sampling.
the known values of σ must be supplied via the sigma control setting.Default is FALSE.-xInit: A numeric matrix with dimension |I| × D, specifies values for the system trajectories at which to initialize MCMC sampling.Default is linear interpolation between the observed (non-missing) values of y to match the resolution of the discretization set I. An optimization routine is applied to Equation 3 (as a function of θ, ϕ and x(I) for unobserved system components) to initialize any unobserved system components.
theta: A numeric vector of the same length as θ, specifies values for the parameters θ at which to initialize MCMC sampling.By default, MagiSolver optimizes Equation 3 as a function of θ only (with xInit fixed) to initialize theta; if there are unobserved system components, theta is initialized together with them (see xInit).-priorTemperature: Numeric, a tempering factor by which to scale the contribution of the GP prior, to control the influence of the GP prior relative to the likelihood of the observations.Effectively, the log of the GP prior is divided by priorTemperature.Default is the total number of observations divided by the total number of discretization points, aggregated over all components; a more complete discussion is provided in Section 4.1.
• Settings related to the GP prior and its hyper-parameters.These options are discussed in detail in Section 4.2.
kerneltype: String, specifies the type of GP covariance function to use.-bandSize: Integer, specifies the size of the diagonal band matrix approximation used to speed up matrix operations.Default bandSize is 20, can be increased if MagiSolver returns an error indicating numerical instability.
• Settings related to the Hamiltonian Monte Carlo (HMC) sampling algorithm that is used to obtain MCMC draws from the posterior.A description of HMC and the role of these options is provided in Section 4.3.
-niterHmc: Integer, specifies the number of HMC iterations to run.Default is 20000.
-nstepsHmc: Integer, specifies the number of leapfrog steps per HMC iteration.Default is 200.
-burninRatio: Numeric, specifies the proportion of HMC iterations to be discarded as burn-in.Default is 0.5, which discards the first half of the MCMC samples.
-stepSizeFactor: Numeric, initial leapfrog step size factor for HMC.Default is 0.01, and the leapfrog step size is automatically tuned during burn-in to achieve an acceptance rate between 60-90%.
• Other miscellaneous settings for specialized situations.
-skipMissingComponentOptimization: Logical, set to TRUE to override automatic optimization for unobserved components.If skipMissingComponentOptimization = TRUE, values for xInit and phi must be supplied for all system components.Default is FALSE.
-positiveSystem: Logical, set to TRUE to enforce the constraint that x(I) is nonnegative for all system components.Default is FALSE.
verbose: Logical, set to TRUE to output diagnostic and sampling progress messages to the console.Default is FALSE.
For most settings, the defaults are generally recommended as a reasonable starting point for using MagiSolver.The examples provided in this paper will illustrate some cases where it is necessary to override the defaults.
In the Hes1 example, the noise standard deviations are known.We supply these values via sigma and set useFixedSigma = TRUE in the control list, as otherwise σ is treated as a parameter that is sampled within each HMC iteration.We use the defaults for the remaining settings and run MagiSolver on the Hes1 dataset as follows, which stores the output in hes1result: R> hes1result <-MagiSolver(y.tilde,hes1logmodel, + control = list(sigma = param.true$sigma,useFixedSigma = TRUE)) In R, the output of MagiSolver is an S3 object of class 'magioutput' which contains the following list elements: • theta: Matrix of MCMC samples for θ after burn-in.
• xsampled: Array of MCMC samples for the system trajectories x(I) after burn-in.
• sigma: Matrix of MCMC samples for σ after burn-in.
• lp: Vector of log-posterior values at each HMC iteration, after burn-in.
• y, tvec, odeModel: The data matrix, time vector, and odeModel specification from the inputs to MagiSolver.
For convenience in R, 'magioutput' objects have the following associated methods to provide basic inferences from the MCMC samples: • print(): Displays a brief summary of the settings used for the MagiSolver run.
• summary(): Generates a table of parameter estimates and credible intervals.
• plot(): Visualizes the inferred trajectories and credible bands for each component, or generates diagnostic traceplots (i.e., plots of MCMC sampled values vs. the number of iterations) for the parameters.
We have allowed the hyper-parameters ϕ to be automatically estimated in this example (including for the unobserved H component), which is often sufficient in our experience.Further guidelines for setting the GP prior and hyper-parameters are discussed in Section 4.2.
Turning to the MCMC samples, Figure 3 shows the traceplots of theta and lp (sigma is omitted since it is treated as known in this example) as an informal check for convergence, produced using the plot() convenience function with type = "trace": R> theta.names<-c("a", "b", "c", "d", "e", "f", "g") R> plot(hes1result, type = "trace", par.names= theta.names,nplotcol = 4) The MCMC samples of each parameter randomly scatter around their posterior means (red horizontal lines), which visually indicate that convergence has occurred.The 95% credible intervals (via 2.5 to 97.5 percentiles of the MCMC samples) are shown via the green horizontal lines.We generally suggest taking the posterior mean as the parameter estimate for θ. 2 The numerical values of these parameter estimates and credible intervals can be extracted using the convenience summary() method: R> summary(hes1result, par.names= theta.names)a b c d e f g Mean 0.0220 0.300 0.0291 0.0336 0.634 13.50 0.1380 2.5% 0.0119 0.224 0.0185 0.0277 0.459 6.82 0.0694 97.5% 0.0358 0.398 0.0414 0.0403 0.851 24.80 0.2200 The true parameter values are well contained in the 95% credible intervals, with the exception of g, which only governs the unobserved H component as seen in Equation 4.
Next, we can extract and visualize the sampled system trajectories x(I).We treat the posterior means as the inferred trajectories, and use the 2.5 to 97.5 percentiles of the MCMC samples to provide 95% credible intervals at each time point in I.These are the default settings of the convenience plot() method, which we use to generate Figure 4: R> plot(hes1result, lwd = 2, col = "forestgreen", comp.names= compnames, + xlab = "Time", ylab = "log(Level)") In this example it is helpful to do some further post-processing, and generate a customized plot on the original scale with the true trajectory overlaid.We exponentiate to convert each component to the original scale of measurement: The inferred trajectories (green curves) and blue shaded areas representing the 95% credible intervals are shown in Figure 5, with the noisy observations and true trajectories overlaid as black points and red curves, respectively.The system trajectories are recovered well: The green curves for P and M are consistent with the observed data points and the truth for the entirely unobserved H component is correctly inferred.The plotting code for Figure 5 is given in the replication script.

Finer control of inference: Features and examples
This section presents two additional dynamic system examples to illustrate the role of the discretization set I, the GP prior and its hyper-parameters ϕ, and the HMC algorithm.We discuss how to use these features to obtain finer control over the inference results.The second example (in Section 4.2) also demonstrates a system with equations that explicitly depend on time.

Choice of discretization set
MAGI constrains the GP to satisfy the ODE system derivatives at the points in the discretization set I. Therefore, increasing the denseness of I may lead to more accurate inference in some cases, with the trade-off being longer computation time.In practice, it can be a useful strategy to consider running MagiSolver with an increasing sequence of discretization sets I 0 ⊂ I 1 ⊂ • • • to ensure that the estimates obtained are stable.
For the initial run, we recommend taking I 0 as the smallest evenly-spaced set (or approximately so) that includes the observation time points τ .This can help ensure that the system dynamics are adequately captured throughout the modeled time interval -note that this is not a requirement for running MagiSolver itself, which can handle any kind of spacing between time points.Then, we can construct subsequent sets I j , j ≥ 1 by inserting one equally-spaced point between each pair of adjacent time points in I j−1 .
The function setDiscretization can be used to prepare data matrices y according to this strategy.The command setDiscretization(y, level = j) returns a data matrix with 2 j − 1 equally-spaced points inserted between each observation of y (i.e., j = 0 returns the original matrix y); this works well when y are evenly spaced.The alternative syntax setDiscretization(y, by = incr) can be useful when the observations in y are unevenly spaced: It returns a data matrix with time points inserted (as needed) to form an equallyspaced discretization set from the first to last observations of y, with interval incr between successive discretization points.
Mathematically, as the discretization set becomes more dense, the contributions of the terms in Equation 3 , where our is the total number of discretization points divided by the total number of observations (aggregated over all components).For example if |I| is doubled, then tempering effectively reduces the GP contribution on the log-scale by half to compensate.A custom value for β can be set by the user via priorTemperature in the optional control list to MagiSolver, but this is not generally recommended.
We illustrate this idea of increasing discretization sets, on a dataset of noisy observations simulated from the classic FitzHugh-Nagumo (FN) equations for X = (V, R) that model spike potentials of neurons (FitzHugh 1961): where V and R are the voltage and recovery variables, and θ = (a, b, c) are the parameters to be estimated.
We begin by loading the dataset and setting a random seed for reproducibility: R> data("FNdat", package = "magi") R> set.seed(12321) The observation time points of FNdat are t = 0, 0.5, 1, . . ., 10 at intervals of 0.5, along with t = 11, 12, 13, 14, 15, 17, 20.Following the suggested strategy above, we create a data matrix corresponding to the first discretization set I 0 , taken to be the 41 equally-spaced points {0, 0.5, 1, . . ., 20}: R> y_I0 <-setDiscretization(FNdat, by = 0.5) We also create data matrices corresponding to the denser sets I 1 , I 2 , I 3 by successively inserting one equally-spaced time point between existing ones: R> y_I1 <-setDiscretization(y_I0, level = 1) R> y_I2 <-setDiscretization(y_I0, level = 2) R> y_I3 <-setDiscretization(y_I0, level = 3) The gradients of the ODEs with respect to X and θ are as follows: We can now run MagiSolver with the discretization sets we constructed and 10000 HMC iterations.Since the noise level is unknown in this dataset, σ will also be inferred via MCMC sampling.Note that the dimensionality of the variables being sampled effectively doubles with each successive discretization set.Thus, the outputs of the HMC iterations can become increasingly "sticky" (i.e., having higher autocorrelation) with denser discretization sets.For more detail on this point, see Section 4.3 for a discussion of HMC and its settings.One way to circumvent this is to increase the number of leapfrog steps per HMC iteration.We have illustrated that below, by setting nstepsHmc = 1000 for our densest set I 3 (otherwise the default is 200 leapfrog steps).To compare the estimates, we make use of summary() to extract the posterior means and 95% credible intervals for both θ and σ from each model fit:

R>
R> FNpar.names<-c("a", "b", "c", "sigmaV", "sigmaR") R> FNsummary <-lapply(list(FNres0, FNres1, FNres2, FNres3), + function(x) summary(x, sigma = TRUE, par.names= FNpar.names)) We then plot these posterior summaries for each parameter and discretization set: The panels of Figure 6 show that the posterior means (points) and credible intervals (vertical bars) visibly shift for b and σ V , as we use the successive discretization sets from I 0 to I 1 to I 2 .Meanwhile comparing I 2 and I 3 for all the parameters, the posterior means are fairly similar and the credible intervals largely overlap, indicating that the inference results are stable.

Setting of hyper-parameters
The GP prior on x(t) has two ingredients: The mean function µ(t) and covariance kernel/function K.In applications of GPs, it is customary to set µ(t) = 0 in the absence of specific prior information (see, e.g., Chapter 2 of Williams and Rasmussen 2006).The reason this can work well in practice is that the GP, once conditioned on observations, will have a mean function that tends to follow the observations.In magi, the GP is additionally conditioned on the ODE manifold constraint, which further aligns the GP mean function with the dynamics specified by the ODEs.Thus, our general recommendation is to assume a zero-mean GP prior for simplicity; all of the examples in the paper take this approach and have good inference results.The user may input a custom prior mean function to magi by evaluating µ(I) and μ(I), i.e., µ(t) and its derivative evaluated at the discretization set I, and providing them to MagiSolver via the optional arguments mu and dotmu.One potential scenario where this might be helpful is to provide prior guidance for unobserved components, though as demonstrated by the recovery of the unobserved H component in the Hes1 example (Section 3), this kind of input is not generally needed.
For a given covariance function K, the GP hyper-parameters ϕ control the overall prior variance level and prior smoothness/bumpiness of each component's trajectory.Specifically, the default choice and our recommendation for general usage in magi is a Matern covariance function K of the form where r = |s − t| is the absolute difference between two time points s and t, ν = 2.01 is the smoothness parameter (which ensures twice-differentiable curves), Γ is the gamma function, and B ν is the modified Bessel function of the second kind.Larger values of ϕ 1 therefore favor curves with higher variance, and larger values of ϕ 2 favor curves with more time-dependence between nearby time points.Each system component has its own set of (ϕ 1 , ϕ 2 ) values to ensure that the GP has sufficient flexibility to model its dynamics.
Other covariance functions are available in magi and may be selected using the kerneltype optional argument to MagiSolver.See Appendix F for their specification and details.Note that for a covariance function to be compatible with magi, the corresponding GP prior on x(t) must satisfy two conditions: (i) the GP derivative ẋ(t) exists, as implied by the definition of the ODE structure; (ii) ẋ(t) is also a GP.Together, this requires the covariance function K to be twice-differentiable (i.e., ∂ 2 ∂s∂t K(s, t) exists).The number of times that K is differentiable (with respect to r) relates to the smoothness of the GP; higher-order differentiability corresponds to smoother curves.Taking the Matern covariance in Equation 5 specifically, K is k-times differentiable if and only if ν > k; thus, smaller values of ν are more capable of modeling rough or bumpy trajectories (see, e.g., pp.84-85 of Williams and Rasmussen 2006).This motivates our default choice of generalMatern (i.e., Equation 5 with ν = 2.01, so that the kernel meets the twice-differentiable requirement and is capable to model relatively rough curves), which in our experience gives the best inference results over the widest range of system behavior (whether rough or smooth).Other covariance functions commonly used in GP applications (such as the Matern covariance with ν = 5/2 or the radial basis function kernel) encode stronger smoothness assumptions that hinder the GP from capturing sharp changes in x(t), which may in turn lead to bias in the parameter estimates.3By default, MagiSolver automatically estimates ϕ for each system component, depending on the availability of observed data.Briefly, for components with observations, a GP with the selected covariance function is fitted to the data via maximum a posteriori (MAP) estimation with a weakly informative Normal prior for ϕ 2 (and flat priors otherwise), which provides ϕ and a value of σ to initialize MCMC sampling (if σ is unknown).Then to handle unobserved components, optimization is applied to the full posterior in Equation 3 as a function of θ and ϕ, x(I) for unobserved components, with the previously initialized values of σ, ϕ, and x(I) for observed components held fixed.
The values of ϕ are held fixed during MCMC sampling for θ, σ, and x(I).This may be contrasted with a full Bayesian approach for handling ϕ, where ϕ would also be sampled.MCMC sampling for ϕ is however expensive as each update requires recomputing the covariance matrices associated with x(I) (e.g., see Titsias, Rattray, and Lawrence 2011) and ẋ(I) as needed in magi.Thus, we follow the approach of estimating ϕ based on its marginal likelihood and holding it fixed (e.g., see Chapter 5 of Williams and Rasmussen 2006); one potential disadvantage is that this approach does not account for uncertainty in ϕ.In practice, credible intervals for θ tend to be fairly stable so long as ϕ lies within a range that is appropriate for the data; an empirical assessment for this point is provided at the end of this section.
Fixing ϕ also allows us to leverage techniques to speed up calculations on the covariance matrices associated with the GPs.Specifically, let K d (s, t) be the fitted GP covariance function for component d, then the following |I| × |I| matrices are involved in its GP multivariate normal distribution at the discretization points in I: where Here, C d is the covariance matrix of x d (I), Ψ d is the covariance matrix of ẋd (I), and m d is the projection matrix that maps x d (I) to the mean function of ẋd (I).With the GP structure, the covariance between two time points tends to be non-negligible only when they are nearby; thus the off-diagonal entries of C d , Ψ d , and their inverses quickly decay to zero.Likewise, the relation between x d (I) is its derivative ẋd (I) is local, so off-diagonal entries of m d also decay to zero quickly.Since ϕ is fixed, the matrices C −1 d , Ψ −1 d , and d needed in the multivariate normal densities (Equation 3) can be pre-computed.Furthermore, we may approximate each of them with sparse band matrices (i.e., non-zero only within bandSize diagonals on either side of the main diagonal), which reduces the matrix multiplication complexity in Equation 3from O(|I| 2 ) to O(|I|).This band matrix approximation works best when I is evenly-spaced, as recommended in Section 4.1.In our experience, a band size of 20 works for most problems and is the default in magi.If the approximation fails (i.e., the quadratic form diverges), a warning message that suggests a larger band size will be automatically shown.A different band size can be chosen by setting bandSize in the control list to MagiSolver.
Often, the automatic estimates of ϕ will be within a reasonable range that permits accurate inference of θ; however, this is not guaranteed for all datasets, in which case we may manually override their values for better control over the inference results.This is done by supplying phi in the control list to MagiSolver.magi includes the gpsmoothing function for fitting a GP to data, along with the gpmean and gpcov functions to compute the resulting mean vector and covariance matrix conditioned on the data.This allows the user to examine and assess the estimated values of ϕ prior to running MagiSolver.
The example in this section demonstrates how these functions can be used, and how the appropriateness of hyper-parameter choices can be assessed, in the context of a system with equations that explicitly depend on time.We consider the three-component system X = (T U , T I , V ) for HIV infection described in the simulation study of Liang, Miao, and Wu (2010), where T U , T I are the concentrations of uninfected and infected cells, and V is the viral load: In this system, η(t) = 9 × 10 −5 × (1 − 0.9 cos(πt/1000)) is an oscillating infection rate over time (in days), and the parameters to be estimated are θ = (λ, ρ, δ, N, c).Functions for these ODEs (hivtdmodelODE) and their gradients (hivtdmodelDx and hivtdmodelDtheta) are shown in Appendix D. Then the odeModel list for input to MagiSolver is as follows: R> hivtdmodel <-list( + fOde = hivtdmodelODE, + fOdeDx = hivtdmodelDx, + fOdeDtheta = hivtdmodelDtheta, + thetaLowerBound = rep(0, 5), + thetaUpperBound = rep(Inf, 5)) We define the component names and labels for later use: R> compnames <-c("TU", "TI", "V") R> complabels <-c("Concentration", "Concentration", "Load") We also create a list with the simulation inputs (parameter values theta, initial conditions x0, noise levels sigma, and observation times) that mimic those in the referenced paper: R> param.true<-list(theta = c(36, 0.108, 0.5, 1000, 3), + x0 = c(600, 30, 1e5), sigma = c(sqrt(10), sqrt(10), 10), + times = seq(0, 20, 0.2)) Next we invoke the numerical solver and simulate noisy observations from this system over the specified time interval (t = 0 to 20) with measurements at intervals of 0.2 and noise SD param.true$sigma,again with a random seed set for reproducibility: These noisy observations are plotted as the black points in Figure 8a for each component.The system dynamics are characterized by rapid changes from t = 0 to t = 5, with the V component exhibiting a particularly steep decline during the first day.
We can use gpsmoothing to perform a preliminary GP fit and obtain estimates of ϕ and σ for each component.The inputs to gpsmoothing are the noisy observations and vector of time points (ODE information is not used at this stage), and the function returns a list with phi and sigma elements.Its usage is demonstrated as follows, where we create phiEst and sigmaInit to store the results, then perform GP fitting for each system component and extract the estimates: R> phiEst <-matrix(0, nrow = 2, ncol = ncol(y) -1) R> sigmaInit <-rep(0, ncol(y) -1) R> for (j in   Next, we can visualize the GP fit implied by these values of ϕ and σ, with the help of the gpmean and gpcov functions.These compute the GP mean vector and covariance matrix conditioned on the observations, ϕ, and σ.To plot a smooth curve of the GP fit, we carry out this computation on a fairly dense set of time points (tOut), and the diagonal of the covariance matrix can be used to produce credible bands (e.g., ±1.96 SDs): R> tOut <-seq(0, 20, by = 0.025) R> for (j in , cex = 0.5) + lines(tOut, fitMean, type = "l", col = "steelblue") + } The resulting blue curves in Figure 8a show the GP means of each component, and the gray bands are 95% intervals (i.e., mean ± 1.96 SD, where the SDs are extracted from the corresponding GP covariance matrix).We see that the automatically estimated GP hyperparameters ϕ 1 and ϕ 2 allow the mean curves to follow the observations reasonably well for components T U and T I ; however, the sharp trend in component V is not captured.The fitted mean curve for V is close to zero, or equivalently, the observations for V are incorrectly attributed as noise.Recall that at this stage, the mean curves are conditional on the observations only (i.e., the ODE manifold constraint is not yet included), so some wiggliness is not unusual, as seen for components T U and T I .The key visual indicator for a reasonable ϕ estimate is a mean curve that roughly captures the trajectory implied by the observations.The phenomenon seen in the V component might be explained by the fact that GP fitting tends to prefer smoother curves (due to the twice-differentiable requirement, see Section 4.2), which has the effect here of "oversmoothing" the trajectory to a near-constant mean function (see also Footnote 3).Using these default hyper-parameters could in turn lead to incorrect inference of θ.To address this issue, we could (i) choose a smaller value of ϕ 2 so that the GP can model sharper changes, and (ii) adjust ϕ 1 according to the overall scale of the component.We examine the automatic estimates of ϕ and σ so that we can make adjustments: R> colnames(phiEst) <-compnames R> phiEst TU TI V [1,] 37538.16828 11083.870501 14299.089897 [2,] 3.91358 2.755717 1.731937 R> sigmaInit [1] 3.378930 3.757803 14158.670863 These estimates appear reasonable for the T U and T I components, with a fairly small sigmaInit that corroborates Figure 8a, i.e., the GP fit follows the observed data closely.This is further evidenced by the ϕ 1 values for T U and T I , where we see that √ ϕ 1 resembles the scale of those components: ∼600 for T U and ∼100 for T I .However, the very large value σ ≈ 14000 for V confirms that the sharp decline in its trajectory is being incorrectly fitted as noise.This is also reflected in its ϕ estimates: ϕ 1 ≈ 14300 is too small to adequately capture the variation in V , while the bandwidth ϕ 2 ≈ 1.7 is too large to model the sharp initial decline.
Following the strategy described above, we use the control list to manually specify more suitable values of ϕ for the V component.In practice, inference is relatively insensitive to ϕ over a reasonable range of values, so a high level of precision is not required for this step.We increase ϕ 1 to 10 7 and decrease ϕ 2 to 0.5: Since the noise levels σ are treated as unknown, the values obtained by GP fitting are only used to initialize the MCMC sampler.We may initialize σ at a more reasonable (smaller) value for V that corresponds with the update to ϕ, which can also help obtain faster MCMC convergence:

R> sigmaInit[3] <-100
To assess whether these manual adjustments to ϕ and σ for V are reasonable, we can recalculate the GP mean and covariance with these new values: We add the updated mean curve (plotted in orange) and 95% intervals to Figure 8a: R> gp_UB <-fitMean + 1.96 * sqrt(diag(fitCov)) R> gp_LB <-fitMean -1.96 * sqrt(diag(fitCov)) R> polygon(c(tOut, rev(tOut)), c(gp_UB, rev(gp_LB)), + col = "gray80", border = NA) R> lines(tOut, fitMean, col = "darkorange") We see that the mean curve now follows the overall trajectory of the V observations.The 95% bands are very narrow and not visible in the plot.This visually confirms that the new values of ϕ for component V are reasonable as input to MagiSolver for carrying out further inference.
We proceed to create a discretization set I that adds one equally-spaced time point between observations:

R> y_I <-setDiscretization(y, level = 1)
Then we run MagiSolver, supplying phi and sigma using the values we specified (recall that phi is fixed at its supplied value, while sigma will be sampled via MCMC starting from its supplied value): R> HIVresult <-MagiSolver(y_I, hivtdmodel, control = list( + phi = phiEst, sigma = sigmaInit)) We compute posterior means and 95% credible intervals for θ and σ: R> summary(HIVresult, sigma = TRUE, par.names= c( + "lambda", "rho", "delta", "N", "c", "sigma_TU", "sigma_TI", "sigma_V")) The estimates for λ, ρ, δ are very close to the true values, while N and c have some slight bias.The estimates of σ likewise reflect their true simulation values, including σ V which we had initialized at 100.We also extract and plot the inferred trajectories (green) in Figure 8b together with 95% credible bands (light blue) and the truth (red) superimposed: In Figure 8b, the inferred trajectories are indistinguishable from the truth, and the 95% credible bands are very narrow.The bands are only visible for portions of the T I trajectory.Thus, good inference results can be obtained in this example by ensuring that the choice of the hyper-parameters ϕ is reasonable.
Lastly, we provide an empirical assessment for the sensitivity of the parameter inference to the GP hyper-parameters ϕ.Since ϕ is held fixed during MCMC sampling, estimates and credible intervals for θ can have some dependence on the specific value of ϕ used.In this HIV example, we used ϕ values that were automatically fit for T U and T I , and manually specified for the V component.To assess the effect of ϕ, we considered randomly sampling new values for each ϕ 1 and ϕ 2 , and re-running MagiSolver with those values.Each entry of ϕ was drawn uniformly as [2/3, 3/2] times its original value, which could be considered a reasonable range of variation: We repeated this for 100 different random seeds.Figure 9 plots the posterior means (points) and 95% credible intervals (vertical bars) for the five parameters over all these random ϕ.The upper and lower limits of the 95% credible intervals inferred using the original values of ϕ are shown via the dashed horizontal lines.For all five parameters and 99 of 100 repetitions, the 95% credible intervals overlap with the original ones.The intervals for λ, ρ, δ are especially robust to the choice of ϕ.Overall, this experiment empirically demonstrates fairly stable inference for θ, provided that ϕ lies within a range that is appropriate for the data.

Hamiltonian Monte Carlo
HMC is an MCMC sampling algorithm that leverages Hamiltonian dynamics (see, e.g., Leimkuhler and Reich 2004) to obtain draws from a target probability distribution.Via its joint consideration of "position" and "momentum" variables, samples generated by HMC can explore the target distribution more effectively than those of random walks (Neal 2011).In this subsection, we review the key concepts of HMC and its implementation options available in magi.
The ingredients of HMC are setup as follows.Let q be a set of "position" variables with negative log-density U (q) (up to an additive constant), so that the target probability density can be written as π(q) = 1 Z exp(−U (q)) with Z being the normalizing constant.Under Hamiltonian dynamics, U (q) is interpreted as the potential energy at position q.Further, let ∇U (q) denote the gradient vector of U (q), with respect to q.In magi, q consists of all the variables to be sampled: The trajectories x(I) and the parameters θ (which also includes the noise levels σ if they are unknown); the function U (•) is the negative log of their joint posterior density, as shown in Equation 3. Next, HMC introduces "momentum" variables p with the same dimension as q, and we define their kinetic energy as K(p) = p ⊤ p/2.The Hamiltonian then combines the kinetic and potential energies, defined as H(q, p) = U (q) + K(p).
With this setup, exp[−H(q, p)] specifies a joint density (up to a multiplicative constant) of q and p, which can be interpreted as follows: (i) q and p are independent random variables; (ii) the marginal distribution of q is the target posterior of interest in Equation 3; (iii) the marginal distribution of p is multivariate standard normal.Intuitively, HMC works on this joint density so that we obtain the samples of interest for q, and the role of p is to facilitate the sampling efficiency.
From a current state for q, one iteration of the HMC algorithm generates the next state for q as follows, where L is a positive integer and ϵ > 0 is a vector of step sizes: 1. Draw p from a standard multivariate normal distribution, then set (q 0 , p 0 ) = (q, p).
(b) Take a full-step for the position by setting q l = q l−1 + ϵ • p.4 (c) Take a half-step for the momentum (using the updated position) by setting p l = p l−1 − (ϵ/2) • ∇U (q l ).
3. The proposed state is (q * , p * ) ≡ (q L , p L ).Accept q * as the next state of q with the usual Metropolis acceptance probability, i.e., min[1, exp(−H(q * , p * ) + H(q, p))].If the proposed state is rejected, the next state of q is set to be the same as its Two main aspects of the HMC algorithm can thus be tuned: The step size vector ϵ, and the number of leapfrog steps L. In magi, these can be supplied by the user in the list of optional inputs to MagiSolver: L is specified via nstepsHmc and a starting factor for ϵ is specified via stepSizeFactor.Two other options in MagiSolver related to MCMC sampling are also worth mentioning: The number of HMC iterations to run is specified via niterHmc, and the proportion of MCMC samples to discard as an initial burn-in period is specified via burninRatio.We discuss each of these and our practical recommendations below.The step-size vector ϵ controls the accuracy of the Hamiltonian approximation: If ϵ is too large, the HMC proposals will have a high rejection rate; while if ϵ is too small, the HMC proposals will move slowly around the target posterior distribution.Thus ϵ needs to be carefully tuned to achieve good HMC performance.As suggested in Neal (2011), the optimal acceptance rate of HMC is 65%; moreover, the tuning of ϵ can be done independently of L. magi handles these aspects of ϵ via automatic tuning during the burn-in period (i.e., the first burninRatio * niterHmc iterations).Briefly, magi uses a moving window of 100 iterations to monitor: (i) the acceptance rate, so that ϵ is increased (or decreased) if the acceptance rate is above 90% (or below 60%); (ii) the SD of each variable in q, so that the individual step sizes in ϵ are adapted to follow the scale of each variable. 5The optional stepSizeFactor input can be a scalar (applied to all the variables in q) or a vector (with the same length as q), that specifies the starting value of ϵ.In our experience, magi's automatic tuning can quickly identify the range of ϵ needed for efficient HMC sampling, so that there is usually no need to manually specify stepSizeFactor.If the acceptance rate of the first many (e.g., thousands) HMC iterations is 0% or 100%, overriding the default stepSizeFactor = 0.01 with a value that is several orders of magnitude smaller (if the acceptance rate is 0%) or larger (if the acceptance rate is 100%) could help speed up convergence.
Since automatic tuning of ϵ is limited to the burn-in period, the default burninRatio = 0.5 usually provides a good balance between samples used for convergence/tuning and samples for final inference.The number of HMC iterations (default niterHmc = 20000) can be set to balance computational time constraints with the Monte Carlo variance of the resulting estimates.To monitor sampling progress when running MagiSolver, setting verbose = TRUE will print a message to the console every 100 HMC iterations.
In magi, the number of leapfrog steps L is fixed for all HMC iterations.The default value of nstepsHmc is 200 and should work well in many cases. 6We recommend using traceplots (which can be generated via the convenience plot() function on the output of MagiSolver) to visually diagnose whether a larger L might be needed.The overall cost per HMC iteration is roughly proportional to L, so L should only be increased if necessary.As the discretization set I is taken to be increasingly dense (see Section 4.1), the increased dimensionality of q may lead to "sticky" HMC samples with high autocorrelation.To illustrate, let us revisit the FN dataset and run MagiSolver on y_I3 (which has 321 discretization points per component) with the default L = 200: R> data("FNdat", package = "magi") R> set.seed(12321)R> y_I0 <-setDiscretization(FNdat, by = 0.5) R> y_I3 <-setDiscretization(y_I0, level = 3) R> FNres3b <-MagiSolver(y_I3, fnmodel, control = list(niterHmc = 10000)) Then, we examine the traceplots of the parameters, shown in Figure 10: R> plot(FNres3b, type = "trace", par.names= FNpar.names,sigma = TRUE) We see that the MCMC samples exhibit some non-stationary patterns, rather than appearing as random scatters for each parameter.This is most evident for parameters b and c, where the Markov chain can remain in one region of their distribution for several hundred iterations at a time (hence "sticky").This issue can usually be alleviated by increasing L, for example by running MagiSolver with nstepsHmc = 1000 as in Section 4.1, then the re-drawn traceplots can be seen to indicate good convergence (not shown for brevity).

Benchmark comparisons with other methods
As noted in the introduction, unobserved system components pose a challenge to most other methods of ODE inference and their software implementations.In this section, we take the Hes1 example where component H is unobserved (as presented in the introduction and Section 3), and compare the inference accuracy and run time of different software packages that can handle this problem in R.
Alongside magi, we consider the deBInfer and CollocInfer packages (see Section 1.2).The deBInfer package represents a Bayesian approach to parameter estimation with the help of numerical ODE solvers, and hence is generally applicable to systems with unobserved components.The CollocInfer package is a collocation-based penalized likelihood method that uses a B-spline basis.While CollocInfer (along with pCODE, which is based on the same underlying methodology) can be used to perform inference with an unobserved component, an estimate of the B-spline basis must be supplied by the user, as we detail below.The other R packages described in Section 1.2 do not have the capacity for unobserved components.
To generate simulated datasets for this comparison, we follow the same procedure described in Section 1.1, with 100 different random seeds.Since deBInfer and CollocInfer do not directly provide estimates of the inferred trajectories, for fair comparison we use each method to obtain estimates of the parameters θ = (a, b, c, d, e, f, g) and initial conditions P (0), M (0), H(0).Using these estimated parameters and initial conditions, we run the numerical solver to reconstruct the trajectory implied by the estimates.For a given method, we then compute the RMSE between this reconstructed trajectory and the true trajectory (i.e., the solid curves in Figure 1) for each system component, at the 33 observation time points.We call this the trajectory RMSE metric, as in Yang et al. (2021).Thus, on each simulated dataset, the three methods are run as follows to obtain estimates of θ and P (0), M (0), H(0): • magi is run as described in Section 3, which infers the parameters and system trajectories.The posterior means of θ and P (0), M (0), H(0) from the MagiSolver output are taken as the estimates.• For deBInfer, we set up a normal likelihood for the observations of P and M on the log-scale, with known SD from the simulation setup.The priors for θ were set to be uniform over restricted ranges: a, b, c, d, e are taken to be uniform on [0, 2], f is uniform on [0, 100], and g is uniform on [0, 10].Without imposing these informative priors on θ (i.e., which contrast with the uniform priors over all positive real numbers, as used in magi), the method would often fail to converge at a reasonable result.The priors for the initial conditions were uniform on the log-scale.We ran 20000 MCMC iterations, to match the number of iterations run in magi.The posterior means of θ and P (0), M (0), H(0) from the de_mcmc output are taken as the estimates, after discarding the first 10000 iterations as burn-in.
• For CollocInfer, the method begins by computing initial estimates of the B-spline basis.However, the package does not have the ability to compute such estimates when there are unobserved system components.Therefore, manual input is needed in this case to supply these estimates, which we do as follows.First, we use the package functions to fit the B-spline basis given the true values of all the system components at the 33 observation time points.In real data analyses, such true values would not be available, so CollocInfer is given an additional advantage by taking this approach.Second, we take the B-spline fit for the unobserved component obtained from the first step, together with the actual noisy observations of P and M , and run the main CollocInfer method with its default settings to obtain the final estimates of θ and P (0), M (0), H(0).
The full code to benchmark each method is provided in the replication materials.
The results of the three methods over the 100 simulated datasets are summarized in Table 1.
For each method, the average run time and trajectory RMSE of each system component is shown.For the unobserved H component, magi can reliably recover its trajectory, while deBInfer and CollocInfer cannot, as indicated by the large RMSEs.For the protein (P ) and mRNA (M ) components, where observations are available, the results are relatively close, though magi still outperforms the other two methods.In terms of run time, magi is the fastest of the three methods, averaging 6.2 minutes per dataset.CollocInfer, as a frequentistbased optimization method, is also relatively fast; deBInfer, which relies on the numerical solver at each iteration to compute the likelihood, was significantly slower than the other two methods.All of the computations were carried out using a single CPU core of an Intel Xeon 3.7 GHz processor.Overall, these results illustrate the favorable performance of magi on this inference problem.

Conclusion and discussion
The inference problem for dynamic systems is a vital task in science and engineering, which widely use ODE models.In practice, the experimental data collected from these systems may often be noisy and sparse.Furthermore, some components of the system may be entirely unobserved.These features pose challenges for estimating the unknown parameters and reconstructing the system trajectories without numerical integration.Existing software packages for this task, to the best of our knowledge, cannot readily handle unobserved components without substantial manual input.This paper presented our package for the MAGI method (Yang et al. 2021), which capably handles these inference problems in a principled Bayesian framework using manifold-constrained Gaussian processes.The user may choose any of R, MATLAB, and Python to carry out the analyses.Scripts that demonstrate the equivalent functionality in all three environments are included in the replication materials.
We believe the methodological approach of magi is quite extensible.We discuss some interesting directions for future developments to the software package in the following.
• Extending the inference framework to allow for time-varying parameters.The current implementation of magi assumes time-constant parameters θ, which covers a broad range of dynamic systems.In some cases, a more flexible time-varying specification θ(t) is needed, e.g., for pharmacokinetic parameters (Li, Brown, Lee, and Gupta 2002) and disease transmission rates (Keller, Zhou, Kaplan, Anderson, and Zhou 2022).GP priors for θ(t) could potentially be incorporated into the magi framework as well to handle this situation.
• Incorporating more flexible choices for the measurement error model and priors for the parameters.The magi package currently assumes the noise term can be adequately modeled as additive and Gaussian.Multiplicative noise can be handled by taking a log-transformation, as demonstrated in the Hes1 example.If the measurement noise is significantly non-Gaussian, allowing a custom specification for the likelihood p(y(τ ) | x(I)) could be a useful feature.Further, the priors for θ are assumed to be flat (uniform) between the user-provided bounds thetaLowerBound and thetaUpperBound.While this assumption may often be adequate (as transformations may also be applied to parameters as needed), custom priors π(θ) could allow more specific prior knowledge of the parameters to be incorporated into the inference.
• Functions to help set up the ODE gradients with respect to θ and X. Supplying the analytic gradients ∂f (X, θ, t)/∂X and ∂f (X, θ, t)/∂θ enables magi to draw MCMC samples efficiently via its HMC implementation for the target posterior density.The ability to easily generate code for these gradients from the ODEs (e.g., with the help of symbolic differentiation) or have them automatically computed (i.e., automatic differentiation or autograd) could potentially reduce the time needed to set up a dynamic system in magi.
In R, support for autograd is currently experimental and computationally inefficient compared with supplying the analytic gradients; a proof-of-concept that uses autograd with magi is provided in Appendix E for the interested reader.

A. magi usage in MATLAB
To use magi in MATLAB, first clone the repository at https://github.com/wongswk/magi,e.g., by executing on the command-line git clone https://github.com/wongswk/magi.On Linux-compatible systems, build the core C++ magi library by running the build.shshell script, then run the MATLAB_build.shshell script in the MATLABmagi directory to generate the compiled MEX files.For Windows systems, pre-compiled versions of the magi library and MEX files are provided in the MATLABmagi/windows directory.Ensure that libcmagi.so(or libcmagi.dll in Windows), along with the accompanying .mroutines and compiled MEX files of the package, are located in either the working directory or the path of MATLAB.
The replication package (supplementary .zipfile provided with this article) includes the MAT-LAB directory which contains the materials for running the three examples discussed in this paper in MATLAB.Specifically: • The models/ subdirectory contains the functions for the ODE systems, since the MAT-LAB convention is one function per file.For example, for the FitzHugh-Nagumo (FN) equations, -fnmodelODE.mcodes the ODEs, -fnmodelDx.m codes their gradients with respect to X -fnmodelDtheta.mcodes their gradients with respect to θ -fnmodelODEsolve.m codes the system in a form suitable for invoking ODE solvers such as ode45.
• replication.m is the MATLAB replication script that carries out the same analyses described in the main paper.Please note that the random seeds between R and MATLAB are not interchangeable so their corresponding numerical results are expected to have slight differences attributable to random-number generation.Otherwise, the functionalities of the R and MATLAB replication scripts are identical, so that users can easily follow the equivalent syntax in MATLAB for the examples given in this paper.
We briefly point out two main differences of note between the syntax in R and MATLAB: • In MATLAB, struct arrays are used in place of 'list' objects in R. Taking the FN equations as an example, to set up odeModel for input to MagiSolver we create a struct with the five required elements, where fOde, fOdeDx, and fOdeDtheta are assigned their corresponding function handles: fnmodel.fOde = @fnmodelODE; fnmodel.fOdeDx= @fnmodelDx; fnmodel.fOdeDtheta= @fnmodelDtheta; fnmodel.thetaLowerBound= [0 0 0]; fnmodel.thetaUpperBound= [Inf Inf Inf]; Similarly, the control list is set up by creating a struct in MATLAB, e.g., config.niterHmc= 10000; to run 10000 HMC iterations, and then config can be passed as the control argument to MagiSolver.
The output of MagiSolver is also a struct; e.g., if FNres0 contains the output of a MagiSolver run, the matrix of MCMC samples for θ would be accessed as FNres0.theta.
The convenience functions summaryMagiOutput() and plotMagiOutput() are provided (analogous to summary() and plot() in R) to generate a table of parameter estimates (with credible intervals) and visualize the inferred trajectories from the output of MagiSolver.
• In MATLAB, optional function arguments need to be explicitly skipped by passing [].
For example, MagiSolver allows the time vector to be passed as a separate argument from the data matrix y.When the time vector is included as the first column in y, we would skip the third argument as follows: FNres0 = MagiSolver(y, fnmodel, [], config);

B. magi usage in Python
To use magi in Python, first clone the repository at https://github.com/wongswk/magi,e.g., by executing on the command-line git clone https://github.com/wongswk/magi.Build the core C++ magi library by running the build.shshell script, then run the py_build.shshell script in the pymagi directory to build the pymagi.soPython library.Ensure that this pymagi directory is contained in Python's path.
The replication package (supplementary .zipfile provided with this article) includes the python directory which contains the Python script replication.pythat carries out the same analyses for the three examples discussed in the main paper.Please note that the random seeds between R and Python are not interchangeable so their corresponding numerical results are expected to have slight differences attributable to random-number generation.Otherwise, the functionalities of the R and Python replication scripts are identical, so that users can easily follow the equivalent syntax in Python.
We briefly point out two main differences of note between the syntax in R and Python: • In Python, we construct the odeModel input to MagiSolver by using the helper function ode_system, rather than setting up an R 'list'.Taking the FN equations as an example, suppose fnmodelOde, fnmodelDx, and fnmodelDtheta respectively are the functions for the ODEs, gradients with respect to X, and gradients with respect to θ.Then we can call ode_system as follows: fn_system = ode_system("FN-python", fnmodelOde, fnmodelDx, fnmodelDtheta, thetaLowerBound = np.array([0,0, 0]), thetaUpperBound = np.array([np.inf, np.inf, np.inf])) where the first argument can be any string that provides a name for the system.
• In Python, the dictionary data type (dict) is used in place of the R 'list' for the control argument to MagiSolver.For example, with fn_system as defined above, we can provide the control argument in the call to MagiSolver as follows: FNres3 = MagiSolver(y = y_I3, odeModel = fn_system, control = dict(niterHmc = 10000, nstepsHmc = 1000)) The output of MagiSolver is also a dict; e.g., if FNres0 contains the output of a MagiSolver run, the matrix of MCMC samples for θ would be accessed as The convenience functions summaryMagiOutput() and plotMagiOutput() are provided (analogous to summary() and plot() in R) to generate a table of parameter estimates (with credible intervals) and visualize the inferred trajectories from the output of MagiSolver.

R>
Note that the original R array implementation hes1logmodelODE must still be passed to the MagiSolver function, as magi does not currently support direct use of torch tensors.A complete R script that demonstrates this approach of using magi with autograd is provided in the replication package.
Although autograd offers a convenient method for calculating derivative information, its computational speed is slower than hand-coded analytical gradients.For optimal performance, we recommend using hand-coded analytical gradients, as discussed in the main text of this paper.

F. Other covariance functions available in magi
As discussed in Section 4.2, the default and recommended GP covariance function for use in magi is the Matern (Equation 5) with ν = 2.01.Several other covariance kernels are also available in the package, which include some of the common choices discussed in Chapter 4 of Williams and Rasmussen (2006).Their specification and features are presented below.In each case, r is the absolute difference between two time points and ϕ are the hyper-parameters for the kernel; larger values of ϕ 1 favor curves with higher variance, and larger values of ϕ 2 favor curves with more time-dependence between nearby time points.They may be selected for use in gpsmoothing and MagiSolver by specifying their corresponding kerneltype string.
Figure1: True system trajectories (solid curves) and sample noisy observations (points) from the Hes1 system.The H component is never observed.In Section 3, we demonstrate how magi recovers the system trajectories and parameters from this sample dataset of noisy observations.

Figure 2 :
Figure2: Example visualization of the ODE manifold constraint.In panel (a), the GP is conditioned on four noisy observations (solid black dots).The colored curves are five random trajectories drawn from the resulting GP posterior, and the gray shaded area represents the 95% credible interval.In panel (b), the GP is conditioned on the same four noisy observations and also the ODE manifold constraint.The colored curves are five random trajectories drawn from the resulting GP posterior, and the gray shaded area represents the 95% credible interval with the manifold constraint.

Figure 3 :
Figure 3: MCMC traceplots for the seven parameters of the Hes1 system and the log-posterior values.The horizontal lines in the plots indicate the posterior mean (red) and limits of the 95% credible interval (green) for each parameter.

Figure 4 :Figure 5 :
Figure4: Inferred trajectories from magi for the three components of the Hes1 system on the log-scale (green curves), generated using the plot() method.The blue shaded areas represent 95% credible intervals.The asynchronous noisy observations of P and M are plotted as black circles.
associated with I, i.e., the GP prior p(x(I)) and p( Ẋ(I) = f (x(I), θ, t I ) | x(I)), would become larger relative to the likelihood of the observations.This is because the likelihood term in Equation 3 simplifies as p(y(τ ) | x(I)) = p(y(τ ) | x(τ )) for any τ ⊂ I and does not change as I becomes more dense, i.e., only the points in I corresponding to observations have an associated likelihood contribution.Therefore, magi automatically uses a tempering hyper-parameter β to maintain the balance between the GP prior and the likelihood across different discretization sets.This helps ensure that parameter inference reaches a stable result over an increasing sequence of discretization sets.Specifically, p(x(I))p( Ẋ(I) = f (x(I), θ, t I ) | x(I)) is tempered as p(x(I))p( Ẋ(I) = f (x(I), θ, t I ) | x(I))

Figure 6 :
Figure 6: Posterior means (points) and 95% credible intervals (vertical bars) for each parameter, from using MagiSolver on the simulated FN dataset with the discretization sets I 0 ⊂ I 1 ⊂ I 2 ⊂ I 3 .

Figure 7 :
Figure 7: Reconstructed system trajectories (solid curves) based on the estimated parameters and initial conditions from the simulated FN dataset using MagiSolver with the discretization sets I 0 ⊂ I 1 ⊂ I 2 ⊂ I 3 .The noisy observations (points) are also plotted.
inferred trajectories (green curves) for the HIV model with 95% credible bands (light blue).The true trajectories are superimposed in red.Note that the credible bands are very narrow, and are only visible for the T I component.

Figure 8 :
Figure 8: Hyperparameter setup and inference for a sample dataset simulated from the HIV model.Panel (a) provides a visual assessment of the GP hyper-parameters and their initial fit to the data.Panel (b) shows the final inferred trajectories from magi, which closely follow the true trajectories.

Figure 9 :
Figure 9: Effect of hyper-parameter values ϕ on the posterior means (points) and 95% credible intervals (vertical bars) for each parameter on the HIV simulated dataset.Each bar represents one random set of ϕ values.The upper and lower limits of the 95% credible intervals inferred using the original values of ϕ are shown via the dashed horizontal lines.

Figure 10 :
Figure 10: MCMC traceplots that indicate relatively high autocorrelation, based on an example run for the FN system with a dense discretization set and 200 leapfrog steps per HMC iteration.The horizontal lines in the plots indicate the posterior mean (red) and limits of the 95% credible interval (green) for each parameter.
dotmu: A numeric matrix with dimension |I|×D, specifies values for the derivatives of the GP prior mean function for each component.Default is zero.

Table 1 :
Performance of the three compared methods over 100 simulated datasets from the Hes1 model.For each method, the mean run time and trajectory RMSE of each component is reported.The H component is never observed.