Dynamic Modeling, Parameter Estimation and Uncertainty Analysis in R

In a wide variety of research ﬁelds, dynamic modeling is employed as an instrument to learn and understand complex systems. The diﬀerential equations involved in this process are usually non-linear and depend on many parameters whose values decide upon the characteristics of the emergent system. The inverse problem, i.e. the inference or estimation of parameter values from observed data, is of interest from two points of view. First, the existence point of view, dealing with the question whether the system is able to reproduce the observed dynamics for any parameter values. Second, the identiﬁability point of view, investigating invariance of the prediction under change of parameter values, as well as the quantiﬁcation of parameter uncertainty. In this paper, we present the R package dMod providing a framework for dealing with the inverse problem in dynamic systems. The particularity of the approach taken by dMod is to provide and propagate accurate derivatives computed from symbolic expressions wherever possible. This derivative information highly supports the convergence of optimization routines and enhances their numerical stability, a requirement for the applicability of soﬁsticated uncertainty analysis methods. Computational eﬃciency is achieved by automatic generation and execution of C code. The framework is object oriented (S3) and provides a variety of functions to set up dynamic models, observation functions and parameter transformations for multi-conditional parameter estimation. The key elements of the framework and the methodology implemented in dMod are highlighted by an application on a three-compartment transporter model.


Introduction
Dynamic models are found in several research fields, such as physics, biology or finance.In all these fields, models link theoretical concepts and empirical evidence.Single mechanisms or single processes of a complex system are represented by respective terms in the equations of a dynamic model.Parameter estimation can then identify those processes which are crucial to explain the observation.In that sense, parameter estimation can be employed as an instrument to understand complex systems.Once, the link between observation and model is established by the estimated parameters, questions about their identifiability arise.The parameter space needs to be explored to analyze whether the estimate is unique and to determine confidence bounds.
Although the problem of parameter estimation in non-linear dynamic systems is highly relevant and at the heart of statistical computing, to this day there are not more than four R packages published on the topic on the Comprehensive R Archive Network (CRAN), namely FME (Soetaert and Petzoldt 2010), nlmeODE (Tornoe 2012), mkin (Ranke, Lindenberger, and Lehmann 2016) and scaRabee (Bihorel 2014).All packages have in common that they are built upon the deSolve package (Soetaert, Petzoldt, and Setzer 2010).The packages mkin and FME support ODEs defined by compiled code where mkin also provides tools to autogenerate the C code and compile it using the inline package.For model fitting and uncertainty analysis, mkin fully resorts to the functionality of FME.Concerning model fitting, FME and nlmeODE (with nlme in the background) use deterministic derivative-based optimizers by default, i.e. either Levenberg-Marquardt or Newton methods provided by nls.lm(), optim() or nlmimb().Although all these optimizers support gradient or Hessian information as input, only the nlmeODE package provides an option to augment the ODE by its sensitivity equations to generate derivates for the residuals.By default, sensitivities are computed by finite differences.Last but not least, the scaRabee package uses the Nelder-Mead optimization algorithm which is derivative-free but needs in general more iterations until convergence compared to derivative-based methods.In all packages, uncertainty analysis is by default based on the variance-covariance matrix, i.e. on the inverse Hessian matrix of the least squares function.For non-linear models, this method provides a good approximation only if parameters are identifiable and the data is highly informative.If these conditions are not met, more sophisticated methods like e.g.Markov chain Monte Carlo (MCMC) sampling, implemented in FME, are required.The strength of scaRabee and especially nlmeODE is multi-conditional fitting.This means that the same model with the same parameters but different forcings to reflect experimental conditions is fitted simultaneously to the condition-specific data sets.In the context of mixed-effects modeling as provided by nlmeODE, parameters can be grouped in fixed effects (same parameters over all conditions) and random effects (parameters are different between conditions).
In this paper we present dMod, an R package on dynamic modeling and parameter estimation.The aim and core functionality of dMod is (i) facilitated set-up of dynamic models with automated C code generation for fast simulation of model predictions and model sensitivities, (ii) flexible set-up of general parameter transformations (explicit or implicit) and observation functions, allowing for the implementation of multiple experimental conditions similar to mixed-effects modeling, (iii) parameter estimation based on trust-region optimization of the negative log-likelihood, making use of the sensitivity equations of the dynamic system and of symbolic derivatives of the observation-and parameter transformation functions, (iv) identifiability and uncertainty analysis based on the profile-likelihood method to determine confidence intervals for parameters and predictions.
The core functionality is extended by two symbolic methods implemented in Python and interfaced via the rPython package: identifiability and observability analysis based on Lie-group symmetries (Merkt, Timmer, and Kaschek 2015) and steady-state constraints for parameter estimation (Rosenblatt, Timmer, and Kaschek 2016).
The consequent implementation of capabilities to generate and propagate derivatives on a compiled-code level distinguishes dMod from other modeling frameworks as discussed above.
Most of the standard optimization routines implemented in R need derivative information.However, the computation of derivatives in the context of ODE models holds some pitfalls.
The accuracy of sensitivities obtained from finite differences can be insufficient because the step control of the integrator presents an additional source of numeric inaccuracy.Even for numeric methods circumventing this problem, e.g.complex-step derivatives (Squire and Trapp 1998), the use of sensitivity equations is still beneficial judging from the accuracy vs. computational cost ratio.See (Raue, Schilling, Bachmann, Matteson, Schelker, Kaschek, Hug, Kreutz, Harms, Theis, Klingmüller, and Timmer 2013b) for a comparison of methods.
Another distinguishing feature of dMod is the handling of non-identifiability of parameters, a phenomenon occurring frequently in the context of parameter estimation in dynamic systems.In some cases, non-identifiability has structural reasons.The differential equations bear certain symmetries and these can or cannot be broken, depending on the structure of the observation.A functional relationship between parameters that leaves the observation invariant is the consequence of the latter.In other cases, the data allows to determine a unique optimum but other solutions, although worse, cannot be statistically rejected.The dMod package deals with parameter identifiability and parameter uncertainty by the profilelikelihood method (Murphy and Van der Vaart 2000;Raue, Kreutz, Maiwald, Bachmann, Schilling, Klingmüller, and Timmer 2009;Kreutz, Raue, Kaschek, and Timmer 2013).The method has proven especially useful in the case of non-identifiable parameters where results obtained from both, the quadratic approximation by the variance-covariance matrix and MCMC sampling can be misleading (Raue, Kreutz, Theis, and Timmer 2013a).Besides parameter uncertainties, the profile likelihood method allows to estimate prediction uncertainty (Kreutz, Raue, and Timmer 2012).It therefore supports the planning of new informative experiments (Raue, Kreutz, Maiwald, Klingmüller, and Timmer 2011) or suggests possible model reductions (Maiwald, Hass, Steiert, Vanlier, Engesser, Raue, Kipkeew, Bock, Kaschek, Kreutz, and Timmer 2016).
The key methods implemented in dMod are illustrated in great detail on a dynamic model of bile acid flow.The example is a show-case of how modeling is a dynamic process, using the analysis tools implemented in dMod to predict, plan new experiments, combine data from different experiments and include non-linear parameter constraints to improve parameter identifiability.Thereby, we show that dMod is a full-grown, flexible modeling environment, fast, thanks to compiled code, reliable, thanks to symbolic derivatives, and accurate, thanks to advanced statistical methods.
The paper is organized as follows: Section 2 introduces the mathematical set-up of dynamic modeling, symmetries in dynamic systems, parameter estimation by the maximum-likelihood method and the profile likelihood.Section 3 discusses the implementation and design principles behind the dMod software.The functionality of dMod is presented in Section 4 on the example of bile acid flow in a three-compartment model and finally, Section 5 discusses the two Python extensions shipped with dMod.
The dMod package is available on the Comprehensive R Archive Network at http://CRAN.R-project.org/package=dMod.The project is hosted on GitHub, https://github.com/dkaschek/dMod, where the most recent development version is available.dMod is licensed under the GPL-3 license.

Dynamic models and model sensitivities
Dynamic models describe systems with states x, usually quantifying the involved species, their interaction and evolution over time.The time evolution of the states is expressed via a set of ODEs, ẋ = f (x).Although constituting quite a special class of dynamic systems, chemical reaction networks formulated by the law of mass action as considered here allow for surprisingly general applications.Typical ODE examples derived from the law of mass action are: each generally written as where X 1 , . . ., X n are the n species involved, ν and ν encode the stoichiometry of the reaction and k 0 is the reaction rate.When several reactions are involved, eq. ( 1) generalizes to with the stoichiometry matrix S obtained from the collection of vectors ν − ν for reactions 1, . . ., r, and the flux vector v(x, k) ∈ R r .Eq. ( 2) also holds for general ODEs that do not follow from mass action kinetics.Moreover the system may be explicitly time-dependent and may contain forcings u(t) which e.g.describe the external stimulation of the system.A general form of the dynamic model is therefore given by ẋ The solution x(t, θ) for a given parameter vector θ = (k, x 0 ) is called model prediction.It is generally assumed that the stoichiometry matrix S of the system is known.Besides the model predition itself, also the sensitivity of the prediction to changes in the parameter values is of interest.The sensitivities s i = ∂x ∂θ i satisfy the sensitivity equations a system of ordinary differential equations that in general depend on the states x and, therefore, need to be solved jointly with the original ODE.
Often, the experimentally observed quantities y do not directly correspond to the species x described by the ODE, but are obtained via an observation function g : with the observation parameters c.Examples for observation functions are scaling and offset transformations, y = c s •x+c o , or the measurement of a superposition of species, y i = j c j x j .The model prediction for the observed states is obtained by evaluating the observation function on the solution of the ODE.Following the chain rule of differentation also the sensitivities of the observed states ∂y ∂θ i = ∂g ∂x ∂x ∂θ i + ∂g ∂θ i are obtained.Here, the parameter vector θ has been augmented by the observation parameters, θ = (k, x 0 , c).
The estimation of the parameters θ given the observation y(t) is addressed in the next section.

Maximum-likelihood method
Parameter estimation is a common task in statistics.It describes the process of inferring parameter values or parameter ranges of a statistical model based on observed data.Over decades, appropriate estimators have been developed for different problem classes.The principle of maximum likelihood allows to derive an estimator which is especially suited for applications where the distribution of the measurement noise is known.This knowledge about the structure of the noise constitutes additional information that makes the maximum-likelihood estimator (MLE) efficient, i.e. from all unbiased estimators the MLE has the lowest variance.Other properties of maximum-likelihood estimation are consistency, i.e. the estimated parameter value approaches the true parameter value in the limit of infinite data-sample size, and asymptotic normality, i.e. in the limit of infinite data the MLE follows a multivariate normal distribution.See (Azzalini 1996) for an introduction to likelihood theory.
Maximum-likelihood estimation is based on the maximization of the likelihood function L(θ) = φ(y D |θ).Here, φ is the joint probability density for a vector of observations y, c.f. eq. ( 5), evaluated at the point y = y D , the vector of data points.The distribution φ depends parametrically on model parameters θ.The maximum-likelihood estimator θ is defined as meaning that θ is an extremum estimator.Depending on the model class and the probability distribution φ, maximization of the likelihood can be a challenge beyond the scope of analytical methods.Numerical optimization methods help to solve the maximization problem in practical applications.
The easiest and one of the most frequent situations is when φ follows a normal distribution, φ ∼ N (y(θ), Σ).When measurements are statistically independent, the variance-covariance matrix Σ = diag(σ 2 1 , . . ., σ 2 N ) is diagonal and φ factorizes.The likelihood function reads Taking twice the negative logarithm, eq. ( 6) turns into thus converting the maximization of L(θ) to a minimization of l(θ).Assuming that the data uncertainties σ i are known, eq. ( 7) is the weighted least-squares function shifted by a constant.For unknown σ i , eq. ( 7) can be optimized with respect to θ and σ jointly, yielding the MLE θ = ( θ, σ).

Non-linear optimization
Numerical optimization is a diverse field with as many algorithms as there are optimization problems around.For our application, derivative-based methods, and in particular the Newton method, play a major role.
Optimization by the Newton method attempts to iteratively find the root of the gradient ∇l(θ) by the recursion where Hl(θ) denotes the Hessian, i.e. the matrix of second derivatives of l(θ).The fact that for normally distributed noise the log-likelihood is a least squares function, has a big advantage: gradient and (approximate) Hessian can be computed from first-order derivatives (Press, Teukolsky, Vetterling, and Flannery 1996).Since the second-order contributions to the Hessian can be neglected, gradient and Hessian read with the weighted residual vector r i (θ) = and its first derivative, the Jacobian matrix As indicated by eqs.( 9)-( 10), first order model sensitivities ∂y i ∂θ j are sufficient to compute gradient and Hessian in good approximation.
The Newton recursion, eq. ( 8), converges in one iteration if l(θ) is a quadratic form.However, l(θ) is only quadratic if the model y(θ) is linear in θ which is usually not the case for ODE models.On the other hand, each smooth function, no matter if quadratic or not, can be approximated by a quadratic function based on its Taylor series.Thus, l(θ) can be approximated by a quadratic function in at least a small region around θ.The idea of trust-region optimization is to confine a Newton step to a region, the trust region, where the quadratic approximation holds (Wright and Nocedal 1999).In each iteration, the trust-region radius is adjusted and the optimization problem restricted to the trust region is solved.For parameter estimation in ODE models this has the additional advantage that parameter changes are constricted and optimization steps into parameter regions where the ODE cannot be numerically solved any more become less frequent.This make trust-region optimization the method of choice for ODEs.

Uncertainty analysis
Parameter estimates θ are obtained by non-linear optimization of the log-likelihood function l(θ).Although being a function of the parameters, the log-likelihood depends on the data, too.Consequently, "equivalent" random realizations of the data would lead to different parameter estimates θ.Given θ for one random realization of the data, the question is, at which significance level we can reject other parameters θ.This leads to the related question of parameter convidence intervals.
A useful tool to derive confidence intervals beyond the scope of Fisher Information Matrix is the profile likelihood.Consider a parameter of interest, θ i , being one of the parameters in the vector θ.Furthermore, let c τ (θ) = θ i − τ = 0 be a parameter constraint.Then, extending the likelihood with the constraint via a Lagrange multiplier λ yields which is called the profile likelihood of the i th parameter.The path τ → ( θτ , λτ ) along the minimizing parameters is called the profile likelihood path.Hence, the profile likelihood pl i returns the minimal log-likelihood under the constraint θ i != τ .By construction, the inequality D i := pl i (τ )−pl i ( θi ) ≥ 0 holds for all τ , assuming equality at least for τ = θi .Interpreting the unconstrained model as the null model, which we assume to be true, and that with θ i fixed to τ as the alternative model, the value D is twice the log of the likelihood-ratio between those.Hence, a likelihood-ratio test can be performed to accept or reject the alternative model at a given confidence level.To compute the corresponding threshold, we assume Wilks' theorem according to which the thresholds are the quantiles of the χ 2 distribution with one degree of freedom.Consequently, the 68%/90%/95%-confidence intervals of θ i are those values of τ for which D i (τ ) ≤ 1, 2.71 and 3.84, respectively.
The reason to formulate the profile likelihood by Lagrangian multipliers is that we can directly use eq.( 11) to derive the profile likelihood path ( θτ , λτ ) with respect to τ : Eq. ( 12) forms the basis of what is implemented in dMod for computing the profile likelihood.
The profile likelihood approach can be applied to compute confidence intervals for model predictions, too, see (Kreutz et al. 2012;Hass, Kreutz, Timmer, and Kaschek 2016).

Implementation and design principles
The guiding idea behind the implementation of dMod is to provide a class structure for models, predictions, observations and parameter transformations that allows a flexible combination of many experimental conditions in one objective function to fit models to data.This flexibility is achieved by two concepts: (1) concatenation of functions by the "*" operator and (2) stacking of functions representing different conditions by the "+" operator.The handling and propagation of derivatives is part of the classes and happens in the background.Methods for the generic print(), plot() and summary() functions are implemented for most outputs.

Model formulation
A system of ODEs ẋ = f (x, p, u, t) is represented by a named character vector of symbolic expressions, the right-hand side of the ODE, involving symbols for the states x, the parameters p, the forcings u and the keyword time for t.The names of the vector are the state names.
The class provided by dMod for such objects is the eqnvec class which checks whether the character can be parsed and thus, can be interpreted as an equation.
dMod provides several ways to define the differential equations.An eqnvec of equations can be explicitly formulated, analogously to a c() command.Especially for chemical reactions it can be tedious to keep track of all gain and loss terms.Therefore, dMod provides also the eqnlist class which encodes the ODE by a list of (1) state names, (2) rate expressions, (3) compartment volumes and (4) the stoichiometric matrix.Each reaction flux, c.f. eq. ( 2), occurs only once and the gain and loss is represented by the coefficients in the stoichiometric matrix.dMod supports the read-in of a csv file with the stoichiometric matrix and rate expressions and provides a function addReaction() to construct an eqnlist object step-by-step or add further reactions to an already existing eqnlist.An eqnlist is converted to (differential) equations, i.e. an eqnvec, by as.eqnvec().The conversion does all the bookkeeping of gain and loss terms and volume ratios due to compartment transitions.
dMod makes use of derivatives wherever possible.It is one of the core functionalities of the cOde package upon which dMod is based to augment a system of ODEs by its sensitivity equations.If r parameters are involved and the system consists of n states, the number of equations grows as quick as n 2 + nr.Solving the equations can be considerably accelerated by utilizing compiled code.dMod provides the odemodel class.An object of class odemodel is generated from an eqnlist or eqnvec.In the background, C code for the ODE and the combined system of ODE and sensitivity equations is generated, written to the working directory and compiled.The odemodel object keeps track of the shared objects.It is the basis of a prediction function x(t, p) generated by the Xs() command.

Prediction functions
dMod seeks to stay close to the mathematical formulation, i.e. x <-Xs(myodemodel) will indeed return an R function, an object of class prdfn, which expects arguments times and pars and turns them into a model prediction.The letter "s" in "Xs" refers to sensitivities, i.e. the solution x(t) is returned, the sensitivities ∂x ∂p (t) are returned, too.Besides parameters, the prediction might also depend on forcings u(t) and sudden events, e.g.setting states to 0 at a predefined point in time.Since forcings and events are by default fixed, they are defined together with the prediction function, x <-Xs(odemodel, forcings = myforcings, events = myevents).The definitions of both, forcings and events, are compatible with the way they are defined in deSolve.

Observation functions
An observation function is a function g(x, p obs , t) that evaluates the solution x(t) together with additional (observation) parameters p obs .The function can explicitly depend on time t.Similar to the ODE case, an observation function can be expressed as a character vector with names corresponding to the names of the observables and equations involving symbols for the states, parameters and the keyword time.An observation function is defined as an eqnvec and turned into an R function via Y().
It is one of the fundamental concepts of dMod to allow concatenation of functions via the "*" operator.The mathematical formulation y = g • x, i.e. y(t, p) = (g • x)(t, p) := g(x(t, p), p, t) becomes y = g*x in R. To obtain derivatives ∂y ∂p , the chain rule is applied: ∂y ∂p = ∂g ∂x ∂x ∂p + ∂g ∂p .This means that Y() needs to be informed which symbols are states and parameters to generate the corresponding expressions ∂g ∂x and ∂g ∂p .The observation function g takes care of computing theses derivatives from the symbolic expressions and doing the matrix multiplication with the sensitivities ∂x ∂p from the prediction function.Evaluation of symbolic expressions can become inefficient in R. Therefore, the observation function is usually translated into a C code and compiled.
When observation-and prediction functions are concatenated, the result is a prediction function, e.g.y = g*x is the R function computing values y(t) from the arguments times and pars via evaluation of the ODE and subsequent evaluation of the observation function.Two observation functions can be concatenated, too, again yielding an observation function.

Parameter transformations
Parameter transformations are the key element of dMod to formulate different kinds of constraints and allow the combination of several experimental/modeling conditions in one parameter vector.
In principle, a parameter transformation p = Φ(θ) is a (differentiable) function connecting inner parameters p with outer parameters θ.The rationale behind the distinction of inner and outer parameters is that the vector p usually desribes those parameters defined in the model equations.The outer parameters θ refer to a convenient parameterization by which the model parameters are computed.Examples are a log-transform of the inner parameters, An R function of class parfn is produced by the P() command.Transformations can either be formulated explicitly or implicitly.In the explicit case, the function p = Φ(θ) corresponds to an eqnvec whose names are the names of the inner parameters and entries are equations with symbols for the outer parameters.An implicit transformation has the form f p = Φ(θ), θ = 0.In this case, f is expressed by an eqnvec with equations containing symbols for p and θ and the names of the eqnvec are the symbols for p.
Similar to prediction-and observation functions, parameter functions not only return parameter values but the Jacobian of the transformation, too.Exploiting the chain rule, the derivatives are propagated, allowing to define y = g*x*p which is a function returning y(t, θ) and ∂y ∂θ (t), i.e. g*x*p is a prediction function.
Let g, x, p1 and p2 be an observation-, a prediction-and two parameter transformation functions.Then p1*p2 is a parameter transformation function, x*p1 is a prediction function and g*p1 is an observation function.

Multi-conditional prediction
A set of parameter values, forcings and events capture a certain condition in which we find the modeled system.Manipulating the system, single model parameters, forcings or events need to be changed to account for the manipulation.It is a typical approach in systems inference to systematically perturb small parts of a modeled system to reveal information about the processes.
The aim of dMod is to allow for "simultaneous" predictions under several conditions and compare these predictions to the corresponding experimental data sets to estimate model parameters.Different experimental conditions are typically expressed by that fact that some parameters are different between conditions whereas others are common to all conditions.This situation occurs e.g. if perturbation experiments are performed, affecting only few parts of a system.Mathematically speaking, we want to construct a parameter transformation where i = 1, . . ., n corresponds to the different conditions.If all parameters are shared throughout all conditions, then p = q.If, however, all parameters are distinct, then p = n • q.
Perturbation experiments correspond to a situation where p q.
All symbols from eqnvec1 and eqnvec2 are collected and their union constitutes the symbols of θ.The evaluation p(theta) returns a list of length n (in the example n = 2) with inner parameters.The "+" operator can be applied consecutively to add conditions.
Prediction-and observation functions are generalized to multiple conditions by the "+" operator, too.Mathematically speaking, they become In words, if prediction-or observation functions are defined for different conditions then they expect condition-specific inputs which are evaluated by the matching functions.Examples for condition-specific prediction functions typically involve different forcings or events.Observation functions can e.g.differ between different measurement techniques.In all these cases, the different prediction functions x 1 , . . ., x n or observation functions g 1 , . . ., g n are defined, referencing the condition, and combined by the "+" operator analogously to p.
All commands, P(), Xs(), Y(), etc. can be executed with condition = NULL.In that case, the corresponding returned function is generic and, if called for different conditions, the same identical function is evaluated with the condition-specific input.The other way round, if a function, say, x is defined for several conditions, its prediction can be evaluated only for a subset of conditions by x(times, pars, conditions = myconditions).

The data structure
In dMod different experimental conditions are handled by lists.Parameter transformations, prediction-and observation functions stacked by the "+" operator return list objects.On the other hand, data.framesas they are used for linear modeling, mixed-effects modeling or plotting with ggplot2 are highly convenient to organize the data.The class datalist provides the interface between dMod's list structures and data.frameobjects.A datalist is a list of data frames with identical structure: observable names, time points, measured values and measurement uncertainty.
Objects of class datalist are usually generated by the as.datalist() command from a data.frame.The factor variables in the data frame to be used as generators for the unique condition names can be passed by the split.byargument.The resulting list of data frames has an additional attribute "condition.grid", a translation table between the condition names and the original factor variables which can be used for specification of parameter transformations or augmentation of predictions by descriptive columns.

Objective function
The aim of dMod is parameter estimation.The objective function to be minimized for this purpose is the weighted least squares function which is produced by the command normL2(data, prdfn) where data is a datalist object and prdfn is a prediction function.
The objective function is the final link connecting the chain of parameter transformations, prediction-and observation functions to observations.It collects all derivative information and besides the objective value it also computes gradient and Hessian.The standard optimizer employed within dMod is the trust() optimizer from the trust package.
Objective functions can be added by the "+" operator, meaning the objective values, gradients and Hessians are accumulated by summation.Besides the standard function normL2(), dMod provides several other functions returning objective functions, e.g.constraintL2() or datapointL2().They allow to define parameter priors or treat data points as parameters, as shown in Section 4. Thus, a typical objective function used for parameter estimation could be obj <-normL2(data, g*x*(p1+p2+p3)) + constraintL2(mu, sigma).

Three-compartment model of bile acid transport
The following model is a simplified dynamic model on bile acid transport.Bile acids are produced in the liver.They are necessary for the digestion of fat and oil.In the liver, bile acids are taken up in hepatocytes (liver cells) by specific transporter molecules.Clearance occurs either via canalicular or sinusoidal export, i.e. export to the bile transportation system of bile canaliculi or the intercellular space.
To study bile clearance, experiments with hepatocytes or hepatocyte-derived cell lines are performed in-vitro.Cells in a Petri dish stick together to a monolayer of cells, forming bile canaliculi in between cells.They adhere to the dish and are surrounded by a buffer providing the cells with nutrients.A radioactive label allows to measure the bile acid taurocholic acide (TCA).
For the mathematical description of bile flow, a three-compartment differential equation model is used.TCA is pipetted into the buffer compartment (TCA_buffer) from where it is transported into the cells (TCA_cell).Intracellular TCA is exported back to the buffer or the canalicular compartment (TCA_cana).Finally, canalicular TCA flows back into the buffer compartment.These processes give rise to the differential equations and corresponding flowchart presented in Figure 1.

Observation function and simulated data
In experiments, the three dynamic states, TCA_buffer, TCA_cell and TCA_cana cannot be directly measured.Rather, the radioactivity can only be measured separately for two compartments, namely the buffer and the cellular compartment, where the latter contains cells The scaling factor s translates amounts of TCA into radioactive counts.The observation function is expressed in dMod as follows.
First, observables are defined by an eqnvec object from which an observation function g is generated by the Y() command.The Y() command needs to be informed which of the symbols are variables (dynamic states) or parameters.Conveniently, Y() can parse an eqnvec or eqnlist such as reactions to retrieve this information.
Adding noise to the prediction, data is simulated.In the following, we will simulate the outcome of an efflux experiment.The experiment start with all TCA concentrations in steady state, such as shown in Figure 2 after t = 50.To initiate the efflux, the buffer is replaced by TCA-free buffer, i.e.TCA_buffer = 0.This translates into the following initial parameter values: # Reset parameter values pars["TCA_cell"] <-0.3846154 pars["TCA_cana"] <-0.1538462 pars["TCA_buffer"] <-0 pars["s"] <-1e3 The predicted dynamics of the system's internal and observable states is obtained by evaluation of the concatenated prediction function g • x, formulated as g*x in dMod.The scaling parameter s is set to 1000.
out <-(g*x)(times, pars, conditions = "standard") Since g and x have been generated as generic functions, i.e. condition = NULL, we can assign the output to a condition of our choice, in thas case "standard".The predicted noiseless observation is obtained by considering the observable states only at the time point of observation timesD.
# Simuate data timesD <-c(0.1, 1, 3, 7, 11, 15, 20, 41) datasheet <-subset(wide2long(out), time %in% timesD & name %in% names(observables)) Data uncertainties σ are derived by the Poisson nature of radioactive count experiments, i.e. σ x = √ x.To avoid division by 0, the minimal σ-value is set to 1. Random values are added to the predicted values to simulate observation noise.In the end, the data.frame is converted into a datalist object.datasheet$sigma <-sqrt(datasheet$value + 1) datasheet$value <-rnorm(nrow(datasheet), datasheet$value, datasheet$sigma) data <-as.datalist(datasheet)plot(out, data) Both, the simulated data and the model prediction from which the data is derived are shown in Figure 3.The data reflects a typical time course of an efflux experiment, showing decreasing cellular TCA levels and increasing levels of TCA in the buffer.

Parameter transformation
Parameter transformations play a crucial role in the set-up of dMod.They can have several purposes such as fixing paramer values, implementing parameter bounds, including steadystate constraints or mapping parameters to different conditions.While being conceptually the First, we use parameter transformations to constrain all parameters to be positive or zero because all our parameters are either amounts or rate parameters.The parameter transformation is generated by the P() command taking an eqnvec object.Parameter transformations explicitly state the relation between the inner parameters, i.e. the parameter values that are evaluated within the model, and the outer parameters, i.e. the parameter values provided by the user or by an optimizer.In our case, we imply positivity of inner parameters using the exp() function on outer parameters.The corresponding code reads: p <-P( trafo = eqnvec( TCA_buffer = "0", TCA_cell = "exp(TCA_cell)", TCA_cana = "exp(TCA_cana)", import = "exp(import)", export_sinus = "exp(export_sinus)", export_cana = "exp(export_cana)", reflux = "exp(reflux)", s = "exp(s)" ), condition = "standard" ) outerpars <-getParameters(p) pouter <-structure(rep(-1, length(outerpars)), names = outerpars) plot((g*x*p)(times, pouter), data) The vector outerpars is the collection of all symbols on the right-hand side of trafo.It coincides with names(pars) except for TCA_buffer, which is fixed to a constant expression, here 0, by the transformation.However, the interpretation of the parameters has changed since now their values are on a log-scale.All three functions, the observation function, prediction function and parameter transformation can be concatenated to one new prediction function, g*x*p which takes times values of the outer parameters to predict internal and observable states.The model prediction generated by pouter is shown in Figure 4.

Objective function and model fitting
The objective of model fitting is to find parameter values such that the corresponding model prediction matches the observation.In Figure 4, the graphs of buffer and cellular should match the observation within the error.For normally distributed measurement noise, maximum-likelihood estimation is equivalent to least-squares estimation.A least-squares objective function can be generated by the normL2() command which requires a datalist object, in our case data, and a prediction function, in our case g*x*p.
Frequently, non-identifiable parameters are encountered in non-linear dynamic systems.To prevent the optimizer from selecting extreme parameter values for which the ODE solver aborts, a general quadratic prior can be entailed on all parameters by calling constraintL2().
The objective functions returned by normL2() and constraintL2() are objects of class objfn and can be added by the "+" operator.
The following code illustrates the implementation of the objective function and how it is used with the trust() optimizer from the trust package to obtain a model fit, shown in Figure 5.Besides non-identifiability of parameters, local optima constitute another pit-fall when optimizing non-linear functions.The trust() optimizer employs derivative information and therefore, if starting within a certain region around a local optimum, is very efficient in finding it back.Once an optimum is found, we can be confident that there is no deeper point around.However, to be confident that an optimum is the globally best solution, we might want to scatter starting points for optimization runs all over the parameter space.The dMod package provides the mstrust() function based on trust() to do a multi-start search: out_mstrust <-mstrust(obj, pouter, rinit = 1, rmax = 10, iterlim = 500, sd = 4, Here, we have searched according to p i = p 0 + ∆ p i where p 0 is the center, in our case pouter, ∆ p i ∼ N (0, σ 2 ) is a random parameter vector taken from a normal distribution, in our case σ = 4, and the index i runs from 1 to fits = 50.The mclapply() command from the parallel package is used internally to run fits in parallel, here cores = 4.The result of mstrust() is a list of all returned values of trust().To extract the final objective value, parameter values, convergence information and the number of iterations, as.parframe() is used.The multi-start approach identifies four local optima, see Figure 6, which yield almost the same objective value, Figure 6A.Despite the similar objective value, the optima are not close to each other in parameter space, as being illustrated by Figure 6B, and lead to different predictions, Figure 6C. Figure 6B suggests that the two initial values TCA_cell and TCA_cana are connected in the sense that if one takes a large value, the other takes a small value and vice versa.This is not surprising because the observed cellular TCA amount is the sum of both.A new experiment needs to be designed to distinguish one situation from another.

Working with several conditions
In practice, the canaliculi only form a closed compartment if Ca 2+ /Mg 2+ ions are present in the buffer.Therefore, if the experiment is repeated with Ca 2+ /Mg 2+ -free efflux buffer, the contents of the canaliculi escapes quickly into the buffer compartment.Under this condition, the buffer measurement reflects what was formerly the total TCA content in buffer and canaliculi whereas the cellular measurement reflects what was fomerly the TCA content of the cells.Mathematically, two experimental conditions which differ only by the reflux parameter need to be combined in one objective function.
Like before, we simulate a data set.Then, a parameter transformation for the additional condition is set up and the parameter space is explored by a multi-start fit.
To simulate the new experimental condition, the "reflux" parameter is modified.The new data set is combined with the original data by the "+" operator.datasheet <-subset(wide2long(out), time %in% timesD & name %in% names(observables)) datasheet$sigma <-sqrt(datasheet$value + 1) datasheet$value <-rnorm(nrow(datasheet), datasheet$value, datasheet$sigma) data <-data + as.datalist(datasheet) To add a condition to the parameter transformation function, we use the equations of the standard condition as a template for the "open" condition.Parameter transformation functions for different conditions are combined by the "+" operator.trafo <-summary(p)$standard$equations trafo["reflux"] <-"exp(reflux_open)" p <-p + P(trafo, condition = "open") Both transformations "standard" and "open" now possess the outer parameters reflux and reflux_open.However, the value of reflux is mapped to an inner parameter only by trans-formation "standard".Accordingly, transformation "open" only uses reflux_open.Thus, both transformations return the same values for all but the reflux parameter.The prediction function g*x is generic in the sense that condition = NULL whereas the concatenation g*x*p has the conditions "standard" and "open", evaluating the identical function g*x on two parameter vectors.
We define an updated objective function: outerpars <-getParameters(p) pouter <-structure(rep(-1, length(outerpars)), names = outerpars) obj <-normL2(data, g*x*p) + constraintL2(pouter, sigma = 10) Then we start 50 fits around pouter.The list of fits is simplified to a parframe and by the as.parvec() function, the parameter vector (of the best fit) is extracted from the parframe.The best fit is used to make a prediction which is plotted together with the simulated data.All results are shown in Figure 7.

Parameter uncertainty and identifiability
One might wonder why the optimum is unique as for any choice of the scaling parameter s we find appropriate values of the TCA initial value parameters that give rise to exactly the same prediction of the observables.The reason for the uniqueness is the parameter L 2constraint that we have added to the objective function.Nonetheless, we will see, that the non-identifiability is still visible in the profile likelihood.
The profile likelihood is computed by the profile() command.There are several options to control the step-size and accuracy.For convenience the method option can be used to select between the presets "integrate" and "optimize".

The code
profiles <-profile(obj, bestfit, names(bestfit), limits = c(-5, 5), cores = 4) plotProfile(profiles) plotPaths(profiles, whichPar = "s") gives us the result shown in Figure 8. Computing the profile likelihood, the sum of data contribution, normL2, and prior contribution, constraintL2, are optimized under the constraint of a given parameter value for the profiled parameter.In the optimum, data and prior contribution are evaluated separately giving rise to the dashed and dotted lines in Figure 8A.As we had expected, the data contribution to the initial value parameters TCA_cell, TCA_cana and the scaling parameter s is constantly zero.The parameters are structurally non-identifiable.
Each profile corresponds to a certain path in parameter space.The path for the profile of the non-identifiable scaling parameter s is shown in Figure 8B.It shows a clear coupling of the scaling parameter s and the initial value parameters TCA_cell and TCA_cana: both initial value parameters have to be decreased by the same extent as the scaling parameter is increased to keep the prediction unchanged.
The profiles of the parameters export_sinus and reflux_open exceed the 95% confidence threshold only to one side.Given the data, the export_sinus parameter could equally be −∞ (corresponding to an export rate of 0) without changing the likelihood significantly for the worse.A similar statement holds for the reflux_open parameter which could equally be ∞ meaning that we could assume instantaneous draining of the canaliculi for the "open" condition.The two parameters are practically non-identifiable.
Finally, the parameters import, export_cana and reflux exceed the 95% confidence threshold in both directions meaning that the parameters have finite confidence intervals.However, the confidence intervals are rather large and we might ask if there is further information that  we could use to improve parameter identifiability without generating new data.

Steady-state constraints and implicit transformations
So far we have estimated both initial concentrations, TCA_cell and TCA_cana, independently.However, we know that the efflux experiment was just started after completion of the uptake process.Our system runs into a steady state, the buffer is exchanged and the measurement begins.Hence, we can use the steady-state condition as an additional information for the modeling process.We get all the information about the transformations from the summary() command.The equation for TCA_cana is substituted by our steady-state constraint.By the "+" operator, a new parameter transformation function pSS is iteratively constructed for all conditions.
Alternatively, we want to implement the steady-state constraint by an implicit parameter transformation, as opposed to the explicit transformation shown above.Let ẋ = f (x, p) be our dynamic system.Then, under certain conditions, we find a function g(p) such that f g(p), p = 0 for all p, i.e. x S = g(p) is a steady state of f .The parameter transformation we want to generate is the function p → p, g(p) .Here, the set of outer parameters is the set of the reaction rates p whereas the set of inner parameters contains these rates and the corresponding steady states as initial value parameters.The root of f must be determined numerically to which end multiroot() from the rootSolve package is used.
The following code is a reimplementation of the example above.The condition "open" is special in the sense that the reflux rate is modified compared to the reflux rate by which the steady state is computed.This is solved by an event, flipping a switch variable from 0 to 1 and thereby replacing the rate reflux by reflux_open.
# Set up prediction function with events event.buffer<-data.frame(var= "TCA_buffer", time = 0, value = 0, method = "replace") event.open<-data.frame(var= "switch", time = 0, value = 1, method = "replace") x <-Xs(mymodel, events = event.buffer,condition = "standard") + Xs(mymodel, events = rbind(event.buffer,event.open),condition = "open") Although the observables have not changed compared to the set-up with purely explicit transformations, the observation function must be generated again because the new parameter TCA_tot has appeared.This parameter was not contained in the original reactions.Therefore, we use the states and parameters arguments to explicitly inform the observation function about the states and parameters involved.Otherwise, parameter sensitivities are not propagated correctly.
# Generate objective function outerpars <-getParameters(p) pouter <-structure(rep(-1, length(outerpars)), names = outerpars) obj <-normL2(data, g*x*pSS*p) + constraintL2(pouter, sigma = 10) The same simulated data set has been fitted by the fully explicit and the implicit/explicit model implementations.In both cases the global optimum is unique.Parameter profiles have been computed with both model implementations, shown in Figure 9.In addition, the original profiles without steady-state constraints are plotted.The optima found by all three approaches, no steady-state, analytic steady-state and numeric steady-state, are statistically compatible.The two implementations using the steady-state information show exactly the same profiles.Since one formulation is parameterized by TCA_cell whereas the other is parameterized by TCA_tot, the plot highlights one of the fundamental properties of the profile likelihood: invariance under reparameterization.In comparison to the profiles without steadystate information, the new profiles are narrower, meaning that the parameters have smaller confidence intervals.This was to be expected because we have reduced the dimension of the parameter space by one.parameter s such that both are structurally non-identifiable.The parameters export_sinus and reflux_open are practically non-identifiable since both parameters cannot be constrained to a finite interval with 95% confidence.

Prediction uncertainty and validation profiles
Next, we investigate the possibility to predict cellular amounts of TCA, TCA_cell, despite the non-identifiability of parameters.The amount of TCA_cell certainly depends on the total amount of TCA in the system.This total amount must be fixed in which case the parameters TCA_cell, TCA_cana and s become identifiable.The prediction uncertainty is assessed by a prediction profile which is computed based on a virtual data point for cellular TCA, measured at time point t = 41 under the "standard" condition.The dMod formulation reads as: obj.validation <-datapointL2(name = "TCA_cell", time = 41, value = "d1", sigma = .002,condition = "standard") The uncertainty σ = 0.002 is set to a small value, i.e. below 1% of the prediction value.The datapointL2() command returns an objective function which evaluates the model prediction 1 and computes the least-squares function of the virtual datapoint, returning derivatives for the data-point parameter d1.Its value "d1" is yet to be determined.By fitting the objective function obj together with obj.validation, "d1" equals the value of the TCA_cell at t = 41, as only then its contribution to the objective value is zero.Using the derivative information provided by datapointL2, a prediction profile around d1 is calculated.
profile_prediction <-profile(obj + obj.validation, myfit$argument, "d1", limits = c(-5, 5), fixed = c(TCA_tot = log(1))) The result is shown in Figure 10A.The interpretation of such a prediction profile is, that a measurement yielding a value for the cellular TCA level at time point t = 41 outside of the interval [0.19, 0.21] does not conform to our model with 95% confidence.More precisely, changing the data-point parameter d1, the model is quickly forced to follow the new data point.This is apparent from the "validation" contribution to the total objective value, Figure 10A, since it remains small with respect to the "data" contribution originating from all other data points.Forcing the model prediction to deviate more than 0.01 from the original one, the profile exceeds the 95% confidence threshold providing a confidence interval for the prediction itself.By calculating prediction profiles for several time points, a confidence 1 transformation(s) found: variable : infinitesimal : transformation -----------------------------------------------------#1: Type: scaling TCA_buffer : -TCA_buffer : TCA_buffer*exp(-epsilon) TCA_cana : -TCA_cana : TCA_cana*exp(-epsilon) TCA_cell : -TCA_cell : TCA_cell*exp(-epsilon) s : s : s*exp(epsilon) In agreement with the identifiability analysis by the profile-likelihood method, the parameters s, TCA_buffer, TCA_cana and TCA_cell are found to be non-identifiable due to a scaling symmetry.The corresponding scaling transformation, last column, leaves the observation invariant for any choice of epsilon.The parameter non-identifiability can be resolved choosing one representative from the orbit of the transformation.In our case, the scaling parameter could e.g.be fixed to 1.

Analytical steady-state constraints
While for the the present model, the steady-state could be explicitly calculated by hand, this might be much more challenging for models with a large number of states and parameters.For many of these models, the steadyStates() command outputs an analytical steady-state solution that can be incorporated in the model as an additional parameter transformation.
Based on (Rosenblatt et al. 2016), the steady-state constraint is solved for a combination of state variables and kinetic parameters while positivity of the solution is ensured.As the paper states, the approach outperforms common methods of steady-state implementation with respect to reliability and performance of the optimization process.For our example, the code reads steadyStates(reactions, file = "SS.Rds) yielding the output TCA_cana = TCA_buffer*export_cana*import_baso/(reflux*(export_cana + export_sinus)) TCA_cell = TCA_buffer*import_baso/(export_cana + export_sinus) TCA_buffer = TCA_buffer The solution is stored in an Rds file.After loading the file by readRDS(), the equations can be used when defining the parameter transformation.Squire W, Trapp G (1998)

Figure 1 :
Figure 1: Differential equations and flowchart of the reaction network.Taurocholic acid, TCA, is transported between three compartments by four different processes.(A) Assuming mass-action kinetics, the three dynamic states satisfy a set of coupled differential equations.(B) The equations are visualized in a flowchart.

Figure 2 :
Figure 2: Output of the prediction function.(A) Prediction of the TCA states.(B) Sensitivities of the three TCA states for the rate parameters only.

Figure 3 :
Figure 3: Model prediction of the observable and internal states.Simulated data is shown as dots with error bars.

Figure 4 :
Figure 4: Prediction of internal and observed states.All values of the outer parameters have been set to −1.Simulated data points are shown as dots with error bars.

Figure 5 :
Figure 5: Prediction of internal and observed states after optimization of the objective function.Simulated data points are shown as dots with error bars.

Figure 6 :
Figure 6: Result of multi-start fitting procedure.(A) Fits have been sorted by increasing objective value.Four optima were found with almost identical objective value.(B) The parameter values for different optima are shown in different colors.(C) Each local optimum corresponds to a different model prediction, shown in different colors.The observed states are pracitically undistinguishable although the internal states show different behavior.

Figure 7 :
Figure 7: Result of multi-start fitting procedure with two experimental conditions.(A) Model prediction of the best fit and the simulated data are shown in different colors.(B) Fits have been sorted by increasing objective value.The lowest value clearly separates from the second plateau.(C) Plotting the parameter values for each of the fits reveals that the second plateau consists of two optima.The lowest plateau however corresponds to a unique optimum.

Figure 8 :
Figure 8: Profile likelihood.(A) Profiles of all parameters.Data-and prior contribution to the total objective value are distinguished by line-type.(B) Parameter paths for the scaling parameter s.

Figure 10 :
Figure 10: Validation profile and confidence bands for the model prediction.(A) The profile likelihood for the data point parameter d1 describing TCA_cell at time point t = 41 is shown.(B) Computing the data parameter profile for different time points yields 95% confidence bands on the prediction of TCA_cell.
Parameter profiles for three different model implementations.The profile likelihood around the global optimum for the models without steady-state constraints, explicit steady-state constraints and implicit implementation of steady states is visualized by different colors.To illustrate that explicit (red) and implicit (blue) steady-state implementations yield the same result, the corresponding profiles are highlighted by red plus and blue cross signs, respectively.