Continuous Time Individual-Level Models of Infectious Disease: a Package EpiILMCT

This paper describes the R package EpiILMCT, which allows users to study the spread of infectious disease using continuous time individual level models (ILMs). The package provides tools for simulation from continuous time ILMs that are based on either spatial demographic, contact network, or a combination of both of them, and for the graphical summarization of epidemics. Model fitting is carried out within a Bayesian Markov Chain Monte Carlo (MCMC) framework. The continuous time ILMs can be implemented within either susceptible-infected-removed (SIR) or susceptible-infected-notified-removed (SINR) compartmental frameworks. As infectious disease data is often partially observed, data uncertainties in the form of missing infection times - and in some situations missing removal times - are accounted for using data augmentation techniques. The package is illustrated using both simulated and an experimental data set on the spread of the tomato spotted wilt virus (TSWV) disease.


Introduction
Innovative mathematical and mechanistic approaches to the modelling of infectious diseases are continuing to emerge in the literature. These can be used to understand the spread of disease through a population -whether homogeneous or heterogeneous -and enable researchers to construct predictive models to develop control strategies to disrupt disease transmission. For example, Deardon et al. (2010) introduced a class of discrete time individual-level models (ILMs) which incorporate population heterogeneities by modelling the transmission of disease given various individual-level risk factors. The general framework of ILMs have already been successfully applied to a broad range of epidemic data, eg., the 2001 UK foot-and-mouth outbreak (Deardon et al. 2010;Deeth and Deardon 2016;Malik et al. 2016), tomato spotted wilt virus (TSWV) disease Deardon 2014, 2016), the spread of 1-18-4 genotype of the porcine reproductive and respiratory syndrome in Ontario swine herds (Kwong et al. 2013), and influenza transmission within households in Hong Kong during 2008to 2010(Malik et al. 2014. Equivalent continuous time ILMs which capture the complex interactions between susceptible and infected individuals through spatial and contact networks can also be considered. The inference and fitting of such models is generally considered within a Bayesian framework using Markov chain Monte Carlo (MCMC).
However, infectious disease epidemiologists have previously found it difficult to apply these individual-level models to real life problems. This is due to a dearth of readily available software products. The applicability of the aforesaid continuous time ILMs is implemented in an R (R Core Team 2019) package, EpiILMCT (Almutiry et al. 2020) and is available from Comprehensive R Archive Network (CRAN) at https://CRAN.R-project.org/ package=EpiILMCT. In this article, we describe the package, EpiILMCT which allows users to simulate and fit epidemic data using distance-and/or network-based models (Bifolchi et al. 2013;Deardon et al. 2010;Jewell et al. 2009), and can also incorporate risk factors associated with both susceptible and infectious individuals. EpiILMCT also uses data augmentation techniques to carry out inference when the infection and/or removal times are unknown or censored, as is usually the case. To the extent of our knowledge, this feature is not available in any existing R packages that permit epidemic data analysis and modelling. Tools for the graphical summarization of epidemic data sets and outcomes are also provided. The statistical inferences made in EpiILMCT are set in a Bayesian framework and are carried out using Markov Chain Monte Carlo (MCMC). The main aim here is to provide a fast implementation of continuous time ILMs under different epidemic modelling frameworks. Because of the computationally intensive nature of MCMC for such models, we have coded functions, including MCMC, in Fortran to speed up computation.
There are several R packages that permit a range of different modelling tools that allow for fitting spatial-temporal epidemic data. For example, the packages splancs (Rowlingson and Diggle 2017), and lgcp (Taylor et al. 2013(Taylor et al. , 2015 provides methods for analyzing epidemic data as spatial and space-time point patterns. Also, the package surveillance (Meyer et al. 2017) implements a spatio-temporal point process model for epidemic data through the function twinstim. Other packages fit a range of autocorrelation regression spatio-temporal models (e.g., CARBayesST (Lee et al. 2018), spdep (Bivand et al. 2013;Bivand and Piras 2015), and spTimer Sahu 2017, 2015)). Further packages are mentioned in the Handling and Analyzing Spatio-Temporal Data CRAN task view (Pebesma 2018). The R Epidemics Consortium (2018) provides further useful resources for disease outbreak analysis related R software packages.
However, in each case, the functionality (e.g., models available) of the packages above is quite different to that of EpiILMCT. The models of the EpiILMCT package are "mechanistic" in that they attempt to more directly model the mechanisms of transmission between individuals. Specifically, they take into account the spatial interactions between individuals with differing disease status (e.g., susceptible, infected, notified, removed) at continuous time points of the epidemic process. Those spatial interactions between susceptible and infectious individuals are incorporated as distance-based effects on the infectivity rate of individuals through an infection kernel function (power-law or Cauchy). The infectivity rates can also depend upon various susceptibility and transmissibility covariates at the individual level. Additionally, and of key importance, none of the aforementioned packages account for uncertainty in the event times using Bayesian data augmentation MCMC method.
There are several R packages that provide for the visualization, simulation and modelling the spread of epidemics through networks. The package EpiModel (Jenness et al. 2018) allows epidemic simulation from mathematical models of infectious disease through stochastic contact networks based on exponential-family random graph models (ERGMs). Some packages assume observed contact network or networks when fitting the specified model; for example, ergm (Handcock et al. 2017;Hunter et al. 2008), Bergm (Caimo and Friel 2014), and hergm (Schweinberger et al. 2018). Those packages implement Bayesian analyses for fitting exponential-family transmission network models to observed contact network data.
A recently developed package, epinet (Groendyke and Welch 2018), allows users to infer transmission networks from time-series epidemic data by modelling the contact network using a generalization of the ERGMs. This package make use of time-series epidemic data as the input assuming unknown contact network in their functionality, and producing parameter estimates of the epidemic model as well as the contact and transmission networks. The transmission model can contain various covariates that captures important features (summary statistics) of the contact network as well as epidemic transmission.
However, once again these packages have different approaches to that implemented in EpiILMCT. We focus here on incorporating a contact network as a covariate in the implemented ILMs in EpiILMCT. The response in the ILMs is the event (e.g., infection) time, rather than the transmission network (the transmission network can be inferred later via posterior predictive simulation, of course, but we do not address this here). This is different to epinet, for example, which models the transmission network directly. The EpiILMCT package allows for any pre-user specified contact networks, including various special cases such as spatial or random unweighted (binary) (un)directed contact networks or weighted contact network.
As both spatio-temporal and contact network-based mechanisms can be key to understanding the dynamics of infectious disease spread, the ILMs in EpiILMCT allow for the incorporation of both contact network and distance-based effects jointly in the infectivity rate of individuals. None of the aforementioned packages have this feature in their functionalities.
The use of individual level data in more mechanistic epidemic models has been implemented in only a few other R packages. The most established of these is surveillance (Salmon et al. 2016;Meyer et al. 2017), a package for temporal and spatio-temporal disease modelling. It provides tools for outbreak detection in routinely collected surveillance data, as well as a range of models for infectious disease data. The most closely related model in surveillance to those of EpiILMCT is the additive endemic-epidemic multivariate temporal point process model. These models are implemented in the twinSIR function for modelling the susceptible-infectious-recovered (SIR) event history of a fixed population in continuous time using individual level data. However, not only is the underlying model framework different to that considered in the EpiILMCT package, but the twinSIR function does not allow for uncertainty in event times to be taken into account via data augmentation techniques. The function does not allow for only the epidemic terms of the model to be considered, as can be done in EpiILMCT; both endemic (e.g., seasonal) and epidemic terms must be included in the analysis. In addition, the distance kernel used in the epidemic part of the twinSIR function is represented by a linear combination of non-negative basis functions and is thus different from the distance kernels used in the EpiILMCT package.
The EpiILM package (Warriyar KV and Deardon 2018) that has recently been made available in R, provides similar utility to EpiILMCT, but for discrete-time ILMs. The models it contains provide options to include susceptible individual covariate information, as well as a choice to describe population heterogeneity. However, the package is limited to discretetime distance-based or network-based infection kernels and requires known event histories (i.e., there is no data augmentation feature).
As stated previously, inference for the models of EpiILMCT is carried out in a Bayesian MCMC framework. Although there are packages available in R to implement MCMC algorithms such as MCMCpack (Martin et al. 2011) and adaptMCMC (Scheidegger 2018), all are based on random walk Metropolis-Hastings (M-H) algorithm. The data augmented MCMC algorithm used in the EpiILMCT package to fit various models uses random walk and independence sampler (within Gibbs) steps within a M-H algorithm. The independence sampler algorithm in our package appears to be essential for updating the missing data efficiently (event times and infectious periods), and the authors having not found it possible to achieve well-mixing MCMC chains if purely random walk M-H algorithms are used (even if tuned adaptedly).
Our main purpose of developing this package is to make the use of continuous time ILMs available to epidemiologists and statisticians, through R, one of the most commonly used statistical software packages. Overall, EpiILMCT offers greatly increased flexibility for analyzing complex disease data. The remainder of this paper is laid out as follows. In the next section, we describe the general continuous individual-level model implemented in EpiILMCT. We also discuss the different infection kernel functions implemented in the package. Sections 3 and 4 discuss the functions contained within the package and the underlying Bayesian inference, respectively. Section 5 illustrates the application of EpiILMCT to simulated and real data, while Section 6 concludes the paper with a short summary of the software package and its implications.

Model
The EpiILMCT package allows for the implementation of continuous time equivalents, and extensions, of the discrete-time individual-level models (ILMs) of Deardon et al. (2010). The compartmental frameworks considered are the susceptible-infectious-removed (SIR) and susceptible-infectious-notified-removed (SIN R). In both frameworks, each individual is assumed to be in one of these states at any point in time, t ∈ R + . In the SIR framework, infected individuals transition between states, susceptible to infectious and from infectious to removed. Individuals are assumed to be in the susceptible (S) state until they become infected at which point they become immediately infectious (I), then being able to transmit the disease for the duration of their infectious periods before entering the removed (R) state. In the SIN R framework, infectious individuals are assumed to move from the infectious state (I) to a notified (N ) state. The latter represents a state in which individuals have been identified as having the disease, and may be subjected to various restrictions (e.g., government-imposed movement constraints in the 2001 UK Foot-and-Mouth disease (FMD) outbreak). The N -state infectivity rate is often assumed to be lower than that of I-state. As infectious individuals enter the R-state, they are removed from the infectious population (e.g., because of recovery and acquired immunity, death or quarantine) and from thereon play no role in transmitting the disease.
A full epidemic history consists of all transition event times for all individuals, and defines the state of all individuals at each point in time. For example for the SIN R framework, S(t), I(t), N (t) and R(t) at time t for t ∈ [0, t obs ] is defined by all infection, notification and removal times. Here, t obs is the maximum removal time; i.e., the time that the last notified individual enters the removed state. We assume that each susceptible individual j at time t has an infectivity rate 1 with a given infectious individual i: where where Ω S (j) and Ω T (i) are the susceptibility and transmissibility functions, respectively. They are defined as: where S and T are the (coefficient) parameter vectors of the susceptibility and transmissibility covariates with sizes equal to the number of susceptibility (p S ) and transmissibility (p T ) covariates, respectively; X φ .j and Z ξ .i are the j th and i th columns of the susceptibility and transmissibility risk factor matrices X φ ∈ R + p S ×n and Z ξ ∈ R + p T ×n , respectively; and φ and ξ are vectors of the power parameters of the susceptibility and transmissibility functions with sizes equal to p S and p T , respectively. Note that, X φ and Z ξ are constrained to be positive. These power parameters allow for non-linearity between the susceptibility and transmissibility risk factors and the infection rate (Deardon et al. 2010). The notification effect parameter γ is used to measure the risk of infection after notification that can be reduced or increased depending on the disease type. For example, the transmissibility has been observed to increase after symptoms in SARS (Pitzer et al. 2007), whereas, it can be lower for the 2001 UK FMD (Jewell et al. 2009). The latter stated this effect parameter in their general model as a control measure parameter that accounts only the reduction in the risk of infection. In the case of γ = 1, notification has no effect on infectivity.
So, the total rate of infectivity of each susceptible individual j at time t is given by: where N − (t) is the set of infectious individuals at time t who have been infected but have not reached the notified state; and N + (t) is the corresponding set for notified individuals (Jewell et al. 2009).
The nomenclature is the same for the SIR framework, but without the N (t) state, there is not need to compartmentalize infectious individuals into pre-and post-notification sets. Therefore, the total rate of infectivity of each susceptible individual j at time t is given by: where I(t) is the set of infectious individuals at time t (i.e., they have been infected, but not yet removed).
The infectivity rate λ j (t) also contains a spark function that is denoted by (j, t) which allows for random infections otherwise unexplained by the model. This might represent, for example, the infection of a susceptible individual from a source outside of the observed population. In this model, we fix the spark term (j, t) such that (j, t) = ; ≥ 0.
The infection kernel κ(i, j) represents shared risk factors between pairs of infected and susceptible individuals. In the EpiILMCT package we consider three kernel types: distancebased, network-based, and combined distance and network-based. Two sub-types of distancebased kernel are also considered: Cauchy and power-law. The infection kernel functions are given in Table 1. In the distance-based ILMs, the kernel function is based on the distances d ij between individuals generally, but not always, spatial Euclidean distance. In the Model Kernel type Kernel function Combined distance and network-based ILMs network-based ILMs, the kernel function is based on the connections between individuals in a contact network that are represented by binary connections c ij = 0 or 1, or weighted connections w ij ∈ [0, ∞). In the combined ILMs the kernel consists of a linear function of both.

Likelihood function
We label the m infected individuals i = 1, 2, . . . , m with corresponding infection (I i ) and removal (R i ) times such that I 1 ≤ I 2 ≤ · · · ≤ I m . The N − m individuals who remain uninfected after t obs are labeled i = m + 1, m + 2, . . . , N with I i = R i = ∞. We then denote infection and removal time vectors for the population as I = {I 1 , . . . , I m } and R = {R 1 , . . . , R m }, respectively. We assume that infectious periods follow a gamma distribution with a fixed shape δ a and rate δ b , δ = (δ a , δ b ) (Jewell et al. 2009). The likelihood function can be divided into two independent components: the infectious and the removed components. As we assumed earlier that each susceptible individual j has a total infectivity rate λ j (I j ) (their total specific infectious pressure) at the time of being infected (I j ) from infectious individuals i ∈ I(I j ), the infectious component under the SIR continuous time ILMs can be written as: where the product term represents the total specific infectious pressure that each infected individual receives from infectious individuals at the time of being infected, and the exponential integral represents the total person-to-person infectious pressure during the course of the epidemic.
The removed component then contains the contribution of the infectious periods to the likelihood function via their densities. As the infectious period of an infected individual i (D i = R i − I i ) is independent of others, the removed component is simply: The likelihood function of the general SIR continuous time ILMs can then be formed by combining the infectious and removal parts given as follows: where the wedge symbol ∧ denotes the minimum operator; θ is the vector of unknown parameters; f ( · ; δ) indicates the density of the infectious period distribution; and D i is , which represents the total person-to-person infectious pressure through the course of the epidemic, can be written as the double sum in the lower equation (Britton and O'Neill 2002;Jewell et al. 2009). The integral is transformed by discretizing it into a sum over the successive events of the epidemic and is substituted by the double sum. The likelihood function of the general SIN R continuous time ILMs can be formed in a very similar manner (see Appendix A).

Function Usage contactnet
Generates undirected unweighted (binary) contact network matrices from spatial (powerlaw, or Cauchy), or random, network models.

plot.contactnet
Provides plot of a contact network of class 'contactnet'.
datagen Generates epidemics from distance/network-based individual level models.

as.epidat
Generates objects of class 'datagen' that contain the individual event history of an epidemic along with other individual level information.
plot.datagen Provides different plots summarizing an epidemic of class 'datagen'.
epictmcmc Runs a Bayesian data augmented MCMC algorithm for fitting specified models (SIR or SIN R).

print.epictmcmc
Prints the contents of 'epictmcmc' object to the console.
loglikelihoodepiILM Calculates the log likelihood for a given compartmental framework and kernel type of the continuous time ILMs.

Contents of the EpiILMCT package
The EpiILMCT package can be used to simulate and graphically summarize epidemics, and, for a given model, carry out Bayesian inference and calculate log-likelihood. Most of the main package functions are written in Fortran 95 (called from within the R wrapper), since they are computationally intensive tasks. The functions contained in the package are reviewed in Table 2.

Contact network
Various types of contact network can be considered. First, we consider unweighted (binary) contact networks which can be directed or undirected. In an undirected unweighted contact network, each pair of individuals share the same symmetric connection such that c ij = c ji for i = j; i, j = 1, . . . , N ; and each network is defined by N 2 elements where c ij = 1 if a connection exists between individuals i and j, and 0 otherwise. In a directed unweighted contact network, it is not necessary for individuals to share the same symmetrical relationship so that c ij = c ji for i = j; i, j = 1, . . . , N . This leads to a non-symmetric contact network matrix. Weighted contact networks can also be considered in the EpiILMCT package in which the connections between individuals are not described as present or absent but are weighted according to their strength. These too can be directed or undirected.
A function (contactnet) is included to generate undirected unweighted contact networks. It can simulate both spatial networks where connections are more likely to occur between individuals closer in space ("spatial contact networks"), as well as random contact networks. The function contactnet has three available options ("powerlaw", "Cauchy", and "random") for the network model, where the first two options simulate spatial contact networks in which the probability of connections between individuals are based on required XY coordinate input.
The inclusion of the two options "powerlaw" and "Cauchy" in the argument type is to allow the user to choose between two commonly assumed spatial forms to describe the underlying population. For example, the power-law network model is taken from Bifolchi et al. (2013) who use this network to test how well purely spatial power-law ILMs can approximate disease spread through networks. The Cauchy model was used by Jewell et al. (2009) to model the 2001 UK foot-and-mouth outbreak in Cumbria; they found this kernel the most appropriate for predicting transmission of these tested.
We now describe the three model options in detail. First, in the power-law contact network model of Bifolchi et al. (2013) the probability of a connection between individual i and j is given by: where d ij is the Euclidean distance between individuals i and j; β is the spatial parameter; and ν is the scale parameter.
Under the Cauchy contact network model, as used in Jewell et al. (2009), the probability of a connection between individual i and j is given by: where d ij is the Euclidean distance between individuals i and j; and β is the spatial parameter.
Finally, under the random contact network model, the probability of a connection is simply generated from a Bernoulli distribution with probability equal to β.
Let us now consider some examples. To create the above undirected unweighted contact networks, the function requires the network model to be specified ("powerlaw", "Cauchy", or "random") via the type argument. If "powerlaw" or "Cauchy" are selected, the XY coordinates of individuals (location) have to be specified through the argument location.
The function contactnet produces a list which includes the contact network matrix in a class, 'contactnet'.
To obtain a plot of the contact network, we introduce an S3 method plot.contactnet function, which uses as its input an object of the class 'contactnet'. The plot.contactnet function uses code internal to EpiILMCT for the layout when plotting power-law or Cauchy network models, but depends on the package igraph (Csardi and Nepusz 2006) when plotting random network model.
The following code generates the three types of contact networks for a population of 50 individuals, with a uniformly distributed spatial layout for the spatial network models.

Epidemic simulation
The function datagen allows the user to generate epidemics from the continuous time ILMs under the SIR or SIN R compartmental frameworks. Which framework is to be used is specified through the type argument. Each infected individual in a simulated epidemic has an infection life history defined by their time of infection and the length of time spent in the infectious state. We assume the conditional intensity functions stay constant between events, such that the time to the next infection, given that the last infection occurred at time t, follows W j ∼ Exp(λ j (t)). Here, W j represents the "waiting time" for susceptible individual j becoming infected.
Under the SIR framework, and using the chosen distribution of the infectious period, an epidemic is simulated starting with a randomly chosen initial infected individual k at time I 1 = 0, or with initial infected individual(s) specified via the argument initialepi. This argument requires a vector or matrix containing the id number(s), removal time(s), infec- The individual with the minimum W is taken as the next infected individual and assigned an infection time I s+1 = I s + min(W ); an infection period D j (generated from f (D j ; δ)); and a removal time R s+1 = I s+1 + D j . The process is repeated until no infectives remain in the population or I s+1 > t max , where t max is the time at where the epidemic simulation is set to end. t max can be then specified via the option tmax.
Under the SIN R framework, each infected individual is considered to have an incubation period comprising the time from infection to notification, and a delay period comprising the time from notification to removal. Together the incubation and delay periods constitute the infectious period. An epidemic is simulated in the same manner described above for the SIR framework, except that the infection period is replaced by incubation and delay periods D (inc) and D (delay) (generated from f (D (inc) j ; δ (inc) ) and f (D (delay) j ; δ (delay) ), respectively); and notification and removal times are assigned as In this function, the infectious, incubation and delay periods are assumed to follow either exponential or gamma distributions. These distributions can be specified through the delta argument. Under the SIR framework, delta is a vector containing the shape and rate parameters of a gamma distribution, whereas under the SIN R framework it is a 2×2 matrix where each row represents the parameters of the incubation and delay period distributions. Note that -as is often done -an exponential distribution can be assigned to any of these distributions by setting shape parameter equal to one.
The epidemic data structure output of the datagen function is used throughout the Epi-ILMCT package. Under an SIR ILM, it returns a matrix with four columns representing: the id numbers of the individuals, removal times, infectious periods, and infection times. Under an SIN R ILM, it returns a matrix with six columns: the id numbers of the individuals, removal times, delay periods, notification times, incubation periods, and infection times. Uninfected individuals are assigned infinity values (Inf ) for both their removal and infection times. Epidemic data from other modelling packages can be extracted and modified to be used in EpiILMCT. For example, we show how this can be done using the individual level models from the surveillance package in Appendix B.
The choice of kernel function κ(i, j) is specified using the kerneltype argument. This takes one of three options: "distance" for distance-based, "network" for network-based, or "both" for distance and network-based. The appropriate kernel matrix must also be provided via the kernelmatrix argument. If "distance" is chosen as the kerneltype, the user must choose a spatial kernel ("powerlaw" or "Cauchy") through the distancekernel argument. The distance matrix can be obtained from XY coordinate data using the dist function from the stats package (R Core Team 2019). Otherwise the distance matrix can be specified by the user. Other arguments in the datagen function require the data and coefficient parameters for the susceptibility and transmissibility risk factors as explained in Section 2.
We define an object of class 'datagen' to take a list of values needed for the use of other functions, such as, plot.datagen and epictmcmc. This list contains: type, kerneltype, epidat (event times), location (XY coordinates of individuals), and network (contact network matrix). In the case of setting the kerneltype to "distance", a NULL value will be assigned to the network option. The package has also a separate function as.epidat that generates an object of class 'datagen' for a given epidemic data set (Appendix B contains a brief example of using this function).
The package also contains an S3 method plot.datagen function, which illustrates disease spread through the epidemic timeline. This function can be used for either distance-based or network-based ILMs. The object of this function has to be of class 'datagen'. If the plottype argument is set to "history", the function produces epidemic curves of infection and removal times. Example plots are shown in Figure 3. Conversely, setting this argument to "propagation" produces plots of the epidemic propagation over time.
With the latter option, exactly which plots are output varies by kernel. With the network kernel, the function plots all the connections between individuals and overlays these with the epidemic pathway direction over time. This path direction consists of directed edges from all infectious individuals connected to a given newly infected individual i with infection time I i (one per plot). Thus, this produces directed networks showing possible pathways of the disease propagation. With the distance kernel, the function plots the spatial epidemic dispersion over time. It shows the changes in the individual status that related to the chosen compartmental framework. To avoid displaying too many plots, the time.index argument allows user to obtain propagation plots at specific infection time points rather than at every infection time.

Bayesian inference
Prior distributions of the model parameters are selected from one of three options: gamma, positive half normal or uniform distribution. Then, Metropolis-Hastings MCMC is performed to estimate the joint posterior of the model parameters and latent variables (the latter if various event times are assumed unknown). This is achieved using the function epictmcmc. The parameters of the susceptibility and transmissibility functions, infection kernel and spark term (collectively denoted θ) are updated using the random-walk proposals. The user is required to tune the proposal variances to achieve good mixing properties. Thus, the user must provide a vector of initial values, a prior distribution ("gamma", "uniform", or "halfnormal"), the prior parameters, and the variance of the normal proposal distribution for each parameter as shown in Figure 2. In case of running multiple MCMC chains, the user should provide a vector of initial values of the model parameters. Note that, setting the variance of the normal proposal distribution to zero fixes a parameter at its initial value. This option allows the user to fix such a parameter in the model while updating others (i.e., conditioning on the parameters).
Using the datatype argument, the epictmcmc function allows for three scenarios in terms of event time uncertainty: "known epidemic" can be used to model a fully observed epidemic with known infection and removal times; "known removal" can be used to model a partially observed epidemic where the infection times are unknown; and "unknown removal" can be used to model a partially observed epidemic where removal and infection times are unknown. The latter option is only available for the SIN R continuous time ILMs where notification times are assumed correctly known. When the datatype argument is set to "known epidemic", the infectious periods are fixed by default. When infection times are unknown, the rate(s) of the infectious, incubation and/or delay period distributions are assigned gamma prior distributions with shape a and rate b. Thus, the rate parameters have conditional distributions with a standard form following the gamma distribution. For the SIR continuous time ILMs, this is as follows: where δ is the rate of the infectious period distribution; M = m i=1 (R i − I i ); and a δ and b δ are the prior parameters of the infectious period rate. For the SIN R continuous time ILMs, the distribution of the incubation rate and delay parameters are as follows: ; and a δ (inc) and b δ (inc) are the prior parameters of incubation period rate; and ; and a δ (delay) and b δ (delay) are the prior parameters of delay period rate.
A Gibbs update (i.e., sampling from the conditional posterior distribution) is used for the infectious period rate (for the SIR continuous time ILMs) or the incubation and/or delay period rates (for the SIN R continuous time ILMs). The required information for each period distribution are entered via the delta argument. We assume each period type follows a gamma distribution with fixed shape and unknown rate. Thus, to update the rate parameter of each period we specify delta, a list containing a vector of the fixed shape value(s), a vector (matrix) of the initial values of the rate(s), and a vector (matrix) for the parameters of the prior distribution of the rate parameter(s). In the case of incubation and delay periods being estimated, the input of the initial values is a 2× nchains matrix, and the prior parameters is a 2 × 2 matrix where each row contains the required information for each period rate.
An independence sampler is then used to update the infection times/infectious periods (for the SIR continuous time ILMs), or the infection times/incubation periods and/or the removal times/delay periods (for the SIN R continuous time ILMs). For the SIR continuous time ILMs, the i th infection time I i is updated by generating an infectious period D * i from a gamma proposal distribution such that D * i ∼ Γ(a, b). Then, the new infection time is the difference between the observed removal time and the new infectious period of the i th individual. The same procedure is used for updating the missing event times, infectious periods and corresponding parameters for the SIN R continuous time ILMs. The parameter values of the gamma proposal distribution could be provided through the periodproposal argument. If they are not provided, the parameters of the gamma proposal distribution are then based on the fixed shape and updated rate values from the argument delta. Computationally, it may be more efficient to apply a block update for the periods and event times. This can be implemented using the blockupdate argument, which requires that the user specifies m (assuming removal and infection times are known for the first m individuals), and the size of each block.
The epictmcmc function allows for sampling from multiple MCMC chains. This is done by providing the number of chains to be run via the option nchains. Additionally, multiple Figure 2: A diagram of the input structure for the arguments control.sus, control.trans, kernel.par, spark.par, gamma.par and delta in the function epictmcmc.
chains can be run in parallel by setting parallel = TRUE. This implies the use of the parLapply function from the parallel package (R Core Team 2019). The number of cores to be used is set to the minimum of the number of chains and the available cores on the user's computer. Note that, if parallel is set to FALSE and nchains>1, multiple MCMC chains are run sequentially. When parallel is set to TRUE, the clusterSetRNGStream function from the parallel package (R Core Team 2019) is used to distribute the setting seed value by the set.seed function (R Core Team 2019) to each core to reproduce the same results, otherwise each core sets its seed value from the current seed of the master process.
The output of this function is an object of class 'epictmcmc'. There are S3 methods: print.epictmcmc, summary.epictmcmc and plot.epictmcmc that depend on the coda package (Plummer et al. 2006). The latter function has a plottype argument to specify which samples need to be plotted. This argument has three options: "parameter" to produce trace plots of the posterior distributions of the model parameters, and "inf.times" ("rem.times") to produce plots of the average posterior and 95% CI of the unobserved infection (removal) times when datatype set to "known removal" ("unknown removal"). The S3 plot.epictmcmc method has the same options as the plot.mcmc function in the coda package, for example, start, thin, and density.
The class 'epictmcmc' contains the MCMC samples of the model parameters and the missing information (if datatype is not set to "known epidemic") as an mcmc matrix, and other useful information to be used in other functions, such as the above S3 methods. So standard summary methods from coda, such as summary.mcmc and plot.mcmc, can be employed using these MCMC samples as inputs.
Posterior predictive checks of the fitted model can be performed using the datagen function. This requires that the user supplies the model parameter values with a combined sample of the MCMC model parameter outputs. If desired, the simulation can be constrained to the first m infected individuals and their event times. This can be achieved by appending this information to the initialepi option.
To limit the number of plots, we assign the time.index option to be a vector containing time points for plots to be generated as shown in the following code: R> plot(NetworkData[[1]], plottype = "propagation", + time.index = seq_len (6)) We can also produce density plots of the infection and removal times, and a plot of the infectious periods, by specifying the argument plottype to "history" as shown in the following code: R> plot(NetworkData[[1]], plottype = "history") Figure 3 shows the densities of the infection and removal times, and the infectious periods; while Figure 4 shows the epidemic propagation plot. To illustrate fitting continuous time ILMs to data, we analyze the epidemic using the epictmcmc function. This is done under two observation scenarios: "known epidemic" and "known removal". For the former analysis, we assign Γ(1, 0.1) gamma prior distributions to the model parameters α 0 and α 1 and use normal MCMC proposals with variances equal to 0.5 and 1, respectively. As we have two susceptibility parameters, the argument control.sus is then a list that contains: 1) a list of a vector of initial values of α 0 and α 1 , and a 2 × 4 matrix in which each row represents the required information for updating each parameter; and 2) a 50 × 2 matrix of the covariates representing the unity intercept and the binary covariate z. Now, we run the MCMC using the epictmcmc function for sampling a single chain of 150,000 iterations using the following code: The estimates of the model parameters can be obtained through the S3 summary function of epictmcmc. The posterior means and 95% credible intervals of these parameters can be obtained via the following command: R> summary(mcmc1, start = 10000, thin = 10) ********************************************************* Model: SIR network-based continuous-time ILM Method: Markov chain Monte Carlo (MCMC) Data assumption: fully observed epidemic number.chains : 1 chains number.iteration : 140000 iterations number.parameter : 2 parameters ********************************************************* The MCMC trace plots for the model parameters can be produced using the S3 method plot.epictmcmc.
For the known removal times analysis, EpiILMCT uses data augmented MCMC to infer infection times and the infectious period rate. Here, we assume that the infectious period follows a gamma distribution with shape 4 and unknown rate parameter δ; so D i ∼ Γ(4, δ).
As the posterior samples of the model parameters are stored in the epictmcmc object as an 'mcmc' object of the type used in the coda package, the standard summary methods from coda can be employed, inserting mcmc1$parameter.samples as the input of this function. This is illustrated in the following command:  To check the fit of the model, we consider the posterior predictive distribution of four statistics. Specifically, we consider: T 1 , the total number of infected individuals; T 2 , the average removal time; T 3 , the variance of the removal times; and T 4 , the length of the epidemic. Here, we simulate 10,000 epidemics based on random draws of the model parameters from the MCMC output (excluding burnin) of the known removal times analysis (i.e., unknown infection times). We condition our simulation on the first ten infected individuals, then calculated the four statistics for each simulation. This simulation procedure is implemented in parallel using the future_lapply function from the future.apply package (Bengtsson 2019) as follows: R> posterior.pred <-function(x) { + epi <-datagen(type = "SIR", kerneltype = "network", kernelmatrix = +  R> plan(multiprocess, workers = 4) ## Parallelize using R> datmcmc <-future_lapply(mb, FUN = posterior.pred, future.seed = TRUE) R> summary.results <-sapply(datmcmc, unlist, simplify = TRUE) The posterior predictive distributions of the four statistics are shown in Figure 8. We can see that each distribution captures the observed statistics well.

Case study: Tomato spotted wilt virus (TSWV) data
We further illustrate the EpiILMCT package by analyzing the TSWV data as described in Hughes et al. (1997) and analyzed with spatial ILMs by Deardon (2014, 2016). These data represent the results of an experiment designed to study the spread of the disease amongst 520 pepper plants raised in a greenhouse. Plants were evenly distributed across a 10×26 meter area as shown in Figure 9. The experiment began on   May 26, 1993 and finished on August 16, 1993. Plants were checked for the disease every 14 days, and ultimately 327 were infected. Following Deardon (2014, 2016) these observation points are recorded to t = 1, 2, . . . , 7. We set the initial infection time to t = 2 in line with the original data set.
We here analyze the epidemic under two data availability scenarios. First, we assume that the event times of the TSWV disease are fully observed. Here, the infectious period was fixed at three time points (42 days) following Deardon (2014, 2016). Additionally, the last observed time point was at t = 7. Second, we assume the epidemic is partially observed. Specifically, we assume that the infection and removal times are unknown, and treat the reported infection times as the notified time points. This entails considerable uncertainty and makes the MCMC analysis much more time consuming (more than 13 times longer than the computation time of the first analysis), because it is necessary to estimate both incubation and delay periods along with the infection and removal times.
The data is stored in the data file tswv, available in the EpiILMCT package. It contains a list of the TSWV epidemic data set for the two compartmental frameworks (SIR and SIN R) structured as a 'datagen' class. The following code shows how the TSWV data set can be extracted and the associated Euclidean distance matrix built.
In the second analysis (i.e., where infection and removal times are treated as unknown), we assume notified times were observed for all infected individuals. Consequently, an SIN R distance-based continuous time ILM is used where the infectivity rate given in Equation 2 becomes: We here assume the risk of infection does not reduce after notification, and set the notification effect parameter to γ = 1. The infectious period here is divided into the incubation and delay periods. We assume the total infectious period to be within three time points (42 days) following Deardon (2014, 2016); Brown et al. (2005). Thus, we assumed the incubation periods to follow an exponential distribution such that D Exp(δ (inc) ) with initial value of δ (inc) = 1, whereas the delay periods is assumed to follow a gamma distribution such that D (delay) i ∼ Γ(10, δ (delay) ) with initial value of δ (delay) = 5. We assign gamma prior distributions for both rates such that δ (inc) ∼ Γ(10, 10) and δ (delay) ∼ Γ(60, 12). These choices are to cover the support of our assumptions about the infectious periods. For simplicity, we assume the infection time of the first infected plant is known. We set its incubation period to one time point.
Note that, infection and removal times are updated here in blocks of 10 (via the blockupdate argument) for reasons of computational efficiency. The epictmcmc function had a run time of 9.51 hours using the parallel method with 4 cores, but this was computationally much more efficient than if single updates were used (≈ 124 hours, calculated based on ten MCMC iterations).  Table 3: The posterior means and 95% credible intervals (CIs) of the model parameters, with a burn-in of 5,000 iterations and thinning interval of 10 for each of the four MCMC chains, for fitting the TSWV using the SINR distance-based continuous time ILM under the assumption of unknown removal and infection times.

Conclusion
This paper introduces the R software package EpiILMCT, which facilitates the use of a broad range of continuous time ILMs under two compartmental frameworks (SIR and SIN R). It also allows for the analysis of partially observed infectious diseases data, achieved using data augmented MCMC within a Bayesian framework. We illustrated the package by fitting continuous time ILMs on simulated and real epidemic data. The paper did not cover all functionality of the package. For instance, we did not illustrate incorporating both distance and network in the kernel function, or allowing for nonlinearity between the susceptibility and transmissibility risk factors and the infection rate. However, implementation of such facets is simple. Additional functionality that was not covered in Sections 5 can be found via help(package = "EpiILMCT").
Also, it is possible to use EpiILMCT to test the efficacy of disease control strategies (eg. vaccination or culling) via simulation study. This can be done by simulating epidemics in small time steps and then manipulating infection and/or removal times according to a given control policy, before simulating the next step of the epidemic simulation conditional upon the manipulated epidemic history just determined. We illustrate this via a simple ring-culling strategy in Appendix C.
In terms of future developments, the authors intend to expand the modelling framework to allow for latent periods (i.e., susceptible-exposed-infectious-removed (SEIR) and susceptible-exposed-infectious-notified-removed (SEIN R)). This would be useful for many disease systems in which the time between infection (exposure) and infectiousness cannot be reasonably ignored. Additionally, expanding the compartmental frameworks to allow for reinfection would also be useful for diseases such as influenza. That is, we could allow for frameworks: susceptible-infectious-susceptible (SIS), susceptible-exposed-infectioussusceptible (SEIS), etc.
Another development could involve incorporating more data uncertainty into the analyses, especially under the network-based model, is an option for future development of this package EpiILMCT. For example, networks are often only partially observed. However, the data augmentation could easily make the computation time for data analyses prohibitive. Various strategies for mitigating this might be available. For example, approximate forms of inference such as Gaussian process emulation (Pokharel and Deardon 2016), approximate Bayesian computation (Beaumont et al. 2009), machine learning based model classification (Pokharel and Deardon 2014), data-sampled likelihood approximation (Malik et al. 2016), or data-aggregation (Deeth and Deardon 2016) could all prove useful for overcoming these computational issues. Finally, it would be possible to extend our modelling framework to allow for multiple, interacting disease strains or pathogens (Romanescu and Deardon 2016). (OMAFRA), and the Natural Sciences and Engineering Research Council of Canada (NSERC). Almutiry was also funded by Qassim University through the Saudi Arabian Cultural Bureau in Canada. Warriyar was funded by the University of Calgary Eyes High Post Doctoral Scholarship scheme. We thank the editor and referees for their valuable suggestions and comments, which greatly improved both the software package and this manuscript.
A. The likelihood function of the general SIN R continuous time ILMs:

B. R code for extracting individual level data from surveillance
Here, we illustrate the extraction of individual level data from the surveillance package for use in the EpiILMCT package. We consider the toy data set (fooepidata) representing a population of 100 individuals that is used in the twinSIR examples of the surveillance package (Höhle et al. 2018).
The data can be found in surveillance via data("fooepidata", package = "surveillance").
R> library("surveillance") R> data("fooepidata") R> names(fooepidata) [1] "BLOCK" "id" "start" "stop" "atRiskY" "event" "Revent" [8] "x" "y" "z1" "z2" "B1" "B2" The fooepidata event history consists of 178 time BLOCKs of 100 rows, where each row describes the state of individual id during the corresponding time interval (start, stop). The start and stop variables represent the start and end of interval time points (in continuous time) that indicate the waiting time between consequence event times (infection and removal times). The binary variables event and Revent are used to indicate the occurrence of newly infected or removed individuals at the stop time of each time interval (BLOCK), respectively. Thus, the stop time is taken to be the infection or removal times of the infected or removed individuals in each time interval. The coordinates of individuals is represented in columns x and y. The fooepidata data set contains also endemic and epidemic covariates. Endemic covariates are represented by the columns named z1 and z2 (the exact interpretation of these covariates is not given). Epidemic covariates are represented by the columns named B1 and B2, and they indicate the count of currently infective individuals for each individual within, and greater than one unit distance, respectively. See (help(epidata, package= "surveillance")) for more details about the data structure. From this data set, we extract only the event times and XY coordinates of each individual, ignoring the purely spatial epidemic covariates which are directly modelled by the distance kernel in EpiILMCT.
R> fit1 <-twinSIR(~B1 + B2, data = fooepidata) R> summary(fit1) Call: twinSIR(formula =~B1 + B2, data = fooepidata) Coefficients: Estimate Std. Error z value Pr(>|z|) B1 0.023960 0.004208 5.693 1.25e-08 *** B2 0.003395 0.001119 3.034 0.00241 ** cox(logbaseline) -6.010580 0.659257 -9.117 < 2e-16 *** ---Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Total number of infections: 88 One-sided AIC: 474.05 Log-likelihood: -235.2 Number of log-likelihood evaluations: 26 The posterior means of the ILM parameters (α, β) are 0.009 and 0.945, respectively. Figure 14 shows the ILM power-law distance kernel function under the posterior mean, along with the distance function suggested by the MLEs of the model parameters from the twinSIR analysis. We can see broad agreement, although the step function assumption of the twinSIR seems less reasonable than the continuous decay of the ILM kernel for short distances (< 1 distance unit). C. R code for implementing ring-based control strategy Here, we illustrate the use of the EpiILMCT package in testing the efficacy of a ringbased control strategy for mitigating the spread of disease. We consider an example in which an infectious disease is transmitted between 625 individuals located in a square area of 50×50 units. These individuals could be thought to represent farms or trees, say. We implement control measures upon all individuals within a circle of r radius of newly infected individuals. This control strategy essentially places these individuals in the removed set. These measures could be thought to represent vaccination or quarantine, but here we assume it is a culling strategy.
To illustrate we first simulate the XY coordinates of individuals from a uniform distribution. This is done as follows: R> library("EpiILMCT") R> set.seed(101) R> n <-625 R> loc <-matrix(cbind(runif(n, 0, 50), runif(n, 0, 50)), ncol = 2, nrow = n) We assume that the epidemic starts with an initial infected individual k = 386, who has an infection time I 1 = 0 and an infectious period of 3 days. We then implement the culling policy within an epidemic simulation study using the datagen command to simulate epidemics in a specified small time steps (e.g., a day at a time). This is done by setting the option tmax, and starting each new simulation step with initially infected and removed individuals set according to the epidemic history, and the culling policy implemented at the current time step. This is done using the initialepi option. We build a control.strategy function to implement the above culling policy using an SIR distance-based continuous time ILM with power-law kernel and no covariates, in which the infectivity rate given in Equation 3 becomes: with infectious periods assumed to follow a gamma distribution such that γ i ∼ Γ(6, δ). R> plot(rr, apply(ninfected, 1, mean), type = "o", ylab = "Number of + individuals", xlab = "radius", ylim = c(0, n), pch = 19) R> lines(rr, apply(numb.culled, 1, mean), type = "o", pch = 19, col = "red") R> legend("topright", c("Average number of infected individuals", "Average + number of culled individuals"), col = c("black", "red"), lty = c(1, 1), + pch = c(19, 19)) R> plot(rr, apply(len.infection, 1, mean), type = "o", ylab = "Length of + epidemic", xlab = "radius", pch = 19) Figure 15 shows the average number of infected and culled individuals at each radius. We can see that the number of infected individuals tends to decrease dramatically as the radius of the ring increases, levelling off once we have to get around r = 5 units. However, the number of culled individuals also increases quite dramatically with increasing the radius of the ring, also levelling off around r = 7 units. We can also see from Figure 16 increasing the radius r tends to decrease the length of the epidemic. Of course, the control.strategy function can be easily modified to impose other control strategies. For example, instead of culling within a time step as in the case here, we could allow for (stochastic) delays between infection and culling for surrounding individuals, or allow for only a probability of failure regarding each cull or vaccination event.

D. Comparing the computation times of running different models
Here, we compare the effect of population size and the number of infected individuals on the computation time for running the epictmcmc function. We considered five population sizes (50, 250, 450, 650 and 850 individuals), and generated three different epidemics using SIR distance-based continuous time ILMs, via the datagen function, resulting in different numbers of infected individuals. These epidemics are categorized into three levels as: small, medium, large defined as epidemics in which the number of infected individuals are less than 25%, between 25% and 50%, or greater than 50% of the population, respectively. Then, we run the epictmcmc three times assuming datatype = "known epidemic", "known removal" with single chain, and "known removal" with three chains, updating the infection times in blocks of size five. Figure 17 shows the computation times in hours for running the epictmcmc function on the above epidemics to obtain 150,000 MCMC samples. The computation times were approximated on the basis of running ten iterations, as our concern here is just to see to estimate the effect of population size and number of infected individuals upon computation time. We observed strong correlation between the population sizes and number of infected individuals in all of the considered analysis scenarios.
We see that under the fully observed epidemic assumption (datatype= "known epidemic"), the function epictmcmc can be performed in reasonable time for all scenarios. However,

datatype = "known epidemic"
The size of the number of infected computation time becomes an issue for partially observed epidemics (datatype= "known removal") when updating the infection times in turn in a single chain. Larger epidemics with larger population sizes were estimated to take more than four weeks to obtain 150,000 MCMC samples. Computation time is greatly reduced by running epictmcmc over multiple chains and updating infection times in blocks of size five. For example, with an epidemic in a population of size was 850 individuals, and almost all of individuals infected, the computation time was reduced from approximately 1548 hours (≈65 days) to 157 hours (≈7 days).