New frontiers in Bayesian modeling using the INLA package in R

The INLA package provides a tool for computationally efficient Bayesian modeling and inference for various widely used models, more formally the class of latent Gaussian models. It is a non-sampling based framework which provides approximate results for Bayesian inference, using sparse matrices. The swift uptake of this framework for Bayesian modeling is rooted in the computational efficiency of the approach and catalyzed by the demand presented by the big data era. In this paper, we present new developments within the INLA package with the aim to provide a computationally efficient mechanism for the Bayesian inference of relevant challenging situations.


Introduction to the R-INLA project
The R-INLA project is an evolving platform that hosts various projects, all interlinked with respect to the INLA package (Rue, Martino, and Chopin 2011) in R (R Core Team 2021). This package is based on the INLA methodology developed by Rue, Martino, and Chopin (2009). This development revolutionized the availability and applicability of Bayesian modeling approaches, even in high dimensions, to practitioners and statisticians alike. The INLA methodology ensures computational efficiency by using sparse representations of high dimensional matrices used in latent Gaussian models (LGMs). The computational efficiency of the method offers great appeal to different fields of science and for various applications. In ecology, Quintero and Jetz (2018) studied bird diversity by using R-INLA while Braga, Ter Braak, Thuiller, and Dray (2018) investigated environmental relationships by incorporating phylogenetic information. Dalongeville, Benestan, Mouillot, Lobreaux, and Manel (2018) used R-INLA to detect genes specific to salinity in the field of genomics. Air pollution was assessed with the purpose of disease assessment by Shaddick et al. (2018) while Rodríguez de Rivera, López-Quílez, and Blangiardo (2018) used R-INLA to determine forest species distributions. A study into fire occurrences was conducted by Podschwit, Larkin, Steel, Cullen, and Alvarado (2018) to develop a forecasting system with the use of R-INLA. The effect of coral bleaching in the Great Barrier Reef on the marine ecosystem was investigated by Stuart-Smith, Brown, Ceccarelli, and Edgar (2018). In social studies, R-INLA has been applied to study the state of education  and child growth  in Africa. These aforementioned works are but a few of many recent applications of R-INLA. The pertinency of R-INLA is clear. We believe that the new developments presented here will enable more applications in an even broader context.
We present a brief conceptual framework of the INLA methodology.
LGMs represent a specific subset of hierarchical Bayesian additive models. This class comprises well-known models such as mixed models, temporal and spatial models. An LGM is defined as a model having a specific hierarchical structure, as follows: The likelihood is conditionally independent based on the likelihood parameters (hyper parameters), θ and the linear predictors, η i , such that the complete likelihood can be expressed as The linear predictor is formulated as follows: where β represent the linear fixed effects of the covariates X and the unknown non-linear functions u of the covariates z are the structured random effects. These include spatial effects, temporal effects, non-separable spatio-temporal effects, frailties, subject-or groupspecific intercepts and slopes, etc. This class of models includes most models used in practice since time series models, spline models and spatial models, amongst others, are all included within this class. The main assumption is that the data, Y , is conditionally independent given the partially observed latent field, X , and some hyper parameters θ 1 . The latent field X is formed from the structured predictor as (β 0 , β, u, η) which forms a Gaussian Markov random field with sparse precision matrix Q(θ 2 ), i.e., X ∼ N (0, Q −1 (θ 2 )). A prior, π(θ), can then be formulated for the set of hyper parameters θ = (θ 1 , θ 2 ). The joint posterior distribution is then given by: The goal is to approximate the joint posterior density (2) and subsequently compute the marginal posterior densities, π(X j |Y ) and π(θ|Y ). Due to the possibility of a non-Gaussian likelihood, the Laplace approximation is used to approximate this analytically intractable joint posterior density. The sparseness assumption on the precision matrix which characterizes the latent Gaussian field ensures efficient computation (Rue and Held 2005).
In this paper we present some new developments within the INLA package in the fields of complex survival models, spatio-temporal models and high performance computing. See Rue et al. (2009), Martino, Akerkar, and Rue (2011), Lindgren and Rue (2015), Rue, Riebler, Sørbye, Illian, Simpson, and Lindgren (2017), Bakka et al. (2018) and Krainski et al. (2019) for more details on some main features of INLA. In Section 2 we discuss the implementation of complex survival models including joint longitudinal-survival models, competing risks models and multi-state models. Each of these could incorporate spline, spatial, temporal or clustering elements to mention a few. We then present the new extensions in the spatio-temporal domain, non-separable space-time models in Section 3. Finally, we discuss in Section 4 how the INLA package is adapted to a high performance computing environment using the PARDISO package (https://www.pardiso-project.org/; Alappat et al. 2020;Bollhöfer, Schenk, Janalik, Hamm, and Gullapalli 2020;Bollhöfer, Eftekhari, Scheidegger, and Schenk 2019).

Complex (joint) survival models
Survival models are used extensively in clinical studies where the time to a certain event is of interest. The hazard function, the instantaneous risk of experiencing the event, is most often of interest to estimate. More importantly, the effects of covariates on the hazard function is of interest for causal inference. Parametric and non-parametric approaches have been proposed to model the hazard function, most are available in the INLA package. In this section, we focus on more complex survival models and will not discuss standard survival models (see Martino et al. 2011). We present joint longitudinal-survival models in this section, for other complex survival models like competing risk models using INLA see Van Niekerk, Bakka, and Rue (2021a).
A basic joint model comprises of two different likelihoods and these likelihoods are joined by shared random effects (see Wulfsohn and Tsiatis 1997;Hu and Sale 2003;Guo and Carlin 2004). Extensions of linear joint models like spatial random effects and non-linear trajectories are used in the context of joint models to address certain practical challenges (see Zhou, Lawson, Hebert, Slate, and Hill 2008;Ratcliffe, Guo, and Ten Have 2004;Andrinopoulou, Eilers, Takkenberg, and Rizopoulos 2018). Each of these new joint models is still an LGM and thus no special implementation package is needed for each one (for more details see Van Niekerk, Bakka, and Rue 2019). Most longitudinal likelihoods and hazard assumptions can be facilitated in this framework, leaving no need to develop a new implementation for each set of assumptions.

Joint models as LGMs
In this section, we present relevant details of the joint model as an LGM as defined in Section 1, full details are available in Van Niekerk et al. (2019). We first present details of the two submodels that form the joint model, a survival and a longitudinal submodel, respectively, and then focus on the joint model in its entirety. Suppose the hazard rate for individual i, i = 1, . . . , N S at time s is defined by where h 0 (s) is the baseline hazard function which can be parametrically or non-parametrically specified and η S i (s) is the linear predictor (possibly time-dependent), based on covariates, for individual i. Currently, the exponential, Weibull, log-Gaussian and log-Logistic survival distributions are included in the INLA package, under the parametric hazard function assumption. The Cox proportional hazards model is included as a semi-parametric model resulting from a non-parametric constant baseline hazard in each of many time partitions (see Cox 1972). In this case, the random walk prior is adopted for the logarithm of the piece-wise constant baseline hazard function, achieving a non-parametric estimate of the baseline hazard function. Now define the density function of the time to event as then the likelihood function for the survival submodel is where c i = I(non-censored observation) indicates if an observation is not censored. An observation is censored when the exact event time is not observed but rather the most informative non-event time. Right, left or interval censoring are common censoring approaches and can be accommodated in our approach. The observations are thus a mixture of event times and censored times, depending on the status of each individual. Now, for the longitudinal data (Y ) suppose that each individual has N i , i = 1, . . . , N S observations for a total longitudinal data set size of N L = N S i=1 N i . We specify the linear predictor η L l (t), based on covariates at time t, and a conditional density function g(Y l |η L l (t)) for observation l, resulting in the likelihood for the longitudinal submodel as Now consider the linear predictors of the joint model, where η S and η L are of the form (1) and g : → is a smooth function of η L i (t). The function h facilitates the joint estimation of the models and can assume various forms. A common approach is to use the entire longitudinal linear predictor (see Ibrahim, Chu, and Chen 2010), while traditionally only the subject-specific intercept and slope of the time effect have been used, i.e., g(η L i (s)) = ν 1 w 1 + ν 2 w 2 s. In the latter we assume the structure specified by Henderson, Diggle, and Dobson (2000) as follows, Note that either ν 1 or ν 2 can be defined to be zero if desired.
Based on this reconstruction of the joint model, it was demonstrated by Van Niekerk et al.
that the joint model is indeed an LGM and can be successfully applied with the INLA package.
To this end, we present two illustrative examples. Firstly, we use data from a randomized clinical trial used to investigate the efficacy of two antiretroviral drugs in HIV patients available in the JMBayes package (Rizopoulos 2016), where g(η L i (s)) = ν 1 w 1 + ν 2 w 2 s. Secondly, we present an example with a non-linear trajectory and informative dropout event process with g(η L i (s)) = νη L i (s), from a prostate cancer study using post treatment PSA levels as a longitudinal biomarker.

Example 1: HIV antiretroviral treatments efficacy
In this example the efficacy and safety of two antiretroviral treatments, Didanosine and Zalcitabine, are investigated and presented in Guo and Carlin (2004). This randomized trial includes N S = N = 467 patients who had failed or were intolerant to Zidovudine (AZT) therapy. The longitudinal response is CD4, the CD4 cell counts (higher implies healthier), at timepoint obstime for each patient, such that N L = 1405. Covariates are the gender, prevOI which denotes if the patient has been diagnosed with AIDS at study entry or not and AZT which indicates if there is an intolerance to AZT or if AZT has failed in treatment. The survival responses is the Time until death (death = 1) or censoring (death = 0). In the joint model, we use the same association structure as in Guo and Carlin (2004), i.e., This model estimates the treatment effect on the survival as well as CD4 count jointly. We can then evaluate the treatments for efficacy in both endpoints by the inclusion thereof as a covariate in both submodels. The specific submodels are then The data is loaded and visualized by the following code (see Figure 1).
R> inla.setOption(short.summary = TRUE) R> data("aids", package = "JMbayes") R> par(mfrow = c(1, 2)) R> interaction.plot(aids$obstime[1:100], aids$patient[1:100], + aids$CD4[1:100], xlab = "Time(years)", ylab = "CD4 count", + legend = FALSE, col = 1:100) R> hist(aids$CD4, main = "", xlab = "CD4 count") In Guo and Carlin (2004) the CD4 counts were transformed with the square root function to use the Gaussian distribution for the response model. In this example we use the original counts and assume a Poisson distribution instead. In Figure 1 it is clear that no zero inflation is evident, although such phenomena could be incorporated into the model using a zero-inflated Poisson distribution for the longitudinal response (available as zeroinflatedpoisson0 or zeroinflatedpoisson1 for types 0 and 1, respectively). The individual CD4 trajectories are very different from one another and the need for individual-specific models are clear. This motivates the inclusion of subject-specific intercepts and slopes into the longitudinal submodel. In this example we assume the Weibull distribution for the survival times, although the exponential, log-Gaussian, log-Logistic or Cox proportional hazards assumptions could be used as well. We also rescale the time axis to the unit axis using the maximum time for this data set: All the times hereafter should thus be rescaled to (0; 21.4) for interpretation.

Preprocessing for joint modeling in INLA
Some preprocessing of the data is required to perform the joint analysis. The full details are omitted here but the concept is illustrated in (7) and the structuring is evident from the forthcoming example code. For each submodel the responses and covariates of the other submodel is stacked with NA's or 0. For the current example define, Then the joint model in (6) is an LGM similar to (1).
For this example, the construction of the responses and covariates for the joint model is done in the following manner.
First we create the joint fixed effects fixed.covariate by padding the covariates with NA's or 0's, for factors or numeric covariates, respectively. The covariates for the joint random effects random.covariate are constructed as a list of the padded random covariates. Here all covariates are padded with NA. Since we use model (6), random.covariate contains the patient identifiers patient for the random intercept and slope effects.
Finally, we construct a list of the longitudinal responses y.long and the survival repsonses y.surv, which forms our joint response object, Yjoint. Then, the new dataset jointdataCD4 is constructed from column binding fixed.covariate and random.covariate, and defining the response Y to be the list we created Yjoint.

Patient-specific predictions
To use the model for patient-specific predictions we extract the necessary components from the latent field of the longitudinal and survival submodels. We use the data in dataH to calculate the survival functions (corresponding to indices (nl+1):(nl+ns)) and dataL1 to illustrate the observed and estimated longitudinal trajectories (corresponding to indices 1:nl). The fitted values are extracted using JointmodelCD4$summary.fitted.values$mean.

Example 2: PSA levels and informative dropout
Here we use the prostate and dropout datasets from the JointModel package (Kim 2016) for N S = 100 patients with N L = 697 longitudinal observations. The longitudinal response is the logarithm of PSA levels after radiation therapy logPSA.postRT at time VisitTime for patient with number ID. The covariate is the entry PSA level logPSA.base. In the dropout dataset the time of dropout is given by DropTime for patient ID2 with informative dropout indicator Status = 1, and also the entry PSA level logPSA.base2 as a covariate.
We follow Hu and Sale (2003) and Kim, Zeng, and Taylor (2017) to estimate the longitudinal trajectory by correcting for the bias introduced by the informative dropout. Since the main objective of the analysis is to estimate the non-linear longitudinal trajectories while correcting the bias introduced by informative dropout, we will use the entire longitudinal predictor as the shared random effect, i.e., h(η L l (s)) = νη L l (s). To model the non-linear trajectory we use a random walk order two component over time α(t). The model is thus: where we assume a Weibull model for the dropout process. Again, we preprocess the original data in the JointModel package. The resulting data set is available in the INLA package as exampledata/psa/jointdataPSA.rds.

Inference for the joint model
We extract the results as follows: R> summary(JointmodelPSA)  Both the intercepts and the logPSA.base covariate are significant. From the hyper parameters, the spline is significant in its departure from linearity as deduced from the precision of the spline Precision for inla.group(V1, n = 50) = 5.643. The shape parameter for the Weibull model is estimated as alpha parameter for weibullsurv[3] = 0.814 which implies a decreasing hazard of dropout over time.
The association parameter ν = Beta for b.eta = 1.167 is significant and it is thus clear that the joint model approach is necessary for this data set, to account for the informative dropout.
For illustration we can extract the spline component with JointmodelPSA$summary.random$' inla.group(V1, n = 50)' and the model-based PSA levels with JointmodelPSA$summary.
fitted.values as in Example 1. The estimated non-linear longitudinal average trajectory is illustrated in Figure 3 by the red line, the model-based PSA levels are given by the blue circles and the grey circles represent the observed PSA levels.

Non-separable space-time models
The INLA package has been very successful in space and space-time modeling by representing spatial models with sparse matrices using the stochastic partial differential equations (SPDE) approach (Lindgren, Rue, and Lindström 2011;Bakka et al. 2018;Krainski et al. 2019). Space-time models are usually constructed as Kronecker products, resulting in separable models, where the space-time covariance function is a product of a spatial and a temporal covariance function. In INLA this is coded using the group and control.group arguments of the function f.
Instead of constructing a space-time model as an interaction between a spatial and a temporal model, Bakka, Krainski, Bolin, Rue, and Lindgren (2020) are developing a class of space-time models directly from the principles of diffusion processes in space-time. The basic building block is a Matérn model in space, which is smoothed by a space-time diffusion process. The spatial Matérn model is a natural starting point due to its wide use in spatial modeling in general, and in INLA in particular. Define the spatial differential operator where γ 2 s is a constant, and ∆ = (d 2 /dx 2 , d 2 /dy 2 ) is the Laplacian. The space-time diffusion process is governed by the differential operator known as a reaction-diffusion operator in physics, and used in many physical models. This operator is used in systems where mass (which can represent mass, energy, individuals, disease counts, or other characteristics) changes in time due to diffusion and replication.
In this section we discuss an implementation of the new models using the computational methods based on Gaussian Markov random fields (GMRFs), in INLA, by writing R code for the inference explicitly instead of using the inla function call. The main purpose of this section is to show clearly how the GMRF framework can be used to code inference in the context of complex spatio-temporal random effects. Further, the reader can study the sparsity structure of the matrices we present to see how well this approach fits with the research on parallel computations presented in Section 4. The code herein is meant for understanding software implementation, and is not suitable for applications, for that we refer to the online code examples in the Supplementary Material of Bakka et al. (2020).

Define precision matrices
We use simple temporal and spatial meshes as follows. The spatial mesh can be plotted by plot(mesh) and is described in Krainski et al. (2019).
Before we can define the separable and the non-separable models, we need to decide the hyper parameters for our two space-time models. We choose the following hyper parameters (γ's) to give reasonable random fields, and to make the separable and non-separable models as similar as possible when it comes to scale parameters, see Bakka et al. (2020).
We select the following hyper parameters of the random effects for the separable model R> range.time <-20 R> range.space <-6 R> sigma.u <-1 and for the non-separable model R> gt <-2.23 R> gs2 <-0.2222 R> ge2 <-0.0805 We use the finite element method (FEM) from Lindgren et al. (2011), adopted by Bakka et al. (2020) to the following M -notation. We note that the temporal model is first order Markov, and that a higher order Markov structure would be used for models with a higher smoothness in time (Bakka et al. 2020).
We first preprocess the spatial and temporal FEM matrices.

Prior simulation
We simulate a Matérn field for t = 1. This can be done through the separable or the nonseparable model, since they are both Matérn marginally for t = 1. We use the seed and num.threads = 1 arguments to get reproducible simulations. Further, we add a small noise to the observations to give a more realistic inference problem. The dataframe df is represented for all of space and time, but we replace the observations by NA after year 1.

Inference in R
We follow the book Rue and Held (2005) and compute posterior precision matrices and means, by conditioning on df$y. The following code chunk replaces the call to inla in this example, showing how the sparsity of the precision matrices impacts the main computational part of the INLA inference procedure.
First the precision matrix for observation noise Qeps, the projection matrix for the observed latent field A.observe and the posterior/conditional precision matrix post.Q are specified.
We plot the point predictions (posterior mean) in space-time, in Figure 4. Note that in year 1 the field is conditioned on data on nearby locations, hence the separable and the non-separable models give very similar results. Year 2 and 3, however, represent forecasts based on the data observed in year 1. The plots shown here are for the first three years, but the for loop can be extended to show all six. In the figure we see a clear difference between the separable and the non-separable models. The separable model forecasts a simple decay to the mean of the current observations, while the non-separable model results in smoother forecasts. Due to the very long temporal range, the decay to the mean is hard to spot visually. We argue that the non-separable forecast is more appropriate in most applied situations. When forecasting, e.g., the temperature in a location in the future, the model should use not just the temperature in the same location today, but also use the temperature in nearby locations, resulting in a smoother forecast. One classical example of this is hot water poured into cold water; we expect the two temperatures to regress to the mean by mixing and smoothing out differences.

Posterior simulations
Finally, we show how to simulate from the posterior, in Figure 5. As before, the first year is very similar, because we conditioned on data here, while year 2 and 3 show different simulations into the future. This code is a replacement for inla.posterior.sample, showing the similarity between posterior simulations for the separable and non-separable models. In this code we used the option reordering = "identity" in the inla.qsample function. The purpose of this is to use the same random noise, and the same reordering, to get a close comparison between the simulations. In general, we recommend to use inla.qsample with a seed to get deterministic and reproducible behavior, but to use the default reordering scheme to speed up computations.

High performance and parallel computing with INLA
The widespread acceptance of the INLA approach and the R-INLA software manifested as the INLA package, were not foreseen when the INLA methodology was originally developed: Hence, the INLA package has continuously evolved from research code started more than 15 years ago, adopting designs made for single-core execution in mind. Today, there is a growing demand for analyzing much larger models: typically, either a large amount of observations and/or a large number of latent variables (read space-time models, for simplicity). And we have already started to provide better support for the increasingly larger statistical models of today running on computational platforms of tomorrow (typically multicore or manycore and possibly hardware accelerated).
At the core of the INLA algorithm, is numerical linear algebra for large sparse matrices. The tasks that are required, are for a symmetric positive definite matrix Q of dimension n, the ability to repeatedly compute • the Cholesky factorization Q = LL , where L is a lower triangular matrix, • solve linear systems like Lx = b, L x = b, LL x = b, and • compute selected elements of the inverse of Q, (Q −1 ) ij , for all ij where Q ij is non-zero.
Additionally, we need also log |Q|, but since the Cholesky factor is available, it is simply i 2 log L ii . During the entire INLA algorithm, the non-zero pattern of Q is the same, which simplifies some of the initial procedures, like finding a good reordering scheme.
For smaller n, like n ∼ 10 4 to 10 5 for a spatial model, the serial algorithms for these tasks will run fine, as we have parallelized (using OpenMP) on a higher level like factorizing several matrices at once. For larger n, like n ∼ 10 5 to 10 6 , this approach is no longer practical. Also, the type of model considered plays a role here; space-time models are O( √ n) more costly, and require more memory, than a spatial one, hence dimension where the serial sparse matrix algorithms is no longer practical, will be less.
The need for parallel numerical methods for large sparse matrices on shared-memory and distributed-memory multiprocessors, has been evident for quite some time. While there is a vast literature on the development of efficient algorithms for the direct solution of sparse linear systems of equations, only a few software package are available, such as, e.g., MUMPS (Amestoy, Duff, Koster, and L'Excellent 2001;Amestoy, Guermouche, L'Excellent, and Pralet 2006), WSMP (Gupta 2002), SuperLU (Li 2005), CHOLMOD (Davis 2006). Neither of these libraries provide parallel algorithms for all our required matrix operations listed above, as they do not have a parallel implementation of the algorithm to compute selected elements of the inverse. (CHOLMOD supports a serial version of this algorithm.) How to efficiently compute selected elements of the inverse of a sparse matrix, has been known for a quite some time (Takahashi, Fagan, and Chen 1973;Erisman and Tinney 1975), but a parallel version of this algorithm was not available in a main sparse matrix library before the work of Verbosio, Coninck, Kourounis, and Schenk (2017) was made available in the PARDISO package (Schenk and Gärtner 2004;Kuzmin, Luisier, and Schenk 2013;Petra, Schenk, Lubin, and Gärtner 2014). According to Gould, Scott, and Hu (2007), PARDISO is one of the best performing parallel libraries for numerical computations for large sparse matrices.
A collaboration between the PARDISO 1 and INLA projects was initiated early 2018, ending up with a special version of the PARDISO package for INLA which was integrated into INLA and released in May 2018. With this new tool, we are now able to run successfully statistical models with n in the millions on KAUST computational servers. The paralellization strategy, that currently is supported using argument control.compute = list(openmp.strategy = "pardiso.parallel"), is to do one matrix at the time using a parallel algorithm to factorize, solve and compute selected entries of the inverse. The future plans for this collaboration, includes improvement of the integration with the INLA algorithm including nested parallelism, and also to extend the PARDISO interface so we can make use of more efficiently computing capabilities exploiting the parallel computing support in PARDISO to enable parallel distributed and accelerated execution of the main numerical tasks required in the INLA algorithm.
To illustrate the abilities of the PARDISO package to work with huge matrices, we ran a series of tests on our computational server, with 512 Gb of RAM, 2 sockets with 16 cores per socket, and with Intel Xeon Gold 6130 CPUs @2.10GHz. The test matrix is constructed to be very challenging, mimicking a large space-time model with the same non-zero structure as the 3-dimensional Laplace equation on a n × n × n cube (which is the worst configuration). Additionally, we added 25 dense rows/columns to mimic the presence of fixed effects in the model. For the (n 3 + 25) × (n 3 + 25) sparse matrix, have about 56 neighbors for each node.
The storage required is about 0.22 Gb for n = 100 and 1.72 Gb for n = 200, to store its non-zero elements. Additionally, we need to store their (relative) location within the matrix. Figure 6 shows the results for n = 100, 120, 140, 160, 180 and 200, using nc = 1, 2, 4, 8, 12, 16 and 32 cores, for doing Cholesky factorization (left) and the partial inverse (right). The results demonstrate a consistent behaviour for the running time both with varying n and nc. The computational cost reduces nicely from nc = 1 and 2 and to 4, but then the speedup fades off. We do not gain much going beyond 16 cores for this example, and the partial inverse is somewhat more expensive to compute than the Cholesky factorization. The results are very encouraging as it shows that PARDISO can handle sparse matrices of this size and structure without problems. The integration of INLA and PARDISO will be further improved and we are currently working on this issue.

Discussion
Bayesian modeling is ever present and still increasing in popularity in applied fields of science. Initially, the inference was performed using sampling-based methods like Gibbs sampling. These methods, however, are often time-consuming and computationally inefficient. From this impediment, approximate Bayesian inference approaches sprouted. (One of) The most popular non-sampling based Bayesian inference approach is the INLA methodology, facilitated through the INLA package. INLA is developed for the class of LGMs, that contains most well-known statistical models. Since the inception of INLA in 2009 through the seminal paper (Rue et al. 2009), the use of the INLA methodology has been cited more than 3000 times.
The success of INLA as a computational inferential framework for Bayesian modeling is partly attributed to the continual development and expansion of package INLA. As evident in this paper, relevant statistical methodology is developed and implemented incessantly in INLA as to provide scientists with a computational platform for state-of-the-art Bayesian models.
The specific developments presented herein address some current Bayesian modeling demands.
In biomedical applications, the use of joint models for survival and longitudinal data is imperative. The efficacy of treatments as measured on multiple endpoints is a crucial step in drug design, and necessitates the use of joint modeling of the endpoints. In this paper, we presented the implementation of joint models with one survival and one longitudinal endpoint. Future developments in this field are under way and the need for a unique interface for these joint models, based on the INLA architecture is clear. The potential for further developments in this realm based on INLA is encouraging. In the flavor of joint models, the extension to spatial joint models, joint models with competing risks or recurrent events and generalized multiple endpoint modeling are some examples of models that could be implemented in INLA based on the approach presented herein. Multi-state models and competing risks models are also of major interest in the biomedical field, and with their implementation in INLA the extensions to spatial or smoothing spline random effects would be trivial.
The innovative SPDE approach for space and space-time models as used in INLA serves as a gateway for extensions in the field of space-time modeling. The development of a class of non-separable space-time models is motivated by current needs in the analysis of complex real space-time data, and is based on physical diffusion processes. This extension is based on the definition of a particular SPDE which is then solved using finite element methods, and con-trasts to more common attempts at generalizing the covariance matrix or the spectrum. This approach is unique to INLA (within software for Bayesian modeling, as far as we know) and ensures unequivocal computational efficiency, without additional approximations, compared to other methods in the literature.
Based on the generalization to non-separable space-time models and the increasing computational demand through big data, the ability of INLA to perform in a high performance computing environment necessitates the development of tools available in INLA that can optimally facilitate the computational burden using high performance computing architecture.
To this end, we present the current and future collaborative work on this front using the PARDISO package in conjunction with INLA. This project promotes the use of INLA to an even wider audience and ensures the applicability of INLA for Bayesian inference in the future.
INLA equips the user with powerful Bayesian modeling tools that are computationally efficient and relevant. The ongoing research and development of INLA ensures congruence to stateof-the-art statistical methodology and places the user at the vanguard of their field.