D-STEM v2: A Software for Modelling Functional Spatio-Temporal Data

Functional spatio-temporal data naturally arise in many environmental and climate applications where data are collected in a three-dimensional space over time. The MATLAB D-STEM v1 software package was first introduced for modelling multivariate space-time data and has been recently extended to D-STEM v2 to handle functional data indexed across space and over time. This paper introduces the new modelling capabilities of D-STEM v2 as well as the complexity reduction techniques required when dealing with large data sets. Model estimation, validation and dynamic kriging are demonstrated in two case studies, one related to ground-level air quality data in Beijing, China, and the other one related to atmospheric profile data collected globally through radio sounding.


Introduction
With the increase of multidimensional data availability and modern computing power, statistical models for spatial and spatio-temporal data are developing at a rapid pace. Hence, there is a need for stable and reliable, yet updated and efficient, software packages. In this section, we briefly discuss multidimensional data in climate and environmental studies as well as statistical software for space-time data.

Multidimensional data
Large multidimensional data sets often arise when climate and environmental phenomena are observed at the global scale over extended periods. In climate studies, relevant physical variables are observed on a three-dimensional (3D) spherical shell (the atmosphere) while time is the fourth dimension. For instance, measurements are obtained by radiosondes flying from ground level up to the stratosphere , by interferometric sensors aboard satellites (Finazzi et al. 2018) or by laser-based methods, such as Light Detection and Ranging (LIDAR) . In this context, statistical modelling of multidimensional data requires describing and exploiting the spatio-temporal correlation of the underlying phenomenon or data-generating process. This is done using explanatory variables and multidimensional latent variables with covariance functions defined over a convenient spatio-temporal support. When considering 3D×T data (4D for brevity), covariance functions defined over the 4D support may be adopted. However, these covariance functions often have a complex form (Porcu et al. 2018). Moreover, when estimating the model parameters or making inferences, very large covariance matrices (though they may be sparse) are implied. In large climate and environmental applications, 4D data are rarely collected at high frequency in all spatial and temporal dimensions. Often, only one dimension is sampled at high frequency while the remaining dimensions are sampled sparsely. Radiosonde data, for instance, are sparse over the Earth's sphere, but they are dense along the vertical dimension, providing atmospheric profiles. This suggests that handling all spatial dimensions equally (e.g. using a 3D covariance function) may not be the best option from a modelling or computational perspective, and a data reduction technique may be useful instead. In this paper, the functional data analysis (FDA) approach (Ramsay and Silverman 2007) is adopted to model the relationship between measurements along the profile, while the remaining dimensions are handled following the classic spatio-temporal data modelling approach using only 2D spatial covariance functions.

Statistical software
Various software programmes are available for considering data on a plane or in a twodimensional (2D) Euclidean space. The choice is more restricted when considering multidimensional or non-Euclidean spaces arising from atmospheric or remote sensing spatiotemporal data observed on the surface of a sphere and over time.
For example, Figure 1 depicts the spatial locations of measurements collected globally in a single day through radio sounding, as discussed in Section 5. Space is three-dimensional, and measurements are repeated over time at the same spatial locations over the Earth's surface but at different pressure values.
The spBayes package (Finley et al. 2015) handles large spatio-temporal data sets, but space is only 2D. The documentation of the spacetime (Pebesma 2012) and gstat (Pebesma and Heuvelink 2016) packages does not explicitly address the multidimensional case, but, according to Gasch et al. (2015), both packages have some capabilities to handle the 3D×T. However, we want to avoid working with 3D spatial covariance functions or sample spatiotemporal variograms. Fixed rank kriging (Cressie and Johannesson 2008) implemented in the R package FRK (Zammit-Mangion 2018) handles spatial and spatio-temporal data both on the Euclidean plane and on the surface of the sphere. FRK implements a set of tools for data gridding and basis function computation, resulting in efficient dimension reduction, allowing it to handle large satellite data sets (Cressie 2018). It is based on a spatio-temporal random effects (SRE) model estimated by the expectation-maximisation (EM) algorithm. Recent extensions to FRK include the use of multi-resolution basis functions (Tzeng and Huang 2018).
A second package based on SRE and the EM algorithm is D-STEM v1 (Finazzi and Fassò 2014). This package implements an efficient state-space approach for handling the temporal dimension and a heterotopic multivariate response approach that is useful when correlating heterogeneous networks (Fassò and Finazzi 2011;Calculli et al. 2015).
D-STEM v1 has been successfully used in various medium-to-large applications, proving that the EM algorithm implementation, being mainly based on closed-form iterations, is quite stable. These applications include air quality assessment in the metropolitan areas of Milan, Teheran and Beijing (Fassò 2013;Taghavi-Shahri et al. 2019;Wan et al. 2020); multivariate spatio-temporal modelling at the country and continental levels in Europe (Finazzi et al. 2013;Fassò et al. 2016); time series clustering ; emulators of atmospheric dispersion modelling systems (Finazzi et al. 2019); and near real-time prediction of earthquake parameters (Finazzi 2020).
A brief, non-exhaustive list of other models and/or software packages for advanced spatial data modelling is presented below, according to the principal technique, allowing the handling of large data sets. In general, these techniques aim at avoiding the Cholesky decomposition of large and dense covariance matrices. Some approaches, including FRK and D-STEM v1, leverage sparse variance-covariance matrices. Others exploit the sparsity of the precision matrix, thanks to a spatial Markovian assumption. This class includes the R packages LatticeKrig (Nychka et al. 2015(Nychka et al. , 2016, INLA (Blangiardo et al. 2013;Lindgren and Rue 2015;Bivand et al. 2015;Rue et al. 2014) and the multi-resolution approximation approach of Katzfuss (2017), which uses the predictive process and the state space representation (Jurek and Katzfuss 2018) to model spatio-temporal data. Low-rank models are another popular approach used by spBayes. Finally, the R package laGP (Gramacy 2016), based on a machine learning approach, implements an efficient nearest neighbour prediction-oriented method. Heaton et al. (2018) develop an interesting spatial prediction competition considering a large data set and involving the above-mentioned approaches.
We observe that, although some of the software packages mentioned above consider both space and time, to the best of our knowledge, none of them handles a spatio-temporal FDA approach for data sets of the kind discussed in 1.1.
In this paper, we present D-STEM v2, which is a MATLAB package, extending D-STEM v1. The new version introduces modelling of functional data indexed over space and time. Moreover, new complexity reduction techniques have been added for both model estimation and dynamic mapping, which are especially useful for large data sets. The rest of the paper is organised as follows. Section 2 introduces the methodology adopted in this paper and, in particular, the data modelling approach and the complexity-reduction techniques. Section 3 describes the D-STEM v2 software in terms of the MATLAB classes used to define the data structure, model fitting and diagnostics and kriging. This is followed by an illustration of the software use through two case studies. The first one, discussed in Section 4, considers high-frequency spatio-temporal ozone data in Beijing. The second one, in Section 5, considers modelling of global atmospheric temperature profiles and exploits the complexity-reduction capabilities of the new package. Finally, concluding remarks are provided in Section 6.

Methodology
This section discusses the methodology behind the modelling and the complexity-reduction techniques implemented in D-STEM v2 when dealing with functional space-time data sets. Moreover, model estimation, validation and dynamic kriging are briefly discussed.

Model equations
Let s = (s lat , s lon ) be a generic spatial location on the Earth's sphere, S 2 , and t ∈ N a discrete time index. It is assumed that the function of interest, f (s, h, t), with domain H = [h 1 , h 2 ] ⊂ R, can be observed at any (s, t) and h ∈ H through noisy measurements y (s, h, t) according to the following model: (1) This model is referred to as the functional hidden dynamic geostatistical model (f-HDGM).
In Equation (1), ε is a zero-mean Gaussian measurement error independent in space and time with functional variance σ 2 ε (h), implying that ε is heteroskedastic across the domain H. The variance is modelled as log(σ 2 ε (h)) = φ(h) c ε , where φ(h) is a p×1 vector of basis functions evaluated at h, while c ε is a vector of coefficients to be estimated. In Equation (2), x(s, h, t) is a b × 1 vector of covariates while β (h) = (β 1 (h), ..., β b (h)) is the vector of functional parameters modelled as is a pb × 1 vector of coefficients that needs to be estimated. Additionally, z(s, t) is a p × 1 latent space-time variable with Markovian dynamics given in Equation (3). The matrix G is a diagonal transition matrix with diagonal elements in the p × 1 vector g. The innovation vector η is obtained from a multivariate Gaussian process that is independent in time but correlated across space with matrix spatial covariance function given by where v = (v 1 , ..., v p ) is a vector of variances and ρ(s, s ; θ j ) is a valid spatial correlation function for locations s, s ∈ S 2 , parametrised by θ j , and θ = (θ 1 , ..., θ p ) . The unknown model parameter vector is given by ψ = c ε , c β , g , v , θ .
Note that, in order to ease the notation, the same p-dimensional basis functions φ(h) are used to model σ 2 ε , β j and φ(h) z(s, t) in Equations (1)-(3). In practice, D-STEM v2 allows one to specify a different number of basis functions for each model component. Also note that ε is not a pure measurement error since it also accounts for model misspecification. Finally, the covariates x(s, h, t) are assumed to be known without error for any s, h and t, and thus they do not need a basis function representation.

Basis function choice
Choosing basis functions essentially means choosing the basis type and the number of basis functions. D-STEM v2 currently supports Fourier bases and B-spline bases. The former guarantee that the function is periodic in the domain H, while the latter are not (in general) periodic but have higher flexibility in describing functions with a complex shape. Whichever basis function type is adopted, the number p of basis functions must be fixed before model estimation. Usually, a high p implies a better model R 2 , but over-fitting may be an issue. Moreover, special care must be taken when choosing the number of basis functions for φ(h) z(s, t). The classic FDA approach suggests fixing a high number of basis functions and adopting penalisation to avoid over-fitting. In our context, this is not viable since the covariance matrices involved in model estimation have dimension n 3 p 3 × n 3 p 3 . Since n is usually large, a large p would make model estimation unfeasible, especially if the number of time points T is also high. When using B-spline basis, a small p implies that the location of knots along the domain H also matters and may affect the model fitting performance. Ideally, p and knot locations are chosen using a model validation technique (see 2.7) by trying different combinations of p and knot locations. If, due to time constraints, this is not possible, equally spaced knots are a convenient option.

Model estimation
The estimation of ψ and the latent space-time variable z(s, t) is based on the maximum likelihood approach considering profile data observed at spatial locations S = {s i , i = 1, ..., n} and time points t = 1, ..., T .
At a specific location s i and time t, q i,t measurements are taken at points h s i ,t = h i,1,t , ..., h i,q i,t ,t and collected in the vector here called the observed profile.
Although D-STEM v2 allows for varying q i,t , for ease of notation, it is assumed here that all profiles include exactly q measurements, although h s i ,t may be different across profiles. Profiles observed at time t across spatial locations S are then stored in the nq × 1 vector y t = (y s 1 ,t , ..., y sn,t ) . Applying model (1)-(3) to the defined data above, we have the following matrix representation: whereX t = X t Φ β,t is a nq × bp matrix, with X t the matrix of covariates and Φ β,t the basis matrix for β. Φ z,t is the nq × np basis matrix for the latent np × 1 vector z t = (z(s 1 , t) , ..., z(s n , t) ) . η t = (η(s 1 , t) , ..., η(s n , t) ) is the np × 1 innovation vector, while ε t is the nq × 1 vector of measurement errors. Additionally,G = I n ⊗ G is the np × np diagonal transition matrix.
The complete-data likelihood function L(ψ; Y , Z) can be written as , ψ y = c ε , c β , and z 0 is the Gaussian initial vector with parameter ψ z 0 . Maximum likelihood estimation is based on an extension of the EM algorithm detailed in Calculli et al. (2015). The model parameter set ψ is initialised with starting values ψ 0 and then updated at each iteration ι of the EM algorithm.
The algorithm terminates if any of the following conditions is satisfied: where ψ ι l is the generic element of ψ ι at the ι-th iteration, L(ψ ι ; Y ) is the observed-data likelihood function evaluated at ψ ι , 0 < 1 1 and 0 < 2 1 are small positive numbers (e.g. 10 −4 ), while ι * is a user-defined positive integer number (e.g. 100) to limit the iterations in the case of convergence failure of the EM algorithm.
Note that S is not time-varying, which means that spatial locations are fixed. This could be a limit in applications where spatial locations change for each t. On the other hand, missing profiles are allowed; that is, y s i ,t may be a vector of q missing values at some t. In the extreme case, a given spatial location s i has only one profile over the entire period (if all the profiles are missing, the spatial location can be dropped from the data set). Shumway and Stoffer (2017, p. 348) explains how the likelihood function of a state-space model changes in the case of a missing observation vector and how the EM estimation formulas are derived. Missing data handling in D-STEM v2 is based on the same approach.

Partitioning
At each iteration of the EM algorithm, the computational complexity of the E-step is O T n 3 p 3 , which may be unfeasible if n is large. When necessary, D-STEM v2 allows one to use a partitioning approach (Stein 2013) for model estimation. The spatial locations S are divided into k partitions, and z t is partitioned conformably, namely, From the EM algorithm point of view, this implies that the E-step is independently applied to each partition, possibly in parallel. When all partitions are equal in size, the computational complexity reduces to O T kr 3 p 3 , with r as the partition size.
Geographical partitioning, constructed aggregating proximal locations, is a natural choice for environmental applications. Given the number of partitions k, the k-means algorithm applied to spatial coordinates provides a geographical partitioning of S. However, the number of points in each partition is not controlled, and a heterogeneous partitioning may arise. If some subsets are very large and others are small, the reduction in computational complexity given above is far from being achieved. This can easily happen, for example, when S is a global network constrained by continent shapes.
For this reason, D-STEM v2 provides a heuristically modified k-means algorithm that encourages partitions with similar numbers of elements. The algorithm optimises the following objective function: where λ ≥ 0, S j ⊂ S is the set of coordinates in the j-th partition, d is the geodesic distance on the sphere S 2 and c j and r j are the centroid and the number of elements in the j-th partition, respectively. The second term in (4) accounts for the variability of the partition sizes and acts as a penalisation for heterogeneous partitionings. Clearly, when λ = 0, the above-mentioned objective function gives the classic k-means algorithm. For high values of λ, solutions with similarly sized partitions are favoured. Unfortunately, an optimality theory for this algorithm has not yet been developed, and the choice of λ is left to the user. Nonetheless, it may be a useful tool to define a partitioning that is appropriate for the application at hand with regard to computing time and geographical properties.

Variance-covariance matrix estimation
The EM algorithm provides a point estimate of the parameter vector ψ but no uncertainty information. Building on Shumway and Stoffer (2017, p. 408), D-STEM v2 estimates the variance-covariance matrix Σ ψ,T = V (ψ | Y ), by means of the observed Fisher information matrix, I T , namelyΣ To understand its computational cost, note that the information matrix given above may be written as a sum: I T = T t=1 i t . For large data sets, each matrix i t may be expensive to compute, and the total computational cost is linear in T , provided missing data are evenly distributed in time. This results in a timeconsuming task with a computational burden even higher than that for model estimation. For this reason, D-STEM v2 makes it possible to approximateΣ ψ,T using a truncated information matrix, namely:Σ which reduces the computational burden by a factor of 1 − t * /T . SinceΣ ψ,t * →Σ ψ,T for t * → T , the truncation time t * is chosen to control the approximation error inΣ ψ . In particular, t * is the first integer such that where · F is the Frobenius norm, and δ may be defined by the user. Generally speaking, the behaviour ofΣ ψ,T for large T and, hence, the behaviour ofΣ ψ,t relays on stationarity and ergodicity of the underlying stochastic process; see, for example, Shumway and Stoffer (2017, Property P6.4) and references therein.
To have operative guidance for the user, let us assume first that no missing values are present, the information matrix is well-conditioned and the covariates have no isolated outliers or extreme trends. In this case, away from the borders t ∼ = 1 and t ∼ = T , the observed conditional information i t has a relatively smooth stochastic behaviour, and the approximation in (5) is expected to be satisfactory at the level defined by δ. Conversely, if some data are missing at time t, the information i t is reduced accordingly. If the missing pattern is random over time, this is not an issue. But, in the unfavourable case with a high percentage of missing data mostly concentrated at the end the time series, t ∼ = T , the above approximation may over-estimate the information and under-estimate the variances of the parameter estimates.

Dynamic kriging
In this paper, dynamic kriging refers to evaluating the following quantities: for any s ∈ S 2 , h ∈ H and t = 1, ..., T . A common approach is to map the kriging estimates on a regular pixelation S * = {s * 1 , ..., s * m }. This may be a time-consuming task when m and/or n and/or T are large. To tackle this problem, D-STEM v2 allows one to exploit a nearest-neighbour approach, where the conditioning term in Equations (7) and (8) is not Y , but the data at the spatial locations S ∼j , where S ∼j ⊂ S is the set of theñ n nearest spatial locations to s * j . The use of the nearest-neighbour approach is justified by the so-called screening effect. Even when the spatial correlation function exhibits long-range dependence, it can subsequently be assumed that y at spatial location s is nearly independent of spatially distant observations when conditioned on nearby observations (see Stein 2002;Furrer et al. 2006, for more details).
For computational efficiency, D-STEM v2 performs kriging for blocks of pixels. To do this, S * is partitioned in u blocks S * = {S * 1 , ..., S * u }, and kriging is done on each block S * l , l = 1, ..., u, with u m controlled by the user. For each target block S * l , the conditioning term in Equations (7) and (8) is given by the data observed atS l = j∈J l S ∼j , J l = j : s * j ∈ S * l . Note that, if S * l is dense and S is sparse (namely n m), thenS l is not much larger than S ∼j since most of the spatial locations in S * l tend to have the same neighbours S ∼j .

Validation
D-STEM v2 allows one to implement an out-of-sample validation by partitioning the original spatial locations S into subsets S est and S val . Data at S est are used for model estimation while data at S val are used for validation. Once the model is estimated, the kriging formula in Equation (7) is used to predict at S val for all times t and heights h. The following validation mean squared errors are then computed whereŷ (s, h, t) is obtained from Equation (7), while P 1 , P 2 and P 3 are the number of terms in each sum.
When h s,t varies across the profiles, D-STEM v2 provides a binned MSE by splitting the continuous domain H into B equally spaced intervals. Let H * r be the set of observation points in the r-th interval, let n r be the corresponding observation number and leth r = 1 nr h∈H * r h be the mean of points in b-th interval. Then, the M SEh r is computed by where P 4 is the total number of observations in the b-th interval.
D-STEM v2 also provides the validation R 2 with respect to time . and the analogous validation R 2 with respect to location s and h r .

Software
This section starts by briefly describing the modelling capabilities of D-STEM v2 inherited by the previous version for dealing with spatio-temporal data sets. Then, it focuses on the D-STEM v2 classes and methods, which implement estimation, validation and dynamic mapping of the model presented in Section 2. Although some of the classes are already available in D-STEM v1, they are listed here for completeness.

Software description
D-STEM v1 implemented a substantial number of models. The dynamic coregionalisation model (DCM, Finazzi and Fassò 2014) and the hidden dynamic geostatistical model (HDGM, Calculli et al. 2015) are suitable for modelling and mapping multivariate space-time data collected from unbalanced monitoring networks. Model-based clustering (MBC, Finazzi et al. 2015) has been introduced for clustering time series, and it is suitable for large data sets with spatially registered time series. Moreover, the emulator model (Finazzi et al. 2019) is based on a Gaussian emulator, and it is exploited for modelling the multivariate output of a complex physical model.
In addition, D-STEM v2 (available at github.com/graspa-group/d-stem) provides the functional version of HDGM, denoted by f-HDGM, which handles modelling and mapping of functional space-time data, following the methodology of Section 2. For implementing f-HDGM, D-STEM v2 relies on the MATLAB version of the fda package (Ramsay et al. 2018), which is automatically downloaded and installed by D-STEM v2.

Data format
Two data formats are available to define observations for the f-HDGM. One is the internal format used by the D-STEM v2 classes, and the other one is the user format based on the more user-friendly At the table row corresponding to location s i and time t, the elements related to y and x are vectors with q i,t elements. Vectors related to y may include missing data (NaN). If y is entirely missing for a given (s, t), the row must be removed from the table. Since spatial locations S are fixed in time, and as their number n is determined by the number of unique coordinates in the table, profiles observed at different time points but the same spatial location s must have the same coordinates.

Software structure
In D-STEM, a hierarchical structure of object classes and methods is used to handle data definition, model definition and estimation, validation, dynamic kriging and the related plotting capabilities. The structure is schematically given below. Further details on the use of each class are given within the two case studies in this paper, while class constructors, methods and property details can be obtained in MATLAB using the command doc <class_name>.

Data handling
The stem_data class allows the user to define the data used in f-HDGM models, mainly through the following objects and methods.
• Objects of stem_data -stem_modeltype: model type (DCM, HDGM, MBC, Emulator or f-HDGM); note that model type is needed here because the data structure varies among the different models; -stem_fda: basis functions specification; -stem_validation (optional): definition of the learning and testing datasets for model validation.
• Methods and Properties of stem_data -kmeans_partitioning: data partitioning for parallel EM computations of Section 2.4; this method is applied to a stem_data object, and its output is used by the EM_estimation method in the stem_model class below; shape (optional): structure with geographical borders used for mapping.
Interestingly, stem_misc.data_formatter is a helper method, which is useful for building stem_varset objects starting from data tables. Its class, stem_misc, is a miscellanea static class implementing other methods for various intermediate tasks not discussed here for brevity.

Model building
The stem_model class is used to define, estimate, validate and output a f-HDGM, mainly through the following objects and methods.

Kriging
The kriging handling is implemented with two classes. The first is the stem_krig class, which implements the kriging spatial interpolation.
The second is the stem_krig_result class, which stores the kriging output and implements the methods for plotting the kriging output.
• Methods of stem_krig_result -surface_plot: mapping of kriging estimate and their standard deviation for fixed h; -profile_plot: method for plotting the kriging function and the variance-covariance matrix for a fixed space and time.
Although at first reading the user could prefer a single object for both input and output of the kriging, these objects may be quite large, making the current approach more flexible.

Case study on ozone data
This section illustrates how to make inferences on an f-HDGM for ground-level high-frequency air quality data collected by a monitoring network. In particular, hourly ozone (O 3 , in µg/m 3 ) measured in Beijing, China, is considered.

Air quality data
Ground-level O 3 is an increasing public concern due to its essential role in air pollution and climate change. In China, O 3 has become one of the most severe air pollutants in recent years (Wang et al. 2017). To describe the diurnal cycle of O 3 , which peaks in the afternoon and reaches a minimum at night-time, the 24 hours of the day are used as domain H of the basis functions, while the time index t is on the daily scale. Moreover, due to the circularity of time, Fourier basis functions are adopted, which implies that β j (h), σ 2 ε (h) are periodic functions. The measurement equation for O 3 is where s is the generic spatial location, h ∈ [0, 24) is the time within the day expressed in hours and t = 1, ..., 1096 is the day index over the period 2015-2017. Based on a preliminary analysis, the number of basis functions for β j (h), σ 2 ε (h) and φ(h) z(s, t) is chosen to be 5, 5 and 7, respectively.

Software implementation
This paragraph details the implementation of the D-STEM v2 in three aspects: model estimation, validation and kriging. Relevant scripts are demo_section4_model_estimate.m, demo_section4_validation.m and demo_section4_kriging.m, respectively, which are available in the supplementary material. All the scripts can be executed by choosing the option number from 1 to 3 in the demo_menu_user.m script.

Model estimation
This paragraph describes the demo_section4_model_estimate.m script devoted to the estimation of the model parameters and of their variance-covariance matrix.
The data set needed to perform this case study is stored as a MATLAB table in the user format of Section 3.2 and named Beijing_O3. It can be loaded from the corresponding file as follows: load ../Data/Beijing_O3.mat; In the Beijing_O3 table, each row refers to a fixed space-time point and gives a 24-element hourly ozone profile with the corresponding conformable covariates, which are: a constant, temperature and UVB.
o_model.EM_estimate(o_EM_options); delta = 0.001; o_model.set_varcov(delta); o_model.set_logL(); All the relevant estimation results are found in the internal stem_EM_result object, which can be accessed as a property of the o_model object as follows: o_model.plot_par; o_model.beta_Chi2_test; o_model.print; Figure 4 is produced by calling the plot_par method and shows the estimated β 0 (h), β temp (h), β uvb (h), and σ 2 ε (h). Thanks to the use of a Fourier basis, the functions are periodic with a period of one day. In the plot of σ 2 ε (h), the unexplained portion of O 3 variance, σ 2 ε (h), is small during daylight hours, which is consistent with the results of Dohan and Masschelein (1987).
When the confidence bands of parplot contain zero, it may be useful to test the significance of the covariates. By calling the method beta_Chi2_test, the results of χ 2 tests are obtained, and they are reported in Table 1. Although β uvb is close to 0 in the morning, all fixed effects are highly significant overall. The model output is shown in the MATLAB command window by calling the print method.

Validation
This paragraph describes the script demo_section4_validation.m, which implements validation. Compared to the code in demo_section4_model_estimate.m, it only differs in providing an object of class stem_validation.
To create the object called o_validation, the name of the validation variable is needed as well as the indices of the validation stations. Moreover, if the size of the nearest neighbour set for each kriging site (nn_size) is not provided as the third input argument in the stem_validation class constructor, D-STEM v2 uses all the remaining stations. For example, a validation data set with three stations is constructed as follows: Figure 4: Estimated β 0 (h), β temp (h), β uvb (h) and σ 2 (h), with 90%, 95%, 99% confidence bands, respectively, shown through the different shades. S_val = [1,7,10]; input_data.stem_validation = stem_validation( O3 , S_val); The validation statistics, computed by EM_estimate, are saved in the internal object stem_-validation_result, which can be accessed as a property of the o_model object. The stem_validation_result object contains the estimated O 3 residuals for the above-mentioned validation stations as well as the validation mean square errors and R 2 , as defined in Section 2.7.

Kriging
This paragraph describes the demo_section4_kriging.m script, which applies the approach of Section 2.6 to the estimated model to map the O 3 concentrations over Beijing city.
The first step is to create an object of class stem_grid, which collects the information about the regular grid of pixels B to be used for mapping. Then, an object of class stem_krig_data is created, where the o_krig_grid object is passed as an input argument: Two comments on the above lines follow. First, since the grid in the o_krig_grid object is regular, the dimensions of the grid (size(lat_mat), 35 × 43), must be provided as well as the shape of the pixels and the spatial resolution of the grid, which is 0.05 • × 0.05 • . Second, the above step using the stem_krig_data constructor may appear redundant at first glance. Indeed, it is needed for compatibility with other model types for which, in addition to the stem_grid object, other information is also necessary for the stem_krig_data constructor. Next, the stem_krig_options class provides some options for kriging. By default, the output is back-transformed in the original unit of measure if the observations have been log-transformed and/or standardised. The back_transform property enables handling this. Moreover, the no_varcov property must be set to 1 to avoid the time-consuming computation of the kriging variance. Eventually, the block_size property is used to define the number of spatial locations in S * l .
o_krig_options = stem_krig_options(); o_krig_options.back_transform = 0; o_krig_options.no_varcov = 0; o_krig_options.block_size = 30; After storing the map of Beijing boundaries into the o_model object, the latter is used with o_krig_data to create an object of class stem_krig. This and o_krig_options together contain all information for kriging, which is obtained by the corresponding kriging method: o_model.stem_data.shape = shaperead( ../Maps/Beijing_adm1.shp ); o_krig = stem_krig(o_model, o_krig_data); o_krig_result = o_krig.kriging(o_krig_options); Note that this task may be time consuming for large grids. The kriging output saved in the o_krig_result object gives the latent process estimate z t and its variance. The surface_plot and profile_plot methods may be used to obtain and plotf (s, h, t) of Equation (7). In this case, the user has to provide the corresponding covariate (X_beta) for the scale/vector h, time t or location s (lon0, lat0) of interest.
Specifically, the surface_plot method is used to display the O 3 map using h, t, X_beta as input arguments. In the case of unavailable X_beta, the mapping concerns the component φ ( Note that the prediction in Equation (7) and the variance in Equation (8) are stored in the output arguments y_hat, and diag_Var_y_hat, respectively.

Case study on climate data
In order to show the complexity-reduction capabilities of D-STEM v2, a data set of temperature vertical profiles collected by the radiosondes of the Universal Radiosonde Observation Program (RAOB) is now considered. The profiles are observed over the Earth's sphere, and they are misaligned, that is, each profile differs in terms of the number of observations and altitude above the ground of each observation. Additionally, the computation burden is higher due to the higher number of spatial locations at which profiles are observed.

RAOB data
Radiosondes are routinely launched from stations all over the world to measure the state of the upper troposphere and lower stratosphere. Data collected by radio sounding have applications in weather prediction and climate studies.
Temperature data from 200 globally distributed stations collected daily during January 2015 at 00:00 and 12:00 UTC are considered here. Each profile consists of a given number of measurements taken at different pressure levels. Since the weather balloon carrying the radiosonde usually explodes at an unpredictable altitude, the profile measurements are misaligned across the profiles and have different pressure ranges. A functional data approach is natural in this case since the underlying temperature profile can be seen as a continuous function sampled at some pressure levels. Right: each graph is a temperature vertical profile collected through radio sounding. Figure 1 depicts the spatial locations of temperature measurements taken on 1 January 2015 at 00:00 UTC. This demo data set, which only covers one month, includes around 10 5 data points. When the full data set is used in climate studies, the number of data points grows to around 10 8 . In this case, a recent server machine with multiple CPUs with at least 256 GB of RAM is required for model estimation and kriging.
The focus of the case study is on the difference between the radiosonde measurement and the output of the ERA-Interim global atmospheric reanalysis model provided by ECMWF. In particular, the aim is to study the spatial structure of the this difference in 4D space, where the dimensions are latitude, longitude, altitude and time.
The model for temperature y is as follows where h ∈ [50, 925] hP a is the pressure level, while t = 1, ..., 62 is a discrete time index for January 2015. Figure 7 shows the temperature measurements at a given station, where 50 and 925 hP a correspond approximately to 25 and 1.3 km, respectively.

Software implementation
This section details the software implementation of the case study described above as in script demo_section5.m, which can be also executed in the demo_menu_user.m script. To avoid repetition, only the relevant parts of the script that differ from the case study of Section 4 are reported and commented on here. In particular, data loading and instantiation of the stem_model object are not described.

Model estimation
The problem of vertical misalignment of the measurements is completely transparent to the user and is handled by the internal stem_misc.data_formatter method when creating the stem_data object. Note that the dimension of the matrices in o_varset depends on q, the maximum number of measurements in each profile. To prevent out-of-memory problems, it is advisable to avoid data sets in which only a few profiles have a large number of measurements, which could result in large matrices in o_varset, with most of the elements set to NaN.
B-spline bases are used, since, in this application, vertical profiles are not periodic with respect to the pressure domain. The corresponding object of class stem_fda is created in the following way: Note that the knots are equally spaced along the functional range. In general, however, non-equally spaced knots can be provided, and each model component (i.e. σ 2 ε , β j and φ(h) z(s, t)) can have a different set of knots. This is obtained using spline_order and spline_knots with additional suffixes _sigma, _beta, _z. Although this data set is not large, the demo shows how to enable the partitioning discussed in Section 2.4. First, the spatial locations are partitioned using the modified k-means algorithm: k = 5; trials = 100; lambda = 5000; partitions = o_data.kmeans_partitioning(k, trials, lambda); where k is the number of partitions, trials is the number of times when the k-means algorithm is executed starting from randomised centroids and lambda is λ in Equation (4).
At the end of the k-means algorithm, data are internally re-ordered for parallel computing. Model estimation is done after creating and setting an object of class stem_EM_options. To do this, the output of the kmeans_globe method is passed to the partitioning_block_size property of the o_EM_options object. Additionally, for parallel computing, the number of workers must be set to a value higher than 1. In general, this could be any number up to the number of cores available on the machine. The three validation MSEs defined in Section 2.7 are shown in Figure 8 and 9. To generate these figures, the method plot_validation is called with vertical = 1, which provides "atmospheric profile" plots with h on the vertical axis: vertical = 1; o_model.plot_validation(vertical);

Kriging
Interpolation across space and over time is done as in Section 4.2.3. However, complexity reduction is enabled by adopting the nearest neighbour approach detailed in Section 2.6.
To do this, a class constructor is first called, where the block_size is used to define the number of spatial locations in S * l , and then nn_size is used to defineñ. Additionally, setting o_krig_options.workers makes it possible to do the kriging over the u blocks in parallel using up to the allocated number of workers: o_krig_options = stem_krig_options(); o_krig_options.block_size = 150; o_krig_options.nn_size = 10; o_krig_options.workers = 2; Finally, kriging predictions and standard errors are mapped for a given h ∈ H and time t: h = 875.3; t = 12; o_krig_result.surface_plot(h, t);   Since covariates are not provided to the surface_plot method, the plots are on the component φ(h) ẑ(s, t), namely, the difference between RAOB and ERA-Interim and its standard deviation. The output of the above code is depicted in Figures 10 and 11.

Concluding remarks
This paper introduced the package D-STEM v2 through two case studies of spatio-temporal modelling of functional data. It is shown that, in addition to maximum likelihood estimation, Hessian approximation and kriging for large data sets, D-STEM v2 also develops several data-handling capabilities, allows for automatic construction of relevant objects and provides graphical output. In particular, it provides high-quality global maps and two kinds of functional plotting: the traditional x-y plot and the vertical profile plot, which is popular, for example, in atmospheric data analysis. In this regard, model validation and kriging are straightforward.
D-STEM v2 fills a gap in functional geostatistics. In fact, although statistical methods for georeferenced functional data have been recently developed (e.g. Ignaccolo et al. (2014)), standard geostatistical packages do not consider functional data, especially in the spatiotemporal context.
The successful use of D-STEM v1 in a number of applications proved that the EM algorithm implementation is quite stable. Now, due to improvements in computational efficiency, the new D-STEM v2 has the capability to handle large data sets. Moreover, thanks to the approximated variance-covariance matrix, it is possible to compute standard errors for all model parameters relatively fast and avoid the large number of iterations typically required by an MCMC approach for making inferences. However, a limit of the EM algorithm is its limited flexibility to changes in the model equations. Indeed, changes in parametrisation or latent variable structure usually require deriv-ing new closed-form estimation formulas and changing the software accordingly. Moreover, changes in covariance functions are not easy to handle. Computationally, the main limit of D-STEM v2 is in the number p of basis functions that can be handled. Even if partitioning is exploited in k blocks of size r, computational complexity is O T kr 3 p 3 , meaning that p cannot be large.
Currently, the authors are working on a new version, which makes it possible to handle multivariate functional space-time data and user-defined spatial covariance functions, which will make D-STEM v2 a valid and comprehensive alternative to the Gaussian process regression models (fitrgp) implemented in the Statistics and Machine Learning Toolbox of MATLAB.