Analysis of Random Fields Using CompRandFld

Statistical analysis based on random ﬁelds has become a widely used approach in order to better understand real processes in many ﬁelds such as engineering, environmental sciences, etc. Data analysis based on random ﬁelds can be sometimes problematic to carry out from the inferential prospective. Examples are when dealing with: large dataset, counts or binary responses and extreme values data. This article explains how to perform, with the R package CompRandFld , challenging statistical analysis based on Gaussian, binary and max-stable random ﬁelds. The software provides tools for performing the statistical inference based on the composite likelihood in complex problems where standard likelihood methods are diﬃcult to apply. The principal features are illustrated by means of simulation examples and an application of Irish daily wind speeds.


Introduction
Today, statistical analyses with one or more variables, observed in space and time are useful in different areas, such as environmental sciences and engineering fields (Cressie and Wikle 2011). Random fields (RFs) are the mathematical foundation for statistical spatial analyses; basic introductions to this topic can be found in: Wackernagel (1998), Diggle and Ribeiro (2007), Cressie and Wikle (2011) to name a few. One of the primary aims of spatio-temporal analyses, together with marginal information, is to assess the dependence structure of the underlying random field. With spatial or spatio-temporal applications, the fitting of models with the maximum likelihood (ML) method may be troublesome to apply for several reasons. We discuss two causes: when the analysis concerns large datasets and when the analysis concerns datasets that are not normally distributed.
In the first case, suppose that the dataset of interest, with size m, is coming from a Gaussian random field. For most of the well-known covariance models, the expression of the ML estimator is unknown and hence the parameter estimation is obtained by numerically maximizing the likelihood. Each likelihood evaluation requires the computation of the determinant and the inverse of the variance-covariance matrix and these tasks are generally performed with algorithms based on the Cholesky factorization, demanding a computational burden of the order O(m 3 ), see Bevilacqua, Gaetan, Mateu, and Porcu (2012) and the references therein. This can be very time consuming.
In the second case, two interesting types of data are widely discussed in the literature: categorical and extreme values. In order to model spatial binary or categorical data, one way is by dichotomizing or categorizing the outcome of a Gaussian RF into several classes depending on some threshold values, see for instance Lele (1998), de Leon (2005). By doing this, the joint probability model depends on the underlying latent process. For a dataset of size m, the expression of the joint distribution involves a sum of q m m-dimensional normal integrals, where q is the number of categories. The closed form of the ML estimator is unknown and hence as an alternative the parameter estimation can be done numerically, but with a high computational cost.
In order to model spatial extremes, one way is to work with max-stable RFs (see e.g., de Haan and Ferreira 2006, Chapter 9). These were introduced by de Haan (1984) as an extension of the block maxima approach from finite-dimensional to infinite-dimensional spaces. Also in this case likelihood-based inference is challenging. For most of the known models, (e.g., Davison, Padoan, and Ribatet 2012) although the closed form of the joint distribution is known, with a dataset of size m the derivation of the likelihood function requires computing a number of derivatives equal to the mth Bell number (Davison and Gholamrezaee 2012). Thus, for most spatial applications this calculation is not feasible in practice.
Alternative approaches have been proposed in order to perform parametric statistical inference with spatial and spatio-temporal dataset. For instance, in the last decade the strategies proposed are based on simplifications of the covariance structure (e.g., Cressie and Johannesson 2008) or likelihood approximations (e.g., Stein, Chi, and Welty 2004;Fuentes 2007;Kaufman, Schervish, and Nychka 2008). Inference based on the composite likelihood function (Lindsay 1988) has been shown to be a valid competitor. Motivations are the following. Firstly, the composite generalizes the ordinary likelihood and contains many interesting alternatives. Secondly, a number of theoretical results exist about the consistency and the asymptotic distribution of the related estimator and statistical tests. Thirdly, the computational cost is substantially lower than that of the likelihood. Finally, the approach is applicable with different types of data such as Gaussian, binary and extreme values. Examples of statistical analyses based on the composite likelihood are: Nott and Rydén (1999), Heagerty and Lele (1998) for binary data, Padoan, Ribatet, and Sisson (2010), Davison and Blanchet (2011) for extreme values, Bevilacqua et al. (2012) for Gaussian responses.
For Gaussian, binary and max-stable RFs, several routines for performing the statistical inference based on the composite likelihood have been put together and implemented in an R (R Core Team 2014) package named CompRandFld (Padoan and Bevilacqua 2015). This article describes how to use this package to perform statistical analysis of random fields. The article is organized as follows: Section 2 summarizes basic notions on Gaussian, binary and max-stable RFs, Section 3 summarizes the principal concepts of the composite likelihood approach with some simulation examples that explain the main capabilities of CompRandFld. Section 4 provides a real data example. Section 5 provides the conclusions.

Random fields 2.1. Gaussian random fields
Gaussian RF are useful for describing the random behavior of the mean levels of a real process.
Here, a few basic notions are recalled.
Let {Y (s, t), (s, t) ∈ I} be a real-valued random process defined on I, where I = S × T , with S ⊆ IR d and T ⊆ IR. When T = {t 0 } then I ≡ S and Y (s) ≡ Y (s, t 0 ) is a purely spatial random field. When S= {s 0 } then I ≡ T and Y (t) ≡ Y (s 0 , t) is a purely temporal random field. Let I := {1, . . . , l} and J := {1, . . . , k} be sets of indices and I * = I ×J be the Cartesian product with cardinality m = k·l. It is assumed for simplicity that Y (s, t) is a second-order stationary spatio-temporal Gaussian RF, that means IE{Y (s, t)} = µ and var{Y (s, t)} = ω 2 are finite constants for all (s, t) ∈ I and the covariance cov{Y (s, t), Y (s , t )} = IE{Y (s, t), Y (s , t)}−µ 2 , (s, t), (s , t ) ∈ I can be expressed by c(h, u), namely c(·, ·) is a positive definite function that depends only on the separations between the points h = s − s and u = t − t (Cressie and Wikle 2011, Chapter 6). The covariance is said to be translation-invariant. Since c(0, 0) = var{Y (s, t)}, then ρ(h, u) = c(h, u)/c(0, 0) is the correlation function of Y . In order to take into account the local variation of real phenomena the overall process is defined by Y (s, t) = Y 0 (s, t) + ε(s, t), where ε(s, t) ∼ N (0, τ 2 ) is a white noise independent from Y 0 . The covariance takes the following functional form where 1I(·) is the indicator function, σ 2 > 0 and ρ 0 are the common variance and correlation of the RF, whereas τ 2 > 0 is the local variance that describes the so-called nugget effect (e.g., Diggle and Ribeiro 2007;Cressie and Wikle 2011, Chapter 4). The variance of Y (s, t) is equal to ω 2 = τ 2 + σ 2 and the covariance function σ 2 ρ 0 (h, u). Since Y (s, t) is also intrinsically stationary (Cressie and Wikle 2011, Chapter 6), then its variability can equivalently be represented as 2γ(h, u) = var{Y (s + h, t + u) − Y (s, t)}. The function γ(·, ·) is called the semivariogram while the quantity 2γ(·, ·) is known as the variogram and under second order stationary γ(h, u) = c(0, 0) − c(h, u). A stationary space time covariance is called spatially isotropic if c(h, u 0 ) depends only on ( h , u 0 ) for any fixed temporal lag u 0 , see Porcu, Montero, and Schlather (2012).
There is extensive literature on parametric spatial covariance models, see Cressie and Wikle (2011, Chapter 4) and the reference therein, and in the past years many parametric models for space-time covariances have been proposed. A possible covariance model is obtained as the product of any valid spatial and temporal covariance. This kind of covariance model, called separable model, has been criticized for its lack of flexibility. For this reason, different classes of non separable covariance models have been proposed recently in order to capture possible space-time interactions (e.g., Porcu, Mateu, and Christakos 2009;Cressie and Wikle 2011, Chapter 6). Some examples of covariance models implemented in CompRandFld and used later in Section 3 and 4 are found in Table 1. Observe that the first model is a separable space-time stable or power-exponential covariance, the second is a special case of the class proposed in Gneiting (2002), the third and fourth are special cases of the classes proposed in De Iaco, Myers, and Posa (2002) and Porcu et al. (2009). Finally, the fifth and sixth models are spatial correlation models (e.g., Gneiting and Schlather 2004). A list of all the correlation Model Functional form Parameters' range Double stable ρ st (h, u) Table 1: Parametric families of isotropic correlation models. The first four models are spatiotemporal and the last two are spatial and K α is the modified Bessel function of order α, Γ is the gamma function, η is a separability parameter, α, κ s , κ t are smoothing parameters and ψ s , ψ t are scale parameters. The subscripts s and t stand for space and time.
functions implemented in the CompRandFld is available with the online help of the routine CovMatrix.

Binary random fields
In some applications the focus is on modeling the presence or absence of a certain phenomenon, observable across space and time. To describe the stochastic behavior of such a process a binary RF is needed.
An approach for defining a spatio-temporal binary random field is the following (e.g., Heagerty and Lele 1998). Let {Y (s, t), (s, t) ∈ I} be a second-order stationary spatio-temporal Gaussian RF with correlation function ρ(h, u) that we assume known up to a vector of parameters ϑ. Then, a binary spatio-temporal RF is defined by where for all (s, t) ∈ I, ε(s, t) ∼ N (0, τ 2 ) is a white noise and b ∈ IR is a threshold. We assume that ε is independent from Y . With this definition, for all (s, t) ∈ I, the univariate marginal probability of Z is where µ is the mean of Z and Φ is the univariate standard Gaussian distribution. In addition, for any (s, t), (s , t ) ∈ I with (s, t) = (s , t ), the bivariate joint probabilities are equal to where Φ 2 is the bivariate standard Gaussian distribution. In our package, Φ 2 is computed using a Fortran routine written by Alan Genz, see Gentz (1992). Note that the marginal and the joint probability of success p 1 and p 11 (h, u) completely characterize the bivariate distribution. Higher order joint probabilities are similarly derived, by using the inclusionexclusion principle. Thus given m spatio-temporal points, the computation of the probabilities for all the related success and failure events requires the evaluation of high-dimensional normal integrals, up to the order m.

Max-stable random fields
In some applications the focus is on modeling exceptionally high (low) levels of a real process. One way for describing the random behavior of spatial or spatio-temporal extremes is provided by max-stable RFs. For simplicity let us focus on the spatial case. Let {Y (s)} s∈S be a stationary RF with continuous sample path and Y 1 , . . . , Y n be n independent copies of it. Consider the pointwise maximum process M n (s) := max i=1,...,n Y i (s), s ∈ S. If there exist continuous functions a n (s) > 0 and b n (s) ∈ IR, for all s ∈ S, such that a −1 n (s){M n (s) − b n (s)} converge weakly to a process Z(s) with non-degenerate marginal distributions for all s ∈ S, then Z is a max-stable RF (e.g., de Haan and Ferreira 2006, Chapter 9). For any s ∈ S, max-stable random fields have univariate marginal generalized extreme value distributions and for a finite set of locations the joint distribution with unit Fréchet marginal distributions, where υ is a probability distribution defined on the (k − 1)-dimensional unit simplex ∆ k−1 and V is the so-called exponent dependence function (e.g., de Haan and Ferreira 2006, Chapter 1,6). Examples of max-stable RF are the Brown-Resnick (Kabluchko, Schlather, and de Haan 2009;Kabluchko 2010), the extremal-Gaussian (Schlather 2002) and the extremal-t . In the first case the exponent dependence function of the bivariate distribution is where λ(h) = 2γ(h) with γ denoting the semivariogram of the underlying Gaussian RF.
(5) was initially derived by Hüsler and Reiss (1989) and is also the exponent function of the max-stable RF described by Smith (1990) but with a different form for λ(h). For these processes, extensions to the spatio-temporal case with dependence function λ(h, u) exist, see Kabluchko (2009), Davis, Klüppelberg, and Steinkohl (2013). In the second case the bivariate distribution depends on the exponent function where ρ(h) is the correlation function ρ(h) of the underlying Gaussian RF. Finally, in the third case, the bivariate distribution depends on the exponent function where T ν+1 is a Student-t distribution with ν + 1 degrees of freedom, ρ(h) is the scaling function and ν the degrees of freedom of the underlying Student-t random field.
A summary for describing the extremal dependence is ξ(h) = V (1, 1; h), named the pairwise extremal coefficient function (Smith 1990). It holds that 1 ≤ ξ(h) ≤ 2, where the lower bound represents the complete dependence case and the upper bound the independence case. For the above mentioned processes the extremal coefficients are equal to respectively.

Composite likelihood approach
Composite likelihood (CL) is a general class of estimating functions based on the likelihood of marginal or conditional events (e.g., Varin, Reid, and Firth 2011).
where (θ; A r ) is the log-likelihood related to the density f (Y ∈ A r ; θ) obtained by considering only the random variables in A r and w r are suitable non negative weights that do not depend on θ. The maximum composite likelihood (MCL) estimator is given by θ = argmax θ c (θ). The CL estimation method has nice properties (e.g., Varin et al. 2011) among which, if the sample size n is such that n m and n goes to infinity, then θ is weakly consistent and asymptotically normally distributed with variance-covariance matrix G(θ) −1 , where is the Godambe information matrix and where ∇ denotes the vector differential operator. On the other hand, if n m or dependent data are available such as a single spatial sequence or a univariate time series, then the above mentioned asymptotic results do not necessary hold, see Cox and Reid (2004). Then, consistency and asymptotic normality of the estimator, as m goes to infinity, must be shown case by case. Likelihood-based statistics such as Wald statistic, score statistic and likelihood ratio statistic and their asymptotic distributions are also available in the CL setting (Chandler and Bate 2007;Varin et al. 2011;Pace, Salvan, and Sartori 2011). In addition, the Akaike information criterion used for model selection in the CL context assumes the following form: where tr(A) is the trace of the matrix A, see Varin and Vidoni (2005). The CL is a wide class of functions that also includes the standard likelihood and other alternatives as special cases. For example, the former is obtained letting A r = Y, R = 1 and w r = 1. Depending on the type of likelihood that is specified on the right hand side of (9), different types of  CL can be obtained. One of the most explored cases is the pairwise likelihood. Considering the marginal pairs A r = (Y i , Y j ), then we obtain the pairwise marginal log-likelihood. If we let A r = (Y i |Y j ), then we obtain the pairwise conditional log-likelihood and finally setting A r = (Y i − Y j ), then we obtain the pairwise difference log-likelihood. In these special cases, log-likelihoods, for a single data contribution are respectively: For these likelihoods the computational burden is O(m 2 ). In principle, the role of the weights is to improve the statistical efficiency and save computational time, see for example Padoan et al. (2010), Bevilacqua et al. (2012), Bevilacqua and Gaetan (2014) and Sang and Genton (2014). The diagram in Figure 1 summarizes the types of likelihoods belonging to the CL class that have been implemented in the package. For comparison purposes we have also implemented in the package other variants of the ordinary likelihood, such as the restricted likelihood of Harville (1977) and the tapered likelihood of Kaufman et al. (2008).

The Gaussian case
Assume that y = {y(s 1 , t 1 ), . . . , y(s k , t l )} is a realisation from a spatio-temporal Gaussian RF (see Section 2.1). y has an m-dimensional normal density with the parameter vector θ = (µ, τ 2 , σ 2 , ϑ) . ϑ is a vector of correlation parameters (see for example the models in Table 1). In the Gaussian case, there are three estimation methods available. Standard ML fitting of the Gaussian model can be performed setting the parameters model = "Gaussian", likelihood = "Full" and type = "Standard" of the FitComposite routine. This method is recommended for small dataset. θ is estimated by maximization of the full likelihood where µ = µ1 m and Σ = {τ 2 1I {(r,q)} ((j, i)) + σ 2 ρ st (||s j − s r ||, |t i − t q |); (i, j), (q, r) ∈ I * } is the m × m covariance matrix defined by (1). The restricted version is also available and it can be used setting type = "Restricted". Under increasing domain the asymptotic variancecovariance matrix of the ML estimator is given by the inverse of the Fisher information matrix I(θ), see Mardia and Marshall (1984).
The second estimation method is based on the tapering likelihood (Kaufman et al. 2008), recommended for large datasets. This method is performed setting the parameter type = "Tapering". The tapering likelihood is given by where Ω c = {ρ st ( s j −s r , |t i −t q |; c); (i, j), (q, r) ∈ I * } is the m×m taper matrix,Σ = Σ•Ω c is an m × m tapered covariance matrix and • is the Schur product. Specifically, ρ(·, ·; c) is a correlation function (taper function) depending on a parameter's vector c, which defines the compact support in space, or in time, or both (Wendland 1995). Hence, tapering in likelihood (15) both the covariance matrix Σ and the matrixΣ −1 , the tapering likelihood (16) is obtained. The parameter taper of FitComposite allows us to set different taper functions. A list of models implemented in the package is reported in Table 2. The first two are spatial tapers, while the last is a spatio-temporal taper. The compact support parameters for space and time can be set providing two positive values c s and c t at maxdist and maxtime (see Table 2). Likelihood (16) is implemented in CompRandFld and in the optimization procedure it calls for specific routines of the package spam (Furrer and Sain 2010;Furrer 2014) for handling the sparse matrixΣ. Under increasing domain, the maximizer of (16) is a consistent estimator and has asymptotic Gaussian distribution, with variance given by the inverse of the Godambe matrix which is given as in (10) with see Shaby and Ruppert (2012). The tapering can have an effect, depending on the choice of the compact support, when assessing a correlation structure. For example, the top-left panel of Figure 2 reports a simulation from a zero-mean, unit-variance, spatial Gaussian random field with exponential correlation and scale parameter c s = 0.2 on the unit square. Simulations are repeated by tapering the covariance matrix with the first Wendland function of Table 2 using the compact supports: 0.6, 0.4 and 0.2. When the compact support is greater or equal to the practical range (0.6), the simulation in the top-right panel shows that there is no difference with the original simulation. While for decreasing values of compact support, the simulations on the two bottom panels show a decreasing range of the correlation structure.
The third estimation method is based on the pairwise likelihoods (12)-(14), also recommended for large dataset. Bevilacqua and Gaetan (2014) have shown by means of a simulation study, that under an increasing-domain asymptotic and for a fixed given compact support, the pairwise likelihood, compared to the other estimation methods, is the fastest. For instance, Figure 3 shows that the ratio between the computational time required for a single numerical evaluation of the full and the pairwise, and the tapering and the pairwise likelihoods is greater than one, stressing that the pairwise likelihood is the fastest estimation method. The ratios have been computed for different sample sizes. We can see that for increasing sample size the pairwise likelihood provide a computational convenience that is increasingly prominent. CL estimation is performed setting likelihood = "Marginal" and type = "Difference" in FitComposite. This is based on the difference pairwise likelihood where D = (i, j, q, r) : and γ ijqr (σ 2 , τ 2 , ϑ) = τ 2 1I {(r,q)} ((j, i))+σ 2 {1−ρ st ( s j −s r , |t i −t q |; ϑ)} is the variogram. Pairwise or conditional likelihoods (12) and (13) can be used setting likelihood = "Marginal" and type = "Pairwise" or likelihood = "Conditional". Binary weights can be set by providing two positive values c s and c t at maxdist and maxtime. Under increasing domain the maximizer of the CL is consistent and asymptotically normal with variance-covariance matrix as in (10) (Bevilacqua et al. 2012). Standard errors can be obtained setting varest=TRUE. These are obtained, for the three likelihood methods, as the square roots of the diagonal elements of the inverse of the Fisher and Godambe information. In the Gaussian case the analytical expressions of such matrices are known.
In the composite likelihood case however, since the expression of J is unwieldy, its numerical evaluation can be computationally demanding. So to quickly obtain standard errors, two options are implemented. If a single data replication is available, then the matrix is computed with a sub-sampling technique and this is obtained specifying vartype = "subsamp". If data replications are available, then the Godambe matrix is computed using its sampling version (Varin et al. 2011) and this is obtained specifying vartype = "sampling". Here is a practical example.
R> pml <-FitComposite(data, coordx = coords, coordt = times, + corrmodel = "exp_exp", likelihood = "Marginal", type = "Pairwise", + start = list(scale_s = 2, scale_t = 3, sill = 1), + fixed = list(mean = 1, nugget = 0), varest = TRUE, maxdist = 0.5, + maxtime = 3) R> pml$param scale_s scale_t sill 2.244840 5.739743 1.392082 With start, users can specify a starting value for any parameter that needs to be estimated. While with fixed, users can specify a value for a parameter that does not need to be estimated. Starting values are admitted only for parameters that are not fixed. Note that the routine CorrelationParam returns the list of parameters for any correlation model. Estimates are obtained by fitting the marginal pairwise likelihood. Binary weights are used so that only pairs of observations with spatial distances and temporal intervals smaller or equal to 0.5 and 3 are retained. print(pml) provides the summary of the fitting, omitted here for brevity. The collected results are: the parameters' estimate with related standard errors and variancecovariance matrix, the value that maximizes the log pairwise likelihood and the value of the CLIC (11) and other information. The fitting results can be used for estimating the correlation or variogram function, computing the practical range (Diggle and Ribeiro 2007) with the Covariogram routine, or as the following code shows, for predicting the process on ungauged locations for future times.
R> times_to_pred <-21:23 R> x <-seq(0,1, 0.02) R> loc_to_pred <-as.matrix(expand.grid(x, x)) R> param <-as.list(c(pml$param, pml$fixed)) R> pr <-Kri(data, coords, coordt = times, loc = loc_to_pred, + time = times_to_pred, corrmodel = "exp_exp", param = param) The Kri routine performs simple kriging (or ordinary kriging, Diggle and Ribeiro 2007) in the new set of locations loc to pred and times times to pred. A dataset, spatio-temporal coordinates and a correlation model with the estimated parameters need to be specified. The routine returns two matrices (together with other information), one with the predicted values and one with their associated standard errors. With standard R code, the results can be easily visualized. Figure 4 shows, with the top and bottom panels, the predicted values and their standard errors, on a 51 × 51 grid of points in [0, 1] and three consecutive times (left-right). Due to the strong dependence structure, we see that a noticeable dependence remains on time (although decreasing) and that the lower variabilities are recorded on the points where the data has been simulated. In CompRandFld simple and tapered kriging are implemented. The default option is simple kriging. This is performed working with the full covariance matrix, which is computationally intensive for large spatial or spatio-temporal data. A remedy is to use the tapered kriging proposed by Furrer, Genton, and Nychka (2006). Tapered kriging is an approximation of the classical kriging, faster to compute and hence useful when dealing with large datasets. With tapered kriging, the tapered covariance matrix is used in the simple kriging systems instead of the full covariance matrix, so that the solution is obtained using fast algorithms for sparse matrices. The routine Kri calls specific routines of the package spam (Furrer and Sain 2010) in order to handle sparse matrices. Tapered kriging can be done setting type = "Tapering" and specifying values for the parameters taper, maxdist and maxtime. For performing tapered kriging see also Nychka, Furrer, and Sain (2014).

Binary case
Assume that z(s 1 , t 1 ), . . . , z(s k , t l ) is a sample from a spatio-temporal binary random field (see Section 2.2), where z(s j , t i ) = 1 (z(s j , t i ) = 0) represents the success (failure) events. In this case, the ML estimator is unknown in closed form and the numerical maximization of the full likelihood is too computationally demanding, also for moderate sample sizes. For this reason, Heagerty and Lele (1998) proposed performing the parameter estimation of the probit model (2), using the pairwise marginal likelihood where D is as in (17) and the probabilities p xy (·, ·), x, y = 0, 1, are described in (3). MCL fitting can be obtained with the routine FitComposite specifying the values model = "BinaryGauss", likelihood = "Marginal" and type = "Pairwise". The conditional likelihood (12) is also available and it can be used by setting likelihood = "Conditional".
R> set.seed(11) R> x <-seq(0, 10, 1) R> y <-seq(0, 10, 1) R> times <-seq(1, 15, 1) R> param <-list(nugget = 0, mean = 0, scale_t = 1.5, scale_s = 2, + sill = 0.6) R> data <-RFsim(x, y, times, corrmodel = "exp_exp", model = "BinaryGauss", + param = param, grid = TRUE)$data This code shows how to simulate a binary random field with the routine RFsim. Since the process is based on a latent Gaussian RF, a correlation function and its parameters need to be specified. Figure 5 shows the simulation for the first four temporal instants (top-left to bottom-right), performed on an 11 × 11 grid on [0, 10] 2 and for 15 times. The dependence can be estimated using the lorelogram function (Heagerty and Zeger 1998). The lorelogram estimator is defined as logarithm of the pairwise odds ratio, In the binary case, it provides a better measure of dependence than the covariance function, for instance it allows ease of interpretation and its upper and lower bounds do not depend on the mean. The following code shows how to compute the empirical estimator of (19) using the routine EVariogram.

Max-stable case
Assume that independent copies, z 1 , . . . , z m , of the vector of spatial componentwise maxima z = {z(s 1 ), . . . , z(s k )} , are available. Examples of real data analyses using this type of extremes are Padoan et al. (2010), Davison and Blanchet (2011) and . Suppose for simplicity that the random behavior of the spatial maxima is well described by one of the classes of spatial max-stable RFs described in Section 2.3. For those families, although the k-dimensional distribution is known (Hüsler and Reiss 1989;Opitz 2013), the derivation of   (4), for deriving the density, a kth Bell number of derivatives is required to compute it (Davison and Gholamrezaee 2012). Nevertheless, the bivariate density is still feasible to compute in practice and it has the form Taking the first and second order derivatives of (5)-(7) allows us to obtain the Brown-Resnick, the extremal-Gaussian and extremal-t bivariate densities. Therefore, for inference the full likelihood can be replaced by the pairwise marginal likelihood (12). Details on CL inference for max-stable random fields are provided by Padoan et al. (2010) and . Here, practical examples are illustrated.
R> fextg <-FitComposite(extg, coords, corrmodel = "stable", + model = "ExtGauss", replicates = 30, varest = TRUE, + vartype = "Sampling", margins = "Frechet", fixed = list(sill = 1), + start = list(scale = 1.5, power = 1.2)) R> fextg$param power scale 0.9156696 1.7135199 With print(fextg) the summary of the fitting results is returned (omitted here for brevity) and with fextg$param only the parameter estimates are displayed. The standard errors of the estimates are computed and specifying vartype = "Sampling" the matrices H and J in (10) are computed using their sampling version. Note that other correlation models, such as the Whittle-Matérn or the generalized Cauchy (see Table 1) can also be fitted. The next code shows how to simulate and fit also the Brown-Resnick and Extremal-t models.
Max-stable RF can also be fitted using the least squares method proposed by Smith (1990).
The following code shows an example with the Brown-Resnick case.
R> wlbr <-WLeastSquare(bres, coords, corrmodel = "stable", + model = "BrowResn", replicates = 30, fixed = list(sill = 1)) The extremal coefficient functions in (8) can be computed on the basis of the fitting results. These are useful for summarizing the extremal dependence. The next code shows how to use the routine Covariogram for computing the extremal coefficients for the case of the Brown-Resnick RF.
R> extcbr <-Covariogram(fbres, answer.extc = TRUE) R> extcbrw <-Covariogram(wlbr, answer.extc = TRUE) Non-parametric estimators of the extremal coefficient are also available. A useful estimator is obtained by means of the madogram and F -madogram introduced by Cooley, Neveau, and Poncet (2006). For instance, the empirical version of F -madogram iŝ where F denotes the unit Fréchet distribution, z (i) (s j ) is the ith sample at location s j (similarly for z (i) (s r )), P h is the set of all the sample pairs {z(s j ), z(s r )} whose distances h fall in the interval (h − , h + ) with > 0 and |P h | is the cardinality of this set. The binned version of the estimator is computed for a sequence of distances, h, that are the centers of a series of equally spaced intervals that divide the range of the pairwise distances. The unbinned or cloud version is computed with the sequence of observed distances, in this case |P s j −sr | = 1 for every observed distance s j − s r . An estimate of the extremal coefficient is obtained byξ(h) = {1 + 2ν(h)}/{1 − 2ν(h)}. The next code shows how to obtain a non-parametric estimate of the extremal coefficient with the routine EVariogram.
R> fmcbr <-EVariogram(bres, coords, type = "Fmadogram", + replicates = 30, cloud = TRUE) R> fmbbr <-EVariogram(bres, coords, type = "Fmadogram", + replicates = 30, numbins = 10) The pairwise extremal coefficients (first line) and the binned version (second line) with 10 equally spaced bins are computed. With standard R commands, the parametric and nonparametric estimates of the extremal coefficient can be graphically represented. Figure 7 presents the results. The panels report, for the three max-stable RFs, the evolution of the estimated extremal coefficients versus the spatial distances. We see that estimates provided by the binned F -madogram, the CL and the least squares methods are quite similar, as expected.

Real data example: Irish wind speed data
We consider daily wind speeds collected over 18 years, from 1961 to 1978, at 12 sites in Ireland ( Figure 8). These have originally been analyzed in Haslett and Raftery (1989). Following Gneiting, Genton, and Guttorp (2007) we omit the Rosslane station, we consider the square root transformation of the data and we remove the seasonal component. The latter is estimated by calculating the average of the square roots of the daily means over all years and stations and regressing this on a set of annual harmonics. The resulting transformed data, {z(s j , t i ), j = 1, . . . , 11, i = 1 . . . 6574}, is assumed to be a realization from a zero-mean Gaussian spatio-temporal RF. For comparison purposes we also perform ML estimation, then, in order to reduce the computational time, we focus our analysis on considering the first half of the year 1962. So, we have 11 × 183 = 2013 observations and ML estimation is still feasible, although very slow. We start the analysis, first estimating non-parametrically the dependence structure. The next code shows how to estimate the empirical spatio-temporal variogram with the routine EVariogram.
R> data("irishwinds") R> coords <-as.matrix(winds.coords [-3, 5:4], ncol = 2) R> times <-(365 + 1):(365 + 183) R> data <-winds[times, ] R> evar <-EVariogram(data, coords, coordt = times, maxtime = 7, + distance = "Chor") If the coordinates of the location sites are given in latitude and longitude, as in this case, it is possible to compute the chordal or the geodesic distance in km (Banerjee 2005;Nychka et al. 2014) setting distance = "Chor" or distance = "Geod". Note that only some of the correlations models described in Table 1 are still valid using the geodesic distance instead of the euclidean distance. For instance the Matern and the Cauchy models are still valid using the geodesic distance under some restrictions on the parameters (Gneiting 2013).
For simplicity, setting maxtime we restrict the computation of the spatio-temporal variogram focusing only on times whose pairwise intervals are smaller or equal to 7 days. Figure 9 reports an illustration of the estimated variogram. The code to reproduce the plots will be discussed in the next paragraph for comparison purposes with the parametric estimate of the variogram. The top-left panel shows, with points, the estimated empirical spatio-temporal variogram (vertical axis) versus spatial distances (horizontal axis) and time intervals (oblique axis). Estimates are in matrix format, where each row (column) is a temporal (spatial) variogram for a given spatial distance (temporal instant). Bottom-left and right panels show, with dots, respectively a spatial and temporal variogram for the fixed temporal interval equal to 0 days and the spatial distance equal to 0 km. Both are simply a section of the spatiotemporal variogram.
We consider now a parametric estimate of the variogram. We focus on the Gneiting (Gneiting 2002) and the double exponential model, see Table 1. The first is a non-separable model for 0 < η ≤ 1, while it is separable for η = 0 and the second is separable. The aim is to investigate if, for these data, the dependence structure is separable. Observe that, when jointly estimating all the parameters, the likelihood surface is multimodal, so in order to avoid this we consider fixed η. However, we estimate the correlation parameters for different values of the separability parameter η in [0, 1]. Using the FitComposite routine, the covariance models are fitted with maximum composite, tapering and standard likelihood estimators. The next code illustrates the case η = 0.5. The code for the other values of η and the double exponential covariance is similar.
R> cormod <-"gneiting" R> fixed <-list(mean = 0, nugget = 0, power_s = 1, power_t = 1, sep = 0.5) R> start <-list(scale_s = 600, scale_t = 1, sill = var(c(data))) R> wls <-WLeastSquare(data, coords, coordt = times, corrmodel = cormod, + maxdist = 400, maxtime = 7, fixed = fixed, start = start, + distance = "Chor", weighted = TRUE) R> start <-as.list(wls$param) R> CL <-FitComposite(data, coords, coordt = times, corrmodel = cormod, + likelihood = "Marginal", type = "Pairwise", maxdist = 400, maxtime = 4, + fixed = fixed, start = start, distance = "Chor", varest = TRUE) R> start <-as.list(CL$param) R> TP <-FitComposite(data, coords, coordt = times, corrmodel = cormod, + likelihood = "Full", type = "Tapering", taper = "Wendland2_Wendland2", + maxtime = 4, maxdist = 400, fixed = fixed, start = start, + distance = "Chor", varest = TRUE) R> start <-as.list(TP$param) R> ML <-FitComposite(data, coords, coordt = times, corrmodel = cormod, + likelihood = "Full", type = "Standard", fixed = fixed, start = start, + distance = "Chor", varest = TRUE) On the fourth line a preliminary estimate with the weighted least square method (Diggle and Ribeiro 2007) is obtained. This is used as a starting value for the optimization of the likelihood-based methods. On the sixth line the marginal pairwise likelihood estimation is performed. With maxdist and maxtime pairs of observations that are more than 400 km far apart and separated by more than four days, are excluded. On the eight line the tapering likelihood estimation, using a Wendland space-time separable taper function (Table 2), is performed. The setting is the same as the CL. This leads to a tapered covariance matrix with 4.4% of no-zero values. On the tenth line ordinary likelihood estimation is performed. Table 3 presents the estimation results. The three estimation methods provide similar results and the standard ML method exhibits the lower standard errors as expected. However, the CL method is computationally the fastest. Indeed, with these data the numerical evaluation of the standard, tapering and composite likelihood requires respectively an elapsed time of 3.4, 1.1 and 0.05 (the time is computed with the system.time routine of R installed in a machine with a 2.27 GHz processor and 6 GB of memory). Therefore, it is 68 (22) times faster to evaluate the composite than the standard (tapering) likelihood. In the sixth column of Table  3, the AIC/CLIC values suggest that with a sample of data, the separable model performs the best fitting. This stresses that the underlying RF can be considered as the product of a purely spatial and temporal RF. The next code shows how to compute and graphically represent the spatio-temporal variogram with the Covariogram routine.
R> Covariogram(CL, vario = evar, fix.lagt = 1, fix.lags = 1, + show.vario = TRUE, pch = 19) Both parametric or nonparametric estimates of the variogram are represented. The topright panel of Figure 9 displayes the variogram estimated with the CL. The bottom-left and right panels show, with the solid lines, an estimate of the spatial and temporal variogram conditionally to the fixed temporal interval equal to 0 days and the spatial distance equal to  Table 3: Fits of the spatio-temporal Gaussian process with Gneiting models to Irish winds data. Results are reported for the separability parameter η = 0, 0.5, 1. The first two columns give the estimation method and the correlation function used. From the third to the sixth columns the parameter estimates (standard errors) are reported; σ 2 , ψ s , ψ t are the sill, the spatial scale and temporal scale parameters. AIC/CLIC is the information criterion or composite information criterion. 0 km. From the top and bottom panels, we can see by comparing the dots with the lines that empirical and model-based variogram estimators provide similar results.
Finally the spatio-temporal simple kriging is performed using the estimated Gneiting correlation model (with η = 0). The prediction sets are the ungauged sites within the coastlines of Ireland for two consecutive times, in particular the days 549 and 550. The next code shows how to obtain the prediction with the Kri routine.
R> lon <-seq(min(coords[, 1]) -0.3, max(coords[, 1]) + 0.85, 0.06) R> lat <-seq(min(coords[, 2]) -0.3, max(coords[, 2]) + 0.13, 0.06) R> pr.s <-as.matrix(expand.grid(lon, lat)) R> pr.t <-(times[length(times)] + 1):(times[length(times)] + 2) R> par <-as.list(c(CL$param, fixed)) R> pr <-Kri(data, coords, coordt = times, corrmodel = cormod, + distance = "Chor", loc = pr.s, param = par, time = pr.t) Figure 10 displays the prediction maps for the two selected days. The plots have been done with standard R code and using the map routine of the package mapproj (McIlroy, Brownrigg, Minka, and Bivand 2014). Specifically, the top panels show the prediction maps for the two consecutive days. We can see that high wind speeds are concentrated between the north-est to the center of the island, decreasing when approaching the south and west-coast. The bottom panels show the standard error prediction maps. As expected, we can firstly see that the standard errors decrease in the parts of the island where the gauging weather stations are located, second, the standard errors increase from the first to the second predicted day.

Conclusion
The composite likelihood is a wide class of estimating functions that generalizes the usual standard likelihood and includes different alternatives of the latter. The wide range likelihood versions that it provides makes it a very flexible estimation method that in recent years has attracted growing interest. There are many statistical fields and applications where it has been successfully used, see Varin et al. (2011). Spatial and spatio-temporal applications are among these.
In this paper we have discussed the R package CompRandFld for applying composite likelihood inferential methods for the analysis of spatial and spatio-temporal data. At the moment, the code available in this package can serve the community that is interested in statistical analysis of spatio-temporal Gaussian and binary data, and spatial extremes. We aim to expand the package in different directions. For example, including other types of composite likelihoods such as the triwise marginal likelihood, mixture of pairwise marginal and univariate likelihoods and marginal likelihoods of blocks. In addition, we intend to implement composite likelihoods for categorical and counting dependent data and extend the code in order to estimate multivariate Gaussian RFs.