spTest: An R Package Implementing Nonparametric Tests of Isotropy

An important step of modeling spatially-referenced data is appropriately specifying the second order properties of the random field. A scientist developing a model for spatial data has a number of options regarding the nature of the dependence between observations. One of these options is deciding whether or not the dependence between observations depends on direction, or, in other words, whether or not the spatial covariance function is isotropic. Isotropy implies that spatial dependence is a function of only the distance and not the direction of the spatial separation between sampling locations. A researcher may use graphical techniques, such as directional sample semivariograms, to determine whether an assumption of isotropy holds. These graphical diagnostics can be difficult to assess, subject to personal interpretation, and potentially misleading as they typically do not include a measure of uncertainty. In order to escape these issues, a hypothesis test of the assumption of isotropy may be more desirable. To avoid specification of the covariance function, a number of nonparametric tests of isotropy have been developed using both the spatial and spectral representations of random fields. Several of these nonparametric tests are implemented in the R package spTest, available on CRAN. We demonstrate how graphical techniques and the hypothesis tests programmed in spTest can be used in practice to assess isotropy properties.


Introduction
An important step in modeling a spatial process is choosing the form of the covariance function. The form of the covariance function will have an effect on kriging as well as parameter estimates and the associated uncertainty (Cressie 1993, pp. 127-135). A common simplifying assumption about the spatial covariance function is that it is isotropic, that is, the dependence between sampling locations depends only on the distance between locations and not on their relative orientation. This assumption may not always be reasonable; for example, wind may lead to directional dependence in environmental monitoring data. Misspecification of the second order properties can lead to misleading inference. Sherman (2011, pp. 87-90) and Guan, Sherman, and Calvin (2004) summarize the effects of incorrectly specifying isotropy properties on kriging estimates through numerical examples. In order to choose an appropriate model, a statistician must first assess the nature of the spatial variation of his or her data. To check for anisotropy (directional dependence) in spatially-referenced data, a number of graphical techniques are available, such as directional sample variograms or rose diagrams (Matheron 1961;Isaaks and Srivastava 1989, pp. 149-154). Spatial statisticians may have an intuition about the interpretation and reliability of these diagnostics, but a less experienced user may desire evaluation of assumptions via a hypothesis test.
A number of nonparametric tests of isotropy, which avoid the choice of a parametric covariance function, have been developed (see, e.g., Lu and Zimmerman 2001;Guan et al. 2004;Lu and Zimmerman 2005;Maity and Sherman 2012). In Weller and Hoeting (2016), we provide a review of the different nonparametric methods available for testing isotropy and symmetry properties, including an extensive simulation study. In the current work, we aim to showcase the functionality of our R (R Core Team 2017) package spTest (Weller 2018), which implements several of the aforementioned methods. Package spTest is available from the Comprehensive R Archive Network (CRAN) at https://CRAN.R-project.org/package=spTest. We use two real data examples to illustrate how the nonparametric hypothesis tests available in package spTest can be used to assess isotropy properties in spatially-referenced data. The examples also demonstrate how graphical techniques and hypothesis tests can be used in a complementary role. The remainder of the paper is organized as follows: Section 2 establishes notation and definitions; Section 3 describes the functionality of the spTest package; Section 4 demonstrates how to use the functions in package spTest in conjunction with graphical techniques on two different data sets; Section 5 concludes the paper with a discussion.

Spatial statistics
In Section 2.1 we review key definitions required for describing and performing nonparametric tests of isotropy. In particular we review the concepts of stationarity and isotropy in random spatial processes. Readers who are familiar with these concepts may wish to proceed to Section 2.2. For additional background, see Weller and Hoeting (2016) or Schabenberger and Gotway (2004). and/or predict Y with associated uncertainty at new locations. To achieve these goals, we must specify the distributional properties of the spatial process. A common assumption is that the finite-dimensional joint distribution of {Y (s) : s ∈ D ⊆ 2 } is multivariate normal, in which case we call the RF a Gaussian random field (GRF).
A RF is strictly stationary if its distribution is invariant under coordinate translation, P(Y (s 1 ) < y 1 , · · · , Y (s k ) < y k ) = P(Y (s 1 + h) < y 1 , · · · , Y (s k + h) < y k ) for all spatial lags h = (h 1 , h 2 ) ∈ 2 and k ∈ Z + . A RF is weakly, or second order, stationary if E(Y (s)) = E(Y (s + h)) = µ and where C(h) < ∞ is called the covariance function or covariogram. Henceforth we assume µ = 0. A RF that is weakly stationary (2) has a constant mean, and the covariance between values at two spatial locations depends only on the spatial lag. Absolute coordinates are irrelevant for the mean, variance, and covariance of a weakly stationary RF. A RF is intrinsically stationary if E(Y (s + h) − Y (h)) = 0 and where 2γ(h) is called the variogram function or variogram, and γ(h) = 1 2 VAR(Y (s+h)−Y (s)) is the semivariogram function or semivariogram. Note that intrinsic stationarity is defined in terms of the increments Y (s + h) − Y (s).
The relationship between the different types of stationarity is given by strict =⇒ weak =⇒ intrinsic. (4) In general the converse in (4) is not true; however, the converse in (4) holds under the assumption that the RF is a GRF. From (2) and (3) it follows that γ(h) = C(0) − C(h) when the RF is weakly stationary. This relationship implies that the second order properties of a weakly stationary RF can be viewed from the perspective of either the (semi)variogram or covariance function.
A common simplifying assumption about the spatial dependence structure is that it is isotropic.
Definition 1 A weakly stationary RF is isotropic if the covariance function, C(h), (or equivalently, the semivariogram function) of the RF depends on the lag vector h only through its Euclidean length, h , i.e., C(h) = C 0 ( h ) for some function C 0 (·) of a univariate argument.
Isotropy implies that the dependence between any two observations depends only on the Euclidean distance between their sampling locations and not on their relative orientation. A spatial process that is not isotropic is called anisotropic. Figure 1 displays data simulated from two stationary RFs, one which is isotropic and one that is anisotropic. The methods in the R package spTest are designed to assist in determining whether or not an assumption of isotropy is reasonable.
When modeling a RF, a typical assumption is that the spatial covariance, C(h), or semivariogram function, γ(h), can be described by a parametric model. A number of methods are available for estimating the parameters of these models, for example, maximum likelihood

Stationary and Anisotropic Spatial Process
Longitude Latitude (b) Figure 1: Heat maps displaying simulated data from two stationary RFs. Figure 1a displays data simulated from an isotropic covariance function. Figure 1b shows data simulated from an anisotropic covariance function. The existence of anisotropy in Figure 1b is evidenced by bands of similar values running the northwest to southeast directions. These bands indicate that spatial dependence is stronger in the NW-SE direction than the SW-NE direction.
or least squares (Cressie 1993, pp. 90-97). Parametric models ensure that the covariance function is valid and provide parameters that can be interpreted as describing characteristics of the random field, such as the range of dependence or the direction of anisotropy (Schabenberger and Gotway 2004, pp. 141-152). The methods in package spTest are nonparametric tests of isotropy, which circumvent the choice of a parametric form for the covariance (semivariogram) function. A nonparametric test of isotropy escapes potential misspecification of a parametric covariance function and avoids the potential for having to estimate the covariance function twice (e.g., under the null and alternative hypothesis for a likelihood-ratio test).
A nonparametric test of isotropy requires a nonparametric estimator of second order properties of the RF. We discuss nonparametric estimation of the semivariogram function here and note that similar techniques can be used for nonparametric estimation of the covariance function. The classical moment-based estimator of the semivariogram (Matheron 1962) is the sample semivariogram given by where the sum is over D(h) = {s : s, s + h ∈ D} and |D(h)| is the number of elements in D(h). The set D(h) is the set of sampling location pairs that are separated by spatial lag h. There are two important modifications to the estimator in (5) that are pertinent to the methods described in this paper. First, for non-gridded sampling locations, the estimator needs to be modified to account for the fact that very few or no pairs of locations will be separated by a specific spatial lag, h. One solution to this challenge is to specify a distance tolerance, , such that lags having length h ± are included in estimating the semivariogram at lag h. Second, directional sample semivariograms can be estimated by using only observations that are separated by spatial lags in a specific direction. For example, to investigate potential anisotropy, we can compare sample semivariograms between the horizontal and vertical directions. For non-gridded sampling locations, very few pairs of locations will lie at a specific distance and directional lag, so we need to allow for both a distance and a directional tolerance when estimating the semivariogram. A common method for doing this is by using a product kernel smoother that smoothes over both the horizontal (h 1 ) and vertical (h 2 ) components of the spatial lag h = (h 1 , h 2 ) .
Spatial RFs and their second order properties can also be expressed in the spectral (or frequency) domain using Fourier transforms. The spectral representation of RFs and their second order properties provide alternative methods for testing second order properties. For brevity we focus only on the methods in the spatial domain and refer the interested reader to Weller and Hoeting (2016) or Fuentes and Reich (2010). We note that that, in addition to the methods from the spatial domain, the nonparametric spectral methods from Lu and Zimmerman (2005) are also implemented in package spTest. Lu (1994) and Lu and Zimmerman (2001) pioneered a popular approach to testing secondorder properties when they used the joint asymptotic normality of the sample semivariogram computed at different spatial lags to evaluate symmetry and isotropy properties. The subsequent works of Guan et al. (2004); Guan, Sherman, and Calvin (2007) and Maity and Sherman (2012) built on these ideas and are the primary methods programmed in package spTest. Here we give an overview of the tests in Guan et al. (2004) and Maity and Sherman (2012).

Nonparametric tests of isotropy
Under the null hypothesis that the RF is isotropic, it follows that the values of γ(·) evaluated at any two spatial lags that have the same norm are equal, independent of the direction of the lags. For example, under the assumption of isotropy, γ((−6, 0)) = γ √ 3, √ 3 . To completely specify the null hypothesis of isotropy, theoretically, one would need to compare semivariogram values for an infinite set of lags. In practice, a small number of lags are specified. Then it is possible to test a hypothesis consisting of a set of linear contrasts of the form H 0 : Aγ(·) = 0 (6) as a proxy for the null hypothesis of isotropy, where A is a full row rank matrix (Lu and Zimmerman 2001). For example, a set of lags, denoted Λ, commonly used in practice on gridded sampling locations with unit spacing is and the corresponding A matrix under H 0 : Aγ(Λ) = 0 is In other words, using (7) and (8) Guan et al. (2004), LZ: Lu and Zimmerman (2005), and MS: Maity and Sherman (2012). The column "spTest function" lists the name of the function used to implement the test in the spTest package. "Design" indicates the spatial sampling design for which the test is valid. "Estimator" describes the method used to estimate second order properties. Note that a "uniform" design implies sampling locations must be uniformly distributed on the domain.
lags h 3 = (1, 1) and h 4 = (−1, 1). An important step in detecting anisotropy is the choice of lags, Λ, as the test results will only hold for the set of lags considered (Guan et al. 2004). While this choice is somewhat subjective, there are several considerations for determining the set of lags. First, in terms of Euclidean distance, short lags should be chosen. Short lags are preferred because estimates of the semivariogram at long lags tend to be more variable. Second, pairs of orthogonal lags should be contrasted because, for an anisotropic process, the directions of strongest and weakest spatial correlation will typically be orthogonal. For other considerations and more detailed guidelines regarding the choice of lags, see Weller and Hoeting (2016), Lu and Zimmerman (2001), and Guan et al. (2004).
The tests in Guan et al. (2004) and Maity and Sherman (2012) require estimating the semivariogram and covariance function values, respectively, at the set of chosen lags. We denote the vector of point estimates of the semivariogram/covariance function at the chosen lags as G n , the true values as G, and the asymptotic variance-covariance matrix of G n as Σ. Under increasing domain asymptotics, a central result for both methods is that where a n is a normalizing constant. This result holds assuming stationarity of the RF and mild conditions on the RF's mixing and moments. See Section 5 for additional discussion regarding the assumptions for the tests. The test statistic is a quadratic form where Σ is an estimate of the asymptotic variance-covariance matrix and b n is a normalizing constant. A p value can be obtained from the asymptotic χ 2 distribution with degrees of freedom given by the row rank of A. Important differences between these works regard the distribution of sampling locations, shape of the sampling domain, and estimation of G and Σ. Table 1 outlines the primary differences between these methods; we refer the interested reader to Weller and Hoeting (2016) for more details.

Nonparametric tests implemented in spTest
The R package spTest includes functions for implementing the tests developed in Guan et al. (2004), Lu and Zimmerman (2005), and Maity and Sherman (2012). The spTest functions for implementing these tests are listed in Table 1. For example, the test from Guan et al. (2004) for data observed at non-gridded, but uniformly distributed, sampling locations is implemented in the function GuanTestUnif, which takes the following arguments: GuanTestUnif(spdata, lagmat, A, df, h = 0.7, kernel = "norm", truncation = 1.5, xlims, ylims, grid.spacing = c(1, 1), window.dims = c(2, 2), subblock.h, sig.est.finite = TRUE) There are several necessary inputs. spdata must include the coordinates of sampling locations and the corresponding data values. This input can be a matrix or an object created by either the sp package (Pebesma and Bivand 2005;Bivand, Pebesma, and Gomez-Rubio 2013) or the geoR package (Ribeiro Jr. and Diggle 2001). The spatial lags used to estimate the semivariogram, denoted Λ, are specified in the matrix lagmat. The matrix A in (10) is specified by A and provides the contrasts of the semivariogram estimates, and its row rank is indicated by the parameter df (the degrees of freedom for the asymptotic χ 2 distribution). The values h and kernel provide the bandwidth (smoothing) parameter and form of the kernel smoother, respectively, used to smooth over spatial lags when estimating the semivariogram. If a normal smoothing kernel is used, then the truncation parameter indicates where to truncate the normal kernel (i.e., zero weight for spatial lags larger than this value). The parameters xlims and ylims give the horizontal and vertical limits of the sampling region (a rectangular sampling region is assumed). When performing a nonparametric test of isotropy for non-gridded sampling locations, we must place a grid on the sampling domain and choosing a moving window or block size to estimate Σ in (9) (see Section 4.2). A grid is placed over the sampling region according to the width and height specified by grid.spacing. The dimensions of the moving window, given in the units of the underlying grid, are determined by the values in window.dims. The bandwidth of the smoothing kernel used to estimate the semivariogram on the subblocks of data created by the moving window is indicated by subblock.h and a finite sample adjustment to the estimate of the asymptotic variance-covariance matrix is made by setting sig.est.finte = TRUE. For more information about the different arguments and guidelines on how to choose them, see Weller and Hoeting (2016), the spTest manual, and the original works (Guan et al. 2004(Guan et al. , 2007Maity and Sherman 2012).

Applications: Using spTest to check for anisotropy
We demonstrate the functionality of the spTest package on two data sets: the first containing data at gridded sampling locations; the second containing data collected via a non-gridded sampling design. For more details on the functions and examples using simulated data, see the spTest manual. The spTest package can be used independently of other packages built for analyzing spatial data, but it works nicely with two other packages loaded into R: fields (Nychka, Furrer, Paige, and Sain 2017) and geoR (Ribeiro Jr. and Diggle 2001). We also load the splines (R Core Team 2017), MASS (Venables and Ripley 2002), and rgdal (Bivand, Keitt, and Rowlingson 2017) packages, which we use to estimate mean functions, compute studentized residuals, and calculate map projection coordinates, respectively.
R> library("spTest") R> library("fields") R> library("geoR") R> library("splines") R> library("MASS") R> library("rgdal") For the two examples given below, we use graphical diagnostics and the hypothesis tests implemented in package spTest to determine whether or not an assumption of isotropy is reasonable for spatially-referenced data. The general strategy will be to first do exploratory data analysis (EDA) of the original data and create a model for the mean of the spatial process using appropriate covariates. After estimating a model for the mean, we extract residuals and again use EDA to check for remaining spatial dependence and utilize graphical diagnostics and hypothesis tests to investigate potential anisotropy. For brevity, we have not included the full version of EDA code and plots; instead, we include only the most relevant code to demonstrate the functionality of the spTest package. The complete version of the code is available in the supplementary material.

Gridded sampling locations
The gridded data used in this section come from the North American Regional Climate Change Assessment Program (NARCCAP; Mearns et al. 2009). The data set WRFG in package spTest includes coordinates and a 24-year average of yearly average temperatures from runs of the Weather Research and Forecasting -Grell configuration (WRFG) regional climate model (RCM) using boundary conditions from the National Centers for Environmental Prediction (NCEP). The original data are available on the NARCCAP website and the R code used to create the yearly averages is available on GitHub (Weller 2016). The data set contains both latitude and longitude and universal transverse mercator (UTM) coordinates. The UTM coordinates specify the regular grid for 14,606 grid boxes along with average temperature at surface at each grid box. Figure 2 displays a heat map of all of the data and was created using the image.plot function from the fields package. Due to computational considerations and because the methods in package spTest assume stationarity, for our analysis we use a 20 × 20 subset of the grid boxes defined by the UTM coordinates over the central col = two.colors(n = 256, start = "blue3", end = "red3", + middle = "gray60"), legend.lab = "Temp (K)", legend.cex = 0.8, + legend.line = 2.2, xlab = "Longitude", ylab = "Latitude", + main = "Mean WRFG-NCEP Temperatures") R> world(add = TRUE) R> left <-seq(1, 400, by = 20)  To investigate potential anisotropy in the relevant subset of these data, we can examine two graphical diagnostics: a heat map and directional sample semivariograms. We use the function variog4 from the geoR package to estimate directional semivariograms to visually assess isotropy properties. indicates that the dependence between observations is stronger in the east-west direction than the north-south direction. The directional dependence is also evidenced by the differences between the directional sample semivariograms (Figure 3b).
R> tas.mat <-matrix(tas, nrow = nx, ncol = ny, byrow = FALSE) R> image.plot(x.coord, y.coord, tas.mat, col = two.colors(n = 256, + start = "blue3", end = "red3", middle = "gray60"), + legend.lab = "Temp (K)", legend.cex = 0.8, legend.line = 2.2, + ylab = "Northing", xlab = "Easting", main = "Subset of Temperatures") R> tas.geodat <-as.geodata(cbind(coords[sub, 1], coords[sub, 2], tas)) R> plot(variog4(tas.geodat), xlab = "distance (meters)", + ylab = "estimated semivariogram") R> title("Directional Sample Semivariograms") The heat map in Figure 3a indicates that the spatial process is anisotropic, having a stronger spatial dependence in the horizontal direction than the vertical direction. Intuitively, northing coordinates are an important factor in determining average temperature, and we need to include its effect in a model for these data. We also notice non-linear trends in temperature as a function of easting coordinates in Figure 3a. Thus, the anisotropy can be attributed, at least in part, to the fact that we have not modeled important covariates related to the process. The directional sample semivariograms in Figure 3b reaffirm the notion that the data exhibit anisotropy as the 90 • sample semivariogram appears much different than the other three. Before modeling the effects of northing and easting coordinates, we use the GuanTestGrid function from package spTest to affirm our understanding that these data exhibit anisotropy.
Necessary conditions for the asymptotic properties of the nonparametric tests to hold are typically met when the data are Gaussian (see Section 5). A quantile-quantile (QQ) plot (not shown) of the relevant subset of WRFG temperatures indicates that a Gaussian assumption is reasonable. To implement the nonparametric test in Guan et al. (2004) via the function GuanTestGrid, we need to specify the spatial lags that will be used to test for differences in the semivariogram. For this test we choose the lag set (7) and use the matrix A in (8) to contrast the semivariogram estimates. With the first row of A and the first two entries of Λ, we are contrasting the estimated dependence structure in the 0 • (h 1 ) and 90 • (h 2 ) directions for data separated by one horizontal or vertical sampling location. The second row of A and second two entries of Λ contrast the estimated dependence structure in the 45 • (h 3 ) and 135 • (h 4 ) directions for data separated by one diagonal sampling location. Because the grid spacing between sampling locations is 50,000 meters, we set the the scaling parameter delta = 50,000. To create subblocks of data used to estimate Σ in (10), we choose a moving window with a size of 4 × 4 grid cells. The moving window dimensions should be chosen so that the window has the same shape (i.e., square or rectangle) and orientation as the sampling domain.
To maximize the amount of data used to estimate Σ, the dimensions of the window should evenly divide the number of columns and rows, respectively, of the entire region. The window dimensions should also be compatible with the spatial lags in Λ. For example, if sampling locations are on the integer grid Z 2 , a window with dimensions of 2 × 2 grid cells cannot be used to estimate the variability of the semivariogram at a lag with Euclidean distance longer than √ 2, the maximum distance between locations in the moving window. For this example there are 20 rows and columns, and we are using lags with spacings of one or two sampling locations; hence, window dimensions of 2 × 2 or 4 × 4 grid cells are reasonable choices. We run the test using window dimensions of 4 × 4 grid cells via the following code, suppressing some of the output for brevity.
As previously mentioned, we need to model the effects of northing and easting UTM coordinates on average temperature. We fit temperature as a nonparametric additive function of both the northing and easting coordinates via least-squares using cubic splines. The cubic splines can be specified using the function ns from the splines package and the least squares fit is computed via the lm function.

R> m1 <-lm(tas~ns(coords[sub, 1], df = 3) + ns(coords[sub, 2], df = 3)) R> summary(m1)
After removing the mean effects of the coordinates, we can check for any remaining (unaccounted for) spatial dependence and evidence of anisotropy in the residuals using graphical diagnostics and a hypothesis test. A QQ plot of the studentized residuals (not shown) indicates that a Gaussian assumption is reasonable.
R> resid.mat <-matrix(studres(m1), nrow = nx, ncol = ny, byrow = FALSE) R> image.plot(x.coord, y.coord, resid.mat, col = two.colors(n = 256, + start = "blue3", end = "red3", middle = "gray60"), xlab = "Easting", + ylab = "Northing") R> title("Heat Map of Studentized Residuals") R> resid.geo <-as.geodata(as.matrix(cbind(coords[sub, 1:2], + studres(m1)))) R> plot(variog4(resid.geo), xlab = "distance (meters)", + ylab = "estimated semivariogram") R> title("Directional Sample Semivariograms") The clusters of similar values in the heat map of Figure 4a, and the increase, followed by a leveling off, of the semivariogram values as the distance increases in the directional sample semivariograms in Figure 4b indicate that the residuals are still spatially dependent. However, the plots in Figure 4 do not clearly illustrate whether or not the residuals exhibit anisotropy. There appears to be directional dependence along the NW to SE direction in the northern parts of the heatmap (Figure 4a). The directional sample semivariograms do not appear to be different until the distance is greater than 200,000 meters. Semivariogram estimates at large distances can be unreliable and a general recommendation is to compute the semivariogram for lags with distance that is less than half of the maximum observed distance separating sampling locations (Schabenberger and Gotway 2004, p. 155). Additionally, directional semivariograms are less reliable than a uni-directional semivariogram because fewer pairs of sampling locations are used at each distance for directional estimation. The unreliability of the sample semivariograms at the larger distances, coupled with the lack of a measure of uncertainty, make it difficult to determine whether or not an assumption of isotropy is reasonable using a plot of the directional sample semivariogams. To gain more insight into the isotropy properties, we perform a nonparametric hypothesis test of isotropy using the residuals with the same choices for Λ, A, and the window dimensions.
R> tr <-GuanTestGrid(spdata = resid.geo, delta = my.delta, + lagmat = mylags, A = myA, df = 2, window.dims = c(4, 4)) R> tr$p.value.finite  Figure 4: Graphical assessments of isotropy in the studentized residuals from the WRFG temperature data. The appearance of elongated areas of similar residual values in the heat map ( Figure 4a) indicates that the process may be anisotropic. The directional semivariograms ( Figure 4b) do not appear to exhibit differences, indicating that the process is isotropic. A nonparametric test of isotropy can assist in determining whether or not an assumption of isotropy is reasonable.
p.value.finite 0.2 Here the residuals do not provide evidence for anisotropy (p value > 0.05). These results suggest that it may be appropriate to choose an isotropic covariance function to model the residuals. However, it is important to note that we have not included the effect of other potentially influential covariates such as elevation or water cover in the model for temperature. Additionally, although we examined a 20 × 20 subset of the data, the grid boxes still cover a large geographic region of the U.S., and thus an assumption of stationarity, which is needed for the asymptotic properties of the hypothesis test to hold, may not be reasonable.

Non-gridded sampling locations
The non-gridded data set used in this section describes monthly surface meteorology in a region of the state of Colorado and comes from the National Center for Atmospheric Research (NCAR). The data are available in the R package fields. For this example, our variable of interest is the log (mm) of the 30-year average of average yearly precipitation at 344 station locations during the time period 1968-1997.
of residuals in Figure 6b indicates that a Gaussian assumption is reasonable. We will use the residuals from this model to check for remaining spatial dependence and potential anisotropy. We use variog4 to estimate directional sample semivariograms. irectional Sample Semivariograms Figure 7: A graphical assessment of isotropy in the studentized residuals from the model relating log(precipitation) to elevation generated by using the geoR function variog4. The directional sample semivariogram in the 0 • direction appears to be different from the other three at distances less than 2. Because there is no measure of uncertainty, it can be difficult to determine whether or not an assumption of isotropy is reasonable.
R> plot(variog4(precip.geo), xlab = "distance (100s of km)", + ylab = "estimated semivariogram", legend = FALSE) R> legend("bottomright", legend = c(expression(0 * degree), + expression(45 * degree), expression(90 * degree), + expression(135 * degree)), col = 1:4, lty = 1:4) R> title("Directional Sample Semivariograms") The increase, followed by a leveling off, of the semivariogram values as distance increases in Figure 7 indicates that there is spatial dependence remaining in the data. We notice that the 0 • semivariogram appears to be slightly different than the other three, but there is no measure of uncertainty, so we cannot determine if the differences are statistically significant. The sample semivariograms also suggest the presence of a large amount of small scale variation, often called a nugget effect, in the data. This can be seen by noting the ratio of semivariogram values at the shortest observed distance to the semivariogram values at the longest reliable distance. The large jumps and decrease in the estimated semivariogram values in Figure 7 indicate that the semivariogram estimates become unreliable beyond a distance of two. The sample semivariogram values in Figure 7 are approximately 0.7 at the shortest distance and approximately 1.1 at a distance of two. This suggests that approximately 63% ≈ (0.7/1.1)100% of the variability in the data is due to small scale variation. We note the apparently large nugget effect because this small scale variation is detrimental to the size and power of nonparametric tests of isotropy (Weller and Hoeting 2016). Despite the small scale variation, we will proceed with nonparametric hypothesis tests to assist in determining if an assumption of isotropy is reasonable.
There are two procedures for testing isotropy in non-gridded data available in package spTest as suggested by Guan et al. (2004) and Maity and Sherman (2012). To choose between these two, we need to decide whether or not it is reasonable to assume that sampling locations are uniformly distributed on the sampling domain. The method for non-gridded data from Guan et al. (2004) relies on the assumption that sampling locations are uniformly distributed while the method proposed by Maity and Sherman (2012) can be used on any general sampling design. To check this assumption, we can turn to methods from the spatial point process literature to perform a test of complete spatial randomness (CSR) (i.e., a uniform spatial distribution) for the sampling locations. Methods for testing CSR are available in the R package spatstat (Baddeley and Turner 2005). For brevity, we do not display the results of the CSR test here, but note that they do not provide evidence against the assumption of CSR for these sampling locations so either test of isotropy can be used.
For both methods suggested by Guan et al. (2004) and Maity and Sherman (2012), we need to choose the lag set, Λ, and the contrast matrix, A. Because semivariogram estimates appear to be unreliable at distances greater than two, we should choose lags having Euclidean distance less than this distance. We choose the lag set The next step in implementing the methods from Guan et al. (2004) and Maity and Sherman (2012) is to determine the size of the moving windows and the block size, respectively, used to estimate the asymptotic variance-covariance matrix, Σ. The moving window is shifted over the sampling region, creating subblocks of data used to estimate Σ. Likewise, for the test in Maity and Sherman (2012), the block size is used to implement the grid-based block bootstrap (GBBB; Lahiri and Zhu 2006).
There are two steps in determining the appropriate window/block size for non-gridded sampling locations. First, we place a grid over the sampling domain; second, we specify scaling parameters that will define the window/block size in terms of that grid. We should complete this two step process while keeping three goals in mind: (1) the number of sampling locations per window/block, denoted n b , should be approximately √ n (Weller and Hoeting 2016); (2) the windows/blocks should have have the same orientation (i.e., square or rectangle) as the entire sampling domain; and (3) the scaling parameters should be compatible with the dimensions of the underlying grid.
For the Colorado precipitation data, recall that one unit of distance equals 100 km. The dimensions of the sampling region are approximately 7.3 × 5.5 (width × height), providing a total area of 40.15. For n = 344 uniformly distributed sampling locations, we expect approximately 344/40.15 = 8.6 sampling locations per unit area. Recalling goal (1), we seek to construct windows/blocks with n b ≈ √ 344 = 18.5 sampling locations, or equivalently, windows/blocks with an area of approximately 18.5/8.6 = 2.15. Goal (2) indicates we want to create rectangular windows/blocks with slightly larger width than height, and (3) says that if our grid divides the x-axis into 12 grid cells, then the scaling parameter defining the width of the window/block should be 3 or 4 because those numbers evenly divide 12. For the CO precipitation data, if we choose our grid to divide the x-axis into 16 cells and the y-axis into 12 cells, we have a grid with (x, y) spacing of roughly (7.3/16, 5.5/12) ≈ (0.46, 0.46). The resulting grid is plotted in Figure 8. Then, choosing our scaling parameters to be 4 × 3, we have windows/blocks with dimensions of approximately (4)(0.46) × (3)(0.46) = 1.84 × 1.38 and area of (1.84)(1.38) ≈ 2.54, or equivalently with an expected number of points per block of n b = (2.54)(8.6) ≈ 21.8.
R> my.color <-two.colors(n = 256, start = "blue3", end = "red3", + middle = "gray60") R> quilt.plot (precip.resid[, 1:2] For the functions GuanTestUnif and MaityTest, the upper and lower limits of the sampling region in the x and y directions are given by the xlims and ylims arguments. Note that the values defining the upper and lower limits should be slightly larger than the minimum and maximum observed x and y coordinates. The horizontal and vertical spacing, respectively, of the grid laid on the sampling region is defined by the two values in grid.spacing. The horizontal and vertical scaling parameters that define the size of the moving windows in GuanTestUnif and blocks in MaityTest in terms of the underlying grid are given by the window.dims and block.dims arguments, respectively. We recommend using visualizations of different grid choices and algebraic calculations, as done above, to assist in choosing a grid and the window/block dimensions. When the scaling parameters defining the moving window or block dimensions are not compatible with the number of rows or columns of gridded sampling locations or the dimensions of the grid laid on the sampling region for non-gridded locations, the functions in package spTest will print a warning message because they do not currently handle partial (incomplete) blocks. Likewise, if the chosen window or block dimensions for non-gridded sampling locations creates (sub)blocks of data with few or no sampling locations, the functions GuanTestUnif and MaityTest will discard (sub)blocks that do not have enough sampling locations and print a warning message. The p value of the hypothesis test will be sensitive to the choice of moving window and block dimensions. See Weller and Hoeting (2016) and the original works (Guan et al. 2004;Maity and Sherman 2012) for more recommendations on choosing these values. The next step for implementing the test in Guan et al. (2004) is choosing the smoothing  Figure 8: Quilt plot of the studentized residuals from the model relating elevation to log(precipitation) for the weather station locations. The grid placed on the region that is used to define the moving windows in Guan et al. (2004) and block size in Maity and Sherman (2012) is also shown. Because the sampling locations are not gridded, it can be difficult to assess isotropy properties via a quilt plot.

Quilt Plot of Residuals and Grid Used for Subsampling
(bandwidth) parameters for smoothing over lags on the entire domain and within each subblock created by the moving windows. The smoothing parameters should be chosen based on the number and density of the sampling locations with larger values of the smoothing parameter inducing higher levels of smoothing, i.e., allowing a greater distance and direction tolerance. In our experience, smoothing parameter values between 0.6 and 0.9 tend to produce reasonable results when using a standard normal Gaussian smoothing kernel truncated at 1.5. However, the p value of the hypothesis test will change with the bandwidth. For this example, we choose a bandwidth of h = 0.70 for smoothing over lags on the entire domain, and a bandwidth of subblock.h = 0.85 for smoothing over lags on on the subblocks of data created by the moving window. Choosing a larger bandwidth for the subblocks equates to allowing for a larger lag distance and direction tolerance, which is needed for subblocks that have few sampling locations. We also use the default Gaussian smoothing kernel (kernel = "norm") truncated at 1.5 (truncation = 1.5). Because the sample size is less than 500, we use a finite sample adjustment to approximate the p value (Guan et al. 2004;Weller and Hoeting 2016).
Finally, for the test in Maity and Sherman (2012) we need to choose the number of bootstrap resamples that will be used in the GBBB procedure to estimate Σ. We recommend using at least 50 bootstrap samples; however, the bootstrapping procedure is computationally intensive. We choose nBoot = 100 bootstrap samples for our example, and we note that the number of bootstrap resamples does not affect the precision of the p value, which is computed via the asymptotic χ 2 distribution. Having determined values for the different options, we can now perform the hypothesis tests. For reproducibility of the bootstrap procedure in the MaityTest function, we set the random seed.
Thus, an anisotropic model may be appropriate for the residuals. Additionally, the apparent anisotropy may also be present due to unaccounted for covariates (e.g., northing/easting coordinates).

Discussion
Choosing a covariance function is an important step in modeling spatially-referenced data and a variety of choices for the covariance function are available (e.g., anisotropy, nonstationarity, parametric forms). The R package spTest implements several nonparametric tests for checking isotropy properties which avoid specifying a parametric form for the covariance function. Weller and Hoeting (2016) perform a simulation study comparing the empirical size and power of the methods for different degrees of anisotropy. They find that methods from Guan et al. (2004) tend to outperform the competitors for gridded and non-gridded data.
One concern regarding the methods in package spTest is that they tend to have low power when the anisotropy is weak and the data are not gridded (Weller and Hoeting 2016;Guan et al. 2004;Maity and Sherman 2012). A second concern is that the results of the tests are potentially sensitive to user choices, for example, the moving window size and bandwidth in the method by Guan et al. (2004). The optimal choice of these values is still an open question. Weller and Hoeting (2016) offer further recommendations on how to choose the user defined values, such as the window size and bandwidth, based on simulated data. Finally, as noted earlier, the size and power of the methods are adversely affected by the presence of small scale variation (nugget effect). Because of these concerns, we recommend using the nonparametric methods in conjunction with graphical techniques.
An implicit assumption of the methods discussed in this paper is ergodicity of the spatial process, an assumption that is difficult to verify (Cressie 1993, pp. 57-58). However, there are two important assumptions which practitioners should consider. The first is an assumption of strict stationarity, (1). While this assumption is difficult to check, it follows from assuming that the RF is weakly stationary and Gaussian (4). The assumption of Gaussian data lies at the heart of many spatial analyses (Gelfand and Schliep 2016) and is easily checked with a QQ plot. The assumption of weak stationarity may be questionable for spatial data over large geographic regions and methods have been developed for testing this assumption (see, e.g., Corstanje, Grunwald, and Lark 2008;Fuentes 2005;Jun and Genton 2012;Bandyopadhyay and Subba Rao 2017). The second assumption required is a mixing condition that states that the dependence between observations goes to 0 at large distances (Hall and Patil 1994;Sherman and Carlstein 1994). In the case of a stationary GRF, this condition is met when the covariance function, C(h), is 0 for sufficiently large h , which also implies ergodicity (Cressie 1993, p. 58). One way to check this assumption in practice is to look for a leveling off of the sample covariogram or semivariogram values as distance increases, indicating that the data are nearly independent at large distances.
After determining whether or not an assumption of isotropy is reasonable, we can choose a parametric or nonparametric model for the covariance function. Weller and Hoeting (2016) further illustrate the role of nonparametric tests of isotropy in the process of modeling spatially-referenced data. We have demonstrated how graphical techniques and the functions available in the R package spTest can be used in a complementary role to check for anisotropy. Future work includes extending the functionality of package spTest to handle non-rectangular sampling domains and improving computational efficiency by programming functions in C++.