Introducing localgauss , an R Package for Estimating and Visualizing Local Gaussian Correlation

Quantifying non-linear dependence structures between two random variables is a challenging task. There exist several bona-ﬁde dependence measures able to capture the strength of the non-linear association, but they typically give little information about how the variables are associated. This problem has been recognized by several authors and has given rise to the concept of local measures of dependence. A local measure of dependence is able to capture the “local” dependence structure in a particular region. The idea is that the global dependence structure is better described by a portfolio of local measures of dependence computed in diﬀerent regions than a one-number measure of dependence. This paper introduces the R package localgauss which estimates and visualizes a measure of local dependence called local Gaussian correlation. The package provides a function for estimation, a function for local independence testing and corresponding functions for visualization purposes, which are all demonstrated with examples.


Introduction
The Pearson correlation coefficient is by far the most widely used measure of dependence between two stochastic variables. For variables with a linear relationship it measures both the strength and the direction of the (linear) dependence, and for jointly Gaussian variables it completely characterizes the dependence structure. However, it is well known that it is not adequate in many cases, and one can easily construct examples where there is complete dependence, but where the correlation coefficient is 0 (e.g., take Y = X 2 where X is any symmetric variable with mean 0 and existing third moment). Other valid measures of dependence exist (see e.g., Rosenblatt and Wahlen 1992;Skaug and Tjøstheim 1993;Tjøstheim 1996;Székely, Rizzo, and Bakirov 2007;Székely and Rizzo 2009), but unlike the Pearson correlation, these measures typically do not distinguish between positive and negative dependencies. Indeed, in general a signed one-number measure cannot properly capture non-linear dependence (see e.g., Embrechts, McNeil, and Straumann 2002). The issue of defining positive and negative dependence for variables in a non-linear relationship is a rather delicate one (see e.g., Lehmann 1966). For such variables there can be regions within their support where there is stronger dependence, positive or negative, and other regions where it is weaker. For example, it has been observed that the dependence between financial variables becomes stronger during an economic downturn, leading to a stronger association between large losses. It may be more suitable to describe such situations by a local measure of dependence that can characterize the dependence structure within different regions of the data, rather than using a one-number measure.
This paper accompanies the R package localgauss, publicly available for Linux, Mac and Windows at the Comprehensive R Archive Network (CRAN, http://CRAN.R-project.org/ package=localgauss), which estimates and visualizes a novel local measure of dependence called local Gaussian correlation, introduced by Tjøstheim and Hufthammer (2013). See also Støve, Tjøstheim, and Hufthammer (2012), Berentsen, Støve, Tjøstheim, and ø (2013), Støve and Tjøstheim (2014) and Berentsen and Tjøstheim (2014). R is a free software environment for statistical computing and graphics (R Core Team 2013), and the R language is widely used among statisticians.
The rest of the article is organized as follows. A brief review of a competing local measures of dependence is presented below. An introduction to local Gaussian correlation and the package localgauss is then described in Section 2. Finally, in Section 3, we demonstrate the usage of the package on several examples.

Local measures of dependence
For two random variables X 1 and X 2 , a local dependence measure is allowed to vary within the support of (X 1 , X 2 ). Besides this property, local measures of dependence proposed in the statistical literature have quite different properties and motivations (see e.g., Berentsen and Tjøstheim 2014, and references therein). The local measure with most similarities to our methodology is perhaps the "local dependence function" proposed by Holland and Wang (1987). This measure is derived by generalizing the concept of cross-product ratios in discrete two-way contingency tables to the case of a continuous bivariate density and is given by where f is the bivariate density of (X 1 , X 2 ). Further motivation for γ is given in Jones (1996) along with an empirical counterpart based on kernel estimates of the density and its derivatives. Properties of γ are given in Jones (1996) and a method for visualizing the dependence structure in bivariate data based on γ can be found in Jones and Koch (2003). Elements of this methodology are adopted in our software to visually present the local dependence measure described in Section 2, and in many situations the two approaches give broadly similar results (see Berentsen and Tjøstheim 2014). Software for the competing methodology can be obtained from the second author of Jones and Koch (2003) upon request.

Local Gaussian correlation and package localgauss
The idea of local Gaussian correlation is very simple. Let X = (X 1 , X 2 ) be a random variable with a general bivariate density f . Then in a neighborhood of each point x = (x 1 , x 2 ) we fit a bivariate Gaussian density . (1) where v = (v 1 , v 2 ) is the running variable in the Gaussian distribution and with µ i (x), i = 1, 2 the local means, σ i (x), i = 1, 2 the local standard deviations and ρ(x) the local correlation at the point x. The population values of the local parameters θ b (x) = θ(x) are obtained by minimizing the local penalty function where ) is a product kernel with bandwidth b = (b 1 , b 2 ), and we define the "local Gaussian correlation" ρ b (x) = ρ(x) as the last element of the vector θ(x) that minimizes q. Moving to another point x , q can be used to obtain a new Gaussian approximation density ψ(v, θ(x )), which approximates f in a neighborhood of x .
In this way f may be represented by a family of Gaussian bivariate densities as x varies, and in each specific neighborhood of x, the local dependence properties are described by ρ(x). We may define the (local) dependence to be positive if ρ(x) > 0 and negative if ρ(x) < 0.
The penalty function q was used in Hjort and Jones (1996) for density estimation purposes, and later by Tjøstheim and Hufthammer (2013) in the development of local Gaussian correlation. In general, q can be used to fit any parametric distribution to f locally, and in Hjort and Jones (1996) it is argued that q can be interpreted as a locally weighted Kullback-Leibler criterion for measuring the distance between f (·) and the chosen parametric distribution (in our case ψ(·, θ(x))).
Given observations X i = (X i1 , X i2 ), i = 1, . . . , n, from f the corresponding estimatesθ(x) are obtained by maximizing the local log-likelihood function (see Hjort and Jones 1996) To see that the local likelihood function (3) is consistent with the penalty function q, first observe that when θ(x) is chosen to minimize q, it satisfies the set of equations Using the notation and assuming E{K b (X i − x)u j (X i , θ(x))} < ∞, the law of large numbers gives almost surely as n → ∞ and we see that (6) can be identified with (4). Note that as b → ∞ (3) reduces to the ordinary log-likelihood for a Gaussian distribution ψ plus a constant, and hence ρ(x) reduces to the ordinary global Gaussian correlation. For more details about the local parameters and estimation using local likelihood, including limiting behavior as b i → 0, i = 1, 2, and estimation of standard errors, we refer to Tjøstheim and Hufthammer (2013).

Function localgauss
The equations ∂L ∂θ j = 0, j = 1, . . . , 5, with ∂L ∂θ j given by (6) do not in general have closed form solutions, hence (3) must be maximized directly. In the R package localgauss the function localgauss() maximizes (3) for different values of x using a modified Newton's method with line-search and returns an S3 object of class 'localgauss'. The function (3) is not everywhere concave, and eigenvalue modification is therefore employed to ensure that the scaling matrix is positive definite (Nocedal and Wright 1999, Chapter 6). The optimizer and objective function are written in Fortran 90 (Metcalf and Reid 1999), and the source code for the gradient and Hessian of (3) are generated using the automatic differentiation tool TAPENADE (Hascoët and Pascual 2004).
The number M of points x = (x 1 , x 2 ) for which we want to estimate the local Gaussian correlation can be manually specified by the argument xy.mat which is an M times 2 matrix. Alternatively, the selection of these points can be done using the methodology of Jones and Koch (2003). First a regular N × N grid is placed across the area of interest. The regular grid is then screened by selecting the grid points x 1 , . . . , x M satisfyingf (x j ) ≥ C, for some constant C and a density estimatorf . For simple and quick implementation this is done using the R package MASS (Venables and Ripley 2002). In localgauss() the values of N and C are set by the arguments gsize and hthresh, respectively. If xy.mat is not specified it will be selected internally by the method described above. Since (3) has to be optimized for every estimation point, the computational time increases proportionally with the size of xy.mat. Calling localgauss() with a 500 times 2 xy.mat, even for a rather large sample size (n ≈ 1000), should not take longer than a few seconds with a personal computer of today's standard. An overview of the other arguments and output of localgauss() is given in Table 1. Note that we have used the notation x and y for the marginal data vectors rather than x 1 and x 2 .
Choosing the bandwidth b = (b 1 , b 2 ) in localgauss() is left to the user. In this way, the local Gaussian correlation can be estimated with different bandwidths, reflecting the dependence structure on different scales of locality increasing to the global correlation as b → ∞. The population value ρ b (x) = ρ(x) is defined as the minimizer of (2) with the bandwidth b fixed. If one wants a more objective choice for estimating the limiting value ρ(x) as b → 0, note that Tjøstheim and Hufthammer (2013) propose a bandwidth algorithm that aims to balance the variance of the estimated local parametersθ(x) versus the bias of the resulting density estimate ψ(·,θ(x)). An alternative method is described in Berentsen et al. (2013) in the context of Arguments Description Default x, y The two data vectors. b1, b2 The bandwidth in the x-direction and y-direction, respectively. gsize The gridsize (only used if xy.mat is not specified).

hthresh
Grid points where a non-parametric density estimate is lower than hthresh are omitted (only used if xy.mat is not specified). 0.001 xy.mat A M times 2 matrix of points where the local parameters are to be estimated. NULL

Results
Description par.est M times 5 matrix of parameter estimates, with columns µ 1 , µ 2 , σ 1 , σ 2 and ρ. eflag M -vector of exit flags from the optimizer. Estimations with exit flags other than 0 should not be trusted. hessian The negative Hessian of the objective function (3). copula goodness-of-fit testing. A method of general applicability is likelihood cross-validation as explained in Berentsen and Tjøstheim (2014). It has worked well in cases tested by us, but may require screening of outlying observations (see Berentsen and Tjøstheim 2014).

Function localgauss.indtest
In an independence testing situation it may be of interest to pinpoint possible discrepancies between the data and the null-hypothesis of independence. Such discrepancies can be investigated more closely by doing several "local tests of independence" as described in Berentsen and Tjøstheim (2014). This idea is similar but not identical to the idea of local permutation testing described in Jones and Koch (2003).
The local test of independence is based on estimating the null-distribution of the estimated local Gaussian correlation by re-sampling from the product of the empirical marginal distributions. This is a standard way of estimating the null-distribution for any statistic suitable for independence testing. The difference in the present case is that we may have a portfolio of test statisticsρ(x i ) for several different points x i , and that we need to produce the null distribution for each of them. In the R package localgauss this can be done by passing a 'localgauss' object, produced by the function localgauss(), to the function localgauss.indtest(). The null-distribution is estimated for each point specified in xy.mat in the 'localgauss' object. The function localgauss.indtest() then returns an S3 object of class 'localgauss.indtest', and the result of the test can be found in the vector test.results: An estimated local correlation for the original data significantly larger than the null-distribution is indicated with +1; an estimated local correlation for the original data insignificant with respect to the null-distribution is indicated with 0; an estimated local correlation for the original data significantly smaller than the null-distribution is indicated with −1. Recall that when b → ∞, the local log-likelihood (3) reduces to the ordinary loglikelihood for Gaussian variables. Note therefore that all the results in test.results will coincide with that of a bootstrap test of the global Gaussian correlation when b → ∞.
In general, the choice of bootstrap replicas R is a compromise between reducing the effect of  random sampling error and available computing power and time. Our experience suggests that using R = 500 bootstrap replicas is adequate, with larger values having little effect on the results. This means that the computational time can be significant even in situations where the original 'localgauss' object only takes a few seconds to produce. The re-sampling loop is constructed using the R package foreach (Kane, Emerson, and Weston 2013; Revolution Analytics and Weston 2013b) which supports parallel execution for users with multiple processors/cores on their computer or access to multiple nodes of a cluster. The function localgauss.ind() will automatically run in parallel if a "parallel backend" compatible with the foreach package is registered. Registering a parallel backend in R can be done via e.g., the R package doParallel (Revolution Analytics and Weston 2013a) which supports both Unix-like systems and Windows. When no parallel backend is registered localgauss.ind() will issue a warning that it is running sequentially. Note that the computational time depends to a large degree on the size of the xy.mat in the 'localgauss' object. An overview of the other arguments and output of localgauss.indtest() is given in Table 2.

Graphics
Graphics are important for proper interpretation and presentation of the results given by the functions localgauss() and localgauss.indtest() described above. The following plotting routines are based on the R package ggplot2 (Wickham 2009).
The S3 method for graphically displaying a 'localgauss' object is invoked by applying plot() to such an object. The function displays the estimated local Gaussian correlationρ(x i ) in tiles for each point in xy.mat. The numerical value ofρ(x i ) is indicated by a color gradient between highcol (which represents large values) and lowcol (which represents low values). This color gradient may be taken to be continuous (divergent.col.grad = FALSE) or divergent with 0 as midpoint (divergent.col.grad = TRUE). We recommend the former choice when the range of the values ofρ(x i ) is small. For the latter choice, values close to 0 are indicated with the color white and we recommend this choice in the presence of both positive and negative values ofρ(x i ). If plot.text = TRUE the numerical value is added to each tile. An overview of the other arguments of the S3 plot method for 'localgauss' objects are given in Table 3.

plot.text
If TRUE, the numerical values of the estimated TRUE local correlation are added to each tile.

plot.points
If TRUE, the original observations are overlain.

FALSE tsize
The font size used if plot.text is TRUE.

lowcol
The color used to indicate small values ofρ(x i ). "cyan" highcol The color used to indicate large values ofρ(x i ). "magenta" point.col The color used for observation points if "black" plot.points is TRUE. point.size The size of observation points if plot.points is 1 TRUE. xlab, ylab The label of the x-axis and y-axis, respectively. divergent.col.grad If TRUE, a divergent color gradient between lowcol TRUE and highcol with 0 as midpoint is used. If FALSE an ordinary color gradient between lowcol and highcol is used. The label of the x-axis and y-axis, respectively. Similarly, an S3 plot method for graphically displaying a 'localgauss.indtest' object is available. The test.results from the 'localgauss.indtest' object are displayed in tiles for each point in xy.mat. The values of test.results are indicated with poscol for +1, negcol for −1 and white for 0. The resulting plot is an analogue to the "dependence map" proposed in Jones and Koch (2003) for the local dependence function. It can be seen as a compromise between the very fine details given by the estimated local Gaussian correlation and the crude characterization of dependence given by a single global measure. An overview of the other arguments of the S3 plot method for 'localgauss.indtest' objects are given in Table 4.

Usage examples
In the following examples, all R code demonstrated assumes that the package has been loaded and we set the random seed to make all the examples reproducible: R> library("localgauss") R> set.seed(1)
The following code creates a 'localgauss' object where xy.mat is generated internally using the arguments gsize and hthresh: R> lg.out2 <-localgauss(x = x, y = y, b1 = 1, b2 = 1, gsize = 15, + hthresh = 0.01) Typically, 'localgauss' objects where the xy.mat is generated internally are well suited for plotting using their corresponding S3 plot method. To check that the value of hthresh is suitable we recommend overlaying the observations in the plot to visually check that estimation is done only in points with a neighborhood containing a "reasonable" amount of observations (otherwise the value of hthresh should be adjusted). For the previous 'localgauss' object this is done by R> plot(lg.out2, plot.text = FALSE, plot.points = TRUE)    which produces the plot shown in Figure 1. Indeed, we see that the neighborhood of all points in the area of estimation (colored pink) contains observations. By default the S3 plot method will include the numerical value of the estimated local correlation. Thus we need only write R> plot(lg.out2) to obtain Figure 2. As expected, the local Gaussian correlation is constant equal to 0.6 everywhere modulo the sampling variation.

Example 2: Wikipedia model
Several simulated examples where the ordinary Pearson correlation fails to capture non-linear dependence structure can be found at Wikipedia (2012). One of these examples can be simulated in the following way R> x <-runif(1000, -1, 1) R> y <-(x^2 + runif(1000, 0, 1/2)) * sample(c(-1, 1), 1000, replace = TRUE) A scatter-plot of these variables is displayed in Figure 3, where we clearly see a strong nonlinear relationship between the two variables. However, the sample correlation is R> cor(x, y) [1] -0.02702666 which is due to the symmetry between positive and negative dependence in the data. The following code creates a 'localgauss' object and plots it without including numerical values in each tile: R> lg.out <-localgauss(x = x, y = y, b1 = 0.5, b2 = 0.5, gsize = 100, + hthresh = 0.15) R> plot (lg.out, plot.text = FALSE) This results in Figure 4. Note that the initial gridsize is gsize × gsize = 100 × 100, which after the screening procedure resulted in a 4684 times 2 xy.mat. This means that (3) must be maximized for 4684 different points which on the first author's laptop (with 2.4 GHz Intel Core i5 CPU and 8 GB memory running Linux) took about one minute. Such an amount of estimation points is hardly necessary in practice, and the neighboring estimates in this example are extremely close in value. This also means that the corresponding plot produced by the S3 plot method will be very smooth.

Example 3: Uranium exploration data set
The Uranium exploration data set of Cook and Johnson (1986) consists of 655 chemical analyzes from water samples collected from the Montrose quad-rangle of Western Colorado. These data can be obtained via the R package copula (Yan 2007): R> data("uranium", package = "copula") R> attach(uranium)  Note that package copula requires that the GNU Scientific Library (GSL) is installed on the computer (see Galassi, Davies, Theiler, Jungman, Alken, Booth, and Rossi 2009). Amongst several other elements, concentrations of cesium (Cs) and scandium (Sc) were measured. We are interested in the relation between the concentration of these two elements. Point estimates of local Gaussian parameters at the points (1.8, 0.7) (the lower tail) and (2.3, 1.2) (the upper tail) are obtained by R> xy.mat <-rbind(c(1.8, 0.7), c(2.3, 1.2)) R> lg.out <-localgauss(x = Cs, y = Sc, xy.mat = xy.mat, b1 = 0.6, b2 = 0.4) R> lg.out$par.est The following code creates a 'localgauss' object suitable for plotting: R> lg.out <-localgauss(x = Cs, y = Sc, b1 = 0.6, b2 = 0.4, gsize = 15, + hthresh = 0.1) To check if we have chosen a suitable value of hthresh we include the observations in the first plot: R> plot(lg.out, plot.text = FALSE, plot.points = TRUE, + xlab = "Cs", ylab = "Sc", divergent.col.grad = FALSE) which results in Figure 5. We see that the neighborhood of all points in the area of estimation contains observations indicating a good choice of hthresh. Note that with the choice divergent.col.grad = FALSE a continuous color gradient is used, which is the better choice for discriminating between values when the range of the values is small. Including numerical values of the estimated local Gaussian correlation can be done by R> plot(lg.out, xlab = "Cs", ylab = "Sc", divergent.col.grad = FALSE) as displayed in Figure 6. Figure 6 indicates that small concentrations of cesium are typically associated with small values of scandium, while large values of cesium are associated with medium to large values of scandium.

Example 4: Non-linear regression
Finally we illustrate the usage of plot() for 'localgauss.ind' objects and localgauss.ind() when X 1 and X 2 are given by the non-linear regression model X 2 = X 2 1 + , where X 1 and are independent standard normal. This is yet another example where the ordinary Pearson's correlation is zero even though X 1 and X 2 are dependent. We start by creating a 'localgauss' object with estimates of the local parameters in the points (−1, 1), (0, 0) and (1, 1) for n = 500 observations simulated from the model:  R> x <-rnorm(500) R> y <-x^2 + rnorm(500) R> xy.mat <-rbind(c(-1, 1), c(0, 0), c(1, 1)) R> lg.out1 <-localgauss(x, y, b1 = .5, b2 = 1, xy.mat = xy.mat) Performing the local test of independence in the same points can be done by passing the 'localgauss' object to localgauss.ind(): R> ind.out1 <-localgauss.indtest(lg.out1, R = 500) R> ind.out1$test.results [1] -1 0 1 We see that the local Gaussian correlation is significantly negative in (−1, 1), significantly positive in (1, 1) and neither in (0, 0) at a (default) 10% significance level. In the previous computation no parallel backend is registered and the bootstrapping is done sequentially. Nevertheless, since the null-distribution ofρ(x i ) is constructed for only three points, the computational time is quite reasonable (about 10 seconds on the first author's laptop). The following code creates a 'localgauss' object where the xy.mat is generated internally and plots it with overlain observations: R> lg.out2 <-localgauss(x, y, b1 = .5, b2 = 1, hthresh = 0.015, gsize = 30) R> plot(lg.out2, plot.text = FALSE, plot.points = TRUE) This results in Figure 7. Applying localgauss.indtest() on this 'localgauss' object can be quite time consuming since the new xy.mat contains 118 points. The computational time can be shortened by registering a parallel backend if the user has multiple cores/processors on his/hers computer. On a Unix-like system with 28 cores (say) a parallel backend can be registered using the doParallel package in the following way: R> library("doParallel") R> registerDoParallel(cores = 28) The function localgauss.indtest() will then automatically run in parallel: R> ind.out2 <-localgauss.indtest(lg.out2, R = 500) On a computer with 4 AMD Opteron 6128 CPUs (a total of 32 cores) and 128 GB of memory this took about 25 seconds, while using just one core on the same computer (sequential run) took about 11 minutes. The result can then be passed to the plot method for 'localgauss.indtest' objects for a graphical display of the test results:

R> plot(ind.out2)
This results in Figure 8 which displays the regions whereρ(x i ) is significantly positive and negative.

Conclusions
This paper introduces the R package localgauss which estimates and visualizes local Gaussian correlation. The package enables users to easily utilize the methodology of local Gaussian  correlation without constructing their own optimization routines. The package can be used for exploring the dependence structure in datasets and can be seen as a supplement to the raw information given by a scatter-plot. Other applications of the package can be found in Berentsen and Tjøstheim (2014) and Berentsen et al. (2013). The former article illustrates how the local Gaussian correlation can be used to construct both global and local tests of independence, while the latter article discusses how the local Gaussian correlation can be used to describe and recognize the dependence structure in various copula models. The package has also been used for parts of the work in Støve and Tjøstheim (2014).