Journal of Statistical Software http://www.jstatsoft.org/rss Wed, 04 Mar 2015 18:43:09 GMT Wed, 04 Mar 2015 18:43:09 GMT Most recent publications from the Journal of Statistical Software Structured Additive Regression Models: An R Interface to BayesX http://www.jstatsoft.org/v63/i21/paper Vol. 63, Issue 21, Feb 2015

Abstract:

Structured additive regression (STAR) models provide a flexible framework for model- ing possible nonlinear effects of covariates: They contain the well established frameworks of generalized linear models and generalized additive models as special cases but also allow a wider class of effects, e.g., for geographical or spatio-temporal data, allowing for specification of complex and realistic models. BayesX is standalone software package providing software for fitting general class of STAR models. Based on a comprehensive open-source regression toolbox written in C++, BayesX uses Bayesian inference for estimating STAR models based on Markov chain Monte Carlo simulation techniques, a mixed model representation of STAR models, or stepwise regression techniques combining penalized least squares estimation with model selection. BayesX not only covers models for responses from univariate exponential families, but also models from less-standard regression situations such as models for multi-categorical responses with either ordered or unordered categories, continuous time survival data, or continuous time multi-state models. This paper presents a new fully interactive R interface to BayesX: the R package R2BayesX. With the new package, STAR models can be conveniently specified using R’s formula language (with some extended terms), fitted using the BayesX binary, represented in R with objects of suitable classes, and finally printed/summarized/plotted. This makes BayesX much more accessible to users familiar with R and adds extensive graphics capabilities for visualizing fitted STAR models. Furthermore, R2BayesX complements the already impressive capabilities for semiparametric regression in R by a comprehensive toolbox comprising in particular more complex response types and alternative inferential procedures such as simulation-based Bayesian inference.

]]>
Mon, 16 Feb 2015 08:00:00 GMT http://www.jstatsoft.org/v63/i21
Spatial Data Analysis with R-INLA with Some Extensions http://www.jstatsoft.org/v63/i20/paper Vol. 63, Issue 20, Feb 2015

Abstract:

The integrated nested Laplace approximation (INLA) provides an interesting way of approximating the posterior marginals of a wide range of Bayesian hierarchical models. This approximation is based on conducting a Laplace approximation of certain functions and numerical integration is extensively used to integrate some of the models parameters out.
The R-INLA package offers an interface to INLA, providing a suitable framework for data analysis. Although the INLA methodology can deal with a large number of models, only the most relevant have been implemented within R-INLA. However, many other important models are not available for R-INLA yet.
In this paper we show how to fit a number of spatial models with R-INLA, including its interaction with other R packages for data analysis. Secondly, we describe a novel method to extend the number of latent models available for the model parameters. Our approach is based on conditioning on one or several model parameters and fit these conditioned models with R-INLA. Then these models are combined using Bayesian model averaging to provide the final approximations to the posterior marginals of the model.
Finally, we show some examples of the application of this technique in spatial statistics. It is worth noting that our approach can be extended to a number of other fields, and not only spatial statistics.

]]>
Mon, 16 Feb 2015 08:00:00 GMT http://www.jstatsoft.org/v63/i20
Bayesian Spatial Modelling with R-INLA http://www.jstatsoft.org/v63/i19/paper Vol. 63, Issue 19, Feb 2015

Abstract:

The principles behind the interface to continuous domain spatial models in the R- INLA software package for R are described. The integrated nested Laplace approximation (INLA) approach proposed by Rue, Martino, and Chopin (2009) is a computationally effective alternative to MCMC for Bayesian inference. INLA is designed for latent Gaussian models, a very wide and flexible class of models ranging from (generalized) linear mixed to spatial and spatio-temporal models. Combined with the stochastic partial differential equation approach (SPDE, Lindgren, Rue, and Lindström 2011), one can accommodate all kinds of geographically referenced data, including areal and geostatistical ones, as well as spatial point process data. The implementation interface covers stationary spatial mod- els, non-stationary spatial models, and also spatio-temporal models, and is applicable in epidemiology, ecology, environmental risk assessment, as well as general geostatistics.

]]>
Mon, 16 Feb 2015 08:00:00 GMT http://www.jstatsoft.org/v63/i19
Comparing Implementations of Estimation Methods for Spatial Econometrics http://www.jstatsoft.org/v63/i18/paper Vol. 63, Issue 18, Feb 2015

Abstract:

Recent advances in the implementation of spatial econometrics model estimation techniques have made it desirable to compare results, which should correspond between implementations across software applications for the same data. These model estimation techniques are associated with methods for estimating impacts (emanating effects), which are also presented and compared. This review constitutes an up-to-date comparison of generalized method of moments and maximum likelihood implementations now available. The comparison uses the cross-sectional US county data set provided by Drukker, Prucha, and Raciborski (2013d). The comparisons will be cast in the context of alternatives using the MATLAB Spatial Econometrics toolbox, Stata's user-written sppack commands, Python with PySAL and R packages including spdep, sphet and McSpatial.

]]>
Mon, 16 Feb 2015 08:00:00 GMT http://www.jstatsoft.org/v63/i18
GWmodel: An R Package for Exploring Spatial Heterogeneity Using Geographically Weighted Models http://www.jstatsoft.org/v63/i17/paper Vol. 63, Issue 17, Feb 2015

Abstract:

Spatial statistics is a growing discipline providing important analytical techniques in a wide range of disciplines in the natural and social sciences. In the R package GWmodel we present techniques from a particular branch of spatial statistics, termed geographically weighted (GW) models. GW models suit situations when data are not described well by some global model, but where there are spatial regions where a suitably localized calibration provides a better description. The approach uses a moving window weighting technique, where localized models are found at target locations. Outputs are mapped to provide a useful exploratory tool into the nature of the data spatial heterogeneity. Currently, GWmodel includes functions for: GW summary statistics, GW principal components analysis, GW regression, and GW discriminant analysis; some of which are provided in basic and robust forms.

]]>
Mon, 16 Feb 2015 08:00:00 GMT http://www.jstatsoft.org/v63/i17
SPODT: An R Package to Perform Spatial Partitioning http://www.jstatsoft.org/v63/i16/paper Vol. 63, Issue 16, Feb 2015

Abstract:

Spatial cluster detection is a classical question in epidemiology: Are cases located near other cases? In order to classify a study area into zones of different risks and determine their boundaries, we have developed a spatial partitioning method based on oblique decision trees, which is called spatial oblique decision tree (SpODT). This non-parametric method is based on the classification and regression tree (CART) approach introduced by Leo Breiman. Applied to epidemiological spatial data, the algorithm recursively searches among the coordinates for a threshold or a boundary between zones, so that the risks estimated in these zones are as different as possible. While the CART algorithm leads to rectangular zones, providing perpendicular splits of longitudes and latitudes, the SpODT algorithm provides oblique splitting of the study area, which is more appropriate and accurate for spatial epidemiology. Oblique decision trees can be considered as non-parametric regression models. Beyond the basic function, we have developed a set of functions that enable extended analyses of spatial data, providing: inference, graphical representations, spatio-temporal analysis, adjustments on covariates, spatial weighted partition, and the gathering of similar adjacent final classes. In this paper, we propose a new R package, SPODT, which provides an extensible set of functions for partitioning spatial and spatio-temporal data. The implementation and extensions of the algorithm are described. Function usage examples are proposed, looking for clustering malaria episodes in Bandiagara, Mali, and samples showing three different cluster shapes.

]]>
Mon, 16 Feb 2015 08:00:00 GMT http://www.jstatsoft.org/v63/i16
spTimer: Spatio-Temporal Bayesian Modeling Using R http://www.jstatsoft.org/v63/i15/paper Vol. 63, Issue 15, Feb 2015

Abstract:

Hierarchical Bayesian modeling of large point-referenced space-time data is increasingly becoming feasible in many environmental applications due to the recent advances in both statistical methodology and computation power. Implementation of these methods using the Markov chain Monte Carlo (MCMC) computational techniques, however, requires development of problem-specific and user-written computer code, possibly in a low-level language. This programming requirement is hindering the widespread use of the Bayesian model-based methods among practitioners and, hence there is an urgent need to develop high-level software that can analyze large data sets rich in both space and time. This paper develops the package spTimer for hierarchical Bayesian modeling of stylized environmental space-time monitoring data as a contributed software package in the R language that is fast becoming a very popular statistical computing platform. The package is able to fit, spatially and temporally predict large amounts of space-time data using three recently developed Bayesian models. The user is given control over many options regarding covariance function selection, distance calculation, prior selection and tuning of the implemented MCMC algorithms, although suitable defaults are provided. The package has many other attractive features such as on the fly transformations and an ability to spatially predict temporally aggregated summaries on the original scale, which saves the problem of storage when using MCMC methods for large datasets. A simulation example, with more than a million observations, and a real life data example are used to validate the underlying code and to illustrate the software capabilities.

]]>
Mon, 16 Feb 2015 08:00:00 GMT http://www.jstatsoft.org/v63/i15
spate: An R Package for Spatio-Temporal Modeling with a Stochastic Advection-Diffusion Process http://www.jstatsoft.org/v63/i14/paper Vol. 63, Issue 14, Feb 2015

Abstract:

The R package spate implements methodology for modeling of large space-time data sets. A spatio-temporal Gaussian process is defined through a stochastic partial differential equation (SPDE) which is solved using spectral methods. In contrast to the traditional geostatistical way of relying on the covariance function, the spectral SPDE approach is computationally tractable and provides a realistic space-time parametrization.
This package aims at providing tools for simulating and modeling of spatio-temporal processes using an SPDE based approach. The package contains functions for obtaining parametrizations, such as propagator or innovation covariance matrices, of the spatio-temporal model. This allows for building customized hierarchical Bayesian models using the SPDE based model at the process stage. The functions of the package then provide computationally efficient algorithms needed for doing inference with the hierarchical model. Furthermore, an adaptive Markov chain Monte Carlo (MCMC) algorithm implemented in the package can be used as an algorithm for doing inference without any additional modeling. This function is flexible and allows for application specific customizing. The MCMC algorithm supports data that follow a Gaussian or a censored distribution with point mass at zero. Spatio-temporal covariates can be included in the model through a regression term.

]]>
Fri, 13 Feb 2015 08:00:00 GMT http://www.jstatsoft.org/v63/i14
spBayes for Large Univariate and Multivariate Point-Referenced Spatio-Temporal Data Models http://www.jstatsoft.org/v63/i13/paper Vol. 63, Issue 13, Feb 2015

Abstract:

In this paper we detail the reformulation and rewrite of core functions in the spBayes R package. These efforts have focused on improving computational efficiency, flexibility, and usability for point-referenced data models. Attention is given to algorithm and computing developments that result in improved sampler convergence rate and efficiency by reducing parameter space; decreased sampler run-time by avoiding expensive matrix computations, and; increased scalability to large datasets by implementing a class of predictive process models that attempt to overcome computational hurdles by representing spatial processes in terms of lower-dimensional realizations. Beyond these general computational improvements for existing model functions, we detail new functions for modeling data indexed in both space and time. These new functions implement a class of dynamic spatio-temporal models for settings where space is viewed as continuous and time is taken as discrete.

]]>
Fri, 13 Feb 2015 08:00:00 GMT http://www.jstatsoft.org/v63/i13
Model-Based Geostatistics the Easy Way http://www.jstatsoft.org/v63/i12/paper Vol. 63, Issue 12, Feb 2015

Abstract:

This paper briefly describes geostatistical models for Gaussian and non-Gaussian data and demonstrates the geostatsp and dieasemapping packages for performing inference using these models. Making use of R’s spatial data types, and raster objects in particular, makes spatial analyses using geostatistical models simple and convenient. Examples using real data are shown for Gaussian spatial data, binomially distributed spatial data, a log-Gaussian Cox process, and an area-level model for case counts.

]]>
Fri, 13 Feb 2015 08:00:00 GMT http://www.jstatsoft.org/v63/i12
geoCount: An R Package for the Analysis of Geostatistical Count Data http://www.jstatsoft.org/v63/i11/paper Vol. 63, Issue 11, Feb 2015

Abstract:

We describe the R package geoCount for the analysis of geostatistical count data. The package performs Bayesian analysis for the Poisson-lognormal and binomial-logitnormal spatial models, which are subclasses of the class of generalized linear spatial models proposed by Diggle, Tawn, and Moyeed (1998). The package implements the computational intensive tasks in C++ using an R/C++ interface, and has parallel computation capabilities to speed up the computations. geoCount also implements group updating, Langevin- Hastings algorithms and a data-based parameterization, algorithmic approaches proposed by Christensen, Roberts, and Sko ̈ld (2006) to improve the efficiency of the Markov chain Monte Carlo algorithms. In addition, the package includes functions for simulation and visualization, as well as three geostatistical count datasets taken from the literature. One of those is used to illustrate the package capabilities. Finally, we provide a side-by-side comparison between geoCount and the R packages geoRglm and INLA.

]]>
Tue, 10 Feb 2015 08:00:00 GMT http://www.jstatsoft.org/v63/i11
Software for Spatial Statistics http://www.jstatsoft.org/v63/i01/paper Vol. 63, Issue 1, Feb 2015

Abstract:

We give an overview of the papers published in this special issue on spatial statistics, of the Journal of Statistical Software. 21 papers address issues covering visualization (micromaps, links to Google Maps or Google Earth), point pattern analysis, geostatistics, analysis of areal aggregated or lattice data, spatio-temporal statistics, Bayesian spatial statistics, and Laplace approximations. We also point to earlier publications in this journal on the same topic.

]]>
Tue, 10 Feb 2015 08:00:00 GMT http://www.jstatsoft.org/v63/i01
micromap: A Package for Linked Micromaps http://www.jstatsoft.org/v63/i02/paper Vol. 63, Issue 2, Feb 2015

Abstract:

The R package micromap is used to create linked micromaps, which display statistical summaries associated with areal units, or polygons. Linked micromaps provide a means to simultaneously summarize and display both statistical and geographic distributions by linking statistical summaries to a series of small maps. The package contains functions dependent on the ggplot2 package to produce a row-oriented graph composed of different panels, or columns, of information. These panels at a minimum typically contain maps, a legend, and statistical summaries, with the color-coded legend linking the maps and statistical summaries. We first describe the layout of linked micromaps and then the structure required for both the spatial and statistical datasets. The function create_map_table in the micromap package converts the input of an sp SpatialPolygonsDataFrame into a data frame that can be linked with the statistical dataset. Highly detailed polygons are not appropriate for display in linked micromaps so we describe how polygon boundaries can be simplified, decreasing the time required to draw the graphs, while retaining adequate detail for detection of spatial patterns. Our worked examples of linked micromaps use public health data as well as environmental data collected from spatially balanced probabilistic surveys.

]]>
Tue, 10 Feb 2015 08:00:00 GMT http://www.jstatsoft.org/v63/i02
micromapST: Exploring and Communicating Geospatial Patterns in US State Data http://www.jstatsoft.org/v63/i03/paper Vol. 63, Issue 3, Feb 2015

Abstract:

The linked micromap graphical design uses color to link each geographic unit’s name with its statistical graphic elements and map location across columns in a single row. Perceptual grouping of these rows into smaller chunks of data facilitates local focus and visual queries. Sorting the geographic units (the rows) in different ways can reveal patterns in the statistics, in the maps, and in the association between them. This design supports both exploration and communication in a multivariate geospatial context. This paper describes micromapST, an R package that implements the linked micromap graphical design specifically formatted for US state data, a common geographic unit used to display geographic patterns of health and other factors within the US. This package creates a graphic for the 51 geographic units (50 states plus DC) that fits on a single page, with states comprising the rows and state names, graphs and maps the columns. The graphical element for each state/column combination may represent a single statistical value, e.g., by a dot or horizontal bar, with or without an uncertainty measure. The distribution of values within each state, e.g., for counties, may be displayed by a boxplot. Two values per state may be represented by an arrow indicating the change in values, e.g., between two time points, or a scatter plot of the paired data. Categorical counts may be displayed as horizontal stacked bars, with optional standardization to percents or centering of the bars. Layout options include specification of the sort order for the rows, the graph/map linking colors, a vertical reference line and others. Output may be directed to the screen but is best displayed on a printer (or as a print image saved to any file format supported by R). The availability of a pre-defined linked micromap layout specifically for the 51 US states with graphical displays of single values, data distributions, change between two values, scatter plots of paired values, time series data and categorical data, facilitates quick exploration and communication of US state data for most common data types.

]]>
Tue, 10 Feb 2015 08:00:00 GMT http://www.jstatsoft.org/v63/i03
RgoogleMaps and loa: Unleashing R Graphics Power on Map Tiles http://www.jstatsoft.org/v63/i04/paper Vol. 63, Issue 4, Feb 2015

Abstract:

The RgoogleMaps package provides (1) an R interface to query the Google and the OpenStreetMap servers for static maps in the form of PNGs, and (2) enables the user to overlay plots on those maps within R. The loa package provides dedicated panel functions to integrate RgoogleMaps within the lattice plotting environment.
In addition to solving the generic task of plotting on a map background in R, we introduce several specific algorithms to detect and visualize spatio-temporal clusters. This task can often be reduced to detecting over-densities in space relative to a background density. The relative density estimation is framed as a binary classification problem. An integrated hotspot visualizer is presented which allows the efficient identification and visualization of clusters in one environment. Competing clustering methods such as the scan statistic and the density scan offer higher detection power at a much larger computational cost. Such clustering methods can then be extended using the lattice trellis framework to provide further insight into the relationship between clusters and potentially influential parameters. While there are other options for such map ‘mashups’ we believe that the integration of RgoogleMaps and lattice using loa can in certain circumstances be advantageous, e.g., by providing a highly intuitive working environment for multivariate analysis and flexible testbed for the rapid development of novel data visualizations.

]]>
Tue, 10 Feb 2015 08:00:00 GMT http://www.jstatsoft.org/v63/i04