Journal of Statistical Software http://www.jstatsoft.org/rss Mon, 20 Apr 2015 00:45:26 GMT Mon, 20 Apr 2015 00:45:26 GMT Most recent publications from the Journal of Statistical Software SAVE: An R Package for the Statistical Analysis of Computer Models http://www.jstatsoft.org/v64/i13/paper Vol. 64, Issue 13, Apr 2015

Abstract:

This paper introduces the R package SAVE which implements statistical methodology for the analysis of computer models. Namely, the package includes routines that perform emulation, calibration and validation of this type of models. The methodology is Bayesian and is essentially that of Bayarri, Berger, Paulo, Sacks, Cafeo, Cavendish, Lin, and Tu (2007). The package is available through the Comprehensive R Archive Network. We illustrate its use with a real data example and in the context of a simulated example.

]]>
Wed, 08 Apr 2015 07:00:00 GMT http://www.jstatsoft.org/v64/i13
GPfit: An R Package for Fitting a Gaussian Process Model to Deterministic Simulator Outputs http://www.jstatsoft.org/v64/i12/paper Vol. 64, Issue 12, Apr 2015

Abstract:

Gaussian process (GP) models are commonly used statistical metamodels for emulating expensive computer simulators. Fitting a GP model can be numerically unstable if any pair of design points in the input space are close together. Ranjan, Haynes, and Karsten (2011) proposed a computationally stable approach for fitting GP models to deterministic computer simulators. They used a genetic algorithm based approach that is robust but computationally intensive for maximizing the likelihood. This paper implements a slightly modified version ofthe model proposed by Ranjan et al. (2011 ) in the R package GPfit. A novel parameterization of the spatial correlation function and a clustering based multi-start gradient based optimization algorithm yield robust optimization that is typically faster than the genetic algorithm based approach. We present two examples with R codes to illustrate the usage of the main functions in GPfit . Several test functions are used for performance comparison with the popular R package mlegp . We also use GPfit for a real application, i.e., for emulating the tidal kinetic energy model for the Bay of Fundy, Nova Scotia, Canada. GPfit is free software and distributed under the General Public License and available from the Comprehensive R Archive Network.

]]>
Wed, 08 Apr 2015 07:00:00 GMT http://www.jstatsoft.org/v64/i12
BatchJobs and BatchExperiments: Abstraction Mechanisms for Using R in Batch Environments http://www.jstatsoft.org/v64/i11/paper Vol. 64, Issue 11, Mar 2015

Abstract:

Empirical analysis of statistical algorithms often demands time-consuming experiments. We present two R packages which greatly simplify working in batch computing environments. The package BatchJobs implements the basic objects and procedures to control any batch cluster from within R. It is structured around cluster versions of the well-known higher order functions Map, Reduce and Filter from functional programming. Computations are performed asynchronously and all job states are persistently stored in a database, which can be queried at any point in time. The second package, BatchExperiments, is tailored for the still very general scenario of analyzing arbitrary algorithms on problem instances. It extends package BatchJobs by letting the user define an array of jobs of the kind “apply algorithm A to problem instance P and store results”. It is possible to associate statistical designs with parameters of problems and algorithms and therefore to systematically study their influence on the results.

The packages’ main features are: (a) Convenient usage: All relevant batch system operations are either handled internally or mapped to simple R functions. (b) Portability: Both packages use a clear and well-defined interface to the batch system which makes them applicable in most high-performance computing environments. (c) Reproducibility: Every computational part has an associated seed to ensure reproducibility even when the underlying batch system changes. (d) Abstraction and good software design: The code layers for algorithms, experiment definitions and execution are cleanly separated and enable the writing of readable and maintainable code.

]]>
Fri, 20 Mar 2015 07:00:00 GMT http://www.jstatsoft.org/v64/i11
gems: An R Package for Simulating from Disease Progression Models http://www.jstatsoft.org/v64/i10/paper Vol. 64, Issue 10, Mar 2015

Abstract:

Mathematical models of disease progression predict disease outcomes and are useful epidemiological tools for planners and evaluators of health interventions. The R package gems is a tool that simulates disease progression in patients and predicts the effect of different interventions on patient outcome. Disease progression is represented by a series of events (e.g., diagnosis, treatment and death), displayed in a directed acyclic graph. The vertices correspond to disease states and the directed edges represent events. The package gems allows simulations based on a generalized multistate model that can be described by a directed acyclic graph with continuous transition-specific hazard functions. The user can specify an arbitrary hazard function and its parameters. The model includes parameter uncertainty, does not need to be a Markov model, and may take the history of previous events into account. Applications are not limited to the medical field and extend to other areas where multistate simulation is of interest. We provide a technical explanation of the multistate models used by gems, explain the functions of gems and their arguments, and show a sample application.

]]>
Fri, 20 Mar 2015 07:00:00 GMT http://www.jstatsoft.org/v64/i10
nparcomp: An R Software Package for Nonparametric Multiple Comparisons and Simultaneous Confi*dence Intervals http://www.jstatsoft.org/v64/i09/paper Vol. 64, Issue 9, Mar 2015

Abstract:

One-way layouts, i.e., a single factor with several levels and multiple observations at each level, frequently arise in various fields. Usually not only a global hypothesis is of interest but also multiple comparisons between the different treatment levels. In most practical situations, the distribution of observed data is unknown and there may exist a number of atypical measurements and outliers. Hence, use of parametric and semiparametric procedures that impose restrictive distributional assumptions on observed samples becomes questionable. This, in turn, emphasizes the demand on statistical procedures that enable us to accurately and reliably analyze one-way layouts with minimal conditions on available data. Nonparametric methods offer such a possibility and thus become of particular practical importance. In this article, we introduce a new R package nparcomp which provides an easy and user-friendly access to rank-based methods for the analysis of unbalanced one-way layouts. It provides procedures performing multiple comparisons and computing simultaneous confidence intervals for the estimated effects which can be easily visualized. The special case of two samples, the nonparametric Behrens-Fisher problem, is included. We illustrate the implemented procedures by examples from biology and medicine.

]]>
Fri, 20 Mar 2015 07:00:00 GMT http://www.jstatsoft.org/v64/i09
R Package multgee: A Generalized Estimating Equations Solver for Multinomial Responses http://www.jstatsoft.org/v64/i08/paper Vol. 64, Issue 8, Mar 2015

Abstract:

The R package multgee implements the local odds ratios generalized estimating equations (GEE) approach proposed by Touloumis, Agresti, and Kateri (2013), a GEE approach for correlated multinomial responses that circumvents theoretical and practical limitations of the GEE method. A main strength of multgee is that it provides GEE routines for both ordinal (ordLORgee) and nominal (nomLORgee) responses, while relevant other softwares in R and SAS are restricted to ordinal responses under a marginal cumulative link model specification. In addition, multgee offers a marginal adjacent categories logit model for ordinal responses and a marginal baseline category logit model for nominal responses. Further, utility functions are available to ease the local odds ratios structure selection (intrinsic.pars) and to perform a Wald type goodness-of-fit test between two nested GEE models (waldts). We demonstrate the application of multgee through a clinical trial with clustered ordinal multinomial responses.

]]>
Fri, 20 Mar 2015 07:00:00 GMT http://www.jstatsoft.org/v64/i08
PReMiuM: An R Package for Profile Regression Mixture Models Using Dirichlet Processes http://www.jstatsoft.org/v64/i07/paper Vol. 64, Issue 7, Mar 2015

Abstract:

PReMiuM is a recently developed R package for Bayesian clustering using a Dirichlet process mixture model. This model is an alternative to regression models, non- parametrically linking a response vector to covariate data through cluster membership (Molitor, Papathomas, Jerrett, and Richardson 2010). The package allows binary, categorical, count and continuous response, as well as continuous and discrete covariates. Additionally, predictions may be made for the response, and missing values for the covariates are handled. Several samplers and label switching moves are implemented along with diagnostic tools to assess convergence. A number of R functions for post-processing of the output are also provided. In addition to fitting mixtures, it may additionally be of interest to determine which covariates actively drive the mixture components. This is implemented in the package as variable selection.

]]>
Fri, 20 Mar 2015 07:00:00 GMT http://www.jstatsoft.org/v64/i07
SDD: An R Package for Serial Dependence Diagrams http://www.jstatsoft.org/v64/c02/paper Vol. 64, Code Snippet 2, Mar 2015

]]>
Fri, 20 Mar 2015 07:00:00 GMT http://www.jstatsoft.org/v64/c02
Building a Nomogram for Survey-Weighted Cox Models Using R http://www.jstatsoft.org/v64/c01/paper Vol. 64, Code Snippet 1, Mar 2015

]]>
Fri, 20 Mar 2015 07:00:00 GMT http://www.jstatsoft.org/v64/c01
NHPoisson: An R Package for Fitting and Validating Nonhomogeneous Poisson Processes http://www.jstatsoft.org/v64/i06/paper Vol. 64, Issue 6, Mar 2015

Abstract:

NHPoisson is an R package for the modeling of nonhomogeneous Poisson processes in one dimension. It includes functions for data preparation, maximum likelihood estimation, covariate selection and inference based on asymptotic distributions and simulation methods. It also provides specific methods for the estimation of Poisson processes resulting from a peak over threshold approach. In addition, the package supports a wide range of model validation tools and functions for generating nonhomogenous Poisson process trajectories. This paper is a description of the package and aims to help those interested in modeling data using nonhomogeneous Poisson processes.

]]>
Fri, 20 Mar 2015 07:00:00 GMT http://www.jstatsoft.org/v64/i06
Constructing and Modifying Sequence Statistics for relevent Using informR in R http://www.jstatsoft.org/v64/i05/paper Vol. 64, Issue 5, Mar 2015

Abstract:

The informR package greatly simplifies the analysis of complex event histories in R by providing user friendly tools to build sufficient statistics for the relevent package. Historically, building sufficient statistics to model event sequences (of the form a'b) using the egocentric generalization of Butts’ (2008) relational event framework for modeling social action has been cumbersome. The informR package simplifies the construction of the complex list of arrays needed by the rem() model fitting for a variety of cases involving egocentric event data, multiple event types, and/or support constraints. This paper introduces these tools using examples from real data extracted from the American Time Use Survey.

]]>
Fri, 20 Mar 2015 07:00:00 GMT http://www.jstatsoft.org/v64/i05
fitdistrplus: An R Package for Fitting Distributions http://www.jstatsoft.org/v64/i04/paper Vol. 64, Issue 4, Mar 2015

Abstract:

The package fitdistrplus provides functions for fitting univariate distributions to different types of data (continuous censored or non-censored data and discrete data) and allowing different estimation methods (maximum likelihood, moment matching, quantile matching and maximum goodness-of-fit estimation). Outputs of fitdist and fitdistcens functions are S3 objects, for which specific methods are provided, including summary, plot and quantile. This package also provides various functions to compare the fit of several distributions to the same data set and can handle to bootstrap parameter estimates. Detailed examples are given in food risk assessment, ecotoxicology and insurance contexts.

]]>
Fri, 20 Mar 2015 07:00:00 GMT http://www.jstatsoft.org/v64/i04
Exploring Diallelic Genetic Markers: The HardyWeinberg Package http://www.jstatsoft.org/v64/i03/paper Vol. 64, Issue 3, Mar 2015

Abstract:

Testing genetic markers for Hardy-Weinberg equilibrium is an important issue in genetic association studies. The HardyWeinberg package offers the classical tests for equilibrium, functions for power computation and for the simulation of marker data under equilibrium and disequilibrium. Functions for testing equilibrium in the presence of missing data by using multiple imputation are provided. The package also supplies various graphical tools such as ternary plots with acceptance regions, log-ratio plots and Q-Q plots for exploring the equilibrium status of a large set of diallelic markers. Classical tests for equilibrium and graphical representations for diallelic marker data are reviewed. Several data sets illustrate the use of the package.

]]>
Fri, 20 Mar 2015 07:00:00 GMT http://www.jstatsoft.org/v64/i03
Fitting Heavy Tailed Distributions: The poweRlaw Package http://www.jstatsoft.org/v64/i02/paper Vol. 64, Issue 2, Mar 2015

Abstract:

Over the last few years, the power law distribution has been used as the data generating mechanism in many disparate fields. However, at times the techniques used to fit the power law distribution have been inappropriate. This paper describes the poweRlaw R package, which makes fitting power laws and other heavy-tailed distributions straightforward. This package contains R functions for fitting, comparing and visualizing heavy tailed distributions. Overall, it provides a principled approach to power law fitting.

]]>
Fri, 20 Mar 2015 07:00:00 GMT http://www.jstatsoft.org/v64/i02
iqLearn: Interactive Q-Learning in R http://www.jstatsoft.org/v64/i01/paper Vol. 64, Issue 1, Mar 2015

Abstract:

Chronic illness treatment strategies must adapt to the evolving health status of the patient receiving treatment. Data-driven dynamic treatment regimes can offer guidance for clinicians and intervention scientists on how to treat patients over time in order to bring about the most favorable clinical outcome on average. Methods for estimating optimal dynamic treatment regimes, such as Q-learning, typically require modeling non- smooth, nonmonotone transformations of data. Thus, building well-fitting models can be challenging and in some cases may result in a poor estimate of the optimal treatment regime. Interactive Q-learning (IQ-learning) is an alternative to Q-learning that only requires modeling smooth, monotone transformations of the data. The R package iqLearn provides functions for implementing both the IQ-learning and Q-learning algorithms. We demonstrate how to estimate a two-stage optimal treatment policy with iqLearn using a generated data set bmiData which mimics a two-stage randomized body mass index reduction trial with binary treatments at each stage.

]]>
Fri, 20 Mar 2015 07:00:00 GMT http://www.jstatsoft.org/v64/i01