Journal of Statistical Software http://www.jstatsoft.org/rss Thu, 18 Sep 2014 23:43:06 GMT Thu, 18 Sep 2014 23:43:06 GMT Most recent publications from the Journal of Statistical Software structSSI: Simultaneous and Selective Inference for Grouped or Hierarchically Structured Data http://www.jstatsoft.org/v59/i13/paper Vol. 59, Issue 13, Sep 2014

Abstract:

The R package structSSI provides an accessible implementation of two recently developed simultaneous and selective inference techniques: the group Benjamini-Hochberg and hierarchical false discovery rate procedures. Unlike many multiple testing schemes, these methods specifically incorporate existing information about the grouped or hierarchical dependence between hypotheses under consideration while controlling the false discovery rate. Doing so increases statistical power and interpretability. Furthermore, these procedures provide novel approaches to the central problem of encoding complex dependency between hypotheses.
We briefly describe the group Benjamini-Hochberg and hierarchical false discovery rate procedures and then illustrate them using two examples, one a measure of ecological microbial abundances and the other a global temperature time series. For both procedures, we detail the steps associated with the analysis of these particular data sets, including establishing the dependence structures, performing the test, and interpreting the results. These steps are encapsulated by R functions, and we explain their applicability to general data sets.

]]>
Fri, 12 Sep 2014 07:00:00 GMT http://www.jstatsoft.org/v59/i13
Capabilities of R Package mixAK for Clustering Based on Multivariate Continuous and Discrete Longitudinal Data http://www.jstatsoft.org/v59/i12/paper Vol. 59, Issue 12, Sep 2014

Abstract:

R package mixAK originally implemented routines primarily for Bayesian estimation of finite normal mixture models for possibly interval-censored data. The functionality of the package was considerably enhanced by implementing methods for Bayesian estimation of mixtures of multivariate generalized linear mixed models proposed in Komárek and Komárková (2013). Among other things, this allows for a cluster analysis (classification) based on multivariate continuous and discrete longitudinal data that arise whenever multiple outcomes of a different nature are recorded in a longitudinal study. This package also allows for a data-driven selection of a number of clusters as methods for selecting a number of mixture components were implemented. A model and clustering methodology for multivariate continuous and discrete longitudinal data is overviewed. Further, a step-by-step cluster analysis based jointly on three longitudinal variables of different types (continuous, count, dichotomous) is given, which provides a user manual for using the package for similar problems.

]]>
Fri, 12 Sep 2014 07:00:00 GMT http://www.jstatsoft.org/v59/i12
hmmm: An R Package for Hierarchical Multinomial Marginal Models http://www.jstatsoft.org/v59/i11/paper Vol. 59, Issue 11, Sep 2014

Abstract:

In this paper we show how complete hierarchical multinomial marginal (HMM) models for categorical variables can be defined, estimated and tested using the R package hmmm. Models involving equality and inequality constraints on marginal parameters are needed to define hypotheses of conditional independence, stochastic dominance or notions of positive dependence, or when the parameters are allowed to depend on covariates. The hmmm package also serves the need of estimating and testing HMM models under equality and inequality constraints on marginal interactions.

]]>
Fri, 12 Sep 2014 07:00:00 GMT http://www.jstatsoft.org/v59/i11
Tidy Data http://www.jstatsoft.org/v59/i10/paper Vol. 59, Issue 10, Sep 2014

Abstract:

A huge amount of effort is spent cleaning data to get it ready for analysis, but there has been little research on how to make data cleaning as easy and effective as possible. This paper tackles a small, but important, component of data cleaning: data tidying. Tidy datasets are easy to manipulate, model and visualize, and have a specific structure: each variable is a column, each observation is a row, and each type of observational unit is a table. This framework makes it easy to tidy messy datasets because only a small set of tools are needed to deal with a wide range of un-tidy datasets. This structure also makes it easier to develop tidy tools for data analysis, tools that both input and output tidy datasets. The advantages of a consistent data structure and matching tools are demonstrated with a case study free from mundane data manipulation chores.

]]>
Fri, 12 Sep 2014 07:00:00 GMT http://www.jstatsoft.org/v59/i10
A Kenward-Roger Approximation and Parametric Bootstrap Methods for Tests in Linear Mixed Models – The R Package pbkrtest http://www.jstatsoft.org/v59/i09/paper Vol. 59, Issue 9, Sep 2014

Abstract:

When testing for reduction of the mean value structure in linear mixed models, it is common to use an asymptotic χ2 test. Such tests can, however, be very poor for small and moderate sample sizes. The pbkrtest package implements two alternatives to such approximate χ2 tests: The package implements (1) a Kenward-Roger approximation for performing F tests for reduction of the mean structure and (2) parametric bootstrap methods for achieving the same goal. The implementation is focused on linear mixed models with independent residual errors. In addition to describing the methods and aspects of their implementation, the paper also contains several examples and a comparison of the various methods.

]]>
Fri, 12 Sep 2014 07:00:00 GMT http://www.jstatsoft.org/v59/i09
dglars: An R Package to Estimate Sparse Generalized Linear Models http://www.jstatsoft.org/v59/i08/paper Vol. 59, Issue 8, Sep 2014

Abstract:

dglars is a publicly available R package that implements the method proposed in Augugliaro, Mineo, and Wit (2013), developed to study the sparse structure of a generalized linear model. This method, called dgLARS, is based on a differential geometrical extension of the least angle regression method proposed in Efron, Hastie, Johnstone, and Tibshirani (2004). The core of the dglars package consists of two algorithms implemented in Fortran 90 to efficiently compute the solution curve: a predictor-corrector algorithm, proposed in Augugliaro et al. (2013), and a cyclic coordinate descent algorithm, proposed in Augugliaro, Mineo, and Wit (2012). The latter algorithm, as shown here, is significantly faster than the predictor-corrector algorithm. For comparison purposes, we have implemented both algorithms.

]]>
Fri, 12 Sep 2014 07:00:00 GMT http://www.jstatsoft.org/v59/i08
SNSequate: Standard and Nonstandard Statistical Models and Methods for Test Equating http://www.jstatsoft.org/v59/i07/paper Vol. 59, Issue 7, Sep 2014

Abstract:

Equating is a family of statistical models and methods that are used to adjust scores on two or more versions of a test, so that the scores from different tests may be used interchangeably. In this paper we present the R package SNSequate which implements both standard and nonstandard statistical models and methods for test equating. The package construction was motivated by the need of having a modular, simple, yet comprehensive, and general software that carries out traditional and new equating methods. SNSequate currently implements the traditional mean, linear and equipercentile equating methods, as well as the mean-mean, mean-sigma, Haebara and Stocking-Lord item response theory linking methods. It also supports the newest methods such as local equating, kernel equating, and item response theory parameter linking methods based on asymmetric item characteristic functions. Practical examples are given to illustrate the capabilities of the software. A list of other programs for equating is presented, highlighting the main differences between them. Future directions for the package are also discussed.

]]>
Fri, 12 Sep 2014 07:00:00 GMT http://www.jstatsoft.org/v59/i07
phtt: Panel Data Analysis with Heterogeneous Time Trends in R http://www.jstatsoft.org/v59/i06/paper Vol. 59, Issue 6, Sep 2014

Abstract:

The R package phtt provides estimation procedures for panel data with large dimensions n, T, and general forms of unobservable heterogeneous effects. Particularly, the estimation procedures are those of Bai (2009) and Kneip, Sickles, and Song (2012), which complement one another very well: both models assume the unobservable heterogeneous effects to have a factor structure. Kneip et al. (2012) considers the case in which the time-varying common factors have relatively smooth patterns including strongly positively auto-correlated stationary as well as non-stationary factors, whereas the method of Bai (2009) focuses on stochastic bounded factors such as ARMA processes. Additionally, the phtt package provides a wide range of dimensionality criteria in order to estimate the number of the unobserved factors simultaneously with the remaining model parameters.

]]>
Fri, 12 Sep 2014 07:00:00 GMT http://www.jstatsoft.org/v59/i06
mediation: R Package for Causal Mediation Analysis http://www.jstatsoft.org/v59/i05/paper Vol. 59, Issue 5, Sep 2014

Abstract:

In this paper, we describe the R package mediation for conducting causal mediation analysis in applied empirical research. In many scientific disciplines, the goal of researchers is not only estimating causal effects of a treatment but also understanding the process in which the treatment causally affects the outcome. Causal mediation analysis is frequently used to assess potential causal mechanisms. The mediation package implements a comprehensive suite of statistical tools for conducting such an analysis. The package is organized into two distinct approaches. Using the model-based approach, researchers can estimate causal mediation effects and conduct sensitivity analysis under the standard research design. Furthermore, the design-based approach provides several analysis tools that are applicable under different experimental designs. This approach requires weaker assumptions than the model-based approach. We also implement a statistical method for dealing with multiple (causally dependent) mediators, which are often encountered in practice. Finally, the package also offers a methodology for assessing causal mediation in the presence of treatment noncompliance, a common problem in randomized trials.

]]>
Tue, 02 Sep 2014 07:00:00 GMT http://www.jstatsoft.org/v59/i05
Nestedness for Dummies (NeD): A User-Friendly Web Interface for Exploratory Nestedness Analysis http://www.jstatsoft.org/v59/c03/paper Vol. 59, Code Snippet 3, Aug 2014

]]>
Wed, 13 Aug 2014 07:00:00 GMT http://www.jstatsoft.org/v59/c03
General Purpose Convolution Algorithm in S4 Classes by Means of FFT http://www.jstatsoft.org/v59/i04/paper Vol. 59, Issue 4, Aug 2014

Abstract:

Object orientation provides a flexible framework for the implementation of the convolution of arbitrary distributions of real-valued random variables. We discuss an algorithm which is based on the fast Fourier transform. It directly applies to lattice-supported distributions. In the case of continuous distributions an additional discretization to a linear lattice is necessary and the resulting lattice-supported distributions are suitably smoothed after convolution.
We compare our algorithm to other approaches aiming at a similar generality as to accuracy and speed. In situations where the exact results are known, several checks confirm a high accuracy of the proposed algorithm which is also illustrated for approximations of non-central χ2 distributions.
By means of object orientation this default algorithm is overloaded by more specific algorithms where possible, in particular where explicit convolution formulae are available. Our focus is on R package distr which implements this approach, overloading operator + for convolution; based on this convolution, we define a whole arithmetics of mathematical operations acting on distribution objects, comprising operators +, -, *, /, and ^.

]]>
Wed, 13 Aug 2014 07:00:00 GMT http://www.jstatsoft.org/v59/i04
ART: A Data Aggregation Program for the Behavioral Sciences http://www.jstatsoft.org/v59/i03/paper Vol. 59, Issue 3, Aug 2014

Abstract:

Today, many experiments in the field of behavioral sciences are conducted using a computer. While there is a broad choice of computer programs facilitating the process of conducting experiments as well as programs for statistical analysis there are relatively few programs facilitating the intermediate step of data aggregation. ART has been developed in order to fill this gap and to provide a computer program for data aggregation that has a graphical user interface such that aggregation can be done more easily and without any programming. All “rules” that are necessary to extract variables can be seen “at a glance” which helps the user to conduct even complex aggregations with several hundreds of variables and makes aggregation more resistant against errors. ART runs with Windows XP, Vista, 7, and 8 and it is free. Copies (executable and source code) are available at http://www.psychologie.hhu.de/arbeitsgruppen/allgemeine-psychologie-und-arbeitspsychologie/art.html.

]]>
Wed, 13 Aug 2014 07:00:00 GMT http://www.jstatsoft.org/v59/i03
The R Package survsim for the Simulation of Simple and Complex Survival Data http://www.jstatsoft.org/v59/i02/paper Vol. 59, Issue 2, Aug 2014

Abstract:

We present an R package for the simulation of simple and complex survival data. It covers different situations, including recurrent events and multiple events. The main simulation routine allows the user to introduce an arbitrary number of distributions, each corresponding to a new event or episode, with its parameters, choosing between the Weibull (and exponential as a particular case), log-logistic and log-normal distributions.

]]>
Wed, 13 Aug 2014 07:00:00 GMT http://www.jstatsoft.org/v59/i02
cancerclass: An R Package for Development and Validation of Diagnostic Tests from High-Dimensional Molecular Data http://www.jstatsoft.org/v59/i01/paper Vol. 59, Issue 1, Aug 2014

Abstract:

Progress in molecular high-throughput techniques has led to the opportunity of a comprehensive monitoring of biomolecules in medical samples. In the era of personalized medicine, these data form the basis for the development of diagnostic, prognostic and predictive tests for cancer. Because of the high number of features that are measured simultaneously in a relatively low number of samples, supervised learning approaches are sensitive to overfitting and performance overestimation. Bioinformatic methods were developed to cope with these problems including control of accuracy and precision. However, there is demand for easy-to-use software that integrates methods for classifier construction, performance assessment and development of diagnostic tests. To contribute to filling of this gap, we developed a comprehensive R package for the development and validation of diagnostic tests from high-dimensional molecular data. An important focus of the package is a careful validation of the classification results. To this end, we implemented an extended version of the multiple random validation protocol, a validation method that was introduced before. The package includes methods for continuous prediction scores. This is important in a clinical setting, because scores can be converted to probabilities and help to distinguish between clear-cut and borderline classification results. The functionality of the package is illustrated by the analysis of two cancer microarray data sets.

]]>
Wed, 13 Aug 2014 07:00:00 GMT http://www.jstatsoft.org/v59/i01
runmixregls: A Program to Run the MIXREGLS Mixed-Effects Location Scale Software from within Stata http://www.jstatsoft.org/v59/c02/paper Vol. 59, Code Snippet 2, Aug 2014

]]>
Wed, 13 Aug 2014 07:00:00 GMT http://www.jstatsoft.org/v59/c02