Journal of Statistical Software http://www.jstatsoft.org/rss Wed, 29 Jul 2015 21:57:57 GMT Wed, 29 Jul 2015 21:57:57 GMT Most recent publications from the Journal of Statistical Software An Improved Evaluation of Kolmogorov’s Distribution http://www.jstatsoft.org/v65/c03/paper Vol. 65, Code Snippet 3, Jun 2015

]]>
Sun, 21 Jun 2015 07:00:00 GMT http://www.jstatsoft.org/v65/c03
ergm.graphlets: A Package for ERG Modeling Based on Graphlet Statistics http://www.jstatsoft.org/v65/i12/paper Vol. 65, Issue 12, Jun 2015

Abstract:

Exponential-family random graph models are probabilistic network models that are parametrized by sufficient statistics based on structural (i.e., graph-theoretic) properties. The ergm package for the R statistical computing environment is a collection of tools for the analysis of network data within an exponential-family random graph model framework. Many different network properties can be employed as sufficient statistics for exponential- family random graph models by using the model terms defined in the ergm package; this functionality can be expanded by the creation of packages that code for additional network statistics. Here, our focus is on the addition of statistics based on graphlets. Graphlets are classes of small, connected, induced subgraphs that can be used to describe the topological structure of a network. We introduce an R package called ergm.graphlets that enables the use of graphlet properties of a network within the ergm package of R. The ergm.graphlets package provides a complete list of model terms that allows to incorporate statistics of any 2-, 3-, 4- and 5-node graphlets into exponential-family random graph models. The new model terms of the ergm.graphlets package enable both exponential-family random graph modeling of global structural properties and investigation of relationships between node attributes (i.e., covariates) and local topologies around nodes.

]]>
Sun, 21 Jun 2015 07:00:00 GMT http://www.jstatsoft.org/v65/i12
DiceDesign and DiceEval: Two R Packages for Design and Analysis of Computer Experiments http://www.jstatsoft.org/v65/i11/paper Vol. 65, Issue 11, Jun 2015

Abstract:

This paper introduces two R packages available on the Comprehensive R Archive network. The main application concerns the study of computer code output. Package DiceDesign is dedicated to numerical design of experiments, from the construction to the study of the design properties. Package DiceEval deals with the fit, the validation and the comparison of metamodels. After a brief presentation of the context, we focus on the architecture of these two packages. A two-dimensional test function will be a running example to illustrate the main functionalities of these packages and an industrial case study in five dimensions will also be detailed.

]]>
Sun, 21 Jun 2015 07:00:00 GMT http://www.jstatsoft.org/v65/i11
remote: Empirical Orthogonal Teleconnections in R http://www.jstatsoft.org/v65/i10/paper Vol. 65, Issue 10, Jun 2015

Abstract:

In climate science, teleconnection analysis has a long standing history as a means for describing regions that exhibit above average capability of explaining variance over time within a certain spatial domain (e.g., global). The most prominent example of a global coupled ocean-atmosphere teleconnection is the El Nin ̃o Southern Oscillation. There are numerous signal decomposition methods for identifying such regions, the most widely used of which are (rotated) empirical orthogonal functions. First introduced by van den Dool, Saha, and Johansson (2000), empirical orthogonal teleconnections (EOT) denote a regression based approach that allows for straight-forward interpretation of the extracted modes. In this paper we present the R implementation of the original algorithm in the remote package. To highlight its usefulness, we provide three examples of potential use- case scenarios for the method including the replication of one of the original examples from van den Dool et al. (2000). Furthermore, we highlight the algorithm’s use for cross- correlations between two different geographic fields (identifying sea surface temperature drivers for precipitation), as well as statistical downscaling from coarse to fine grids (using Normalized Difference Vegetation Index fields).

]]>
Sun, 21 Jun 2015 07:00:00 GMT http://www.jstatsoft.org/v65/i10
Mann-Whitney Type Tests for Microarray Experiments: The R Package gMWT http://www.jstatsoft.org/v65/i09/paper Vol. 65, Issue 9, Jun 2015

Abstract:

We present the R package gMWT which is designed for the comparison of several treatments (or groups) for a large number of variables. The comparisons are made using certain probabilistic indices (PI). The PIs computed here tell how often pairs or triples of observations coming from different groups appear in a specific order of magnitude. Classical two and several sample rank test statistics such as the Mann-Whitney-Wilcoxon, Kruskal-Wallis, or Jonckheere-Terpstra test statistics are simple functions of these PI. Also new test statistics for directional alternatives are provided. The package gMWT can be used to calculate the variable-wise PI estimates, to illustrate their multivariate distribution and mutual dependence with joint scatterplot matrices, and to construct several classical and new rank tests based on the PIs. The aim of the paper is first to briefly explain the theory that is necessary to understand the behavior of the estimated PIs and the rank tests based on them. Second, the use of the package is described and illustrated with simulated and real data examples. It is stressed that the package provides a new flexible toolbox to analyze large gene or microRNA expression data sets, collected on microarrays or by other high-throughput technologies. The testing procedures can be used in an eQTL analysis, for example, as implemented in the package GeneticTools.

]]>
Sun, 21 Jun 2015 07:00:00 GMT http://www.jstatsoft.org/v65/i09
PCovR: An R Package for Principal Covariates Regression http://www.jstatsoft.org/v65/i08/paper Vol. 65, Issue 8, Jun 2015

Abstract:

In this article, we present PCovR, an R package for performing principal covariates regression (PCovR; De Jong and Kiers 1992). PCovR was developed for analyzing regression data with many and/or highly collinear predictor variables. The method simultaneously reduces the predictor variables to a limited number of components and regresses the criterion variables on these components. The flexibility, interpretational advantages, and computational simplicity of PCovR make the method stand out between many other regression methods. The PCovR package offers data preprocessing options, new model selection procedures, and several component rotation strategies, some of which were not available in R up till now. The use and usefulness of the package is illustrated with a real dataset, called psychiatrists.

]]>
Sun, 21 Jun 2015 07:00:00 GMT http://www.jstatsoft.org/v65/i08
MF Calculator: A Web-Based Application for Analyzing Similarity http://www.jstatsoft.org/v65/c02/paper Vol. 65, Code Snippet 2, Jun 2015

]]>
Mon, 01 Jun 2015 07:00:00 GMT http://www.jstatsoft.org/v65/c02
LazySorted: A Lazily, Partially Sorted Python List http://www.jstatsoft.org/v65/c01/paper Vol. 65, Code Snippet 1, Jun 2015

]]>
Mon, 01 Jun 2015 07:00:00 GMT http://www.jstatsoft.org/v65/c01
DTR: An R Package for Estimation and Comparison of Survival Outcomes of Dynamic Treatment http://www.jstatsoft.org/v65/i07/paper Vol. 65, Issue 7, Jun 2015

Abstract:

Sequentially randomized designs, more recently known as sequential multiple assignment randomized trial (SMART) designs, are widely used in biomedical research, particularly in clinical trials, to assess and compare the effects of various treatment sequences. In such designs, patients are initially randomized to one of the rst-stage therapies. Then patients meeting some criteria (e.g., no relapse of disease) participate in the second-stage randomization to one of the second-stage therapies. The advantage of such a design is that it allows the investigator to study various treatment sequences where the patients' second-stage therapies can be adjusted based on their responses to the rst-stage therapies. In the past few years, substantial improvement has been made in the statistical methods for analyzing the data from SMARTs. Much of the proposed statistical approaches focus on estimating and comparing the survival outcomes of treatment sequences embedded in the SMART designs. In this article, we introduce the R package DTR, which provides a set of functions that can be used to estimate and compare the effects of different treatment sequences on survival outcomes using the newly proposed statistical approaches. The proposed package is also illustrated using simulated data from SMARTs.

]]>
Mon, 01 Jun 2015 07:00:00 GMT http://www.jstatsoft.org/v65/i07
The VGAM Package for Capture-Recapture Data Using the Conditional Likelihood http://www.jstatsoft.org/v65/i05/paper Vol. 65, Issue 5, Jun 2015

Abstract:

It is well known that using individual covariate information (such as body weight or gender) to model heterogeneity in capture-recapture (CR) experiments can greatly enhance inferences on the size of a closed population. Since individual covariates are only observable for captured individuals, complex conditional likelihood methods are usually required and these do not constitute a standard generalized linear model (GLM) family. Modern statistical techniques such as generalized additive models (GAMs), which allow a relaxing of the linearity assumptions on the covariates, are readily available for many standard GLM families. Fortunately, a natural statistical framework for maximizing conditional likelihoods is available in the Vector GLM and Vector GAM classes of models. We present several new R functions (implemented within the VGAM package) specifically developed to allow the incorporation of individual covariates in the analysis of closed population CR data using a GLM/GAM-like approach and the conditional likelihood. As a result, a wide variety of practical tools are now readily available in the VGAM object oriented framework. We discuss and demonstrate their advantages, features and flexibility using the new VGAM CR functions on several examples.

]]>
Mon, 01 Jun 2015 07:00:00 GMT http://www.jstatsoft.org/v65/i05
frbs: Fuzzy Rule-Based Systems for Classification and Regression in R http://www.jstatsoft.org/v65/i06/paper Vol. 65, Issue 6, Jun 2015

Abstract:

Fuzzy rule-based systems (FRBSs) are a well-known method family within soft computing. They are based on fuzzy concepts to address complex real-world problems. We present the R package frbs which implements the most widely used FRBS models, namely, Mamdani and Takagi Sugeno Kang (TSK) ones, as well as some common variants. In addition a host of learning methods for FRBSs, where the models are constructed from data, are implemented. In this way, accurate and interpretable systems can be built for data analysis and modeling tasks. In this paper, we also provide some examples on the usage of the package and a comparison with other common classification and regression methods available in R.

]]>
Mon, 01 Jun 2015 07:00:00 GMT http://www.jstatsoft.org/v65/i06
kml and kml3d: R Packages to Cluster Longitudinal Data http://www.jstatsoft.org/v65/i04/paper Vol. 65, Issue 4, Jun 2015

Abstract:

Longitudinal studies are essential tools in medical research. In these studies, variables are not restricted to single measurements but can be seen as variable-trajectories, either single or joint. Thus, an important question concerns the identification of homogeneous patient trajectories. kml and kml3d are R packages providing an implementation of k-means designed to work specifically on trajectories (kml) or on joint trajectories (kml3d). They provide various tools to work on longitudinal data: imputation methods for trajectories (nine classic and one original), methods to define starting conditions in k-means (four classic and three original) and quality criteria to choose the best number of clusters (four classic and one original). In addition, they offer graphic facilities to “visualize” the trajectories, either in 2D (single trajectory) or 3D (joint-trajectories). The 3D graph representing the mean joint-trajectories of each cluster can be exported through LATEX in a 3D dynamic rotating PDF graph (Figures 1 and 9).

]]>
Mon, 01 Jun 2015 07:00:00 GMT http://www.jstatsoft.org/v65/i04
simPH: An R Package for Illustrating Estimates from Cox Proportional Hazard Models Including for Interactive and Nonlinear Effects http://www.jstatsoft.org/v65/i03/paper Vol. 65, Issue 3, Jun 2015

Abstract:

The R package simPH provides tools for effectively communicating results from Cox proportional hazard (PH) models, including models with interactive and nonlinear effects. The Cox (PH) model is a popular tool for examining event data. However, previously available computational tools have not made it easy to explore and communicate quantities of interest and associated uncertainty estimated from them. This is especially true when the effects are interactions or nonlinear transformations of continuous variables. These transformations are especially useful with Cox PH models because they can be employed to correctly specifying models that would otherwise violate the nonproportional hazards assumption. Package simPH makes it easy to simulate and then plot quantities of interest for a variety of effects estimated from Cox PH models including interactive effects, nonlinear effects, as well as standard linear effects. Package simPH employs visual weighting in order to effectively communicate estimation uncertainty. There are options to show either the standard central interval of the simulation's distribution or the shortest probability interval - which can be useful for asymmetrically distributed estimates. This paper uses hypothetical and empirical examples to illustrate package simPH 's syntax and capabilities.

]]>
Mon, 01 Jun 2015 07:00:00 GMT http://www.jstatsoft.org/v65/i03
CompPD: A MATLAB Package for Computing Projection Depth http://www.jstatsoft.org/v65/i02/paper Vol. 65, Issue 2, Jun 2015

Abstract:

Since the seminal work of Tukey (1975), depth functions have proved extremely useful in robust data analysis and inference for multivariate data. Many notions of depth have been developed in the last decades. Among others, projection depth appears to be very favorable. It turns out that (Zuo 2003 ; Zuo, Cui, and He 2004; Zuo 2006), with appropriate choices of univariate location and scale estimators, the projection depth induced estimators usually possess very high breakdown point robustness and infinite sample relative efficiency. However, the computation of the projection depth seems hopeless and intimidating if not impossible. This hinders the further inference procedures development in practice. Sporadically algorithms exist in individual papers, though an unified computation package for projection depth has not been documented. To fill the gap, a MATLAB package entitled CompPD is presented in this paper, which is in fact an implementation of the latest developments (Liu, Zuo, and Wang 2013; Liu and Zuo 2014). Illustrative examples are also provided to guide readers through step-by-step usage of package CompPD to demonstrate its utility.

]]>
Mon, 01 Jun 2015 07:00:00 GMT http://www.jstatsoft.org/v65/i02
The R Package groc for Generalized Regression on Orthogonal Components http://www.jstatsoft.org/v65/i01/paper Vol. 65, Issue 1, Jun 2015

Abstract:

The R package groc for generalized regression on orthogonal components contains functions for the prediction of q responses using a set of p predictors. The primary building block is the grid algorithm used to search for components (projections of the data) which are most dependent on the response. The package offers flexibility in the choice of the dependence measure which can be user-defined. The components are found sequentially. A first component is obtained and a smooth fit produces residuals. Then, a second component orthogonal to the first is found which is most dependent on the residuals, and so on. The package can handle models with more than one response. A panoply of models can be achieved through package groc: robust multiple or multivariate linear regression, nonparametric regression on orthogonal components, and classical or robust partial least squares models. Functions for predictions and cross-validation are available and helpful in model selection. The merit of a fit through cross-validation can be assessed with the predicted residual error sum of squares or the predicted residual error median absolute deviation which is more appropriate in the presence of outliers.

]]>
Mon, 01 Jun 2015 07:00:00 GMT http://www.jstatsoft.org/v65/i01