Journal of Statistical Software http://www.jstatsoft.org/rss Sun, 14 Mar 2010 08:46:49 GMT Sun, 14 Mar 2010 08:46:49 GMT Most recent publications from the Journal of Statistical Software A Generalization of the Dirichlet Distribution http://www.jstatsoft.org/v33/i11/paper Vol. 33, Issue 11, Feb 2010

Abstract:

This paper discusses a generalization of the Dirichlet distribution, the ‘hyperdirichlet’, in which various types of incomplete observations may be incorporated. It is conjugate to the multinomial distribution when some observations are censored or grouped. The <b>hyperdirichlet</b> R package is introduced and examples given. A number of statistical tests are performed on the example datasets, which are drawn from diverse disciplines including sports statistics, the sociology of climate change, and psephology.

]]>
Tue, 23 Feb 2010 08:00:00 GMT /v33/i11 Robin K. S. Hankin
Inference with Linear Equality and Inequality Constraints Using R: The Package ic.infer http://www.jstatsoft.org/v33/i10/paper Vol. 33, Issue 10, Feb 2010

Abstract:

In linear models and multivariate normal situations, prior information in linear inequality form may be encountered, or linear inequality hypotheses may be subjected to statistical tests. R package <b>ic.infer</b> has been developed to support inequality-constrained estimation and testing for such situations. This article gives an overview of the principles underlying inequality-constrained inference that are far less well-known than methods for unconstrained or equality-constrained models, and describes their implementation in the package.

]]>
Tue, 23 Feb 2010 08:00:00 GMT /v33/i10 Ulrike Grömping
Solving Differential Equations in R: Package deSolve http://www.jstatsoft.org/v33/i09/paper Vol. 33, Issue 9, Feb 2010

Abstract:

In this paper we present the R package <b>deSolve</b> to solve initial value problems (IVP) written as ordinary differential equations (ODE), differential algebraic equations (DAE) of index 0 or 1 and partial differential equations (PDE), the latter solved using the method of lines approach. The differential equations can be represented in R code or as compiled code. In the latter case, R is used as a tool to trigger the integration and post-process the results, which facilitates model development and application, whilst the compiled code significantly increases simulation speed. The methods implemented are efficient, robust, and well documented public-domain Fortran routines. They include four integrators from the <b>ODEPACK</b> package (LSODE, LSODES, LSODA, LSODAR), DVODE and DASPK2.0. In addition, a suite of Runge-Kutta integrators and special-purpose solvers to efficiently integrate 1-, 2- and 3-dimensional partial differential equations are available. The routines solve both stiff and non-stiff systems, and include many options, e.g., to deal in an efficient way with the sparsity of the Jacobian matrix, or finding the root of equations. In this article, our objectives are threefold: (1) to demonstrate the potential of using R for dynamic modeling, (2) to highlight typical uses of the different methods implemented and (3) to compare the performance of models specified in R code and in compiled code for a number of test cases. These comparisons demonstrate that, if the use of loops is avoided, R code can efficiently integrate problems comprising several thousands of state variables. Nevertheless, the same problem may be solved from 2 to more than 50 times faster by using compiled code compared to an implementation using only R code. Still, amongst the benefits of R are a more flexible and interactive implementation, better readability of the code, and access to R’s high-level procedures. <b>deSolve</b> is the successor of package <b>odesolve</b> which will be deprecated in the future; it is free software and distributed under the GNU General Public License, as part of the R software project.

]]>
Tue, 23 Feb 2010 08:00:00 GMT /v33/i09 R. Woodrow Setzer, Karline Soetaert, Thomas Petzoldt
grofit: Fitting Biological Growth Curves with R http://www.jstatsoft.org/v33/i07/paper Vol. 33, Issue 7, Feb 2010

Abstract:

The <b>grofit</b> package was developed to fit many growth curves obtained under different conditions in order to derive a conclusive dose-response curve, for instance for a compound that potentially affects growth. <b>grofit</b> fits data to different parametric models and in addition provides a model free spline method to circumvent systematic errors that might occur within application of parametric methods. This amendment increases the reliability of the characteristic parameters (e.g.,lag phase, maximal growth rate, stationary phase) derived from a single growth curve. By relating obtained parameters to the respective condition (e.g.,concentration of a compound) a dose response curve can be derived that enables the calculation of descriptive pharma-/toxicological values like half maximum effective concentration (EC50). Bootstrap and cross-validation techniques are used for estimating confidence intervals of all derived parameters.

]]>
Wed, 17 Feb 2010 08:00:00 GMT /v33/i07 Guido Hasenbrink, Matthias Kahm, Hella Lichtenberg-Fraté, Jost Ludwig, Maik Kschischo
Simple Algorithms to Calculate Asymptotic Null Distributions of Robust Tests in Case-Control Genetic Association Studies in R http://www.jstatsoft.org/v33/i08/paper Vol. 33, Issue 8, Feb 2010

Abstract:

The case-control study is an important design for testing association between genetic markers and a disease. The Cochran-Armitage trend test (CATT) is one of the most commonly used statistics for the analysis of case-control genetic association studies. The asymptotically optimal CATT can be used when the underlying genetic model (mode of inheritance) is known. However, for most complex diseases, the underlying genetic models are unknown. Thus, tests robust to genetic model misspecification are preferable to the model-dependant CATT. Two robust tests, MAX3 and the genetic model selection (GMS), were recently proposed. Their asymptotic null distributions are often obtained by Monte-Carlo simulations, because they either have not been fully studied or involve multiple integrations. In this article, we study how components of each robust statistic are correlated, and find a linear dependence among the components. Using this new finding, we propose simple algorithms to calculate asymptotic null distributions for MAX3 and GMS, which greatly reduce the computing intensity. Furthermore, we have developed the R package <b>Rassoc</b> implementing the proposed algorithms to calculate the empirical and asymptotic <i>p</i> values for MAX3 and GMS as well as other commonly used tests in case-control association studies. For illustration, <b>Rassoc</b> is applied to the analysis of case-control data of 17 most significant SNPs reported in four genome-wide association studies.

]]>
Wed, 17 Feb 2010 08:00:00 GMT /v33/i08 Yong Zang, Wing Kam Fung, Gang Zheng
Categorical Inputs, Sensitivity Analysis, Optimization and Importance Tempering with tgp Version 2, an R Package for Treed Gaussian Process Models http://www.jstatsoft.org/v33/i06/paper Vol. 33, Issue 6, Feb 2010

Abstract:

This document describes the new features in version 2.x of the <b>tgp</b> package for R, implementing treed Gaussian process (GP) models. The topics covered include methods for dealing with categorical inputs and excluding inputs from the tree or GP part of the model; fully Bayesian sensitivity analysis for inputs/covariates; sequential optimization of black-box functions; and a new Monte Carlo method for inference in multi-modal posterior distributions that combines simulated tempering and importance sampling. These additions extend the functionality of <b>tgp</b> across all models in the hierarchy: from Bayesian linear models, to classification and regression trees (CART), to treed Gaussian processes with jumps to the limiting linear model. It is assumed that the reader is familiar with the baseline functionality of the package, outlined in the first vignette (Gramacy 2007).

]]>
Wed, 17 Feb 2010 08:00:00 GMT /v33/i06 Robert B. Gramacy, Matthew Alan Taddy
Measures of Analysis of Time Series (MATS): A MATLAB Toolkit for Computation of Multiple Measures on Time Series Data Bases http://www.jstatsoft.org/v33/i05/paper Vol. 33, Issue 5, Feb 2010

Abstract:

In many applications, such as physiology and finance, large time series data bases are to be analyzed requiring the computation of linear, nonlinear and other measures. Such measures have been developed and implemented in commercial and freeware softwares rather selectively and independently. The Measures of Analysis of Time Series (<b>MATS</b>) MATLAB toolkit is designed to handle an arbitrary large set of scalar time series and compute a large variety of measures on them, allowing for the specification of varying measure parameters as well. The variety of options with added facilities for visualization of the results support different settings of time series analysis, such as the detection of dynamics changes in long data records, resampling (surrogate or bootstrap) tests for independence and linearity with various test statistics, and discrimination power of different measures and for different combinations of their parameters. The basic features of <b>MATS</b> are presented and the implemented measures are briefly described. The usefulness of <b>MATS</b> is illustrated on some empirical examples along with screenshots.

]]>
Wed, 17 Feb 2010 08:00:00 GMT /v33/i05 Dimitris Kugiumtzis, Alkiviadis Tsimpiris
clues: An R Package for Nonparametric Clustering Based on Local Shrinking http://www.jstatsoft.org/v33/i04/paper Vol. 33, Issue 4, Feb 2010

Abstract:

Determining the optimal number of clusters appears to be a persistent and controversial issue in cluster analysis. Most existing R packages targeting clustering require the user to specify the number of clusters in advance. However, if this subjectively chosen number is far from optimal, clustering may produce seriously misleading results. In order to address this vexing problem, we develop the R package clues to automate and evaluate the selection of an optimal number of clusters, which is widely applicable in the field of clustering analysis. Package clues uses two main procedures, shrinking and partitioning, to estimate an optimal number of clusters by maximizing an index function, either the CH index or the Silhouette index, rather than relying on guessing a pre-specified number. Five agreement indices (Rand index, Hubert and Arabie’s adjusted Rand index, Morey and Agresti’s adjusted Rand index, Fowlkes and Mallows index and Jaccard index), which measure the degree of agreement between any two partitions, are also provided in clues. In addition to numerical evidence, clues also supplies a deeper insight into the partitioning process with trajectory plots.

]]>
Wed, 03 Feb 2010 08:00:00 GMT /v33/i04 Xiaogang Wang, Fang Chang, Weiliang Qiu, Ruben H. Zamar, Ross Lazarus
Inverse Modelling, Sensitivity and Monte Carlo Analysis in R Using Package FME http://www.jstatsoft.org/v33/i03/paper Vol. 33, Issue 3, Feb 2010

Abstract:

Mathematical simulation models are commonly applied to analyze experimental or environmental data and eventually to acquire predictive capabilities. Typically these models depend on poorly defined, unmeasurable parameters that need to be given a value. Fitting a model to data, so-called inverse modelling, is often the sole way of finding reasonable values for these parameters. There are many challenges involved in inverse model applications, e.g., the existence of non-identifiable parameters, the estimation of parameter uncertainties and the quantification of the implications of these uncertainties on model predictions.

<p>The R package FME is a modeling package designed to confront a mathematical model with data. It includes algorithms for sensitivity and Monte Carlo analysis, parameter identifiability, model fitting and provides a Markov-chain based method to estimate parameter confidence intervals. Although its main focus is on mathematical systems that consist of differential equations, FME can deal with other types of models. In this paper, FME is applied to a model describing the dynamics of the HIV virus.</p>

]]>
Tue, 02 Feb 2010 08:00:00 GMT /v33/i03 Thomas Petzoldt, Karline Soetaert
MCMC Methods for Multi-Response Generalized Linear Mixed Models: The MCMCglmm R Package http://www.jstatsoft.org/v33/i02/paper Vol. 33, Issue 2, Feb 2010

Abstract:

Generalized linear mixed models provide a flexible framework for modeling a range of data, although with non-Gaussian response variables the likelihood cannot be obtained in closed form. Markov chain Monte Carlo methods solve this problem by sampling from a series of simpler conditional distributions that can be evaluated. The R package <b>MCMCglmm</b> implements such an algorithm for a range of model fitting problems. More than one response variable can be analyzed simultaneously, and these variables are allowed to follow Gaussian, Poisson, multi(bi)nominal, exponential, zero-inflated and censored distributions. A range of variance structures are permitted for the random effects, including interactions with categorical or continuous variables (i.e., random regression), and more complicated variance structures that arise through shared ancestry, either through a pedigree or through a phylogeny. Missing values are permitted in the response variable(s) and data can be known up to some level of measurement error as in meta-analysis. All simu- lation is done in C/ C++ using the <b>CSparse</b> library for sparse linear systems.

]]>
Tue, 02 Feb 2010 08:00:00 GMT /v33/i02 Jarrod Hadfield
Regularization Paths for Generalized Linear Models via Coordinate Descent http://www.jstatsoft.org/v33/i01/paper Vol. 33, Issue 1, Feb 2010

Abstract:

We develop fast algorithms for estimation of generalized linear models with convex penalties. The models include linear regression, two-class logistic regression, and multi- nomial regression problems while the penalties include â„"<sub>1</sub> (the lasso), â„"<sub>2</sub> (ridge regression) and mixtures of the two (the elastic net). The algorithms use cyclical coordinate descent, computed along a regularization path. The methods can handle large problems and can also deal efficiently with sparse features. In comparative timings we find that the new algorithms are considerably faster than competing methods.

]]>
Tue, 02 Feb 2010 08:00:00 GMT /v33/i01 Trevor Hastie, Jerome H. Friedman, Rob Tibshirani
Contiguity-Constrained Hierarchical Agglomerative Clustering Using SAS http://www.jstatsoft.org/v33/c02/paper Vol. 33, Code Snippet 2, Feb 2010

]]>
Tue, 02 Feb 2010 08:00:00 GMT /v33/c02 Anthony Recchia
R Functions to Symbolically Compute the Central Moments of the Multivariate Normal Distribution http://www.jstatsoft.org/v33/c01/paper Vol. 33, Code Snippet 1, Feb 2010

]]>
Tue, 02 Feb 2010 08:00:00 GMT /v33/c01 Kem Phillips
Computation of Multivariate Normal and t Probabilities http://www.jstatsoft.org/v33/b02/paper Vol. 33, Book Review 2, Feb 2010

Computation of Multivariate Normal and t Probabilities
Alan Genz and Frank Bretz
Springer-Verlag, 2009
ISBN: 978-3-642-01688-2

]]>
Tue, 02 Feb 2010 08:00:00 GMT /v33/b02 Joakim Ekström
Hands-On Intermediate Econometrics Using R http://www.jstatsoft.org/v33/b01/paper Vol. 33, Book Review 1, Feb 2010

Hands-On Intermediate Econometrics Using R
Hrishikesh D. Vinod
World Scientific Publishing Co. Pte. Ltd., 2008
ISBN: 978-981-281-885-0

]]>
Tue, 02 Feb 2010 08:00:00 GMT /v33/b01 A. Talha Yalta