Abstract:

Object orientation provides a flexible framework for the implementation of the convolution of arbitrary distributions of real-valued random variables. We discuss an algorithm which is based on the fast Fourier transform. It directly applies to lattice-supported distributions. In the case of continuous distributions an additional discretization to a linear lattice is necessary and the resulting lattice-supported distributions are suitably smoothed after convolution.

We compare our algorithm to other approaches aiming at a similar generality as to accuracy and speed. In situations where the exact results are known, several checks confirm a high accuracy of the proposed algorithm which is also illustrated for approximations of non-central χ2 distributions.

By means of object orientation this default algorithm is overloaded by more specific algorithms where possible, in particular where explicit convolution formulae are available. Our focus is on R package distr which implements this approach, overloading operator + for convolution; based on this convolution, we define a whole arithmetics of mathematical operations acting on distribution objects, comprising operators +, -, *, /, and ^.

Abstract:

Today, many experiments in the field of behavioral sciences are conducted using a computer. While there is a broad choice of computer programs facilitating the process of conducting experiments as well as programs for statistical analysis there are relatively few programs facilitating the intermediate step of data aggregation. ART has been developed in order to fill this gap and to provide a computer program for data aggregation that has a graphical user interface such that aggregation can be done more easily and without any programming. All “rules” that are necessary to extract variables can be seen “at a glance” which helps the user to conduct even complex aggregations with several hundreds of variables and makes aggregation more resistant against errors. ART runs with Windows XP, Vista, 7, and 8 and it is free. Copies (executable and source code) are available at http://www.psychologie.hhu.de/arbeitsgruppen/allgemeine-psychologie-und-arbeitspsychologie/art.html.

]]>Abstract:

We present an R package for the simulation of simple and complex survival data. It covers different situations, including recurrent events and multiple events. The main simulation routine allows the user to introduce an arbitrary number of distributions, each corresponding to a new event or episode, with its parameters, choosing between the Weibull (and exponential as a particular case), log-logistic and log-normal distributions.

]]>Abstract:

Progress in molecular high-throughput techniques has led to the opportunity of a comprehensive monitoring of biomolecules in medical samples. In the era of personalized medicine, these data form the basis for the development of diagnostic, prognostic and predictive tests for cancer. Because of the high number of features that are measured simultaneously in a relatively low number of samples, supervised learning approaches are sensitive to overﬁtting and performance overestimation. Bioinformatic methods were developed to cope with these problems including control of accuracy and precision. However, there is demand for easy-to-use software that integrates methods for classifier construction, performance assessment and development of diagnostic tests. To contribute to ﬁlling of this gap, we developed a comprehensive R package for the development and validation of diagnostic tests from high-dimensional molecular data. An important focus of the package is a careful validation of the classification results. To this end, we implemented an extended version of the multiple random validation protocol, a validation method that was introduced before. The package includes methods for continuous prediction scores. This is important in a clinical setting, because scores can be converted to probabilities and help to distinguish between clear-cut and borderline classification results. The functionality of the package is illustrated by the analysis of two cancer microarray data sets.

]]>Abstract:

We apply the cyclic coordinate descent algorithm of Friedman, Hastie, and Tibshirani (2010) to the ﬁtting of a conditional logistic regression model with lasso ("1) and elastic net penalties. The sequential strong rules of Tibshirani, Bien, Hastie, Friedman, Taylor, Simon, and Tibshirani (2012) are also used in the algorithm and it is shown that these offer a considerable speed up over the standard coordinate descent algorithm with warm starts.

Once implemented, the algorithm is used in simulation studies to compare the variable selection and prediction performance of the conditional logistic regression model against that of its unconditional (standard) counterpart. We ﬁnd that the conditional model performs admirably on datasets drawn from a suitable conditional distribution, outperforming its unconditional counterpart at variable selection. The conditional model is also ﬁt to a small real world dataset, demonstrating how we obtain regularization paths for the parameters of the model and how we apply cross validation for this method where natural unconditional prediction rules are hard to come by.

]]>Abstract:

We describe an R package for determining the optimal price of an asset which is “perishable” in a certain sense, given the intensity of customer arrivals and a time-varying price sensitivity function which speciﬁes the probability that a customer will purchase an asset oﬀered at a given price at a given time. The package deals with the case of customers arriving in groups, with a probability distribution for the group size being speciﬁed. The methodology and software allow for both discrete and continuous pricing. The class of possible models for price sensitivity functions is very wide, and includes piecewise linear models. A mechanism for constructing piecewise linear price sensitivity functions is provided.

]]>Abstract:

Finite mixtures of von Mises-Fisher distributions allow to apply model-based clustering methods to data which is of standardized length, i.e., all data points lie on the unit sphere. The R package movMF contains functionality to draw samples from ﬁnite mixtures of von Mises-Fisher distributions and to ﬁt these models using the expectation-maximization algorithm for maximum likelihood estimation. Special features are the possibility to use sparse matrix representations for the input data, different variants of the expectation-maximization algorithm, different methods for determining the concentration parameters in the M-step and to impose constraints on the concentration parameters over the components.

In this paper we describe the main ﬁtting function of the package and illustrate its application. In addition we compare the clustering performance of ﬁnite mixtures of von Mises-Fisher distributions to spherical k-means. We also discuss the resolution of several numerical issues which occur for estimating the concentration parameters and for determining the normalizing constant of the von Mises-Fisher distribution.

]]>Statistical Software (R, SAS, SPSS, and Minitab) for Blind Students and Practitioners, version varies

R, SAS, SPSS, and Minitab

Growth Curve Analysis and Visualization Using R

Daniel Mirman

Chapman & Hall/CRC, 2014

ISBN: 9781466584327

Analyzing Spatial Models of Choice and Judgment with R

David A. Armstrong III, Ryan Bakker, Royce Carroll, Christopher Hare, Keith T. Poole, Howard Rosenthal

CRC Press, 2014

ISBN: 978-14665-1715-8

Abstract:

The use of copula-based models in EDAs (estimation of distribution algorithms) is currently an active area of research. In this context, the copulaedas package for R provides a platform where EDAs based on copulas can be implemented and studied. The package offers complete implementations of various EDAs based on copulas and vines, a group of well-known optimization problems, and utility functions to study the performance of the algorithms. Newly developed EDAs can be easily integrated into the package by extending an S 4 class with generic functions for their main components. This paper presents copulaedas by providing an overview of EDAs based on copulas, a description of the implementation of the package, and an illustration of its use through examples. The examples include running the EDAs defined in the package, implementing new algorithms, and performing an empirical study to compare the behavior of different algorithms on benchmark functions and a real-world problem.

]]>Abstract:

Generalized linear mixed models (GLMMs) comprise a class of widely used statistical tools for data analysis with fixed and random effects when the response variable has a conditional distribution in the exponential family. GLMM analysis also has a close relationship with actuarial credibility theory. While readily available programs such as the GLIMMIX procedure in SAS and the lme4 package in R are powerful tools for using this class of models, these progarms are not able to handle models with thousands of levels of fixed and random effects. By using sparse-matrix and other high performance techniques, procedures such as HPMIXED in SAS can easily fit models with thousands of factor levels, but only for normally distributed response variables. In this paper, we present the %HPGLIMMIX SAS macro that fits GLMMs with large number of sparsely populated design matrices using the doubly-iterative linearization (pseudo-likelihood) method, in which the sparse-matrix-based HPMIXED is used for the inner iterations with the pseudo-variable constructed from the inverse-link function and the chosen model. Although the macro does not have the full functionality of the GLIMMIX procedure, time and memory savings can be large with the new macro. In applications in which design matrices contain many zeros and there are hundreds or thousands of factor levels, models can be fitted without exhausting computer memory, and 90% or better reduction in running time can be observed. Examples with a Poisson, binomial, and gamma conditional distribution are presented to demonstrate the usage and efficiency of this macro.

]]>