Abstract:

This article surveys currently available implementations in R for continuous global optimization problems. A new R package globalOptTests is presented that provides a set of standard test problems for continuous global optimization based on C functions by Ali, Khompatraporn, and Zabinsky (2005). 48 of the objective functions contained in the package are used in empirical comparison of 18 R implementations in terms of the quality of the solutions found and speed.

]]>Abstract:

Convex optimization now plays an essential role in many facets of statistics. We briefly survey some recent developments and describe some implementations of these methods in R . Applications of linear and quadratic programming are introduced including quantile regression, the Huber M-estimator and various penalized regression methods. Applications to additively separable convex problems subject to linear equality and inequality constraints such as nonparametric density estimation and maximum likelihood estimation of general nonparametric mixture models are described, as are several cone programming problems. We focus throughout primarily on implementations in the R environment that rely on solution methods linked to R, like MOSEK by the package Rmosek. Code is provided in R to illustrate several of these problems. Other applications are available in the R package REBayes, dealing with empirical Bayes estimation of nonparametric mixture models.

]]>Abstract:

Trust region algorithms are nonlinear optimization tools that tend to be stable and reliable when the objective function is non-concave, ill-conditioned, or exhibits regions that are nearly flat. Additionally, most freely-available optimization routines do not exploit the sparsity of the Hessian when such sparsity exists, as in log posterior densities of Bayesian hierarchical models. The trustOptim package for the R programming language addresses both of these issues. It is intended to be robust, scalable and efficient for a large class of nonlinear optimization problems that are often encountered in statistics, such as finding posterior modes. The user must supply the objective function, gradient and Hessian. However, when used in conjunction with the sparseHessianFD package, the user does not need to supply the exact sparse Hessian, as long as the sparsity structure is known in advance. For models with a large number of parameters, but for which most of the cross-partial derivatives are zero (i.e., the Hessian is sparse), trustOptim offers dramatic performance improvements over existing options, in terms of computational time and memory footprint.

]]>Abstract:

Over the last two decades, it has been observed that using the gradient vector as a search direction in large-scale optimization may lead to efficient algorithms. The effectiveness relies on choosing the step lengths according to novel ideas that are related to the spectrum of the underlying local Hessian rather than related to the standard decrease in the objective function. A review of these so-called spectral projected gradient methods for convex constrained optimization is presented. To illustrate the performance of these low-cost schemes, an optimization problem on the set of positive definite matrices is described.

]]>Abstract:

R (R Core Team 2014) provides a powerful and flexible system for statistical computations. It has a default-install set of functionality that can be expanded by the use of several thousand add-in packages as well as user-written scripts. While R is itself a programming language, it has proven relatively easy to incorporate programs in other languages, particularly Fortran and C. Success, however, can lead to its own costs:

Users face a confusion of choice when trying to select packages in approaching a problem.

A need to maintain workable examples using early methods may mean some tools offered as a default may be dated.

In an open-source project like R, how to decide what tools offer "best practice" choices, and how to implement such a policy, present a serious challenge.

We discuss these issues with reference to the tools in R for nonlinear parameter estimation (NLPE) and optimization, though for the present article `optimization` will be limited to function minimization of essentially smooth functions with at most bounds constraints on the parameters. We will abbreviate this class of problems as NLPE. We believe that the concepts proposed are transferable to other classes of problems seen by R users.

]]>Abstract:

Numerical optimization is often an essential aspect of mathematical analysis in science, technology and other areas. The function optim() provides basic optimization capabilities and is among the most widely used functions in R . Additionally, there are various packages and functions for solving various types of optimization problem (the optimization task view on Comprehensive R Archive Network provides a comprehensive list of available options for solving optimization problems in R). In this special volume, four papers are presented which discuss some of the areas in numerical optimization where significant developments have been made recently to enhance the capabilities in R . This introduction provides a brief overview of the volume.

]]>Abstract:

The R package structSSI provides an accessible implementation of two recently developed simultaneous and selective inference techniques: the group Benjamini-Hochberg and hierarchical false discovery rate procedures. Unlike many multiple testing schemes, these methods specifically incorporate existing information about the grouped or hierarchical dependence between hypotheses under consideration while controlling the false discovery rate. Doing so increases statistical power and interpretability. Furthermore, these procedures provide novel approaches to the central problem of encoding complex dependency between hypotheses.

We briefly describe the group Benjamini-Hochberg and hierarchical false discovery rate procedures and then illustrate them using two examples, one a measure of ecological microbial abundances and the other a global temperature time series. For both procedures, we detail the steps associated with the analysis of these particular data sets, including establishing the dependence structures, performing the test, and interpreting the results. These steps are encapsulated by R functions, and we explain their applicability to general data sets.

Abstract:

R package mixAK originally implemented routines primarily for Bayesian estimation of finite normal mixture models for possibly interval-censored data. The functionality of the package was considerably enhanced by implementing methods for Bayesian estimation of mixtures of multivariate generalized linear mixed models proposed in Komárek and Komárková (2013). Among other things, this allows for a cluster analysis (classification) based on multivariate continuous and discrete longitudinal data that arise whenever multiple outcomes of a different nature are recorded in a longitudinal study. This package also allows for a data-driven selection of a number of clusters as methods for selecting a number of mixture components were implemented. A model and clustering methodology for multivariate continuous and discrete longitudinal data is overviewed. Further, a step-by-step cluster analysis based jointly on three longitudinal variables of different types (continuous, count, dichotomous) is given, which provides a user manual for using the package for similar problems.

]]>Abstract:

In this paper we show how complete hierarchical multinomial marginal (HMM) models for categorical variables can be defined, estimated and tested using the R package hmmm. Models involving equality and inequality constraints on marginal parameters are needed to define hypotheses of conditional independence, stochastic dominance or notions of positive dependence, or when the parameters are allowed to depend on covariates. The hmmm package also serves the need of estimating and testing HMM models under equality and inequality constraints on marginal interactions.

]]>Abstract:

A huge amount of effort is spent cleaning data to get it ready for analysis, but there has been little research on how to make data cleaning as easy and effective as possible. This paper tackles a small, but important, component of data cleaning: data tidying. Tidy datasets are easy to manipulate, model and visualize, and have a specific structure: each variable is a column, each observation is a row, and each type of observational unit is a table. This framework makes it easy to tidy messy datasets because only a small set of tools are needed to deal with a wide range of un-tidy datasets. This structure also makes it easier to develop tidy tools for data analysis, tools that both input and output tidy datasets. The advantages of a consistent data structure and matching tools are demonstrated with a case study free from mundane data manipulation chores.

]]>Abstract:

When testing for reduction of the mean value structure in linear mixed models, it is common to use an asymptotic χ2 test. Such tests can, however, be very poor for small and moderate sample sizes. The pbkrtest package implements two alternatives to such approximate χ2 tests: The package implements (1) a Kenward-Roger approximation for performing F tests for reduction of the mean structure and (2) parametric bootstrap methods for achieving the same goal. The implementation is focused on linear mixed models with independent residual errors. In addition to describing the methods and aspects of their implementation, the paper also contains several examples and a comparison of the various methods.

]]>Abstract:

dglars is a publicly available R package that implements the method proposed in Augugliaro, Mineo, and Wit (2013), developed to study the sparse structure of a generalized linear model. This method, called dgLARS, is based on a differential geometrical extension of the least angle regression method proposed in Efron, Hastie, Johnstone, and Tibshirani (2004). The core of the dglars package consists of two algorithms implemented in Fortran 90 to efficiently compute the solution curve: a predictor-corrector algorithm, proposed in Augugliaro et al. (2013), and a cyclic coordinate descent algorithm, proposed in Augugliaro, Mineo, and Wit (2012). The latter algorithm, as shown here, is significantly faster than the predictor-corrector algorithm. For comparison purposes, we have implemented both algorithms.

]]>Abstract:

Equating is a family of statistical models and methods that are used to adjust scores on two or more versions of a test, so that the scores from different tests may be used interchangeably. In this paper we present the R package SNSequate which implements both standard and nonstandard statistical models and methods for test equating. The package construction was motivated by the need of having a modular, simple, yet comprehensive, and general software that carries out traditional and new equating methods. SNSequate currently implements the traditional mean, linear and equipercentile equating methods, as well as the mean-mean, mean-sigma, Haebara and Stocking-Lord item response theory linking methods. It also supports the newest methods such as local equating, kernel equating, and item response theory parameter linking methods based on asymmetric item characteristic functions. Practical examples are given to illustrate the capabilities of the software. A list of other programs for equating is presented, highlighting the main differences between them. Future directions for the package are also discussed.

]]>Abstract:

The R package phtt provides estimation procedures for panel data with large dimensions n, T, and general forms of unobservable heterogeneous effects. Particularly, the estimation procedures are those of Bai (2009) and Kneip, Sickles, and Song (2012), which complement one another very well: both models assume the unobservable heterogeneous effects to have a factor structure. Kneip et al. (2012) considers the case in which the time-varying common factors have relatively smooth patterns including strongly positively auto-correlated stationary as well as non-stationary factors, whereas the method of Bai (2009) focuses on stochastic bounded factors such as ARMA processes. Additionally, the phtt package provides a wide range of dimensionality criteria in order to estimate the number of the unobserved factors simultaneously with the remaining model parameters.

]]>Abstract:

In this paper, we describe the R package mediation for conducting causal mediation analysis in applied empirical research. In many scientific disciplines, the goal of researchers is not only estimating causal effects of a treatment but also understanding the process in which the treatment causally affects the outcome. Causal mediation analysis is frequently used to assess potential causal mechanisms. The mediation package implements a comprehensive suite of statistical tools for conducting such an analysis. The package is organized into two distinct approaches. Using the model-based approach, researchers can estimate causal mediation effects and conduct sensitivity analysis under the standard research design. Furthermore, the design-based approach provides several analysis tools that are applicable under different experimental designs. This approach requires weaker assumptions than the model-based approach. We also implement a statistical method for dealing with multiple (causally dependent) mediators, which are often encountered in practice. Finally, the package also offers a methodology for assessing causal mediation in the presence of treatment noncompliance, a common problem in randomized trials.

]]>