Journal of Statistical Software

Local Influence Diagnostics for Nonlinear Mixed Models under the Case-Weight Perturbation Scheme in SAS

2024-04-15T14:39:25+00:00

The nonlinear mixed model is a popular tool for analyzing continuous longitudinal data. This paper is primarily concerned with gauging the sensitivity of nonlinear mixed models to influential observations through local influence, which assesses the impact of small perturbations of the likelihood function. Unlike when case deletion is used, in local influence the model only needs to be fitted once, making it much more computationally appealing. The methodology is illustrated with two datasets, establishing that the local influence diagnostic can easily be applied to nonlinear mixed models through the NLMIXED procedure in the SAS software as a tool to identify influential individuals.

singleRcapture: An R Package for Single-Source Capture-Recapture Models

2025-02-13T11:40:05+00:00

Population size estimation is a major challenge in official statistics, social sciences, and natural sciences. The problem can be tackled by applying capture-recapture methods, which vary depending on the number of sources used, particularly on whether a single or multiple sources are involved. This paper focuses on the first group of methods and introduces a novel R package: singleRcapture. The package implements state-of-the-art single-source capture-recapture (SSCR) models (e.g., zero-truncated one-inflated regression) together with new developments proposed by the authors, and provides a user-friendly application programming interface (API). This self-contained package can be used to produce point estimates and their variance and implements several bootstrap variance estimators or diagnostics to assess quality and conduct sensitivity analysis. It is intended for users interested in estimating the size of populations, particularly those that are difficult to reach or measure, for which information is available only from one source and dual/multiple system estimation is not applicable. Our package serves to bridge a significant gap, as the SSCR methods are either not available at all or are only partially implemented in existing R packages and other open-source software.

sdmTMB: An R Package for Fast, Flexible, and User-Friendly Generalized Linear Mixed Effects Models with Spatial and Spatiotemporal Random Fields

2024-05-24T10:56:49+00:00

Geostatistical spatial or spatiotemporal data are common across scientific fields. However, appropriate models to analyze these data, such as generalized linear mixed effects models (GLMMs) with Gaussian Markov random fields (GMRFs), are computationally intensive and challenging for many users to implement. Here, we introduce the R package sdmTMB, which extends the flexible interface familiar to users of lme4, glmmTMB, and mgcv to include spatial and spatiotemporal latent GMRFs using the stochastic partial differential equation (SPDE) approach. SPDE matrices are constructed with fmesher, and estimation is conducted via maximum marginal likelihood with TMB or via Bayesian inference with tmbstan and rstan. We describe the model and explore case studies that illustrate sdmTMB's flexibility in implementing penalized smoothers, non-stationary processes (time-varying and spatially varying coefficients), hurdle models, cross-validation, and anisotropy (directionally dependent spatial correlation). Finally, we compare the functionality, speed, and interfaces of related software, demonstrating that sdmTMB can be an order of magnitude faster than R-INLA. We hope sdmTMB will help open this useful class of models to more geostatistical analysts.

MixtureMissing: An R Package for Robust and Flexible Model-Based Clustering with Incomplete Data

2024-07-15T15:46:33+00:00

The R package MixtureMissing performs model-based clustering on data sets with values missing at random, aiming to identify homogeneous groups of observations. In model-based clustering, the data within each cluster follow a specific distribution. In the package, 13 distributions are available, including the contaminated normal distribution, the generalized hyperbolic distribution (GHD), and 11 special or limiting cases of GHD. Notably, eight out of these 11 cases have not been formulated at the time of writing. Given a list of candidate distributions, the package can recommend the optimal distribution to employ based on a specified information criterion. In this paper, the methodological foundations and computational aspects of the package are discussed. Furthermore, important features of model fitting, model summary, and available visualization tools are thoroughly illustrated using real data sets.

watson: An R Package for Fitting Mixtures of Watson Distributions

2024-11-20T11:23:26+00:00

In this paper we present and showcase the R package watson which provides a computational framework for fitting and random sampling of the Watson distribution on a p-dimensional sphere. We first introduce the random sampling scheme of the package, which offers two sampling algorithms that are based of the results of Sablica, Hornik, and Leydold (2025). What is more, the package offers a smart tool to combine these two methods, and based on the selected parameters, it approximates the relative sampling speed for both methods and picks the faster one. In addition, we describe the main fitting function for the mixtures of Watson distribution which uses the expectation-maximization (EM) algorithm. Special features are the possibility to use multiple variants of the E-step and M-step, sparse matrices for the data representation and a control parameter which will dynamically eliminate small clusters with overall contribution smaller than this parameter. Moreover, we discuss the numerical issues of the whole fitting procedure and describe how this is handled and solved in the package. Finally, we demonstrate the package on multiple examples involving misspecified simulation study, estimation of the New Zealand earthquake data and depth image clustering.

dynamite: An R Package for Dynamic Multivariate Panel Models

2024-06-06T08:18:23+00:00

dynamite is an R package for Bayesian inference of intensive panel (time series) data comprising multiple measurements per multiple individuals measured in time. The package supports joint modeling of multiple response variables, time-varying and time-invariant effects, a wide range of discrete and continuous distributions, group-specific random effects, latent factors, and customization of prior distributions of the model parameters. Models in the package are defined via a user-friendly formula interface, and estimation of the posterior distribution of the model parameters takes advantage of state-of-the-art Markov chain Monte Carlo methods. The package enables efficient computation of both individual-level and aggregated predictions and offers a comprehensive suite of tools for visualization and model diagnostics.

dbnR: Gaussian Dynamic Bayesian Network Learning and Inference in R

2023-08-29T12:31:49+00:00

Dynamic Bayesian networks are a type of multivariate time series forecasting model capable of a level of interpretability thanks to their graphical representation. They have been reported extensively in the literature in a variety of areas, but their application has usually involved an ad hoc implementation or adaptation of existing Bayesian network software to a dynamic case. In this paper, we present dbnR, an R package that encapsulates the whole process of learning the model and parameters from data and performing inference. The package provides three different structure learning algorithms, exact and approximate inference and a visualization tool that allows inspection of the graphical structure of the networks. The aim of dbnR is to provide a tool that enables fast deployment of dynamic Bayesian network models and to make them readily available as general purpose forecasting models.

skewlmm: An R Package for Fitting Skewed and Heavy-Tailed Linear Mixed Models

2024-04-22T12:42:40+00:00

Longitudinal data are commonly analyzed using linear mixed models, which, for mathematical convenience, usually assume that both random effect and error follow normal distributions. However, these restrictive assumptions may result in a lack of robustness against departures from the normal distribution and invalid statistical inferences. Schumacher, Lachos, and Matos (2021) developed a flexible extension of linear mixed models considering the scale mixture of skew-normal class of distributions from a frequentist point of view, accommodating skewness and heavy tails, and the robust model formulation accounts for a possible within-subject serial dependence by considering some useful dependence structures. This paper presents the R package skewlmm, which implements the method proposed by Schumacher et al. (2021) and provides a user-friendly tool to fit robust linear mixed models to longitudinal data, including model-fit tests, residual analyzes, and plot functions to support model selection and evaluation. Two data sets and a synthetic example are analyzed to illustrate the methodology and software implementation.

SMLE: An R Package for Joint Feature Screening in Ultrahigh-Dimensional GLMs

2023-02-10T17:26:16+00:00

Sparsity-restricted maximum likelihood estimation (SMLE) has received considerable attention for feature screening in ultrahigh-dimensional regression. SMLE is a computationally convenient method that naturally incorporates the joint effects among features in the screening process. We develop a publicly available R package SMLE, which provides a user-friendly environment to carry out the SMLE method in generalized linear models. In particular, the package includes functions to conduct SMLE-screening and the related post-screening selection with popular selection criteria such as AIC and (extended) BIC. The package gives users the flexibility in controlling a series of screening parameters and accommodates both numerical and categorical feature input. The usage of SMLE is illustrated on extensive numerical examples, where the promising performance of the package is well observed.

counterfactuals: An R Package for Counterfactual Explanation Methods

2023-06-29T15:13:38+00:00

Counterfactual explanation methods provide information on how feature values of individual observations must be changed to obtain a desired prediction. Despite the increasing amount of proposed methods in research, only a few implementations exist, whose interfaces and requirements vary widely. In this work, we introduce the counterfactuals R package, which provides a modular and unified R6-based interface for counterfactual explanation methods. We implemented three existing counterfactual explanation methods and propose some optional methodological extensions to generalize these methods to different scenarios and to make them more comparable. We explain the structure and workflow of the package using real use cases and show how to integrate additional counterfactual explanation methods into the package. In addition, we compared the implemented methods for a variety of models and datasets with regard to the quality of their counterfactual explanations and their runtime behavior.

TrendLSW: Trend and Spectral Estimation of Nonstationary Time Series in R

2024-05-08T20:10:45+00:00

The TrendLSW R package has been developed to provide users with a suite of wavelet-based techniques to analyze the statistical properties of nonstationary time series. The key components of the package are (a) two approaches for the estimation of the evolutionary wavelet spectrum in the presence of trend; and (b) wavelet-based trend estimation in the presence of locally stationary wavelet errors via both linear and nonlinear wavelet thresholding; and (c) the calculation of associated pointwise confidence intervals. Lastly, the package directly implements boundary handling methods that enable the methods to be performed on data of arbitrary length, not just dyadic length as is common for wavelet-based methods, ensuring no preprocessing of data is necessary. The key functionality of the package is demonstrated through two data examples, arising from biology and activity monitoring.

equateMultiple: An R Package to Equate Multiple Forms

2024-10-18T08:42:36+00:00

Item response theory (IRT) provides a framework for modeling the responses given to a test or questionnaire, which are assumed to depend on an underlying latent variable and on some item parameters. Due to identifiability issues, when the parameters are estimated separately on different datasets, the estimates of the item parameters and the predicted values of the latent variable are not directly comparable. Equating is a statistical procedure that can be used to convert these values to a common metric and to obtain comparable test scores. The R package equateMultiple implements methods to link the parameters estimated on many different datasets. After briefly reviewing the IRT models and the equating methods, this article illustrates the use of the package.