Journal of Statistical Software

anomaly: Detection of Anomalous Structure in Time Series Data

2022-03-04T10:24:22+00:00

One of the contemporary challenges in anomaly detection is the ability to detect, and differentiate between, both point and collective anomalies within a data sequence or time series. The anomaly package has been developed to provide users with a choice of anomaly detection methods and, in particular, provides an implementation of the recently proposed collective and point anomaly family of anomaly detection algorithms. This article describes the methods implemented whilst also highlighting their application to simulated data as well as real data examples contained in the package.

An Extendable Python Implementation of Robust Optimization Monte Carlo

2023-07-03T15:36:25+00:00

Performing inference in statistical models with an intractable likelihood is challenging, therefore, most likelihood-free inference (LFI) methods encounter accuracy and efficiency limitations. In this paper, we present the implementation of the LFI method robust optimization Monte Carlo (ROMC) in the Python package elfi. ROMC is a novel and efficient (highly-parallelizable) LFI framework that provides accurate weighted samples from the posterior. Our implementation can be used in two ways. First, a scientist may use it as an out-of-the-box LFI algorithm; we provide an easy-to-use API harmonized with the principles of elfi, enabling effortless comparisons with the rest of the methods included in the package. Additionally, we have carefully split ROMC into isolated components for supporting extensibility. A researcher may experiment with novel method(s) for solving part(s) of ROMC without reimplementing everything from scratch. In both scenarios, the ROMC parts can run in a fully-parallelized manner, exploiting all CPU cores. We also provide helpful functionalities for (i) inspecting the inference process and (ii) evaluating the obtained samples. Finally, we test the robustness of our implementation on some typical LFI examples.

makemyprior: Intuitive Construction of Joint Priors for Variance Parameters in R

2023-09-11T11:02:14+00:00

Priors allow us to robustify inference and to incorporate expert knowledge in Bayesian hierarchical models. This is particularly important when there are random effects that are hard to identify based on observed data. The challenge lies in understanding and controlling the joint influence of the priors for the variance parameters, and makemyprior is an R package that guides the formulation of joint prior distributions for variances. A joint prior distribution is constructed based on a hierarchical decomposition of the total variance in the model along a tree, and takes the entire model structure into account. Users input their prior beliefs or express ignorance at each level of the tree. Prior beliefs can be general ideas about reasonable ranges of variance values and need not be detailed expert knowledge. The constructed priors lead to robust inference and guarantee proper posteriors. A graphical user interface facilitates construction and assessment of different choices of priors through visualization of the tree and joint prior. The package aims to expand the toolbox of applied researchers and make priors an active component in their Bayesian workflow.

fairadapt: Causal Reasoning for Fair Data Preprocessing

2022-04-28T14:28:26+00:00

Machine learning algorithms are useful for various prediction tasks, but they can also learn how to discriminate, based on gender, race or other sensitive attributes. This realization gave rise to the field of fair machine learning, which aims to recognize, quantify and ultimately mitigate such algorithmic bias. This manuscript describes the R package fairadapt, which implements a causal inference preprocessing method. By making use of a causal graphical model alongside the observed data, the method can be used to address hypothetical questions of the form "What would my salary have been, had I been of a different gender/race?". Such individual level counterfactual reasoning can help eliminate discrimination and help justify fair decisions. We also discuss appropriate relaxations which assume that certain causal pathways from the sensitive attribute to the outcome are not discriminatory.

bayesnec: An R Package for Concentration-Response Modeling and Estimation of Toxicity Metrics

2023-07-03T16:36:45+00:00

The bayesnec package has been developed for R to fit concentration (dose)-response curves (CR) to toxicity data for the purpose of deriving no-effect-concentration (NEC), no-significant-effect-concentration (NSEC), and effect-concentration (of specified percentage "x", ECx) thresholds from non-linear models fitted using Bayesian Hamiltonian Monte Carlo (HMC) via R packages brms and rstan or cmdstanr. In bayesnec it is possible to fit a single model, custom model-set, specific model-set or all of the available models. When multiple models are specified, the bnec() function returns a model weighted average estimate of predicted posterior values. A range of support functions and methods is also included to work with the returned single, or multi-model objects that allow extraction of raw, or model averaged predicted, NEC, NSEC and ECx values and to interrogate the fitted model or model-set. By combining Bayesian methods with model averaging, bayesnec provides a single estimate of toxicity and associated uncertainty that can be directly integrated into risk assessment frameworks.

sparsegl: An R Package for Estimating Sparse Group Lasso

2022-11-29T00:50:51+00:00

The sparse group lasso is a high-dimensional regression technique that is useful for problems whose predictors have a naturally grouped structure and where sparsity is encouraged at both the group and individual predictor level. In this paper we discuss a new R package for computing such regularized models. The intention is to provide highly optimized solution routines enabling analysis of very large datasets, especially in the context of sparse design matrices.

cubble: An R Package for Organizing and Wrangling Multivariate Spatio-Temporal Data

2022-06-27T14:05:38+00:00

Multivariate spatio-temporal data refers to multiple measurements taken across space and time. For many analyses, spatial and time components can be separately studied: for example, to explore the temporal trend of one variable for a single spatial location, or to model the spatial distribution of one variable at a given time. However for some studies, it is important to analyze different aspects of the spatio-temporal data simultaneously, for instance, temporal trends of multiple variables across locations. In order to facilitate the study of different portions or combinations of spatio-temporal data, we introduce a new class, cubble, with a suite of functions enabling easy slicing and dicing on different spatio-temporal components. The proposed cubble class ensures that all the components of the data are easy to access and manipulate while providing flexibility for data analysis. In addition, the cubble package facilitates visual and numerical explorations of the data while easing data wrangling and modelling. The cubble class and the tools implemented in the package are illustrated with examples from climate data analysis.

Weighted scoringRules: Emphasizing Particular Outcomes When Evaluating Probabilistic Forecasts

2023-05-15T16:06:26+00:00

When predicting future events, it is common to issue forecasts that are probabilistic, in the form of probability distributions over the range of possible outcomes. Such forecasts can be evaluated using proper scoring rules. Proper scoring rules condense forecast performance into a single numerical value, allowing competing forecasters to be ranked and compared. To facilitate the use of scoring rules in practical applications, the scoringRules package in R provides popular scoring rules for a wide range of forecast distributions. This paper discusses an extension to the scoringRules package that additionally permits the implementation of popular weighted scoring rules. Weighted scoring rules allow particular outcomes to be targeted during forecast evaluation, recognizing that certain outcomes are often of more interest than others when assessing forecast quality. This introduces the potential for very flexible, user-oriented evaluation of probabilistic forecasts. We discuss the theory underlying weighted scoring rules, and describe how they can readily be implemented in practice using scoringRules. Functionality is available for weighted versions of several popular scoring rules, including the logarithmic score, the continuous ranked probability score, and the energy score. Two case studies are presented to demonstrate this, whereby weighted scoring rules are applied to univariate and multivariate probabilistic forecasts in the fields of meteorology and economics.