Journal of Statistical Software 2023-01-18T22:04:03+00:00 Editorial Office Open Journal Systems The Journal of Statistical Software publishes articles on statistical software along with the source code of the software itself and replication code for all empirical results. cglasso: An R Package for Conditional Graphical Lasso Inference with Censored and Missing Values 2022-06-10T09:31:35+00:00 Luigi Augugliaro Gianluca Sottile Ernst C. Wit Veronica Vinciotti <p>Sparse graphical models have revolutionized multivariate inference. With the advent of high-dimensional multivariate data in many applied fields, these methods are able to detect a much lower-dimensional structure, often represented via a sparse conditional independence graph. There have been numerous extensions of such methods in the past decade. Many practical applications have additional covariates or suffer from missing or censored data. Despite the development of these extensions of sparse inference methods for graphical models, there have been so far no implementations for, e.g., conditional graphical models. Here we present the general-purpose package cglasso for estimating sparse conditional Gaussian graphical models with potentially missing or censored data. The method employs an efficient expectation-maximization estimation of an ℓ<sub>1</sub> -penalized likelihood via a block-coordinate descent algorithm. The package has a user-friendly data manipulation interface. It estimates a solution path and includes various automatic selection algorithms for the two ℓ<sub>1</sub> tuning parameters, associated with the sparse precision matrix and sparse regression coefficients, respectively. The package pays particular attention to the visualization of the results, both by means of marginal tables and figures, and of the inferred conditional independence graphs. This package provides a unique and computational efficient implementation of a conditional Gaussian graphical model that is able to deal with the additional complications of missing and censored data. As such it constitutes an important contribution for empirical scientists wishing to detect sparse structures in high-dimensional data.</p> 2023-01-18T00:00:00+00:00 Copyright (c) 2023 Luigi Augugliaro, Gianluca Sottile, Ernst C. Wit, Veronica Vinciotti deepregression: A Flexible Neural Network Framework for Semi-Structured Deep Distributional Regression 2022-03-11T13:27:09+00:00 David Rügamer Chris Kolb Cornelius Fritz Florian Pfisterer Philipp Kopper Bernd Bischl Ruolin Shen Christina Bukas Lisa Barros de Andrade e Sousa Dominik Thalmeier Philipp F. M. Baumann Lucas Kook Nadja Klein Christian L. Müller <p>In this paper we describe the implementation of semi-structured deep distributional regression, a flexible framework to learn conditional distributions based on the combination of additive regression models and deep networks. Our implementation encompasses (1) a modular neural network building system based on the deep learning library TensorFlow for the fusion of various statistical and deep learning approaches, (2) an orthogonalization cell to allow for an interpretable combination of different subnetworks, as well as (3) pre-processing steps necessary to set up such models. The software package allows to define models in a user-friendly manner via a formula interface that is inspired by classical statistical model frameworks such as mgcv. The package's modular design and functionality provides a unique resource for both scalable estimation of complex statistical models and the combination of approaches from deep learning and statistics. This allows for state-of-the-art predictive performance while simultaneously retaining the indispensable interpretability of classical statistical models.</p> 2023-01-18T00:00:00+00:00 Copyright (c) 2023 David Rügamer, Chris Kolb, Cornelius Fritz, Florian Pfisterer, Philipp Kopper, Bernd Bischl, Ruolin Shen, Christina Bukas, Lisa Barros de Andrade e Sousa, Dominik Thalmeier, Philipp F. M. Baumann, Lucas Kook, Nadja Klein, Christian L. Müller spsurvey: Spatial Sampling Design and Analysis in R 2022-04-29T07:29:20+00:00 Michael Dumelle Tom Kincaid no@e-mail.provided Anthony R. Olsen no@e-mail.provided Marc Weber no@e-mail.provided <p>spsurvey is an R package for design-based statistical inference, with a focus on spatial data. spsurvey provides the generalized random-tessellation stratified (GRTS) algorithm to select spatially balanced samples via the grts() function. The grts() function flexibly accommodates several sampling design features, including stratification, varying inclusion probabilities, legacy (or historical) sites, minimum distances between sites, and two options for replacement sites. spsurvey also provides a suite of data analysis options, including categorical variable analysis (cat_analysis()), continuous variable analysis (cont_analysis()), relative risk analysis (relrisk_analysis()), attributable risk analysis (attrisk_analysis()), difference in risk analysis (diffrisk_analysis()), change analysis (change_analysis()), and trend analysis (trend_analysis()). In this manuscript, we first provide background for the GRTS algorithm and the analysis approaches and then show how to implement them in spsurvey. We find that the spatially balanced GRTS algorithm yields more precise parameter estimates than simple random sampling, which ignores spatial information.</p> 2023-01-18T00:00:00+00:00 Copyright (c) 2023 Michael Dumelle, Tom Kincaid, Anthony R. Olsen, Marc Weber jumpdiff: A Python Library for Statistical Inference of Jump-Diffusion Processes in Observational or Experimental Data Sets 2022-04-13T14:02:18+00:00 Leonardo Rydin Gorjão Dirk Witthaut Pedro G. Lind <p>We introduce a Python library, called jumpdiff, which includes all necessary functions to assess jump-diffusion processes. This library includes functions which compute a set of non-parametric estimators of all contributions composing a jump-diffusion process, namely the drift, the diffusion, and the stochastic jump strengths. Having a set of measurements from a jump-diffusion process, jumpdiff is able to retrieve the evolution equation producing data series statistically equivalent to the series of measurements. The back-end calculations are based on second-order corrections of the conditional moments expressed from the series of Kramers-Moyal coefficients. Additionally, the library is also able to test if stochastic jump contributions are present in the dynamics underlying a set of measurements. Finally, we introduce a simple iterative method for deriving secondorder corrections of any Kramers-Moyal coefficient.</p> 2023-01-18T00:00:00+00:00 Copyright (c) 2023 Leonardo Rydin Gorjão, Dirk Witthaut, Pedro G. Lind Regression Modeling for Recurrent Events Possibly with an Informative Terminal Event Using R Package reReg 2021-11-05T18:22:48+00:00 Sy Han Chiou Gongjun Xu Jun Yan Chiung-Yu Huang <p>Recurrent event analyses have found a wide range of applications in biomedicine, public health, and engineering, among others, where study subjects may experience a sequence of event of interest during follow-up. The R package reReg offers a comprehensive collection of practical and easy-to-use tools for regression analysis of recurrent events, possibly with the presence of an informative terminal event. The regression framework is a general scalechange model which encompasses the popular Cox-type model, the accelerated rate model, and the accelerated mean model as special cases. Informative censoring is accommodated through a subject-specific frailty without any need for parametric specification. Different regression models are allowed for the recurrent event process and the terminal event. Also included are visualization and simulation tools.</p> 2023-01-28T00:00:00+00:00 Copyright (c) 2023 Sy Han Chiou, Gongjun Xu, Jun Yan, Chiung-Yu Huang ergm 4: New Features for Analyzing Exponential-Family Random Graph Models 2022-08-17T14:53:42+00:00 Pavel N. Krivitsky David R. Hunter Martina Morris Chad Klumb <p>The ergm package supports the statistical analysis and simulation of network data. It anchors the statnet suite of packages for network analysis in R introduced in a special issue in Journal of Statistical Software in 2008. This article provides an overview of the new functionality in the 2021 release of ergm version 4. These include more flexible handling of nodal covariates, term operators that extend and simplify model specification, new models for networks with valued edges, improved handling of constraints on the sample space of networks, and estimation with missing edge data. We also identify the new packages in the statnet suite that extend ergm's functionality to other network data types and structural features and the robust set of online resources that support the statnet development process and applications.</p> 2023-01-27T00:00:00+00:00 Copyright (c) 2023 Pavel N. Krivitsky, David R. Hunter, Martina Morris, Chad Klumb Expanding Tidy Data Principles to Facilitate Missing Data Exploration, Visualization and Assessment of Imputations 2021-08-02T08:59:37+00:00 Nicholas Tierney Dianne Cook <p>Despite the large body of research on missing value distributions and imputation, there is comparatively little literature with a focus on how to make it easy to handle, explore, and impute missing values in data. This paper addresses this gap. The new methodology builds upon tidy data principles, with the goal of integrating missing value handling as a key part of data analysis workflows. We define a new data structure, and a suite of new operations. Together, these provide a connected framework for handling, exploring, and imputing missing values. These methods are available in the R package naniar.</p> 2023-02-03T00:00:00+00:00 Copyright (c) 2023 Nicholas Tierney, Dianne Cook Additive Bayesian Network Modeling with the R Package abn 2021-07-27T13:11:32+00:00 Gilles Kratzer Fraser Lewis no@e-mail.provided Arianna Comin no@e-mail.provided Marta Pittavino no@e-mail.provided Reinhard Furrer <p>The R package abn is designed to fit additive Bayesian network models to observational datasets and contains routines to score Bayesian networks based on Bayesian or information theoretic formulations of generalized linear models. It is equipped with exact search and greedy search algorithms to select the best network, and supports continuous, discrete and count data in the same model and input of prior knowledge at a structural level. The Bayesian implementation supports random effects to control for one-layer clustering. In this paper, we give an overview of the methodology and illustrate the package's functionality using a veterinary dataset concerned with respiratory diseases in commercial swine production.</p> 2023-01-28T00:00:00+00:00 Copyright (c) 2023 Gilles Kratzer, Fraser Lewis, Arianna Comin, Marta Pittavino, Reinhard Furrer Bayesian Structure Learning and Sampling of Bayesian Networks with the R Package BiDAG 2021-09-06T09:37:18+00:00 Polina Suter Jack Kuipers Giusi Moffa Niko Beerenwinkel <p>The R package BiDAG implements Markov chain Monte Carlo (MCMC) methods for structure learning and sampling of Bayesian networks. The package includes tools to search for a maximum a posteriori (MAP) graph and to sample graphs from the posterior distribution given the data. A new hybrid approach to structure learning enables inference in large graphs. In the first step, we define a reduced search space by means of the PC algorithm or based on prior knowledge. In the second step, an iterative order MCMC scheme proceeds to optimize the restricted search space and estimate the MAP graph. Sampling from the posterior distribution is implemented using either order or partition MCMC. The models and algorithms can handle both discrete and continuous data. The BiDAG package also provides an implementation of MCMC schemes for structure learning and sampling of dynamic Bayesian networks.</p> 2023-01-28T00:00:00+00:00 Copyright (c) 2023 Polina Suter, Jack Kuipers, Giusi Moffa, Niko Beerenwinkel