https://www.jstatsoft.org/index.php/jss/issue/feedJournal of Statistical Software2023-03-23T07:59:58+00:00Editorial Officeeditor@jstatsoft.orgOpen Journal SystemsThe Journal of Statistical Software publishes articles on statistical software along with the source code of the software itself and replication code for all empirical results.https://www.jstatsoft.org/index.php/jss/article/view/v106i01Elastic Net Regularization Paths for All Generalized Linear Models2021-08-18T15:53:15+00:00J. Kenneth Taykjytay@stanford.eduBalasubramanian Narasimhannaras@stanford.eduTrevor Hastiehastie@stanford.edu<p>The lasso and elastic net are popular regularized regression models for supervised learning. Friedman, Hastie, and Tibshirani (2010) introduced a computationally efficient algorithm for computing the elastic net regularization path for ordinary least squares regression, logistic regression and multinomial logistic regression, while Simon, Friedman, Hastie, and Tibshirani (2011) extended this work to Cox models for right-censored data. We further extend the reach of the elastic net-regularized regression to all generalized linear model families, Cox models with (start, stop] data and strata, and a simplified version of the relaxed lasso. We also discuss convenient utility functions for measuring the performance of these fitted models.</p>2023-03-23T00:00:00+00:00Copyright (c) 2023 J. Kenneth Tay, Balasubramanian Narasimhan, Trevor Hastiehttps://www.jstatsoft.org/index.php/jss/article/view/v106i02netmeta: An R Package for Network Meta-Analysis Using Frequentist Methods2021-12-03T20:50:05+00:00Sara Balduzzibalduzzi.sara@gmail.comGerta Rückergerta.ruecker@uniklinik-freiburg.deAdriani Nikolakopoulouadriani.nikolakopoulou@uniklinik-freiburg.deTheodoros Papakonstantinoutheodoros.papakonstantinou@uniklinik-freiburg.deGeorgia Salantigeorgia.salanti@ispm.unibe.chOrestis Efthimiouorestis.efthimiou@ispm.unibe.chGuido Schwarzersc@imbi.uni-freiburg.de<p>Network meta-analysis compares different interventions for the same condition, by combining direct and indirect evidence derived from all eligible studies. Network metaanalysis has been increasingly used by applied scientists and it is a major research topic for methodologists. This article describes the R package netmeta, which adopts frequentist methods to fit network meta-analysis models. We provide a roadmap to perform network meta-analysis, along with an overview of the main functions of the package. We present three worked examples considering different types of outcomes and different data formats to facilitate researchers aiming to conduct network meta-analysis with netmeta.</p>2023-03-23T00:00:00+00:00Copyright (c) 2023 Sara Balduzzi, Gerta Rücker, Adriani Nikolakopoulou, Theodoros Papakonstantinou, Georgia Salanti, Orestis Efthimiou, Guido Schwarzerhttps://www.jstatsoft.org/index.php/jss/article/view/v106i03MLGL: An R Package Implementing Correlated Variable Selection by Hierarchical Clustering and Group-Lasso2021-03-17T20:52:58+00:00Quentin Grimonprezquentin.grimonprez@inria.frSamuel Blancksamuel.blanck@univ-lille.frAlain Celissealain.celisse@univ-lille.frGuillemette Marotguillemette.marot@univ-lille.fr<p>The R package MLGL, standing for multi-layer group-Lasso, implements a new procedure of variable selection in the context of redundancy between explanatory variables, which holds true with high-dimensional data. A sparsity assumption is made that postulates that only a few variables are relevant for predicting the response variable. In this context, the performance of classical Lasso-based approaches strongly deteriorates as the redundancy increases. The proposed approach combines variables aggregation and selection in order to improve interpretability and performance. First, a hierarchical clustering procedure provides at each level a partition of the variables into groups. Then, the set of groups of variables from the different levels of the hierarchy is given as input to group-Lasso, with weights adapted to the structure of the hierarchy. At this step, group-Lasso outputs sets of candidate groups of variables for each value of the regularization parameter. The versatility offered by package MLGL to choose groups at different levels of the hierarchy a priori induces a high computational complexity. MLGL, however, exploits the structure of the hierarchy and the weights used in group-Lasso to greatly reduce the final time cost. The final choice of the regularization parameter - and therefore the final choice of groups - is made by a multiple hierarchical testing procedure.</p>2023-03-23T00:00:00+00:00Copyright (c) 2023 Quentin Grimonprez, Samuel Blanck, Alain Celisse, Guillemette Marothttps://www.jstatsoft.org/index.php/jss/article/view/v106i04drda: An R Package for Dose-Response Data Analysis Using Logistic Functions2022-12-02T13:40:55+00:00Alina Malyutinaalina.malyutina@helsinki.fiJing Tangjing.tang@helsinki.fiAlberto Pessiaacademic@albertopessia.com<p>Analysis of dose-response data is an important step in many scientific disciplines, including but not limited to pharmacology, toxicology, and epidemiology. The R package drda is designed to facilitate the analysis of dose-response data by implementing efficient and accurate functions with a familiar interface. With drda it is possible to fit models by the method of least squares, perform goodness-of-fit tests, and conduct model selection. Compared to other similar packages, drda provides in general more accurate estimates in the least-squares sense. This result is achieved by a smart choice of the starting point in the optimization algorithm and by implementing the Newton method with a trust region with analytical gradients and Hessian matrices. In this article, drda is presented through the description of its methodological components and examples of its user-friendly functions. Performance is evaluated using both synthetic data and a real, large-scale drug sensitivity screening dataset.</p>2023-03-23T00:00:00+00:00Copyright (c) 2023 Alina Malyutina, Jing Tang, Alberto Pessiahttps://www.jstatsoft.org/index.php/jss/article/view/v106i05RecordTest: An R Package to Analyze Non-Stationarity in the Extremes Based on Record-Breaking Events2022-10-10T09:22:52+00:00Jorge Castillo-Mateojorgecm@unizar.esAna C. Cebriánacebrian@unizar.esJesús Asínjasin@unizar.es<p>The study of non-stationary behavior in the extremes is important to analyze data in environmental sciences, climate, finance, or sports. As an alternative to the classical extreme value theory, this analysis can be based on the study of record-breaking events. The R package RecordTest provides a useful framework for non-parametric analysis of non-stationary behavior in the extremes, based on the analysis of records. The underlying idea of all the non-parametric tools implemented in the package is to use the distribution of the record occurrence under series of independent and identically distributed continuous random variables, to analyze if the observed records are compatible with that behavior. Two families of tests are implemented. The first only requires the record times of the series, while the second includes more powerful tests that join the information from different types of records: upper and lower records in the forward and backward series. The package also offers functions that cover all the steps in this type of analysis such as data preparation, identification of the records, exploratory analysis, and complementary graphical tools. The applicability of the package is illustrated with the analysis of the effect of global warming on the extremes of the daily maximum temperature series in Zaragoza, Spain.</p>2023-03-23T00:00:00+00:00Copyright (c) 2023 Jorge Castillo-Mateo, Ana C. Cebrián, Jesús Asínhttps://www.jstatsoft.org/index.php/jss/article/view/v106i06gfpop: An R Package for Univariate Graph-Constrained Change-Point Detection2021-06-29T12:33:59+00:00Vincent Rungevincent.runge@univ-evry.frToby Dylan Hockingtoby.hocking@nau.eduGaetano Romanog.romano@lancaster.ac.ukFatemeh Afghahfafghah@clemson.eduPaul Fearnheadp.fearnhead@lancaster.ac.ukGuillem Rigaillguillem.rigaill@inrae.fr<p>In a world with data that change rapidly and abruptly, it is important to detect those changes accurately. In this paper we describe an R package implementing a generalized version of an algorithm recently proposed by Hocking, Rigaill, Fearnhead, and Bourque (2020) for penalized maximum likelihood inference of constrained multiple change-point models. This algorithm can be used to pinpoint the precise locations of abrupt changes in large data sequences. There are many application domains for such models, such as medicine, neuroscience or genomics. Often, practitioners have prior knowledge about the changes they are looking for. For example in genomic data, biologists sometimes expect peaks: up changes followed by down changes. Taking advantage of such prior information can substantially improve the accuracy with which we can detect and estimate changes. Hocking et al. (2020) described a graph framework to encode many examples of such prior information and a generic algorithm to infer the optimal model parameters, but implemented the algorithm for just a single scenario. We present the gfpop package that implements the algorithm in a generic manner in R/C++. gfpop works for a user-defined graph that can encode prior assumptions about the types of changes that are possible and implements several loss functions (Gauss, Poisson, binomial, biweight, and Huber). We then illustrate the use of gfpop on isotonic simulations and several applications in biology. For a number of graphs the algorithm runs in a matter of seconds or minutes for 105 data points.</p>2023-03-27T00:00:00+00:00Copyright (c) 2023 Vincent Runge, Toby Dylan Hocking, Gaetano Romano, Fatemeh Afghah, Paul Fearnhead, Guillem Rigaillhttps://www.jstatsoft.org/index.php/jss/article/view/v106i07Broken Stick Model for Irregular Longitudinal Data2022-11-02T22:42:16+00:00Stef van Buurenstef.vanbuuren@tno.nl<p>Many longitudinal studies collect data that have irregular observation times, often requiring the application of linear mixed models with time-varying outcomes. This paper presents an alternative that splits the quantitative analysis into two steps. The first step converts irregularly observed data into a set of repeated measures through the broken stick model. The second step estimates the parameters of scientific interest from the repeated measurements at the subject level. The broken stick model approximates each subject's trajectory by a series of connected straight lines. The breakpoints, specified by the user, divide the time axis into consecutive intervals common to all subjects. Specification of the model requires just three variables: time, measurement and subject. The model is a special case of the linear mixed model, with time as a linear B-spline and subject as the grouping factor. The main assumptions are: Subjects are exchangeable, trajectories between consecutive breakpoints are straight, random effects follow a multivariate normal distribution, and unobserved data are missing at random. The R package brokenstick v2.5.0 offers tools to calculate, predict, impute and visualize broken stick estimates. The package supports two optimization methods, including options to constrain the variance-covariance matrix of the random effects. We demonstrate six applications of the model: Detection of critical periods, estimation of the time-to-time correlations, profile analysis, curve interpolation, multiple imputation and personalized prediction of future outcomes by curve matching.</p>2023-03-23T00:00:00+00:00Copyright (c) 2023 Stef van Buurenhttps://www.jstatsoft.org/index.php/jss/article/view/v106i08Probabilistic Estimation and Projection of the Annual Total Fertility Rate Accounting for Past Uncertainty: A Major Update of the bayesTFR R Package2021-02-08T01:54:56+00:00Peiran Liuprliu@uw.eduHana Ševčíkováhanas@uw.eduAdrian E. Rafteryraftery@uw.edu<p>The bayesTFR package for R provides a set of functions to produce probabilistic projections of the total fertility rates for all countries, and is widely used, including as part of the basis for the United Nations official population projections for all countries. Liu and Raftery (2020) extended the theoretical model by adding a layer that accounts for the past total fertility rate estimation uncertainty. A major update of bayesTFR implements the new extension. Moreover, a new feature of producing annual total fertility rate estimation and projections extends the existing functionality of estimating and projecting for five-year time periods. An additional autoregressive component has been developed in order to account for the larger autocorrelation in the annual version of the model. This article summarizes the updated model, describes the basic steps to generate probabilistic estimation and projections under different settings, compares performance, and provides instructions on how to summarize, visualize and diagnose the model results.</p>2023-03-26T00:00:00+00:00Copyright (c) 2023 Peiran Liu, Hana Ševčíková, Adrian E. Rafteryhttps://www.jstatsoft.org/index.php/jss/article/view/v106i09intRinsic: An R Package for Model-Based Estimation of the Intrinsic Dimension of a Dataset2022-12-27T16:43:08+00:00Francesco Dentifrancesco.denti@unicatt.it<p>This article illustrates intRinsic, an R package that implements novel state-of-the-art likelihood-based estimators of the intrinsic dimension of a dataset, an essential quantity for most dimensionality reduction techniques. In order to make these novel estimators easily accessible, the package contains a small number of high-level functions that rely on a broader set of efficient, low-level routines. Generally speaking, intRinsic encompasses models that fall into two categories: homogeneous and heterogeneous intrinsic dimension estimators. The first category contains the two nearest neighbors estimator, a method derived from the distributional properties of the ratios of the distances between each data point and its first two closest neighbors. The functions dedicated to this method carry out inference under both the frequentist and Bayesian frameworks. In the second category, we find the heterogeneous intrinsic dimension algorithm, a Bayesian mixture model for which an efficient Gibbs sampler is implemented. After presenting the theoretical background, we demonstrate the performance of the models on simulated datasets. This way, we can facilitate the exposition by immediately assessing the validity of the results. Then, we employ the package to study the intrinsic dimension of the Alon dataset, obtained from a famous microarray experiment. Finally, we show how the estimation of homogeneous and heterogeneous intrinsic dimensions allows us to gain valuable insights into the topological structure of a dataset.</p>2023-04-06T00:00:00+00:00Copyright (c) 2023 Francesco Dentihttps://www.jstatsoft.org/index.php/jss/article/view/v106i10Application of Equal Local Levels to Improve Q-Q Plot Testing Bands with R Package qqconf2023-01-20T00:38:52+00:00Eric Weineericweine15@gmail.comMary Sara McPeekmcpeek@uchicago.eduMark Abneyabney@uchicagao.edu<p>Quantile-quantile (Q-Q) plots are often difficult to interpret because it is unclear how large the deviation from the theoretical distribution must be to indicate a lack of fit. Most Q-Q plots could benefit from the addition of meaningful global testing bands, but the use of such bands unfortunately remains rare because of the drawbacks of current approaches and packages. These drawbacks include incorrect global type-I error rate, lack of power to detect deviations in the tails of the distribution, relatively slow computation for large data sets, and limited applicability. To solve these problems, we apply the equal local levels global testing method, which we have implemented in the R Package qqconf, a versatile tool to create Q-Q plots and probability-probability (P-P) plots in a wide variety of settings, with simultaneous testing bands rapidly created using recently-developed algorithms. qqconf can easily be used to add global testing bands to Q-Q plots made by other packages. In addition to being quick to compute, these bands have a variety of desirable properties, including accurate global levels, equal sensitivity to deviations in all parts of the null distribution (including the tails), and applicability to a range of null distributions. We illustrate the use of qqconf in several applications: assessing normality of residuals from regression, assessing accuracy of p values, and use of Q-Q plots in genome-wide association studies.</p>2023-04-16T00:00:00+00:00Copyright (c) 2023 Eric Weine, Mary Sara McPeek, Mark Abneyhttps://www.jstatsoft.org/index.php/jss/article/view/v106i11disaggregation: An R Package for Bayesian Spatial Disaggregation Modeling2021-09-21T09:20:22+00:00Anita K. Nandianita.k.nandi@gmail.comTim C. D. Lucastimcdlucas@gmail.comRohan Arambepolarohan.arambepola@bdi.ox.ac.ukPeter GethingPeter.Gething@telethonkids.org.auDaniel J. Weissdaniel.weiss@bdi.ox.ac.uk<p>Disaggregation modeling, or downscaling, has become an important discipline in epidemiology. Surveillance data, aggregated over large regions, is becoming more common, leading to an increasing demand for modeling frameworks that can deal with this data to understand spatial patterns. Disaggregation regression models use response data aggregated over large heterogeneous regions to make predictions at fine-scale over the region by using fine-scale covariates to inform the heterogeneity. This paper presents the R package disaggregation, which provides functionality to streamline the process of running a disaggregation model for fine-scale predictions.</p>2023-05-01T00:00:00+00:00Copyright (c) 2023 Anita K. Nandi, Tim C. D. Lucas, Rohan Arambepola, Peter Gething, Daniel J. Weisshttps://www.jstatsoft.org/index.php/jss/article/view/v106i12bootUR: An R Package for Bootstrap Unit Root Tests2022-08-18T21:08:17+00:00Stephan Smeekess.smeekes@maastrichtuniversity.nlInes Wilmsi.wilms@maastrichtuniversity.nl<p>Unit root tests form an essential part of any time series analysis. We provide practitioners with a single, unified framework for comprehensive and reliable unit root testing in the R package bootUR. The package's backbone is the popular augmented Dickey-Fuller test paired with a union of rejections principle, which can be performed directly on single time series or multiple (including panel) time series. Accurate inference is ensured through the use of bootstrap methods. The package addresses the needs of both novice users, by providing user-friendly and easy-to-implement functions with sensible default options, as well as expert users, by giving full user-control to adjust the tests to one's desired settings. Our parallelized C++ implementation ensures that all unit root tests are scalable to datasets containing many time series.</p>2023-05-08T00:00:00+00:00Copyright (c) 2023 Stephan Smeekes, Ines Wilms