Published by the Foundation for Open Access Statistics Editors-in-chief: Bettina Grün, Torsten Hothorn, Edzer Pebesma, Achim Zeileis    ISSN 1548-7660; CODEN JSSOBK
Authors: Hannes Mühleisen, Alexander Bertram, Maarten-Jan Kallen
Title: Database-Inspired Optimizations for Statistical Analysis
Abstract: Computing complex statistics on large amounts of data is no longer a corner case, but a daily challenge. However, current tools such as GNU R were not built to efficiently handle large data sets. We propose to vastly improve the execution of R scripts by interpreting them as a declaration of intent rather than an imperative order set in stone. This allows us to apply optimization techniques from the columnar data management research field. We have implemented several of these optimizers in Renjin, an open-source execution environment for R scripts targeted at the Java virtual machine. The demonstration of our approach using a series of micro-benchmarks and experiments on complex survey analysis show orders-of-magnitude improvements in analysis cost.

Page views:: 803. Submitted: 2016-03-18. Published: 2018-11-28.
Paper: Database-Inspired Optimizations for Statistical Analysis     Download PDF (Downloads: 228)
Supplements:
renjin-relational-optimizations.zip: Source code Download (Downloads: 11; 6MB)
v87i04-replication.zip: Replication materials Download (Downloads: 10; 15KB)
acs3yr.rds: Supplementary data in R binary format Download (Downloads: 3; 1GB)
alabama.rds: Supplementary data in R binary format Download (Downloads: 8; 7MB)
california.rds: Supplementary data in R binary format Download (Downloads: 5; 144MB)

DOI: 10.18637/jss.v087.i04

by
This work is licensed under the licenses
Paper: Creative Commons Attribution 3.0 Unported License
Code: GNU General Public License (at least one of version 2 or version 3) or a GPL-compatible license.