Simulating Complex Cross-Sectional and Longitudinal Data Using the simDAG R Package

Robin Denz, Nina Timmesfeld

Main Article Content

Abstract

Generating artificial data is a crucial step when performing Monte Carlo simulation studies. Depending on the planned study, complex data generation processes (DGPs) containing multiple, possibly time-varying, variables with various forms of dependencies and data types may be required. Simulating data from such DGPs may therefore become a difficult and time-consuming endeavor. The simDAG R package offers a standardized approach to generate data from simple and complex DGPs based on the definition of structural equations in directed acyclic graphs using arbitrary functions or regression models. The package offers a clear syntax with an enhanced formula interface and directly supports generating binary, categorical, count and time-to-event data with arbitrary dependencies, possibly non-linear relationships and interactions. It additionally includes a framework to conduct discrete-time based simulations which allows the generation of longitudinal data on a semi-continuous time-scale. This approach may be used to generate time-to-event data with both recurrent or competing events and possibly multiple time-varying covariates, which may themselves have arbitrary data types. In this article we demonstrate the vast amount of features included in simDAG by replicating the DGPs of multiple real Monte Carlo simulation studies.

Article Details

Article Sidebar