# Summary

Below are instructions on replication for:
Niehaus, Zhou, Cook, and Jun. "bizicount: Bivariate Zero-Inflated Count Copula Regression using R", published in JSS v109/i01.
The full replication takes roughly 1 hour on Debian GNU/Linux 6.6.15-amd64 using 7 Intel i7-10510U CPUs @ 1.80GHz with 16 GB RAM.
Alternatively, it takes 30 minutes on Windows 10 Pro using an Intel i7-12700K using 20 cores @ 4.5GHZ and 32 GB DDR4 RAM.
Note that the exact numerical results may differ slightly, depending on the platform, see below for details.

* The `Figures/` folder has the tables and figures from replication. These will be replaced if users run the replication scripts.

* All printed console output from the manuscript can be found in the `v109i01-replication.txt` file. 

* To re-run the replication materials, source the `v109i01-replication.R` script. If desired, the script can be changed to exclude the simulation results.

# Table of Contents

1. [Dependencies](#dependencies)
2. [File descriptions](#file-descriptions)
3. [Replicating the paper](#replication)

***

# Dependencies

R packages
 
* `bizicount`
* `dplyr`
* `tidyr`
* `ggplot2`
* `copula`
* `doParallel`
* `doRNG`
* `RhpcBLASctl`

These can be installed manually; however, if the replication instructions found below are followed, they will be installed automatically.
See the session information printed below, or the `session_info.txt` file for exact versions of these packages.

**Platform and R package version dependencies**

This replication was done using Debian GNU/Linux bit with the default BLAS/LAPACK 
installation using R version 4.3.3. Other platforms may use different C++ compiler versions that could
lead to differences in numerical results. Using different linear algebra 
libraries may also give different results due to different algorithms for solving 
linear systems and differences in numerical precision among them. 

The versions of each package are found below:

```
R version 4.3.3 (2024-02-29)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux trixie/sid

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.12.0 
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

time zone: Europe/Vienna
tzcode source: system (glibc)

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] RhpcBLASctl_0.23-42 doRNG_1.8.6         rngtools_1.5.2     
 [4] doParallel_1.0.17   iterators_1.0.14    foreach_1.5.2      
 [7] copula_1.1-3        ggplot2_3.5.0       tidyr_1.3.1        
[10] dplyr_1.1.4         bizicount_1.3.2    

loaded via a namespace (and not attached):
 [1] utf8_1.2.4          generics_0.1.3      lattice_0.22-5     
 [4] digest_0.6.35       lme4_1.1-35.1       magrittr_2.0.3     
 [7] grid_4.3.3          mvtnorm_1.2-4       Matrix_1.6-5       
[10] Formula_1.2-5       httr_1.4.7          purrr_1.0.2        
[13] fansi_1.0.6         scales_1.3.0        stabledist_0.7-1   
[16] pbivnorm_0.6.0      codetools_0.2-19    numDeriv_2016.8-1.1
[19] cli_3.6.2           rlang_1.1.3         pspline_1.0-19     
[22] texreg_1.39.3       gsl_2.1-8           munsell_0.5.0      
[25] splines_4.3.3       withr_3.0.0         tools_4.3.3        
[28] nloptr_2.0.3        minqa_1.2.6         colorspace_2.1-0   
[31] boot_1.3-30         vctrs_0.6.5         R6_2.5.1           
[34] stats4_4.3.3        lifecycle_1.0.4     ADGofTest_0.3      
[37] MASS_7.3-60.0.1     pcaPP_2.0-4         pkgconfig_2.0.3    
[40] pillar_1.9.0        gtable_0.3.4        glue_1.7.0         
[43] Rcpp_1.0.12         tibble_3.2.1        tidyselect_1.2.1   
[46] nlme_3.1-164        DHARMa_0.4.6        compiler_4.3.3     
```

***

# File descriptions

* `v109i01-replication.R` -- Primary replication script. 
* `install_dependencies.R` -- Installs the [R dependencies](#dependencies). This script will be executed automatically if the instructions in [replication](#Replication) are followed.
* `montes_small.R` -- Script used for simulation results presented in appendix. **If you are on Windows**, the script will most likely result in a prompt asking for R to have network access. This is required for the parallel processing to work, as the processing is done over a PSOCK cluster on Windows.
* `plots_small.R` -- Script for producing plots from monte carlo results. 
* `empirical_replication.R` -- The script that generates the tables and figures in the main-text and appendix, with the exception of the theoretical copula functions.
* `plots.R` -- Produces the plots for the theoretical copula functions that are found in Figure 1.
* `Figures/` -- Folder containing all figures and tables produced by the above scripts, including those from the simulations and in the appendix. It also contains the `console_output.txt` file, which is the output of all scripts printed to a text file.
* `output_montes_small.RData` -- Data produced from running the simulations on our machine (overwritten if simulations are re-run). 

***

# Replication

_Note: The Monte Carlo simulations can take about an hour to run. Because of this, users can easily set the `run_simulations` variable in the
`v109i01-replication.R` script to `FALSE`. In that case, the `output_montes_small.RData` file has the results as produced for the paper._

If you choose to re-run the simulations and you are on Windows, there is a chance that you will be
prompted for elevated network priveleges. This is because we utilize a PSOCK cluster to run the 
simulations in parallel. Denying these privileges will cause the replication to fail. 

1. Open the `v109i01-replication.R` script
2. Set working directory to the script location.
3. Source the opened `v109i01-replication.R` script, either by pressing `ctrl/cmd` + `shift` + `enter/return` or by clicking
   `Code --> Source with Echo` at the top of RStudio. Alternatively, type `source("v109i01-replication.R", echo = TRUE)` into the console.
4. Output will be in `Figures/`, and raw console output will be in `v109i01-replication.txt`. 

**Note:** The simulations will use all but one of the available CPUs. 
