Experiment

All the R scripts for obtaining the experimental results of this paper are put in the folder Code, and the derived experimental results are placed in the folder experiment-results. Note that the scripts will automatically download the datasets from wsrf-test-data@GitHub to run the experiments.

Because the experiments carried out in the paper takes very long time to complete, we include as an example the smaller weather dataset from the R package rattle. The result of it is experi.log.weather. And the total time taken for the weather dataset without testing the scalability of wsrf (which is disabled by default) is less than 20 minutes on machines with Ubuntu 12.04.1 LTS, GCC version 4.8.1, 16 cores and 32.9GB RAM on Intel(R) Xeon(R) CPU E5620 @ 2.40GHz, which is the same environment as in the paper.

To run the experiments on a specific dataset, please change the working directory into Code and type the shell command below to start:

## Please change "weather" to the name of the dataset one wants to
## test.  Or one can use `make help` for available tests.  Please
## make sure all the required *R* packages are installed before
## execution.

make weather.log

Since the experiments involve distributed computing, corresponding modifications (giving the name list of cluster) to the scripts need to be made when running all the scripts presented. However, distributed computing is by default disabled in the scripts. One wants to run experiments related to distributed computing should follow the instructions in the lines prefixed with __"#*#"__ in the scripts.

If the error or warning below occurs when running the script for the experiments, one should modify (decrease) the timing interval of Rprof in the file experi.R, because of the issue PR#16395 of R summaryRprof:

Error in memcounts[, 1L:2L] : incorrect number of dimensions

In max(summaryRprof(tf, memory = "both")$by.total[, "mem.total"]) : no non-missing arguments to max; returning -Inf

Note that the results of the time and memory taken for the experiments are not that possible replicated. However, it won't be that significant and the observations obtained in the paper still hold. The results of accuracy would be the same if the same environment and versions of the packages described in the paper are applied.

Notes for the experimental results:

Manuscript

The R scripts for generating the tables and figures in the manuscript are in the folder Code/manuscript/tables and Code/manuscript/figures. section5.R in the folder Code/manuscript/ is the script in Section 5 of the manuscript.

To run the scripts, type the shell command below:

## Please change "table3-ntree.R" to the corresponding script.
## Please make sure all the required *R* packages are installed
## before execution.

Rscript table3-ntree.R

Then the figures or LaTeX code for the tables are output.