Instructions on how to replicate the results from the paper with the given code.

# Initial setup of the environment

Installation and set up of Julia:
1. Download Julia 1.9 from https://julialang.org/downloads/ and install it. Make sure that the `julia` executable is added to path.
2. Start Julia in the /Code folder in the interactive mode using the `julia --project` command.
3. In the interactive prompt type `]` (switches to package manager mode) and then type run `instantiate` command.
   This operation will automatically install all required Julia packages in versions used in the paper.
4. Exit package manager mode by pressing `Esc` and exit Julia by executing the `exit()` command.

For set-up of R and Python with the required packages please follow the instructions that are specific to the runtime environment you use.
In the tests the following setup was used:
* R 4.2.2, `data.table` 1.14.8 and `arrow` 12.0.0 (using `dplyr` to specify queries).
* Python 3.11.3, `Polars` 0.17.13 and `pandas` 2.0.1 (using the `pyarrow` engine).

# Replication of results presented in section 5 and section 9

Steps to take:
1. Start Julia in the /Code folder in the interactive mode using the `julia --project` command.
2. Run the following commands for code from section 5:
```
using DataFrames
df = DataFrame(a = 1:3)
combine(df, :a => sum => :a_sum)
select(df, :a => sum => :a_sum)

df = DataFrame(x1 = 1:3, x2 = 4:6)
combine(df, [:x1, :x2] .=> [sum minimum maximum])

[:x1, :x2] .=> [sum minimum maximum]

df = DataFrame(key1 = ["a", "b", "a", "b"],
                      key2 = [1, 2, 1, 2],
                      value = 1:4)
gdf = groupby(df, [:key1, :key2])
gdf[("b", 2)]

combine(gdf, :value => sum)
select(gdf, :value => sum)
```
3. Run the following commands for code from section 9:
```
using CSV, DataFrames, DataFramesMeta, Chain,
      Dates, HTTP, Plots, Statistics
input = "https://raw.githubusercontent.com/Rdatatable" *
        "/data.table/master/vignettes/flights14.csv"
flights = CSV.read(HTTP.get(input).body, DataFrame)
select!(flights, :year, :month, :origin, :dest, :dep_delay)
@chain flights begin
    groupby(:month, sort = true)
    combine(:dep_delay => mean)
    transform(:month => ByRow(monthname) => :month_name)
    @aside show(sort(_, :dep_delay_mean))
    plot(_.month_name, _.dep_delay_mean, label = nothing,
         xlabel = "Month", ylabel = "Mean delay", xrotation = 15)
    savefig("flights.pdf")
end;
@chain flights begin
    groupby(:month, sort = true)
    @combine(:dep_delay_mean = mean(:dep_delay))
    @rtransform(:month_name = monthname(:month))
    @aside show(sort(_, :dep_delay_mean))
    plot(_.month_name, _.dep_delay_mean, label = nothing,
         xlabel = "Month", ylabel = "Mean delay", xrotation = 15)
    savefig("flights.pdf")
end;
```
4. Exit Julia by executing the `exit()` command.

# Replication of benchmarks presented in section 8 (code is given in the Appendix)

Note that the timings will of vary from the ones reported in the paper as they
depend on the specific user setup and computing environment.

Steps to take:
1. Go to /Code folder
2. Run the following command: `julia -t auto --project julia_bench.jl`
3. Run the following command: `RScript datatable_bench.r`
4. Run the following command: `RScript arrow_bench.r`
5. Run the following command: `python polars_bench.r`
6. Run the following command: `python pandas_bench.r`
