Integration of R and Scala Using rscala

The rscala software is a simple, two-way bridge between R and Scala that allows users to leverage the unique strengths of both languages in a single project. Scala classes can be instantiated from R and Scala methods can be called. Arbitrary Scala code can be executed on-the-ﬂy from within R and callbacks to R are supported. R packages can be developed based on Scala . Conversely, rscala also enables R code to be embedded within a Scala application. The rscala package is available from the Comprehensive R Archive Network (CRAN) and has no dependencies beyond base R and the Scala standard library.


Introduction
This paper introduces rscala (Dahl 2020), a software package that provides a bridge between R (R Core Team 2019) and Scala (Odersky et al. 2004). The goal of rscala is to allow users to leverage the unique strengths of Scala and R in a single program. For example, R packages can implement computationally intensive algorithms in Scala and, conversely, Scala applications can take advantage of the vast array of statistical packages in R. Callbacks from embedded Scala into R are supported. The rscala package is available from the Comprehensive R Archive Network (CRAN) at https://CRAN.R-project.org/package=rscala. Also, R can be embedded within a Scala application by adding a one-line dependency declaration in Scala Build Tool (SBT).
Scala is a general-purpose programming language that strikes a balance between execution speed and programmer productivity. Scala programs run on the Java virtual machine (JVM) at speeds comparable to Java. Scala features object-oriented, functional, and imperative programming paradigms, affording developer's flexibility in application design. Scala code can be concise, thanks in part to: type inference, higher-order functions, multiple inheritance through traits, and a large collection of libraries. Scala also supports pattern matching, oper-ator overloading, optional and named parameters, and string interpolation. Scala encourages immutable data types and pure functions (i.e., functions without side-effects) to simplify parallel processing and unit testing. In short, the Scala language implements many of the most productive ideas in modern computing. To learn more about Scala, we suggest Programming in Scala (Odersky, Spoon, and Venners 2016) as an excellent general reference.
Because Scala is flexible, concise, and quick to execute, it is emerging as an important tool for scientific computing. For example, Spark (Zaharia et al. 2016) is a cluster-computing framework for massive datasets written in Scala. Several books have been published recently on using Scala for data science (Bugnion 2016), scientific computing (Jancauskas 2016), machine learning (Nicolas 2014;Karim and Alla 2017), and probabilistic programming (Pfeffer 2016). We believe that Scala deserves consideration when looking for an efficient and convenient general-purpose programming language to complement R.
R is a scripting language and environment developed by statisticians for statistical computing and graphics. Like Scala, R supports a functional programming style and provides immutable data types. Scala programmers who learn R will find many familiar concepts, despite the syntactical differences. R has a large user base and over 13,000 actively maintained packages on CRAN. Hence, the Scala community has a lot to gain from an integration with R.
R code can be very concise and expressive, but may run significantly slower than compiled languages. In fact, computationally intensive algorithms in R are typically implemented in compiled languages such as C, C++, Fortran, and Java. The rscala package adds Scala to this list of high-performance languages that can be used to write R extensions. The rscala package is similar in concept to Rcpp (Eddelbuettel and François 2011), an R integration for C and C++, and rJava (Urbanek 2019a), an R integration for Java. Though the rscala integration is not as comprehensive as Rcpp and rJava, it provides the following important features to blend R and Scala. First, rscala allows arbitrary Scala snippets to be included within an R script and Scala objects can be created and referenced directly within R code. These features allow users to integrate Scala solutions in an existing R workflow. Second, rscala supports callbacks to R from Scala, which allows developers to implement general, high-performance algorithms in Scala (e.g., root finding methods) based on user-supplied R functions. Third, rscala supports developing R packages based on Scala which allows Scala developers to make their work available to the R community. Finally, the rscala software makes it easy to incorporate R in a Scala application without even having to install the R package. In sum, rscala's feature-set makes it easy to exploit the strengths of R and Scala in a single project.
We now discuss the implementation of rscala and some existing work. Since Scala code compiles to Java byte code and runs on the JVM, one could access Scala from R via rJava and then benefit from the speed of shared memory. We originally implemented our Scala bridge using this technique, but later moved to a custom TCP/IP protocol for the following reasons. First, rJava and Scala both use custom class loaders which, in our experience, conflict with each other in some cases. Second, since rJava links to a single instance of the JVM, one rJava-based package can configure the JVM in a manner that is not compatible with a second rJava-based package. The rscala package creates a new instance of the JVM for each bridge to avoid such conflicts. Third, the simplicity of no dependencies beyond Scala's standard library and base R is appealing from a user's perspective. Finally, callbacks in rJava are provided by the optional JRI (Java/R interface) component, which is only available if R is built as a shared library. While this is the case on many platforms, it is not universal and therefore callbacks could not be a guaranteed feature of rscala software if it were based on rJava's JRI.
The discussion of the design of rscala has so far focused on accessing Scala from R. The rscala software also supports accessing R from Scala using the same TCP/IP protocol. This ability is an offshoot of the callback functionality. Since Scala can call Java libraries, those who are interested in accessing R from Scala should also consider the Java libraries Rserve (Urbanek 2019b) and RCaller (Satman 2014). Rserve is also "a TCP/IP server which allows other programs to use facilities of R" (https://www.rforge.net/Rserve). Rserve clients are available for many languages including Java. Rserve is fast and provides a much richer API than rscala. Like rJava, however, Rserve also requires that R be compiled as a shared library. Also, Windows has some limitations such that Rserve users are advised not to "use Windows unless you really have to" (https://www.rforge.net/Rserve/doc.html).
The paper is organized as follows. Section 2 describes using Scala from R. Some of the more important topics presented there include the data types supported by rscala, embedding Scala snippets in an R script, executing methods of Scala references, and calling back into R from Scala. We also discuss how to develop R packages based on Scala. Section 3 describes using R from Scala. In both Sections 2 and 3, concise examples are provided to help describe the software's functionality. Section 4 provides a case study to show how Scala can easily be embedded in R to significantly reduce computation time for a simulation study. We conclude in Section 5 with potential features for future work.

Accessing Scala in R
This section provides a guide to accessing Scala from R. Those interested in the reverseaccessing R from Scala -will also benefit from understanding the ideas presented here.

Installation
The rscala package is available on the Comprehensive R Archive Network (CRAN) and can be installed by executing the following R expression.

R> install.packages("rscala")
The rscala package requires Scala, which itself requires Java. System administrators can install Scala and Java using their operating system's software management system (e.g., "sudo apt install scala" on Ubuntu based systems). Administrators and users can also do a manual installation. To get the currently supported major versions of Scala, use: R> names(rscala::scalaVersionJARs()) [1] "2.11" "2.12" "2.13.0-M5" The simplest way to satisfy these dependencies, however, is with the scalaConfig function: R> rscala::scalaConfig() This function tries to find Scala and Java on the user's computer and, if needed, downloads and installs Scala and Java in the user's~/.rscala directory. Because this is a user-level installation, administrator privileges are not required.

Instantiating a Scala bridge
Load and attach the rscala package in an R session with the library function:

R> library("rscala")
Create a Scala bridge using the scala function: The scala function takes several arguments to control how Scala is run, including options to add JAR files to the classpath and control the memory usage. Details on this and all other functions are provided in the R documentation for the package (e.g., help("scala")).
A Scala session is only valid during the R session in which it is created and cannot be saved and restored through, for example, the save and load functions. Multiple Scala bridges can be created in the same R session. Each Scala bridge runs independently with its own memory and classpath. A Scala bridge cannot be shared across multiple R processes/threads.

Evaluating Scala snippets
Snippets of Scala code can be compiled and executed within an R session using several operators. The most basic operator is the + operator which runs code in Scala's global namespace and always returns NULL. Consider, for example, computing the binomial coefficient n k = k i=1 (n − i + 1)/i. The code below uses Scala's def statement to define the function. The expression 1 to k creates a range and the higher-order map method of the range applies the expression (n -i + 1) / i.toDouble to each element i in the range. Finally, the results are multiplied together by the product method.
Notice the side effect of printing 120 to the console. The behavior for console printing is controlled by arguments of the scala function. Default values are set such that console output is displayed in typical environments.
Scala snippets can also be evaluated with the * operator. Whereas the + operator evaluates in Scala's global namespace and returns NULL, the * operator evaluates in a local block and always returns the result of the last expression: R> choose(10, 3) == s * binomialCoefficient(10, 3) [1] TRUE

Scalar and copyable types
A Scala result of type Byte, Int, Double, Boolean, or String is passed back to R as a length-one vector of raw, integer, double, logical, or character, respectively. We refer to these as the scalar types supported by the rscala package. Further, Scala arrays and rectangular arrays of arrays of the scalar types are passed to R as vectors and matrices of the equivalent R types. We call copyable types those types that are scalar types, arrays of scalar types, and rectangular arrays of arrays of the scalar types. The name emphasizes the fact that these data structures are serialized and copied between Scala and R. This may be costly for large data. Table 1 shows the mapping of Scala and R types using code examples. The example below shows how the Scala and R expressions produce the same result.

Passing data to Scala
It was shown previously that data of copyable types is returned to R when evaluating Scala snippets using the * operator. Conversely, data of copyable types can be passed to Scala snippets. A Scala bridge is represented in R as a function. Arguments passed to a Scala bridge are made available to the associated Scala snippet: The previous example demonstrates using a single named argument, but any number of named or unnamed arguments can be used: R> names <-c("Hannah", "David", "Reinier") R> s(names, convertToUpperCase = TRUE) * + val x = if ( convertToUpperCase ) names.map(_.toUpperCase) else names + x.map { y => y == y.reverse } +

[1] TRUE FALSE TRUE
Note that, for unnamed arguments, the identifiers (e.g., names in the previous example) are used as Scala variable names. Since Scala has different rules for variable names than does R, only the intersection of valid variable names in both Scala and R can be used. For example, use.upper and _useUpper would be invalid arguments to a Scala bridge, the first being an invalid identifier in Scala and the second being invalid in R.
The previous example also illustrates that vectors are typically passed to Scala as arrays (e.g., names in the previous example), except vectors of length one are passed not as arrays but as scalars (e.g., in the previous example, convertToUpperCase is a scalar). If the user wants to ensure that a vector is passed as an array, R's "as-is" function I is used. In the example below, the length of x is random but the Scala code is valid because x is wrapped in I to guarantee that it is passed as an array. R> x <-letters[sample(length(letters), rbinom(1, size = 2, prob = 0.5))] R> s(x = I(x)) * x.map(_.toUpperCase).mkString

Scala references
If the result of a Scala expression is not a copyable type, the * operator returns a reference to a Scala object that can be used in subsequent evaluations. If a Scala reference is desired, even when working with copyable types, use theˆoperator.
In the next example, an instance of the class 'scala.util.Random' is created and, because the result is not a copyable type, a Scala reference is returned. Second, a Scala reference to an array of integers is returned -despite the fact that this is a copyable type -because theô perator is used. Scala references can also be passed as arguments to a Scala bridge: R> s(rng, len = 15L) * rng.alphanumeric.take(len).mkString [1] "Gz8SJOu3tgSeMVR"

Accessing methods and variables of Scala objects
Taking inspiration from rJava's high-level $ operator, methods associated with Scala references can be called directly using the $ operator: As with arguments to a Scala bridge, variables of copyable types and Scala references may be used as arguments when employing the $ operator. If the result of a method call on a Scala reference is not a copyable type, then a Scala reference is returned. If a Scala reference is desired even when working with copyable types, add a dot immediately after the $ operator:

R> rng$.nextInt(10L)
rscala reference of type Int The value of an instance variable may be accessed as if there was a method of the same name taking no arguments. For example, the value self in an instance of 'scala.util.Random' is accessed as: R> rng$self() rscala reference of type java.util.Random In an interactive R session, the rscala package provides rudimentary tab-completion for method names of Scala references.

Other uses of the $ operator
There are several other uses of the $ operator. In the next example, the following are generated with the $ operator: an instance of the class 'scala.util.Random', an instance of a mutable hash map, and a null reference of type String.
R> seed <-123L R> rng <-s$.new_java.util.Random(seed) R> map <-s$".new_scala.collection.mutable.HashMap[String, Double]"() R> nullString <-s$.null_String() Note the use of quotes for the hash map in the previous example. Scala has type parameterization which is similar to (but arguably more advanced than) generics in Java and templates in C++. In many instances, the Scala compiler infers the type parameter, but the user may need or want to explicitly provide it. When using the $ operator, quoting may be needed since the type involves characters that are not allowed in R identifiers (e.g., [ and ]). Likewise, names of Scala methods may not be valid identifiers in R and may also need to be quoted to avoid parsing errors in R. For example, note that the List's append method :+ is quoted here: R> myList <-s$List(1L, 2L, 3L) R> augmentedList <-myList$ :+ (100L) R> paste0(augmentedList$toString(), " now contains 100.") [1] "List(1, 2, 3, 100) now contains 100." The next example shows usage of the $ operator to access a previously defined function (e.g., binomialCoefficient), a method of a companion object (e.g., Array's range method), a factory method of a companion object (e.g., List's implied apply method), and a method of a singleton object (e.g., 'scala.util.Properties's versionNumberString method).

Interfacing with Java
Scala runs on the JVM and since it supports instantiating Java classes and calling object and static methods, the rscala package automatically provides this support as well. For example, we can find the system's time zone through a chain of calls using the standard Java library: R> s$java.util.TimeZone.getDefault()$getDisplayName() [1] "Mountain Standard Time"

Callbacks to R from Scala
When the scala function creates a Scala bridge, an instance of 'org.ddahl.rscala.RClient' is bound to the identifier R within Scala. It is through this instance that callbacks to the R interpreter are possible. The 'RClient' class is thread-safe. Its source code and Scaladoc are located on GitHub: https://github.com/dbdahl/rscala/.
All of the evaluation methods of this class take the same arguments. The first argument is a template for an R expression, where %is a placeholder for items that are provided as variable arguments. The result type is indicated by the suffix of the method name evalXY, where X ∈ {R, I, D, L, S} and Y ∈ {0, 1, 2}. The value of X indicates whether the result from R should be interpreted as raw, integer, double, logical, or character, respectively. The value of Y indicates whether the result should be interpreted as a scalar, an array, or a rectangular array of arrays, respectively. The method evalObject returns a Scala reference to an arbitrary R object which can be passed as an argument to another evaluation method. Several examples are below.

[1] TRUE
A more interesting use case is calling a user-supplied R function from Scala. First, consider an R function that computes f (n, α), the expectation of the Ewens(n, α) distribution, i.e., the expected number of clusters when sampling n observations from a discrete random measure obtained from a Dirichlet process with mass parameter α.
R> f <-function(n, alpha) sapply(alpha, function(a) sum(a / (1:n + a -1))) R> f(100, 1.0) [1] 5.187378 In a Bayesian analysis, the Ewens distribution is a prior distribution in random partition models and α is a hyperparameter. In the prior elicitation process, practitioners may want to find the value of α that corresponds to the expert's anticipated number of clusters. Thus, the task is to numerically solve f (n, α) = µ for α, given fixed values for n and µ. To be specific, suppose n = 1000 and µ = 10. The value α can be obtained using root finding methods.
Here, we demonstrate the bisection method implemented in Scala. Note that the function's first argument, func, is a user-defined R function. [1] 1.443818 The most important aspect of the previous example is in the first line of the Scala snippet, where the evalD0 method calls the R function func and returns the result as a Double.

Speed considerations
Section 4 considers the speed and ease of implementing a simulation study in R, C++ via Rcpp, and Scala via rscala. It is not a comprehensive comparison of the performance of these languages. For that, we refer readers to benchmarks available on the web. Here we simply highlight performance characteristics of rscala itself.
All calls into Scala require compilation before invocation. Subsequent uses of the same code skip the time-consuming compilation due to caching. Consider, for example, two calls to the method nextGaussian of an instance of 'java.util.Random': R> rng_rscala <-s$.new_java.util.Random() R> first <-system.time(rng_rscala$nextGaussian())["elapsed"] R> second <-system.time(rng_rscala$nextGaussian())["elapsed"] R> c(first = first, second = second, ratio = first / second) first.elapsed second.elapsed ratio.elapsed 0.120 0.001 120.000 By way of comparison, rJava provides two means to call the nextGaussian method. Suppose that rngRJava is the result of instantiating an object of class 'scala.util.Random' using rJava. The high-level $ operator of rJava can call this method using rngRJava$nextGaussian(). Alternatively, the rJava's low-level interface provides the .jcall function. The next example and Table 2 compare the speed of rscala's rng$nextGaussian() and rJava's two ways of calling the same method.

Developing packages based on rscala
The rscala package enables developers to use Scala in their own R packages to implement computationally intensive algorithms. For example, the sdols (Dahl and Müller 2019), shallot (Dahl 2019b), and bamboo (Dahl 2019a) packages on CRAN use Scala via rscala to implement statistical methodology of their associated journal articles (Dahl, Day, and Tsai 2017;Li, Dahl, Vannucci, Joo, and Tsai 2014). Readers are encouraged to study those examples in addition to our description here.
An R package based on rscala should include the rscala package in the Imports field of the package's DESCRIPTION file. Also, add import(rscala) to the NAMESPACE file. Typically a package based on rscala will instantiate a Scala bridge in the package's .onLoad function. To make the bridge available to the other functions in the package, the author should assign the bridge to the package environment. The .onLoad function may be as simple as: R> .onLoad <-function(libname, pkgname) { + assign("s", scala(), envir = parent.env(environment())) + } If the package is to access precompiled code from a JAR file, we suggest cross compiling against the major versions supplied by: R> names(rscala::scalaVersionJARs()) [1] "2.11" "2.12" "2.13.0-M5" This is done in part by adding a line to SBT's build.sbt file, like: crossScalaVersions := Seq("2.11.12", "2.12.7", "2.13.0-M5") The JAR files should be copied to directories inst/java/scala-X.XX relative to the package root, where X.XX represent a major version of Scala (e.g., 2.12). The cross compiling and copying of JAR files is automated by the rscala::scalaSBT function. If JAR files of compiled Java code are to be included in the package, they should be placed directly in the inst/java directory of the source package. Finally, to make the JAR file available to the package's R functions, the name of the package should be passed as the first argument to the scala function, e.g.: R> .onLoad <-function(libname, pkgname) { + assign("s", scala(pkgname), envir = parent.env(environment())) + } It is common in the .onLoad function to define global imports, classes, objects, and functions using the + operator. We recommend, however, that this be accomplished through the scalaLazy function to delay the evaluation until necessary. This gives the Scala bridge the chance to start up without blocking R's read-eval-print loop. For example, the .onLoad function of the bamboo package is: R> .onLoad <-function(libname, pkgname) { + s <-scala(pkgname) + scalaLazy(function(s) s + import org.ddahl.bamboo._ ) + assign("s", s, envir = parent.env(environment())) + } Since packages should not leave external processes (in this case, Scala) running when the package is unloaded, the package should close the Scala bridge in the .onUnload function, e.g.:

R> .onUnload <-function(libpath) { + close(s) + }
Finally, a package can piggy-back on another package by using its Scala bridge. For example, the shallot package uses the Scala bridge from the sdols package and registers an additional JAR file using the scalaJARs function. That is, the .onLoad function for the shallot package might be: R> .onLoad <-function(libname, pkgname) { + s <-sdols:::s + scalaJARs(pkgname, s) + assign("s", envir = parent.env(environment())) + } In this case, since the shallot package is not the original owner of the Scala bridge, the shallot package should not call close(s) in an .onUnload function.

Accessing R in Scala
So far we have demonstrated accessing Scala from R. Conversely, rscala can also embed an R interpreter in a Scala application via the 'org.ddahl.rscala.RClient' class. In this case, however, there is not an existing instance of the R interpreter. The R client spawns an R instance, immediately starts the embedded R server, and connects R to Scala.
The 'RClient' class is thread-safe. Source code and Scaladoc are located on GitHub: https: //github.com/dbdahl/rscala/. As a convenience, rscala's JAR file is available in standard repositories for use by dependency management systems. To use 'RClient' in a Scala application, simply add the following line to SBT's build.sbt file: libraryDependencies += "org.ddahl" %% "rscala" % "(VERSION)" where (VERSION) is replaced with the current package version. Note that, since the necessary R code is bundled in the JAR file, the rscala package does not need to be installed in R. An embedded R interpreter is instantiated as follows: scala> val R = org.ddahl.rscala.RClient() This assumes that the registry keys option was not disabled during the R installation on Windows. On other operating systems, R is assumed to be in the search path. If these assumptions are not met or a particular installation of R is desired, the path to the R executable may be specified explicitly (e.g., org.ddahl.rscala.RClient("/path/to/R_HOME/bin/R")). Console output from R is not automatically serialized back to Scala.
The rscala package can be an easy and convenient way to access statistical functions, facilitate calculations, manage data, and produce plots in a Scala application. Consider, for example, wrapping R's qnorm function to define a method in Scala by the same name: The next example uses R's dataset eurodist to compute the European city that is closest, on average, to all other European cities. While this statistical calculation is easily implemented in R, one can imagine a Scala application that needs to perform a more taxing calculation that leverages R's rich data-processing functions.

Europe s central city is Lyons.
Spark, a cluster-computing framework for massive datasets, is another example of a Scala application that might benefit from access to R. Spark provides an application programming interface to Scala, Java, R, and Python. R users who are not already familiar with Scala would be best served by accessing Spark from R using a dedicated package such as sparklyr (Luraschi, Kuo, Ushey, Allaire, and The Apache Software Foundation 2019) or SparkR (Venkataraman, Meng, Cheung, and The Apache Software Foundation 2019). Scala developers, however, might prefer to program directly with Spark's machine learning library (MLlib) in Scala and to supplement its functionality with R through rscala. Recall that every 'RClient' object has its own workspace, so several instances can be used to overcome the single-threaded nature of R. One could, for example, use software to manage a pool of 'RClient' objects on each worker node. One potential limitation is the cost of pushing large datasets over the TCP/IP bridge.

Case study: Simulation study accelerated with rscala
While the previously mentioned sdols, shallot, and bamboo packages demonstrate the ability to develop packages based on rscala, we demonstrate in this section the ease with which computationally intensive statistical procedures can be implemented by embedding Scala code in an R script. The algorithm is embarrassingly parallel and we consider two means of parallelization: one using Scala's 'Future' class and the other using R's parallel package. By way of comparison, we include a pure R implementation of the same algorithm, and also an implementation that uses inline C++ code via the Rcpp package. All four implementations define a function that takes an arbitrary R function for sampling.
We investigate a simulation study of the coverage probability of a bootstrap confidence interval procedure. Consider a population parameter β 1 /β 2 , where β 1 and β 2 are population quantiles associated with probabilities p 1 and p 2 , respectively. Based on a sample of n observations, a point estimator of the parameter is the ratio of the corresponding sample quantiles, and the following bootstrap procedure can be used to find a confidence interval when the population distribution is unspecified. The sample estimate is recorded for each of nSamples bootstrap samples. A bootstrap confidence interval is given by (l, u), where l and u are quantiles of the bootstrap sampling distribution associated with α/2 and 1 − α/2, respectively. Although the nominal coverage is 1 − α, interest lies in computing the actual coverage probability of this bootstrap confidence interval procedure using a Monte Carlo simulation study. nIntervals samples from the population are obtained from a user-supplied sampling function. Although the code is general, we sample n = 100 observations from the standard normal distribution and set p 1 = 0.75 and p 2 = 0.35, making β 1 /β 2 ≈ −1.75. We use nIntervals = 10,000 Monte Carlo replicates, each having nSamples = 10,000 bootstrap samples.
The four implementations are included in the supplementary material. (The code is also available in the package: system.file("doc/bootstrap-coverage.R", package = "rscala")). The R implementation is the shortest and the rscala implementations are somewhat more concise than that of Rcpp. The Rcpp implementation is written in a C style. All but one implementation use the parallel package to harness all available cores; the first rscala implementation uses Scala's 'Future' class for parallelism and, when sampling the data, a single instance of 'RClient' is used by multiple JVM threads to call back to R. On machines with many cores, having each thread wait its turn to access the one R instance will likely slow down the execution. In the second rscala implementation, each CPU core has a separate R instance with a corresponding 'RClient' object.
We tested on machines running Ubuntu 16.04 with 4 and 56 cores, Mac High Sierra with 8 cores, and Windows 10 with 8 cores. R was installed from CRAN binaries for all machines except the 4-core Ubuntu machine, where R was compiled from source. All machines used R 3.5.1, Scala 2.12, Java 8, Rcpp 0.12.19, and rscala 3.2.1.
Elapsed times (in seconds) for 10 replications of the simulation study are found in Table 3.  For the sake of time, the pure R implementation was only run on the 56-core Ubuntu machine. The pure R implementation ran about 23 times slower than the fastest implementation. The Rcpp implementation and the two rscala implementations were similar in terms of speed on the 56-core Ubuntu machine. The second rscala implementation (which uses the parallel package) was the fastest overall on the 56-core machine, and the first rscala implementation shows a performance penalty from sharing a single instance of 'RClient' when many cores are present. On the machines with fewer cores, the first rscala implementation was the fastest and both rscala implementations were somewhat faster than the Rcpp implementation.

Conclusion
This paper introduced the rscala software to bridge R and Scala, which allows a user to leverage their skills in both languages and to exploit strengths in each language. For example, R users can implement computationally intensive algorithms in Scala, write R packages based on Scala, and access Scala libraries from R. Scala programmers can take advantage of R's tools for data analysis and graphics from within a Scala application.
We are exploring possible improvements for our software. First, we are exploring a mechanism to allow the R user to interrupt Scala computations without destroying the TCP/IP bridge. Second, we are exploring support for transcompiling a subset of R syntax into Scala code to avoid the overhead of callbacks from Scala to R. Experimental support has already been implemented. For example, sˆfunction(x = stD1) sd(x) / mean(x) returns a Scala reference of type Array[Double] => Double which computes the coefficient of variation without calling back to R.