%PROC_R : A SAS Macro that Enables Native R Programming in the Base SAS Environment

In this paper, we describe %PROC_R , a SAS macro that enables native R language to be embedded in and executed along with a SAS program in the base SAS environment under Windows OS. This macro executes a user-deﬁned R code in batch mode by calling the unnamed pipe method within base SAS . The R textual and graphical output can be routed to the SAS output window and result viewer, respectively. Also, this macro automatically converts data between SAS datasets and R data frames such that the data and results from each statistical environment can be utilized by the other environment. The objective of this work is to leverage the strength of the R programming language within the SAS environment in a systematic manner. Moreover, this macro helps statistical programmers to learn a new statistical language while staying in a familiar environment.


Introduction
As the number of data analysis languages and platforms rapidly grows, it is becoming increasingly desirable for programmers to call a language from a different environment. Numerous efforts in this direction have empowered statistical analysts with the ability to use their favorite language within their operating environment. The R project, a leading open-source statistical computing language and environment (R Development Core Team 2011b), has packages to integrate native C++ (Eddelbuettel and Francois 2011) and SQL code (Grothendieck 2011). Python programmers have developed various methods that allow rich statistical collections of the R project to be accessible within the Python environment (Xia, McClelland, and Wang 2010). In the proprietary arena, SAS is a widely used system for data management and statistical analysis. PROC SQL is SAS's product to enable SQL programmers to efficiently perform data manipulation within the SAS platform without much base SAS knowledge (SAS Institute Inc. 2004). In light of increasing momentum of R in statistical computing, the SAS institute recently released the interface between R and SAS/IML (Wicklin 2009), the latter being a specialized SAS module oriented towards matrix computation. However, base SAS, which is the foundation of all SAS modules and has a much bigger user community than SAS/IML, lacks a well documented and user-friendly integration for R programming. A programming interface of base SAS and R is highly desirable because base SAS and SAS/STAT users frequently need to access the novel statistical/visualization tools that have been developed in the open source R project and not yet been implemented in SAS. In this paper, we present a SAS programming interface in which R code can be written and executed within a SAS session and the resulting textual and graphical output of R is accessible from a SAS terminal. This SAS macro constructs a functional R script based upon a built-in R function and a customer-defined R script. Upon submission, the R code is executed via a batch mode session by a SAS pipe. From the pipe, textual and graphical R output can be rendered to the proper SAS terminal. We name this interface %PROC_R in the same spirit as the commercialized SAS products such as PROC SQL.

Interface
sqldf (Grothendieck 2011) is an R package from which SQLite syntax can be submitted in R code to manipulate R data frame. The following code demonstrates the select clause of SQL is used in R environment to compute summary statistics of R data frame testdata. sqldf(" select avg ( value ) mean from testdata group by treatment") Similarly, the syntax of PROC SQL from base SAS is as follows: proc sql; Select mean ( value ) as mean from testdata group by treatment; quit; We design our SAS/R interface in the similar fashion to sqldf and PROC SQL. Following a regular SAS program, a R script can be written and embedded in the following approach: %include "C:\Proc_R.sas"; %Proc_R (SAS2R =, R2SAS =); Cards4; ****************************** ***Please Enter R Code Here*** ****************************** ;;;; %Quit; Following a chunk of regular SAS code, %include statement is invoked to execute the source code of %Proc_R that prepares the proper environment in the current SAS session for R code construction and execution. Note that the macro %PROC_R has two macro variables SAS2R and R2SAS as input parameters that define the data transfer mechanism between SAS and R. SAS2R specifies the names of SAS datasets to be used in R session after being converted to R data frame. R2SAS defines the R data frames resulted from the R session that needs to be converted to SAS dataset because they may be needed in the remaining SAS session for additional analysis. Next we use four examples to demonstrate how the workflow of SAS and R could be seamlessly glued together by this interface.

Examples
3.1. Compute eigenvalue and eigenvectors for a numeric SAS dataset R is a matrix oriented language while the base SAS is designed more toward the handling of tabular dataset. In this example, we first use SAS to create a SAS dataset "test" which mimics a 10 X 10 numeric matrix. In the %PROC_R macro argument, we specify this test SAS dataset as the input data for R session. This SAS dataset is converted to an R data frame on the backend of %PROC_R. Since data frame is the default R object to which SAS dataset is converted, one needs to explicitly convert the data frame to a matrix object in the user-defined R script. Finally eigen() function of R is invoked to compute the eigenvalue and vectors. The R log and result page are displayed on the SAS output window (Figure 1).

Creation of R animation in base SAS environment
R wiki page provides a simple piece of code using the caTools package (Tuszynski 2011) that produces a flashy animation for Mandelbrot set (Tuszynski 2010). This code can be executed from %PROC_R interface without any modification and the resulting gif animation can be viewed from SAS result viewer ( Figure 3).

Programmatically integrate SAS and R output
In this example, I demonstrate that an R graphs is first produced by %PROC_R interface and then seamlessly written into an HTML report created by SAS ODS. The following ODS statement produce a sample HTML report: ODS HTML FILE="c:\example.html" STYLE=minimal gpath= "c:\" GTITLE GFOOTNOTE; proc print data=sashelp.company(obs=10); run; ods _all_ close; What follows is the R code that we copied from Ruckstuhl and Locher (2011), using packages IDPmisc  and SwissAir (Locher 2011), and submitted in the %PROC_R interface.

Implementation
The %PROC_R program accomplishes R analysis in the base SAS via six steps: Write a draft R program based on user supplied R code and system built-in R function.
The draft R script is interrogated and modified by SAS macro so that it can be executed in SAS environment The data exchange mechanism between SAS and R is implemented in both SAS code and R code.
The completed R script is submitted to a backend R session via batch mode by base SAS pipes.
The automatically produced text file containing R statistical output and log is transformed to SAS data and printed on the the base SAS output window.
The R graphics is either explicitly or implicitly saved to a default or customer specified directory and routed to SAS graph window or result viewer via HTML format.

R code construction
This SAS program enables users to enter R script via infile and cards4 statement in SAS data step. %PROC_R is a very simple macro that only specifies that R code is saved at the default SAS workspace and provide the first several lines of data step. Cards4 (or datalines4) statement is used here because it allows semicolon as the part of input data when semicolon is recognized as the end of statement by both SAS and R. The four consecutive semicolons at the end mark the end of user define R scripts. While this manuscript is under development, we noticed that this technique has also been discussed in some techinical blogs (Xie 2011). The run statement of this data step is provided in the %Quit macro which is the main body of this interface and will be described in more details below.

R code refinement
In the %Quit macro, SAS program will inspect the resulting R program to determine what change needs to be made in order for the R program to be submitted from SAS environment.
First of all, the users are encouraged to define their own user space to store the log, graphics and intermediate datasets by setwd() function. If setwd() is not detected in user defined R code or is commented out, SAS program will add one line of setwd() function in the R code that set the R workspace to the current SAS temporary directory. Secondarily, if users do not explicitly write the graphics as a permanent file and instead intend to view them on fly, SAS program would add the lines to start R graphics device, save the figure and close the device. Later I will show how the saved R graphics can be viewed from SAS end.

Data exchange between R and SAS using CSV as intermediate format
We made conscious decision to make the macro arguments as simple as possible in order to allow less experienced SAS user to utilize this interface when they need to operate in SAS environment. The only arguments for this macro are to define the SAS and R datasets that are to be utilized by their counterpart. One or multiple SAS dataset names can be assigned to macro variable SAS2R. The presence of these SAS dataset names triggers a compilation of SAS do loop to sequentially write the listed SAS datasets to CSV files saved under user specified folder or SAS temporary directory. Then the proper R function like read.csv will be added to the current R codes to read these tab delimited files into R data frames with the same names. Therefore, users can directly call the corresponding R data frames using the SAS names defined in SAS2R argument without any change. Finally, if the value of macro variable R2SAS is not null, then its value, a list of names for R data frames, will be written into csv files by write.csv function which will be appended to the end of R codes. These csv files from R data frames are again read into SAS so that they can be used in the remaining the SAS program. Note that the naming convention for SAS datasets and variables need to be enforced throughout this process because not all R data or variable names are legitimate in SAS.

R submission in batch mode via pipe
Unnames pipe is a powerful SAS tool that execute a foreign program outside of SAS. Here we used it to execute the batch mode R scripts in parallel to the current SAS session (Wei 2009; R Development Core Team 2011a). As shown below, this pipe defines both the source R codes and export output/log file in ASCII format. NOXWAIT options force SAS session to suspend until the completion of the piped R process. I create a SAS macro variable &rpath to store the full path for R executable files. Currently, the paths for various R versions from 2.11.0 to 2.14.0 are provided so that users could easily adopt this macro to their own R version.

Display of R log and graphics on SAS terminal
One primary motivation for SAS user to call R function from within SAS environment is probably the state-of-art graphical capability of R. Therefore, it is priority of this interface to have a mechanism that channels the saved R graphics back to SAS session so that users can conveniently view the graphical output in real time. This can be achieved in two ways. First is to use the GSLIDE procedure to display the saved R figure on the SAS graph editor as shown in the following code: goptions reset = all device = win iback = "&rdirec\&fgname" imagestyle = fit; proc gslide; run; quit; (Holland 2008) proposed another rather creative solution that is to create an HTML file with the link to the saved R graphics. The following SAS code creates an empty HTML, adds a link to the saved R graphics and render the HTML file to SAS result viewer. This means that one does not even need SAS/GRAPH installed to produce high quality graphics in SAS environment. We adopt this approach in our implementation. However, I do notice rare occasions where the graphics is not properly displayed in HTML page due to the excessively long string of file path and name particularly when the R results are stored in SAS temporary workspace. The remedy of this problem is to use setwd function in R script to save R result to a user defined destination with short path or simply SAS graph viewer instead of HTML reviewer to display R graphics.

File management of SAS/R working space
The source R script, R log message and R graphical output are all saved under SAS temporary folder unless users defined otherwise. It is preferable to assign session specific names to these files so that the code and output from previous session would not be mistakenly called by the current session which may abort and produce no result due to syntax error. This can be achieved by appending a 12 digit number representing SAS system time to the end of file name. In one SAS session the users may explicitly write graphics to a permanent file or let SAS program do that on the backend. Thus the name of graphics could be either user defined or system assigned. In order to intelligently determine what the proper file is to visualize from SAS end, we again use SAS pipe function to scan the working space of the current SAS/R session. The name of files in the directory is listed in a SAS dataset according to the descending order of the creation time, as demonstrated by the following code (Wei 2009).

Discussion
Nowadays data analyst very often needs access to various tools/languages to accomplish complex task. Perl and Python is widely used to pre-process dataset prior to the advanced statistical analysis and visualization in R. Many SAS users are also very keen to learn R language in order to test novel statistical methodology developed in open source R that is outside the horizon of SAS/STAT. It is not rare for a multilingual data analyst to perform some analysis in one environment, save the intermediate datasets and continue the rest of analysis in a different environment. However, it would be difficult to keep track of analysis workflow if multiple computing environments are involved. Switching platform also adds to difficulties for others to repeat one's research. In this work, we developed the SAS macro that allows R analysis to proceed along with SAS program without moving out of SAS environment. We also show the way to integrate R analysis result into a SAS statistical report programmatically. There are several issues worth of note. First of all, it is not possible to embed this interface into other SAS macro because cards or datalines statement for R code input is not allowed within a SAS macro. We also believe that it is a good practice to keep SAS and R code block clearly separate even when they are in one program. Secondly, the whole chunk of R scripts written by users will be executed from start to end in one R batch mode. It is not feasible to test the R code piece by piece in this interface because the intermediate results would be lost at the end of each batch mode. Therefore this interface is suitable to test or run a small task or execute a large project that has been well developed in a more suitable R interactive development environment.