WHY WRITE STATISTICAL SOFTWARE? THE CASE OF ROBUST STATISTICAL METHODS

A BSTRACT . Robust statistical methods are designed to work well when classical assumptions, typically normality and/or the lack of outliers, are violated. Almost everyone agrees on the value of robust statistical pro-cedures. Nonetheless, after more than 40 years and thousands of papers, few robust methods were available in standard statistical software packages until very recently. This paper argues that one of the primary reasons for the lack of robust statistical methods in standard statistical software packages is the fact that few developers of statistical methods are willing to write user-friendly and readable software for the methods they develop, regardless of the usefulness of the method. Recent changes in academic statistics make it highly desirable for all developers of statistical methods to provide usable code for their statistical methods.

The author thanks Jan de Leeuw and Nicholas Cox for advice that significantly improved the paper. use. They have numerous excuses for not writing more user-friendly code.
An incomplete list includes: • Code isn't publishable.
• Software isn't fundable and doesn't help your career.
• Writing code is no fun.
• You don't want to embarrass yourself with your poor programming skills.
• No one will use it.
• It takes extra time.
The remainder of this paper addresses these excuses, discusses the case of robust regression and concludes with further comments.

CODE ISN'T PUBLISHABLE
That is no longer true. In particular, as this article shows, statistical software can be published in the electronic Journal of Statistical Software (JSS) which can be found at www.jstatsoft.org. Abstracts are published in the Journal of Computational and Graphical Statistics (JCGS). Quoting from the JSS website, JSS will publish: (1) Manuals, user's guides, and other forms of description of statistical software, together with the actual software in human-readable form.
(3) Special issues on topics in statistical computing.
(4) A yearly special issue documenting progress of major statistical software projects.
(5) Reviews and comparisons of statistical software.
(6) Reviews of books using statistical software. (Recently added.) The typical JSS paper will have a section explaining the statistical technique, a section explaining the code, a section with the actual code, and a section with examples. All sections will be made browsable as well as downloadable. The papers and code should be accessible to a broad community of practitioners, teachers, and researchers in the field of statistics and beyond.

SOFTWARE ISN'T FUNDABLE AND DOESN'T HELP MY CAREER
In reality, it is theoretical research in statistics that rarely gets funded. For example, in the United States, the National Science Foundation (NSF) wants "useful and innovative" to quote an NSF Statistics and Probability Program Director. "Useful" can be taken in several ways, but clearly user-friendly code helps argue for usefulness. If no one can use the method, then it could be argued that the method isn't useful. A great way to show that statistical methods are useful is to have collaborators who can apply your methods.
The advantage of collaborators includes the fact they have "real" problems, thus at least doubling your funding options. They also at least double your publication options. These collaborators will also support you and your department. "Innovative" means that NSF, and presumably other funding agencies, are not overly interested in funding incremental steps forward in any particular area. They would much rather see innovative approaches to new problems. Of course, innovative does not mean crazy. One must still convince the funding agency that the project is doable. User-friendly code is one excellent way to do that.
Historically, many junior academics thought that the best way to get funding was to collaborate with a funded senior colleague. Although it occasionally works, there are serious disadvantages of this strategy. Many funding agencies give priority to junior, not senior, faculty. Having a senior colleague as a collaborator may actually decrease the likelihood of funding. More importantly, when junior academics are evaluated, there is a tendency to think that the junior faculty member got the grant because he or she collaborated with a senior colleague.
Other funding agencies want medical applications or military applications.
Often junior faculty are discouraged from being Co-Principal Investigators (Co-PIs) on such grants because "it's just consulting" and the academic should be "writing papers instead." If this consulting takes away from research, this point is well taken. On the other hand, this funding can be used to reduce teaching loads, thus providing more time for research. Consulting papers also provide breadth to one's resume. Being a Co-PI also often leads to interesting real research problems, for which the statistician can be the PI.

WRITING CODE IS NO FUN
Writing code may not be fun, but seeing researchers use your methods is lots of fun! If your method is useful, researchers will use it if they know about it and have the tools. Statistical software applications must be published in a variety of journals in order to have an impact.

I DON'T WANT TO EMBARRASS MYSELF WITH MY POOR PROGRAMMING SKILLS
What counts is code that works, not how pretty it is or even how fast it is.
Obviously, making code readable prior to publication is important, but it is far more important to get useful code out so it can be used. In fact, it seems far more valuable to spend time on robustifying the code so it crashes less frequently, then making the code pretty.

NO ONE WILL USE MY SOFTWARE
No one has a chance to use your method if you don't provide code, which is worse than no one using the code at all. If you write code in a readable and user-friendly way, and the statistical method is itself useful, my experience is that several groups will be interested. Researchers interested in comparing their new method with yours will use it (and reference it). Anyone reading about your method in a methods journal will be thrilled to have a chance to try the method on their data. They will reference your papers and code for others to use. User-friendly code that stands the test of time may then be incorporated into standard statistical packages.

WRITING CODE TAKES EXTRA TIME
It is true that writing user-friendly readable code take more time than code that can only be used by its author. Since code must be functional in order for authors to run simulations and examples, it is frequently only a bit more effort to make the code user-friendly. The availability of R to everyone makes it a good platform, other factors being equal.

EXAMPLE: THE CASE OF ROBUST REGRESSION
Multiple linear regression is one of the most used of all statistical methods. How should you proceed? First, you have to find out if the idea is right.
There are two choices, start with the theory or the simulations. If you start with the theory and the idea is wrong, you are going to waste a lot of time trying to prove something that isn't true. It's better to start with simulations in either SAS, Matlab, S-PLUS, R, or other vectorizable software. If the idea works, work on the grant application and the proof. The simulations convince the funding agency that the idea is good. The theory will follow with appropriate assumptions. Once the theory and simulations are done, use your collaborators' data and submit it to a statistical methods journal.
While that paper is under review, work on user-friendly software for the method. Submit the code and documentation to the Journal of Statistical Software. In the meantime, your collaborator is writing a methods paper for their field showing off this new method. One good idea, three publications and a grant. How could life be any better?

CONCLUSION
Reviewing the excuses for not writing usable code found in the Introduction to this paper, it is clear that most of them are baseless: • Code is publishable.
• Software is fundable and helps your career.
• Writing code is fun, especially when others use it.
• Your programming skills may be better than you think.
• Everyone will use useful code.
• It takes little extra time.

DEPARTMENT OF STATISTICS, UNIVERSITY OF KENTUCKY
E-mail address: astro@ms.uky.edu