Introduction to the Special Volume on "Ecology and Ecological Modeling in R"

The third special volume in the "Foometrics in R" series of the Journal of Statistical Software collects a number of contributions describing statistical methodology and corresponding implementations related to ecology and ecological modelling. The scope of the papers ranges from theoretical ecology and ecological modelling to statistical methodology relevant for data analyses in ecological applications.


Introduction
As one of the special volumes of the Journal of Statistical Software compiled under the heading "Foometrics in R", this special volume collects a number of contributions on statistical methodology and corresponding implementations related to ecology and ecological modelling. This is already the third special volume in this series, following its successful predecessors on "Spectroscopy and Chemometrics in R" and "Psychometrics in R", but the first contribution that files under Foometrics without actually having an explicit "metric" in the title (although the term ecolometrics is used from time to time to describe the field that is the scope of this special volume). Anyway, it fits well in the range of foometrics special volumes, intended to summarise data analysis techniques and mathematical models for different types of sciences "foo" (cf. Eastlake 3rd, Manros, and Raymond 2001, for an etymology of "foo") available in R (R Development Core Team 2007). As reflected by the background of the two co-editors, such a description requires a combination of knowledge of statistical methodology, R, and ecology, and one intention of this special volume is therefore also to promote the communication between statisticians and ecologists.
The scope of this special volume has been chosen to be as broad as possible, ranging from theoretical ecology and ecological modelling to statistical computation and data analysis in ecology. Both aspects are well-represented in the selection of published papers. In the concluding section, some references on additional work that may be of interest to our readership are given.
The linkage between the contributions to this special volume from the diverse fields of ecology is their focus on R as a software framework that enables the rapid yet reliable development and distribution of software for ecologists. As a positive side-effect, R can also serve as an interface that bridges the gap between different ecological schools. Two very nice examples in this special volume are the contributions by Dray and Dufour (2007) and Calenge (2007) which make methods developed in the French school of analyse des données (a special branch of multivariate statistics) available to the non-French speaking ecological community. In particular, the ade4 package also forms a connection to the previous special volume of JSS on Psychometrics, where it is mentioned as one possibility to perform correspondence analyses (de Leeuw and Mair 2007).

Outline of the contributions
The volume starts with an application of R to analyse subfossil remains of organisms preserved in aquatic sediments. The package analogue (Simpson 2007) contains functions to perform modern analogue technique (MAT) transfer functions, which can be used in palaeoecology to reconstruct past changes in the environment, such as lake-water pH or climate change. In addition to this, analogue matching (AM) is concerned with identifying modern sites that are floristically and faunistically similar to fossil samples and can therefore be used to define reference conditions in conservation biology. Taking a related direction, the second paper by Yuan (2007) describes the package bio.infer that implements functionality for inferring environmental conditions from assemblage composition using maximum likelihood predictions.
A group of papers arranged around the French school of analyse des données follows, that embraces various features of multivariate statistics, such as principal components or correspondence analysis. Although the corresponding statistical theory has been available for quite a while, it has been largely overlooked by the (non-French) ecological community due to the lack of non-French references. Dray and Dufour (2007) provide both a review of the methodology, making it available to a larger community, and an introduction to the ade4 package. On top of ade4, a graphical user-interface is available in package ade4TkGUI making it even easier to access ade4 features for first-time users and R novices (Thioulouse and Dray 2007). Finally, as a third ade4 related contribution, Calenge (2007) present the package adehabitat as a further add-on. Besides demonstrating the exploration of habitat selection data using ade4functionality, the paper discusses further adehabitat features such as home range estimation or visualisation tools.
The contribution by Goslee and Urban (2007) introduces the ecodist package for investigating ecological data based on dissimilarities. Various variants of the Mantel test for inspecting the relationship between dissimilarity measures are introduced, in particular partial tests and tests that allow for nonlinear spatial structures.
Jachner, van den Boogart, and Petzoldt (2007) focus on a theme of general relevance when comparing results from ecological simulation models with real data: The assessment of similarity between time series. The paper discusses several deviance measures that allow to ignore location, scale or distance of the measured values as well as exact time and speed or inequality constraints and time continuity. Jachner et al. (2007) provide a systematic overview over dissimilarity measures and the corresponding implementations are made available in the package qualV.
Moving from statistically-oriented papers to papers with focus on ecological modelling, Petzoldt and Rinke (2007) describe an object-oriented approach that allows one to implement and simulate ecological models in a fairly general framework. The relevant infrastructure is included in the package simecol and demonstrated through a series of examples ranging from cellular automata over differential equation models up to individual-based simulations of age-structured populations.
The in-depth analysis of age-structured populations and population models is the focus of the following two contributions. In the first of them Jones (2007) presents his package demogR which includes tools for the construction and analysis of matrix population models. In addition to the standard analyses commonly used in evolutionary demography and conservation biology, demogR contains a variety of tools from classical demography. This includes the construction of period life tables, and the generation of model mortality and fertility schedules for human populations. The tools in demogR are generally applicable to age-structured populations but are particularly useful for analyzing problems in human ecology.
The related package popbio from Stubben and Milligan (2007) is intended to do both, the construction and analysis of projection matrix models from a demography study of marked individuals classified by age or stage. The package covers methods described in Matrix Population Models by Caswell (2001) and Quantitative Conservation Biology by Morris and Doak (2002). The package also includes methods to estimate vital rates and construct projection matrix models from census data typically collected in plant demography studies.
The volume is completed by the very first paper submitted to the special volume, the untb package of Hankin (2007a), that can be used to analyze ecosystem data in the context of Hubbell's striking "neutral theory of biodiversity" (Hubbell 2001). His theory states that observed population dynamics may be explained on the assumption of per capita equivalence amongst individuals. The package provides a comprehensive set of R routines for numerical simulation of neutral ecosystem dynamics, analysis of field data, and visualization of datasets. Moreover, it is noticeable that the untb package circumvents numerical problems with large numbers by using either the logarithmic representation employed by the Brobdingnag package provided by the same author (Hankin 2007b) or alternatively by employing the PARI/GP computer algebra system (Batut, Belabas, Bernardi, Cohen, and Olivier 2006).

Conclusion and outlook
The range of contributions compiled in this special volume gives an impression about the vast potential of the R framework and, even more important, the R community and its open source platform in bringing together high level data analysis and scientific knowledge across the borders of disciplines. We agree that "many biologists do not want to spend the rest of their lives as programmers" 1 , but this is not necessarily required to benefit from the increasing availability of relevant R packages. Typically it is sufficient to be an attentive reader and to have a thorough understanding of the methodological background to use R packages on a specific problem.
As already mentioned in the introduction, this special volume is also intended to promote interaction between ecologists and statisticians. This is particularly important since statistical methods are evolving continuously, and according to Hobbs, Twombly, and Schimel (2006) "reliance on traditional methods presents a major barrier to understanding fundamental ecological patterns and processes". For example, recent developments supplement traditional H 0 -based tests by likelihood-based model-selection techniques (Hobbs and Hilborn 2006;Johnson and Omland 2004) and, as demonstrated in this volume, special procedures are tailored explicitly to address ecological questions.
Of course, this special volume is not intended to be the exclusive source for employing R in ecology and there are already numerous books and online resources available demonstrating both power and practical use of R for ecological applications. There are new upcoming books (e.g. Bolker 2007), new manuals and online-tutorials, e.g. about the vegan package for multivariate statistics (Oksanen 2007) or the use of R in plant pathology (Sparks, Esker, and Garrett 2007), and new software packages for microscopic image analysis built on top of R (Grosjean and Denis 2007), to mention only a few. Almost surely, the future will see several additional exciting new developments and possibilities for utilising R in ecology.
Much thought has already been spent on the question of why R is so successful (Narasimhan 2004;Tierney 2005;Mullen and van Stokkum 2007) and de Leeuw (2005) argued that the reasons are not solely technical. But why is the user base so large and why do we have more than 1140 contributed packages today? We are certain that one of the main non-technical reasons is the culture applied for the development of scientific open source software: an active community interested primarily in research results, publication credits and exchange of ideas. Everything is transparent and everything can be reproduced without unreasonable financial and technical demands. Documentation, reproducibility and hands-on experience are key factors to understand the increasingly complex ecological data analysis ourselves. Rigorous testing, user feedback and continuous improvements are required to establish credibility and, finally, transparency and reliability should help to convince environmental decision makers.