Rcapture : Loglinear Models for Capture-Recapture in R

This article introduces Rcapture , an R package for capture-recapture experiments. The data for analysis consists of the frequencies of the observable capture histories over the t capture occasions of the experiment. A capture history is a vector of zeros and ones where one stands for a capture and zero for a miss. Rcapture can ﬁt three types of models. With a closed population model, the goal of the analysis is to estimate the size N of the population which is assumed to be constant throughout the experiment. The estimator depends on the way in which the capture probabilities of the animals vary. Rcapture features several models for these capture probabilities that lead to diﬀerent estimators for N . In an open population model, immigration and death occur between sampling periods. The estimation of survival rates is of primary interest. Rcapture can ﬁt the basic Cormack-Jolly-Seber and Jolly-Seber model to such data. The third type of models ﬁtted by Rcapture are robust design models. It features two levels of sampling; closed population models apply within primary periods and an open population model applies between periods. Most models in Rcapture have a loglinear form; they are ﬁtted by carrying out a Poisson regression with the R function glm . Estimates of the demographic parameters of interest are derived from the loglinear parameter estimates; their variances are obtained by linearization. The novel feature of this package is the provision of several new options for modeling capture probabilities heterogeneity between animals in both closed population models and the primary periods of a robust design. It also implements many of the techniques developed by R. M. Cormack for open population models.


Introduction
The goal of a classical capture-recapture experiment is to study the demographic characteristics of an animal population. It is carried out by capturing animals, marking them with an

Closed populations
A population is said to be closed if no mortality nor immigration can occur within the population. Hence, the size of a closed population, noted N , does not vary during the experiment. This assumption is reasonable for capture-recapture experiments held over a short period of time. To estimate this population size, a model is fitted to the data. Following Otis, Burnham, White, and Anderson (1978), the model can incorporate up to three sources of variation among capture probabilities: a temporal effect (subscript t), a heterogeneity between units (subscript h) and a behavioral effect (subscript b). A temporal effect causes the capture probabilities to vary among capture occasions; heterogeneity causes the capture probabilities to

Analysis Steps
Rcapture function Fits various loglinear models for a closed population: M0, Mt, Mh (Chao,Poisson2,Darroch), Mth (Chao,Poisson2,Darroch), Mb and Mbh Fits a loglinear model given a design matrix mX closedp.mX Fits a Mh or Mth model for which the form of the column for heterogeneity in the design matrix is determined by the user Produces descriptive statistics for capture-recapture data Produces fit statistics concerning the ui, i.e. the numbers of first captures at each capture occasion, for the models in closedp.
Applies a bias correction to the abundance estimations obtained by the closedp function Computes the multinomial profile likelihood for the adundance of some closed population capture-recapture models plot.descriptive

boxplot.closedp
Produces boxplots of the pearson residuals of the loglinear models in closedp. The analysis of data from a closed population capture-recapture experiment amounts to finding the best fitting model and estimating the population size from the chosen model. Here we propose steps to follow for such an analysis. Figure 1 schematizes these steps and links them to relevant functions of the Rcapture package. The first step is to explore the data with descriptive statistics. This helps to identify the factors associated to the variability of the capture probabilities. Next, several models are fitted and compared based on standard criteria such as the deviance of the model and the AIC. Ultimately, a model is chosen and the population abundance N is estimated from this model. The following paragraphs describe the Rcapture functions associated to each steps.
First note that a capture history will be expressed as a t × 1 vector ω = (ω 1 , . . . , ω t ), where ω j = 1 if the unit is captured at the jth occasion and 0 if not. There are two accepted formats for a capture-recapture data set in the package Rcapture. The first one is an R matrix or data frame whose rows are the capture histories of each animal caught. The number of columns in the data matrix is then the number of capture occasions in the experiment (noted t). In the alternative format, the data matrix contains one row per capture history followed by its frequency. In that case, it has t+1 columns. The first t columns identify the capture histories. They must contain only zeros and ones. In Rcapture functions, the format of the data set is specified with the dfreq argument. This argument is set to FALSE for the first format; it is set to TRUE for the alternative format. The function histpos.t generates the (2 t − 1) × t matrix of the observable capture histories in a capture recapture experiment; it also defines the order of the capture histories for the Poisson regression. This order is relevant when using the closedp.mX function and the keep option of the openp functions.

Descriptive Statistics
The descriptive function of the Rcapture package computes basic capture-recapture frequency statistics. It displays, for i = 1, . . . , t, the number of units captured i times (f i ), the number of units captured for the first time on occasion i (u i ), the number of units captured for the last time on occasion i (v i ) and the number of units captured on occasion i (n i ). If the n i statistics vary among capture occasions, there is a temporal effect. The descriptive function also gives the m-array matrix, which contains recapture frequencies for units released on each occasion.
An interesting tool to explore a possible heterogeneity in the capture probabilities are the graphs of log f i / t i and log(u i ) versus i generated by the plot.descriptive function. Table 1 gives the form of the two graphs for some models. Some elements of Table 1 are easy to justify. For model M 0 , the number of captures follows the Binomial(t, p) distribution and the number of capture occasions before the first capture follows the Geometric(p) distribution, where p is the capture probability of a unit at any capture occasion. This latter result also holds under model M b . So, in these cases, where N is the population abundance we want to estimate. Therefore, the graphs produced by plot.descriptive are linear. Moreover, Rivest (2007) shows that the f i graph should be concave downward when there is a temporal effect. This effect is typically small and the graph of the f i stays almost linear for model M t . Furthermore, from the work of Lindsay (1986) on mixing distributions in an exponential family, the f i graph for model M h and the u i graph for models M h and M bh should be convex, up to sampling errors. The shape of the f i graph for model M th depends on the relative importance of the temporal effect and the heterogeneity. So the plot.descriptive function can bring out heterogeneity among capture probabilities in a data set through graphs with a convex shape.

Models fitting
The main Rcapture function for fitting a model to a closed population data set is closedp.
It fits M 0 , M t , M h , M th , M b , and M bh through Poisson regressions. Since M tb and M tbh do not have a loglinear form, closedp does not produce abundance estimations for these models. All models are fitted using the glm function; it produces maximum likelihood estimates of the loglinear parameters. The maximization is done through an iteratively reweighed leastsquares algorithm which is simple and numerically stable. An estimate of the population size N is then derived from the loglinear parameters.
The estimator of N is obtained by maximizing a Poisson loglikelihood. Cormack and Jupp (1991) showed that this Poisson estimator is almost identical to the conditional multinomial estimator. A variance, valid under multinomial sampling, is derived in Sandland and Cormack (1984). It is given by var m (N ) = var p (N ) − N where subscripts m and p refer to multinomial and Poisson sampling.
To illustrate the use of a loglinear model in a closed population experiment, let's detail the case of model M 0 . This is the simplest model; it has a single capture probability p common to all units, at every capture occasion, which does not change after a first capture. For an experiment including t capture occasions, 2 t − 1 capture histories ω are observable. The probability for a unit to experience a capture history ω is Pr where ω j is the number of times the unit is caught. Therefore, the expected number of units in the population having capture history ω is µ ω = N (1 − p) t− P ω j p P ω j . This expected frequency can be reexpressed in the form of a loglinear model as Thus, model M 0 is fitted in closedp by fitting a loglinear model E(Y ) = exp(Xβ) with Y equal to the (2 t − 1) × 1 vector of the observed frequencies n ω (including zero frequencies), X is a (2 t − 1) × 2 design matrix with a first column of ones and a second column defined by ω j , and β = (γ, β) t . Then, the abundance is estimated asN = n + exp(γ) where n is the total number of units caught during the experiment. This is indeed an estimator of the population size because exp(γ) = exp(log(N (1 − p) t )) = N (1 − p) t = N × Pr(ω 0 ) = µ 0 where ω 0 is the unobservable capture history of zero capture and µ 0 is the expected number of units never captured. A loglinear presentation of the other models fitted by closedp can be found in Rivest and Lévesque (2001) and Rivest and Baillargeon (2007). Note that in closedp model M b is as presented in (Cormack 1989) while M bh allows the probability of first capture at occasion 1 to differ from the probability of first capture after occasion 1. It is suitable when the u i plot of descriptive is linear except for occasion 1.
The Rcapture package specializes in modeling heterogeneity. The closedp function suggests three types of models for M h and M th : Chao, Darroch and Poisson2. Chao's models estimate a lower bound for the abundance. The estimate obtained under M h Chao is Chao's (1987) moment estimator. Rivest and Baillargeon (2007) exhibit a loglinear model underlying this estimator and provide a generalization to M th . Some loglinear parameters of Chao's models, the η parameters, should theoretically be greater or equal to zero. So when the argument neg of the function closedp is set to TRUE (the default), negative η parameters are fixed to zero. For Darroch's models, a column defined as ( ω j ) 2 /2 is added to the design matrix for either M 0 or M t . These models for M h and M th are considered by Darroch et al. (1993) andAgresti (1994). For Poisson2 models, the column for heterogeneity in the design matrix is 2 P ω j − 1. For these two models, the logits of the individual capture probabilities are assumed to be random variables. These variables are distributed according to a mixed normal distribution under Darroch's model or to a mixed Poisson distribution under a Poisson model. Details can be found in Rivest and Baillargeon (2007). The Poisson model typically yields smaller corrections for heterogeneity than Darroch's model since the capture probabilities are bounded from below under this model.
In addition to Chao, Darroch and Poisson2 heterogeneity models, other M h and M th models can be fitted with the closedp.h function. This function can fit general Poisson models with heterogeneity columns equal to a P ω j − 1. When a is large, the Poisson estimator is close to the one obtained under models M 0 or M t and as a goes to 1, the Poisson estimator becomes close to Darroch's estimator. The family of Poisson estimator for M h and M th provides a wide range of corrections for heterogeneity. The function closedp.h can also fit models with the form of the column for heterogeneity in the design matrix defined by the user. For the log-gamma model of Rivest and Baillargeon (2007), this column is − log(λ+ ω j )+log(λ) for some λ > 0. Rcapture also features a function, closedp.mX, that has a user defined design matrix. closedp.mX allows, for instance, fitting a model with an interaction between two capture occasions. Adding interactions between successive occasions results in a trap effect because the probability of being captured at occasion i depends on the capture at occasion i − 1. The function closedp.mX estimates the population size asN = n + exp(γ), whereγ is the estimated intercept. Therefore, it is not suited for models with behavioral effects.

Model selection
When several models have been fitted, they must be compared and one has to be selected. The functions closedp, closedp.h and closedp.mX generate deviances, degrees of freedom and Akaike Information Criteria (AIC). These statistics are useful tools to compare models and to assess the goodness of their fit. Under the assumption of a good fit, the deviance of a model follows a χ 2 distribution with the model's degrees of freedom. Also, likelihood ratio tests can be constructed to compare nested models and a smaller AIC indicates a better model. Note however that for model M h Chao and M th Chao a small deviance means that there is a heterogeneity in capture probabilities; it does not mean that the lower bound estimates calculated for these models are unbiased.
The fit of a model can also be judged through its residuals. The functions boxplot.closedp and boxplot.closedp.custom produces boxplots of the Pearson residuals for the different fitted models. These graphs bring out badly fitted data.
Rcapture also contains a function aimed at studying the model's fit from the u i statistics. The uifit function focuses on what is most important to model accurately, i.e. the number of new captures at each occasion. It displays the observed u i statistics and the u i predicted by each model in closedp. It also forecasts the u i for 5 additional hypothetical capture occasions for models M 0 , M h Poisson2 , M h Darroch and M b . The predicted and observed u i -statistics are compared using χ 2 statistics. Moreover, the mean and variance of the day of first capture are calculated with the predicted u i for each model. All these statistics generated by uifit are further tools to assess the fit of a model.

Abundance estimation
The functions closedp, closedp.h and closedp.mX give an estimate for the abundance and its standard error. For small samples, the estimation can be improved by a bias correction. The function closedp.bc performs, for the models in closedp, a bias correction through frequency modifications as presented in Rivest and Lévesque (2001) and Rivest and Baillargeon (2007). These frequency modifications also stabilize the the standard errors estimates forN .
Abundance can also be estimated through confidence intervals. A naive 100(1−α)% confidence interval assuming asymptotic normality isN ± Z α/2 se(N ) . Better confidence intervals are obtained using a profile loglikelihood. This can be done with the profileCI function which follows the methodology of Cormack (1992). This function calculates the value of N that maximizes the multinomial likelihood. It also plots the the profile likelihood for N and calculates a 100(1 − α)% profile likelihood confidence interval. It works for every model fitted by closedp, closedp.h or closedp.mX, except models M b and M bh .

Snowshoe hare example
We now fit closed population models to the snowshoe hare data considered in Cormack (1989) and Agresti (1994). This data set is included in the Rcapture package. It has the default format, i.e. each row represents the capture history of one animal. Hence, the argument dfreq of Rcapture functions doesn't have to be specified as it is set to FALSE by default.
R> data("hare") R> desc <-descriptive(hare) R> plot(desc) R> closedp(hare) The f i plot of the function descriptive in Figure 2 shows that the two animals caught on all occasions create some heterogeneity in the capture probabilities. Therefore, it is not surprising that the best fitting model is heterogenous. Indeed, the model with the smallest  . The estimate for M th Darroch is equal to that reported in Agresti (1994).
Another approach to take care of the heterogeneity would be to remove the 2 hares caught 6 times, as Cormack (1989) did. With Rcapture, the best way to discard these hares is to add a column to the design matrix for M t taking the value 1 for the capture history (1, 1, 1, 1, 1, 1) and 0 otherwise. The upper bound of the confidence interval for N depends on the interpretation given to the two hares caught at all occasions. It is large when they are assumed to be associated with a small heterogeneity in the capture probabilities. It is small when the two trap happy hares are assumed to be unrepresentative of the unsampled part of the population.

HIV example
We now analyze epidemiological capture-recapture data on HIV in Abeni et al. (1994). The capture histories are obtained by linking the records of four reporting centers in Rome, Italy. The data set's format is the alternative one, i.e. each row represents an observed capture history followed by its frequency. Therefore, the argument dfreq of the Rcapture functions has to be set to TRUE. The function descriptive shows that 1774 out of 1896 individuals (94%) appear on one list only. The f i plot in Figure 3 is linear showing that heterogeneity is not a problem; the u i plot is not interpretable since it depends on the arbitrary ordering of the 4 centers. The model with a time (or a list) effect and the six possible pairwise dependencies between lists is fitted.

Meadow vole period 3 example
The last closed population example concerns the third primary sampling period of the meadow vole data set presented in Chapter 19 of Williams, Nichols, and Conroy (2002). The data is in columns 11 to 15 of the data set mvole included in the Rcapture package. The complete data set will be analyzed with a robust design model in Section 4.1. Descriptives statistics are not presented here, but they suggest that heterogeneity is present in the data for the third period. There is not much ground for discriminating between the M h Poisson2 and the M h Darroch estimator; still the M h Poisson2 predicted values for u i are somewhat closer to the observed u i than those for M h Darroch . One can also wonder whether to predict that 1.86 new unmarked animals will be caught on a hypothetical 10th day of capture is realistic. In the model selection process, it might also be useful to look at the models' Pearson residual.
The boxplots in Figure 5

Open populations
Open population models apply when animals are released and recaptured or resighted at future capture occasions. Typically the capture occasions are distant in time and mortality occurs between them. When the animals released are not a random sample of the animals in the population at a given capture occasion, the analysis focuses on the estimation of survival rates of the animals that were released. The Cormack-Jolly-Seber model applies in such situations. When marked and unmarked animals undergo the same sampling process, both the population sizes and the survival rates can be estimated. This is the Jolly-Seber model. Open population models are often used for capture-recapture experiments held over a long period of time. Therefore, the capture occasions are called periods; they are indexed by the subscript i ranging from 1 to I.
The function openp of Rcapture fits both the Cormack-Jolly-Seber and the Joly-Seber model following the loglinear approach of Cormack (1985Cormack ( , 1989, see also Rivest and Daigle (2004).
If the interest focuses only on estimating survival rates, the abundance estimators are simply discarded. Besides the survival rates φ 1 to φ I−1 , these functions estimate the capture probabilities p * 1 to p * I , the population sizes N 1 to N I , the number of new units entering the population B 1 to B I−1 and the total number or units who ever inhabited the survey area N tot. In some applications of the Jolly Seber model, births are arrivals to the colony and deaths are departures (see Schwarz and Stobo 1997). In those cases, the total number of visitors to the colony N tot, is the parameter of interest. By default, the argument m of the function openp is set to "up"; which means that the capture probabilities vary between periods (up = unconstrained probabilities). Because of the well known lack of identifiability for the Jolly-Seber model (see Pollock, Nichols, Brownie, and Hines 1990), the parameters p * 1 , p * I , the survival rate φ I−1 between periods I − 1 and I, N 1 and N I are not estimable with the function openp.up. On the other hand, all the parameters are estimable when m is given the value "ep" because it sets the capture probabilities equal to a common value (ep = equal probabilities). The function openp insures that the estimated survival probabilities belong to [0, 1] and that the births B i are positive by imposing constraints to the loglinear parameters. Setting the argument neg of this function to FALSE removes these constraints.
The steps we propose to follow in the analysis of an open population data set differ from the ones for a closed population data set. A single model is fitted; refitting the model to a subset of the data can be attempted if it doesn't fit well. Figure 6 summarizes the procedure. First, if the experiment follows a robust design, the data matrix must be converted to between primary session data. This is done with the function periodhist which pools the capture histories for   several occasions into a single entry having the value 1 for a unit caught at least once during these occasions and 0 otherwise. Next, descriptive statistics can be produced to explore the data; the m.array matrix output by descriptive is of interest. The Cormack-Jolly-Seber or the Jolly-Seber model is fitted with openp which produces estimates of the demographic parameters. The presence of a trap effect is tested by including additional loglinear parameters in the model. Unfortunately, estimates of demographic parameters accounting for a significant trap effect cannot be calculated at this time. The model's quality of fit is assessed with the deviance of the standard model (without a trap effect) and with plot.openp. This function plots the Pearson residuals versus the number of captures. Large residuals bring out badly fitted data. To pursue its analysis, the user can choose to remove some units from the data set and refit the model. This is done with the keep argument of openp. Typically, one wishes to omit the units caught too often or, on the contrary, the units caught only once which can be considered as transients. The removal of units is a good diagnostic tool to assert the stability of the results, but it is not advised as a general strategy. When some units are omitted, Rcapture brings them back into the final abundance estimates with a correction proposed by (Cormack 1989, p.410). The difference between observed and expected frequencies for the omitted groups is added to the model's estimations of abundance. To complete the analysis, it is possible to construct tests on the parameters under the assumption of multivariate normality of the estimators as will be shown in the following examples. For this, the covariance matrix of all the demographic parameters estimates is used. This matrix is returned by the function openp under the name cov. As noted before, the abundance output presents standard errors adjusted to be valid under multinomial sampling. However, the code matrix contains unadjusted variances of abundance estimators. So these variances are valid under Poisson sampling and not multinomial sampling.

Lazuli bunting example
Let's analyze the lazuli bunting data treated in Cormack (1993a The residuals plot in Figure 7 shows large residuals for the birds caught twice or more while the residuals are small for birds caught once. The Jolly-Seber model does not fit well and the likely presence of transients might cause that. To remove the birds caught only once from the analysis, one uses the keep argument as follows. q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q  The deviance drop of 94 for 6 degrees of freedom is highly significant. The residual plot for this model is not presented here; it still has some Pearson residuals larger than 4 that might influence the survival estimates. The next commands use the object keep3 to identify capture histories with more than one capture and with residuals smaller than 4. R> keep3p <-residuals(op.m2$glm, type = "pearson") < 4 R> num3 <- ((1:255) The two sets of survival estimates are similar; the large residuals have a small impact. Tables  3 and 4 of Cormack (1993a, p.46) present estimates obtained by fitting the first two models of this Section. They report estimates that are identical to those presented here.
The survival estimates are quite similar between periods. We would like to test the equality of the survival probabilities and estimate their common value. In softwares such as Mark and M-Surge, this test is easily performed by fitting a model with constant survival probabilities. This model is not loglinear. An asymptotically equivalent to these homogeneity tests can be calculated with Rcapture using the output from openp.

Eider duck example
This example shows that Rcapture can reproduce the analysis of eider duck data set presented in Cormack (1989  [1] 0.001592682 This is less than 2%. The residual plot in Figure 8 shows a large residual for the 13 ducks captured all the times. We redo the analysis without them. R> keep2 <-apply(histpos.t (6) [1] 0.03427131 The fit is still not satisfactory. The residual plot has the convex shape characteristic of heterogeneity in the capture probabilities. We also remove the individuals caught at 5 periods out of 6.
R> keep3 <-apply(histpos.t (6) It gives a much larger deviance; so the hypothesis of equal capture probabilities is rejected. In the end, the best model is the one fitted without the animals captured 5 or 6 times. The abundances obtained from that model are the following. These abundances and previously shown deviances reproduce the results in Table 6 of Cormack (1989, p.408).
We now investigate models for the growth rate N i+1 /N i of this population using the multivariate normal distribution for the abundance estimates. If the estimated variance covariance matrix of (N 2 , . . . ,N 5 ) isΣ, then the variance of the growth rates (N 3 /N 2 , . . . ,N 5 /N 4 ) iŝ AΣÂ t where A is the 3 × 4 matrix of partial derivatives, The R code for the calculations of the the growth rates and their standard errors is as follows. 1]), 3, 4, byrow = TRUE) R> sig <-partial %*% op.m3$cov[9:12, 9:12] %*% t(partial) R> cbind(estimate = growth, stderr = sqrt(diag(sig))) estimate stderr period 3 1.2218081 0.11281160 period 4 0.8000063 0.07563975 period 5 1.2771890 0.08512487 As previously mentioned, this standard error is calculated using variances under Poisson sampling. In the same way that we calculated a common survival in Section 3.1, we can now obtain an estimate for the common growth rate.

Revisiting the snowshoe hare example
One can use the function openp to investigate whether the hare population is closed. The following commands add possible deaths and immigrations to the final model fitted with closedp.mX in Section 2.1. R> data("hare") R> keep <-rep(TRUE, 2^6 -1) R> mat <-histpos.t (6)  The new deviance of 46.2, df = 52, is not significantly smaller than the one for the closed population model, 47.9, df = 55, see Section 2.1. The assumption that the population is closed cannot be rejected. Models featuring births and deaths after a particular sampling occasion can be fitted to this data set using the function robustd.t discussed in the next section.

Robust design
The robust design is a combination of models for closed and open populations introduced by Pollock (1982). Units are captured at different periods between which the population experiences mortality and immigration. Thus, open population models apply at this first level of sampling to estimate survival rates. However, within each primary period, sampling is done more than once; that is, a short term study is conducted. Closed population models are used at this stage to estimate population sizes. By pooling the data of a series of shortterm studies, the robust design improves the estimation of the demographic characteristics of the population.
With the package Rcapture, one can fit a model for a robust design using either the function robustd.t or the function robustd.0. These functions implement the loglinear parameterizations presented in Rivest and Daigle (2004). They estimate the same demographic parameters as the openp function, without any constrain or unestimable parameters. Within the primary periods, The function robustd.t can fit closed population models M 0 , M t , M h and M th while robustd.0 only accepts models M 0 and M h . That is, the function robustd.0 doesn't fit models with a within period temporal effect. However, it is much less memory consuming than robustd.t, so it runs faster. This is so because robustd.0 codes capture histories in terms of the number of captures for each primary period. Therefore, the length of the response vector for a model fitted with robustd.t is 2 − 1 for a model fitted with robustd.0, where t i stands for the number of capture occasions at period i. For an experiment such as the one in Section 4.1, with 6 primary periods having each 5 capture occasions, this represents over 1 billion entries in the dependent vector, brought down to 46 655 by the alternative coding of the capture histories.
The function robustd uses the data matrix and a vector vt containing the numbers of capture occasions for each primary sampling period as input arguments. The closed population models for each period are specified with the arguments vm, vh and va as described in the package documentation. Negative γ parameter estimates in the open population part of the model and negative η parameter estimates for Chao's closed population models are by default set to zero. They can be unconstrained by setting the neg function to FALSE, however this also allows survival estimates to be greater than 1 and immigration parameters to be less than 0.
Open population analysis between primary periods robustd.t

Analysis Steps
Rcapture functions robustd.0 Computes various demographic parameters using a loglinear model in capture-recapture experiments.
The .0 function does not allow models with temporal effects but it can fit data from a large number of capture occasions. It is less memory consuming than robustd.t. as in Figure 1 as in Figure 2 Model fitting and parameters estimation The .t function allows models with temporal effects but it cannot be used with a large number of capture occasions because it becomes memory consuming. Figure 9: The steps of a robust design analysis linked to relevant Rcapture functions Figure 9 presents the steps in a robust design data analysis. First, closed population analyses should be performed, as described in Section 2, for the short terms study within each period. Then, after converting the data set with the function periodhist, an open population analysis is conducted (see Section 3). These preliminary analyses suggest robust design models for consideration, which are now fitted with the function robustd.t or robustd.0. The selection of closed population models within periods relies on the closed populations analyses. The model's fit is evaluated by testing for the presence of a temporary emigration. This is done by comparing the deviance of the fitted model to the deviance of the same model with temporary emigration, homogenous or not. One can simply use the AIC to do the comparison, or a likelihood ratio test can be performed. If a model with temporary emigration is significantly better than the model without temporary emigration, then the fitted model might not be appropriate. A bad fit can be associated to a temporary emigration out of the study area if the difference on the logit scale of the between period capture probabilities minus the within period capture probabilities are negative. A bad fit can also be caused by an improper modeling of the within period capture probabilities, especially if the capture probabilities display some heterogeneity. New specifications for the models M h or M th used in the primary periods might be needed. As in the open population analysis, once a final model is chosen, further analysis can be conducted assuming that the parameter estimates have a multivariate normal distribution.
In robustd.t and robustd.0, the parameter values of the closed population models change between periods. Also these functions do not have a keep argument for investigating the impact of a particular capture history on the outcome.

Meadow vole example
This example presents a study of the complete meadow vole data set in Chapter 19 of Williams et al. (2002). The third period of this data set has been analysed in Section 2.3. This data set concerns a robust design with 30 capture occasions pertaining to 6 primary periods having 5 capture occasions each. These capture occasions represent five consecutive days of trapping every month from June to December 1981 at Patuxent Wildlife Research Center, Laurel, Maryland. This data set has 10 trap deaths that are ignored in this analysis.
First, a between primary period Jolly-Seber analysis is presented. To find a suitable model within each primary period, the function closedp has been used repeatedly. Heterogeneity has been detected in all periods except the second one where the data collection was perturbed by a racoon (the last capture occasion for the second period does not have any new animal captured and is taken out of the analysis). In a robust design we use M h models for all primary periods bearing in mind the questionable fit in the second one. Since there is no time effect within primary periods, we use the function robustd.0 to fit the model. We note that it is possible not to specify any model for the second period. It would be done with the following command.

Limitations and comparison to other softwares
One limitation of Rcapture is that it does not handle trap deaths. This occurs if some captured animals are not released in the population after their capture. Animals cannot be recaptured after a trap death so that their capture histories will have zeros for the remaining capture occasions. In closed population models, trap deaths can be considered as a subpopulation with a known size. The analysis can focus on the estimation of the size of the non trap death population, using the data on the animals that did not experience a trap death. In open population models, the goodness of fit statistics of the openp functions are valid in the presence of trap deaths. Their demographic parameter estimates are however biased; alternative formulas for converting loglinear parameters into demographic parameters need to be develop to account for trap deaths. A robust design analysis is also sensitive to the occurrence of trap deaths; they might bias its conclusions. Methods to deal with death traps with this software are under investigation.
The robust design functions highlight another limitation of the Poisson regression for modeling capture recapture data. In large experiments, the number of observable capture histories can be very large. Most of them have a zero frequency; still, all these zero frequencies must appear in the dependent vector for the Poisson regression. This makes the number of cases in the Poisson regression unnecessarily large. An alternative fitting strategy discussed in Barker and White (2004) is to model the capture histories of the released animals at each capture occasion using multinomial distributions. Then, only capture histories with a positive frequency contribute to the likelihood. For an open population model, the likelihood to maximize can be written in terms of the m-array matrix for the experiment. This fitting strategy is implemented in Mark (see White and Burnham 1999;White 2005), which is the main software for analyzing capture recapture data. Since the likelihood is written in terms of aggregated data, testing the fit of the model is not straightforward under this approach. The simple tests for trap dependence and the goodness of fit diagnostics based on residuals presented in Section 3 are not available anymore.
Over the years several softwares have been written for the analysis of data from capture recapture experiments. Several of these softwares are now available within Mark (see White and Burnham 1999;White 2005). For instance it contains the package Popan of Schwarz and Arnason (1996) for the modeling of abundance in open population models, see also http://www.cs.umanitoba.ca/~popan/. Package Care for closed populations data, with an emphasis on epidemiological applications, is discussed in Chao, Tsay, Lin, Shau, and Chao (2001). Package M-Surge (see Choquet, Reboulet, Pradel, Gimenez, and Lebreton 2004) is also available to model multistate recapture data in the Cormack-Jolly-Seber setting. Bayesian methods can also be used to fit capture-recapture model (see Madigan and York 1997;Brooks, Catchpole, and Morgan 2000). Durban and Elston (2005) suggest a Bayesian approach to M h . Gibbs sampling and Markov chain Monte Carlo techniques are implemented for fitting complex models (see Gimenez, Crainiceanu, Barbraud, Jenouvrier, and Morgan 2006).
The package Rcapture covers the basic statistical models for capture-recapture experiments. It is the only package that focuses on the use of loglinear models for the analysis of closed and open population data. As illustrated in Section 3, the fit of complex models can be investigated with the maximum likelihood estimates and their asymptotic variances obtained from Rcapture. This package provides diagnostic tools and several alternatives for fitting model M h and M th to closed population data. In view of the lack of identifiability of such models pointed out by Link (2003), this flexibility is welcomed when confronting heterogeneity. Rcapture tries to emphasize that there is more to data analysis than model fitting by providing probability and residual plots to guide the analysis. It takes advantage of the flexible R programming environment which allows users to build their own R function by using multipurpose minimization functions such as optim. For instance a function closedp.Mtb for fitting closed population model M tb which does not have a loglinear form is provided with the package as an illustration of the application of the R programming language to the building of models for capture-recapture data.
to acknowledge the contribution of Emmanuelle Manouvelle who wrote the first versions of many of the functions in this package.