The VGAM Package for Capture – Recapture Data Using the Conditional Likelihood

It is well known that using individual covariate information (such as body weight or gender) to model heterogeneity in capture–recapture (CR) experiments can greatly enhance inferences on the size of a closed population. Since individual covariates are only observable for captured individuals, complex conditional likelihood methods are usually required and these do not constitute a standard generalized linear model (GLM) family. Modern statistical techniques such as generalized additive models (GAMs), which allow a relaxing of the linearity assumptions on the covariates, are readily available for many standard GLM families. Fortunately, a natural statistical framework for maximizing conditional likelihoods is available in the Vector GLM and Vector GAM classes of models. We present several new R-functions (implemented within the VGAM package) specifically developed to allow the incorporation of individual covariates in the analysis of closed population CR data using a GLM/GAM-like approach and the conditional likelihood. As a result, a wide variety of practical tools are now readily available in the VGAM object oriented framework. We discuss and demonstrate their advantages, features and flexibility using the new VGAM CR functions on several examples.


Introduction
Note: this vignette is essentially Yee et al. (2015).
Capture-recapture (CR) surveys are widely used in ecology and epidemiology to estimate population sizes. In essence they are sampling schemes that allow the estimation of both n and p in a Binomial(n, p) experiment . The simplest CR sampling design consists of units or individuals in some population that are captured or tagged across several sampling occasions, e.g., trapping a nocturnal mammal species on seven consecutive nights. In these experiments, when an individual is captured for the first time then it is marked or tagged so that it can be identified upon subsequent recapture. On each occasion recaptures of individuals which have been previously marked are also noted. Thus each observed individual has a capture history: a vector of 1s and 0s denoting capture/recapture and noncapture respectively. The unknown population size is then estimated using the observed capture histories and any other additional information collected on captured individuals, such as weight or sex, along with environmental information such as rainfall or temperature.
We consider closed populations, where there are no births, deaths, emigration or immigration throughout the sampling period (Amstrup et al. 2005). Such an assumption is often reasonable when the overall time period is relatively short. Otis et al. (1978) provided eight specific closed population CR models (see also Pollock (1991)), which permit the individual capture probabilities to depend on time and behavioural response, and be heterogeneous between individuals. The use of covariate information (or explanatory variables) to explain heterogeneous capture probabilities in CR experiments has received considerable attention over the last 30 years (Pollock 2002). Population size estimates that ignore this heterogeneity typically result in biased population estimates (Amstrup et al. 2005). A recent book on CR experiements as a whole is McCrea and Morgan (2014).
Since individual covariate information (such as gender or body weight) can only be collected on observed individuals, conditional likelihood models are employed (Pollock et al. 1984;Huggins 1989;Alho 1990;Lebreton et al. 1992). That is, one conditions on the individuals seen at least once through-out the experiment, hence they allow for individual covariates to be considered in the analysis. The capture probabilities are typically modelled as logistic functions of the covariates, and parameters are estimated using maximum likelihood. Importantly, these CR models are generalized linear models (GLMs; McCullagh and Nelder 1989;Huggins and Hwang 2011).
Here, we maximize the conditional likelihood (or more formally the positive-Bernoulli distribution) models of Huggins (1989). This approach has become standard practice to carry out inferences when considering individual covariates, with several different software packages currently using this methodology, including: MARK (Cooch and White 2012), CARE-2 (Hwang and Chao 2003), and the R packages (R Core Team 2015): mra (McDonald 2012), RMark (Laake 2013) and Rcapture Rivest 2014, 2007), the latter package uses a log-linear approach, which can be shown to be equivalent to the conditional likelihood (Cormack 1989;Huggins and Hwang 2011). These programs are quite user friendly, and specifically, allow modelling capture probabilities as linear functions of the covariates. So an obvious question is to ask why develop yet another implementation for closed population CR modelling?
Firstly, nonlinearity arises quite naturally in many ecological applications, (Schluter 1988;Yee and Mitchell 1991;Crawley 1993;Gimenez et al. 2006;Bolker 2008). In the CR context, capture probabilities may depend nonlinearly on individual covariates, e.g., mountain pygmy possums with lighter or heavier body weights may have lower capture probabilities compared with those having mid-ranged body weights (e.g., Huggins and Hwang 2007;Stoklosa and Huggins 2012). However, in our experience, the vast majority of CR software does not handle nonlinearity well in regard to both estimation and in the plotting of the smooth functions. Since GAMs (Hastie and Tibshirani 1990;Wood 2006) were developed in the mid-1980s they have become a standard tool for data analysis in regression. The nonlinear relationship between the response and covariate is flexibly modelled, and few assumptions are made on the functional relationship. The drawback in applying these models to CR data has been the difficult programming required to implement the approach.
Secondly, we have found several implementations of conditional likelihood slow, and in some instances unreliable and difficult to use. We believe our implementation has superior capabilities, and has good speed and reliability. The results of Section ?? contrast our software with some others. Moreover, the incorporation of these methods in a general, maintained statistical package will result in them being updated as the package is updated.
Standard GLM and GAM methodologies are unable to cope with the CR models considered in this article because they are largely restricted to one linear/additive predictor η. Fortunately however, a natural extension in the form of the vector generalized linear and additive model (VGLM/VGAM) classes do allow for multiple ηs. VGAMs and VGLMs are described in Yee and Wild (1996) and Yee and Hastie (2003). Their implementation in the VGAM package (Yee 2008(Yee , 2010(Yee , 2014 has become increasing popular and practical over the last few years, due to large number of exponential families available for discrete/multinomial response data. In addition to flexible modelling of both VGLMs and VGAMs, a wide range of useful features are also available: • smoothing capabilities; • model selection using, e.g., AIC or BIC (Burnham and Anderson 1999); • regression diagnostics and goodness-of-fit tools; • reduced-rank regression (Yee and Hastie 2003) for dimension reduction; • computational speed and robustness; • choice of link functions; • offsets and prior weights; and • (specifically) when using R: generic functions based on object oriented programming, e.g., fitted(), coef(), vcov(), summary(), predict(), AIC(), etc.
Our goal is to provide users with an easy-to-use object-oriented VGAM structure, where four family-type functions based on the conditional likelihood are available to fit the eight models of Otis et al. (1978). We aim to give the user additional tools and features, such as those listed above, to carry out a more informative and broader analysis of CR data; particularly when considering more than one covariate. Finally, this article primarily focuses on the technical aspects of the proposed package, and less so on the biological interpretation for CR experiments. The latter will be presented elsewhere.
An outline of this article is as follows. In Section 2 we present the conditional likelihood for CR models and a description of the eight Otis et al. (1978) models. Section 3 summarizes pertinent details of VGLMs and VGAMs. Their connection to the CR models is made in Section 4. Software details are given in Section 5, and examples on real and simulated data using the new software are demonstrated in Section 6. Some final remarks are given in Section 7. The two appendices give some technical details relating to the first and second derivatives of the conditional log-likelihood, and the means. Vector of capture histories for individual i (i = 1, . . . , n) with observed values 1 (captured) and 0 (noncaptured). Each y i has at least one observed 1 "h" Model M subscript, for heterogeneity "b"

Capture-recapture models
Model M subscript, for behavioural effects "t" Model M subscript, for temporal effects p ij Probability that individual i is captured at sampling occasion j (j = 1, . . . , τ ) z ij = 1 if individual i has been captured before occasion j, else = 0 θ Vector of regression coefficients to be estimated related to p ij η Vector of linear predictors (see Table 3 for further details) g Link function applied to, e.g., p ij . Logit by default Table 1: Short summary of the notation used for the positive-Bernoulli distribution for capture-recapture (CR) experiments. Additional details are in the text.
In this section we give an outline for closed population CR models under the conditional likelihood/GLM approach. For further details we recommend Huggins (1991) and Huggins and Hwang (2011). The notation of Table 1 is used throughout this article.

Conditional likelihood
Suppose we have a closed population of N individuals, labelled i = 1, . . . , N and τ capture occasions labelled j = 1, . . . , τ . We make the usual assumptions that individuals in the population behave independently of each other, individuals do not lose their tags, and tags are recorded correctly. Let y ij = 1 if the ith individual was caught on the jth occasion and be zero otherwise, and let n be the number of distinct individuals captured.
Let p ij denote the probability of capturing individual i on occasion j. As noted in Section 1, Otis et al. (1978) describe eight models for the capture probabilities, see Section 2.2 for further details. Label the individuals captured in the experiment by i = 1, . . . , n and those never captured by i = n + 1, . . . , N . The full likelihood is given by where K is independent of the p ij but may depend on N . The RHS of (1) requires knowledge of the uncaptured individuals and in general cannot be computed. Consequently no MLE of N will be available unless some homogeneity assumption is made about the noncaptured individuals. Instead, a conditional likelihood function based only on the individuals observed at least once is .
(2) Capture Joint probability history  Otis et al. (1978), with τ = 2 capture occasions in closed population CR experiment. Here, p cj = capture probability for unmarked individuals at sampling period j, p rj = recapture probability for marked individuals at sampling period j, and p = constant capture probability across τ = 2. Note that the "00" row is never realized in sample data.
is used. Here p † is are the p ij computed as if the individual had not been captured prior to j so that the denominator is the probability individual i is captured at least once. This conditional likelihood (2) is a modified version of the likelihood corresponding to a positive-Bernoulli distribution (Patil 1962).

The eight models
Models which allow capture probabilities to depend on one or a combination of time, heterogeneity or behavioural effects are defined using appropriate subscripts, e.g., M th depends on time and heterogeneity. These eight models have a nested structure of which M tbh is the most general. The homogeneous model M 0 is the simplest (but most unrealistic) and has equal capture probabilities for each individual H 0 : p ij = p, regardless of the sampling occasion. All eight models are GLMs, since the conditional likelihood (2) belongs to the exponential family .
To illustrate the approach, we use the following toy example throughout, consider a CR experiment with two occasions-morning and evening (i.e., τ = 2), with capture probabilities varying between the two occasions. Furthermore, suppose we have collected some individual covariates-weight and gender. The joint probabilities of all the eight models are listed in Table 2. It can be seen that all but the positive-Binomial model (M 0 /M h ) require more than one probability and hence more than one linear predictor, so that the original Nelder and Wedderburn (1972) GLM framework is inadequate. Further, there are two noteworthy points from Table 2 which apply for any τ ≥ 2: • first, for M t -type models, as τ increases so will the number of linear predictors and hence the potential number of parameters; • secondly, it is evident that there are four main categories consisting of non-heterogeneity models (M 0 , M b , M t and M tb ), which are paired with a heterogeneity sub-model (respectively M h , M bh , M th and M tbh ).
The four heterogeneity models allow for each individual to have their own probability of capture/recapture. In our toy example, the capture probabilities are dependent on an individual's weight and gender. We discuss these models further in Section 3.1. It is natural to consider individual covariates such as weight and gender as linear/additive predictors. Let x i denote a covariate (either continuous or discrete) for the ith individual, which is constant across the capture occasions j = 1, . . . , τ , e.g., for continuous covariates one could use the first observed value or the mean across all j. If there are d − 1 covariates, we write x i = (x i1 , . . . , x id ) with x i1 = 1 if there is an intercept. Also, let g −1 (η) = exp(η)/{1 + exp(η)} be the inverse logit function. Consider model M tbh , then the capture/recapture probabilities are given as [notation follows Section 3.3] where β * (1)1 is the behavioural effect of prior capture, β * (j+1)1 for j = 1, . . . , τ are time effects, and β 1[−1] are the remaining regression parameters associated with the covariates. Computationally, the conditional likelihood (2) is maximized with respect to all the parameters (denote by θ) by the Fisher scoring algorithm using the derivatives given in Appendix A.

Estimation of N
In the above linear models, to estimate N let π i (θ) = 1 − τ s=1 (1 − p † is ) be the probability that individual i is captured at least once in the course of the study. Then, if θ is known, the Horvitz-Thompson (HT; Horvitz and Thompson 1952) estimator is unbiased for the population size N and an associated estimate of the variance of N (θ) is where, following from a Taylor series expansion of N ( θ) about N (θ),

Vector generalized linear and additive models
To extend the above linear models, we use VGLMs and VGAMs which we briefly describe in this section. These models fit within a large statistical regression framework which will be described in Yee (2015). The details here are purposely terse; readers are directed to Yee (2008Yee ( , 2010 for accessible overviews and examples, and Yee and Wild (1996) and Yee and Hastie (2003) for technical details.

Basics
Consider observations on independent pairs (x i , y i ), i = 1, . . . , n. We use "[−1]" to delete the first element, e.g., . For simplicity, we will occasionally drop the subscript i and simply write x = (x 1 , . . . , x d ) . Consider a single observation where y is a Q-dimensional vector. For the CR models of this paper, Q = τ when the response is entered as a matrix of 0s and 1s. The only exception is for the M 0 /M h where the aggregated counts may be inputted, see Section 5.2. VGLMs are defined through the model for the conditional density The jth linear predictor is then where β (j)k is the kth component of β j . In the CR context, we remind the reader that, as in Table 2, we have M = 2 for M bh , M = τ for M th and M = 2τ − 1 for M tbh .
In GLMs the linear predictors are used to model the means. The η j of VGLMs model the parameters of a model. In general, for a parameter θ j we take η j = g j (θ j ), j = 1, . . . , M and we say g j is a parameter link function. Write In practice we may wish to constrain the effect of a covariate to be the same for some of the η j and to have no effect for others. In our toy example, model M th with τ = M = 2, d = 3, we have which correspond to x i2 being the individual's weight and x i3 an indicator of gender say, then we have the constraints β (1)2 ≡ β (2)2 and β (1)3 ≡ β (2)3 . Then, with " * " denoting the parameters that are estimated, and we may write We can also write this as (noting that x i1 = 1) In general, for VGLMs, we represent the models as where H 1 , H 2 , . . . , H d are known constraint matrices of full column-rank (i.e., rank ncol(H k )), β * (k) is a vector containing a possibly reduced set of regression coefficients. Then we may write as an expression of (6) concentrating on columns rather than rows. Note that with no constraints at all, all H k = I M and β * (k) = β (k) . We need both (6) and (8) since we focus on the η j and at other times on the variables x k . The constraint matrices for common models are pre-programmed in VGAM and can be set up by using arguments such as parallel and zero found in VGAM family functions. Alternatively, there is the argument constraints where they may be explicitly inputted. Using constraints is less convenient but provides the full generality of its power.

Handling time-varying covariates
Often, the covariates may be time-varying, e.g., when using temperature as a covariate, then a different value is observed and measured for each occasion j for j = 1, . . . , τ . Again, using our toy example with M = 2, d = 3, and τ = 2, suppose we have time-dependent covariates x ij , j = 1, 2. We may have the model for the linear predictor on the two occasions. Here, x ikt is for the ith animal, kth explanatory variable and tth time. We write this model as Thus to handle time-varying covariates one needs the xij facility of VGAM (e.g., see Section 6.3), which allows a covariate to have different values for different η j through the general formula where x ikj is the value of variable x k for unit i for η j . The derivation of (9), followed by some examples are given in Yee (2010). Implementing this model requires specification of the diagonal elements of the matrices X * ik and we see its use in Section 6.3. Clearly, a model may include a mix of time-dependent and time-independent covariates. The model is then specified through the constraint matrices H k and the covariate matrices X # (ik) . Typically in CR experiments, the time-varying covariates will be environmental effects. Fitting timevarying individual covariates requires some interpolation when an individual is not captured and is beyond the scope of the present work.

VGAMs
VGAMs replace the linear functions in (7) by smoothers such as splines. Hence, the central formula is is the rank of the constraint matrix for x k . Note that standard error bands are available upon plotting the estimated component functions (details at Yee and Wild (1996)), e.g., see Figure 1.

VGLMs and VGAMs applied to CR data
In this section we merge the results of Sections 2 and 3 to show how the eight models of Otis et al. (1978) can be fitted naturally within the VGLM/VGAM framework.

Linear predictors and constraint matrices
As in Section 3.1, we now write y i as the capture history vector for individual i. Written technically, y i ∈ ({0, 1}) τ \{0 τ } so that there is at least one 1 (capture). For simplicity let p c and p r be the capture and recapture probabilities. Recall that the value for M will depend on the CR model type and the number of capture occasions considered in the experiment, for example, consider model M b as in Table 2, then (η 1 , η 2 ) = (g(p c ), g(p r )) for some link function g, thus M = 2. The upper half of Table 3 gives these for the eight Otis et al. (1978) models. The lower half of Table 3 gives the names of the VGAM family function that fits those models. They work very similarly to the family argument of glm(), e.g., R> vglm(cbind(y1, y2, y3, y4, y5, y6)~weight + sex + age, + family = posbernoulli.t, data = pdata) is a simple call to fit a M th model. The response is a matrix containing 0 and 1 values only, and three individual covariates are used here. The argument name family was chosen for not necessitating glm() users learning a new argument name; and the concept of error distributions as for the GLM class does not carry over for VGLMs. Indeed, family denotes some full-likelihood specified statistical model worth fitting in its own right regardless of an 'error distribution' which may not make sense. Each family function has logit() as their default link, however, alternatives such as probit() and cloglog() are also permissible. Section 5 discusses the software side of VGAM in detail, and Section 6 gives more examples.
As noted above, constraint matrices are used to simplify complex models, e.g., model M tbh into model M th . The default constraint matrices for the M tbh (τ ) model are given in Table 4. These are easily constructed using the drop.b, parallel.b and parallel.t arguments in the family function. More generally, the H k may be inputted using the constraints argumentsee Yee (2008) and Yee (2010) for examples. It can be seen that the building blocks of the H k are 1, 0, I and O. This is because one wishes to constrain the effect of x k to be the same for capture and recapture probabilities. In general, we believe the H k in conjunction with (9) can accommodate all linear constraints between the estimated regression coefficients β (j)k .
For time-varying covariates models, the M diagonal elements x ikj in (9) correspond to the value of covariate x k at time j for individual i. These are inputted successively in order using the xij argument, e.g., as in Section 6.3.

Penalized likelihood and smoothing parameters
For each covariate x k , the smoothness of each component function f * (j)k in (10) can be controlled by the non-negative smoothing parameters λ (j)k . Yee and Wild (1994) show that, when vector splines are used as the smoother, the penalized conditional log-likelihood is maximized. Here, c is the logarithm of the conditional likelihood function (2). The penalized conditional likelihood (11) is a natural extension of the penalty approach described in Green and Silverman (1994) to models with multiple η j .
An important practical issue is to control for overfitting and smoothness in the model. The s() function used within vgam() signifies the smooth functions f * (j)k estimated by vector splines, and there is an argument spar for the smoothing parameters, and a relatively small (positive) value will mean much flexibility and wiggliness. As spar increases the solution converges to the least squares estimate. More commonly, the argument df is used, and this is known as the equivalent degrees of freedom (EDF). A value of unity means a linear fit, and the default is the value 4 which affords a reasonable amount of flexibility. ] ,

Software details for CR models in VGAM
Having presented the conditional likelihood (2) and VGLMs/VGAMs for CR models, we further discuss the fitting in VGAM. It is assumed that users are somewhat familiar with modelling in R and using glm() class objects. VGAM, authored by TWY, uses S4 classes. In order to present the new family functions developed for vglm() and vgam(), some additional preliminaries for VGAM are given below. Version 0.9-4 or later is assumed, and the latest prerelease version is available at http://www.stat.auckland.ac.nz/yee/VGAM/prerelease. Below we describe each of the eight models with their VGAM representation and their default values, we also give additional remarks. All eight models can be fit using posbernoulli.tb(), it is generally not recommended as it is less efficient in terms of memory requirements and speed.

Basic software details
All family functions except posbinomial() should have a n × τ capture history matrix as the response, preferably with column names. Indicators of the past capture of individual i, defined as z ij , are stored on VGAM objects as the cap.hist1 component in the extra slot. Also, there is a component called cap1 which indicates on which sampling occasion the first capture occurred.
As will be illustrated in Section 6.3, a fitted CR object stores the point estimate for the population size estimator ( Notice that in Table 3, the VGAM family functions have arguments such as parallel.b which may be assigned a logical or else a formula with a logical as the response. If it is a single logical then the function may or may not apply that constraint to the intercept. The formula is the most general and some care must be taken with the intercept term. Here are some examples of the syntax: • parallel.b = TRUE~x2 means a parallelism assumption is applied to variables x 2 and the intercept, since formulas include the intercept by default.
• parallel.b = TRUE~x2-1 means a parallelism assumption is applied to variable x 2 only.
• parallel.b = FALSE~0 means a parallelism assumption is applied to every variable including the intercept.  (Table 3). For example, setting posbernoulli.t(parallel.t = FALSE~0) constrains all the p j to be equal.

Models
If comparing all eight models using AIC() or BIC() then setting omit.constant = TRUE will allow for comparisons to be made with the positive-Bernoulli functions given below. The reason is that this omits the log-normalizing constant log τ τ y * i from its conditional loglikelihood so that it is comparable with the logarithm of (2).
An extreme case for M h is where p ij = p i with p i being parameters in their own right (Otis et al. 1978). While this could possibly be fitted by creating a covariate of the form factor(1:n) there would be far too many parameters for comfort. Such an extreme case is not recommended to avoid over-parameterization. p.small = 1e-04, no.warning = FALSE, type.fitted = c("probs", "onempall0")) NULL Note that for M t , capture probabilities are the same for each individual but may vary with time, i.e., H 0 : p ij = p j . One might wish to constrain the probabilities of a subset of sampling occasions to be equal by forming the appropriate constraint matrices.

Models M t /M th
Argument iprob is for an optional initial value for the probability, however all VGAM family functions are self-starting and usually do not need such input.

Models
so that the first coefficient β * (1)1 corresponds to the behavioural effect. Section 6.4 illustrates how the VGLM/VGAM framework can handle short-term and long-term behavioural effects. , drop.b = FALSE~1, type.fitted = c("likelihood.cond", "mean.uncond"), imethod = 1, iprob = NULL, p.small = 1e-04, no.warning = FALSE, ridge.constant = 1e-04, ridge.power = -4) NULL One would usually want to keep the behavioural effect to be equal over different sampling occasions, therefore parallel.b should be normally left to its default. Allowing it to be FALSE for a covariate x k means an additional τ − 1 parameters, something that is not warranted unless the data set is very large and/or the behavioural effect varies greatly over time.

Models
Arguments ridge.constant and ridge.power concern the working weight matrices and are explained in Appendix A.
Finally, we note that using fits the most general model. Its formula is effectively (5) for M = 2τ − 1, hence there are (2τ − 1)d regression coefficients in total-far too many for most data sets.

Examples
We present several examples using VGAM on both real-life and simulated CR data.

R>
Notice that the s() function was used to smooth over the weight covariate with the equivalent degrees of freedom set to 3. Plots of the estimated component functions against each covariate are given in Figure 1. In general, weight does seem to have a (positive) linear effect on the logit scale. Young deer mice appear more easily caught compared to adults, and gender seems to have a smaller effect than weight. A more formal test of linearity is R> summary(fit.bh) Call: vgam(formula = cbind(y1, y2, y3, y4, y5, y6)~s(weight, df = 3) + sex + age, family = posbernoulli.b, data = deermice) Names of additive predictors: logitlink(pcapture), logitlink(precapture) Dispersion Parameter for posbernoulli. and not surprisingly, this suggests there is no significant nonlinearity. This is in agreement with Section 6.1 of Hwang and Huggins (2011) who used kernel smoothing.
Section 6.4 reports a further analysis of the deermice data using a behavioural effect comprising of long-term and short-term memory.

Yellow-bellied Prinia
Our second example also uses a well-known and well-studied data set collected on the Yellowbellied Prinia (Prinia flaviventris), a common bird species located in Southeast Asia. A CR experiment was conducted at the Mai Po Nature Reserve in Hong Kong during 1991, where captured individuals had their wing lengths measured and fat index recorded. A total of τ = 19 weekly capture occasions were considered, where n = 151 distinct birds were captured. In previous studies, models M h and M th have both been fitted to these data, where both wing length and fat index were used as covariates. We focus our attention on the former model, and considered the posbinomial() function, with some further emphasis on demonstrating smoothing on covariates. The prinia data consists of four columns and rows corresponding to each observed individual:  The first two columns give the observed covariate values for each individual, followed by the number of times each individual was captured/not captured respectively (columns 3-4). Notice that the wing length (length) was standardized here. We considered smoothing over the wing length, and now plotted the fitted capture probabilities with and without fat content against wing length present, see Figure 2. Both the estimates for the population size and shape of the fitted capture probabilities with smoothing ( Figure 2) matched those in previous studies, e.g., see Figure 1 of Hwang and Huggins (2007). Notice that capture probabilities are larger for individuals with fat content present, also the approximate ±2 pointwise SEs become wider at the boundaries-this feature is commonly seen in smooths.
In closing, we refit model fit.tbh using Select() to illustrate the avoidance of manual specification of cumbersome formulas and response matrices with many columns. For example, suppose pdata is a data frame with columns y01, y02, . . . , y30. Then Select(pdata, "y") will return the matrix cbind(y01, y02, ..., y30) if there are no other variables beginning with "y".

Discussion
We have presented how the VGLM/VGAM framework naturally handles the conditionallikelihood and closed population CR models in a GLM-like manner. Recently, Stoklosa et al. (2011) proposed a partial likelihood approach for heterogeneous models with covariates. There, the recaptures of the observed individuals were modelled, which yielded a binomial distribution, and hence a GLM/GAM framework in R is also possible. However, some efficiency is lost, as any individuals captured only once on the last occasion are excluded. The advantage of partial likelihood is that the full range of GLM based techniques, which includes more than GAMs, are readily applicable. Huggins and Hwang (2007); Hwang and Huggins (2011) and Stoklosa and Huggins (2012) implemented smoothing on covariates for more general models, however these methods required implementing sophisticated coding for estimating the model parameters. Zwane and van der Heijden (2004) also used the VGAM package for smoothing and CR data, but considered multinomial logit models as an alternative to the conditional likelihood. We believe the methods here, based on spline smoothing and classical GAM, are a significant improvement in terms of ease of use, capability and efficiency.
When using any statistical software, the user must take a careful approach when analyzing and interpreting their output data. In our case, one must be careful when estimating the population via the HT estimator. Notice that (3) is a sum of the reciprocal of the estimated capture probabilities seen at least once, π i (θ). Hence, for very small π i (θ), the population size estimate may give a large and unrealistic value (this is also apparent when using the mra package and Rcapture which gives the warning message: The abundance estimation for this model can be unstable). To avoid this, Stoklosa and Huggins (2012) proposed a robust HT estimator which places a lower bound on π i (θ) to prevent it from giving unrealistically large values. In VGAM, a warning similar to Rcapture is also implemented, and there are arguments to control how close to 0 "very small" is and to suppress the warning entirely.
There are limitations for M h -type models, in that they rely on the very strong assumption that all the heterogeneity is explained by the unit-level covariates. This assumption is often not true, see, e.g., Rivest and Baillargeon (2014). To this end, a proposal is to add randomeffects to the VGLM class. This would result in the VGLMM class ("M" for mixed) which would be potentially very useful if developed successfully. Of course, VGLMMs would contain GLMMs (McCulloch et al. 2008) as a special case. Further future implementations also include: automatic smoothing parameter selection (via, say, generalized cross validation or AIC); including a bootstrap procedure as an alternative for standard errors.
GAMs are now a standard statistical tool in the modern data analyst's toolbox. With the exception of the above references, CR analysis has since been largely confined to a few regression coefficients (at most), and devoid of any data-driven exploratory analyses involving graphics. This work has sought to rectify this need by introducing GAM-like analyses using a unified statistical framework. Furthermore, the functions are easy to use and often can be invoked by a single line of code. Finally, we believe this work is a substantial improvement over other existing software for closed population estimation, and we have shown VGAM's favourable speed and reliability over other closed population CR R-packages.