MortalitySmooth : An R Package for Smoothing Poisson Counts with P-Splines

The MortalitySmooth package provides a framework for smoothing count data in both one- and two-dimensional settings. Although general in its purposes, the package is specif-ically tailored to demographers, actuaries, epidemiologists, and geneticists who may be interested in using a practical tool for smoothing mortality data over ages and/or years. The total number of deaths over a speciﬁed age- and year-interval is assumed to be Poisson-distributed, and P-splines and generalized linear array models are employed as a suitable regression methodology. Extra-Poisson variation can also be accommodated. Structured in an S 3 object orientation system, MortalitySmooth has two main functions which ﬁt the data and deﬁne two classes of objects: Mort1Dsmooth and Mort2Dsmooth . The methods for these classes ( print , summary , plot , predict , and residuals ) are also included. These features make it easy for users to extract and manipulate the outputs. In addition, a collection of mortality data is provided. This paper provides an overview of the design, aims, and principles of MortalitySmooth , as well as strategies for applying it and extending its use.


Introduction
In recent decades, advanced smoothing methods have been developed for analyzing and modeling data when parametric forms for describing functions are unreasonably rigid. Developers of statistical software have since implemented most of these methodologies, thereby broadening the user-base beyond the statistical circle.
These smoothing methods can be applied to demography in general, and to mortality analysis in particular. Mortality changes are crucial issues in various fields, and understanding mortality development is a key factor in actuarial studies, as well as in epidemiology and genetics.
Section 3 onwards, we present the package itself. First, we demonstrate how users can select the data that are included in the package. Examples that show the potential of the package for smoothing counts over one-and two-dimensions are given in Section 4. Although most of the package's features can be used in either setting, we describe how to extrapolate, construct confidence intervals, and account for overdispersion in the unidimensional case. In Section 4.2, we illustrate interpolation and residuals in the 2D setting. The paper concludes with a critical discussion of the methodology, its performance in the package, and its possible extensions.

Introduction to P-splines
While we can easily generalize this approach for any given Poisson-distributed data, here we will formulate the model for mortality data. For a given population, at age i during year j, the total number of deaths, Y ij , follows a Poisson distribution, Y ij ∼ P(E ij · µ ij ) (Keiding 1990). The expected values are thus the product of exposures E ij and the hazard (or force of mortality) µ ij . In large populations, the size of the exposure population is typically estimated by averaging the population sizes at the beginning and at the end of the year. A fully nonparametric estimate of this hazard can be obtained by computing the actual death rates at the respective age and year:m ij = Y ij /E ij .
For ease of manipulation and presentation, mortality data are commonly prepared as rectangular arrays. For each age and calendar year, we have the number of deaths and the number of exposures arranged in m × n matrices Y and E, respectively. The rows are indexed by age, and the columns are indexed by year. Mortality data can be also viewed in one dimension, such as age (year), by fixing a specific column-year (row-age). In this case, we will have a vector of death counts, y, and exposures, e.
MortalitySmooth employs P-splines for smoothing this type of data. Eilers and Marx (1996) developed a method which combines (fixed-knot) B-splines with a roughness penalty. Currie et al. (2004) extended the initial idea to modeling and forecasting mortality, and a comprehensive study of the methodology was presented in Currie et al. (2006). B-splines are bell-shaped curves composed of smoothly joined polynomial pieces of degree q. The points on the horizontal axis at which the pieces merge together are called "knots." We will use equally spaced knots. For details on B-splines and related algorithms, see De Boor (1978) and Dierckx (1993).
In the MortalitySmooth package, the function MortSmooth_bbase creates internally an equallyspaced B-splines basis over the abscissa of the data. The function reproduces an algorithm presented by Eilers and Marx (2010) using differences of truncated power functions. In the section "Splines and knots," Eilers and Marx (2010) considered the possibility that small errors might be made when using this algorithm, and produced small negative entries in the basis. However, these mistakes are typically very small as a result of computational advances in present-day computers, and, in their example, the largest negative entry was of the order of 10 −10 which is practically irrelevant in a mortality analysis.

P-splines in one dimension for Poisson data
In the one-dimensional case, equally-spaced B-splines can be used as a regression basis: B ∈ R m×k . The coefficients, a, will be associated with each B-spline. We then model our Poisson data as follows: η = ln(E(y)) = ln(e) + ln(µ) = ln(e) + Ba , where η is the linear predictor, and, when dealing with Poisson data, a logarithm is used as a link-function. The logarithm of the exposures, ln(e), is commonly called an offset.
Following the approach outlined by Eilers and Marx (1996), we can choose a relatively large number of B-splines with an additional penalty, P , on the regression coefficients. Whereas the large basis would result in over-fitting, the penalty forces the coefficients to vary more smoothly.
A penalized version of the iteratively reweighted least squares (IRWLS) algorithm yields estimates of the coefficients a: wherez = (y − eμ)/(eμ) + Bã, which is defined as a working dependent variable. B is again the regression matrix,μ andã denote current approximations to the solution, andW is a diagonal matrix of weights. In the case of Poisson errors, as in Equation 3.7 in Currie et al. (2004),W = diag(eμ).
The only departure from the standard procedure for fitting GLM with B-splines as regressors is the modification of B W B by P . This penalty is weighted by a positive regularization parameter λ, and is given by The matrix D d constructs dth order differences of the coefficients: D d a = ∆ d a .
MortalitySmooth computes D d as the (repeated) differencing of the identity matrix. For example, D 1 and D 2 are constructed as follows, when k = 5: By changing λ, the smoothness can be tuned. Hence, the parameter λ controls the trade-off between smoothness and model fidelity. A higher λ will lead to a higher penalty term, and, consequently, to smoother fitted values. Conversely, with a small λ, the unpenalized term gains importance and the smooth curve will be closer to the data.
It is noteworthy that, as pointed by Eilers and Marx (2002a), (i) the number of equally spaced knots does not matter much, provided enough of them are chosen to ensure greater flexibility than is needed, and that (ii) the choice of the degree of the polynomials q for constructing the B-splines is almost irrelevant in the case of P-spline models.

P-spline models for two-dimension mortality data
In order to model mortality data in arrays, we will seek to construct a basis for a twodimensional regression with local support analogous to B in Equation 2. A detailed description of this generalization can be found in Eilers and Marx (2002b); Currie et al. (2004Currie et al. ( , 2006; Eilers et al. (2006).
For the purposes of regression, we assume that the data are arranged as a column vector; that is, y = vec(Y ). Accordingly, we arrange the matrices of exposures E so that e = vec(E). Let B a , m × k a be the regression matrix of B-splines based on age x a , and, similarly, let B y , n × k y be the regression matrix of the explanatory variable for year x y . The regression matrix for our two-dimensional model is the Kronecker product As in the unidimensional case, the number of columns in B a and B y is related to the number of knots chosen for the B-splines. Following the same concept as in the unidimensional case, we will use a relatively large number of equally spaced B-splines over both domains.
In a regression setting, the matrix B has an associated vector of regression coefficients a of length k a k y . Therefore, the model can be written as η = ln(E(y)) = ln(e) + ln(µ) = ln(e) + (B y ⊗ B a )a = ln(e) + Ba.
If we arrange the elements of a in a k a × k y matrix A, where a = vec(A), Equation 5 can be written as ln(E(Y )) = ln(E) + ln(M ) = ln(E) + B a AB y .
From the definition of the Kronecker product, we can independently penalize the coefficients over the rows and columns of A. Let the difference matrices, D a and D y , act on the columns and rows of A, respectively. The penalized IRWLS in Equation 2 can still be used with the following two-dimensional penalty: where λ a and λ y are the smoothing parameters used for age and year, respectively. I ka and I ky are identity matrices of dimension k a and k y , respectively. More details can be found in Currie et al. (2004).
As in the unidimensional setting, B-splines provide enough flexibility to capture surface trends. The additional penalty reduces the effective dimension, leading to a rather parsimonious model with a smoothed, fitted surface. Another advantage of using two-dimensional P-splines is that doing so allows us to choose different smoothing parameters over ages and years, which makes the model very flexible.

A generalized linear array model approach
In theory, we can use Equation 2 to estimate a in a two-dimensional setting as well. This is possible when the problems are of moderate size. But with mortality data, the problems can be relatively large. For instance, with 90 ages and 70 years, and assuming B a has 90/5 = 18 and B y has 70/5 = 14 columns, the regression matrix would be (m·n)×(k a ·k y ) = 6300×252.
Even the computation of B W B is challenging in the context of two dimensional smoothing and, in general, the penalized IRWLS algorithm quickly runs into storage and computational difficulties.
In order to overcome these issues, Eilers et al. (2006) and Currie et al. (2006) propose an algorithm that takes advantage of the special structure of both the data as an rectangular array, and the model matrix as a tensor product. They call their model the generalized linear array model, or GLAM.
This concept can be illustrated in the computation of µ = vec(M ) in two dimensions, as in Equation 6, in which the matrix of force of mortality is constructed without using the large Kronecker product basis. In this manner, we save both space and time.
In a similar fashion, the inner product for the two-dimensional setting in Equation 2 can be obtained efficiently.
We define the function G(X) as the row tensor function for a n × c matrix X: where 1 is a vector of 1s of length c.
Following Equation 2.6 in Currie et al. (2006), the inner products in Equation 2 for a twodimensional setting can then be written as whereW is the m × n matrix of weights, i.e., vec(W ) = diag(W ).
Equation 8 has two important computational properties. On the right-hand side, we avoid the computation of the potentially large matrix B = B y ⊗ B a . Moreover, the number of multiplications on the right-hand side is much smaller than the number on the left-hand side.
Section 4 in Currie et al. (2006) provides details of the rearranging and the re-dimensioning of the right-hand side that are required to give the left-hand side exactly. This procedure is done internally in MortalitySmooth. In the following, we present a small example in which the computing times and the object sizes are compared for the direct and the GLAM algorithms: the left-hand and the right-hand sides of Equation 8.
First we create artificial axes and associated B-spline bases: R> a <-1:50 R> m <-length(a) R> y <-1:100 R> n <-length(y) R> Ba <-MortSmooth_bbase(x = a, xl = min(x), xr = max(x), ndx = 7, deg = 3) R> ka <-ncol(Ba) R> By <-MortSmooth_bbase(x = y, xl = min(y), xr = max(y), ndx =17, deg = 3) R> ky <-ncol(By) R> w <-runif(m * n) The last line creates a vector of pseudo-weights. The direct computation of the inner product in Equation 2 could be as follows, with the related computing time: The GLAM algorithm requires the construction of each tensor product of the right-hand side of Equation 8, as well as the arrangement of the weights inW . Moreover, in MortalitySmooth the function MortSmooth_BWB recovers the left-hand side by re-dimensioning these elements, which is a very efficient operation.

Smoothing parameter selection
In a P-spline approach, the trade-off between parsimony and accuracy is clearly driven by the choice of the smoothing parameter(s). We need to balance the ability to reproduce observed data and to achieve small effective dimensions.
In a GLM framework, a common measure of discrepancy is the deviance. For Poisson counts, it takes the following form: On the other side, we define the effective dimension, or the degrees of freedom of the model to be ED = tr(H) .
We refer the reader to Hastie and Tibshirani (1990, p. 52) for a fuller treatment of this problem. The hat-matrix H can be computed from the estimated linearized smoothing problem in Equation 2: whereŴ contains the weights of the last iteration after convergence. By fixing a value of λ or a combination (λ a , λ y ), both deviance and the effective dimension can be computed in both a one-dimensional and in a two-dimensional setting.
In a P-spline framework, we need some way of choosing an "optimal" value(s) for the smoothing parameter(s) which can balance bias and variance in the model construction. Marx (1996, 2002a) have suggested using the Akaike information criterion (AIC, Akaike 1973). The expression for AIC is given by Alternatively, one can use the Bayesian information criterion (BIC, Schwarz 1978), which penalizes model complexity more heavily than AIC. In a two-dimensional setting, Once the smoothing parameter(s) has been selected, the system of equations described in (2) has a unique solution. As has already been pointed out by Currie et al. (2004, p. 285), a stiffer fit, which is given by the BIC, is preferred when modeling mortality data with P-splines. Therefore, MortalitySmooth will have this criterion as the default option.
In practice, based on a given information criterion, two functions are employed for searching for the optimal smoothing parameter in the one-dimensional setting, and the (λ a , λ y ) combination used in the two dimensional setting: Mort1Dsmooth_optimize and Mort2Dsmooth_optimize. Instead of conducting a plain grid-search over a wide range(s) of possible smoothing parameter(s), these functions make use of cleversearch from the R package svcm (Heim 2009). This approach efficiently explores a one(two)-dimensional space of the smoothing parameter(s), optimizing each smoothing parameter in turn, moving at most one grid step up or down.
The smoothing parameter selection is made in two separate steps, and, in a unidimensional setting, two arguments control this optimization procedure: TOL2 and RANGE. The former argument controls the difference between the two adjacent smoothing parameters in the grid search, and by default is set to 0.5 on a logarithmic scale. RANGE contains the range of smoothing parameters in which the grid search is applied, and is set by default to (10 −4 , 10 6 ), taken on the log-scale.
First, we search over the complete RANGE using a rough grid (four times TOL2) and the median of RANGE as an initial value of λ. Next, the procedure searches in the restricted range around the sub-optimal smoothing parameter, using a finer grid defined by TOL2. This procedure allows us to find a precise smoothing parameter in an efficient manner: we do not explore the full range of possible λ values, moving at most one grid step up or down. Furthermore, the two-step routine reduces the risk of finding a sub-optimal smoothing parameter. In twodimensional cases, the controlling arguments are defined over both domains.
The default ranges and tolerance levels have been proven to be adequate for many applications in mortality research. Nevertheless, two issues could arise when working with objective information criteria: (1) the information criterion may not present a clear minimum, and (2) the information criterion may not provide reasonable outcomes. The optimization procedure in MortalitySmooth attempts to reduce the risk associated with this first problem by evaluating the selected criterion over many values of the smoothing parameter(s). In the latter case, neither BIC nor AIC might be suitable, mainly because the data present features that do not fit with the assumptions of these information criteria, such as strong seasonal effects. As a valid alternative, the user could design a personal grid search, or compute a specific information criterion that is more in accordance with the data in hand.

Overdispersed Poisson data
We frequently find that the Poisson assumption, which was mentioned above, can be relatively strong in demographic and actuarial data, and that, in particular situations, the presence of overdispersion cannot be neglected. Breslow (1984) and Cameron and Trivedi (1986) are among those who have recognized that counts may display extra-Poisson variation, or overdispersion, relative to a Poisson model.
However, the P-spline methods can be modified and adapted to this new situation. Replacing the classic Poisson assumption, we allow the variance to be adjusted as a function of the mean. Specifically, we have the variance proportional to the mean. The additional dispersion parameter will then be the constant of proportionality. In the formula, we have for the unidimensional case Var(y) = ψ 2 v(eµ) .
where v(·) is the variance function and ψ 2 is the dispersion parameter. For the simple Poisson case, we would have v(eµ) = eµ and ψ 2 = 1.
Using this generalization, the penalized IRWLS algorithm is modified and a quasi-likelihood estimation is performed. The classic works in this area are by McCullagh and Nelder (1989) and Cameron and Trivedi (1998). For a recent treatment from a P-splines perspective, we refer the reader to Djeundje and Currie (2011).
It is worth pointing out here that, regardless of whether maximizing the penalized likelihood with B-splines basis leads to the system of equations in (2), the penalized quasi-likelihood version is given by where the diagonal weight matrix is scaled by the dispersion parameter ψ 2 Since the working variablez is not modified by ψ 2 , the overdispersion parameter will have an effect on the estimated coefficients a. For instance, with overdispered data, i.e., ψ 2 > 1, the resulting fitted values will be smoother.
In a one-dimensional setting and for a fixed smoothing parameter, the dispersion parameter can be computed as follows: but an information criterion is needed for estimating the smoothing parameter. Replacing the log-likelihood in both AIC and BIC with the quasi-likelihood, we obtain: For a two-dimensional setting, m is substituted with m · n in both Equation 15 and Equation 16. It is easy to see that, the larger the ψ 2 , the smaller the first term in both AIC and BIC is. Such a result clearly leads to smoother outcomes.
In MortalitySmooth, overdispersion can be accommodated. In this case, the smoothing parameter(s) is first optimized with ψ 2 = 1, and then the dispersion parameter is computed by Equation 15. Given the updated ψ 2 , a criterion from Equation 16 is used for re-optimizing the smoothing parameter(s). This sequence of steps continues until convergence. Details of this iterative procedure are given in Section 3 of Djeundje and Currie (2011).

Data selection
As was mentioned above, P-splines are suitable for smoothing any kind of Poisson-distributed data. However, MortalitySmooth is specifically tailored to demographers and actuaries conducting mortality research. The package therefore provides examples and some data from this area. The user can supply external data, as well, provided they are conformable.
Data from four countries (Denmark, Japan, Sweden, and Switzerland) are contained in the object HMDdata. The data are taken from the Human Mortality Database (2011). Matrices of age-specific population, deaths, exposures, and death rates are provided.
For a given country, type of data, and sex, the data have the same number of rows (ages 0 to 110), whereas the ranges of years are subject to availability. HMDdata are accessible either manually or simply through using the function selectHMDdata. In the following, we present the latter approach to selecting the death rates of Danish females from ages 50 to 100, and from 1950 to 2009: R> x <-50:100 R> y <-1950:2009 R> mydata <-selectHMDdata(country = "Denmark", data = "Rates", + sex = "Females", ages = x, years = y) We create an HMDdata object: an indexed matrix with its own attributes. Within the package, these object mydata can be directly plotted as follow: R> plot(mydata, cut = 10, col.regions = terrain.colors (11)) The function plot will recognize the dimension(s) of the data and reproduce either a simple plot or a shaded contour map for one-and two-dimensional data, respectively. By default, while death counts are plotted on the original scale, death rates are presented on a log-scale. Figure 1 presents the outcome for the Danish data previously extracted.

Smoothing with MortalitySmooth
The main functions for smoothing in MortalitySmooth are Mort1Dsmooth and Mort2Dsmooth; they are designed to smooth one-and two-dimensional Poisson counts, respectively. They are both implemented in an object-oriented design using the standard S3 paradigm.
Internally, these functions use rich B-splines bases for ensuring flexibility (one knot every five data-points). By default, polynomials with q = 3 are chosen for constructing B-splines and the penalty matrix has d = 2. However, the number and the degree of the B-splines, as well as the order of differences, can be adjusted by the user.

Smoothing counts in one dimension
The function Mort1Dsmooth returns an object of class Mort1Dsmooth, which is a P-splines smoothing of the input data of a degree and order (eventually) fixed by the user.
The main arguments are an abscissa of the data (x), a set of count response variable values series (y), and the offset term (offset). In a mortality context, the first argument would be either the ages or the years under study. Death counts would be the response, y. As shown in Section 2.1, the offset is the logarithm of the exposure population.

R> fit1D$deviance
[1] 73.0779 Given such figures, the user could manually compute the overdispersion parameter ψ 2 from Equation 15: 73.0779/(56 − 6.358) = 1.472098. However, this value, along with others, could be read directly from the resulting summary of the model:
The MortalitySmooth package provides several support functions for a Mort1Dsmooth object. For instance, we can directly plot fit1D, obtaining the actual death rates on a log-scale along with the fitted rates 1 :

Extrapolation
The function predict.Mort1Dsmooth can be used to compute standard errors for response and log-rates from Mort1Dsmooth objects. Furthermore, this function allows us to forecast mortality (see Currie et al. 2004). The following example takes the model previously fitted to Danish males at age 60 and uses it to produce forecasts to 2020, as well as to compute standard errors for the whole period.

R> x.new <-x[1]
:2020 R> fit1Dfor <-predict(fit1D, newdata = x.new, se.fit = TRUE) Alternatively, the user could employ the argument w and an augmented dataset to manually forecasting mortality. It should be emphasized that the order of difference for the penalty term is crucial when P-splines are employed for extrapolating past trends. Specifically, the B-spline coefficients form a polynomial sequence of degree d − 1 in the future trend; i.e., using the default order of difference d = 2, we get a linear extrapolation of the last two B-spline coefficients. This induces an approximate log-linear extrapolation of the death rates. We note that extrapolation depends on the spacing of the knots, with smaller spacing leading to a more volatile extrapolation.
To acknowledge differences when forecasting using a diverse order of difference, we estimate and extrapolate the same data using d = 1 and d = 3; i.e., constant and quadratic polynomials for the future log-rates, respectively.

Confidence intervals
In a P-splines approach, standard errors can be obtained using the hat-matrix, and a predict function can be used for extracting them from a Mort1Dsmooth object. In the following, we extract the standard errors from the object fit1Dfor, and we plot actual death rates along with the fitted and forecasted ones and a 95% confidence interval. The outcomes are presented in Figure 3, along with the forecasted death rates from P-splines with d = 1 and d = 3. We use the 1945-2000 data and predict the 2001-2020 rates. As validation, we also plot the actual rates from 2001 to 2009.

Overdispersion
The level of overdispersion in the data for an optimal smoothing parameter is given by the value of ψ 2 , Equation 15. From a Mort1Dsmooth object we already know the value: 1.47.
A value of ψ 2 larger than one is a sign of the presence of overdispersion. We can account for such a phenomenon simply by setting the argument overdispersion equal to TRUE: R> fit1Dover <-Mort1Dsmooth(x = x, y = y, offset = log(e), + overdispersion = TRUE) As was pointed out in Section 2.5, smoother results are obtained when we account for overdispersion. In other words, a ψ 2 > 1 within a quasi-likelihood approach for fitting P-splines leads to a larger smoothing parameter. In the case of the Danish data, we more than triple the parameter λ, and, consequently, decrease the effective dimension: R> c(fit1D$lambda, fit1D$df) [1] 316.227766 6.357878 R> c(fit1Dover$lambda, fit1Dover$df) [1] 1000.000000 4.819838

Smoothing counts in two dimensions
In order to smooth mortality over both ages and years, another function needs to be employed: Mort2Dsmooth. As with the uni-dimensional approach, this function returns an object of class, Mort2Dsmooth, which can be used by support functions. In this generalization, the user needs to provide a series of ages and years, as well as two matrices with deaths and exposures. The arguments are named x for the abscissa (commonly, the ages), y for the ordinate (here, the years), and Z for the matrix of death counts over age and years. In this case as well, the log of the matrix of the exposure population is needed as an offset in the model.
In the following example, we extract deaths and exposures for Swedish females, and smooth these data over age and time. The optimal combination of the two smoothing parameters (see Section 2.4) is by default obtained by minimizing the BIC.
R> x <-10:100 R> y <-1930:2010 R> Y <-selectHMDdata("Sweden", "Deaths", "Females", ages = x, years = y) R> E <-selectHMDdata("Sweden", "Exposures", "Females", ages = x, years = y) R> fit2D <-Mort2Dsmooth(x = x, y = y, Z = Y, offset = log(E)) R> fit2D Call: Mort2Dsmooth(x = x, y = y, Z = Y, offset = log(E)) Number Additional information on the fitted object can be found in the summary. Also, in the twodimensional setting, the function predict could be used for computing the confidence interval, as well as for predicting values. A shaded contour map of both the actual and the smooth death rates is automatically produced by plotting the Mort2Dsmooth object: R> plot(fit2D, palette = "terrain.colors") Figure 4 presents the outcome using the color palette terrain.colors. Other color options are available in ?plot.Mort2Dsmooth . Again, P-splines demonstrate their ability to retain the signals coming from important mortality changes, and, at the same time, to leave out random noise. This feature is particularly useful in a two-dimensional setting: the option to borrow information from neighboring years allows the model to capture the actual trends in the presence of large random fluctuations. Smoothing small populations and cause-specific mortality is easier when this feature is used. The function residuals can be applied to a Mort2Dsmooth object as well as in the unidimensional setting. Deviance, Pearson, Anscombe and working residuals can be extracted:

Residuals
R> res2D <-residuals(fit2D, type = "pearson") Residuals can be plotted over the predictors of the model. This is done in Figure 5, in which residuals from a two-dimensional P-splines fit for the Swedish example are plotted over age and time. This two-dimensional representation makes it possible to discern where the model cannot capture the actual data. When applied to mortality data, this type of plot may reveal peculiar period shocks and cohort effects. For instance, in Figure 5 most residuals are around the value of zero, without showing any particular systematic features. The only exceptions are the high values during World War II, during which the model overestimates the actual death rates (the negative residuals are represented by dark-pink). In R, this figure is produced by: R> res.grid <-expand.grid(list(x = x, y = y)) R> res.grid$res <-c(res2D) R> levelplot(res~y * x, data = res.grid, + at = c(min(res2D), -2, -1, 1, 2, max(res2D)))

Interpolation
The weights within the estimation procedure are by default all equal to one. A modification of this argument is allowed, and can be used for extrapolation as well as interpolation. Here, we present an example in which we randomly assign zero-weights to about 80% of the data; i.e., we interpolate the complete surface using only one-fifth of the available information.
R> m <-length(x) R> n <-length(y) R> set.seed(1) R> whi0 <-sample(x = 1:(m * n), size = 5900) R> W <-matrix(1, m, n) R> W[whi0] <-0 R> fit2Dint <-Mort2Dsmooth(x = x, y = y, Z = Y, offset = log(E), W = W) Warning message: Interpolation and/or extrapolation is taking place The warning message informs the user that part of the weights is set equal to zero; i.e., interpolation and/or extrapolation are taking place with the estimation procedure. The function predict.Mort2Dsmooth can also be used for interpolating mortality over ages and/or years when a fitted object is already available. Likewise, in the extrapolation example, the Figure 6: Actual and fitted death rates from a 2D smoothing with P-splines over ages for selected years, logarithmic scale. λ a and λ y selected by BIC computed on less than 20% of actual data (randomly chosen). Sweden, females, ages 10-100, 1930 to 2010. power of penalties becomes clearly visible when interpolating: the B-spline coefficients lead to polynomial sequences of degree 2d − 1, which means cubic interpolation for the default d = 2.

Conclusions and outlook
In this paper, we present an R package for smoothing mortality using a P-spline approach: MortalitySmooth. While the routines can be applied to any Poisson-distributed data, the emphasis is on mortality data, and the package is structured as a user-friendly tool for demographers, actuaries, epidemiologists, and geneticists who deal with mortality research.
Smoothing death counts is crucial in many demographic analyses, and the outcomes of smoothing methods can be exploited for studies that involve changes in mortality rates, agestructure decompositions, and the construction of continuous life-tables.
We have outlined several reasons for choosing P-splines in this setting. In general, greater flexibility is provided when there are a relatively large number of B-splines in the basis, while a reduction of the effective dimension is neatly produced by an additional penalty on the coefficients of the B-splines. In our mortality analysis, we modeled directly the death counts which are distributed as Poisson in a single age-year interval. Because P-splines can be seen as a generalization of linear regression, this method is useful for smoothing mortality trends.
We have also demonstrated that this methodology allows for a comprehensive analysis of the mortality developments over both ages and time with the construction of a two-dimensional basis.
The MortalitySmooth package offers two main functions: Mort1Dsmooth and Mort2Dsmooth for fitting data over one-and two-dimensional settings. A user needs to provide only a few arguments for a default outcome: the death counts, the eventual exposure population, and the domain(s) over which the data are collected. Additional arguments can be used for setting features of the B-spline bases and the penalty term. Nevertheless, it is known that, for a P-spline approach, such choices make little difference. The only exception is the order of difference in cases of interpolation and extrapolation (see Eilers and Marx 2010).
Different criteria for selecting the degree of smoothness in the model are available. Additional support functions help the user to extract and plot outputs, as well as to compute confidence intervals and residuals from fitted objects. Extrapolation and interpolation are also allowed using predict functions. In addition, although not shown here, weights can be used to analyze mortality over an age-cohort grid. The user would first need to re-arrange both the deaths and the exposures into suitable matrices, and to construct weights of zero for uncompleted cohorts. Then the package would smooth this structure, assuming smoothness over ages and cohorts. Richards, Kirkby, and Currie (2006) have shown that this structure may lead to a better fit than models based on age and period. Future work will be done that includes the age-cohort approach directly as an argument in the main functions.
In a relational model framework, different populations can be modeled in terms of their distances from a reference population. Biatat and Currie (2010) present a new approach for jointly modeling several populations, allowing for the comparison and classification of mortality in different countries. This model is embedded in the GLAM structure, and we are currently developing an extension of the presented package to incorporate this methodology.
Human mortality generally increases quite smoothly after about age 10, but it shows a steep decline between birth and this age due to infant and child mortality. This pattern is also observed in many other species, and global smoothers over the whole age range do not produce satisfactory outcomes. Camarda, Eilers, and Gampe (2010) present an alternative solution starting from a P-spline approach. Moreover, Kirkby and Currie (2010) have proposed a smoothing model that successfully deals with another specific feature of mortality: period shocks, such as wars, flu epidemics, hot summers, or cold winters. We plan to incorporate into the package these approaches for infant mortality and period shocks.
On the computational side, the last two model extensions can benefit from the implementa-tion of sparse matrix methods within the GLAM algorithm. We plan to accommodate such methodology within MortalitySmooth. We also intend to include additional support functions, and, to provide a more general perspective, other distributions from the exponential family; e.g., binomial and gamma. This revised package will thus be seen as a tool that can be used for estimating generalized linear array models in a comprehensive framework.