Analyzing Temperature Eﬀects on Mortality Within the R Environment: The Constrained Segmented Distributed Lag Parameterization

Here we present and discuss the R package modTempEﬀ including a set of functions aimed at modelling temperature eﬀects on mortality with time series data. The functions ﬁt a particular log linear model which allows to capture the two main features of mortality-temperature relationships: nonlinearity and distributed lag eﬀect. Penalized splines and segmented regression constitute the core of the modelling framework. We brieﬂy review the model and illustrate the functions throughout a simulated dataset.


Introduction
Health effects of air temperature are well-known. Some epidemiologic evidence may be found in Braga, Zanobetti, and Schwartz (2001) and Basu and Samet (2002) among the others. Temperature effects have been studied since a long time but in the last decades quantifying temperature effects has become quite important owing to greenhouse effect and consequent climatic changes (e.g., McGeehin and Mirabelli 2001).
The relationship between mortality and temperature is found to be V-shaped in the most of areas around the world: mortality reaches its minimum at some optimal value and increases as temperature gets colder or hotter. The temperature value where mortality reaches its minimum is sometimes referred as minimum mortality temperature and represents the threshold value beyond which mortality increases. Moreover it has been ascertained that the effect is not limited to the same day-exposure t, say, but it is extended to several next days t + 1, t + 2, . . . An in-depth analysis of temperature effects on mortality requires to account for the prolonged effects (the so-called distributed lag effect) and for nonlinearity (Armstrong 2006). Muggeo (2008a) presents a unified framework to model the temperature effects on mortality. Let E[Y t ] = µ t be the expected number of deaths for day t = 1, 2, . . . , T , z t the temperature value, and x t the vector of additional confounding explanatory variables, such as days of week, holidays, influenza epidemics, for instance. The proposed model assumes Y t ∼ Pois(µ t ) and ( 1) where (z − ψ 1 ) − = (z − ψ)I(z < ψ) and (z − ψ 2 ) + = (z − ψ 2 )I(z > ψ 2 ) are two linear spline functions which allow to model the effects of low and high temperatures, respectively below the cold threshold ψ 1 and above the heat threshold ψ 2 ; L 1 and L 2 are the two maximum lag values selected to assess the delayed effects of cold and heat (typically 15 to 60); x t δ contains typical confounders sketched above; finally β 1l 1 and β 2l 2 describe the effect of temperature on the response.
To obtain plausible and reasonable findings, the model also assumes that the DL curves are smooth functions. At this aim, the beta parameters are expressed by means of linear combinations of B-spline bases, where C = [C 1 , . . . , C P 1 ] and H = [H 1 , . . . , H P 2 ] are the two B-spline bases respectively of rank equal to P 1 and P 2 with relevant coefficients b 1 and b 2 (Eilers and Marx 1996;Wood 2006). To complete specification of the DL curves, a penalty term is imposed on the DL coefficients. The overall penalty J(λ) is where D 1 and D 2 are difference matrices (Eilers and Marx 1996), Υ 1 and Υ 2 are two diagonal known weight matrices (Muggeo 2008a). Therefore the DL coefficients are doubly penalised: a standard difference penalty (b 1 D 1 D 1 b 1 and b 2 D 2 D 2 b 2 ) on the spline coefficients to ensure smoothness over the whole lag range in the spirit of classical P-splines (Eilers and Marx 1996), and an additional varying ridge penalty affecting late DL coefficients to favour the DL curves approaching to zero at longer lags. Therefore the penalised log-likelihood may be written as The smoothing parameter λ = (λ 1 , λ 2 , ω 1 , ω 2 ) affects the estimate of all the model parameter, especially β 1 and β 2 , by regulating the smoothness of the DL curves via the spline coefficients b 1 and b 2 . To obtain values of the smoothing parameter λ = (λ 1 , λ 2 , ω 1 , ω 2 ) , a reasonable approach is to minimise an empirical version of the expected mean square error: for known scale parameter (and specifically equal to one in the Poisson case) we consider the so-called un-biased risk estimator (or scaled AIC) given by (Wood 2006) in which Dev = 2 y i log(y i /μ i ) is the usual model deviance, and edf are the effective degrees of freedom computed as trace of the hat matrix. Additional measures are available, including the well-known AIC (Akaike information criterion) and BIC (Bayesian information criterion). Selection of λ may be carried out efficiently by the method proposed in Wood (2004) and implemented in his mgcv package by the function gam.fit().
The estimation procedure which allows to bypass the problems related to the non-regularity of the segmented models (1) or (2) extends the previous work of Muggeo (2003) implemented in the R package segmented (Muggeo 2009), and it is described elsewhere (Muggeo 2008a). Details are omitted, but it is important to emphasise that estimation is performed iteratively in terms of the spline coefficients (rather than the DL coefficients) maximising a log-likelihood penalised for the (4), and supplying starting values only for the thresholds.
Poor clear-cut segmented relationships, due to short time series and/or a lot of zeroes in the observed counts and/or and many outliers, can make model estimation difficult; problematic convergence may suggest that the model being fitted is not supported by data (see model o2 in the section 3). However limited experience on some datasets, shows that these computational troubles are quite unlike in typical time series and the model is successfully fitted most of times. At the convergence estimates of the thresholds and their standard errors are readily available from the model output, while the DL curves are easily obtained using the B-spline bases. For instance, for the cold DL curve we get and in the same way it is possible to obtain the estimates for the heat curve.

Overview of the package
The R package modTempEff includes functions to fit the constrained segmented distributed lag model to epidemiological time series of temperature and mortality. The package is written in R code (R Development Core Team 2009), and it is available from the Comprehensive R Archive Network at http://CRAN.R-project.org/package=modTempEff. The package depends on the packages mgcv and splines, and it includes the following functions: tempeff (formula, data, fcontrol, etastart, drop.L, ...). This is the main function aimed at estimating the model.
csdl (z, psi, L, ridge, ndx, DL, diff.varying). This function is employed within the formula of tempeff() to set the temperature variable and the arguments necessary to fit a CSDL parameterization.
seas (x, ndx). This function allows to include in the linear predictor a nonparametric term for the long term trend and seasonality.
plot (x, which, var.bayes, add, delta.rr, level, ...). Method to plot the estimated DL curves for cold and heat.
tempeff() is used to specify the model: formula is the standard formula of the regression equation including confounders (e.g., days of week, influenza epidemics, ...) entering the model linearly, the temperature variable having a constrained segmented distributed lag relationship which has to be specified via the function csdl(), and the nonparametric term for long term trend and seasonality. data means the possible dataset where the variables are stored, and the control argument fcontrol refers to the some options of the fitting process returned fit.control(). Starting values may be supplied in etastart, and drop.L is an integer to specify whether the first drop.L observations have to be discarded before fitting. drop.L may be useful when several fitted models have to be compared and the same number of observations in each model is desirable, as explained below. The three dots ... accept arguments to be passed to csdl() as discussed below.
Actually tempeff() is based on tempeff.fit() which is not designed to be called from the user; in turn, tempeff.fit() uses gam.fit() from the mgcv package and splineDesign() from the splines package, both included in the R base distribution. tempeff() returns objects of class 'modTempEff' for which some methods exist as described below.
The function csdl() is employed to include in the model a variable having a csdl relationship with the response; this variable, specified via its first argument z, typically represents the mean or maximum daily temperature or sometimes the 'apparent' temperature accounting for humidity and pressure. The arguments psi and L are mandatory; one or two starting values have to supplied in psi depending on the number of the breakpoints to be estimated, while L defines the maximum lags within which to assess the effect of cold and heat, see L 1 and L 2 in formulas (1) and (2); Of course, the first max(L 1 , L 2 ) observations are removed when a CSDL is included. The optional arguments ndx, DL and diff.varying regulate smoothing of DL curves. ndx requires two integers (default to round(L/3)) to specify the 'apparent' dimension of the B-spline bases for cold and heat (P 1 and P 2 of formula (3)). The user may impose a global difference penalty on the spline coefficients (DL = FALSE, default) or on the DL coefficients themselves (DL = TRUE): empirical evidence has shown that the two options are unlike to lead different results. The argument diff.varying (default to FALSE) enables the user to specify a varying difference penalty, in the form l (β l − β l−1 ) 2 δ l with δ l being a monotonic function of lag l which penalises against large values of differences of DL coefficients. Some simulations have shown this varying difference penalty is substantially unnecessary and even not advised in practice, provided that a varying ridge penalty is used. The additional varying ridge penalties are specified via the argument ridge which defaults to NULL indicating no ridge penalty. Otherwise ridge is a length-two named list of characters written as a function of l; for instance two quadratic ridge penalties for both cold and heat may be set via ridge = list(cold = "l^2", heat = "l^2").
The function seas() allows to model the long term trend and seasonality in a non parametric way: a (usually rich) B-spline of rank ndx is used with a standard second-order difference penalty to prevent undersmoothing.
fit.control() allows to control the estimating algorithm, for instance via tol to regulate the tolerance value at which the algorithm stops, display to print the iterative process, and it.max to set the maximum number of the (outer) iterations of the algorithm. Each outer iteration comprises a few inner iterations managed by maxit.inner and GLM which defaults to FALSE. When GLM = TRUE, at each iteration an unpenalised GLM is fitted via glm.fit(), otherwise gam.fit() from mgcv is used. GLM = TRUE speeds up computations since the smoothing parameter is estimated only at the final iteration, and therefore it may be helpful with very large datasets when gam.fit() is unpractical; however some experience suggests to use GLM = FALSE to prevent premature convergence to non-optimal solutions.
Finally the methods print, summary, coef, anova and plot allows to visualize, extract and display the most important information of the fit; in particular coef returns L+1 coefficients of DL curves for cold and/or heat (depending on which), and plot portrays the fitted DL curves for cold and/or heat effects (depending on which) with pointwise confidence intervals at level level.

Fitting the model in R
We illustrate the aforementioned functions on a simulated dataset, including daily time series of natural mortality and temperature for five years (T = 1825). We load the package via

Loading required package: splines
Notice that modTempEff loads the packages splines and mgcv; the former is employed to build the B-spline bases of formula (3) via the function splineDesign(), and the latter is requested to perform 'selection' of the smoothing parameter λ and model estimation via the function gam.fit().
The typical dataset employed in the analysis of temperature (or air pollution) effect on health, comprises daily times series of the 'health variable' (mortality counts, all causes or causespecific) and meteorological/environmental variables. The dataset also includes variables corresponding to day, year, month, and day-of-week. The data being analysed have a similar appearance  The variables mtemp and dec1 are basic for our analysis, as they represent the daily time series of mean temperature and death counts; dec1 is actually simulated using estimates coming from a real data analysis. month, year, day, and wday are 'seasonal' variables respectively for month (12 level categorical variable), year (5 level categorical variable), day (integer t = 1, 2, . . . , 1825) and day of week (7 level categorical variable). Although additional variables, such as humidity or pressure, may be present in the dataset in practice, we neglect them for simplicity; the rationale of discussed methods and relevant code are unchanged. Figure 1 shows the daily time series of mortality counts and a raw scatterplot of mortality against temperature. Note the classical seasonal pattern in the daily series and the V-shaped relationship mortality-temperature. These figures may be obtained easily via R> layout(matrix(c(1, 1, 2), ncol = 3)) R> with(dataDeathTemp, plot(dec1, xlab = "day", ylab = "no. of deaths")) R> with(dataDeathTemp, plot(mtemp, dec1, xlab = "temperature", + ylab = "no. of deaths")) The interest centers on the temperature effects controlling for confounders, such as seasonality and days-of week, to be included in the regression equation. A starting model could consider the categorical variables year, month and day of week to control for seasonality, while the temperature effects could be modeled via equation (2) with maximum lags L 1 = L 2 = 60; we assume two B-spline bases of rank given by round(L)/3 (default) and no ridge penalty, i.e., the default ridge = NULL; the starting value for the threshold is set to 20 as suggested by the right plot in Figure 1. The option display = TRUE in fit.control() allows to monitor the estimation process by printing at each iteration the deviance and the current estimate of the threshold.
Following a recently consolidated approach in the analysis of mortality time series, we may improve the fit by including a smoother for seasonality rather than parametric terms for month, year and day; at this aim, we use penalised splines by modifying properly the formula, The smoother used to model the long term trend and seasonality of the observed series is a 'classical' P-spline, that is B-splines with a difference penalty on the coefficients. It should be emphasized that ndx, like ndx argument in csdl(), controls the rank, i.e., the size of the basis used for the penalized spline; the rank or 'apparent' dimension is ndx+3 since third degree splines are employed, but the 'actual' dimension, i.e., the effective degrees of freedom (edf ), are obtained as trace of the hat matrix, and they are typically much less than the corresponding basis size. Note that in the analysis of epidemiological time series, Psplines fitted by direct maximisation of the penalised log-likelihood should be preferred to the alternative nonparametric smoothers fitted by backfitting. The pitfall with the backfitting lies on the so-called 'concurvity' (i.e., a sort of nonparametric collinearity) which leads to biased estimates for the model parameters, especially for the cold effect (Ramsay, Burnett, and Krewski 2003;Muggeo 2004).
A raw inspection of the fitted models via the print method may be useful to assess the different fits, we can try to improve the model (such that each likelihood-based criterion is better), by also imposing the fitted DL curves to follow a more plausible biological pattern. Following arguments reported in Muggeo (2008a) and briefly sketched above, we could try to include a ridge penalty to allow the DL curves to approach to zero more rapidly. We set a linear varying ridge penalty on the DL coefficients i.e., Υ 1 = Υ 2 = diag(0, 1, 2, 3, . . . , 60) such that the varying ridge wiggliness measures become 60 l 1 =0 β 2 1l 1 l 1 and 60 l 2 =0 β 2 2l 2 l 2 , respectively for cold and heat. The argument ridge of csdl() has to be employed at this aim, and a natural call makes use of csdl (mtemp, 20, c(60, 60), ridge = list(cold = "l", heat = "l")) in the formula of tempeff(). However the ... in tempeff() accept arguments to be passed to csdl(); therefore we can simply type R> o.Ridge.l <-update(o.noRidge, ridge = list(cold = "l", heat = "l")) and note the formula reads correctly as R> formula(o.Ridge.l) dec1~factor(dweek) + seas(day, 30) + csdl(mtemp, psi = 20, L = c(60, 60), ridge = list(cold = "l", heat = "l"))

R> o
Each argument given in tempeff() via the ... is passed to csdl() by overwriting its possible previous value; this feature may be useful for the user interested in fitting and comparing different models, for instance by replacing the temperature variable ('apparent' rather than 'ambient' temperature, say) and/or by modifying the starting values for the breakpoint and/or the number of lags L.
The effect of the varying ridge penalty is to shrink the late DL coefficients throughout zero. However the amount of shrinkage depends on the weights (main diagonals of Υ 1 and Υ 2 ) and on the smoothing parameters ω 1 and ω 2 . While smoothing parameters are not modifiable as they are estimated by data, it is possible to increase weights to strengthen the effect of shrinkage. Quadratic or cubic weights lead to results similar to ones returned by a linear ridge (model o.Ridge.l) and relevant results are not shown. On the other hand a varying ridge penalty with quartic weights, such as 60 l 1 =0 β 2 1l 1 l 4 1 and 60 l 2 =0 β 2 2l 2 l 4 2 , leads to noteworthy outcome, R> o.Ridge.l4 <-update(o.noRidge, ridge = list(cold = "l^4", heat = "l^4")) R> o.Ridge.l4 Analysis of Deviance Table   Model 1: dec1~factor(dweek) + csdl(mtemp, psi = 20, L = c(60, 60)) + seas(day, 30) Model 2: dec1~factor(dweek) + seas(day, 30) + csdl(mtemp, psi = 20, L = c(60, 60), ridge = list(cold = "l", heat = "l")) Model 3: dec1~factor(dweek) + seas(day, 30) + csdl(mtemp, psi = 20, L = c(60, 60), ridge = list(cold = "l^4", heat = "l^4")) Resid. Df Resid. As expected, the deviance is lower for more complex models (higher edf ), however both AIC and BIC are lower for the model o.Ridge.l4 which uses less than 12.36 degrees of freedom as compared with the model with no ridge penalty. In conclusion, P-splines for seasonality and a quartic ridge penalty for the DL curves should be preferred. Of course, different combinations of varying ridge penalty patterns might be used for cold and heat, and comparisons could be made via statistical criteria and/or substantive assessments. We do not include these comparisons or a discussion in the present paper.
We can have a deeper glance of the 'selected' model via summary(),

R> summary(o.Ridge.l4)
Model: tempeff(formula = dec1~factor(dweek) + csdl(mtemp, psi = 20, L = c(60, 60)) + seas(day, 30), data = dataDeathTemp, fcontrol = fit.control(display = FALSE), ridge = list(cold = "l^4", heat = "l^4")) Seasonality ( Most of the printed information are rather self-explaining, although some points are noteworthy. The V variable shows the estimate and relevant t value of a re-parameterization of the threshold; at the convergence such values should be small. We suggest to warn about fits with large values of coefficients of the V variable. The reported net effect of temperature is the sum of the lag specific log relative risks for cold and heat. Such synthesis measure is aimed at quantifying the overall effect of cold and heat effects after accounting for possible 'harvesting'. The harvesting occurs when a positive association at short lags (positive lag-specific DL coefficients, typically within seven days) is followed by negative association at longer lags (negative lag-specific DL coefficients) which should suggest a 'deficit' of mortality. From an epidemiological point of view, this would emphasise that the effect of temperature is 'only' to anticipate the deaths by some days, probably affecting more vulnerable people, elderly or suffering persons (e.g., Hajat, Armstrong, Gouveia, and Wilkinson 2005). For the estimates of the net effect and of the threshold, two standard errors are computed. The 'frequentist' ones (SE.freq) are based on a sandwich formula involving penalised and unpenalised information matrix assuming fixed the smoothing parameter; the 'bayesian' standard errors (SE.bayes) also account, to some extend, for the smoothing parameters and therefore should be preferred as featured by better coverage properties, see Wood (2006) for details. Threshold estimate is also reported along with corresponding standard errors (bayesian and frequentist). Note the breakpoint is actually where L specifies the number of coefficients to be returned.
It is instructive to compare the fitted DL curves from the three aforementioned models. Figure 2 emphasises the nice end of the additional varying penalty. Plots on the left side show the fitted DL curves using only a global difference penalty. This output is substantially the one of the approach by Zanobetti, Wand, Schwartz, and Ryan (2000), although they deal with a linear (non-segmented) relationship using different basis and penalty. Note, however, the DL estimated curve (and its standard errors) does not approach to zero at late lags. This implies, for instance, that the estimate of the 'net' effect (sum of lag-specific effects) and corresponding standard error might depend on choice of maximum lag L.
While a simple difference penalty ensures smoothness over the lags, the varying ridge penalty allows the DL curve estimate to approach to zero. Unlike the only difference penalty, the additional varying ridge shrink the DL coefficients and their standard errors towards zero. DL coefficients near to zero at longer lags are biologically plausible since they assume a vanishing effect as lag increases. Moreover this feature makes choice of maximum lag a minor issue. Notice the argument new = FALSE has been set to display the plot on the current device; otherwise the default value new = TRUE would have opened a new device. Additional arguments for the plot method can be used to specify which DL has to be drawn (cold, heat or both of them), the level of the pointwise confidence intervals and which standard errors should be used (frequentist or bayesian). Note when the 'modTempEff' object has been called without a CSDL term, plot.modTempEff() still works by drawing the fitted nonparametric term for seasonality, provided that it has been included in the model. This method also works for fits obtained with fixed (i.e., not estimated) breakpoints via fcontrol = fit.control(it.max = 0). We conclude the illustration of the code by fitting model (1), namely two different thresholds for cold and heat. The only difference concerns the psi argument which now requires two starting values. Thus, R> o2 <-tempeff(dec1~day + factor(dweek) + factor(year) + factor(month) + + csdl (mtemp, psi = c(10, 20), L = c(60, 60)), + data = dataDeathTemp, fcontrol = fit.control(display = TRUE)) The algorithm does not converge after 20 iterations; in general, we could also increase the number of iterations or modify the starting values, but usually this does not change the result, see Muggeo (2008b) for a discussion about non convergence in segmented regression.
Broadly speaking, we can interpret such non-convergence as over-fitting, namely the fitted model is not supported by data and a 'bath-type' relationship (see equation (1)) is unlike. However it is always possible to inspect the fit to have a deeper assessment of the results,
We do not discuss further the selection between one-or two-breakpoints models, and following results reported in Tiwari, Cronin, Davis, Feuer, Yu, and Chib (2005), we suggest of using the BIC; at this aim the anova method includes the option test="BIC", R> anova(o.Ridge.l4, o2.Ridge.l4, test = "BIC")
The dataset shipped with the package also includes two additional simulated response counts: decNS which is not associated with mtemp, and dec2 which is associated via a CSDL parameterization with two breakpoints. The user may try to fit models with these responses and to assess different results.

Conclusion
We have discussed the practical implementation of a log-linear regression model to analyse the temperature effects on mortality with (daily) time series data. The model is estimated via penalised log-likelihood by means of the efficient gam.fit() function in the mgcv package. Estimates of distributed lag effect of the cold and heat, and threshold values are returned, along with additional linear parameters and a smoothing term to account for long term trend and seasonality.
There are several sides where the model and the package may be improved, specifically with regard to the effect of air pollution, e.g., particulate matter or ozone. Currently the pollutant may enter the model linearly in the formula of tempeff(), however it would be interesting to model it via an additional distributed lag effect with its proper maximum lag, size of the spline basis, and smoothing parameter to be estimated from data. Modelling pollutant via a separate DL does not pose particular problems and its implementation appears rather practicable: this feature could be included in the next release of the package. A much more challenging improvement would be to model the synergic effect of temperature and pollutant via two bivariate DL curves, cold-by-pollutant and heat-by-pollutant. The idea has been discussed in Muggeo (2007) assuming a fixed breakpoint ψ, but further investigation is needed to modify the model framework and the estimating procedure when the breakpoints have to be estimated. Another possible and non-straightforward extension of the package concerns the so-called case-crossover studies where each event day is matched to several control days according to a specified design (e.g., Janes, Sheppard, and Lumley 2005). The constrained segmented distributed lag parameterization may be still applied in theory, but the regression model to fit is not longer a log-linear model for count response but a conditional logit model with a binary response applied to an opportunely augmented dataset. Therefore fitting the constrained segmented distributed lag parameterization would rely on a different function, perhaps the clogit() from package survival (Therneau and Lumley 2009). More generally, the present package may be employed to model data from different fields; if model (1) or (2) hold and the response variable belongs to exponential family, modTempEff may be customized by modifying the family argument of the call to gam.fit().
However in its current implementation, at time of writing the model has been successfully employed in the analysis of temperature and mortality in Santiago and Palermo, two cities with different climatic conditions (Muggeo and Hajat 2009), and it is hoped that the package may be helpful for researchers involved in epidemiological studies of mortality and temperature.