DClusterm : Model-Based Detection of Disease Clusters

The detection of regions with unusually high risk plays an important role in disease mapping and the analysis of public health data. In particular, the detection of groups of areas (i.e., clusters) where the risk is signiﬁcantly high is often conducted by public health authorities.Manymethods have been proposed for the detection of these disease clusters, most of them based on moving windows, such as Kulldorﬀ’s spatial scan statistic. Here we describe a model-based approach for the detection of disease clusters implemented in the DClusterm package. Our model-based approach is based on representing a large number of possible clusters by dummy variables and then ﬁtting many generalized linear models to the data where these covariates are included one at a time. Cluster detection is done by performing a variable or model selection among all ﬁtted models using diﬀerent criteria. Because of our model-based approach, cluster detection can be performed using diﬀerent types of likelihoods and latent eﬀects. We cover the detection of spatial and spatio-temporal clusters, as well as how to account for covariates, zero-inﬂated datasets and overdispersion in the data.


Introduction
The analysis of epidemiological data at small area level often involves accounting for possible risk factors and other important covariates using different types of regression models. However, it is not uncommon that after a number of covariates have been accounted for, residuals still show a spatial distribution that defines some groups of areas with unusual high epidemiological risk. Hence, in many occasions it is not clear whether all spatial risk factors have been included in our model.
Public health data are often aggregated over small administrative areas due to issues of confidentiality. However, individual data are often available as well, and generalized linear models (GLM, McCullagh and Nelder 1989) is a common framework used in disease mapping for modeling aggregated and individual-level information. GLMs not only model Poisson or binomial responses, but they can also link the outcome to a linear predictor on the covariates (and, possibly, other effects). However, until recently, it was not clear how to use GLMs to detect clusters of disease, a group of contiguous areas with significant high risk.
In order to detect disease clusters, the most widely used method is probably the spatial scan statistic (SSS) proposed by Kulldorff (1997). Given a possible cluster z, the SSS will compare the relative risk in the cluster θ z to the relative risk outside the cluster θ z using the following test: This test is performed via the use of a likelihood ratio statistic, where many different possible clusters are tested in turn by changing the areas in z and the most likely cluster (i.e., the one with the highest value of the test statistic) is selected. Significance of this cluster is assessed with a Monte Carlo test that also provides the p value.
Several R packages include an implementation of the SSS. DCluster (Gómez-Rubio, Ferrándiz-Ferragud, and López-Quílez 2005) implements the orginal version of the SSS for spatial binomial and Poisson data and can compute cluster significancy by means of bootstrap or a Gumbel distribution. Package SpatialEpi (Kim and Wakefield 2018) also implements the SSS for binomial or Poisson data and assesses significancy using a Monte Carlo test. rsatscan (Kleinman 2015) provides a set of functions to create the necessary data files and run the SaTScan software (for Windows) from R. Hence, spatio-temporal detection of disease clusters can also be performed. Package scanstatistics (Allévius 2018) implements a number of space-time scan statistics, that includes spatio-temporal versions of the SSS. Similarly as the original SSS, these implementations lack of a general way of adjusting for covariates and assessing the significancy of non-primary clusters.
Other packages which can be used for the detection of spatial clusters are described now. The smerc package (French 2019) implements different scan statistics, including some for the detection of non-circular clusters. Package AMOEBA (Valles 2014) includes a function to detect spatial clusters based on the Getis-Ord statistic. Bayesian Dirichlet processes for spatial clustering, as developed in Molitor, Papathomas, Jerrett, and Richardson (2010), are implemented in the PReMiuM package (Liverani, Hastie, Azizi, Papathomas, and Richardson 2015). A version of the spatial scan statistic adapted to detect clusters in social networks is available in package SNscan (Wang, Hsu, and Phoa 2016). Package SpatialEpiApp (Moraga 2017) includes a shiny application for the analysis of spatial and spatio-temporal disease mapping that includes cluster detection using the SSS. Methods for the detection of spatial clusters with case-control point data are available in package smacpod (French 2018). The surveillance package (Salmon, Schumacher, and Höhle 2016;Meyer, Held, and Höhle 2017) also implements several methods for the detection of spatial and spatio-temporal clustering. Finally, package ClustGeo (Chavent, Kuentz, Labenne, and Saracco 2017) implements hierarchical clustering with spatial constraints that can be used to partition regions into different groups of contiguous areas.
In this paper we describe a novel approach to disease cluster detection that provides a link between GLMs and SSS. We have implemented this approach via the free and open source DClusterm package (Gómez-Rubio, Moraga, and Rowlingson 2019) for the R statistical software (R Core Team 2019). This approach involves fitting many different GLMs for which dummy variables that represent possible clusters are included one at a time. Cluster detection is based on selecting a number of dummy cluster variables using variable selection methods.
This paper is organized as follows. Section 2 will introduce the link between GLM and SSS. Next, in Section 3 we show how to include random effects in the detection of disease clusters to account for over-dispersion. The detection of disease clusters for zero-inflated data is discussed in Section 4. Section 5 describes how to extend these ideas to detect clusters in space and time. Finally, a discussion and some final remarks are provided in Section 6.

General description
An explicit link between GLMs and the SSS is provided by Jung (2009) and Zhang and Lin (2009) who show that the test statistic for a given cluster is equivalent to fitting a GLM using a cluster variable as predictor. This cluster variable is a dummy variable which is 1 for the areas in the cluster and 0 for the areas outside the cluster. By including cluster covariates in the model we obtain an estimate of the increased risk (as measured by its associated coefficient) and its significance (by means of the associated p value, for example).
We demonstrate our aproach by first considering a Poisson model with expected counts E i and observed cases O i modeled as: Here µ i is the mean of the Poisson distribution (which is equal to E i θ i ), log(E i ) denotes an offset to account for the population "exposure" (age, gender, etc.), x i represents a covariate of the outcome of interest and θ i is the relative risk and it measures any deviation (increase or decrease) in the incidence of the disease from the expected number of cases. A simple and raw estimate of the relative risk θ i that does not require covariates is the standardized mortality ratio (SMR), which is defined as O i /E i (Waller and Gotway 2004).
After fitting this model, we obtain estimatesα andβ which account for the effects of noncluster covariates in the model. We include the cluster covariates as follows: The overall intercept is now log(E i ) +α +βx i and c (j) i denotes a dummy variable associated with cluster j, with j taking values from 1 to the number of clusters being tests, defined as Here, γ j is a measure of the risk within cluster j. We are only interested in clusters whose coefficient is significantly higher than 0 (i.e., increased risk). Hence those with a significant negative coefficient will be ignored. We use model selection techniques to select the most important cluster in the study region. As such, the log-likelihood can be used to compare the model with the cluster variables to the null model (i.e., the one with no cluster covariates at all).
It is possible to perform cluster detection without considering non-cluster based covariates in the model. Here, cluster detection accounting for the non-cluster based covariates will likely provide a different number of clusters than models fit without these variables. By comparing the clusters detected in both models (with and without non-cluster based covariates), we will be able to find which clusters are linked to underlying risk factors included in the model and which clusters remain unexplained by these other variables.
In the examples that we include in this paper, we will consider both scenarios to better understand how cluster detection works.
Similar approaches to detection of disease clusters have been proposed by Bilancia and Demarinis (2014) and Gómez-Rubio, Moraga, and Molitor (2018)

Leukemia in upstate New York
We first consider an example where we model counts of leukemia cases in upstate New York. These data are provided in the NY8 dataset, available in package DClusterm. It provides cases of leukemia in different census tracts in upstate New York. This data set has been analyzed by several authors (Waller, Turnbull, Clark, and Nasca 1992;Waller and Gotway 2004). The location of leukemia is thought to be linked to the use of Trichloroethene (TCE) by several companies in the area. Figure 1 (using color palettes from RColorBrewer, Neuwirth 2014) shows the standardized mortality ratios of the census tracts and the locations of the industries using TCE.
In order to measure exposure, the inverse of the distance to the nearest TCE site has been used (variable PEXPOSURE in the dataset). In addition, two other socioeconomic covariates have been used: the proportion of people aged 65 or more (variable PCTAGE65P) and the proportion of people who own their home (variable PCTOWNHOME).
Hence, our first action is to load the DClusterm package and the NY8 dataset: R> library("DClusterm") R> data("NY8", package = "DClusterm") A number of cases could not be linked to their actual location and they were distributed uniformly over the study area, making the counts real numbers instead of integers (see, Waller and Gotway 2004, for details). We have rounded these values as we intend to use a Poisson likelihood for the analysis. Furthermore, expected counts are computed using the overall incidence ratio (i.e., total number of cases divided by the total population). Age-sex standardization is not possible in this case as this information is not available in our dataset.
R> NY8$Observed <-round(NY8$Cases) R> NY8$Expected <-NY8$POP8 * sum(NY8$Observed) / sum(NY8$POP8) R> NY8$SMR <-NY8$Observed / NY8$Expected The centres of the areas will be stored in columns named x and y. This will be used later when ordering the areas by increasing distance to the putative cluster centre. If the location of the main populated cities are available these could be used but here we will use function coordinates() instead: As DClusterm is designed to detect clusters in space and time, it will always expect data to be from one of the classes in packages sp, for purely spatial data, or spacetime (Pebesma 2012), for spatio-temporal data.

Cluster detection with no covariates
First of all, a model with no covariates will be fitted and used as a baseline, so that other models can be compared to this one (for example, using the AIC or the log-likelihood) to assess whether they provide a better fit.
Argument thegrid will take a 2-column data.frame (with names x and y) with the centres of possible clusters. If the grid of cluster centres is not defined, then a rectangular grid is used with a distance between adjacent points defined by argument step. Dummy cluster variables are created around these points by adding areas to the cluster until a certain proportion of the population (defined by argument fractpop), with the column to be used to compute the population at risk in the cluster defined by parameter ClusterSizeContribution, or until a certain distance about the centre (defined by argument radius) has been reached. When testing for significant cluster variables, argument alpha defines the significance level. DetectClustersModel() can detect spatial and spatio-temporal clusters, which is why its first argument is a spatial or space-time object. The type of clusters that are investigated is defined by argument typeCluster. In the example we have used typeCluster = "S" in order to detect spatial clusters. For spatio-temporal clusters typeCluster = "ST" must be used, as described in Section 5.
The output of DetectClustersModel() contains all clusters found at significance level given by the parameter alpha. In the detection process, multiple tests are performed simultaneously which can increase the likelihood of incorrectly declaring a group of areas as a cluster. This situation is known as multiple testing problem and to avoid it several statistical methods have Figure 1: Standardized mortality ratios and covariates of the incidence of Leukemia in upstate New York dataset. PEXPOSURE is the inverse of the distance to the nearest TCE site, PCTAGE65P is the proportion people aged 65 or more, and PCTOWNHOME is the proportion of people who own their home. The red crosses in the top-left map mark the locations of the TCE sites.
been developed. Here, we deal with this problem using the Bonferroni correction which uses a stricter significance level to declare a cluster (Dunn 1958). Specifically, a cluster is declared if its p value is smaller than alpha divided by the number of clusters tested.
In addition, overlapping clusters can be removed with function slimkncluster() (see below) so that only clusters with the greatest significancy are reported. This will also help to reduce the problem of multiple testing as only one possible cluster around a given cluster center is reported, similarly as the SSS.
The output of DetectClustersModel() has a column called alpha_bonferroni that denotes the level of significance adjusted for multiple testing using the Bonferroni correction. Thus, users can avoid the multiple testing problem by filtering the output and keeping only the clusters with p value less than alpha_bonferroni.
Other options include the number of replicates for Monte Carlo tests (argument R) if cluster assessment is done by simulation. By default, Monte Carlo tests are not used. DClusterm allows for parallel computing using several cores as implemented in package parallel. The number of cores to use is defined in option mc.cores (now 2 cores are used): R> options(mc.cores = 2) In the following example, to reduce the computational burden, we have only looked for clusters around 5 areas (whose rows in NY8 are defined in variable idxcl). In a real application we advise the use of all locations (area centroids or actual locations of individual data).
R> idxcl <-c(120, 12, 89, 139, 146) R> ny.cl0 <-DetectClustersModel(NY8, thegrid = as.data.frame(NY8)[idxcl, + c("x", "y")], fractpop = 0.15, alpha = 0.05, radius = Inf, + step = NULL, typeCluster = "S", R = NULL, model0 = ny.m0, Below is a summary of the clusters detected. Dates are ignored as this is a purely spatial cluster analysis. In the case of spatio-temporal clusters, dates shown define the temporal range of the cluster. Values x and y defined the cluster centre, size is the number of areas included in the cluster, statistic is the value of the test statistic and risk represents the point estimate of the associated cluster coefficient. Also, note that only clusters with a lower pvalue than argument alpha are returned. cluster indicates whether the cluster is a significant one. Finally, note how detected clusters are ordered by increasing value of pvalue, so that the most significant clusters are reported first. Function get.knclusters() can be used to extract the indices of the areas in the different clusters detected. Similarly, a vector with the cluster to which area belongs can be obtained with function get.allknclusters(). The detected clusters plus a "no cluster" category are the levels of this categorical variable, which can be added to a spatial object as follows: R> NY8$CLUSTER0 <-get.allknclusters(NY8, ny.cl0) The areas and centers of the clusters detected are shown in Figure 2. Because of the lack of adjustment for covariates these clusters show regions of high risk based on the raw data (observed and expected counts) alone.

Cluster detection after adjusting for covariates
Similarly, clusters can be detected after adjusting for significant risk factors. First of all, we will fit a Poisson regression with the 3 covariates mentioned earlier. As it can be seen, two of them are clearly significant and the third one has a p value very close to 0.05: R> ny.m1 <-glm(Observed~offset(log(Expected)) + PCTOWNHOME + + PCTAGE65P + PEXPOSURE, family = "poisson", data = NY8) R> summary(ny.m1) Call: glm(formula = Observed~offset(log(Expected)) + PCTOWNHOME + PCTAGE65P + PEXPOSURE, family = "poisson", data = NY8) As we have three covariates in the model, the offset included in the model will be different now, which may produce that the detected clusters may be different in this case. Cluster detection is performed as in the previous example, but now we use the model that adjusts for covariates instead: R> ny.cl1 <-DetectClustersModel(NY8, thegrid = as.data.frame(NY8)[idxcl, + c("x", "y")], fractpop = 0.15, alpha = 0.05, typeCluster = "S", + R = NULL, model0 = ny.m1, ClusterSizeContribution = "POP8") R> ny.cl1 R> NY8$CLUSTER1 <-get.allknclusters(NY8, ny.cl1) Figure 2 shows the clusters detected after adjusting for covariates (using gridExtra, Auguie 2017, and latticeExtra, Sarkar and Andrews 2016). Compared to the example with no covariate adjustment, the most significant cluster has disappeared. Hence, that cluster has been explained by the effect of the covariates. Another cluster is a bit smaller in size, which means that covariates only explain a small part of it. In all remaining clusters, cluster significance has been reduced by the effect of the covariates.
Finally, function slimknclusters() could have been used to remove overlapping clusters with less significant coefficients. This procedure will sort all detected clusters in decreasing order according to the value of statistic and remove those clusters which overlap with another cluster with a larger value of statistics. This will be similar in spirit to the SSS as this will only consider a single cluster around each area and will help to control for the multiple testing problem inherent to the problem of the detection of disease clusters.
Function slimknclusters() takes three arguments. The first one is the spatial or spatio-temporal object used in the call to DetectClustersModel, the second one is the table with the clusters detected and the third one is the minimum cluster size to consider. By default, this is 1, and it will set the minimum number of areas in a cluster to report it.
For example, the following code could be used to reduce the number of possible clusters to those non-overlapping cluster with a size of at least three in the cluster detection with covariates: R> slimknclusters(NY8, ny.cl1, 3)

Mixed-effects models for cluster detection
Random effects can be incorporated into our models to account for unmeasured risk factors. In this context, random effects will model area-specific intrinsic variation not explained by other covariates or cluster variables in the model. Cluster detection will be performed as usual, but we should keep in mind that by including random effects and dummy cluster variables there may be a clash between the two. By using dummy variables we are intentionally looking for unexplained spatial variation in the data. Hence, random effects should aim at modelling a different structure in the data. For example, if spatial random effects are included, these may tend to model the clusters and then an identifiability problem between them and the dummy cluster covariates may occur, which may lead to poor estimates of the random effects and/or the coefficients of the dummy cluster covariates.
For the Poisson case, this will mean that the relative risk can be modelled as: where u i represents a random effect normally distributed with zero mean and variance σ 2 u . Note that random effects can be defined to be spatially correlated, as suggested by Bilancia and Demarinis (2014). However, this can produce a clash between the dummy cluster variables and the random effects.
Random effects are particularly useful to model over-dispersion in count data, which occurs when the variance in the data is greater than the variance defined by the statistical model. For example, for a Poisson model the mean and variance should be equal. Hence, by including random effects in a Poisson GLM the variance of the data can be different to the mean.

Leukemia in upstate New York
We go back to the example on the leukemia incidence in upstate New York to show how models can include random effects and, at the same time, detect disease clusters. In this particular example, random effects will be important in order to reflect any over-dispersion present in the data. For this reason, our first step here is to test the data for over-dispersion using Dean's P B and P B score tests (see Dean 1992, for details Although p values have increased, they are both small and we may still consider that data are over-dispersed. Hence, we will aim at detecting clusters using a Poisson regression with independent random effects to account for census tract-level heterogeneity.

Cluster detection with no covariates
The baseline model with no covariate will now be fitted with function glmer() from package lme4 (Bates, Mächler, Bolker, and Walker 2015). This function is similar to glm() for GLMs but it will allow us to include random effects in the model as part of the formula argument.
R> library("lme4") R> ny.mm0 <-glmer(Observed~offset(log(Expected)) + (1 | + AREANAME), data = as(NY8, "data.frame"), family = "poisson") R> ny.clmm0 <-DetectClustersModel(NY8, thegrid = as.data.frame(NY8)[idxcl, + c("x", "y")], fractpop = 0.15, alpha = 0.05, typeCluster = "S", + R = NULL, model0 = ny.mm0, ClusterSizeContribution = "POP8") R> ny.clmm0 After accounting for overdispersion, the number of clusters detected is 2. These are shown in Figure 3. The largest cluster detected before has now disappeared and only the two smallest clusters remain. This may be due to the fact that the first cluster has the smallest SMR and risk. Hence, allowing for extra-variation will make the discrepancy between observed and expected less extreme and this cluster will be the first one to be declared as non-significant. The two remaining clusters have the same size and a very similar risk as in the previous case.

Cluster detection with covariates
The mixed-effects model with covariates can be fitted as follows: R> ny.mm1 <-glmer(Observed~offset(log(Expected)) + PCTOWNHOME + + PCTAGE65P + PEXPOSURE + (1 | AREANAME), data = as(NY8, "data.frame"), + family = "poisson") The estimates of the coefficients are similar to that of the fixed-effects model, but the associated p values are somewhat higher. This is probably due to the inclusion of random effects in the model.
R> ny.clmm1 <-DetectClustersModel(NY8, thegrid = as.data.frame(NY8)[idxcl, + c("x", "y")], fractpop = 0.15, alpha = 0.05, typeCluster = "S", + R = NULL, model0 = ny.mm1, ClusterSizeContribution = "POP8") R> ny.clmm1 q q q q Figure 3: Clusters detected with no covariate adjustment (left) and after adjusting for covariates (right) in the whole study region (top row) and Syracuse city (bottom row) using a mixed-effects model to account for overdispersion of the data. Areas in clusters are shaded in gray and cluster centers are represented by blue dots. When overdispersion and covariates are included in the model, only one of the clusters remains with size 1 and a slightly reduced estimate of the cluster coefficient. This should not come as a surprise given that we have already seen how including covariates explains some of the extra-variation and that by including random effects the significance of the clusters is also reduced. A summary of the clusters detected can be found in Figure 3.

Zero-inflated models for cluster detection
The analysis of rare diseases often involves datasets where there are many areas with zero counts, leading to zero-inflated data. In this situation the Poisson or binomial likelihoods may not be suitable to fit a model, because there is an excess of areas with zeros with respect to the number predicted by the pure Poisson or binomial model, and other distributions for the data should be used. Gómez-Rubio and López-Quílez (2010) discuss this issue and they have extended model-based cluster detection methods to account for zero-inflation.
For count data, a zero-inflated Poisson model could be used. In this case, observed number of cases come from a mixture distribution of a Poisson and a distribution with all mass at zero. Probabilities are as follows: Here P o(O i |θ i E i ) represents the probability of observing O i cases using a Poisson distribution with mean θ i E i . π i represents the proportion of zeroes observed that do not come from the Poisson distribution.
Relative risks θ i can be modeled using a log-linear model to depend on some relevant risk factors. Also, it is common that all π i 's are taken equal to a single value π. Ugarte, Ibáñez, and Militino (2006) analyze the incidence of brain cancer in Navarre (Spain) at the health district level. Figure 4 shows the standardized mortality ratios. As it can be seen there are many areas where the SMR is zero because there are no cases in those areas. Ugarte, Ibáñez, and Militino (2004) have also reported a significant zero-inflation of these data compared to a Poisson distribution. For cluster detection, the method implemented in DClusterm is similar to the one used in Gómez-Rubio and López-Quílez (2010) for the detection of disease clusters of rare diseases.

Cluster detection
Before starting our cluster detection methods, we will check the appropriateness of a Poisson distribution for this data. Fitting a log-linear model (with no covariates) gives the following model: R> nav.m0 <-glm(OBSERVED~offset(log(EXPECTED)) + 1, family = "poisson", + data = brainnav) Furthermore, a quasipoisson model has been fitted in order to assess any extra-variation in the data: R> nav.m0q <-glm(OBSERVED~offset(log(EXPECTED)) + 1, + data = brainnav, family = "quasipoisson") The dispersion parameter in the previous model is 1.21, which is higher than 1. This may mean that the Poisson distribution is not appropriate in this case due to the excess of zeros.
The column for the expected counts must be named Expected, and this is our first step. Note also that, because only one time period is considered, data will have a single value and it is the 1st of January of 1990.
Function DetectClustersModel() will perform the cluster detection using a zeroinfl model. This provides a very flexible way of handling different types of models in R for cluster detection.
An index for the areas in each of the detected clusters can be obtained with the knbinary() function. It will return a data.frame with all the dummy cluster variables, i.e., the returned data.frame will have as many columns as clusters and a number of rows equal to the number of areas. Entries will be 1 if the corresponding areas are in a cluster and 0 otherwise. These indices can be used for a number of analyses, such as checking whether two clusters overlap or computing the number of times an area is included in a cluster. In the following example we obtain the representation of all the clusters detected and the first one, the most significant, is added as a new column to the original SpatialPolygonsDataFrame to be displayed in Figure 5.

Spatio-temporal clusters
where µ i,t is the mean of area i at time t and c (j) i,t a cluster dummy variable for spatio-temporal cluster j. Random effects can also be included in Equation 7 as described in Section 3 and zero-inflated distributions can also be considered as in Section 4.
Note how now data are indexed according to space and time. Dummy cluster variables are defined as in the spatial case, by considering areas in the cluster according to their distance to the cluster center, for data within a particular time period. When defining a temporal cluster, areas are aggregated using all possible temporal windows up to a predefined temporal range.

Cluster detection with no covariates
Similarly as in the purely spatial case, a Poisson model with no covariates will be fitted first: R> nm.m0 <-glm(Observed~offset(log(Expected)) + 1, family = "poisson", + data = brainst) R> summary(nm.m0) Call: glm(formula = Observed~offset(log(Expected)) + 1, family = "poisson", data = brainst) Before proceeding to disease cluster detection, we have extracted the centroids of the counties in New Mexico by using function coordinates() from the sp slot in the STFDF object that stores the data.

R> NM.coords <-coordinates(brainst@sp)
Cluster detection with function DetectClustersModel() takes arguments minDateUser and maxDateUser to define the start and end date of the period which is considered when looking for clusters. In this example the time period has been constrained to [1985][1986][1987][1988][1989]. Furthermore, typeCluster = "ST" is used to look for spatio-temporal clusters. Note that it is also possible to look for spatial clusters by aggregating observed and expected cases over the whole temporal window with typeCluster = "S".

Cluster detection after adjusting for covariates
In this case, we will use the inverse of the distance to LANL as a covariate as no other information about the areas is available. Distances have been computed using function spDistsN1() from package sp (Pebesma and Bivand 2005). Given that coordinates are expressed in longitude and latitude great circle distances are used.
R> dst <-spDistsN1(pts = NM.coords, pt = losalamos, longlat = TRUE) Distances need to be put together in a way that values are available for all time periods. In this case, given that distances do not change over time, a vector is created by repeating the vector of distances as many times as time slots (years) we have in the dataset.
R> nm.m1 <-glm(Observed~offset(log(Expected)) + IDLANL, + family = "poisson", data = brainst) R> summary(nm.m1) Call: glm(formula = Observed~offset(log(Expected)) + IDLANL, family = "poisson", data = brainst)  Note how now the included covariate is not significant. For illustrative purposes, we will still keep the covariate in our model for the cluster detection. However, non-significant covariates will have a tiny impact on the clusters detected as they will not produce a change in the expected number of cases.
In order to exploit the output from DetectClustersModel(), function get.stclusters() will take the data and this output to return a list with the indices of the areas in the clusters. The next example shows how to add a new variable to brainst with the space-time regions in the most significant cluster, which is displayed in Figure 7.

Discussion
In this paper we have introduced DClusterm, a new package for the R statistical computing software for the detection of disease clusters using a model-based approach, following recent developments by several authors. Clusters are represented by dummy variables that are introduced into a generalized linear model and different likelihoods can be used to account for different types of data. Because of this model-based approach, fixed effects (to consider relevant risk factors) and random effects (to account for other non-spatial unmeasured risk factors) can be put in the linear predictor as well.
In our examples we have considered well-known datasets to show how the functions in the DClusterm package tackle the problem of cluster detection. The results are similar to those found in relevant papers where the same datasets have been analyzed using a similar methodology. In particular, we have considered the case of the detection of clusters in space and space-time, zero-inflated data and over-dispersed data.