Generating Correlated and/or Overdispersed Count Data: A SAS Implementation

Analysis of longitudinal count data has, for long, been done using a generalized linear mixed model (GLMM), in its Poisson-normal version, to account for correlation by specifying normal random effects. Univariate counts are often handled with the negativebinomial (NEGBIN) model taking into account overdispersion by use of gamma random effects. Inherently though, longitudinal count data commonly exhibit both features of correlation and overdispersion simultaneously, necessitating analysis methodology that can account for both. The introduction of the combined model (CM) by Molenberghs, Verbeke, and Demetrio (2007) and Molenberghs, Verbeke, Demetrio, and Vieira (2010) serves this purpose, not only for count data but for the general exponential family of distributions. Here, a Poisson model is specified as the parent distribution of the data with a normally distributed random effect at the subject or cluster level and/or a gamma distribution at observation level. The GLMM and NEGBIN model are special cases. Data can be simulated from (1) the general CM, with random effects, or, (2) its marginal version directly. This paper discusses an implementation of (1) in SAS software (SAS Inc. 2011). One needs to reflect on the mean of both the combined (hierarchical) and marginal models in order to generate correlated and/or overdispersed counts. A pre-specification of the desired marginal mean (in terms of covariates and marginal parameters), a marginal variance-covariance structure and the hierarchical mean (in terms of covariates and regression parameters) is required. The implied hierarchical parameters, the variance-covariance matrix of the random effects, and the variance-covariance matrix of the overdispersion part are then derived from which correlated Poisson data are generated. Sample calls of the SAS macro are presented as well as output.


Introduction
In many areas of research, correlated or otherwise hierarchical data are collected, be it in medical statistical, social, behavioral, and educational science, economy, biology, agriculture, etc. Molenberghs and Verbeke (2005) and Verbeke and Molenberghs (2000) describe methods for the analysis of discrete and continuous longitudinal data, respectively. Apart from analyzing data, leading to estimation and hypothesis testing, interest may be in evaluating the statistical properties of models fitted to data generated from certain mechanisms, to study operational characteristics, for design and sample size determination purposes, etc. Monte-Carlo (MC) simulation is a commonly followed route to this effect. Simulation of hierarchical normally distributed data is relatively straightforward, because simulation from univariate and multivariate normal distributions is easy.
Our focus is on hierarchical and/or overdispersed count data. A common basic model is the Poisson model. If overdispersion is present, i.e., the variance is unequal to the mean, the negative binomial (NEGBIN) model can be considered. This model can be generated by assuming that the Poisson parameter follows a gamma distribution. For correlated counts, the generalized linear mixed model (GLMM; Breslow and Clayton 1993) can be used. Here, the linear predictor of the Poisson model encompasses normal random effects. When both phenomena are present, the so-called combined model (CM) is relevant (Molenberghs et al. 2007(Molenberghs et al. , 2010; it allows for both normal and gamma random effects. Consider a longitudinal study, where a subject is measured repeatedly over time. The normal random effects induce correlation between different measures of the same subject. gamma random effects at each occasion take care of overdispersion. Should the gamma random effects be correlated, then a serial correlation process can be modeled. This parallels the flexibility of the general linear mixed models (Verbeke and Molenberghs 2000). Two comments are in place. First, the GLMM and CM allow for general outcome types, so there are versions for binary and timeto-event outcomes; these are not the focus here. Second, the GLMM and CM are often called conditional models, because they are formulated conditional on random effects. Marginal functions, namely, the mean and variance, are then derived by marginalizing/integrating over the random effects. While marginalizing over the normal random effects is relatively easy for count data, the expressions are more cumbersome than marginalizing over correlated or uncorrelated gamma random effects. This feature is exploited by offering implementations of the CM in a SAS 9.3 (SAS Inc. 2011) macro to generate correlated and/or overdispersed Poisson random variables. The method can also be used to generate purely serially correlated counts by dropping the normal random effects and choosing the "overdispersion part" to follow a serially correlated multivariate gamma distribution. The macro makes use of the SAS LOGISTIC procedure to create the design matrix by using a working response variable, which is deleted after it has served its purpose. At a data manipulation phase, the macro uses the GENMOD procedure to obtain parameter estimates corresponding to the design matrix, to eliminate columns from the design matrix corresponding to reference categories (when dummy coding is used for some covariates).
Proposals have been made in the literature. Examples include the overlapping sums (Madsen and Dalthorp 2007;Mardia 1970;Kocherlakota 1992, 2001); lognormal-Poisson hierarchy; a method named "normal-to-anything" (NORTA; Cario and Nelson 1997;Cario and Nelson 1998;Nelsen 2006;Mardia 1970;Li and Hammond 1975), and extensions thereof (Yahav and Shmueli 2012;Ghosh and Pasupathy 2012;Shin and Pasupathy 2010;Avramidis, Channouf, and L'Ecuyer 2009;Park and Shin 1998;Downer and Moser 2001). Next, we briefly describe these methods in turn. Overlapping sums (also known as the trivariate reduction) method generates a bivariate Poisson random vector from three or more independent Poisson random variables. NORTA presents a way of generating corre-lated random variables derived from the multivariate normal distribution given pre-specified marginals and a desired correlation structure. This is motivated by the ease with which multivariate normal random variables can be generated. A log-normal-Poisson hierarchy generates correlated counts by transforming correlated normals to log-normal variables, such that the two random variables are correlated via the correlation in the conditional means. See also Devroye (1986) for an overview on random variate generation. Most of these methods suffer from such limitations as: severe computational restrictions; difficulty achieving the target correlation; generated variables are required to be overdispersed; correlations constrained to be strictly positive; etc. These issues stem form the fact that the multivariate Poisson distribution grows in computational complexity with an increase in the dimensions due to the summations inherent in the distribution (Karlis 2003).
A review of the models used for correlated and/or overdispersed count data is presented in Section 2, while Section 3 focuses on data generation. Section 4 provides some specific details about the SAS macro, some examples of CMs to generate data from, how to generate these data using the macro and the output of the macro. Some concluding remarks are given in Section 5.

Overview of the models
We seek to generate correlated Poisson random variables for K independent subjects with subject i having measurements Y ij , i = 1, . . . , K, j = 1, . . . , n i . Our methodology requires a reflection on both the target marginal mean model as well as the hierarchical model. The target or desired mean must be pre-specified in terms of an n i ×p known design matrix X i and a corresponding p-dimensional vector of parameters for the marginal mean α. In addition, the covariates for the hierarchical component, i.e., conditional on the normal random effects, also need to be specified from which design matrices X i of dimension n i × m and Z i , an n i × q design matrix of subject i are created. An m-dimensional fixed-effects parameter vector β, D the normal random effects variance-covariance matrix of dimension q × q and Σ i , the n i × n i variance-covariance matrix of the gamma random effects, are then derived and used to generate the data. Details of these are given next.
In dealing with correlated count data, the so-called generalized linear mixed model (GLMM; Breslow and Clayton 1993;Wolfinger and O'Connell 1993;Molenberghs and Verbeke 2005) is a commonly used tool for analysis. The GLMM modifies the linear predictor in generalized linear models (GLM; Nelder and Wedderburn 1972;McCullagh and Nelder 1989;Agresti 2002), a class of fixed-effects models unifying linear, logistic, and Poisson regression models among others, to include unknown subject-specific or random effects in addition to the fixed effects. In practice, these random effects are usually assumed to follow a normal distribution, mainly for convenience and software availability. However, they can, in principle, be assumed to follow a different distribution than the normal. Specific to count data, the standard GLMM (also known as the Poisson-normal model) takes the following form: whereby the conditional distribution of the observations from a subject i given the random effects b i is Poisson with a rate parameter λ ij that is log-linearly related to covariates.
Count data are also notorious for overdispersion (Hinde and Demétrio 1998a,b;Breslow 1984;Lawless 1987;Molenberghs and Verbeke 2005), a feature that is usually accounted for by using the NEGBIN model. The NEGBIN model follows from a two-stage approach where in stage 1, a distribution is considered for the response variable, given a random effect f (y i |b i ), and in stage 2, a model for the random effects f (b i ) is specified, in this case the Poisson and gamma distributions, respectively. Combining the two stages and integrating as in Equation 2 over the random effects results in the NEGBIN marginal model. Extension to the case of correlated or hierarchical count data is rather easy as shown in Section 3.2 of Molenberghs et al. (2010). Fitting these models is done by maximizing the marginal likelihood Closed-form expressions for these integrals do not exist in all cases but Molenberghs et al. (2007) and Molenberghs et al. (2010) derived the marginal mean and covariance for the Poisson GLMM case as respectively, where J i is a matrix of 1's and M i is a diagonal matrix with entries µ ij . Also, the higher-order marginal moments and the marginal joint distribution can be derived in closed form for the Poisson case (Molenberghs et al. 2010).
Much as longitudinal count data usually exhibit both features of correlation and overdispersion, analysis until recently has been done predominantly by accounting only for one of these features but not both. The introduction of the CM by Booth, Casella, Friedl, and Hobert (2003), Molenberghs et al. (2007) and Molenberghs et al. (2010) quite flexibly accounts for these features simultaneously. The CM is expressed as where θ ij , the entries in θ i , are the overdispersion parameters introduced at observation level.
If the θ ij 's are assumed to be independent as is often done in practice, then the association is only induced by the b i and the θ ij would cover the overdispersion not accounted for by the normal random effects. As such, Σ i is reduced to a diagonal matrix. On the other hand, the θ ij can be correlated such that Σ i can take on more general structures, which implies the use of some form of multivariate gamma (MGamma) distribution. The marginal mean and the marginal variance-covariance matrix take the form: where M i = diag(µ i ) and Here, J i is a matrix of ones. Note that we make use of the fact that the Gamma random effects have unit mean. Note also that the overall variance-covariance matrix V i is made up, among others, of the normal random-effects variance D and the Gamma random-effects variance Σ i . The rules for creating Z i follow those of X i , given that both are design matrices.

Generation of correlated counts
As will be presented in Section 3.1, the marginal moments of the GLMM can be used to parsimoniously generate correlated count data with a pre-specified marginal mean function and such variance-covariance structures as compound symmetry and the one generated by random intercept and random slope models. In the GLMM case, however, the random effects used do not distinguish between correlation and overdispersion, a disadvantage that may lead to mis-representation of the random-effects variability and therefore necessitates the CM. Since marginal and conditional parameters do not have a 1:1 correspondence in the case of discrete data, unlike the continuous data case, caution has to be exercised when generating data for the context in mind, marginal or hierarchical. The use of random effects in the hierarchical context is satisfying while a marginal model or moments thereof are necessary in the marginal context in order to match context and process. We describe the process of count data generation, first, starting from a GLMM and then from a CM. A presentation of the algorithms for generating data from these two models follows in Sections 3.1 and 3.2.

The GLMM as a data generator
The GLMM can be used to generate correlated random variables with a desired structure. Given a marginal (log) mean, possibly depending on covariates X i , and a variance-covariance matrix for Y i , Algorithm 1 below generates random variables with this pre-specified structure.
Algorithm 1: 1. Derive the unknowns β and D of the GLMM by comparing the desired marginals µ i and V i with the marginals (3a) and (3b) from the GLMM.

Using
For example, in the case of compound symmetry (CS) and given the desired marginal mean as ln(µ i ) = X i α and desired variance-covariance structure as V i = M i + τ 2 J i (CS structure), the necessary unknowns in Step 1 of the above algorithm are derived by comparing [a] X i α = X i β + 0.5Z i DZ i [which is (3a) expressed in matrix form] for the marginal mean, and, The marginal mean parameter α was introduced in Section 2. Note that X i is the marginal design matrix, as opposed to X i , which is the design matrix for the fixed effects given the normal random effects. Solving [a] for β and [b] for D leads to: If the generalized inverse is not an inverse, the solution clearly is not unique. This is not a problem; it simply means that several choices of β and D are possible, which nevertheless all lead to the desired marginal structure. This is akin to the fact that there is a one-to-many map between a given marginal model on the one hand and the class of hierarchical models that marginalizes to it on the other. Any member of the class of hierarchical model can in principle be used as a data generator for the marginal structure.

The combined model as a data generator
While the GLMM is relatively standard, it allows for specific covariance and correlation structures only, i.e., these that are induced from the normal random effects. In contrast, the CM can be used to generate correlated Poisson random variables, with very general marginal covariance and correlation structures. This generality was conceptually presented in Molenberghs et al. (2007) and Molenberghs et al. (2010), but not used in practice. Precisely, the earlier paper was restricted to i.i.d. gamma variables, whereas here, they can follow any multivariate gamma distribution. The major extension relative to the GLMM is that there is a third unknown term in the CM, i.e., Σ i , the variance-covariance matrix for the overdispersion parameter(s). It allows for positive and negative correlations alike, provided the resulting marginal variance-covariance matrix V i is positive-definite. However, because this matrix is user-specified in our algorithms, checking for this property is trivial.
Given a desired mean and variance-covariance structure, Algorithm 2 generates the Poisson variates.
Algorithm 2: 1. Derive the unknowns β, D, and Σ i in the CM.
The necessary unknowns in Step 1 of Algorithm 2 are given by β as in (6a) and further:  Table 1: Possible combinations of the normal and gamma random effects in the context of count data. refers to combinations of the CM from which correlated and/or overdispersed data can be generated, while refers to the independent count data generation case which is not of interest in this paper. Also included is the SAS macro argument referring to the normal and gamma random effects and the corresponding value to be input for each case, when running macro CorrPoisson.
where notational conventions are as before.
An extension to generating purely serially correlated outcomes is done by removing the normal random effect and choosing θ i such that it follows a serially correlated multivariate gamma.
The general form of the CM (4), in the case of Poisson data, is that the normal random effects are correlated and the gamma random effects are also correlated. From this general case, several special cases can be derived. An overview of the possible combinations is presented in Table 1. The following special cases, which are also presented in Table 1, can be derived from the more general case: • A combination of normal and independent gamma random effects. This is the most commonly used form of the CM in which the normal random effects induce/account for correlation while the gamma random effects induce/account for overdispersion. It is model (4) but with Σ i diagonal.
• Normal random effects without gamma random effects. In this case, (4) reduces to (1) and data are generated as explained in Section 3.1. Here, the normal random effects induce/account for both correlation and overdispersion.
• No normal random effects, no gamma random effects. The absence of both random effects is equivalent to generating independent counts.
• No normal random effects, correlated gamma random effects. This implies that both correlation and overdispersion are induced via the gamma random effects. Thus, λ ij in (4) becomes exp(X ij β) and Σ i is fully general. This case allows for arbitrary marginal variance-covariance matrices between the Poisson variables, featuring positive and/or negative correlation, provided the resulting variance-covariance matrix is positive-definite.
• No normal random effects, independent gamma random effects. In this case, the CM reduces to the negative-binomial model which accounts for overdispersion but not correlation. Then, λ ij in (4) becomes exp(X ij β) and Σ i is diagonal.
Additional variations can be made by choosing for the normal random effects (random intercept + slope, or higher dimensions) to be either independent (D diagonal) or correlated.

Introduction
We have implemented the above discussed method of data generation for marginal models in SAS version 9.3. The SAS macro is called CorrPoisson. Data generation is done in SAS/IML preceded by some data manipulations. Given that we allow the θ ij 's to be correlated, some form of multivariate gamma distribution is required. We invoke the copula package (Hofert, Kojadinovic, Maechler, and Yan 2015;Hofert and Maechler 2011;Yan 2007;Kojadinovic and Yan 2010) available for the R environment for statistical computing and graphics (R Core Team 2016) to do this. We are aware of the experimental COPULA procedure in SAS 9.3 but follow a different route here. Instead, we have explored SAS 9.3's flexibility to call R in the SAS/IML procedure using the SUBMIT and ENDSUBMIT statements. This is done in the RinSASIML.sas program which is included in CorrPoisson.sas using the %inc statement. When calling CorrPoisson, the path to RinSASIML.sas must be defined correctly using the Include macro argument as, for example, Include = c:/temp in case RinSASIML.sas is saved in the c:/temp directory. Three issues deserve mentioning at this point, namely, (a) that R software (available from https://CRAN.R-project.org) has to be installed as well as the copula package (it is not installed by default in R), (b) that the SAS system has to be launched with the -RLANG system option to permit calling R from SAS (note that it is often convenient to insert this option in a SASV9.CFG file), and, (c) that calling R functions in SAS using the SUBMIT and ENDSUBMIT statements is a relatively new feature that was introduced in SAS/IML 9.22. As such, the macro will not work with SAS versions that preceded 9.22. We refer to the SAS/IML 9.22 user's guide (SAS Inc. 2011) or later versions for details about calling R from within SAS. The macro was developed and certainly works with SAS version 9.3 and R version 2.15.2. However, compatibility issues may arise while invoking R in SAS/IML depending on the versions of both R and SAS. Error messages that may indicate incompatibility are, for example, An installed version of R could not be found or The installed version of R cannot be used.  If this test results in errors and the user has the most recent version of R, it may be useful to explicitly tell SAS/IML which R version to use since SAS/IML tries to use the corresponding R_HOME variable. This can be done by launching SAS with a specification of the R_HOME variable as, for example, -RLANG -SET R_HOME "C:/program files/R/R-3.0.1", in which case R version 3.0.1 would be used. We emphasize that in order to have a successful execution of macro CorrPoisson, one should first ensure that (1) the connection between SAS and R is ok (by running the test program above), and (2) that the path to RinSASIML.sas is correctly defined. Otherwise, errors directly related to these two aspects may be encountered. Note that our macro can quite easily use any parameterization method possible in SAS LOGISTIC and GENMOD procedures for the design matrix for classification variables, e. (1) when a random intercept model is specified for the normal random effects and (2) when a random intercept and slope model is used for the normal random effects. Please note that while the methodology and the SAS macro allow for X i to be different from X i , our illustrations below set X i = X i , by leaving the macro argument Xcov2 blank, for purposes of checking whether or not the data generation follows intuition. A need to have X i and X i different is achieved by specifying the different covariates in the macro arguments Xcov and Xcov2, respectively. A general remark about the use of the macro is that all variables to be specified in the different macro arguments must be in the dataset specified in the CovData argument. A detailed description of the macro arguments is given in Tables 3 and 4.

The random intercept case
In this section, we demonstrate how to generate correlated count data from the CM using the macro that we have developed.

The combined model
Using a random intercept model for the normal random effects, we illustrate the generation of 4 correlated Poisson variables using the following CM: where, b 0i is the random intercept, the treatment allocation T i ∼ Bernoulli(0.5), t ij is the ordering of the jth observation in subject i = 1, . . . , K = 500 and j = 1, 2, 3, 4. The design matrices X i and X i in (6a) are created from the same covariates, namely, T i , t ij , and T i t ij while the desired marginal mean parameters and desired variance-covariance structure are

Calling macro CorrPoisson
The CM has several variations, as shown in Table 1 and described in Section 3.2. Generating data from these variations can be achieved by specifying the combinations of normal and gamma random effects using the macro arguments NormalRandEff and GammaRandEff, respectively. For illustration purposes, we shall only generate from the general case of the CM, namely, with correlated normal and correlated gamma random effects. This is done by specifying NormalRandEff = 2 and GammaRandEff = 2. More specifically, to generate data from CM (7), one has to first create the CovData dataset, referred to as temp in our example. This dataset contains variables id, trt and time. Here, trt stands for "treatment". Further, time indicates time in the study, either from a well-chosen baseline moment 0, recoded to be centered around 0, or otherwise depending on the study. See Tables 3 and 4 for more details about the CovData argument. Given temp, the following macro call would generate correlated count data from the CM with a random intercept model specified for the normal random effects: Generating from the other variations is done by specifying the macro arguments as shown in Tables 1, 3 and 4. Please note that setting NormalRandEff = 0 meaning no normal random effects, and GammaRandEff = 0 meaning no gamma random effects, would imply generating independent count data which is not of interest in this paper although it is also implemented in macro CorrPoisson, for completeness. Also note that not all macro arguments are shown in the above call. Those not shown are set to the default values. See Tables 3 and 4 for the full list of macro arguments that facilitate the data generation and their descriptions. Leaving the random argument blank implies the use of a random intercept model for the normal random effects.

Argument Description CovData
Dataset containing subject or cluster identification variable and covariates from which design matrices X i (and X i ) in (6a) is (are) created. By default, CovData = temp, for illustration purposes. Dataset temp contains covariates id, trt and time. It is a required argument and therefore has to be specified before running the macro. This dataset should take the "long" or hierarchical data structure as opposed to the wide format. Please note that all variables to be specified in the other macro arguments must be in this dataset.

ID
Subject identification variable in CovData.

OrderVar
Variable with ordering of the observations within subject. This should be contained in the CovData dataset. Default input is time, again for illustrating how the macro can be invoked.

Xcov
Covariates from which X i is created. It is also a required argument though for illustration purposes, Xcov = trt time trt*time. Please note that the intercept must not be included in the specification of Xcov. It is added at the creation of the X i design matrix.

Xcov2
Covariates from which X i is created. If left blank (the default), the macro sets Xcov2 = Xcov such that X i = X i thus using the same covariates for both these design matrices. Again, the intercept must not be included in the specification of Xcov2. It is added at the creation of the X i design matrix if different covariates from those in Xcov are specified.

Alpha
The desired marginal mean parameter estimates, without the reference categories. It can be specified, for example, as Alpha = 2.5 0.7 1.2 -0.45, corresponding to Intercept, trt=0, time and trt*time, respectively. Please note that the parameter estimate for the intercept must always be included.

Class
Specify all classification variables included in the Xcov argument, for example, Class = trt. Class2 Specify all classification variables included in the Xcov2 argument, for example, Class = gender. Note that if Xcov2 is left blank, this argument is ignored. outdata Name of final output dataset in which the generated outcome variable, named Y, is merged with the CovData dataset. It also contains the variancecovariance matrix of the gamma distribution (Σ i , GammaCov) in (4c), the corresponding correlation matrix of the gamma distribution (GammaCorr), the shape and scale parameters for the gamma distribution, the θ i 's (GamV) in (4c), the λ * ij 's (mu) in 4b and the random effects estimates (b i ) in (4d). By default, outdata = out.

Output
By default, CorrPoisson creates the output dataset out whose content is as described in Tables 3 and 4 under the outData argument and provided in a SAS dataset called out1.sas7bdat for the random intercept case. By setting Estimates = 1 (the default), the following output Argument Description param Parameterization method for the classification variables specified in the Class argument. Default is param = glm. See SAS documentation for details about the different methods at http://support.sas.com/documentation/cdl/en/statug/63347/ HTML/default/viewer.htm#statug_logistic_sect006.htm. random Covariates for the normal random effects. Default is blank hence random intercept model. To fit, for example, a random intercept and slope model, specify random = time, whereby time indicates the ordering of observations within a subject. meanNormalRE By default (meanNormalRE is left blank), macro CorrPoisson then assumes the normal random effects (b i ) to have zero mean. This can be changed by filling the mean in this argument. seed Set seed. Default is seed = 123. desiredVarCov Specify the desired variance-covariance matrix V i by inputting the entries of the upper triangular matrix row-wise (e.g., v11 v12 v22 or v11 v12 v13 v22 v23 v33). GammaRandEff Either 0, 1 or 2 for the gamma random effects where 0 = No gamma random effects, 1 = Independent gamma random effects, 2 = Correlated gamma random effects. Default is GammaRandEff = 2. NormalRandEff Either 0, 1 or 2 for the normal random effects where 0 = No normal random effects, 1 = Independent normal random effects, 2 = Correlated normal random effects. Default is NormalRandEff = 2.

Include
Specifies the path to RinSASIML.sas. It is a required argument and must be correctly specified for the macro to run well. is generally printed to the output window: 1. The CM combination of normal and gamma random effects from which data is being generated.
2. The sample size or number of clusters (K) in the CovData dataset.
3. Minimum and maximum number of measurements per cluster or subject.
4. Covariates used for the design matrix Z i of the normal random effects (including the intercept). For example, if a random intercept model is considered for the normal random effects, then "Normal random effects covariates = Intercept". In the case of a random intercept and slope model, then "Normal random effects covariates = Intercept time" where time is the variable specified in the macro argument OrderVar in Tables 3  and 4. 5. The desired or given marginal mean in terms of the covariates and parameters. For classification variables, e.g., trt in this case, an indication is given for the reference category being considered. Herein, 0 was considered as the reference leading to "trt0". 6. The desired or given variance-covariance matrix V i .
7. The covariates used, the desired marginal mean parameters ("alpha"), the derived conditional parameters ("beta"), the difference between the marginal and conditional parameters ("diff") and the variance covariance matrix D of the normal random effects. Please note that D is printed only if normal random effects are used.
Specifically, Figure 1 shows the results printed to the output window when a random intercept model is used for the normal random effects. From Figure 1 and as expected for a CM with a random intercept only model, a change (diff) in the marginal parameters (α) from the hierarchical parameters (β) is only in the intercept parameter but the other parameters remain practically unchanged.
To illustrate the functionality of the method given a low variance-covariance structure, the following macro call was used (for all the macro arguments, see Tables 3 and 4 Generate Correlated Poisson data from CM with Normal and No Gamma random effects **************************************************** Sample size (K) = 20 minimum number of measurements per subject = 2 maximum number of measurements per subject = 2 Normal random effects covariates = Intercept  For very small variance-covariance matrix V i , computational issues are more rampant as the method gets more sensitive also to the given α parameters. Figure 2 shows the output as printed to the output window. The output dataset is also presented in a SAS dataset named out1smallvar.sas7bdat.

The random intercept and slope case
Consider the following CM with a random intercept and slope model specified for the normal random effects: where notation and the desired marginals are as in Section 4.2. Since model (8) is similar to (7) with the only difference being the random slope (b 1i ) in model (8), the code shown in Section 4.2 is used to generate correlated count data. The only difference is that, in this case, the argument random = time. The macro will then create a dataset containing the generated data named out, in long form, and a print to the output window as in Figure 3. Unlike the random intercept case, the difference between α and β in the case of the random intercept and slope model is evident in both the intercept and time parameters, as expected. However, this pattern holds when all subjects in the CovData dataset have an equal number of measurements, when the random intercept and slope model for the normal random effects is specified. Note that for the random intercept model, a change is seen only in the intercept parameter, whether or not all subjects have an equal number of measurements. Not only does the macro allow for n i to be equal but it also allows n i to be unequal. To illustrate this, we generate a dataset with n i being between 2 and 4 measurements per subject. For one to fit the same model shown in Section 4.3 but with n i varying between 2 and 4 measurements, the same code as in the above call would be used. The difference, though, should be in the input dataset temp specified using the CovData argument. The results as printed to the output window are shown in Figure 4. One should note the difference in the two models from the minimum and maximum number of measurements per subject in the output.

Concluding remarks
We have presented SAS code to generate correlated Poisson data, in the context of marginal models, from the CM introduced by Molenberghs et al. (2007) and Molenberghs et al. (2010). The CM simultaneously accommodates correlation and overdispersion unexplained by the normal random effects. In the absence of correlation, the model simplifies to a negativebinomial model for overdispersion. On the other hand, in the absence of overdispersion, it simplifies to the GLMM. The model's flexible structure makes it a good candidate as a data generator reflecting the characteristics of interest, in this case, overdispersion and/or correlation. The CM is a convenient tool that mimics or incorporates these intrinsic features of correlated count data.
By marginalizing the distribution of the Poisson response conditional on the normal and gamma random effects, i.e., integrating out the random effects from the conditional density of the CM, one is able to generate data in the context of marginal models by comparing the mean and variance of the marginal model with the desired marginal mean and variance structures. The covariates determining the fixed-and random-effects design matrices are kept simple herein. This is not limiting in the sense that a specification of any covariates can be done as is needed. It is possible to encounter non-positive definite D matrices or negative entries along the diagonal of Σ i . These issues are important but different. A non-positive D matrix points to a non-allowable hierarchical model. However, the corresponding marginal model may then still be valid. If interest is restricted to marginal quantities, such a model may be retained. A non-valid Σ i is a problem in any case. Thus, keeping the inferential goals in mind, such model settings need to be given careful consideration. A purely marginal specification comes with the advantage that a broader parameter space is allowable. At the same time, a purely marginal specification inhibits the use of such hierarchical features as empirical Bayes predictions.
Because the CM is hierarchical, random variables with only positive correlations are generated due to restrictions of positive-definiteness on the random effects variance-covariance matrices. This may be a drawback for the CM, as is the case for some of the methods present in the literature for count data generation. However, a way to overcome this is to generate directly from the marginal model, arguably via correlated θ ij , of which the variance-covariance matrix Σ i then reflects the desired structure.
Execution errors of the form Unable to allocate sufficient memory may be encountered when attempts are made to generate sizes of datasets that use resources more than are available to SAS for the specific computer in use. One may then have to reduce the sample size or the number of measurements per subject or find a computer with greater memory capacity. Our experience is that this limitation is more rampant in 32-bit Windows operating systems which support matrices up to a maximum size of memory of 2GB of addressable space.
The SAS macro CorrPoisson.sas and the RinSASIML.sas program are available as supplementary material along with this manuscript and at the authors' website (http://ibiostat. be/software/overdispersion).