%CRTFASTGEEPWR: a SAS macro for power of the generalized estimating equations of multi-period cluster randomized trials with application to stepped wedge designs

Multi-period cluster randomized trials (CRTs) are increasingly used for the evaluation of interventions delivered at the group level. While generalized estimating equations (GEE) are commonly used to provide population-averaged inference in CRTs, there is a gap of general methods and statistical software tools for power calculation based on multi-parameter, within-cluster correlation structures suitable for multi-period CRTs that can accommodate both complete and incomplete designs. A computationally fast, non-simulation procedure for determining statistical power is described for the GEE analysis of complete and incomplete multi-period cluster randomized trials. The procedure is implemented via a SAS macro, \%CRTFASTGEEPWR, which is applicable to binary, count and continuous responses and several correlation structures in multi-period CRTs. The SAS macro is illustrated in the power calculation of two complete and two incomplete stepped wedge cluster randomized trial scenarios under different specifications of marginal mean model and within-cluster correlation structure. The proposed GEE power method is quite general as demonstrated in the SAS macro with numerous input options. The power procedure and macro can also be used in the planning of parallel and crossover CRTs in addition to cross-sectional and closed cohort stepped wedge trials.


Introduction
Cluster randomized trials (CRTs) are studies designed to evaluate interventions that operate at a group level, manipulate the physical or social environment, or cannot be delivered to individuals (Murray, Varnell, and Blitstein 2004).Regarding the different schedules of recruiting participants, CRTs are classified into cross-sectional, closed-cohort and open-cohort designs (Copas, Lewis, Thompson, Davey, Baio, and Hargreaves 2015).Cross-sectional designs recruit a unique set of individuals in each period, whereas closed-cohort designs follow the same individuals in clusters with repeated observations across periods.The open-cohort design, however, allows the addition of new members to an existing cohort in each period.Moreover, there are different types of CRT designs, including parallel, crossover, and stepped-wedge designs.As are instances of CRTs with outcomes measured in multiple periods, a steppedwedge cluster randomized trial (SW-CRT) is a type of CRT wherein clusters switch from control condition to treatment at randomly assigned time points (Hussey and Hughes 2007).Logistical and ethical considerations such as the need to deliver the intervention in stages and the desire to implement the intervention in all clusters are factors involved in the choice to use a SW-CRT (Turner, Li, Gallis, Prague, and Murray 2017).SW-CRTs may be preferred over other designs because they may facilitate cluster recruitment or offer increased power over other cluster randomized designs even with a limited number of clusters (Hemming and Taljaard 2020).Most study planning methods are for complete SW-CRTs, where all clusters have outcome data in all periods.However, incomplete stepped wedge designs are increasingly being deployed, whereby some cluster-periods do not record data due to logistical, resource, and patient-centered considerations (Kasza and Forbes 2019).Specifically, researchers may choose not to collect data in a cohort design or enroll new participants in a cross-sectional design during some cluster-periods.Hemming, Lilford, and Girling (2015) described two types of incompleteness in stepped wedge designs, one involving implementation periods and the other staggered study entry or termination of clusters.
Population-averaged analysis models with GEE estimation and inference have several advantages for the design and analysis of CRTs (Preisser, Young, Zaccaro, and Wolfson 2003).In contrast to generalized linear mixed models, the intervention effect from a populationaveraged model describes how the average response changes across the subsets of population defined by the treated and control cluster-periods.Additionally, because models for mean and correlation structures are separately specified, the interpretation of the marginal mean regression parameters remains the same regardless of the specification of working correlation model (Preisser, Lu, and Qaqish 2008).The link function is chosen to obtain inference on the target parameter of choice; for binary responses, the target parameters could be the odds ratio via the logit link, the risk ratio via the log link, or the risk difference via the identify link.Another advantage in using GEE for CRTs is that the estimation of mean model parameters is robust to misspecification of correlation structures in large samples.However, the specification of working independence correlation structure may result in efficiency loss that can be substantial when the cluster-period sizes are not all equal (Tian, Preisser, Esserman, Turner, Rathouz, and Li 2022).Furthermore, an over-simplified exchangeable correlation structure may give inaccurate power calculations when there is correlation decay in multi-period CRTs (Li 2020;Kasza, Hemming, Hooper, Matthews, and Forbes 2019).Thus, correlation structures informed by the study design are recommended for both study design and data analysis of stepped wedge and other multi-period CRTs.
Because CRTs are usually less powerful than individually randomized trials, determination of the proper number and allocation of study participants is critically important.A general, non-simulation, computationally fast power method based on the model-based variance was introduced by Rochon (1998).Moreover, as special cases of the fast GEE power method, simple-to-use sample size formulae for continuous responses and non-simulation procedures for binary responses have recently been proposed for complete, cross-sectional and cohort SW-CRTs (Li, Turner, and Preisser 2018).The methods extend earlier sample size formulae for GEE analysis of parallel-groups CRTs, including cross-sectional and cohort CRTs (Preisser et al. 2003(Preisser et al. , 2008) and multi-level CRTs (Reboussin, Preisser, Song, and Wolfson 2012;Teerenstra, Lu, Preisser, Van Achterberg, and Borm 2010;Wang, Turner, Preisser, and Li 2021).Prior work (Li et al. 2018;Li 2020) has shown that the analytical fast GEE power for marginal mean (e.g, intervention) parameters in complete SW-CRTs agrees well with simulated power based on GEE with finite-sample sandwich variance estimators for as few as eight clusters.Zhang, Preisser, Turner, Rathouz, Toles, and Li (2023b) shows, for CRTs with incomplete stepped wedge designs, fast GEE power based on the t-test agrees well with simulated power with the finite-sample sandwich variance estimator of (Kauermann and Carroll 2001) and twelve clusters.In scenarios with 6 clusters, it is similar to simulated power with estimations using the correctly specified model-based variance estimator.Those studies focus on the empirical performance of the analytical power rather than software tools to implement the method for different multi-period CRT designs.This article describes a comprehensive, analytical power method and accompanying SAS (SAS Institute Inc. 2020) macro for both complete and incomplete multi-period CRTs.The proposed GEE power procedure is motivated by the Connect-Home trial, which uses an incomplete, cross-sectional stepped wedge design to test an intervention to improve outcomes for rehabilitation patients transitioning from skilled nursing facilities (SNFs) to home-based care (Toles et al. 2021).The primary component of the intervention is an individualized Transition Plan of Care that SNF staff create to support the patient and caregiver at home.The incomplete design with six SNFs (clusters) and four patients per cluster-period (360 patients total), shown in Figure 1, was chosen based on considerations of internal validity and power under restrictions placed by available resources and logistical considerations.The black and orange boxes represent cluster-periods where no patients are enrolled, giving an incomplete design.Staggered enrollment of SNFs (clusters) is used to initiate data collection in stages with limited research staff resulting in the black boxes.The orange boxes represent the implementation phase, where two months are needed to activate the intensive intervention through training nursing home and home health care staff.The Connect-Home study design is distinct in several aspects.First, the number of periods (J = 22) is much greater than the number of sequences (S = 6), as compared to the standard SW-CRT in which J = S + 1 Hussey and Hughes (2007); Hemming et al. (2015); Li et al. (2018).Next, the incompleteness of the design adds to the complexity of power calculation.
Thus, the Connect-Home trial inspires an extension of the computationally fast, nonsimulation procedures for determining sample size and statistical power for GEE analysis from complete (Li et al. 2018) to incomplete SW-CRTs.Specifically, we introduce a SAS macro CRTFASTGEEPWR as a computationally efficient, non-simulation based routine for determining the statistical power in multi-period CRTs by further accommodating incompleteness at the design stage.Ouyang, Li, Preisser, and Taljaard (2022) reviewed 18 sample size calculators for stepped wedge CRTs and identified the SAS macro CRTFASTGEEPWR and the R (R Core Team 2023) package swdpwr (Chen, Zhou, Li, and Spiegelman 2022) as two of the most flexible approaches for non-simulation GEE power and sample size calculation for marginal models.Whereas swdpwr provides power calculation based on the average intervention effects mean model with categorical period effects for continuous and binary outcomes, exchangeable-type correlation structures, and equal cluster sizes for complete designs, the SAS macro CRTFASTGEEPWR additionally offers three options for coding intervention effects, a choice of linear effects or categorical period effects, count outcomes, time-decaying correlation structures, and unequal cluster-period sizes for both complete and incomplete multi-period CRT designs.The SAS macro CRTFASTGEEPWR is developed to accommodate binary, count and continuous responses in stepped wedge and other multi-period CRTs with a collection of commonly-seen multilevel intra-cluster correlation structures for incomplete cross-sectional and cohort designs.
The remainder of this article is organized as follows.Section 2 describes the populationaveraged models of interest with special consideration of correlation structures suitable for multi-period CRTs.Section 3 summarizes the general power procedure for GEE analysis with adaptation to stepped wedge designs.Section 4 presents the SAS macro details and four examples for complete and incomplete SW-CRTs.

Population-averaged model of multi-period CRTs
A unifying population-averaged model framework is described below for the design and statistical analysis of multi-period CRTs.The following notations apply to both cross-sectional and cohort multi-period CRTs, where there are J periods, S sequences, I clusters and I s clusters in sequence s, such that I = S s=1 I s .Let y ijk denote the response of the kth individual from cluster i during period j for i = 1, . . ., I, j = 1, . . ., J i , and k = 1, . . ., N ij , noting that J i ≤ J is the number of observed periods (i.e., with data collection) for cluster i, and N ij is the cluster-period size.Let µ ijk denote the marginal mean response of y ijk , which is related to the intervention effect δ and jth categorical period effect β j with link function g(.) via the marginal mean model Alternatively, categorical period effects in Equation 1 can be replaced with linear period effects: where β 0 is the intercept, and {t ij : j = 1, . . ., J i } are integer-valued calendar periods from the study design such that β 1 is the increment in the mean response on the scale of the link function for a unit increase in calendar period, and u ij is the treatment status in cluster i at period j.Linear period effects are particularly useful for, but are not limited to, multi-period CRTs with fewer clusters than periods (I < J), such as in the Connect-Home study.
Three types of intervention effect models are implemented in the SAS macro, the widely used average intervention effects model (Hussey and Hughes 2007;Hemming et al. 2015;Li et al. 2018), the incremental intervention effects model (Hughes, Granston, and Heagerty 2015) and the extended incremental intervention effects model.In the average intervention effects model, u ij is the period-specific treatment indicator (1 for intervention and 0 for control) for cluster i and δ is the intervention effect, irrespective of time on treatment, on the link function scale.
With the specification of u ij , population-averaged models could be used for different types of multi-period CRTs.In multi-period CRTs with parallel designs, u ij = 0 for all clusters under baseline period j, and u ij = 0, 1 in subsequent post-baseline periods depending on the treatment status.In the case of complete SW-CRTs, clusters switch from control condition to intervention at different periods.Thus, u ij = 0 in the control period, j = 1, . . ., b i and b i is the total periods under the control condition in cluster i, whereas, in intervention periods, Conversely, the incremental intervention effects model assumes a gradual uptake of the intervention such that its effect depends on time-on-treatment.Assuming a complete SW-CRT is specified with the incremental intervention effects model, the treatment status is u ij = 0 in the control period and u ij = (j − b i )/q, j = b i + 1, . . ., J i in the intervention period, where q > 0 is chosen to scale the intervention effect δ according to user specification.In the Connect-Home trial (Figure 1), q = 10 so that δ is defined as the full intervention effect on the link function scale after 10 periods, which corresponds to the number of intervention periods for the first SNF.Finally, an extended incremental intervention effects model is considered for designs additionally having a maintenance phase after the active intervention phase with q periods.In SW-CRTs with a maintenance phase, one research question relates to whether the patient benefit from the intervention as captured by the outcome is maintained after the active intervention period has ceased.The treatment status is u ij = (j − b i )/q, j = b i + 1, . . ., b i + q in the active intervention periods and u ij = 1 in the maintenance phase, for j > b i + q.
Specification of the marginal model is completed with the covariance structure of all individual responses in each cluster.Variance of the individual-level response is VAR(y ijk ) = v ijk ϕ where v ijk is the variance function and ϕ is the dispersion parameter.For binary responses, v ijk = µ ijk (1 − µ ijk ) and ϕ = 1, while for continuous responses following typical normal model assumptions, v ijk = 1 and ϕ is the constant variance.For count outcomes with Poisson distribution, v ijk = µ ijk and ϕ is the dispersion parameter.The macro assumes GEE is used to estimate the marginal mean parameters θ = (β 1 , . . ., β p , δ) ⊤ as well as the working correlation matrix parameters; common multilevel correlation structures for multi-period CRTs will be presented in Section 4.1 during the SAS macro description.

Variance in GEE analysis of multi-period CRTs
Using GEE analysis, parameters in the marginal mean model are estimated through the estimating equations (Liang and Zeger 1986): where the response vector is Y i = (y i11 , . . ., y iJ i N iJ i ) ⊤ and its expected mean vector is Table 1: Intra-cluster correlation: corr y ijk , y ij ′ k ′ under different correlation structures for SW-CRTs for cluster i, period j and individual k).
correlation with correlation parameters α and is used in the power calculation.
The choice of working correlation structure usually depends on design features of the CRT.Four within-cluster correlation structures for multi-period CRTs are summarized in Table 1.Specifically, there are two distinct correlation structures for each of the cohort and crosssectional design.Each structure incorporates the usual ICC, which measures the correlation between responses from different individuals within the same cluster during the same period: corr y ijk , y ijk ′ = α 0 or α 1 , k ̸ = k ′ .For cross-sectional designs, the nested exchangeable correlation structure additionally specifies a correlation parameter α 2 for observation pairs collected from different periods.Alternatively, exponential decay assumes the between-period correlation between responses from different individuals within the same cluster in the j th and j ′ th periods decays over time as α 0 r |j−j ′ | 0 . For cohort designs, the block exchangeable correlation structure distinguishes between-period correlations for pairs of individuals, α 2 , from a constant intra-individual correlation for repeated observations, α 3 (Li et al. 2018;Preisser et al. 2003).On the other hand, the general proportional decay correlation structure (Gallis, Wang, Rathouz, Preisser, Li, and Turner 2022) allows for correlation decay over time, where the intra-individual correlation corr , j ̸ = j ′ has a first-order autoregressive structure decay rate r 1 and the between-period correlation among responses from different individuals within the same cluster is reduces to the proportional decay correlation structure (Li 2020;Lefkopoulou, Moore, and Ryan 1989).Note that the nested exchangeable correlation for cross-sectional designs is a special case of block exchangeable correlation when α 2 = α 3 .Finally, the simple exchangeable correlation (Hussey and Hughes 2007) additionally specifies that within-and between-period correlations are equal α 1 = α 2 .Rochon (1998) proposed a power calculation procedure using the model-based variance, which depends on the design matrix and specification of the working correlation structure.Typically, simulations are used to determine power for population-averaged models with incomplete multi-period CRTs with complex correlation structures.However, there may be substantial computational burdens in generating empirical power based on GEE with a large number of simulation scenarios.To lessen computational burden, Zhang et al. (2023b) adapted the non-simulation based fast GEE power method (Rochon 1998) multi-period complete and incomplete SW-CRTs, which we implement in the SAS macro CRTFASTGEEPWR.

Overview of the fast GEE power method
The two-sided test of the intervention effect H 0 : δ = 0 vs H 1 : δ ̸ = 0 is based upon the asymptotic normal distribution of √ I( δ − δ) with mean zero and variance determined by the (p, p)-th element of VAR( √ I( θ − θ)) , when I is sufficiently large (Li et al. 2018).In turn, the Wald-test statistic δ/VAR( δ) has an asymptotically standard normal distribution under the null hypothesis.Thus, power to detect an intervention effect of size δ ̸ = 0 with a nominal type I error rate α is Φ z α/2 + |δ|/ VAR( δ) where Φ(•) is the standard normal cumulative distribution function and z α/2 is the normal quantile such that Φ(z α/2 ) = α/2.However, for CRTs with a small number of clusters, the t test is a good alternative with power modified as Φ t,I−p t α/2,I−p + |δ|/ VAR( δ) where t α/2,I−p is the α/2% quantile of the t distribution with I − p degrees of freedom.Typically, the degrees of freedom of the t statistic in CRTs is set to I −p, where p = dim(θ) is the number of estimated marginal mean model parameters in Equation 1; some authors have used I −2, which is sometimes preferred for multi-period CRTs with fewer number of clusters than periods (Li 2020;Ford and Westgate 2020).The modelbased variance of the intervention effect VAR( δ) in the determination of power is defined as the (p, p)-th element in the model-based variance matrix VAR MB ( θ).We refer to this general analytical power method (Rochon 1998) as "fast GEE power" because it is a computationally fast power calculation procedure for CRTs with GEE analysis.

Adaption of the fast GEE power for incomplete SW-CRTs
The fast GEE power procedure has been previously investigated for complete SW-CRTs for the average intervention effects marginal mean model with categorical period effects (Li et al. 2018).Motivated by Connect-Home, we define a class of incomplete designs for which Connect-Home is an archetype allowing for implementation periods and/or staggered entry/termination.Let b i0 and b i1 denote the first and last calendar periods of data collection for cluster i in the control condition (e.g., b 20 = 2 and b 21 = 7 for the second sequence, s = 2, in Figure 1), such that there are b i = b i1 − b i0 + 1 total periods in the control condition; q i0 and q i1 as the first and last calendar periods of data collection for cluster i in the intervention condition (e.g., q 20 = 10 and q 21 = 18, s = 2) such that there are q i = q i1 − q i0 + 1 total periods in the intervention condition; and c i implementation periods occurring in calendar periods A key step in applying the fast GEE power computation to incomplete SW-CRTs is the generation of cluster-level design matrices, consisting of covariates in Equation 1 or 2. For incomplete SW-CRTs, we specify a Design Pattern (DP) matrix by the notation of {b i0 , b i1 , q i0 , q i1 , i = 1, . . ., I} to represent the experimental design analogous to the power analysis of continuous responses in linear mixed models (Hemming et al. 2015).Each element in the DP matrix corresponds to a representative cluster-period in the SW-CRT design with entries of 0 for control condition, 1 for intervention condition and 2 for cluster-periods in sequences without data collection.To illustrate the specification of DP matrix, an example of an incomplete SW-CRT (Hemming et al. 2015) is used here with S = 2 treatment sequences, T = 4 periods, and an implementation period that occurs in period 2 for the first sequence and in period 3 for the second sequence.The incomplete design is specified by (b 10 , b 11 , q 10 , q 11 ) = (1, 1, 3, 4) for s = 1 and (b 20 , b 21 , q 20 , q 21 ) = (1, 2, 4, 4) for s = 2. Then DP = 0 2 1 1 0 0 2 1 The DP matrix serves to modify the design matrix obtained under the complete design to more accurately determine VAR MB ( θ) and thus VAR( δ) in the power calculation for CRTs with incomplete design.

Input arguments in the macro
A SAS macro CRTFASTGEEPWR that implements the fast GEE power method is developed for multi-period CRTs with complete and incomplete designs and available at https://www.bios.unc.edu/~preisser/personal/software.html.
Table 2 provides required and optional arguments in the macro, which are classified into three aspects: describing characteristics of the multi-period CRT, parameterizing the marginal mean model and choosing the working correlation structure.All required arguments are in boldface in the first column of Table 2.
First, users are required to describe characteristics of the multi-period CRT through a design pattern matrix, specified by DesignPattern, containing the number of sequences and periods, numerical indicators for treatment status, and the incompleteness in the design.The number of clusters in sequences are specified by a vector M to allow varying number of clusters across sequences.The SAS macro applies to cross-sectional and closed-cohort multi-period CRT designs but not to open cohort designs.Specifically, the SAS macro allows a varying number of participants across cluster periods for cross-sectional designs through the specification of CP_size_matrix.For closed-cohort designs, cluster-period sizes within rows (sequences) of the CP_size_matrix should be non-increasing.Equal cluster-period sizes within rows imply no dropout, whereas decreasing numbers are assumed to be due to dropout only (not to intermittent missing data) so that the design matrices for clusters are unambiguous.An example of decaying cohort size due to dropout in a closed-cohort CRT using CRTFASTGEEPWR is in Appendix C.1.
Marginal mean model options include binary, count, and continuous responses with three link functions, specified by dist and link, respectively.Note that the default link function is the canonical link.Meanwhile, the categorical period effects model or the linear period effects model is selected by specifying period_effect_type.The user also needs to choose one of three intervention effects models introduced in Section 2, specified by intervention_effect_type.For the incremental intervention effects model, max_intervention_period should be filled with the number of periods at which the full treatment effect δ is reached.For the extended incremental intervention effects model, the max_intervention_period is the number of periods under active intervention phase and there should be at least one maintenance period in each sequence.The intervention effect size and period effects at the scale of link function are all required with the choice of specific intervention and period effects model through delta and beta_period_effects.
The working correlation structure is specified through the specification of corr_type and corresponding intra-cluster correlations (ICCs).The macro does not have an option for crosssectional versus cohort design, but rather the type of study design should inform the user's choice of correlation structure as suggested in Table 1.The significance level for two-sided tests, specified by alpha is optional with a default value of 0.05.In the degrees of freedom determination, the number of clusters minus the number of marginal mean parameters, df = I − p, is the default formula.Another degrees of freedom determination df = I − 2 in Li ( 2020) is also available, specified by df_choice.
There are some consistency checks in SAS macro CRTFASTGEEPWR to ensure parameters are reasonably specified for the power calculation.First, if there are cluster periods in the design pattern matrix with no data collection (i.e., 2 in DesignPattern), the corresponding locations in the CP_size_matrix need to be 0. Second, the length of beta_period_effects should match the selected period effects model.For the linear period effects model, there are two period effects parameters.While for the categorical period effects model, the number of period effects parameters is equal to the column size of the Design pattern matrix DesignPattern.Finally, the marginal mean outcome µ ijk needs to be within the rational range based on the specific outcome type, such as µ ijk within (0,1) for binary outcomes and µ ijk are non-negative for count outcomes.Specifically, for binary outcomes, the Frèchet bounds are also checked based on the specification of working correlation structures and the marginal means to ensure their compatibility (Qaqish 2003).

Multi-period CRT examples of SAS macro CRTFASTGEEPWR
In this section, we focus on illustrating the power calculation of two complete and two incomplete stepped wedge cluster randomized trials using the SAS macro CRTFASTGEEPWR, with different outcome types, specification of marginal mean models, and correlation structures.
The first example illustrates power calculation based on the Connect-Home trial design (Figure 1) with linear period effects for a continuous outcome, patient preparedness for home care (a scale with range 0 to 100) assessed 7 days after discharge from the SNF.There are 6 sequences with 22 periods in the study, having 7 periods without patient enrollment and 15 periods with patient enrollment in each cluster.In the power calculation, there is 1 cluster in each sequence and 360 patients in the trial, in which 4 patients enrolled in each non-missing cluster period.In CP_size_matrix, 0 means no patients enrolled in the specific sequence and period, which corresponds to locations with 2 in the design pattern matrix.We assume the baseline patient preparedness score as β 0 = 68 and a gently increasing linear period effect such that β 1 = 0.1 for J = 1, . . ., 22 with common variance ϕ = 64 (standard deviation = 8).The full effect size is reached at 10 months on intervention condition (q = 10) for the incremental intervention effects model with δ = 10.ICCs are specified with (α 1 , α 2 ) = (0.03, 0.015) under the nested exchangeable correlation structure to indicate a small within-cluster correlation for the cross-sectional design.The power is calculated using the z test with normal approximation and t test with df = 3.From the results, the power using the z-test is much greater than the t test.Simulation studies have shown that the z test is too optimistic and tends to have an inflated test size in SW-CRTs with a small number of clusters (Li et al. 2018).Thus, we recommend calculating power with the t test for the Connect Home study.
%CRTFASTGEEPWR(alpha = 0.05, m = %str(J(5, 1, 8)), corr_type = ED, intervention_effect_type = AVE, delta = -0.789,period_effect_type = CAT, beta_period_effects = %str({-1.266,0.01, 0.01, 0.01, 0.01, 0.01}), alpha0 = 0.03, R0 = 0.8, dist = binary, phi = 1, CP_size_matrix = %str(J(5, 6, 2)), DesignPattern =%str({0 1 1 1 1 1, 0 0 1 1 1 1, 0 0 0 1 1 1, 0 0 0 0 1 1, 0 0 0 0 0 1})); The fast GEE power of binary outcomes with exponential decay correlation structure and (alpha0,r0):(0.03,0.8) Under average intervention effects model and delta = -0.789The fourth example is based on the Heart Health NOW (HHN) study with a large number of clusters.We assume the HHN study using a complete, stratified, SW-CRT evaluating the effect of primary care practice support on evidence-based cardiovascular disease (CVD) prevention, organizational change process measures, and patient outcomes, the latter captured by electronic health records (EHR, Weiner et al. 2015).Medical practices are randomized to receive the intervention at one of three time points (steps) within two strata defined by high (the first three treatment sequences, Figure 3) or low (last three rows) readiness for change.After four quarters in the intervention phase (green boxes), each practice enters a maintenance phase (gray boxes) for two to six quarters depending upon the allocated treatment sequence.HHN was a quality improvement research project whereby the intervention of practice facilitation aimed to bring about enduring change in patient-centered and organizational outcomes.We consider a combined binary outcome regarding whether there is hospitalization due to stroke, acute myocardial infarction, or angina for patients.We assume that there are 30 medical practices (clusters) in each sequence (practice cohort) with 100 patients enrolled in each cluster period; thus 1100 patients will be enrolled in each cluster.If the baseline probability of hospitalization is 0.05, making β 0 = log(0.05/0.95)= −2.944with an consistent decreasing period effects β 1 = −0.01.Under the extended incremental intervention effects model, the Figure 3: The study design of the HHN study: the blue, green and gray cells denote control, active intervention and maintenance cluster-periods, respectively.intervention effect is assumed to decrease the odds of hospitalization at the end of 4 quarters by 25% under the active intervention condition, δ = log(0.75)= −0.288,maintaining the same effect size in the maintenance periods.The working correlation structure for the binary outcome is a nested exchangeable correlation structure with ICCs (α 1 , α 2 ) = (0.03, 0.015).Considering the large cluster size and number of clusters, powers is very similar under the z test and t test, reaching 78%.

Discussion
This article proposes a fast GEE power method for binary, count, and continuous responses of complete and incomplete multi-period CRTs, including parallel-arm longitudinal CRTs, cluster randomized crossover trials and SW-CRTs.The fast GEE power approach was illustrated in the planning of four complete and incomplete cross-sectional stepped wedge designs for binary, count and continuous outcomes.The SAS macro CRTFASTGEEPWR is novel in several aspects.Through specification of the Design Pattern matrix in the spirit of Hemming et al. (2015) and Rochon (1998), the general GEE power method in SAS macro is implemented for four within-cluster correlation structures proposed in the recent literature for multi-period CRTs.To our knowledge, there is no other statistical softwares that unifies the multi-level correlation structures for designing cross-sectional and cohort CRTs.This proposed software for power of multi-period CRTs based on marginal models fills a gap by adding to the literature of power calculators, which has mostly focused on mixed models (Hughes et al. 2015;Hemming, Kasza, Hooper, Forbes, and Taljaard 2020).
In the SAS macro CRTFASTGEEPWR, the user needs to choose from among three parameterizations of the intervention effect and two specifications of the period effects.Categorical or linear period effects may be chosen.A linear time trend could be appropriate when the underlying secular trend under control condition increases or decreases at an approximately constant rate over time (e.g., an educated speculation from historic data before the planning of the study).When the secular trend is non-linear or when little prior information is available, categorical period effects might be a more robust specification and with fewer assumption compared to a linear time trend.In the special case of a complete stepped wedge design where the treatment sequences are balanced and the outcome is continuous, Grantham, Forbes, Heritier, and Kasza (2020) has shown that the model-based variance of the intervention effect estimator under linear period effects (Model 1) is identical to that under categorical period effects (Model 2).In more general cases, however, one would expect the variance of the intervention effect estimator to be larger under Model 1 due to the additional cost in degrees of freedom for estimating more period effect parameters.Therefore, whenever feasible, Model 1 can be considered as a more conservative option for power calculation, and the specified degrees of freedom should be concordant with this conservative analysis option.
As the power calculation of the macro CRTFASTGEEPWR is based on the t distribution, it offers two choices of degree of freedom based upon the intended GEE analysis, either I − 2 or I − p , where I is the number of clusters and p is the number of regression coefficients in the marginal mean model.The options are based on previous simulation studies for multiperiod CRTs (Li et al. 2018;Li 2020;Li, Yu, Rathouz, Turner, and Preisser 2022;Zhang et al. 2023b).In general, when the number of clusters is less than the number of periods, the user must set df = I − 2 or specify the linear time trend in Model 2 with df = I − 3 since otherwise the degrees of freedom becomes negative.As a concrete example, in the Connect-Home trial (Figure 1), I = 6 and J = 22 and, thus, I − p = 6 − 23 = −17 < 0. Therefore, a linear time trend is recommended when using I − p as the degrees of freedom.Zhang et al. (2023b) showed that predicted power based on the fast, non-simulation GEE power method with df = I − p has good agreement to the empirical power for testing the intervention effect from simulation studies when the bias-corrected sandwich variance estimator of Kauermann and Carroll (2001) is used in GEE analysis.Others have recommended using df = I − 2 (Ford and Westgate 2020;Li 2020;Li et al. 2022) based on simulation studies.Several variance estimators have been widely studied for use in GEE analysis.Based on simulation studies of its finite-sample properties, the bias-corrected variance estimator of Kauermann and Carroll (2001) has been recommended for use in GEE analysis of parallelgroups CRTs (Preisser et al. 2003), multi-level CRTs (Teerenstra et al. 2010), crossover CRTs (Li, Forbes, Turner, and Preisser 2019), and SW-CRTs (Li et al. 2018) with as few as 8 to 10 clusters.For as few as six clusters, recommendations are less certain.In this case, Zhang et al. (2023b) suggest that the GEE with the model-based variance matrix may be the best choice.In the case of constructing confidence intervals for ICCs, Preisser et al. (2008) and Li et al. (2022) recommend the bias-variance estimator of Mancl and DeRouen (2001) and matrix-adjusted estimating equations (MAEE) for point estimates.Evidence in the literature for count outcomes is relatively scarce compared to that for binary and continuous outcomes.
A limitation of the SAS macro CRTFASTGEEPWR is that the power procedure that it implements is based on the GEE model-based variance estimator, when sandwich variance estimators (Zeger, Liang, and Albert 1988), which validly estimate the true variance under misspecification of the correlation structure, are often used in analysis.As noted in the introduction, under correct specification of the correlation matrix, the accuracy of the fast GEE power method based on the GEE model-based variance matrix has been validated in simulation studies based on GEE with a bias-corrected variance estimator for a few as eight clusters (Kauermann and Carroll 2001;Li et al. 2022).Future research could investigate the use of asymptotic sandwich variance formulae in the fast GEE power method under correlation structure mis-specification.For example, if the planned GEE analysis specifies the working independence correlation matrix (Thompson, Hemming, Forbes, Fielding, and Hayes 2021), power could be computed under one or more assumed true non-independence correlation structures.Another adaptation of the fast GEE power method, with the goal of improving Type I error control in CRTs with small number of clusters, would base the power calculation on the sandwich variance formula that is the expectation of a bias-corrected empirical variance estimator (Kauermann and Carroll 2001;Mancl and DeRouen 2001;Fay and Graubard 2001).
Several recent software programs have been developed for the GEE analysis of studies with a small number of clusters.For researchers interested in GEE analysis of binary responses with bias-corrected variance estimators for CRTs, the SAS macro GEECORR by Shing, Preisser, and Zink (2021) is available.A SAS macro GEEMAEE developed by the authors for binary, count, and continuous outcomes additionally applies bias-corrections in the estimation of ICCs and their variance estimators, which is desirable for reporting ICC parameters as recommended by the CONSORT statement for stepped wedge trials (Hemming et al. 2018).The GEECORR and GEEMAEE SAS macro are available at https://www.bios.unc.edu/~preisser/personal/software.html.responsibility of the authors and do not necessarily represent the views of PCORI, its Board of Governors or Methodology Committee.John S. Preisser received stipend from PCORI for service on a data safety and monitoring board and as a merit reviewer.John S. Preisser did not serve on the Merit Review panel that reviewed this project.

A. Standard GEE analysis of multi-period CRTs
For GEE analysis of multi-period CRTs, we consider a general population averaged regression framework.Let µ ijk denote the marginal mean response for a continuous or categorical outcome y ijk , which is the k th subject in the i th cluster and the j th period, for i = 1, 2, . . ., I, j = 1, 2, . . ., J i , and k = 1, 2, . . ., n ij .The mean response µ ijk is related to covariates X ijk with link function g(.) in the following marginal mean model: where β is the (p + 1) × 1 parameter corresponding to the column of the design matrix and describes how the population averaged response depends on the covariates.(Zeger et al. 1988)

⊤
. Let the design matrix in the i th cluster be X i = (X i11 , . . ., X iJ i n iJ i ) ⊤ , where X ijk = x 1ijk , . . ., x (p+1)ijk ⊤ is a (p + 1) × 1 vector by combing p exploratory variables and an intercept.The estimating equations for β are: is a working variance matrix with parameter α, and A i = diag VAR (y i11 ), VAR (y i12 ), . . ., VAR y iJ i n iJ i .The variance of the response is denoted as VAR(y ijk ) = v ijk ϕ where v ijk is the variance function and ϕ is the dispersion parameter.For binary responses, v ijk = µ ijk (1 − µ ijk ) and ϕ = 1, while for continuous responses following typical normal model assumptions, v ij = 1 and ϕ is the constant variance.For count response, v ijk = µ ijk and ϕ is the dispersion parameter.Furthermore, the sandwich variance estimator of β is: (3) On the other hand, sandwich variance estimators have the virtue that they provide consistent estimates of the variance matrix for parameter estimates even when the assumed variance structure fails to hold.When the working covariance is believed to be equal to the true correlation structure, then a consistent estimator for variance estimator VAR( β) is given by . Meanwhile, the dispersion parameter could be estimated by the moment estimator proposed in Liang and Zeger (1986) j=1 n ij is the number of all participants in the CRT.

B. GEE analysis with finite-sample corrections
In a recent systematic review of stepped wedge CRTs, Kristunas, Morris, and Gray (2017) reported that the median number of clusters is 11 with a range between 6 to 22 clusters.When CRTs have a small number of clusters, there is a greater chance of unequal distributions of potentially confounding factors among the different conditions, necessitating more sophisticated randomization to maintain the study's intrinsic validity Murray (1998).Moreover, a small number of clusters can significantly degrade the efficiency of GEE analysis.The biased sandwich variance estimates in this section would help to reduce the bias of the standard sandwich variance estimator in equation 3.
B.1.Bias-corrected sandwich variance estimators Mancl and DeRouen (2001) showed that the GEE sandwich variance estimates in Equation 3tended to underestimate the variance of intervention effects, which results in a greater Waldtype test size than the nominal level with number of clusters less than 40.The estimated tests sizes with the model-based estimator under the true correlation structure are usually closer to the nominal level compared with the robust estimator but are still often inflated.Thus, bias-corrected sandwich variance estimates should be considered under the situation with a small number of clusters.The formula Σ−1 1 Σβ Σ−1 1 unifies the common types bias-corrected sandwich variance estimates for CRTs, where which is evaluated at the solution of the GEEs θ = ( β, α).When both F 1i and B 1i are identity matrices, Equation 4 reduces to the uncorrected sandwich estimator VAR( β) (Liang and Zeger 1986) in Equation 3 referred to as BC0.There are mainly three bias-corrected sandwich variance estimates to reduce the finite sample bias of the uncorrected sandwich variance estimator.Setting F 1i = I and B 1i = (I − H 1i ) −1/2 gives the finite-sample correction of Kauermann and Carroll (2001) or referred as BC1, where Next, setting F 1i = I and B 1i = (I − H 1i ) −1 gives the finite-sample correction of Mancl and DeRouen (2001) or referred as BC2.Because the matrix elements of the cluster leverage are between 0 and 1, there is an order of bias-correction BC0 < BC1 < BC2 (Preisser et al. 2008).

Finally, setting F
and B 1i = I gives the finite-sample correction of Fay and Graubard (2001) or BC3, where the bound parameter ζ is a user-defined constant (< 1) with a default value 0.75.
The number of clusters and the aim of the finite sample adjustment influence the choice of bias-corrected sandwich variance estimators.Most research is focused on bias-corrected sandwich variance estimators for the intervention effect.In recent work, Li et al. (2018) recommended using BC1 or BC3 and MAEE with a t-test for simulated power computation in the design phase for stepped wedge CRTs as few as 8 clusters.Zhang et al. (2023b) found that for incomplete stepped wedge CRTs with a small number of clusters, GEE analysis based on BC1 with MAEE is recommended to maintain the suitable type I error rate based on t-test.

C. Examples in applying CRTFASTGEEPWR for CRT designs
C.1.Calculate power for a closed-cohort CRT with decaying cluster sizes We demonstrate the power calculation for CRTs with decreasing cohort size, based on a hypothetical closed-cohort CRT with the same design as the Connect-Home trial.As in Section 4.1, we assume 4 patients are recruited at baseline and 1 patient drops out per cluster after 15 periods after their baseline period.The number of observation per cluster-cluster is shown in the following codes in CP_size_matrix.All of the parameters in the marginal mean model are set to the same values as those in Section 4.1.When describing the correlation for patients with repeated measurements in a cluster, we utilize the block exchangeable correlation structure with the within period ICC α 1 = 0.03, between-patient and between-period ICC α 2 = 0.015, and within-patient and between-period ICC α 3 = 0.2.In the software output, there are 348 observations collected from 6 clusters and the power using z test is still above 90%.While the power using t test is 61.5%, much smaller than the z test.Power using t test in the example C.1 is also much smaller than that in Section 4.1, which is caused by the greater ICC due to patients' repeated measure across time and drop-out at the end of the study.

Figure 1 :
Figure 1: The trial diagram of the Connect Home trial: the blue, orange and green cells denote control, implementation and intervention cluster-periods, respectively.

Figure 2 :
Figure 2: The trial diagram of the decision-making trial: the study includes 40 clusters and each treatment sequence includes 8 clusters.The blue and green cells denote control and intervention cluster-periods, respectively.

Table 2 :
Input arguments in the SAS macro CRTFASTGEEPWR.
Denote the vector of response vector as Y i = y i11 , . . ., y iJ i n iJ i ⊤with marginal mean vector µ i = µ i11 , . . ., µ iJ i n iJ i