randomizeR : An R Package for the Assessment and Implementation of Randomization in Clinical Trials

This introduction to the R package randomizeR is a slightly modiﬁed version of Uschner, Schindler, Heussen, and Hilgers (2016). Randomization in clinical trials is the key design technique to ensure the comparability of treatment groups. Although there exist a large number of software products which assist the researcher to implement randomization, no tool which covers a wide range of procedures and allows the comparative evaluation of the procedures under practical restrictions has been proposed in the literature. The R package randomizeR addresses this need. The paper includes a detailed description of the randomizeR package that serves as a tutorial for the generation of randomization sequences and the assessment of randomization procedures.


Introduction
Randomization is a design technique to ensure the comparability of treatment groups in clinical trials by introducing a deliberate element of chance.Armitage (1982) states the three main goals that are supposed to be achieved by randomization: First, it tends to balance known and unknown covariates and, thus, to produce structural equality of the treatment groups.Second, by ensuring effective blinding of treatment allocations from investigators and patients, randomization helps to avoid bias caused by the selection of patients.Finally, randomization contributes to the internal validity of a trial that provides the basis for statistical inference.The importance of randomization for clinical trials was first noted in the 1940s by Sir A. Bradford Hill (see Chalmers 1999) who realized that successful blinding of treatment allocations was impossible without randomization.Since that time, regulators have advocated the use of randomization in their guidelines (see for example ICH E9 1998) and several different randomization procedures have been proposed in the literature.It has been noticed that different randomization procedures behave differently, e.g., concerning their susceptibility to bias and their potential to control power and type-I-error probability.An overview containing the latest developments can be found in Rosenberger and Lachin (2016).
Despite the importance of randomization in the context of clinical trials, the novelties of the last decades have hardly found their way into clinical practice.Berger, Bejleri, and Agnor (2015) remark that clinicians stick with randomization procedures that are easily available, although their poor properties have been widely discussed and better alternatives have been proposed.Notably, the most suitable randomization procedure may depend on the clinical context.For example, when block randomization is used, Tamm and Hilgers (2014) advocate small block sizes to control chronological bias, while Kennes, Hilgers, and Heussen (2012) show that larger blocks are better to control selection bias.
The choice of an adequate randomization procedure is crucial to achieve favorable properties such as control of power and type-I-error rate, especially for small sample sizes where asymptotic assumptions do not hold.The R package randomizeR assists clinical trialists in making a scientifically sound choice based on these criteria.It combines the assessment of randomization procedures with the generation of randomization lists for clinical trials.Existing software tools such as the blockrand package by Snow (2013) implement only a very limited number of randomization procedures.The special class of response-adaptive randomization procedures is not included in randomizeR, but has currently been implemented for trials with time-to-event outcome in the MTLAB package RARtool by Ryeznik, Sverdlov, and Wong (2015).To our knowledge, no software package for the assessment of the practical properties of randomization procedures exists, particularly not for the assessment of their exact properties in small clinical trials.This article gives a detailed description of randomizeR.It is organized as follows.Section 2 gives the background for assessing the properties of randomization procedures, such as susceptibility to bias.The randomization procedures are presented in Section 2.3.Section 3 shows how randomizeR implements the methods from the background chapter.Particularly, Section 3.2.2provides a tutorial for the generation of randomization lists with randomizeR, and 3.3.1 illustrates the assessment of randomization procedures with respect to the different types of bias and other criteria.

Notation and model
We consider the case of a two armed clinical trial with parallel group design.Let A and B be treatments that influence a continuous outcome Y .A randomization procedure M is a probability distribution on the space Ω = {0, 1} N .A randomization sequence is an element t ∈ Ω, where t i = 1 if patient i is allocated to treatment A and t i = 0 otherwise.Let T = (T 1 , . . ., T N ) denote a random variable with probability distribution M taking values in Ω.Throughout the paper, let the outcome y i of patient i = 1, . . ., N be the realization of a normally distributed random variable with group expectations µ A , µ B and equal but unknown variance σ 2 > 0. The outcome y i is called response.Higher values of the response are regarded as better.We test the null hypothesis that the expectation of the experimental treatment A does not differ from the expectation of the control treatment B against the two-sided alternative with b i the fixed bias of patient i.

Criteria for the assessment
The potential of a randomization procedure to control the impact of bias on the study results along with other exemplary criteria for the choice of a randomization procedure are summarized in this section.The assessment of the impact of bias is important if the form of the bias is unknown or the bias is unobserved.

Susceptibility to chronological bias
Changes in study environment, e.g., increased diagnostic potential, may impact the response to treatment over time.Unobserved time trends lead to a bias of the estimator of the treatment effect, for which Matts and McHugh (1978) used the term chronological bias.Chronological bias is a special case of accidental bias as introduced by Efron (1971).Efron investigated the effects of covariates that have been (unintentionally) ignored in the model.Although the underlying trend could in theory be included in the model, the bias is often unobserved, or the exact form is unknown.Furthermore, especially in small population groups, it is challenging to use models with many explanatory variables.Rosenkranz (2011) measured the impact of chronological bias by the distortion of the type-I-error rate of the t test when the responses are influenced by a trend τ (i, ϑ): Tamm and Hilgers (2014) proposed three shapes of trend that are summarized in Table 1.
Here log denotes the natural logarithm and 1 {i≥n 0 } is the indicator function yielding the value one if i ≥ n 0 and zero if i < n 0 .

Susceptibility to selection bias
Sir A. Bradford Hill was the first to adopt randomization in clinical trials (see Chalmers 1999).His aim was to ensure effective blinding and to avoid bias due to the conscious or unconscious selection of patients to treatment groups, the so-called selection bias.We consider two measures for selection bias proposed in the literature, the expected number of correct guesses and the influence of selection bias on the test decision.
The expected number of correct guesses was introduced by Blackwell and Hodges (1957).They assume that, based on past treatment assignments, the investigator consciously or unconsciously guesses the next treatment based on the past assignments.Suppose the investigator guesses that patient i > 1 will receive treatment g(t, i) ∈ {A, B} based on the previous assignments (t 1 , . . ., t i−1 ).The correct guesses of a randomization sequence is the number of assignments the investigator guesses correctly: Two guessing strategies were investigated by Blackwell and Hodges (1957).Under the convergence strategy (CS), the investigator assumes that the next patient is assigned to the group that has so far been assigned less.Under the divergence strategy (DS) the experimenter assumes that the next patient is assigned to the treatment that has so far been observed more often.At the beginning of the trial and when there is a tie, the investigator guesses the next allocation at random.
The expected number of correct guesses E(CG(t)) reflects the predictability of a sequence t ∈ Ω.The overall predictability of a randomization procedure M is given by the average proportion of correct guesses: Proschan (1994) proposed to measure the influence of selection bias on the test decision when the responses are biased as a result of selecting the patients following the convergence strategy: where denotes the imbalance of a randomization sequence, sign(x) denotes the sign function, and η denotes the selection effect.Tamm, Cramer, Kennes, and Heussen (2012) demonstrated the impact of selection bias for different values of the selection effect η.

Balancing behavior
According to ICH E9 (1998), it is desirable for a randomization procedure to balance the group sizes throughout the trial as well as at the end of the trial, while avoiding predictability.Table 2 summarizes the measures for the imbalance that have been proposed in the literature (see for example Atkinson 2014).According to Lachin (1988) imbalance may cause decreased power of the statistical test in case of continuous endpoint and homoscedasticity.

Randomization procedures
Randomization procedures can be described in terms of a restricted or unrestricted random walk (see Proschan 1994).The restrictions imposed on the random walk lead to different randomization procedures.In this section, we give a short overview about the randomization procedures that are implemented in randomizeR.For a comprehensive overview we refer to Rosenberger and Lachin (2016).

Imbalance
Formula Table 2: Imbalance measures implemented in randomizeR.Patient i q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q Figure 1: Random walk of the randomization sequences of CR.
Complete Randomization (CR) is equivalent to tossing a fair coin for the allocation of each patient.CR leads to 2 N equi-probable sequences where N denotes the total sample size.
Figure 1 shows a randomization sequence produced by a CR in heavy black, along with all possible sequences in light gray.
Using Random Allocation Rule (RAR), patients are allocated by drawing N times without replacement from an urn consisting of N/2 balls for each treatment.RAR produces N N/2 equi-probable sequences that all attain final balance.
Permuted Block Randomization (PBR) with block constellation bc = (k 1 , . . ., k m ) balances the allocations in the blocks of length k 1 , . . ., k m .For each block j = 1, . . ., m, an urn is filled with k j /2 balls for each of the two treatments, and k j balls are drawn without replacement from the urn.PBR leads to m j=1 k j k j /2 equi-probable sequences that attain balance after each block, particularly at the end.
Permuted Block Randomization with random block constellation (RPBR) is similar to PBR, but the block constellation bc is sampled at random from the set of given block lengths rb.RPBR permits two variations (Heussen 2004;Rosenberger and Lachin 2016): The entries k j of bc can either be drawn uniformly with replacement from rb until k j ≥ N , or be conditioned to achieve final balance, namely k j = N .
The Truncated Binomial Design (TBD) consists of tossing a fair coin for the allocation of patients until N/2 heads or tails have occurred.The remaining patients are allocated deterministically to the opposite group.TBD results in sequences that attain final balance but are not equiprobable.TBD admits the same number of sequences as RAR.As an extension, TBD can be conducted in blocks similar to PBR, or similar to RPBR with random block constellation (RTBD).
The Maximal Procedure (MP) was proposed by Berger, Ivanova, and Knoll (2003).MP gives equal probability to all sequences that attain final balance and do not exceed a pre-specified maximum tolerated imbalance mti ∈ N. Always when the imbalance boundary is reached, i.e., |D i | = mti, a deterministic assignment is made to the underrepresented group in order to reduce the imbalance.
The Big Stick Design (BSD) introduced by Soares and Wu (1983) consists of tossing a fair coin until the imbalance |D i | reaches a maximum tolerated imbalance mti ∈ N.
In Efron's Biased Coin design (EBC) (see Efron 1971), a biased coin is tossed for the next allocation in order to reduce the imbalance with probability 0.5 ≤ p ≤ 1.When the groups are balanced, a fair coin is tossed.
Chen (1999) proposed a biased coin design with imbalance tolerance (CHEN) that combines BSD and EBC.When the groups are balanced, a fair coin is tossed for the allocation of patients.Otherwise, a biased coin with probability 0.5 ≤ p ≤ 1 is tossed until reaching an imbalance boundary mti ∈ N. When the imbalance boundary is reached, a deterministic assignment is made in order to reduce the imbalance.
The Accelerated Biased Coin Design (ABCD) proposed by Antognini and Giovagnoli (2004) uses the toss of a biased coin for the allocation of patients.The probability for allocating the ith patient to the experimental group depends on the imbalance D i−1 and an acceleration parameter a > 0. The acceleration parameter exponentially weights the imbalance and thus determines the degree of randomness of the design.
The Bayesian Biased Coin Design (BBCD) proposed under the name 'dominant biased coin design' by Antognini and Zagoraiou (2014) is similar to ABCD.Here, the probability p i of allocating the ith patient to the experimental group depends on the acceleration parameter a > 0 and the ratio N A (i − 1)/N B (i − 1) where N A (i − 1) and N B (i − 1) are the numbers of patients in groups A and B respectively after allocating i − 1 patients.
Wei's Urn Design (UD) consists of N draws of an urn whose composition is updated after each draw.Before allocating the first patient, the urn contains an initial number (ini ≥ 0) of balls of different colors for each treatment.For the allocation of a patient, a ball is drawn, the color is noted, and replaced along with an additional number (add ≥ 0) of balls of the of the opposite color.
The Generalized Biased Coin Design (GBCD) was developed by Smith (1984) to extend various designs.A biased coin is tossed for the allocation of patients, where probability of allocating the next patient to the experimental group depends on N A (i − 1) and N B (i − 1) as well as a balancing parameter ρ ≥ 0.
The Hadamard Randomization (HADA) proposed by Bailey and Nelson (2003) uses the rows of a special Hadamard matrix H ∈ {0, 1} 11×12 for the allocation of patients.Rows from H are sampled until the number of allocations reaches the planned sample size N .
Table 3 in the next chapter shows how the presented randomization procedures can be used in randomizeR.

Implementation
The assessment of a randomization procedure is based on a set of allocation sequences.Some of the assessment criteria, for example type-I-error or power, depend on a response, while others, e.g., correct guesses, are independent of a response.There are two different options to generate a set of allocation sequences depending on the sample size N , and two different methods to calculate the response based assessment criteria: Complete or simulated reference set: In case of N ≤ 24, it is possible to generate the set of all possible allocation sequences {0, 1} N , assess the eligibility of an allocation sequence and, independently, calculate the associated probabilities for the allocation sequences by the randomization algorithm.This results in the complete set of the sequences.The function getAllSeq provides this functionality, see Section 3.2.3.
In case of N > 24, the complete set cannot be calculated in reasonable time.Instead, a number r of allocation sequences is generated undergoing the randomization algorithm of the particular randomization procedure.The function genSeq provides this functionality, see Section 3.2.3.Formally, the relative frequencies of the allocation sequences can be used to estimate the true probabilities of allocation sequences.Note therefore that, in a simulation, the relative frequency of a randomization sequence is used.This results in the simulated set of the sequences.The simulated set can also be applied in case of smaller sample sizes, and procedures where the exact approach is not available in the literature, i.e., Block Randomization with random block constellation and Hadamard Randomization.
Exact or simulated response based assessment criteria: The susceptibility of a randomization procedure to bias can be measured as the distortion of the type-I-or type-II-error probability.The exact method computes the distribution of the exact rejection probabilities.For each randomization sequence in the reference set, the rejection probability of Student's t test is calculated using the knowledge on the b i from Model 3 (see Langer 2014, Chapter 4).The simulation method simulates a response vector for each allocation sequence in the reference set according to Model 3 and derives a test decision of Student's t test.The type-I-or type-II-error rate is computed as the proportion of falsely rejected test decisions.This method was used for example by Proschan (1994).Section 3.3 shows how the exact and simulated type-I-and type-II-error probabilities can be assessed with randomizeR.
Any combination of the above methods can be used.For example, a simulated reference set can be used to assess the distribution of the exact rejection probabilities.The combination of simulated error rate with simulated reference set is usually used in the literature.For small sample sizes, the combination of exact reference set with distribution of exact error probabilities yields most accurate results.
All sampling algorithms use R's standard random number generator, the Mersenne-Twister (R Core Team 2016).

Overview
The randomizeR package covers two closely connected purposes: The generation of randomization sequences and the assessment of randomization procedures according to the aforementioned criteria.The previous chapter was dedicated to introducing the basic terms and literature from the field.In the present chapter, we show how randomizeR addresses these purposes.
The current version of randomizeR is based on R 3.3.0.It can be loaded in an R session via: R> library("randomizeR") All the main components of randomizeR are implemented using the S4 object oriented system.

Generating randomization sequences
There are two main purposes for the generation of randomization sequences.The first purpose is the generation of a single sequence for the allocation of patients in a clinical trial.
The second purpose is the generation of multiple sequences in order to assess the properties of a randomization procedure.Both purposes can be regarded as functions that use the randomization procedure itself as basis for their behaviour.They are therefore implemented as methods that take an object representing the randomization procedure as input.

Representing randomization procedures
randomizeR implements randomization procedures as subclasses of the randPar class.For example, an object representing Complete Randomization with sample size N = 10 is generated by R> N <-10 R> (params <-crPar(N)) Object of class "crPar" design = CR N = 10 groups = A B The function crPar is a so-called constructor function, namely a function that generates an object of the class crPar and prepares it for use.The object param then contains all the information about the randomization procedure.Table 3 summarizes the constructor functions for the randomization procedures described in Section 2.3.In randomizeR an overview over the implemented randomization procedures is shown by ?randPar.

Generation of a single sequence
We can use the function genSeq to generate a single randomization sequence for a particular clinical trial.It takes an object representing a randomization procedure as input and generates a sequence at random using this procedure.For example, the following code generates a randomization sequence using Complete Randomization with sample size N = 10 as above: R> params <-crPar(N) R> (R <-genSeq(params)) Object of class "rCrSeq" design = CR seed = 808898100 N = 10 groups = A B The sequence M: To ensure the reproducibility of the results and to enhance the reporting of the randomization procedure, the function genSeq saves all the information that was used for the generation of the randomization sequence in the object R along with the randomization sequence itself.
In order to obtain the randomization sequence stored in the object R, we can use the function getRandList: R> getRandList(R) ,] "B" "A" "A" "A" "B" "A" "B" "A" "A" "A" The randomization sequence and the other information stored in the object R can conveniently be saved to a .csvfile by using the saveRand function: R> saveRand(R, file="myRandList.csv") Figure 1 in Section 2.3 shows the random walk of the randomization sequence R.This figure can be generated using the function plotSeq in randomizeR:

R> plotSeq(R, plotAllSeq = T)
The function genSeq can generate randomization sequences for all randomization procedures from Table 3.The function genSeq has a method for each randomization procedure.For all randomization procedures, its output is an object of a class extending the class randSeq.

Generation of a set of sequences
Randomization procedures can be assessed based on the set of sequences they produce (see Section 2.4).randomizeR provides two ways to generate multiple randomization sequences from a specific procedure.
For small sample sizes N ≤ 24, we can use the function getAllSeq to generate the complete set of randomization sequences.It takes an object representing a randomization procedure as input and calculates the complete set of randomization sequences of that procedure.For example, in the above case of Complete Randomization with sample size N = 10, we run the statement

The first 3 of 1024 sequences of M: 1 A A A A A A A A A A 2 B A A A A A A A A A 3 A B A A A A A
to get the complete set of 2 10 = 1024 sequences.Note that the function getAllSeq does not support the random block designs RPBR and RTBD or the Hadamard Randomization, because no algorithms have been established in the literature.
In those cases where the enumeration of the complete set of sequences is computationally intensive or algorithmically not feasible, the function genSeq can be used to generate a simulated reference set.For example for sample size N = 50 and Complete Randomization, a simulated reference set of size r = 10, 000 is generated by

Assessing randomization procedures
Most of the assessment criteria in Section 2.2 depend on the assumtion of normally distributed responses according to (1).Assume that both treatments have equal expectation µ A = µ B = 0 and variances σ A = σ B = 1.Then we can use the function normEndp representing the normal endpoint to pass these assumtions to randomizeR by setting R> muA <-muB <-0 R> sigmaA <-sigmaB <-1 R> normalEndpoint <-normEndp(mu = c(muA, muB), sigma = c(sigmaA, sigmaB)) The class endpoint provides flexibility for the extension to other endpoints.Currently, only normal endpoints are available.
randomizeR implements the criteria for the evaluation of randomization procedures as subclasses of the class issue.For example, an object representing the exact rejection probability in the presence of chronological bias due to linear time trend that has a strength of ϑ = 1, is generated by R> (cb <-chronBias(type = "linT", theta = 1, method = "exact")) Object of class "chronBias" TYPE = linT THETA = 1 METHOD = exact ALPHA = 0.05 The parameter method indicates whether the exact distribution of the type-I-error rate should be calculated or whether the test decision should be simulated by generating responses that are influenced by the trend (method = "sim").The function chronBias is a constructor function for objects of the class chronBias.The object cb contains all the information about the bias.Table 4 summarizes the constructor functions for all the criteria of assessment presented in Section 2.2.In randomizeR an overview over the implemented assessment criteria is shown by ?issues.

Assessment of a randomization procedure
The randomizeR package includes the function assess to evaluate the behavior of a randomization procedure with respect to one or more of the criteria from Section 2.2.
R> (A <-assess(bsdSeq, sb, cb, pw, endp = normalEndpoint)) Assessment of a randomization procedure The first 3 rows of 972 rows of D: In the case of the Big Stick Design with sample size N = 12 and maximum tolerated imbalance mti = 2, there are 972 possible sequences.The first column of the assessment corresponds to the randomization sequence and the second to its probability of occurrence of the sequence.The following columns correspond to the criteria sb, cb and pw passed to assess.The notation P(rej)(type) refers to the probability of rejection in the presence of the given type of bias.The column power(exact) gives the exact power of each randomization sequence.Any number of assessment criteria can be passed to assess.For the criteria imbal and corGuess the endpoint endp is not relevant and can be omitted.The summary of the assessment shows the important characteristics of the distribution of the allocation sequences such as mean, standard deviation, minimum, maximum and the quantiles for each criterion: R> summary(A) For example, the five percent quantile x05 of the probability of rejection in the presence of a linear time trend linT is 0.042.Note that, until now, only the mean value of each criterion has been studied in the literature.

Comparison of randomization procedures
randomizeR provides the function compare for the comparison of several randomization procedures with respect to one of the criteria from Section 2.2.For example, assume we are in the same setting as above (N = 12) and want to compare the Big Stick Design, the Maximal Procedure with mti = 2 and the Permuted Block Randomization with block size four with respect to their susceptibility to selection bias.We can partly recycle the previous code and set the parameters for the other randomization procedures we want to compare: R> mpSeq <-getAllSeq(mpPar(N, mti)) R> bc <-rep(4, N/4) R> pbrSeq <-getAllSeq(pbrPar(bc)) Then the following code compares the aforementioned procedures with respect to selection bias: R> (C <-compare(sb, bsdSeq, mpSeq, pbrSeq, endp = normalEndpoint)) (a) Violin plot q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q 0.05  The distribution of a criterion for a particular randomization procedure can be visualized and compared between different randomization procedures via boxplot (Figure 2b) or violin plot (Figure 2a): R> plot(C) R> plot(C, y = "boxplot") For the generation of boxplot and violin plot, we used the functions geom_boxplot and geom_violin of the R package ggplot2 proposed by Wickham (2009).

Summary and further research
randomizeR is an R package that facilitates the generation of randomization lists for a large number of randomization procedures and makes the assessment of randomization procedures with respect to various criteria possible.The package currently implements 15 randomization procedures and six criteria for the assessment of the procedures.It assists researchers at the design stage of a clinical trial by letting the choice of a randomization procedure and the implementation of the design go hand in hand.
We are working towards extending randomizeR in various directions.The object oriented approach makes it easy to add new randomization procedures and assessment criteria.A unified assessment criterion may be included to uniformly judge the suitability of a randomization procedure based on various criteria.The models may be extended to other endpoints such as time-to-event data, or to more than two treatment groups.Finally, randomization tests may be implemented to enable randomization based inference independent of parametric assumptions.

Figure 2 :
Figure 2: Comparative visualization of the distribution of the type-I-error probability under the influence of unobserved selection bias for BSD(2), MP(2) and PBR(4).

Table 1 :
The three types of time trend included in randomizeR.When the response Y i of patient i is influenced by an unobserved quantity b i , we call b i bias of the i-th patient.Bias includes selection bias or chronological bias, or both, as proposed in the sections 2.2.1 and 2.2.2.We investigate how the randomization procedures manage the misspecification of the model

Table 4 :
Criteria for the assessment.