Depends R (> = 2.6.0), MASS License GPL (> = 2)

Longitudinal data from factorial experiments frequently arise in various fields of study, ranging from medicine and biology to public policy and sociology. In most practical situations, the distribution of observed data is unknown and there may exist a number of atypical measurements and outliers. Hence, use of parametric and semi-parametric procedures that impose restrictive distributional assumptions on observed longitudinal samples becomes questionable. This, in turn, has led to a substantial demand for statistical procedures that enable us to accurately and reliably analyze longitudinal measurements in factorial experiments with minimal conditions on available data, and robust nonparametric methodology offering such a possibility becomes of particular practical importance. In this article, we introduce a new R package nparLD which provides statisticians and researchers from other disciplines an easy and user-friendly access to the most up-to-date robust rank-based methods for the analysis of longitudinal data in factorial settings. We illustrate the implemented procedures by case studies from dentistry, biology, and medicine.


Introduction
Longitudinal data are measurements collected from the same experimental units, usually referred to as subjects or individuals, over time.Such data are widely encountered in biology, medicine, public policy, sociology, and psychology, and often arise in the following factorial settings: One homogeneous group of subjects is observed repeatedly at s = 1, . . ., t time points.
One homogeneous group of subjects is observed repeatedly at s = 1, . . ., t time points, where each subject is observed under each time point several times (e.g., different vessel sections, left and right eye).
One homogeneous group of subjects is observed repeatedly under = 1, . . ., c conditions, where each subject is observed under each condition at s = 1, . . ., t time points.
More than one homogeneous group of subjects (e.g., male and female, different treatment groups) are observed repeatedly at s = 1, . . ., t time points.Typical questions that most practitioners deal with are: Do the treatments have the same effect?Is the time profile flat?
Are the effects of the treatments similar over time?Are the treatment profiles parallel?
Alternatively, these practical questions can be translated into the statistical language as treatment effects, time effects, and interaction effects between treatment and time, respectively.
Measurements from the same experimental unit may be dependent, which leads to an extra level of complexity in longitudinal studies.Diggle, Liang, and Zeger (1994) provide a comprehensive overview of existing methods for longitudinal data analysis: generalized linear models (GLM) and their extensions, generalized linear mixed models (GLMM) and generalized estimating equations (GEE, Breslow and Clayton 1993;Zeger and Liang 1992;Liang and Zeger 1986).Most of these procedures are implemented in the commercial software SAS (SAS Institute Inc. 2003), e.g., SAS PROC MIXED for the analysis of linear mixed models.Publicly available software for longitudinal data analysis includes implementation of GLM/GLMM in R (R Development Core Team 2012) through the glmmPQL() function in MASS (Venables and Ripley 2002), the lme() function in nlme (Pinheiro, Bates, DebRoy, Sarkar, and R Development Core Team 2012), and the lmer() function in lme4 (Bates, Maechler, and Bolker 2012); while GEE is available through gee (Carey, Lumley, and Ripley 2012) and geepack (Halekoh, Højsgaard, and Yan 2006, and references therein).However, all of these (semi)parametric procedures are based on specific model assumptions, e.g., existence of an expectation or homogeneous variances.In practice, such conditions can rarely be verified, and if observed measurements do not reflect the imposed conditions, e.g., in case of skewed distributions, outliers, or small sample sizes, parametric statistical procedures may result in unreliable or even false conclusions.
As an alternative, we can employ nonparametric rank-based methods that offer a flexible and robust framework for the analysis of a variety of longitudinal studies.In particular, in contrast to parametric procedures for factorial designs, the rank-based methodology is not restricted to data on a continuous scale and enables to analyze ordered categorical, dichotomous, and heavily skewed data in a systematic way (Konietschke, Bathke, Hothorn, and Brunner 2010).
Moreover, such nonparametric methods are robust to outliers and exhibit competitive performance for small sample sizes (Brunner, Domhof, and Langer 2002, Section 1.1, pp. 2).Note that, as discussed by Brunner et al. (2002), Robson (2002), Lehmann (2009), and Romano (2009), and references therein, if the assumptions of a parametric method are satisfied, the parametric procedure typically yields a higher power efficiency than its nonparametric counterpart.However, if these assumptions are not met or impossible to verify, the conclusions from the parametric method could be unreliable or even false.In such situations, more broadly applicable nonparametric approaches are preferred.Brunner and Puri (2001) and Brunner et al. (2002) provide a detailed overview of purely rank-based nonparametric methods for longitudinal data in factorial experiments, including descriptive point estimators of relative treatment effects (RTEs), confidence intervals, and test procedures.The hypotheses of "no treatment effect", "no time effect", and "no interaction between treatment and time" can be tested with nonparametric procedures for the analysis of data from factorial designs.Hereby, the hypotheses are not formulated in terms of expectations of treatment effects, but rather in terms of their distribution functions.
The working group from the Department of Medical Statistics, University of Göttingen, provides a SAS/IML macro library for nonparametric analyses of factorial longitudinal data.For each of different factorial designs, interactive matrix language (IML) code is implemented to test hypotheses formulated above and to compute confidence intervals.Given a high demand in publicly available software for robust longitudinal data analysis (Erceg-Hurn and Mirosevich 2008), we develop an R version of this SAS/IML macro library, i.e., a user friendly package nparLD that is freely available from the Comprehensive R Archive Network at http://CRAN.R-project.org/package=nparLD and offers nonparametric methodology for the most frequently arising factorial designs.The package nparLD is the first comprehensive R package that supports nonparametric methods in higher-way layouts.The name nparLD is interpreted as follows: "npar" stands for "nonparametric" while "LD" stands for "longitudinal data".All the implemented functions are accompanied by detailed help files and numerous examples.The goal of this paper is to introduce the developed nonparametric procedures in nparLD to a wide audience of statisticians and practitioners dealing with longitudinal studies.
The paper is organized as follows.In Section 2, we provide a brief overview of the nonparametric marginal model and commonly used factorial designs.In Section 3, we discuss the interpretation of results along with a review of relative treatment effects and their relationship to Wald-type statistics and analysis of variance (ANOVA)-type test statistics which are denoted by WTS and ATS, respectively, using case studies from dentistry, medicine, and biology.In Section 4, three different factorial longitudinal experiments are statistically evaluated with the new R package nparLD.Section 5 contains some conclusions and an outlook to future work.

Nonparametric factorial designs and hypotheses
We describe the idea of the nonparametric marginal model and its connection to different types of commonly arising factorial designs for longitudinal data.To classify common factorial designs, we introduce a notational system for each design depending on the number of factors.The factor which stratifies samples in independent groups, is called a whole-plot factor; while

Data
Marginal distributions Time (Factor T ) Table 1: The LD-F1 design and the corresponding marginal distributions.
the factor, stratifying repeated measurements, is called a sub-plot factor1 .For example, when male and female subjects are observed at three time points, the factor sex is the whole-plot factor and the factor time is the sub-plot factor.Each design is denoted by Fx-LD-Fy, where x and y are the number of whole-plot and sub-plot factors, respectively, while "LD" stands for "longitudinal data".If only one group of homogeneous subjects is being considered, then such a design is denoted by LD-Fy instead.Examples of LD-F1, F1-LD-F1, and F2-LD-F1 designs are studied in Section 4. For more details and other common designs, we refer to Brunner et al. (2002, pp. 25 ff.).

LD-F1 design
We consider an experimental design consisting of k = 1, . . ., n experimental units (subjects).Each subject is observed repeatedly t times.The data from subject k are collected in the vector with marginal distributions X 1s , . . ., X ns ∼ F s , s = 1, . . ., t.
Here, F s denotes the marginal cumulative distribution function from sample s.The total number of observations is N = n • t.The data structure of such a trial is displayed in Table 1.
The hypothesis of "no time effect" is expressed in terms of the marginal distribution functions as H F 0 (T ) : F 1 = . . .= F t and was introduced by Akritas and Arnold (1994).
As an illustration of the LD-F1 design, we may consider the psychiatric clinical trial by Bandelow et al. (1998) where the clinical global impression score (CGI) of 16 patients suffering from panic disorder was recorded during eight weeks under a treatment.Note that, since the response is measured on an ordered categorical scale, parametric and semiparametric meanbased approaches are inappropriate and may lead to unreliable conclusions.
In many factorial experiments, more than one homogeneous group of subjects is observed at multiple time points.Hence, in such cases, we get the Fx-LD-F1 design, where x is the number of whole-plot factors (e.g., treatments).The most common situation is the F1-LD-F1 design which is examined in Section 2.2.

Data
Marginal distributions Time (Factor T ) Time Factor The F1-LD-F1 design and the corresponding marginal distributions.

F1-LD-F1 design
Suppose that a different groups of homogeneous subjects are observed repeatedly at t different time points and each group receives a randomly assigned treatment (treatment 1, treatment 2,. .., treatment a).The statistical model underlying this design can be described by independent random vectors X ik = (X ik1 , . . ., X ikt ) , k = 1, . . ., n i , with marginal distributions X iks ∼ F is , i = 1, . . ., a; s = 1, . . ., t.The total number of observations is N = n • t, where n = a i=1 n i .The data structure of this F1-LD-F1 trial is displayed in Table 2.The hypotheses of no main effect A, no main time effect T , and no interaction (AT ) between A and T , are expressed in terms of the marginal distribution functions: where F is denotes the mean distribution over time for treatment group i, i = 1, . . ., a, F .s = 1 a a i=1 F is denotes the mean distribution over the treatment groups for time point s, s = 1, . . ., t, and F is denotes the overall mean distribution.Note that the hypotheses for the classical (parametric) linear longitudinal models are expressed in the same way with the expectations µ is .For a discussion of the formulation of hypotheses by distribution functions, we refer to Akritas and Arnold (1994).
As an example, we may consider the study on efficacy of irrigation techniques in removing debris from irregularities in root canals with different apical sizes by Rödig, Sedghi, Konietschke, Lange, Ziebolz, and Hülsmann (2010).In this study, thirty extracted human pre-molars were randomly divided into three groups (each of size n = 10) followed by root canal preparation.In all three groups, three different irrigation procedures were performed and the amount of remaining debris was measured on an ordinal scale.This trial constitutes an F1-LD-F1 design.A more sophisticated setting may include an additional stratification of the whole-plot factor, which is illustrated in Section 2.3.

F2-LD-F1 design
The F2-LD-F1 design is one of the most widely used settings in factorial experiments in spite

Data
Marginal distributions Time Table 3: The F2-LD-F1 design and the corresponding marginal distributions.
of its complexity.Here, a • b different groups of homogeneous subjects are observed repeatedly at t different time points.Such a design can be employed, for example, when subjects are first stratified by their gender (female or male), then in each stratification, subjects are randomly assigned to different treatments (treatment 1, treatment 2,. .., treatment a) and recorded at multiple time points.The statistical model of this trial can be described by independent random vectors The data structure of this F2-LD-F1 trial is displayed in Table 3, and the nonparametric hypotheses can be expressed in the same manner as in the F1-LD-F1 design.
To demonstrate an application of the F2-LD-F1 design, we may consider the shoulder tip pain study carried out with patients having undergone laparoscopic surgery in the abdomen, described by Lumley (1996).In this study, 41 patients were stratified by gender (Factor A), and in each stratification, the patients were randomly assigned to either the control or treatment group (Factor B).The pain scores of each patient were recorded at six time points (Factor T ) to evaluate the effect of gender, treatment, and their interactions.
Instead of considering stratifications on the whole-plot factor, we may have stratifications on the sub-plot factor (time factor).Such designs include LD-F2, F1-LD-F2, and F2-LD-F2, which are discussed in detail by Brunner et al. (2002, Chapter 2).
In Section 3, we examine rank estimators of the relative treatment effects and test procedures for the hypotheses discussed above.

Nonparametric effects, estimators, and test procedures
The general model (1) 2 does not entail any parameters by which a difference between the distributions could be described.Therefore, the mean distribution function H(x) = 1 t t s=1 F s (x) and the distribution functions F s (x) are used to define a treatment effect as a marginal summary measure between H and F s : where Z ∼ H denotes a randomly chosen observation from the whole data set independently from X 1s .These so-called relative marginal effects 3 p s can be regarded as the probability that a randomly chosen observation X sk at time point s tends to result in a larger value than Z.
The interpretation of p s is rather simple; i.e., X 1j tends to result in • a smaller value than X 2s , if p j < p s , • a larger value than X 2s , if p j > p s , • neither a smaller nor larger value than X 2s , if p j = p s .
A graphical illustration of the tendency is given in Figure 1, where both the distribution functions F i (x), F j (x), and the mean distribution function H(x), are displayed.
Figure 1 suggests that, for an easier interpretation of the effect size measure p s , it is sufficient to consider only the relation between p j and p s referring to <, =, and >, respectively.For a detailed discussion of p s , we refer to Brunner et al. (2002, Section 3.1, pp. 35 ff).
The unknown quantities p s can be estimated with overall ranks of the data by replacing all the observations X 11 , . . ., X ns with their overall ranks R 11 , . . ., R ns .Further, let R •s = 2 To derive the results for factorial settings, sub-indices for the random vectors X k in the model (1) are necessary.
3 Here, we use the term "relative marginal effect" as a synonym for "relative treatment effect". 1 n n k=1 R sk denote the mean of the ranks in the marginal sample s.Then, an asymptotically unbiased and consistent estimator of p s is given by where N = n • t.To test the hypothesis H F 0 : F 1 = . . .= F t , let C denote a suitable contrast matrix, F = (F 1 , . . ., F t ) the vector of distributions, p = (p 1 , . . ., p t ) the vector of the relative marginal effects, and p the vector of corresponding estimators.Further, let V n denote the empirical covariance matrix of the ranks, noting that V n is a consistent estimator.Then, under the hypothesis H F 0 : CF = 0, the Wald-type statistic (WTS) has, asymptotically, a χ 2 f distribution with f = rank(C) degrees of freedom under some mild regularity conditions (Akritas, Arnold, and Brunner 1997).Here, M + denotes the Moore-Penrose inverse of a matrix M.
It is well known that the convergence of the distribution of Q n to its limiting χ 2 f distribution is relatively slow (Akritas et al. 1997;Akritas and Brunner 1997;Brunner, Munzel, and Puri 1999;Brunner et al. 2002).In general, the asymptotic approximation deteriorates with an increase of a number of factor levels as well as for smaller sample sizes.Hence, we also present a small sample modification of this test statistic that maintains an accurate size of the test even for small sample sizes (n ≥ 7).
Let T = C [CC ] − C denote the projection matrix obtained from C where M − denotes a generalized inverse of M.Then, the distribution of the ANOVA-type statistic (ATS) can be approximated by the F ( f ,∞) distribution, where f = [tr(T V)] 2 /tr(T VT V) (Brunner, Dette, and Munk 1997;Brunner et al. 1999;Brunner and Puri 2001).
Note that, depending on additional stratification or experimental groups, an appropriate partition into sub-indices is necessary.For example, if the experimental units are divided into i = 1, . . ., a groups with j = 1, . . ., n i units in the i-th group, then X ijs denotes the random variable for the jth unit in the ith group at time point s.
For the main effects of the whole-plot factors and interactions involving only the whole-plot factors, the distribution of ATS can be further approximated by applying a finite denominator degrees of freedom f 0 , i.e., by the F ( f , f 0 ) distribution, taking advantage of the diagonal covariance matrix resulting from independence of subjects4 .(For further details on f 0 , see Brunner et al. (1997) and Brunner et al. (2002, Section 8.3.3, pp. 134 ff).)Real-life applications of the so-called modified ATS can be found in Sections 4.2 and 4.3.
Note that, the adjusted degrees of freedom used for the approximation of the distribution of ATS may appear to be quite different from the conventional degrees of freedom employed in the traditional repeated measures ANOVA.However, such an adjustment for degrees of freedom can be viewed as a generalization of the conventional degrees of freedom in the heteroscedastic case.For instance, for the repeated measures data in the LD-F1 design, assuming sphericity, e.g., compound symmetry structure, of the covariance matrix, approximate degrees of freedom for the distribution of ATS, which is equivalent to treatment sum of squares divided by residual sum of squares, are equal to ( f , f 0 ) = (t − 1, (n − 1) • (t − 1)) (Bathke et al. 2009).However, in general, ranked observations are heteroscedastic even if the original observations are homoscedastic (Akritas 1990), and thus it is reasonable to assume an arbitrary (unstructured) covariance matrix (Brunner et al. 2002, Section 2.2, pp.33)5 .Therefore, for such covariance matrix structure, an appropriate adjustment for degrees of freedom is necessary to draw a valid inference using ATS.

Examples
In this section, we provide examples that illustrate how different factorial designs can be analyzed using the package nparLD.Along with the individual functions for specific designs (ld.f1(), ld.f2(), f1.ld.f1(), f2.ld.f1() and f1.ld.f2()), the package provides a wrapper function nparLD() that automatically identifies the most suitable design through the formula provided by a user.The wrapper function nparLD() creates a class object called nparLD from which users may obtain short and extended summaries as well as a plot of the results using print(), summary(), and plot() for the nparLD object, respectively.In particular, the print() function displays basic results about the model formula and results from the WTS and ATS; the summary() function shows the relative treatment effects in detail additionally to the output provided by print().Finally, plot() creates plots of the relative treatment effects and their corresponding confidence intervals at different time points.

Study from dentistry
Our first case study is related to a growth curve problem where the LD-F1 design may be employed.Potthoff and Roy (1964) assess distances (in millimeters) between the center of the pituitary and the pterygomaxillary fissure of 16 boys and 11 girls on four different occasions, i.e., at the ages 8, 10, 12, and 14, and conclude that two separate growth curves are required for boys and girls.In this example, we focus on the homogeneous group of the 16 boys.(The data for boys are available in dental.A complete dataset for both boys and girls is available in Orthodont in the package nlme.)In particular, we are interested in testing the hypothesis H F 0 (T ) : F 8 = F 10 = F 12 = F 14 of no time effect, where F s denotes the marginal distribution of the distances at age s.We start our analysis by examining the box plot shedding light on the distribution of the data for each age group, and by observing the plot of the relative treatment effect for each age group along with the pointwise 95% confidence intervals.The plots are generated using the following code: R> library("nparLD") R> data("dental") R> boxplot(resp ~time, data = dental, lwd = 2, xlab = "time", + font.lab= 2, cex.lab = 2, main = "Box Plots")  ----------------------Check that the order of the time level is correct.Time level: 8 10 12 14 If the order is not correct, specify the correct order in time.order.
Remark.Note that, as a precautionary check, the function nparLD() automatically provides information about the selected model as well as the order of levels in the sub-plot factor.The user may choose to hide this information by setting order.warning= FALSE.Moreover, more detailed information about the data and abbreviations used in the output become available by setting description = TRUE.
The box plots at the left panel of Figure 2 show the minimum, first quartile, median, third quartile, and the maximum distance measured for each time point separately.They indicate that the measured distances have a somewhat skewed distribution (especially as the age goes up).The increase in median gives rise to a time effect.The 95% confidence intervals at the right panel of Figure 2 present the lower bound, point estimate, and the upper bound for each time point separately using plot(ex.f1np),where ex.f1np is an nparLD class object.The point estimates increase, meaning the older the boys, the larger the observed distances between pituitary and the pterygomaxillary fissure.The exact values for the bounds and point estimates are obtainable by using the code plot(ex.f1np)$Conf.Int.The results can be explored further by using summary(ex.f1np).
Remark.Note that, a further insight about dependence of the measurements per subject can be readily obtained by using the groupedData() function from the R package nlme (Pinheiro et al. 2012)  For each age group s, the rank mean of the overall ranks (RankMeans), the number of observations (Nobs) and the point estimate p s of the relative treatment effect (RTE) are displayed.The obtained result of 0.29 for the age group 8 (time8) can be interpreted, for example, as follows: a randomly chosen observation from the whole dataset results in a smaller value than a randomly chosen observation from the age group 8 with an estimated probability of 29%.Further, since p 8 < p 10 < p 12 < p 14 , the observations from the age group 8 tend to result in smaller values than those from the age group 10 which, in return, also tends to result in smaller values than the measurements from the age groups 12 and 14, respectively.Thus, an increase in the effect seems to indicate the increase in the measured distances.
To test the hypothesis H F 0 (T ) of no time effect, WTS in (3) and ATS in (4) can be applied, which are also displayed in the output of summary(ex.f1np).The column df for ATS is the numerator degrees of freedom of the F distribution as the denominator degrees of freedom is set to infinity.Both WTS and ATS yield highly statistically significant p values of 2.392×10 −20 and 1.438 × 10 −18 , respectively, indicating that the null hypothesis of no time effect is to be rejected.To investigate the question about which of the four distribution functions differ, we can apply multiple comparisons with the Bonferroni adjustment as described below: R> m810 <-which(((dental$time == 8) + (dental$time == 10)) == 1) R> m812 <-which(((dental$time == 8) + (dental$time == 12)) == 1) R> m814 <-which(((dental$time == 8) + (dental$time == 14)) == 1) The results are presented in Table 4, where, for brevity, only the p values obtained from ATS are reported.
In Table 4, the Bonferroni-adjusted p value of 0.6612, obtained for testing the age group 8 against the age group 10 (Time 8 vs.Time 10), is calculated by multiplying the original p value of 0.2204 by 3. Similar calculations are also performed for the other pairwise comparisons.
From the results, we can conclude that the distance between the center of the pituitary and the pterygomaxillary fissure significantly increases over time by observing the p values of < 0.0001 from both WTS and ATS.In addition, we notice significant differences between the distributions of the measured distances for the age groups 8 and 12, and age groups 8 and 14, respectively.To compare the obtained results and conclusions with parametric methods, we further reanalyze the data with the lme() function in the R package nlme (Pinheiro et al.

2012) as described below:
R> library("nlme") R> ex.f1lme <-lme(resp ~time, data = dental, random = ~1 | subject) R> summary(ex.f1lme) We obtain an overall significant time effect (p value < 0.0001).Regarding the multiple comparisons against age group 8, using the code and multiplying the original p value by 3, we obtain the adjusted p value of 0.4395 for the comparison "Time 8 vs.Time 10", as well as the p values of 0.0009 and < 0.0001 for "Time 8 vs.Time 12" and "Time 8 vs.Time 14", respectively.Thus, both parametric and nonparametric procedures result in similar conclusions in this example, which is not surprising since the data exhibit only a minor degree of skewness as indicated by the box plots (see the left panel of Figure 2).

Rat growth study
Our next example deals with the study of body weights of 27 rats (Box 1950;Wolfinger 1996).Each rat was randomly assigned to one of three treatments (control, thyroxin, or thiouracil) with sample sizes 10, 7, and 10, respectively.Thyroxin is a thyroid hormone typically applied in hypothyroidism, and thiouracil is a drug that suppresses generation of thyroxin.The first group was kept as a control while the second and third group had thyroxin and thiouracil added to their drinking water, respectively.The weight (in grams) of each rat was recorded at baseline and subsequent four weeks.Thus, this experiment has the F1-LD-F1 design structure with treatment being the whole-plot factor.As is pointed out by Wolfinger (1996), the body weights of the 27 rats show a "fanning effect", indicating an increase in variability over time.Although a standard technique of the logarithmic transformation makes the data more homoscedastic, there is a concern that such a nonlinear transformation distorts the time and treatment effects.Therefore, we analyze the data using our nonparametric methods.Similar to the dental study, we first examine the box plots of the data and the plot of the relative treatment effect estimates and their corresponding confidence intervals.R> library("nparLD") R> data("rat") R> boxplot(resp ~group * time, data = rat, names = FALSE, + col = c("grey", 2, 3), lwd = 2) R> axis(1, at = 2, labels = "Time 0", font = 2, cex = 2) R> axis(1, at = 5, labels = "Time 1", font = 2,cex = 2) R> axis(1, at = 8, labels = "Time 2", font = 2,cex = 2) R> axis(1, at = 11, labels = "Time 3", font = 2,cex = 2) R> axis(1, at = 14, labels = "Time 4", font = 2,cex = 2) R> legend(2, 190, c("Control", "Thiour", "Thyrox"), lwd = c(3, 3, 3), + col = c("grey", 2, 3), cex = 2) R> ex.f1f1np <-nparLD(resp ~time * group, data = rat, + subject = "subject", description = FALSE) R> plot(ex.f1f1np)-----------------------Check that the order of the time and group levels are correct.Time level: 0 1 2 3 4 Group level: control thyrox thiour If the order is not correct, specify the correct order in time.order or group.order.Relative Effects q q q group control group thiour group thyrox Figure 3 shows box plots (the left panel) and 95% confidence intervals (the right panel) for the relative treatment effects in the rat growth study for the three treatments and each time point separately.The box plots indicate that the data follow a skewed distribution.The 95% confidence intervals further imply that the weights increase over time in all treatment groups.

F1 LD F1 Model
The main question of this experiment is whether the time profiles of the three experiments are parallel, i.e., if there exists a statistical interaction between treatment and time.Absence of such an interaction would be indicated by parallel time profiles.Regarding the box plots and the 95% confidence intervals in Figure 3, the time profile from the thyroxin group does not seem to be parallel to both the control and the thiouracil group time profiles.A secondary question of interest would be to know whether the treatments affect the weight gains.The nparLD() and summary() functions can be applied to test the hypothesis of no interaction between treatment and time formulated in terms of the distribution functions, which help us answer questions raised above.

R> summary(ex.f1f1np)
For the F1-LD-F1, F1-LD-F2, and F2-LD-F1 designs, in addition to WTS and ATS, the modified ATS using the Box (1954) approximation for the whole-plot factors and their interaction (Brunner et al. 1997) are available.The modified ATS has a finite denominator degrees of freedom (denoted by f 0 as discussed in Section 3) as opposed to ATS which has the denominator degrees of freedom equal to infinity, in order to improve approximation of the distribution under the hypothesis of "no treatment effect" and "no interaction between the whole-plot factors".The output for each test is presented below: In this study, the hypothesis of no interaction, i.e., parallel time profiles, is rejected at the 1% level using both WTS and ATS with the p values of 4.199 × 10 −12 and 3.617 × 10 −8 , respectively.To investigate the question of whether the hypothesis of no interaction is rejected at each time point, multiple comparisons can be performed by using the nparLD() function repeatedly with the baseline and week 1 values, followed by comparison of the baseline, week 1, and week 2 values, etc. Similarly as in the dental study, the Bonferroni adjustment can be applied to control the Type I error.

Respiratory disorder study
Our last example concerns about a clinical trial for patients with a respiratory disorder (Koch, Carr, Amara, Stokes, and Uryniak 1990).A total of 111 patients from two centers were randomly assigned to two treatments (active or placebo).The status of each patient was recorded on an ordinal scale (0 = terrible, 1 = poor, 2 = fair, 3 = good, 4 = excellent) at baseline and subsequent four visits.An advantage of the rank-based nonparametric methods is that they can handle ordinal data in the same manner without requiring any further transformation.
As the box plots suggest (see the upper panel of Figure 4), scores under the active treatment are larger than under placebo, and that there exists a difference in scores between the two centers.The lower panel of Figure 4 shows the 95% confidence intervals for the relative treatment effects p ijs .We observe that, in both centers, the effects for the placebo group seem to remain constant over time, while the effects for the active treatment group increase.In Center 1, the clinical record seems to decrease at the last visit, which cannot be observed in Center 2. The time effect, treatment effect, center effect, and their interactions can be analyzed using the print() function.Treat A Treat P q q q q q q q q q q q q −1 0 1 2 3 4 5 Center 2 Time 1 Time 2 Time 3 Treat A Treat P q q q q q 0.0 treatment A treatment P q q q q q 0.0  The print() function automatically prints out tables of results for all main effects and interactions in the F2-LD-F1 design using WTS, ATS, and a modified ATS for the whole-plot factors.Both WTS and ATS indicate a significant interaction between treatment and time at the 5% level, with the p value of 0.0006 using ATS.This confirms the interpretation of the confidence intervals regarding the time profiles in Figure 4.Moreover, both a significant treatment (p value of 0.0028) and center effect (p value of 0.0018) are observed from the modified ATS, where the significant center effect should be further investigated.
Now, let us compare the obtained results with the conclusions provided by the parametric methods.In particular, we apply lme() from nlme to our respiratory data below: Note that, the parametric method of lme() yields completely different conclusions compared to the nonparametric procedure of nparLD().In particular, the results from lme() imply that time is an insignificant variable, i.e., the obtained p value is 0.1138, while nparLD() provides a highly statistically significant p value of 0.0029, concluding that the scores do not evolve in time.Similarly, treatment is declared as an insignificant factor by lme(), with the p value of 0.6085; in contrast, nparLD() concludes that there exists a significant treatment effect, with the resulting p value of 0.0022.Such contradictory results are not surprising since the respiratory data are observed on an ordered categorical scale and, hence, parametric methods are inapplicable.Thus, following the output of nparLD(), we are inclined to conclude that scores are dynamic in treatment and time, and that interaction between treatment and time is significant.These results are also confirmed by the graphical diagnostics provided by the box plots of the respiratory data (see the upper panel of Figure 4).

Conclusion and future work
The R package nparLD implements a broad range of rank-based nonparametric methods for analyzing longitudinal data in factorial experiments.A notable novel feature of nparLD is that it accommodates various factorial designs, including higher-way layouts.The users can easily evaluate the treatment and time effects as well as their interactions via the robust ANOVAtype statistic (ATS), which accurately controls the Type I error rate even for small sample sizes, and the classical Wald-type statistic (WTS).We plan to update the package nparLD on a regular basis with new nonparametric statistical procedures available for longitudinal data.In particular, we aim to implement the multiple contrast testing procedures discussed by Konietschke et al. (2010).In addition, we plan to undertake a major update of the code and release nparLD in the S4 style.

Figure 2 :
Figure 2: Box plots and 95% confidence intervals for p s in the dental study.

Figure 3 :
Figure 3: Box plots and 95% confidence intervals for p is in the rat growth study.Thiouracil and thyroxin are denoted by thiour and thyrox, respectively.

Figure 4 :
Figure 4: Box plot and 95% confidence intervals for p ijs in the respiratory disorder study.
that produces subject-specific line plots in factorial experiments.

Table 4 :
Multiple comparisons against the control in the dental study with Bonferroni adjustment.