Continuous Ordinal Regression for Analysis of Visual Analogue Scales: The R Package ordinalCont

This paper introduces the R package ordinalCont, which implements an ordinal regression framework for response variables which are recorded on a visual analogue scale (VAS). This scale is used when recording subjects’ perception of an intangible quantity such as pain, anxiety or quality of life, and consists of a mark made on a linear scale. We implement continuous ordinal regression models for VAS as the appropriate method of analysis for such responses, and introduce smoothing terms and random effects in the linear predictor. The model parameters are estimated using constrained optimization of the penalized likelihood and the penalty parameters are automatically selected via maximization of their marginal likelihood. The estimation algorithm is shown to perform well, in a simulation study. Two examples of application are given: the first involves the analysis of pain outcomes in a clinical trial for laser treatment for chronic neck pain; the second is an analysis of quality of life outcomes in a clinical trial for chemotherapy for the treatment of breast cancer.


Introduction
Visual analogue scales (VAS) are used for measuring quantities which are intangible and difficult to measure on conventional scales, such as pain, anxiety and quality of life. These are generally used for self-rating. Subjects are given a linear scale of 100 mm and asked to put a mark where they perceive themselves. The scale has verbal anchor descriptors at each extreme, as depicted for the measurement of pain in Figure 1. The VAS reading is taken as the measurement from the left endpoint to the subject's mark, and is usually normalized to lie in the interval [0, 1]. Statistical analysis of the VAS is controversial. While it is a bounded, continuous variable, several authors have argued that it is ordinal, rather than ratio in nature, and should be treated as such in analysis (for example Philip 1990;Wewers and Lowe 1990;Svensson 2000;Kersten, Küçükdeveci, and Tennant 2012). The issue is that the VAS cannot be considered a ratio measurement, or "linear", because for example a 1-cm difference in VAS scores at the lower end of the scale does not necessarily represent the same difference in the intangible outcome as a 1-cm difference at the upper end; and a doubling of VAS score may not translate to a doubling of e.g., the pain or anxiety. Whatever the extent of the "linear" portion in the VAS, it seems reasonable that some nonlinearity will be observed around the limits, where a higher density of observations, caused by perceptive states considered extreme or close to extreme, could be expected. The problem of non-interpretability of distances between measurements and the possibility of nonlinear behaviour, particularly at one or both extremes of the scale, is overcome by treating VAS measurements as ordinal rather than ratio data. We therefore refer to scales of this type as continuous ordinal.
We have developed a regression framework for continuous ordinal responses (Manuguerra and Heller 2010) and implemented it in the R package ordinalCont, which we present in this paper. The package, created in R (R Core Team 2020), is available from the Comprehensive R Archive Network (CRAN) at https://CRAN.R-project.org/package=ordinalCont (Manuguerra and Heller 2020). The relevance of continuous self-rating scales in the pain literature has been described in Heller, Manuguerra, and Chow (2016), where the frequent use of suboptimal methods in the analysis of VAS and the superior power of continuous ordinal regression analysis are discussed.
The remainder of this paper is structured as follows. Section 2 gives an overview of ordinal regression, in which the response variable is an ordered category. In Section 3 we develop a regression model based on responses such as VAS, which are measured on a bounded continuous scale but are at an ordinal level. Section 4 describes its implementation in the R package ordinalCont. Sections 6 and 7 give two examples of application, the first a pain response and the second quality of life.

Ordinal regression
Ordinal regression models (McCullagh 1980) are widely used for regression analysis of discrete ordinal responses Y within K ordered categories. The Y 's are considered as coarse versions of an unobserved, continuous latent variable W , such that where the α j 's are the correspondence on the latent variable scale of the category boundaries on the ordinal scale and −∞ = α 0 < α 1 < · · · < α K = ∞. This is depicted in Figure 2. Typically W is an intangible quantity such as pain, anxiety or quality of life and y = 1, 2, . . . , K codes for ordinal states such as "none", "mild", "moderate", "severe".
Figure 2: Correspondence between latent variable W and observed response Y , with K = 5.
The ordinal nature of Y is preserved by basing the regression model on the cumulative probabilities γ j = P(Y ≤ j). To relate the γ j to covariates x = (x 1 , . . . , x p ) in the jth category, we write: The covariates are modelled on the latent scale: where is a random error term, whose distribution we need to specify. A convenient choice for the distribution of is the standard logistic distribution, having cumulative distribution function (CDF) F(z) = P( ≤ z) = 1/(1 + e −z ) (Johnson, Kotz, and Balakrishnan 1995). Then Inverting this translates to the cumulative logistic model (also called the proportional odds model) for Y : In the ordinal regression literature, the intercepts α j are either ignored or used to characterize differences in category "size". In agreement with the latter approach, we argue that the α j s are worthy of careful modelling as they relate to an important cognitive aspect, i.e., how the perception of W changes at different levels. This behaviour can be of particular interest around the extremes of the scale, where steep curvature is indicative of a tendency for subjects to perceive and score their outcome at the extreme. The ordinal regression model has developed to incorporate additive and non-linear functional forms of predictors (Hastie and Tibshirani 1987) and spline-based smoothers of predictors (Yee and Wild 1996). In Tutz (2003), a general framework to deal with semiparametrically structured models is given, with predictors that can contain parametric parts, additive parts with an unspecified functional form, and interactions. In particular global effects and categoryspecific effects are distinguished. In R, the standard ordinal regression is implemented in the package ordinal (Christensen 2019).
Typically VAS responses are analyzed by simply treating the VAS measurements as continuous responses in a normal model (for example Olivan-Blázquez et al. 2014); or by nonparametric methods such as the Mann-Whitney and Kruskal-Wallis tests (for example Mentula, Kalso, and Heikinheimo 2014); or by dichotomizing ("pain", "no pain") and using logistic regression (for example Choi et al. 2014); or by categorizing into a small number of levels and using model (2). Of these, the latter approach is most satisfactory, although it involves a loss of information. In a PubMed search we could only find one instance of it (Atrian, Abbaszadeh, Sarvieh, Sarafraz, and Jafarabadi 2013). The other approaches are less than optimal: use of the normal regression model has obvious distributional shortcomings; nonparametric methods are valid in that they are rank-based and therefore respect the ordinality of the measure; however, they are limited to comparisons between groups and cannot incorporate covariates or other more sophisticated regression features such as random effects.
We have proposed (Manuguerra and Heller 2010) a generalization of the standard ordinal model (2) which leads to an ordinal regression model for continuous scales. In this, the intercepts are modelled using parametric functions or B-splines. The original Bayesian estimation methodology has been updated (Manuguerra, Heller, and Ma 2017) to maximum penalized likelihood (MPL), which is faster and more stable. An extended version of this is described in Section 3. Similarly, Liu, Shepherd, Li, and Harrell Jr (2017) have used cumulative probability models for continuous outcomes. Their approach makes use of non-parametric maximum likelihood to estimate the parameters of the model and the intercepts and is then robust, but has the disadvantage of estimating a high number of parameters.

The continuous ordinal regression model
Consider VAS measurements v which are sampled from a continuous response variable V ∈ (0, 1) 1 , with density f v (v) and CDF γ(v). The continuous ordinal response variable V reflects the subjective perception of the underlying continuous latent variable W defined on the real line. The dependence between V and W is modelled by a smooth one-to-one function g : (0, 1) → R that maps v on the VAS scale to w = g(v) on the latent scale. This mapping is the link between the recorded perception of the latent variable and an underlying metric. As for the standard ordinal model, the covariates X are modelled on the latent scale. Defining the regressor h (X) and assuming W = −h (X) + , the cumulative distribution function for the score V can then be written as: From Equation 3 it is clear that the ordinal regression model is a transformational model (Möst and Hothorn 2015;Hothorn, Möst, and Bühlmann 2018) for the conditional distribution of V , which depends on the distribution function of , F(·), and the regressor h(v, X) = g(v)+h (X).
The function g(v) is the continuous analog of the discrete intercepts α j in model (2), and its shape is informative of the change in perception of the latent variable at different levels (as are the α j ). The systematic component h (X) may incorporate fixed effects, random effects and smoothing terms. Random effects are useful not only for modelling clustered or longitudinal data, as in the examples in Sections 6 and 7, but also for modelling individual variation due to subjective perception of the latent variable.
Differentiating Equation 3, we can write the density of V as: ∂v . The log-likelihood is obtained taking the logarithm of Equation 4. Re-ordering the terms, for subject i it can be written in the general form: A variety of distributions can be used for F(·), and some of the most popular ones (logit, probit, loglog, cloglog, Cauchit) are implemented in the ordinalCont package. We can comment that if the cumulative distribution is F(h) = e h 1+e h (logistic model), the regressor h(v, X) can be interpreted as the log-odds and the systematic component h (X) as the log-odds ratio to an observation with h (X) = 0, regardless of the value of v. When F(h) = 1 − e −e h (cloglog link function) and the score v is the time to event, the model can be used for survival analysis and the g function is the log-cumulative hazard at baseline. From Equations 3 and 4, we can derive the general forms of the survival function S(v|X) = 1 − F(h(v, X)), the cumulative hazard function Λ(v|X) = − log(1−F(h(v, X))) and the hazard function λ(v|X) =Ḟ (h)ġ(v) 1−F(h(v,X)) .

The g function
In order to specify model (5), we need to define the form of the function g (v). It has to be differentiable, monotonic increasing and "flexible enough" to capture the nonlinear behaviour of the ordinal measure. In the contest of the ordinal regression, where the recorded scores are always bounded and can be normalised to the interval [0, 1], any function which maps (0, 1) to R could be appropriate (see next paragraph for how to transform data from a closed to an open interval). The choice of such a function does not decrease the generality of the formulation of the problem and its applicability to other models (e.g., transformational models which are not ordinal models), as real data are always bounded in an interval and can then be normalised. The function g(v) has previously been expressed using a parametric approach, as in Manuguerra and Heller (2010), or non-parametrically, as in Liu et al. (2017). In this study we have implemented an approach based on monotonic I-splines Ψ u (Ramsay 1988). We can where p θ is the number of basis functions. This specific formulation allows for more flexibility than the parametric approach, not assuming the shape of the function g(v); makes it easy to satisfy the monotonicity constraint, by imposing θ u ≥ 0, ∀u; provide the first two derivatives in easy-to-express forms, i.e.,ġ( can be efficiently computed using the properties of I-splines and M-Splines.

Scale endpoints
A problem in extending the ordinal regression model to continuous scales is that the VAS is a closed scale, i.e., it contains the extremes 0 and 1. As lim v→0 g(v) = −∞ and lim v→1 g(v) = ∞, the presence of either 0 or 1 results in the model not being identifiable. Both extremes are valid recordings, in particular v = 0, reflecting an absence of the latent variable (e.g., pain or anxiety). In order to deal with this numerically, where necessary we use the scaling proposed by Smithson and Verkuilen (2006).

Covariates
In the current formulation, the roles of the observed score v and the covariates X in the regressor h(v, X) are separated in the two terms g(v) and h (X). Assuming p fixed effects, r individual effects and s smoothing terms, the latter can be written as: where U is a n × p design matrix for the fixed effects, Z j is the design matrix for the j-th individual effect and Φ l is the matrix containing the basis functions (B-splines) evaluated at the values of the covariate Y l . Here η is the set of parameters η 1 , . . .

Penalized likelihood estimation
B-splines, I-splines and individual effects in the model can lead to overfitting. We avoid it using penalty functions, i.e., including additional constraints on ϑ and θ, the parameters that define the smoothing terms and the g function. Our experience (see also Ruppert, Wand, and Carroll 2003, page 66) reveals that a penalty function can stabilize the estimation process and makes the selection of number and location of knots less important.
The penalized log-likelihood can then be written as: where the J's are penalty functions, such as roughness penalties, and the first p terms in the last sum account for penalty terms relative to the fixed effects, and are zero. The roughness penalties can be written in general as J(θ) = θ Rθ where the R terms are square matrices with elements given by R jk = Ψ j (y)Ψ k (y) dy. For the individual effect terms, R is an identity matrix and the penalty term is J The model parameters we want to estimate are η = η 1 , . . . , η t and θ = θ 1 , . . . , θ m . Their constrained MPL estimates are then defined by

Estimation of model parameters
The estimation procedure iterates through two steps repeated until convergence. First, given the current values of the λ's, the parameters that define the g function and the predictor, θ and η, are estimated, then given the current value of θ and η the λ's are calculated according to the method detailed in Section 3.6. The first estimation of the parameters θ and η is done using maximum likelihood, i.e., with the starting value λ 0 = 0. When the number of observations is large compared to the number of parameters, this choice does not create convergence problems, but the user should be careful in the opposite case. We comment that this is a constrained optimization as θ ≥ 0, and that g(v) and h(v, X) are estimated jointly, as they are updated at every step.
The Karush-Kuhn-Tucker (KKT) necessary conditions for the constrained MPL estimation of θ and η are: Equations 9 and 10 can be solved iteratively, with the unconstrained parameters η estimated using a Newton method and the positively constrained parameters θ estimated using the multiplicative iterative (MI) algorithm (Ma 2010).
The first and the second derivative of the penalized log-likelihood (7) with respect to η are: , R is a block diagonal matrix with R m matrices on the diagonal and λ = λ 1 , . . . , λ t .
Given the estimated values of η at iteration k + 1 and θ at iteration k, the values of θ at iteration k + 1 are obtained with the MI algorithm (Ma, 2010). The MI algorithm can handle a large number of non-negatively constrained parameters as it is very efficient to implement, involving only the first derivative of the penalized likelihood function. When θ > 0, by the second KKT condition (10): and 0 otherwise. Following Ma (2010), it is straightforward to derive the updating algorithm:

Automatic smoothing parameter estimation
Fixing the values of η to the most current estimate, the optimal smoothing parameters are the roots of the partial derivatives with respect to each λ m of the marginal posterior computed integrating out θ and η from (7). The log-posterior can be written as: where p m is the number of parameters estimated for the m-th smoother or random effect. From here the marginal log-posterior is obtained by integrating out θ and η: Exact solution to this integral is infeasible, but it can be approximated following Laplace (1774): for σ 2 θ and σ 2 ηm we obtain the desired smoothing parameters estimates:

Quantile residuals
The estimated CDF of V , commonly called probability integral transform, iŝ whereĝ(v) andĥ (X) are calculated at the estimated values of θ and η. Quantile residuals (Dunn and Smyth 1996) are computed at observed v i as where Φ −1 (·) is the inverse standard normal CDF. If model assumptions are satisfied and there is sufficient diversity in the observations (v i , X i ), then the residuals q i should be normally distributed. On the other hand, if observations are highly discretized, for example at value v = 0 with limited covariate values, the q i could deviate from normality. In this case, we recommend the use of randomized quantile residuals (Dunn and Smyth 1996).

Package overview
The ordinalCont R package implements the continuous ordinal regression models, using maximum penalized likelihood estimation. The main function is ocm for both fixed-and mixedeffect models. Model equations are specified with syntax as in, for example, function lm and the package lme4 (Bates, Mächler, Bolker, and Walker 2015). The output of the ocm function is an object of class 'ocm', and several methods are provided for it, including plot, print, summary, extractAIC, logLik, predict and anova. The plot method displays the following graphical summaries: • a plot of the estimated g function with 95% confidence intervals; • the histogram and normal Q-Q plot of the quantile residuals; • plots of the smooth terms (B-splines) for covariates x, specified as s(x) in the formula (if included in the model).
The predict method provides, for given covariate values X * and parameter estimatesη and θ, the predicted values for v, the conditional density function of v, the conditional distribution function, the conditional quantile function, the linear regressor, the exponential of the linear regressor, the conditional hazard functions, the conditional cumulative hazard function and the conditional survival function.

Results
In the simulations we have used four sample sizes, n = 100, 200, 500, 1000, and for each sample size we have generated and analysed 1000 samples. In Table 1 we report the means and the standard deviations of the 1000 estimates, while in Figures 3 and 4 the estimates of the smoother and the g function are shown together with the true values of the functions for a typical simulation.
As expected with penalized likelihood regressions, increasing the sample size has the effect of decreasing the bias and the variance of the parameter estimates. It is worth noting that both the variance of the random effects and the nonlinear effect of a 3 are well estimated for all the sample sizes, n = 100 included. In the latter case, despite the limited number of observations, the estimates are still good indicators of the strength and direction of the effects.

Application 1: Chronic neck pain study
In this example, we illustrate the use of the continuous ordinal model when treatment effects, interaction effects and random effects are included in the model.
This study (Chow, Heller, and Barnsley 2006) is a randomized controlled trial on the efficacy of low-level laser therapy (LLLT) in the treatment of chronic neck pain. Patients with chronic neck pain were randomized to receive 14 LLLT treatments over 7 weeks, with either active or placebo laser (laser = 1, laser = 2 respectively). The primary outcome measure of the study was the VAS, marked by the patients at baseline, at the end of the course of treatment and one month after completion of the treatment. We analyzed data from n = 84 subjects measured at the three time points. The VAS scores are in [0, 1], and the ocm function automatically scales them to (0, 1) for the purpose of analysis.
The data layout is (first three cases):
To check the model fit, we look at the histogram of the quantile residuals and the quantile residual normal Q-Q plot:

R> plot(lasermodel0)
The plots ( Figure 5) suggest an adequate model fit, even if with a slightly skewed distribution of the quantile residuals, confirmed by the Q-Q plot.
To model the VAS scores at the three time points, we use a random effect for patient. The continuous ordinal model is where x 2 and x 3 are indicator variables for times t = 2 and t = 3 respectively, and withinpatient correlation is modelled using the random effect b i ∼ N (0, σ 2 ) for patient i. The function ocm is used, with the term (1|id) specifying the random intercept for patient (id): Significant treatment benefits are demonstrated at times 2 and 3, as evidenced by the significance of the interaction terms laser1:time2 (β 12 ) and laser1:time3 (β 13 ), with an attenuation of the positive effect at time 3. Note that there was an imbalance between treatment groups at baseline, with the active treatment group having worse pain (β 1 = −2.14, p = 0.001).
The random effects (b i ) are sampled from a normal distribution with significant variancê σ 2 ≈ 6.1, signifying a persistent subject effect. It is particularly important to incorporate this effect because of the individual and subjective nature of the measurements.
We use the predict method to compare the observed and predicted values of the VAS score.
R> plot(lasermodel1$v, predict(lasermodel1), xlab = "observed score", + ylab = "predicted score", xlim = c(0, 1), ylim = c(0, 1)) R> lines(c(0, 1), c(0, 1)) The results are shown in Figure 6. The reader could notice that attenuation is present in the plot, as predicted scores are higher than observed scores for low values of v, and vice-versa. We can comment that, based on our experience with simulated and real data sets, this feature is not commonly observed, even when a high proportion of the observed score v takes values 0 or 1.
In order to appreciate the estimated effect of the treatment on the score v, and under the assumption of individual effect b = 0, we can calculate the conditional density distributions f (v|treatment = "laser", b=0) and f (v|treatment="placebo", b=0), and the conditional cumulative distributions F(v|treatment="laser", b=0) and F(v|treatment="placebo", b=0).

Application 2: Breast cancer study
In this example, we demonstrate the use of a smooth term for one of the covariates and of the predict.ocm method.
Metastatic breast cancer is the most common cause of cancer death among Australian women. The ANZ0001 trial is a randomized trial with three chemotherapy treatment arms (n = Health-related quality of life is assessed at each chemotherapy treatment cycle. The treatments Intermittent Capecitabine (IC) and Continuous Capecitabine (CC) are compared with the standard combination treatment CMF, each with its own protocol. There is no maximum duration of treatment, but it is interrupted on disease progression, or when patient intolerance or unacceptable toxicity are recorded.
In this analysis we aim to verify which treatment has a better impact on quality of life, and in particular how this impact changes over chemotherapy cycle. Various aspects of quality Cycle  Treatment  IC  CC  CMF  0  85  85  90  5  61  56  51  10  34  38  10  15  19  21  3  20  13  11  2  25  7  6  0  45 1 3 0 60 0 1 0  Table 2. Note that no patients on standard CMF treatment progressed beyond cycle 20, and for the other two treatments the data are sparse beyond this point. We should comment that the software does not natively accommodate censoring at the moment. This includes informative censoring, e.g., cases in which subjects quit the study for causes correlated with the covariates (death, relapses, etc.). In these cases, the results could be biased.
We used the overall quality of life VAS scale as dependent variable. Among the several covariates available, only treatment and chemotherapy cycle number were found to be significant.
We fitted eight models for the overall quality of life v ij for patient i at chemotherapy cycle j: where t i is the treatment arm of patient i, s(j) is a smooth term that depends on the cycle number j, t i * s(j) = t i + s(j) + s(j|t i ), s(j|t i ) is a smooth term that depends on cycle number j conditional on the treatment arm, b i are random effects sampled from N (0, σ 2 ) and the error on the latent scale follows the logistic distribution.
In the following, we show how to use smoothers stratified by a categorical variable, and to do so we study the effect of body-surface area (BSA), treatment and cycle number on the patients' appetite, which, similarly to the overall quality of life, has been measured on a VAS.
Drug dosage in cancer chemotherapy is often based on BSA (Pinkel 1958). BSA can be computed according to a variety of formulae and several studies have analysed the distribution of the different BSA versions in cancer patients. According to Verbraecken, Van de Heyning, De Backer, and Van Gaal (2006) a reasonable value to distinguish female normal weight patients from female overweight patients is 1.68 m 2 . In the following, we study the effect of cycle number over appetite for subjects with a BSA lower or higher than 1.68 m 2 , according to the model:

cycle number
Effect on the latent scale Figure 9: Effect of cycle number on appetite stratified by BSA level, with 95% confidence intervals, as estimated in model9. Higher values correspond to lower scores on the continuous ordinal scale.
present case, as we are using the logistic distribution for F(·), we refer the reader to Section 3.1 and recall that the systematic component h (X) can be interpreted as the log-odds ratio to an observation with h (X) = 0, regardless of the value of v. This implies that, keeping the other covariates constant, the odds ratio for subjects who have been treated with the second treatment (CC) to those treated with the first treatment (IC) is exp(0.77094) = 2.1618. The appetite loss associated with the second treatment, relative to the first, is then quantified with a more than two-fold increase in the odds of having less rather than more appetite, independently of the level of appetite. The plot ( Figure 9) suggests that patients with low BSA lose their appetite, especially at the beginning of the treatment, with the effect peaking at around the 10th chemotherapy cycle. There is not a significant effect of chemotherapy on the appetite of patients with BSA over 1.68 m 2 .

Discussion
The package ordinalCont implements a regression framework for a response variable that is a recorded perception on a visual analog scale, of an underlying latent variable which is difficult or impossible to observe or measure. The model is an extension of the cumulative logistic ordinal regression model for discrete ordinal responses and is general in the sense of incorporating parametric and smoothing terms, as well as random effects, similarly to what is done in a generalized additive mixed model (GAMM). The recorded perception is mapped to the latent variable using a smooth function, and to avoid overfitting, the model parameters are estimated using constrained optimization of the penalized likelihood. The penalty parameters are automatically selected via maximization of their marginal likelihood.
A commonly used approach for VAS responses is to treat them as ratio variables; while this may in some cases deliver a similar analysis to our approach, this does not recognize the inherent non-ratio and nonlinear nature of the VAS measure. Another approach is to categorize the VAS responses, and analyze them as discrete ordinal responses. While this approach is not incorrect it involves loss of information; our model obviates the need for this aggregation of information.

Computational details
The results in the simulation were obtained on OS X using the reference BLAS libraries shipped with R. The results differ slightly when using a different BLAS, e.g., the vecLibbased BLAS.