More on Multidimensional Scaling and Unfolding in R : smacof Version 2

The smacof package offers a comprehensive implementation of multidimensional scaling (MDS) techniques in R . Since its first publication (De Leeuw and Mair 2009b) the functionality of the package has been enhanced, and several additional methods, features and utilities were added. Major updates include a complete re-implementation of multidimensional unfolding allowing for monotone dissimilarity transformations, including row-conditional, circular, and external unfolding. Additionally, the constrained MDS implementation was extended in terms of optimal scaling of the external variables. Further package additions include various tools and functions for goodness-of-fit assessment, unidimensional scaling, gravity MDS, asymmetric MDS, Procrustes, and MDS biplots. All these new package functionalities are illustrated using a variety of real-life applications.


Introduction
is a technique that represents proximities among objects as distances among points in a low-dimensional space. Multidimensional unfolding (Coombs 1964;Busing, Groenen, and Heiser 2005;Borg and Groenen 2005) is a related technique that represents input preference data as distances (among individuals and objects) in a low-dimensional space. Nowadays, MDS as well as unfolding problems are typically solved through numeric optimization. The state-of-the-art approach is called SMACOF (Stress Majorization of a Complicated Function; De Leeuw 1977) 1 and provides the user with a great amount of flexibility for specifying

Purpose
Function names Goodness-of-fit icExplore, randomstress, permtest Stability jackmds, bootmds, confEllipse Constrained MDS smacofConstraint (arguments type, constraint.type) Unfolding unfolding (arguments type, conditionality, circle, fixed, fixed.coord), vmu MDS variants uniscale, gravity, driftVectors, Procrustes Plots biplotmds Utilities sim2diss, stress0 MDS and unfolding variants. Since the first publication of the smacof package in R by De Leeuw and Mair (2009b), several additional MDS and unfolding approaches as well as various extensions and utility functions have been implemented, as presented in this article. We keep our elaborations fairly applied since the core technical details were already provided in the original publication.
The first part of this paper gives the reader the key ingredients of MDS, with a special focus on newly implemented dissimilarity transformation functions. This is followed by a section on MDS goodness-of-fit assessment, including various ways of assessing the stability of a solution, and a section on MDS biplots. The incorporation of optimal scaling on the external variables, as presented in a subsequent section, makes MDS an attractive tool for confirmatory research. What follows next is a detailed presentation of the recently implemented unfolding function, which adds great amounts of flexibility in model specification as compared to the original implementation. Finally, several smaller additions such as Procrustes transformation, asymmetric MDS, gravity MDS, and unidimensional scaling are presented. Table 1 gives an overview of these developments. Related R packages are mentioned in the respective sections.

SMACOF in a nutshell
MDS takes a symmetric dissimilarity matrix ∆ of dimension n×n with non-negative elements δ ij as input. These dissimilarities can be either directly observed (e.g., in an experimental setting a participant has to rate similarities between pairs of stimuli) or derived (e.g., by applying a proximity measure on a multivariate data frame). If the data are collected or derived as similarities s ij , the sim2diss function supports users to convert them into dissimilarities δ ij . Corresponding conversion formulas are given in Table 2. Additional technical details on various conversions can be found in Shepard (1957), Gower and Legendre (1986), Ramsay (1997), Esposito, Malerba, Tamma, and Bock (2000), Fleiss, Levin, and Paik (2003), Heiser and Busing (2004), and Keshavarzi, Dehghan, and Mashinchi (2009). The resulting matrix ∆ can then be passed to the respective MDS functions.
We begin with w ij which denotes a non-negative a priori weight for δ ij . By default, w ij = 1. If a δ ij is missing, all functions in smacof set the corresponding w ij = 0 such that these entries are blanked out from optimization. Solving the stress function results in an n × p matrix X of point coordinates located in a p-dimensional space (p fixed a priori) with Euclidean distances Thed ij 's are the disparities (also called d-hats), collected in the n × n matrixD. Disparities are optimally scaled dissimilarities. That is, a transformation admissible on the assumed scale level ("measurement levels as functions"; see, e.g., Jacoby 1999) is applied. The first smacof package incarnation offered only two specification options: metric or non-metric. The new package version implements the following bundle of transformation functions (ordered from most restrictive to least restrictive): • Ratio MDS:d ij = bδ ij .
• Monotone spline MDS:d ij = f (δ ij ) where f is an I-spline (integrated spline) transformation (Ramsay 1988) with fixed number of knots and spline degree.
• Ordinal MDS:d ij = f (δ ij ) where f is a monotone step function. Approaches for tie handling (i.e., in case of δ ij = δ i ′ j ′ ) are the following: -Primary approach ("break ties"): does not require thatd ij =d i ′ j ′ .
-Tertiary approach: requires that the means of the tie blocks are in the correct order.
Since dissimilarities are non-negative, these monotone transformations impose non-negativity on the disparities as well.
In order to make stress scale-free, it needs to be normalized either implicitly or explicitly. SMACOF uses an explicit normalization using the constraint i<j w ijd 2 ij = n(n − 1)/2. This results in the normalized stress expression Kruskal (1964) proposed an implicit stress normalization called "stress-1": In the MDS literature, many experiments and other MDS software in mainstream statistical packages have been using stress-1. Fortunately, there exists a simple relation between σ n and σ 1 , as shown in detail in Borg and Groenen (2005, Chapter 11). They prove that at a local minimum X * σ 1 (D, X * ) = σ n (D, X * ).
Therefore, without loss of generality, we report stress-1 in all MDS functions implemented in smacof 2 . To illustrate MDS with different types of transformation functions we use a simple dataset from Guttman (1965). The data consist of an 8 × 8 matrix containing correlations of eight items in an intelligence test. First, we need to convert these similarities into dissimilarities, as all smacof functions operate on dissimilarities. Second, we fit four MDS versions and report the corresponding stress values.  The variability in the stress values across the different transformations is due to the differing amounts of flexibility provided by each of the transformations. Figure 1 shows the Shepard diagrams involving four different transformation functions. These diagrams plot the observed dissimilarities δ ij against the fitted distances d ij (X), and map the disparitiesd ij into the point cloud .
The option to apply various dissimilarity transformations is one of the advantages of the SMACOF framework compared to classical scaling (Torgerson 1952) as implemented in stats' cmdscale. In smacof, these transformation functions are now also available for all kinds of three-way MDS models (indscal and idioscal functions), as well as for confirmatory MDS and unfolding, as described further below. Mair, Borg, and Rusch (2016) give an extensive treatment of how to assess goodness-of-fit in MDS. Here we present some recently implemented utility functions that support users with this task.

Configuration starting values
Optimizing the stress function in Equation 1 through majorization leads to local minima problems since the stress surface is generally bumpy. By default, smacof uses a classical scaling solution to support the algorithm with a reasonable starting configuration. This is not necessarily the best choice because it does not always lead to the lowest stress value. A common heuristic strategy is to try out several random starting configurations, and pick the fit with the lowest stress value.
To illustrate this approach, we use one of the classical MDS textbook datasets from Wish (1971), containing similarity ratings for 12 countries, for which we fit a ratio MDS. The first line converts the input similarities into dissimilarities by subtracting each s ij from 7 (cf. last row in Table 2).
R> WishD <-sim2diss(wish, method = 7) R> fitWish <-mds(WishD) This leads to a stress value of 0.2185. Now we fit 100 additional ratio MDS models based on different random starts, and report the lowest stress value.
R> set.seed(123) R> stressvec <-rep(NA, 100) R> fitbest <-mds(WishD, init = "random") R> stressvec[1] <-fitbest$stress R> for(i in 2:100) { + fitran <-mds(WishD, init = "random") + stressvec[i] <-fitran$stress + if (fitran$stress < fitbest$stress) fitbest <-fitran + } R> round(fitbest$stress, 4) This solution leads to a slightly lower stress value than the one obtained with a classical scaling start. From a purely statistical point of view the user would normally decide to go with this solution. However, from a more substantive perspective, interpretability plays an important role. For instance, there might be a solution with a reasonably low stress value (but not the lowest) which leads to better interpretability. This issue is studied in detail in Borg and Mair (2017) who propose the following strategy (p. 21-22): 1. Run an MDS analysis with a set of different initial configurations (e.g., using many random configurations).

Save all resulting MDS solutions and their stress values.
3. Use Procrustean fitting (see Section 7.4) to eliminate all meaningless differences (i.e., differences not driven by the data) among the MDS solutions.
4. Compute the similarity of each pair of MDS configurations.
5. Analyze the similarity structure of the MDS configurations with two-dimensional MDS (to visualize the similarity structure) or cluster analysis (to identify types of MDS configurations).
6. For each type of MDS configuration with a reasonably low stress value, plot one prototypical MDS solution and check its interpretability.
7. Pick the MDS solution that is acceptable in terms of stress value and gives the best interpretation.
These steps to explore initial configurations are implemented in the icExplore function. Again, we fit 100 ratio MDS models with random starts and save all fitted MDS objects (returnfit argument).
R> set.seed(123) R> icWish <-icExplore(WishD, nrep = 100, returnfit = TRUE) R> plot(icWish, main = "IC Plot Wish") Figure 2 shows the configuration plot of the 100 MDS solutions based on random starts (cf. Step 5). 3 The larger the size of the label, the larger the stress value and, therefore, the worse the fit of the solution. Based on this plot the user can extract various solutions that fit satisfactorily, plot the configurations, and interpret the solutions.

Stress norms and permutation tests
Regarding stress-1 values (on a percentage scale), Kruskal (1964, p. 3) says that "our experience with experimental and synthetic data suggests the following verbal evaluation: 20% poor, 10% fair, 5% good, 2.5% excellent, 0% perfect". In subsequent years, these rules of thumb have been applied in a somewhat mechanical manner. This is problematic for various reasons (see Mair et al. 2016;Borg, Groenen, and Mair 2018); one of which is that the stress value depends on n, as is obvious in Equation 1: the larger n, the larger the stress value 4 .
This issue was recognized in the early days of MDS. Throughout the 1970s various researchers have studied this phenomenon by means of Monte Carlo simulations within the context of ordinal MDS (see Spence and Young 1978, for an overview). These studies lead to the concept of stress norms. The idea is to create random dissimilarities (e.g., by drawing from a uniform U (0, 1) distribution) for a given n and p. For each random draw an MDS solution is fitted. Subsequently, the average stress value and the standard deviation can be computed.
A corresponding implementation is provided by the function randomstress which allows users to not only derive ordinal MDS norms, but also to obtain stress norms for other types of MDS from Section 2. As an example, we use a dataset from Lawler (1967) who studied the performance of managers. There are three traits (T1 = quality of output, T2 = ability to generate output, T3 = demonstrated effort to perform), and three methods (M1 = rating by superior, M2 = peer rating, M3 = self-rating). We start the stress norm analysis by fitting a 2D ratio MDS model: In our example the stress value of 0.241 from the original MDS fit is above this cutoff. This suggests a "non-significant" result which implies that the 2D ratio MDS solution does not fit satisfactorily.
There are several issues associated with such random stress norms. First, as Spence and Ogilvie (1973) point out, the dispersion of the random stress norms is in general very small. In most practical applications the strategy applied above leads to "significant" results; our example is somewhat of a rare exception. Second, apart from n and p, other circumstances such as the error in the data, missing values, as well as ties affect the stress (Mair et al. 2016;Borg and Groenen 2005). Third, the benchmark is based on completely random configurations. Real-life data almost always have some sort of structure in it such that the random stress strategy leads to "significant" results in most cases.
Instead of generating random dissimilarities, permutation tests can be used, as formalized in Mair et al. (2016). They lead to "sharper" tests than random null configurations. There are two scenarios for setting up a permutation scheme. First, in the case of directly observed dissimilarities the elements in ∆ can be permuted. For each permutation sample an MDS model of choice is fitted. By doing this many times it results in a null distribution of stress values. Second, for derived dissimilarities, Mair et al. (2016) propose a strategy for systematic column-wise permutations (one variable at a time). This permutation scheme gives a more informative null distribution compared to full column-wise permutations. For each permutation sample a dissimilarity matrix is computed, and an MDS fitted. Again, this gives a stress distribution under the H 0 of little departure from complete exchangeability of dissimilarities in the data-generating process.
Let us illustrate both permutation scenarios. For directly observed dissimilarities we continue with the Lawler example from above (500 permutations): We cannot reject the H 0 of "stress/configuration are obtained from a random permutation of dissimilarities". For the derived dissimilarity situation we use a dataset from McNally, Robinaugh, Wu, Wang, Deserno, and Borsboom (2015) which is included in the MPsychoR package (Mair 2020). It includes 17 posttraumatic stress disorder (PTSD) symptoms reported by survivors of the Wenchuan earthquake in 2008, scaled on a 5-point rating scale. We use the Euclidean distance as (derived) dissimilarity measure and compute an interval MDS. This leads to the following stress value: R> library("MPsychoR") R> data("Wenchuan", package = "MPsychoR") R> Wdelta <-dist(t(Wenchuan)) R> fitWen <-mds(Wdelta, type = "interval") R> round(fitWen$stress, 3) [1] 0.184 In the subsequent permtest call we provide the raw input data through the data argument. This way the function knows that the permutations should be performed on the raw data rather than on ∆. We also need to tell the function which dissimilarity measure we used above before fitting the MDS. We perform 1000 replications.  Figure 3, obtained by calling plot(permWen), visualizes the results in two ways: the left panel shows the empirical cumulative distribution function (ECDF) of the permutation stress values, whereas the right panel shows the permutation stress histogram including the critical value (lower 5% quantile) and the observed stress value.
Note that such permutation strategies can be applied to unfolding models (see Section 6) as well (see Mair et al. 2016, for details).

Stability of a solution I: Jackknife
De Leeuw and Meulman (1986) developed a jackknife strategy for MDS in order to examine the stability of a solution. Their approach, implemented in the jackknife function, computes i = 1, . . . , n additional solutions with configurations X −i (object i being left out). Note that each X −i has row i missing and has therefore n − 1 rows in total. To make the X −i 's comparable, the location of the missing point is estimated by minimizing a least squares problem, and subsequently transformed using Procrustes (see Section 7.4) with X as target.
Let us denote the resulting configurations by X * −i , each of them of dimension n × p. From these configurations the average (centroid) jackknife solutionX * can be computed. Thus, we have n + 2 comparable configurations in total which can be represented in a single plot, as shown below.
De Leeuw and Meulman (1986) also introduced various measures related to the jackknife solution. The first one is a stability measure and is computed as follows: ST can be interpreted as the ratio of between and total variance. To measure the crossvalidity, that is, comparing the "predicted" configuration of object i as the i-th row inX * with the actual configuration (i-th row in X), can be used. Using these two normalized measures the dispersion around the original solution X can be simply expressed as The dataset we use to illustrate the jackknife MDS is from McNally, Mair, Mugno, and Riemann (2017), included in the MPsychoR package. Below we scale 16 depression symptoms reported by patients using the Quick Inventory of Depressive Symptomatology (QIDS-SR). We fit a 2D ordinal MDS on the Euclidean distance input matrix, subject to an MDS jackknife. The print output shows the jackknife measures reported above. Figure 4 shows the jackknife MDS plot. The points are placed at X (MDS configuration). The centers of the stars denote the jackknife centroids, the rays the n − 1 jackknife solutions. This result suggests that the solution is very stable.
Further options for using jackknife in MDS are presented in Vera (2017) where the distances are subject to stability analysis.

Stability of a solution II: Bootstrap
Bootstrap approaches for stability assessment in MDS were proposed by Meulman and Heiser (1983), Heiser and Meulman (1983), Weinberg, Carroll, and Cohen (1984), and further refined by Jacoby and Armstrong (2014). The smacof implementation resamples the original data and therefore works for derived dissimilarities only. The confidence ellipsoids are computed as follows. Let x i denote the row coordinates of object i from the original configuration X. Let S i be the p × p covariance matrix of the bootstrapped solutions of object i. The 100(1 − α)% confidence ellipsoid for object i is determined by the points z j for which where χ 2 (α; p) is the α-quantile of the χ 2 -distribution with df = p. In R, this computation can be easily achieved using the ellipse package (Murdoch and Chow 2020). As a stability measure, we can use a slight modification of Equation 5: N denotes the number of bootstrap replications, X * l the configuration of the l-th replication, X * the bootstrap centroid configuration. Again, ST reflects a between/total variance ratio and can be used to compare various MDS solutions against each other . For instance, one could compare an unrestricted solution with a restricted solution (see Section 5). The larger ST , the more stable the solution.
Let us apply the corresponding bootmds function on the depression data from above. We use N = 500 bootstrap replications.
R> set.seed(123) R> bootRogers <-bootmds(fitRogers, RogersSub, method.dat = "euclidean", + nrep = 500) R> bootRogers Call: bootmds.smacofB(object = fitRogers, data = RogersSub, method.dat = "euclidean", nrep = 500) In addition to the stability coefficient, the function also reports the stress averaged across bootstrap samples, including the 95% confidence interval (bootstrap percentile).  Figure 5 shows the resulting bootstrap configuration with the confidence ellipsoids. There is a fair amount of instability associated with the sleep-onset insomnia item (labeled "onset"). Ramsay (1977Ramsay ( , 1982 incorporated MDS into a parametric (log-)normal framework with maximum-likelihood estimation. This idea makes it easy to derive confidence ellipsoids around the points in the configuration. De Leeuw (2019) achieved similar ellipsoids without any distributional assumptions, based on taking the second derivatives of the stress. This approach works for symmetric MDS solutions as well as for individual difference models (INDSCAL, IDIOSCAL) of arbitrary dimensions. In its current form, its implementation is limited to ratio transformations. Expressions for the stress derivatives can be found in the corresponding paper.

Stability of a solution III: Pseudo-confidence ellipses
Let us use the same dataset as above and fit a ratio MDS. The confEllipse function computes the stress derivatives and subsequently the confidence ellipsoids.
R> fitRogers2 <-mds(RogersD) R> confRogers <-confEllipse(fitRogers2) The following plot function takes this object and produces the configuration plot with the ellipsoids. Of importance is the eps argument which we set to 0.01 below. This value implies that we look at a perturbation region where the stress value is at most 1% larger than the local minimum we have found. Figure 6 shows the corresponding configuration plot.
R> plot(confRogers, eps = 0.01, ylim = c(-0.11, 0.11), + ell = list(lty = 1, col = "gray")) Note that the scales along the axes differ from the ones in Figures 5 and 6 (apart from the fact that ratio MDS is used). This is because the SMACOF engine for estimating pseudoconfidence ellipsoids normalizes the coordinates differently (see De Leeuw 2019, for details). Also, the shape differences in the confidence ellipsoids are due to different methods used to construct the ellipsoids.

MDS biplots
Biplots were developed within the context of principal component analysis (PCA; Gabriel 1971). In a PCA biplot the loading vectors are mapped on top of the scatterplot of the principal component scores. However, the concept of biplots can be applied to other multivariate techniques as well, as elaborated in Greenacre (2010), Gower, Lubbe, andLe Roux (2011), andMair (2018). In MDS, biplots are often used to map external variables onto the MDS configuration. Such covariates allow users to explore meaningful directions in the MDS space rather than trying to interpret the dimensions directly. Note that Rabinowitz (1975) was one of the first to suggest embedding axes representing external variables into MDS solutions in order to facilitate substantive interpretations.
Let Y be a n × q matrix with q external variables in the columns, each of them centered and optionally standardized (the latter simply changes the length of the biplot vector, not its direction). To produce an MDS biplot, the following multivariate regression problem needs to be solved: where B is a p × q containing p regression coefficients for each of the q variables, and E is the n × q matrix of errors. The corresponding OLS estimatesB = (X ⊤ X) −1 X ⊤ Y give the coordinates of the external variables in the MDS space. The smacof package provides the biplotmds function which performs the regression fit. By default, the external variables are standardized internally (default scale = TRUE; scale = FALSE does centering only).
Let us start with a simple example where we map a single metric variable onto a configuration. We use a dataset taken from Engen, Levy, and Schlosberg (1958) on facial expressions (see also Heiser and Meulman 1983). Participants had to rate proximities of 13 facial expressions, resulting in the dissimilarity matrix ∆. Rating scale values were collected by Abelson and Sermat (1962) for the dimensions "pleasant-unpleasant" (PU), "attention-rejection" (AR), and "tension-sleep" (TS).
We fit an ordinal MDS solution, and map the pleasant-unpleasant (PU) variable on top of the configuration. We present two biplot versions. First, we focus on the vector representation.
Second, we use the axis representation for which the calibrate package (Graffelman 2020) turns out to be helpful. We start by computing the regression coefficients based on the centered external variable. In order to make sure that the ticks on the biplot axis correspond to the original scale, some additional preliminary lines are needed.
Let us move on with a second, more complex example involving multiple external variables which reproduces part of the analysis presented in Mair (2018). We use the mental states dataset from Tamir, Thornton, Contreras, and Mitchell (2016) who, for each individual, collected a dissimilarity matrix involving 60 mental states, derived from functional magnetic resonance imaging (fMRI) scans. The data are included in the MPsychoR package. We average across the individuals, which leads to a single 60 × 60 dissimilarity matrix, subject to a 2D monotone spline MDS. After the biplot computations, we print out the R 2 values from the individual regression fits.
R> data("NeuralActivity") R> data("NeuralScales") R> NeuralD <-Reduce("+", NeuralActivity)/length(NeuralActivity) R> fitNeural <-mds(NeuralD, type = "mspline") R> biNeural <-biplotmds(fitNeural, The vector version of the MDS biplot is given in Figure 8. The longer a covariate vector, the larger the corresponding R 2 . That is, the more accurate the corresponding axis projections are in relation to the raw data. The orientation of the vectors reflects the correlation patterns among the external variables, assuming the plot gives an accurate representation of the data (of course, we lose information here due to projecting into a low-dimensional space). Other options such as nonlinear MDS biplots are presented in Gower et al. (2011, Chapter 5), including corresponding R code.

MDS with optimal scaling on external predictors
Another advantage of the SMACOF framework compared to classical MDS is the option to fit restricted MDS variants. There are two basic strategies to constrain an MDS configuration. The first option involves internal constraints where the points are forced to be located on a geometric shape. For the particular case of a sphere this can be achieved using smacofSphere. The corresponding theory was described in De Leeuw and Mair (2009b). The only spherical update since the original publication has been the incorporation of various types of dissimilarity transformations in smacofSphere.
Here we focus on a second strategy, that is, imposing external constraints on the configuration in the tradition of De Leeuw and Heiser (1980), Borg and Lingoes (1980), and Heiser and Meulman (1983). The simplest form of such a restriction is a linear restriction directly incorporated into the stress formula given in (1). Z is a known covariate matrix of dimension n × q with number of covariates q ≥ p. C is a q × p matrix of regression weights to be estimated, subject to potential additional restrictions, as outlined below.
For practical purposes, however, this basic implementation is of limited use. For instance, specifying a 2 × 2 ANOVA design in Z collapses point coordinates to only four points in a 2D configuration. What makes the external restriction concept attractive in practice is to apply an additional optimal scaling step on the external scales within each majorization iteration.
Each predictor variable z 1 , . . . , z q is subject to an optimal scaling transformation. A popular option is to scale these vectors in an ordinal way (i.e., using monotone regression). Other transformations such as interval or splines (with or without monotonicity constraints) are implemented in smacof as well. Note that, from a specification point of view, these external variable transformations are unrelated to the dissimilarity transformations introduced in Section 2.
Let us illustrate such a constrained MDS using the face expression data from Section 4. We include the two external variables "pleasant-unpleasant" (PU) and "tension-sleep" (TS). They constitute the matrix Z. We restrict C to be diagonal, which performs dimensional weighting. Note that for this diagonal restriction the number of dimensions is determined by the number of covariates (i.e., q = p), since each covariate defines an axis (dimension). We also use the configuration from an unrestricted ordinal fit as initial configuration. It is important that the user provides a reasonable starting configuration for the constrained MDS computation; using one from an unrestricted fit is in general a good option.
Let us start with the first constrained MDS model: ordinal dissimilarity transformation of ∆, interval transformed external variables in Z, diagonal regression weights restriction in C.
The resulting MDS configuration is given in Figure 9. Using the calibrate package the axes of the external variables (original scales) can be added (see supplemental code materials). These axes are a simple form of biplot axes, resulting from the diagonal restriction in C. For this interval transformed solution the observed values in Z can be directly read from the PU and TS axes; the configuration coordinates reproduce these values exactly.
In a second fit we relax the interval transformation of Z in terms of an ordinal transformation. C is still kept diagonal.
R> fitFaceC2 <-smacofConstraint(FaceExp, type = "ordinal", + constraint = "diagonal", external = Z, constraint.type = "ordinal", + init = fitFace$conf) R> round(fitFaceC2$C, 3)  Due to the less restrictive nature of this specification this solution has a lower stress value (0.159) than the interval transformed solution from above. Figure 10 gives some insight into the ordinal transformations performed internally on each column of Z. Figure 11 shows the configuration with the transformed axes on top and to the right. Again, the points can be projected onto these axes. The corresponding values match the ones inẐ. For the next constrained MDS variant we use all three external variables in the dataset (i.e., PU, AR, and TS). As q > p we need to relax the diagonal restriction in C: we keep C unrestricted and use once more an interval transformation of Z.
R> Z <-FaceScale R> fitFaceC3 <-smacofConstraint(FaceExp, type = "ordinal", + constraint = "unrestricted", external = Z, constraint.type = "interval", + init = fitFace$conf) R> round(fitFaceC3$C, 3) Again, the three biplot axes can be mapped onto the configuration using the calibrate package, after computing the regressions Z = XB with Z column-centered (see supplemental materials for the entire code chunk). the axes are concerned, the biplot suggests that PS and TU are almost orthogonal, whereas the predictions TS and AR are highly correlated in this 2D space. Dimension 2 lines up with the AR axis which is useful for the interpretation of the configuration.
A general dimensional interpretation on the basis of the external variables no longer holds since C is not diagonal: the solution is rotated/reflected followed by dimensional stretching. By applying an SVD on C the user can get the rotation matrices and the dimension stretching values (see Borg and Groenen 2005, p. 232, for details).

Unfolding
As mentioned in the introduction, one of the major updates since the first publication of the package was a complete re-implementation of the unfolding function. This update gives the user the possibility to apply the usual transformations on the dissimilarities, to incorporate circular restrictions, and to fit row-conditional and external unfolding models.

Unfolding theory
Unfolding takes a rectangular dissimilarity matrix ∆ of dimension n × m with elements δ ij (i = 1, . . . , n and j = 1, . . . , m) as input. Such data are most often rankings or ratings. The smacof Version 2: Multidimensional Scaling and Unfolding in R stress function from Equation 1 changes to with the fitted Euclidean distances (p-dimensional space) expressed as X 1 is an n × p matrix (row configuration), X 2 an m × p matrix (column configuration), and D the n × m matrix of disparities. Again, the weights w ij and the dissimilarities δ ij must be non-negative.
In terms of stress normalization, Borg and Groenen (2005, Section 11.1) argue that one could find an optimal dilation factor that multiplies both the row and column coordinates by the same constant, leading to This expression provides a short cut to compute the stress-1 value, given that we allow for an optimal dilation constant. At the same time it is a trick for interpretation in terms of the well known stress-1 value after all the majorization computations are done. Details on the majorization approach in the case of ratio transformations are given in (De Leeuw and Mair 2009b). Below we elaborate on a modification that is able to handle general monotone dissimilarity transformations from Section 2.

Transformation functions
Early versions of non-metric multidimensional unfolding (i.e., ordinal transformation) are described in Coombs (1964). Busing et al. (2005) elaborate in detail on the challenges of including transformation functions in unfolding with respect to optimization. One major problem is degeneracy of the solution due to equal disparities. They suggest to penalize the stress function by the coefficient of variation, which moves the algorithm away from solutions with small variation in the fitted distances. The corresponding badness-of-fit target is called p-stress: The coefficient of variation ν(D) is calculated on the basis of the disparities and enters the penalty term as follows: .
Obviously this penalty term acts as a multiplicative factor in Equation 16. As ν(D) decreases, the p-stress penalization increases. There are two tuning parameters involved in this p-stress setup: • λ ∈ (0; 1] is a lack-of-penalty parameter that controls the influence of penalty term: the larger λ, the smaller the penalty influence. • ω acts as range parameter in the penalty term: for a small ω the penalty is especially effective if ν(D) is small. Busing et al. (2005) did an extensive simulation study in order to provide suggestions on how to fix the tuning parameters. For conditional unfolding, it is suggested to set λ = 0.5, and ω = 1 (default settings in unfolding) 5 . For unconditional unfolding, they suggest that one uses λ = 0.5 and ω > 0.1. Further details can be found in the corresponding publication.
The p-stress target can be minimized using majorization, for which the details are again given in Busing et al. (2005). From a practical point of view, after obtaining a p-stress optimized solution, users can consider the stress-1 from Equation 15 as goodness-of-fit index 6 . Note that all the dissimilarity transformation functions from MDS (i.e., ratio, interval, ordinal, spline; cf. Section 2) are implemented for unfolding as well.
Let us illustrate an example of an ordinal unfolding solution. We use a dataset from Dabic and Hatzinger (2009), available in the prefmod package (Hatzinger and Dittrich 2012), where individuals were asked to configure a car according to their preferences. They could choose freely from several modules such as exterior and interior design, technical equipment, brand, price, and producing country. We use only the first 100 individuals in this analysis. Since not all individuals ranked all objects, we have the situation of "partial rankings". The unfolding function specifies a proper weight matrix W automatically: w ij = 0 if δ ij is missing; w ij = 1 otherwise. This way, the corresponding missing dissimilarities are blanked out from the optimization. For the ordinal unfolding model we are going to fit, this weight matrix can be extracted using unf_ord$weightmat.  This call prints out the stress-1 value as well as the final p-stress value. The configuration plot and Shepard diagram shown in Figure 13 can be produced as follows: R> plot(unf_ord, main = "Unfolding Configuration Car Preferences", + ylim = c(-1, 1)) R> plot(unf_ord, plot.type = "Shepard", + main = "Shepard Diagram Car Preferences") The Shepard diagram shows the ordinal transformation of the input dissimilarities, whereas the configuration plot maps the row and column coordinates into a joint space, which makes distances between any pair of points interpretable.

Row-conditional unfolding
The solution above is an unconditional (also called matrix-conditional) unfolding solution since a single transformation function is estimated that applies to all individuals. The ranks are compared unconditionally which, in some situations, is a fairly restrictive assumption. For instance, in our example it might be the case that individual i is quite indifferent to all car characteristics, but still ranks them. Individual i ′ , however, could have strong preferences but might end up with the same ranking as individual i. Treating these ranks as equal is a strong assumption. Similarly, for rating data in ∆, individual i's rating of, for instance, 2 is assumed to be equal to any other individual's rating of 2.
Row-conditional unfolding relaxes this single transformation assumption by estimating separate transformation functions for each individual (i.e., for each row in ∆). Technically, what changes with respect to the p-stress expression in Equation 16 is the penalty term. Busing et al. (2005) suggest using the harmonic mean for row-wise aggregation of the penalty components which modifies Equation 17 to Thed i 's are the row vectors inD. The raw stress term in Equation 16 remains unadjusted since it is additive over the rows.
Let us fit a row-conditional version of the ordinal unfolding on the car characteristics data. We use the final configuration obtained above as starting configuration. Note that for running time purposes we set a slightly more generous convergence boundary ε than the default 7 . In general, we recommend to increase the number of iterations using the itmax argument, if needed. For a reasonably large sample size it can take a while for the algorithm to converge. A parallelized fit can be evoked through the parallelize argument.

External unfolding
External unfolding uses fixed coordinates for either the rows or the columns. Such fixed coordinates might come from a previous analysis, or might be based on an underlying theory. Early external unfolding references are Carroll (1972), Srinivasan and Shocker (1973), Rodgers andYoung (1981), andDeSarbo andRao (1984). Within each iteration either the row coordinates in X 1 (in case of fixed coordinates denoted as F 1 ), or the column coordinates in X 2 (in case of fixed coordinates denoted as F 2 ) need to be constrained and scaled. The scaling factors for fixed rows and fixed columns, respectively, are Based on these scaling factors the updated coordinates are X 1 := s 1 F 1 in the case of row restrictions, or X 2 := s 2 F 2 in the case of column restrictions. Using this adjustment the new coordinates are properly scaled with respect to the unconstrained column/row coordinates, while maintaining the specified shape constraints. To illustrate, we use a dataset from Borg, Bardi, and Schwartz (2017). We focus on the Portrait Value Questionnaire (PVQ) portion of the data which result from a questionnaire of 40 items assessing how persons rate the personal importance of ten basic values: power (PO), achievement (AC), hedonism (HE), stimulation (ST), self-direction (SD), universalism (UN), benevolence (BE), tradition (TR), conformity (CO), security (SE) on a scale from 0 to 6. We use an aggregated version where the item scores belonging to the same psychological value are averaged. As fixed coordinates we use the following value circle coordinates: R> tuv <-matrix(NA, nrow = ncol(PVQ40agg), ncol = 2) R> alpha <--360/10 R> for (i in 1:10){ + alpha <-alpha+360/10 + tuv[i,1] <-cos(alpha*pi/180) + tuv[i,2] <-sin(alpha*pi/180) + } This specification is different from spherical unfolding introduced below, as we fix the value coordinates on the circle (equidistant) instead of just forcing them to be aligned on a circle. Of course, in external unfolding we can specify any arbitrarily fixed configuration; it does not have to be a circle.
Below we fit two solutions: an unconstrained ordinal unfolding solution, and a constrained ordinal unfolding solution with fixed circular column coordinates. Since smaller responses in the PVQ data reflect larger dissimilarities, we reverse the category scores.

Spherical unfolding
Sometimes it is of interest to restrict either row coordinates or column coordinates to be on a geometric shape such as a sphere. Technically, this implies that within each iteration the row (or column) coordinates have to be spherically restricted. Let us elaborate on this restriction for the row coordinates in X := X 1 for p = 2 (for the column coordinates in X 2 it works in an analogous fashion). Each row vector x i can be expressed in polar coordinates (see Cox and Cox 1991): with θ i as the corresponding angle and r i as the radius. We aim to find a circular restricted configuration X c for which the row vectors have polar coordinates We have a single radius r and the corresponding angle θ c,i . To compute X c , we want to minimize the quadratic part of the majorizing function: r × r i (cos(θ c,i ) cos(θ i ) + sin(θ c,i ) sin(θ i )).
Substituting the optimal θ c,i = θ i gives Setting the first derivative equal to zero yields the update This simple expression gives us the optimal circular projection of the row coordinates in X.
As mentioned above, the same steps can be carried out for the column coordinates (X := X 2 ; replace i by j, and n by m in these equations).
To illustrate an unfolding solution where we restrict the column coordinates to be on a circle, we use once more a dataset from  which builds on the Schwartz (1992) value circle theory. The data are derived from the Schwartz Value Survey (SVS). They were centered (row-wise) and converted from preferences into dissimilarities, hence representing a rectangular dissimilarity matrix ∆ with 327 persons and 10 variables referring to Schwartz' psychological values: power, achievement, hedonism, stimulation, self-direction, universalism, benevolence, tradition, conformity, and security. We fit two (ratio) unfolding solutions: an unrestricted one as well as one with circular restrictions on the column coordinates (values): R> unf_vals <-unfolding(indvalues) R> unf_valsc <-unfolding(indvalues, circle = "column") Comparing the stress-1 values we get 0.171 for the unrestricted solution, and 0.179 for the restricted solution. This suggests that the circular solution is basically as good as the unrestricted one.
The reason for this becomes obvious when looking at the configuration plots in Figure 16. The unrestricted solution in the left panel suggests that the personal values approximate a circle, as suggested by Schwartz' value theory. The configuration in the right panel results from forcing the psychological values to be arranged on a circle. Tucker (1960) propsed the vector model of unfolding (VMU) which Carroll (1972) later called MDPREF. It is basically a principal component analysis (PCA) on the transposed input similarity matrix P. After a singular value decomposition P ⊤ ≈ UΣV ⊤ for given dimensionality p, the row coordinates are obtained by X 1 = √ m − 1UΣ, and the column coordinates by X 2 = (m − 1) −1/2 V. We apply the VMU on the unreversed PVQ data from above, since the inputs have to be similarities. Note that, by default, the vmu function does an internal row-wise centering of the data. The results can be visualized using a biplot, where the row scores are represented as preference vectors (see Figure 17).

Unidimensional scaling
Unidimensional scaling can be applied in situations where one has a strong reason to believe that there is only one underlying dimension, such as time, ability, or preference. Even though unidimensional scaling can be considered as a special case of MDS, it is generally discussed separately  since the local minimum problem becomes serious if mds is used with ndim = 1. The smacof package provides a simple implementation where all possible n! dissimilarity permutations are considered for scaling, and the one which leads to a minimal stress value is returned. Obviously, this strategy is applicable only to a small number of objects, say less than 10 objects 8 .
In the following example we examine seven works by Plato, map them on a single dimension, and explore to which degree the mapping reflects the chronological order by which they are written. The input dissimilarities are derived according to the following strategy: Cox and Brandwood (1959) extracted the last five syllables of each sentence; each syllable is classified as long or short which gives 32 types; and based on this classification a percentage distribution across the 32 scenarios for each of the seven works can be computed, subject to a Euclidean distance computation.
R> PlatoD <-dist(t(Plato7)) R> fitPlato <-uniscale(PlatoD, verbose = FALSE) R> round(sort(fitPlato$conf), 3) The last line prints the 1D "time" configuration of Plato's works that lead to the lowest stress value. Note that the exact chronological order of Plato's works is unknown; scholars only know that "Republic" was the first work, and "Laws" his last one. Copleston (1949, p. 140) suggests the following temporal order of these selected seven works: Republic, Sophist, Politicus, Philebus, Timaeus, Critias, Laws. Obviously, our unidimensional scaling model advocates a different chronological order.

Gravity model
The idea of the gravity model goes back to Newton's law of universal gravitation where he states that force is proportional to the product of the two masses, and inversely proportional to the square of the distance between them. Haynes and Fotheringham (1984) present a wide range of statistical applications of this gravity concept with special emphasis on spatial interactions. Within an MDS context, the gravity model turns out to be especially useful for text co-occurrence data (Mair, Rusch, and Hornik 2014;Borg et al. 2018) in order to avoid that words with high co-occurrences dominate the solution. Note that the sim2diss function already includes a basic gravity computation (see Table 2). Here we present the gravity function, which does a very similar transformation, but is designed to take a document-term matrix (DTM) as input.
The first step is to binarize the DTM: if a word (columns) is mentioned at least once in a particular document (rows), the corresponding cell entry is 1, and 0 otherwise. Let Y denote this binarized DTM. From this simplified structure the word co-occurrence matrix C can be computed by C = Y ⊤ Y. Thus, only the information on whether words occur together or not is considered. C is a symmetric matrix (with elements c ij ) with co-occurrence frequencies in the off-diagonals. Let c i+ = j̸ =i c ij and c +j = i̸ =j c ij be the respective margins of C (diagonal blanked out). The gravity model defines the following dissimilarities (for i ̸ = j): To illustrate a gravity application on text data we use a DTM similar to the one presented in Mair et al. (2014). This DTM was created on the basis of statements of 254 Republican voters who had to complete the sentence "I am a Republican because...". First, let us create the gravity dissimilarities according to the strategy outlined above.

R> gopD <-gravity(GOPdtm)$gravdiss
Note that using text data, C is typically sparse (i.e., many elements c ij = 0). For these elements we cannot compute Equation (19) since we divide by 0. The function sets the corresponding entries to NA. In the subsequent MDS call, these elements are automatically blanked out by setting the corresponding weight w ij to 0 in the basic stress equation.
The larger a bubble, the larger the contribution of a particular object (here, word) to the total stress. Objects with large SPP values are responsible for misfit. The closer two words are in the configuration, the more frequently they have been mentioned together in a single statement.
An extension of this model is presented in Mair et al. (2014), who introduce the exponent λ in order to emphasize larger dissimilarities. The reason for this is that in text data we often end up with little variance in the input dissimilarities which leads to a concentric, circular representation of the configuration. Equation 19 changes to This extension is called power gravity model. The parameter λ needs to be chosen ad hoc and can take values from [−∞, ∞]. For λ < 1 we shrink large dissimilarities, for λ = 1 we end up with the ordinary gravity model, and for λ > 1 we stretch large dissimilarities. Note that there is a trade-off between the choice of λ and the stress value: the more structure we create, the higher the stress value. This extension is relevant for metric MDS strategies such as ratio, interval, or spline MDS. The λ parameter can be specified in the gravity function.
A recent work by Rusch, Mair, and Hornik (2021) embeds the gravity formulation into a more general loss function which, among other things, finds an optimal λ during the optimization process.

Asymmetric MDS
So far, except in unfolding, ∆ has been a symmetric matrix of input dissimilarities. In this section we aim to scale square asymmetric dissimilarity matrices. Young (1975), Collins (1987), Zielman and Heiser (1996), Borg and Groenen (2005, Chapter 23), and Bove and Okada (2018) present various asymmetric MDS variants of which smacof implements the drift vector model (Borg 1979). The starting point of this approach is to decompose the asymmetric dissimilarity matrix ∆ into a symmetric part M, and a skew-symmetric part N: with M = (∆ + ∆ ⊤ )/2, and N = (∆ − ∆ ⊤ )/2. In smacof, the symdecomp function can be used for this decomposition. Drift vector models display simultaneously the symmetric and the skew-symmetric part of ∆. They first fit an MDS (of any type) on the symmetric matrix M, resulting in the configuration X. The asymmetric information is then incorporated as follows (Borg and Groenen 2005, p. 503): • For each object pair i, j compute a ij = x i − x j which results in a vector of length p.
• Norm a ij to unit length, resulting in b ij = a ij / a ⊤ ij a ij .
• Incorporate the skew-symmetric part: c ij = n ij b ij with n ij as the corresponding element in N (drift vectors).
• For a given point i, average the elements in c ij : d i = n −1 j c ij (average drift vectors). • For plotting, compute the vector lengths of d i (root mean square of its elements, scaled by a factor of √ n/mean(M)), and the direction angle (relative to the y-axis) of To illustrate the drift vector model, we use the classical Morse code data by Rothkopf (1957). Rothkopf asked 598 subjects to judge whether two signals, presented acoustically one after another, were the same or not. The values are the average percentages for which the answer "Same!" was given in each combination of row stimulus i and column stimulus j, where either i or j was the first signal presented. The responses were aggregated to confusion rates and subsequently subtracted from 1, such that the values represent dissimilarities. The driftVector function performs the decomposition from Equation 21, fits an MDS of choice on M, and applies the drift vector computation steps outlined above. For the Morse code data, the resulting drift configuration plot based on a 2D ordinal MDS fit, is given in Figure 19.
R> morseDrift <-driftVectors(morse2, type = "ordinal") R> plot(morseDrift, main = "Drift Vectors Morse Codes", + col.drift = "darkgray") We see that the vectors tend to point in the bottom left direction; they are certainly not random. In the bottom left quadrant we mostly have longer signals suggesting that shorter signals are more often confused with longer ones than vice versa. Note that the plot function has a vecscale argument by which the user can modify the length of the drift vectors by a scaling factor.
Other approaches to scale asymmetric data implemented in R are the following. Vera and Rivera (2014) embed MDS into a structural equation modeling framework. Their approach is implemented in the semds package (Vera and Mair 2019). Zielman and Heiser (1993) developed a slide-vector approach which is implemented in asymmetry (Zielman 2021). Unfolding strategies from Section 6 can also be used for asymmetric dissimilarity matrices. An application can be found in Sagarra, Busing, Mar-Molinero, and Rialp (2018).

Procrustes
Sometimes it is of interest to compare multiple MDS configurations based on, for instance, different experimental conditions (the objects need to be the same within each condition). The idea of Procrustes (Hurley and Cattell 1962) is to remove "meaningless" configuration differences such as rotation, translation, and dilation (see Commandeur 1991, for an overview of Procrustean models). Note that Procrustes transformations do not change the fit (stress value) of an MDS.
In brief, Procrustes works as follows. Let X and Y be two MDS configuration matrices. X is the target configuration, and Y the configuration subject to Procrustes transformation leading to the transformed configuration matrixŶ. Further, let Z be a centering matrix (Z = I − n −1 11 ⊤ ). Procrustes involves the following steps: 1. Compute C = X ⊤ ZY.
The matrixŶ contains the Procrustes transformed configuration and replaces Y. The target configuration X andŶ can be plotted jointly and allows researchers to explore potential differences between the configurations.
The dataset we use to illustrate Procrustes is taken from Vaziri-Pashkam and Xu (2019). In their fMRI experiment on visual object representations they used both natural and artificial shape categories to study the activation of various brain regions (each object represents a particular brain region). We start with fitting two ordinal MDS solutions, one for each condition R> artD <-sim2diss(VaziriXu$artificialR) R> fitart <-mds(artD, type = "ordinal") R> realD <-sim2diss(VaziriXu$realR) R> fitnat <-mds(realD, type = "ordinal") By plotting the two configurations, Figure 20 suggests that these configurations are different. Let us apply a Procrustes transformation with the artificial condition as target configuration X, and the natural condition solution as testee configuration Y, subject to Procrustes transformation.
R> fitproc <-Procrustes(fitart$conf, fitnat$conf) R> fitproc The print output shows the rotation matrix, the dilation factor, and the translation vector (which is always 0 if two MDS configurations are involved, due to normalization constraints). In addition, it reports Tucker's congruence coefficient for judging the similarity of two configurations. This coefficient is derived from factor analysis and can be computed as follows: A few remarks regarding the congruence coefficient. First, it is generally recommended that one uses the congruence coefficient to judge configurational similarity and not the correlation coefficient, since correlating distances does not properly assess the similarity of configurations (see Borg and Groenen 2005, p. 439-440 for details). Second, there is actually no Procrustes transformation needed to compute the congruence coefficient, since c(X, Y) = c(X,Ŷ). Third, when applying (22)  As an alternative we can consider Guttman's alienation coefficient, which is simply This measure differentiates better between two solutions than the congruence coefficient. Figure 21 shows the Procrustes transformed solution (i.e., plotting X andŶ jointly) and suggests that the two configurations are actually very similar, apart from a few points. The user can request the (sorted) distances between each pair of testee and target points via fitproc$pairdist. V4, inferior IPS, IPS3, and V3B show the biggest differences across the two conditions.
R> plot(fitproc, legend = list(labels = c("artificial", "natural"))) Another option to apply Procrustes is to use a theoretical configuration as target. For instance, Borg and Leutner (1983) constructed rectangles on the basis of a grid design (as contained in rect_constr) which we use as target configuration. Participants had to rate similarity among rectangles within this grid. Based on these ratings a dissimilarity matrix was constructed, here subject to a 2D ordinal MDS solution. Within the context of theoretical solutions it is sometimes interesting to determine the stress value based on the dissimilarity matrix and an initial configuration (with 0 iterations). The stress0 function does the job.
R> stress0(rectangles, init = rect_constr) Now we fit an ordinal MDS model, using the theoretical rectangle alignment as starting value. The resulting MDS configuration is used as testee configuration Y, subject to Procrustes.
Note that Procrustes is not limited to MDS applications. It can be applied to any configuration matrices X and Y as long as the objects involved are the same and the matrices have the same dimensions. Other forms of generalized Procustes analysis are given in Borg and Groenen (2005, Section 20.9).

Conclusion
In this follow-up paper to De Leeuw and Mair (2009b), who introduced the smacof package, we presented numerous updates that have been implemented over the years. It is safe to say that these developments establish smacof as the most comprehensive implementation of MDS and unfolding techniques in R. Still, there are several tasks on our to-do list. First, we plan to implement a fastMDS routine entirely written in C to speed up computations for large data settings. Second, we will work on an implementation of inverse MDS (De Leeuw and Groenen 1997). Third, we aim to extend spherical MDS and unfolding to more general geometric shapes such as a pre-specified polygon mesh.