My Early Interactions with Jan and Some of His Lost Papers

It has been over 40 years since I got to know Jan. This period almost entirely overlaps my career as a psychometrician. During these years, I have had many contacts with him. This paper reviews some of my early interactions, focussing on the following topics: (1) An episode surrounding the inception of the ALSOS project, and (2) Jan’s unpublished (and some lost) notes and papers that I cherished and quoted in my work, including (2a) the ELEGANT algorithm for squared distance scaling, (2b) the INDISCAL method for nonmetric multidimensional scaling (MDS), and (2c) notes on DEDICOM.


The inception of the ALSOS project
I met Jan for the first time in the spring of 1974. I was a first year graduate student (Ph.D. level) at the Thurstone Psychometric Labs at the University of North Carolina at Chapel Hill under the supervision of Forrest Young. Jan was at Bell Labs working with Joe Kruskal and Doug Carroll. He was on a leave of absence from Department of Data Theory at the University of Leiden. One day Jan came down to Chapel Hill to give a talk in a colloquium organized by Forrest. I do not remember exactly what he talked about in his talk, but it must have been something about nonmetric multidimensional scaling (MDS), which was then one of the hottest topics in psychometrics. It has been more than 40 years since then, and I have had many interactions with Jan, the most memorable one being a joint project, later known as the "ALSOS" project (Young 1981;. Between 1976 and1980, we (Young, De Leeuw, and Takane in some order) published seven papers, all having an "Alternating Least Squares method with Optimal Scaling features" as their subtitles Takane, Young, and De Leeuw 1977, 1979, 1980Young et al. 1980;Young, Takane, and De Leeuw 1978). Here is my personal account of how it started.
During the spring semester of 1974, I took a course in Developmental Psychology. I was not particularly interested in this course, but we were required to take three courses offered outside the Psychometric Labs as one of the requirements for a Ph.D. degree. The course used a new textbook by Liebert, Poulos, and Strauss (1974), which contained a report on the study of Kempler (1971) on (the failure of) conservation of quantity in young children. In his study, Kempler constructed a set of 100 rectangles by systematically varying their height and width over ten levels each (ranging from 10 inches to 14.5 inches in half an inch interval), and asked four groups of children of varying age (1st, 3rd, 5th, and 7th graders) to judge if the rectangles looked large or small. He then calculated the mean height and width of the rectangles judged large for the four groups of children separately. He found that the mean height of the rectangles judged large was disproportionately large for younger children, while just the opposite was true for the mean width. As the children got older, the mean height of the rectangles judged large decreased rather consistently, while the mean width increased, until the two became almost equal in the end. Kempler thought this was because younger children tended to put more emphasis on height than width when they made area judgments of rectangles, but as they got older, they became able to take into account both the height and width of rectangles more equitably.
When I saw this report, I immediately thought that this would be an ideal example for the weighted additive model (WAM). It was the mid-1970s, and weighted distance models like INDSCAL (Carroll and Chang 1970) were very popular among psychometricians to account for certain types of individual differences in (dis)similarity judgments. The INDSCAL model postulates a common spatial representation of stimuli across all individuals, while also assuming differential weights attached to different dimensions by different individuals that give rise to individual differences. I was thinking something similar should hold in more fundamental additive models. I was very excited about my idea, and immediately went to talk about it with Forrest, who was then developing computer programs called POLYCON (polynomial conjoint scaling) and ADCON (additive conjoint scaling). Forrest was very happy about my idea of the weighted additive model, and we immediately started thinking about expanding his projects. We thought it would make more sense to begin with simple additive models (ADDALS;De Leeuw et al. 1976) because the ADCON project was already in progress, and the WAM for all groups or individuals combined presupposed a simple additive model for each group or each individual. It was so decided that my first duty as a research assistant was to rewrite the ADCON program to make it more general in terms of the number of additive factors, accommodating possibly non-orthogonal factors, handling of missing data, etc. I recall POLYCON used the more conventional steepest descent method (a la Kruskal 1964b) for optimization, while ADCON used what would later come to be named an alternating least squares (ALS) algorithm. We kept using the latter algorithm in ADDALS as well as in all subsequent programs developed under the ALSOS project. At this stage, the ALS algorithm was used rather heuristically. We knew from our experience that it was monotonically convergent in the sense that it consistently minimized the least squares criterion, but we were unable to prove it theoretically. As a first year Ph.D. student in psychometrics, I had little knowledge of numerical optimization algorithms at the time.
It was in these circumstances Jan arrived in Chapel Hill. We had a hidden agendum for his visit. We asked him about our algorithm used in ADCON, and Jan could immediately tell us that it was monotonically convergent, and that it was straightforward to show it theoretically. He also suggested that the algorithm might be called alternating least squares (ALS), and that the extended ADCON program might be renamed ADDALS. We talked about other possible models that might be fit in a similar way. It was clear by this time that our algorithmic scheme would work not only for simple and weighted additive models Takane et al. 1980), but also for other linear models (e.g., regression analysis models; Young et al. 1976), bilinear models (e.g., principal component analysis models; Young et al. 1978), quadratic models (e.g., weighted and unweighted squared Euclidian models; Takane et al. 1977), and common factor analysis models (Takane et al. 1979). The three of us (Young, De Leeuw, and Takane) agreed to set up a joint project later known as the ALSOS project (Young 1981;Young et al. 1980). ALSOS has two major ingredients, optimal scaling (OS) and alternating least squares (ALS). Psychometrics has a long tradition of "scaling up" the data measured on lower scale levels into higher ones through models of the data (Takane 2005), called OS. Its proliferation in the 1970s and 80s was, however, mainly inspired by the development of nonmetric multidimensional scaling (Kruskal 1964a,b;Shepard 1962a,b), where observed (dis)similarity data measured on an ordinal scale were monotonically transformed to fit a distance model. This was done by minimizing a least squares (LS) criterion that measures the overall discrepancy between monotonically transformed data and distances between stimuli represented as points in a multidimensional space.
More specifically, let us denote the least squares (LS) criterion by φ (X,d), where X indicates the matrix of stimulus coordinates, andd is the vector of monotonically transformed data. We minimize this criterion with respect to bothd and X. Kruskal (1964b) approached this problem by first minimizing the criterion with respect tod conditionally on X, and then unconditionally with respect to X. This process may formally be expressed as whereφ(X) = mind |X φ(X,d) is the conditional minimum of φ(X,d) with respect tod given X, which can only be obtained algorithmically for a specific X given. The unconditional minimum ofφ(X) with respect to X can in turn be obtained by an iterative procedure such as the steepest descent method. This minimization strategy works well for the simple Euclidean distance model used in nonmetric MDS, where the number of parameters to be estimated is usually not exceedingly large. However, it may not work so well when fitting other models. For example, the model in principal component analysis (PCA) tends to have a large number of parameters, since there are both component loading and score matrices to be estimated. This may well be the reason why Kruskal and Shepard (1974) drew a rather negative conclusion about their nonmetric PCA procedure.
In the ALS algorithm, on the other hand, a parameter updating procedure is split into two distinct phases: min each of which is a relatively simple conditional minimization step. These two steps are iterated until no significant improvement in the value of φ occurs from one iteration to the next. Since the model estimation phase (the first phase) can further be broken down into smaller sub-steps, a large number of parameters in the fitted model is no longer a big issue. It is monotonically convergent because each step obtains a conditional minimum of one subset of parameters while fixing all others. An additional benefit comes from the fact that a normalized LS criterion can often be minimized by minimizing an unnormalized LS criterion combined with an actual normalization of model parameters (De Leeuw 1977b). Consequently, ALS quickly became one of the most popular optimization strategies in psychometrics. Long after the ALSOS project was over, I am still benefitting from the handiness of ALS algorithms. I have kept using the algorithms in various contexts, e.g., in different constraints on different dimensions (DCDD; Takane, Kiers, and De Leeuw 1995), extended redundancy analysis (ERA; Takane and Hwang 2005), and generalized structured component analysis (GSCA; Takane 2004, 2014) and its variants (Jung, Takane, Hwang, and Woodward 2012;Zhou, Takane, and Hwang 2016). Interestingly, it has been shown that WAM is a special case of all of these models (DCDD, ERA, and GSCA). Under certain circumstances, it is also a special case of constrained principal component analysis (CPCA; Takane 2013).
There still is a lingering problem with ALS algorithms (Takane 1992) in my mind, however. The convergence of ALS algorithms is based on the notion that a monotonically decreasing bounded sequence must converge. This is a well known fact in mathematics. However, this condition alone does not guarantee that a convergence point is a local minimum of an objective function. It is possible that the convergence point is some accumulation point that is not a local minimum. This is because the amount of improvement in the value of the objective function can get smaller and smaller as the iteration proceeds, and the algorithm may never get to the desired point. I often use the following anecdote to illustrate my point: Suppose we would like to go from Seattle to Los Angeles. Our plan is to go on each day a half way between the current location and San Francisco. This way we are getting closer to Los Angeles every day, but we know that we can never get to Los Angeles (or anywhere beyond San Francisco). We say that there is a point of discontinuity at San Francisco in our travel plan. Obviously, there should be no such points of discontinuity to be able to reach our desired destination. Essentially the same holds for numerical algorithms. Consequently, Zangwill (1969) requires that there are no such points in our algorithm, but he does not show explicitly how we can verify it in practice. In numerical optimization literature (e.g., Fletcher 1987), the notion of continuity in the algorithm (which should not be confused with the continuity of the objective function) is replaced by a condition of a sufficient decrease in the value of the objective function in each step of the algorithm (such as Wolfe's condition). The sequential nature of the ALS algorithms, however, makes it difficult to verify this condition directly.

Unpublished (and some lost) papers
Toward the end of October 2014, I received an e-mail from Jan asking me if I still had copies of a couple of his unpublished papers. One was De Leeuw (1977b) entitled "A Normalized Cone Regression Approach to Alternating Least Squares Algorithms" (this paper happened to be quoted above), and the other was De Leeuw (1975) entitled "An Alternating Least Squares Algorithm for Squared Distance Scaling." While I could immediately locate the former, I could not find the latter, which regrettably seemed to have gotten lost forever. Those were the days before personal computers, and even published papers were not likely to be on an electronic medium. Let alone unpublished ones. Somewhat fortunately, the content of De Leeuw (1975) has been described in some detail in Takane's MDS book in Japanese (Takane 1980). The algorithm used in this paper was called the ELEGANT algorithm for a reason to be noted below.
When I first met Jan in 1974, he struck me as already a very mature scientist. His mathematical expertise was awesome. I thought I had a lot to learn from him. After he went back to Bell Labs, he generously sent me several working papers of his, and I appreciated very much the opportunities to study his unpublished papers closely. When he was young, Jan was saying that he would not publish a paper unless it was accepted essentially as was in the first round of review. It was kind of rare, however, that papers were accepted without any revisions.
So quite a few of his papers remained unpublished, some of which were subsequently lost permanently. In addition to De Leeuw (1975), two more papers (notes) were subsequently identified as missing. They are the INDISCAL paper and notes on DEDICOM. The former deals with an indicator method for nonmetric MDS, and the latter concerns some important properties of the algorithm I developed earlier for DEDICOM. The former was quoted as De Leeuw, Takane, and Young (in preparation) in Takane (1977), but for some reason, it is not listed in the reference list of the paper. So the exact title of the paper is unknown. (There is a slight chance that it has never been written formally, but a description of the procedure has been given in Takane (1980), which may be based on the brief description of the idea in De Leeuw (1974).) The latter was referenced as De Leeuw (1983) in Takane (1985). In what follows, I will try my best to reproduce the contents of Jan's three lost papers as faithfully as possible as a token of my appreciation to him.

The ELEGANT algorithm for squared distance scaling
This is an algorithm for fitting squared Euclidean distances. Let X denote the n-stimuli by r-dimensions matrix of stimulus coordinates. Let o ij represent the observed dissimilarity between stimuli i and j, and let d ij represent the corresponding Euclidean distance calculated from X. We assume that o ij is symmetric, so that it is sufficient to consider only j > i. Let D o 2 denote the diagonal matrix of order n * = n(n − 1)/2 with o 2 ij 's arranged in certain order as its diagonal elements, and let D d 2 denote the diagonal matrix of order n * with d 2 ij arranged in the same order as the diagonal elements of D o 2 . Let τ (X) represent the least squares criterion defined on squared distances. This may be written as Let A denote the n * by n design matrix for pair comparisons. A row of A represents a stimulus pair and a column of A represents a stimulus. If the k-th diagonal element of D o 2 (and D d 2 ) represents the (squared) dissimilarity (resp. distance) between stimuli i and j, the k-th row of A has 1 in the i-th column and −1 in the j-th column with all other elements equal to zero. Then τ (X) above can be rewritten as Define Q as and Q * as Then τ (X) can be further rewritten as: By differentiating the above τ and setting the result equal to zero, we obtain assuming temporarily that Q * is constant (i.e., not a function of X). Noticing that where I n is the identity matrix of order n, and 1 n is the n-component vector of ones, and that J n X = X (i.e., the origin of the space is set at the centroid of the stimulus configuration), we may further rewrite Equation (7) as where it is assumed that ∆ = X X is a diagonal matrix. Let B(X) = A Q * A/n 2 .
Then, for fixed B(X), B(X)X = X∆ (13) is an eigen-equation for B(X). Since τ (X) = tr(Q * ) − n 2 tr(∆ 2 ), minimizing τ for fixed Q * (and so for fixed B(X)) amounts to obtaining the eigenvectors of B(X) corresponding to the r largest eigenvalues. The matrix of (normalized) eigenvectors X is then re-scaled to satisfy Equation (11).
The above procedure has to be applied iteratively, since B(X) is a function of X and has to be updated for a new estimate of X. This is repeated until no significant change occurs in the value of τ . This process is similar to the refactoring method in common factor analysis, where communalities are estimated by iterative refactoring. That is, the matrix of factor loadings is estimated with tentative estimates of communalities by solving an eigen-equation. This yields new estimates of communalities, which are then used for an improved estimate of the matrix of factor loadings, and so on. A modification of the above algorithm to ordinal data is straightforward using the ALS algorithmic framework. Note that B(X) can be further rewritten as Let s ij = (o 2 ij − d 2 ij )/n 2 . Then, the ij-th element of the first term in B(X), denoted as b * ij , is obtained by It is more convenient and efficient to use this expression for calculating B(X).
It turned out that the above algorithm was painfully slow, like the refactoring method in common factor analysis. So much so that I mentioned to Jan that although it looked elegant on surface, it was too slow to be useful in practice. My remark was based on the fact that the algorithm was more elegant than the method used in ALSCAL, which also fitted squared distances, but updated each coordinate of a stimulus point at a time with all other coordinates fixed (which amounted to solving a cubic equation in each step). The ELEGANT algorithm, on the other hand, could update the entire set of stimulus coordinates simultaneously. Jan responded to my remark and suggested, half-jokingly, to call it ELEGANT. The slow convergence of the algorithm may be seen from the expression of B(X) in Equation (15), a major portion of which is XX , so that the new update of X is greatly affected by the previous X.
The ELEGANT algorithm has never been used in practice because of its slow convergence. However, after having worked on so many acceleration techniques for iterative procedures (Loisel and Takane 2011;Takane and Zhang 2009;Takane, Jung, and Hwang 2010), I now think that its slow convergence can be overcome to the extent that it is practically viable. Squared distance scaling is attractive because (a) the squared Euclidean distance function is a simple quadratic function of its parameters (stimulus coordinates), and (b) an extension to ordinal dissimilarity data is straightforward (Kruskal's LS monotonic transformation algorithm can be used on squared distances just as with unsquared distances). Takane (1977) compared some formal characteristics of the ELEGANT algorithm with classical MDS, the C-matrix method (De Leeuw 1977a;Guttman 1968), and the fourth type of quantification method (Hayashi 1951), which was also used as an initialization method in the original C-matrix method. This paper also describes the ELEGANT algorithm in some detail, and it is written in English.

INDISCAL: An indicator method for nonmetric MDS
This is an algorithm for scalar product optimization in nonmetric MDS. Let o 1 , · · · , o K represent K distinct observed dissimilarities in ascending order (i.e., o k < o k for k < k ), where K indicates the total number of distinct values in observed dissimilarities.
where o ij (1 ≤ i, j ≤ n) denotes the observed dissimilarity between stimuli i and j measured on an ordinal scale. The matrix of optimally transformed squared dissimilarity data can then be expressed asD where θ k , the k-th element of θ, is assumed non-negative. The scalar product matrix derived fromD (2) (θ) can then be written as where J is, as before, the centering matrix of order n (i.e., J = I n − 1 n 1 n /n, where I n is the identity matrix of order n and 1 n is the n-element vector of ones), and C k = (−1/2)JB k J. Now the least squares criterion based on the scalar product can be written as This criterion may be minimized by ALS, i.e., by alternately minimizing it with respect to X for fixed θ, and with respect to θ for fixed X subject to the non-negativity restrictions on θ k . The former is particularly simple because it reduces to the eigen-decomposition of the matrix A(θ). The latter is a nonnegative least squares problem (e.g., Lawson and Hanson 1974), which is a kind of quadratic programming problem involving a slightly more complicated procedure than Kruskal's least squares monotonic regression algorithm, but which can be solved, for example, by an active constraint method in a finite number of steps. A conditional minimization scheme is also possible in which the criterion function is minimized first with respect to X conditionally upon θ, which is then unconditionally minimized with respect to θ (De Leeuw 1974). However, the ALS minimization scheme is simple enough, and is expected to work efficiently.
The basic motivation for fitting squared Euclidean distances in the previous section was because monotonicity of unsquared distances and that of squared distances are completely equivalent, and Kruskal's (Kruskal 1964a,b) least squares monotonic transformation algorithm for the former can be applied to the latter without any modification. Minimizing the least squares criterion defined in terms of scalar products (the strain optimization), on the other hand, is attractive because finding the stimulus configuration is straightforward (can be done in closed form), once observed or derived scalar products are given. Optimal monotonic transformations, on the other hand, are more complicated with scalar products. Perhaps for this reason, no fully nonmetric MDS procedure had been developed until the late 1990s (Trosset 1998) based on the scalar product optimization. However, it is remarkable that a long time before 1998, Jan developed such a procedure. It is called INDISCAL, an indicator method of nonmetric MDS. Jan called it INDISCAL, again half-jokingly, as a "parodic" name for INDSCAL. As alluded to earlier, the method was briefly mentioned in Section 3.3 of De Leeuw (1974), but might have never been written up as such. It was cited in Takane (1977) without any substantive description, but is described more fully in Takane (1980). Since Takane (1980) is written in Japanese, INDISCAL has never been widely known in the psychometric community in the world at large.

Notes on DEDICOM
DEDICOM (decomposition into directional components) is a model for square asymmetric tables, proposed by Harshman (1978). The model can be written as where A 0 is an n by n model matrix describing asymmetric relationships among n stimuli, X is an n by r-component (r n) loading matrix, and B is an r by r matrix describing asymmetric relationships among the r components. The matrix X captures the relationships between the r components and the n stimuli. This model attempts to explain asymmetric relationships among n stimuli (A 0 ) by postulating a smaller number of components which assume asymmetric relationships among themselves (B), and the relationships between the components and the stimuli (X). The model given above is not unique. To remove the model indeterminacy, we impose the following restrictions: and In practice, it is rare to find an asymmetric table that can be exactly represented by Model (21) with prescribed rank r, due to error perturbations. It is more common that the above A 0 is only approximately true, and it is necessary to find the model matrix A 0 that best fits to the observed data matrix A. That is, we seek to find X and B of prescribed rank r that minimizes Takane (Takane 1985) developed an iterative algorithm to find an X and B that jointly minimize this criterion. The B that minimizes ψ for given X can be easily obtained by (This corresponds to the conditional minimum of ψ with respect to B given X.) The minimum of ψ with respect to X is then obtained by iterating the following steps until convergence: Step I. Calculate X * = A XX AX + AXX A X = A XB + AXB .
Step II. Apply the Schmidt orthogonalization method to X * to obtain X, and go back to Step I.
The Schmidt orthogonalization process can be formally written as where X is a columnwise orthogonal matrix, and R is an upper triangular matrix. At convergence, R becomes a diagonal matrix. This may be seen by noting that at convergence X X * = B B + BB = R . Since B B + BB is symmetric, and R is upper triangular, it must be diagonal.
I observed that the above algorithm was always monotonically convergent, but could not show theoretically that was indeed the case. After a few months of struggles, I decided to ask several people for help, including Jan and Henk (Kiers). Several months later, I received a response from Jan. Unfortunately, this note seems to have been lost permanently. However, as far as I can remember, it contained two important messages: (i) Let the rank of the matrix AA + A A be called the Harshman rank of a square asymmetric matrix A, and when it holds (i.e., rank(AA + A A) = r), there is an exact solution to Model (21), where A 0 = A. Let represent the compact singular value decomposition (SVD) or the eigen-decomposition of the symmetric matrix AA + A A, where P is the n by r matrix of columnwise orthogonal matrix of singular vectors (or eigenvectors), and D is the r by r positive definite diagonal matrix of singular values (or eigenvalues). Then X = P and B = P AP will exactly solve Equation (21) with the restrictions given in Equations (22) and (23). Furthermore, if the diagonal elements of D are all distinct, X is unique up to reflections and permutations of its columns. Also, note that Model (21) with the restrictions (22) and (23) implies that so that X = P, and ∆ = D. (Although the exact infallible case as above rarely occurs in practice, this result is useful to obtain a good initial estimate of X for iterative solutions.) (ii) A sufficient condition for Takane's algorithm to be monotonically convergent is that the matrix (A ⊗ B) + (A ⊗ B) is positive definite. This implies that Takane's algorithm is generally not monotonically convergent. Note that this condition depends on iterations, since B is a function of the current update of X.
A few weeks later, I received another letter, this time from Henk (Kiers), in response to my inquiry, reporting that he solved my problem completely. He suggested that we wrote a joint paper, which culminated in Kiers, Ten Berge, Takane, and De Leeuw (1990). His letter indicated, consistent with Jan's result, that my algorithm was not always monotonically convergent. His letter also offered a prescription necessary to make my algorithm always monotonically convergent. In essence, it was suggested to modify Step I of Takane's algorithms as follows: Step I'. Calculate X * = AXX A X + A XX AX + 2αX, where α is a scalar not smaller than the largest eigenvalue of the symmetric part of −(A ⊗ B).
The above modification makes Takane's algorithm consistent with majorization algorithms (see De Leeuw 1977aLeeuw , 1988Kiers 1990;Kiers and Ten Berge 1992), another great invention and contribution of Jan to psychometrics/statistics. Note that α introduced above also depends on iterations through B (just as in the sufficient condition for monotonic convergence of Takane's original algorithm). A choice of alpha that does not depend on iterations is given by the largest eigenvalue of the symmetric part of −(A ⊗ A).
The above modification guarantees monotonic convergence of Takane's algorithm, while it may slow down the convergence (when α is positive) of the algorithm. Takane and Zhang (2009) went the opposite direction. They incorporated the minimal polynomial extrapolation method to speed up the convergence at the sacrifice of monotonic convergence.

Concluding remarks
I have had many other memorable contacts with Jan, although I must close this essay soon. I published two more papers with him in the late 1980s and early 1990s (Takane and De Leeuw 1987;Takane et al. 1995). By then, Jan became one of my valuable resource persons, that is, those I solicited some help from whenever I encountered problems that I could not solve myself. (One instance of this was already mentioned above in the case of Takane's algorithm for DEDICOM.) In these papers, I had to get some advice from Jan and Henk (Kiers) about existing literature on the topics, and about the convergence and uniqueness properties of the algorithms I came up with. My joint papers with Jan were predominantly first-authored by me, but this merely reflected the fact that I sometimes needed his help, but not vice versa.
There was one problem I raised to Jan about which I never received his response. He has probably forgotten it by now, but this seems to be a great opportunity to remind him about it. I am still hopeful to get some hint from him. Let me describe the problem briefly. My former graduate student, Heungsun Hwang, and I were developing an MDS model for Burt tables (Takane and Hwang 2000) around year 2000. Let f ij denote the observed joint frequency of the i-th category of item I and the j-th category of item J, and let p ij represent the corresponding probability, for which we postulated the following model: where w i (≥ 0 and i⊂I w i = 1) is the bias parameter for category i, d ij is the Euclidian distance between categories i and j represented as points in a space of dimensionality r, and the summations in the denominator extend over all categories in the two items. In line with the tradition of MDS, the squared Euclidean distance is parameterized by d 2 ij = r a (x ia −x ja ) 2 , where x ia is the coordinate of stimulus i on dimension a. Being consistent with the implicit constraints in multiple correspondence analysis, we required that i⊂I f i x ia = 0 for all items and dimensions (1 ≤ a ≤ r), where f i is the marginal frequency of category i. We derived estimates of covariances among observed f ij 's in off-diagonal blocks of Burt tables, and constructed a generalized LS criterion using the inverse of the estimated covariance matrix as weights, to estimate the bias parameters and the coordinates of the category points. We thought the model was quite plausible, and the estimation procedure quite attractive, yielding efficient estimates. We, however, experienced some anomaly with our Gauss-Newton algorithm. We frequently observed that solutions degenerated with the category points for one item drifting away from the centroid of the category configuration. Although we never observed this phenomenon in the case of only two items (a single two-way contingency table), the frequency of degenerate solutions seemed to pick up as the number of items increased. How could this happen? I am still puzzled by this phenomenon.