PerMallows : An R Package for Mallows and Generalized Mallows Models

In this paper we present the R package PerMallows , which is a complete toolbox to work with permutations, distances and some of the most popular probability models for permutations: Mallows and the Generalized Mallows models. The Mallows model is an exponential location model, considered as analogous to the Gaussian distribution. It is based on the deﬁnition of a distance between permutations. The Generalized Mallows model is its best-known extension. The package includes functions for making inference, sampling and learning such distributions. The distances considered in PerMallows are Kendall’s τ , Cayley, Hamming and Ulam.


Introduction
Permutations are ordered sets of items that arise naturally in many domains, such as genomics (Bader 2011), cryptography, scheduling or computer vision (Ziegler, Christiansen, Kriegman, and Belongie 2012), with that of ranking being the most studied (Burges et al. 2005;Mallows 1957).
Probability models for permutation spaces -or permutation models -appear in the same domains as permutations themselves. The first examples come from the area of social choice in which they are still a lively topic (Caragiannis, Procaccia, and Shah 2013). Some of the hottest topics in the research of permutation models are preference learning (Furnkranz and Hullermeier 2013) and learning to rank (Cohen, Schapire, and Singer 1998), which have earned their own space as subfields of machine learning since their commercial applications have increased exponentially in the last years. more complex models, such as mixtures, which combine several MMs. This is not the first package in the literature to deal with distributions over permutations.
BradleyTerry2 (Turner and Firth 2012), psychotree (Strobl, Wickelmaier, and Zeileis 2011) and prefmod packages (Hatzinger and Dittrich 2012) model and fit preference data in the form of paired comparisons. In the particular case of the distance-based models, there exist two packages. The RMallow (Gregory 2012) package uses an EM algorithm to fit mixtures of MM under the Kendall's τ distance to full or partial rankings, with and without ties as described in Murphy and Martin (2003). On the other hand, the pmr package for probability models for ranking data (Lee and Yu 2015) considers Luce's model (Critchlow et al. 1991), MM and GMM under the Kendall's τ , Spearman's ρ, Spearman's ρ 2 and foot rule distances. The package is aimed at helping in the analysis of preference data in tasks such as visualizing data and computing descriptive statistics. It implements functions for the maximum likelihood estimation for the parameters of the MM and extensions thereof similar to the GMM.
None of the aforementioned packages offer a wide range of functionalities so as to be considered a complete toolbox for reasoning on permutation data. There is no way of generating permutations from a given model or calculating the probability of a permutation under a certain model, for example. Moreover, none of the packages consider the algebraic machinery necessary for the efficient management of functions over permutations spaces, from basic operations (compositions or factorization into disjoint cycles) to complex combinatorial functions (such as counting or generating permutations).
Regarding the distance metrics, the packages in the literature that deal with distance-based models only consider Spearman's ρ, Spearman's ρ 2 , Spearman's foot rule and Kendall's τ , which are the most natural for the preference domain. However, the application of permutations is beyond the preference domain. Cayley is related with the number of swaps and also with the cyclic structure of permutations, Hamming is related with the number of fixed points and Ulam with the longest common subsequence between two permutations. Therefore, Cayley, Hamming and Ulam are more natural in fields such as computer vision, biology, cryptography, matching or card shuffling. PerMallows, aims to be a compilation of functions for working on distance-based probability models MM and GMM under the Kendall's τ , Cayley, Hamming and Ulam distances. The utilities include the following functions: • Functions for dealing with MM and GMM: Learning the parameters from a collection of permutations, sampling permutations from a given distribution, computation of the probability of a permutation, of the expectation and marginal distribution. Moreover, several algorithms for each of the tasks are offered, including approximate and exact algorithms.
• Distance related functions: Compute distances between permutations, randomly generate permutations at a given distance, count the number of permutations at a given distance, etc.
• Operations with permutations: Generation of all permutations of a given number of items, inversion, composition, different operators such as, for example, swapping or transposing items, descriptive statistics, factorization of permutations, etc.
Permutations are highly structured combinatorial objects and permutation models cannot obviate this fact. PerMallows not only tries to take advantage of statistical tools, but also tries to exploit the algebraic nature of permutations in order to provide efficient sampling and learning algorithms. Moreover, MM and GMM strongly depend on the distances for permutations for several reasons. First, there exists no general algorithm to sample or estimate the parameters of a MM or GMM, since each operation depends on the distance considered. Second, the parameter interpretation also differs regarding the distance considered by the model.
Before presenting the use of the package, different aspects included in the paper need to be discussed. A brief discussion is included in Section 2. For a more detailed discussion we refer the interested reader to Irurozki (2014). Section 2.2 details the probability models and examples of real applications of the models. The particularities of the models under each distance are also enumerated, such as the parameter interpretation and the practical use limits of PerMallows. Sections 2.3 and 2.4 introduce several algorithms for learning and sampling respectively. A quick-start guide of PerMallows can be found in Section 3. Section 4 concludes the paper.

Background
In this section we give the algebraic background for the understanding of the functions in the PerMallows package. It is divided into four parts. First, notions on permutations and distance are given. Then, we briefly describe MM and GMM. The last two parts are devoted to the learning and sampling algorithms for the MM and GMM included in PerMallows.

Permutations and metrics
Permutations are bijections of the set of integers {1, . . . , n} onto itself. We will denote permutations with Greek letters, mostly π and σ. In the permutation σ = {2, 4, 1, 3} we will say that item 2 is at position 1 and denote it σ(1) = 2. The permutation that places every item i at position i is called the identity permutation and it is denoted as e = {1, 2, 3, . . . , n}. For an excellent reference on combinatorics of permutations see (Bóna 2004).
Kendall's τ . This is the most popular choice in the ranking domain. The Kendall's τ distance d k (σ, π) counts the number of pairwise disagreements between σ and π, i.e., the number of item pairs that have a relative order in one permutation and a different order in the other. We can equivalently define d k (σ, π) as the number of adjacent swaps to convert σ −1 into π −1 . The maximum value of the Kendall's τ distance between two permutations is n(n − 1)/2. The Kendall's τ distance is sometimes called bubble sort distance because d k (σ) equals the number of adjacent swaps that the bubble sort algorithm performs to order the items in σ increasingly. This definition induces the distance decomposition vector V(σ) = (V 1 (σ), . . . , V n−1 (σ)), such that d k (σ) = n−1 j=1 V j (σ) and where V j (σ) equals the number of times that the bubble sort algorithm swaps item σ(j). It follows that 0 ≤ V j (σ) ≤ n − j for 1 ≤ j ≤ n. Note that V j (σ) is also equal to the number of items smaller than σ(j) in the tail of the permutation, and that it can be expressed as follows: where I(·) denotes the indicator function.
It is worth noticing that there is a bijection between each σ ∈ S n and each possible V(σ) vector. Therefore, when dealing with the Kendall's τ distance we can use the V(σ) vector as an alternative representation of σ.
Generating uniformly at random a permutation σ consistent with a given X(σ) can be done by slightly adapting the well known Fisher-Yates-Knuth shuffler (Irurozki 2014). This algorithm, as well as the conversions from σ to X(σ), is also supported in PerMallows, all of which have time complexity O(n).
Hamming. The Hamming distance d h (σ, π) counts the number of positions that disagree in the two permutations. The maximum value of the Hamming distance between two permutations is, therefore, n. It is worth noticing that there is no pair of permutations σ and π such that d(σ, π) = 1.
The Hamming distance is closely related to the concepts of fixed and unfixed points. A fixed point in σ is a position i where σ(i) = i, while the opposite is an unfixed point. The Hamming distance to the identity, d h (σ), counts the number of unfixed points in σ. This leads to the decomposition vector of the Hamming distance, H(σ) = (H 1 (σ), . . . , H n (σ)): Consequently, d h (σ) = n j=1 H j (σ) (note that every σ ∈ S n has a unique H(σ) but the opposite is not necessarily true). Given σ = {2, 1, 3, 6, 4, 5}, then H(σ) = (110111) and d h (σ) = 5.
The conversion from σ to H(σ) and the generation uniformly at random of σ consistent with a given H(σ) are both supported in PerMallows and have complexity O(n).
Ulam. The Ulam distance d u (σ, π) counts the length of the complement of the longest common subsequence (LCS) in σ and π, i.e., the number of items which are not part of the LCS. The maximum value of the Ulam distance between two permutations is n − 1. If the reference permutation is the identity, d u (σ) equals n minus the length of the longest increasing subsequence (LIS).
The classical example to illustrate the Ulam distance d u (σ, π) considers a shelf of books in the order specified by σ (Diaconis 1988). The objective is to order the books as specified by π with the minimum possible number of movements, where a movement consists of taking a book and inserting it in another position (delete-insert). The minimum number of movements is exactly d u (σ, π).
The computation of the Ulam distance between two given permutations is included in PerMallows. It has complexity O(n log l) where l is the length of the longest increasing subsequence.

Counting and generating permutations
The random generation of permutations is a problem of interest in many disciplines. The uniformly at random generation, for example, can be efficiently carried out with the well known Fisher-Yates shuffle (also known as Knuth shuffle). More restrictive version of the problem are the following related questions, which are addressed in the following lines: • Given the number of items n and a distance d, how many permutations are there at distance d from the identity, e?
• Given the number of items n and a distance d, generate uniformly at random a permutation at the given distance from e.
Due to their right invariant property, the number of permutations at distance d from the identity, denoted S(n, d), is the same as the number of permutations at distance d from any σ = e. There is no closed expression for S(n, d) for any of the metrics in this paper. Fortunately, these sequences appear in the On-Line Encyclopedia of Integer Sequences (OEIS) (Sloane 2009) for every metric considered in PerMallows for different values of n and d, as well as some recursions to obtain them.
Kendall's τ . The number of permutations at every possible Kendall's τ distance, S k (n, d) can be found at the OEIS (Sloane 2009, sequence A008302). Its computational cost is O(n 3 ). The uniformly at random generation of a permutation at a given Kendall's τ distance, if the values for S k (n, d) are given, can be carried out in O(n 2 ) (Irurozki 2014, page 38).
Cayley. The number of permutations at a given Cayley distance are given by Stirling numbers of the first kind (Sloane 2009, sequence A008275) and PerMallows can compute it in O(n 2 ). The uniformly at random generation can be done by adapting the recurrence used in the counting process, as shown in (Irurozki 2014, page 54), running in time O(n).
Hamming. Counting and generating permutations at a given Hamming distance, which is related to the notion of derangements, are computationally efficient operations that PerMallows can compute in O(n) (Irurozki 2014, page 17), (Sloane 2009, sequence A000166).
Ulam. Both processes of counting (Sloane 2009, sequence A126065) and generating permutations at a given Ulam distance are related with the Ferrers diagrams (FD) and the Standard Young Tableaux (SYT). The link between SYT and permutations is given by the celebrated Robinson-Schensted-Knuth (RSK) correspondence. A self contained explanation can be found in (Irurozki 2014, page 110). The complexity of the counting and generating processes is equivalent to the complexity of enumerating the partitions of n, which grows sub-exponentially with n (Hardy and Ramanujan 1918).
Counting permutations at a given distance is a crucial operation, not only for the random generation of permutations, but also for the learning and sampling processes. Among the distances considered, Ulam is the most demanding from a computational perspective. PerMallows includes functions to preprocessing a problem in order to speed these operations up, as shown in page 25.

Probability distributions over permutations
This section introduces the Mallows model (Mallows 1957) and its most popular extension, the generalized Mallows model (Fligner and Verducci 1986).

Mallows model
The MM was one of first probability models proposed for rankings or permutations. However, it is still one of the most used models in both theoretical and application papers. It is an exponential model defined by a central permutation σ 0 and a spread (or dispersion) parameter θ. When θ > 0, σ 0 is the mode of the distribution, i.e., the permutation with the highest probability. The probability of any other permutation decays exponentially as its distance to the central permutation increases. The spread parameter controls how fast this fall happens.
The probability of a permutation under this model can be expressed as follows: where ψ(θ) is the normalization constant. The distance can be measured in many ways, including Kendall's τ , Cayley, Hamming and Ulam distances. The MM under the Kendall's τ distance is also known in the literature as the Mallows φ model (Critchlow et al. 1991).
Regardless of the distance, the central permutation is the location parameter. When the dispersion parameter θ is greater than 0, then σ 0 is the mode and, as θ increases the distribution gets sharper. On the other hand, with θ = 0 we obtain the uniform distribution and when θ < 0 then σ 0 is the anti-mode, i.e., the permutation with the lowest probability.
The MM is very attractive for researchers and practitioners because of its simple definition and because it is a realistic model in several domains. Its most remarkable drawback is the computational difficulty of working with it since, in general, there are no closed forms for the normalization constant. In addition, there is neither a general approach to sample and learn from the model, nor to express the expectation of the distance. Moreover, these operations strongly depend on the distance for permutations considered by the model.

Generalized Mallows model
This extension of the MM tries to break the restriction of every permutation at the same distance having the same probability value. Instead of one single spread parameter, the GMM requires the definition of k parameters θ j , each affecting a particular position of the permutation. This allows modelling a distribution with more emphasis on the consensus of certain positions of the permutation while having more uncertainty about some others.
An example of such a situation is as follows. Let σ 0 and π 1 be two rankings that differ only in the first and second ranked items, and let σ 0 and π 2 differ only in the last two ranked items. A MM centred around σ 0 will assign equal probability to π 1 and π 2 since: However, it is reasonable to think that since σ 0 and π 2 differ in the last positions, their disagreement is not as notorious as the disagreement of σ 0 and π 1 . One could expect that p(π 1 ) < p(π 2 ). This situation can be modelled with the GMM by setting the dispersion parameters, for example, as follows: Not every distance that can be used in the MM can also be used in GMM, since GMM requires the distance to be decomposed in k terms as follows: For any distance that decomposes as the above equation, the GMM is defined as follows: where θ = (θ 1 , . . . , θ k ) and ψ(θ) is the normalization constant. PerMallows considers the GMM for the Kendall's τ , Cayley and Hamming distances. The GMM under the Kendall's τ distance is known in the literature as Mallows φ component model.
It is worth noticing that if the S j (σ) terms are independent for a uniformly at random chosen permutation, then the Mallows distribution is factorizable.
The following lines briefly introduce the particularities of the models under each of the distances considered. We have claimed that the decision on which distance to use depends strongly on the domain. As a general rule, we can state that a MM or GMM based on a particular distance can be used to model data in a domain if that distance is a natural measure of dissimilarity in that domain, proof of which can be found in Caragiannis et al. (2013).
Under the MM, the dispersion parameter θ is a measure of the consensus in the population, i.e., the larger θ, the closer the permutations are. Under the GMM, the dispersion parameters θ j have a different interpretation depending on the distance that is being considered, which will be detailed in each case.
It is worth noticing that the dispersion parameters are not universal measures of spread (Schader 1991). This means that if we consider any of these models under two different distances, the dispersion parameters are not a comparable measure of uniformity since, for different metrics, the dispersion parameters are in different scales.
Kendall's τ . The dispersion parameter θ in the GMM is a (n − 1)-dimensional vector. The distribution and the normalization constant can be factorized. They, as well as the expected value of the Kendall's τ distance and V j under the MM and GMM, can be evaluated efficiently (Fligner and Verducci 1986;Irurozki 2014).
Let σ be a permutation sampled from a GMM under the Kendall's τ distance, with parameters θ and σ 0 , where σ 0 (j) = i. The dispersion parameter θ j is related to position j in σ in the sense that the larger θ j , the larger the probability of σ(j) ≤ i. In the ranking domain, when permutations are interpreted as rankings this means that item j is ranked in the first i positions with high probability.
Cayley. The dispersion parameter θ in the GMM is a (n − 1)-dimensional vector. The distribution, the normalization constant, the expectation and the probability of each term X j (σ) are factorizable and computationally cheap. Although the factorization was introduced in Fligner and Verducci (1986), it includes some typos, the interested reader can find the corrected version in Irurozki (2014).
Let σ be a permutation from a GMM under the Cayley distance, with parameters θ and σ 0 , where σ 0 (j) = i. The larger θ j , the larger the probability that σ(i) ≤ j. Regarding the position where item j lies in σ, we can state that the larger θ j , the larger the probability that Hamming. The dispersion parameter θ in the GMM is an n-dimensional vector. Contrary to the previous cases, the MM and GMM are not factorizable when the Hamming distance is considered. However, its symmetry yields computationally efficient expressions for the normalization constant, the expectation and the probability of each term H j (σ) (Irurozki 2014).
Let σ be a permutation from a GMM under the Hamming distance, with parameters θ and σ 0 , where σ 0 (j) = i. The larger θ j , the larger the probability that σ(i) = j.
Ulam. The Ulam distance has no natural decomposition of the form of Equation 4 and, thus, the GMM cannot be defined under this metric. Moreover, the MM can not be factorized, as far as the authors know. However, an efficient way of computing ψ(θ) and the expected value of the distance can be found in Irurozki (2014). It relies on the computation of the number of permutations of n items at each possible Ulam distance, S u (n, d), which means that the complexity of computing ψ(θ) or E[d u ] is equivalent to that of computing S u (n, d).

Model fitting
In this section we deal with the maximum likelihood estimation (MLE) of the parameters of the distribution given a sample of m i.i.d. permutations {σ 1 , σ 2 , . . . , σ m }, i.e., the problem of learning a model from a data sample. The maximum likelihood estimators are different for MM and GMM. Moreover, their expression differs regarding the distance on the permutations considered. In this way, we will describe the maximum likelihood estimation for each model and distance separately. Finally, we will introduce the algorithms provided in PerMallows.

Mallows model
For the MM, the maximum likelihood estimation of the parameters can be carried out in two independent stages, which are as follows.
1. The first one deals with the MLE for the central permutation,σ 0 . Estimatingσ 0 of a sample under a MM which considers the Kendall's τ (respectively Cayley, Hamming and Ulam) distance is equivalent to finding the permutation that minimizes the sum of the Kendall's τ (respectively Cayley, Hamming and Ulam) distance to the permutations in the sample. For the distances considered here, only the case of Hamming has polynomial complexity, as far as the authors are aware.
2. The second one estimates the MLE for the dispersion parameters for the givenσ 0 . The expression forθ givenσ 0 has no closed expression for any of the distances considered. However, since the first derivative of the log-likelihood is monotonic, it can be solved efficiently with numerical methods such as Newton-Raphson.
The complete statements of the problems can be found in Mandhani and Meila (2009) and Irurozki (2014).

Generalized Mallows model
For the GMM, the exact maximum likelihood estimation of the parameters can not be carried out in two stages and, thus, all the parameters must be simultaneously estimated. The MLE for the parameters of a GMM under the Kendall's τ , Cayley and Hamming distances can be found in Mandhani and Meila (2009);Irurozki (2014).

Learning algorithms under each distance
In this section we briefly describe the learning algorithms implemented in PerMallows for learning MM and GMM. Since the exact estimation of the parameters, as described in the previous section, can be computationally intractable under certain conditions, approximate algorithms are also included for most of the cases.
The crucial speed-up stage in the approximate algorithms for both MM or GMM is the approximation of the central permutation. These algorithms divide the learning process in two stages, first approximate the central permutationσ 0 and second estimate the dispersion parameters for the givenσ 0 .
Kendall's τ . For the Kendall's τ distance, PerMallows includes the approximate estimation of MM and GMM. The estimatedσ 0 is the result of the Borda algorithm (Borda 1781). This well-known algorithm is fast (O(mn)) as well as a good approximation (factor 5 approximation of the optimal ranking (Coppersmith, Fleischer, and Rurda 2010) and also an asymptotically optimal estimator of the real central permutation (Fligner and Verducci 1986)).
The interested reader can find in Ali and Meila (2012) the description and performance analysis of several exact and approximate algorithms for the MLE for the consensus permutation under the MM. The referenced paper is an extensive work which describes and compares 104 algorithms and combinations thereof, including Borda and Dwork, Kumar, Naor, and Sivakumar (2001). The problem of the exact MLE for the parameters of a GMM is addressed in Mandhani and Meila (2009).
Cayley. The PerMallows package includes exact and heuristic algorithms for the MLE for the parameters of both MM and GMM. The exact algorithm gets faster as the consensus of the sample increases. The approximate algorithm uses a heuristic step to create an initial solution which is further improved using a Variable Neighborhood Search (VNS) to iteratively improve it (Irurozki 2014).
Hamming. There exists an exact polynomial time algorithm to fit the maximum likelihood parameters of the MM under the Hamming distance which is included in PerMallows, so an approximate algorithm is not needed.
For the GMM under the Hamming distance, PerMallows also includes a polynomial time algorithm. Although not an exact algorithm, it has been shown to be a good approximation in theory (it is an asymptotically unbiased estimator of the real solution) and in practice. An exhaustive review of MM and GMM under the Hamming distance can be found in Irurozki (2014).
Ulam. PerMallows supports the approximate learning of parameters of the MM under the Ulam distance.
The approximate consensus given a collection of permutations is approached as a set-median problem. The set-median permutation is the permutation in the sample that minimizes the sum of the distances to the rest of the permutations in the sample. Note that this problem can be solved in time O(m 2 nlog n) for samples of m permutations.
The expression of the MLE for the dispersion parameter given the consensus permutation and further details are given in Irurozki (2014). The package includes the possibility of precomputing the number of permutations, which can be used to efficiently deal with the Ulam distance for large permutations of more than 80 items, see page 25.

Sampling
PerMallows implements three different algorithms for generating permutations from a given distribution: The Distances and the multistage algorithms, which are exact, and the Gibbs algorithm (a traditional approximate algorithm).

Distances sampling algorithm
The Distances sampling algorithm can generate samples from the MM under the Kendall's τ , Cayley, Hamming and Ulam metrics. It is based on the facts that, first, under the MM every permutation at the same distance from the central permutation has the same probability, and secondly, the numbers of permutations at each distance, denoted as S(n, d) can be computed (see Section 2.1.1). Since our first claim does not hold for the GMM, it follows that this sampler can be used only to sample from the MM. Consequently, the probability of obtaining a permutation at distance d under the MM is as follows: The process of simulating from the distribution can be carried out in three stages: 1. Randomly select the distance at which the permutation will lie using Equation 5.
2. Pick, uniformly at random, a permutation π at distance d from the identity permutation e, i.e., d(π) = d. This step relies on the u.a.r. generation of a permutation at a given distance discussed in Section 2.1.1.
3. In the case that σ 0 = e, then π is the output. Otherwise, the invariance property lets The complexity of the algorithm depends on the considered distance. See Section 2.1.1 for the complexity of counting and generating permutations.
We can conclude that this is a quick as well as precise algorithm for the simulation of the MM. However, it has some limitations. First, it does not work with the GMM. Also, it is infeasible to keep S(n, d) for values of n > 150 (see Table 1). In these situations we can use the multistage sampling algorithm.

Multistage sampling algorithm
The multistage sampling algorithm generates samples from the MM and GMM under the Kendall's τ , Cayley and Hamming metrics. This algorithm divides the sampling process into three stages, namely: 1. Randomly generate a distance decomposition vector, S(π), for the Kendall's τ , Cayley or Hamming distances. Details can be found in (Irurozki 2014, pages 41, 59 and 97 respectively).
2. Generate a permutation π uniformly at random consistent with the given distance decomposition vector S(π).

Gibbs sampling algorithm
The Gibbs sampler is a Markov chain Monte Carlo algorithm based on sampling a Markov chain whose stationary distribution is the distribution of interest. Therefore, we have adapted this algorithm to generate samples from approximate distributions of both MM and GMM.
The Gibbs sampler generates permutations by moving from one solution to another permutation which is close to the first one, where the definition of close is related to the distance for permutations considered in the model. The initial samples are discarded (burn-in period) until the Markov chain approaches its stationary distribution, and so samples from the chain are samples from the distribution of interest. Then, the above process is repeated until the algorithm generates a given number of permutations.
The Gibbs algorithm has complexity O(n) for every distance considered with the exception of the MM under the Ulam distance, for which the complexity is O(n log n). Therefore, it is in general the fastest algorithm. For those users tempted by this theoretical nice time performance, we should emphasize the fact that this an approximate sampling algorithm. Further details can be found in Irurozki (2014)

Summary
PerMallows provides a complete toolbox for working with permutations, the Mallows and the generalized Mallows models. These probability models need the definition of a metric for permutations and PerMallows considers four different distances: Kendall's τ , Cayley, Hamming and Ulam. The package includes three sampling algorithms and two learning algorithms. However, not every algorithm can be applied to the models under every metric. As a summary, we have included in Table 1 the applicability of the algorithms for each metric as well as the maximum number permutation length that each function can handle.

The PerMallows package
In this section we show how to use the PerMallows package, from basic operations such as measuring distances, to more complex task of learning and sampling. Section 3.2 illustrates how to deal with the MM and GMM. For the sake of reproducibility, set first the random seed.

Permutations
Generation The most basic function consists in creating permutations, which are vectors containing the first n natural numbers and where each item appears once and once only. They can be defined by hand as follows: R> sigma <-c(1, 5, 6, 4, 2, 3) R> sigma [1] 1 5 6 4 2 3 The validity of a vector as a permutation can be checked with the function is.permutation.

[1] FALSE
The identity permutation is that which maps every item i to position i. It can be created with the identity.permutation function. The number of items in the permutations, n, is denoted perm.length in PerMallows.
R> runif.permutation(n = 2, perm.length = 6) [ The generation of the set of every possible permutation of n items is carried out with the permutations.of function. Recall that the number of permutations of n items increases factorially with n and it is, thus, computationally expensive to generate every permutation of n ≥ 10. By default, the alert argument is set to TRUE. When alert is TRUE and n is greater than 9, an alert message is shown.
R> permutations.of(perm.length = 3, alert = FALSE) The collection of permutations generated is stored as a 2-dimensional matrix. It is also possible to read such a matrix from disk using the function read.perms, which also checks if every row is a valid permutation. R> path = system.file("test.txt", package = "PerMallows") R> sample = read.perms(path) Together with the PerMallows package, we provide some small datasets that will be used as running examples throughout this reference manual.
Another way of generating permutations is by ranking the ratings of a set of items. Suppose we have the results time elapsed by five runners in three different races. An example of such a file is given in data.order. After reading the file, we can get the ranks of each student with the function order.ratings.
R> compose(perm1 = sigma, perm2 = pi) [1] 6 2 1 5 3 4 R> compose(perm1 = pi, perm2 = sigma) [1] 3 6 4 2 5 1 Our implementation of the composition allows one of the arguments (first or second, but not both at the same time) to be a collection of permutations. In this case, every permutation in the sample is composed with the permutation in the other argument, resulting in a new collection of permutations.
R> tau <-c(2, 1, 3, 4) R> data("perm.sample.small") R> compose(perm1 = perm.sample.small, perm2 = tau) V2 V1 V3 V4 [1,] 3 1 2 4 [2,] 1 2 3 4 [3,] 4 1 3 2 [4,] 2 1 4 3 [5,] 3 2 4 1 A useful summary of a sample is given by the number of permutations in the sample in which item i appears at position j. This is usually denoted as the frequency matrix or first order marginal matrix. The current package supports it via the freq.matrix function. We will illustrate the freq.matrix function using the well known APA dataset, a version of which is included in this package. This dataset has been largely used in the literature. In particular, one can find in Diaconis (1989) a spectral analysis that takes into account the first order marginal of the dataset.
The American Psychological Association (APA) dataset includes 15449 ballots of the election for the president in 1980 (Diaconis 1989). Each voter ranked at least one of the five candidates. Along with this package we distribute the 5738 ballots that ranked all the five candidates under the name of data.apa. Its marginal matrix is computed as follows: R> data("data.apa") R> freq.matrix(perm = data.apa) Under the ranking interpretation, σ(i) = j denotes that item i is ranked at position j. It follows that the first column counts the proportion of the votes in which each candidate was ranked as the favorite. We can see that the third candidate was chosen as the best alternative by the majority of the voters and thus won the election.
Another basic operation for permutations is inversion. The inverse of a permutation can be obtained using the function inverse.perm.

R> inverse.perm(perm = sigma)
[1] 1 5 6 4 2 3 The argument perm can be a single permutation or a collection of permutations. It is worth noting that the inverse of the inverse of any permutation is itself.
R> inversion(perm = identity.permutation(6), i = 1) [1] 2 1 3 4 5 6 Distances We will now show how to deal with functions related with the distances between permutations included in this package. These functions include the argument dist.name which is one of the following: kendall, cayley, hamming or ulam. They can also be denoted by their initial letter: k, c, h and u.
The distance function computes the distance between perm1 and perm2 for a given metric.
As we have seen, the Kendall's-τ , Cayley and Hamming distances from a permutation to the identity can be decomposed in a vector. Clearly, this decomposition is different regarding the metric in question. In particular, the decomposition of the Kendall's-τ distance is a vector of n−1 integer terms, for the Cayley distance it is a binary vector of n − 1 terms and for the Hamming distance it is a binary vector of n terms. These decompositions are computed by perm2decomp:

R> sigma
[1] 1 5 6 4 2 3 R> v.vector <-perm2decomp(perm = sigma, dist.name = "k") R> v.vector Given that there are possibly many longest increasing subsequences in a permutation, there is no decomposition of the Ulam distance. The possible values for the dist.name are therefore, kendall, cayley and hamming, being the default value kendall. The PerMallows package can also perform the inverse operation, that is, given a decomposition vector and a metric, obtain a permutation consistent with the vector, decomp2perm.
Recall that for Cayley and Hamming there are possibly many permutations with a particular decomposition. In these situations, the function recovers uniformly at random one of the permutations consistent with the decomposition vector.

R> cycles <-perm2cycles(perm = sigma)
The cycle2str function can be used in order to display the cycles in a user-friendly way.

R> cycles2perm(cycles = cycles)
[1] 1 5 6 4 2 3 PerMallows also includes a function to count the number of permutations of perm.length items at distance dist.value for a given distance. R> count.perms(perm.length = 6, dist.value = 2, dist.name = "ulam") [1] 181 The Cayley distance d is related to the number of cycles c by the following relation: n = c + d. Therefore, the number of permutations with perm.length items and num.cycles cycles can be obtained as follows.
R> num.cycles <-1 R> len <-6 R> count.perms(perm.length = len, dist.value = len -num.cycles, + dist.name = "c") [1] 120 The notions of fixed points and derangements are related to the Hamming distance. A fixed point is a position such that σ(i) = i. A derangement is a permutation with no fixed point and its Hamming distance to the identity equals n. In PerMallows the number of derangements of a permutation of perm.length items can be obtained as follows.
R> count.perms(perm.length = len, dist.value = len, dist.name = "hamming") [1] 265 The generation of random permutations at a prescribed distance is supported by the rdist function, which generates n permutations of perm.length items at distance dist.value for a particular metric.

Distributions on permutations
In this section we show how to deal with the Mallows and Generalized Mallows models. We show functions for making inference, learning and sampling both models.
Bear in mind that MM can be used with every metric for permutation considered in this manuscript (Kendall's-τ , Cayley, Hamming and Ulam), while the GMM can only be used with Kendall's-τ , Cayley and Hamming. Remember also that, for distributions on permutations of n items, the dispersion parameter vector θ has n-1 terms when the distance is either Kendall's-τ or Cayley, and n when the distance is Hamming.
Parameter fitting The estimation of the parameters of the MM and GMM is carried out with separate functions, lmm and lgmm respectively. The following lines illustrate how to fit the parameters of the APA dataset which has been used in previous examples. The most natural distance for ranking data is the Kendall's-τ and, consequently, the distance-based probability models for voting data are based on the Kendall's-τ . The MM, which by default considers the Kendall's-τ distance, can be used to model the APA dataset as follows: R> lmm(data.apa) The mode of the distribution σ 0 is the average ranking, i.e., the ranking that minimizes the sum of the distances to the rankings in the sample. The dispersion parameter is a degree of the consensus of the population. Recall that the dispersion parameters are only comparable for any two models that consider the same distance, as explained in Section 2.2.
The Hamming distance, which is related to fixed points, is the natural metric for measuring matchings. In order to fit the parameters of a GMM under the Hamming distance, the dist.name option has to be used. Note that the central permutation is given by the identity. Taking into consideration the dispersion parameters, which are a measure of spread, we can analyze how strong the consensus is. For example, the reader can note that θ 1 is the largest spread parameter. This means that the consensus at the first position is large in comparison with the rest and, thus, a large number of permutations in the sample will fix position 1. On the contrary, θ 3 is the smallest, which means that the consensus in this position is the weakest. For a more detailed description of the parameter interpretation see Section 2.2.
The estimations performed so far use approximate algorithms. In order to use an exact estimation algorithm, we can use the estimation argument, which can be approx (by default) for the approximate learning and exact for the exhaustive one.
R> my.mm <-lmm(data = perm.sample.med, dist.name = "cayley", + estimation = "exact") R> my.gmm <-lgmm(data = perm.sample.med, dist.name = "cayley", + estimation = "approx") R> my.mm In some situations, one can have an intuition for the value of the mode of the distribution. It is possible to start the search by an initial guess with the argument sigma_0_ini. This can speed the computation up.
Regarding the inference operators, PerMallows includes functions to compute the expectation of the distance (resp. its decomposition vector) under the MM (resp. GMM). The function expectation.mm computes the expectation of the distance under an MM of a given dispersion parameter and number of items, while expectation.mm computes the expectation of the distance decomposition vector given a multidimensional dispersion parameter.
R> expectation.mm(theta = 3, perm.length = 9, dist.name = "ulam") Finally, PerMallows includes the computation of marginal distribution for both MM and GMM under the Hamming distance. The set of fixed and unfixed points is represented in the distance decomposition vector, so H j (σ) = 0 means that j is a fixed point, H j (σ) = 1 that j is an unfixed point and NA means that j is unknown. The following command computes the marginal distribution of those permutations having fixed points at positions 2 and 4 and unfixed points at position 1.
R> marginal(h = c(1, 0, NA, 0, NA), theta = c(1.1, 2, 1, 0.2, 0.4)) [1] 0.0808545 Handling large permutations under the Ulam distance The most computationally expensive version of any function is usually that concerning the Ulam distance. PerMallows can handle MM under the Ulam distance with permutations of 80 items perfectly in any regular personal computer. For larger permutation sizes, PerMallows offers the possibility of preprocessing a problem, so the operations -such as counting and generating permutations, learning, sampling-are performed very fast. This preprocess consists of generating files relative to each of the partitions of perm.length.
In the case where one expects to work repeatedly with a model under the Ulam distance of a particular permutation length larger than 80 (independently of the value of θ), it is a good idea to run the preprocess. The preprocess is called with the following function: R> generate.aux.files(perm.length = 6)

[[1]] [1] 6
In order to use the preprocessed information, it is only necessary to use the disk=TRUE argument to the above defined functions. Some examples are as follows.

Conclusions
This paper describes an R package for dealing with probability distributions over permutations spaces, PerMallows. The models used are the Mallows (MM) and generalized Mallows (GMM) models. Both models require a distance for permutations; PerMallows considers the Kendall's τ , Cayley, Hamming and Ulam metrics.
The PerMallows package is aimed at being a compilation of resources for working on MM and GMM. It provides functions for the exact and approximate estimation of the parameters of a collections of permutations, for simulating from a given distribution or calculating the density function.
Efficient algorithms for reasoning on permutation spaces can not be given without taking into consideration the particular nature of permutations. Therefore, the core of the PerMallows package consists of several functions and operators for permutations such as the factorization of permutations, the random generation of permutations, counting the number of permutations at a given distance, etc.
We expect the PerMallows package to be helpful to every kind of user, from the novice in the field of permutations and/or probability models for permutation spaces to the advanced users.