Directional Statistics and Filtering Using libDirectional

In this paper, we present libDirectional, a MATLAB library for directional statistics and directional estimation. It supports a variety of commonly used distributions on the unit circle, such as the von Mises, wrapped normal, and wrapped Cauchy distributions. Furthermore, various distributions on higher-dimensional manifolds such as the unit hypersphere and the hypertorus are available. Based on these distributions, several recursive filtering algorithms in libDirectional allow estimation on these manifolds. The functionality is implemented in a clear, well-documented, and object-oriented structure that is both easy to use and easy to extend.


Introduction
Directional statistics is a subfield of statistics that deals with quantities defined on manifolds such as the unit circle or the unit hypersphere. Originally mostly developed with geoscientific applications in mind (Mardia 1981;Bingham 1974;Gaile and Burt 1980), directional statistics has gained widespread interest in various areas during the past decades, for example in biology (Batschelet 1981;Mardia et al. 2007), robotics (Glover and Kaelbling 2014;Feiten et al. 2013;Markovic et al. 2014), machine learning (Banerjee et al. 2005;Gopal and Yang 2014;Diethe et al. 2015), aerospace (Horwood and Poore 2014;Darling and DeMars 2015;, and signal processing (Traa and Smaragdis 2013;Azmani et al. 2009;Drude et al. 2014). A good introduction to the topic can, for example, be found in the book by Mardia and Jupp (1999).
There are a number of software packages that implement methods stemming from directional statistics (see Section 2). While some of these packages provide good implementations of certain algorithms, most of them are limited to few or just a single probability distribution. Also, usually only a single type of manifold is considered. Moreover, most libraries do not include the ability to perform recursive filtering. To remedy these deficiencies, we present a new library called libDirectional.
libDirectional is a library written in MATLAB, a very popular programming language in the engineering community. A few functions are written in C or C++ for performance reasons but they can still be conveniently called from MATLAB. The design of the library follows an object-oriented approach, which makes it user-friendly and easily extensible. As the code is thoroughly documented, the library is simple to use, modify, and extend.
We designed libDirectional with several goals in mind. It is intended to allow the user to easily and quickly experiment with different directional probability densities and filters. Thus, we think it is a valuable tool not only for learning but also for teaching directional statistics. Furthermore, libDirectional allows rapid prototyping of directional algorithms for a variety of applications. Finally, one of the important goals of the library is to facilitate an easy comparison of different algorithms, for example the quantitative evaluation of a variety of filters.
While it is almost impossible to ensure that any non-trivial software is completely free of bugs, we put a strong emphasis on correctness in the development of libDirectional. In particular, we implemented a large number of unit tests that can be used to automatically test most of the implemented features and serve as additional usage examples. Aside from the unit tests, we include a lot of assertions (Hoare 2003) in the code to reduce the risk of problems, for example inadvertently calling certain functions with invalid parameters such as a vector of incorrect dimension or a covariance matrix lacking symmetry or positive definiteness. Even though these assertions introduce some overhead, we decided that early detection of errors and ease of use are more important than speed for libDirectional. In case the code is used in a real-time environment where speed is critical, it is still possible to remove certain checks to reduce this overhead.

Related work
Over the course of the past two decades, a number of software packages for directional statistics have been developed and published. In this section, we give an overview of the most significant packages.
A lot of the software developed for directional statistics only considers the circular case. An early example is CIRCSTAT, a collection of Stata programs for circular statistics developed by Cox (1998). Later, the well-known book on circular statistics by Jammalamadaka and Sengupta (2001) was released that includes a floppy disk containing CircStats, a library written in S-PLUS by Ulric Lund. The package CircStats was later ported to R by Agostinelli (2012). An enhanced version of this library was subsequently published under the name circular by Lund and Agostinelli (2013). This package is discussed in more detail in the book by Pewsey et al. (2013). Other R libraries such as isocir (Barragán et al. 2013), a package for isotonic inference for circular data, and NPCirc (Oliveira et al. 2014), a package implementing nonparametric circular regression, were built on top of circular. It should be noted that NPCirc also includes the ability to perform circular-circular regression (on the torus) and circular-linear regression (on the cylinder). There are also some packages for other programming languages. A MATLAB toolbox called CircStat (not to be confused with the aforementioned packages CIRCSTAT and CircStats) was published by Berens (2009). As MATLAB is very popular in the engineering community, we have chosen this language for libDirectional as well. There have also been articles with associated code for circular statistics in C++ (Krogan 2011) and Fortran (Allinger 2013). Furthermore, there is a closedsource commercial software called Oriana for circular statistics by Kovach Computing Services (KCS) (2011).
While there is quite a lot of software available for the circular case, there are only few packages that deal with hyperspherical data. An early example is SPAK (Leong and Carlile 1998), a package written in MATLAB that deals with Kent distributions (Kent 1982) and offers quite limited functionality. A fairly comprehensive library for the Bingham distribution by the name of libBingham was published by Glover (2013). It is written in both MATLAB and C, and can be used from either language. An R package called movMF that deals with mixtures of von Mises-Fisher distributions was later published by Hornik and Grün (2014).
Some software for handling orientations has also been published. The Bingham-based library libBingham mentioned above is capable of handling rotations using a quaternion representation. Moreover, there is an R package called orientlib by Murdoch (2003) and another R package called rotations by Stanfill et al. (2014).
Although the software listed above is very useful for a variety of problems and has successfully been used by many scientists, there are some deficiencies that we seek to address in libDirectional. Most state-of-the-art software packages are limited to just a single manifold (most frequently the circle) and in some cases to just one particular probability distribution. While this may be fine for scientists only interested in a particular manifold or a particular distribution, in many applications, data on multiple different manifolds is to be considered and more than one probability distribution is of interest. Therefore, we implemented a number of common distributions defined on several manifolds in a unified manner in libDirectional. The second issue with most existing software is that only very few packages (e.g., libBingham) contain the functionality necessary for recursive estimation. As there is a significant demand for recursive filtering in many applications, e.g., in robotics, autonomous vehicles, aeronautics, etc., we provide several recursive filtering algorithms in libDirectional.

Probability distributions
In this section, we introduce the probability distributions implemented in libDirectional. These distributions can be classified according to the manifold on which they are defined. First, we consider distributions on the unit circle, then probability distributions on the torus, the unit hypersphere, and circular-linear spaces.

Circle
In the following, we give an overview of the distributions defined on the unit circle that are implemented in libDirectional. In general, we parameterize the unit circle as the half-open interval [0, 2π) while keeping the topology of the unit circle in mind.
There are several common techniques for deriving circular probability distributions. A widely used method is called wrapping. We start with a real random variable x ∼ f (·) distributed according to some probability distribution f (·) on R. Now, we consider x mod 2π, which has the wrapped density This concept has been applied to a number of common distributions, resulting in the wrapped normal (WN), wrapped Cauchy (WC), wrapped exponential (WE), and wrapped Laplace (WL) distributions (Jammalamadaka and Kozubowski 2004;Mardia and Jupp 1999, Sec. 3.5.7). In some cases, the infinite sum in (1) can be simplified to a closed-form expression (e.g., for the wrapped Cauchy distribution). If a simplification is not possible, it is usually sufficient to consider a small finite number of terms of the series, see Kurz et al. (2014c).
Another common concept consists in restricting a linear distribution on R 2 to the unit circle. For example, restricting a two-dimensional Gaussian distribution N (x; µ, C) with µ = 1 and C = κ · 1 0 0 1 to the unit circle, i.e., x = 1, yields an (unnormalized) von Mises (VM) distribution (von Mises 1918). The von Mises distribution has been further generalized by Gatto and Jammalamadaka (2007).
Of course, it is also possible to define distributions directly on the unit circle. For example, the circular uniform distribution and distributions based on Fourier series (Willsky 1974) belong to this category. Since nontrivial conditions have to be ensured for a Fourier series to be nonnegative (Fernández-Durán 2007), we also implemented the option to approximate the square root of the probability density function (pdf) as a Fourier series. By approximating the square root, the pdf values obtained by squaring are always nonnegative and therefore valid according to Pfaff et al. (2015Pfaff et al. ( , 2016b. The square root form is used by default but its complexity is hidden from the user. The class FourierDistribution only requires one additional parameter, the number of coefficients, and can be used like any other distribution in libDirectional. Furthermore, we provide a class named CircularMixture, which allows considering mixtures of arbitrary circular distributions, for example von Mises mixtures such as used by Markovic and Petrovic (2012). To represent distributions with a non-standard density (e.g., marginal or conditional densities of certain higher-dimensional distributions), we also offer the class CustomCircularDistribution.
Aside from the continuous circular distributions, we also consider a discrete circular distribution, which can be thought of as a set of weighted samples. In line with the concept of a Dirac mixture on R n (Hanebeck et al. 2009), we refer to this distribution as a wrapped Dirac mixture (WD) distribution. For n samples β 1 , . . . , β n ∈ [0, 2π) with weights γ 1 , . . . , γ n > 0, where n j=1 γ j = 1, we write the wrapped Dirac mixture as where x ∈ [0, 2π) and δ(·) is the Dirac delta function. Note that this distribution does not have a well-defined probability density function. It is also worth mentioning that this function does not include any wrapping terms because every Dirac component only has "probability mass" at a single point (Kurz 2015, Section 2.2.3 D).

Class name Comment
CircularMixture mixture of arbitrary circular distributions CircularUniformDistribution see (  All circular distributions implemented in libDirectional (see Table 1) are derived from an abstract base class called AbstractCircularDistribution. This base class includes a number of functions that are applicable to all circular distributions and that are independent of the details of the particular distribution. This makes it easy to add implementations of new circular distributions, as a lot of functionality is automatically available once the pdf is defined.
In particular, we offer multiple plotting functions to generate different types of visualizations.
Example 1 (Plotting probability density functions) For example, we can generate a two-dimensional plot of the pdf of a wrapped normal distribution with parameters µ = 2 and σ = 1.3 simply by typing the following two commands.  If we also set the labels and axis using the following code, we obtain the plot depicted in Figure. 1a.
Similarly, we can create plots of other distributions. A three-dimensional plot of the pdf of a von Mises distribution with parameters µ = 6 and κ = 0.5 can be generated using the following code.
Also, the abstract base class contains a number of numerical methods to calculate the entropy, trigonometric moments, integrals of the pdf, etc. For most numerical methods (designated by the suffix Numerical), there are counterparts without this suffix that can be overridden by child classes to provide an analytical implementation. If the child class does not provide an analytical version of the algorithm, the numerical method is used as a fallback.
Example 2 (Numerical and analytical calculation) Let us consider the wrapped normal distribution defined in Example 1 again. Suppose we want to calculate the first trigonometric moment of this distribution, i.e., E(exp(ix)), where E(·) is the expected value. For this purpose, we simply call the corresponding function: In the case of the wrapped normal distribution, trigonometricMoment is a function inside the class WNDistribution that implements an analytic calculation of the trigonometric moment. If no analytic solution was implemented, the function trigonometricMoment in the base class AbstractCircularDistribution would have automatically fallen back to an algorithm based on numerical integration. Even though an analytical solution is available for the wrapped normal distribution, we can still call the numerical algorithm as follows.
> wn.trigonometricMomentNumerical(1) This can, for example, be used to compare the numerical and analytical results in order to validate the correctness of the analytical implementation. In this case, both results match up to the displayed number of digits, but in certain cases, analytical and numerical solutions may differ more significantly. Also, the numerical computation is typically slower, in some cases by several orders of magnitude.
A variety of methods are implemented for some or all circular distributions, for example the probability density function, the cumulative distribution function, the circular mean and variance, trigonometric moments, entropy, stochastic sampling, parameter estimation, conversions using trigonometric moment-matching, etc. These methods are thoroughly documented within the code, so we do not go into detail about them here.
Furthermore, we offer convolution and multiplication operations of circular probability densities for some distributions. These operations are required for the circular filtering algorithms discussed in Section 4. For example, we implement the approximations for the von Mises distribution discussed in Azmani et al. (2009), and the approximations for the WN distribution discussed in Kurz et al. (2016a) and Traa and Smaragdis (2013).
One feature, however, deserves a more thorough discussion as many readers may not be familiar with it. In libDirectional, we have implemented several algorithms for deterministic sampling, a concept where samples are drawn from a distribution deterministically rather than stochastically. These samples are then represented using a (wrapped) Dirac mixture. The advantage of deterministic approaches is that the samples can be placed at representative positions to achieve a good representation of the true density with very few samples. This can, for example, be achieved by performing moment matching. Algorithms of this type have previously been used in Gaussian filters such as the unscented Kalman filter (UKF) by Julier and Uhlmann (2004) or the smart sampling Kalman filter (S 2 KF) by Steinbring et al. (2016). We have proposed deterministic sampling schemes based on approximation of the first trigonometric moment and the first two trigonometric moments in Kurz et al. (2014b). Furthermore, we have proposed approximations using quantization ), superposition of moment-based samples, and a binary tree approximation (Kurz et al. 2016c). These methods are implemented in libDirectional and their use is demonstrated in the following example.
Example 3 (Deterministic sampling) Once again, we consider the wrapped normal distribution defined in Example 1. Suppose we want to approximate this distribution using a wrapped Dirac mixture with three components, i.e., with three samples on the unit circle. According to Kurz et al. (2014b), the approximation can preserve the first trigonometric moment. We can easily achieve this using the following command. The row vectors d and w correspond to the positions and weights of the Dirac components, respectively. It can be seen that the resulting wrapped Dirac mixture is evenly weighted. We can verify that the first moment is indeed preserved, similar to Example 2.
An approximation with five components based on the first two trigonometric moments is also possible.
> wd5 = wn.toDirac5() In this case, the mixture components are not evenly weighted. We can plot the resulting approximations using the following statements.

Torus and hypertorus
Another interesting manifold is the torus as it can be used to represent two (possibly correlated) angular quantities. We parameterize the torus as [0, 2π) 2 , i.e., the Cartesian product of two circles. On the torus, we consider several different distributions, the bivariate wrapped normal distribution, two versions of the bivariate von Mises distribution, the bivariate wrapped Dirac mixture, and a distribution based on a two-dimensional Fourier series (see Table 2). As their names suggest, these distributions constitute bivariate generalizations of the wrapped normal distribution, the von Mises distribution, the wrapped Dirac mixture, and the Fourier distribution, respectively. In the case of the wrapped normal distribution, the generalization to a higher number of dimensions is straightforward (Johnson and Wehrly 1977, Example 7.3). A bivariate wrapped normal distribution arises when a random variable distributed according to a bivariate normal distribution is wrapped in both dimensions. The bivariate wrapped Dirac distribution is obtained analogously. For the bivariate Fourier distribution, a two-dimensional Fourier series is used to approximate the density, or the square root thereof. The bivariate von Mises distribution is more tricky as there are several different, non-equivalent definitions, some of which are discussed in Mardia et al. (2007). In libDirectional, we chose to implement the sine version as well as the matrix version of the bivariate von Mises. The sine version has the advantage that it has been more thoroughly investigated than its alternatives and a number of its properties are known, e.g., a series representation for its normalization constant (Singh et al. 2002). On the other hand, the matrix version has the significant advantage that it is closed under multiplication .
The overall design of toroidal distributions in libDirectional is similar to the circular distributions discussed above. Once again, there is an abstract base class from which all toroidal distributions inherit. Its name is AbstractToroidalDistribution and it implements a number of methods that are independent of the particular toroidal distribution. These methods can be overridden by the child classes if analytical solutions are available.
One of the key problems when dealing with toroidal distributions is the question of how to quantify correlation. Over the past decades, a number of different correlation coefficients have been proposed, and we have decided to implement several of them in libDirectional, namely the correlation coefficients by Johnson and Wehrly (1977), Jupp and Mardia (1980), and

Class name Comment
which we have implemented under the name mean4D and covariance4D, respectively. These values can, for example, be used to determine the circular mean in each dimension as well as certain circular-circular correlation coefficients.
Example 4 (Bivariate wrapped normal) First, we instantiate a bivariate wrapped normal distribution using the following statement. We can visualize the density of this distribution as a function [0, 2π) 2 → R + using the plot method.
It can be shown that for a bivariate wrapped normal distribution, the marginals are wrapped normal. They can be obtained using the following function calls.
> wn1 = twn.marginalizeTo1D(1), wn2 = twn.marginalizeTo1D(2) wn1 = WNDistribution with properties: mu: 1 sigma: 1 dim: 1 wn2 = WNDistribution with properties: mu: 3 sigma: 0.9487 dim: 1 As can be seen, the marginals are returned as objects of the class WNDistribution, i.e., a circular distribution that can be used as discussed in Section 3.1. Thus, this example illustrates one of the benefits of having implemented distributions on multiple different manifolds within a single library.
Beyond toroidal distributions, we also offer some hypertoroidal distributions, i.e., distributions on the n-torus [0, 2π) n . An overview of the supported distributions is given in Table 3, all of which are generalizations of the corresponding toroidal distributions.

Class name Comment
CustomHypertoroidalDistribution allows any user-specified pdf  All hypertoroidal distributions use AbstractHypertoroidalDistribution as their base class. Because the circle and the torus are special cases of the hypertorus for n = 1 and n = 2, respectively, circular and toroidal distributions also inherit (indirectly) from this class. In this way, a lot of code can be shared among many distributions even though they are defined on different manifolds.

Real hypersphere
In this section, we consider probability distributions defined on the unit hypersphere S n−1 = {x ∈ R n : x = 1}, i.e., we parameterize the unit hypersphere as a set of unit vectors in R n for some n ∈ N. Note that this also encompasses the unit circle if n = 2, but uses a different parameterization (the set of two-dimensional unit vectors rather than a one-dimensional interval of length 2π) compared with the section above.
On the hypersphere, we consider several different distributions as well. The first distribution is the von Mises-Fisher distribution proposed by Fisher (1953), a generalization of the circular von Mises distribution to the unit hypersphere. It is parameterized by a unit vector µ defining the mode of the distribution as well as a concentration parameter κ influencing its dispersion. The von Mises-Fisher distribution is unimodal and radially symmetric around µ. The Watson distribution (Watson 1965) is closely related and has the same set of parameters, but it is

Class name Comment
BinghamDistribution see Bingham (1964), Bingham (1974) HypersphericalDiracDistribution discrete distribution on the real hypersphere HypersphericalUniform uniform distribution on the real hypersphere VMFDistribution von Mises-Fisher distribution (Fisher 1953) WatsonDistribution see Watson (1965) BayesianComplexWatsonMixtureModel complex Watson mixture with prior ComplexAngularCentralGaussian see Kent (1997)   antipodally symmetric (i.e., f (x) = f (−x)), and thus is bimodal with modes at ±µ. However, it is still radially symmetric around the axis of µ. In order to represent anisotropic noise, the Watson distribution can be generalized to obtain the Bingham distribution as defined by Bingham (1974). The Bingham distribution is usually parameterized by an orthogonal matrix M that defines the location of the mean and the orientation of the principal axes of the uncertainty, as well as a diagonal matrix Z responsible for representing the uncertainties along the different axes. Furthermore, we can enforce that the diagonal entries of Z are sorted in ascending order and that the last diagonal entry is zero (Kurz et al. 2014f) without changing the expressiveness of the distribution. Thus, we use this parameterization within libDirectional 1 .
The architecture for hyperspherical probability distributions is similar to that of the previously discussed manifolds. There is a base class called AbstractHypersphericalDistribution, which provides generic features independent of the particular distribution such as plotting and numerical integration. The individual distributions inherit from this class and can provide methods for the pdf, the normalization constant, stochastic sampling, parameter estimation, etc. As the normalization constant for the Bingham distribution is given by a hypergeometric function of matrix argument, it is quite expensive to evaluate. We have implemented several possible methods including the saddlepoint approximation by Kume and Wood (2005). Further discussion about the different methods for computing the normalization constant can be found in Gilitschenski et al. (2014b). Similar to the deterministic sampling schemes on the circle (see Example 3), we also provide a deterministic sampling scheme for the Bingham distribution, which is presented in Gilitschenski et al. (2016b), and a deterministic sampling scheme for the von Mises-Fisher distribution proposed in Kurz et al. (2016b).

Complex hypersphere
This section introduces three distributions and related statistical models defined on the complex hypersphere CS n−1 = {z ∈ C n : z = 1}. The complex Bingham distribution is defined by its probability density function with Hermitian transpose z H :=z , where z is conditioned on z H z = 1, c B (B) is an appropriate normalization term and B is the complex positive semi-definite parameter matrix. The complex Bingham distribution has complex symmetry, namely, it is invariant under scalar rotation, i.e., p(z) = p(z exp(iϕ)). Similar to the real Bingham distribution, its deviation around the mean is governed by the difference between the eigenvalues of B. Again, the parameter matrices B and B + kI define the same distribution (Kent 1994). A maximum likelihood fit according to Kent (1994) is implemented in ComplexBinghamDistribution.fit().
In contrast to the real case, the complex Bingham normalization constant c B (B) can be written in terms of elementary functions where λ k are the eigenvalues of the parameter matrix B. A symbolic implementation is used to generate code for the normalization constant ComplexBinghamDistribution.logNorm() and other moments of the complex Bingham distribution.
Counterintuitively, the complex Bingham distribution is a special case of the real Bingham distribution of higher dimension. The corresponding real Bingham parameter matrix can be calculated by replacing each entry B kl = α kl exp(iϕ kl ) with blocksB kl B kl = α kl cos(ϕ kl ) − sin(ϕ kl ) sin(ϕ kl ) cos(ϕ kl ) .
This relationship is useful to test implementations of the complex distributions against their real counterparts and is implemented in ComplexBinghamDistribution.toReal(). Nevertheless, the algorithms for the complex case can be implemented more efficiently without relying on the real counterpart.
The complex Watson distribution is a special case of the complex Bingham distribution. It is defined by its pdf (Mardia and Dryden 1999) where w ∈ CS n−1 is a complex vector with unit norm and κ governs the concentration around the mean direction. The complex Watson normalization constant c W (κ) can be written in terms of elementary functions (Mardia and Dryden 1999). An implementation of the normalization constant is provided in ComplexWatsonDistribution.logNorm() and derivatives thereof are used for maximum likelihood estimates in ComplexWatsonDistribution.fit().
Different sampling algorithms for the complex Bingham distribution are known (Kent et al. 2004). Here, a sampling process based on sampling from truncated exponential distributions has been implemented in ComplexBinghamDistribution.sample(). The sampling process can be used to create samples for a complex Watson distribution and for complex Watson mixture models.
Early applications of the complex Watson and complex Bingham distributions are statistical modeling of two-dimensional landmarks (Kendall 1984). A fairly recent application is to describe phase and level differences in multi-channel recordings (Vu and Haeb-Umbach 2010;Drude et al. 2014). Different speaker positions cause different phase and level differences such that an EM algorithm for clustering can be employed. An EM algorithm for a complex Watson mixture model is implemented in ComplexWatsonMixtureModel.fit() (Vu and Haeb-Umbach 2010). An extension to the EM algorithm incorporates prior knowledge about the mode direction and the mixture weights of each mixture component. The corresponding variational EM algorithm is given by BayesianComplexWatsonMixtureModel.fit() (Drude et al. 2014). Figure 4 shows some samples of a complex Watson mixture model in the complex shape domain (Kent 1994). Although pdfs on complex hyperspheres cannot be visualized easily, plots in the shape domain allow visually inspecting similarities between shapes. The unnormalized mode vectors of the underlying complex Watson distributions are

respectively.
Kent suggested the complex angular central Gaussian model as an alternative to the complex Bingham distribution. Its probability density function is given by where Γ(·) refers to the gamma function (Kent 1997). The library provides the probability density function as well as a sampling algorithm. Additionally, parameter estimation is provided in ComplexAngularCentralGaussian.fit(). Natural extensions are a Complex Angular Central Gaussian Mixture Model (Ito et al. 2016a) and a Complex Bingham Mixture Model (Ito et al. 2016b), both of which found great applications in speech enhancement. We plan to add the corresponding code to libDirectional as future work.

SE(2)
Finally, we consider the manifold of rigid body motions in two dimensions called SE(2). A rigid body motion can be seen as a rotation together with a translation. The rotation is an element of the group of two-dimensional rotations SO(2)-which can be parameterized as the unit circle-and the translation is a vector in R 2 . In libDirectional, we consider two different continuous distributions on SE(2). As they use different parameterizations of SE(2), the distributions do not share a common base class.

Partially wrapped normal distribution on SE(2)
The first distribution is called the partially wrapped normal distribution (PWN) presented in Kurz et al. (2014e) that was further discussed in Kurz (2015, Section 2.3.3) 2 . This distribution is defined on [0, 2π) × R 2 and is obtained from a normal distribution on R 3 where the first component is wrapped. Then, the first component can be used to represent the angle of the rotation, whereas the second and third components are used to represent the translation. In analogy to the WD distribution on the circle and the bivariate WD distribution on the torus, we also define a partially wrapped Dirac mixture (PWD) distribution. Similar to the toroidal expectation values given in (2), it is of interest to consider the moments where [x 1 , x 2 , x 3 ] is a partially wrapped random variable. Note that although both consider a four-dimensional augmented random vector, (2) and (4) differ by their treatment of the last two dimensions. The values defined in (4) are available using the methods mean4D and covariance4D. Moreover, we provide the ability to obtain the marginals of a SE2PWNDistribution as a WNDistribution and as a GaussianDistribution, respectively.

Modified Bingham distribution
The second distribution on SE(2) is related to the Bingham distribution in the sense that it also arises by restricting a Gaussian random vector ). This is motivated by the fact that a multiplicative subgroup of dual quaternions can be used for representing elements of SE(2), which is reminiscent of the approach by Matsuda et al. (2014). Similar to the Bingham case, our distribution needs to be antipodally symmetric in order to account for the fact that unit dual quaternions are a double cover of SE(2). The distribution is characterized by its pdf where N (C) denotes the normalization constant. This corresponds to the density of a fourdimensional Gaussian where the first two entries of x are interpreted as one vector that is restricted to unit length. This probability distribution is implemented within the class SE2BinghamDistribution.
Not every choice of C ∈ R 4×4 is admissible for this distribution. In order to improve our understanding of the structure of the underlying distribution, we rewrite C as and we can then rewrite the pdf as . From this representation, it follows that C 1 needs to be symmetric (but not necessarily positive or negative definite), C 2 may be arbitrary, and C 3 has to be symmetric negative definite.
An SE2BinghamDistribution object can be constructed using either of the two constructors SE2BinghamDistribution(C) or SE2BinghamDistribution(C1,C2,C3). Besides that, we have implemented a method for obtaining the covariance matrix using Monte Carlo integration (computeCovarianceMCMC), a computation of the normalization constant (computeNC), the mode of the density (mode), a parameter estimation procedure (fit), and deterministic as well as random sampling procedures (sampleDeterministic and sample).

Filters
Based on the probability distributions introduced in the previous section, it is possible to derive recursive filtering algorithms to perform recursive Bayesian estimation. In the following, we introduce a number of filters that are implemented in libDirectional.

Circle
Several filters for the unit circle are available in libDirectional. These include both filters based on circular statistics and traditional filters originally intended for linear domains that have been modified for use on the unit circle and that can be employed for comparison. In the following, we distinguish the circular filters based on the type of density they use.

WN-assumed filter
First of all, we have implemented several WN-assumed filtering algorithms in the class WNFilter, i.e., filters approximating the true distribution with a WN distribution after each prediction and filtering step. This class allows predicting with a noisy identity system model, with a nonlinear measurement model in conjunction with additive noise (Kurz et al. 2013), and with a nonlinear measurement model in conjunction with non-additive noise. As far as the measurement model is concerned, the class can handle noisy identity measurements and nonlinear measurements given by a likelihood (Kurz et al. 2014d) with several methods. A more thorough discussion of these scenarios can be found in Kurz et al. (2016a).

VM-assumed filter
Analogously, we can assume a von Mises distribution instead of a wrapped normal distribution. This alternative is implemented in VMFilter. Similar to before, we also distinguish between different types of system and measurement models as discussed in Kurz et al. (2016a). It should be noted that for identity system and identity measurement models, the filter proposed by Azmani et al. (2009) arises as a special case. Finally, we also implement a measurement update based on nonlinear measurement functions (rather than measurement likelihoods) proposed in Gilitschenski et al. (2015b).

Fourier filters
The Fourier identity filter and the Fourier square root filter, as explained in Pfaff et al. (2015Pfaff et al. ( , 2016a, use a truncated Fourier series to approximate the density or the square root of the density. While this filter is very universal and only makes few assumptions about the noise distributions, nonlinear measurement models require knowledge of the likelihood and for nonlinear system models, the transition density is required. Unless an identity system model with additive noise is used for the prediction step, the transition density needs to be given as one of the toroidal distributions (Pfaff et al. 2016b) depending on the current state and the state at the next time step. The Fourier filters are particularly powerful as we implemented the ability to approximate other distributions using the Fourier series representation described in Section 3.1. Thus, prediction and filter steps can easily be performed in an approximate fashion for arbitrary densities. For this filter, the user only needs to specify at least one additional parameter at the time of initialization. The integer n (also called noOfCoefficients) must be given to determine the number of Fourier coefficients. The optional string input argument transformation can be provided to specify if the Fourier identity filter or the Fourier square root filter should be used. If no second argument is given, the square root filter is used by default. Higher values of n result in better approximations for distributions that are more peaked but also yield a moderately higher run time as the filter has an asymptotic complexity of O(n log n) for the update step and for the prediction step with an identity model with additive noise. If a prediction step using the transition density is required, the run time complexity increases to O(n 2 log n).

Gaussian-assumed filters
In order to assess the performance of filters based on directional statistics in comparison with filters making a Gaussian assumption, we included modified versions of the unscented Kalman filter (UKF) (Julier and Uhlmann 2004) based on the implementation of Steinbring (2015). For this purpose, we consider two different possibilities how the filter can be modified as discussed in Kurz et al. (2016a) and (Kurz 2015, Section 3.1). First, we can define the filter on a (local) chart of the manifold, e.g., the open interval (0, 2π) and try to detect and fix issues when the boundary of the chart is reached by reparameterizing if necessary. In this case, the resulting filter has a scalar state. This type of filter is implemented in the class CircularUKF. Second, it is also possible to consider a filter on the space in which the manifold is embedded and to introduce a constraint enforcing that the state always resides on the manifold. In this case, the resulting filter has a two-dimensional state vector, which is constrained to be of unit length. This type of filter is implemented in the class ConstrainedUKF.

Particle filter
A commonly used filter that avoids the Gaussian assumption is the particle filter (Arulampalam et al. 2002). As this filter does not really depend on the underlying manifold as long as the system function and the measurement likelihood properly consider the periodicity, it is easy to adapt the particle filter to the circle. Hence, we have implemented a particle filter with sequential importance resampling (SIR) in CircularParticleFilter. The particle filter is a Markov Chain Monte Carlo method, and thus, relies on random sampling, so it constitutes a nondeterministic method.

Discrete filter
We also consider a Dirac-based discrete filter based on an evenly spaced grid on the circle (Kurz et al. 2016d), which was previously used by Pfaff et al. (2015). Similar approaches have been applied to practical problems, e.g., localization of a robot (Burgard et al. 1996). The discrete filter closely resembles the particle filter, but it uses equally spaced particles with fixed positions. Consequently, prediction and measurement update only affect the weights of the particles but not their location. The discrete filter is deterministic and can closely approximate the exact Bayesian filter provided a sufficient number of grid points is used.

Piecewise Constant Filter
In addition to the Dirac-based discrete filter discussed above, we also offer a filter based on piecewise constant distributions (Kurz et al. 2016d) that is similar to a Wonham filter (Wonham 1964). This filter subdivides the interval [0, 2π) into a predefined number of smaller intervals of equal size and assumes a uniform distribution within each interval. The filter requires a system matrix that contains the transitions probabilities from each interval into any other interval, which can be precomputed using numerical methods. For the measurement update, it is possible to use a likelihood function or to discretize the measurement space and use a precomputed measurement matrix containing the conditional probabilities of obtaining each discrete measurement given the state is in a certain interval.
Example 5 (Nonlinear circular filtering) Let us consider a system with a circular state x k ∈ [0, 2π) and system dynamics where w k ∼ WN (x; 0, 0.4) is WN-distributed additive noise. If we assume that the current state is distributed according to x k ∼ WN (x; 2, 0.5), we can perform the prediction step with the WN-assumed filter using the following commands.
> filter = WNFilter(); > filter.setState(WNDistribution(2, 0.5)); > a = @(x) mod(x + 0.5 * cos(x)^2, 2*pi); > filter.predictNonlinear(a, WNDistribution(0, 0.4)); > filter.getEstimate()  It can be seen that the predicted density is returned as a wrapped normal distribution (see Figure 5a). Now we consider the measurement model where v k ∼ N (x; 0, 0.7) is normally distributed additive noise. In this case, we have a circular state, but a real-valued measurement. However, a circular measurement (or a measurement on a completely different manifold) would be possible as well. If we obtain a measurement, sayẑ = 0.3, we can perform the measurement update as follows.

Torus and hypertorus
Circular filtering can be generalized to hypertoroidal filtering, just as circular distributions can be generalized to hypertoroidal distributions. As toroidal and hypertoroidal filtering is not a well-researched field, only a few filters are available. In libDirectional, we provide a filter called ToroidalWNFilter, which is based on the toroidal wrapped normal distribution proposed in Kurz et al. (2014a) along with extensions for nonlinear system and measurement equations given in . Furthermore, there is a modified version of the unscented Kalman filter (Julier and Uhlmann 2004) called ToroidalUKF, which can be seen as a two-dimensional generalization of the CircularUKF discussed above. We also provide toroidal and hypertoroidal generalizations of the particle filter in ToroidalParticleFilter and HypertoroidalParticleFilter, respectively. The class HypertoroidalFourierFilter offers a generalization of the Fourier filters used in the circular case. The implementation of additional hypertoroidal filters is planned as future work.

Hypersphere
In libDirectional, we have also included several filters for estimation on the unit hypersphere. A nonlinear filter based on the von Mises-Fisher distribution ) is available in the class VMFFilter, which inherits from the base class for hyperspherical filters called AbstractHypersphericalFilter. The class VMFFilter includes the filters by Chiuso and Picci (1998); Markovic et al. (2014) as a special case. A hyperspherical particle filter and a hyperspherical version of the UKF are also available.
We have also implemented filters assuming antipodal symmetry, i.e., we assume that the points −x and x have the same probability density. This type of filter can be used for axial estimation problems, e.g., when the axis of rotation of an object is to be estimated and the direction of the axis is irrelevant. Furthermore, antipodal symmetry appears in unit quaternions, which can be used for estimation on SO(3).
All hyperspherical filters with antipodal symmetry are derived from the abstract base class AbstractAxialFilter. The name refers to the estimation of an axis, e.g., a rotation axis. We provide the Bingham filter described in Kurz et al. (2014f), a special case of which was also considered by Glover and Kaelbling (2013). Furthermore, the Bingham filter contains the unscented extension proposed in Gilitschenski et al. (2016b). For comparison, libDirectional also includes an axial version of the Kalman filter (Kalman 1960) in the class AxialKalmanFilter. This Kalman filter uses a Gaussian distribution to approximate one of the two modes of the bimodal Bingham distribution (on one of the hemispheres).

SE(2)
We also implemented two different filters for estimation of planar rigid-body motions based on a subgroup of the unit dual quaternions that can be represented as four-dimensional vectors where the first two entries are restricted to unit length (when jointly considered as a twodimensional vector) as proposed by Gilitschenski et al. (2014a). One of the filters is based on the modified Bingham distribution (in SE2BinghamFilter) and is similar to the structure of the Bingham filter (Gilitschenski et al. 2015a). The other filter is a UKF that is implemented (in SE2UKF) in the same way as the hyperspherical UKF. The only difference is that SE2UKF ensures the resulting estimate to be a unit dual quaternion. Glover (2013) BSD (3-Clause) mhg Koev and Edelman (2006) GPL v2 or later Nonlinear Filtering Toolbox Steinbring (2015) GPL v3

Installation and dependencies
In this section, we provide a brief explanation of the installation procedure as well as the external libraries libDirectional depends on.

Installation
The most recent version of libDirectional can always be obtained from https://github.com/ libDirectional. In order to install the library, the entire lib-folder including all subdirectories has to be added to MATLAB's search path. This can be achieved using the startup.m script. We officially support MATLAB 2014a and later, but most functionality should be available in older versions as well. libDirectional is platform-independent and runs on the Windows, Linux, and Mac-Versions of MATLAB.
A number of functions of the library have been implemented in C++ for performance reasons. These functions can be directly called from MATLAB using the mex-file mechanism. In order to compile the corresponding source code, we provide the compileAll.m script, which compiles all files that are necessary for libDirectional, including the external dependencies (see Section 5.2). The compilation procedure requires a current compiler supported by MATLAB such as Microsoft Visual C++ 2013 or later 3 , gcc 4.7, or Xcode.

Externals
To avoid reinventing the wheel, libDirectional relies on a few libraries written by other authors (see Table 5). We include all of these dependencies in the folder externals to make it easy for the user to run our library. Inclusion of the externals is permitted by their respective licenses and our library libDirectional itself is licensed under the GPL v3 license.
Eigen (Guennebaud et al. (2010)) is a C++ library for efficient matrix algorithms, which we use for some of our C++ implementations. In order to obtain high efficiency, we also use fmath (Shigeo 2009) to calculate exponential functions and logarithms using vector instructions from the SSE/AVX instruction set.
For the calculation of the normalization constant of the Bingham distribution, we offer several algorithms, one of which is implemented in the mhg library provided by Koev and Edelman (2006). Furthermore, we use the complex error function required for the multiplication of wrapped normal densities , which is implemented in the Faddeeva package by Johnson (2012). We also use a modified version of the script circVMcdf by Shai Revzen, which provides an implementation of the cumulative distribution function of a von Mises distribution as described in Hill (1977). Moreover, we include some code from libBingham by Glover (2013), but this library is not in the "externals"-folder as only small parts are used and these are directly integrated into the code of libDirectional.
Finally, we rely on some functions of the Nonlinear Filtering Toolbox for MATLAB by Steinbring (2015). In particular, we use the UKF implementation as well as the deterministic sampling feature for Gaussians from this library. We provide a subset of the Nonlinear Filtering Toolbox within libDirectional that contains all required functions.

Conclusion
In this paper, we have presented libDirectional, a MATLAB library for directional statistics and directional estimation. As we have shown, this library implements a variety of directional distributions on a number of different manifolds such as the circle, the hypertorus, and the hypersphere. Most distributions offer not only the probability density function but also algorithms for common associated problems such as visualization, parameter estimation, entropy calculation, stochastic sampling, etc. All of these methods are implemented in a clean, objectoriented design that allows the use of analytical solutions whenever possible and provides a transparent fallback to numerical solutions if analytical solutions are unavailable.
Based on these distributions, a number of different recursive filters are implemented in libDirectional that can be used for estimation of random variables located on the aforementioned manifolds. These filters not only include many methods based on directional statistics, but also some standard approaches that were modified for the directional setting and that can be used for comparison in order to evaluate the benefits and drawbacks of directional approaches.
We hope that the publication of libDirectional will make directional statistics and estimation algorithms based thereon available to a wider audience. As the library was designed to be quick to learn and easy to understand, more researches will be able to experiment with these types of methods, to apply them to various problems, and to improve upon them.