EMMIXcskew: an R Package for the Fitting of a Mixture of Canonical Fundamental Skew t-Distributions

This paper presents an R package EMMIXcskew for the fitting of the canonical fundamental skew t-distribution (CFUST) and finite mixtures of this distribution (FM-CFUST) via maximum likelihood (ML). The CFUST distribution provides a flexible family of models to handle non-normal data, with parameters for capturing skewness and heavy-tails in the data. It formally encompasses the normal, t, and skew-normal distributions as special and/or limiting cases. A few other versions of the skew t-distributions are also nested within the CFUST distribution. In this paper, an Expectation-Maximization (EM) algorithm is described for computing the ML estimates of the parameters of the FM-CFUST model, and different strategies for initializing the algorithm are discussed and illustrated. The methodology is implemented in the EMMIXcskew package, and examples are presented using two real datasets. The EMMIXcskew package contains functions to fit the FM-CFUST model, including procedures for generating different initial values. Additional features include random sample generation and contour visualization in 2D and 3D.


Introduction
Finite mixture models, in particular normal mixture models, have been widely used in statistics and a diverse range of applied fields such as bioinformatics, biomedicine, economics, finance, genetics, image analysis, psychometrics, and social science.They provide a powerful and flexible tool for the probabilistic modelling of data, with applications ranging from density estimation to clustering, classification, and discriminant analysis.For a survey on mixture models and their applications, see Everitt and Hand (1981), Titterington et al. (1985), McLachlan and Basford (1988), Lindsay (1995), Böhning (2000), McLachlan and Peel (2000), and Frühwirth-Schnatter (2006), the edited volume of Mengersen et al. (2011), and also the papers by Banfield and Raftery (1993) and Fraley and Raftery (1998).

arXiv:1509.02069v2 [stat.CO] 9 Feb 2017
In recent years, mixture models with skew component distributions have received increasing attention.These models adopt densities that can take more flexible distributional shapes than the traditional normal and t-distributions as component distributions, rendering them suitable for a wider range of applications.Of these, the skew t-distribution is gaining popularity due to its ability to handle both the asymmetry and heavy-tailedness in the data.In particular, a number of different formulations of skew t-distribution have been used extensively in the model-based clustering literature (see, for example, Lee andMcLachlan (2014a, 2016a), Azzalini et al. (2016), McLachlan and Lee (2016) and the references therein).They have also found many applications in a range of fields, including astrophysics (Riggi and Ingrassia 2013), financial risk analysis and modelling (Soltyk and Gupta 2011, Bernardi 2013, Lee and McLachlan 2013b, Abanto-Valle et al. 2015), fisheries science (Contreras-Reyes and Arellano-Valle 2013), flow cytometry (Pyne et al. 2009, Frühwirth-Schnatter and Pyne 2010, Rossin et al. 2011, Ho et al. 2012, Hu et al. 2013, Pyne et al. 2014, Lee et al. 2014, Lin et al. 2016, 2015, Lee et al. 2016c, Pyne et al. 2015), image segmentation (Lee and McLachlan 2013a), pharmaceutical science (Schaarschmidt et al. 2015), and the social sciences (Muthén andAsparouhov 2014, Asparouhov andMuthén 2016).For a comprehensive survey of skew distributions, see, for example, the articles by Azzalini (2005), Arellano-Valle and Azzalini (2006), Arellano-Valle et al. (2006), the book edited by Genton (2004), and the recent monograph by Azzalini and Capitanio (2014).
Recently, Lee and McLachlan (2016a) introduced a finite mixture of canonical fundamental skew t (FM-CFUST) distribution.This formulation of the skew t-distribution has a general p × q matrix of skewness of parameters (Arellano-Valle and Genton 2005).It thus provides a more general characterisation than the restricted and unrestricted skew t-distributions (adopting the terminology of Lee and McLachlan (2013c)).This paper describes an R package EMMIXcskew for the fitting of the FM-CFUST model.It implements the EM algorithm described in Lee and McLachlan (2016a) and provides other functionalities such as random sample generation, density evaluation, and the plotting of contours in 2D and 3D.
The remainder of this paper is organized as follows.Section 2 provides a brief description of the CFUST distribution and its nested models.Section 3 outlines an EM algorithm for fitting finite mixtures of CFUST distributions and examines different approaches for generating starting values for this EM algorithm.In the next two sections, the usage of the EMMIXcskew package is illustrated using real and simulated examples.Finally, we conclude with some brief remarks in Section 6.

The CFUST and related distributions
To establish notation, let Y be p-dimensional random vector that follows a multivariate CFUST distribution, denoted by Y ∼ CF U ST p,q (µ, Σ, ∆, ν).Then the density of Y is given by f (y; µ, Σ, ∆, ν) = 2 q t p (y; µ, Ω, ν) T q c(y) ν + p ν + d(y) ; 0, Λ, ν + p , where It can observed from (1) above that the CFUST distribution is described by the parameters (µ, Σ, ∆, ν), where µ is a p-dimensional vector of location parameters, Σ is a positive definite scale matrix, ∆ is a p × q matrix of skewness parameters, and ν is a scalar degrees of freedom that regulate the tails of the distribution.In the above, we let t p (y; µ, Ω, ν) denote the pdimensional t-distribution with location parameter µ, scale matrix Ω, and degrees of freedom ν, and T q (.) is the q-dimensional (cumulative) t-distribution function.
As mentioned previously, the CFUST distribution includes some commonly used distributions as special and/or limiting cases.Taking ∆ = 0 reduces (1) to the symmetric multivariate t-density t p (µ, Ω, ν), and further letting ν → ∞ and ν = 1 leads to the multivariate normal N p (µ, Ω) and Cauchy C p (µ, Ω) distributions, respectively.If ∆ is constrained to be a diagonal matrix, we obtain the skew t-distribution of Sahu et al. (2003) which is referred to as the unrestricted skew t-distribution using the terminology in Lee andMcLachlan (2014a, 2013c).
To obtain the classical skew t-distribution by Azzalini and Capitanio (2003) from (1), one can set q = 1 or take ∆ to be a matrix of zeros except for one column (Lee and McLachlan 2016a).This formulation of the skew t-distribution, referred to as the restricted skew t-distribution, is equivalent to that given by Branco and Dey (2001), Gupta (2003), Lachos et al. (2010) and Pyne et al. (2009); see Lee and McLachlan (2013c).Analogously, the restricted and unrestricted skew normal distributions can be obtained by placing appropriate constraints on ∆ and letting ν → ∞.Some properties of the CFUST distribution are described in Arellano-Valle and Genton (2005).It is of interest to note that this distribution suffers an identifiability issue as discussed in Lee and McLachlan (2016).In brief, this means that the CFUST density is invariant under permutations of the columns of the skewness matrix ∆, but this does not affect parameter estimation.Hence, in practice, the user only needs to be aware that changing the order of the columns of ∆ does not affect the density of the CFUST model.
There are several R packages on CRAN that deal with (multivariate) mixture models with skewed component densities.In particular, the restricted and unrestricted versions of skew t-mixture models are implemented in EMMIXskew (Wang et al. 2009) and EMMIXuskew (Lee and McLachlan 2013d), respectively.The package mixsmsn (Prates et al. 2013) implements the family of finite mixtures of scale-mixture of skew-normal distributions, which includes a skew normal distribution and a skew t-distribution that is equivalent to the restricted skew normal distribution and restricted skew t-distribution, respectively.However, the estimation procedure used in mixsmsn imposes the condition that all components of the skew t-mixture model share a common value for the degrees of freedom.A recently developed package MixGHD (Tortora et al. 2015) provides functions to fit finite mixtures of generalized hyperbolic distributions.For the classical multivariate skew-normal and skew t-distributions, the sn package (Azzalini 2014) can be used.For traditional normal mixture model and related tools, a number of other packages are available on CRAN, such as bgmm (Biecek et al. 2012), flexmix (Leisch 2004, Grün andLeisch 2008), mclust (Fraley andRaftery 2007, Scrucca et al. 2016), and mixtools (Benaglia et al. 2009).

Fitting mixtures of CFUST distributions via the EM algorithm
The density of a finite mixture model is given by a convex linear combination of component densities.More formally, adopting the CFUST distribution as component densities, we obtain a finite mixture of CFUST (FM-CFUST) distribution with density given by where π h (h = 1, . . ., g) are the mixing proportions and f (.) denotes the CFUST density given by (1).The mixing proportions satisfy π h ≥ 0 and g h=1 π h = 1.The vector Ψ = (π 1 , . . ., π g−1 , θ T 1 , . . ., θ T g ) contains all the unknown parameters of the model, with θ h containing the elements of µ h and δ h , the distinct elements of Σ h , and ν h .
For the fitting of the FM-CFUST model, we employ the EM algorithm (Dempster et al. 1977) to compute the maximum likelihood (ML) estimate of the parameters of the model.The EM algorithm proceeds by alternating repeatedly between the E-and M-steps until the changes in the log likelihood values are less than some specified small value in the case of convergence.
To facilitate parameter estimation via the EM algorithm, a set of latent variables are introduced, namely the component labels Z j (corresponding to the j = 1, . . ., n independent observations on Y ), alongside two random variables U j and W j that follow a half-normal distribution and a gamma distribution, respectively.Thus, the complete-data for the FM-CFUST model consist of these missing variables and the observations y j .This leads to a four-level hierarchical characterization of the FM-CFUST model, given by where Z j is a g-dimensional vector of binary component labels such that Z hj = (Z j ) h = 1 if the jth observation belongs to the hth component and zero otherwise.Here, HN q (0, Σ) denotes the q-dimensional half-normal distribution with scale matrix Σ, gamma(α, β) denotes the gamma distribution with mean α β , and Mult g (1, π) denotes the multinomial distribution with one draw and g categories with probabilities specified by π.We now outline the E-and M-steps of the EM algorithm for fitting the FM-CFUST model.

The E-step
The E-step of the EM algorithm requires the calculation of the so-called Q-function, Q(Ψ; Ψ (k) ), which is the conditional expectation of the complete-data log likelihood given the observed data y, using the current estimate of Ψ, which is denoted by θ (k) after the kth iteration.It follows that on the (k + 1)th iteration, the E-step requires the following five conditional expectations to be calculated, The expressions for (4) to (8) are given in Lee and McLachlan (2016a) and therefore not repeated here.However, it should be noted that e 1hj can be evaluated using different approaches, two of which are described in the above reference.For simplicity, the EMMIXcskew package implements the one-step-late (OSL) approach for this conditional expectation.It should be noted that the use of the approximate OSL approach to calculate e (k) 1hj can result in the incomplete-data likelihood not increasing monotonically.This conditional expectation can be calculated more accurately by a power series derived in Lee andMcLachlan (2014b, 2016a) for which monotonicity of the likelihood is preserved.An implementation of this option will be provided in a future update of this package.

The M-step
On the (k + 1)th iteration of the the M-step, the current estimate of Ψ, Ψ (k) , is updated to Ψ (k+1) , which is chosen to globally maximize Q(Ψ; Ψ (k) ) over Ψ.For the FM-CFUST model, the M-step leads to the following updates: hj , hj w An update of the degrees of freedom ν h is obtained by solving the following equation for , where ψ(•) denotes the digamma function.

Generating initial values for parameters
As the log likelihood function may exhibit a complicated profile with many local maxima and the EM algorithm is sensitive to its initial values, it is important to choose good starting values.In this section, we consider three strategies for generating valid initial values for the EM algorithm for the FM-CFUST model.For the remainder of this section, we suppress the subscript h (denoting the index of a component in a mixture model) for notational convenience.

Nested approach
An intuitive approach is to start the EM algorithm with the solution given by one of the nested models of a CFUST distribution, for example, the results from fitting a normal or t-mixture model.This option is available in EMMIXcskew with the fmcfust.initfunction (see Section 5.2), which accepts the outputs from the packages EMMIXskew and EMMIXuskew.The former package provide routines to fit mixtures of (multivariate) normal, t-, (restricted) skew normal, and skew t-distributions, whereas the latter package fits a mixture of unrestricted skew t-distributions.

Method of moments-based approach
Another approach is based on the moments of an unrestricted multivariate skew normal (uMSN) distribution.As noted earlier, the uMSN distribution is a nested case of the CFUST distribution.It can be characterized as the convolution of a truncated normal variable and a multivariate normal variable as follows, where µ ∈ R, ∆ = diag(δ) is a diagonal matrix of skewness parameters with diagonal elements given by δ, U 0j ∼ N p (0, I p ) and U 1j ∼ N p (0, Σ).The uMSN distribution (11) has mean and variance given by respectively.On rearranging the above expressions, we obtain an expression for µ and Σ in terms of δ (recall that by definition ∆ = diag(δ) for the uMSN distribution).To obtain an initial value for δ (0) , one can consider reducing the values of the diagonal elements of Σ (0) by an arbitrary proportion (1 − a) where a ∈ [0, 1] (Lin 2010).This leads a set of expressions given by where the sign of each element of δ (0) is given by the sign of the third-order sample moment of the corresponding variable about its sample mean.In (13) above, s * is a p-dimensional vector containing the diagonal elements of the sample covariance matrix S, and ȳ denotes the sample mean.Concerning the degrees of freedom, it can be set (initially) to a large number to reflect a uMSN distribution.

Transformation approach
A third approach is based on a transformation of Y j in an attempt to better handle the correlation of the variables in Y j .We consider the transformation vector X j = CY j , where C is an orthogonal matrix such that the covariance matrix of X j , cov(X j ), is diagonal.
In practice, we work with the sample covariance matrix of Y j .Then we can fit a uMST distribution to the transformed vector X j , where each X j can be characterized as where U 0j and U 1j follow a central multivariate t-distribution with ν degrees of freedom and scale matrix given by I q and Σ, respectively.Note that in ( 14) above, we have used a stochastic representation of the CFUST distribution that is analogous to (11) for a uMSN distribution.On pre-multiplying X j by C in ( 14) to obtain Y j , we have This suggests that an initial value for µ and for ∆ in a CFUST distribution can be given by C μ and C ∆, respectively, where μ and ∆ are the estimates of µ and ∆ obtained by fitting the uMST distribution to the X j .However, it should be noted that if the true distribution of Y j were a CFUSN distribution, then the transformed data X j may not necessarily have a uMSN distribution even though cov(X j ) is diagonal.This happens when the off-diagonal elements of Σ cancel out the off-diagonal elements of ∆∆ T .But it might be expected that in most situations where the sample covariance matrix of the X j is approximately diagonal, the matrix Σ and the skewness matrix ∆ are both diagonal or close to it.
In the case where a mixture of CFUST distributions is to be fitted rather than a single component distribution, we would need to first cluster the Y j into g clusters and proceed separately within each cluster as described above.
The above three methods are implemented in the function fmcfust.init of the EMMIXcskew package.By default, it adopts the moments method.An example on a real dataset demonstrating the use of these approaches is given in Section 5.2.

Stopping criterion
We adopt the Aitken acceleration-based stopping criterion as the default strategy to assess the convergence of the EM algorithm for the FM-CFUST model.The EMMIXcskew package also provides a few other criteria as an alternative, including those based on the relative change in the log likelihood value and estimates of the parameters of the model.

Aiken acceleration-based approach
In brief, when using the default strategy, our algorithm terminates when the absolute difference between a log likelihood value and its asymptotic estimate is smaller than a specified tolerance, ; that is, when where (k+1) is the log likelihood value at the (k + 1)th iteration and is its asymptotic estimate, given by In the above, 1) denotes the Aitken's acceleration at the kth iteration (Böhning et al. 1994, McLachlan andKrishnan 1997).The default tolerance of = 10 −6 is applied to the examples in the following sections, but the user can specify a different value.

Relative likelihood-based approach
Another commonly used stopping criterion is to monitor the relative changes in the log likelihood values at the end of each iteration and to stop the algorithm when the (relative) difference between two successive log likelihood values is less than a specified threshold.More formally, our algorithm terminates when where the threshold is set to 10 −6 by default.Again, the user can specify a different threshold.

Relative parameters-based approach
Apart from tracking the changes in the log likelihood value, one can also monitor the changes in the parameter estimates.In this case, the algorithm is considered to have converged when the relative change in all the parameter estimates is less than a specified threshold .Note that this criterion implies that the relative changes of all the free parameters needs to be smaller than , that is, all elements of Ψ including the mixing proportions.Thus, the EM algorithm terminates when Again, the default tolerance is = 10 −6 .

Notes on the EM implementation
In the EM algorithm described in this paper, the number of components g and the dimension q of the skewing vector must be specified.In practice, these are typically unknown and model selection criteria are employed to aid in choosing an appropriate value of g and q.Some of the more commonly used information criterion include the Bayesian information criterion (BIC; Schwarz (1978)), given by and the Akaike information criterion (AIC; Akaike (1974)), given by where m is the number of free parameters, n is the number of observations, and is the value of the log likelihood function at the fitted parameter vector.As is typical in fitting a mixture of factor analyzers (MFA) model, one may fit the FM-CFUST model for a range of values of p and q and choose the combination of p and q corresponding to the lowest AIC or BIC.An alternative strategy for automatically selecting an appropriate g was considered in Lee and McLachlan (2016c) which is based on the minimum message length (MML) criteria.However, concerning the value of q, it was observed in Lee and McLachlan (2016a) that when fitting the CFUST model with q = p it was able to model data generated from the rMST (q = 1) and uMST (q = p and ∆ is a diagonal matrix) distributions quite well.In particular, the estimated ∆ matches the true ∆ reasonably well.For example, in the case of data generated from a rMST distribution, all but one of the columns of the estimated ∆ has elements being relatively small, thus resembling the q = 1 case.Hence, in the implementation of the EM algorithm in the EMMIXcskew package, the default value of q is set to p.But the user is encouraged to experiment with different values of q when fitting the FM-CFUST model.

Using the EMMIXcskew package
The EMMIXcskew package implements the EM algorithm described in Section 3 and provides additional functions such as random sample generation, density evaluation, and graphics outputs.The software is primarily written in R. EMMIXcskew will be made available on the Comprehensive R Archive Network (CRAN).
In this sequel, we now demonstrate the basic usage of the EMMIXcskew package using simple examples.In particular, the following main routines are discussed: • dfmcfust: evaluation of density values; • rfmcfust: generation of a random sample from a FM-CFUST distribution; • fmcfust: fitting a FM-CFUST model; • init.cfust: generation of initial values for use in fmcfust; • fmcfust.contour.2d:plotting a 2D graph of the contours of a FM-CFUST model; • fmcfust.contour.3d:plotting 3D surface contours of a FM-CFUST model.

The density function of FM-CFUST distribution
The density of a FM-CFUST distribution is calculated by the dcfust function.The inputs to be passed into this function are similar to that in dfmmst from the EMMIXuskew package, and must be structured as described in Table 1.Briefly, the parameters µ, Σ, and δ are each implemented as a list of g matrices, where g is the number of components in the fitted model.Each element of the list objects mu, sigma, and delta (h = 1, . . ., g) must be specified as a p × 1, p × p, and p × q matrix, respectively.The parameters dof and pro are g by 1 arrays, representing the degrees of freedom and the mixing proportions for each component, respectively.Finally, the input data are specified by dat, an n × p matrix where each row represents an individual observation.
Typically, one may be interested in calculating density values for a fitted FM-CFUST model (obtained from the fmcfust function).In this case, the output object of the fitted model can be directly passed in to dfmcfust as a single argument known.Note that if both known and all the model parameters were provided by he user, only the values specified by known would be used.Issuing the following command will return a vector of n × 1 density values.
dfmcfust(dat, mu, sigma, delta, dof, pro, known=NULL) For a single CFUST distribution (that is, g=1), the dcfust function can be used.Here, the arguments mu, sigma, and delta need not be a list object, but mu can be a numeric array, and sigma and delta are matrices.Similar to the above function call, the dcfust can be called at the R command prompt as follows, which will return a numeric vector of density values.

Fitting a single CFUST distribution
To fit the FM-CFUST model with a single component (g = 1), the main routine fmcfust can be used.This implements the EM algorithm described in Section 3, with the default strategy for initial values given in (13).By default, q is assumed to be the same as p.A typical call of fmcfust is: fmcfust(g, dat, q=p, initial=NULL, known=NULL, itmax=100, eps=1e-6, nkmeans=20, verbose=TRUE) As a simple example, we consider the iris dataset, available directly in R. For illustration purposes, we look at the Versicolor species and focus on the two variables Sepal.Widthh and Pedal.Length.We first create a new data object iris.versicolor with the required data, then execute the fmcfust function with g = 1.This is the minimum information that must be supplied to fmcfust.
To view these parameters, summary can be called: The other arguments of fmcfust are similar to that used in fmmst from the EMMIXuskew package.Briefly, known is a list of model parameters that are known a priori and, if supplied, will not be updated in the iterations of EM algorithm.The arguments itmax and eps determine when the EM algorithm is terminated.If either the maximum number of iterations as specified by itmax is reached or the tolerance as specified by eps is obtained, the EM loop will terminate.User specified initial values can be supplied using initial.Note that this must be a list object structured as in Table 1.The argument nkmeans specifies the number of k-means trials to be preformed when using the default starting strategy.With verbose=TRUE, fmcfust prints the log likelihood value at each iteration and displays a summary of the estimated parameters of the model.
Note that in the above example we have used the default starting strategy to generate initial values, and assume q = p.As pointed out in Section 3, this may not always give the optimal fit.It is highly recommended that the EM algorithm is run from a range of different starting values.Some alternative methods for generating different starting values are discussed in Section 3.3.These are implemented in init.cfust(see Section 5.2 for further discussions).In addition, the user can also experiment with different values of q.However, it is interesting to note that for the simulated dataset of Section 4.1 in Lee and McLachlan (2016a), it was observed that the FM-CFUST model is able to (roughly) recover the structure of ∆ without prior knowledge of any constraints on the matrix of skewness parameters.
To assist in choosing a suitable model for the data from a range of different fitted results (for example, using different starting values), log likelihood values and information measures such as AIC and BIC can be compared.These values are available as part of the output of fmcfust and can be accessed using Fit.versicolor$loglik,Fit.versicolor$aic, and Fit.versicolor$bic.They can also be viewed using the print command as shown below for the example above.The contours of the fitted model can be visualized using fmcfust.contour.2d(see Figure 1).Further details and examples of contour plots will be given in Section 5.5.

Fitting a FM-CFUST distribution
Consider now the fitting of a three-component FM-CFUST model to the entire iris dataset.It consists of four geometric measurements on 150 observations of Iris, with 50 observations from each of three species of Iris (Setosa, Virginica, and Versicolor).The following code fits a FM-CFUST model using the results of a FM-uMST model as starting values.

Nested special cases of the FM-CFUST distribution
In this section, we focus on the restricted and unrestricted versions of MST mixture models.For the normal and t-mixture models, routines for fitting them are implemented in EM-MIXskew.As noted earlier, the rMST distribution corresponds to a CFUST distribution with q = 1.Thus setting q=1 in fmcfust will fit a FM-rMST model.However, as EM-MIXskew uses a specialized implementation of the EM algorithm for this model, the user is encouraged to use this package when fitting a FM-rMST model.Similarly, the EMMIXuskew package can be used for the fitting of a FM-uMST model.To fit the FM-rMST model to the same dataset as in the example above using the EMMIXcskew package, the following code can be used.

> fit.restricted <-fmcfust(3, iris[,-5],q=1)
The above model can also be fitted using the EMMIXskew package with the command fit.restricted <-EmSkew(iris[,-5],3,"mst",debug=FALSE).We can compare the predicted cluster labels of these models against the true class labels.For all three models, the predicted cluster labels are stored as clust in the output object.A cross-tabulation of these labels suggests that the fitted FM-CFUST model performs well with only one misclassified observation, whereas the FM-rMST and FM-uMST models have three and six misclassified observations, respectively.The misclassification rate (MCR) against the true labels can be calculated using error.ratefrom the EMMIXskew package.In this example, the FM-CFUST model obtained the lowest MCR compared to the FM-rMST and FM-uMST models (see Table 2 and the code below).

Random sample from the FM-CFUST distribution
The CFUST admits a convolution-type stochastic representation that facilitates random sample generation.More specifically, let U 0 and U 1 be independent random variables following multivariate normal distributions given by N p (0, Σ) and N q (0, I q ), respectively.Let also w be a scalar random variable with the gamma( ν 2 , ν 2 ) distribution.Then has a CFUST distribution with density given by ( 1).The rcfust function adopts ( 22) to generate a random sample of CFUST observations.Its mixture version is implemented as rfmcfust in EMMIXcskew.These two functions are given, respectively, by rcfust(n, mu, sigma, delta, dof, known=NULL, ...) rfmcfust(g, n, mu, sigma, delta, dof, pro, known=NULL, ...) Input arguments for the above functions follow the same structure as described in Section 4.1, permitting the parameters of the CFUST (or FM-CFUST) model to be specified either individually using mu, sigma, delta, and dof (and also pro for a FM-CFUST model) or within a list object using known.The argument n specifies the number of random observations to be generated.In the case of a FM-CFUST model, n is either a single integer (which represents the total number of observations to be generated) or a vector of g integers representing the number of observations to be generated from each of the g component.Note that if n is a single value, rfmcfust will determine the sample size for each component using the mixing proportion specified by pro.
As an example, suppose one would like to generate 10 random observations from a CFUST , and ν = 4, the following command can be issued at the R command prompt.A 10 × 2 matrix will be returned.

Starting values for fitting FM-CFUST distributions
Three different strategies for generating starting values for the FM-CFUST model were described in Section 3.3.These are implemented in the EMMIXcskew package.Apart from the default starting strategy (13) which makes use of moment-based estimates of a uMSN distribution, the init.cfustfunction implements the transformation approach (15) as one of its options, and accepts starting values based on the results of its nested models as another option.The arguments of the function are the following: init.cfust(g,dat,q=p,initial=NULL,known=NULL,clust=NULL,nkmeans=20, method=c("moments","transformation","EMMIXskew","EMMIXuskew")) To use a fitted model given by the packages EMMIXskew and EMMIXuskew, set method to "EMMIXskew" and "EMMIXuskew", respectively, and the output of the functions EmSkew and fmmst can be directly passed into init.fustusing the argument initial.If an initial value of the parameter vector is not supplied (that is, initial=NULL), then the default option is to provide an initial value for each component-parameter vector obtained by applying the method of moments (that is, method="moments") to the clusters corresponding to the components.These clusters are obtained by using the k-means procedure, but the user can specify an initial partition obtained using some other method of clustering, for example, a model-based approach using a mixture of t-distributions.The user-specified initial partition is passed in using clust.If the transformation approach is preferred, set method="transformation". Again, in this case, if an initial partition is not supplied, the partition given by k-means clustering is used.
An example session is shown below demonstrating how to use the above function to generate different starting values.We use the Geyser dataset (Azzalini and Bowman 1990), which contain measurements on 299 successive eruptions of the Old Faithful Geyser during 1 August to 15 August 1985.The two variables recorded were the waiting time between eruptions and the duration of each eruption, both measured in minutes.This dataset is available from the MASS package.

R> library(MASS) R> data(geyser) R> plot(geyser, pch=20)
An initial inspection of the dataset (Figure 3(a)) suggests three clusters.Hence, we set g to 3. In the example below, initial.defaultand initial.transformationrefers to default (moment-based) approach and the transformation approach, respectively.For the nested approach, we have demonstrated in Section 4.3 how to use the results of a fitted FM-uMST model as initial values.In that example, the model was fitted using fmmst() in our package, which is a replica of the same function in the EMMIXuskew package, and hence the option method="EMMIXuskew" was used.This option can be used in the same way to supply initial values from the EMMIXuskew package.In addition, the EMMIXcskew package also accepts outputs from the EMMIXskew package which provide routines to fit finite mixtures of normal, t, (restricted) skew-normal, and (restricted) skew t-distributions.In this case, the option method="EMMIXskew" is used to pass the results to fmcfust().
R> initial.default<-init.cfust(3,geyser) R> initial.transformation<-init.cfust(3,geyser, method="transformation") R> fit.geyser.restricted<-EmSkew(geyser, 3, "mst", debug=FALSE) R> initial.restricted<-init.cfust(3, geyser, initial=fit.gesyserIn this case, the starting value that corresponds to the fitted model of FM-rMST gave the highest log likelihood value.We now proceed to fit a FM-CFUST model using the default starting strategy, the transformation approach, and the fitted model of FM-rMST. R> fit.geyser1 <-fmcfust(3, geyser, initial=initial.default)R> fit.geyser2 <-fmcfust(3, geyser, initial=initial.transformation)R> fit.geyser3 <-fmcfust(3, geyser, initial=initial.restricted)R> fit.geyser1$loglik [1] -1415.442R> fit.geyser2$loglik [1] -1344.724R> fit.geyser3$loglik [1] -1333.533According to the use of the final log likelihood value, the results of fit.geyser3 are preferred.This corresponds to the result using the initial value with the highest log likelihood value identified above.Note that this may not always be the case; that is, using the initial value with the highest log likelihood value may not always give the optimal results compared to those with smaller initial log likelihood values.It is advisable to run the EM algorithm using a range of different starting values.To visualize the clustering results of the above three models, we can plot the data with colours according to the predicted cluster labels given by these models, as shown below.Figures 3(a), 3(b), and 3(c) show the results using the default strategy, the transformation approach, and the fitted FM-rMST model, respectively.It can be observed that fit.geyser2 perhaps gave a more natural partition of the data, although its log likelihood value is lower than that given by fit.geyser3.

Stopping Criteria
The stopping criteria described in Section 3.4 are available through the convergence option in fmcfust().The default is using Aitken's acceleration-based approach (convergence="Aitken").The other two options are convergence="likelihood" and convergence="parameters", referring to relative likelihood based and relative parameters-based approach respectively.For illustration, using the Geyser dataset and intiial.restrictedas initial values as an example, we can run the EM algorithm with the relative likelihood-based and relative parameterbased convergence criteria using the following commands.------------------------------------------------- Briefly, dat is a dataset that is either a matrix or data.frame.Note that if dat is not specified, then the limits of the axes of the plot must be specified (using the standard xlim, ylim, and zlim arguments).The parameters of the FM-CFUST model are specified using model.Typically, this is an output from fmcfust.The argument grid determines the grid size of the plots.Thus the higher the number in grid, the smoother the contours (at the cost of longer computation time).The data points (if provided) are plotted by default.By setting drawpoints=FALSE, only the contours will be plotted.In the case where g > 1, a user-specified partition of the data can be provided using clust.This is used when drawpoints=TRUE and the data points in the plot will be colour-coded using the labels in clust.The arguments nlevels and levels control how many contours are displayed and at which percentiles are they computed, respectively.Finally, component specifies which component is included in the plot.This option allows the components to be plotted individually without taking into account the mixing proportions.In contrast, the default is to return the contours of a mixture density.
To illustrate the use of fmcfust.contour.2d,we reconsider the iris.versicolorexample in Section 4.2. Figure 1 can be generated by the following command in R.
R> fmcfust.contour.2d(iris.versicolor,fit.Versicolor, drawpoints=FALSE, + main="versicolor") We now turn to an example of generating a 3D contour plot of a FM-CFUST model.Suppose we would like to draw the contours of a two-component FM-CFUST distribution with parameters µ 1 = (0, 0, 0) , µ 2 = (5,5,5) , ν 1 = ν 2 = 3, π = (0.2, 0.8), We first create an object obj with these parameters.By default, a mixture density is produced on running the fmcfust.contour.3d(see Figure 4(a)).A first remark on this figure is that the two components seem to be 'joined' together.To gain a better view of the shapes of these two components, we may set components=1:2 so that their mixing proportions are ignored.
In addition, we generate 500 random observations from the specified FM-CFUST model and include them in the plot.Observe now in Figure 4(b) that the two components are plotted as two separately objects and their colours are matched with the simulated data.

Concluding remarks
This paper has presented the EMMIXcskew package for the fitting of a CFUST distribution and finite mixtures of CFUST distributions to heterogeneous data that exhibit non-normal features.In addition to computing the maximum likelihood estimates of the model parameters, the EMMIXcskew package provides routines for random number generation, density evaluation, the plotting of 2D and 3D contours, and a few different methods for initial values generation for the FM-CFUST model.A finite mixture of CFUST distributions provides a model for the robust extension of traditional normal mixture models, with greater flexibility in handling asymmetry and heavy tails.The skewness parameters of a CFUST distribution are characterized by a general matrix, which provides an elegant unification of the restricted and unrestricted skew t-distributions.The aim of this package is to provide users with the option of fitting this flexible distribution to their dataset.Model selection criteria such as the AIC and BIC are provided for the FM-CFUST model to assist the user in choosing between different models for their data.
It is noted that the fitting of a FM-CFUST model can be quite computationally intensive when q is large.This is due to the calculations of some of the conditional expectations involved in the E-step of the EM algorithm.Future work will consider applicable strategies to speed up the parameter estimation process for this model such as parallel implementations described in Lee et al. (2016b,a) and Lee and McLachlan (2016b).

Figure 1 :
Figure 1: Contours of a FM-CFUST model fitted to the versicolor data.

Figure 2 :
Figure 2: Clustering of the iris dataset.The upper panels of the figure on the left show the true labels, whereas the lower panels are the predicted labels given by the FM-CFUST model.For the figure on the right, the upper panels correspond to the results given by the FM-uMST model and the lower panels correspond to that given by the FM-rMST model.

Figure 3 :
Figure 3: The Old Faithful geyser data from the MASS package.(a) Scatter plot of the data.(b) Clustering results of the FM-CFUST model using the default (moments-based) approach for generating starting values.(c) Clustering results of the FM-CFUST model using the transformation approach for generating starting values.(d) Clustering results of the FM-CFUST model using the results of FM-rMST model as starting values.

Figure 4 :
Figure 4: Contour plots of FM-CFUST models generated by EMMIXcskew package.(a) 3D contours of the density of a FM-CFUST distribution.(b) 3D contours of the density of each component of the FM-CFUST distribution shown in (b).

Table 1 :
Structure of the model parameters in EMMIXcskew.

Table 2 :
Misclassification rate of the three skew t-mixture models fitted to the iris dataset.
To help choose an appropriate starting value, we can compare the log likelihood values for the FM-CFUST model fitted using these initial values.