PPtreeViz : An R Package for Visualizing Projection Pursuit Classiﬁcation Trees

PPtreeViz , an R package, was developed to explore projection pursuit methods for classiﬁcation. It provides functions to calculate various projection pursuit indices for clas-siﬁcation and to explore the results in the space of projection. It also provides functions for the projection pursuit classiﬁcation tree. The visualization methods of the tree structure and the features of each node in PPtreeViz can be used to easily explore the projection pursuit classiﬁcation tree structure and determine the characteristics of each class. To calculate the projection pursuit indices and optimize these indices, we use the Rcpp and RcppArmadillo packages in R to improve the speed.


Introduction
Projection pursuit (Huber 1985) uses the projection pursuit index and optimization procedure to find an interesting low dimensional projection. Such an interesting feature is defined by the projection pursuit index, and we usually maximize the predefined projection pursuit index to find an interesting projection. An interesting projection for classification is a view with the most separable classes. Several projection pursuit indices with class information have been suggested. For example, Lee, Cook, Klinke, and Lumley (2005) proposed the LDA (linear discriminant analysis) projection pursuit index using class information through an extension of the linear discriminant analysis idea. They also proposed the L p index. These indices work for a reasonable amount of data with a small number of variables. The PDA (penalized discriminant analysis) index was developed for highly correlated data or a small number of observations with very large number of variables (Lee and Cook 2010). Lee, Cook, Park, and Lee (2013) proposed the projection pursuit classification tree, a new approach to build a classification tree using projection pursuit indices with class information.
At each node, the projection pursuit classification tree uses the best projection to separate two groups of classes using various projection pursuit indices with class information. One class is assigned to only one final node and the depth of the projection pursuit classification tree cannot be greater than the number of classes. Therefore, the projection pursuit classification tree constructs a simple but more understandable tree for classification. The projection coefficients of each node represent the importance of the variables to the class separation of each node. The behaviors of these coefficients are useful to explore how classes are separated in a tree.
PPtreeViz is an R (R Core Team 2017) package that was developed to explore the projection pursuit indices with classification as well as the projection pursuit classification tree. It provides an initial overview of the data for classification, allowing us to further explore the entire tree structure as well as each node of the tree.
A couple of R packages have been developed to draw a graphical representation of the classification tree. The generic function plot.rpart from package rpart (Therneau, Atkinson, and Ripley 2017) shows a basic representation of the tree structure, and the prp and fancyRpartPlot functions in rpart.plot (Milborrow 2017) package show more fancy plots. Packages party (Hothorn, Hornik, Strobl, and Zeileis 2017) and partykit (Hothorn and Zeileis 2015) include plots of the tree with various options of the inner terminal node for the user to explore the tree structure. Also, the tree representation is neat and clear, and each inner node has an id number that can be used for further analysis. Most of these packages focus on the representation of the tree structure and the status of the classes in each node. We adapted the way that party draws the tree structure to represent the projection pursuit classification tree and then added more features to explore the data space of each node. PPtreeViz also provides the visualization tools to explore projection pursuit indices with class information. This package is a useful exploratory data analysis tool for classification.
In Section 2, we review the various projection pursuit indices with class information, propose two new indices with class information, and describe optimization methods for projection pursuit indices. Visual exploration methods for the projection pursuit are also discussed. The usages of R functions to calculate and optimize index values, and the method used to visualize the results are described. In Section 3, we outline the way the projection pursuit classification tree is constructed and show how to fit the projection pursuit classification tree with functions in PPtreeViz. The visualization methods used to explore the projection pursuit classification tree structure are discussed in Section 4. Examples and discussion then follow.

Projection pursuit indices with class information
Let X ij be the p-dimensional vector of the j-th observation in the i-th class, i = 1, . . . , g, j = 1, . . . , n i , g is the number of classes, and n i is the number of observations in class i. Also, let n i j=1 X ij /n is the overall mean, and n = g i=1 n i is the total number of observations.

LDA index
A is a projection matrix. In this index, n i is used as the weight of class i. If the sizes of classes (n i ) differ substantially, the result of the projection pursuit method is affected by the sizes of the classes, and is usually dominated by the class with a large size. Even though the observed data have different class sizes, it might be interesting to find separations that are not influenced by class size. For this purpose, we developed indices with the same weights, n/g, and this is equivalent to using no weight.
LDAindex function is developed to calculate the LDA index value. It is written in C++ using the Rcpp (Eddelbuettel and François 2011) and RcppArmadillo (Eddelbuettel and Sanderson 2014) packages in R. origclass is the group information; it should be a vector of integers, characters or factors. origdata is the original data without group information and should be a matrix. proj is the projection vector to be used if the user wants to project the original data with a default value of NULL. If the user does not provide his/her own projection vector, LDAindex treats origdata as the projected data and calculates the LDA index value of the original data. weight is the option for the weight in the index calculation and the default value is TRUE.

PDA index
When variables are highly correlated, |A (W + B)A| in the LDA index is close to zero and the LDA index does not work properly. The PDA index (Lee and Cook 2010) has a penalty term with λ to prevent the value of the determinant from being close to zero. λ is the value in [0, 1] and controls the proportion of the penalty term. If λ = 0, no penalty term is added and the PDA index is the same as the LDA index. If λ = 1, all variables are treated as uncorrelated variables. If the observed data are highly correlated, we need to use a large λ.
The definition of the PDA index in Lee and Cook (2010) only works for the standardized data. We extend this idea to the raw data. Let Then, The main idea of the PDA index is to keep the diagonal elements of W and to reduce the effect of the off-diagonal elements using λ. This approach weakens the correlations among variables and keeps the variances of the variables.
The L r index ignores the correlations among the variables, focusing only on the variations of each variable, and uses L r measure to calculate the variations. With various r, we can find different views with separable classes (Lee et al. 2005). In PPtreeViz, we modified the definition of L r index to keep the index value in [0, 1]. Let be the projected data of X ij onto the q-dimensional projection A and Then, We can also use the same weight n/g instead of n i .
The Lrindex function is developed to calculate the L r index value. r should be a positive integer value and the default value is 1.

1D Gini index
The Gini is the impurity measure used in the classification tree (Hastie, Friedman, and Tibshirani 2011). Since this measure is defined for one-dimensional variables, the Gini projection pursuit index is developed to find a one-dimensional projection only. Let y ij,proj = a X ij and y * 1 ≤ · · · ≤ y * n be the ordered observations of y ij,proj , i = 1, . . . , g and j = 1, . . . , n i . For each k = 1, . . . , n − 1, let p Ri,k = # of observations of class i in y * k+1 , . . . , y * n n − k , Then,

1D entropy index
The entropy is another impurity measure used in the classification tree (Hastie et al. 2011). We use the same approach to the 1D Gini index. Let Then, where − log(g) ≤ P Entropy,k ≤ maxE and maxE = log 2 − log g if g is an even number − 1 2 log g 2 −1 if g is an odd number The property of the entropy index differs from that of the Gini index. When g is an even number, P Entropy,k has the value maxE when all observations in {y * 1 , . . . , y * k } are in g/2 classes and y * k+1 , . . . , y * n are in the other g/2 classes. This means that the entropy index has the maximum value when two groups of classes are separable and each group consists of the same number of classes. When g is an odd number, P Entropy,k has the value maxE when all observations in {y * 1 , . . . , y * k } are in the (g+1)/2 classes and y * k+1 , . . . , y * n are in the other classes. Therefore, the entropy index finds different views from those of the Gini index. The ENTROPYindex1D function is developed to calculate the entropy index value. [1] 0.6068117

Optimization for the projection pursuit with LDA and PDA indices
We need to use an optimization procedure to find an interesting projection using the projection pursuit index. For the LDA and PDA indices, we can calculate the theoretical optimum using the maximization lemma (Johnson and Wichern 2007). For the other indices, we use the simulated annealing optimization method.

LDA index optimization
Finding the optimum A to maximize I LDA,W (A) is the same as finding A to maximize the following Therefore, the optimal q-dimensional projection to maximize I LDA,W (A) is the first q eigenvectors of (W + B) −1 B. The same approach can be applied to I LDA,noW (A).
The LDAopt function is designed to find the theoretical optimal projection to maximize the LDA index with various dimension of the projection. To calculate (W + B) −1 , we use the ginv function in the MASS package (Ripley 2017;Venables and Ripley 2002

PDA index optimization
Finding the optimum A to maximize I PDA,W (A) is the same as finding A to maximize the following Therefore the optimal q-dimensional projection to maximize I PDA,W (A) is the first q eigenvectors of (W P DA + B) −1 B. The PDAopt function is designed to find the theoretical optimal projection to maximize the PDA index.

Simulated annealing optimization
Since it is not easy to find the theoretical optimum projection for the L r , Gini and entropy indices, we modified the simulated annealing optimization algorithm to fit the projection pursuit method. This algorithm can be applied to general projection pursuit indices. The first approach for the simulated annealing optimization for the projection pursuit is found in Lee et al. (2005). We modified their algorithm slightly to find the global optimum more quickly and more precisely. We also give guidelines to set up the initial values of the parameters.

Simulated annealing optimization algorithm for the projection pursuit
1. Set an initial projection A 0 and temp = 1.
Several parameters are used in this simulated annealing optimization. The cooling parameter is used to determine the step size of a new projection A k . This parameter is in (0, 1). If we use a small value for cooling, the step size is rapidly reduced, and it becomes difficult to find the global optimum. If we use cooling close to 1, the step size is slowly reduced, but there is a strong chance to find the global optimum. We recommend using 0.999 for our projection pursuit indices for classification to find the global optimum.
The energy parameter is used for the probability to take new projection. The smaller energy, the higher the probability and the chance to take a new projection will be high. However, the probability is also determined by the difference between the new projection pursuit index and the old projection pursuit index, and the range of these values depends on the data. We suggest that energy = 1 − I where I is the projection pursuit index of the original data. The PPopt function is designed to find the optimal projection to maximize the Lr, Gini and entropy indices.

Explore the projection pursuit indices with the two-dimensional data
To explore the properties of various projection pursuit indices with the two-dimensional data, we provide an R function for Huber's plot (Huber 1990). This plot represents the projection pursuit index values in all possible directions in a 2D space. These indices are calculated using the projections for (cos θ, sin θ), θ = 1 • , . . . , 180 • . For each projection, we calculate the projected data and the index value of the projected data. The circle with a dashed line represents the median of all index values and the solid circle shows the other index values relative to this median. The projection with the maximum index value is indicated as a solid line and the histogram of the data projected on to this optimal projection is also provided. We use the ggplot2 package (Wickham 2009) to draw Huber plot and the histogram and use gridExtra package (Auguie and Antonov 2015) to arrange these two plots. Figure 1 shows a simple example of the Huber's plot of the LDA index with two variables from iris datathe sepal length and the sepal width. From the histogram of the projected data on the best projection, we can see the separation of the setosa class from the other two classes, except for one observation in the setosa. The Huberplot function provides PPmethod options for various projection pursuit indices with class information. Also the user can use his/her own function with PPmethod = "UserDef" and UserDefFtn options to calculate the index.
For the simple example, we use the principal component analysis. To find the first principal component, we need to maximize the variance of the projected data, therefore we can define to sampleIndex as a function to calculate the variance of the projected data. Figure 2 shows the result of UserDefFtn = sampleIndex.

Exploring the optimal projection for classification
After finding the optimal projection in the q-dimensional space, it is worthwhile to explore the data projected onto the optimal projection with the projection coefficients. We developed the PPoptViz function to explore the projection space. The plot for the coefficient values of the projection and the histogram of the projected data are provided for q = 1. The coefficient values for each variable are represented using bars from zero (represented as a black line) and the red dotted lines at ±1/ √ p represent the guidelines showing the significant importance of the coefficients. If the coefficient value is outside of these dotted lines, we can conclude that the corresponding variable plays an important role in the separation. The histogram represents the distributions of each class with different colors and helps detect the view with separable classes. We then use the plot of the projection coefficient values, to determine which variables are important to separate classes in this optimal view. Figure 3 shows the best projection coefficients and data projected onto the best projection with iris data using the LDA index. Variables 3 and 4 have more important roles than the other two variables to separate the setosa class. We also can observe some separation between virginica and versicolor with some overlap between the two classes.
R> PPoptViz(LDAopt(origclass = iris [,5], origdata = iris[,1:4], q = 1)) For q > 1, the plots are provided with the type of q * q matrix. The diagonal parts represent the coefficient plots of the optimal projection in each dimension and the off-diagonal parts represent the scatter plot of the data projected onto the corresponding projection. Figure 4 shows the best 2D projection coefficients and a scatter plot of the projected data. In the scatter plot for the 1st dimension (dim 1) and 2nd dimension (dim 2), we can clearly see that the setosa class is separated from the other classes. We also can see that the versicolor and virginica class appear to be separable, although this is not clear and we cannot observe this separation without color. Even though variables 2 and 4 have larger projection coefficients than the other two variables in the 2nd dimension, this 2nd projection does not have an important role for class separation. For iris data, the 1st projection shows enough separation.

Projection pursuit classification tree
In this chapter, we explore how to construct the projection pursuit classification tree using PPtreeViz. A projection pursuit classification tree (Lee et al. 2013) is a type of classification tree that uses projection pursuit indices with class information. The usual tree-structured classification finds the rule used to separate data into two groups using impurity measures that determine the degree of purity of two groups in terms of classes. In contrast, the projection pursuit classification tree finds the rule used to separate classes into two groups. This rule uses the best projection to separate two groups of classes with various projection pursuit indices with class information. One class is assigned to only one final node and the maximum depth of the projection pursuit classification tree is the number of classes. Therefore, the projection pursuit classification tree constructs a simple but more understandable tree for classification. The projection coefficients for each node represent the importance of the variable in separating the classes in each node, and the behaviors of these coefficients are useful to explore how classes are separable.

Projection pursuit classification tree with various indices
The original package PPtree uses C to calculate the projection pursuit indices and to optimize these indices. For the projection pursuit classification tree, the PPtree package also provides three different functions with different indices and optimization methods: LDA.Tree, PDA.Tree, and PP.Tree. At present, Rcpp is a popular tool to develop R packages with a heavy computational load. Therefore we want to move on to use Rcpp instead of C. We combine these three functions into the PPtreeClass for the projection pursuit classification tree with the PPmethod option in PPtreeViz library. To use this function, we need at least the two arguments, formula and data. The default value of PPmethod is "LDA". In the projection pursuit classification tree, we provide 8 different rules to define the cutoff values for each node. Let m 1 , med 1 , s 1 , IQR 1 , and n 1 be the mean, median, standard deviation, inter quartile range, and sample size of the left group at each node, respectively, and let m 2 , med 2 , s 2 , IQR 2 , and n 2 be the mean, median, standard deviation, inter quartile range, and sample size of the right group. The rules are then defined using the following formula: Rule 2 = n 2 n 1 + n 2 * m 1 + n 1 n 1 + n 2 * m 2 Rule 5 = 0.5 * med 1 + 0.5 * med 2 Rule 6 = n 2 n 1 + n 2 * med 1 + n 1 n 1 + n 2 * med 2 To calculate the means, medians, inter quartile ranges and standard deviations in each rule, we use the stats packages (R Core Team 2017).

Visualization of projection pursuit classification tree
The projection pursuit classification tree focuses on the exploratory analysis as well as the precision of classification. With this tree structure, we can determine the variables that play  important roles in each separation. Thus, it is very important to explore each node in the projection pursuit classification tree structure and to find out how to divide classes into two groups. This is the main advantage of the projection pursuit classification tree. Therefore we need to develop tools to explore the projection space of each node. PPtreeViz provides two functions to explore the projection pursuit treeplot and PPclassNodeViz. For the plot with the tree structure, we modified plot for BinaryTree in the party package (Hothorn et al. 2017). The plot function in the party package shows inner nodes with node id and edges with condition. For the final nodes, however, it represents summary statistics for the classification results. For the projection pursuit classification tree, we need to show the name of the class and the node id for the final node. Therefore we need to modify plot in party.
The plot function in PPtreeViz is a generic plot function for an object with PPtreeclass class, which is the result from the PPtreeClass function. The font.size and width.size options are available for a large tree with a large number of groups. font.size determines the size of the letters in the plot, and its default value is 17. width.size determines the size of the ellipse for the inner node, and its default value is 1.
R> plot(Tree.result) R> PPclassNodeViz(Tree.result, node.id = 1, Rule = 1) R> PPclassNodeViz(Tree.result, node.id = 3, Rule = 1) Figure 5(a) represents the projection pursuit classification tree with iris data. The ellipses represent the inner nodes and the rectangles with group names represent the final nodes. The node id is shown in the square box by each node. In each ellipse, we find the optimal 1D projection using the LDA index and find the cutoff values with various rules. At node 1, the projection pursuit classification tree separates "setosa" from the other two classes and at node 3, this tree separates "virginica" from "versicolor". We can explore these inner nodes more closely using the PPclassNodeViz function, and the result is shown in Figure 5(b). We can use the plot of the projection coefficients to figure out that variables 2 and 3 have important roles in separating "setosa" from the other two groups, but they work in a different direction. This means that "setosa" has a large value for variable 2 and a small value for variable 3. In contrast, the other two classes have a small value of variable 2 and a large value of variable 3. This result is shown in the middle plot titled "Mean of left and right nodes". In this plot, "L" refers to the left group ("setosa" in node 1) and "R" refers to the right group ("versicolor" and "verginica").
The dashed vertical red line in the histogram of the projected data represents the cutoff values for the specified rule (-1.0708 in this plot). These coefficient values and cutoff values can be checked with the coef.print = TRUE and cutoff.print = TRUE options in print function.
No errors occur at node 1.
Figure 5(c) shows the result for node 3 of the tree. From the coefficient plot, we can see that variable 4 is the most important variable to separate "virginica" from "versicolor". In this node, a few observations are misclassified with rule 1.
The projection coefficients at each node are related to the correlations among variables. Therefore it is helpful to explore the coefficients using the correlations. In the PPclassNodeViz function, we provide an image option to draw the image plot of the correlation matrix. The default value is FALSE, and in the next chapter we show how to use image = TRUE through examples.
In the PPtreeViz library, we define the PPtreeclass class to save all results from fitting the PPtreeClass function. Currently, partykit is commonly used to summarize and visualize tree structure in various ways. To use the properties of partykit, we define the as.party.PPtreeclass function to add PPtreeclass class to the party class.   Figure 6: Plot of the structure of projection pursuit classification tree with partykit.
V1 Weight of the fish V2 Body length -length from the nose to the beginning of the tail V3 Tail length -length from the beginning of the tail to the end of the tail V4 Ratio of the length from the notch of the tail to the end of the tail to the tail length V5 Ratio of height to the total length V6 Ratio of width to the total length Table 1: Variables in fishcatch data.
7)* proj3*X >= cut3 -> "Smelt" 5) proj2*X >= cut2 8)* proj4*X < cut4 -> "Perch" 9) proj4*X >= cut4 10)* proj5*X < cut5 -> "Roach" 11)* proj5*X >= cut5 -> "Whitewish" 3) proj1*X >= cut1 12)* proj6*X < cut6 -> "Bream" 13)* proj6*X >= cut6 -> "Parkki"  Figure 7 is the result of the projection pursuit classification tree using the PDA index with lambda=0.1. No misclassification occurs in this projection pursuit classification tree with rules 1 and 3. At node 1, we can separate Bream and Parkki from the other fish classes. Figure 8 represents the plot of node 1 with the image = TRUE option. In the histogram of the projected data (the upper right panel), we can clearly see that Parkki and Bream are separated from the other fish classes. The coefficient plot (the upper left panel) shows that the height (V5) is the most important variable. From this plot, we can see that Parkki and Bream have a tall (V5) and narrow width (V6) relative to the other classes of fish. In the correlation plot (the lower right panel), yellow represents positively correlated and blue represents negatively correlated variables. From this image plot, we can see that V1, V2 and V3 are positively correlated, V4 and V6 are negatively correlated, and V5 and V6 are uncorrelated.
At node 2 (Figure 9), the projection pursuit classification tree separates Pike and Smelt from the other fish classes. In this separation, the ratio of height (V5) is important. Pike and Smelt are shorter than the other fish classes. In the data of node 2, V4 is negatively correlated with V5 and V6. In contrast to the data at node 1, V5 and V6 are positively correlated at node 2.
Bream and Parkki are clearly separated in node 3, and Figure 10 presents the result of node 3. In this separation, the tail length (V3) is the most important variable while the ratio of the notch length (V4) also plays an important role. This result reveals that Bream has a much longer tail and much deeper notch than Pakki.
Pike and Smelt are separated at node 4 ( Figure 11). Pike has longer body (V2), longer tail (V3), and lighter weight (V1) than Smelt. In the data of node 4, V1, V2, and V3 are highly negatively correlated with V4. Perch is separated from the others at node 5. From Figure 12, we can see that Perch has a shorter tail (V3) and a longer length (V2), with a small notch in the tail (V4) relative to Roach and Whitewish. At node 9 separation (Figure 13), we can separate Roach from Whitewish with the role of two important variables, the ratio of the notch (V4) and the weight (V1) of the fish. Whitewish are heavier than Roach and have a small notch in the tail compared to that for Roach. The data at node 9 show that V4 is negatively correlated with all other variables.
The projection pursuit classification tree can be used in PPtreeViz to determine the special characteristics of each species, especially from a classification point of view. Fortunately, there are no misclassification cases in this example. However, we can notice any misclassification and the reason for it at each node by using the histogram of the projected data and the values of the projection coefficients.

Isolated letter speech recognition data
The isolated letter recognition data (ISOLET) is from the UCI Machine Learning repository (http://archive.ics.uci.edu/ml). In this repository, two data sets are available: one for training and the other for testing. The training set has 240 observations and the testing set   Figure 14: Projection pursuit classification tree with letter identification data.
has 60 observations for each letter. Both data sets have 617 variables. The descriptions of the variables without matching the order in the data sets can be found in Cole and Fanty (1990). This data set was adopted in several studied in machine learning (Cole andFanty 1990, Dietterich andBakiri 1991). The algorithms are mainly designed to precisely predict letters with 617 variables from spoken letters. Their concern is focused on predictability, not interpretability. However it is also important to determine the feature of each letter and to understand their characteristics. In this paper, we have explored the tree structure of the projection pursuit classification tree using the training set and the PDA index with lambda = 0.1. Figure 14 shows the entire projection pursuit classification tree structure of the ISOLET data. The projection pursuit classification tree has 26 final nodes and the depth of this tree is 14. With the plot function in PPtreeViz, we can draw this complicated tree structure without any overlap. The training error rate for this projection pursuit classification tree is 0.0224 and the test error rate is 0.0693. Figure 15 presents the results for node 2. In this ISOLET data, each variable is represented as a line in the coefficient plot and the mean plot since we have too many variables. In the mean plot, the variables with a large difference between the left and right group are shown in red. Node 2 separates the letter F and S from the letter X. These two groups of letters have quite different patterns, especially around variable 350 and 450. For these variables, the coefficient plot has high peaks. Also the means between variables 150 and 220, and around variable 600 are different. Figure 16 shows the results for node 26, where J and K are separated from G, T, A, E, B, D, V, and P. The main differences of the means between these two groups of letters are in the first 200 variables, the variables between 450 and 470, and around variable 400. This feature is also captured by the coefficient plot. Figure 17 shows node 35 separating E, B and D from V and P. The patterns between variable 450 and variable 600, around variable 330, and around variable 420 differ somewhat, and the coefficient plot has high peaks around these variables. Due to the lack of information regarding the description of the variables, we cannot clearly explain how these letters are differently classified. However, we determine which variables mostly differ between the two groups and how they separate groups.

Discussion
The projection pursuit classification tree is a tree for classification that is used with the projection pursuit method. Each class in this tree is assigned to only one final node,which simplifies the final tree structure. The final tree structure is worth exploring, since it is easy to understand, to interpret how classes are separated and to explore the feature of each class. The original R package PPtree for the projection pursuit classification tree was developed using the C language for the main calculation. We want to keep PPtree with the C language and also to make a new version for calculation with Rcpp. Therefore, we developed the projection pursuit index calculation, optimization algorithm with Rcpp and RcppArmadillo packages. We also add a visualization of the result of the optimization for the projection pursuit indices, the tree structure, and the data space in each node. We combine all these methods into the PPtreeViz package.
There are a couple of R packages for the classification tree with a visualization of the tree structure. rpart and rpart.plot are packages for the classification tree and for visualization of the result of rpart. The generic plot of rpart does not show good presentation of the tree structure. prp provides a more efficient plot of the tree structure including interactive pruning option. fancyRpartPlot provides nice presentation of the tree structure. However fancyRpartPlot is not suitable for a big tree with a large number of nodes. The party and partykit packages provide a function for new classification tree method and also provides a new approach to visualize the the general binary tree. Its visualization method includes options of various plots that present the status of the final node, as well as the tree structure. In its visualization of the tree structure, the inner nodes are represented as ellipses, with an indication of the node, and the final nodes are represented as various type of plots. Even though the visualization methods in party and partykit are useful, there is a limit to exploring the projection pursuit classification tree.
The visualization of the tree structure in the PPtreeViz package adapted the method from the party package for the inner node representation. For the final node, PPtreeViz presents the assigned class with the node id, and this helps indicate the specific node with the node id and explores the separations in each node as well as the overall tree structure. With this PPtreeViz, we can easily understand how classes are separated, which features differ from the other classes, etc. This information can be useful for further analysis in order to improve the predictability of the classification. PPtreeViz also provides methods to explore the projection pursuit indices using the class information.
In this paper, we propose two new indices, Gini and entropy, and we provide GINIindex1D and ENTROPYindex1D functions to calculate the index values. According to the index definition, these indices need to calculate P Gini,k and P Entropy,k for all k = 1, . . . , n, and as well as find the maximum values. It takes more time to get these index values than with other indices (LDA, PDA, or Lp indices), even though we use Rcpp (Eddelbuettel and François 2011). It gets worse when we use these indices in the PPopt function or PPtreeClass function. We recommend not to use PPmethod = "GINI" or PPmethod = "ENTROPY" options in PPtreeClass with a large number of observations or variables.