MPCI : An R Package for Computing Multivariate Process Capability Indices

Manufacturing processes are often based on more than one quality characteristic. When these variables are correlated the process capability analysis should be performed using multivariate statistical methodologies. Although there is a growing interest in methods for evaluating the capability of multivariate processes, little attention has been given to developing user friendly software for supporting multivariate capability analysis. In this work we introduce the package MPCI for R , which allows to compute multivariate process capability indices. MPCI aims to provide a useful tool for dealing with multivariate capability assessment problems. We illustrate the use of MPCI package through both simulated and real examples.


Introduction
Process capability indices such as, C p , C pk , C pm and C pmk are typically used as measures of process capability in the univariate domain (Kotz and Lovelace 1998). However, in the modern industry there are many manufacturing processes in which the quality is measured by the joint level of several correlated characteristics. Various process capability indices have been developed for assessing the performance of multivariate manufacturing processes. In general, multivariate process capability indices are constructed in one of the following ways: using the ratio of the volume of a tolerance region to the volume of a process region -see, for example, Shahriari, Hubele, and Lawrence (1995), Taam, Subbaiah, and Liddy (1993), and Pearn, Wang, and Yen (2007) and the references contained therein; using the proportion of nonconforming items -see Chen (1994) and Chen, Pearn, and Lin (2003); or using the principal component analysis (PCA) -see Wang and Chen (1998), Xekalaki and Perakis (2002) and Wang (2005).
With the exception of Scagliarini and Vermiglio (2008), which developed a visual basic program to realize an Excel spreadsheet for computing two multivariate measures of capability, a current problem with multivariate measures of capability is the shortage of user-friendly software.
In this paper, with the purpose of providing a useful tool to help practitioners in performing multivariate capability analyses, we present the R (R Development Core Team 2012) package MPCI which is available from the Comprehensive R Archive Network at http: //www.cran.r-project.org/package=MPCI. MPCI currently computes five multivariate process capability indices following the methods proposed by Shahriari et al. (1995), Taam et al. (1993) and the indices based on PCA proposed by Wang and Chen (1998), Xekalaki and Perakis (2002) and Wang (2005).
We structured this work by introducing each considered index together with its implementation in MPCI. The layout of the paper is as follows: Section 2 introduces the notation and the data sets that will be used throughout the article; Section 3 provides the theoretical background and the solutions offered by MPCI for performing multivariate capability analysis using the multivariate capability vector (Shahriari et al. 1995); Section 4 introduces the multivariate capability index M C pm (Taam et al. 1993) and illustrates the use MPCI; Section 5 examines the statistical methodology and explains how to use MPCI to obtain multivariate process capability indices based on PCA (Wang and Chen 1998;Xekalaki and Perakis 2002;Wang 2005)); Section 6 ends the paper with several concluding remarks.

Notation and data sets
Let X represent the vector of the v quality characteristics of interest with mean vector µ and covariance matrix Σ. We assume that the joint probability distribution of the v quality characteristics is the multivariate normal distribution The real data set, included as dataset2 in the package, consists of the n = 25 sample observations of the bivariate example discussed by Wang and Chen (1998):

The multivariate capability vector
The multivariate capability vector was proposed by Shahriari et al. (1995) based on the original work of Hubele, Shahriari, and Cheng (1991). The vector contains three components and appears as (C pM , P V, LI) C pM is the first component of the vector, it is a ratio of volumes. The numerator is the volume defined by the engineering tolerance region, while the denominator is the area or volume of a "modified process region", defined as the smallest region similar in shape to the engineering tolerance region, circumscribed about a specified probability contour: C pM = Volume of engineering tolerance region Volume of modified process region The volume of the engineering tolerance region is To compute the volume of the modified process region it is worth reminding that under the hypothesis of multivariate normality the statistic follows a χ 2 distribution with v degree of freedom. Therefore, the borders of the process region U P L i , the upper process limit, and LP L i , the lower process limit (i = 1, 2, . . . , v) are derived from the projection of the probability ellipsoid onto the respective axes. Specifically, they are determined by solving the systems of equations of first derivative, with respect to each X i of the quadratic form (Nickerson 1994) where χ 2 (v,α) is the 100(1 − α)-th percentile of a χ 2 distribution with v degrees of freedom associated with the probability contour. Usually, in analogy with the "6σ" in the denominator of the univariate indices, α = 0.0027. The solutions (two for each dimension i) of Equation 9 are (Wang, Hubele, Lawrence, Miskulin, and Shahriari 2000): where i = 1, 2, . . . , v and det Σ −1 i is the determinant of a matrix obtained from Σ −1 by deleting the i-th row and column. Thus, the volume of the modified process region is In practice µ and Σ are unknown. Given a random sample, X 1 , X 2 , . . . , X n , of size n from the process, to estimate µ and Σ the sample mean and the sample covariance matrix can be used. Rencher (2002) provides a detailed discussion on the properties of the sampling distribution of X and S, while Vershynin (2012) investigates the optimal sample size that guarantees estimation of a covariance matrix by a sample covariance matrix with a fixed accuracy.
As far as the index is concerned, values of C pM higher than 1 indicate that the modified process region is smaller than the engineering tolerance region, therefore we have high probability that the produced items will be classified as conform.
The second component of the vector (C pM , P V, LI) is defined as the significance level of a Hotelling T 2 statistic computed under the assumption that the center of the engineering specifications is considered to be the true underlying mean of the process: and F (v,n−v) is the F distribution with v and n − v degrees of freedom. Values of P V close to zero indicate that the center of the process is far from the engineering target value.
The third component of the vector summarizes a comparison of the location of the modified process region and the tolerance region. It indicates whether any part of the modified process region falls outside the engineering specifications. It has a value of 1 if the entire modified process region is contained within the tolerance region and, otherwise, a value of 0: LI = 1 if modified process region is contained within the tolerance region 0 otherwise (17) 3.1. The multivariate capability vector using MPCI MPCI consists of the core function mpci whose arguments are: index which specifies the index that will be computed, in this case index = "shah" (the other options are "taam", "wang", "xeke", "wangw" and will be illustrated in the following sections); x the matrix of the data (the quality characteristics are the columns); LSL the vector of the Lower Specification Limits; USL the vector of the Upper Specification Limits; Target the vector of the target of the process; alpha which specifies the quadratic form (9) and conventionally α = 0.0027. The function mpci uses also npc, Method and graphic. The arguments npc and Method refer to the indices based on PCA, therefore they will be discussed in Section 5. The logical argument graphic allows, in the bivariate case, for additional graphical representations concerning process and specifications and will be introduced in Section 4.
The function is thought to be easy to use, for this reason if alpha is missing, then the function assumes as default value α = 0.0027 (in analogy with the with the "6σ" in the denominator of the univariate indices). Similarly if the argument Target, the Target vector, is missing the program compute it as the midpoint of the specification interval (default value).
To illustrate how MPCI works we use the dataset1 included in the package.
R> library("MPCI") R> data("dataset1") R> x <-dataset1 R> Target  Now, the user can interpret the results: C pM = 1.26159 indicates that the modified process region is smaller than the engineering tolerance region; P V = 0.770597 means that the center of the process is not far of the engineering target value; finally, LI = 0 implies that at least in one direction, the modified tolerance region exceeds the specification limits.
The function, without the specification of the arguments Target and alpha is R> mpci(index = "shah", x, LSL, USL) It works with the default values and, in this example, gives the same results.

The multivariate capability index M C pm
The index M C pm was proposed by Taam et al. (1993) and is defined as a ratio of two volumes. The numerator is the volume of the modified tolerance region R 1 and the denominator is the volume of the scaled 99.73 percent process region R 2 . Under the multivariate normality hypothesis R 2 is an ellipsoidal process region, while the modified tolerance region R 1 is the largest ellipsoid that is centered at the target completely within the original tolerance region. In the general case of v characteristics the region R 1 is an hyperellipsoid with a volume given by where a i (i = 1, 2, . . . , v) are the lengths of the semi-axes.
Then the multivariate capability index is written as where X is the vector (v × 1) of measurements from a multivariate normal distribution with mean vector µ and covariance matrix Σ, Σ T = E (X − T) (X − T) is the mean square error matrix from the process, T is a vector of target values, and K(m) is a 99.73-th percentile of a χ 2 with v degrees of freedom.
The denominator of M C pm can be also expressed as a product of two terms: where R 3 is the region in which 99.73% of the process values fall within.
Therefore M C pm can be rewritten as: The M C pm index is a function of two components: C p which represents the process variability relative to the modified tolerance region; D which detects the process deviation from the target. Given a random sample of n measurements, X 1 , X 2 , . . . , X n , each of dimension v, the estimator for M C pm is given by When the process mean vector equals the target vector, and the index has the value 1, then 99.73% of the process values lie within the modified tolerance region.

M C pm using MPCI
MPCI computes the index M C pm by setting index = "taam". We illustrate this function using the dataset2 and introducing graphic: an optional argument that, in the bivariate case, allows to plot the ellipse of the process region, the tolerance region, the modified process region and the modified tolerance region. The plot is obtained by setting graphic = TRUE (default, graphic = FALSE).
R> data("dataset2") R> x <-dataset2 R> alpha <-0.0027 R> Target <-c(177,53) R> LSL <-c(112.7, 32.7) R> USL <-c(241.3,73.3) R> mpci(index = "taam", x, LSL, USL, graphic = TRUE) The function produces Figure 1 and the output is: Taam et al. (1993) Shahriari et al. (1995) is the ratio of the Tolerance Region and the Modified Process Region" [1] "MCpm index of Taam et al. (1993) is the ratio of the ellipsoids: Modified Tolerance Region and the Process Region " Examining the value of the computed index, M C pm = 1.825283, the user can assert that the process is capable since the process has a smaller variation than allowed by the specification limits. Furthermore, by means of the visualization offered by Figure 1, the user has an immediate idea of the process status with respect to the specifications.
The same example without using graphic is:  Wang and Chen (1998) proposed a method for constructing multivariate capability indices using PCA. PCA uses the spectral decomposition of the covariance matrix Σ

Multivariate capability indices based on PCA
where U = (u 1 , u 2 , . . . , u v ) is the matrix of eigenvectors of Σ with columns u i (i = 1, 2, . . . , v) and D = diag(λ 1 , λ 2 , . . . , λ v ) is the diagonal matrix of the eigenvalues. In this framework, the engineering specifications and the target values of the i-th principal component (PC i ) are for i = 1, 2, . . . , v, respectively.
The proposal by Wang and Chen (1998) evaluates the capability of a multivariate process considering a subset m (m ≤ v) of PCs. They defined MPCIs M C p , M C pk , M C pm and M C pmk using the univariate process capability indices of the PCs. The multivariate capability index M C p defined in Wang and Chen (1998) is where is the univariate measure of capability for the i-th PC, σ P C i = √ λ i and m denotes the number of PCs used for assessing the capability. Similarly, they have defined M C pk , M C pm and M C pmk by replacing C p;P C i with C pk;P C i , C pm;P C i and C pmk;P C i , respectively, for i = 1, 2, . . . , m. Values of the indices greater than 1 indicate that the process is capable.
In order to allow for potential differences in the portion of variance explained by the principal components, Xekalaki and Perakis (2002) suggested a series of new indices that assign unequal weights to the univariate index values corresponding to the principal components employed, in particular in proportion to the percentages of variance explained by them, as determined by their respective eigenvalues. They specifically proposed the following index and offered similar definitions of MXC pk , MXC pm and MXC pmk . Values of the indices greater than 1 indicate a capable process.
Within the framework of a short-run process capability assessment, Wang (2005) proposed the use of the weighted geometric mean. The weights are, once again, the eigenvalues λ i of each principal component. The index is The author proposed similar definitions for MWC pk , MWC pm and MWC pmk .

The indices based on PCA using MPCI
MPCI computes multivariate capability indices according to Wang and Chen (1998), Xekalaki and Perakis (2002) and Wang (2005), by setting index = "wang", index = "xeke" and index = "wangw" respectively. In this framework a key role is played by the number of principal components used for computing the multivariate indices. Therefore, in order to provide a wide range of options to the users, the package allows different solutions. The user can choose the number of principal components "a priori" by specifying the parameter npc. Note that, as guide for this choice, the function summary(princomp(x)) can be employed to obtain a summary on the importance of the components. Otherwise, by specifying the input argument Method, the user can choose one of the following five methods (see e.g., Rencher 2002): 1. Method 1 (Percentage). This method selects the number of principal components that explain at least 80% percent of the total variability.
2. Method 2 (Average). This method works with the principal components whose eigenvalues are greater than the average of the eigenvalues.

Method 3 (Scree)
. This option shows the Scree Diagram and the user can choose the number principal components (Rencher 2002).

Method 4 (Bartlett's test)
. The number of principal components used for the assessment of the indices is determined through the Bartlett's test (Bartlett 1950).

Method 5 (Anderson's test)
. The number of principal components used for the assessment of the indices is determined through the Anderson's test (Anderson 1933).
Note that, when the function mpci computes one of the three indices based on PCA, if npc or Method are not specified, then mpci computes the indices using the Method 1. Furthermore, when the user chooses for Method = 4 or Method = 5 the parameter alpha corresponds to the significance level used by the corresponding test. If alpha is missing, then α = 0.05 (default value).
Example with index = "wang" (Wang and Chen 1998)  In the output the user can find the number principal components selected by the default Method 1 and the values of the indices M C p , M C pk , M C pm and M C pmk . The interpretation of these indices is analogous to the univariate case. Values higher than 1, as in this case, are interpreted as that the process is capable, whereas values smaller than 1 suggest that some quality improvement activities were required.
Example with index = "wangw" (Wang 2005)  The interpretation is similar to example illustrated above. Example with index = "xeke" (Xekalaki and Perakis 2002) and Method = 3: The function produces Figure 2 and asks to enter the number of principal components. Entering 2 the output is

Concluding remarks
Processes with multiple quality characteristics often occur in manufacturing industries, consequently quantitative measures of process performance for multivariate quality characteristics are of great interest to quality control practitioners.
The R package MPCI provides an easy-to-use tool for computing multivariate process capability indices. To our knowledge, MPCI is the first available R software for computing multivariate process capability indices. Our aim is to provide useful tools for performing multivariate process capability analysis. In such a way we hope to contribute to the diffusion of multivariate process capability analysis. MPCI is still undergoing active development. Planned extensions include the multivariate indices based on the proportion of nonconforming items -see Chen (1994) and Chen et al. (2003) -and several new capability indices Pan and Lee (2010) recently proposed.