An Excel Add-In for Statistical Process Control Charts

Statistical process control (SPC) descibes a widely-used set of approaches used to de-tect shifts in processes in, for example, manufacturing. Among these are “control charts”. Control charts and other SPC techniques have been in use since at least the 1950s, and, because they are comparatively unsophisticated, are often used by management or operations personnel without formal statistical training. These personnel will often have experience with the popular spreadsheet program Excel , but may have less training on a mainstream statistical package. Base Excel does not provide the ability to draw control charts directly, although add-ins for that purpose are available for purchase. We present a free add-in for Excel that draws the most common sorts of control charts. It follows the development of the textbook of Montgomery (2005), so it may be well-suited for instructional purposes.

1. The Excel spreadsheet program and control charts 1.1. Introduction This paper describes a free, open-source add-in for the popular Excel spreadsheet program which draws control charts. The add-in should allow non-technical users to produce charts of interest; since it follows many of the directives in Montgomery's (2005) textbook, it may also be useful for instruction.

Control charts
A control chart is a graph with time on the X-axis and whose Y -axis shows the output of a particular process. The chart, together with control limits established from probability theory, provide a visual clue as to the stability of the process. When a point on the chart lands outside the control limits, or when certain anomalous behaviours are observed, the operators can be alerted to investigate or perform other actions. Although control charts can be used in many contexts, it will be convenient to imagine the output as representing measurements of quality in manufacturing.

Types of control charts
Our add-in draws eight popular control charts. Charts are described as being relevant to attribute (categorical) data or variable (continuous) data. For attribute data we draw the: p charts, for the proportion of defective items in binomial data; np charts, for the number of defective items in binomial data; c charts, for the number of defective items in a fixed-size sample from Poisson data; and u charts, for the rate of production of defective items in Poisson data.
For variable data, we draw the: X charts, for the average of a set of continuous measuments; R charts, for the range of a set of continuous measurements; S charts, for the standard deviation of a set of continuous measurements; and X charts, for individual measurements.
In each case we provide appropriate control limits. We also allow the user to identify observations that fall outside control limits, and to re-draw the graph excluding those points. For attribute data we implement a set of so-called "trend rules," which call for observations to be identified as worthy of further examination if they form patterns obeying certain criteria. For example, one rule says that a pattern is suspicious if two of three consecutive points fall in the region betweenx + s andx + 2s. Our add-in allows the user to identify when some or all of these patterns are present.

Implementation
The add-in is implemented in Visual Basic Applications (see, for example, Walkenbach 1999), so no compiled code is necessary. We believe that will make the add-in easy to install and use, even for users whose machines are constrained by, for example, security considerations. Implementation in Excel makes the add-in accessible to that program's huge user base and means that the familiar spreadsheet interface, rather than specialized software, can be used in courses on quality assurance. Our use of Excel's well-developed drawing functions produces graphs with a look that should hopefully be familiar to many users.

Excel versions prior to 2007
A user with a version of Excel prior to the 2007 version starts the installation procuess by downloading the add-in file to a location on his or her computer. Then he or she selects Tools | Add-Ins, clicks on Browse to locate the addin, and then presses Ok to complete the installation. Subsequently, a Control Charts item will appear under the Tools menu, and selecting that entry produces the start-up screen.

Excel 2007
Installation of add-ins in Excel 2007 also starts with the user downloading the add-in file to some known location on his or her computer. He or she should then click on the Office button in the top-left corner of the screen. The choice of Excel Options at the bottom of that window produces the Excel Options page, from which the choice of Add-Ins can be made in the left panel. With Excel Add-Ins selected in the Manage drop-down list at the bottom of the page, the user clicks Go to activate the Add-In window, then Browse to locate the add-in file, and then Ok to complete the installation. In this version of Excel the add-in is made available under the Add-In tab, where an entry named Control Charts will be visible under Menu Commands. Clicking that entry will produce the start-up screen.

Organization
The next section describes the user interface and the way that the user should organize his or her data in order to use the add-in. Sections 3 and 4 give brief descriptions of the charts, for users unfamiliar with the details of their construction. Finally, Section 5 gives conclusions and directions for further development.
2. The user interface 2.1. Example: Attribute data Figure 1 shows the window that appears when the add-in is invoked. This window is intended for attribute data; a user who needs a chart for variable data will click the Variables button in the top center of the window (see Section 4). The data for an attribute control chart will consist of a set of counts of defective items and a set of sample sizes. The data behind the form is taken from Table 6-7 of Montgomery (2005) and counts the number of defects found in constant-size sets of printed circuit boards. Here the user has requested a c chart.

Data and sample size
The user enters a reference to a column of count data, giving the number of defective units (or defects) produced, into the Enter Count Data field, most usually by clicking in that box and then highlighting the relevant area on the spreadsheet with the mouse. If the sample size is constant for all observations, that number can be typed directly into the Enter sample size box; alternatively the user can use the mouse to input a column of sample sizes the same size as the first column. The c chart measures counts (rather than rates) and therefore acts as if every observation is the count in a sample of size 1 unit. So for that chart (as shown in Figure 1) no sample size need be entered.
The user should generally have the count data be the rightmost column of interest on the sheet, since by default results are placed in a set of columns starting immediately to the right of the counts (see also below).

Choosing the chart
In the top right portion of the window the user selects the sort of chart desired. For p and np charts, the user can also enter a value of the p parameter in the box underneath. For c and u charts the box's label changes (as in Figure 1) to remind the user that in this case it is the c parameter that can be entered. In either case the relevant parameter is estimated from the data if the box is left blank.

Control limits
Across the middle of the window is a set of three checkboxes that allow the user to draw the control limits (UCL/LCL for "upper control limit" and "lower control limit"), lines at ±2 SD, and lines at ±1 SD. For attribute data all three of these boxes are checked by default.

Trend rules
Beneath the contol limit checkboxes is a set of three radio buttons that allow the user to apply different subsets of control rules. The Apply Western Electric Rules applies each of a set of eight rules listed in Montgomery (2005, p. 167). (Technically the term "Western Electric rules" more commonly refers just to the first four of these, but we have kept the phrase to evoke the entire set.) When this option is selected the program will "flag" points that produce violations of any of the either rules. When the UCL/LCL only choice is selected, only points that fall outside the lower or upper control limits will be flagged, and, of course, No Rules means that no points should be flagged. When two plots are produced simultaneously (for example,X and R charts), the rules are applied separately.
The Modify Rules button brings up a separate window that allows the user to choose a subset of the rules to be applied. Figure 2 shows the Modify Rules window. In the current version of the add-in, this allows users to select the set of rules that are to be applied when the Apply Western Electric rules button is pressed. In future versions we envision giving the user slightly more specific control over individual rules, allowing him or her to, for example, change the "six" in rule 5 ("Six points...in a row steadily increasing or decreasing") to "seven".

Excluding points
When one or more points fall outside a control limit (or violate one of the other rules) it is common to investigate them, to see if the violation can be ascribed to some known cause. Measurements produced when the process is known to be out of control are often exluded and the chart re-drawn. By entering an additional column of data into the points to exclude box, the user can direct that points with entries in that column be excluded from any calculations. Two such points have been excluded in Figure 1. Figure 3 shows the results of pressing Run in Figure 1. Notice that the excluded points still appear on the graph, marked with a special symbol; they are still flagged if they contribute to a violation of a relevant trend rule.
The "1" in Exclude 1's only refers to the control limit violation rule. When a point lands outside the control limits it is flagged with a "1," since rule number 1 has been violated. When the Exclude 1's only box is checked, the user is asking to exclude only points flagged in that way. Points that violate other rules, and which are therefore flagged with a different number, continue to be included in the computations when the Exclude 1's only box is checked.
In the current version of the add-in, results are placed immediately to the right of the data. Therefore the points to exclude column should not be placed immediately to the right of the data; doing that will over-write that column and confuse the algorithm. We recommend placing the column of points to exclude to the left of the data.

Result location
By default, results are presented in a set of columns immediately to the right of the data on the workbook. If a location is entered in the location of results box, results will someday be placed in that location instead. In fact, in the current version of the add-in this entry is ignored. Figure 4 shows the window that allows users to draw control charts for variables. The changes from the attribute case are largely minor. For variable data, we expect a matrix of data, each of whose rows describe a set of observations from a particular run (for example, from a manufactured lot). The number of columns in the data matrix then gives the size of the largest sample in the data; in some rows, some entries may be blank if sample sizes differ.

Variable data
Since there are two parameters in these charts (µ and σ) the user may enter one or both in the box labeled Enter mu (comma) sigma (or blank). That instruction tells the user to enter both µ and σ separated by a comma. A single number is interpreted as a value of µ, with σ to be estimated; a comma followed by a number, conversely, is taken to be a value of σ, with µ to be estimated.
For variables data, only upper and lower control limits can be drawn, and the Western Electric trend rules cannot be imposed. Figure 4 shows, on the spreadsheet behind the add-in window, the data in Example 5-1 of Montgomery (2005). For this example, the user has made the default choice of X/R (that is, she has asked to produce theX and R charts) and, by leaving the Enter mu box blank, has asked that µ and σ be estimated. Figure 5 shows theX and R charts produced from the

Preparing the data
For attribute data, the data representing the counts of defective items needs to be arrayed in a column. The computations are performed on the sheet, so they are visible after the picture has been drawn. This has the advantage of making the computations more transparent to the user, though it also means that if the values in the new cells change (perhaps because the user re-draws the chart with updated information) the graph will change as well. Figure 2 shows the way the data from Montgomery (2005 , Table 6-4) would be laid out. Note that the sample size data is located to the left of the column giving the number of noncomforming units.
For variable data, the user will pass in a matrix of observations, each row giving the observations for a particular sample. If the matrix of observations has only one column we produce the X chart for individual observations (below). No sample size needs to be supplied for variable data, since that information is carried in the numbers of non-empty entries in the rows.
In both cases, different types of results appear in the spreadsheet in one of two different ways. In some cases numbers are computed inside Visual Basic for Applications (VBA), and then displayed as numbers on the spreadsheet. For example, the lower control limit for theX chart is computed internally and displayed as a number. In other cases VBA creates formulas and relies on the underlying spreadsheet to evaluate them. For example, the "X-bar" column on the same chart is computed through a call to Excel's built-in =AVERAGE function. This choice has been made for programming convenience, but it has the effect that if the user changes the underlying data, some columns will change and others will not. Of particular note is that the "Flags" columns are computed in VBA and do not update when the data changes. Consequently users cannot change the data and expect the results to be properly updated automatically; instead, they will need to re-run the add-in to produce an updated chart.

Statistical overview: Charts for attribute data
This section gives a brief overview of the control charts produced by the add-in, for practitioners who might be unfamiliar with their use. In each case we envision drawing a random sample from some distribution and counting the number of defective items in the sample (for attribute data) or measuring each item in the sample with a continuous measurement (for variable data). The following paragraphs describe the different charts.

The p chart
The p chart helps practitioners look at variations in the rate at which defective items are produced. We envision that sample j consists of n j observations, each of which is either defective or not defective. The sample proportion of defectives within each lot,p j , is computed in the usual way. The chart plots thesep j 's against sample number. The centerline of the graph is given byp, which is the defect rate from all samples combined (or the value of p provided by the user, if there is one). Control lines for sample j are plotted atp ±σ j ,p ± 2σ j , andp ± 3σ j , whereσ j is the usual estimate of standard deviation given by p(1 −p)/n j . Very often all the sample sizes will be the same, in which case all the control lines will be horizontal.

The np chart
The np chart is simply a p chart (for the constant sample size case) in which everything has been multiplied by n. This means that the chart's vertical axis is expressed in terms of numbers of defectives per lot, rather than in defective rate, which might be easier for nontechnical personnel to interpret. The np chart is not appropriate for variable sample-size data, however, since defect counts will naturally be expected to increase with sample size. This gives rise to a variable centerline, which can be difficult to interpret. The add-in will not permit such a chart to be drawn.

The c chart
The c chart counts the number of defects (not defective items) in a particular sample. This would be appropriate for data arising from a Poisson process, for example. We require a constant sample size (so as to ensure a constant centerline) and we compute the defective rate c j within each sample. The plot's centerline is set to a valuec, which is either the given value c or, if no value is supplied, the average of the c j . Following the Poisson assumption, we compute the standard deviation ofc as √c , and draw control lines atc ± √c ,c ± 2 √c , andc ± 3 √c .

The u chart
The u chart measures the defect rate, rather than the count, in Poisson-type data. For a constant sample size n, then, it is simply a c chart in which everything has been divided by n. When sample sizes vary and no centerline is provided, the centerlineū is set at the total number of defects divided by the total sample size. Control lines for sample j are then drawn atū ± 1, 2, or 3 ū/n j .

Statistical overview: Charts for variable datā
X and R chart, σ supplied TheX chart is used when the data consists of a set of samples of continuous measurements, each measurement supposed to be iid from some distribution. The centerline of the chart is set to a user-supplied µ, if there is one, or to the overall average of all the observations in the sample, denotedx. If a σ is supplied, control limits are conventionally drawn at µ (orx) ±3σ/ √ n.
The R chart plots the ranges of the individual samples. The random variable W = R/σ, based on an iid sample from a normal distribution, itself has a known distribution whose mean and standard deviation are denoted by d 2 and d 3 , respectively. These values, which depend only on sample size, have been tabulated in a number of locations. We have taken the values of those constants (and others described below) from a package in the R statistical environment (R Development Core Team 2009) named qcc and due to Scrucca (2004).
When σ is given, the centerline of the chart is set at d 2 σ, and upper and lower control limits set at d 2 σ ±3d 3 σ. Often the values d 2 −3d 3 and d 2 +3d 3 are written D 1 and D 2 and tabulated separately.
X and R chart, σ not supplied When no σ is supplied, it can be estimated. One way to do this is to note that, since d 2 is the mean of the distribution of R/σ,R/d 2 serves as an estimate of σ. This estimate (call it σ) is in fact unbiased (Montgomery 2005). Then theX chart can be drawn with centerline at x and with control limits atx ± 3(R/d 2 )/ √ n. The value 3(R/d 2 )/ √ n is denoted by A 2 and again we have taken values of this expression from R.
Meanwhile, the standard deviation of the range R is d 3 σ, which can be estimated by d 3 (R/d 2 ). The R chart is therefore drawn with the centerline atR and the control limits atR±3d 3 (R/d 2 ). The symbols D 3 and D 4 are often used to denote 1 − 3(d 3 /d 2 ) and 1 + 3(d 3 /d 2 ); in that case the control limits of theR chart can be written D 3R and D 4R .

X and S chart, σ supplied
The R chart is not well suited for variable sample sizes, since in that case it produces a changing centerline. When sample sizes vary, or when they are large (say, bigger than around 10 or 12), Montgomery (2005) recommends using anX chart based directly on the standard deviation, together with an S chart, which shows the changing values of sample standard deviations across time. When a standard value of σ is supplied, theX chart is drawn as before.
Our estimate of the standard deviation for a single sample is the usual one based on the unbiased estimator of variance, s 2 = 1/(n − 1) (x i −x) 2 . Of course s is not unbiased for σ; we write E(s) = c 4 σ, where c 4 is another value that depends on sample size. (This one we compute directly in our VBA code, using the expression in Montgomery (2005, p. 95).
X and S chart, σ not supplied When no value of σ is given, we computes, the average of the sample standard deviations (weighted by sample sise if necessary), and estimate σ bys/c 4 . TheX chart is drawn with the centerline atx and control limits atx ± 3(s/c 4 )(1/ √ n) =x ± A 3s , where A 3 is another value that can be looked up in a table or computed from c 4 .
In this case the S chart is drawn with centerline ats and control limits ats ± 3(s/c 4 ) 1 − c 2 4 = (B 3s , B 4s ) for suitably chosen B 3 , B 4 .
For all four of these charts, users should be aware that the algorithm estimates variance by pooling variability within samples. If the samples are taken within batches produced within lots, there can be an extra (batch-to-batch) source of variability that the algorithm will not capture, and, as a result, the control limts will be too narrow. See Montgomery (2005), exercise 5-70 for an example.

X chart for individual observations
When it is necessary to draw a control chart for invidividual measurements, it is common to use, as an estimate of the range, the moving range and its average. That is, we define MR i = |x i − x i−1 | and compute the average of these, MR = 1/(n − 1) MR i . Then the control chart has its centerline atx and control limits atx ± 3MR/d 2 . For this chart extra care must be taken to track exclusions; if, for example, observation 5 is excluded, we must ensure that neither |x 5 − x 4 | nor |x 6 − x 5 | is used in the calculation of MR.

Conclusion
We have demonstrated an add-in to the widely-used spreadsheet program Excel that provides control chart functionality, previously available only in commercial products, in a free opensource form. The add-in draws many of the major sorts of process control charts for attribute and variable data with a straightforward interface. It follows the development in the textbook of Montgomery (2005) and is suitable for use in a course on quality control methods.

Future development
Future releases of the add-in will concentrate on improvement in two areas. First, the display of the computed values of confidence levels and so on, which currently appears on the spread-sheet behind the chart, should be able to be suppressed. If results are desired, the user should be able to select the location in which they are displayed. Second, the ability to modify the trend rules (not just select a subset of them) might be useful.
Beyond user interface issues, the addition of the s 2 , the cumulative sum (CUSUM), and exponentially weighted moving average (EWMA) charts would complete the set of univariate charts detailed in Montgomery (2005). The add-in might be even more useful in an instructional setting if it incorporated some of the process control tools not related to control charts, like, for example, a calculator to implement the ANSI sampling standards.