Integrated Degradation Models in R Using iDEMO

Degradation models are widely used to assess the lifetime information for highly reliable products with quality characteristics whose degradation over time can be related to reliability. The performance of a degradation model largely depends on an appropriate model description of the product's degradation path. The cross-platform package iDEMO (integrated degradation models) is developed in R and the interface is built using the Tcl/Tk bindings provided by the tcltk and tcltk2 packages included with R. It is a tool to build a linear degradation model which can simultaneously consider the unit-to-unit variation, time-dependent structure and measurement error in the degradation paths. The package iDEMO provides the maximum likelihood estimates of the unknown parameters, mean-time-to-failure and q-th quantile, and their corresponding confidence intervals based on the different information matrices. In addition, degradation model selection and goodness-of-fit tests are provided to determine and diagnose the degradation model for the user's current data by the commonly used criteria. By only enabling user interface elements when necessary, input errors are minimized.


Introduction
High quality products are designed and manufactured to function for a long time before they fail. Hence, with only a relatively short period of time available for internal life testing, it is a great challenge for manufacturers to obtain reliability information for their products. Although there are helpful techniques such as censoring and/or accelerating a product's lifetime by testing at a higher level of stress, these techniques offer little assistance for highly reliable products. The major obstacle is the rather difficult problem of obtaining sufficient time-to-failure data to efficiently estimate a product's lifetime. Under such a restraint, if quality characteristics (QCs) do exist whose degradation over time (also called degradation The remainder of this paper is organized as follows. Section 2 briefly introduces the degradation model and the goals that iDEMO is designed to meet. Section 3 describes the functions of iDEMO and the usage of iDEMO is illustrated through the analysis of real data. Finally, concluding remarks and future possible extensions of the package are given. Peng and Tseng (2009) proposed the following model to describe the degradation data. Let Y (t) and L(t), t ≥ 0, denote the observed and the true values of the QC of a product at time t, respectively. Assume that there exists a transformation function such that

Statistical overview: Degradation models
where Θ follows a normal distribution with mean η and variance σ 2 η (denoted by N (η, σ 2 η )) representing the unit-to-unit variation of the products; is the measurement error with N (0, 1); σ B is a diffusion coefficient; and B(t) denotes standard Brownian motion representing a timecorrelated structure. The random effect Θ, the standard Brownian motion B(t) and the measurement error are assumed to be mutually independent. For this degradation model, similar to Lu and Meeker (1993, p. 164), we assume that P{Θ ≤ 0} is negligible in order to avoid the certain probability of getting non-feasible degradation slopes. The advantage of degradation model M 0 is that it allows us to simultaneously consider the unit-to-unit variation with time-dependent structure and measurement error. It is easily seen that if we set the cases (i) σ B = 0, (ii) σ η = σ = 0 and (iii) σ = 0 in the degradation model M 0 , then the model M 0 reduces to (i) the conventional random-effect model M 1 , (ii) the Wiener process model M 2 and (iii) the degradation model M 3 without measurement error, respectively, which had been proposed in Peng and Tseng (2009). In addition to the models M 0 to M 3 , two more degradation models M 4 and M 5 are now available in iDEMO as shown in Table 1. Note that the model M 4 is the traditional regression and used as a benchmark to compare with the other competing degradation models. Furthermore, in order for statistical inference to be possible, the identifiability of the degradation model M 0 is needed and can be established as follows.
Proposition 1. If the measure frequencies are greater than 3, then the degradation model M 0 is identifiable.
The proof of Proposition 1 is presented in Appendix A. This means that different values of the parameter in the degradation model M 0 must generate different probability distributions of the observable variables. See Lehmann and Casella (1998), and Casella and Berger (2001) for details.

Lifetime information
Let ω denote the critical level for the degradation path of model M 0 . The product's lifetime T can then be suitably defined as the first passage time when the true degradation path L(t) crosses the critical level ω; i.e., T = inf{t|L(t) ≥ ω}. The probability density function (PDF) and cumulative distribution function (CDF) of the lifetime distribution T are respectively and where Φ(·) is the CDF of N (0, 1). Note that if the drift rate is η ≥ 0 for the degradation models M 2 and M 5 , we readily perceive that lim t→∞ F T (t) = 1 as derived from (2). The q-th quantile, t(q), of a product's lifetime can then be evaluated by solving F T (t(q)) = q. However, if η ≥ 0 for the degradation models M 0 , M 1 and M 3 , and η < 0 for the degradation models M 0 -M 3 and M 5 , the density function in (1) is improper with the positive probability at infinity and the first passage time has a defective distribution. i.e., For instance, by using L'Hospital rule for the model M 1 , it is easily shown that For this scenario, it is impossible to estimate each and every q-th quantile of a product's lifetime because of lim t→∞ F T (t) ∈ (0, 1). If q ∈ (0, lim t→∞ F T (t)), the q-th quantile of a product's lifetime can still be estimated. Otherwise, instead of the q-th quantile, the product's mean-time-to-failure (MTTF ) is an alternative measure associated with the product's lifetime information. For more discussion of the negative drift rate, see Whitmore (1986), Chhikara and Folks (1989) and the references therein.
For the model M 0 , MTTF is given by where D(z) = exp(−z 2 ) z 0 exp(x 2 )dx is Dawson's integral for all real z. See Peng (2008), Peng and Tseng (2009) and Peng and Hsu (2012) for more details.
In practical application however, the parameters in the degradation model M 0 are unknown. Therefore, in order to assess the product's lifetime information, the likelihood function of the degradation model M 0 can be derived in the following section.

Log-likelihood function and model selection criteria
Assume that n units are tested, and the degradation measurements of each unit are available at time t 1 , . . . , t m in a degradation test. The sample path of the i-th unit at time t j is given by Thus Y i follows a multivariate normal distribution with mean vector ηt, and covariance matrix where and I m is an identity matrix of order m. The log-likelihood function of ϑ = (η, σ 2 η , σ 2 B , σ 2 ) is given by Hence, the maximum likelihood estimates (MLEs)θ of all unknown parameters can be found numerically by using the function optim() in R. Moreover, let r and L(θ|y) be the number of parameters of a degradation model and the value of the log-likelihood function, respectively. For degradation models M 0 to M 5 , Akaike information criterion (AIC), Bayesian information criterion (BIC) and Hannan-Quinn criterion (HQC) are adopted for model selection and defined as AIC = −2L(θ|y) + 2r, BIC = −2L(θ|y) + r ln(n) and HQC = −2L(θ|y) + 2r ln(ln(n)), respectively. The degradation model with minimum AIC (BIC or HQC) is chosen as the best model to fit the degradation data.

Information matrix type
For a given function g with continuous first partial derivatives, let g(ϑ) be the quantity of interest and ∇g(ϑ) = ∂g(ϑ)/∂ϑ. Then, under the large sample case, we have where Avar(g(θ)) = (∇g(ϑ)) I −1 (ϑ)(∇g(ϑ)), and I(ϑ) denotes expected Fisher's information matrix (FIM). That is where I ϑ k ,ϑ l = −E ∂ 2 L(ϑ|y i )/∂ϑ k ∂ϑ l for 1 ≤ k, l ≤ 4. For instance, the approximate variance of the estimated MTTF ( MTTF ) and the estimated q-th quantile (t(q)) of the product can be obtained respectively as and Hence, the (1−α)% confidence intervals (CIs) of the MTTF and q-th quantile can be obtained directly.
In practical applications, if we cannot be certain that the degradation model is correctly specified, then computing valid CIs (or variances) of the interesting quantities becomes a more practical issue. Hence, instead of using the FIM (i.e., nI(ϑ)), we adopt the observed information matrix (OIM) proposed by Boldea and Magnus (2009) to construct the valid CIs of the unknown parameters, MTTF and q-th quantile whether the degradation model is correctly specified or not. More specifically, we can estimate the FIM by is the score vector corresponding to the single observation y i evaluated at ϑ =θ, or by the Hessian matrix based on second-order derivatives, or by the robust matrix When the underlying model is correctly specified, the inverses I −1 1 and I −1 2 are consistent estimators of the asymptotic variance ofθ. Moreover, the inverse I −1 3 is also a consistent estimator of the asymptotic variance ofθ, whether the degradation model is correctly specified or not. See White (1982) and Boldea and Magnus (2009) for more details.

Overview of iDEMO functionality
The package iDEMO is designed not only for advanced users of R, but also for beginners. Analysis can be done using a user-friendly interface and iDEMO was designed to meet the following targets: (i) Degradation model selection is designed to choose the best one among the degradation models M 0 to M 5 for the user's current data based on AIC, BIC and HQC.
(ii) For each degradation model M 0 to M 5 , we provide the MLEs of unknown parameters, product's MTTF , q-th quantile and the corresponding (1 − α)% CIs based on the FIM and/or OIM.
(iii) Use the least square estimate (LSE) to obtain the pseudo failure time (PFT) and then diagnose the goodness of fit for the degradation model with PFTs by using the Kolmogorov-Smirnov (KS) and Anderson-Darling (AD) tests.
(iv) Six plot types are available in iDEMO: degradation paths, degradation data summary, the PDF and CDF of the lifetime distribution and the probability-probability (PP) and quantile-quantile (QQ) plots for a specific degradation model with PFTs.
We will address the functions in iDEMO sequentially to achieve these goals in the following section.

Data input format
In practical applications, the user needs to prepare a data file to run the program iDEMO.
The data set should be saved as a text file (e.g., .txt, .dat, .prn and so on). The data format has two parts. The first part is the measuring time used in an experiment, which is placed in the first column of the data set. The second part contains the corresponding observations (QC) of each test unit column by column. For example, consider the data file laser.txt (Meeker and Escobar 1998)  There are 15 tested laser devices and the QC of a laser device is its operating current. The measured frequency of its operating current is 250 hours and the experiment was terminated at 4000 hours. When the operating current reaches a predefined threshold level, ω = 10, the device is considered to have failed. In the following subsections, we use the laser data to illustrate the stepwise tutorial of iDEMO.

Starting the iDEMO
Once R is running, simply load iDEMO by typing the command library("iDEMO") into the R console. Typing the command run.gui() in the R console, the iDEMO window shown in Figure 1 appears. The appearance of the screen images may differ under different operating systems. As shown in Figures 1 and 2, three tabbed notebook widgets (Basic information, Parameter estimation and Lifetime information) are designed to provide required/optional information for the degradation data analysis.

Import data
Pressing the Import data button results in the following Open file window as shown in Figure 3. Select the data file and press the Open button. The filename will then appear on the Data set column and the data set that the user entered is now the active data set in the R system. To check the correctness of a data set used in GUI, type the filename (e.g., laser) of the data set in the R console. The data set will then show in the R console.
Note that clicking the button Show the degradation paths will produce the plot of the degradation paths. Figure 4 shows the plot of operating current (QC) over time for the laser data.

Changing the data set
If no data set is imported in GUI, the text <no data> will be shown on the Data set column. When the user clicks on the Data set column, a blank Name list window will appear in Figure 5. To change the data set, assume that we have imported two data sets (data1 and data2). By clicking on the Data set column (the Name list window shown in Figure 5), choosing one (e.g., data1) and pressing the OK button, the data set (data1) can then be used in GUI. Repeating the same procedure, the data set can be changed from data1 to data2.

Degradation data summary
Clicking the button Degradation data summary will show the box plot of the degradation paths for each measurement. Each box contains the extreme of the lower whisker, the lower hinge, the median, the mean, the upper hinge and the extreme of the upper whisker.
For the laser data, Figure 4 shows the box plot of QC over time.

Pseudo failure time estimation
We first obtain the n slopes (η) after fitting a simple linear regression without the intercept term (i.e., LSE) for each degradation path. The threshold ω should be predefined (the default value is 10 and must be a positive value). The PFT estimation can then be estimated by ω/η. This means that the eventual crossing times may be beyond the termination time, Figure 6: The initial setting window.
where we assume that the degradation test has ended. After clicking the Pseudo failure time estimation button, the PFT estimation for the test units will be listed and saved as PFT.name (e.g., PFT.laser) in R.
For the laser data, click the Pseudo failure time estimation button to obtain the PFT as follows.

Degradation model selection
If it is not clear which model should be used, Degradation model selection can help users to choose an appropriate degradation model. Six combinations of variations in the degradation model can be used in iDEMO. After pressing the button Degradation model selection, an Initial setting window will appear as shown in Figure 6. The default values of η, σ η , σ B and σ are 0, 1, 1 and 1, respectively. There are five optimization algorithms used in iDEMO for searching the MLE: Nelder-Mead (default), BFGS, CG, L-BFGS-B and SANN. The user can input different initial values of the unknown parameters (and/or the optimization algorithm) in the Initial setting window to ensure the correctness of the model selection result. Clicking the button OK will list the results of the model selection in the R console, including the optimization algorithm, MLEs of the unknown parameters, value of the log-likelihood function, AIC, BIC and HQC for the models M 0 to M 5 . In addition, ranks are calculated for each criterion.
For the laser data, clicking the button Degradation model selection and the button OK in the Initial setting window will produce the following result of the model selection.

Optimization Algorithm: Nelder-Mead
It is clearly seen that the degradation model M 3 is substantially better than the other models in terms of the three described criteria for the model selection.

Single degradation model analysis
Checking the box Single degradation model analysis, the Run button, the functions in the Parameter estimation and Lifetime information tabs will be enabled. According to the result of the degradation model selection, the user can specify a degradation model in the Parameter estimation tab and choose the related product's lifetime information from the Lifetime information tab.

Model parameters
Since the drift rate η is always needed, only select the variation parameters to decide which degradation model fits the data set. Then give each of them an initial value to search for the MLE. The initial values and algorithm are the same settings as described in Degradation model selection. i.e., the default value of (η, σ η , σ B , σ ) and the default algorithm are (0, 1, 1, 1) and Nelder-Mead, respectively. We recommend using different initial values (and/or the  optimization algorithm) to obtain more stable and more accurate estimates of the unknown parameters. In addition, the parameter estimates of outputs will show up in both R console and the Initial value edit boxes and replace the initial values automatically. Note that if no variation or only variation σ η is checked, the functions of CI and the functions in the Lifetime information tab will be disabled. This is because it is meaningless to have a model that contains no variation; for the latter, the covariance matrix of the corresponding degradation model is singular. Furthermore, if only variation σ is checked (i.e., the traditional regression model M 4 ), according to the definition of the product's lifetime, it does not make sense to have a product's lifetime without variation. Therefore, the functions in the Lifetime information tab will be disabled. Otherwise, six combinations of variations in the degradation model can be arbitrarily chosen.

Confidence interval
The value of the significant level must be between 0 and 1; the default value is 0.05. There are four kinds of information matrices to choose from when constructing the CI. Since the variations σ η , σ B and σ are positive parameters, to avoid obtaining the negative lower bound, we can use the log-transformation to obtain the corresponding CIs. Note that several functions listed in Table 2 are exported by iDEMO. Users can use the idemo.FIM, OIM.hessian, OIM.score and OIM.robust functions with specified values for further studies. Furthermore, if no information matrix is selected, the edit box of the significant level of the CI will be disabled. This means that the output will not contain any CI for the parameters, MTTF and q-th quantile.
For the laser data, based on the model M 3 , check all the boxes in the Parameter estimation tab except for the box Measurement error (σ ). The outputs of analysis can be obtained as follows.

Data: laser
Please wait, analyzing degradation data... Note that LCI (UCI) is the lower (upper) bound of the CI of the corresponding parameter. LCI.ln and UCI.ln are the lower and upper bound of the CI with log-transformation, respectively.

Lifetime distribution
Checking the boxes, Show the plot of the PDF estimation, and, Show the plot of the CDF estimation, will show the plots of the PDF and CDF estimation of the lifetime distribution. Users can choose which plots they would like to view by checking boxes from the GUI. Note that because the range of the CDF is between 0 and 1, the pairwise CIs of the CDF estimation are with logit-transformation. In addition, the reliability and failure (or hazard) rate functions can be obtained from the PDF and CDF in (1) and (2). For example, in R, type R> library("iDEMO") R> Eta <-0.002 R> sEta <-0.00042 R> sB <-0.0108 R> W <-10 R> reliability.fun <-1 -idemo.CDF(t = 4000, Eta, sEta, sB, W) R> reliability.fun [1] 0.8649452 R> failure.rate.fun <-idemo.PDF(t = 4000, Eta, sEta, sB, W)/reliability.fun R> failure.rate.fun Checking the box, Mean-time-to-failure (MTTF), will list the MLE of the exact and approximate product's MTTF evaluated by (3) and (4). Note that if the estimated drift ratê η is negative, it is meaningless to calculate the product's (exact and approximate) MTTF . This is because the slope of the mean degradation path should be positive, otherwise it cannot cross the (prefixed) positive threshold. As for the estimated q-th quantile, suppose that we would like to estimate the median lifetime t(0.5); check the box qth quantile and type 0.5 (the default value) in the qth quantile edit box. Furthermore, we need to specify an interval in the Searching interval for the quantile edit box to search for the target value (q-th quantile). The default interval is from 2000 to 12000 in increments of 50. If the target quantile is not found in the searching interval after running the program, there are two possible settings that should be noticed: (i) the interval is too narrow to search for the target quantile; on the other hand, the inputted target quantile is too large or too small, and (ii) the value of the threshold may be specified incorrectly. Note that this searching interval is also used as the domain of the plots of the PDF and CDF estimation as well as the QQ plot.
For the laser data, based on the model M 3 and checking all boxes in Lifetime Distribution of Lifetime information tab, the results of the exact MTTF , approximate MTTF and median lifetime, along with their corresponding 95% CIs are listed as follows.     It is easily seen that the CIs evaluated by FIM and OIM calculated by the Hessian matrix are almost the same for the MLEs of the parameters, MTTF and median lifetime. This means that the sample size of the laser data is large in comparison to the number of parameters. Furthermore, these results show that even if the degradation model is mis-specified, the CIs computed from the OIM calculated by the robust matrix are narrower than those from the FIM. Note that for the degradation model M 3 , P{Θ ≤ 0} = 5.495135 × 10 −7 is negligible, although the density function of the lifetime distribution is improper (i.e., lim t→∞ F T (t) = 0.9999995). Figure 7 shows the plots of the PDF and CDF estimation of the lifetime distribution, respectively.

Goodness of fit
To diagnose the goodness of fit for the specified degradation model with PFTs, we provide two graphical representations (i.e., PP and QQ plots) and two well-known hypothesis testings (i.e., Kolmogorov-Smirnov (KS) and Anderson-Darling (AD) tests). If the box Plot the pseudo failure time (PFT) is not checked, the functions of the PP and QQ plots as well as the KS and AD tests will be disabled. This means that no goodness-of-fit outputs will be listed in the R console. Note that if the estimated drift rateη is negative or the PFT estimation contains the negative values (i.e.,η < 0), the functions in the Goodness of Fit will not work because it is meaningless to do so.
For the laser data, based on the model M 3 and checking all the boxes in the Goodness of Fit of Lifetime information tab, the results of the goodness of fit are obtained as follows.

Odds and ends
After finishing the above settings, press the Run button to submit the computational job. If the inputted information is invalid or the data files are incorrectly formatted, there is a warning or an error message shown in the R console, which informs the user to correct the format. If the inputted data passes the examination, the program iDEMO starts to perform the analysis and the message Please wait, analyzing degradation data... will show up in the command line. A prompt sign will appear immediately, but the computation is still proceeding. Please wait until the results appear. Click the Exit button to quit the GUI.

Conclusion and future development
In this paper we have presented the R package iDEMO for analysis of degradation tests. All the features of the GUI in iDEMO have been explained and illustrated through the laser data set. The GUI is designed not only for non-R users but also for advanced users. For the latter, some useful functions built into the package are provided to do further studies. In addition, the disabled property used in the GUI can effectively avoid input errors. The latest version of iDEMO can be downloaded at http://www.stat.sinica.edu.tw/chienyu/ or http://www.idemo.tw/.
Further development in iDEMO for improving the flexibility and applications will focus on two directions.
For practical applications, if the degradation path of a product degrades very slowly, engineers usually use higher-level stress variables (such as temperature, voltage, electric current etc.) to accelerate the degradation path. Typical experiments are accelerated degradation tests (ADTs), step-stress ADT, progressive-stress ADT and so on. See Liao and Tseng (2006), Tseng and Peng (2007), and Peng and Tseng (2010) for details.
For the monotonic degradation path, instead of the Gaussian process fitted in the present paper, we may use other types of processes to fit data like the Hougaard process (Hougaard 1986), which includes stable, gamma and inverse Gaussian processes as special cases. An intuitive advantage of the Hougaard process is that they are strictly increasing, which seems more reasonable for the laser data.

Computational details
All computations and graphics in this paper have been obtained using the R version 2.15.0. Several utility packages have been created to help in the analyzing process. For instance, package gsl (Hankin 2006) provides the Dawson's integral for evaluating the MTTF , package tcltk2 (Grosjean 2012) creates the tabbed notebook widget and package ADGofTest (Bellosta 2011) calculates the p value of the AD statistic based on the algorithm developed by Marsaglia and Marsaglia (2004).