Newdistns: An R Package for New Families of Distributions

A new R contributed package written by the authors is introduced. This package computes the probability density function, cumulative distribution function, quantile function, random numbers and some measures of inference for nineteen families of distributions. Each family is flexible enough to encompass an uncountable number of structures. The use of the package is illustrated using a real data set. Also robustness of random number generation is checked by simulation.


Introduction
Let G be any valid cumulative distribution function defined on the real line.The last decade or so has seen many approaches proposed for generating new distributions based on G.All of these approaches can be put in the form where B : [0, 1] → [0, 1] and F is a valid cumulative distribution function.So, for every G one can use (1) to generate a new distribution.
The nineteen approaches and the corresponding families of G distributions are the ones that we are aware of since 1997.Each of these family can be motivated by lifetime issues, as we shall see in Section 2. The applications of these G distributions have been widespread.A list of applications for each family of G distributions is given in Section 2.
The aim of this paper is to present a new contributed package for R (R Development Core Team 2014) that computes basic properties for any G distribution from each of the nineteen families.
The properties considered include the probability density function, cumulative distribution function, quantile function, random numbers and measures inferred based on fitting the family of distributions to some data.Calling sequences for the computation of all of these properties are given in Section 2. Also given in Section 2 are explicit expressions for the probability density, cumulative distribution and quantile functions.The computation of the measures of inference is based on the package AdequacyModel.Illustrations of the practical use of the new R package are given in Section 3. Finally, the robustness of the routines for random number generation is checked by simulation in Section 4.
Minimum value of the negative log-likelihood function; Kolmogorov Smirnov test statistic and its p-value; Convergence status -Algorithm converged or algorithm not converged.
The following format is used for the listing: the first line gives the expression for the probability density function; the second line gives the expression for the cumulative distribution function; the third line gives the expression for the quantile function; the fourth line gives the calling sequence for the probability density function; the fifth line gives the calling sequence for the cumulative distribution function; the sixth line gives the calling sequence for the quantile function; the seventh line gives the calling sequence for random number generation; the eighth line gives the calling sequence for the measures of inference.
The notation used for the calling sequences in the last five lines can be described as follows.
The spec (a character string) specifies the distribution corresponding to the probability density function, g(•), and the cumulative distribution function, G(•).The distribution should be one that is recognized by R. It could be one of the distributions implemented in the R base package or one of the distributions implemented in an R contributed package or one freshly written by a user.In any case, there should be functions dspec, pspec and qspec, computing the probability density function, cumulative distribution function and quantile function of the G distribution.
If log = TRUE then log of the probability density function will be returned.If log.p = TRUE then log of the cumulative distribution function will be returned and the quantile function will be computed for exp(p).If lower.tail= FALSE then one minus the cumulative distribution function will be returned and the quantile function will be computed for 1 -p.The code n denotes the number of random numbers to be generated.
Additional arguments in the form of ... can be supplied for each calling sequence.These arguments could give inputs (e.g., parameter values) for the distribution specified by spec.
Each of these distributions is defined on the positive real line and has one or two parameters.These distributions in fact include the most popular distributions for lifetime modeling.As we shall see the nineteen families of distributions can be motivated by lifetime issues.
In the calling sequence for the measures of inference, data must be a vector of data values for which the family of distributions is to be fitted.starts must be a vector of initial values for the parameters of the family of distributions and those of g.The vector must contain the initial values for the parameters of the family of distributions in the order specified by the calling sequence for the probability density function, and then the initial value for r if g has only one parameter.The vector must contain the initial values for the parameters of the family of distributions in the order specified by the calling sequence for the probability density function, then the initial value for r and then the initial value for s if g has two parameters.method is the method for optimizing the log likelihood function.It can be one of "Nelder-Mead", "BFGS", "CG", "L-BFGS-B" or "SANN".The default is "BFGS".The option "L-BFGS-B" can be used only if each parameter specified by starts takes values on the positive real line.The details of these options can be found in the manual pages for optim.
For each of the nineteen families of G distributions, we now list motivation, particular members of the family studied in the literature and the applications they have received.
Beta exponential G distributions due to Alzaatreh et al. (2013b): for x in the range of g, 0 ≤ p ≤ 1, λ > 0, the first shape parameter, a > 0, the second shape parameter, and b > 0, the third shape parameter, where Beta extended G distributions due to Cordeiro et al. (2012b): for x in the range of g, 0 ≤ p ≤ 1 − exp(−α), α > 0, the scale parameter, a > 0, the first shape parameter, and b > 0, the second shape parameter.The default values for α, a and b are 1.
Beta extended G distributions have been used to model lifetimes of mechanical components (Cordeiro et al. 2012b).
These distributions were motivated to model the failure time of a a-out-of-a + b − 1 system when the failure times of the components are independent and identical random variables with cumulative distribution function G.
Beta G distributions have been used to model: adult numbers for Tribolium Castaneum and Tribolium Confusum (Eugene et al. 2002;Kong et al. 2007); breaking strength of glass fibers (Barreto-Souza et al. 2010, 2011;Cordeiro and Lemonte 2011a;Cordeiro et al. 2013a;Domma and Condino 2013;Adepoju et al. 2014;Alshawarbeh et al. 2014); breaking stress of carbon fibers (Barreto-Souza et al. 2011;Cordeiro and Lemonte 2011a;Alshawarbeh et al. 2014;Leao et al. 2014;Oluyede and Yang 2014); carbon monoxide measurements in several brands of cigarettes (Cordeiro et al. 2013c); daily ozone level measurements in New York (Cordeiro et al. 2013e); exceedances of flood peaks of the Wheaton river in Yukon Territory, Canada (Akinsete et al. 2008;Mahmoudi 2011;Alshawarbeh et al. 2014;Cordeiro et al. 2014a); failure times of a polyster/viscose yarn in a textile experiment (Pal and Tiensuwan 2014); failure times of motorettes with a new insulation (Cordeiro et al. 2013c;Pal and Tiensuwan 2014); failure times of turbocharger of one type of engine (Singla et al. 2012); fatigue life of 6061-T6 aluminum coupons cut parallel with the direction of rolling (Mahmoudi 2011;Bidram 2012;Bidram et al. 2013); fatigue life of bearings of a certain type (Montenegro and Cordeiro 2013); flood data for the Floyd river located in James, Iowa, USA (Akinsete et al. 2008); household income and consumption in Italy (Domma and Condino 2013); lifetimes of mechanical components (Silva et al. 2010;Badmus and Bamiduro 2014;Jafari et al. 2014); maximum values of monthly flood rates of the Castelo river, Brazil (Lourenzutti et al. 2014); monthly actual taxes revenue in Egypt (Nassar and Nada 2011); national index of consumer prices of Brazil corresponding to health and personal care (Cordeiro and Lemonte 2011c); number of successive failures of the air-conditioning system of each number of a fleet of Boeing 720 jet airplanes (Nassar and Nada 2012;Bidram et al. 2013); remission times of a random sample of bladder cancer patients (Zea et al. 2012;Merovci and Sharma 2014;Oluyede and Yang 2014); repair times for an airborne communication transceiver (Cordeiro et al. 2013b;Percontini et al. 2013;Cordeiro et al. 2014a); SAR image processing (Cintra et al. 2014); short-term and longterm outcomes of constraint induced movement therapy after stroke (Nassar and Nada 2012); strength of ball bearings (Nassar and Nada 2012); stress-rupture life of kevlar epoxy strands subjected to constant sustained pressure (Cordeiro et al. 2013b); survival times of cutaneous melanoma (a type of malignant cancer) patients (Paranaba et al. 2011); survival times of guinea pigs injected with different doses of tubercle bacilli (Cordeiro and Lemonte 2011b;Merovci and Sharma 2014); survival times of myelogenous leukemia patients (Mahmoudi 2011); times to first failure of devices (Jafari and Mahmoudi 2014).
The default values for λ and a are 1.
These distributions were motivated to model the time to failure of the first out of a Poisson number of systems functioning independently where each system has a fixed number of parallel units and their failure times are independent and identical random variables with cumulative distribution function G.These distributions have been used to model the daily average air temperature (F) in Cairo (Ristić and Nadarajah 2013).
Exponentiated generalized G distributions due to Cordeiro et al. (2013d): for x in the range of g, 0 ≤ p ≤ 1, a > 0, the first shape parameter, and b > 0, the second shape parameter.The default values for a and b are 1.
These distributions were motivated to model the failure of time of a system having b units functioning in parallel and each of these units have a subunits functioning in series.The failure times of the subunits are assumed to be independent and identical with cumulative distribution function G.
Particular exponentiated generalized G distributions studied in the literature include the exponentiated generalized Birnbaum-Saunders distribution (Cordeiro and Lemonte 2014).
Exponentiated generalized G distributions have been used to model: breaking stress of carbon fibers (Cordeiro et al. 2013d); effects of mechanical damage on banana fruits (Cordeiro et al. 2013d); exceedances of flood peaks of the Wheaton river near Carcross in Yukon Territory, Canada (Cordeiro et al. 2013d;Cordeiro and Lemonte 2014); lifetimes for industrial devices put on life test at time zero (Cordeiro and Lemonte 2014); stress-rupture life of kevlar epoxy strands subjected to constant sustained pressure (Cordeiro et al. 2013d).
Exponentiated G distributions due to Gupta et al. (1998): for x in the range of g, 0 ≤ p ≤ 1 and a > 0, the shape parameter.The default value for a is 1.
These distributions were motivated to model the failure of time of a system having a units functioning in parallel the failure times of which are assumed to be independent and identical with cumulative distribution function G.
Particular exponentiated G distributions studied in the literature include the exponentiated Frechet distribution (Nadarajah and Kotz 2003), the exponentiated gamma distribution (Nadarajah and Gupta 2007), the exponentiated generalized inverse Weibull distribution (Elbatal and Muhammed 2014), the exponentiated Gumbel distribution (Nadarajah 2006) Exponentiated G distributions have been used to model: annual maximum daily rainfall from Orlando, Florida (Nadarajah 2006); drought data from Nebraska (Nadarajah and Gupta 2007); remission times of a random sample of bladder cancer patients (Elbatal and Muhammed 2014).
Exponentiated Kumaraswamy G distributions due to Lemonte et al. (2013): for x in the range of g, 0 ≤ p ≤ 1, a > 0, the first shape parameter, b > 0, the second shape parameter, and c > 0, the third shape parameter.The default values for a, b and c are 1.
These distributions were motivated to model the failure of time of a system having c units functioning in parallel and each of these units have b subunits functioning in series and each of these subunits have a subsubunits functioning in parallel.The failure times of the subsubunits are assumed to be independent and identical with cumulative distribution function G.These distributions have been applied model lifetimes (Lemonte et al. 2013).
Gamma G distributions have been used to model: breaking stress of carbon fibers (Alzaatreh et al. 2014;Cordeiro et al. 2014b); flood levels for the Susquehanna river at Harrisburg, PA (Alzaatreh and Knight 2013); gene expression levels on human cancer cells (Castellares et al. 2015); number of million of revolutions before failure of ball bearings in a life testing experiment (Pararai et al. 2014); number of successive failures for the air conditioning system of each member in a fleet of Boeing 720 jet airplanes (Oluyede et al. 2014); remission times of a random sample of bladder cancer patients (Cordeiro et al. 2015;Oluyede et al. 2014;Castellares and Lemonte 2014); salaries of professional baseball players (Oluyede et al. 2014); strengths of glass fibers (Alzaatreh et al. 2014); survival times of breast cancer patients (Ramos et al. 2013); survival times of cutaneous melanoma (a type of malignant cancer) patients (Cordeiro et al. 2014b); survival times of guinea pigs injected with different doses of tubercle bacilli (Pararai et al. 2014); tensile strength for single-carbon fibers (Alzaatreh and Knight 2013); the cDNA microarray data of the NC160 cancer cell lines (Castellares et al. 2015); waiting times between consecutive eruptions of the Kiama Blowhole (da Silva et al. 2013).
Gamma G II distributions due to Ristić and Balakrishnan (2012): Gamma uniform G distributions due to Torabi and Montazeri (2012): for x in the range of g, 0 ≤ p ≤ 1, and a > 0, the shape parameter.The default value for a is 1.
These distributions were constructed by considering the distribution of G −1 (W/(1 + W )), where W is a gamma random variable.These distributions have been used to model survival times of leukemia patients (Torabi and Montazeri 2012).
Generalized beta G distributions due to Alexander et al. (2012): for x in the range of g, 0 ≤ p ≤ 1, a > 0, the first shape parameter, b > 0, the second shape parameter, and c > 0, the third shape parameter.The default values for a, b and c are 1.
Generalized beta G distributions have been used to model: effects of mechanical damage on banana fruits (Alexander et al. 2012); exceedances of flood peaks of the Wheaton river near Carcross in Yukon Territory, Canada (Mead 2014); monthly actual taxes revenue in Egypt (Mead 2014); survival times of breast cancer patients (Tahir et al. 2014); times of failure and running times for a sample of devices from a field-tracking study of a larger system (Alexander et al. 2012); times of unscheduled maintenance actions for the USS Halfbeak number 4 main propulsion diesel engine (Marciano et al. 2012).
These distributions were motivated to model the time to failure of the first out of a geometric number of systems functioning independently where each system has a Poisson number of parallel units and their failure times are independent and identical random variables with cumulative distribution function G.These distributions have been used to model adult numbers of Tribolium Confusum and failure times for epoxy insulation specimens in an accelerated voltage life test (Nadarajah et al. 2013a).
Kumaraswamy G distributions due to Cordeiro and Castro (2011): for x in the range of g, 0 ≤ p ≤ 1, a > 0, the first shape parameter, and b > 0, the second shape parameter.The default values for a and b are 1.
These distributions were motivated to model the failure of time of a system having b units functioning in series and each of these units have a subunits functioning in parallel.The failure times of the subunits are assumed to be independent and identical with cumulative distribution function G.
Kumaraswamy G distributions have been used to model: breaking strengths of glass fibers (Paranaiba et al. 2013); breaking strengths of polyster/viscose yarns (Aryal and Elbata 2015); breaking stress of carbon fibers (Shams 2013b,a); carbon monoxide levels from several cigarette brands (Gomes et al. 2014); exceedances by the river Nidd at Hunsingore Weir (Nadarajah and Eljabri 2013); exceedances of flood peaks of the Wheaton river near Carcross in Yukon Territory, Canada (Bourguignon et al. 2013); failure times for epoxy insulation specimens (Gomes et al. 2014); failure times of mechanical components (Cordeiro et al. 2012c); flood data for the Floyd river located in James, Iowa, USA (Cordeiro et al. 2012c); flood discharge of at least seven consecutive days and return period of 10 years in the Brazilian Pantanal (Cordeiro et al. 2012a); frequencies of the purchases of a brand X breakfast cereals (Akinsete et al. 2014); lifetimes of industrial devices put on life test at time zero (de Pascoa et al. 2011;Cordeiro et al. 2014c); number of absences among shift-workers in a steel industry (Akinsete et al. 2014); stressrupture life of kevlar epoxy strands subjected to constant sustained pressure (Paranaiba et al. 2013); survival times of cutaneous melanoma (a type of malignant cancer) patients (de Santana et al.

2012)
; survival times of guinea pigs injected with different doses of tubercle bacilli (Cordeiro et al. 2012c); survival times of patients given radiation therapy and radiation plus chemotherapy (Cordeiro et al. 2014c); the number of millions revolutions reached by ball bearings before fatigue failure (Ghosh 2014); times of failure and running times of devices from a field-tracking study of a larger system (Cordeiro et al. 2010); times to serum reversal of children exposed to HIV by vertical transmission (de Pascoa et al. 2011;de Santana et al. 2012;Paranaiba et al. 2013); times until bulls reach the weight of 160kg since birth (Roges et al. 2014).
Log gamma G I distributions due to Amini et al. (2013): for x in the range of g, 0 ≤ p ≤ 1, a > 0, the first shape parameter, and b > 0, the second shape parameter.The default values for a and b are 1.
These distributions were motivated as the distribution of the ath upper b-record value for a random sample from the cumulative distribution function G.They have been applied to model weekly earnings of full-time wage and salary workers from the US Bureau of Labor Statistics (Amini et al. 2013).
Log gamma G II distributions also due to Amini et al. (2013): for x in the range of g, 0 ≤ p ≤ 1, a > 0, the first shape parameter, and b > 0, the second shape parameter.The default values for a and b are 1.
These distributions were motivated as the distribution of the ath lower b-record value for a random sample from the cumulative distribution function G.They have been applied to model weekly earnings of full-time wage and salary workers from the US Bureau of Labor Statistics (Amini et al. 2013).
Marshall Olkin G distributions due to Marshall and Olkin (1997):  2014); number of miles to first and succeeding major motor failures of buses operated by a large city bus company (Gui 2013a); number of times that a given paper is cited in a given database (Perez-Casany and Casellas 2014); permeability values from horizons of the Dominquez field of Southern California (Jose et al. 2009); remission times of a random sample of bladder cancer patients (Ghitany et al. 2005(Ghitany et al. , 2007)); survival times of guinea pigs injected with different doses of tubercle bacilli (Krishna et al. 2013); vinyl chloride data obtained from clean up gradient monitoring wells (Zakerzadeh and Mahmoudi 2012); waiting times before service of bank customers (Zakerzadeh and Mahmoudi 2012).
Modified beta G distributions due to Nadarajah et al. (2013c):  for x in the range of g, 0 ≤ p ≤ 1, and −∞ < λ < ∞, the skewness parameter.The default value for λ is 1.
These distributions were constructed as modifications of Azzalini (Azzalini 1985)'s skewsymmetric distributions.They have been used to model annual maximum daily rainfall data for 14 locations in west central Florida: Clermont, Brooksville, Orlando, Bartow, Avon Park, Arcadia, Kissimmee, Inverness, Plant City, Tarpon Springs, Tampa International Airport, St Leo, Gainesville and Ocala (Nadarajah et al. 2013b).

Illustrations
Here, we provide three illustrations of the practical use of the package Newdistns.
The first illustration plots the probability density function of the beta-gamma distribution for varying parameter values.We have taken the shape and scale parameters of the gamma distribution to be one of (1, 1), (2, 2), (3, 1) or (5, 2).The shape parameters of the beta distribution are taken to be 2 and 3.
The following codes will produce Figure 1, the plot of the probability density functions of the beta-gamma distribution.

Figure 1 :
Figure 1: Probability density functions of the beta-gamma distribution.

Figure 2 :
Figure 2: Cumulative distribution functions of the beta-Student's t distribution.

Figure 3 :
Figure 3: p-values of the Kolmogorov-Smirnov test versus n for the beta Student's t distribution.

Figure 4 :
Figure 4: p-values of the Kolmogorov-Smirnov test versus n for the beta exponential gamma distribution.

Figure 5 :
Figure 5: p-values of the Kolmogorov-Smirnov test versus n for the exponentiated normal distribution.