Mixed Non-Parametric and Parametric Estimation Techniques in R Package etasFLP for Earthquakes’ Description

etasFLP is an R package which ﬁts an epidemic type aftershock sequence (ETAS) model to an earthquake catalog; non-parametric background seismicity can be estimated through a forward predictive likelihood approach, while parametric components of triggered seismicity are estimated through maximum likelihood; estimation steps are alter-nated until convergence is obtained and for each event the probability of being a background event is estimated. The package includes options which allow its wide use. Methods for plot , summary and profile are deﬁned for the main output class object. The paper provides examples of the package’s use with description of the underlying R and Fortran routines.


Introduction
A reliable estimation of the conditional intensity function is crucial in the description of seismic data through a space time point process. Often the conditional intensity function presents a superposition of a parametric component over a non-parametric component, e.g., in the epidemic type aftershock sequence (ETAS) model (Ogata 1988), widely used in statistical seismology. Indeed if we want to predict large earthquakes in presence of clusters of aftershocks, these may complicate the statistical analysis of the background seismic activity (Adelfio, Chiodi, De Luca, Luzio, and Vitale 2006) so it could be useful to study the features of independent events separately from the study of the strongly correlated ones in order to describe the seismicity of an area in space, time and magnitude domains. Zhuang, Ogata, and Vere-Jones (2002) proposed a stochastic method (also implemented as an option in the package etasFLP) which associates a probability that each event is either a background event or an offspring generated by other events, based on the ETAS model as in a latent class cluster model. The probabilities are then used to generate a thinned catalog.
However, the simultaneous estimation of non-parametric and parametric components is one of the main issues from a statistical point of view, as well as from a computational point of view. In the ETAS model, non-parametric components are associated mainly to background seismicity, while parametric ones are related to induced or triggered seismicity. Adelfio, Chiodi, and Luzio (2010) presented a seismic sequences detection technique that identifies the conditional intensity function of a model describing the seismic activity as an ETAS model; Adelfio (2010a) used non-parametric methods to estimate the intensity function of a space-time point process and interpreted clustering results by a second-order diagnostic approach; see also Adelfio and Schoenberg (2009), as well as Adelfio and Chiodi (2009). Console, Jackson, and Kagan (2010) proposed a stochastic method associating to each event a probability to be either a background event or an offspring generated by other events; Marsan and Lengliné (2008) used the concept of cascade triggering without using models. Adelfio and Chiodi (2013) proposed a method for the simultaneous solution of these problems, with an optimal selection of the bandwidth for the estimation of background seismicity. The etasFLP package is mainly based on those ideas. In Chiodi (2013, 2015a), we classified events according to their probability of being a background or an offspring event, as proposed by Zhuang et al. (2002), and then estimated the space-time intensity of the generating point process of the different components by mixing non-parametric and parametric approaches, applying a forward predictive likelihood estimation approach to semi-parametric models (Chiodi and Adelfio 2011;Adelfio and Chiodi 2015a). The probabilities of being a background event are used as weights in non-parametric intensity estimation of the background seismicity.
The package ETAS (Jalilian 2016) in the R environment (R Core Team 2016), provides a porting of the Fortran code by Jiancang Zhuang, to estimate the components of an ETAS model. However ETAS cannot be used to implement the mixed estimation approach proposed in this paper; also for this reason we created the package etasFLP (Chiodi and Adelfio 2017), which provides open source tools to estimate a wide class of the ETAS model, developed in the R environment. The package has various options which allow to deal with different models of the ETAS family, and also different kinds of estimation strategy.
In Section 2 we recall some formal definitions of point processes and of the ETAS model. In Section 3 we report the definition of the kernel estimator for intensity functions, then the basic method for non-parametric estimation is introduced in Section 4; the algorithm for simultaneous estimation of non-parametric and parametric components of an ETAS model used in etasFLP is exposed in Section 4.1. The structure and details of the code are explained in Section 5 with some application of the ETAS model in Section 6, while final remarks and future developments are presented in Section 7.

Branching point processes: The ETAS model
A point process is a random collection of points, each one representing the time and space coordinates of a single event. Let Z d = T × S d−1 be a general d-dimensional closed region, with T the time domain and S d−1 a two or three dimensional space. The conditional intensity function (Daley and Vere-Jones 2003) is the expected frequency of events in a space-time unit region, conditional on the history H t of the point process up to time t, i.e., where L(x) is the Lebesgue measure of x, ∆t, ∆s are time and space increments. Generally, intensities λ(z) depend on some unknown parameter.
The ETAS model, introduced by Ogata (1988), is a space time branching process for earthquake description, widely used in seismological context. Seismic events are usually collected in seismic catalogs, that represent the time-space-magnitude dimensions of n observed events, where the ith event is identified by its occurrence time T i , its hypocentral coordinates (x i , y i , z i ) and its magnitude m i .
The ETAS conditional intensity function can be written as follows: where m j is the magnitude of the jth event and Therefore, in the ETAS model, background seismicity has a time stationary intensity µf (s), with f (·) a density function, while the occurrence rate of aftershocks at time t, following the earthquake of time t j and magnitude m j , is described by the following parametric model: where κ 0 is a constant which measures the aftershocks productivity, c and p are parameters of the modified Omori's law (Utsu 1961); p is useful in order to characterize the pattern of seismicity of the given region, indicating the decay rate of aftershocks in time.
For the spatial rate, conditioned to the magnitude of the generating event, the following distribution is often used: The last equation relates the occurrence rate of aftershocks to mainshock magnitude m j , through parameters α, γ that measure the influence on the relative weight of each sequence; m 0 is the completeness threshold of magnitude i.e., the lower bound for which earthquakes with higher magnitude are surely recorded in the catalog; d and q are two parameters related to the spatial influence of mainshocks.
Finally, the following eight-parameters function gives the whole intensity function in a point of coordinates (x, y, t), conditioned to the past history H t : In the package etasFLP we used this formulation of the model with an eight-components vector of parameters: θ = (µ, κ 0 , c, p, α, γ, d, q), so that in the package some simpler model can be fitted; for example a 7-parameters model can be fitted fixing γ = 0, as explained in Section 5.
θ is usually estimated by the maximum likelihood (ML) approach, with nontrivial numerical techniques (Ogata 1998). A procedure to maximize the expected complete data log-likelihood function, based on the expectation-maximization algorithm, was introduced in Veen and Schoenberg (2008). In Adelfio and Ogata (2010) a naive likelihood cross-validation is optimized to obtain the bandwidth of the smoothing kernel used to estimate the intensity for earthquake occurrence of northern Japan.
Instead, the first component f (·) of model (1) is generally estimated by non-parametric techniques. In particular, in kernel-type approaches either a fixed (Vere-Jones 1992) or adaptive kernel smoothing (Zhuang et al. 2002) can be used. Zhuang et al. (2002) also estimated the probability for each event of being a background event (ρ i , i = 1, . . . , n) in order to obtain a thinned catalog, that includes events with a bigger probability of being mainshock, so that for the thinned process spatial intensity is described by an inhomogeneous Poisson process. Package etasFLP gives the option to follow this approach.
The simultaneous estimation of the background intensity and the triggered intensity components of an Epidemic type model is a crucial statistical issue.
In our algorithm, according to Console et al. (2010), we use ρ i as weights for the kernel estimation of the background seismicity to get a simultaneous estimate of the intensity components of the ETAS model (1).
The problem of the estimation of the non-parametric component is addressed in the following sections.

Kernel estimator for the intensity function
Given n observed events u 1 , u 2 . . . , u n in an r-dimensional closed region, the kernel estimator of the unknown density f (·) (Silverman 1986;Wand and Jones 1994) in a generic point u ∈ R r is:f where K(·, ·) is a multivariate kernel function centered at observed points and Σ is a matrix of smoothing constants and w i are weights. The w i , i = 1, . . . , n, can be given by the estimated probability that the ith point belongs to the background. A common choice for K(·, ·) is the normal multivariate density; to take into account highly variable patterns in a space region, variable smoothing matrices Σ i can be more suitable (Terrell and Scott 1992), but in package etasFLP we simply used a bivariate normal kernel for the spatial distribution of background seismicity, with two smoothing constants, the bandwidths h x , h y (now denoted with h = (h x , h y )), which can be estimated according to the FLP approach, sketched in Section 4. Adelfio (2010b) found a convenient method of displaying some second-order properties in a neighborhood of a selected point of the process, based on very general high-dimensional kernel estimator developed to define a more general version of the conditional intensity function of a multi-dimensional point process.
The usual maximization of the likelihood with respect to the smoothing parameters, as known, would produce bandwidths of length zero and degenerate intensities on the observed points, so we used another strategy of estimation for non-parametric components, which however uses a concept linked to likelihood and to time-dependent nature of data.

Forward predictive likelihood (FLP)
Suppose that in a space-time point process the intensity function λ(·) depends on a set of parameters ψ. Letλ ψ;Ht k (·) denote a generic estimator of λ(·), based on observations until t k , that could include smoothing constants in a semi-parametric context. Assume that a realization of the process is observed in the space region Ω s and the time interval (T 0 ; T max ). The log-likelihood for the point process, given the k observed values z i and computed using the estimatorλ ψ;Ht k (·) is: In this package, we use the method proposed in Chiodi and Adelfio (2011), that measures the ability of the observations and estimates until t k to give information on the next observation.
Let log L(λ ψ;Ht k (z); H t k+1 ) be the log-likelihood computed on the first k + 1 observations, but using the estimates based on the first k ones, defined as: For example, in Equation 6,λ ψ;Ht k (z k+1 ) could be the intensity of the (k+1)th point estimated by a kernel method using centers given by the previous k points.
Then, in order to measure the predictive information of the first k observations on the (k+1)th, we define δ k,k+1 , the difference between (6) and (5), as: Therefore, given n observations, the target function is: where k 1 is a fixed constant, for example k 1 = n 2 . In this paper, ψ = h = (h x , h y ), although in the more general kernel estimator in Equation 4 we could have ψ = Σ. fix v max the maximum number of iterations and max the maximum difference between estimates in successive iterations; the choice of a starting value for θ is explained in Section 5.3.

7:
Initially the weights ρ i are fixed to a common value, that is, in the first step no weighting is performed.
End of initialization steps. body of the algorithm: Get the ML estimatorθ (v) of the parameters of the model θ = (φ, µ) , by means of the maximization of the whole likelihood (5) and compute the values: 10: On the basis of the estimated parameters, for each point of the data set estimate the weight: The values ρ (v) i are an estimation of the probability to belong to the background seismicity and are used as weights for the non-parametric estimation of the background intensity in (4), with w i = ρ i .

12:
Estimate the optimal bandwidths h = (h x , h y ) of the kernel estimator, through the FLP approach, that is maximizing (7) with respect to (h x , h y ) and holding fixed triggered

13:
Update the estimation of the background intensityμf h (v) (x i , y i ), through a weighted normal kernel estimator with bandwidthsĥ x ,ĥ y and weights ρ ; is used to check for convergence 16: end while end of the body of the algorithm. 17: end procedure

The algorithm for mixed estimation
In this section we explain the general structure of the estimation algorithm, used in the R function etasclass(), which executes the estimation procedure in the more complete situation, which is an eight-parameters model, with space-dependent background seismicity, declustering and weighting steps and bandwidth estimation through the FLP criterion of Equation 7.
In order to estimate the different components of the space-time point process ETAS with intensity (eq:etasfull), we describe the simultaneous estimation of non-parametric and parametric components of the ETAS model. In other words, we alternate the standard parametric likelihood method, to estimate the parameter θ of the offsprings component, with the FLP approach, used just to compute the smoothing parameters h of background intensity in (4). Further, the proposed mixed procedure estimates the probability ρ i of each event to belong to one of the two model components and uses the ρ i as weights in (4).
The main function of the package etasFLP is the R function etasclass() and is based on the following procedure. Given an earthquake catalog of n events, which occurred in a given space-time region, and the intensity given in Equation (3), the full iterative process of simultaneous estimation of the non-parametric and parametric components of the model, proceeds as given in Algorithm 1.

The R and Fortran implementation
The package in R is installed and loaded in the usual way, e.g.:

R> install.packages("etasFLP")
Once the package is installed, then it must be loaded through the usual: R> data("italycatalog", package = "etasFLP") R> data("californiacatalog", package = "etasFLP") The structure of these data frames is the usual structure for a general catalog used as input for the function etasclass(). Any catalog must contain at least five variables with names time, lat, long, z, magn1:

Structure of the code
The code is written mainly in R, in order to use an easy accessible open source system and to use the input/output facilities. However some time consuming routines are written in good old Fortran, especially for computations hard to be optimized in R.

List of functions and subroutines
The main R functions are: Not all functions are documented in the package; other R functions are intended for internal use only. Fortran code is present in the source code in the file fortran_etas_code.f90 in the directory src.

The function etasclass()
The main function of the package etasFLP is the R function etasclass() which performs the optimization steps of the algorithm described in Section 4.1. Some options are present, which allow to fit a variety of models with different kind of estimation procedures. Details on input arguments are given in the help files of the package.
• magn.threshold: Threshold magnitude (events of magnitude at least magn.threshold will be selected). Default value = 2.5. Some idea on the magnitude threshold of the catalog can be obtained using the function magn.plot(), which plots the logarithm of the back cumulated frequencies of the magnitudes; a change of slope in this graph is an indication of a magnitude threshold. However in the first trials on a new catalog, a good strategy is to begin with a high value of magn.threshold.
• magn.threshold.back: Threshold magnitude used to build the catalog cat.back for the first estimation of the background seismicity. Default value = magn.threshold + 2. This is only a simple technique to obtain a first estimate of the space background seismicity (namely,f h (0) (x, y) in the algorithm in Section 4.1).
• mu, k0, c, p, a, gamma, d, q: Values of the eigth-parameters of the ETAS model. They will be used as starting values or as fixed values according to the value of params.ind.
• params.ind: Vector of 8 indicators; params.ind[i] = TRUE means that the ith parameter must be estimated. params.ind[i] = FALSE means that the ith parameter is fixed to its input value (the order of parameters is: mu, k0, c, p, a, gamma, d, q). Default value = replicate(8, TRUE), that is, etasclass() estimates all parameters.
Then three flags follow and an integer which regulates the kind of declustering, namely declustering, thinning, flp and ndeclust: • hdef: Starting values for the x, y bandwidths used in the kernel estimator of background seismicity. Default value = 1,1.
• declustering: If TRUE the catalog is iteratively declustered for a maximum of ndeclust steps to optimally estimate the background intensity. Default value = TRUE.
• thinning: If thinning = TRUE a background catalog is obtained sampling from the original catalog with probabilities estimated during the iterations. Default value = FALSE and in this case declustering is made by weighting with weights ρ i .
• flp: If flp = TRUE then background seismicity is estimated through forward predictive likelihood (described in Section 4). Otherwise Silverman's rule is used. Default value = TRUE.
• ndeclust: Maximum number of iterations for the general procedure, that is v max in the algorithm of Section (4.1). Default value = 5. A value of 3 can be however good enough to have reliable estimates.
• w: initial weights Other control parameters: • onlytime: If TRUE then a time process is fitted to data, regardless to space location.
• is.backconstant: If TRUE then the background seismicity is assumed to be homogeneous in space (and declustering, flp are set to FALSE).
• description: A description string used for the output. Default value = "".
• cat.back: Catalog used for the estimation of the background seismicity. Default value = NULL.
• back.smooth: Controls the level of smoothing for the background seismicity . • longlat.to.km: If TRUE, then long and lat variables of cat.orig are treated as geographical coordinates and converted to kilometers. Default value = TRUE.
• usenlm: If TRUE, then the R function nlm() (Gauss-Newton method) is used in the maximum likelihood steps; if FALSE, then the R function optim() is used (with method = "BFGS"). Default value = TRUE.
• compsqm: If TRUE, then standard errors are computed. Default value = TRUE.
• ntheta: Number of divisions of the round angle, used in the approximation of the integral involved in the likelihood computation of the ETAS model, as explained in the following subsection. Default value = 100.

The ML step
The main call of the ML step can be tuned with the argument usenlm: If this argument is set to the default value of TRUE, then the maximization of the likelihood is performed through a call to the standard Newton-type optimization function nlm(), present in the stats package of R. If problems are experienced with this call, then with usenlm = FALSE the function optim(), of R package stats, is used with the option method = "BFGS", which performs a quasi-Newton optimization. In general nlm() is preferred for efficiency and optim() should be used only if there are problems with convergence. Here we report the piece of code with this optimization call: The function minimized is − log L, according to Equation 5, where k is set to n in order to have the whole likelihood and is computed by the function etas.mod2(), which internally calls the Fortran subroutine etasfull8(), used for the computation of the triggered intensity.
In the space domain, the integral involved in the likelihood is solved through a transformation to polar coordinates centered on each observed point (Ogata 1988): The round angle is then divided in ntheta parts; the default value (ntheta = 100) is enough to obtain a very good approximation. Each of the n × ntheta slices is then integrated on the time domain. This integration is efficiently computed in etas.mod2() with few lines of R code, using the advantage and the speed of R for computation involving whole matrices.
Briefly, to solve the integral of the component in (2), putting the origin on the point x j , y j , transforming to polar coordinates (ρ, θ) we obtain (when q = 1): Now integration with respect to θ is substituted by a finite sum, dividing the round angle in n θ slices. A good approximation of the integral is therefore given by: where ∆(θ) is the angle of the region, and a jl is the distance from the jth point to the border of the space region in the direction θ l (l = 1, 2, . . . , n θ ). This step can be memory and time consuming, since here a matrix (n × n θ ) with elements a jl must be stored and managed. Elements a jl are computed once in the initialization steps of etasclass().
The integration on the time domain of the components g(t − t j |m j ) is straightforward, with a particular case when p = 1.
In the likelihood computation of the ML step, the value of the background intensity is kept fixed to the last value obtained, whichever method is used for the background. Also the weights ρ i are kept fixed and passed as argument to etas.mod2(), together with the vector of the background densities. The optimization is made with respect to the logarithm of parameters, in order to respect the positivity constraints for all parameters.

Choice of starting values
The optimization algorithm depends on the choice of initial values. Some default guess choice is performed inside the function for parameters without input starting values. For example if there is no input value for mu, the function etasclass() will start with µ = n 2∆t . This approximation comes from the solution of: with respect to µ, κ 0 , supposing initially ρ i = 1 2 , i = 1, 2, . . . , n. Other reasonable values are given for other parameters, by the function etas.starting of the package. The etas.starting function provides, as output, starting values for the parameters, that can be used as input in etasclass. These are only guess values, based on an exploratory analysis of the input catalog and on some approximation, for example the one used in Schoenberg (2013) to approximate the integral in (5), when λ(·) is the intensity function of a space-time ETAS model. Initial value for c is based on the empirical distribution of lag times between successive events. Starting value for d is based on the empirical distribution of space distances between successive events. mu and kappa0 starting values are obtained solving the likelihood equations (8) using the approximation for the integral given in Schoenberg (2013); for the other parameters the initial rough guesses are: p = 1, q = 2, a = 1.5, gamma = 0.5. The weights ρ are initialized to 0.5 Although it is only an heuristic solution for a non trivial problem, in our experience, it seems to individuate satisfactory starting points.
In general, any previous ETAS model fitting to similar data can be useful as a starting point.
If convergence problems are experienced for a given catalog, a useful strategy can be to start with a high magnitude threshold value m 0 (that is, with a smaller catalog which contains only big earthquakes), and then using this first output as starting guess for a new execution of etasclass() with a lower magnitude threshold value m 0 .
In this trial executions declustering should be avoided (declustering = FALSE) or ndeclust however must be fixed to a small value, no more then 3.

The FLP step
The FLP step is performed only if flp = TRUE. In this case we must have declustering = TRUE. To maximize expression (7) with respect to the bandwidths (h x , h y ), we used the R function flp1.etas.nlm() which minimizes the function flpkspace() through the optimization function nlm(). flpkspace() substantially computes −FLP, defined in (7) and is an interface to the Fortran subroutine deltafl1kspace().
The kernel density is computed by the R function kde2dnew.fortran(), which is a standard bivariate normal kernel estimator, substantially an interface to the Fortran subroutine density2(). The arguments of the function are: • xkern, ykern: x-values, y-values of kernel points of length n (n = length(xkern)).
• gx, gy: x-values, y-values of the points where densities must be estimated.
• w: Vector of weights to give to observed points (length n).
• eps: Enlargement factor for the region.
The used kernel weighted estimator is as in (4), with w i = ρ i and Σ a diagonal matrix with elements h.

The object of class 'etasclass'
An object of S3 class 'etasclass' is returned by the function etasclass() and contains information on the estimation procedure. The main items of the output are: • this.call: The exact call that originated the output.
• params.ind: Indicates which parameters have been estimated.
• params: ML estimates of the ETAS parameters.
• sqm: Estimates of standard errors of the ML estimates of the ETAS parameters (sqm[i] = 0 if params.ind[i] = FALSE).
• hdef: Final bandwidths used for the kernel estimation of background spatial intensity (however estimated, with flp = TRUE or flp = FALSE).
• rho.weights: Estimated probability for each event to be a background event (ρ).
• time.res: Rescaled time residuals (for time processes only).
• params.iter: A matrix with the estimates values at each iteration.
• sqm.iter: A matrix with the estimates of the standard errors at each iteration.
• rho.weights.iter: A matrix with the values of rho.weights at each iteration.
Also the values of optional input arguments are returned in the output.

The output: Text and graphical output
plot and summary methods for object 'etasclass' As usual in R, the output is managed through the standard S3 methods print() and summary() for textual output and the plot() method for default graphic output defined for objects of class 'etasclass', the standard output object of the function etasclass().
The text output of summary() contains essential information on the catalog, on the ML estimates and their standard errors and on the general iterative estimation process, giving the sequence of AIC values.
The plot method has some specific optional argument (the first argument is a compulsory 'etasclass' object): • pdf: If TRUE, then 2D plots are sent to a pdf file; • file: Name of the pdf file; • ngrid: Number of points for each direction (x, y) of a ngrid × ngrid space points grid, where estimated intensities must be evaluated. Default value = 200, which gives a good graphical resolution; • flag.3D: Option of the plot method. If TRUE a 3D plot is also produced with estimated intensities for each observed point; • flag.log: Option of the plot method. If TRUE then a log scale is used to plot intensities. The default is FALSE; of course the plot method can be called twice with two different scale options.
The computation of the grid of intensities has been left to the plot method to avoid too many computations in the main routine etasclass(). Three intensity plots are computed: the whole intensity, the background and the triggered ones.
Values of the triggered intensity on a space grid of values are efficiently computed through the Fortran subroutine etasfull8tintegrated(), which essentially computes the space intensities through an integration on the time domain of the components g(t − t j |m j ): where (x, y) is one of the ngrid × ngrid points of the spatial grid. As in other parts of the package etasFLP, time consuming routines are written in Fortran.
In our package we use function image.plot(), from package fields (Nychka, Furrer, Paige, and Sain 2016), to plot the intensities on the maps (geographic maps are computed with the function map() of the package maps; Becker, Wilks, Brownrigg, and Minka 2016).
The values of the intensities on the grid are however given as output, so that users could use this plot method as a grid computing engine and can use anything else for graphical plotting (e.g., using some Google maps resources). Some more graphical diagnostic outputs and residual analysis tools are provided by the plot method, as described in Adelfio and Chiodi (2015b), and used also in Nicolis, Chiodi, and Adelfio (2015).

profile() and plot() methods for objects 'etasclass' and 'profile.etasclass'
Approximate inference on a single parameter can be made through the profile method for 'etasclass' objects. Given the likelihood function L(θ) for the whole model, where the unconstrained ML estimate of the unknown parameter vector θ isθ (of k components, k ≤ 8), denote by θ j the single parameter of interest for which the profile must be built. Let θ (−j) be the k − 1 components vector of the remaining parameters, and letθ (−j) (θ j ) be the ML estimate of θ (−j) corresponding to a fixed value of (θ j ). The function L[θ j ,θ (−j) (θ j )] is the profile likelihood function for θ j .
Indeed we used a rescaled version (−2 log(LR)) of the profile likelihood, so that we define: The study of this function versus θ j is useful to make approximate inference on a single parameter; for example we can build the 1 − α confidence interval for θ j , selecting the interval for which l p (θ j ) < χ 2 α . To avoid heavy computations, l p (θ j ) is computed only for a few values of θ j , and then a spline interpolation is used to have a reasonable sketch of the plot of the profile likelihood; the approximation will be in general quite good since the profile function is convex near the global solution and furthermore we know the global minimum value. This method is numerically convenient since each computation of a value l p (θ j ) is computationally very expensive: A likelihood maximization problem must be solved, with k − 1 unknown components instead of the k components of the unconstrained problem. We will need only one ML step, since the non-parametric components will be the same of the full model.
The profile method needs an object of class 'etasclass' as input, in order to have a full model already estimated without constraints. The computation of the profile likelihood will leave unchanged all options used to obtain the 'etasclass' object, e.g., fixed parameters, weighting and declustering options, background seismicity.
• iprofile: An integer in the range 1-8. The profile likelihood will be computed with respect to the parameter of index iprofile. The order of parameters is: µ, κ 0 , c, p, α, γ, d, q.
• nprofile: The number of values of params[iprofile] for which the profile likelihood must be computed. Default value = 7. An odd number of grid points nprofile is advised, so that the central point is the unconstrained ML estimate for the profiled parameter, and the spline interpolation will have a better quality.
The global ML estimates are in ml = fitted$params [profile], with asymptotic standard errors sd = fitted$params.sqm. The profile likelihood is computed in the range (ml -sd * kprofile, ml + sd * kprofile). Default value = 3.
• profile.approx: If TRUE, then a conditional-likelihood approach is used to obtain a first valueθ * (−j) (θ j ) for each maximization step in the profile likelihood computation. Default value = FALSE. The formula used is: The plot method for an object of the class 'profile.etasclass' is so defined: plot(x, prob = c(0.9, 0.95, 0.99), ...) • x: Is an object of the class 'profile.etasclass' (the output of the profile method for 'etasclass' objects).
In the plot method for 'profile.etasclass' objects the profile likelihood is computed and plotted with a spline interpolation on nprofile points. Inverse spline interpolation is used to obtain approximate confidence intervals with coverage probabilities given as input in the vector prob.

Examples
The following examples have been executed on a PC with a Intel Core CPU i7-3770 (3.40GHz) with a RAM of 8Gb; we used the R version 3.2.2 (64-bit) running under Ubuntu 14.04 LTS.

Example with the Italian data
Let consider a sample earthquake catalog: R> data("italycatalog", package = "etasFLP") Assigning to the catalog the class 'eqcat', summary() and plot() methods are defined to have elementary descriptive functions on an earthquake catalog: R> class(italycatalog) <-"eqcat" R> summary(italycatalog) R> plot(italycatalog) On this sample earthquake catalog we can run a sample execution (many options are however fixed to their default values, and in the following example are assigned explicitly only to explain their meaning): R> etas.flp <-etasclass(italycatalog, magn.threshold = 3.0, + magn.threshold.back = 3.5, k0 = 0.005, c = 0.005, p = 1.01, a = 1.05, + gamma = 0.6, q = 1.52, d = 1.1, + params.ind = c (TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE) The previous call of etasclass() fits a space time ETAS model to the catalog italycatalog, only for events with a minimum magnitude of magn.threshold = 3.0. The events with magnitude at least 3.5 (magn.threshold.back = 3.5) are used to obtain a catalog with bigger events: A default kernel estimator of the x, y density on this catalog will be the basis for the first approximationf h (0) (x, y).
The options declustering = TRUE and ndeclust = 15 mean that the whole mixed procedure described in Section 4.1 will be used, with a maximum of 15 steps (the input value of ndeclust).
The option thinning = FALSE means that declustering is made with weighting and not with thinning.
flp = TRUE means that the forward predictive likelihood (Section 4) is used to estimate the bandwidths.
• onlytime = FALSE: Means that a space-time model will be fitted.
• is.backconstant = FALSE: The background intensity will not be constant in the space region.
• sectoday = TRUE: The variable time in catalog italycatalog is in seconds, and this option transforms it into days.
• usenlm = TRUE: The default nlm() function of R will be used in the ML step.
• epsmax = 0.001: Controls the convergence criterion of the declustering steps.  TRUE, TRUE, TRUE, TRUE, TRUE, TRUE,  TRUE, TRUE) This matrix contains the ML estimates of the 8 parameters of the ETAS model for each declustering iteration reported in each row. We can see that the values of the parameters estimates vary a certain amount for the first iterations, but then values remain almost unchanged. To evaluate the information of the declustering and weighting process, we can examine the values of the weights ρ i , i = 1, 2, . . . , n reported in the vector rho.weights. A good situation is when the weights are all close to 0 or 1 (that is, each event is clearly a background event or a triggered one). In Figure 1 we plot the images of the background and triggered space intensities estimated by the FLP approach, obtained by:

R> plot(etas.flp)
In Figure 2 we plot the images of the background and triggered space intensities estimated without the FLP approach, obtained by: R> plot(etas.noflp) As an example, profile is computed for α using the object etas.flp: In Figure 3 we report the graphical output of the command plot(prof.flp), which displays the profile likelihood (computed with profile(etas.flp)) of the parameter α, for the model estimated with the option flp = TRUE. The method used to plot the profile is explained in Section 5.5. Let's have a look at the profile results: We obtained a 95% asymptotic confidence interval (based on the LR test) around α of (1.381−1.635); we can compare this value with the classical asymptotic normal based interval (θ ∓ z 1−α s θ ) obtaining for α (the 5th parameter): 1.509371 ∓ 1.96 × 0.064077 = 1.384 − 1.635, very close to the LR based interval.

An example with California data
Another execution with a different catalog, is shown with an application to the North California data: We selected events with a magnitude of at least 3.5 (magn.threshold = 3.5). Here we used many default choices, so flp = TRUE and thinning = FALSE. All the eight parameters of the ETAS model will be estimated. The summary of the results is reported below.  In Figure 4 the estimated background and triggered space intensities evaluated on the region are reported (obtained with the command plot(etas.flp.cal)). In Figure 5 we report the graphical output of the command plot(prof.flp.cal), which shows the profile likelihood for the parameter α. The object prof.flp.cal has been obtained with the method profile.

Final remarks and future enhancement
The package etasFLP has been published on the Comprehensive R Archive Network (CRAN) and is available from https://CRAN.R-project.org/package=etasFLP. Some computing improvement can be made (together with some generalization). The iterative architecture of declustering will remain unchanged, but some improvements will be made: Particularly we are studying the possibility of using very approximate estimates of parameters in the first steps of declustering, and then improving the precision in later steps. This should increase speed.
After some trial, we decided to use, in the optimization steps, the functions nlm() and optim() with the numerical computation of derivatives. Actually we experimented in nlm() the use of analytical derivatives instead of numerical derivatives, but we did not see significant advantages, both for the computational times and for convergence.
The main functionality improvement will be the use of the FLP approach with a variable kernel estimator, to allow for pattern of points with different geographical orientation.
Feedback from users will be however the best way to improve the package, in an open source philosophy.