RobPer : An R Package to Calculate Periodograms for Light Curves Based on Robust Regression

An important task in astroparticle physics is the detection of periodicities in irregularly sampled time series, called light curves. The classic Fourier periodogram cannot deal with irregular sampling and with the measurement accuracies that are typically given for each observation of a light curve. Hence, methods to ﬁt periodic functions using weighted regression were developed in the past to calculate periodograms. We present the R package RobPer which allows to combine diﬀerent periodic functions and regression techniques to calculate periodograms. Possible regression techniques are least squares, least absolute deviations, least trimmed squares, M, S-and τ -regression. Measurement accuracies can be taken into account including weights. Our periodogram function covers most of the approaches that have been tried earlier and provides new model-regression-combinations that have not been used before. To detect valid periods, RobPer applies an outlier search on the periodogram instead of using ﬁxed critical values that are theoretically only justiﬁed in case of least squares regression, independent periodogram bars and a null hypothesis allowing only normal white noise. Finally, the package also includes a generator to generate artiﬁcial light curves.


Introduction
We introduce the R (R Core Team 2015) package RobPer (Thieler, Rathjens, and Fried 2015), which can be used to calculate periodograms and detect periodicities in irregularly sampled time series.Our special objective are light curves, which occur in astroparticle physics and are irregularly sampled times series (t i , y i , s i ) i=1,...,n consisting of unequally spaced observation times t 1 , . . ., t n , observed values y 1 , . . ., y n and measurement accuracies s 1 , . . ., s n .The measurement accuracies s i give information about how precise the y i were measured.They can be interpreted as estimates for the standard deviations of the observed values.The observed values possibly contain a periodic fluctuation y f with fluctuation period p f and the irregularly spaced observation times t i are realizations of random variables with a periodically shaped density.
Such periodicity in the pattern of the observation times is a typical phenomenon, as the sampling of astroparticle physics' time series is influenced among others by astronomical constellations.For example, plotting a histogram of the observation times for the gamma Figure 1: Light curve with gamma particle emissions for the very high energy gamma particle source Mrk 421 (see Tluczykont et al. 2010, and references therein).Panel 1a shows the light curve, vertical lines at each point show the reported measurement accuracies.Panel 1b depicts a histogram of the observation times t i modulo the period p s = 27.31.A sine represents the shape rather well.
particle source Mrk 421 modulo the period p s = 27.31shows an unequal distribution over a cycle of this length (see Figure 1).This is due to the fact that observations cannot be sampled during full moon and the moon period is similar to p s .
So we assume the following model for the observations indexed by i = 1, . . ., n: s i : given estimate for σ i independent from Y 1 , . . ., Y n , where T (i) denotes the ith ordered observation time in T 1 , . . ., T n and D(p s ) is a periodic sampling density with period p s .The observation times t 1 , . . ., t n and the observed values y 1 , . . ., y n are realizations of T 1 , . . ., T n and Y 1 , . . ., Y n , respectively.We assume the observation times to be measured without error.Y f ;i is the systematic periodic component in the observations, corresponding to an unknown periodic function f and the period p f we are searching for.Y w;i is additive noise.
To detect a periodic fluctuation with period p f in the observed values y i , it is not possible to use the standard periodogram of Fourier analysis.This method can only be applied to time series with equidistant observation times, while light curves are typically irregularly sampled.A setting-adapted procedure, the Deeming periodogram (Deeming 1975), is not recommendable either in this case, because it is known to react to a periodicity p s in the sampling (see Hall and Li 2006).
In order to determine periodicity in light curves, other methods than the classical Fourier periodogram or the Deeming periodogram should be used.Popular periodogram methods in astroparticle physics are for example the Lomb-Scargle periodogram (Scargle 1982) or the phase dispersion minimization periodogram (Stellingwerf 1978).These and many other approaches can be generalized to fitting periodic functions to the light curve using least squares regression and calculating periodogram bars based on SE and SY, where SE is the remaining variance in the residuals of the fit and SY is the overall variance in the observed values y i .An even broader class of periodogram methods additionally allows application of robust regression instead of least squares regression and weighted regression to take the measurement accuracies s i into account.
The function RobPer in our homonymous R package calculates a periodogram of a light curve based on fitting periodic functions to (t i , y i ) i=1,...,n using least squares or a robust regression technique, optionally taking measurement accuracies s i into account using weighted regression.The coefficient of determination corresponding to the objective function of the regression technique is used as periodogram bar.This proceeding incorporates analogues to most of the existing periodograms and introduces several new techniques.Preliminary implementations of most of these periodogram methods have been compared by Thieler, Backes, Fried, and Rhode (2013).Here, we explain the usage of the R package RobPer, which makes improved and extended methods for period detection publicly available.
This article is organized as follows: In Section 2, the usage and the structure of the function RobPer are explained.Especially, the different periodic functions and regression techniques are discussed and related to the existing periodogram methods.Diagrams which show how this R function is implemented in detail are displayed in Appendix A. Section 3 is devoted to the question how to find valid periods using a periodogram.Thieler et al. (2013) propose robust fitting of a beta distribution combined with outlier detection.The function betaCvMfit in the package RobPer performs this.In Section 4, the function tsgen is presented which allows to generate artificial light curves.Some examples for how to use the package are given in Section 5. Section 6 concludes with a summary.

Calculate periodograms with RobPer
The R function RobPer calculates a periodogram of a given light curve (t i , y i , s i ) i=1,...,n .This is done by fitting a periodic function g to the data (t i , y i ) i=1,...,n .The function g has m parameters entering g linearly.It has a period of 1 and is transformed by g t p j for each given trial period (p j ) j=1,...,q .A simple example is g(t) = sin(2πt)β 1 + cos(2πt)β 2 .The periodogram bars for the different trial periods are defined as the coefficients of determination of the respective fits.Using weighted regression with weights 1/s i makes it possible to take Light curve (ti, yi, si) or (ti, yi), i = 1, . . ., n ; If weighting = FALSE the measurement accuracies si column may be omitted.
weighting ∈ {T, F} If TRUE, weighted regression is performed to take into account the si.

Return value periodogram ∈ R q
Vector of periodogram bars belonging to the trial periods.the measurement accuracies into account.As the shape of the true fluctuation f in Equation 3is usually unknown, we will typically have g = f .

Possibly warnings
Table 1 gives an overview over all arguments of RobPer.The possible shapes of the function g that may be fitted by RobPer are presented in Section 2.1.Fitting them using least squares regression is in many cases equivalent to already existing periodogram methods (see with respect to the unknown parameter value β ∈ R m , where X ∈ R n×m is the design matrix containing the known components of g t p at the measurement times t 1 , . . ., t n with p being a trial period and y the vector of observations y 1 , . . ., y n .In the simple example mentioned above, the ith row of X has the elements sin(2πt i /p) and cos(2πt i /p).The function ζ : R n → [0, ∞[ is chosen according to the regression method, e.g., ζ(r) = n i=1 r 2 i for least squares regression.Using the same regression technique, the location µ of the observations y 1 , . . ., y n can be estimated minimizing with i = 1 n being an n-variate vector of ones in case of unweighted regression.The periodogram bar can then be calculated as R 2 = 1 − SE SY .This definition for the coefficient of determination does not only apply for least squares regression, but also for least absolute deviation-(L 1 ) and M-regression in general (see Maronna, Martin, and Yohai 2006, p. 171) as well as for S-, least trimmed squares-(LTS) and τ -regression (see Croux and Dehon 2003).
If it is intended to take given measurement accuracies s 1 , . . ., s n into account, weighted regression can be performed.In this case, the terms y, X and i in the two fitted models are replaced by y i = y i /s i , X ij = X ij /s i and i i = i i /s i = 1 n /s i , respectively.In the following, we will focus on the case of unweighted regression and only point out the handling of weighted regression, when both procedures differ.and 2step), the sine function (sine), Fourier series of second and third degree and periodic spline functions (fourier(2), fourier(3) and splines).Regression techniques: See Table 3 for labels.The underlined methods can take into account measurement accuracies using weighted regression.The periodogram bars of methods marked by * do not base on SE or SY, but on the parameter vector of the function fitted (e.g., squared amplitude).

Step functions
Many periodogram methods from astroparticle physics such as the epoch folding periodogram (Leahy et al. 1983) or the analysis of variance periodogram (Schwarzenberg-Czerny 1989) can be interpreted as fitting a step function to a light curve (see Schwarzenberg-Czerny 1998 or Thieler et al. 2013).They use periodogram bars related to R 2 , n and the numbers of steps per cycle.
Another typical periodogram method in astroparticle physics is the phase dispersion minimization periodogram (PDM, Stellingwerf 1978).Depending on the particular setting the periodogram bar in many cases equals the mean of the coefficients of determination of two fits with different step functions with staggered jumps (see Thieler et al. 2013 or Thieler 2013 for more details).
RobPer provides two options to fit periodic step functions.The number of steps per cycle is controlled by the argument steps.Using model = "step", a single periodic step function with steps of equal width is fitted for each trial period.Performing regression = "L2", model = "step" is equivalent to calculating an epoch folding-or analysis of variance periodogram.Using model = "2step", two different step functions with opposed jump times and steps of equal width are fitted separately and the periodogram bar is the mean of both coefficients of determination.This is the only option where two periodic functions are fitted for one trial period.It is included to provide the PDM periodogram with overlapping bins.

Sine functions
Sine functions are periodic and quite popular for investigating periodicity.The classic periodogram of Fourier analysis for equally sampled time series represents the explained variance SE of a least squares fit of a sine model to the zero-centered time series.The Lomb-Scargle periodogram (Scargle 1982) works equivalently for unequally sampled time series.
As the mean of an irregularly sampled time series is not identical to the least squares fit of an intercept in a sine model, more recent methods use the uncentered data and fit a model with intercept, e.g., the floating mean periodogram by Cumming et al. (1999) and the generalized Lomb-Scargle periodogram by Zechmeister and Kürster (2009).Performing regression = "L2", model = "sine" is equivalent to calculating those periodograms and in case of equidistant observation times also equivalent to the Fourier periodogram.Some other methods as the Date Compensated Fourier Transform by Ferraz-Mello (1981), the SigSpec periodogram by Reegen (2007) or robust approaches by Ahdesmäki et al. (2007) and Zhang and Chan (2005) apply the same regression step as the floating mean-and the generalized Lomb-Scargle periodogram, but use the squared amplitude of the fitted sinusoid as the periodogram bar.In case of regular sampling, this is another representation of the classical periodogram of Fourier analysis.As the amplitude is a concept closely related to trigonometric functions, RobPer uses the coefficient of determination only, to obtain a general method independent of the periodic function chosen.

Further periodic functions
Recently, fitting more complex periodic functions has been proposed for periodograms.Fourier series (see Hall et al. 2000 andPalmer 2009) and periodic splines (see Akerlof et al. 1994, Hall et al. 2000and Oh et al. 2004) may provide better adaptivity compared to sine functions, but still present a continuous function, unlike the step function.RobPer offers the possibility to fit Fourier series of second (model = "fourier(2)") or third (model = "fourier(3)")  degree or a periodic spline function with four knots per cycle (model = "splines").For the latter option, B-splines are generated using the function spline.desfrom the package splines (Bates and Venables 2016).

Regression techniques: Argument regression
Instead of fitting the models mentioned above by the popular least squares regression (see Table 2), RobPer also allows application of six robust regression techniques, see Table 3. Robust regression techniques like least absolute deviations, least trimmed squares (Rousseeuw and Yohai 1984) and M-regression (Huber and Ronchetti 1981) have already been used to fit sines (evaluating the squared amplitude) by Zhang and Chan (2005), Ahdesmäki et al. (2007), Li (2009) and Li (2010).M-regression with the Huber function was applied to fit periodic splines by Oh et al. (2004).Thieler et al. (2013) use least absolute deviations and M-regression and all models described in this article to calculate periodograms based on the coefficient of determination.
To the best of our knowledge, S- (Rousseeuw and Yohai 1984) and τ -regression (Yohai and Zamar 1988) have not been used before in periodogram calculation.For the latter, RobPer uses the algorithms Fast-S from Salibian-Barrera and Yohai (2006) and Fast-τ from Salibian-Barrera, Willems, and Zamar ( 2008) and slightly modified versions of the code distributed with the respective publication (see the respective paragraphs entitled in Section 2.2).The following paragraphs outline the algorithms used by RobPer for calculating the different regression estimators.For the basic definitions of these regression techniques we refer to the literature mentioned above and the book by Maronna et al. (2006).

LTS regression
The R function ltsReg from package robustbase (Rousseeuw et al. 2015) is used to perform LTS regression in RobPer.In preliminary studies we observed that the function can have problems finding a good solution for some of the candidate periods.This results in coefficients of determination which are too small or sometimes even negative.By setting LTSopt = TRUE, it is possible to let RobPer further optimize the solution of ltsReg by using the R function genoud from package rgenoud (Mebane, Jr. and Sekhon 2011).This function uses an evolutionary approach to improve the given solution, locally optimizing the tem-porarily best solutions in a gradient descent algorithm.Further arguments pop.size (size of one generation), max.generations (maximum of generations before stopping the algorithm) and wait.generations(maximum number of generations to wait for an improvement of the optimization criterion) control the behavior of the algorithm and can be set in RobPer by the argument genoudcontrol (see Table 1).The argument tol controls the precision for convergence criteria.
A further problem we observed is that ltsReg sometimes aborts the fit.However, it is typically able to perform the fit if it is run again.In case of a crash, RobPer calls ltsReg up to three times.After the third failed attempt, the respective periodogram bar is set to NA, or a least absolute deviation regression is performed.The latter is done, if the ltsReg regression result should be further processed, using the genoud algorithm or using the LTS result as initial estimate for an M-regression fit (see next paragraph).

M-regression
In case of M-regression, a periodogram bar, i.e., the coefficient of determination where σ is an estimate of the error scale σ in the regression model.As explained above, Equations 10 and 11 represent the minimization criteria of the fits of the chosen periodic fluctuation (SE in Equation 5) and of a location estimate (SY in Equation 6), respectively.The function ρ is a distance measure.The vector i consists of ones in case of unweighted regression.
As mentioned before, in case of weighted regression, y i , i i and the rows x i of the design matrix are standardized by the measurement accuracy s i (see Figure 12 in Appendix A).
The value σ is obtained in an initial estimation of the periodic fluctuation, calculating a scale estimate of the fitted residuals.In principle, one could use a different estimate of σ calculated from fitting only an intercept in Equation 11, but Maronna et al. (2006, p. 171) recommend using the scale estimate from the (larger) regression model.In our context this means that SY depends on the trial period and cannot be calculated globally.On the other hand this ensures that the regression model implementation is based on an iteratively reweighted least squares (IRWLS) approach (see Maronna et al. 2006, pp. 104-105), and meets our special requirements.For M-regression using the biweight function, the implementation makes also use of the function genoud from package rgenoud (see previous paragraph) to overcome possible problems with local optima.
As noted above, weighted regression scales observed values and design matrices by the measurement accuracies.The variance of the error is expected to be about one then.Hence it can be reasonable to set σ to one.This can be done in RobPer setting the argument var1 to TRUE, as is recommendable in our experience in case of weighted M-regression.
To calculate a periodogram bar using M-regression with IRWLS, three initial estimates are needed: A scale estimate σ (if not set to one) and initial location estimates β (0) and µ (0)  for β and µ.The initial estimates should be obtained using robust techniques.As proposed by Maronna et al. (2006, p. 105)

S-regression
In case of regression = "S", RobPer uses the Fast-S algorithm by Salibian-Barrera and Yohai (2006) to perform S-regression for fitting the periodic function efficiently.The algorithm starts with a set of N parameter candidates, locally optimizes them using kk iterations, then optimizes the tt best of these candidates until convergence and finally chooses the best parameter candidate.
The R function FastS used in RobPer is a slightly modified version of the function fast.spublished by Salibian-Barrera and Yohai (2006).It was changed in order to work more efficiently in the context given here, especially when fitting step functions, and to specify one parameter candidate in advance.This candidate is set to where m denotes the dimension of the linear model of the periodic function and μ denotes the location estimate.β µ arises from plugging in the fit obtained from the location model into the parametrization of the full model.This ensures that fitting the full periodic function will not give a worse fit than fitting only a location parameter.Otherwise it could happen that SY < SE and the coefficient of determination (which has to be in [0, 1]) would be negative.
Further changes in FastS are: 1.The arguments k and best.rare renamed to kk and tt to unify notation as in FastTau.
The arguments int, N, kk, tt, b, cc and seed are merged to a list Scontrol, which is also an argument of RobPer (except for int, which is fixed in RobPer).
2. If an intercept column is added to the design matrix (using Scontrol$int = TRUE), this is done before the dimension of the design matrix is determined (instead of doing this first and redoing it in case of Scontrol$int = TRUE).
3. To find a subsample in general position, regressors x i are sampled from the set of rows of the design matrix X ignoring the frequency of occurrence in X.For each regressor x i , one value y i is then sampled from the entries of y belonging to this regressor.In case of a step function to be fitted, one observation per step is drawn to get a subsample.
4. If no subsample can be found in 100 trials, FastS returns NA.RobPer then releases a warning, but can calculate further periodogram bars for other trial periods.
5. The internal functions loss.S, re.s, f.w, scale1, our.solve and rho are now defined outside FastS.Otherwise R would have to redefine them for each periodogram bar.
7. The labels of the return values are changed for better interpretation.

τ -regression
In case of regression = "tau", τ -regression is used to fit the periodic function.2. Arguments N, kk, tt, rr and approximate are combined to a list taucontrol, which is also an argument for RobPer.
3. Subsamples in general position are found as in FastS (change 3 in the previous paragraph).
4. If no subsample can be found, FastTau returns NA instead of a break using the stop function.This allows RobPer to release a warning, while calculating further periodogram bars for other trial periods.
5. A block of code used several times to check new regression parameter candidates for providing the best optimization value so far has been modularized into the subfunction checkbest.
6. Due to rounding errors, it may happen in the IRWLS algorithm that negative values close to zero occur, although they have to be non-negative by theory.This is avoided by setting such values to zero.
7. The subfunction randomset is replaced by the R function sample from the base package as both functions fulfill the same task and sample is faster.
8. The labels of the return values are changed for better interpretation.

Fit beta distributions with betaCvMfit
In this section we present the function betaCvMfit, which robustly fits a beta distribution to a sample using Cramér-von-Mises (CvM) distance minimization.The function is adapted from R code by Brenton R. Clarke for fitting a gamma distribution (see Clarke, McKinnon, and Riley 2012) using CvM distance minimization.Section 3.1 motivates the application of this function, while its usage is explained in more detail in Section 3.2.

Motivation
After a periodogram is calculated, one might be interested in the automatic detection of significant periods.A period shall be called significant, if the respective periodogram bar is atypical from the distribution of the applied criterion under the null hypothesis of no periodic fluctuation.To determine significance, this distribution needs to be known or estimated.Let Q α be the α-quantile of this distribution.Assuming independent identically distributed periodogram bars Per(p 1 ), . . ., Per(p q ) we get A single periodogram bar calculated as described in Section 2 using unweighted least squares regression is B( m−1 2 , n−m 2 )-distributed, where B denotes the beta distribution and m is the dimension of the model.This result can be found in Schwarzenberg-Czerny (1998) or easily be deduced from Seber and Lee (2003, p. 110) and Gupta and Nadarajah (2004, p. 51).Already small violations of the assumptions made about the method or the light curve disturb this proceeding.In this work, we consider weighted and robust regression in addition to ordinary least squares.Besides, we have to take into account small deviations from our model assumptions like bad estimates s i .An example is shown in Figure 2. Panel 2a shows the weighted least squares periodogram (using a sine model) of a light curve only consisting of white noise.The observed values were generated as with y w;i and y r;i being realizations from The value of s i is given for all i, and c is chosen to fulfill where var() denotes the empirical variance.This means, there is roughly an extra 20 percent noise which is not explained by the measurement accuracies.Evidently, no periodogram bar is outstanding, but using the q √ 0.95 quantile of a B( m−1 2 , n−m 2 ) distribution (dashed line), several periods are found automatically.
To circumvent these problems, Thieler et al. (2013) propose to relax the assumption of a predefined B m−1 2 , n−m 2 -distribution and only assume that the periodogram values can be approximated by any beta distribution.As peculiar periods are expected to show up as outliers, robustly fitting a B(θ 1 , θ 2 )-distribution to Per(p 1 ), . . ., Per(p q ) is proposed.The authors use CvM distance minimization for this, which has been recommended by Clarke et al. (2012) for fitting gamma distributions in the presence of outliers.The CvM is defined as where u (1) , . . ., u (n) is the ordered sample, F n is the empirical distribution function and F θ is the distribution function of B(θ 1 , θ 2 ).
Panel 2b shows the predefined (solid) and the CvM-fitted (dashed) beta density for a periodogram calculated from the only-noise-data described above.While the q √ 0.95 quantile of the predefined distribution is about 0.03, the related quantile of the fitted distribution is 0.16 and no period is detected automatically.
The above approach falls within the framework of outlier detection described by Davies and Gather (1993) and is successfully used by Thieler et al. (2013) in the context discussed here.However, it assumes independent periodogram bars.This may cause problems when the periodogram peaks are broad (because the assumption of independency of the periodogram bars is violated): Then it can be hard for the automatism to find any outlying periodogram value, as there are many high values.One might try to ease this problem choosing a selection of trial periods with large distances or considering only the periods referring to local maxima in the periodogram as (roughly) independent trial periods (modifying and expanding an approach of Zechmeister and Kürster 2009) and fit the beta distribution to them using a CvM fit.
Simulations indicate that the beta distribution describes the distribution under the null hypothesis rather well for the different periodograms.Nevertheless, in the following we will call detected periods "valid" and not "significant" to stress that our approach to detect periods lacks a theoretical justification.

The R function betaCvMfit
The function betaCvMfit fits a B(θ 1 , θ 2 )-distribution with mean θ 1 /(θ 1 + θ 2 ) to a sample vector data using CvM distance minimization and has been applied in Thieler et al. (2013) for fitting beta distributions to periodograms to detect valid periods.
As it may happen that the periodogram bars become negative due to fitting problems, the function sets all negative entries of data to zero.If the logical argument CvM is set to TRUE, a CvM fit is calculated.As initial values for the optimization, the moment estimates of the beta distribution are used.If the argument rob is set to TRUE, the median and the median absolute deviation from the median (MAD) are used instead of the arithmetic mean for x and the standard deviation for ŝ, respectively.In case of a very small estimate ŝ (which happens particularly if ŝ is the MAD), the function stops as it is not possible to calculate the estimates θ 1 and θ 2 shown above.The parameters of a beta distribution are strictly positive.Since it can happen that θ 1 or θ 2 are negative, the initial estimates are clipped to be at least 0.00001.If CvM is set to FALSE, the CvM distance is not optimized, and the initial estimates θ 1 and θ 2 are returned.
Figure 3 shows the different fits varying the arguments CvM and rob for 50 B(4, 15)-distributed observations containing 10 percent outliers between 0.8 and 1.

Generate light curves with tsgen
To investigate our periodogram methods in simulations, we implemented the R function tsgen to generate artificial light curves.A preliminary version of this function is used in Thieler   The function calls several autonomous subfunctions one by one which perform individual simulation steps.These are: 1. Generate a sampling t 1 , . . ., t n (using sampler, see Section 4.1).
4. Disturb the light curve replacing measurement accuracies s i by outliers, or replacing observations y i = y f ;i +y w;i +y r i by aperiodic features (using disturber, see Section 4.4).
Table 4 lists all arguments for the subfunctions.The gray-shaded arguments are also arguments for tsgen, which passes them to the respective subfunction.

Generate sampling using sampler
The R function sampler is used to sample observation times t 1 , . . ., t n in the interval [0, n s •p s ] with a possibly periodic sampling of period p s .The sampling pattern depends on the argument ttype (see Table 4).If a periodic pattern is chosen, the observed time interval covers n s cycles of it.
In case of ttype = "equi", the observation times are equidistantly sampled with t i = i ps•ns n .For ttype = "unif", the observation times are drawn independently from a uniform distribution on [0, n s • p s ].Both these sampling schemes are aperiodic, the sampling period p s only influences the duration t n − t 1 of the sampling.
For ttype = "sine" and ttype = "trian", the observation times are sampled from a periodic density with sampling period p s .First, observation cycles z i are drawn from a discrete uniform distribution on {1, . . ., n s } to determine the cycle the ith observation is part of.Second, observation phases ϕ i are sampled with density or To sample from d sine , the function BBsolve from package BB (Varadhan and Gilbert 2009), is used.The unsorted observation times t i are then generated using The sine-shaped density is motivated by sampling patterns observed in real data, see Panel 1b.The triangular shaped density offers an alternative periodic sampling design.Separately sampling observation cycle and phase was proposed by Hall and Yin (2003).
As the result, sampler returns the ordered observation times t 1 , . . ., t n .

Generate periodic signal using signalgen
To generate the periodic component in the observed values, the R function signalgen is used.The values y f ;1 , . . ., y f ;n with fluctuation period p f at observation times t 1 , . . ., t n are generated using The observation times, the fluctuation period and the shape of f are arguments of signalgen (see Table 4).In case of ytype = "const", f is defined as Table 4: Arguments for the subfunctions of tsgen.See the respective section for more details.
Gray-shaded values are also arguments for tsgen, which passes the values to the respective subfunction."var" denotes the empirical variance.so there is no (periodic) fluctuation.This setting can be used to investigate the false alarm probability of a period detection method.In case of ytype = "sine", f is defined as This is a typical assumption in the literature.For ytype = "trian", with ϕ 1 (t) = t mod 1 = (t − t ) is used.This triangular shaped function was originally implemented in order to be able to choose between different periodic shapes.The light curve observed for CoRoT ID 0105288363 (Chadid et al. 2011) shows that functions with a similar shape are quite realistic.When choosing ytype = "peak", y f is generated using This function mostly shows values close to zero and large values for only one time unit per cycle.This "peak" occurring in each cycle has an asymmetric shape.
As the result, signalgen returns the periodic component y f ;1 , . . ., y f ;n of the observed values.

Add noise and measurement accuracies using lc_noise
The R function lc_noise is used to generate measurement accuracies s 1 , . . ., s n and add noise to a periodic fluctuation (see Table 4).The measurement accuracies are sampled from a gamma(3, 10) distribution.This choice is motivated by real data from Tluczykont et al. (2010).As shown in Equation 4, the noise component y w = (y w;1 , . . ., y w;n ) is a realization of Y w with Y w;i ∼ N (0, s 2 i ).A second noise component y r does not depend on the s i .It is generated as red noise, i.e., following a power law with power law index α.For α = 0 we get white noise.Flicker noise (pink noise) is generated using α = 1 and brown noise using α = 2.The power law noise is generated using subfunctions TK95_uneq and TK95.The latter generates an equidistant time series of power law noise according to Timmer and König (1995).For irregular observation times, a noise series resulting from TK95 is used and an unequally sampled noise series is generated following Uttley, McHardy, and Papadakis (2002).
The noise components are scaled so that the variance of the y r;i has approximately the proportion redpart in the overall noise variance and that SNR is the ratio var(y f )/ var(y w + y r ), where var(x) is the empirical variance of vector x.Note that the white noise components' variances are exactly s 2 i , so that the s i are not estimates but true values.In this sense, the measurement accuracies of a generated light curve are more informative for our artificial data than for real light curves, where the measurement accuracies are estimates.Allowing for a second noise component makes it possible to lower the information of the measurement accuracies with respect to the overall noise in the observed values.
The function lc_noise returns the observed values y i = y f ;i + y w;i + y r;i , i = 1, . . ., n.

Disturb light curve using disturber
The last subfunction applied in tsgen is disturber, which can be used to disturb a given light curve (see Table 4).It replaces a given fraction of measurement accuracies by the smaller value s i = 1 2 min(s 1 , . . ., s n ), i in a subset of {1, . . ., n}.As small measurement accuracies stand for precise observations, the influence of observations with disturbed measurement accuracies s i rises in case of a weighted fit.For unweighted regression, this type of disturbance does not affect the result of the fit.
Optionally, disturber also replaces observed values y i by atypical values.For this, a time interval [t start , t start + 3p s ] within the interval [t 1 , t n ] is randomly chosen and all observed values belonging to this time interval are replaced by a peak function: where φ denotes the density of the standard normal distribution.If the y i are intended to be disturbed and the light curve is shorter than 3p s , the function will stop with an error message.
The function returns the modified vectors y = (y 1 , . . ., y n ) T and s = (s 1 , . . ., s n ) T .If the option to change y values is not used (see Table 4) and the fraction of outlying measurement accuracies is set to zero, y and s are returned unchanged.

Application
In this section, we give examples how to use the RobPer package for light curve analysis.We start with an artificial example, also given in the manual, and then analyze some real data.
Alternatively, the functions sampler, signalgen, lc_noise and disturber can be used to generate the same light curve, see Section 4.

Sampling observation times:
R> set.seed( 22 The result is the same:
In the next step, we calculate a periodogram of the light curve.The periodogram is calculated fitting a step model using unweighted M-regression with the Huber function.The light curve spans a time interval of approximately ncycles • ps = 500 time units, so it is sensible to investigate periods up to 50 (one tenth, see Halpern, Leighly, and Marshall 2003).

Disturbed data from GROJ0422+32
The first real data set we analyze is a light curve for gamma ray emission of the source GROJ0422+32, obtained by the BATSE Earth Occultation Monitoring project of the NASA.These experiments are described in Harmon, Fishman, Wilson, Paciesas, Zhang, Finger, Koshut, McCollough, Robinson, and Rubin (2002) and Harmon, Wilson, Fishman, Connaughton, Henze, Paciesas, Finger, McCollough, Sahi, Peterson, Shrader, Grindlay, and Barret (2004).The data have been kindly provided by the NASA, are available from http: //gammaray.nsstc.nasa.gov/batse/occultation,and are shown in Panel 7a.
A large peak is visible starting at about 48900 Markarian Julian days (which corresponds to December 10 1991 in the Gregorian calendar), a so called gamma ray burst.It occasionally occurs in gamma ray observations and can be considered as outlier.The light curve covers a time interval of about 3312 days, so following Halpern et al. (2003) we consider periods up to 330 days (about one tenth of the overall duration of the light curve).Figure 7b shows the periodogram obtained fitting a sine function using least squares regression, which is the classical approach in astroparticle physics.It is calculated using R> data(star_groj0422.32)R> PP <-RobPer(star_groj0422.32, periods = 1:330, model = "sine", + regression = "L2", weighting = FALSE) Periodograms for τ -regression and M-regression using the Huber function are obtained replacing "L2" by "tau" or "huber" in the code above.The respective periodograms are shown in Panels 7c and 7d.All three periodograms do not show any outstanding peak.Apart from this, the periodograms using robust regression have a completely different shape than the least squares periodogram, which seems to have problems with the gamma ray burst.It might be questionable if the least squares periodogram can find a periodic structure in the observations in the presence of the gamma ray burst.We add a sine with period 30 and amplitude 0.005 to the observed values and repeat the analysis.The results can be seen in Figure 8.In Panel 8a it is visible that we did not introduce a strong periodic behavior.Nevertheless, the robust periodograms, Panels 8c and 8d, easily detect it, while there is only a small local peak in the least squares periodogram in Panel 8b.The horizontal lines in Panels 8c and 8d show the respective 330 √ 0.95-quantiles of the CvM-fitted beta distribution and are calculated from a periodogram PP using So, as opposed to least squares regression, robust techniques are able to detect an (added) periodic fluctuation although the data are disturbed seriously by the gamma ray burst.period is detected, but considering the shape of the periodogram, one might wonder if there is a periodicity of 31 hidden in the same way as when adding a small periodic fluctuation to the GROJ0422+32 data, see Panel 8b.However, the periodograms for τ -regression in Panel 9b and Huber M-regression in Panel 9c show a different behavior from Figure 8, so this does not seem to be the case.Especially, the least squares and the Huber M periodogram show a which is equivalent to epoch folding or phase dispersion minimization when using least squares regression (see Section 2).The periodogram is calculated applying R> data(Mrk501) R> RobPer(Mrk501, periods = 1:400, model = "step", regression = "L2", + weighting = FALSE) in case of least squares regression and with regression = "tau" or regression = "huber" in case of τ -or Huber M-regression, respectively.For least squares regression in Panel 10b and Huber M-regression in Panel 10d we see a broad peak between the trial periods 200 and 300, much too broad to be considered as valid period (see Halpern et al. 2003).For τ -regression in Panel 10c, this behavior is not observed.

Data from Markarian 421 and 501
In the examples from the previous section, robust techniques recognize some periodicity in a light curve, while the least squares periodogram only provides a slightly atypical behavior for the trial period in question.Here it is the other way round: the least squares periodogram does not indicate a valid period, but exhibits some interesting feature similar to the previous data set, where a periodicity was hidden in noisy data.This initial suspicion cannot be confirmed by using robust regression instead of least squares regression.In summary, using our methods, we do not find a periodicity in the light curves for Mrk 421 and Mrk 501, neither using least squares nor robust regression.

Conclusions
The R package RobPer presented in this work allows searching for periodicity in irregularly sampled time series, possibly taking into account additional information on the precision of the measurement, if available.These are the typical characteristics of light curves, that is time series occurring in astroparticle physics.The periodogram is calculated fitting periodic functions to the light curve.The user can choose between six different periodic functions and seven different regression techniques, meaning that 42 possible combinations are offered, not taking into account further options like choosing the number of steps for the step model Figure 11: Reading guidance for the structograms: In Panel 11a, the blocks used for the representation of an algorithm.In Panel 11b, a structogram (left) for a simple R code (right), which generates the observations (t i , y i , s i ) i=1,...,100 of a simple light curve with fluctuation period 5.This R code is for demonstration only and not programmed efficiently.

Figure 2 :
Figure 2: Example illustrating that a predefined B m−1 2 , n−m 2 distribution is sometimes not flexible enough if the model restrictions are slightly violated (see text for details).Panel 2a shows the periodogram of a light curve not completely following the assumed data model with the q √ 0.95 quantile of a B( m−1 2 , n−m 2 ) distribution (dashed line).Panel 2b shows a histogram of the periodogram bars, with the density of the B( m−1 2 , n−m 2 ) (dashed) and the CvM-fitted beta distribution with parameters 0.8 < 1 = m−1 2 and 40.18 < 248.5 = n−m 2 (solid).

Figure 3 :
Figure 3: Gray-scale-version of the example for betaCvMfit given in the RobPer manual: Histogram of 45 B(4, 15)-distributed observations and 5 outliers uniformly distributed between 0.8 and 1.The black solid line shows the B(4, 15)-distribution, the other curves show different fits using betaCvMfit (in case of CvM = TRUE, the different settings for rob lead to the same result).

Figure 4 :
Figure 4: Artificial light curve in Panel 4a with vertical bars marking the s i .Plotting time axis modulo 7 in Panel 4b reveals the periodic fluctuation of p f = 7. Histogram and sampling density of the observation times modulo 20 in Panel 4c shows the sampling periodicity of p s = 20.

Figure 5 :Figure 6 :
Figure 5: Periodogram bars calculated fitting a spline model using unweighted M-regression with the Huber function to the artificial example from Figure 4: Robustly fitting a beta distribution to the periodogram bars in Panel 5a leads to two outstanding trial periods in Panel 5b.

AFigure 7 :
Figure 7: Analysis of GROJ0422+32: Panel 7a shows the light curve, while the other panels show the periodograms fitting a sine using least squares in Panel 7b, τ -in Panel 7c, Huber M-regression in Panel 7d.No periodogram bar exceeds the respective 330 √ 0.95-quantile of the CvM-fitted beta distribution (horizontal line).

Figure 8 :Figure 9 :
Figure 8: Adding a sine with amplitude 0.005 to the light curve of GROJ0422+32.Panel 8a shows the modified light curve, while the other panels show the periodograms fitting a sine using least squares in Panel 8b, τ -in Panel 8c, Huber M-regression in Panel 8d.The horizontal lines in those three panels show the respective 330 √ 0.95-quantile of the CvM-fitted beta distribution.

Figure 10 :
Figure 10: Light curve in Panel 10a and periodograms for Mrk 501 obtained fitting a periodic step function with least squares regression in Panel 10b, τ -regression in Panel 10c, Huber M-regression in Panel 10d.

µSYFigure 12 :m
Figure 12: Structogram of RobPer.The block singleFUN is displayed in detail in Figure 13.

Figure 13 :
Figure 13: Structogram of singleFUN.NA indicates a missing value.The block IRWLS is displayed in detail in Figure 14.

Table 1 :
Arguments and return values of the function RobPer.{T, F} means {TRUE, FALSE}.

Table 2 or
Thieler et al. 2013for a more detailed discussion).

Table 2 :
Published periodogram methods that rely on fitting a periodic model g to a light curve using a regression technique.Models (see Section 2.1): periodic step functions and pairwise overlapping step functions (step

Table 3 :
Regression techniques implemented in RobPer and R functions used to perform the regression technique.For more details see Section 2.2.
Rousseeuw et al. (2015)2 ≥ 0.So for this regression technique, an implementation is needed where the scale estimate can be fixed in advance.For M-regression using the biweight function, the function lmrob..M..fit from package robustbase byRousseeuw et al. (2015)is used.This R function includes Huber M-regression only as a limiting case of Hampel M-regression with all but one of its tuning constants set to very large values.In other R functions known to us for M-regression (rlm from package MASS byVenables and Ripley 2002, iwlsm from package RSiena by Ripley, Boitmanis, and Snijders 2013 and robustregBS and robustRegH from package robustreg by Johnson 2015), the scale estimate cannot be fixed in advance.Hence M-regression using the Huber function is newly implemented for RobPer.Like the functions specified before, this we use the median (weighted if the s i shall be taken into account) to initially estimate µ.For β, LTS regression (see previous paragraph) is used.It has a high breakdown point and is appropriate in situations with many observations not agreeing with the best fit.This situation will often occur in periodogram calculation, as many trial periods and thus many wrong models are fitted to the light curve.The scale estimate σ is calculated as the (weighted) median of the residuals of the LTS fit.