mosum : A Package for Moving Sums in Change-Point Analysis

Time series data, i.e., temporally ordered data, is routinely collected and analysed in in many ﬁelds of natural science, economy, technology and medicine, where it is of importance to verify the assumption of stochastic stationarity prior to modeling the data. Nonstationarities in the data are often attributed to structural changes with segments between adjacent change-points being approximately stationary. A particularly important, and thus widely studied, problem in statistics and signal processing is to detect changes in the mean at unknown time points. In this paper, we present the R package mosum , which implements elegant and mathematically well-justiﬁed procedures for the multiple mean change problem using the moving sum statistics.


Introduction
With its beginnings dating back as far as the 1950s (Page 1954), change-point analysis is still a very active field of research in statistics. It can broadly be classified into procedures for sequential or online detection of change-points (i.e., monitoring for changes as the data is being observed) and offline detection (i.e., searching for changes after all the data is observed). In this work, we present the package mosum (Meier, Cho, and Kirch 2021), which provides an implementation of the moving sum (MOSUM) procedure from Eichinger and Kirch (2018) and its multiscale extension for offline detection of multiple changes in the mean. It is available for the statistical computing language R (R Core Team 2021) from the Comprehensive R Archive Network (CRAN) at https://CRAN.R-project.org/package=mosum.
There exist many theoretical approaches and software implementations to offline change-point analysis. For example, the R package bcp (Erdman and Emerson 2007) provides an imple-

MOSUM procedure for multiple changes in the mean
In this section, we discuss the idea of moving sums and describe the procedures for detecting multiple change-points in the mean, which are implemented in the mosum package. In Section 2.1, we review the intuition and the mathematical theory behind the use of MOSUM statistics for multiple change-point detection. In Section 2.2, we explain how to produce the estimators for the locations of the change-points from the MOSUM statistics, followed by a brief discussion of MOSUM-based variance estimation in Section 2.3. In Sections 2.5 and 2.6, we present the extensions of the MOSUM procedure with multiple summation bandwidths. Section 2.7 elaborates on how to construct bootstrap confidence intervals for the change-point locations.

MOSUM statistic
Consider observations X 1 , . . . , X n drawn independently from a distribution with the same mean as an example. By the Law of Large Numbers, it follows that for a sufficiently large summation bandwidth G > 0. If, on the other hand, there is a change in the mean of height d at time point k, then Based on this observation, the following MOSUM statistic provides a good tool for changepoint detection: with the MOSUM detector and a local estimator σ 2 k of the innovation variance (see the upcoming Section 2.3). The values T G (k) for k at the left and right boundaries can be calculated adopting a CUSUMtype boundary extension: withX (1,2G) := (2G) −1 2G t=1 X t and similarly for k = n − G + 1, . . . , n. The statistical model underlying the MOSUM procedure is a classical change-point location model (c.f. Eichinger and Kirch 2018) X t = f t + e t = N +1 j=1 µ j I{k j−1 < t ≤ k j } + e t , t = 1, . . . , n.
(4) The piecewise constant deterministic signal f t has N change-points at k j , j = 1, . . . , N (with the notational convention k 0 := 0 and k N +1 := n), and the centred model innovations e t are assumed to be independent and identically distributed. Note that the MOSUM detector from (2) is decomposed as We call T G (k; f ) the MOSUM signal and T G (k; e) the noise term.
The number of change-points N and their locations k j for j = 1, . . . , N , as well as the heights of jumps d j = µ j+1 − µ j , are typically unknown and in many applications, estimation of both the number and locations of changes is of particular interest. The MOSUM detector lends itself naturally for this purpose, since whenever a mean change at time k occurs, the corresponding MOSUM signal T G (k; f ) from (5) will attain a local maximum in its absolute value, which is superimposed by the noise term T G (k; e) in the detector T G (k; X). This is illustrated in Figure 1, where the prominent peak of the MOSUM signal at time k = 200 is still clearly visible in the noisy T G (k; X).
In order to make use of this observation for change-point detection, a suitable threshold is needed (for details, see Section 2.2 below). One option is to use a critical value of the corresponding test procedure as threshold. The actual distribution of the MOSUM statistic is not known in general, even for well-known innovation distributions and small sample sizes. One therefore makes use of an asymptotic result to derive a MOSUM-based test for changes in the mean. Indeed, for appropriate bandwidths G = G(n) satisfying G → ∞ as n → ∞ while G/n → 0 (as detailed in Appendix C), where ⇒ denotes the convergence in distribution, Γ 2 is a Gumbel-distributed random variable with P (Γ 2 ≤ z) = exp(−2 exp(−z)), and a G and b G are sequences of properly chosen scaling and shifting factors depending only on the sample size n and the bandwidth G. This asymptotic result gives rise to a MOSUM-based test with asymptotic level α, which rejects the null hypothesis H 0 : N = 0 against the alternative H 1 : N > 0 when the MOSUM statistic T G from (1) exceeds the asymptotic critical value where Q 1−α (Γ 2 ) is the (1−α)-quantile of the Gumbel distribution. The p value corresponding to the test statistic t is given by p n,G (t) = 1 − exp(−2 exp(b G − a G t)).
The bandwidth G plays a crucial role in the performance of the methodology in practical applications. We will discuss this issue in Section 2.4.

Change-point estimators
The absolute MOSUM detector |T G (k)| (see (2)) is a powerful tool for visual change-point inspection. The corresponding (absolute) MOSUM signal |T G (k; f )| (see Figure 1 for an example) is a piecewise linear function, which is equal to zero far away from the changepoints, linearly increases as a change-point is approached, and then decreases towards zero after the change-point. Consequently, a jump of the underlying step signal f t results in a peak in the MOSUM signal, with the location of the jump coinciding with that of the local maximum of |T G (k; f )|.
In practice, T G (k; f ) is not observable and we have to work with the noisy MOSUM detector T G (k) = T G (k; X) instead. Therefore, it is natural to apply a threshold to the (scaled) absolute MOSUM detector and construct change-point estimators based on the local maxima of neighborhoods exceeding the threshold. To elaborate, we consider significant neighborhoods (l, r) with l ≤ k ≤ r, such that Choosing the maximal point within every significant environment as a change-point estimator, however, may result in false positives for the following reason. Recall that the MOSUM signal T G (k; f ) is a linear function to the left and right of the true change-point and will cross the threshold. The observed detector on the other hand adds noise to this signal. Therefore it can happen, merely by chance, that the detector falls below the threshold only to be exceeding it again for just one time point or two, before crossing beneath the threshold again. Obviously, this results in additional significant neighborhoods and hence spurious estimators. See Figure 2 for an example, where this phenomenon occurs at about time point 90. Every statistical procedure based on the MOSUM detector thus needs to take this effect into account.
While spurious peaks such as this can usually be distinguished from true changes by visual inspection, the following two mathematical criteria are implemented in the R package mosum in order to avoid such systematic over-estimation.

The ε-criterion
Let 0 < ε < 1/2 be fixed. A maximal point within a significant environment k • (l,r) as in (7) will be accepted as a change-point estimate if and only if This criterion states that the significance environment has to be large enough relative to the summation bandwidth G. Otherwise, we argue that the significance is merely due to the influence of a neighboring change (as in Figure 2 at about time point 90), hence it will be discarded. Based on extensive simulation studies, we recommend ε = 0.2 as a reasonable default choice which works well in many situations. The estimators thus obtained for the change-point locations are consistent in rescaled time, as long as the distance between neighboring change-points is, in some asymptotic sense, at least twice the bandwidth, see Eichinger and Kirch (2018) for further details. The ε-criterion is particularly useful if the bandwidth is sufficiently large, i.e., G ≥ 20.

The η-criterion
Let η ≥ 0 be fixed. The η-criterion accepts a time point k • as a change-point estimator if and only if it is the maximal point within its own ηG environment, i.e., From the simulation studies, we found that a value of η = 0.4 can be recommended if there is no further knowledge about the mutual distance of changes in the data. Lifting the requirement on the significance of the entire ηG environment, the η-criterion is less conservative compared to the ε-criterion. This can be particularly useful when the bandwidth is small; we further compare the two criteria in the following.

Comparison of the two criteria
The ε-criterion is conservative in the sense that it not only requires significance of one point (as per the corresponding testing procedure) but of a whole neighborhood of length at least εG. Therefore, in some situations, the ε-criterion might be too restrictive. In particular, if the bandwidth is small (e.g., G = 8), significance environments of length 2 or even of length 1 should not be discarded a priori. See Figure 3 for an example of such a situation, where the changes at time points 10 and 20 are detected with a significant environment of length 1; nonetheless, the peaked shape of the detector clearly indicates the presence of changes.
Furthermore, it can happen for larger bandwidths and certain jump signals that the detector lies entirely above the threshold. In such cases, the ε-criterion only keeps the global maximizer of the detector as a change-point estimator even though it is obvious that only one changepoint could not have caused the detector to be significant everywhere. See Figure 4 for an example of such a situation, where the location of changes is visible in the detector (in terms of peaks), but the detector does not fall below the critical value between the changes. The ηcriterion with a suitably chosen η on the other hand can correctly identify more than one change-points in this example.

Variance estimation
The standard variance estimator (n − 1) −1 n t=1 (X t −X n ) 2 is consistent in the absence of mean changes, but systematically over-estimates the variance in the presence of mean changes and thus is not suitable for change-point estimation procedure. The median absolute deviation (Hampel 1974) or inter-quartiles range estimators are popularly adopted in the presence of change-points, but they have also been observed to over-estimate the variance in the presence of frequent change-points, see Fryzlewicz (2020). An alternative approach to variance estimation is to use a local variance estimator, such as the MOSUM-based variance estimator defined as σ 2 (l,r) := 1 r − l + 1 r t=l (X t −X (l,r) ) 2 withX (l,r) denoting the sample mean of observations X l , X l+1 , . . . , X r .
Possible choices of σ k for the usage within (1) are the local variance estimators The variance estimator in (11) is preferable for the detection of change-points at which not only the mean but also the variance undergoes changes, whereas the estimator in (12) is more robust in the sense that it will prevent the detection of spurious estimators in the presence of time-varying variance; see Sections 4.1 and 4.3 for an example with heteroscedastic data.
It is possible to consider the model (4) with dependent innovations e i fulfilling weak dependency assumptions. The distributional convergence and thus the thresholds associated with the corresponding critical values as discussed in Section 2.1, remain valid as long as the variance estimators σ 2 k are replaced by appropriate estimators for the long-run variance τ 2 = lim n→∞ var( √ nē n ) withē n = n −1 n t=1 e t . For this, we may use the MOSUM-version of the flat-top kernel estimator (Politis and Romano 1995) calculated with sufficiently large bandwidths (e.g., G ≥ 50), or the difference-based estimators from Tecuapetla-Gómez and Munk (2017), Dette, Eckle, and Vetter (2020) and Axt and Fried (2019). This is not recommended in the presence of frequent changes, however, as accurate estimation of the long-run variance is typically very difficult and thus estimators based on small and medium-sized samples are not very reliable. An alternative approach is to use the local variance estimators (ignoring the dependence structure) and at the same time slightly increasing the threshold, e.g., using C n,G (α) · log(n/G) δ for some δ > 0 (e.g., δ = 0.1), where C n,G (α) is as in (6).

Choice of bandwidth
In practice, the choice of bandwidth plays a crucial role for the performance of the MOSUM procedure. Smaller bandwidths can detect large jumps even if there are neighboring changepoints close by. At the same time, large bandwidths may not be suitable for estimating the locations of such change-points due to contamination of the signal by neighboring changepoints. Indeed, with large bandwidths, the signal may have a flat top (i.e., the signal has no longer a unique maximum but the maximal value is attained over an interval), see the third and fourth panels in Figure 5 for an example. In this case -while being significant -the maximal point in the corresponding significant neighborhood will effectively be arbitrary on that interval, so it cannot be used for the purpose of change-point localization.
On the other hand, large bandwidths are able to detect small isolated changes, whereas small bandwidths tend to miss such small jumps even if they are surrounded by long stationary stretches. Consider Figure 5 as an example, where a bandwidth of at least G = 70 seems to be required to detect the small change at k 1 = 100. One remedy for this issue is to adopt multiple bandwidths, which will be discussed in the upcoming Section 2.6.
In a similar manner, for some signals, using the same bandwidth for the left and right summation windows in the MOSUM detector does not provide sufficient flexibility. Consider the case of a small mean change located close to a large mean change, see the top panel of Figure 6 for such an example. The change may be too small to be detected by a small bandwidth, whereas the summation windows will be contaminated by the neighboring large mean change when a large bandwidth is used. One way of overcoming this limitation is to consider the use of asymmetric bandwidths G = (G l , G r ), which will be discussed in the following Section 2.5.

Asymmetric bandwidths
Let G = (G l , G r ) be a bandwidth, possibly asymmetric (i.e., G l = G r ). The asymmetric MOSUM detector is defined as At the left and right boundaries, the values of T G (k) can be defined by a CUSUM extension similar to (3): and analogously for k = n − G r + 1, . . . , n. The critical value C n,G (α) for the asymmetric MOSUM test, as well as the corresponding p values p n,G , can be obtained similarly as in the symmetric case, see Section C in the Appendix. As an example, the usefulness of using asymmetric bandwidths is illustrated in the bottom panel of Figure 6.
The extension of the change-point estimators to the asymmetric case is straightforward. The ε-criterion (8) is adjusted to and the η-criterion (9) is adjusted to The MOSUM-based variance estimators can be defined with asymmetric bandwidths in a similar fashion as the MOSUM detector. When using asymmetric bandwidths, it is advisable to avoid pairs of bandwidths that are too unbalanced, both in view of the asymptotic theory and the finite sample performance as is well-known from the two-sample testing literature.
To avoid such situations, one can impose an upper bound on max(G l , G r )/ min(G l , G r ).
Algorithm 1: Multiscale MOSUM procedure with bottom-up merging. input : Data (X 1 , . . . , X n ), set G of symmetric bandwidths, α, η ∈ (0, 1) 1 Initialize P ← K ← ∅ /* P contains the pool of candidates to be pruned down and K the final estimators */ /* Step 1: Generate candidates */ 2 for G ∈ G do 3 P G ← set of MOSUM change-point estimators obtained with bandwidth G and critical value C n,G (α) according to criterion (9) 4 for k ∈ P G do Add ( k, G) to P 5 end /* Step 2: Merging in the increasing order with respect to G */

Multiple bandwidths
In Section 2.4, we showed that the use of a single bandwidth (symmetric or asymmetric) may not be adaptive enough to detect different types of changes, which advocates the use of multiple bandwidths. On the other hand, this may introduce spurious change-point estimates and/or multiple estimates that relate to the same underlying true change-point. For these reasons, an additional merging step is necessary. Below, we introduce two procedures for pruning down a set of candidate change-point estimators obtained from applying the MOSUM procedure with multiple bandwidths.

Multiscale MOSUM procedure with bottom-up merging
As argued in Section 2.4, one possible issue of large bandwidths is that their summation windows may contain, and thus be contaminated by, several changes, rendering the corresponding estimators unreliable. Motivated by this consideration and following Messer, Kirchner, Schiemann, Roeper, Neininger, and Schneider (2014), it is reasonable to keep all the candidate estimators from the smallest bandwidth and, iteratively moving on to the next smallest bandwidth, only keep those that cannot be accounted for by the previously accepted estimates.
To elaborate, in the order of increasing bandwidths, we check whether there exists a previously accepted change-point k in the final set of estimators K, which is close to the current candidate k • detected with the bandwidth G, in the sense that | k • − k| < ηG for some η ∈ (0, 1). If so, we conclude that the change-point estimated by k • has already been detected by k ∈ K, and thus discard k • ; otherwise we add it to K. We note that the tuning parameter η bears close resemblance to the parameter η from the η-criterion in (9). For this reason, the implementation of the procedure in the R package mosum also uses the η-criterion to obtain the candidate set in this bottom-up merging. Algorithm 1 depicts a comprehensive highlevel description of the procedure. See Messer et al. (2014) for further details and a more comprehensive discussion of the rationale behind this approach in the context of change-point estimation for point processes.
The advantages of the bottom-up merging include its ease of implementation and computational speed. There are, however, several drawbacks as well: First, this approach only works with multiple symmetric bandwidths since a set of asymmetric bandwidths does not provide a canonical ordering. Secondly, the effect of multiple testing should be taken into account, which has been done in Messer et al. (2014) for the problem considered therein. More importantly, the bandwidths in consideration need to be large enough so that the critical value from the asymptotic distribution is meaningful. For example, there may be false positives in P, in the sense that for some ( k, G) ∈ P -particularly for small G -the detection interval ( k − G, k + G] does not include any true change-point, and the bottom-up merging may fail to prevent such tuple from entering K. To avoid this, Messer et al. (2014) propose to use only bandwidths of order n. However, for signals with frequent change-points in some parts but no jumps in other parts, this poses as a limitation.
Multiscale MOSUM procedure with localized pruning Niu and Zhang (2012) considered using an information criterion to prune down a set of candidate change-point estimators generated by a multiscale MOSUM procedure, say P, by performing an exhaustive search over every possible combination of the candidate estimators, and a similar idea has also been explored by Yau and Zhao (2016). Such an approach quickly becomes computationally infeasible as the number of candidate models grows exponentially with |P|. Moreover, it does not take advantage of that each estimator is detected within a local interval determined by the summation windows. That is, for any estimator k detected with bandwidths G = (G l , G r ), we have the information about its detection interval I( k) := ( k − G l , k + G r ] which can be utilized in the pruning step. In Algorithm 2, we present a comprehensive high-level description of the information criterion-based localized pruning method applicable with the multiscale MOSUM procedure; in contrast to Algorithm 1, it allows the use of asymmetric bandwidths. In Step 2 of Algorithm 2, we determine the order in which the change-point candidates ( k, G ∈ P are to be processed. For this, we adopt a sorting function c, which assesses the "significance" of each candidate according to the inverse of the p value (c p ) associated with its detection or the (scaled) jump size (c J ) as defined below: All change-point candidates are processed one by one in the decreasing order according to the chosen sorting function. In case of ties, we order the candidates with respect to the length of their detection intervals and associated bandwidths, preferring those returned at shorter bandwidths. In Step 2.2 of Algorithm 2, we identify a local environment ( k L , k R ] around the estimator k • considered at the current iteration, over which we perform the exhaustive search for changepoints. It is defined by the currently surviving candidates (which either have already been selected or are still to be processed, contained in C) closest to k • , while their detection intervals do not overlap with that of k • (see lines 8-9 of Algorithm 2); any candidate falling within Algorithm 2: Multiscale MOSUM procedure with localized merging. input : Data (X 1 , . . . , X n ), set G of (possibly asymmetric) bandwidths, α and ε or η ∈ (0, 1), sorting function c as c p or c J from (17) 1 Initialize P ← C ← K ← ∅ /* P contains the pool of candidates to be pruned down, C the currently surviving candidates and K the final estimators */ /* Step 1: Generate candidates */ 2 for G ∈ G do 3 P G ← set of MOSUM change-point estimators obtained with bandwidth G and critical value C n,G (α) according to criteria (15) or (16)   4 for k ∈ P G do Add ( k, G) to P 5 end /* Step 2: Prune down P in decreasing order with respect to c(·) */ 6 while P is not empty do /* Step 2.1: Find the candidate of consideration Add A to K, remove R from P and update C ← P ∪ A 16 end output: K ( k L , k R ] is regarded as belonging to the set of conflicting candidates D, in the sense that their detection bandwidths overlap with that of k • . For the exhaustive search performed on D in Step 2.3, in which we determine the inclusion or exclusion of each change-point candidate in D, the Schwarz Criterion (SC) is adopted. Specifically, for any A ⊂ D, SC is calculated as where denotes the residual sum of squares given some set Q = { k 1 , . . . , k m } of change-point candidates (with the convention k 0 := 0 and k m+1 := n). If the user is confident that a normality assumption on the data is reasonable, or at least that all the moments of X t exist, we recommend the choice of p(n) = log(n) with slightly larger than one (e.g., = 1.01). Otherwise p(n) = n should be used with an exponent 2/ν < < 1, where ν is the number of moments that one believes to exist for E(|e t | ν ) < ∞ (see Kühn 2001).
Among the 2 |D| subsets of D, we select the subset A according to the following rule. Let F denote the collection of all subsets A ⊂ D satisfying: adding further change-point candidates to A monotonically increases the SC, and denote by m * = min A∈F |A|. Then, we select

while the first and the last elements of A may or may not be included in
If there are multiple subsets yielding the minimum SC in (II), we choose the one with the minimum cardinality; if there are still ties, we arbitrarily select one. An efficient algorithm implementing the exhaustive search according to (II) is outlined in Section A of the Appendix.
Finally, A is added to the final set of estimated change-points K in Step 2.4 of Algorithm 2, while candidates in D which do not merit further consideration are removed from P, and the set of currently surviving candidates C is updated accordingly. Repeatedly performing the localized exhaustive search, the algorithm is terminated when P is empty.
We refer to Cho and Kirch (2020b) for the discussion on the theoretical properties of the multiscale MOSUM procedure with localized pruning, where it is shown to achieve consistency both in the total number and the locations of change points under general conditions.

Bootstrap confidence intervals
Consider a set k 1 , . . . , k N of change-point estimates returned by, e.g., the multiscale MOSUM procedures (Algorithms 1-2) or by the single-bandwidth MOSUM procedure from Section 2.2. Conditional on consistent estimation of the multiple change-points, both in their total number (such that N = N ) and locations, we can construct confidence intervals for the locations of change-points via bootstrapping. Below we provide the simplest version of the bootstrap scheme for i.i.d. innovations.
Denote the (possibly asymmetric) detection bandwidths associated with k j by G j = ( G l j , G r j ), and let I j := { k j−1 +1, . . . , k j } for j = 1, . . . , N +1, where the convention k 0 := 0 and k N +1 := n is employed. Note that I j represents all time points between the (j − 1)th and the jth estimated change-points, so that the underlying step signal is assumed to be (approximately) constant within this interval. A bootstrap replicate {X * t : t ∈ I j } of the observations {X t : t ∈ I j } can be obtained by drawing a random sample of size |I j | from {X t : t ∈ I j } (with replacement). Repeating this for all j = 1, . . . , N + 1 yields a bootstrap replicate {X * 1 , . . . , X * n } of the observed time series. Then a bootstrap replicate k * j of k j is obtained as (13), with X t replaced by its bootstrap replicates X * t . From this, confidence intervals for the change-point locations can readily be computed.
To elaborate, assume that for each change-point estimate To construct uniform confidence intervals for k 1 , . . . , k N , we take into account the multiple testing problem that arises when considering multiple estimates, by computing M (α) as denotes the estimated height of the jump at t = k j , and the pooled innovation sample variance. Then, uniform 100(1 − α)%-confidence intervals for k 1 , . . . , k N are given by Note that for each j = 1, . . . , N , if either of the two endpoints of K pw j (α) falls outside the detection interval ( k j − G l j , k j + G r j ], we trim off the confidence intervals so that K pw j (α) ⊂ ( k j − G l j , k j + G r j ]; a similar step is taken for the uniform confidence intervals K unif j (α).

Introduction to the package
In this section, we introduce the main functions of the mosum packages for multiple changepoint detection, mosum, multiscale.bottomUp and multiscale.localPrune, as well as functions for producing confidence intervals for change-point locations and visualising the results. We first start with a brief guide on how to generate piecewise stationary time series popularly used as benchmark in the change-point literature in Section 3.1. In Section 3.2, we explain the implementation of the (single-bandwidth) MOSUM procedure described in Sections 2.1-2.2. In the subsequent Sections 3.3 and 3.4, we introduce the implementations of the multiscale MOSUM procedures given in Algorithms 1-2 from Section 2.6, respectively. We present how bootstrap confidence intervals for change-points can be obtained in Section 3.6, and discuss some tools for visualization of the outcome from multiscale change-point analysis in Section 3.7.

Generating piecewise stationary time series
The function testData generates piecewise stationary time series with i.i.d. innovations. testData(model = "custom", lengths = NULL, means = NULL, sds = NULL, rand.gen = rnorm, seed = NULL, ...) It takes the following arguments: • model: A string specifying the model from which a realization is to be generated. The default choice is "custom", which allows the user to parse the piecewise stationary model with the arguments lengths, means and sds. In addition, five change-point models "blocks", "fms", "mix", "teeth10" and "stairs10" from Fryzlewicz (2014) have been implemented. If one of these models is chosen, the arguments lengths, means and sds are ignored.
• lengths: Lengths of the piecewise stationary segments, represented as an integer vector.
Only in use if model = "custom".
• means, sds: Means and deviation scalings of the piecewise stationary segments, represented as numeric vectors. The i.i.d. innovations generated from the distribution specified by rand.gen are multiplied by the respective entry of sds over each segment. Note that the entries of sds coincides with the standard deviation in case of standard normal innovations (rand.gen = rnorm). Only in use if model = "custom".
• rand.gen: A function to generate the time series innovations.
• seed: A seed value to be parsed to set.seed (optional) preceding a call of rand.gen.
If seed is set to NULL, then set.seed is not called.
The function testData returns a list consisting of a numeric vector containing a realization of the specified time series model, as well as the deterministic piecewise constant signal f t and the scaling vector applied to the innovations. As an example, we consider a realization of the mix time series model, visually overlaid by the corresponding step signal: R> td <-testData(model = "mix", seed = 1234) R> plot(ts(td$x), col = "darkgray") R> lines(td$mu, col = 2, lty = 2, lwd = 2) The result is plotted in Figure 7.

MOSUM procedure with a single bandwidth
The single-bandwidth MOSUM procedure from Section 2.2 for multiple change-point estimation is implemented in the function mosum: The function takes the following arguments: • x: Input data X 1 , . . . , X n , i.e., a univariate time series represented as a numeric vector (of length n) or an object of class 'ts'.
• G: Bandwidth G, i.e., a single positive integer smaller than n/2. Alternatively, a single numeric value in the interval (0, 0.5) describing G as a fraction of the data length n can be given.
• G.right: Length of the right summation window G r , i.e., a single positive integer, if an asymmetric bandwidth G = (G l , G r ) is used. As with G, it can alternatively be given as a single numeric value in (0, 0.5). When max(G l , G r )/ min(G l , G r ) < 4 while threshold = "critical.value", a warning message is generated; see also Section 2.5 for a brief discussion.
• var.est.method: A string encoding how the local variance estimation σ 2 k shall be conducted. Currently implemented are: -"custom": The local variance estimates supplied by the user; in this case, these values can be parsed as a numeric vector of length n with the argument var.custom; -"mosum": The MOSUM-based variance estimator from (10) is used; -"mosum.min": the MOSUM-based variance estimator from (11) is used; -"mosum.max": the MOSUM-based variance estimator from (12) is used.
• var.custom: The custom local variance estimates σ 2 k for k = 1, . . . , n as a numeric vector of length n containing positive values. Only in use if var.est.method = "custom".
• boundary.extension: Logical variable indicating whether the values T G (k) for 1 ≤ k ≤ G l − 1 and n − G r + 1 ≤ k ≤ n shall be padded with CUSUM values, see (3). If boundary.extension = FALSE, these values will be evaluated as NA.
• threshold: A string indicating which threshold should be used to determine significance of the scaled absolute MOSUM detector. By default, the asymptotic critical value C n,G (α) from (6) is used, where the significance level α is given by the parameter alpha. Alternatively, with threshold = "custom", it is possible to parse a user-defined numerical value with the argument threshold.custom. The latter case might be used e.g., in case of dependent observations, see the discussion at the end of Section 2.3.
• alpha: A single numeric value in (0, 1) representing the significance level for the critical value. Only in use if threshold = "critical.value".
• threshold.custom: A numeric value greater than 0 to be used as the threshold for the significance of the scaled absolute MOSUM detector. Only in use if threshold = "custom".
• eta: A numeric value greater than 0 for η in the η-criterion. Only in use if criterion = "eta".
• do.confint: A boolean argument indicating whether to compute the confidence intervals for change-points.
• level: A single numeric value in (0, 1) representing the confidence level for the confidence intervals. Only in use if do.confint = TRUE.
• N_reps : A single positive integer representing the number of bootstrap replicates to be generated for confidence interval construction. Only in use if do.confint = TRUE.
The function mosum is flexible in that it allows for the user to supply custom variance and threshold values using the arguments var.custom and threshold.custom, respectively, as well as providing the theoretically-motivated default choices discussed in Sections 2.1 and 2.3.
When called, mosum returns an S3 object of class 'mosum.cpts', containing the following entries (apart from the call arguments): • stat: Scaled absolute MOSUM detector σ −1 k |T G (k)| for 1 ≤ k ≤ n, as a numeric vector of length n.
• rollsums: Unscaled MOSUM detector T G (k) for 1 ≤ k ≤ n, as a numeric vector of length n.
• var.estimation: Values of σ 2 k as a numeric vector of length n estimated as specified by var.est.method.
• cpts: A vector containing the locations of the estimated change-points.
• cpts.info: A data frame containing the estimated change-points and their respective detection bandwidths, asymptotic p values of the MOSUM statistics and scaled change heights (obtained as in (17)) as columns. • ci: An S3 object of class 'cpts.ci' containing confidence intervals for the change-points.
Returned iff do.confint = TRUE; see Section 3.6 for further details.
S3 objects of class 'mosum.cpts' are supported by plot, summary, print and confint methods.
As an illustration, we analyse Nile, a time series of length n = 100 containing the annual flow of the Nile at Aswan from the R package datasets (see help(Nile) for further information about the dataset). Using the argument display of plot.mosum.cpts, we can visualize either the input time series or the scaled absolute MOSUM detector along with the estimated change-points.
R> m <-mosum(Nile, G = 20, alpha = 0.05) R> par(mfcol = c(2, 1), mar = c(4, 2.5, 2.5, 0.5)) R> plot(m, display = "data") R> plot(m, display = "mosum") The result is shown in Figure 8. It can be seen that the scaled MOSUM detector exceeds the critical value C n,G (0.05) in the years 1895-1901. Applying the η-criterion described in (9) with the default choice η = 0.4, a single change-point is estimated at α = 0.05 as below. The estimated change-point location at k = 28, where the scaled absolute MOSUM detector attains its maximum, coincides with the year 1898, which is close to the beginning of the construction of the Aswan Low Dam in 1899. Besides the estimated locations of changepoints, summary of 'mosum.cpts' objects extract and print the information contained in the entry cpts.info.

Multiscale MOSUM procedure with bottom-up merging
The function multiscale.bottomUp provides an implementation of the multiscale MOSUM procedure with bottom-up merging described in Algorithm 1 from Section 2.6: multiscale.bottomUp(x, G = bandwidths.default(length(x), G.min = max(20, ceiling(0.05*length(x)))), threshold = c("critical.value", "custom") Apart from those passed to the function mosum (including ...), it accepts the following arguments: • G: A set G of (symmetric) bandwidths, represented as a vector of integers smaller than n/2, or numeric values in (0, 0.5) describing the bandwidths relative to n. When the smallest bandwidth is smaller than min(20, 0.05n) (0.05 in the case of the relative bandwidth) while threshold = "critical.value", a warning message is generated, see the discussion at the end of Section 2.6.1. By default, it is given by bandwidths.default (see Section 3.5), with its arguments set to avoid triggering the warning.
• threshold: it is possible to parse a user-defined threshold function with the argument threshold.function when threshold = "custom"; see Section 4.2 for an example.
• threshold.function: A user-specified function of the form function(G, n, alpha) for computing a bandwidth-dependent threshold of significance. Only in use if threshold = "custom".
When called, multiscale.bottomUp returns an S3 object of class 'multiscale.cpts', consisting of the following entries in addition to the call arguments: • cpts: Set of change-point estimators K (see Algorithm 1), represented as an integer vector.
• cpts.info: A data frame containing the estimated change-points and their respective detection bandwidths, asymptotic p values of the MOSUM statistics and scaled change heights (obtained as in (17)) as columns.
• pooled.cpts: Candidate set P containing all the estimators from the multiscale MO-SUM procedure considered for pruning.
• ci: An S3 object of class 'cpts.ci' containing confidence intervals for the change-points. Returned iff do.confint = TRUE; see Section 3.6 for further details.
S3 objects of class 'multiscale.cpts' are supported by plot, summary, print and confint methods. We provide a detailed description of plot.multiscale.cpts in Section 3.7.

Multiscale MOSUM procedure with localized pruning
The function multiscale.localPrune provides an implementation of Algorithm 2 from Section 2.6: multiscale.localPrune(x, G = bandwidths.default(length(x)), max.unbalance = 4, threshold = c("critical.value", "custom") It accepts the following arguments, in addition to those accepted by mosum and multiscale.bottomUp: • G: A set G of bandwidths, represented as a vector of integers smaller than n/2, or numeric values in (0, 0.5) describing the bandwidths relative to n. Asymmetric bandwidths obtained as the Cartesian product of the set G with itself are used for the multiscale MOSUM procedure. By default, it is given by bandwidths.default, see Section 3.5.
• max.unbalance: A numeric value greater than equal to one which imposes an upper bound on max(G l , G r )/ min(G l , G r ) for the bandwidth G = (G l , G r ) to be used for candidate generation. By default, we recommend max.unbalance = 4.
• rule: A string for the choice of the sorting criterion c to be used in Algorithm 2. Possible values are "pval" for the inverse of the p value corresponding to change-point estimates (c p ) and "jump" for the corresponding jump size (c J ), see (17).
• penalty: A string indicating which penalty p(n) to be used in the SC, see (18). Possible values are "log" for p(n) = log(n) and "polynomial" for p(n) = n.
• pen.exp: A numeric value for the exponent in the penalty term of the SC, see (18).
When called, multiscale.localPrune returns an S3 object of class 'multiscale.cpts', as described in Section 3.3.
As a simple example, we continue with the normal time series x from Section 3.3, which is analysed using an asymmetric bandwidth grid of size 16 obtained as the Cartesian product of the set {30, 50, 80, 130} with itself: R> mlp <-multiscale.localPrune (x, G = c(30, 50, 80, 130)) R> print(mlp$cpts) R> print(mlp$pooled.cpts) [1] 50 100 300 [1] 48 50 86 96 100 300 As the example shows, the initial candidate set P considered by Algorithm 2 tends to be larger than that considered by Algorithm 1 due to the use of asymmetric bandwidths. The localized merging algorithm is successful in removing any spurious or duplicate estimates and returns K that correctly estimates all the change-points.

Bandwidth generation
The default option for generating bandwidths for the multiscale MOSUM procedure is the function R> bandwidths.default(n, d.min = 10, G.min = 10, G.max = min(n/2, n^(2/3))) which takes as its input the sample size (n), the minimal mutual distance between changepoints that can be expected (d.min), and the minimal and maximal allowed bandwidths (G.min and G.max). The function returns an integer vector (G 1 , . . . , G m ) where G 0 = G 1 = max{G min , 2d min /3} and G j+1 = G j−1 + G j for j = 1, . . . , m − 1, with m chosen as the largest integer such that G m ≤ G max while G m+1 > G max . For multiscale.bottomUp, we set G.min so that the default bandwidths for the function do not generate a warning message about the smallest bandwidth being too small relative to n. For multiscale.localPrune, we use the default choices given above.
• parm: A string indicating which parameters are to be given confidence intervals; only parm = "cpts" is supported. The argument is required for the compatibility with the generic function confint.
• level: A single numeric value in (0, 1) representing the confidence level for the confidence intervals; corresponds to α used in K pw j (α) from (20) and K unif j (α) from (21).
• N_reps: A positive integer representing the number of bootstrap replicates to be generated for confidence interval construction.
When called, the function returns an S3 object of class 'cpts.ci' containing the following entry besides the call arguments: • CI: A data frame of five columns, containing the estimated change-points (in column cpts) and the end points of the pointwise confidence intervals K pw j (α) (in columns pw.left and pw.right) and the uniform confidence intervals K unif j (α) (in columns unif.left and unif.right).
As an example, we revisit the analysis from Section 3.4:

Visualization
The plot of the scaled absolute MOSUM detector is particularly suitable for visually inspecting the data for possible change-points. There is a plot method available for S3 objects of class 'mosum.cpts', which plots either the input time series along with piecewise constant signal constructed using the estimated change-points, or the scaled absolute MOSUM detector along with the critical value and the estimated locations of the change-points; see Figure 8.
S3 objects of class 'multiscale.cpts' for the output of the multiscale algorithms discussed in Sections 3.3-3.4, are also supported by plot method. We can visualize the set of estimated change-point locations (on the x-axis) against the input time series or display the significance of the change-point estimators, as well as plotting the confidence intervals of change-points or the detection environments of the estimators. For fair representation of the significance of change-point estimators detected at different scales, we utilize the p values associated with their detection rather than plotting the MOSUM detectors from different scales simultaneously.
• display: A string indicating whether to plot the input time series (display = "data") along with the estimated change-point locations and piecewise constant signal, or to display the significance of change-point estimators (display = "significance"), represented by one minus the p values associated with their detection.
• shaded: A string indicating whether confidence intervals (shaded = "CI") or detection intervals (shaded = "bandwidth") shall be plotted as shaded rectangles around the estimated change-point locations. No shaded rectangle is produced when shaded = "none".
• level, N_reps: Arguments used for generating the bootstrap confidence intervals, to be parsed to confint.multiscale.cpts. Only in use if shaded = "CI".
The persp3D.multiscaleMosum accepts the following arguments: • x: Input data X 1 , . . . , X n , i.e., a univariate time series represented as a numeric vector (of length n) or an object of class 'ts'.
• mosum.args: Further arguments to be parsed to the function mosum (see Section 3.2), which may be empty. Note that the bandwidths are chosen by default and should not be given as an argument in mosum.args.
• threshold, alpha, threshold.custom: Arguments specifying how to select the thresh- olds for standardising scaled MOSUM detectors |T G (k)|/ σ k computed with different bandwidths G, see the description of these arguments in Section 3.3.
• pal.name: A string containing the name of the ColorBrewer palette from the RCol-orBrewer (Neuwirth 2014) to be used. Sequential palettes are recommended. See brewer.pal.info of RColorBrewer for further details.
The remaining arguments are graphical parameters parsed to the function persp3D of the plot3D package (Soetaert 2019), and further information can be found in the R documentation thereof. The range of the color palette is chosen such that the three lightest hues (when a sequential palette is used) indicate insignificant MOSUM values in change-point analysis.
As an example, we consider a visualization of the MOSUM detectors computed on a realization from the test signal mix from the literature (see Figure 7 for the plot of the time series): R> x <-testData(model = "mix", seed = 1234)$x R> persp3D.multiscaleMosum(x, mosum.args = list(boundary.extension = FALSE)) The result is shown in Figure 10. Note that a z-axis value above one in Figure 10 implies that the respective statistic exceeds the critical value, as indicated by the emergence of the hue of orange. It becomes obvious that the large changes at the beginning are given prominence at smaller bandwidths, whereas the peaks corresponding to smaller changes become significant at larger bandwidth.

Usage examples
In this section, we provide more detailed examples demonstrating the usage of the mosum package for detecting multiple change-points in the mean. We start with the single-bandwidth MOSUM procedure applied to heteroscedastic time series in Section 4.1, followed by the application of the multiscale algorithms from Section 2.6 to test signals with frequent changepoints and real-life time series from economics in Sections 4.2 and 4.3.

MOSUM procedure with a single bandwidth
In this first example, we apply the MOSUM procedure with a single, asymmetric bandwidth G = (40, 60) to a time series of length n = 800 with independent normal innovations and two changes in the mean at k 1 = 200 and k 2 = 600 of respective height d 1 = 2 and d 2 = −1.5, co-occurring with changes in variance. The variances over each stationary segment is set to be 1, 0.8 and 0.5, respectively. We adopt the local variance estimator (11), which is expected to have more power when there are changes in both mean and variance. By default, the η-criterion (16) with η = 0.4 is used for change-point estimation in conjunction with the default significance level α = 0.1. 400,200), means = c(0, 2, 1), + sds = sqrt(c(1, 0.8, 0.5)), seed = 111) R> x1 <-td$x R> f1 <-td$mu R> m <-mosum(x1, G = 40, G.right = 60, var.est.method = "mosum.min") R> print(m$cpts)

Multiscale MOSUM procedure with bottom-up merging
We apply Algorithm 1 to the piecewise stationary time series mix from Section 3.1. The test signal is particularly interesting due to the variety of types of mean changes, from large jumps over short intervals to small jumps over longer stretches of stationarity (see Figure 7). We consider using a dense bandwidth grid G = {10, 11, . . . , 40} with a slightly increased threshold C n,G (α) · log(n/G) 0.1 , in order that the use of small bandwidths does not yield spurious estimators, see the discussion at the end of Section 2.6.1.
R> x2 <-testData(model = "mix", seed = 1234)$x R> threshold.custom <-function(G, n, alpha) { + mosum.criticalValue(n, G, G, alpha) * log(n/G)^0.1 + } R> mbu <-multiscale.bottomUp(x2, G = 10:40, threshold = "custom", + threshold.function = threshold.custom) Although the smallest bandwidth from G is relatively small compared to the sample size n = 560, the use of custom threshold suppresses the issue of the warning message. Intuitively, the large changes with small mutual distance (at the beginning of the signal) should be detected by small bandwidths, whereas the small changes with large mutual distance (towards the end of the signal) should be detected by the large bandwidths. This is indeed verified by inspecting the column G.left containing the detection bandwidths in the output of summary(mbu); see also the surface plot Figure 10 in Section 3.7.

Multiscale MOSUM procedure with localized pruning
We apply the multiscale MOSUM procedure summarized in Algorithm 2 to the piecewise stationary time series blocks (see Section 3.1). We use the default choice of bandwidths returned by bandwidths.default as described in Section 3.3. Candidate change-point estimates are generated with the generous choice of α = 0.4: R> x3 <-testData(model = "blocks", seed = 123)$x R> mlp <-multiscale.localPrune(x3, alpha = 0.4, pen.exp = 1.01) R> print(mlp$cpts) [1] 200 266 307 471 511 818 902 1331 1555 1597 1654 To see how many candidates have been discarded in the merging step of the algorithm, the set P of change-point candidates prior to merging may also be of interest: R> print(mlp$pooled.cpts) [ In addition, we analyse the quarterly US ex-post real interest rate from 1961:Q1 to 1986:Q3 (n = 103) from the Citibase data bank, which has been extensively studied in Garcia and Perron (1996); the data is available from the R package strucchange (Zeileis et al. 2002). We set α = 0.1 and use the η-criterion with η = 0.4 along with the default bandwidths for the multiscale MOSUM procedure, and use p(n) = log n and = 1.01 for the penalty function for the localized pruning. The plot of input data series in Figure 12 suggests that there may be changes in the variance. To prevent detecting spurious estimators due to possible heteroscedasticity in the data, we use the local variance estimator of (12). We note that the outcome did not vary with a range of alternative choices for α, η and . For comparison, we analyse the same data for change-points using the WBS algorithm with the information criterion-based model selection (Fryzlewicz 2014), the TGUH algorithm (Fryzlewicz 2018), PELT (Killick et al. 2012a), S3IB (Rigaill 2015), cumSeg (Muggeo and 1960(Muggeo and 1965(Muggeo and 1970(Muggeo and 1975(Muggeo and 1980(Muggeo and 1985 −20 Adelfio 2010, transforms the data and iteratively fits a linear model for change-point analysis), FDRSeg (Li et al. 2016) and H-SMUCE (Pein et al. 2017); where relevant, we set the significance level at 0.1 and follow the default setting for any other arguments. Detailed codes for applying the above methods are provided in the online supplementary material.
Top panel of Figure 12 shows the change-points estimated by Algorithm 2 and competitors. Among different methods, cumSeg and H-SMUCE are designed to be robust to heteroscedasticity in the data, and the use of local variance estimator is found effective in this case for our MOSUM-based method. On the other hand, the time-varying variance may have led methods such as TGUH, S3IB and FDRSeg to detect possibly spurious change-points. Most methods return the two change-points identified by Algorithm 2 and in fact, the WBS, PELT and H-SMUCE return the identical estimators. Also, the two change-point estimators are close to the two breaks reported in Garcia and Perron (1996), where the authors related the breaks to the findings in the economics literature.
Unlike multiscale.localPrune, WBS, S3IB and cumSeg require the maximal number of change-points as an input, and the output may vary with respect to the choice of this parameter; e.g., when this quantity is set as large as 100, the WBS returns 100 estimators.
We can generate bootstrap confidence intervals around the change-points and plot them: R> plot(mlp, display = "data", shaded = "CI", CI = "unif", level = 0.1) The 90% uniform confidence intervals are plotted in the bottom panel of Figure 12. The stepR package supports calculation of confidence intervals of change-point locations, based on the exponential bounds for the probability of under-estimating the change-point set; we plot these confidence intervals in the x-axis of the same figure. We note that the latter confidence intervals are considerably wider than the bootstrap confidence intervals generated as above for this data.

Conclusion and outlook
In this paper, we present the R package mosum (Meier et al. 2021). Implementations of both single bandwidth and multiscale MOSUM procedures for multiple change-point estimation have been described, as well as tools for visualization and data generation.
Due to its flexibility, the MOSUM framework discussed in this paper can be extended in several ways. First, different MOSUM procedures for the mean change problem can be considered, such as those based on more robust location estimators utilising the median. Secondly, multivariate data analysis could be incorporated. Also, different types of changes can be handled, e.g., parameter changes in (auto-)regressive time series (linear, non-linear or even count time series).
All of the above extensions can be dealt with theoretically in a unified framework based on estimating functions (see e.g., Kirch and Kamgaing 2015, for a discussion in a sequential framework). However, the details of the implementation, including important issues such as covariance estimation, require case-by-case consideration. It is planned to incorporate important extensions into future revisions of the mosum package.

A. Localized exhaustive search algorithm
In this section, we discuss an efficient implementation of the exhaustive search performed in Line 11 of Algorithm 2. It seeks for a subset A ⊂ D according to the criterion II from Section 2.6. For the sake of completeness and readability, we present a high-level description thereof in Algorithm 3. In what follows, we describe some details of our implementation of Algorithm 3 within the mosum package. It is based on C++ code, which is integrated into R with the Rcpp package (Eddelbuettel and François 2011).
In every iteration of Step 1.2 of Algorithm 3, several SC values have to be computed for comparison. To make these steps less computationally intensive, we use pre-computed sums.  Table 1: Connection between binary representation, set representation and integer representation of all subsets of a candidate set D = { k 1 , k 2 , k 3 , k 4 } of length |D| = 4.
To elaborate, let P = { k 1 < . . . < k m } denote the set of candidate change-points returned from Step 1 of Algorithm 2. Algorithm 3 is sped up by pre-computing and storing the sums The key idea is that every D i ⊂ D can be represented as a vector ψ i = (ψ i,1 , . . . , ψ i,|D| ) of binary variables ψ i,j indicating whether k j belongs to D i or not, i.e., ψ i,j ∈ {0, 1} and ψ i,j = 1 if and only if k j ∈ D i , for 1 ≤ i ≤ M and 1 ≤ j ≤ |D|. In addition, every binary vector ψ i has a canonical representation as a nonnegative integer κ i = |D| j=1 ψ i,j 2 j−1 ∈ {0, . . . , M − 1}. This one-to-one correspondence D i ↔ ψ i ↔ κ i enables an efficient way of representing the subsets. Table 1 provides an example for the case |D| = 4.
A crucial advantage of this approach is that set operations (acting on D i ) can be translated into binary operations (acting on the bits ψ i of κ i ), the latter being highly performant when implemented in C++. As an example, the outer loops in Step 1.1 and Step 1.2 of Algorithm 3 translate into a loop over all κ i such that the corresponding ψ i contains exactly non-zero entries. Such a loop can be realized in C++ as follows (see Section 11.1.1 in Stroustrup 2000, for a comprehensive overview on bitwise/logical operators in C++): unsigned int kappa = (1 << l) -1; while(kappa < M-1) { // ... const unsigned int tmp = kappa | (kappa-1); kappa = tmp | ((((tmp & -tmp) / (kappa & -kappa)) >> 1) -1); } As another example, the inner loops (given D i ) over all subsets D j ⊂ D i with |D j | = |D i | − 1 in Step 1.1 and Step 1.2 of Algorithm 3 translate into a loop over all κ j with ψ j having exactly |D i | − 1 non-zero entries and ψ j ≤ ψ i holding component-wise among the binary vectors. A possible realization in C++ is as follows:  We also benefit from the binary representation when searching for D i ⊂ R D i in Step 2 as this amounts to locating the leftmost and the rightmost index at which ψ i ,j = 1.
Due to the exponential time and memory consumption, we restrict the candidate set size to |D| ≤ 24 in our implementation of Algorithm 3. If a candidate set D exceeding this size is proposed in Step 2.2 of Algorithm 2, we regard the corresponding tuple ( k • , G • ) (from Line 7 in Algorithm 2) as infeasible. If an infeasible tuple occurs, we swap k • with the next feasible candidate from D or, if none of D is feasible, with the next feasible candidate from P, and leave k • for future consideration. In many cases, changing the order of processing is already sufficient to ensure that the number of conflicting candidates of k • will shrink at some point to a feasible size, because some of the currently conflicting estimators will have been pruned down due to the processing of another overlapping candidate set. If, however, all remaining tuples in P are infeasible, we use a manual thinning step, successively removing the elements of D until it reaches a feasible size. More precisely, the candidates of D are processed one by one in decreasing order of mutual distance and removed, until |D| = 24. If this thinning step is necessary in a call of multiscale.localPrune, the user will be informed by a warning message as below: Warning: 25 conflicting candidates, thinning manually

B. Execution time of multiscale MOSUM methods
We briefly study the execution time of multiscale MOSUM methods when applied to "dense" test signals of length n ≥ 2 × 10 4 , which are obtained by repeating the five test signals implemented in the function testData until the length of the series exceeds 2 × 10 4 . Table 2 summarizes the execution time of Algorithms 1-2 implemented in mosum on the dense test signals averaged over 100 replications, as well as that of WBS2 (proposed to improve upon the adaptivity of the WBS for signals with possibly frequent changes, see Fryzlewicz (2020) for details), TGUH (implemented in breakfast) and PELT (changepoint) applied with the default parameters. The code for generating the results in Table 2 is provided in the online supplementary material, and is run on a 4 GHz Intel Core i7 with 16 GB of RAM running macOS Catalina.
As expected, Algorithm 1 can be executed very quickly and its execution time requires only a fraction of a second to handle long time series with frequent jumps. Algorithm 2 with c J as the sorting function, often takes less than one second to analyse dense test signals generated under different scenarios. An exception is the test signal stairs10 where the candidates detected at large bandwidths tend to yield large jump size due to the nature of the signal, which results in large search space for Step 2.3 of Algorithm 2 and increased execution time. Similarly, the p values associated with the candidates detected by large bandwidths for blocks and mix are often set at exactly zero, which creates many ties for the sorting function c p and leads to large search space. Nonetheless, Algorithm 2 with c p as the sorting function requires comparable or even smaller execution time than the WBS2 for the test signals fms, teeth10 and stairs10.