Dimension and Complexity Measures
|
|
Nonlinear
dynamical systems observed from time series are identified in terms of
their invariant measures as estimated from the time series. Some
invariant measures are associated with the dimension and complexity of
the underlying to time series system, or more specifically, to the
attractor, the set of points or trajectories generated by the dynamical
system. The correlation dimension is an invariant measure of the attractors fractal dimension and it is the most popular among other similar fractal dimension measures, such as the information and box-counting dimension. The correlation dimension measures the self-similarity property of the so-called strange attractors, i.e. attractors that possess this property and have a non-integer fractal dimension. The "regular" attractors, which are objects or set of points that can be sufficiently described in terms of the topological and Euclidean dimension, have an integer correlation dimension. The correlation dimension is based on the density of points of the attractor and this makes possible to estimate it from the sample density of the points reconstructed from the time series, typically from delay embedding for a delay t and an embedding dimension m. The estimation starts with the computation of the so-called correlation sum C(r) (abusively, it is also called correlation integral), which is calculated for a range of radii r. For each r, C(r) is the average fraction of points within this radius r (r is referred to as radius because the Euclidean norm is used to compute the inter-point distances of all points from each reference point). Given that the underlying dynamical system is deterministic, the correlation sum scales with the power of r as C(r) ~ rí, and the exponent v is the correlation dimension. Theoretically, the scaling should hold for very small r (r →0), but in practice the scaling is bounded at small r by the lack of close points (dependent on the length of the time series) and the masking of noise (even for noise-free data a lower bound for r is set by the round-off error). Computationally, for the estimation of í, first the local slope of the graph of log(C(r)) vs log(r) is computed by the smooth derivative of log(C(r)) with respect to log(r). Provided that the scaling C(r) ~ rí exists at least for a sufficiently large range of r, a horizontal plateau of the local slope vs log(r) should be formed for this range of r. The least size of the range is usually given in terms of the ratio of the upper edge r2 to the lower edge r1 of the scaling r-interval, and a commonly used ratio is r2/r1 = 4. Here, the estimate of í is the mean local slope in the interval [r1,r2] with the least variance of the local slope. In real world applications, reliable estimation of í can be hard (meaning that the variance of the local slope is large) which indicates that there is no clear scaling. In that cases, other simpler measures derived from the correlation sum may be more useful, such as the value of the correlation sum C(r) for a given r, which is simply the cumulative density function of inter-point distances at a given inter-point distance, or the value of r that corresponds to a given C(r), which is the inverse of this cumulative density. The embedding dimension m is an important parameter for the analysis of time series from the dynamical system approach, and determines the dimension of the Euclidean pseudo-space in which supposedly the attractor is reconstructed from the time series. An m that is small but sufficiently large to unfold the attractor is sought and a popular method to estimate it is the false nearest neighbors. This method increases the embedding dimension by one at each step and counts the percentage of points for which its nearest neighbor falls apart with the addition of a new component (from embedding dimension m to m+1), and therefore these points are called false nearest neighbors. The estimated minimum embedding dimension is the one that first gives an insignificant percentage of false nearest neighbors. Though the minimum m is not an invariant measure, as it depends for example on the delay parameter t, it is used in some applications as a measure to discriminate different dynamical regimes and it is therefore included here as well. Regarding the complexity measures, the approximate entropy and the algorithmic complexity are implemented. The approximate entropy is a computationally simple and efficient estimate of entropy that measures the so-called "pattern similarity" in the time series. It is expressed as the logarithm of the ratio of two correlation sums computed for embedding dimensions m and m+1. The algorithmic complexity is also a computationally simple and efficient measure of complexity that requires that the time series is discretized to a number of symbols assigned to the allocated bins. Formally, algorithmic complexity quantifies how complex the symbolic series is in terms of the length of the shortest computer program (or the set of algorithms) that need to completely describe the symbolic series. Here, we follow the approach of Lempel and Ziv suggesting a measure of the regularity of the symbolic series, or its resemblance to a random symbolic series. Computationally, the complexity of the time series is evaluated by the plurality of distinct words of varying length in the series of symbols. The equidistant and equiprobable binning are implemented for converting the time series to a symbolic time series. There are a number of other measures of complexity, most notably the largest Lyapunov exponent, that are not implemented yet. The correlation dimension estimate is one of the most popular measures of nonlinear analysis of time series, since it was first introduced in Grassberger P. and Procaccia, I. (1983), Measuring the Strangeness of Strange Attractors, Physica D, Vol 9, pp 189-208. The inter-point distance density at fixed radius and the radius at fixed inter-point distance density have been used in the analysis of EEG, e.g. see Andrzejak R.G., Mormann F., Widman G., Kreuz T., Elger C.E., and Lehnertz K. (2006), Improved Spatial Characterization of the Epileptic Brain by Focusing on Nonlinearity, Epilepsy Vol 69, pp 30-44. McSharry P.E., Smith L.A., Tarassenko L. (2003), Comparison of Predictability of Epileptic Seizures by a Linear and a Nonlinear Methods, IEEE Transactions on Biomedical Engineering, Vol 50, No 5, pp 628-633. The false nearest neighbor method was developed in Kennel M.B., Brown, R., Abarbanel H.D.I. (1992), Determining Embedding Dimension for Phase-Space Reconstruction Using a Geometrical Construction, Physical Review A, Vol 45, pp 3403-3411. The approximate entropy estimate has been used in a number of studies on physiological data and it was introduced in Pincus S.M. (1991), Approximate Entropy as a Measure of System Complexity, Proc Natl Acad Sci USA, Vol 88, pp 2297-2301. Among different measures of algorithmic complexity (also referred to as Kolmogorov complexity), the measure suggested by Lempel and Ziv is implemented here, as given in the paper Lempel A. and Ziv J. (1976), On the Complexity of Finite Sequences, IEEE Transactions on Information Theory, Vol 22, pp 75-81. The algorithmic complexity measure though defined for symbolic sequences has been used in time series analysis, e.g. see Radhakrishnan N., James W.D. and Loizou P.C. (2000), An Alternate Partitioning Technique To Quantify The Regularity Of Complex Time Series, International Journal of Bifurcation and Chaos, Vol 10, No 7, pp 1773-1779. Zhao Y., Small M, Coward D., Howell E., Zhao C.N., Ju L. and Blair D.G. (2006) Identifying Deterministic Signals in Simulated Gravitational Wave Data: Algorithmic Complexity and Surrogate Data Method, Classical and Quantum Gravity, Vol 23, pp 1801-1814. |
|
False Nearest Neighbors (FalseNearN) |
|
False
Nearest Neighbors is
in the literature referred to as the method for estimating the minimum
embedding dimension, but here it is used as a measure given by the percentage of false nearest
neighbors computed for the given range of the delay t and the
embedding dimension m. The
following parameters can be specified: - delay (t): any valid matlab format denoting an array of positive integers or a single positive integer. The default is '1'. - embedding dimension (m) : any valid matlab format denoting an array of positive integers or a single positive integer. The default is '1' meaning that the percentage of false nearest neighbors is computed controlling for their distance in embedding dimension 1 and then 2. - Theiler window (g) : any positive integer that indicates the minimum time separation nearest neighbors are allowed to have, in order to account for temporally correlated points in the search of nearest neighbors. The default is '0' meaning that no constrained is applied to the search of nearest neighbors. This correction was suggested in Theiler, J. (1987), Efficient Algorithm for Estimating the Correlation Dimension from a Set of Discrete Points, Physical Review A, Vol 36, No 9, pp 4456-4462. - escape factor (e) : any positive integer larger than 2 (default is 10). This parameter actually sets the threshold for classifying a nearest neighbor pair at an embedding dimension m as false nearest neighbor. If the ratio of the distance of this pair of points at embedding dimension m+1 over the distance of the same pair of points at embedding dimension m is larger than the threshold the points of the pair are regarded as false nearest neighbors. The search of nearest neighboring points is facilitated by organizing the reconstructed points at each embedding dimension in a k-d-tree, as implemented by Guy Shechter. The routines are written in C and they are converted to matlab functions. They should run in Windows and Linux operational systems without any problems for matlab versions 6.5 or larger. Two parameters are fixed and specified in the beginning of the matlab function FalseNearestNeighbor.m. The first gives the maximum distance to search for nearest neighbor and the second sets a limit to the proportion of valid target points, i.e. target points for which a nearest neighbor is found (within the given distance). The given values to these parameters are meant to give reliable results, i.e. the nearest neighbor is not far away and a sufficient statistic of points (as a percentage of all points) is maintained from which the percentage of false nearest neighbors is calculated. Note that this choice may not give output for large embedding dimensions (relative to the time series length). Example: If the user selects this measure by activating the check box in the beginning of the measure line and sets for delay (t) '1 5', for embedding dimension '2:2:8', and for the Theiler window (g) and escape factor (e) uses the default values, then False Nearest Neighbors is computed for the combination of the 2 delays and 4 embedding dimensions and in the measure list the following measure names will appear FalseNearNt1m2g0e10 |
|
Correlation Dimension (CorrelaDim) |
|
The correlation
dimension í is defined by the scaling
of the correlation sum C(r) with the distance (radius)
r, C(r) ~ rí.
To estimate this scaling, first the local slope of the graph of log(C(r))
vs log(r) is computed by the smooth derivative of log(C(r))
with respect to log(r). The smooth derivative at each log(r)
is the slope of the line fitted to the set of 5 points, the one at log(r)
and the two next to it at each side. The estimate of í
is the mean local slope in the interval [r1,r2]
with the least variance of the local slope, where the length of the interval
of radii is given by the ratio r2/r1. The radii
r range from 0 to 1, as the time series is "linearly" standardized first to
minimum 0 and maximum 1. A number of measures of correlation dimension can be computed for any combination of a given range of the delay t, the embedding dimension m, and the length of the scaling interval s. The following parameters can be specified: - delay (t): any valid matlab format denoting an array of positive integers or a single positive integer. The default is '1'. - embedding dimension (m) : any valid matlab format denoting an array of positive integers or a single positive integer. The default is '1'. - Theiler window (g) : any positive integer that indicates the minimum time separation nearest neighbors are allowed to have, in order to account for temporally correlated points in the search of nearest neighbors. The default is '0' meaning that no constrained is applied to the search of neighbors. This correction was suggested in Theiler, J. (1987), Efficient Algorithm for Estimating the Correlation Dimension from a Set of Discrete Points, Physical Review A, Vol 36, No 9, pp 4456-4462. - upper/lower ratio of scaling window (s) : any valid matlab format denoting an array of positive integers larger or equal to 2 or a single positive integer larger or equal to 2. The default is '4', meaning that the length of the scaling interval is determined by the factor of 4, r2/r1=4, i.e. the ratio of the upper radius r2 to the lower radius r1 is 4. - number of radii, resolution (o) : any positive integer larger or equal to 10. The default is '100', meaning that the correlation sum will be computed for 100 radii equally spaced at a logarithmic scale from the smallest to the largest meaningful radius. Note that a large number of radii increases the resolution in terms of radii for the identification of the scaling, but increases also the computational time. Example: If the user selects this measure by activating the check box in the beginning of the measure line and sets for delay (t) '1 5', for embedding dimension '3:5', for the upper/lower ratio of scaling window (s) '2 4' and for the Theiler window (g) and number of radii (o) uses the default values, then Correlation Dimension is computed for the combination of the 2 delays, 3 embedding dimensions and 2 ratios of scaling window, and in the measure list the following measure names will appear
CorrelaDimt1m3g0s2o100 |
|
Correlation Sum for given radius (CorrelSumr) |
|
Correlation
Sum for given radius is the correlation sum computed for the given range
of the delay t, the embedding dimension m, and the radii
r. The following parameters can be specified: - radius (r): any valid matlab format denoting an array of numbers between 0 and 1 or a single number between 0 and 1. The radii r range from 0 to 1, as the time series is "linearly" standardized to minimum 0 and maximum 1.The default is '0.1'. - delay (t): any valid matlab format denoting an array of positive integers or a single positive integer. The default is '1'. - embedding dimension (m) : any valid matlab format denoting an array of positive integers or a single positive integer. The default is '1'. - Theiler window (g) : any positive integer that indicates the minimum time separation nearest neighbors are allowed to have, in order to account for temporally correlated points in the search of nearest neighbors. The default is '0' meaning that no constrained is applied to the search of neighbors. This correction was suggested in Theiler, J. (1987), Efficient Algorithm for Estimating the Correlation Dimension from a Set of Discrete Points, Physical Review A, Vol 36, No 9, pp 4456-4462. Example: If the user selects this measure by activating the check box in the beginning of the measure line and sets for radius (r) '0.1 0.2', delay (t) '1 5', and embedding dimension '3:5', and for the Theiler window (g) the default value, then Correlation Sum for given radius is computed for the combination of the 2 radii, 2 delays and 3 embedding dimensions, and in the measure list the following measure names will appear (note that the value for r is multiplied with 100 and then rounded in order to have only integers in the name of the measure)
CorrelSumrr10t1m3g0 |
|
Radius for given Correlation Sum (RadCorrSum) |
|
Radius for
given Correlation Sum is the inverse of the Correlation Sum, and finds
the radius that is associated to the given correlation sum. The measure
is computed for the given range of the correlation sum c, the delay t
and the embedding dimension m. The
inverse of the correlation sum is found iteratively, starting at a
radius value expected for the given correlation sum as if the time
series is uniform white noise. This is found by the smallest positive
root of x2-4x+4c1/m, where c
is the correlation sum and m the embedding dimension. This
starting value is adaptive to the input parameters and reduces the
number of iterations that can be computationally intensive. The
iterative scheme is rather ad hoc and the computation time can be
reduced by implementing a more efficient search algorithm. The following parameters can be specified: - correlation sum (c): any valid matlab format denoting an array of numbers between 0 and 1 or a single number between 0 and 1. The correlation sum c range from 0 to 1, as it represents the cumulative density function. The default is '0.1'. - delay (t): any valid matlab format denoting an array of positive integers or a single positive integer. The default is '1'. - embedding dimension (m) : any valid matlab format denoting an array of positive integers or a single positive integer. The default is '1'. - Theiler window (g) : any positive integer that indicates the minimum time separation nearest neighbors are allowed to have, in order to account for temporally correlated points in the search of nearest neighbors. The default is '0' meaning that no constrained is applied to the search of neighbors. This correction was suggested in Theiler, J. (1987), Efficient Algorithm for Estimating the Correlation Dimension from a Set of Discrete Points, Physical Review A, Vol 36, No 9, pp 4456-4462. Example: If the user selects this measure by activating the check box in the beginning of the measure line and sets for correlation sum (c) '0.1 0.5', delay (t) '1 5', and embedding dimension '4:5', and for the Theiler window (g) the default value, then Radius for given Correlation Sum is computed for the combination of the 2 correlation sums values, 2 delays and 2 embedding dimensions, and in the measure list the following measure names will appear (note that the value for c is multiplied with 100 and then rounded in order to have only integers in the name of the measure)
RadCorrSumc10t1m4g0 |
|
Aprroximate Entropy (ApproxEntr) |
|
Approximate
Entropy measure is the logarithm of the ratio of two correlation sums
for embedding dimensions m and
m+1. The measure is computed for the given
range of the delay t, the embedding dimension m, and the
radii r. The following parameters can be specified: - radius (r): any valid matlab format denoting an array of numbers between 0 and 1 or a single number between 0 and 1. The radii r range from 0 to 1, as the time series is "linearly" standardized to minimum 0 and maximum 1.The default is '0.1'. - delay (t): any valid matlab format denoting an array of positive integers or a single positive integer. The default is '1'. - embedding dimension (m) : any valid matlab format denoting an array of positive integers or a single positive integer. The default is '1' meaning that the correlation sum is computed for embedding dimensions 1 and then 2 in order to compute the approximate entropy. - Theiler window (g) : any positive integer that indicates the minimum time separation nearest neighbors are allowed to have, in order to account for temporally correlated points in the search of nearest neighbors. The default is '0' meaning that no constrained is applied to the search of neighbors. This correction was suggested in Theiler, J. (1987), Efficient Algorithm for Estimating the Correlation Dimension from a Set of Discrete Points, Physical Review A, Vol 36, No 9, pp 4456-4462. Example: If the user selects this measure by activating the check box in the beginning of the measure line and sets for radius (r) '0.1 0.2', delay (t) '1 5', and embedding dimension '3:5', and for the Theiler window (g) the default value, then Approximate Entropy is computed for the combination of the 2 radii, 2 delays and 3 embedding dimensions, and in the measure list the following measure names will appear (note that the value for r is multiplied with 100 and then rounded in order to have only integers in the name of the measure)
ApproxEntrr10t1m3g0 |
|
Algorithmic Complexity, equidistant bins (AlgComEqDi) |
|
Algorithmic Complexity, equidistant bins is the measure of algorithmic
complexity discretizing the time series to bins (assigned to symbols) of
equal length. The following parameter can be specified: - number of bins (b): any valid matlab format denoting an array of positive integers or a single positive integer. The default is '0' and it is used to denote the number of bins set to the rounded integer of sqrt(N/5), where N is the length of the time series. Note that b = 1 is meaningless as at least two bins should be given to split the range of values. Example: If the user selects this measure by activating the check box in the beginning of the measure line and sets for number of bins (b) '0:4', then Algorithmic Complexity, equidistant bins, is computed for the rounded integer of sqrt(N/5) (for b=0), and for b=2,3 and 4. No computations will be done for b=1 and the output value is NaN (not a number). In the measure list the following measure names will appear AlgCompEqDib0 |
|
Algorithmic Complexity, equiprobable bins (AlgComEqPr) |
|
Algorithmic
Complexity, equiprobable bins is the measure of algorithmic complexity
discretizing the time series to bins (assigned to symbols) of equal
length. The following parameter can be specified: - number of bins (b): any valid matlab format denoting an array of positive integers or a single positive integer. The default is '0' and it is used to denote the number of bins set to the rounded integer of sqrt(N/5), where N is the length of the time series. Note that b = 1 is meaningless as at least two bins should be given to split the range of values. Example: If the user selects this measure by activating the check box in the beginning of the measure line and sets for number of bins (b) '0:4', then Algorithmic Complexity, equiprobable bins, is computed for the rounded integer of sqrt(N/5) (for b=0), and for b=2,3 and 4. No computations will be done for b=1 and the output value is NaN (not a number). In the measure list the following measure names will appear AlgCompEqDib0 |
|
OK |
|
By pressing this button the window of "Dimension and Complexity Measures" will disappear and the user will be moved to the "Select / run measures" window. Any changes in the measures and parameter values will be stored. | |
Cancel |
|
Quit without doing anything and return to the "Select / run measures" window. Any changes in the measures and parameter values will be ignored. | |
Help |
|
This file will be shown. |