Dimension and Complexity Measures

 
Nonlinear dynamical systems observed from time series are identified in terms of their invariant measures as estimated from the time series. Some invariant measures are associated with the dimension and complexity of the underlying to time series system, or more specifically, to the attractor, the set of points or trajectories generated by the dynamical system.

The correlation dimension is an invariant measure of the attractors fractal dimension and it is the most popular among other similar fractal dimension measures, such as the information and box-counting dimension. The correlation dimension measures the self-similarity property of the so-called strange attractors, i.e. attractors that possess this property and have a non-integer fractal dimension. The "regular" attractors, which are objects or set of points that can be sufficiently described in terms of the topological and Euclidean dimension, have an integer correlation dimension. The correlation dimension is based on the density of points of the attractor and this makes possible to estimate it from the sample density of the points reconstructed from the time series, typically from delay embedding for a delay t and an embedding dimension m. The estimation starts with the computation of the so-called correlation sum C(r) (abusively, it is also called correlation integral), which is calculated for a range of radii r. For each r, C(r) is the average fraction of points within this radius r (r is referred to as radius because the Euclidean norm is used to compute the inter-point distances of all points from each reference point). Given that the underlying dynamical system is deterministic, the correlation sum scales with the power of r as C(r) ~ rí, and the exponent v is the correlation dimension. Theoretically, the scaling should hold for very small r (r →0), but in practice the scaling is bounded at small r by the lack of close points (dependent on the length of the time series) and the masking of noise (even for noise-free data a lower bound for r is set by the round-off error). Computationally, for the estimation of í, first  the local slope of the graph of log(C(r)) vs log(r) is computed by the smooth derivative of log(C(r)) with respect to log(r). Provided that the scaling C(r) ~ rí exists at least for a sufficiently large range of r, a horizontal plateau of the local slope vs log(r) should be formed for this range of r. The least size of the range is usually given in terms of the ratio of the upper edge r2 to the lower edge r1 of the scaling r-interval, and a commonly used ratio is r2/r1 = 4. Here, the estimate of í is the mean local slope in the interval [r1,r2] with the least variance of the local slope.   

In real world applications, reliable estimation of í can be hard (meaning that the variance of the local slope is large) which indicates that there is no clear scaling. In that cases, other simpler measures derived from the correlation sum may be more useful, such as the value of the correlation sum C(r) for a given r, which is simply the cumulative density function of inter-point distances at a given inter-point distance, or the value of r that corresponds to a given C(r), which is the inverse of this cumulative density.

The embedding dimension m is an important parameter for the analysis of time series from the dynamical system approach, and determines the dimension of the Euclidean pseudo-space in which supposedly the attractor is reconstructed from the time series. An m that is small but sufficiently large to unfold the attractor is sought and a popular method to estimate it is the false nearest neighbors. This method increases the embedding dimension by one at each step and counts the percentage of points for which its nearest neighbor falls apart with the addition of a new component (from embedding dimension m to m+1), and therefore these points are called false nearest neighbors. The estimated minimum embedding dimension is the one that first gives an insignificant percentage of false nearest neighbors. Though the minimum m is not an invariant measure, as it depends for example on the delay parameter t, it is used in some applications as a measure to discriminate different dynamical regimes and it is therefore included here as well.  

Regarding the complexity measures, the approximate entropy and the algorithmic complexity are implemented. The approximate entropy is a computationally simple and efficient estimate of entropy that measures the so-called "pattern similarity" in the time series. It is expressed as the logarithm of the ratio of two correlation sums computed for embedding dimensions m and m+1.

The algorithmic complexity is also a computationally simple and efficient measure of complexity that requires that the time series is discretized to a number of symbols assigned to the allocated bins. Formally, algorithmic complexity quantifies how complex the symbolic series is in terms of the length of the shortest computer program (or the set of algorithms) that need to completely describe the symbolic series. Here, we follow the approach of Lempel and Ziv suggesting a measure of the regularity of the symbolic series, or its resemblance to a random symbolic series. Computationally, the complexity of the time series is evaluated by the plurality of distinct words of varying length in the series of symbols. The equidistant and equiprobable binning are implemented for converting the time series to a symbolic time series.

There are a number of other measures of complexity, most notably the largest Lyapunov exponent, that are not implemented yet.

The correlation dimension estimate is one of the most popular measures of nonlinear analysis of time series, since it was first introduced in

Grassberger P. and Procaccia, I. (1983), Measuring the Strangeness of Strange Attractors, Physica D, Vol 9, pp 189-208.

The inter-point distance density at fixed radius and the radius at fixed inter-point distance density have been used in the analysis of EEG, e.g. see

Andrzejak R.G., Mormann F., Widman G., Kreuz T., Elger C.E., and Lehnertz K. (2006), Improved Spatial Characterization of the Epileptic Brain by Focusing on Nonlinearity, Epilepsy Vol 69, pp 30-44.

McSharry P.E., Smith L.A., Tarassenko L. (2003), Comparison of Predictability of Epileptic Seizures by a Linear and a Nonlinear Methods, IEEE Transactions on Biomedical Engineering, Vol 50, No 5, pp 628-633.

The false nearest neighbor method was developed in

Kennel M.B., Brown, R., Abarbanel H.D.I. (1992), Determining Embedding Dimension for Phase-Space Reconstruction Using a Geometrical Construction, Physical Review A, Vol 45, pp 3403-3411.   

The approximate entropy estimate has been used in a number of studies on physiological data and it was introduced in

Pincus S.M. (1991), Approximate Entropy as a Measure of System Complexity, Proc Natl Acad Sci USA, Vol 88, pp 2297-2301.

Among different measures of algorithmic complexity (also referred to as Kolmogorov complexity), the measure suggested by Lempel and Ziv is implemented here, as given in the paper

Lempel A. and Ziv J. (1976), On the Complexity of Finite Sequences, IEEE Transactions on Information Theory, Vol 22, pp 75-81.

The algorithmic complexity measure though defined for symbolic sequences has been used in time series analysis, e.g. see

Radhakrishnan N., James W.D. and Loizou P.C. (2000), An Alternate Partitioning Technique To Quantify The Regularity Of Complex Time Series, International Journal of Bifurcation and Chaos, Vol 10, No 7, pp 1773-1779.

Zhao Y., Small M, Coward D., Howell E., Zhao C.N., Ju L. and Blair D.G. (2006) Identifying Deterministic Signals in Simulated Gravitational Wave Data: Algorithmic Complexity and Surrogate Data Method, Classical and Quantum Gravity, Vol 23, pp 1801-1814.

 

False Nearest Neighbors (FalseNearN)

False Nearest Neighbors is in the literature referred to as the method for estimating the minimum embedding dimension, but here it is used as a measure given by the percentage of false nearest neighbors computed for the given range of the delay t and the embedding dimension m. The following parameters can be specified:

- delay (t): any valid matlab format denoting an array of positive integers or a single positive integer. The default is '1'.

- embedding dimension (m) : any valid matlab format denoting an array of positive integers or a single positive integer. The default is '1' meaning that the percentage of false nearest neighbors is computed controlling for their distance in embedding dimension 1 and then 2.

- Theiler window (g) : any positive integer that indicates the minimum time separation nearest neighbors are allowed to have, in order to account for temporally correlated points in the search of nearest neighbors. The default is '0' meaning that no constrained is applied to the search of nearest neighbors. This correction was suggested in Theiler, J. (1987), Efficient Algorithm for Estimating the Correlation Dimension from a Set of Discrete Points, Physical Review A, Vol 36, No 9, pp 4456-4462.

- escape factor (e) : any positive integer larger than 2 (default is 10). This parameter actually sets the threshold for classifying a nearest neighbor pair at an embedding dimension m as false nearest neighbor. If the ratio of the distance of this pair of points at embedding dimension m+1 over the distance of the same pair of points at embedding dimension m is larger than the threshold the points of the pair are regarded as false nearest neighbors.

The search of nearest neighboring points is facilitated by organizing the reconstructed points at each embedding dimension in a k-d-tree, as implemented by Guy Shechter. The routines are written in C and they are converted to matlab functions. They should run in Windows and Linux operational systems without any problems for matlab versions 6.5 or larger. Two parameters are fixed and specified in the beginning of the matlab function FalseNearestNeighbor.m. The first gives the maximum distance to search for nearest neighbor and the second sets a limit to the proportion of valid target points, i.e. target points for which a nearest neighbor is found (within the given distance). The given values to these parameters are meant to give reliable results, i.e. the nearest neighbor is not far away and a sufficient statistic of points (as a percentage of all points) is maintained from which the percentage of false nearest neighbors is calculated. Note that this choice may not give output for large embedding dimensions (relative to the time series length).

Example: If the user selects this measure by activating the check box in the beginning of the measure line and sets for delay (t) '1 5', for embedding dimension '2:2:8', and for the Theiler window (g) and escape factor (e) uses the default values, then False Nearest Neighbors is computed for the combination of the 2 delays and 4 embedding dimensions and in the measure list the following measure names will appear

FalseNearNt1m2g0e10
FalseNearNt1m4g0e10
FalseNearNt1m6g0e10
FalseNearNt1m8g0e10
FalseNearNt5m2g0e10
FalseNearNt5m4g0e10
FalseNearNt5m6g0e10
FalseNearNt5m8g0e10

 

Correlation Dimension (CorrelaDim)

The correlation dimension í is defined by the scaling of the correlation sum C(r) with the distance (radius) r, C(r) ~ rí. To estimate this scaling, first the local slope of the graph of log(C(r)) vs log(r) is computed by the smooth derivative of log(C(r)) with respect to log(r). The smooth derivative at each log(r) is the slope of the line fitted to the set of 5 points, the one at log(r) and the two next to it at each side. The estimate of í is the mean local slope in the interval [r1,r2] with the least variance of the local slope, where the length of the interval of radii is given by the ratio r2/r1. The radii r range from 0 to 1, as the time series is "linearly" standardized first to minimum 0 and maximum 1.

A number of measures of correlation dimension can be computed for any combination of a given range of the delay t, the embedding dimension m, and the length of the scaling interval s. The following parameters can be specified:

- delay (t): any valid matlab format denoting an array of positive integers or a single positive integer. The default is '1'.

- embedding dimension (m) : any valid matlab format denoting an array of positive integers or a single positive integer. The default is '1'.

- Theiler window (g) : any positive integer that indicates the minimum time separation nearest neighbors are allowed to have, in order to account for temporally correlated points in the search of nearest neighbors. The default is '0' meaning that no constrained is applied to the search of neighbors. This correction was suggested in Theiler, J. (1987), Efficient Algorithm for Estimating the Correlation Dimension from a Set of Discrete Points, Physical Review A, Vol 36, No 9, pp 4456-4462.

- upper/lower ratio of scaling window (s) : any valid matlab format denoting an array of positive integers larger or equal to 2 or a single positive integer larger or equal to 2. The default is '4', meaning that the length of the scaling interval is determined by the factor of 4, r2/r1=4, i.e. the ratio of the upper radius r2 to the lower radius r1 is 4.

- number of radii, resolution (o) : any positive integer larger or equal to 10. The default is '100', meaning that the correlation sum will be computed for 100 radii equally spaced at a logarithmic scale from the smallest to the largest meaningful radius. Note that a large number of radii increases the resolution in terms of radii for the identification of the scaling, but increases also the computational time.

Example: If the user selects this measure by activating the check box in the beginning of the measure line and sets for delay (t) '1 5', for embedding dimension '3:5', for the upper/lower ratio of scaling window (s) '2 4' and for the Theiler window (g) and number of radii (o) uses the default values, then Correlation Dimension is computed for the combination of the 2 delays, 3 embedding dimensions and 2 ratios of scaling window, and in the measure list the following measure names will appear

CorrelaDimt1m3g0s2o100
CorrelaDimt1m3g0s4o100
CorrelaDimt1m4g0s2o100
CorrelaDimt1m4g0s4o100
CorrelaDimt1m5g0s2o100
CorrelaDimt1m5g0s4o100
CorrelaDimt2m3g0s2o100
CorrelaDimt2m3g0s4o100
CorrelaDimt2m4g0s2o100
CorrelaDimt2m4g0s4o100
CorrelaDimt2m5g0s2o100
CorrelaDimt2m5g0s4o100

 

Correlation Sum for given radius (CorrelSumr)

Correlation Sum for given radius is the correlation sum computed for the given range of the delay t, the embedding dimension m, and the radii r. The following parameters can be specified:

- radius (r): any valid matlab format denoting an array of numbers between 0 and 1 or a single number between 0 and 1. The radii r range from 0 to 1, as the time series is "linearly" standardized to minimum 0 and maximum 1.The default is '0.1'.

- delay (t): any valid matlab format denoting an array of positive integers or a single positive integer. The default is '1'.

- embedding dimension (m) : any valid matlab format denoting an array of positive integers or a single positive integer. The default is '1'.

- Theiler window (g) : any positive integer that indicates the minimum time separation nearest neighbors are allowed to have, in order to account for temporally correlated points in the search of nearest neighbors. The default is '0' meaning that no constrained is applied to the search of neighbors. This correction was suggested in Theiler, J. (1987), Efficient Algorithm for Estimating the Correlation Dimension from a Set of Discrete Points, Physical Review A, Vol 36, No 9, pp 4456-4462.

Example: If the user selects this measure by activating the check box in the beginning of the measure line and sets for radius (r) '0.1 0.2', delay (t) '1 5', and embedding dimension '3:5', and for the Theiler window (g) the default value, then Correlation Sum for given radius is computed for the combination of the 2 radii, 2 delays and 3 embedding dimensions, and in the measure list the following measure names will appear (note that the value for r is multiplied with 100 and then rounded in order to have only integers in the name of the measure)

CorrelSumrr10t1m3g0
CorrelSumrr10t1m4g0
CorrelSumrr10t1m5g0
CorrelSumrr10t5m3g0
CorrelSumrr10t5m4g0
CorrelSumrr10t5m5g0
CorrelSumrr20t1m3g0
CorrelSumrr20t1m4g0
CorrelSumrr20t1m5g0
CorrelSumrr20t5m3g0
CorrelSumrr20t5m4g0
CorrelSumrr20t5m5g0

 

Radius for given Correlation Sum (RadCorrSum)

Radius for given Correlation Sum is the inverse of the Correlation Sum, and finds the radius that is associated to the given correlation sum. The measure is computed for the given range of the correlation sum c, the delay t and the embedding dimension m. The inverse of the correlation sum is found iteratively, starting at a radius value expected for the given correlation sum as if the time series is uniform white noise. This is found by the smallest positive root of x2-4x+4c1/m, where c is the correlation sum and m the embedding dimension. This starting value is adaptive to the input parameters and reduces the number of iterations that can be computationally intensive. The iterative scheme is rather ad hoc and the computation time can be reduced by implementing a more efficient search algorithm.

The following parameters can be specified:

- correlation sum (c): any valid matlab format denoting an array of numbers between 0 and 1 or a single number between 0 and 1. The correlation sum c range from 0 to 1, as it represents the cumulative density function. The default is '0.1'.

- delay (t): any valid matlab format denoting an array of positive integers or a single positive integer. The default is '1'.

- embedding dimension (m) : any valid matlab format denoting an array of positive integers or a single positive integer. The default is '1'.

- Theiler window (g) : any positive integer that indicates the minimum time separation nearest neighbors are allowed to have, in order to account for temporally correlated points in the search of nearest neighbors. The default is '0' meaning that no constrained is applied to the search of neighbors. This correction was suggested in Theiler, J. (1987), Efficient Algorithm for Estimating the Correlation Dimension from a Set of Discrete Points, Physical Review A, Vol 36, No 9, pp 4456-4462.

Example: If the user selects this measure by activating the check box in the beginning of the measure line and sets for correlation sum (c) '0.1 0.5', delay (t) '1 5', and embedding dimension '4:5', and for the Theiler window (g) the default value, then Radius for given Correlation Sum is computed for the combination of the 2 correlation sums values, 2 delays and 2 embedding dimensions, and in the measure list the following measure names will appear (note that the value for c is multiplied with 100 and then rounded in order to have only integers in the name of the measure)

RadCorrSumc10t1m4g0
RadCorrSumc10t1m5g0
RadCorrSumc10t5m4g0
RadCorrSumc10t5m5g0
RadCorrSumc50t1m4g0
RadCorrSumc50t1m5g0
RadCorrSumc50t5m4g0
RadCorrSumc50t5m5g0

 

Aprroximate Entropy (ApproxEntr)

Approximate Entropy measure is the logarithm of the ratio of two correlation sums for embedding dimensions m and m+1. The measure is computed for the given range of the delay t, the embedding dimension m, and the radii r. The following parameters can be specified:

- radius (r): any valid matlab format denoting an array of numbers between 0 and 1 or a single number between 0 and 1. The radii r range from 0 to 1, as the time series is "linearly" standardized to minimum 0 and maximum 1.The default is '0.1'.

- delay (t): any valid matlab format denoting an array of positive integers or a single positive integer. The default is '1'.

- embedding dimension (m) : any valid matlab format denoting an array of positive integers or a single positive integer. The default is '1' meaning that the correlation sum is computed for embedding dimensions 1 and then 2 in order to compute the approximate entropy.

- Theiler window (g) : any positive integer that indicates the minimum time separation nearest neighbors are allowed to have, in order to account for temporally correlated points in the search of nearest neighbors. The default is '0' meaning that no constrained is applied to the search of neighbors. This correction was suggested in Theiler, J. (1987), Efficient Algorithm for Estimating the Correlation Dimension from a Set of Discrete Points, Physical Review A, Vol 36, No 9, pp 4456-4462.

Example: If the user selects this measure by activating the check box in the beginning of the measure line and sets for radius (r) '0.1 0.2', delay (t) '1 5', and embedding dimension '3:5', and for the Theiler window (g) the default value, then Approximate Entropy is computed for the combination of the 2 radii, 2 delays and 3 embedding dimensions, and in the measure list the following measure names will appear (note that the value for r is multiplied with 100 and then rounded in order to have only integers in the name of the measure)

ApproxEntrr10t1m3g0
ApproxEntrr10t1m4g0
ApproxEntrr10t1m5g0
ApproxEntrr10t5m3g0
ApproxEntrr10t5m4g0
ApproxEntrr10t5m5g0
ApproxEntrr20t1m3g0
ApproxEntrr20t1m4g0
ApproxEntrr20t1m5g0
ApproxEntrr20t5m3g0
ApproxEntrr20t5m4g0
ApproxEntrr20t5m5g0

 

Algorithmic Complexity, equidistant bins  (AlgComEqDi)

Algorithmic Complexity, equidistant bins is the measure of algorithmic complexity discretizing the time series to bins (assigned to symbols) of equal length. The following parameter can be specified:

- number of bins (b): any valid matlab format denoting an array of positive integers or a single positive integer. The default is '0' and it is used to denote the number of bins set to the rounded integer of sqrt(N/5), where N is the length of the time series. Note that b = 1 is meaningless as at least two bins should be given to split the range of values.

Example: If the user selects this measure by activating the check box in the beginning of the measure line and sets for number of bins (b) '0:4', then Algorithmic Complexity, equidistant bins, is computed for the rounded integer of sqrt(N/5) (for b=0), and for b=2,3 and 4. No computations will be done for b=1 and the output value is NaN (not a number). In the measure list the following measure names will appear

AlgCompEqDib0
AlgCompEqDib1
AlgCompEqDib2
AlgCompEqDib3
AlgCompEqDib4

 

Algorithmic Complexity, equiprobable bins (AlgComEqPr)

Algorithmic Complexity, equiprobable bins is the measure of algorithmic complexity discretizing the time series to bins (assigned to symbols) of equal length. The following parameter can be specified:

- number of bins (b): any valid matlab format denoting an array of positive integers or a single positive integer. The default is '0' and it is used to denote the number of bins set to the rounded integer of sqrt(N/5), where N is the length of the time series. Note that b = 1 is meaningless as at least two bins should be given to split the range of values.

Example: If the user selects this measure by activating the check box in the beginning of the measure line and sets for number of bins (b) '0:4', then Algorithmic Complexity, equiprobable bins, is computed for the rounded integer of sqrt(N/5) (for b=0), and for b=2,3 and 4. No computations will be done for b=1 and the output value is NaN (not a number). In the measure list the following measure names will appear

AlgCompEqDib0
AlgCompEqDib1
AlgCompEqDib2
AlgCompEqDib3
AlgCompEqDib4

 

OK

By pressing this button the window of "Dimension and Complexity Measures" will disappear and the user will be moved to the "Select / run measures" window. Any changes in the measures and parameter values will be stored.
 

Cancel

Quit without doing anything and return to the "Select / run measures" window. Any changes in the measures and parameter values will be ignored.
 

Help

This file will be shown.