Nonlinear Model Measures

 

Among different classes of nonlinear models, the local state space models are implemented here. This class is popular in nonlinear analysis of time series because it is effective for fitting and prediction tasks and it is intuitively simple. Different variations of the local model are implemented, and different statistical measures of fitting and prediction can be derived for a number of varying model specific parameters. These are the delay (t), embedding dimension (m), number of neighbors (k) and prediction times (h).

The local region supporting the model for fitting or prediction is determined here by the number of neighbours to the target point formed from the available set of reconstructed points. The k-d-tree data structure is utilized to speed up computation time in the search of neighboring points, as implemented by Guy Shechter. The routines are written in C and they are converted to matlab functions. They should run in Windows and Linux operational systems without any problems. The point distances are computed using the Euclidean (L2) norm.

The difference between the fitting and prediction measures is in the set of samples that are used to compute the statistical measure: all the samples are used for fitting and the samples in the so-called test set are used for prediction. Here, the test set is the last part of the time series given by the parameter of fraction for test set (f). For multi-step fit or prediction, both the direct and iterative schemes are implemented. For the direct fit or prediction at a lead time h, the h-step ahead mappings are used to form the estimate of the target points at a lead time h, whereas for the iterative fit or prediction, the one step predictions from previous steps are used to make predictions for the current step until the h-step is reached.

The local models implemented here are listed below, where the task is to predict h-step ahead from a current state xi, where all states (points) are reconstructed with a delay t and an embedding dimension m, and the local model is estimated on the basis of the k nearest neighbors of xi, denoted as xi(j), for j=1,...,k. The selection of the specific model is determined by the value of the so-called truncation parameter (q). 

- The local average model (LAM), called also model of zero order, predicts xi+h  from the average of the mappings of the neighbors at lead time h, i.e. xi(j)+h, for j=1,...,k. This model is selected by q = 0.

- The local linear model (LLM) is the linear autoregressive model xi+h = a0 + aT xi,  based only on the neighboring points xi(j), for j=1,...,k. The parameters a0 and a are estimated by ordinary least squares (OLS) and this requires that k > m+1. The prediction xi+h is computed from the equation above for the target point xi. Note that the solution for the model parameters may be numerically unstable if k is close to m. This model is selected by using any q m.

- The local linear model as above but  regularizing the OLS solution for the model parameters with principal component regression (PCR). PCR rotates the parameter space to match the basis formed by the principal components (found by Singular Value Decomposition (SVD) of the matrix formed by the neighboring points). Then the space is projected to the subspace formed by the q first principal axes, the solution for the parameters is found in this subspace and it is transformed back to the original state space to yield the PCR regularized solution for the parameters. In this way, the estimated parameters have smaller variance (they are more stable) at the cost of an introduced bias. Another advantage is that PCR may at cases reduce the effect of noise. The condition for the PCR solution is that k > q+1, so that stable solutions can be reached even when m > k provided that the truncation parameter is sufficiently small. This solution is selected by using 0 < q < m.

Four standard statistical measures are encountered, the mean square error (MSE), the normalized mean square error (NMSE), the normalized root mean square error (NMSE) and the correlation coefficient (CC). The three last measures account for the variance of the time series and this allows comparison of the measure across different time series.   

The local state space models are commonly used under the hypothesis of an underlying deterministic system to the time series, e.g. see

Abarbanel H.D.I. (1996), Analysis of Observed Chaotic Data, Springer, New York.

Kantz H. and Schreiber T. (1997), Nonlinear Time Series Analysis, Cambridge University Press, Cambridge.

The local linear model was first introduced in

Farmer J.D. and Sidorowich J.J. (1987), Predicting Chaotic Time Series, Physical Review Letters, Vol 59, pp 845-848.

The PCR regularization of the OLS solution for the local linear model and the difference in direct and iterative prediction are presented in

Kugiumtzis D., Lingjærde O.C. and Christophersen N. (1998), Regularized Local Linear Prediction of Chaotic Time Series, Physica D, Vol 112, pp 344-360.

Kugiumtzis D. (2002), State Space Local Linear Prediction, in Modelling and Forecasting Financial Data, Techniques of Nonlinear Dynamics, eds Soofi A. and Cao L., Kluwer Academic Publishers, Chp 4, pp 95-113.

 

Local Average or Linear Direct Fit (Loc_DirFit)

  Local Average or Linear Direct Fit gives four statistical measures of goodness of fit with a local model of order zero (the local average model, LAM) or one (the local linear model, LLM). A number of different delays, embedding dimensions and number of neighbors for the local model can be given and the fit can be computed for a number of lead times using the direct prediction scheme. The following parameters can be specified:

- delay (t): any valid matlab format denoting an array of positive integers or a single positive integer. The default is '1'.

- embedding dimension (m) : any valid matlab format denoting an array of positive integers or a single positive integer. The default is '1' meaning that no state space reconstruction will be done and the points are simply the samples of the time series.

- number of neighbors (k) : any valid matlab format denoting an array of positive integers or a single positive integer. The default is '1' meaning that the computations of the goodness of fit measures will be done for the simplest local model where the prediction is based on the mapping of the nearest neighbor of the target point. This prediction is called also for nearest neighbor prediction. 

- prediction time (h) : any valid matlab format denoting an array of positive integers or a single positive integer. The default is '1' meaning that the computations of the goodness of fit measures will be done only for one-step ahead.

- truncation parameter (q) : any positive integer. This parameter actually determines the type of local model to be used. The default is '0' meaning that the local average model will be used. If qm the standard local linear model with OLS solution for the model parameters will be used. So, if the user wants to run this model for a number of different embedding dimensions, just use q larger than the maximum of the selected embedding dimensions. If 0 < q  < m  the local linear model with the PCR regularized solution for the model parameters will be used. Care should be taken that the condition q < k (or m < k if q > m) always holds, otherwise the equation system is under-determined and there is no unique solution for the model parameters leading to unstable parameter estimates. The solution for the model parameters can be unstable (in the sense that the variance of the estimates is very large) also when q (or m if q > m) is at the level of k

The user can activate (check) any of the four statistical measures to save results for. If none is checked then results for all four measures will be saved and the measures will be included in the measure list. The statistical measures are the following:

- MSE : if checked the mean square error (MSE) measure of goodness of fit for the specified parameters will be included in the list of measures. MSE is the sum of squares of prediction errors defined as

where xi+h is the actual sample and xi(h) is the h-step ahead prediction at current time i, for i=m,...,N-h, and N is the length of the time series.  

- NMSE : if checked the normalized mean square error (MSE) measure of goodness of fit for the specified parameters will be included in the list of measures. NMSE is the MSE divided by the variance of the samples included in the sum of MSE, defined as

where is the mean of the samples in the sum.

- NRMSE : if checked the normalized root mean square error (MSE) measure of goodness of fit for the specified parameters will be included in the list of measures. NRMSE is the square root of NMSE.

- CC : if checked the correlation coefficient (CC) measure of goodness of fit for the specified parameters will be included in the list of measures. CC is the standard Pearson correlation coefficient between the actual samples and the respective predictions, i.e. xi+h and xi(h) for h-step ahead prediction at current time i.

Example: If the user selects this measure by activating the check box in the beginning of the measure line and sets for the delay (t) '1', for embedding dimension (m) '5 10', for the number of nearest neighbors (k) '10 20', for prediction time (h) '1:3' and for truncation parameter (q) '20' (no truncation with PCR since q m), and checks only NRMSE then the NRMSE measure of Local Average or Linear Direct Fit is computed for the combinations of the 2 values of m, 2 values of k and the 3 values of h and in the measure list the following measure names will appear

Loc_DirFitNRMSEt1m5k10q20h1
Loc_DirFitNRMSEt1m5k10q20h2
Loc_DirFitNRMSEt1m5k10q20h3
Loc_DirFitNRMSEt1m5k20q20h1
Loc_DirFitNRMSEt1m5k20q20h2
Loc_DirFitNRMSEt1m5k20q20h3
Loc_DirFitNRMSEt1m10k10q20h1    (these measure values should not be trusted as m = k)
Loc_DirFitNRMSEt1m10k10q20h2    (these measure values should not be trusted as m = k)
Loc_DirFitNRMSEt1m10k10q20h3    (these measure values should not be trusted as m = k)
Loc_DirFitNRMSEt1m10k20q20h1
Loc_DirFitNRMSEt1m10k20q20h2
Loc_DirFitNRMSEt1m10k20q20h3

 

Local Average or Linear Iterative Fit (Loc_IteFit)

  Local Average or Linear Iterative Fit gives four statistical measures of goodness of fit with a local model of order zero (the local average model, LAM) or one (the local linear model, LLM). A number of different delays, embedding dimensions and number of neighbors for the local model can be given and the fit can be computed for a number of lead times using the Iterative prediction scheme. The following parameters can be specified:

- delay (t): any valid matlab format denoting an array of positive integers or a single positive integer. The default is '1'.

- embedding dimension (m) : any valid matlab format denoting an array of positive integers or a single positive integer. The default is '1' meaning that no state space reconstruction will be done and the points are simply the samples of the time series.

- number of neighbors (k) : any valid matlab format denoting an array of positive integers or a single positive integer. The default is '1' meaning that the computations of the goodness of fit measures will be done for the simplest local model where the prediction is based on the mapping of the nearest neighbor of the target point. This prediction is called also for nearest neighbor prediction. 

- prediction time (h) : any valid matlab format denoting an array of positive integers or a single positive integer. The default is '1' meaning that the computations of the goodness of fit measures will be done only for one-step ahead.

- truncation parameter (q) : any positive integer. This parameter actually determines the type of local model to be used. The default is '0' meaning that the local average model will be used. If qm the standard local linear model with OLS solution for the model parameters will be used. So, if the user wants to run this model for a number of embedding dimensions, just use q larger than the maximum of the selected embedding dimensions. If 0 < q  < m  the local linear model with the PCR regularized solution for the model parameters will be used. Care should be taken that the condition q < k (or m < k if q > m) always holds, otherwise the equation system is under-determined and there is no unique solution for the model parameters leading to unstable parameter estimates. The solution for the model parameters can be unstable (in the sense that the variance of the estimates is very large) also when q (or m if q > m) is at the level of k

The user can activate (check) any of the four statistical measures to save results for. If none is checked then results for all four measures will be saved and the measures will be included in the measure list. The statistical measures are the following:

- MSE : if checked the mean square error (MSE) measure of goodness of fit for the specified parameters will be included in the list of measures. MSE is the sum of squares of prediction errors defined as

where xi+h is the actual sample and xi(h) is the h-step ahead prediction at current time i, for i=m,...,N-h, and N is the length of the time series.  

- NMSE : if checked the normalized mean square error (MSE) measure of goodness of fit for the specified parameters will be included in the list of measures. NMSE is the MSE divided by the variance of the samples included in the sum of MSE, defined as

where is the mean of the samples in the sum.

- NRMSE : if checked the normalized root mean square error (MSE) measure of goodness of fit for the specified parameters will be included in the list of measures. NRMSE is the square root of NMSE.

- CC : if checked the correlation coefficient (CC) measure of goodness of fit for the specified parameters will be included in the list of measures. CC is the standard Pearson correlation coefficient between the actual samples and the respective predictions, i.e. xi+h and xi(h) for h-step ahead prediction at current time i.

Example: If the user selects this measure by activating the check box in the beginning of the measure line and sets for the delay (t) '1', for embedding dimension (m) '5 10', for the number of nearest neighbors (k) '10 20', for prediction time (h) '1:3' and for truncation parameter (q) '20' (no truncation with PCR since q m), and checks only NRMSE then the NRMSE measure of Local Average or Linear Iterative Fit is computed for the combinations of the 2 values of m, 2 values of k and the 3 values of h and in the measure list the following measure names will appear

Loc_IteFitNRMSEt1m5k10q20h1
Loc_IteFitNRMSEt1m5k10q20h2
Loc_IteFitNRMSEt1m5k10q20h3
Loc_IteFitNRMSEt1m5k20q20h1
Loc_IteFitNRMSEt1m5k20q20h2
Loc_IteFitNRMSEt1m5k20q20h3
Loc_IteFitNRMSEt1m10k10q20h1    (these measure values should not be trusted as m = k)
Loc_IteFitNRMSEt1m10k10q20h2    (these measure values should not be trusted as m = k)
Loc_IteFitNRMSEt1m10k10q20h3    (these measure values should not be trusted as m = k)
Loc_IteFitNRMSEt1m10k20q20h1
Loc_IteFitNRMSEt1m10k20q20h2
Loc_IteFitNRMSEt1m10k20q20h3

Local Average or Linear Direct Prediction (Loc_DirPre)
  Local Average or Linear Direct Prediction gives four statistical measures of goodness of prediction with a local model of order zero (the local average model, LAM) or one (the local linear model, LLM). The local average or linear model is estimated in the first part of the time series, the so-called training set, and the predictions are made in the second part of the time series, the so-called test set. A number of different delays, embedding dimensions and number of neighbors for the local model can be given and the prediction can be computed for a number of lead times using the direct prediction scheme. The following parameters can be specified:

 - fraction for test set (f) : any number between 0.1 and 0.9. The default is '0.5' meaning that the test set is the second half of the time series. Typically f should be smaller or equal to 0.5, so that the test set does not exceed in size the training set. 

- delay (t): any valid matlab format denoting an array of positive integers or a single positive integer. The default is '1'.

- embedding dimension (m) : any valid matlab format denoting an array of positive integers or a single positive integer. The default is '1' meaning that no state space reconstruction will be done and the points are simply the samples of the time series.

- number of neighbors (k) : any valid matlab format denoting an array of positive integers or a single positive integer. The default is '1' meaning that the computations of the goodness of prediction measures will be done for the simplest local model where the prediction is based on the mapping of the nearest neighbor of the target point in the training set. This prediction is called also for nearest neighbor prediction. 

- prediction time (h) : any valid matlab format denoting an array of positive integers or a single positive integer. The default is '1' meaning that the computations of the goodness of prediction measures will be done only for one-step ahead.

- truncation parameter (q) : any positive integer. This parameter actually determines the type of local model to be used. The default is '0' meaning that the local average model will be used. If qm the standard local linear model with OLS solution for the model parameters will be used. So, if the user wants to run this model for a number of embedding dimensions, just use q larger than the maximum of the selected embedding dimensions. If 0 < q  < m  the local linear model with the PCR regularized solution for the model parameters will be used. Care should be taken that the condition q < k (or m < k if q > m) always holds, otherwise the equation system is under-determined and there is no unique solution for the model parameters leading to unstable parameter estimates. The solution for the model parameters can be unstable (in the sense that the variance of the estimates is very large) also when q (or m if q > m) is at the level of k

The user can activate (check) any of the four statistical measures to save results for. If none is checked then results for all four measures will be saved and the measures will be included in the measure list. The statistical measures are the following:

- MSE : if checked the mean square error (MSE) measure of prediction in the test set for the specified parameters will be included in the list of measures. MSE is the sum of squares of prediction errors defined as

where xi+h is the actual sample and xi(h) is the h-step ahead prediction at current time i, for i=N1,...,N-h, N is the length of the time series and N1 is the length of the training set ( N1 = (1-f)N ). 

- NMSE : if checked the normalized mean square error (MSE) measure of prediction in the test set for the specified parameters will be included in the list of measures. NMSE is the MSE divided by the variance of the samples included in the sum of MSE, defined as

where is the mean of the samples in the sum

- NRMSE : if checked the normalized root mean square error (MSE) measure of prediction in the test set for the specified parameters will be included in the list of measures. NRMSE is the square root of NMSE.

- CC : if checked the correlation coefficient (CC) measure of prediction in the test set for the specified parameters will be included in the list of measures. CC is the standard Pearson correlation coefficient between the actual samples and the respective predictions.

Example: If the user selects this measure by activating the check box in the beginning of the measure line and sets for fraction for test set (f) '0.25', for the delay (t) '1', for embedding dimension (m) '5 10', for the number of nearest neighbors (k) '10 20', for prediction time (h) '1:3' and for truncation parameter (q) '20' (no truncation with PCR since q m), and checks only NRMSE then the NRMSE measure of Local Average or Linear Direct Prediction is computed on a test set of length one forth of time series length for the combinations of the 2 values of m, 2 values of k and the 3 values of h and in the measure list the following measure names will appear

Loc_DirPreNRMSEt1m5k10q20h1
Loc_DirPreNRMSEt1m5k10q20h2
Loc_DirPreNRMSEt1m5k10q20h3
Loc_DirPreNRMSEt1m5k20q20h1
Loc_DirPreNRMSEt1m5k20q20h2
Loc_DirPreNRMSEt1m5k20q20h3
Loc_DirPreNRMSEt1m10k10q20h1    (these measure values should not be trusted as m = k)
Loc_DirPreNRMSEt1m10k10q20h2    (these measure values should not be trusted as m = k)
Loc_DirPreNRMSEt1m10k10q20h3    (these measure values should not be trusted as m = k)
Loc_DirPreNRMSEt1m10k20q20h1
Loc_DirPreNRMSEt1m10k20q20h2
Loc_DirPreNRMSEt1m10k20q20h3

 

Local Average or Linear Iterative Prediction (Loc_ItePre)

  Local Average or Linear Iterative Prediction gives four statistical measures of goodness of prediction with a local model of order zero (the local average model, LAM) or one (the local linear model, LLM). The local average or linear model is estimated in the first part of the time series, the so-called training set, and the predictions are made in the second part of the time series, the so-called test set. A number of different delays, embedding dimensions and number of neighbors for the local model can be given and the prediction can be computed for a number of lead times using the iterative prediction scheme. The following parameters can be specified:

 - fraction for test set (f) : any number between 0.1 and 0.9. The default is '0.5' meaning that the test set is the second half of the time series. Typically f should be smaller or equal to 0.5, so that the test set does not exceed in size the training set. 

- delay (t): any valid matlab format denoting an array of positive integers or a single positive integer. The default is '1'.

- embedding dimension (m) : any valid matlab format denoting an array of positive integers or a single positive integer. The default is '1' meaning that no state space reconstruction will be done and the points are simply the samples of the time series.

- number of neighbors (k) : any valid matlab format denoting an array of positive integers or a single positive integer. The default is '1' meaning that the computations of the goodness of prediction measures will be done for the simplest local model where the prediction is based on the mapping of the nearest neighbor of the target point in the training set. This prediction is called also for nearest neighbor prediction. 

- prediction time (h) : any valid matlab format denoting an array of positive integers or a single positive integer. The default is '1' meaning that the computations of the goodness of prediction measures will be done only for one-step ahead.

- truncation parameter (q) : any positive integer. This parameter actually determines the type of local model to be used. The default is '0' meaning that the local average model will be used. If qm the standard local linear model with OLS solution for the model parameters will be used. So, if the user wants to run this model for a number of embedding dimensions, just use q larger than the maximum of the selected embedding dimensions. If 0 < q  < m  the local linear model with the PCR regularized solution for the model parameters will be used. Care should be taken that the condition q < k (or m < k if q > m) always holds, otherwise the equation system is under-determined and there is no unique solution for the model parameters leading to unstable parameter estimates. The solution for the model parameters can be unstable (in the sense that the variance of the estimates is very large) also when q (or m if q > m) is at the level of k

The user can activate (check) any of the four statistical measures to save results for. If none is checked then results for all four measures will be saved and the measures will be included in the measure list. The statistical measures are the following:

- MSE : if checked the mean square error (MSE) measure of prediction in the test set for the specified parameters will be included in the list of measures. MSE is the sum of squares of prediction errors defined as

where xi+h is the actual sample and xi(h) is the h-step ahead prediction at current time i, for i=N1,...,N-h, N is the length of the time series and N1 is the length of the training set ( N1 = (1-f)N ). 

- NMSE : if checked the normalized mean square error (MSE) measure of prediction in the test set for the specified parameters will be included in the list of measures. NMSE is the MSE divided by the variance of the samples included in the sum of MSE, defined as

where is the mean of the samples in the sum

- NRMSE : if checked the normalized root mean square error (MSE) measure of prediction in the test set for the specified parameters will be included in the list of measures. NRMSE is the square root of NMSE.

- CC : if checked the correlation coefficient (CC) measure of prediction in the test set for the specified parameters will be included in the list of measures. CC is the standard Pearson correlation coefficient between the actual samples and the respective predictions.

Example: If the user selects this measure by activating the check box in the beginning of the measure line and sets for fraction for test set (f) '0.25', for the delay (t) '1', for embedding dimension (m) '5 10', for the number of nearest neighbors (k) '10 20', for prediction time (h) '1:3' and for truncation parameter (q) '20' (no truncation with PCR since q m), and checks only NRMSE then the NRMSE measure of Local Average or Linear Iterative Prediction is computed on a test set of length one forth of time series length for the combinations of the 2 values of m, 2 values of k and the 3 values of h and in the measure list the following measure names will appear

Loc_ItePreNRMSEt1m5k10q20h1
Loc_ItePreNRMSEt1m5k10q20h2
Loc_ItePreNRMSEt1m5k10q20h3
Loc_ItePreNRMSEt1m5k20q20h1
Loc_ItePreNRMSEt1m5k20q20h2
Loc_ItePreNRMSEt1m5k20q20h3
Loc_ItePreNRMSEt1m10k10q20h1    (these measure values should not be trusted as m = k)
Loc_ItePreNRMSEt1m10k10q20h2    (these measure values should not be trusted as m = k)
Loc_ItePreNRMSEt1m10k10q20h3    (these measure values should not be trusted as m = k)
Loc_ItePreNRMSEt1m10k20q20h1
Loc_ItePreNRMSEt1m10k20q20h2
Loc_ItePreNRMSEt1m10k20q20h3

 

OK

By pressing this button the window of "Nonlinear Model Measures" will disappear and the user will be moved to the "Select / run measures" window. Any changes in the measures and parameter values will be stored.
 

Cancel

Quit without doing anything and return to the "Select / run measures" window. Any changes in the measures and parameter values will be ignored.
 

Help

This file will be shown.