Time Series Prediction as a Problem of Missing Values Time Series Prediction as a Problem of Missing Values Application to ESTSP2007 and NN3 Competition Benchmarks Antti Sorjamaa and Amaury Lendasse Time Series Prediction and ChemoInformatics Group Adaptive Informatics Research Centre Helsinki University of Technology
Antti Sorjamaa - TSPCi - AIRC - HUT2/22 Outline Time Series Prediction vs. Missing Values Time Series Prediction vs. Missing Values Global methodology Global methodology –Self-Organizing Maps (SOM) –Empirical Orthogonal Functions (EOF) Results Results
Antti Sorjamaa - TSPCi - AIRC - HUT3/22 Missing Values 19? ??3 7? ?1?? 12?3?56 ?58?? ?21?20 Time ? ? ? ? ? ?? ??? 4950????
Antti Sorjamaa - TSPCi - AIRC - HUT4/22 Time Series Prediction vs. Missing Values Methods designed for finding Missing Values in temporally related databases Methods designed for finding Missing Values in temporally related databases Time series is such a database Time series is such a database Unknown future can be considered as a set of missing values Unknown future can be considered as a set of missing values Same methods can be applied
Antti Sorjamaa - TSPCi - AIRC - HUT5/22 Global Methodology Based on two methods –SOM Nonlinear projection / interpolation Nonlinear projection / interpolation Topology preservation on a low-dimensional grid Topology preservation on a low-dimensional grid –EOF Linear projection Linear projection Projection to high-dimensional output space Projection to high-dimensional output space Needs initialization Needs initialization
Antti Sorjamaa - TSPCi - AIRC - HUT6/22 SOM
Antti Sorjamaa - TSPCi - AIRC - HUT7/22 SOM Interpolation SOM learning is done with known data Missing values are left out Approach proposed by Cottrell and Letrémy (in Applied Stochastic Models and Data Analysis 2005)
Antti Sorjamaa - TSPCi - AIRC - HUT8/22 EOF Projection Based on Singular Value Decomposition (SVD) Based on Singular Value Decomposition (SVD) Only q Singular Values and Vectors are used Only q Singular Values and Vectors are used –q is smaller than K (the rank of X) –Larger values contain more signal than smaller
Antti Sorjamaa - TSPCi - AIRC - HUT9/22 EOF Projection (2) SVD cannot deal with missing values SVD cannot deal with missing values –Initialization is crucial! Decomposition with SVD and reconstruction Decomposition with SVD and reconstruction –q largest singular values and vectors are used in the reconstruction –Original data is not modified! –The selection of q using validation
Antti Sorjamaa - TSPCi - AIRC - HUT10/22 EOF Projection (3) 19? ??3 7? ?1?? 12?3?56 ?58?? ?21? Initialization 2. Round 1 3. Round 2 4. Round 3... n. Done!
Antti Sorjamaa - TSPCi - AIRC - HUT11/22 Global Methodology (2) Missing Data SOM EOF Data with filled values SOM grid size Number of EOF EOF iteration
Antti Sorjamaa - TSPCi - AIRC - HUT12/22 ESTSP2007 Competition Data ValidationLearning
Antti Sorjamaa - TSPCi - AIRC - HUT13/22 Results, Regressor size 11 EOF SOM SOM+EOF
Antti Sorjamaa - TSPCi - AIRC - HUT14/22 Results (2) EOF SOM SOM+EOF
Antti Sorjamaa - TSPCi - AIRC - HUT15/22 Prediction
Antti Sorjamaa - TSPCi - AIRC - HUT16/22 NN3 Competition Prediction of 111 time series Prediction of 111 time series Single, automatic, methodology for predicting all the series Single, automatic, methodology for predicting all the series Prediction of 18 values to the future for each series Prediction of 18 values to the future for each series All series rather short, which makes the prediction tricky All series rather short, which makes the prediction tricky Mean SMAPE of all series evaluated in the competition Mean SMAPE of all series evaluated in the competition
Antti Sorjamaa - TSPCi - AIRC - HUT17/22 Validation MSE = 0,1559 NN3: Long Series Validation MSE = 0,0076
Antti Sorjamaa - TSPCi - AIRC - HUT18/22 NN3: Short Series Validation MSE = 0,3493
Antti Sorjamaa - TSPCi - AIRC - HUT19/22 NN3: Validation Errors
Antti Sorjamaa - TSPCi - AIRC - HUT20/22 Summary Time Series Prediction can be viewed as a problem of Missing Values Time Series Prediction can be viewed as a problem of Missing Values SOM+EOF methodology works well, better than individual methods alone SOM+EOF methodology works well, better than individual methods alone –SOM projection is discrete –EOF needs sufficiently good initialization Methods complete each other
Antti Sorjamaa - TSPCi - AIRC - HUT21/22 Further Work Improvements to the methodology Improvements to the methodology The selection of singular values and vectors Convergence criterion Convergence criterion How to guarantee quick convergence? Applying the methodology to data sets from other fields Applying the methodology to data sets from other fields Climatology, finance, process data
22/22 Questions? Time Series Prediction as a Problem of Missing Values Problem of Missing Values Application to ESTSP2007 and NN3 Competition Benchmarks