Presentation is loading. Please wait.

Presentation is loading. Please wait.

Time Series Prediction as a Problem of Missing Values Time Series Prediction as a Problem of Missing Values Application to ESTSP2007 and NN3 Competition.

Similar presentations


Presentation on theme: "Time Series Prediction as a Problem of Missing Values Time Series Prediction as a Problem of Missing Values Application to ESTSP2007 and NN3 Competition."— Presentation transcript:

1 Time Series Prediction as a Problem of Missing Values Time Series Prediction as a Problem of Missing Values Application to ESTSP2007 and NN3 Competition Benchmarks Antti Sorjamaa and Amaury Lendasse Time Series Prediction and ChemoInformatics Group Adaptive Informatics Research Centre Helsinki University of Technology

2 Antti Sorjamaa - TSPCi - AIRC - HUT2/22 Outline Time Series Prediction vs. Missing Values Time Series Prediction vs. Missing Values Global methodology Global methodology –Self-Organizing Maps (SOM) –Empirical Orthogonal Functions (EOF) Results Results

3 Antti Sorjamaa - TSPCi - AIRC - HUT3/22 Missing Values 19?11762 4137??3 7?081221 102?1?? 12?3?56 ?58??11 9672906 3?21?20 Time4748 49 50 ? ? ? ? 424344454647434445464748 444546474849 454647484950 4647484950? 47484950?? 484950??? 4950????

4 Antti Sorjamaa - TSPCi - AIRC - HUT4/22 Time Series Prediction vs. Missing Values Methods designed for finding Missing Values in temporally related databases Methods designed for finding Missing Values in temporally related databases Time series is such a database Time series is such a database Unknown future can be considered as a set of missing values Unknown future can be considered as a set of missing values  Same methods can be applied

5 Antti Sorjamaa - TSPCi - AIRC - HUT5/22 Global Methodology Based on two methods –SOM Nonlinear projection / interpolation Nonlinear projection / interpolation Topology preservation on a low-dimensional grid Topology preservation on a low-dimensional grid –EOF Linear projection Linear projection Projection to high-dimensional output space Projection to high-dimensional output space Needs initialization Needs initialization

6 Antti Sorjamaa - TSPCi - AIRC - HUT6/22 SOM 1235 6 4 1 2 3 5 6 4

7 Antti Sorjamaa - TSPCi - AIRC - HUT7/22 SOM Interpolation SOM learning is done with known data Missing values are left out Approach proposed by Cottrell and Letrémy (in Applied Stochastic Models and Data Analysis 2005)

8 Antti Sorjamaa - TSPCi - AIRC - HUT8/22 EOF Projection Based on Singular Value Decomposition (SVD) Based on Singular Value Decomposition (SVD) Only q Singular Values and Vectors are used Only q Singular Values and Vectors are used –q is smaller than K (the rank of X) –Larger values contain more signal than smaller

9 Antti Sorjamaa - TSPCi - AIRC - HUT9/22 EOF Projection (2) SVD cannot deal with missing values SVD cannot deal with missing values –Initialization is crucial! Decomposition with SVD and reconstruction Decomposition with SVD and reconstruction –q largest singular values and vectors are used in the reconstruction –Original data is not modified! –The selection of q using validation

10 Antti Sorjamaa - TSPCi - AIRC - HUT10/22 EOF Projection (3) 19?11762 4137??3 7?081221 102?1?? 12?3?56 ?58??11 9672906 3?21?20195117624137553 75081221 1025155 1253556 5585511 9672906 35215201941176241376113 72081221 10231810 1253356 7582511 9672906 38211201941176241379213 74081221 10211912 1253356 9582511 9672906 38211201941176241379223 75081221 10211913 1243356 10582511 9672906 3821120 1. Initialization 2. Round 1 3. Round 2 4. Round 3... n. Done!

11 Antti Sorjamaa - TSPCi - AIRC - HUT11/22 Global Methodology (2) Missing Data SOM EOF Data with filled values SOM grid size Number of EOF EOF iteration

12 Antti Sorjamaa - TSPCi - AIRC - HUT12/22 ESTSP2007 Competition Data ValidationLearning

13 Antti Sorjamaa - TSPCi - AIRC - HUT13/22 Results, Regressor size 11 EOF SOM SOM+EOF

14 Antti Sorjamaa - TSPCi - AIRC - HUT14/22 Results (2) EOF SOM SOM+EOF

15 Antti Sorjamaa - TSPCi - AIRC - HUT15/22 Prediction

16 Antti Sorjamaa - TSPCi - AIRC - HUT16/22 NN3 Competition Prediction of 111 time series Prediction of 111 time series Single, automatic, methodology for predicting all the series Single, automatic, methodology for predicting all the series Prediction of 18 values to the future for each series Prediction of 18 values to the future for each series All series rather short, which makes the prediction tricky All series rather short, which makes the prediction tricky Mean SMAPE of all series evaluated in the competition Mean SMAPE of all series evaluated in the competition

17 Antti Sorjamaa - TSPCi - AIRC - HUT17/22 Validation MSE = 0,1559 NN3: Long Series Validation MSE = 0,0076

18 Antti Sorjamaa - TSPCi - AIRC - HUT18/22 NN3: Short Series Validation MSE = 0,3493

19 Antti Sorjamaa - TSPCi - AIRC - HUT19/22 NN3: Validation Errors

20 Antti Sorjamaa - TSPCi - AIRC - HUT20/22 Summary Time Series Prediction can be viewed as a problem of Missing Values Time Series Prediction can be viewed as a problem of Missing Values SOM+EOF methodology works well, better than individual methods alone SOM+EOF methodology works well, better than individual methods alone –SOM projection is discrete –EOF needs sufficiently good initialization  Methods complete each other

21 Antti Sorjamaa - TSPCi - AIRC - HUT21/22 Further Work Improvements to the methodology Improvements to the methodology The selection of singular values and vectors Convergence criterion Convergence criterion How to guarantee quick convergence? Applying the methodology to data sets from other fields Applying the methodology to data sets from other fields Climatology, finance, process data

22 22/22 Questions? Antti.Sorjamaa@hut.fi Lendasse@cis.hut.fi http://www.cis.hut.fi/projects/tsp Time Series Prediction as a Problem of Missing Values Problem of Missing Values Application to ESTSP2007 and NN3 Competition Benchmarks


Download ppt "Time Series Prediction as a Problem of Missing Values Time Series Prediction as a Problem of Missing Values Application to ESTSP2007 and NN3 Competition."

Similar presentations


Ads by Google