Presentation is loading. Please wait.

Presentation is loading. Please wait.

Yi Zhao1, Yanyan Shen*1, Yanmin Zhu1, Junjie Yao2

Similar presentations


Presentation on theme: "Yi Zhao1, Yanyan Shen*1, Yanmin Zhu1, Junjie Yao2"β€” Presentation transcript:

1 Forecasting Wavelet Transformed Time Series with Attentive Neural Networks
Yi Zhao1, Yanyan Shen*1, Yanmin Zhu1, Junjie Yao2 1Shanghai Jiao Tong University 2East China Normal University ICDM 2018

2 Outline Motivation Preliminaries Model Experiments Conclusion

3 Motivation Forecasting complex time series demands time-domain & frequency-domain information. e.g., stock prices, web traffic, etc. Various methods to extract local time-frequency features which are important to predict the future values. Fourier Transform Short-time Fourier Transform Wavelet Transform Use the varying global trend to identify the most salient parts of local time-frequency information to better predict the future values.

4 Preliminaries Problem Statement Wavelets
Given a time series 𝑿= π‘₯ 𝑑 βˆˆπ‘… 𝑑=1, 2, …, 𝑇}, predict π‘₯ 𝑇+𝑛 π‘›βˆˆ β„• + , the future value in time 𝑇+𝑛 via a function 𝑓 : π‘₯ 𝑇+𝑛 =𝑓(𝑿) Wavelets Given a basic wavelet function h(βˆ™), we can get the wavelets: β„Ž π‘Ž,𝜏 = π‘Ž β„Ž 𝑑 βˆ’πœ π‘Ž Continuous Wavelet Transform (CWT) The continuous wavelet transform refers to the β€œsimilarity” between the signal x(𝑑) and the basis function β„Ž π‘Ž,𝜏 βˆ™ : πΆπ‘Šπ‘‡ π‘₯ (𝜏, π‘Ž)= π‘Ž x(𝑑) β„Ž ( 𝑑 βˆ’πœ π‘Ž ) 𝑑𝑑

5 3. CNN feature extraction
Model Overview 1. Input time series LSTM π‘₯ 𝑇+𝑛 f_att(, W) 2. Scalogram 3. CNN feature extraction 4. Attention Module 5. Fusion & Prediction Preprocessing Given input time series 𝑿, we denote by πΆπ‘Šπ‘‡ π‘₯ (𝜏, π‘Ž) the wavelet transform coefficients matrix. The scalogram 𝑿 s is defined as follows: 𝑿 s = || πΆπ‘Šπ‘‡ π‘₯ (𝜏, π‘Ž)|| 2 Source: Wavelet Tutorial by Robi Polikar,

6 Model D π‘₯ 1 … … AttentionNet π‘₯ t … … … π‘₯ T … … …… C … … … D …
LSTM π‘₯ 1 β„Ž 1 … … β„Ž π‘‘βˆ’1 AttentionNet LSTM π‘₯ t … β„Ž 𝑑 … … β„Ž π‘‡βˆ’1 𝜢 𝟏 β„Ž 𝑇 LSTM π‘₯ T … … 𝜢 𝟐 …… C π‘₯ 𝑇+𝑛 … … 𝜢 π‘ͺβˆ’πŸ … D 𝜢 π‘ͺ … VGG output features Attention Module Fusion & Prediction

7 Model CNN: extract local time-frequency features
Feed scalogram 𝑿 s to a stack of convolution layers: 𝑿 𝑠 (𝑙) = πœ™( 𝑾 𝑠 𝑙 βˆ— 𝑿 s + 𝑏 𝑠 (𝑙) ) LSTM: learn global long-term trend and get hidden state 𝒉 𝑇 in the last step Attention module: discriminate the importance of local features dynamically Given time-frequency features 𝑿 𝑠 𝐿 = 𝒙 𝑖 π‘–βˆˆ[1, 𝐢]} and 𝒉 𝑇 Attention score: 𝑒 𝑖 = 𝑓 π‘Žπ‘‘π‘‘ 𝒙 𝑖 , 𝒉 𝑇 = π’˜ 𝑇 πœ™ 𝑾 π‘Ž 𝒙 𝑖 ; 𝒉 𝑇 + 𝒃 π‘Ž +𝒃; 𝛼 𝑖 = exp⁑( 𝑒 𝑖 ) π‘˜=1 𝐢 exp⁑( 𝑒 π‘˜ ) Weighted sum of local time-frequency features: 𝒛= 𝑖=1 𝐢 𝛼 𝑖 𝒙 𝑖 Fusion & Prediction: combine local and global features for prediction π‘₯ 𝑇+𝑛 = π’˜ 𝑝 𝑇 𝑓 𝑝 𝑧; 𝒉 𝑇 + 𝑏 𝑝 Objective Function Squared Loss: 𝑳 π‘™π‘œπ‘ π‘  = 𝑖=1 𝑁 ( π‘₯ 𝑑+𝑛 𝑖 βˆ’ π‘₯ 𝑑+𝑛 𝑖 ) 2 + πœ† ||𝑾|| 2

8 Datasets Stock opening prices Power consumption
Collected from Yahoo! Finance. Daily opening prices of 50 stocks among 10 sectors from 2007 to Each stock has 2518 daily opening prices. Daily opening prices from 2007 to 2014 are used as training data, and those in and 2016 are used for validation and testing, respectively. Power consumption Electric power consumption in one household over 4 years. Sampled at one-minute rate. 475,023 data points in year 2010.

9 Main Results Metric: Baselines
Mean Squared Error: 𝑀𝑆𝐸= 1 𝑁 𝑖=1 𝑁 ( π‘₯ 𝑑+𝑛 𝑖 βˆ’ π‘₯ 𝑑+𝑛 𝑖 ) 2 Baselines NaΓ―ve: take the last value in the series as the prediction value Ensemble of LSTM & CNN: feed the concatenation of features from VGGnet and the last hidden state from LSTM into the fusion and prediction directly.

10 Case Study Illustration of attention mechanism
Given an input of 20 stock prices, we show the scalogram, and the attention weights. The model attends to the local features that are similar to the global trend and helps in predicting the future value.

11 Conclusion Wavelet transform is able to explicitly disclose the latent components at different frequencies from a complex time series. We develop a novel attention-based neural network that leverages CNN to extract local time-frequency features and applies LSTM to capture the long-term global trend simultaneously. The experimental results on two real life datasets verify the usefulness of time-frequency information from wavelet transformed time series and the our method in terms of prediction accuracy.

12 THANK you! Q&A


Download ppt "Yi Zhao1, Yanyan Shen*1, Yanmin Zhu1, Junjie Yao2"

Similar presentations


Ads by Google