Download presentation
Presentation is loading. Please wait.
Published byAleesha Craig Modified over 8 years ago
1
Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Chien-Shing Chen Author : Jessica K. Ting Michael K. Ng Hongqiang Rong Joshua Z. Huang 國立雲林科技大學 National Yunlin University of Science and Technology Statistical Models for Time Sequences Data Mining
2
Intelligent Database Systems Lab Outline Motivation Objective Introduction Autoregression, Autocorrelation, Autocovariance Non-adaptive Statistical, Adaptive Statistical models Experimental Results Conclusions Personal Opinion Review N.Y.U.S.T. I.M.
3
Intelligent Database Systems Lab N.Y.U.S.T. I.M. Motivation The method like DFT(discrete Fourier transform) can only handle “whole matching”. Similarity between two time sequences is usually defined based on the similarity of the curve shapes, but it is too difficult to visually compare similarity between time sequences.
4
Intelligent Database Systems Lab Objective We can apply a clustering algorithm to the coefficients to cluster time sequences. We can also use the AR models to predict near future values. The coefficients of these AR models can be used as features to index subsequences to facilitate the query of subsequences with similar behaviors. N.Y.U.S.T. I.M.
5
Intelligent Database Systems Lab 1-1.Introduction Similarity between two time sequences is usually defined based on the similarity of the curve shapes. Our approach is to define a sliding windows (of different window sizes) over a time sequence and build autoregression models from the subsequences in diferent windows. N.Y.U.S.T. I.M.
6
Intelligent Database Systems Lab 1-2.Introduction N.Y.U.S.T. I.M.
7
Intelligent Database Systems Lab 2-1.Autogression Models Y t = ậ Y t-1 + t N.Y.U.S.T. I.M.
8
Intelligent Database Systems Lab 2-2.Autoregression Models simple regression: Yt = + Xt + t Y t = ậ Y t-1 + t N.Y.U.S.T. I.M.
9
Intelligent Database Systems Lab 2-3.autocovariance N.Y.U.S.T. I.M.
10
Intelligent Database Systems Lab 2-4.no false dismissals We need to show this indexing method guarantees no “false dismissals”., and are representations of x and y in the index space respectively, then the method can guarantee no “false dismissals”. N.Y.U.S.T. I.M.
11
Intelligent Database Systems Lab 2-5.distance between two series Let x = and y = be two data sequences of zero mean and 2-norm being equal to 1. X and y must be of exactly the same length N.Y.U.S.T. I.M.
12
Intelligent Database Systems Lab 2-6.distance between two series Let x = and y = be two data sequences of zero mean and 2-norm being equal to 1.Here m n and we assume m >= n. Let Vi=( y i,…, y i+n-1) for 1 =1 N.Y.U.S.T. I.M.
13
Intelligent Database Systems Lab 2-7.versus N.Y.U.S.T. I.M.
14
Intelligent Database Systems Lab 3-1.Indexing the time sequence We start to index the time sequence X=(x1,x2,……xn) We rescale it so that it is of zero mean and 2-norm being equal to 1. Then we fit AR models from the first order to higher orders for s until the descreasing rate of the modelling error is less than a specified tolerance. N.Y.U.S.T. I.M.
15
Intelligent Database Systems Lab N.Y.U.S.T. I.M.
16
Intelligent Database Systems Lab 3-2. Non-adaptive Statistical Model N.Y.U.S.T. I.M.
17
Intelligent Database Systems Lab 3-3. Adaptive Statistical Model The Adaptive statistical model which is modified from the non-adaptive model. N.Y.U.S.T. I.M.
18
Intelligent Database Systems Lab 3-4. non-adaptive statistical N.Y.U.S.T. I.M.
19
Intelligent Database Systems Lab 3-5. Adative statistical model N.Y.U.S.T. I.M.
20
Intelligent Database Systems Lab 3-6. Conception From this example, we find that it’s a good idea to extract features from subsequences by the adaptive statistical model since we can notice the change of model in the whole time sequence. The idea of adaptive statistical model is easy to be understood and it’s similar to the idea of non-adaptive N.Y.U.S.T. I.M.
21
Intelligent Database Systems Lab 4-1 AR models to predict near future values. Non-adaptive Statistical Model Adaptive Statistical Model N.Y.U.S.T. I.M.
22
Intelligent Database Systems Lab 4-2.Prediction If the order of AR model is the same and the distance between the features are not large (up to tolerance), then we can still use the AR model and the autocovariance function for the new data subsequence, and then we continue to add data point to test the AR model. Y[1:w]=( y 1, y 2, y 3,….., y w) Y[1:w+d]=( y 1, y 2, y 3,…., y w, y w+1, y w+2………. y w+d] N.Y.U.S.T. I.M.
23
Intelligent Database Systems Lab N.Y.U.S.T. I.M.
24
Intelligent Database Systems Lab N.Y.U.S.T. I.M.
25
Intelligent Database Systems Lab 5-1.Experimental Results 1.Statistical Models versus Fourier Transforms 2.Adaptive versus Non-adaptive Models N.Y.U.S.T. I.M.
26
Intelligent Database Systems Lab 5-1 Statistical Models versus Fourier Transforms We compare the performance of two methodology in clustering. 1. generate several sets of time sequences of known classes. 2. calculate the autocovariance function values and Fourier coefficients for each time sequence. 3. use the autocariance function value and the magnitude of Fourier coefficients as feature vectors for classification with a clustering algorithm. 4. compare the clustering results with the originally known classes, and calculate the classification accuracy. N.Y.U.S.T. I.M.
27
Intelligent Database Systems Lab 5-2 Statistical Models versus Fourier Transforms In real applications, many time sequences look like cosine curves (or sinusoidal curves). M is the number of cosine curves Each is the adjusted frequency component Ai is the associated amplitude of each frequency component is a noise function. N.Y.U.S.T. I.M.
28
Intelligent Database Systems Lab 5-3. adjusted frequency Given the frequency perturbation level and the frequency component, the adjusted frequency component is formulated by N.Y.U.S.T. I.M.
29
Intelligent Database Systems Lab 5-4. result N.Y.U.S.T. I.M.
30
Intelligent Database Systems Lab 6-1.predict by Adaptive versus Non-adaptive Models N.Y.U.S.T. I.M.
31
Intelligent Database Systems Lab 6-2. datas of experiment All of these time sequences are of length 750, which is approximately equal to 3 years trading days. Stock prices of Guangdong Investiment Ltd. Stock prices of Great Eagle Holdings Ltd. Stock prices of Wheelock Co.Ltd. N.Y.U.S.T. I.M.
32
Intelligent Database Systems Lab N.Y.U.S.T. I.M. 750, predict five data points 751,…,755.
33
Intelligent Database Systems Lab N.Y.U.S.T. I.M.
34
Intelligent Database Systems Lab 6-3. explain N.Y.U.S.T. I.M. 74,354,501,741, great change
35
Intelligent Database Systems Lab N.Y.U.S.T. I.M.
36
Intelligent Database Systems Lab N.Y.U.S.T. I.M.
37
Intelligent Database Systems Lab N.Y.U.S.T. I.M. non real adaptive
38
Intelligent Database Systems Lab 7.Concluding 1.the computational efficiency of calculating the autocovariance functions and AR models, which are capable to handle very large data volume 2.prediction capability 3.short indices N.Y.U.S.T. I.M.
39
Intelligent Database Systems Lab 8.Personal Opinion This method can be used in our lab’s. ex:classification,clustering,……. N.Y.U.S.T. I.M.
40
Intelligent Database Systems Lab 9.Review Time sequences Subsequences AR model:autocovariance, autocorrelate Non-adaptive, adaptive statistical model Prediction Statistical V.S. DFT Adaptive V.S. non-adaptive N.Y.U.S.T. I.M.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.