Download presentation
Presentation is loading. Please wait.
1
Spatial and Temporal Databases Efficiently Time Series Matching by Wavelets (ICDE 98) Kin-pong Chan and Ada Wai-chee Fu
2
2 Table of Contents Introduction Related Works The Proposed Approach Overall Strategy Performance Evaluation Conclusion
3
3 Introduction Time-series: a sequence of real numbers, each number representing a value at a time point (financial data, scientific observation data, …) Time-series databases supporting fast retrieval of data and similarity query are desired
4
4 Introduction (cont) Similarity Search Finds data sequences that differ only slightly from the given query sequence Example) One may want to find all companies whose stock price fluctuations behave similarly with IBM during a year. Similarity matching process Given compute
5
5 Introduction (cont.) Indexing Dimensionality reduction Transformation is applied to reduce dimension Completeness Nature of data Effectiveness of power concentration of a particular transformation depends on the nature of the time series
6
6 Related Works Discrete Fourier Transform (Agrawal et al) Parseval’s theorem F-index may raise false alarm, but guarantee no false dismissal Disadvantage: misses the important feature of time localization
7
7 Related Works (cont.) Singular Value Decomposition: decompose a matrix X of size N*M into Restriction X is not updated X can be updated daily or monthly. In that case, SVD has to be recomputed the whole matrix again to update
8
8 The proposed Approach : Similarity Model Define new similarity model used in sequence matching
9
9 Proposed Approach : Haar Wavelet Haar wavelet Allows a good approximation with a subset of coefficients Fast to compute and requires little storage It preserves Euclidean distance
10
10 Proposed Approach : Haar Wavelet (cont) Example of Wavelet Computation Assume Original time sequence is f(x) = (9 7 3 5) 4(9 7 3 5) 1 (6) (2) 2 (8 4) (1 –1) Resolution Average Coefficients =6+2 =6-2 =8+1 =8-1 =4+(-1) =4-(-1)
11
11 Proposed Approach : Haar Wavelet (cont) Instead of storing 6,2,1 and -1, assume we store first two coefficient, 6 and 2 Reconstruction Process 4(8 8 4 4) 1 (6) (2) 2 (8 4) Resolution Average Coefficients (0 0) Original: (9 7 3 5), Reconstructed: (8 8 4 4) We can reduce dimension of the data with sacrificing the accuracy
12
12 Proposed Approach : DFT versus Haar (cont) Motivation of replacing DFT with DWT Pruning power: less false alarm appear in DWT than DFT Complexity consideration Complexity of Haar is O(n) while O(nlogn) for Fast Fourier Transform Note: DWT does not require massive index reorganization in case of update, which is a major drawback of SVD
13
13 Proposed Approach: Guarantee of no False Dismissal No qualified time sequence will be rejected, thus no false dismissal They show that this property holds for the Haar wavelet where
14
14 The Overall Strategy Pre-processing Similarity Model Selection: User can select Euclidean distance or v-shift similarity Haar wavelet transform is applied to time-series Index Construction Index structure such as R-tree is built using first few coefficients Range Query Nearest Neighbor Query
15
15 Experimental Results
16
16 Experimental Results (cont.) Scalability Test
17
17 Conclusion Efficient time series matching through dimension reduction by Haar wavelet transform Outperforms DFT in terms of pruning power, scalability and complexity
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.