Download presentation
Presentation is loading. Please wait.
Published byMarianna Hall Modified over 9 years ago
1
University of Macau Discovering Longest-lasting Correlation in Sequence Databases Yuhong Li yb27407@umac.mo Department of Computer and Information Science University of Macau, Macau
2
2 University of Macau ■ Typical Analysis Query LCS: Motivation Find the most correlated stock to GOOG for every 3 months in 2008 - 2011? It is hard to define a proper length. 0 +1 Perfect negative correlation Perfect positive correlation No correlation
3
3 University of Macau ■ Longest-lasting Correlated Subsequences LCS: Motivation
4
4 University of Macau Baseline Solution … … … Sequence database … … os=0 os=1
5
5 University of Macau Challenges … sequence 1, …, sequence n
6
6 University of Macau Main Idea ■ Time Series are Long Dimensionality Reduction Thousands of dimensions. Dimensionality Reduction obeys upper bounding lemma. ■ Huge Search Space Batch Pruning Group similar subsequences ■ Unpruned Subsequences Further Refinement Intra-object grouping Inter-object grouping Correlation computing costs O(m) Raw subsequences, dim = m PAA representation, dim = 3 Correlation computing costs O(3)
7
7 University of Macau LCS: Diamond Cover Index ■ Intra-object grouping Grouping similar subsequences in a sequence object.... PAA feature space minDist
8
8 University of Macau LCS: Diamond Cover Index ■ Inter-object Grouping Exploiting Similarity between Sequence Objects. Grouping the diamond MBRs of different objects into higher level MBRs Compact MBRs. DCI is the collection of the compact MBRs. Memory efficient. Offer good pruning ability.
9
9 University of Macau LCS: Subsequence Refinement minDist
10
10 University of Macau LCS: Experimental Evaluation ■ Programming Language: C++ Machine: Ubuntu 12.04, 4GB RAM ■ Datasets RAND: Random generate sequences. STOCK: 2187 quoted companies in NYSE from 2008 to 2012. TAO: Sea surface temperatures, 28399 sequences of length 1008.
11
11 University of Macau SOTA: state-of-the-art method in distance calculation. SKIP: incremental correlation computation. SOTA+DCI, SKIP+DCI: DCI version of SOTA and SKIP respectively. Stock DatasetTAO Dataset LCS: Experimental Evaluation At least one order of magnitude faster than SOTA adaption.
12
12 University of Macau Thanks QA inputhiddenoutput
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.