Download presentation
Presentation is loading. Please wait.
Published byBella Griggs Modified over 10 years ago
1
Online Event-driven Subsequence Matching over Financial Data Streams Huanmei Wu,Betty Salzberg, Donghui Zhang Northeastern University, College of Computer & Information Science Presented by : Evangelos Kanoulas
2
NU CCIS SIGMOD 2004 Motivation (1) An incoming stream of stock market data Analyze it and do Trend prediction Pattern recognition Dynamic clustering of multiple data streams Rule discovery Subsequence matching is the main component
3
NU CCIS SIGMOD 2004 Motivation (2) Subsequence similarity over financial data streams has its unique properties Zigzag shape of piecewise linear representation (PLR) Relative position of end points is important Price change (amplitude) is more important than time interval 1 2 4 3 5 time 1 2 4 3 5 S1S1 S2S2 Price time S1S1 S2S2 S3S3
4
NU CCIS SIGMOD 2004 Outline Motivation 2. Data Stream Processing 3. Subsequence Matching 4. Trend Prediction 5. Performance 6. Conclusion
5
NU CCIS SIGMOD 2004 Data Stream Processing (1) Aggregation and Smoothing Incoming data arrives at any time Piecewise Linear Representation requires a unique value for each time interval Aggregation of the raw data Smoothing of the aggregated values using the moving average
6
NU CCIS SIGMOD 2004 Data Stream Processing (2) Segmentation PLR may not be in a zig-zag shape The end points of the PLR should be points at which the trend changes dramatically All other points are considered as noise and should be eliminated aggregated data stream
7
NU CCIS SIGMOD 2004 Data Stream Processing (3) %b data stream : the base for linear segmentation Why use %b (Bollinger Band Percent)? 1.%b is a widely used financial indicator 2.%b has a smoothed moving trend similar to the aggregated data stream 3.%b is normalized value, most values are between -1 and 2 Uniform segmentation criteria aggregated data stream %b data stream
8
NU CCIS SIGMOD 2004 Data Stream Processing (4) Segmentation over %b t Price (x) Sliding Window 1 2 3 5 6 7 8 9 10 11 12 4 13 In the current sliding window, where P j (X j,t j ) is the current point, P i (X i, t i ) is an upper end point if, X i = max ( X values of the current sliding window ) X i > X j + ( where is the given error threshold ) P i (X i, t i ) is the last one satisfying the above two conditions PiPi PjPj
9
NU CCIS SIGMOD 2004 Data Stream Processing (5) Two Step Pruning a.Filter step on %b streams b.Refine step on the raw sequence stream to eliminate false positives t4t4 t0t0 t1t1 t2t2 t 3 Agg. Stream %b stream δpbδpb price δpdδpd t3t3 t0t0 t1t1 t2t2 t4t4 t5t5 t δpbδpb
10
NU CCIS SIGMOD 2004 Outline Motivation Data Stream Processing 3. Subsequence Matching 4. Trend Prediction 5. Performance 6. Conclusion
11
NU CCIS SIGMOD 2004 Subsequence Similarity (1) Event-driven subsequence matching Identifying a new potential end point triggers a subsequent matching search The search algorithm finds subsequences in the historical data similar to a query subsequence The query subsequence consists of the most current n end points Price t t 5 t 6 t 7 t 8 t 9 t 10 t 11 t 12 t 13 t 14 …… t 37 t 38 t 39 t 40 1 2 3 4
12
NU CCIS SIGMOD 2004 Subsequence Similarity (2) New similarity measure S = {(X 1, t 1 ), (X 2, t 2 ), …, (X n, t n )} S' = {(X 1 ', t 1 '), (X 2 ', t 2 '), …, (X n ', t n ')} S and S' are similar if they satisfy the following two conditions : The relative position of S and S' end points is the same d(S, S') <, where d(S, S') = ( * ||(X i+1 - X i )| - |(X i+1 ' - X i ')|| + * |(t i+1 - t i ) - (t i+1 ' - t i ')|) where,, 0 are user defined parameters
13
NU CCIS SIGMOD 2004 Subsequence Similarity (3) Subsequence Permutation S = {(X 1, t 1 ), (X 2, t 2 ), …, (X n, t n )} S = { [(X 1, t 1 ), (X 3, t 3 ), …, (X n-1, t n-1 )], [(X 2, t 2 ), (X 4, t 4 ), …, (X n, t n )] } S = {[(X i 1, t i 1 ), (X i 3, t i 3 ), …, (X i (n-1), t i (n-1) )], [(X i 2, t i 2 ), (X i 4, t i 4 ), …, (X i n, t i n )] } Separate upper and lower points Sort separately based on X values {i 1, i 3, …, i (n-1), i 2, i 4, …, i n } Get the subsequence permutation
14
NU CCIS SIGMOD 2004 Outline Motivation Data Stream Processing Subsequence Matching 4. Trend Prediction 5. Performance 6. Conclusion
15
NU CCIS SIGMOD 2004 Trend prediction Subsequence matching application Trend-K at a point p measures the change of the price to the next k points Three trends: UP, DOWN, NOTREND Price t t 5 t 6 t 7 t 8 t 9 t 10 t 11 t 12 t 13 t 14 …… t 37 t 38 t 39 t 40
16
NU CCIS SIGMOD 2004 Outline Motivation Data Stream Processing Subsequence Matching Trend Prediction 5. Performance 6. Conclusion
17
NU CCIS SIGMOD 2004 Performance (1) Similarity measure 70 65 60 55 50 45 40 35 30 Perm+AmpAmp OnlyPerm OnlyPerm+EucEuc Only Correctness %
18
NU CCIS SIGMOD 2004 Performance (2) Event–driven vs. Fixed time periods Correctness % 70 65 60 55 50 45 40 35 30 Event-driven FT1FT5FT10FT15FT25FT30FT20 Relative CPU cost 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% Event-driven FT1 FT5 FT10 FT15 FT25 FT30FT20
19
NU CCIS SIGMOD 2004 Outline Motivation Data Stream Processing Subsequence Similarity Trend Prediction Performance 6. Conclusion
20
NU CCIS SIGMOD 2004 Conclusion Proposed an online segmentation and pruning algorithm Defined an alternative similarity subsequence measure Introduced an event-driven online similarity matching algorithm Achieved 70% correct predictions using real world data
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.