Presentation is loading. Please wait.

Presentation is loading. Please wait.

Time Series Filtering Time Series

Similar presentations


Presentation on theme: "Time Series Filtering Time Series"— Presentation transcript:

1 Time Series Filtering Time Series 1 5 9 2 6 10
Matches Q11 Time Series 1 5 9 2 6 10 Given a Time Series T, a set of Candidates C and a distance threshold r, find all subsequences in T that are within r distance to any of the candidates in C. 3 7 11 4 8 12 Candidates

2 Filtering vs. Querying Query Database Database 1 5 9 6 1 2 6 10 2 7 8
(template) Database Database Matches Q11 Best match 1 5 9 6 1 2 6 10 2 7 8 3 3 7 11 9 4 4 8 12 5 10 Queries Database

3 Euclidean Distance Metric
Given two time series Q = q1…qn and C = c1…cn , their Euclidean distance is defined as: 10 20 30 40 50 60 70 80 90 100 Q C

4 Early Abandon During the computation, if current sum of the squared differences between each pair of corresponding data points exceeds r 2, we can safely stop the calculation. 10 20 30 40 50 60 70 80 90 100 calculation abandoned at this point Q C

5 Classic Approach Time Series
1 5 9 2 6 10 Individually compare each candidate sequence to the query using the early abandoning algorithm. 3 7 11 4 8 12 Candidates

6 Wedge Having candidate sequences C1, .. , Ck , we can form two new sequences U and L : Ui = max(C1i , .. , Cki ) Li = min(C1i , .. , Cki ) They form the smallest possible bounding envelope that encloses sequences C1, .. ,Ck . We call the combination of U and L a wedge, and denote a wedge as W. W = {U, L} A lower bounding measure between an arbitrary query Q and the entire set of candidate sequences contained in a wedge W: C1 C2 U W L U L W Q

7 Generalized Wedge Use W(1,2) to denote that a wedge is built from sequences C1 and C2 . Wedges can be hierarchally nested. For example, W((1,2),3) consists of W(1,2) and C3 . C1 (or W1 ) C2 (or W2 ) C3 (or W3 ) W(1, 2) W((1, 2), 3)

8 H-Merge Time Series 1 5 9 Compare the query to the wedge using LB_Keogh If the LB_Keogh function early abandons, we are done Otherwise individually compare each candidate sequences to the query using the early abandoning algorithm 2 6 10 3 7 11 4 8 12 Candidates

9 Hierarchal Clustering
W3 W2 W5 W1 W4 W(2,5) W(1,4) W((2,5),3) W(((2,5),3), (1,4)) K = 5 K = 4 K = 3 K = 2 K = 1 C3 (or W3) C5 (or W5) C2 (or W2) C4 (or W4) C1 (or W1) Which wedge set to choose ?

10 Which Wedge Set to Choose ?
Test all k wedge sets on a representative sample of data Choose the wedge set which performs the best

11 Upper Bound on H-Merge Worst case
Wedge based approach seems to be efficient when comparing a set of time series to a large batch dataset. But, what about streaming time series ? Streaming algorithms are limited by their worst case. Being efficient on average does not help. Worst case C1 (or W1 ) C2 (or W2 ) C3 (or W3 ) W(1, 2) Subsequence W((1, 2), 3)

12 ? Triangle Inequality If dist(W((2,5),3), W(1,4)) >= 2 r < r
Subsequence W3 W2 W5 W1 W4 W3 W(2,5) W1 W4 W3 W(2,5) W(1,4) W((2,5),3) W((2,5),3) < r W(((2,5),3), (1,4)) >= 2r ? W(1,4) K = 5 K = 4 K = 3 K = 2 K = 1 W(1,4) cannot fail on both wedges fails

13 Experimental Setup Datasets
ECG Dataset Stock Dataset Audio Dataset We measure the number of computational steps used by the following methods: Brute force Brute force with early abandoning (classic) Our approach (H-Merge) Our approach with random wedge set (H-Merge-R)

14 Experimental Results: ECG Dataset
Batch time series 650,000 data points (half an hour’s ECG signals) Candidate set 200 time series of length 40 r = 0.5 Algorithm Number of Steps brute force 5,199,688,000 classic 210,190,006 H-Merge 8,853,008 H-Merge-R 29,480,264 x 10 9 6 brute force 5 4 Number of Steps 3 2 1 classic H-Merge H-Merge-R Algorithms

15 Experimental Results: Stock Dataset
Batch time series 2,119,415 data points Candidate set 337 time series with length 128 r = 4.3 Algorithm Number of Steps brute force 91,417,607,168 classic 13,028,000,000 H-Merge 3,204,100,000 H-Merge-R 10,064,000,000 brute force x 10 10 10 9 8 7 Number of Steps 6 5 4 3 classic 2 H-Merge-R H-Merge 1 Algorithms

16 Experimental Results: Audio Dataset
Batch time series 46,143,488 data points (one hour’s sound) Candidate set 68 time series with length 101 r = 4.14 Sliding window 11,025 (1 second) Step 5,512 (0.5 second) Algorithm Number of Steps brute force 57,485,160 classic 1,844,997 H-Merge 1,144,778 H-Merge-R 2,655,816 brute force x 10 7 6 5 4 Number of Steps 3 2 1 H-Merge-R classic H-Merge Algorithms

17 Experimental Results: Sorting
Wedge with length 1,000 Random walk time series with length 65,536 r = 0.5 r = 1 r = 2 r = 3 Sorted 95,025 151,723 345,226 778,367 Unsorted 1,906,244 2,174,994 2,699,885 3,286,213


Download ppt "Time Series Filtering Time Series"

Similar presentations


Ads by Google