Download presentation
Presentation is loading. Please wait.
Published byGerard Nelson Modified over 9 years ago
2
COMP 5331 Project 26-11-2015 1
3
Roadmap I will give a brief introduction (e.g. notation) on time series. Giving a notion of what we are playing with. I will talk about the similarity (i.e. distance measure) between two time series and what invariance are. Then I will formulate a problem on music data and claim that the existing methods of distance measure are not good enough. Finally, I will introduce a novel method of distance measure. Hope I can make all of them in 15 mins! 2
4
What are Time Series? It is a collection of observations made sequentially in time. It is an ordered list of real-valued numbers: T=t 1,t 2,…,t m. measurement Sample rate 1 per day 40 Hz (40 times in a minute) Length of T = |T| = m 3 time Can be anything scalar vector
5
A subsequence T i,k of time series T is a shorter time series of length k. The y axis (i.e. time) is not important. The measurement is always taken in consistent sample rate. Only the order is important! Length = k There are m-k+1 subsequence with length = k in T Only store the x-value Sliding window 4
6
Some data obviously are time series. Blood pressure of my grandma OVER time Salary of my dad OVER time My GPA OVER time 5 But time series is not necessary related to time
7
Others are not. But we can convert them! Outlook Cut and Stretch An Algorithm to Convert Image to Time Series 1.Compute the central point 2.Arbitrary choose a starting point o in the contour 3.Calculating the distance between the central point and its contour. Start from the o, go around and finish at o 6
8
050100150200250300350400450 0 0.5 1 Handwriting data 7 Outlook
9
gtttatgtagcttaccccctcaaagcaatac actgaaaatgtttcgacgggtttacatcacc ccataaacaaacaggtttggtcctagccttt ctattag... An Algorithm to Convert DNA to Time Series for some complex object such as protein, we can use enzyme to break it into its linear building blocks, peptide. 8 Mentioned by group 2
10
MFCC Mel Frequency Cepstral Coefficient It is a speech analysis method based on human perception experiments. It concentrates on only certain frequency components. 9 MFCC audio A vector with 13 numbers I use the first coefficient to do the experiment
11
What can we ask in time series Clustering Classification Motif Discovery (repeated pattern) 10 On sub-sequences in a long time series stream On different individual time series
12
Rule Discovery Query 20040060080010001200 0 We need to define similarity between two time series! 11 For classification algorithm in time series, simple nearest neighbor classification performs VERY well. The choice of distance measure is important
13
Similarity Distance function, D(A,B) We want to have these property D(A,B) = D(B,A) Symmetry D(A,A) = 0 Constancy D(A,B) = 0 iff A= B Positivity D(A,B) D(A,C) + D(B,C) Triangular Inequality (not essential but better to have, easy to find the lower bound and do indexing) 12
14
Triangular Inequality helps us to compute the lower bound of some distance measure. Lower bound Compute the actual measure is expensive while compute the lower bound is cheap. So, we always compute the cheap lower bound first. Having the lower bound, we can check whether the computation of the actual measure is necessary or not for the current task. By using this strategy, we can save a lot of works. 13
15
Example Our task is find the nearest point to the query point, Q. Visit a. Calculate the actual measure. The result is 2. 2 is our best-so-far answer. Visit b. Calculate the actual measure. The result is 7.81. 2 remains our best-so-far answer. Visit c. Only need to calculate the lower bound measure. D(Q,b) D(Q,c) + D(b,c) 5.51 D(Q,c) Since the lower bound of D(Q,c) is worse than the best-so-far, we don’t need to compute the actual D(Q,c). We have computed the pairwise distance of all the data points (i.e. a, b, c) 14
16
Euclidean Distance ED Given two time series of length n, Q=q 1,q 2,..,q i,…,q n C=c 1,c 2,…c i,…c n 15 Q C We can do a little bit speed up by using
17
Normalization ED is sensitive to distortion. We need to do normalization for the time series. Amplitude and Offset invariance, which can be solved by z- normalization 16 050100150200250300 01002003004005006007008009001000 The green time series has a greater amplitude than the blue one
18
Linear trend invariance Noise invariance 17 020406080100120140160180200 -4 -2 0 2 4 6 8 10 12 020406080100120140 -4 -2 0 2 4 6 8 Find the Least Square line: L=At 2 +Bt+C Compute the discrete points on L: L=l 1, l 2,…,l n The new time series: T’=t 1 -l 1,t 2 -l 2,…,t n -l n Create an window There are n points in the window. Compute the mean value of these n points Create a new point with this mean value and remove these n points Go to another set of n points
19
Attention ! Different domains require different invariances Cardiology (heart) data requires invariance to the mean value Some invariance should not be considered in specific domain Adding rotation invariance would make it impossible to distinguish the shapes `p’ and `d’, In music data, I think the following invariances are important Local scaling invariance Occlusion invariance Uniform scaling invariance 18 Three
20
Local Scaling Invariance We think these two time series are similar It is because they have the same component of parts. Each of them first has a static part, a peak, a static part, a valley and finally a static part. But all have different length (i.e. no. of sample points) ED returns a poor result because the peak and the valley are in different positions. It can be solved by Dynamic Time Warping (DTW). 19
21
DTW The “one to many” property allows similar shapes to match even they are out of phase and have different length 20 ED One to one DTW One to many The one to many property of DTW makes it perform better than ED But this property also makes it computational expensive.
22
21 i i+2 i time jsjs isis m 1 n1 Time Series B Time Series A pkpk psps p1p1 The red dots in the matrix is the path P=p 1,p 2,…,p s,…p k p s =(i s,j s ) which shows alignment of points between A and B The path is found by using dynamic programming in order to minimize the total distance between them
23
Occlusion Invariance Some part of one time series is missing. It can be handled by variant of DTW, which have the extra ability to ignore sections that are difficult to match (possibility with some penalty) For example, for the valley in the lower time series, DTW cannot find any corresponding part (nearby) in the upper time series. 22
24
Uniform Scaling Invariance We can easily see that the right time series is just rescale of the left one (rescale rate =2X) If we uniform scale (US) the left one by a factor of 2. ED will give a good result. For simplicity, I call the operation US+ED simply as US However, we do not know the factor before head. We are forced to testing all possible factor. 23
25
DTW is not generalization of uniform scaling Sometimes, US performs better than DTW First, we must understand what is rescale. For example, we rescale a time series T={0,3,2,1} from |T|=4 to 10 and form a new series called t’={0, 1, 2, 3, 2.67, 2.3, 2, 1.6, 1.3, 1} where the red points are the original points while the blue points are the created intermediate points. 24
26
Suppose there are two time series A={0, 3, 2, 1} B={0, 1, 2, 3, 2.67, 2.3, 2, 1.6, 1.3, 1} If we rescale A from length 4 to 10 and form A’={0, 1, 2, 3, 2.67, 2.3, 2, 1.6, 1.3, 1} ED(A’,B)=0 DTW(A,B)=|0-0| + |1-0| + |2-3| + |3-3| + |2.67-3| + |2.3-2| + |2-2| + |1.6-2| + |1.3-1| + |1-1|=3.33 Using US matches our intuition. 25 A B
27
So there is a compounded method of first using US and then DTW. It is called SWM, which stands for Scaled and Warped Matching. 26 By Uniform ScalingBy DTW
28
Why I think these three important? Imagine you are beginner in playing piano. You may not possibly play some notes if there are so many of them appear in short period of time. Occlusion Invariance You may have a little bit tempo (speed of music) difference in some bars. Local scaling invariance There are several parts in a music piece. You may play the first part faster than that of the other and play the second part same as that of the other. Uniform scaling invariance 27
29
Experiment Data: 2 music pieces of the exactly same content but with different length. Content: A segment extracted from my favourite song, Red Shadow in the Candle Light ( 燭影搖紅 ), performed by me. Slower one: 12s Faster one: 9s 28
30
I use the MFCC to convert a music piece into a matrix with 13 rows. I only use the first row in the analysis. 29 Slower one Faster one 913 1203 I call this time series as candle_fast I call this time series as candle_slow
31
ED, DTW, US, SWM Since candle fast and candle slow has different lengths I extract the first 913 points from candle slow to form a new time series called candle slow _pruned. I rescale the candle_fast to have the same length with candle slow and form a new time series called candle fast _lengthen ED(candle fast, candle slow _pruned)=2531 DTW(candle fast, candle slow )=662 US(candle fast, candle slow )= ED(candle fast _lengthen, candle slow )= 3965 SWM(candle fast, candle slow )=1260 I original expect SWM should be the best. It may because the longer one cannot be obtained by uniform scaling from the short one. 30
32
However, even SWM is not good enough They have similar length For us, they look similar with the aiding of color The left time series are formed by four non-overlap subsequences. The right time series are formed by the same set of subsequences, with the same ordering, but with different scaling factor. ED performs worse US perform worse. US can only “take care” one subsequence by choosing the suitable scaling factor DTW perform better 31
33
If we know ahead that the two time series formed by the same set of subsequence but with different scaling factor. We can separate the time series and do a uniform scaling on each part. Piecewise uniform scaling After the piecewise uniform scaling, do the “Scaled and Warped Matching” Observe that: There are four, this may implies that the two time series are formed by same set of 4 subsequence. 32 DTW
34
For music data, the story is much simpler. For different parts, there are usually a rest between them. But using this information, we can separate the time series and do the piecewise uniform scaling on it. 33
35
Thank You Question and Comments And get the Coupon 34
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.