Download presentation
Presentation is loading. Please wait.
1
Parallelizing Dynamic Time Warping
+ Hello, everyone. My topic today is parallelizing Dynamic Time Warping. Jieyi Hu
2
During the summer, I went to Houston Texas and join the accelerator program of Rice University, being one of the co-found of a startup team. Our team is called SenseWatch and we make API for hand gesture recognition and vital sign interpretation.
3
our API collects sensor data from the wearable device, process the sensor data on the smartphone, recognize actionable information from the data, and interact with the users. But how? So without further ado, let’s get into it.
4
The core of this API is the algorithm we use to recognize hand gesture and interpret vital sign, which is mainly Dynamic Time Warping. Dynamic Time warping is a time series alignment algorithm developed originally for speech recognition. It is an algorithm for measuring similarity between two temporal sequences which may vary in time or speed. So if some people do their hand gestures faster and others do theirs slower, our API is capable of handling that.
5
Costly However, Dynamic Time Warping is extremely costly. The algorithm in general has an order of n square for time complexity. And with this complexity, it is really hard to make the hand gesture recognition in real time. Fortunately, there are ways to improve this algorithm. And one of them would be parallelizing it. However, we have to dive deep to know more about this algorithm so that we could understand if it is really possible to parallelize it and how to do so.
6
The whole process to recognize a hand gesture includes collection, normalization, calculation and comparison. Collection is getting sensor data from the wearable device, and transferring data to the smartphone into a fixed-length queue. The length of the queue should be the same to the length of the pre-defined gesture data, so we could do Dynamic Time Warping over them. Normalization is normalizing the input for Dynamic Time Warping, so that the output would be meaningful. Calculation is run Dynamic Time Warping, and find out the similarity between sensor data and predefined gesture data. And Comparison is comparing the similarities of all the predefined gesture data to determine if there are similar gestures are made and which one is the most similar.
7
25 Data Points Here, let’s use a simplified example of the real life version. For collection, We only collect the accelerometer data from the Microsoft Band and leave out the gyroscope data even though they would be usually included. Because of that, the data points are 3 dimensional, containing 3 integer representing the acceleration of x, y, z axis of the Band. Around 32 millisecond, one data point would come in from the Microsoft Band. And since the predefined gestures has 25 data points each, which is about 0.8 second one gesture. Therefore, the length of the queue for collection should be 25 as well. For the first 25 data points, we just append them to the queue, and every data point comes in after that, we remove the first one in queue, and append the latest one to the end.
8
fun normalize(float acc) -> Int { return round(acc * 10) }
In this case, we have 4 predefined gestures, they are normalized up front already for using less computation during run time. And normalization we used was this formula on top. So if original acceleration is , it would be 31 after the normalization.
9
Pre-defined Gestures Data
Deep copy of queue (Sensor Data) Pre-defined Gestures Data x y z x0 y0 z0 x1 y1 z1 . x23 y23 z23 x24 y24 z24 x y z x0 y0 z0 x1 y1 z1 . x23 y23 z23 x24 y24 z24 x y z x0 y0 z0 x1 y1 z1 . x23 y23 z23 x24 y24 z24 x y z x0 y0 z0 x1 y1 z1 . x23 y23 z23 x24 y24 z24 x y z x0 y0 z0 x1 y1 z1 . x23 y23 z23 x24 y24 z24 Before we run Dynamic Time Warping, we make a deep copy of the queue, and consider as a 3 by 25 2d array. So are the predefined gesture data. The structures of sensor data are the same to the structure of the predefined gesture data. Same dimension of 3 and same length of 25. The sensor data and one of the predefined gesture data would be the input of one execution of Dynamic Time Warping, and the output is going the a real number which represents the similarity between the sensor data, which is the gesture user just made, and the pre-defined gesture data, which is the gesture you set up before hand. We call the output of Dynamic Time Warping, the cost. The smaller the cost is, the more similar these two gestures are. And if the cost is 0, then the gestures are identical. DTW Similarity
10
***explain notation*** cost[0,0] is this cost[0,1] is this
y z x0 y0 z0 x1 y1 z1 . x23 y23 z23 x24 y24 z24 x y z x0 y0 z0 x1 y1 z1 . x23 y23 z23 x24 y24 z24 Cost[i, j] Cost[0, 0] ∞ . Cost[0, 1] Cost[1, 0] Cost[1, 1] Now, we can start to do Dynamic Time Warping. First, construct a cost table which is a n+1 by n+1 2d array, in this case, 26 by 26. And We use Cost[i, j] to denote the cell value. ***explain notation*** cost[0,0] is this cost[0,1] is this cost[1,0] is this cost[1,1] is this then we set the cell value of first row and the first column as infinite but cost[0,0] = 0 ***explain formula*** and we have this formula to find the cell value cost[i,j] is equal to dist[i-1,j-1] which is the distance between the two data point [i-1,j-1], and then plus min(cost[i-1,j-1], cost[i-1,j] and cost[i,j-1]) which is the minimum of the cell in front of the current cell by one, the cell on top of the current cell by one and the cell in front of as well as on top of the current cell by one. and for distance, we could use the first formula, or use euclidean distance which is the second one. ***explain cost[1,1]*** and cost[1,1] would be the distance between this data point and this data point and add the minimum of this this and this. And we repeat the calculation cell by cell and then row by row until we finish the whole cost table. And the final cost that represents the similarity between sensor data and predefined gesture data would be the minimum value among the last row and last column.
11
Pre-defined Gestures Data x y z
Deep copy of queue (Sensor Data) Pre-defined Gestures Data x y z x0 y0 z0 x1 y1 z1 . x23 y23 z23 x24 y24 z24 x y z x0 y0 z0 x1 y1 z1 . x23 y23 z23 x24 y24 z24 x y z x0 y0 z0 x1 y1 z1 . x23 y23 z23 x24 y24 z24 x y z x0 y0 z0 x1 y1 z1 . x23 y23 z23 x24 y24 z24 x y z x0 y0 z0 x1 y1 z1 . x23 y23 z23 x24 y24 z24 And we denote the final cost of the predefined gesture using C[K], K is the index for pre-defined gesture data, in this case, K is 0 to 3. However, there is error margin in the real life, we cannot expect the cost of the pre-defined gesture to be 0, in other word, we cannot expect user to make the gesture exactly the same every time. Therefore, we have thresholds for every predefined gestures. If the cost is smaller than the threshold, then this gesture is similar gesture meaning it is somewhat similar to user made gesture. However, we could have several similar gestures for one user made gesture. To identify the exact one that the user made, we need one more layer which is comparison to find out. For every gesture whose cost is smaller than its threshold, we calculate its similarity, which is the threshold minus the cost. For every gesture whose cost is greater than or equal to its threshold, its similarity is 0. And the pre-defined gesture that has the greatest similarity is the one that the user made. Cost[0] Cost[1] Cost[2] Cost[3] TH[0] TH[1] TH[2] TH[3] S[0] S[1] S[2] S[3] S[k] = (TH[k] - Cost[k]) if Cost[k] <= TH[k] else 0
12
P0 P1 P2 P3 Scatter G0,G4,G... G1,G5,G... G2,G6,G... G3,G7,G... O(n)
Broadcast Sensor Data O(log n) DTW S0,S4,S... S1,S5,S... S2,S6,S... S3,S7,S... O(n^2*n/p) Local Max local max reduce max[0,1] max[2,3] max[0,1,2,3] With knowing the whole process, I noticed there are two levels parallelizing could be made. The most obvious one would be parallelizing the calculation of similarities for all the predefined gestures. The parallelized version would be personalized one to all distributing predefined gesture evenly among process element, and then broadcasting the deep copy of the sensor data, and then calculating of similarities of distributed predefined gesture locally with sensor data using Dynamic Time Warping, followed by finding the index of predefined gesture whose similarity is the local max, and at last using reduce operation to find index of global max. In sequence, the whole process is order of n cube, because the dominant part is Dynamic Time Warping which is order of n square, and we have to do it for all the predefined gesture which is n. But in parallel, scatter is order of n, broadcast is log n, Dynamic Time Warping is n square for n over p gestures, which is n cube over p in total, and the local max and reduce would be log n. Therefore, the parallel version if order of n cube over p, which is somewhat improved.
13
Cost[i, j] Cost[0, 0] Cost[0, 1] Cost[1, 0] Cost[1, 1]
x y z x0 y0 z0 x1 y1 z1 . x23 y23 z23 x24 y24 z24 x y z x0 y0 z0 x1 y1 z1 . x23 y23 z23 x24 y24 z24 Cost[i, j] Cost[0, 0] ∞ . Cost[0, 1] Cost[1, 0] Cost[1, 1] The other level of parallelizing could be made is parallelizing the calculations of cost table. My first idea is to do it like Fibonacci sequence for the part of finding the minimum, but it is not straightforward like that. The relation in Fibonacci sequence between the current and previous ones are 1 dimensional, [i] is related to [i-1],[i-2], and [i-3]. But this one is two dimensional, which is that [i,j] is related to [i-1,j-1], [i-1,j] and [i,j-1]. Therefore, I haven’t figure it out yet. Hopefully, I would be able to talk about it more on presentation 2.
14
Thank you!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.