Xin Tong, Teng-Yok Lee, Han-Wei Shen The Ohio State University

Xin Tong, Teng-Yok Lee, Han-Wei Shen The Ohio State University
Salient Time Steps Selection from Large-Scale Time-Varying Data Sets with Dynamic Time Warping Hello, everyone. I am going to present our work “Salient Salient Time Steps Selection from Large-Scale Time-Varying Data Sets with Dynamic Time Warping” by Xin Tong, Teng-Yok Lee, and Han-Wei Shen, from The Ohio State University Xin Tong, Teng-Yok Lee, Han-Wei Shen The Ohio State University

Problem Large-scale simulations often generate a large number of time steps Visualizing all time steps simultaneously is difficult Looking at a subset by simple heuristics, e.g. uniform sampling, can be problematic The sampling time steps can be either redundant or insufficient Large-scale simulation data generates data over a large number of time steps. It is difficult to visualize all time steps simultaneously. Then, scientist would like to reduce data size and only look at a subset of it. Traditionally, people use simple heuristics to sample the data. For example we can use uniform sampling and retrieve one sample for every 10 or 100 time steps. However, the uniform sampling can be problematic, because the results can be either redundant or insufficient.

An Example of Uniform Sampling
20 time steps from an astrophysics simulation Uniform Samples: 2 8 14 20 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 2 9 12 20 Non-uniform Samples: Notes: Here is an example of uniform sampling. We have a time varying data containing 20 time steps. We visualize the data by iso-surfaces of the same value. If we want to display a subset of 4 time steps, which 4 we want to choose? If we use uniform sampling and choose one for every 6 time steps, redundant information is included, because the two adjacent samples time step 14 and 20 are both plane. However, if we use non-uniform samples, and choose time step 2, a plane; time step 9, a ring; time step 12, ring and its connected components; and time step 20, a plane, we can have more interesting feature, like the cylinder shape isosurface in time step 12, and no redundant time steps.

Approach Select salient/key time steps to represent distinct time intervals of different lengths in the data Use an efficient dynamic programming algorithm to find the optimal key time steps with the lowest mapping cost Use Dynamic Time Warping (DTW) to evaluate the mapping cost Design a visualization system to interactively explore different time intervals with different number of salient time steps “Our goal is to select the salient time step from the time varying dataset. These salient time steps can represent distinct time intervals of different lengths. “ “To achieve this goal, we propose an efficient algorithm that can find the globally optimal key time steps with lowest mapping cost based on dynamic programming.” We use Dynamic Time Warping to evaluate the mapping cost. “Based on the efficient algorithms, we also develop a visualization system. This system allow the scientists to interactively explore different time intervals with different number of time steps on the fly.” Time series T Subset R

Algorithm Overview Idea Input Output T T Compare R1 R2
Treat the input time varying data as one time series T For each possible subset R of T Map T to R and measure the similarity between T and R after the mapping Pick the R that is the most similar to T Input A user-specified number of time intervals Dissimilarity measurement between any two time steps Output The selected time intervals and the representing time steps Notes (Animation would be helpful): Let’s have an overview of our approach. Say given a time varying data, we treat it as a time series T. There are multiple possible subset time series R from T. For example on this slide, there are two subsets R1 and R2 from T. We try to compare each R with the original time series T. And pick the one R that is most similar to T. The input of the approach is the user-specified number of time intervals and the pairwise dissimilarity between any two time steps in time series T. After applying our approach to this input, we will get the output of the selected time intervals and the representing key time steps.

DTW: Dynamic Time Warping
x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 Time series X Time series Y y1 y2 y3 y4 y5 y6 y7 y8 y9 y10 Nonlinear Mapping Measure the similarity between two time series X & Y after the optimal nonlinear mapping The mapping cost can be computed as: DTW cost: the minimal mapping cost Notes: Since we use DTW cost to measure the similarity between two time series, let us review what DTW is. The DTW algorithm is a dynamic programming algorithm to solve the optimal mapping. Its mapping cost measures the similarity between two time series X and Y after the optimal nonlinear mapping. In the non-linear mapping, an element on one time series can map onto one or more than one elements on the other time series. For example, we have two time series X and Y. The nonlinear mapping between them is as the purple lines. The two elements at the two ends of the purple line have their dissimilarity D. If we sum up the dissimilarities in all the purple mappings, we can get the mapping cost, C. There are multiple choice of the mapping between X and Y. The one mapping giving the lowest mapping cost is the optimal mapping. And the cost is DTW cost C*. 𝐶= 𝑖=1 𝐿 𝐷 𝑥 𝑛 𝑖 , 𝑦 𝑚 𝑖 𝐶 ∗ = min 𝑃∈ℙ 𝑖=1 𝐿 𝐷 𝑥 𝑛 𝑖 , 𝑦 𝑚 𝑖

DTW and Time Step Selection
Segments Time series T Subset R Our goal: given the desired number of time steps k, find the subset R with the minimal DTW cost Constraints to R Each key time step must map to itself Each time step in T maps to only one in R T must be mapped to R in order Notes: Let’s go back to the problem of key time steps selection. Our goal is that given the desired number of time steps k, we want to find the subset R with the minimal DTW cost by mapping to time series T There are multiple choice of R, and it is very challenging to search the optimal subset R. Based on the special relationship between the two time series T and R, we put three reasonable constrains on the mapping. The first constraint is that the key time step only maps to itself, because key time step can represent itself. The second constraint is that each time step in T maps to only one time step in R, because each time step in T has only one representing key time step in R. The third constraint is that T must be mapped to R in order, which means no crossing is allowed for the mapping in each segment. This help to keep the temporal coherence for time varying data.

DTW and Time Step Selection (cont’d)
Finding R is essentially equal to dividing T into k – 1 segments Within each segment, DTW is solved independently Segments Subset R m Time series T p With these three constrains, we can divide the time series T into (k-1) segments by the time steps in R. Then, we can solve the DTW mapping in each segment independently. For example in the time interval [m,p]. The time steps in [m, p] map to either the left or the right boundary of the segment. Then, after the mapping in all the segments are solved, the mapping between T and R is also solved.

Key Time Steps Selection with Dynamic Programming
A recursive approach to find the key time steps R F(p, k): The optimal mapping from p time steps into k key time steps Equal to the sum of two mappings F(m, k – 1): The optimal mapping from m < p time steps to k – 1 key time steps D(m + 1, p): The DTW mapping of the later p – m time steps Find the time step m that leads to the optimal F(p, k) F(p, k) = minm F(m, k – 1) + D(m + 1, p) [1, p] [1, m] [m, p] T R m p 1 Notes: There are multiple choice for the subset R from T. In order to find the optimal R as the key time steps, we can use a recursive approach. Assume F(p, k ) is a sub-problem to find an optimal mapping from the time interval [1, p] onto k key time steps. Then we can divide the problem of solving F(p, k) into two mappings by a splitting point m. For the mapping before the splitting point m, we map the time steps [1, m] onto (k-1) key time steps. F(m, k – 1) can be treated as a sub-problem again, and solved recursively. For the mapping after m, from time step (m + 1) to time step p, we solve it my DTW mapping in a segment. Since there are multiple choice for the splitting point m, we need to find the m that leads to the minimal sum of the two mapping costs. To solve the recursive equation, we can use a table to store the result of all subproblem F(p, k). This becomes a dynamic programming that can be solved in polynomial time. Then, we can query arbitrary number of key time steps by simply scanning the table. Dynamic programming by storing F(p, k) in a table [T. Liu and J. R. Kender, ECCV ’02]* Different numbers of key time steps can be re-selected by rescanning the table *T. Liu and J. R. Kender. Optimization algorithms for the selection of key frame sequences of variable length. In ECCV ’02

Case Study 1: Radiation of Astrophysics Turbulence
600×248×248 cells in 200 time steps Comparing the iso-surface of the temperature at 20,000K from each time step Similarity metric: Mutual Information of distance fields of isosurfaces [Bruckner and Möller, EuroVis ’10]* We applied our algorithm on two data sets. One data set is the radiation of astrophysics turbulence. This data set consist of 600×248×248 cells in 200 time steps We analyze the data set by comparing the iso-surface of temperature at 20,000 K from each time step. To measure the dissimilarity between each pair of two time steps, we use the mutual information of distance fields to iso-surfaces. *Bruckner and Möller, Isosurface Similarity Maps. In EuroVis ‘10

Case Study 1: Result Less redundant information
Dissimilarity Matrix: Less redundant information More interesting isosurfaces The selected time intervals correspond to the blocks along the diagonal in the dissimilarity matrix Uniform Samples: 15 50 85 120 155 190 We selected 6 key time steps from the data set, and compare with the 6 uniform samples. Compared with the 6 uniform samples, the key time steps contains less redundant information. For example, the circled time steps are all planes shape. In the uniform sample, there are two adjacent samples are plane shape, but only one key time step here is needed for the plane shape iso-surface. On the other hand, more interesting isosurfaces are preserved. Say isosurfaces in front of the plane are interesting feature. The time steps with blue circles are the interesting time steps. There are four key time steps contains interesting features. But in uniform samples, only 3 of them have interesting features. The time intervals corresponding to the selected key time steps are marked as black squares on the dissimilarity matrix. They correspond to the blue low value blocks along the diagonal in the dissimilarity matrix very well. 24 48 78 93 113 141 Key Time Steps:

Case Study 2: Madden-Julian Oscillation
Madden-Julian Oscillation (MJO) A periodical weather phenomenon in the tropical area Main characteristics: eastward propagation of tropical cloud and rainfall Simulation result: Generated by Samson and Leung (PNNL) 2699×599×27 cells in 479 time steps Goal: Detecting typical stages of eastward cloud movement Experiment At each time step, extracting the water vapor mixing ratio along the longitude as a 1D function Using Earth Mover’s Distance to compare the 1D functions Notes: Another dataset is the simulation on the Madden-Julian Oscillation(MJO) . MJO is a periodical weather phenomenon in the tropical area. Its main characteristics is its eastward progression of tropical cloud and rainfall.. This simulation is generated by Samson and Leung at PNNL. It is a 3D volumetric dataset containing 479 time steps. Our goal is detecting the typical stages of the eastward cloud movement. In the experiment, at each time step, we extract the water vapor mixing ratio along the longitude as a 1D function. We use the 1D function to represent this time step, and use Earth Mover’s Distance to compare the 1D functions between each pair of two time steps, and use it as the dissimilarity. Longitude Water Mixing Ratio

Case Study 2: Result Typical stages of eastward cloud movement is detected from the longitude distribution plot of 7 key time steps Detected time intervals correspond to the typical stages and the dissimilarity matrix Dissimilarity Matrix Key time steps Notes: After running our algorithm, we detected the typical stages of eastward cloud movement from the longitude distribution plot of 7 key time steps. On these 7 plots, x axis is longitude, y axis is the water vapor value. In time step 25, we can see that cloud start from west side, and move to the middle in time step 145. Then in time step 171, the cloud move to the east side. These three time steps forms a MJO cycle. Another cycle of MJO is detected by the following three key time steps. Same as the previous example, the detected time intervals correspond to the blue low value region on the diagonal of dissimilarity matrix very well. 25 145 171 214 269 356 463

Key Time Steps Based Time Varying Data Browser
Components Dissimilarity matrix (only the rows of key time steps) Key time steps, represented time intervals Warping path of the nonlinear mapping 2D/3D visualization of key time steps Notes: Here, to make use of the key time steps selection results from our algorithm, we designed this key time steps based time-varying data browser to interactively analyze the time-varying dataset. The contents of the browser include the a subset of dissimilarity matrix shown as heat map. The browser also shows the key time steps as red digits and time interval boundary as white digits. A path of the nonlinear mapping is shown as yellow line. For example, time steps in 151 to 225 are represented by time step 191. We also have the 2D/3D visualization of the key time steps below the dissimilarity matrix. We use MJO dataset as an example, below the dissimilarity matrix, we have the visualization of the cloud distribution in Indonesia area of the 5 key time steps. Example on MJO dataset

Key Time Steps Based Time Varying Data Browser (cont’d)
Support interactive exploration of different key time steps within different time interval Notes: This system allows us to interactively explore different number of key time steps within different time interval. From the previous selection, if we want more information in the interesting time interval, [145,230], we can zoom in and pick another 5 key time steps from this time interval. In this way, we can explore the time varying data set interactively and recursively until we reach our interested feature and time step.

Number of key time steps Computing time (seconds)
Performance Experiment platform Intel Core i CPU with 16GB system memory Dissimilarity matrix computation Related to the complexity of the dissimilarity metrics Time step selection Fast for interactive reselection of time steps Cubic to the number of time steps Number of key time steps Percentage to all time steps Computing time (seconds) 10 2% 50 10% 100 20% 450 93% We evaluated our algorithm on our machine. For computation time of the dissimilarity matrix is related to the complexity of the dissimilarity metrics. It is fast enough for interactive selection of key time steps. The computation time of our key time step selection algorithm is very short, less than 0.19 second for 450 time steps. And it is cubic to the number of time steps.

Conclusion Contribution Future Work
A dynamic programming algorithm to identify a globally optimal subset of time steps from large scale time-varying data Dynamic reselection of time steps for interactive exploration Future Work Apply time step selection for different spatial regions In situ data reduction based on a partial dissimilarity matrix We develop a dynamic programming algorithm to identify a globally optimal subset of time steps from large scale time-varying data. This algorithm allows us to interactively explore the time varying dataset by dynamic reselection of key time steps. These are our future work: Currently, we test the algorithm on the entire spatial domain. In the future, we will try to evaluate key time steps in different spatial regions separately. And applying it on the In situ data reduction based on a partial dissimilarity matrix

Acknowledgements Any Questions?
Thank the anonymous reviewers for their comments. This work was supported in part by NSF grant IIS , US Department of Energy DOE-SC , Battelle Contract No , and Department of Energy SciDAC grant DE-FC02- 06ER25779, program manager Lucy Nowell. Datasets D. Whalen (LANL) and M. L. Norman (SDSC), Competition data set and description. In 2008 IEEE Visualization Design Contest, 2008 S. Hagos and R. Leung (PNNL), Moist thermodynamics of the maddenjulian oscillation in a cloud-resolving simulation. Journal of Climate, 24:5571–5583, 2011. Any Questions?

Xin Tong, Teng-Yok Lee, Han-Wei Shen The Ohio State University

Similar presentations

Presentation on theme: "Xin Tong, Teng-Yok Lee, Han-Wei Shen The Ohio State University"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Xin Tong, Teng-Yok Lee, Han-Wei Shen The Ohio State University

Similar presentations

Presentation on theme: "Xin Tong, Teng-Yok Lee, Han-Wei Shen The Ohio State University"— Presentation transcript:

Similar presentations

About project

Feedback