Stream Monitoring under the Time Warping Distance Yasushi Sakurai (NTT Cyber Space Labs) Christos Faloutsos (Carnegie Mellon Univ.) Masashi Yamamuro (NTT.

Slides:



Advertisements
Similar presentations
Indexing Time Series Based on original slides by Prof. Dimitrios Gunopulos and Prof. Christos Faloutsos with some slides from tutorials by Prof. Eamonn.
Advertisements

The A-tree: An Index Structure for High-dimensional Spaces Using Relative Approximation Yasushi Sakurai (NTT Cyber Space Laboratories) Masatoshi Yoshikawa.
Learning Trajectory Patterns by Clustering: Comparative Evaluation Group D.
Probabilistic Skyline Operator over Sliding Windows Wenjie Zhang University of New South Wales & NICTA, Australia Joint work: Xuemin Lin, Ying Zhang, Wei.
Word Spotting DTW.
Discovering Lag Interval For Temporal Dependencies Larisa Shwartz Liang Tang, Tao Li, Larisa Shwartz1 Liang Tang, Tao Li
Dynamic Programming Nithya Tarek. Dynamic Programming Dynamic programming solves problems by combining the solutions to sub problems. Paradigms: Divide.
Fast Algorithms For Hierarchical Range Histogram Constructions
Yasuhiro Fujiwara (NTT Cyber Space Labs)
Doruk Sart, Abdullah Mueen, Walid Najjar, Eamonn Keogh, Vit Niennatrakul 1.
Carnegie Mellon DB/IR '06C. Faloutsos#1 Data Mining on Streams Christos Faloutsos CMU.
Efficient Anomaly Monitoring over Moving Object Trajectory Streams joint work with Lei Chen (HKUST) Ada Wai-Chee Fu (CUHK) Dawei Liu (CUHK) Yingyi Bu (Microsoft)
Efficient Distribution Mining and Classification Yasushi Sakurai (NTT Communication Science Labs), Rosalynn Chong (University of British Columbia), Lei.
Similarity Search for Adaptive Ellipsoid Queries Using Spatial Transformation Yasushi Sakurai (NTT Cyber Space Laboratories) Masatoshi Yoshikawa (Nara.
Streaming Pattern Discovery in Multiple Time-Series Spiros Papadimitriou Jimeng Sun Christos Faloutsos Carnegie Mellon University VLDB 2005, Trondheim,
A Generic Framework for Handling Uncertain Data with Local Correlations Xiang Lian and Lei Chen Department of Computer Science and Engineering The Hong.
74 th EAGE Conference & Exhibition incorporating SPE EUROPEC 2012 Automated seismic-to-well ties? Roberto H. Herrera and Mirko van der Baan University.
SIGMOD 2006University of Alberta1 Approximately Detecting Duplicates for Streaming Data using Stable Bloom Filters Presented by Fan Deng Joint work with.
Stabbing the Sky: Efficient Skyline Computation over Sliding Windows COMP9314 Lecture Notes.
Quantile-Based KNN over Multi- Valued Objects Wenjie Zhang Xuemin Lin, Muhammad Aamir Cheema, Ying Zhang, Wei Wang The University of New South Wales, Australia.
WindMine: Fast and Effective Mining of Web-click Sequences SDM 2011Y. Sakurai et al.1 Yasushi Sakurai (NTT) Lei Li (Carnegie Mellon Univ.) Yasuko Matsubara.
Themis Palpanas1 VLDB - Aug 2004 Fair Use Agreement This agreement covers the use of all slides on this CD-Rom, please read carefully. You may freely use.
Efficient Similarity Search in Sequence Databases Rakesh Agrawal, Christos Faloutsos and Arun Swami Leila Kaghazian.
Abdullah Mueen UC Riverside Suman Nath Microsoft Research Jie Liu Microsoft Research.
Overview Of Clustering Techniques D. Gunopulos, UCR.
Distance Functions for Sequence Data and Time Series
Based on Slides by D. Gunopulos (UCR)
A Multiresolution Symbolic Representation of Time Series
Privacy Preservation for Data Streams Feifei Li, Boston University Joint work with: Jimeng Sun (CMU), Spiros Papadimitriou, George A. Mihaila and Ioana.
Spatial and Temporal Databases Efficiently Time Series Matching by Wavelets (ICDE 98) Kin-pong Chan and Ada Wai-chee Fu.
1 Theory I Algorithm Design and Analysis (11 - Edit distance and approximate string matching) Prof. Dr. Th. Ottmann.
Fast Subsequence Matching in Time-Series Databases Christos Faloutsos M. Ranganathan Yannis Manolopoulos Department of Computer Science and ISR University.
Time Series I.
Qualitative approximation to Dynamic Time Warping similarity between time series data Blaž Strle, Martin Možina, Ivan Bratko Faculty of Computer and Information.
Fast and Exact Monitoring of Co-evolving Data Streams Yasuko Matsubara, Yasushi Sakurai (Kumamoto University) Naonori Ueda (NTT) Masatoshi Yoshikawa (Kyoto.
Dynamic Time Warping Algorithm for Gene Expression Time Series
Dynamic Programming. Well known algorithm design techniques:. –Divide-and-conquer algorithms Another strategy for designing algorithms is dynamic programming.
BRAID: Discovering Lag Correlations in Multiple Streams Yasushi Sakurai (NTT Cyber Space Labs) Spiros Papadimitriou (Carnegie Mellon Univ.) Christos Faloutsos.
Fast Mining and Forecasting of Complex Time-Stamped Events Yasuko Matsubara (Kyoto University), Yasushi Sakurai (NTT), Christos Faloutsos (CMU), Tomoharu.
Incorporating Dynamic Time Warping (DTW) in the SeqRec.m File Presented by: Clay McCreary, MSEE.
BRAID: Stream Mining through Group Lag Correlations Yasushi Sakurai Spiros Papadimitriou Christos Faloutsos SIGMOD 2005.
Shape-based Similarity Query for Trajectory of Mobile Object NTT Communication Science Laboratories, NTT Corporation, JAPAN. Yutaka Yanagisawa Jun-ichi.
Computer Science and Engineering Efficiently Monitoring Top-k Pairs over Sliding Windows Presented By: Zhitao Shen 1 Joint work with Muhammad Aamir Cheema.
Distributed Spatio-Temporal Similarity Search Demetrios Zeinalipour-Yazti University of Cyprus Song Lin
Identifying Patterns in Time Series Data Daniel Lewis 04/06/06.
Spatio-temporal Pattern Queries M. Hadjieleftheriou G. Kollios P. Bakalov V. J. Tsotras.
ICDE, San Jose, CA, 2002 Discovering Similar Multidimensional Trajectories Michail VlachosGeorge KolliosDimitrios Gunopulos UC RiversideBoston UniversityUC.
k-Shape: Efficient and Accurate Clustering of Time Series
Lei Li Computer Science Department Carnegie Mellon University Pre Proposal Time Series Learning completed work 11/27/2015.
Exact indexing of Dynamic Time Warping
D-skyline and T-skyline Methods for Similarity Search Query in Streaming Environment Ling Wang 1, Tie Hua Zhou 1, Kyung Ah Kim 2, Eun Jong Cha 2, and Keun.
Streaming Pattern Discovery in Multiple Time-Series Jimeng Sun Spiros Papadimitrou Christos Faloutsos PARALLEL DATA LABORATORY Carnegie Mellon University.
1 Dynamic Time Warping and Minimum Distance Paths for Speech Recognition Isolated word recognition: Task : Want to build an isolated ‘word’ recogniser.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology ACM SIGMOD1 Subsequence Matching on Structured Time Series.
Enabling Real Time Alerting through streaming pattern discovery Chengyang Zhang Computer Science Department University of North Texas 11/21/2016 CRI Group.
Carnegie Mellon School of Computer Science Forecasting with Cyber-physical Interactions in Data Centers Lei Li PDL Seminar 9/28/2011.
Fast Subsequence Matching in Time-Series Databases.
Forecasting with Cyber-physical Interactions in Data Centers (part 3)
Abolfazl Asudeh Azade Nazi Nan Zhang Gautam DaS
RE-Tree: An Efficient Index Structure for Regular Expressions
Majkowska University of California. Los Angeles
Distance Functions for Sequence Data and Time Series
Supporting Fault-Tolerance in Streaming Grid Applications
Kijung Shin1 Mohammad Hammoud1
Spatio-temporal Pattern Queries
Distance Functions for Sequence Data and Time Series
Time Series Filtering Time Series
Online Analytical Processing Stream Data: Is It Feasible?
Efficient Processing of Top-k Spatial Preference Queries
Presentation transcript:

Stream Monitoring under the Time Warping Distance Yasushi Sakurai (NTT Cyber Space Labs) Christos Faloutsos (Carnegie Mellon Univ.) Masashi Yamamuro (NTT Cyber Space Labs)

ICDE 2007 Y. Sakurai et al 2 Introduction Data-stream applications  Network analysis  Sensor monitoring  Financial data analysis  Moving object tracking Goal  Monitor numerical streams  Find subsequences similar to the given query sequence  Distance measure: Dynamic Time Warping (DTW)

ICDE 2007 Y. Sakurai et al 3 Introduction DTW is computed by dynamic programming  Stretch sequences along the time axis to minimize the distance  Warping path: set of grid cells in the time warping matrix X Y xNxN yMyM xixi yjyj y1y1 x1x1 Time warping matrix Optimal warping path (the best alignment )

ICDE 2007 Y. Sakurai et al 4 Related Work Sequence indexing, subsequence matching  Agrawal et al. (FODO 1998)  Keogh et al. (SIGMOD 2001)  Faloutsos et al. (SIGMOD 1994)  Moon et al. (SIGMOD 2002) Fast sequence matching for DTW  Yi et al. (ICDE 1998)  Keogh (VLDB 2002)  Zhu et al. (SIGMOD 2003)  Sakurai et al. (PODS 2005)

ICDE 2007 Y. Sakurai et al 5 Related Work Data stream processing for pattern discovery  Clustering for data streams Guha et al. (TKDE 2003)  Monitoring multiple streams Zhu et al. (VLDB 2002)  Forecasting Papadimitriou et al. (VLDB 2003)  Detecting lag correlations Sakurai et al. (SIGMOD 2005) DTW has been studied for finite, stored sequence sets We address a new problem for DTW

ICDE 2007 Y. Sakurai et al 6 Overview Introduction / Related work Problem definition Main ideas Experimental results

ICDE 2007 Y. Sakurai et al 7 Problem Definition Subsequence matching for data streams  (Fixed-length) query sequence Y=(y 1, y 2,…, y m )  Sequence (data stream) X=(x 1, x 2,…, x n )  Find all subsequences X[t s,t e ] such that

ICDE 2007 Y. Sakurai et al 8 Subsequence Matching X[t s :t e ] xtexte xtsxts x1x1 xnxn X ymym Y y1y1 Other similar subsequences Redundant, useless subsequences

ICDE 2007 Y. Sakurai et al 9 Problem Definition Subsequence matching for data streams  (Fixed-length) query sequence Y  Sequence (data stream) X=(x 1, x 2,…, x n )  Find all subsequence X[t s,t e ] such that Multiple matches by subsequences which heavily overlap with the “local minimum” best match [ double harm ]  Flood the user with redundant information  Slow down the algorithm by forcing it to keep track of and report all these useless “solutions” Eliminate the redundant subsequences, and report only the “optimal” ones

ICDE 2007 Y. Sakurai et al 10 Problem Definition Problem: Disjoint query  Given a threshold , report all X[t s :t e ] such that Only the local minimum is the smallest value in the group of overlapping subsequences that satisfy the first condition Additional challenges: streaming solution  Process a new value of X efficiently  Guarantee no false dismissals  Report each match as early as possible

ICDE 2007 Y. Sakurai et al 11 Overview Introduction / Related work Problem definition Main ideas Experimental results

ICDE 2007 Y. Sakurai et al 12 Compute the time warping matrices starting from every time-tick  Need O(n) matrices, O(nm) time per time-tick Disjoint query  Compute all the possible subsequences and then choose the optimal ones Y X xtsxts xtexte x1x1 Why not ‘ naive ’ ? Capture the optimal subsequence starting from t = t s

ICDE 2007 Y. Sakurai et al 13 Main idea (1) Star-padding  Use only a single matrix (the naïve solution uses n matrices)  Prefix Y with ‘ * ’, that always gives zero distance  instead of Y=(y 1, y 2, …, y m ), compute distances with Y’  O(m) time and space (the naïve requires O(nm) )

ICDE 2007 Y. Sakurai et al 14 X t=t s t=t e Report X[t s :t e ] Second subsequence t=1 SPRING Start at zero distance on every bottom row

ICDE 2007 Y. Sakurai et al 15 Main idea (2) STWM (Subsequence Time Warping Matrix)  Problem of the star-padding: we lose the information about the starting time-tick of the match  After the scan, “which is the optimal subsequence?” Elements of STWM  Distance value of each subsequence  Starting position Combination of star-padding and STWM  Efficiently identify the optimal subsequence in a stream fashion

ICDE 2007 Y. Sakurai et al 16 Main idea (3) Algorithm for disjoint queries Designed to:  Guarantee no false dismissals  Report each match as early as possible

ICDE 2007 Y. Sakurai et al 17 Algorithm for disjoint queries 1. Update m elements (distance and starting position) at every time-tick 2. Keep track of the minimum distance d min when a subsequence within  is found 3. Report the subsequence that gives d min if (a) and (b) are satisfied (a) the captured optimal subsequence cannot be replaced by the upcoming subsequences (b) the upcoming subsequences dot not overlap with the captured optimal subsequence

ICDE 2007 Y. Sakurai et al 18 Algorithm for disjoint queries distance (upper number), starting position (number in parentheses) X=(5,12,6,10,6,5,13), Y=(11,6,9,4),  = 20 y 4 = 4 54 (1) 110 (2) 14 (2) 38 (2) 6 (2) 7 (2) 88 (2) y 3 = 9 53 (1) 46 (2) 10 (2) 2 (2) 10 (4) 17 (4) 18 (4) y 2 = 6 37 (1) 37 (2) 1 (2) 17 (4) 1 (4) 2 (4) 51 (4) y 1 = (1) 1 (2) 25 (3) 1 (4) 25 (5) 36 (6) 4 (7) xtxt t

ICDE 2007 Y. Sakurai et al 19 Algorithm for disjoint queries distance (upper number), starting position (number in parentheses) X=(5,12,6,10,6,5,13), Y=(11,6,9,4),  = 20 optimal subsequence, redundant subsequences y 4 = 4 54 (1) 110 (2) 14 (2) 38 (2) 6 (2) 7 (2) 88 (2) y 3 = 9 53 (1) 46 (2) 10 (2) 2 (2) 10 (4) 17 (4) 18 (4) y 2 = 6 37 (1) 37 (2) 1 (2) 17 (4) 1 (4) 2 (4) 51 (4) y 1 = (1) 1 (2) 25 (3) 1 (4) 25 (5) 36 (6) 4 (7) xtxt t

ICDE 2007 Y. Sakurai et al 20 Algorithm for disjoint queries distance (upper number), starting position (number in parentheses) X=(5,12,6,10,6,5,13), Y=(11,6,9,4),  = 20 optimal subsequence, redundant subsequences y 4 = 4 54 (1) 110 (2) 14 (2) 38 (2) 6 (2) 7 (2) 88 (2) y 3 = 9 53 (1) 46 (2) 10 (2) 2 (2) 10 (4) 17 (4) 18 (4) y 2 = 6 37 (1) 37 (2) 1 (2) 17 (4) 1 (4) 2 (4) 51 (4) y 1 = (1) 1 (2) 25 (3) 1 (4) 25 (5) 36 (6) 4 (7) xtxt t

ICDE 2007 Y. Sakurai et al 21 Algorithm for disjoint queries y 4 = 4 54 (1) 110 (2) 14 (2) 38 (2) 6 (2) 7 (2) 88 (2) y 3 = 9 53 (1) 46 (2) 10 (2) 2 (2) 10 (4) 17 (4) 18 (4) y 2 = 6 37 (1) 37 (2) 1 (2) 17 (4) 1 (4) 2 (4) 51 (4) y 1 = (1) 1 (2) 25 (3) 1 (4) 25 (5) 36 (6) 4 (7) xtxt t distance (upper number), starting position (number in parentheses) X=(5,12,6,10,6,5,13), Y=(11,6,9,4),  = 20 optimal subsequence, redundant subsequences

ICDE 2007 Y. Sakurai et al 22 Algorithm for disjoint queries Guarantee to report the optimal subsequence (a) The captured optimal subsequence cannot be replaced (b) The upcoming subsequences do not overlap with the captured optimal subsequence y 4 = 4 54 (1) 110 (2) 14 (2) 38 (2) 6 (2) 7 (2) 88 (2) y 3 = 9 53 (1) 46 (2) 10 (2) 2 (2) 10 (4) 17 (4) 18 (4) y 2 = 6 37 (1) 37 (2) 1 (2) 17 (4) 1 (4) 2 (4) 51 (4) y 1 = (1) 1 (2) 25 (3) 1 (4) 25 (5) 36 (6) 4(7)4(7) xtxt t

ICDE 2007 Y. Sakurai et al 23 Algorithm for disjoint queries Guarantee to report the optimal subsequence  Finally report the optimal subsequence X[2:5] at t=7  Initialize the distance values (d 2 =51, d 3 =18, d 4 =88) y 4 = 4 54 (1) 110 (2) 14 (2) 38 (2) 6 (2) 7 (2) 88 (2) y 3 = 9 53 (1) 46 (2) 10 (2) 2 (2) 10 (4) 17 (4) 18 (4) y 2 = 6 37 (1) 37 (2) 1 (2) 17 (4) 1 (4) 2 (4) 51 (4) y 1 = (1) 1 (2) 25 (3) 1 (4) 25 (5) 36 (6) 4 (7) xtxt t

ICDE 2007 Y. Sakurai et al 24 Overview Introduction / Related work Problem definition Main ideas Experimental results

ICDE 2007 Y. Sakurai et al 25 Experimental Results Experiments with real and synthetic data sets  MaskedChirp, Temperature, Kursk, Sunspots Evaluation  Accuracy for pattern discovery  Computation time  (Memory space consumption)

ICDE 2007 Y. Sakurai et al 26 Pattern Discovery MaskedChirp Query sequenceData stream

ICDE 2007 Y. Sakurai et al 27 Pattern Discovery MaskedChirp SPRING identifies all sound parts with varying time periods Query sequenceData stream The output time of each captured subsequence is very close to its end position

ICDE 2007 Y. Sakurai et al 28 Pattern Discovery Temperature Query sequenceData stream

ICDE 2007 Y. Sakurai et al 29 Pattern Discovery Temperature Query sequenceData stream SPRING finds the days when the temperature fluctuates from cool to hot

ICDE 2007 Y. Sakurai et al 30 Pattern Discovery Kursk Query sequenceData stream

ICDE 2007 Y. Sakurai et al 31 Pattern Discovery Kursk Query sequenceData stream SPRING is not affected by the difference in the environmental conditions

ICDE 2007 Y. Sakurai et al 32 Pattern Discovery Sunspots Query sequenceData stream

ICDE 2007 Y. Sakurai et al 33 Pattern Discovery Sunspots Query sequenceData stream SPRING can capture bursty periods and identify the time- varying periodicity

ICDE 2007 Y. Sakurai et al 34 Computation time Wall clock time per time-tick  Naïve method: O(nm)  SPRING: O(m) , not depend on sequence length n

ICDE 2007 Y. Sakurai et al 35 Extension to multiple streams Motion capture data  Place special markers on the joints of a human actor  Record their x-, y-, z-velocities  Use 16-dimensional sequences  Capture motions based on the similarity of rotational energy E rotation : rotational energy I : moment of inertia  : angular velocity

ICDE 2007 Y. Sakurai et al 36 High-speed Motion Capture

ICDE 2007 Y. Sakurai et al 37 High-speed Motion Capture Recognize all motions in a stream fashion  Entertainment applications, etc Walk Swing Rotate Swing Rotate One-leg jump Jump Walk Run Walk

ICDE 2007 Y. Sakurai et al 38 Conclusions Subsequence matching under the DTW distance over data streams 1. High-speed, and low memory consumption  O(m) time and space; not depend on n 2. Accuracy  Guarantee no false dismissals Stored data sets  SPRING can be applied to stored sequence sets

Appendix

ICDE 2007 Y. Sakurai et al 40 Mini-introduction to DTW DTW allows sequences to be stretched along the time axis  Minimize the distance of sequences  Insert ‘stutters’ into a sequence  THEN compute the (Euclidean) distance ‘stutters’:original

ICDE 2007 Y. Sakurai et al 41 Mini-introduction to DTW DTW is computed by dynamic programming  Warping path: set of grid cells in the time warping matrix data sequence P of length N query sequence Q of length M pNpN qMqM pipi qjqj q1q1 p1p1 p -stutters q -stutters Optimum warping path (the best alignment )

ICDE 2007 Y. Sakurai et al 42 Mini-introduction to DTW q-stutter no stutter p-stutter DTW is computed by dynamic programming p 1, p 2, …, p i,; q 1, q 2, …, q j

ICDE 2007 Y. Sakurai et al 43 Pattern Discovery Humidity Query sequenceData stream

ICDE 2007 Y. Sakurai et al 44 Pattern Discovery Humidity Query sequenceData stream

ICDE 2007 Y. Sakurai et al 45 Two Algorithms of SPRING SPRING-optimal  = 10,000  = 15,000 Query sequence

ICDE 2007 Y. Sakurai et al 46 Two Algorithms of SPRING SPRING-first  = 10,000  = 15,000 Query sequence

ICDE 2007 Y. Sakurai et al 47 Memory space consumption Memory space for time warping matrix (matrices)  Naïve method: O(nm)  SPRING: O(m) , not depend on sequence length n  SPRING (path): clearly lower than that of the naïve method