Abdullah Mueen Eamonn Keogh University of California, Riverside.

Slides:



Advertisements
Similar presentations
Indexing Time Series Based on original slides by Prof. Dimitrios Gunopulos and Prof. Christos Faloutsos with some slides from tutorials by Prof. Eamonn.
Advertisements

SAX: a Novel Symbolic Representation of Time Series
Probabilistic Skyline Operator over Sliding Windows Wenjie Zhang University of New South Wales & NICTA, Australia Joint work: Xuemin Lin, Ying Zhang, Wei.
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Incremental Clustering Previous clustering algorithms worked in “batch” mode: processed all points at essentially the same time. Some IR applications cluster.
Doruk Sart, Abdullah Mueen, Walid Najjar, Eamonn Keogh, Vit Niennatrakul 1.
Searching on Multi-Dimensional Data
Distributed Indexed Outlier Detection Algorithm Status Update as of March 11, 2014.
Efficient Anomaly Monitoring over Moving Object Trajectory Streams joint work with Lei Chen (HKUST) Ada Wai-Chee Fu (CUHK) Dawei Liu (CUHK) Yingyi Bu (Microsoft)
1 NNH: Improving Performance of Nearest- Neighbor Searches Using Histograms Liang Jin (UC Irvine) Nick Koudas (AT&T Labs Research) Chen Li (UC Irvine)
Mining Time Series.
Introduction to Bioinformatics
Common approach 1. Define space: assign random ID (160-bit) to each node and key 2. Define a metric topology in this space,  that is, the space of keys.
VLSH: Voronoi-based Locality Sensitive Hashing Sung-eui Yoon Authors: Lin Loi, Jae-Pil Heo, Junghwan Lee, and Sung-Eui Yoon KAIST
A Novel Scheme for Video Similarity Detection Chu-Hong Hoi, Steven March 5, 2003.
Automatic Identification of ROIs (Regions of interest) in fMRI data.
Stabbing the Sky: Efficient Skyline Computation over Sliding Windows COMP9314 Lecture Notes.
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Themis Palpanas1 VLDB - Aug 2004 Fair Use Agreement This agreement covers the use of all slides on this CD-Rom, please read carefully. You may freely use.
Abdullah Mueen UC Riverside Suman Nath Microsoft Research Jie Liu Microsoft Research.
Efficient Query Filtering for Streaming Time Series
Based on Slides by D. Gunopulos (UCR)
1 Gigabit Rate Multiple- Pattern Matching with TCAM Fang Yu Randy H. Katz T. V. Lakshman
Finding Time Series Motifs on Disk-Resident Data
Face Processing System Presented by: Harvest Jang Group meeting Fall 2002.
CIS786, Lecture 8 Usman Roshan Some of the slides are based upon material by Dennis Livesay and David.
1 Dot Plots For Time Series Analysis Dragomir Yankov, Eamonn Keogh, Stefano Lonardi Dept. of Computer Science & Eng. University of California Riverside.
Computer Vision James Hays, Brown
FastDTW: Toward Accurate Dynamic Time Warping in Linear Time and Space
Neighbourhood Sampling for Local Properties on a Graph Stream A. Pavan, Iowa State University Kanat Tangwongsan, IBM Research Srikanta Tirthapura, Iowa.
Cluster and Grid Computing Lab, Huazhong University of Science and Technology, Wuhan, China Supporting VCR Functions in P2P VoD Services Using Ring-Assisted.
The X-Tree An Index Structure for High Dimensional Data Stefan Berchtold, Daniel A Keim, Hans Peter Kriegel Institute of Computer Science Munich, Germany.
Graph Algorithms. Definitions and Representation An undirected graph G is a pair (V,E), where V is a finite set of points called vertices and E is a finite.
BRAID: Stream Mining through Group Lag Correlations Yasushi Sakurai Spiros Papadimitriou Christos Faloutsos SIGMOD 2005.
Adaptive Multi-path Prediction for Error Resilient H.264 Coding Xiaosong Zhou, C.-C. Jay Kuo University of Southern California Multimedia Signal Processing.
Mining Time Series.
A fast algorithm for the generalized k- keyword proximity problem given keyword offsets Sung-Ryul Kim, Inbok Lee, Kunsoo Park Information Processing Letters,
Computer Science and Engineering Efficiently Monitoring Top-k Pairs over Sliding Windows Presented By: Zhitao Shen 1 Joint work with Muhammad Aamir Cheema.
Using Traveling Salesman Problem Algorithms to Determine Multiple Sequence Alignment Orders Weiwei Zhong.
University of Macau, Macau
Discovering Deformable Motifs in Time Series Data Jin Chen CSE Fall 1.
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
Open Problem: Dynamic Planar Nearest Neighbors CSCE 620 Problem 63 from the Open Problems Project
Tamanna Chhabra, M. Oguzhan Kulekci, and Jorma Tarhio Aalto University.
Semi-Supervised Time Series Classification Li Wei Eamonn Keogh University of California, Riverside {wli,
Bin Yao (Slides made available by Feifei Li) R-tree: Indexing Structure for Data in Multi- dimensional Space.
Epitomic Location Recognition A generative approach for location recognition K. Ni, A. Kannan, A. Criminisi and J. Winn In proc. CVPR Anchorage,
Exact indexing of Dynamic Time Warping
Fast Shapelets: All Figures in Higher Resolution.
Stream Monitoring under the Time Warping Distance Yasushi Sakurai (NTT Cyber Space Labs) Christos Faloutsos (Carnegie Mellon Univ.) Masashi Yamamuro (NTT.
Monitoring k-NN Queries over Moving Objects Xiaohui Yu University of Toronto Joint work with Ken Pu and Nick Koudas.
The Keogh Lab 1 Presented by Abdullah Mueen. Overview of our work Our Goal: Extract information from raw, noisy, massive, unstructured data. We develop.
Magic Camera Master’s Project Defense By Adam Meadows Project Committee: Dr. Eamonn Keogh Dr. Doug Tolbert.
Accelerating Dynamic Time Warping Clustering with a Novel Admissible Pruning Strategy Nurjahan BegumLiudmila Ulanova Jun Wang 1 Eamonn Keogh University.
Indexing Time Series. Outline Spatial Databases Temporal Databases Spatio-temporal Databases Multimedia Databases Time Series databases Text databases.
Similarity Measurement and Detection of Video Sequences Chu-Hong HOI Supervisor: Prof. Michael R. LYU Marker: Prof. Yiu Sang MOON 25 April, 2003 Dept.
Distance Vector Routing
ITree: Exploring Time-Varying Data using Indexable Tree Yi Gu and Chaoli Wang Michigan Technological University Presented at IEEE Pacific Visualization.
Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte A Bag-of-Features Framework for Time Series Classification.
1 Bilinear Classifiers for Visual Recognition Computational Vision Lab. University of California Irvine To be presented in NIPS 2009 Hamed Pirsiavash Deva.
Matrix Profile I: All Pairs Similarity Joins for Time Series: A Unifying View that Includes Motifs, Discords and Shapelets Chin-Chia Michael Yeh, Yan.
Matrix Profile II: Exploiting a Novel Algorithm and GPUs to break the one Hundred Million Barrier for Time Series Motifs and Joins Yan Zhu, Zachary Zimmerman,
A comparison of Ad-Hoc Routing Protocols
Enumeration of Time Series Motifs of All Lengths
Bioinformatics: The pair-wise alignment problem
Jin Shieh and Eamonn Keogh University of California - Riverside
We understand classification algorithms in terms of the expressiveness or representational power of their decision boundaries. However, just because your.
Why Compress? To reduce the volume of data to be transmitted (text, fax, images) To reduce the bandwidth required for transmission and to reduce storage.
Online Analytical Processing Stream Data: Is It Feasible?
Liang Jin (UC Irvine) Nick Koudas (AT&T Labs Research)
Presentation transcript:

Abdullah Mueen Eamonn Keogh University of California, Riverside

Repeated pattern in a time series Utility: – Activity/Event discovery – Summarization – Classification Application – Data Center Chiller Management (Patnaik, et al. KDD 09) – Designing Smart Cane for Elders (Vahdatpour, et al. IJCAI 09)

Streaming time series Sliding window of the recent history – What minute long trace repeated in the last hour?

Discovery The most similar pair of non-overlapping subsequences Maintenance time: time: time: time: time:1005 Update motif pair after every time tick time

A subsequence is a high dimensional point – The dynamic closest pair of points problem Closest pair may change upon every update Naïve approach: Do quadratic comparisons Deletion Insertion

Goal: Algorithm with Linear update time Previous method for dynamic closest pair (Eppstein,00) – A matrix of all-pair distances is maintained O( w 2 ) space required – Quad-tree is used to update the matrix Maintain a set of neighbors and reverse neighbors for all points We do it in O( ) space

Methods Evaluation Extensions Case Studies Conclusion

Smallest nearest neighbor → Closest pair Upon insertion – Find the nearest neighbor; Needs O( w ) comparisons. Upon deletion – Find the next NN of all the reverse NN Insertion Deletion 9

FIFO Sliding window N-lists Neighbors in order of distances R-lists Points that should be updated Total number of nodes in R-lists is w Still O( w 2 ) space (ptr, distance)

While inserting  Updating NN of old points is not necessary  A point can be removed from the neighbor list if it violates the temporal order Average length O( )

Fast Pair Insect EEG EOG RW Average Update Time (Sec) x 10 4 Window Size (w) Insect EEG EOG RW Fast Pair Average Update Time (Sec) Motif Length (m) x Insect EEG EOG RW Average Length of N-list Window Size (w) Insect EEG EOG RW Motif Length (m) Average Length of N-list

Methods Evaluation Extensions Case Studies Conclusion

Multidimensional Motif Maintain motif in Multiple Synchronous time series Points in one series can have neighbors in others Arbitrary Data Rate Load shedding: Skip points if can’t handle Fraction of Data Used Fraction of Motifs Discovered EOG Random Walk EEG Insect FIFO for each streams

Replace all the occurrences of a motif by a pointer to that motif. For signals with regular repetitions – Higher compression rate with less error. Dictionary

Check if a robot closed a loop Convert the stream of video frames to stream of color histograms. Most similar color histograms are good candidates for loop detection Start Loop Closure Detected

First attempt to maintain time series motif online – Maintains minutes long repetition in hours long sliding window Linear update time with less space cost than quad-tree based method – O( ) Vs O(w 2 ) Faster than general dynamic closest pair solution

Code and Data: