Efficient Anomaly Monitoring over Moving Object Trajectory Streams joint work with Lei Chen (HKUST) Ada Wai-Chee Fu (CUHK) Dawei Liu (CUHK) Yingyi Bu (Microsoft)

Slides:



Advertisements
Similar presentations
High Performance Discovery from Time Series Streams
Advertisements

Incremental Clustering for Trajectories
Indexing Time Series Based on original slides by Prof. Dimitrios Gunopulos and Prof. Christos Faloutsos with some slides from tutorials by Prof. Eamonn.
A Non-Blocking Join Achieving Higher Early Result Rate with Statistical Guarantees Shimin Chen* Phillip B. Gibbons* Suman Nath + *Intel Labs Pittsburgh.
Ranking Outliers Using Symmetric Neighborhood Relationship Wen Jin, Anthony K.H. Tung, Jiawei Han, and Wei Wang Advances in Knowledge Discovery and Data.
Computer Science and Engineering Inverted Linear Quadtree: Efficient Top K Spatial Keyword Search Chengyuan Zhang 1,Ying Zhang 1,Wenjie Zhang 1, Xuemin.
Mining Compressed Frequent- Pattern Sets Dong Xin, Jiawei Han, Xifeng Yan, Hong Cheng Department of Computer Science University of Illinois at Urbana-Champaign.
Probabilistic Skyline Operator over Sliding Windows Wenjie Zhang University of New South Wales & NICTA, Australia Joint work: Xuemin Lin, Ying Zhang, Wei.
1 StatStream: Statistical Monitoring of Thousands of Data Streams in Real Time Pankaj Kumar Madhukar Rakesh Kumar Singh Puspendra Kumar Project Instructor:
Mining Distance-Based Outliers in Near Linear Time with Randomization and a Simple Pruning Rule Stephen D. Bay 1 and Mark Schwabacher 2 1 Institute for.
1 NNH: Improving Performance of Nearest- Neighbor Searches Using Histograms Liang Jin (UC Irvine) Nick Koudas (AT&T Labs Research) Chen Li (UC Irvine)
Yoshiharu Ishikawa (Nagoya University) Yoji Machida (University of Tsukuba) Hiroyuki Kitagawa (University of Tsukuba) A Dynamic Mobility Histogram Construction.
Streaming Pattern Discovery in Multiple Time-Series Spiros Papadimitriou Jimeng Sun Christos Faloutsos Carnegie Mellon University VLDB 2005, Trondheim,
Mining Time Series.
Continuous Intersection Joins Over Moving Objects Rui Zhang University of Melbourne Dan Lin Purdue University Kotagiri Ramamohanarao University of Melbourne.
Stabbing the Sky: Efficient Skyline Computation over Sliding Windows COMP9314 Lecture Notes.
--Presented By Sudheer Chelluboina. Professor: Dr.Maggie Dunham.
Data Mining: Concepts and Techniques (3rd ed.) — Chapter 12 —
Disk Aware Discord Discovery:
Themis Palpanas1 VLDB - Aug 2004 Fair Use Agreement This agreement covers the use of all slides on this CD-Rom, please read carefully. You may freely use.
On Discovering Moving Clusters in Spatio-temporal Data Panos Kalnis National University of Singapore Nikos Mamoulis University of Hong Kong Spiridon Bakiras.
Data Mining: Concepts and Techniques Mining time-series data.
Efficient Query Filtering for Streaming Time Series
Online Pattern Discovery Applications in Data Streams Sensor-less: Pairs-trading in stock trading (find highly correlated pairs in n log n time) Sensor-full:
Elastic Burst Detection: Applications Discovering intervals with an unusually large numbers of events. –In astrophysics, the sky is constantly observed.
Jessica Lin, Eamonn Keogh, Stefano Loardi
1. 2 General problem Retrieval of time-series similar to a given pattern.
Probabilistic Skyline Operator over sliding Windows Wan Qian HKUST DB Group.
Visually Mining and Monitoring Massive Time Series Amy Karlson V. Shiv Naga Prasad 15 February 2004 CMSC 838S Images courtesy of Jessica Lin and Eamonn.
Spatial and Temporal Databases Efficiently Time Series Matching by Wavelets (ICDE 98) Kin-pong Chan and Ada Wai-chee Fu.
1 Dot Plots For Time Series Analysis Dragomir Yankov, Eamonn Keogh, Stefano Lonardi Dept. of Computer Science & Eng. University of California Riverside.
Fast Subsequence Matching in Time-Series Databases Christos Faloutsos M. Ranganathan Yannis Manolopoulos Department of Computer Science and ISR University.
Pattern Matching with Acceleration Data Pramod Vemulapalli.
Detecting Distance-Based Outliers in Streams of Data Fabrizio Angiulli and Fabio Fassetti DEIS, Universit `a della Calabria CIKM 07.
Analysis of Constrained Time-Series Similarity Measures
SISAP’08 – Approximate Similarity Search in Genomic Sequence Databases using Landmark-Guided Embedding Ahmet Sacan and I. Hakki Toroslu
A Query Adaptive Data Structure for Efficient Indexing of Time Series Databases Presented by Stavros Papadopoulos.
Efficient Elastic Burst Detection in Data Streams Yunyue Zhu and Dennis Shasha Department of Computer Science Courant Institute of Mathematical Sciences.
RDF: A Density-based Outlier Detection Method Using Vertical Data Representation Dongmei Ren, Baoying Wang, William Perrizo North Dakota State University,
Shape-based Similarity Query for Trajectory of Mobile Object NTT Communication Science Laboratories, NTT Corporation, JAPAN. Yutaka Yanagisawa Jun-ichi.
Clustering Moving Objects in Spatial Networks Jidong Chen, Caifeng Lai, Xiaofeng Meng, Renmin University of China Jianliang Xu, and Haibo Hu Hong Kong.
Abdullah Mueen Eamonn Keogh University of California, Riverside.
Exact indexing of Dynamic Time Warping
August 30, 2004STDBM 2004 at Toronto Extracting Mobility Statistics from Indexed Spatio-Temporal Datasets Yoshiharu Ishikawa Yuichi Tsukamoto Hiroyuki.
Stream Monitoring under the Time Warping Distance Yasushi Sakurai (NTT Cyber Space Labs) Christos Faloutsos (Carnegie Mellon Univ.) Masashi Yamamuro (NTT.
Graph preprocessing. Framework for validating data cleaning techniques on binary data.
Presented by Ho Wai Shing
1 Efficient and Effective Clustering Methods for Spatial Data Mining Raymond T. Ng, Jiawei Han Pavan Podila COSC 6341, Fall ‘04.
Di Yang, Zhengyu Guo, Elke A. Rundensteiner and Matthew O. Ward Worcester Polytechnic Institute EDBT 2010, Submitted 1 A Unified Framework Supporting Interactive.
Tomáš Skopal 1, Benjamin Bustos 2 1 Charles University in Prague, Czech Republic 2 University of Chile, Santiago, Chile On Index-free Similarity Search.
D-skyline and T-skyline Methods for Similarity Search Query in Streaming Environment Ling Wang 1, Tie Hua Zhou 1, Kyung Ah Kim 2, Eun Jong Cha 2, and Keun.
Streaming Pattern Discovery in Multiple Time-Series Jimeng Sun Spiros Papadimitrou Christos Faloutsos PARALLEL DATA LABORATORY Carnegie Mellon University.
NSF Career Award IIS University of California Riverside Eamonn Keogh Efficient Discovery of Previously Unknown Patterns and Relationships.
Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Chien-Shing Chen Author : Jessica K. Ting Michael K. Ng Hongqiang Rong Joshua Z. Huang 國立雲林科技大學.
Indexing Time Series. Outline Spatial Databases Temporal Databases Spatio-temporal Databases Multimedia Databases Time Series databases Text databases.
VizTree Huyen Dao and Chris Ackermann. Introducing example
Privacy Preserving Outlier Detection using Locality Sensitive Hashing
Online Parameter Optimization for Elastic Data Stream Processing Thomas Heinze, Lars Roediger, Yuanzhen Ji, Zbigniew Jerzak (SAP SE) Andreas Meister (University.
Presented by Niwan Wattanakitrungroj
Keogh, E. , Chakrabarti, K. , Pazzani, M. & Mehrotra, S. (2001)
Fast Approximate Query Answering over Sensor Data with Deterministic Error Guarantees Chunbin Lin Joint with Etienne Boursier, Jacque Brito, Yannis Katsis,
A Time Series Representation Framework Based on Learned Patterns
Time Series Filtering Time Series
Data Mining: Concepts and Techniques — Chapter 8 — 8
Data Mining: Concepts and Techniques — Chapter 8 — 8
Heavy Hitters in Streams and Sliding Windows
Time Series Filtering Time Series
Data Mining: Concepts and Techniques — Chapter 8 — 8
Liang Jin (UC Irvine) Nick Koudas (AT&T Labs Research)
Presentation transcript:

Efficient Anomaly Monitoring over Moving Object Trajectory Streams joint work with Lei Chen (HKUST) Ada Wai-Chee Fu (CUHK) Dawei Liu (CUHK) Yingyi Bu (Microsoft)

2 Outline Introduction Problem Statement Batch Monitoring Piecewise Index and Rescheduling Experiments Conclusion

3 Motivating Example (1) A strange trajectory!

4 Motivating Example (2)

5 Problem Statement (1) Base window – of length w b Left sliding window – of length w l Right sliding window – of length w r Detecting anomalies: look forward and backward

Problem Statement (2) Distance between two base windows: Euclidean distance (to any metric) Neighbor of Q: Distance (Q, C) < d Trajecoty stream anomaly (for base window Q)  N1: Q’s neighbor in its left sliding window  N2: Q’s neighbor in its right sliding window  If N1+N2<k, Q is anomaly k and d are parameters Problem: at every time tick, checking whether a base windows is an anomaly.

7 Simple Pruning: straight forward For every anomaly candidate base window  Randomly pick base windows, calculate distance  Searching range is limited to its left and right sliding window  Accumulate number of neighbors n  When n≥k, stop (the candidate is certified to be non-anomaly) Time cost  E(Y) ≤ [k/F x (d)]+ P a N (Theorem 1) [Bay03] Y– number of distance computations P a –anomaly rate F x (d)—rate of points within distance range d to base window x N—sliding window length  P a is tiny, then E(Y) is not relevant to sliding window’s length  Cost is still very high!

8 Can we prune some computations? Observation  Temporally close base windows usually are spatially close  Local continuity exists in most trajectory data Hint  Partition the stream and monitor by batch! Temporally faraway base windowsTemporally close base windows

9 Local Clustering Clustering Base Windows  Temporally continuous (threshold m)  Spatially close (threshold r) Online Clustering Algorithm  Incrementally decide whether a base window belong to previous local cluster or a new local cluster, upon its arrival

10 Batch Monitoring Case 1 Case 2 Case 3Case 4 Case 5 One computation, Big growth!

Further Improvement? Sad fact: Most computations are for non-anomalies  Not every cluster join is useful (e.g, “case 5”) Always falling in “case 1” are DISIRED! Measure the utility of cluster C for joining with Q  Dist (C.centriod, Q.centriod) could be a good estimate of utility of C. Case 1 Case 5 Good! Bad!

Index Clusters’ Pivots (centriods) Single index: update cost! No index: slow! Trade off: piecewise VP-trees over trajectory streams Benefit: efficient & zero update cost

Rescheduling: stop earlier for non- anomalies! Range query on a tree, with a larger range Increase neighbor count more quickly!

14 Experiments Datasets  Real World: movement, GE stock  Synthetic:random walk  Link: Configurations  Pentium IV 2.2GHz PC with 2GB RAM

15 Effectiveness Parameter k and d F-measure Vs. (k, d)

16 Parameters of w b and W Parameter setting: F-measure V.s. w b and W F-measure Vs. w b F-measure Vs. W

17 Experiments Average pruning power V.s. (dataset, w b ) Peers: Simple Pruning and DWT w b = 128 w b = 256

18 What about memory consumption? Average memory cost  Metric: unit (4 bytes)

19 Discussions & Extensions Local continuity  Very important for almost any work on time series and trajectories  DFT [Faloutsos94], DWT [Chan99], LB_Keogh [Keogh02] may encounter low pruning power without local continuity

20 Related Problems Burst Detection [Zhu02]  Could it capture general anomaly? Discord Detection [Keogh05]  Need global dataset  Endless stream ? Anomalies in traditional database  K-d outlier [Knorr00]  Density-based anomaly [Breunig00]  Pruning by clustering [Tao06]  Data are archived Cannot apply on trajectory streams!

21 What kind of anomalies? Visualized trajectory anomaly: from a GPS trajectory Anomaly: A Detour Zoomed Comparison

22 Conclusions Frame the problem Efficient monitoring by batch Piecewise index Experimental studies

23 Major references [Zhu02] Yunyue Zhu, Dennis Shasha: StatStream: Statistical Monitoring of Thousands of Data Streams in Real Time. In VLDB, [Keogh05] Eamonn J. Keogh, Jessica Lin, and AdaWai-Chee Fu. HOT SAX: Efficiently finding the most unusual time series subsequence. In ICDM, [Knorr00] Edwin M. Knorr, Raymond T. Ng, and V.Tucakov. Distance-based anomalies: Algorithms and applications. In VLDB J., [Breunig00] Markus M. Breunig, Hans-Peter Kriegel, Raymond T. Ng, Jörg Sander: LOF: identifying density-based local anomalies. In SIGMOD, [Bay03] Stephen D. Bay, Mark Schwabacher: Mining distance-based anomalies in near linear time with randomization and a simple pruning rule. In KDD, [Faloutsos94] Christos Faloutsos, M. Ranganathan, and Yannis Manolopoulos. Fast subsequence matching in time-series databases. In SIGMOD, 1994 [Chan99] Kin-Pong Chan and AdaWai-Chee Fu. Efficient time series matching by wavelets. In ICDE, [Keogh02] Eamonn J. Keogh. Exact indexing of dynamic time warping. In VLDB, [Tao06] Y. Tao, X. Xiao, and S. Zhou. Mining distance-based outliers from large databases in any metric space. In KDD, pages 394–403, 2006.

24 Thanks! Q & A