Trajectory Outlier Detection: A Partition-and-Detect Framework1 04/08/08 April 8, 2007 Trajectory Outlier Detection: A Partition-and-Detect Framework Jae-Gil.

Slides:

Advertisements

Similar presentations

Xiaolei Li, Zhenhui Li, Jiawei Han, Jae-Gil Lee. 1. Motivation 2. Anomaly Definitions 3. Algorithm 4. Experiments 5. Conclusion.

Advertisements

Incremental Clustering for Trajectories

Author: Carlos Pacheco, Shuvendu K. Lahiri, Michael D. Ernst, Thomas Ball MIT CSAIL.

Ranking Outliers Using Symmetric Neighborhood Relationship Wen Jin, Anthony K.H. Tung, Jiawei Han, and Wei Wang Advances in Knowledge Discovery and Data.

Mining Compressed Frequent- Pattern Sets Dong Xin, Jiawei Han, Xifeng Yan, Hong Cheng Department of Computer Science University of Illinois at Urbana-Champaign.

Clustering Categorical Data The Case of Quran Verses

Query Optimization of Frequent Itemset Mining on Multiple Databases Mining on Multiple Databases David Fuhry Department of Computer Science Kent State.

VLDB  Motivation  TraClass: Trajectory Feature Generation  Trajectory Partitioning  Region-Based Clustering  Trajectory-Based.

Chapter 7 – Classification and Regression Trees

Data Mining Techniques: Clustering

Video Shot Boundary Detection at RMIT University Timo Volkmer, Saied Tahaghoghi, and Hugh E. Williams School of Computer Science & IT, RMIT University.

An Approach to Evaluate Data Trustworthiness Based on Data Provenance Department of Computer Science Purdue University.

66: Priyanka J. Sawant 67: Ayesha A. Upadhyay 75: Sumeet Sukthankar.

A Novel Scheme for Video Similarity Detection Chu-Hong Hoi, Steven March 5, 2003.

Nearest Neighbor. Predicting Bankruptcy Nearest Neighbor Remember all your data When someone asks a question –Find the nearest old data point –Return.

Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.

Avatar Path Clustering in Networked Virtual Environments Jehn-Ruey Jiang, Ching-Chuan Huang, and Chung-Hsien Tsai Adaptive Computing and Networking Lab.

Data Mining: Concepts and Techniques (3rd ed.) — Chapter 12 —

Decision Tree Algorithm

Cluster Analysis.  What is Cluster Analysis?  Types of Data in Cluster Analysis  A Categorization of Major Clustering Methods  Partitioning Methods.

An Experimental Evaluation on Reliability Features of N-Version Programming Xia Cai, Michael R. Lyu and Mladen A. Vouk ISSRE’2005.

Computing Sketches of Matrices Efficiently & (Privacy Preserving) Data Mining Petros Drineas Rensselaer Polytechnic Institute (joint.

Cluster Analysis (1).

Preference Analysis Joachim Giesen and Eva Schuberth May 24, 2006.

Unsupervised Rough Set Classification Using GAs Reporter: Yanan Yean.

Video Trails: Representing and Visualizing Structure in Video Sequences Vikrant Kobla David Doermann Christos Faloutsos.

1 Software Testing Techniques CIS 375 Bruce R. Maxim UM-Dearborn.

Radial Basis Function Networks

Fast Subsequence Matching in Time-Series Databases Christos Faloutsos M. Ranganathan Yannis Manolopoulos Department of Computer Science and ISR University.

Tokyo Research Laboratory © Copyright IBM Corporation 2009 | 2009/04/03 | SDM 09 / Travel-Time Prediction Travel-Time Prediction using Gaussian Process.

Overview of Distributed Data Mining Xiaoling Wang March 11, 2003.

Data Mining Chun-Hung Chou

Masquerade Detection Mark Stamp 1Masquerade Detection.

Exploiting Segmentation for Robust 3D Object Matching Michael Krainin, Kurt Konolige, and Dieter Fox.

1. cluster the data. 2. for the data of a cluster, set up the network. 3. begin at a random vertex as source/sink s, choose its farthest vertex as the.

Distributed Constraint Optimization Michal Jakob Agent Technology Center, Dept. of Computer Science and Engineering, FEE, Czech Technical University A4M33MAS.

Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.

Gwangju Institute of Science and Technology Intelligent Design and Graphics Laboratory Multi-scale tensor voting for feature extraction from unstructured.

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Introduction SNR Gain Patterns Beam Steering Shading Resources: Wiki:

1 Information Filtering & Recommender Systems (Lecture for CS410 Text Info Systems) ChengXiang Zhai Department of Computer Science University of Illinois,

Outlier Detection Using k-Nearest Neighbour Graph Ville Hautamäki, Ismo Kärkkäinen and Pasi Fränti Department of Computer Science University of Joensuu,

1 A Bayesian Method for Guessing the Extreme Values in a Data Set Mingxi Wu, Chris Jermaine University of Florida September 2007.

黃福銘 (Angus F.M. Huang) ANTS Lab, IIS, Academia Sinica TrajPattern: Mining Sequential Patterns from Imprecise Trajectories.

Chapter 9 – Classification and Regression Trees

Automatic Image Annotation by Using Concept-Sensitive Salient Objects for Image Content Representation Jianping Fan, Yuli Gao, Hangzai Luo, Guangyou Xu.

Clustering Moving Objects in Spatial Networks Jidong Chen, Caifeng Lai, Xiaofeng Meng, Renmin University of China Jianliang Xu, and Haibo Hu Hong Kong.

Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.

Ground Truth Free Evaluation of Segment Based Maps Rolf Lakaemper Temple University, Philadelphia,PA,USA.

Stefan Mutter, Mark Hall, Eibe Frank University of Freiburg, Germany University of Waikato, New Zealand The 17th Australian Joint Conference on Artificial.

Detecting Group Differences: Mining Contrast Sets Author: Stephen D. Bay Advisor: Dr. Hsu Graduate: Yan-Cheng Lin.

Spatio-temporal Pattern Queries M. Hadjieleftheriou G. Kollios P. Bakalov V. J. Tsotras.

黃福銘 (Angus). Angus Fuming Huang Academia Sinica, Institute of Information Science, ANTS Lab Jae-Gil Lee Jiawei Han UIUC Kyu-Young Whang KAIST ACM SIGMOD’07.

Threshold Setting and Performance Monitoring for Novel Text Mining Wenyin Tang and Flora S. Tsai School of Electrical and Electronic Engineering Nanyang.

Accelerating Dynamic Time Warping Clustering with a Novel Admissible Pruning Strategy Nurjahan BegumLiudmila Ulanova Jun Wang 1 Eamonn Keogh University.

Yield Cleaning Software and Techniques OFPE Meeting

1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.

Fast Query-Optimized Kernel Machine Classification Via Incremental Approximate Nearest Support Vectors by Dennis DeCoste and Dominic Mazzoni International.

Discriminative Frequent Pattern Analysis for Effective Classification By Hong Cheng, Xifeng Yan, Jiawei Han, Chih- Wei Hsu Presented by Mary Biddle.

Extracting stay regions with uncertain boundaries from GPS trajectories a case study in animal ecology Haidong Wang.

Paper_topic: Parallel Matrix Multiplication using Vertical Data.

Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:

Similarity Measurement and Detection of Video Sequences Chu-Hong HOI Supervisor: Prof. Michael R. LYU Marker: Prof. Yiu Sang MOON 25 April, 2003 Dept.

May 2003 SUT Color image segmentation – an innovative approach Amin Fazel May 2003 Sharif University of Technology Course Presentation base on a paper.

1 A latent information function to extend domain attributes to improve the accuracy of small-data-set forecasting Reporter ： Zhao-Wei Luo Che-Jung Chang,Der-Chiang.

Cluster Analysis What is Cluster Analysis? Types of Data in Cluster Analysis A Categorization of Major Clustering Methods Partitioning Methods.

Shadow Detection in Remotely Sensed Images Based on Self-Adaptive Feature Selection Jiahang Liu, Tao Fang, and Deren Li IEEE TRANSACTIONS ON GEOSCIENCE.

1 Minimum Bayes-risk Methods in Automatic Speech Recognition Vaibhava Geol And William Byrne IBM ； Johns Hopkins University 2003 by CRC Press LLC 2005/4/26.

Feeling-based location privacy protection for LBS

GPX: Interactive Exploration of Time-series Microarray Data

Continuous Density Queries for Moving Objects

Presentation transcript:

Trajectory Outlier Detection: A Partition-and-Detect Framework1 04/08/08 April 8, 2007 Trajectory Outlier Detection: A Partition-and-Detect Framework Jae-Gil Lee, Jiawei Han, and Xiaolei Li Department of Computer Science University of Illinois at Urbana-Champaign ICDE 2008

Trajectory Outlier Detection: A Partition-and-Detect Framework2 04/08/08 Table of Contents  Motivation  Partition-and-Detect Framework  Outlier Detection Algorithm: TRAOD Partitioning Phase (Simple) Detection Phase Partitioning Phase (Enhanced)  Performance Evaluation  Related Work  Conclusions

Trajectory Outlier Detection: A Partition-and-Detect Framework3 04/08/08 Outlier Detection  Definition: the process of detecting a data object that is grossly different from or inconsistent with the remaining set of data  Applications: the detection of credit card fraud, the monitoring of criminal activities in electronic commerce, etc.  Algorithms: distribution-based, distance-based, density- based, and deviation-based  Target data: previous research has mainly dealt with outlier detection of point data

Trajectory Outlier Detection: A Partition-and-Detect Framework4 04/08/08 Analysis on Trajectory Data  Tremendous amounts of trajectory data of moving objects are being collected Example : vehicle positioning data, hurricane tracking data, animal movement data, etc.  Trajectory outlier detection has many important, real- world applications Detection of suspicious persons in video surveillance Analysis of unusual air-mass trajectories in meteorology …  A powerful outlier detection algorithm for trajectories is needed urgently

Trajectory Outlier Detection: A Partition-and-Detect Framework5 04/08/08 Limitations of Existing Algorithms  Knorr et al. [5] have presented one of very few attempts Define the distance between two whole trajectories using the summary information (e.g., the coordinates of the starting and ending points) Apply a distance-based approach to detection of trajectory outliers  Existing algorithms might not be able to detect outlying portions of trajectories Example : TR 3 is not detected as an outlier since its overall behavior is similar to those of neighboring trajectories TR 5 TR 1 TR 4 TR 3 TR 2 An outlying sub-trajectory

Trajectory Outlier Detection: A Partition-and-Detect Framework6 04/08/08 Discovery of Outlying Sub-Trajectories  Discovery of outlying sub -trajectories is very useful in the real world Example : S udden changes in hurricane’s path [10]  We propose the partition-and-detect framework

Trajectory Outlier Detection: A Partition-and-Detect Framework7 04/08/08 The Partition-and-Detect Framework  Consists of two phases: partitioning and detection TR 5 TR 1 TR 4 TR 3 TR 2 A set of trajectories (1) Partition (2) Detect TR 3 A set of trajectory partitions An outlier Outlying trajectory partitions Note : A set of outlying trajectory partitions indicates an outlying sub- trajectory

Trajectory Outlier Detection: A Partition-and-Detect Framework8 04/08/08 The Problem Statement I O  Given a set of trajectories I = { TR 1, …, TR n }, our algorithm generates a set of outliers O = { O 1, …, O m } with outlying trajectory partitions for each O i  Necessary definitions: A trajectory is a sequence of multi-dimensional points, which is denoted as TR i = p 1 p 2 p 3 … p j … p leni ; a trajectory partition ( t-partition for short) is a line segment p i p j ( i < j ), where p i and p j are the points chosen from the same trajectory A t-partition is outlying if it does not have a sufficient number of similar neighbors A trajectory is an outlier if it contains a non-negligible amount of outlying t-partitions

Trajectory Outlier Detection: A Partition-and-Detect Framework9 04/08/08 The Outlier Detection Algorithm: TRAOD  Based on the partition-and-detect framework Algorithm TRAOD (TRAjectory Outlier Detection) I Input : A set of trajectories I = { TR 1, …, TR n } O Output : A set of outliers O = { O 1, …, O m } with outlying t-partitions for each O i Algorithm : /* Partitioning Phase */ I 01: for each TR  I do L 02: Partition TR into a set L of line segments; LD 03: Accumulate L into a set D ; /* Detection Phase */ D 04: for each P  D do 05: Mark P if it is an outlying t-partition; I 06: for each TR  I do 07: Output TR if it is an outlier;

Trajectory Outlier Detection: A Partition-and-Detect Framework10 04/08/08 Where We Are Now /* Partitioning Phase */ I 01: for each TR  I do L 02: Partition TR into a set L of line segments LD 03: Accumulate L into a set D ; /* Detection Phase */ D 04: for each P  D do 05: Mark P if it is an outlying t-partition; I 06: for each TR  I do 07: Output TR if it is an outlier; by a simple strategy; by a two-level partitioning strategy;

Trajectory Outlier Detection: A Partition-and-Detect Framework11 04/08/08 A Simple Partitioning Strategy (1/2)  Careless partitioning (especially, in a long length) could miss possible outliers Example : Even though TR out behaves differently from its neighboring trajectories, these differences are averaged out due to careless partitioning Neighboring Trajectories A t-partition A trajectory TR out

Trajectory Outlier Detection: A Partition-and-Detect Framework12 04/08/08 A Simple Partitioning Strategy (2/2)  A trajectory is partitioned at a base unit : the smallest meaningful unit of a trajectory in a given application Example : The base unit can be every single point Pros : high detection quality in general Cons : poor performance due to a large number of t-partitions  remedied by a two-level partitioning strategy Neighboring Trajectories A t-partition A trajectory TR out An outlying t-partition

Trajectory Outlier Detection: A Partition-and-Detect Framework13 04/08/08 Where We Are Now /* Partitioning Phase */ I 01: for each TR  I do L 02: Partition TR into a set L of line segments LD 03: Accumulate L into a set D ; /* Detection Phase */ D 04: for each P  D do 05: Mark P if it is an outlying t-partition; I 06: for each TR  I do 07: Output TR if it is an outlier; by a simple strategy; by a two-level partitioning strategy;

Trajectory Outlier Detection: A Partition-and-Detect Framework14 04/08/08 Distance between T-Partitions  The weighted sum of three components: the perpendicular distance ( ), parallel distance ( ), and angle distance ( ) Adapted from similarity measures used in the domain of pattern recognition [13]

Trajectory Outlier Detection: A Partition-and-Detect Framework15 04/08/08 Trajectory Outliers Based on Distance (1/2)  Def. (a close trajectory ):  Def. (an outlying t-partition ): TR j is close to L i TR j is not close to L i L i is an outlying t-partition L i is not an outlying t-partition Not close ≤ 1 ‒ p Close > 1 ‒ p

Trajectory Outlier Detection: A Partition-and-Detect Framework16 04/08/08 Trajectory Outliers Based on Distance (2/2)  Def. (an outlier ): A trajectory TR i is an outlier if the sum of the lengths of all t-partitions in TR i the sum of the lengths of outlying t-partitions in TR i ≥ F TR i TR j TR i is an outlier TR j is not an outlier

Trajectory Outlier Detection: A Partition-and-Detect Framework17 04/08/08 Incorporation of Density (1/2)  The previous definition, as it is, has the local density problem A t-partition in a dense region tends to have relatively a larger number of close trajectories than that in a sparse region T-Partitions in dense regions are favored!

Trajectory Outlier Detection: A Partition-and-Detect Framework18 04/08/08 Incorporation of Density (2/2)  Def. (the density of a t-partition): The density of a t-partition L i is the number of t-partitions within the distance σ from L i, where σ is the standard deviation of pairwise distances between t-partitions  Def. (the adjusting coefficient of a t-partition):  Adjustment by the density The number of close trajectories is multiplied by the adjusting coefficient adj ( L i ) adj ( L i ) < 1.0 in a dense region adj ( L i ) > 1.0 in a sparse region the density of the t-partition L i the average density of all t-partitions adj ( L i ) =

Trajectory Outlier Detection: A Partition-and-Detect Framework19 04/08/08 Guidelines for Parameter Values  Three parameters: D corresponds to similar, p to sufficient, and F to non-negligible  Remark: There is no universally correct parameter value even for the same data set and application  Our guideline: Resorts on user feedback Want Many Outliers? Have Many Trajectories? Are Trajectories Short? DpFDpF SmallerLarger

Trajectory Outlier Detection: A Partition-and-Detect Framework20 04/08/08 Where We Are Now /* Partitioning Phase */ I 01: for each TR  I do L 02: Partition TR into a set L of line segments LD 03: Accumulate L into a set D ; /* Detection Phase */ D 04: for each P  D do 05: Mark P if it is an outlying t-partition; I 06: for each TR  I do 07: Output TR if it is an outlier; by a simple strategy; by a two-level partitioning strategy;

Trajectory Outlier Detection: A Partition-and-Detect Framework21 04/08/08 Two-Level Trajectory Partitioning  Objective Achieves much higher performance than the simple strategy Obtains the same result as that of the simple strategy; i.e., does not lose the quality of the result  Basic idea 1. Partition a trajectory in coarse granularity first 2. Partition a coarse t-partition in fine granularity only when necessary  Main benefit Narrows the search space that needs to be inspected in fine granularity  Many portions of trajectories can be pruned early on

Trajectory Outlier Detection: A Partition-and-Detect Framework22 04/08/08 Intuition to Two-Level Trajectory Partitioning  If the distance between coarse t-partitions is very large (or small), the distances between their fine t-partitions is also very large (or small) TR i TR j Coarse-Granularity Partitioning Fine-Granularity Partitioning Given two coarse t-partitions, can we know if the distance between any two fine t-partitions is greater than (or less than) D ?

Trajectory Outlier Detection: A Partition-and-Detect Framework23 04/08/08 Coarse-Granularity Partitioning*  Try to maximize two rivalry measures Preciseness : the difference between a trajectory and a set of its coarse t-partitions should be as small as possible −Required for making the bounds tight Conciseness : the number of coarse t-partitions should be as small as possible −Required for reducing the number of comparisons  Formulate this problem using the minimum length description (MDL) principle A good tradeoff between the two measures is found based on the information theory * Coarse-granularity partitioning is identical to that in our earlier work on trajectory clustering [15]

Trajectory Outlier Detection: A Partition-and-Detect Framework24 04/08/08 Fine-Granularity Partitioning  Identify outlying coarse t-partitions by deriving the distance bounds between two coarse t-partitions L i and L j Suppose l i is a fine t-partition in L i and l j is that in L j Derive the above bounds separately for ( Lemmas 1~3 ) and combine them ( Lemma 4 ) TR i TR j LiLi LjLj lb ( L i, L j, f ) The lower bound of f ( l i, l j ), ub ( L i, L j, f ) The upper bound of f ( l i, l j ),

Trajectory Outlier Detection: A Partition-and-Detect Framework25 04/08/08 Derivation of the Distance Bounds Lemma 1. Bounds for Lemma 2. Bounds for Lemma 3. Bounds for Lemma 4. Bounds for dist ( L i, L j ) Combine

Trajectory Outlier Detection: A Partition-and-Detect Framework26 04/08/08 Pruning Rules for Fine-Granularity Partitioning  Rule 1 : If lb ( L i, L j, dist ) > D, fine-granularity partitioning is not required when comparing L i and L j  Rule 2 : If ub ( L i, L j, dist ) ≤ D, fine-granularity partitioning is required, but the distance between the fine t-partitions in L i and L j needs not be computed > D LiLi LjLj lb ( L i, L j, dist ) > D LiLi LjLj ub ( L i, L j, dist ) ≤ D ≤ D

Trajectory Outlier Detection: A Partition-and-Detect Framework27 04/08/08 Performance Evaluation  Use two real trajectory data sets Hurricane track data set −Records the Atlantic hurricanes for the years 1950 through 2006 −The entire set: 608 trajectories and 18,951 points; A small set (1990~2006): 221 trajectories and 7,270 points Animal movement data set −Records the locations of elk, deer, and cattle for the years 1993 through 1996 (the Starkey Project) − Elk1993 : 33 trajectories and 15,422 points; Deer1995 : 32 trajectories and 20,065 points; Cattle1993 : 41 trajectories and 19,556 points  Validate the quality of outlier detection  Evaluate the effectiveness of the two-level partitioning strategy

Trajectory Outlier Detection: A Partition-and-Detect Framework28 04/08/08 Trajectory Outliers for Hurricane Data (Small) D = 85, p = 0.95, F = 0.2 → # of outliers = 13

Trajectory Outlier Detection: A Partition-and-Detect Framework29 04/08/08 Trajectory Outliers for Elk1993 D = 55, p = 0.95, F = 0.1 → # of outliers = 3

Trajectory Outlier Detection: A Partition-and-Detect Framework30 04/08/08 Trajectory Outliers for Deer1995 D = 80, p = 0.95, F = 0.1 → # of outliers = 3

Trajectory Outlier Detection: A Partition-and-Detect Framework31 04/08/08 Effects of Parameter Values (a) D = 83, p = 0.95, F = 0.2 (b) D = 87, p = 0.95, F = outliers 10 outliers

Trajectory Outlier Detection: A Partition-and-Detect Framework32 04/08/08 Pruning Power of Two-Level Partitioning  2L-Total : the ratio of the number of pairs pruned by Rule 1 to the total number of pairs of coarse t-partitions  2L-False : the proportion of pairs pruned incorrectly  Optimal : the maximum ratio of pairs that can be pruned Achieves high pruning power (64~88%)

Trajectory Outlier Detection: A Partition-and-Detect Framework33 04/08/08 Speedup Ratio of Two-Level Partitioning the elapsed time of the algorithm using the simple partitioning strategy the elapsed time of the algorithm using the two-level partitioning strategy Speedup Ratio = Shows significant performance improvement

Trajectory Outlier Detection: A Partition-and-Detect Framework34 04/08/08 Related Work  Outlier detection algorithms for points Distribution-based [2], distance-based [3, 4, 5, 6], density-based [7, 8], deviation-based [9]  Trajectory outlier detection technique using a distance- based approach [5] Not clear whether this technique can detect outlying sub - trajectories from very complicated trajectories  Trajectory outlier detection algorithms based on classification [12] Require a good training set and depend on training

Trajectory Outlier Detection: A Partition-and-Detect Framework35 04/08/08ConclusionsConclusions  Proposed a novel framework, the partition-and-detect framework, for detecting trajectory outliers  For the 1 st phase, proposed a two-level trajectory partitioning strategy Ensures both high quality and high efficiency  For the 2 nd phase, proposed a hybrid of the distance-based and density-based approaches Very intuitive, but does not have the local density problem  Demonstrated the effectiveness of TRAOD using various real trajectory data

Trajectory Outlier Detection: A Partition-and-Detect Framework36 04/08/08 Thank You!