Trajectory Clustering

Slides:



Advertisements
Similar presentations
Density-Based Clustering Math 3210 By Fatine Bourkadi.
Advertisements

Incremental Clustering for Trajectories
Very simple to create with each dot representing a data value. Best for non continuous data but can be made for and quantitative data 2004 US Womens Soccer.
DBSCAN & Its Implementation on Atlas Xin Zhou, Richard Luo Prof. Carlo Zaniolo Spring 2002.
Clustering.
Cluster Analysis: Basic Concepts and Algorithms
November 12, 2013Computer Vision Lecture 12: Texture 1Signature Another popular method of representing shape is called the signature. In order to compute.
Data Mining Cluster Analysis: Advanced Concepts and Algorithms
2001/12/18CHAMELEON1 CHAMELEON: A Hierarchical Clustering Algorithm Using Dynamic Modeling Paper presentation in data mining class Presenter : 許明壽 ; 蘇建仲.
Clustering Prof. Navneet Goyal BITS, Pilani
CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Clustering CS 685: Special Topics in Data Mining Spring 2008 Jinze Liu.
More on Clustering Hierarchical Clustering to be discussed in Clustering Part2 DBSCAN will be used in programming project.
Automatic Histogram Threshold Using Fuzzy Measures 呂惠琪.
Image Segmentation Image segmentation (segmentace obrazu) –division or separation of the image into segments (connected regions) of similar properties.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ What is Cluster Analysis? l Finding groups of objects such that the objects in a group will.
Lecture 6 Image Segmentation
Cluster Analysis.
Data Mining Cluster Analysis: Advanced Concepts and Algorithms Lecture Notes for Chapter 9 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Cluster Analysis.
1 Techniques for Efficient Road- Network-Based Tracking of Moving Objects Speaker : Jia-Hui Huang Date : 2006/10/23.
Mining Long Sequential Patterns in a Noisy Environment Jiong Yang, Wei Wang, Philip S. Yu, Jiawei Han SIGMOD 2002.
Clustering Ram Akella Lecture 6 February 23, & 280I University of California Berkeley Silicon Valley Center/SC.
E.G.M. PetrakisBinary Image Processing1 Binary Image Analysis Segmentation produces homogenous regions –each region has uniform gray-level –each region.
Radial Basis Function Networks
Evaluating Performance for Data Mining Techniques
October 8, 2013Computer Vision Lecture 11: The Hough Transform 1 Fitting Curve Models to Edges Most contours can be well described by combining several.
CHAMELEON : A Hierarchical Clustering Algorithm Using Dynamic Modeling
October 14, 2014Computer Vision Lecture 11: Image Segmentation I 1Contours How should we represent contours? A good contour representation should meet.
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
Chapter 9 – Classification and Regression Trees
Vladyslav Kolbasin Stable Clustering. Clustering data Clustering is part of exploratory process Standard definition:  Clustering - grouping a set of.
1 CSE 980: Data Mining Lecture 17: Density-based and Other Clustering Algorithms.
Garrett Poppe, Liv Nguekap, Adrian Mirabel CSUDH, Computer Science Department.
Digital Image Processing CCS331 Relationships of Pixel 1.
Density-Based Clustering Algorithms
December 9, 2014Computer Vision Lecture 23: Motion Analysis 1 Now we will talk about… Motion Analysis.
Trajectory Outlier Detection: A Partition-and-Detect Framework1 04/08/08 April 8, 2007 Trajectory Outlier Detection: A Partition-and-Detect Framework Jae-Gil.
黃福銘 (Angus). Angus Fuming Huang Academia Sinica, Institute of Information Science, ANTS Lab Jae-Gil Lee Jiawei Han UIUC Kyu-Young Whang KAIST ACM SIGMOD’07.
Presented by Ho Wai Shing
Density-Based Clustering Methods. Clustering based on density (local cluster criterion), such as density-connected points Major features: –Discover clusters.
1 Knowledge Discovery from Transportation Network Data Paper Review Jiang, W., Vaidya, J., Balaporia, Z., Clifton, C., and Banich, B. Knowledge Discovery.
Course14 Dynamic Vision. Biological vision can cope with changing world Moving and changing objects Change illumination Change View-point.
By Pushpita Biswas Under the guidance of Prof. S.Mukhopadhyay and Prof. P.K.Biswas.
October 1, 2013Computer Vision Lecture 9: From Edges to Contours 1 Canny Edge Detector However, usually there will still be noise in the array E[i, j],
Parameter Reduction for Density-based Clustering on Large Data Sets Elizabeth Wang.
April 21, 2016Introduction to Artificial Intelligence Lecture 22: Computer Vision II 1 Canny Edge Detector The Canny edge detector is a good approximation.
October 3, 2013Computer Vision Lecture 10: Contour Fitting 1 Edge Relaxation Typically, this technique works on crack edges: pixelpixelpixel pixelpixelpixelebg.
Digital Image Processing (DIP)
Clustering Anna Reithmeir Data Mining Proseminar 2017
What Is Cluster Analysis?
Data Mining: Basic Cluster Analysis
More on Clustering in COSC 4335
Clustering CSC 600: Data Mining Class 21.
Chen Jimena Melisa Parodi Menashe Shalom
Mean Shift Segmentation
CS 685: Special Topics in Data Mining Jinze Liu
Fitting Curve Models to Edges
Range Imaging Through Triangulation
Computer Vision Lecture 9: Edge Detection II
Critical Issues with Respect to Clustering
CSE572, CBS572: Data Mining by H. Liu
MIS2502: Data Analytics Clustering and Segmentation
MIS2502: Data Analytics Clustering and Segmentation
Clustering Wei Wang.
CSE572: Data Mining by H. Liu
Computer and Robot Vision I
CS 685: Special Topics in Data Mining Jinze Liu
Introduction to Artificial Intelligence Lecture 22: Computer Vision II
Presentation transcript:

Trajectory Clustering Tianxiao Jiang, Biomedical Engineering Department, University of Houston

Visual aided approach Automatic method may discover movement patterns that are interesting but trivial. Visual analytics can control the computational process. by setting different input parameters, interpret the results and direct the algorithm towards meaningful solution

A density based clustering method: OPTICS Definition 1 (Core Object): Let p be an object in a dataset D, Eps a distance threshold, MinNbs (minimum number of neighbors), p is a core object if NEps(p) >= MinNbs. Definition 2 (Reachability Distance): The reachability distance of p with respect to o is essentially the distance between the two objects except for the case when p is too close. In this case, the distance is normalized to the core distance. Brief introduction of OPTICS Initially a random object p0 is chosen. Then, at each iteration i, the next object pi chosen from D is the object with the smallest reachability distance with respect to all already visited core objects. The process is repeated until all objects in D have been considered.

Progressive Clustering A large value MinNbs (decide core object) is applied Analyst play with parameters (Eps, MinNbs) until getting a few clusters that are spatially coherent and well interpretable After examining, describing, and explaining the most significant clusters, the analyst excludes their members from the further analysis and applies clustering with smaller MinNbs to the remaining data Repeat the process (Go to the second step)

Common Destination GPS-tracking of 17241 cars in Milan. A subset of 6187 trajectories on Wednesday from 6:00 AM to 10:00 AM were analyzed Same distance function: common destination. Different parameters It is usually difficult for density-based clustering technique to detect uneven density dataset Reduce Reduce Eps=500m MinNbs=15 Eps=500m MinNbs=10 Eps=500m MinNbs=5 It is usually difficult for density-based clustering technique to detect uneven density dataset

Route similarity & Common source and destination Reduce A great part of them are very short trajectories within the center. 1074 trajectories, i.e. 63% of 1781, have been classified as “noise”, which means the absence of frequent routes among them. In most cases when trajectories have close starts and close ends are short. Only 127 trajectories (2%) are long. Clusters where the starts are close to the ends include 1439 trajectories, or 23% of all (75% have been regarded as “noise”; they are shown in grey). On center of Milan (red) Route similarity Eps =1000m, MinNbs=5 512 trajectories (29%) On center of Milan (red) Route similarity Eps =1000m, MinNbs=5 Cluster size 29% 11 to 28 trajectories Whole data set Common source and destination Eps =800, MinNbs=8 Cluster size 29% 11 to 28 trajectories

Frequently taken route (Route similarity) peripheral routes using the belt around the city radial routes from the periphery towards the inner city radial routes from the inner city towards the periphery Local and round trips are excluded (25%). The biggest cluster (bright green lines in the middle image) consists of 66 trajectories. In different experiments, from 77 to 90% of the trajectories are treated as “noise”, i.e. no frequent routes are found among them. Long trajectories Route similarity Eps =800 m, MinNbs=5 Long trajectories Route similarity Eps =800 m, MinNbs=5 Long trajectories Route similarity Eps =800 m, MinNbs=5

Micro-clustering Traditional trajectory clustering algorithm group similar trajectory as a whole, sometimes it can miss common sub-trajectory. Trajectory might have different lengths Discovering common sub-trajectory is very useful in some applications.

Micro clustering 1 Represent trajectory as piece-wise segments (possibly with missing intervals) Find a maximal segment where all trajectories are pair-wise close to each other The similarity of trajectories is based on the time length of close segments

Micro clustering 2 Segments of different trajectories within a given rectangle are grouped together if they occur in similar time intervals Determine the maximal group size and temporal dimension within the threshold rectangle

Micro clustering 3 A Partition-and-Group Framework Partition trajectory into sub-trajectories using minimum description length (MDL) principle Group similar segments using a density based clustering algorithm Notice that a trajectory can belong to multiple clusters since a trajectory is partitioned into multiple line segments, and clustering is performed over these line segments.

Partition and group framework

Q: Why not prune the ‘useless’ part of the trajectory so that we can use traditional clustering algorithm on the ‘interesting’ part of the trajectory ? It is tricky to determine which part is ‘useless’ and which part is ‘important’. ‘Pruning’ forbids us from discovering unexpected clustering results.

Q: Why density-based clustering ? Trajectory is often of arbitrary shapes and noisy Density-based clustering methods [2, 6] are the most suitable for line segments because they can discover clusters of arbitrary shape and can filter out noises [11]. 2, Ankerst, M., Breunig, M. M., Kriegel, H.P., and Sander, J., “OPTICS: Ordering Points to Identify the Clustering Structure,” In Proc. 1999 ACM SIGMOD Int’l Conf. on Management of Data, Philadelphia, Pennsylvania, pp. 49–60, June 1999. 6, Ester, M., Kriegel, H.-P., Sander, J., and Xu, X., “A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise,” In Proc. 2nd Int’l Conf. on Knowledge Discovery and Data Mining, Portland, Oregon, pp. 226–231, Aug. 1996. 11,Han, J. and Kamber, M., Data Mining: Concepts and Techniques, 2nd ed., Morgan Kaufmann, 2006.

Distance function Perpendicular distance (d⊥, Lehmer mean of order 2) Parallel distance (d||) Angle distance dθ The parallel distance is designed to be robust to detection errors, especially broken line segments. If we use MAX instead of MIN, the parallel distance could be significantly distorted by broken line segments. The angle distance is designed for trajectories with directions. If we handle trajectories without directions, the angle distance is defined as simply L×sin(θ)

Distance function dist(Li,Lj) = w1*d⊥(Li,Lj)+w2 *d||(Li,Lj)+ w3*dθ(Li,Lj). The weights w1, w2 and w3 are determined depending on applications. Equal weights generally works well in many applications. Symmetric: dist(Li,Lj) = dist(Lj,Li) Symmetry is important to avoid ambiguity of the clustering result. If a distance function is asymmetric, a different clustering result can be obtained depending on the order of processing.

Phase-1: Partition Characteristic points: points where the behavior of a trajectory changes rapidly. The optimal partitioning of a trajectory is a tradeoff between preciseness and conciseness. Preciseness means that the difference between a trajectory and a set of its trajectory partitions should be as small as possible. Conciseness means that the number of trajectory partitions should be as small as possible.

Minimum description length (MDL) Objective: Minimize L(H)+L(D|H) MDL is widely used in information theory for data compression. Due to the triangle inequality, L(H) increases as the number of trajectory partitions. L(D|H) increases as a set of trajectory partitions deviates from the trajectory. L(H) measures the degree of conciseness L(D|H) that of preciseness.

Phase-2: Line segments clustering (DBSCAN) ε-neighborhood in line segments is not a circle or a sphere but likely to be an ellipse or an ellipsoid. The shapes are affected by the direction and length of a core line segment and those of border line segments.

Short segments induce over-clustering If one of line segments is very short, angle distance is small regardless of the actual angle. The result in (a) is counter-intuitive since L1 and L3 are far apart To overcome this problem, segments can be made longer at the cost of preciseness

Representative Trajectory of a Cluster While sweeping a vertical line across line segments in the direction of the major axis of a cluster, we count the number of the line segments hitting the sweep line. This number changes only when the sweep line passes a starting point or an ending point. If this number is equal to or greater than MinLns, we compute the average coordinate of those line segments with respect to the major axis and insert the average into the representative trajectory; otherwise, we skip the current point (e.g., the 5th and 6th positions in Figure 13). Besides, if a previous point is located too close (e.g., the 3rd position in Figure 13), we skip the current point to smooth the representative trajectory

Heuristic Parameter Selection ε: Minimize entropy Maximum Entropy Carbon crystal ! A meaningful cluster will have skewed structure and small entropy. This is natural since MinLns should be greater than avg|Nε(L)| to discover meaningful clusters MinLns: avg|Nε(L)| + 1 ∼ 3.

Hurricane track The result in Figure 18 is quite reasonable. We know that some hurricanes move along a curve, changing their direction from east-to-west to south-to-north, and then to west- to-east. On the other hand, some hurricanes move along a straight east-to-west line or a straight west-to-east line. The lower horizontal cluster represents the east-to-west movements, the upper horizontal one the west-to-east movements, and the vertical ones the south-to-north movements.

Flocks and Convoy Discovering group of objects that move together during a given period of time. For example, migrating animals, flocks of birds or convoys of vehicles e.g. propose moving clusters to describe the problem of discovery of sequence of clusters in which objects may leave or enter the cluster during some time interval but having the portion of common objects higher than a predefined threshold…

The End