Fundamental tools: clustering

Slides:



Advertisements
Similar presentations
Maximum flow Main goals of the lecture:
Advertisements

Approximation algorithms for geometric intersection graphs.
Part VI NP-Hardness. Lecture 23 Whats NP? Hard Problems.
Trajectory Segmentation Marc van Kreveld. Algorithms Researchers … … want their problems to be well-defined (fully specified) … care about efficiency.
NP-Hard Nattee Niparnan.
Polygon Triangulation
Minimum Clique Partition Problem with Constrained Weight for Interval Graphs Jianping Li Department of Mathematics Yunnan University Jointed by M.X. Chen.
Computational Movement Analysis Lecture 3:
Computational Geometry
1 CSE 980: Data Mining Lecture 16: Hierarchical Clustering.
Introduction to Kernel Lower Bounds Daniel Lokshtanov.
Approximations of points and polygonal chains
Motion Planning for Point Robots CS 659 Kris Hauser.
Lecture 24 Coping with NPC and Unsolvable problems. When a problem is unsolvable, that's generally very bad news: it means there is no general algorithm.
On Map-Matching Vehicle Tracking Data
CMPS 3120: Computational Geometry Spring 2013
Computational Movement Analysis Lecture 4: Movement patterns Joachim Gudmundsson.
Approximation Algorithms Chapter 5: k-center. Overview n Main issue: Parametric pruning –Technique for approximation algorithms n 2-approx. algorithm.
Computing the Fréchet Distance Between Folded Polygons
Intersections. Intersection Problem 3 Intersection Detection: Given two geometric objects, do they intersect? Intersection detection (test) is frequently.
17. Computational Geometry Chapter 7 Voronoi Diagrams.
Computability and Complexity 23-1 Computability and Complexity Andrei Bulatov Search and Optimization.
Complexity 15-1 Complexity Andrei Bulatov Hierarchy Theorem.
Data Transmission and Base Station Placement for Optimizing Network Lifetime. E. Arkin, V. Polishchuk, A. Efrat, S. Ramasubramanian,V. PolishchukA. EfratS.
Praktikum zur Analyse von Formen - Abstandsmaße - Helmut Alt Freie Universität Berlin.
Trajectory Simplification
Median trajectories: define and compute a trajectory composed of the input trajectories and that is somehow in the middle Marc van Kreveld Department of.
Computability and Complexity 32-1 Computability and Complexity Andrei Bulatov Boolean Circuits.
Lecture 11. Matching A set of edges which do not share a vertex is a matching. Application: Wireless Networks may consist of nodes with single radios,
On Map-Matching Vehicle Tracking Data. Outline Authors Errors in the data Incremental MM Algorithm Global MM Algorithm Quality Measures Performance Conclusion.
Lecture 11. Matching A set of edges which do not share a vertex is a matching. Application: Wireless Networks may consist of nodes with single radios,
Computational Complexity Polynomial time O(n k ) input size n, k constant Tractable problems solvable in polynomial time(Opposite Intractable) Ex: sorting,
Nattee Niparnan. Easy & Hard Problem What is “difficulty” of problem? Difficult for computer scientist to derive algorithm for the problem? Difficult.
Theory of Computing Lecture 15 MAS 714 Hartmut Klauck.
Computational Movement Analysis Lecture 5: Segmentation, Popular Places and Regular Patterns Joachim Gudmundsson.
Example Question on Linear Program, Dual and NP-Complete Proof COT5405 Spring 11.
4/28/15CMPS 3130/6130 Computational Geometry1 CMPS 3130/6130 Computational Geometry Spring 2015 Shape Matching Carola Wenk A B   (B,A)
NP-COMPLETE PROBLEMS. Admin  Two more assignments…  No office hours on tomorrow.
Approximation Algorithms Department of Mathematics and Computer Science Drexel University.
Geodesic Fréchet Distance Inside a Simple Polygon Atlas F. Cook IV & Carola Wenk Proceedings of the 25th International Symposium on Theoretical Aspects.
Strings Basic data type in computational biology A string is an ordered succession of characters or symbols from a finite set called an alphabet Sequence.
NP Completeness Piyush Kumar. Today Reductions Proving Lower Bounds revisited Decision and Optimization Problems SAT and 3-SAT P Vs NP Dealing with NP-Complete.
Example Apply hierarchical clustering with d min to below data where c=3. Nearest neighbor clustering d min d max will form elongated clusters!
Theory of Computing Lecture 12 MAS 714 Hartmut Klauck.
TU/e Algorithms (2IL15) – Lecture 12 1 Linear Programming.
2IMA20 Algorithms for Geographic Data Spring 2016 Lecture 2: Similarity.
The NP class. NP-completeness Lecture2. The NP-class The NP class is a class that contains all the problems that can be decided by a Non-Deterministic.
CMPS 3130/6130 Computational Geometry Spring 2017
The NP class. NP-completeness
Groups of vertices and Core-periphery structure
Polygonal Curve Simplification
Haim Kaplan and Uri Zwick
Graph Theory and Algorithm 02
Computability and Complexity
ICS 353: Design and Analysis of Algorithms
Enumerating Distances Using Spanners of Bounded Degree
Parameterised Complexity
Bart M. P. Jansen June 3rd 2016, Algorithms for Optimization Problems
Randomized Algorithms CS648
2IMG15 Algorithms for Geographic Data
Chapter 11 Limitations of Algorithm Power
On the Geodesic Centers of Polygonal Domains
Minimizing the Aggregate Movements for Interval Coverage
2IMG15 Algorithms for Geographic Data
Aggregate-Max Nearest Neighbor Searching in the Plane
The Theory of NP-Completeness
Clustering.
Spatial Databases - Distance
Lecture 24 Vertex Cover and Hamiltonian Cycle
Presentation transcript:

Computational Movement Analysis Lecture 2: Clustering Joachim Gudmundsson

Fundamental tools: clustering Group similar objects into clusters.

Fundamental tools: clustering Group similar (sub)curves into clusters. Similarity measure: Fréchet distance Question: Do we need any constraints on a cluster? Constraints on subcurves in a cluster?

Aim: Cluster subcurves Cluster of subcurves

Subtrajectory clustering

Subtrajectory clustering

Subtrajectory clustering

Subtrajectory clustering

Recall: Fréchet Distance Fréchet Distance measures the similarity of two curves. Dog walking example Person is walking his dog (person on one curve and the dog on other) Allowed to control their speeds but not allowed to go backwards! Fréchet distance of the curves: minimal leash length necessary for both to walk the curves from beginning to end

Recall: Fréchet Distance Input: Two polygonal chains P=p1, … , pn and Q=q1, … , qm in Rd. The Fréchet distance between P and Q is: where  and  range over all continuous non-decreasing reparametrizations. Note that (0)=p1, (1)=pn, (0)=q1 and (1)=qm. Well-suited for the comparison of curves since it takes the continuity of the curves into account. 𝛿 𝐹 (P,Q) = inf 𝛼:[0,1]→𝑃 𝛽:[0,1]→𝑄 max 𝑡 ∈[0,1] |𝑃(𝛼 𝑡 )−𝑄(𝛽 𝑡 )| reparamterized monotonous curve

Decision algorithm: compute path Algorithm: 1. Compute Free Space diagram mn cells  O(mn) time 2. Compute a non-xy-decreasing path from (q1,p1) to (qm,pn). Build network O(mn) time. Find a path O(mn) time. (q1,p1) (qm,pn) P Q

Cluster Input: A polygonal curve T, an integer m>1 and a distance d. Cluster: m subcurves T1, … , Tm of T with distance at most d between any two subcurves. Constraints?

Cluster Input: A polygonal curve T, an integer m>1 and a distance d. Cluster: m subcurves T1, … , Tm of T with distance at most d between any two subcurves. Constraint 1: subcurves are pairwise disjoint

Cluster Input: A polygonal curve T, an integer m>1 and a distance d. Cluster: m subcurves T1, … , Tm of T with distance at most d between any two subcurves. Constraint 1: subcurves are pairwise disjoint More constraints? d  infinite number of clusters

Cluster Input: A polygonal curve T, an integer m>1 and a distance d. Cluster: m subcurves T1, … , Tm of T with distance at most d between any two subcurves. Constraint 1: subcurves are pairwise disjoint Constraint 2: cluster has to be maximal “length” d  infinite number of clusters

Decision Problem Given a curve T, a subcurve cluster SC(m,l,d) of T consists of at least m subcurves T1, … , Tm of T such that: the subcurves are pairwise disjoint, the distance between any two subcurves is at most d, and at least one subcurve has length l.

Decision Problem Given a curve T, a subcurve cluster SC(m,l,d) of T consists of at least m subcurves T1, … , Tm of T such that: the subcurves are pairwise disjoint, the distance between any two subcurves is at most d, and at least one subcurve has length l.

Decision Problem Given a curve T, a subcurve cluster SC(m,l,d) of T consists of at least m subcurves T1, … , Tm of T such that: the subcurves are pairwise disjoint, the distance between any two subcurves is at most d, and at least one subcurve has length l. The length of a subcurve cluster is assumed to be maximal.

Decision Problem Given a curve T, a subcurve cluster SC(m,l,d) of T consists of at least m subcurves T1, … , Tm of T such that: the subcurves are pairwise disjoint, the distance between any two subcurves is at most d, and at least one subcurve has length l. The length of a subcurve cluster is assumed to be maximal.

Decision Problem Given a trajectory T, a subtrajectory cluster SC(m,l,d) of T consists of at least m subtrajectories T1, … , Tm of T such that: the subtrajectories are pairwise disjoint, the distance between any two subtrajectories is at most d, and at least one subtrajectory has length l. The length of a subtrajectory cluster is assumed to be maximal.

Problem Decision version: Subtrajectory cluster SC(m,l,d) Given a trajectory T, is there a subtrajectory cluster with parameters m, l and d? Optimisation versions: SC(m,max,d) – maximise length of cluster

Hardness results Theorem 1: Finding any approximation of the SC(m,max,d) problem is 3SUM-hard. Theorem 2: The decision problem SC(m,l,d) is NP-complete. Theorem 3: The problem of computing a (2-)-distance approximation of the SC(m,max,d)-problem is NP-hard. [Gudmundsson & van Kreveld’08]

Is there a clique of size k in a given graph G=(V,E)? Hardness results Theorem 2: The decision problem SC(m,l,d) is NP-complete. Reduction from MaxClique MaxClique: Is there a clique of size k in a given graph G=(V,E)? Clique of size 4

Longest subtrajectory cluster: NP-complete Problem: SC(m,l=n,d). a e b a,c,e d c a,b d,e a,e b,c b,d a,c e a b c d e MaxClique b,c,d

Longest subtrajectory cluster: NP-complete Problem: SC(m,l=n,d). a e b a,c,e d c a,b d,e a,e b,c b,d a,c e a b c d e MaxClique b,c,d

Longest subtrajectory cluster: NP-complete Problem: SC(m,l=n,d). a e b a,c,e d c a,b d,e a,e b,c b,d a,c e a b c d e MaxClique b,c,d

Longest subtrajectory cluster: NP-complete Problem: SC(m,l=n,d). a e b a,c,e d c a,b d,e a,e b,c b,d a,c e a b c d e MaxClique b,c,d

Longest subtrajectory cluster: NP-complete Problem: SC(m,l=n,d). a e b a,c,e d c a,b d,e a,e b,c b,d a,c e a b c d e MaxClique b,c,d

Longest subtrajectory cluster: NP-complete Problem: SC(m,l=n,d). a e b a,c,e d c a,b d,e a,e b,c b,d a,c e a b c d e MaxClique b,c,d

Longest subtrajectory cluster: NP-complete Problem: SC(m,l=n,d). a e b a,c,e d c a,b d,e a,e b,c b,d a,c e a b c d e MaxClique b,c,d

Longest subtrajectory cluster: NP-complete Problem: SC(m,l=n,d). a b c d e a b c d e MaxClique b,c,d a,c,e a,b a,e b,d e d d,e b,c a,c

Longest subtrajectory cluster: NP-complete Problem: SC(m,l=n,d). a b c d e a b c d e MaxClique b,c,d a,c,e a,b a,e b,d e d d,e b,c a,c SC(m,l=n,d)  Clique of size m in G Problem as hard as MaxClique!

Hardness results Theorem 2: The decision problem SC(m,l,d) is NP-complete.

Longest subtrajectory cluster: NP-complete Problem: SC(m,l=n,d). a b c d e a b c d e MaxClique b,c,d a,c,e a,b a,e b,d e d d,e b,c a,c

Hardness results Theorem 3: The problem of computing a (2-)-distance approximation of the SC(m,max,d)-problem is NP-hard.

Hardness results Theorem 3: The problem of computing a (2-)-distance approximation of the SC(m,max,d)-problem is NP-hard. Corollary 1: The problem of computing a (2-)-distance approximation of SC(max, l, r), for any constant 0 <  < 1, is at least as hard as approximating MaxClique.

Hardness results Theorem 3: The problem of computing a (2-)-distance approximation of the SC(m,max,d)-problem is NP-hard. Corollary 1: The problem of computing a (2-)-distance approximation of SC(max, l, r), for any constant 0 <  < 1, is at least as hard as approximating MaxClique. Can we find a 2-distance approximation in polynomial time?

Fréchet distance between m curves Input: Set of m polygonal curves F = {F1, …, Fm} with |Fi| = ni The Fréchet distance of F can be computed by computing the Fréchet distance between every pair of curves. Time: O( (ninj log ninj)) i,j If |Fi| = n/m then O((n/m)4 log n/m).

Fréchet distance between m curves Input: Set of m polygonal curves F = {F1, …, Fm} with |Fi| = ni Observation: Given F1, F2 and F3, we have: F(F1,F3)  F(F1,F2) + F(F2,F3). [Dumitrescu & Rote’04]

Fréchet distance between m curves Input: Set of m polygonal curves F = {F1, …, Fm} with |Fi| = ni Observation: Given F1, F2 and F3, we have: F(F1,F3)  F(F1,F2) + F(F2,F3). [Dumitrescu & Rote’04] a  a+b b Can we use this observation to get an approximation?

Fréchet distance between m curves Input: Set of m polygonal curves F = {F1, …, Fm} with |Fi| = ni Idea: Select a representative curve F1 of F. Compute the maximum Fréchet distance D between F1 and all other curves in F.

Fréchet distance between m curves Input: Set of m polygonal curves F = {F1, …, Fm} with |Fi| = ni Idea: Select a representative curve F1 of F. Compute the maximum Fréchet distance D between F1 and all other curves in F.  D  F  2D Observation: Gives a 2-approximation

Fréchet distance between m curves Input: Set of m polygonal curves F = {F1, …, Fm} with |Fi| = ni Idea: Select a representative curve F1 of F. Compute the maximum Frechet distance D between F1 and all other curves in F.  D  F  2D Observation: Gives a 2-approximation Time: O( (n1ni log n1ni)) i

Decision algorithm: compute path Recall: Deciding if the Fréchet distance between two curves P and Q is less than r can be computed in O(mn) time. The Fréchet distance between two polygonal curves P and Q can be computed in O(mn log mn) time using parametric search. (q1,p1) (qm,pn) P Q Q P

Recall the problem Given a trajectory T, a subtrajectory cluster SC(m,l,d) of T consists of at least m subtrajectories T1, … , Tm of T such that: the subtrajectories are pairwise disjoint, the distance between any two subtrajectories is at most d, and at least one subtrajectory has length l.

Recall the problem Input: A trajectory T with n points, an integer m>1 and a real value d>0. Output: SC(m,max,d) Constraint: For simplicity we will assume that all sub- trajectories in a cluster has to start and end at a vertex. Idea: Create a free space diagram describing the distance between T and T.

Free space diagram of T T

Free space diagram of T T

Free space diagram of T T A B D(A,C)  d D(B,C)  d D(A,B)  2d C

Free space diagram of T C: representative trajectory B C C: representative trajectory The length of the SC {A,B,C} is the length of the representative trajectory.

Free space diagram of T

Free space diagram of T

Free space diagram of T

Approximation algorithm Sweep the free space diagram from left to right with two vertical lines (L and R) At each event point decide if there are m monotone curves between L and R L R While sweeping maintain network of critical points.

Approximation algorithm Sweep the free space diagram from left to right with two vertical lines (L and R) At each event point decide if there are m monotone curves between L and R a) If “yes” then move R to the right b) If “no” and R-L=1 then move R to the right c) If “no” and R-L>1 then move L to the right L R

Approximation algorithm Sweep the free space diagram from left to right with two vertical lines (L and R) At each event point decide if there are m monotone curves between L and R a) If “yes” then move R to the right b) If “no” and R-L=1 then move R to the right c) If “no” and R-L>1 then move L to the right L R

Approximation algorithm Sweep the free space diagram from left to right with two vertical lines (L and R) At each event point decide if there are m monotone curves between L and R a) If “yes” then move R to the right b) If “no” and R-L=1 then move R to the right c) If “no” and R-L>1 then move L to the right L R

Approximation algorithm Sweep the free space diagram from left to right with two vertical lines (L and R) At each event point decide if there are m monotone curves between L and R a) If “yes” then move R to the right b) If “no” and R-L=1 then move R to the right c) If “no” and R-L>1 then move L to the right L R

Approximation algorithm Sweep the free space diagram from left to right with two vertical lines (L and R) At each event point decide if there are m monotone curves between L and R a) If “yes” then move R to the right b) If “no” and R-L=1 then move R to the right c) If “no” and R-L>1 then move L to the right L R

Approximation algorithm Sweep the free space diagram from left to right with two vertical lines (L and R) At each event point decide if there are m monotone curves between L and R a) If “yes” then move R to the right b) If “no” and R-L=1 then move R to the right c) If “no” and R-L>1 then move L to the right L R

Approximation algorithm Sweep the free space diagram from left to right with two vertical lines (L and R) At each event point decide if there are m monotone curves between L and R a) If “yes” then move R to the right b) If “no” and R-L=1 then move R to the right c) If “no” and R-L>1 then move L to the right L R

Approximation algorithm Sweep the free space diagram from left to right with two vertical lines (L and R) At each event point decide if there are m monotone curves between L and R a) If “yes” then move R to the right b) If “no” and R-L=1 then move R to the right c) If “no” and R-L>1 then move L to the right L R

Approximation algorithm Sweep the free space diagram from left to right with two vertical lines (L and R) At each event point decide if there are m monotone curves between L and R a) If “yes” then move R to the right b) If “no” and R-L=1 then move R to the right c) If “no” and R-L>1 then move L to the right L R

Approximation algorithm Sweep the free space diagram from left to right with two vertical lines (L and R) At each event point decide if there are m monotone curves between L and R a) If “yes” then move R to the right b) If “no” and R-L=1 then move R to the right c) If “no” and R-L>1 then move L to the right L R

Approximation algorithm Sweep the free space diagram from left to right with two vertical lines (L and R) At each event point decide if there are m monotone curves between L and R a) If “yes” then move R to the right b) If “no” and R-L=1 then move R to the right c) If “no” and R-L>1 then move L to the right L R

Approximation algorithm Sweep the free space diagram from left to right with two vertical lines (L and R) At each event point decide if there are m monotone curves between L and R a) If “yes” then move R to the right b) If “no” and R-L=1 then move R to the right c) If “no” and R-L>1 then move L to the right L R

Approximation algorithm Sweep the free space diagram from left to right with two vertical lines (L and R) At each event point decide if there are m monotone curves between L and R a) If “yes” then move R to the right b) If “no” and R-L=1 then move R to the right c) If “no” and R-L>1 then move L to the right L R

Data structures Number of event points? L R

Data structures Number of event points? L R

Data structures Number of event points? O(n) L R Two types of events: L moves to the right R moves to the right How to handle an event?  Decide if there are m non- overlapping xy-monotone paths between L and R

Handle event u P u’ R Start with top-most corner u on R. L Find the top-most corner u’ on L that can be reached by a xy-monotone path P. L R u P u’ Observation: No point on R below u can reach a point on L above u’ with an xy-monotone path.

Handle event u P v v’ u’ R Start with top-most corner u on R. L Find the top-most corner u’ on L that can be reached by a xy-monotone path P. L R u P v v’ u’ Observation: No point on R below u can reach a point on L above u’ with an xy-monotone path.

Handle event u P u’ v v’ R Start with top-most corner u on R. L Find the top-most corner u’ on L that can be reached by a xy-monotone path P. L R u P Next take the top-most corner v on R below u’. Find the top-most corner on L that can be reached by a xy-monotone path. Continue until: m curves found, or no more corners on R. u’ v v’

Path Query in the Free Space diagram In worst case the algorithm performs n path queries. How do we perform a path query? Recall querying for a path in lecture 1. O(n2) time per query O(n) events, n points on R Total: O(n3w) time and O(nw) space, where w = max (R-L)

Path Query in the Free Space diagram In worst case the algorithm performs n path queries. How do we perform a path query? Can it be improved? O(n2w) time and O(nw) space

Path Query in the Free Space diagram L R In worst case the algorithm performs n path queries. How do we perform a path query? Can it be improved? O(n2w) time and O(nw) space Extension: The algorithm can be modified to handle the case when only the “reference” trajectory needs to start an end at vertex.

Approximation algorithm Theorem: A 2-distance approximation of the SC(m,max,d) problem can be computed in O(n2+nmw) time and O(nw) space using the discrete Fréchet distance. A 2-distance approximation of the SC(m,max,d) problem can be computed in time O(n2w) using the continuous Fréchet distance if reference trajectory starts and ends in vertex. A 2-distance approximation of the SC(m,max,d) problem can be computed in time O(n3m 2(n/m)(log2 n+m)) using the continuous Fréchet distance. [Joint work: Buchin, Buchin, Löffler and Luo’10]

Experimental Results (continuous!) i5-200 CPU with a Nvidia GTX 580 Note: Continuous model  input data can be simplified! [Joint work with Nacho Valladares’13]

Open Problems Can we cluster faster? Can a c-approximate Fréchet clustering be computed faster? Can we cluster faster for special cases? What should we report? Cluster using other distance measures? For example using [Sankaramanet al. 2013]?

References K. Buchin, M. Buchin, J. Gudmundsson, M. Loffler and J. Luo. Detecting Commuting Patterns by Clustering Subtrajectories. International Journal on Computational Geometry and Applications, 2011. N. Valladares and J. Gudmundsson. A GPU approach to subtrajectory clustering using the Fréchet distance. ACM SIGSPATIAL 2012. A. Dumitrescu and G. Rote. On the Fréchet distance of a set of curves, Proceedings of the Sixteenth Canadian Conference on Computational Geometry, 2004. S. Sankararaman, P. K. Agarwal, T. Mølhave, J. Pan and A. P. Boedihardjo. Model-driven matching and segmentation of trajectories. ACM SIGSPATIAL, 2013