Computational Geometry and Spatial Data Mining Marc van Kreveld Department of Information and Computing Sciences Utrecht University.

Slides:



Advertisements
Similar presentations
Fundamental tools: clustering
Advertisements

A Framework for Clustering Evolving Data Streams Charu C. Aggarwal, Jiawei Han, Jianyong Wang, Philip S. Yu Presented by: Di Yang Charudatta Wad.
Efficient access to TIN Regular square grid TIN Efficient access to TIN Let q := (x, y) be a point. We want to estimate an elevation at a point q: 1. should.
 Distance Problems: › Post Office Problem › Nearest Neighbors and Closest Pair › Largest Empty and Smallest Enclosing Circle  Sub graphs of Delaunay.
Exact Inference in Bayes Nets
Computational Movement Analysis Lecture 4: Movement patterns Joachim Gudmundsson.
DMiST- Data Mining in Spatio-Temporal sets
Trajectory Pattern Mining ACMGIS’2011 Hoyoung Jeung† Man Lung Yiu‡ Christian S. Jensen* † Ecole Polytechnique F´ed´erale de Lausanne (EPFL) ‡ Hong Kong.
Marc van Kreveld (and Giri Narasimhan) Department of Information and Computing Sciences Utrecht University.
Geographical analysis Overlay, cluster analysis, auto- correlation, trends, models, network analysis, spatial data mining.
The Evolution of Spatial Outlier Detection Algorithms - An Analysis of Design CSci 8715 Spatial Databases Ryan Stello Kriti Mehra.
Computational Geometry for DAC: Partitioning Algorithms Joseph S. B. Mitchell and Girishkumar Sabhnani Stony Brook University.
From portions of Chapter 8, 9, 10, &11. Real world is complex. GIS is used model reality. The GIS models then enable us to ask questions of the data by.
Generated Waypoint Efficiency: The efficiency considered here is defined as follows: As can be seen from the graph, for the obstruction radius values (200,
Image Indexing and Retrieval using Moment Invariants Imran Ahmad School of Computer Science University of Windsor – Canada.
1st Meeting Industrial Geometry Computational Geometry ---- Some Basic Structures 1st IG-Meeting.
Quadtrees Raster and vector.
17. Computational Geometry Chapter 7 Voronoi Diagrams.
Computational Geometry and Spatial Data Mining
Data Transmission and Base Station Placement for Optimizing Network Lifetime. E. Arkin, V. Polishchuk, A. Efrat, S. Ramasubramanian,V. PolishchukA. EfratS.
Polynomial-Time Approximation Schemes for Geometric Intersection Graphs Authors: T. Erlebach, L. Jansen, and E. Seidel Presented by: Ping Luo 10/17/2005.
Trajectory Simplification
Segmentation Divide the image into segments. Each segment:
Median trajectories: define and compute a trajectory composed of the input trajectories and that is somehow in the middle Marc van Kreveld Department of.
Tracking Moving Objects in Anonymized Trajectories Nikolay Vyahhi 1, Spiridon Bakiras 2, Panos Kalnis 3, and Gabriel Ghinita 3 1 St. Petersburg State University.
UNC Chapel Hill M. C. Lin Overview of Last Lecture About Final Course Project –presentation, demo, write-up More geometric data structures –Binary Space.
What is Cluster Analysis?
1 University of Denver Department of Mathematics Department of Computer Science.
Approximate Distance Oracles for Geometric Spanner Networks Joachim Gudmundsson TUE, Netherlands Christos Levcopoulos Lund U., Sweden Giri Narasimhan Florida.
© NICTA 2007 Joachim Gudmundsson Detecting Movement Patterns Among Trajectory Data.
Area, buffer, description Area of a polygon, center of mass, buffer of a polygon / polygonal line, and descriptive statistics.
The Shortest Path Problem
Part I: Introductory Materials Introduction to Graph Theory Dr. Nagiza F. Samatova Department of Computer Science North Carolina State University and Computer.
CSE53111 Computational Geometry TOPICS q Preliminaries q Point in a Polygon q Polygon Construction q Convex Hulls Further Reading.
October 8, 2013Computer Vision Lecture 11: The Hough Transform 1 Fitting Curve Models to Edges Most contours can be well described by combining several.
Gene expression & Clustering (Chapter 10)
ADA: 14. Intro to CG1 Objective o give a non-technical overview of Computational geometry, concentrating on its main application areas Algorithm.
UNC Chapel Hill M. C. Lin Point Location Reading: Chapter 6 of the Textbook Driving Applications –Knowing Where You Are in GIS Related Applications –Triangulation.
Fixed Parameter Complexity Algorithms and Networks.
Planning Near-Optimal Corridors amidst Obstacles Ron Wein Jur P. van den Berg (U. Utrecht) Dan Halperin Athens May 2006.
Computational Movement Analysis Lecture 5: Segmentation, Popular Places and Regular Patterns Joachim Gudmundsson.
8. Geographic Data Modeling. Outline Definitions Data models / modeling GIS data models – Topology.
B-trees and kd-trees Piotr Indyk (slides partially by Lars Arge from Duke U)
Time Series Data Analysis - I Yaji Sripada. Dept. of Computing Science, University of Aberdeen2 In this lecture you learn What are Time Series? How to.
Geometric Matching on Sequential Data Veli Mäkinen AG Genominformatik Technical Fakultät Bielefeld Universität.
CSIS7101 – Advanced Database Technologies Spatio-Temporal Data (Part 1) On Indexing Mobile Objects Kwong Chi Ho Leo Wong Chi Kwong Simon Lui, Tak Sing.
2/19/15CMPS 3130/6130 Computational Geometry1 CMPS 3130/6130 Computational Geometry Spring 2015 Voronoi Diagrams Carola Wenk Based on: Computational Geometry:
Data Structures & Algorithms Graphs
CSE 589 Part VI. Reading Skiena, Sections 5.5 and 6.8 CLR, chapter 37.
Stabbing balls and simplifying proteins Ovidiu Daescu and Jun Luo Department of Computer Science University of Texas at Dallas Richardson, TX
NP-COMPLETE PROBLEMS. Admin  Two more assignments…  No office hours on tomorrow.
Machine Learning Queens College Lecture 7: Clustering.
Course14 Dynamic Vision. Biological vision can cope with changing world Moving and changing objects Change illumination Change View-point.
Geometric Description
Measures and metrics Pattern Recognition 2015/2016 Marc van Kreveld.
Comp. Genomics Recitation 7 Clustering and analysis of microarrays.
Eick: kNN kNN: A Non-parametric Classification and Prediction Technique Goals of this set of transparencies: 1.Introduce kNN---a popular non-parameric.
The geometric GMST problem with grid clustering Presented by 楊劭文, 游岳齊, 吳郁君, 林信仲, 萬高維 Department of Computer Science and Information Engineering, National.
COSC 3101A - Design and Analysis of Algorithms 14 NP-Completeness.
The NP class. NP-completeness Lecture2. The NP-class The NP class is a class that contains all the problems that can be decided by a Non-Deterministic.
The 2x2 Simple Packing Problem André van Renssen Supervisor: Bettina Speckmann.
2IMA20 Algorithms for Geographic Data Spring 2016 Lecture 3: Movement Patterns.
A novel, low-latency algorithm for multiple group-by query optimization Duy-Hung Phan Pietro Michiardi ICDE16.
The NP class. NP-completeness
Algorithms and Networks
K Nearest Neighbor Classification
Enumerating Distances Using Spanners of Bounded Degree
2IMG15 Algorithms for Geographic Data
Kinetic Collision Detection for Convex Fat Objects
Clustering.
Presentation transcript:

Computational Geometry and Spatial Data Mining Marc van Kreveld Department of Information and Computing Sciences Utrecht University

Clustering? Are the people clustered in this room?  How do we define a cluster? In spatial data mining we have objects/ entities with a location given by coordinates Cluster definitions involve distance between locations

Clustering - options Determine whether clustering occurs Determine the degree of clustering Determine the clusters Determine the largest cluster Determine the outliers

Co-location Are the men clustered? Are the women clustered? Is there a co-location of men and women?

Co-location Like before, we may be interested in –is there co-location? –the degree of co-location –the largest co-location –the co-locations themselves –the objects not involved in co-location

Spatio-temporal data Locations have a time stamp Interesting patterns involve space and time

Trajectory data Entities with a trajectory (time-stamped motion path) Interesting patterns involve subgroups with similar heading, expected arrival, joint motion,... n entities = trajectories; n = 10 – 100,000 t time steps; t = 10 – 100,000  input size is nt m size subgroup (unknown); m = 10 – 100,000

Examples of trajectory data Tracked animals (buffalo, birds,...) Tracked people (potential terrorists) Tracked GSMs (e.g. for traffic purposes) Trajectories of tornadoes Sports scene analysis (players on a soccer field)

Example pattern in trajectories What is the location visited by most entities? location = circular region of specified radius

Example pattern in trajectories What is the location visited by most entities? location = circular region of specified radius 4 entities

Example pattern in trajectories What is the location visited by most entities? location = circular region of specified radius 3 entities

Example pattern in trajectories Compute buffer of each trajectory

Example pattern in trajectories Compute buffer of each trajectory Compute the arrangement of the buffers and the cover count of each cell 1

Example pattern in trajectories One trajectory has t time stamps; its buffer can be computed in O(t log t) time All buffers can be computed in O(nt log t) time The arrangement can be computed in O(nt log (nt) + k) time, where k = O( (nt) 2 ) is the complexity of the arrangement Cell cover counts are determined in O(k) time

Example pattern in trajectories Total: O(nt log (nt) + k) time If the most visited location is visited by m entities, this is O(nt log (nt) + ntm) Note: input size is nt ; n entities, each with location at t moments

Patterns in entity data Spatial data n points (locations) Distance is important –clustering pattern Presence of attributes (e.g. man/woman): –co-location patterns Spatio-temporal data n trajectories, each has t time steps Distance is time- dependent –flock pattern –meet pattern Heading and speed are important and are also time-dependent

Entities in subdivisions Also co-location pattern Discovered simply by overlay E.g., occurrences of oaks on different soil types

Clustering entities in subdivisions What if it is known that the entities only occur in regions of a certain type? bird nests radius of cluster Situation without subdivision

Clustering entities in subdivisions What if it is known that the entities only occur in regions of a certain type? bird nests Situation with subdivision land-water radius of cluster

Clustering entities in subdivisions burglary house car

Region-restricted clustering Determine clusters in point sets that are sensitive to the geographic context (at least, for the relevant aspects)  Assume that a set of regions is given where points can only be, how should we define clusters? Joint research with Joachim Gudmundsson (NICTA, Sydney) and Giri Narasimhan (U of F, Miami), 2006

Region-restricted clustering Given a set P of points, a set F of regions, a radius r and a subset size m, a region-restricted cluster is a subset P’  P inside a circle C where –P’ has size at least m –C has radius at most 2r –C contains at most  r 2 area of regions of F ≤ 2r sum area ≤  r 2 r

Region-restricted clustering Given a set P of n points, a set F of polygons with n f edges in total, and values for r and m, report all region-restricted clusters of exactly m points Exactly m points? “Real” clustering (partition)? Outliers?

Region-restricted clustering Exactly m points? Every cluster with >m points consists of clusters with m points with smaller circles “Real” clustering (partition)? Outliers? m = 5

Region-restricted clustering Exactly m points? Every cluster with >m points consists of clusters with m points with smaller circles “Real” clustering (partition)? Outliers? m = 5

Region-restricted clustering 1.Determine all smallest circles with m points of P inside 2.Test if the radius is ≤ r (report) or > 2r (discard) 3.If the radius is in between, determine the area of regions of F inside

Region-restricted clustering 1.Determine all smallest circles with m points of P inside Use (m-2)-th order Voronoi diagram: cells where the same (m-2) points are closest Its vertices are centers of smallest circles around exactly m points

ordinary = order-1 VD

order-2 VD

order-3 VD

Region-restricted clustering The m-th order Voronoi diagram (or (m-2)) has O(nm) cells, edges, and vertices It can be constructed in O(nm log n) time  we get O(nm) smallest circles with m points inside; for each we also know the radius

Region-restricted clustering 2. Test if the radius is ≤ r (report) or > 2r (discard) Trivial in O(1) time per circle, so in O(nm) time overall

Region-restricted clustering 3. Determine the area of regions of F inside Brute force: O(n f ) time per circle, so in O(nmn f ) time overall

Region-restricted clustering Complication: This need not give all region-restricted clusters! –Need to compute area of F inside a circle with moving center –Requires solving high-degree polynomials

Region-restricted clusters The anti-climax: we cannot give an exact algorithm! If we takes squares instead of circles, we can deal with the problem....

Region-restricted clustering 3. Determine the area of regions of F inside Brute force: O(n f ) time per square, so in O(nmn f ) time overall The total time for steps 1, 2, and 3 is O(nm log n) + O(nm) + O(nmn f ) = O(nm log n + nmn f ) time

Region-restricted clustering 3. Determine the area of regions of F inside Using a suitable data structure (only possible for squares): O(log 2 n f ) time per square, so in O(nm log 2 n f ) time overall The total time becomes O(nm log n + n f log 2 n f + nm log 2 n f ) order- (m-2) VD construction preprocessing of data structure total query time in data structure

Region-restricted clustering The squares solution generalizes to regular polygons (e.g. 20-gons) An approximation of the radius within (1+  )r gives a O(n/  2 + n f log 2 n f + n log n f /(m  2 )) time algorithm 16-gon

Region-restricted clustering Open problems: –Develop a region-restricted version of k-means clustering, single link clustering,... –Region-restricted co-location? –Replace region-restricted by gradual model 0 /unit 2 /unit5 /unit8 /unit typical:clusters:

Patterns in trajectories n trajectories, each with t time steps  n polygonal lines with t vertices Already looked at most visited location

Patterns in trajectories Flock: near positions of (sub)trajectories for some subset of the entities during some time Convergence: same destination region for some subset of the entities Encounter: same destination region with same arrival time for some subset of the entities Similarity of trajectories Same direction of movement, leadership, flockconvergence

Patterns in trajectories Flocking, convergence, encounter patterns –Laube, van Kreveld, Imfeld (SDH 2004) –Gudmundsson, van Kreveld, Speckmann (ACM GIS 2004) –Benkert, Gudmundsson, Huebner, Wolle (ESA 2006) –... Similarity of trajectories –Vlachos, Kollios, Gunopulos (ICDE 2002) –Shim, Chang (WAIM 2003) –... Lifelines, motion mining, modeling motion –Mountain, Raper (GeoComputation 2001) –Kollios, Scaroff, Betke (DM&KD 2001) –Frank (GISDATA 8, 2001) –...

Patterns in trajectories Flock: near positions of (sub)trajectories for some subset of the entities during some time –clustering-type pattern –different definitions are used Given: radius r, subset size m, and duration T, a flock is a subset of size  m that is inside a (moving) circle of radius r for a duration  T

Patterns in trajectories Longest flock: given a radius r and subset size m, determine the longest time interval for which m entities were within each other’s proximity (circle radius r) Time = longest flock in [ 1.8, 6.4 ] m = 3

Patterns in trajectories Meet: near some position of (sub)trajectories for some subset of the entities –clustering-type pattern Given: radius r, subset size m, and duration T, a meet is a subset of size  m that is inside a (stationary) circle of radius r for a duration  T this was “moving” for flock

Patterns in trajectories The same subset required for a flock or meet? Example: meet with m = 4; duration is 3 + time steps or 4 + time steps?

Patterns in trajectories flock meet fixed subsetvariable subset examples for m = 3

Patterns in trajectories Exact results ( input size is n  ) NP-hard O(n 3  log n) O(n 4  2 log n + n 2  3 ) fixed subsetvariable subset flock meet O(n 4  2 log n + n 2  3 )

Patterns in trajectories A radius-2 approximation of the longest flock can be computed in time O(n 2  log n)... meaning: if the longest flock of size m for radius r has duration T, then we surely find a flock of size m and duration  T for radius 2r longest flock for r at least as long a flock for 2r

Patterns in trajectories Approximate radius results ( input size is n  ) flock meet fixed subsetvariable subset O(n 2  log n) O((n 2  log n) /  2 ) O((n 2  log n) / (m  2 )) factor 2 factor 2+  factor 1+  NP-hard O(n 3  log n) O(n 4  2 log n + n 2  3 )

v3v3 Fixed subset flock It is NP-complete to decide if a graph has a subgraph with m nodes that is a clique v1v1 v2v2 v3v3 v4v4 v5v5 v6v6 v7v7 For every node of the graph, make an entity with a trajectory all nodes not adjacent to v 1 go here v1v1 v2v2 v4v4 v5v5 v6v6 v7v7 v 1 is not adjacent to v 4, v 5, and v 7 r

v3v3 Fixed subset flock v1v1 v2v2 v3v3 v4v4 v5v5 v6v6 v7v7 v1v1 v2v2 v4v4 v5v5 v6v6 v7v7 v 4 not in flock v 4 in flock

v3v3 Fixed subset flock v1v1 v2v2 v3v3 v4v4 v5v5 v6v6 v7v7 v1v1 v2v2 v4v4 v5v5 v6v6 v7v7 The trajectories have a fixed flock of size m and full duration if and only if the graph has a clique of size m flock {v 4,v 5,v 7 } of (full) duration 23 (3·7+2) and size 3

Fixed subset flock Longest fixed flock is NP-hard Max clique has no approximation  cannot approximate duration, nor flock size The reduction applies for all radii < 2r v1v1 v2v2 v3v3 v4v4 v5v5 v6v6 v7v7 v 4 not in flock v 4 in flock

Flock and meet algorithms Go into 3D (space-time) for algorithms time flockmeet

Fixed subset flock, approximation An efficient radius-2 approximation algorithm of longest fixed flock exists Idea: if some v i is in the longest flock, then all other entities are within distance 2r from v i radius 2r, centered at v i vivi flock with v i 2r2r

Fixed subset flock, approximation For each v j, we can determine the O(  ) time intervals where v j is in the column of v i Maintain the intersections for all entities in an augmented tree in O(n  log n) time Do this for all columns (role of v i ) and report longest overall pattern Total: O(n 2  log n) time

Variable subset flock, exact The subset that forms the flock may change entities, but must stay of size  m Any flock subset at any instant has a disk D of radius r with at least 2 entities on the boundary  defining entities r defining entities

Variable subset flock, exact Two entities define two cylinders through time by tracing the two possible radius r disks

Variable subset flock, exact Two entities define two cylinders through time by tracing the two possible radius r disks

Variable subset flock, exact Two entities define two cylinders through time by tracing the two possible radius r disks

Variable subset flock, exact Two entities define two cylinders through time by tracing the two possible radius r disks

Variable subset flock, exact Two entities define two cylinders through time by tracing the two possible radius r disks

Variable subset flock, exact Two entities define two cylinders through time by tracing the two possible radius r disks

Variable subset flock, exact Two entities define two cylinders through time by tracing the two possible radius r disks

Variable subset flock, exact Two entities define two cylinders through time by tracing the two possible radius r disks

Variable subset flock, exact Two entities define two cylinders through time by tracing the two possible radius r disks

Variable subset flock, exact Two entities define two cylinders through time by tracing the two possible radius r disks

Variable subset flock, exact Two entities define two cylinders through time by tracing the two possible radius r disks

Variable subset flock, exact A critical moment is where another entity is on the boundary of the disk; it may go outside or inside

Variable subset flock, exact At a critical moment: –a variable subset flock may start (m entities) –a variable subset flock may stop (<m entities) –Three pairs of defining entities have disks that coincide There are also critical moments when two entities are at distance exactly 2r Between two time steps t i and t i+1 there are O(n 3 ) critical moments  in total there are O(n 3  ) critical moments 2r2r

Variable subset flock, exact Let the O(n 3  ) critical moments be the nodes in a directed acyclic graph G Edges of G are between two consecutive critical moments of the same two defining entities –directed from earlier to later –weight is time between critical moments –only if at least m entities are inside the disk time A longest variable subset flock is a maximum weight path in G

Variable subset flock, exact The graph G can be built in O(n 3  log n) time A maximum weight path can be found in O(n 3  log n) time time A longest variable subset flock is a maximum weight path in G

Patterns in trajectories, summary Flock and meet patterns require algorithms in 3- dimensional space (space-time) Exact algorithms are inefficient  only suitable for smaller data sets Approximation can reduce running time with one or two orders of magnitude

Patterns in trajectories, summary flock meet fixed subsetvariable subset O(n 2  log n) O((n 2  log n) /  2 ) O((n 2  log n) / (m  2 )) factor 2 factor 2+  factor 1+  NP-hard O(n 3  log n) apx exact apx exact O(n 4  2 log n + n 2  3 )

Future research on longest trajectories Faster exact and approximation algorithms Better approximation factors Remove restriction of fixed shape of flocking region (compact or elongated both possible during same flock) Longest duration convergence longest convergence

Patterns in trajectories Flock and meet patterns require algorithms in 3- dimensional space (space-time) Exact algorithms are inefficient  only suitable for smaller data sets Approximation can reduce running time with an order of magnitude

To conclude With an exact definition of a spatial or spatio- temporal pattern, geometric algorithms can be used to compute all patterns Many known structures from computational geometry are useful (Voronoi diagrams, arrangements,...) Since the (exact) algorithms may be inefficient, approximation may be a solution

To discuss What patterns must be detected in practice (both spatial and spatio-temporal)? What is the most appropriate definition (formalization) of these? Spatial association rules, auto-correlation, irregularities, classification,... and other computable things in spatial/spatio-temporal data mining