DMiST- Data Mining in Spatio-Temporal sets www.dmist.net.

Slides:



Advertisements
Similar presentations
1 Computational Geometry Chapter Range queries How do you efficiently find points that are inside of a rectangle? –Orthogonal range query ([x 1,
Advertisements

A Robust Super Resolution Method for Images of 3D Scenes Pablo L. Sala Department of Computer Science University of Toronto.
Fundamental tools: clustering
Approximations of points and polygonal chains
Lecture 24 Coping with NPC and Unsolvable problems. When a problem is unsolvable, that's generally very bad news: it means there is no general algorithm.
Unit 2: Engineering Design Process
Randomized Sensing in Adversarial Environments Andreas Krause Joint work with Daniel Golovin and Alex Roper International Joint Conference on Artificial.
Theory of Computing Lecture 16 MAS 714 Hartmut Klauck.
Computational Movement Analysis Lecture 4: Movement patterns Joachim Gudmundsson.
Trajectory Pattern Mining ACMGIS’2011 Hoyoung Jeung† Man Lung Yiu‡ Christian S. Jensen* † Ecole Polytechnique F´ed´erale de Lausanne (EPFL) ‡ Hong Kong.
Marc van Kreveld (and Giri Narasimhan) Department of Information and Computing Sciences Utrecht University.
Geographical analysis Overlay, cluster analysis, auto- correlation, trends, models, network analysis, spatial data mining.
CS774. Markov Random Field : Theory and Application Lecture 17 Kyomin Jung KAIST Nov
1 Closest Points A famous algorithmic problem... Given a set of points in the plane (cities in the U.S., transistors on a circuit board, computers on a.
Complexity 16-1 Complexity Andrei Bulatov Non-Approximability.
Computational Geometry and Spatial Data Mining
Offset of curves. Alina Shaikhet (CS, Technion)
UMass Lowell Computer Science Analysis of Algorithms Prof. Karen Daniels Spring, 2005 Lecture 1 (Part 1) Introduction/Overview Tuesday, 1/25/05.
DAST, Spring © L. Joskowicz 1 Data Structures – LECTURE 1 Introduction Motivation: algorithms and abstract data types Easy problems, hard problems.
UMass Lowell Computer Science Analysis of Algorithms Prof. Karen Daniels Spring, 2003 Review Lecture Tuesday, 5/6/03.
Trajectory Simplification
Computer Science Spatio-Temporal Aggregation Using Sketches Yufei Tao, George Kollios, Jeffrey Considine, Feifei Li, Dimitris Papadias Department of Computer.
Median trajectories: define and compute a trajectory composed of the input trajectories and that is somehow in the middle Marc van Kreveld Department of.
UMass Lowell Computer Science Analysis of Algorithms Prof. Karen Daniels Fall, 2002 Lecture 1 (Part 1) Introduction/Overview Tuesday, 9/3/02.
UMass Lowell Computer Science Analysis of Algorithms Prof. Karen Daniels Fall, 2002 Review Lecture Tuesday, 12/10/02.
Computational Geometry and Spatial Data Mining Marc van Kreveld Department of Information and Computing Sciences Utrecht University.
UMass Lowell Computer Science Analysis of Algorithms Prof. Karen Daniels Fall, 2004 Lecture 1 (Part 1) Introduction/Overview Wednesday, 9/8/04.
Backtracking Reading Material: Chapter 13, Sections 1, 2, 4, and 5.
1 University of Denver Department of Mathematics Department of Computer Science.
NP and NP- Completeness Bryan Pearsaul. Outline Decision and Optimization Problems Decision and Optimization Problems P and NP P and NP Polynomial-Time.
UMass Lowell Computer Science Analysis of Algorithms Prof. Karen Daniels Spring, 2002 Lecture 1 (Part 1) Introduction/Overview Tuesday, 1/29/02.
© NICTA 2007 Joachim Gudmundsson Detecting Movement Patterns Among Trajectory Data.
Quadtrees and Mesh Generation Student Lecture in course MATH/CSC 870 Philipp Richter Thursday, April 19 th, 2007.
DAST, Spring © L. Joskowicz 1 Data Structures – LECTURE 1 Introduction Motivation: algorithms and abstract data types Easy problems, hard problems.
Lecture 10: Inner Products Norms and angles Projection Sections 2.10.(1-4), Sections 2.2.3, 2.3.
Orthogonal Range Searching I Range Trees. Range Searching S = set of geometric objects Q = query object Report/Count objects in S that intersect Q Query.
Complexity Classes Kang Yu 1. NP NP : nondeterministic polynomial time NP-complete : 1.In NP (can be verified in polynomial time) 2.Every problem in NP.
Improving Min/Max Aggregation over Spatial Objects Donghui Zhang, Vassilis J. Tsotras University of California, Riverside ACM GIS’01.
1 Introduction to Approximation Algorithms. 2 NP-completeness Do your best then.
Finding a Hausdorff Core of a Polygon: On Convex Polygon Containment with Bounded Hausdorff Distance Reza Dorrigiv, Stephane Durocher, Arash Farzan, Robert.
Computational Movement Analysis Lecture 5: Segmentation, Popular Places and Regular Patterns Joachim Gudmundsson.
B-trees and kd-trees Piotr Indyk (slides partially by Lars Arge from Duke U)
RESOURCES, TRADE-OFFS, AND LIMITATIONS Group 5 8/27/2014.
Spatial-Temporal Models in Location Prediction Jingjing Wang 03/29/12.
Week 10Complexity of Algorithms1 Hard Computational Problems Some computational problems are hard Despite a numerous attempts we do not know any efficient.
CSIS7101 – Advanced Database Technologies Spatio-Temporal Data (Part 1) On Indexing Mobile Objects Kwong Chi Ho Leo Wong Chi Kwong Simon Lui, Tak Sing.
On the Approximability of Geometric and Geographic Generalization and the Min- Max Bin Covering Problem Michael T. Goodrich Dept. of Computer Science joint.
Stabbing balls and simplifying proteins Ovidiu Daescu and Jun Luo Department of Computer Science University of Texas at Dallas Richardson, TX
NP-COMPLETE PROBLEMS. Admin  Two more assignments…  No office hours on tomorrow.
Example Ex. Find Sol. So. Example Ex. Find (1) (2) (3) Sol. (1) (2) (3)
06/12/2015Applied Algorithmics - week41 Non-periodicity and witnesses  Periodicity - continued If string w=w[0..n-1] has periodicity p if w[i]=w[i+p],
Volumes By Cylindrical Shells Objective: To develop another method to find volume without known cross-sections.
Computer Science Background for Biologists CSC 487/687 Computing for Bioinformatics Fall 2005.
UMass Lowell Computer Science Analysis of Algorithms Prof. Karen Daniels Fall, 2001 Review Lecture Tuesday, 12/11/01.
Fifth International Conference on Curves and Surfaces Incremental Selective Refinement in Hierarchical Tetrahedral Meshes Leila De Floriani University.
Disc Covering Problem with Application to Digital Halftoning Tetsuo Asano School of Information Science, JAIST Japan Advanced Institute of Science and.
Algorithm Complexity By: Ashish Patel and Alex Golebiewski.
The NP class. NP-completeness Lecture2. The NP-class The NP class is a class that contains all the problems that can be decided by a Non-Deterministic.
2IMA20 Algorithms for Geographic Data Spring 2016 Lecture 3: Movement Patterns.
A Unified Framework for Efficiently Processing Ranking Related Queries
CMPS 3130/6130 Computational Geometry Spring 2017
RE-Tree: An Efficient Index Structure for Regular Expressions
European Symposium on Algorithms – ESA
Orthogonal Range Searching and Kd-Trees
Computability and Complexity
Spatial Online Sampling and Aggregation
Inverse Kinematics Problem:
2IMG15 Algorithms for Geographic Data
Efficient Aggregation over Objects with Extent
Presentation transcript:

DMiST- Data Mining in Spatio-Temporal sets

Input Number of time steps = T Example: T = 9 t=0t=1t=2t=3t=4t=5t=6t=7t=8 Entity: (x1,y1), (x2,y2), …, (x9,y9)

flock encounter convergence Input Number of entities/animals/items = n Example: n=4 and T=11 I 1 : (x 1 1,y 1 1), …, (x 1 T,y 1 T) I 2 : (x 2 1,y 2 1), …, (x 2 T,y 2 T) … I n : (x n 1,y n 1), …, (x n T,y n T)

Example Caribou Satellite Collar Project, Canada. Number of caribou = 15. Time steps = once a week for 8 years.

Input size? To obtain efficient solutions we need solutions that scales well, i.e. algorithms with limited dependency on the input. n - number of entities (20  millions) T – number of time steps (10  thousands) m – size of a flock (2  200) entities k – flock duration (5  50) time steps Size of input = nT Practical algorithms O((nT) 2 ) Fast algorithms O(nT log nT)

Six basic patterns 1.Encounter At least m entities pass through a circular region of radius r. 2.Convergence At least m entities are simultaneously within a circular region of radius r. 3.Flock At least m entities move together during a time interval of length at least s; for every point in time there is a circular region of radius r that contains all the entities. 4.Recurrences At least m entities are visiting a circular region of radius r at least k times. 5.Regular recurrences 6.Concurrent recurrences

Members NICTA Joachim Gudmundsson Thomas Wolle Ghazi Al-Naymat DSTO Brenton Williams Matthew Lowry Uni. of Sydney Sanjay Chawla Uni. of Queensland Xiaofang Zhou Heng Tao Shen Hoyoung Jeung Utrecht University Marc van Kreveld

Members NICTA Algorithms (apx) Computational Geometry Data mining DSTO Applications Data mining Uni. of Sydney Data mining Algorithms Uni. of Queensland Data base systems Data mining Utrecht University Algorithms GIS

Approximations Most problems cannot be solved fast! Instead we need to approximate the solution. Example: Convergence (Radius r is given) Find all discs of radius r that contains at least m entities. r Convergence m=10 Approximate #entities Approximate radius

Convergence  Is there a point that is “covered” by at least m rectangles? Is there a disc of radius r that intersects at least m lines?

Convergence Good news: 2-approximation of the number of entities in O(Tn 2 /m) time. Bad news: Cannot be solved exactly faster than ~Tn 2.

Encounter Is there a disc of radius r that intersects at least m entities at some point in time? t1 t4 t3 t2 2r

Encounter - detect Idea: -Consider one “cylinder” C with radius 2r. -Compute the intersections between C and the n-1 paths. -If > 7m paths inside C at any time then “Encounter” Total time: O(n log n) / cylinder -If not, then solve exactly. Observation: The total size of all subsets within C is O(mn). Total time: O(n log n + nm) / cylinder Time O(Tn 2 (log n+m)).

Flock - definition m – flock size k – flock duration r – radius of disc t1t1 t2t2 t3t3 t4t4

Flock - Problem Problem: Find a largest flock. Problem is NP-hard. Problem as hard as MaxClique! t1t1 a c b d e t2t2 b c a e d c t3t3 b a e d t4t4 e a b d d b d e c t5t5 e a b c d e MaxClique

Flock – Hardness result Cannot be approximated in polynomial time within a factor of n 1-  of the optimal. (even if we approximate the radius (factor 2)). Hopeless?

Flock Idea: An entity in the time interval [t 1,t d ]  A point in 2d-dimensions t1t1 t2t2 t3t3 t4t4 t5t5 t6t6 t7t7 14-dimensional Euclidean space 

Flock t1t1 t2t2 t3t3 t4t4 t5t5 t6t6 t7t7 Intersection of k (2k-2)-dimensional “cylinders” 

Flock 1.For each i=k to T do 2. For every entity E in the time interval [t i,t i+k ] do 3. transform E to a point in 2k-dimensional space 4.Build a “Skip Quadtree” 5. For each point do 6. perform a 2k-dimensional range counting query. Approximation: 3-approximation of the radius Total time: O(Tk (n log n + (1.5) 2k ))

Flock – experimental results #entitiesFlock durationTime (s) 20K4<1 20K867 20K K K K K K K166800

What should be reported? Detect if a pattern exists, report. Report all patterns. Report “largest” pattern

Current and future research Advanced patterns –Regular recurrences –Hierarchical patterns –… Implement practical algorithms Algorithms and association rule mining Input data with errors? External memory algorithms? Generate test data