Presentation is loading. Please wait.

Presentation is loading. Please wait.

DMiST- Data Mining in Spatio-Temporal sets www.dmist.net.

Similar presentations


Presentation on theme: "DMiST- Data Mining in Spatio-Temporal sets www.dmist.net."— Presentation transcript:

1 DMiST- Data Mining in Spatio-Temporal sets www.dmist.net

2 Input Number of time steps = T Example: T = 9 t=0t=1t=2t=3t=4t=5t=6t=7t=8 Entity: (x1,y1), (x2,y2), …, (x9,y9)

3 flock encounter convergence Input Number of entities/animals/items = n Example: n=4 and T=11 I 1 : (x 1 1,y 1 1), …, (x 1 T,y 1 T) I 2 : (x 2 1,y 2 1), …, (x 2 T,y 2 T) … I n : (x n 1,y n 1), …, (x n T,y n T)

4 Example Caribou Satellite Collar Project, Canada. Number of caribou = 15. Time steps = once a week for 8 years.

5 Input size? To obtain efficient solutions we need solutions that scales well, i.e. algorithms with limited dependency on the input. n - number of entities (20  millions) T – number of time steps (10  thousands) m – size of a flock (2  200) entities k – flock duration (5  50) time steps Size of input = nT Practical algorithms O((nT) 2 ) Fast algorithms O(nT log nT)

6 Six basic patterns 1.Encounter At least m entities pass through a circular region of radius r. 2.Convergence At least m entities are simultaneously within a circular region of radius r. 3.Flock At least m entities move together during a time interval of length at least s; for every point in time there is a circular region of radius r that contains all the entities. 4.Recurrences At least m entities are visiting a circular region of radius r at least k times. 5.Regular recurrences 6.Concurrent recurrences

7 Members NICTA Joachim Gudmundsson Thomas Wolle Ghazi Al-Naymat DSTO Brenton Williams Matthew Lowry Uni. of Sydney Sanjay Chawla Uni. of Queensland Xiaofang Zhou Heng Tao Shen Hoyoung Jeung Utrecht University Marc van Kreveld

8 Members NICTA Algorithms (apx) Computational Geometry Data mining DSTO Applications Data mining Uni. of Sydney Data mining Algorithms Uni. of Queensland Data base systems Data mining Utrecht University Algorithms GIS

9 Approximations Most problems cannot be solved fast! Instead we need to approximate the solution. Example: Convergence (Radius r is given) Find all discs of radius r that contains at least m entities. r Convergence m=10 Approximate #entities Approximate radius

10 Convergence  Is there a point that is “covered” by at least m rectangles? Is there a disc of radius r that intersects at least m lines?

11 Convergence Good news: 2-approximation of the number of entities in O(Tn 2 /m) time. Bad news: Cannot be solved exactly faster than ~Tn 2.

12 Encounter Is there a disc of radius r that intersects at least m entities at some point in time? t1 t4 t3 t2 2r

13 Encounter - detect Idea: -Consider one “cylinder” C with radius 2r. -Compute the intersections between C and the n-1 paths. -If > 7m paths inside C at any time then “Encounter” Total time: O(n log n) / cylinder -If not, then solve exactly. Observation: The total size of all subsets within C is O(mn). Total time: O(n log n + nm) / cylinder Time O(Tn 2 (log n+m)).

14 Flock - definition m – flock size k – flock duration r – radius of disc t1t1 t2t2 t3t3 t4t4

15 Flock - Problem Problem: Find a largest flock. Problem is NP-hard. Problem as hard as MaxClique! t1t1 a c b d e t2t2 b c a e d c t3t3 b a e d t4t4 e a b d d b d e c t5t5 e a b c d e MaxClique

16 Flock – Hardness result Cannot be approximated in polynomial time within a factor of n 1-  of the optimal. (even if we approximate the radius (factor 2)). Hopeless?

17 Flock Idea: An entity in the time interval [t 1,t d ]  A point in 2d-dimensions t1t1 t2t2 t3t3 t4t4 t5t5 t6t6 t7t7 14-dimensional Euclidean space 

18 Flock t1t1 t2t2 t3t3 t4t4 t5t5 t6t6 t7t7 Intersection of k (2k-2)-dimensional “cylinders” 

19 Flock 1.For each i=k to T do 2. For every entity E in the time interval [t i,t i+k ] do 3. transform E to a point in 2k-dimensional space 4.Build a “Skip Quadtree” 5. For each point do 6. perform a 2k-dimensional range counting query. Approximation: 3-approximation of the radius Total time: O(Tk (n log n + (1.5) 2k ))

20 Flock – experimental results #entitiesFlock durationTime (s) 20K4<1 20K867 20K1681 20K3294 80K47 8980 80K161600 160K420 160K82800 160K166800

21 What should be reported? Detect if a pattern exists, report. Report all patterns. Report “largest” pattern

22 Current and future research Advanced patterns –Regular recurrences –Hierarchical patterns –… Implement practical algorithms Algorithms and association rule mining Input data with errors? External memory algorithms? Generate test data


Download ppt "DMiST- Data Mining in Spatio-Temporal sets www.dmist.net."

Similar presentations


Ads by Google