Presentation is loading. Please wait.

Presentation is loading. Please wait.

Discovering Interesting Sub-paths in Spatiotemporal Datasets: A Summary of Results Xun Zhou, Shashi Shekhar, Pradeep Mohan, Stefan Liess, and Peter K.

Similar presentations


Presentation on theme: "Discovering Interesting Sub-paths in Spatiotemporal Datasets: A Summary of Results Xun Zhou, Shashi Shekhar, Pradeep Mohan, Stefan Liess, and Peter K."— Presentation transcript:

1 Discovering Interesting Sub-paths in Spatiotemporal Datasets: A Summary of Results Xun Zhou, Shashi Shekhar, Pradeep Mohan, Stefan Liess, and Peter K. Snyder xun@cs.umn.edu

2 2 Outline Introduction Problem Formulation & Challenges Computational Solutions Experimental Evaluation Case Study Conclusion and Future Work

3 3 Interesting ST Sub-path Interesting subsets of ST paths ৹ Climate Change ৹ Transport Science ৹ Environmental Monitoring Speed profile along a trajectory on I-95 Mississippi river Source: http://ops.fhwa.dot.gov/tolling_pricing/val ue_pricing/pubs_reports/projectreports/i9 5managedlanes/index.htm Sea level rise along coastal areas Source: http://scienceblogs.com/intersection/2009/01/federal_report _warns_of_rising.php Source: http://blog.seattlepi.com/environment/http://blog.seattlepi.com/environment/

4 4 Sub-path of Abrupt Change Spatial sub-path of Abrupt Change ৹ Sharp change in vegetation cover ৹ Transition between ecological zones (ecotones) ৹ Moves in response to climate change The change is enduringly abrupt W 1 =[12N, 17N] W2W2 W3W3 A plot of vegetation cover along 18.5E longitude (the red line) from GIMMS vegetation dataset [1] Vegetation Cover in Africa in NDVI (normalized difference vegetation index)

5 5 Related Work Interesting Sub-path Discovery Interesting point/unit sub-path Interesting sub-path with arbitrary length (our work) 2-D: Edge detection [4] 1-D: Change point detection, e.g., CUSUM [3]

6 6 Our contribution Formalize the Interesting Sub-path Discovery problem A novel computational solution : SEP Cost model and analysis on its performance Case study in real application

7 7 Problem Formulation: Basic Concepts Interesting Sub-path (ISP): (1). Interest Measure: Function F spi (i, j)  R, R is a real value. F spi is an algebraic function [5] (e.g., mean=sum/count) (2). Interestingness test T: F spi  {True, False} (3). Example: “average increase is at least 3.5” 823271216131823121 1 2 3 4 5 6 7 8 9 10 11 12 Attribute value Location Difference value: 7 -6 1 -1 5 5 4 -3 5 5 -11 Unit Sub-path : 1-2 2-3 3-4 4-5 5-6 6-7 7-8 8-9 9-10 10-11 11-12 Sub-path: A contiguous subset of a path Unit sub-path: two neighboring locations, length = 1. ৹ A value is associated with each unit sub-path. Dominant ISP (DISP): ৹ An ISP that is not a subset of any other ISP. Slope of (5, 11) = 3.5 ! Aggregate Functions Distributive: SUM, COUNT. SUM(1, 5)= SUM(SUM(1,3), SUM(3,5)) Algebraic: AVG. AVG = SUM/COUNT. Holistic: MEDIAN

8 8 Problem Statement Given ৹ A path S in a ST framework with n unit sub-paths ৹ A function f of values associated with each sub-path in S ৹ A interestingness measure (algebraic function) F spi : R n  R ৹ A test function T: R  {True, False} Find ৹ All the dominant interesting sub-paths (DISP) in S Objective ৹ Reduce computational cost Constraints ৹ Correctness & Completeness Difference value: 7 -6 1 -1 5 5 4 -3 5 5 -11 Unit Sub-path : 1-2 2-3 3-4 4-5 5-6 6-7 7-8 8-9 9-10 10-11 11-12

9 9 Challenges No pre-defined maximum length for DISPs ৹ E.g., the length can range from 1 to then |S| Pattern interestingness is lack of monotonicity ৹ Interest measures are usually algebraic functions ৹ E.g., sub-path (8, 9) in sub-path (5, 11). The data volume can be very large. ৹ Long time series/Fine resolution images. ৹ GPS Trajectories.

10 10 Step 1: ISP identification Exhaustively enumerate all the sub-paths Scan each sub-path to compute and test the interestingness Step 2: Dominated ISP elimination For each ISP in the candidate set, eliminate all the ISPs it dominates Computational Solutions: Naive Approach Bottleneck 1: Repetitive scans of sub-paths to computer F spi. Bottleneck 2: Many dominated sub-paths are generated.

11 11 Computational Solution: SEP Approach Solution 1: Build lookup tables for distributive functions ৹ E.g., SUM(3,5)=SUM(1,5)-SUM(1,3) ৹ Built in linear time, lookup in constant time ৹ Reversible Aggregate Function [6] : sum, count, etc. Solution 2: Design efficient enumeration strategies ৹ Traverse the sub-path space in certain order ৹ Following the dominance relationship The Sub-path Enumeration and Pruning (SEP) Approach Sub-path(1,2)(1,3)(1,4)(1,5)(1,6)(1,7)(1,8)(1,9)(1,10)(1,11)(1,12) SUM7 1216111512172211 Difference value: 7 -6 1 -1 5 5 4 -3 5 5 -11 Unit Sub-path : 1-2 2-3 3-4 4-5 5-6 6-7 7-8 8-9 9-10 10-11 11-12

12 12 Step 0: Build the lookup table by scanning the entire path Step 1: Sub-path enumeration Step 2: Dominated sub-path elimination {Identical to that of Naive Approach} SEP with Row-wise Traversal

13 13 SEP with Top-down Traversal Traversal space  Grid-based DAG A breadth-first traversal on the G-DAG ৹ A node can be visited only if none of its predecessors is pruned. ৹ Determine the number of predecessors 1 12 122 1222 12222 122222 1222222 12222222 122222222 1222222222 01111111111 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12

14 14 Experimental Evaluation(1) (1) PLR = 0.1 (worst case for SEP ) (2) PLR = 1 (best case for SEP top-down) Pattern Length Ratio (PLR) Longest DISP’s length against total number of unit sub- paths Run time: Naive vs. SEP two designs.

15 15 Experimental Evaluation(2)  Performance of the two traversal designs with PLR: 0.1  1 Summary: (1)SEP is scalable & efficient compared to the Naive approach. (2) Top-down outperforms row-wise when data has longer DISPs.

16 16 Case Study: Results on Spatial Paths Input : GIMMS vegetation cover in NDVI, Aug. 1-15, 1981, Africa. Output : Sub-paths with vegetation cover change in above data. Interest Measure: “Sameness Degree (SD)” “Average value change” against “average value change that >=Θ a ”

17 17 Conclusion and Future Work Conclusion ৹ SEP is a novel computational solution to the Interesting Sub-path Discovery problem ৹ It is effective, efficient and scalable. ৹ A cost model is studied to analyze the performance tradeoff. Future Work ৹ Improve algorithmic design and evaluation metric ৹ Interesting Spatial-Temporal Regions. ৹ Application on other domains (transport science, etc).

18 18 Acknowledgements and References We would like to thank ৹ ACMGIS reviewers ৹ Sponsor of this work: NSF, USDOD ৹ Spatial Database and Data Mining Group @ UMN ৹ Kim Koffolt References [1] Tucker, C.J., J.E. Pinzon, M.E. Brown. Global inventory modeling and mapping studies. Global Land Cover Facility, University of Maryland, College Park, Maryland, 1981-2006. [2] Joint Institute for the Study of the Atmosphere and Ocean(JISAO). Sahel rainfall index. http://jisao.washington.edu/data/sahel/.http://jisao.washington.edu/data/sahel/ [3] E. Page. Continuous inspection schemes. Biometrika, 41(1/2):100-115, 1954. [4] J. Canny. A computational approach to edge detection. Readings in computer vision: issues, problems, principles, and paradigms, 184(87-116):86, 1987. [5] S. Shekhar and S. Chawla. Spatial Ddatabases: A Tour. Prentice Hall, 2003 (ISBN 013-017480-7). [6] S. Cluet and G. Moerkotte. Efficient evaluation of aggregates on bulk types. In In Proc. Int. Workshop on Database Programming Languages, 1995

19 19 Sub-path of Abrupt Change Spatial sub-path of Abrupt Change ৹ Sharp change in vegetation cover ৹ Transition between ecological zones (ecotones) ৹ Moves in response to climate change The change is enduringly abrupt W 1 =[12N, 17N] W2W2 W3W3 A plot of vegetation cover along 18.5E longitude (the red line) from GIMMS vegetation dataset [1] Vegetation Cover in Africa in NDVI (normalized difference vegetation index) Temporal sub-path of Abrupt Change ৹ Abrupt shift in precipitation, temperature, etc. ৹ Climate change detection. Smoothed Sahel precipitation anomaly (JJASO) Raw Sahel precipitation anomaly (JJASO)[2]

20 20 Computational Solution: Bottlenecks(2) Solution 2: Traverse efficiently to reduce candidates. ৹ Dominating relationship: partial order relationship over the traversal space. ৹ Search along this relationship may reduce dominated sub-paths generated. ৹ The Sub-path Enumeration and Pruning (SEP) Approach with two different traversal strategies.

21 21 SEP with Top-down Traversal(2) Determine the number of predecessors Use an array to record the number of predecessors visited 1-2 2-3 3-4 4-5 5-6 6-7 7-8 8-9 9-10 10-11 11-12 Case# Predecessors Root node0 Boundary node1 Inner node2 5 11 2 1 0

22 22 Step 0: Build the lookup table by scanning the entire path Step 1: ISP Identification Step 2: Not Needed ptv[][] : predecessors to visit; Q[]: queue for breadth-first traversal; Q.Enqueue (S) While Q is not empty W = Q.pop() Compute F spi (W) using the lookup tables If T(F spi ) == TRUE Then Output W Next Loop End IF For each successor (i, j) of W update ptv[i][j] If ptv[i][j]==0 Then Q.enqueue([i,j]) End For End While SEP with Top-down Traversal(3)

23 23 Theoretical Analysis n: Number of unit sub-paths ApproachNaiveSEP Row-wiseSEP Top-down Best case time complexity O(n 3 )O(n 2 )O(n) Worst case time complexity O(n 4 )O(n 2 ) Space complexityO(n) O(n 2 )

24 24 Case Study: Results on Temporal Dimension Temporal Sub-paths of abrupt precipitation change in the Sahel region, Africa.

25 25 Experimental Evaluation Evaluation Goals ৹ Scalability of the SEP approach vs. the Naive approach. ৹ Dominance zone in performance of the two design decisions. Experiment Setup ৹ Use the implementation for the “abrupt change sub-path detection”. ৹ Interest measure: AVG of slope Parameters ৹ Path length: total # of sub-paths in the traversal space. ৹ Pattern Length Ratio (PLR) Longest DISP’s length against total number of unit sub-paths


Download ppt "Discovering Interesting Sub-paths in Spatiotemporal Datasets: A Summary of Results Xun Zhou, Shashi Shekhar, Pradeep Mohan, Stefan Liess, and Peter K."

Similar presentations


Ads by Google