Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Flexible Spatio-temporal indexing Scheme for Large Scale GPS Tracks Retrieval Yu Zheng, Longhao Wang, Xing Xie Microsoft Research.

Similar presentations


Presentation on theme: "A Flexible Spatio-temporal indexing Scheme for Large Scale GPS Tracks Retrieval Yu Zheng, Longhao Wang, Xing Xie Microsoft Research."— Presentation transcript:

1 A Flexible Spatio-temporal indexing Scheme for Large Scale GPS Tracks Retrieval Yu Zheng, Longhao Wang, Xing Xie yuzheng@microsoft.com Microsoft Research Asia

2 Outline Introduction Modeling user behavior Index design Experimental results Conclusion

3 Outline Introduction Modeling user behavior Index design Experimental results Conclusion

4 Introduction Background – GPS-enabled devices become prevalent – Large amount of GPS logs have been accumulated – Quite a few GPS-data-sharing applications appeared Spatio-temporal index is necessary – For system: to manage the potentially large-scale data – For users: to explore the GPS data interested them

5 Introduction Problem Definition – Retrieve the GPS trajectories across a given region and intersecting a given time span Present techniques are not optimized to these applications Spatial queryTemporal query

6 Introduction Our contributions – A stochastic process model: simulating user behavior of uploading GPS tracks Users prefer to upload data they created recently The insert frequency of different parts of index are skewed – A novel indexing scheme: optimized to the user behavior of uploading GPS tracks Smaller index size Minimal update efforts Satisfactory retrieval performance

7 Outline Introduction Modeling user behavior Index design Experimental results Conclusion

8 Modeling User Behavior A GPS track Duration of a GPS track Interval between trajectory created and uploaded

9 Modeling User Behavior Upload log file to server at time Tup Users’ arrival can be modeled as Poisson process T dur follows Gaussian distribution The interval between uploading time and end time of trajectory T int = Tup -Te Can be modeled as Rayleigh distribution Summarized from photos uploaded by multiple users over a period of 3 months on Flickr Ts Te T dur = Te -Ts GPS Log File

10 Modeling User Behavior A (Ts, Te) represents a GPS track

11 Outline Introduction Modeling user behavior Index design Experimental results Conclusion

12 Index Design Architecture – Partition space into disjoint grids – Maintain a temporal index for each grid – The temporal index (CSE-Tree) is special

13 Temporal Index (CSE-Tree) A GPS segment can be represented by a pair (Ts, Te) A point on two dimensional plane A temporal query is a time span (Time min, Time max )

14 Temporal index Structure – Partition the points into groups by Te – Build a start time index (B+ Tree) to index points of each group – Build a end time index (B+ Tree) to index groups Ts Te t1 t2t2 ti ti+1

15 Temporal Index (CSE-Tree) Three operations – Insert – Compress – Search

16 Temporal Index (CSE-Tree) Compress operation – Occur when update frequency drops to some extent – Convert B+ tree to dynamic array dynamic array B+ Tree

17 Temporal Index (CSE-Tree) Search operation – Te> Time min : Search End Time index to get the corresponding start time indexes – Ts< Time max : Look up each start time index candidate to find the correct points

18 Outline Introduction Modeling user behavior Index design Experimental results Conclusion

19 Experimental Settings Platform – PC with 3.00 GHz Intel Pentium 4 CPU, Windows XP SP2 platform, and 0.99 GB RAM Parameters – B+ tree: Inner node size is 64 bytes Leaf size 1024 bytes – Poisson process: 100, 300, 500 and 700 – Total duration of the process is 2400 hours (100 days) – Rayleigh distribution: T int is 1.07. – Normal distribution of Tdur: mean (0.42), variance (0.98).

20 Experimental Results The compress operation saves index size – No overlap between nodes – B+ tree  Dynamic array Index size comparison

21 Experimental Results Insert efforts – Less node access than both SEB-tree and R-tree – Most inserts occur in the area surrounded by the broken line – Few node access in End Time Tree Mean number of node access in one insertion

22 Experimental Results Query performance Mean number of node access in one query

23 Conclusion A model simulating user behavior of upload data – Based on stochastic process theory – statistical analysis on the data collection in real world CSE-Tree – Smaller index size – Less node access in insertion – Slightly more node access than SEB-tree in query

24 Thanks! yuzheng@microsoft.com Q&A


Download ppt "A Flexible Spatio-temporal indexing Scheme for Large Scale GPS Tracks Retrieval Yu Zheng, Longhao Wang, Xing Xie Microsoft Research."

Similar presentations


Ads by Google