A Flexible Spatio-temporal indexing Scheme for Large Scale GPS Tracks Retrieval Yu Zheng, Longhao Wang, Xing Xie Microsoft Research Asia
Outline Introduction Modeling user behavior Index design Experimental results Conclusion
Outline Introduction Modeling user behavior Index design Experimental results Conclusion
Introduction Background – GPS-enabled devices become prevalent – Large amount of GPS logs have been accumulated – Quite a few GPS-data-sharing applications appeared Spatio-temporal index is necessary – For system: to manage the potentially large-scale data – For users: to explore the GPS data interested them
Introduction Problem Definition – Retrieve the GPS trajectories across a given region and intersecting a given time span Present techniques are not optimized to these applications Spatial queryTemporal query
Introduction Our contributions – A stochastic process model: simulating user behavior of uploading GPS tracks Users prefer to upload data they created recently The insert frequency of different parts of index are skewed – A novel indexing scheme: optimized to the user behavior of uploading GPS tracks Smaller index size Minimal update efforts Satisfactory retrieval performance
Outline Introduction Modeling user behavior Index design Experimental results Conclusion
Modeling User Behavior A GPS track Duration of a GPS track Interval between trajectory created and uploaded
Modeling User Behavior Upload log file to server at time Tup Users’ arrival can be modeled as Poisson process T dur follows Gaussian distribution The interval between uploading time and end time of trajectory T int = Tup -Te Can be modeled as Rayleigh distribution Summarized from photos uploaded by multiple users over a period of 3 months on Flickr Ts Te T dur = Te -Ts GPS Log File
Modeling User Behavior A (Ts, Te) represents a GPS track
Outline Introduction Modeling user behavior Index design Experimental results Conclusion
Index Design Architecture – Partition space into disjoint grids – Maintain a temporal index for each grid – The temporal index (CSE-Tree) is special
Temporal Index (CSE-Tree) A GPS segment can be represented by a pair (Ts, Te) A point on two dimensional plane A temporal query is a time span (Time min, Time max )
Temporal index Structure – Partition the points into groups by Te – Build a start time index (B+ Tree) to index points of each group – Build a end time index (B+ Tree) to index groups Ts Te t1 t2t2 ti ti+1
Temporal Index (CSE-Tree) Three operations – Insert – Compress – Search
Temporal Index (CSE-Tree) Compress operation – Occur when update frequency drops to some extent – Convert B+ tree to dynamic array dynamic array B+ Tree
Temporal Index (CSE-Tree) Search operation – Te> Time min : Search End Time index to get the corresponding start time indexes – Ts< Time max : Look up each start time index candidate to find the correct points
Outline Introduction Modeling user behavior Index design Experimental results Conclusion
Experimental Settings Platform – PC with 3.00 GHz Intel Pentium 4 CPU, Windows XP SP2 platform, and 0.99 GB RAM Parameters – B+ tree: Inner node size is 64 bytes Leaf size 1024 bytes – Poisson process: 100, 300, 500 and 700 – Total duration of the process is 2400 hours (100 days) – Rayleigh distribution: T int is – Normal distribution of Tdur: mean (0.42), variance (0.98).
Experimental Results The compress operation saves index size – No overlap between nodes – B+ tree Dynamic array Index size comparison
Experimental Results Insert efforts – Less node access than both SEB-tree and R-tree – Most inserts occur in the area surrounded by the broken line – Few node access in End Time Tree Mean number of node access in one insertion
Experimental Results Query performance Mean number of node access in one query
Conclusion A model simulating user behavior of upload data – Based on stochastic process theory – statistical analysis on the data collection in real world CSE-Tree – Smaller index size – Less node access in insertion – Slightly more node access than SEB-tree in query
Thanks! Q&A