Download presentation
Presentation is loading. Please wait.
Published byWendy Tucker Modified over 8 years ago
1
A Flexible Spatio-temporal indexing Scheme for Large Scale GPS Tracks Retrieval Yu Zheng, Longhao Wang, Xing Xie yuzheng@microsoft.com Microsoft Research Asia
2
Outline Introduction Modeling user behavior Index design Experimental results Conclusion
3
Outline Introduction Modeling user behavior Index design Experimental results Conclusion
4
Introduction Background – GPS-enabled devices become prevalent – Large amount of GPS logs have been accumulated – Quite a few GPS-data-sharing applications appeared Spatio-temporal index is necessary – For system: to manage the potentially large-scale data – For users: to explore the GPS data interested them
5
Introduction Problem Definition – Retrieve the GPS trajectories across a given region and intersecting a given time span Present techniques are not optimized to these applications Spatial queryTemporal query
6
Introduction Our contributions – A stochastic process model: simulating user behavior of uploading GPS tracks Users prefer to upload data they created recently The insert frequency of different parts of index are skewed – A novel indexing scheme: optimized to the user behavior of uploading GPS tracks Smaller index size Minimal update efforts Satisfactory retrieval performance
7
Outline Introduction Modeling user behavior Index design Experimental results Conclusion
8
Modeling User Behavior A GPS track Duration of a GPS track Interval between trajectory created and uploaded
9
Modeling User Behavior Upload log file to server at time Tup Users’ arrival can be modeled as Poisson process T dur follows Gaussian distribution The interval between uploading time and end time of trajectory T int = Tup -Te Can be modeled as Rayleigh distribution Summarized from photos uploaded by multiple users over a period of 3 months on Flickr Ts Te T dur = Te -Ts GPS Log File
10
Modeling User Behavior A (Ts, Te) represents a GPS track
11
Outline Introduction Modeling user behavior Index design Experimental results Conclusion
12
Index Design Architecture – Partition space into disjoint grids – Maintain a temporal index for each grid – The temporal index (CSE-Tree) is special
13
Temporal Index (CSE-Tree) A GPS segment can be represented by a pair (Ts, Te) A point on two dimensional plane A temporal query is a time span (Time min, Time max )
14
Temporal index Structure – Partition the points into groups by Te – Build a start time index (B+ Tree) to index points of each group – Build a end time index (B+ Tree) to index groups Ts Te t1 t2t2 ti ti+1
15
Temporal Index (CSE-Tree) Three operations – Insert – Compress – Search
16
Temporal Index (CSE-Tree) Compress operation – Occur when update frequency drops to some extent – Convert B+ tree to dynamic array dynamic array B+ Tree
17
Temporal Index (CSE-Tree) Search operation – Te> Time min : Search End Time index to get the corresponding start time indexes – Ts< Time max : Look up each start time index candidate to find the correct points
18
Outline Introduction Modeling user behavior Index design Experimental results Conclusion
19
Experimental Settings Platform – PC with 3.00 GHz Intel Pentium 4 CPU, Windows XP SP2 platform, and 0.99 GB RAM Parameters – B+ tree: Inner node size is 64 bytes Leaf size 1024 bytes – Poisson process: 100, 300, 500 and 700 – Total duration of the process is 2400 hours (100 days) – Rayleigh distribution: T int is 1.07. – Normal distribution of Tdur: mean (0.42), variance (0.98).
20
Experimental Results The compress operation saves index size – No overlap between nodes – B+ tree Dynamic array Index size comparison
21
Experimental Results Insert efforts – Less node access than both SEB-tree and R-tree – Most inserts occur in the area surrounded by the broken line – Few node access in End Time Tree Mean number of node access in one insertion
22
Experimental Results Query performance Mean number of node access in one query
23
Conclusion A model simulating user behavior of upload data – Based on stochastic process theory – statistical analysis on the data collection in real world CSE-Tree – Smaller index size – Less node access in insertion – Slightly more node access than SEB-tree in query
24
Thanks! yuzheng@microsoft.com Q&A
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.