Searching Trajectories by Locations – An Efficiency Study Zaiben Chen 1, Heng Tao Shen 1, Xiaofang Zhou 1, Yu Zheng 2, Xing Xie 2 1 The University of Queensland.

Searching Trajectories by Locations – An Efficiency Study Zaiben Chen 1, Heng Tao Shen 1, Xiaofang Zhou 1, Yu Zheng 2, Xing Xie 2 1 The University of Queensland 2 Microsoft Research, Asia

Outline Research problem & application scenarios Basic ideas K Best-Connected Trajectory (k-BCT) query The Incremental k-NN Algorithm (IKNN) Performance study Best-first Depth-first Optimization & extension Experiments Conclusion

Research Problem: Searching Trajectory Databases GPS trajectories collected by GeoLife Project, MSRA How to retrieve the trajectories we want?

Searching Trajectory Databases Search by a location Search by a sample trajectory Frentzos et al. Geoinfomatica07; Dfoser et al. VLDB00. (R-tree variants) Chen et al, SIGMOD05; Vlachos et al, ICDE02; Yi et al, ICDE98, etc. (Similarity)

Searching Trajectory Databases The problem we study: Searching by multiple locations To find trajectories that are close to all the locations Technically, it is an extension of the single-location based query. But more complicated. Practically, it produces a more general way to search trajectories. Two extreme cases (one location, many locations)

Application motivations The Microsoft GeoLife Project http://research.microsoft.com/en-us/projects/geolife/ GeoLife is a location-based service built on Microsoft Virtual Earth. Our work benefits the following two functions (1) Travel recommendation E.g. To help a visitor planning a trip to multiple attractions by considering others traveling trajectories. (2) Sharing life experiences & friend recommendation E.g. To find out which users share the similar daily route through Queens Plaza, Central Stat., Mains St.

Application motivations Geo-Coding: From Pictures to Coordinates The recommended route

Application motivations Geo-Coding: From Pictures to Coordinates The recommended route The first step: to define the closeness (i.e. distance) between a trajectory and locations

Similarity Function The similarity function reflects how close a trajectory is to the given locations, and we call the most similar trajectory the best-connected trajectory. Step 1. find out the closest trajectory point on R to each location q i Step 2. sum up the contribution of each matched pair. (unordered query) Dist q (q i, R) is the shortest distance from q i to R Q={q 1, q 2, … q m }, R={p 1, p 2, … p n }

Problem Definition k-Best Connected Trajectory (k-BCT) query Given a set of trajectories T = {R 1, R 2, …, R n }, a set of query locations Q = {q 1, q 2, …, q m }, and the similarity function Sim(Q, R), the k-BCT query is to find the k trajectories among T that have the highest similarity. Assumption: The number of query locations is small. (m is a small constant) Intuition: The k-BCT result is the JOIN of m single-location based queries.

Basic ideas Incremental k-NN Algorithm (IKNN) Step 1. Index all the trajectory points by one single R-tree Get the shortest distance from a query location to the trajectories Step 2. Search for the λ-nearest neighbor (λ-NN) of each query location (q 1 to q m ), by using any traditional k-nearest neighbor algorithm over R-tree. For any trajectory that scanned by a λ-NN, its shortest distance to the query point is known. Candidate set C = {all scanned trajectories}

IKNN algorithm Step 3. Construct lower bounds of similarity. For a trajectory R1 in C, assume it got 3 points p1, p2 and p3 scanned by the λ-NN search of q1, q2. R1 p1p2 Sim(Q, R1) = e -|q1, p1| + e -|q2, p2| + e -|q3, p5| p3 q1 q2q3 p5 e -|q1, p1| + e -|q2, p2|

The Incremental k-NN algorithm Step 4. Construct upper bound of similarity. For any trajectory that is not covered by the λ-NN search, e.g. R5 its distance to q i must be larger than the radius of q i R1 Sim(Q, R5) = e -|q1, R5| + e -|q2, R5| + e -|q3, R5| e -radius1 + e -radius2 + e -radius3 q1 q2q3 R5 radius1radius2radius3

The Incremental k-NN algorithm Step 5. Check the STOP condition (pruning condition) For a k-BCT query, if we can get k candidate trajectories whose lower bounds are not less than the upper bound of similarity for all un-scanned trajectories, then the k best-connected trajectories must be included in the candidate set. if the condition is satisfied go to the refinement step else increase λ by some Δ repeat the search process With the search region of the λ-NN search enlarges, eventually k best-connected trajectories will be found.

Problem The problem: we may need to increase λ and compute the lower/upper bounds for many rounds before we eventually find the k-BCT results. The λ-NN search will run for many rounds for every query location. (let λ be a constant k initially, and Δ be k as well) round 1: 1 – k nearest neighbors round 2: 1 – 2k nearest neighbors … round i: 1 – i*k nearest neighbors Trajectory points are visited multiple times. Normally, λ >> k, so the complexity is λ^2.

Problem The problem: we may need to increase λ and compute the lower/upper bounds for many rounds before we eventually find the k-BCT results. The λ-NN search will run for many rounds for every query location. (let λ be a constant k initially, and Δ be k as well) round 1: 1 – k nearest neighbors round 2: 1 – 2k nearest neighbors … round i: 1 – i*k nearest neighbors Normally, λ >> k, so the complexity is lambda square. Can we reduce the overlapped search regions?

Efficiency study of the IKNN Adaption of the λ-NN algorithm The best-first nearest neighbor search [Hjaltason et al., TODS99] A priority queue is maintained to store all the R-tree entries that have yet to be visited, using the MINDIST as a key. So it visits MBRs/Objects in the order of the MINDIST. The depth-first nearest neighbor search [Roussopoulos et al., SIGMOD95] It recursively traverses the R-tree level by level in a depth-first manner, while maintaining a global list of k nearest candidates found so far. Estimate the performance of the IKNN adopting different λ-NN algorithms

Adaption of the λ-NN algorithm The best-first NN search Retrieve the λ, λ+, λ+2, … NN for each query location incrementally until the k best-connected trajectories are included in the candidate set. Benefit The λ-NN is returned in an incremental way I/O optimal, no overlap occurs, V sum = λ Shortcoming Memory consumption is NOT guaranteed. A priority queue is maintained to store all the R-tree entries that have yet to be visited. The queue may be as large as the whole dataset in an extreme case.

The best-first strategy Performance (R-tree leaf access) Estimate the circle region (with radius r) that contains λ points [Belussi et al. VLDB95] Estimate the leaf access of a range query with radius r [Korn et al. TKDE2001] m independent λ-NN queries q λ objects radius

Adaption of the lambda-NN algorithm The depth-first NN search Every time we search for the λ+ NN, we have to re-visit the search region of the λ-NN query. Benefit: Guaranteed memory usage, O(c Log c N) Drawback: Too many overlaps A simple improvement: Double λ at each round, to reduce the number of rounds and amortize cost. Pruning: All MBRs whose MAXDIST is even smaller than the current search range of λ-NN can be skipped in the search of λ+ NN.

The depth-first strategy Performance (R-tree leaf access) The search region is not necessary a circle! So we can not use the previous method directly. Estimate the size of the first visited MBR (at any level) that contains not less than λ points Estimate the radius (MAXDIST) of the region that contains the MBR MBR 1 qiqi MAXDIST R-tree nodes outside the circle with radius MAXDIST wont be visited.

The depth-first strategy (cont.) Performance Estimate the leaf access of a range query with radius MAXDIST [Korn et al. TKDE2001] Finally,

Summary IKNN algorithmMemory usageObject visitsLeaf access The best-first strategy no guaranteem × O(λ) The depth-first strategy O(logN * c)m × O(λ) The best-first strategy, although has no guarantee in memory usage, it normally runs faster and the priority queue can still be accommodated in the main memory of a modern computer easily. The modified depth-first strategy reaches nearly the same performance as that of the best-first strategy, while it still preserves a low memory consumption

Optimization & Extension Considering the importance of the query locations and assigning different weights in exploring objects. Extension to query locations with an order specified

Experiments 12, 653 trajectories (1,147,116 points) collected by the Geolife project Number of query locations: 2 to 10 Tests are conducted on PC with 2.1GHz CPU and 1GB memory

Experiments – Node Access

Experiments – Query Time

Experiments – Memory Usage

Thank you

Searching Trajectories by Locations – An Efficiency Study Zaiben Chen 1, Heng Tao Shen 1, Xiaofang Zhou 1, Yu Zheng 2, Xing Xie 2 1 The University of Queensland.

Similar presentations

Presentation on theme: "Searching Trajectories by Locations – An Efficiency Study Zaiben Chen 1, Heng Tao Shen 1, Xiaofang Zhou 1, Yu Zheng 2, Xing Xie 2 1 The University of Queensland."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Searching Trajectories by Locations – An Efficiency Study Zaiben Chen 1, Heng Tao Shen 1, Xiaofang Zhou 1, Yu Zheng 2, Xing Xie 2 1 The University of Queensland.

Similar presentations

Presentation on theme: "Searching Trajectories by Locations – An Efficiency Study Zaiben Chen 1, Heng Tao Shen 1, Xiaofang Zhou 1, Yu Zheng 2, Xing Xie 2 1 The University of Queensland."— Presentation transcript:

Similar presentations

About project

Feedback