Efficient Processing of Top-k Spatial Preference Queries João B. Rocha-Junior, Akrivi Vlachou, Christos Doulkeridis, and Kjetil Nørvåg VLDB’ 2011 - Seattle, USA
Outline Top-k spatial preference queries Current approaches Our approach Mapping to distance-score space Query processing Materialization (index construction) Experimental evaluation Conclusion VLDB’ 2011 - Seattle, USA
Motivation Increasing number of Web information systems specialized in location-based queries Systems are limited to simple spatial queries Example: return objects in a given spatial location Top-k spatial preference query Ranks data objects based on the score of feature objects in their spatial neighborhood Combines spatial and non-spatial scores Limited to queries restricted to spatial constraints This query take in account the quality (score) of the features VLDB’ 2011 - Seattle, USA
Top-k spatial preference queries Given a set of data objects and scored feature objects hotel bar café y b1(0.9) b3(0.3) b2(0.6) Query Spatial neighborhood Features of interest (e.g., bars) c1(0.6) Top-1 p2 Returns Ranked set of k best data objects p1 Top-1 c2(0.4) Score of a data object Obtained from feature objects in its spatial neighborhood c4(0.8) c3(0.2) p3 Top-1 x VLDB’ 2011 - Seattle, USA
Score function Aggregation of partial scores Partial score Any monotone function: sum, max, and min Partial score Score of a data object for a set of feature objects Defined by the score of a single feature object Highest score Satisfies the spatial constraint Spatial constraint Range, nearest neighbor, and influence VLDB’ 2011 - Seattle, USA
Example (agg=sum) score(p)=1.5 score(p)=1.0 score(p)=0.6 Range Nearest neighbor Influence score(p)=1.5 score(p)=1.0 score(p)=0.6 VLDB’ 2011 - Seattle, USA
Current approaches Naïve State-of-the-art [1,2] Compute the score of all objects, select the top-k Very costly State-of-the-art [1,2] Data objects and feature objects are indexed by multi-dimensional indices [1] Yiu, M.L., Dai, X., Mamoulis, N., Vaitis, M., : “Top-k spatial preference queries”, ICDE, 2007. [2] Yiu, M.L., Lu, H., Mamoulis, N., Vaitis, M.: “Ranking spatial data by quality preferences”, TKDE, 2011. VLDB’ 2011 - Seattle, USA
Current approaches Probing algorithms (SP and GP) Requires computing the score for all objects Branch and bound algorithms (BB and BB*) Compute an upper-bound score for the entries in the data objects R-tree Prune entries whose upper-bound score is smaller than the score of the k-th object found Feature join algorithm (FJ) Create combinations of feature sets with high score Combinations whose score is smaller than the score of the k-th object found are pruned VLDB’ 2011 - Seattle, USA
Motivation behind our idea… Few feature objects are necessary to compute the score of a data object Features not dominated by any other feature in terms of both distance and score Nice properties Small size in practice Sufficient to support any neighborhood condition and query parameter y c1(0.5) c2(0.6) p1 ? c4(0.4) c5(0.8) c3(0.2) Make dominate clear x hotel café VLDB’2011 - Seattle, USA
Our framework Mapping to distance-score space Identify SKY(p, Fi) Pairs of objects (p, t) with t Fi to be examined Identify SKY(p, Fi) Minimum set of pairs required to compute the score of p according to Fi for any query Materialize SKY(p, Fi) Stored in a R-tree, one R-tree Ri per feature set Fi Efficient query processing and maintenance Query processing algorithm VLDB’ 2011 - Seattle, USA
Mapping to the distance-score space pair (p2,c) pair (p1,c) café hotel (p2,c1) (p1,c1) p1 c3(0.5) c1(0.9) c4(0.3) c2(0.7) p2 (p1,c2) (p2,c3) (p2,c2) (p1,c3) (p2,c4) (p1,c4) Mapping Pairs (object, feature) Space [distance X score] Skyline Minimize: distance Maximize: score VLDB’ 2011 - Seattle, USA
Theoretical properties SKY(p, Fi) is sufficient to determine the partial score of p for any spatial preference query Maintaining SKY(p, Fi) is sufficient to answer any spatial preference query (stored in an R-tree) SKY(p, Fi) is the minimum set required The data required to process range queries permits processing nn and influence queries The proofs of the theorems can be found in the paper VLDB’ 2011 - Seattle, USA
Access to partial scores Only node entries that satisfy the spatial constraint are accessed Items are retrieved in decreasing order of score Minor modifications to support nn and influence root: e1 e2 Max-heap: <p3(0.8),p2(0.6)> Max-heap: <e1(0.8) > e1: (p3,t4) (p2,t1) (p1,t3) e2: (p3,t4) (p2,t4) (p3,t4) VLDB’ 2011 - Seattle, USA
Query processing Compute top-k data objects progressively aggregating partial scores retrieved from Ri Similar to Fagin’s algorithm (NRA) Algorithm Each time an object p is retrieved from Ri, any unseen object p’ in Ri has a score(p’) ≤ score(p) Keep track of lower and upper-bound score of the seen objects Terminates when the lower-bound of the k-th object is better than the upper-bound of the remaining objects VLDB’ 2011 - Seattle, USA
Example (range, r=4.5) + R1 p3(0.8) p1(0.9) R2 = 1.7 r=4.5 r=4.5 hotel restaurant bar R1 p3(0.8) p1(0.9) R2 + = 1.7 Object R1 R2 Score Upper-bound p3 0.8 - 1.7 p1 - 0.9 1.7 VLDB’ 2011 - Seattle, USA
Example (range, r=4.5) + R1 p2(0.6) R2 = 1.2 r=4.5 r=4.5 Object R1 R2 Score Upper-bound p3 0.8 - p1 0.9 1.4 1.5 p2 0.6 1.2 VLDB’ 2011 - Seattle, USA
Example (range, r=4.5) + R1 p1(0.2) p3(0.3) R2 = 0.5 Top-1 r=4.5 r=4.5 Object R1 R2 Score Upper-bound p3 0.8 p1 0.9 p2 0.6 1.2 0.3 1.1 Top-1 0.2 1.1 VLDB’ 2011 - Seattle, USA
Materialization Objects are partitioned into regions The distance among objects in the same region is small The skyline set of the objects in the same region is similar with high probability Compute SKY(R, Fi) for the region R SKY(p, Fi) SKY(R, Fi), ∀p R Advantage The feature set is accessed only once to compute the dynamic skyline of all objects in the region Should I explain dynamic skyline? VLDB’ 2011 - Seattle, USA
Experimental evaluation We compare our approach (SFA) against SP, GP, BB, BB*, and FJ algorithms [1,2] All approaches are implemented in Java Measures: response time, I/O, update time, index construction time, and index size [1] Yiu, M.L., Dai, X., Mamoulis, N., Vaitis, M., : “Top-k spatial preference queries”, ICDE, 2007. [2] Yiu, M.L., Lu, H., Mamoulis, N., Vaitis, M.: “Ranking spatial data by quality preferences”, TKDE, 2011. VLDB’ 2011 - Seattle, USA
Variables studied Data distribution Cardinality (object and features) Uniform (UN), Synthetic (CN), Real (RL) Cardinality (object and features) 50K, 100K, 200K, 400K, 800K, 1600K Number of results (k) 10, 20, 30, 40, 50 Number of feature sets 1, 2, 3, 4 5 Query range (r), for range and influence queries 10, 40, 160, 640, 2560 VLDB’ 2011 - Seattle, USA
Number of feature objects Datasets Datasets Number of data objects Number of feature objects Dynamic skyline set Wal-Mart (WM) 11K 4K 1.98 Hotels (HT) 31K 4.82 Synthetic (CN) 100K 11.26 Uniform (UN) 12.04 VLDB’ 2011 - Seattle, USA
Number of features a) I/O varying the number of feature sets b) response time varying the number of feature sets VLDB’ 2011 - Seattle, USA
Scalability b) response time varying |O| a) response time varying |Fi| VLDB’ 2011 - Seattle, USA
Real datasets a) range b) influence c) nearest neighbor VLDB’ 2011 - Seattle, USA
Conclusion Top-k spatial preference queries are a useful tool for novel location-based applications We propose a new approach for processing top-k spatial preference queries efficiently We find and materialize SKY(p, Fi) We prove that SKY(p, Fi) is sufficient to determine the partial score of p for any spatial preference query The size of SKY(p, Fi) is small in practice We propose algorithms to process queries using our index The efficiency of our approach is verified through experiments on synthetic and real datasets VLDB’ 2011 - Seattle, USA
Thanks! More information: João B. Rocha-Junior joao@idi.ntnu.no http://www.idi.ntnu.no/~joao VLDB’ 2011 - Seattle, USA