Presentation is loading. Please wait.

Presentation is loading. Please wait.

Computer Science and Engineering Inverted Linear Quadtree: Efficient Top K Spatial Keyword Search Chengyuan Zhang 1,Ying Zhang 1,Wenjie Zhang 1, Xuemin.

Similar presentations


Presentation on theme: "Computer Science and Engineering Inverted Linear Quadtree: Efficient Top K Spatial Keyword Search Chengyuan Zhang 1,Ying Zhang 1,Wenjie Zhang 1, Xuemin."— Presentation transcript:

1 Computer Science and Engineering Inverted Linear Quadtree: Efficient Top K Spatial Keyword Search Chengyuan Zhang 1,Ying Zhang 1,Wenjie Zhang 1, Xuemin Lin 2,1 1 The University of New South Wales, Australia 2 East China Normal University

2 An enormous amount of spatio-textual objects available in many applications online local search e.g., online yellow pages social network services e.g., Facebook, Flickr Background

3 p1 (pizza, coffee,sushi) p3 (pizza, sushi) p2 (pizza, coffee,steak) p4 (coffee, sushi) p5 (pizza, steak,seafood) pizza,coffee

4 4 Top k spatial keyword search ( TOPK-SK ) Data A set of spatio-textual objects Each object is represented a location and a set of keywords Query Query location (q.loc) A set of query keywords (q.T) Answer The closest k objects, each of which contains all query keywords

5 Naïve Approach 11 spatio-textual objects Vocabulary {t1, t2, t3} Query q with q.T = {t1, t2} and k =1 p4 (t1) p6 (t2,t3 ) P10 (t1) p1 (t1,t2) p10 (t1) p3 (t1,t3) p5 (t2,t3) p8 (t3) p7 (t3) p11 (t2) p9 (t2 ) p2 (t1,t2) Distance Order P3 P4 P7 P8 P5 P1 P10 P9 P6 P2 P11 Running Example

6 Inverted R-tree [Y. Zhou,et al., CIKM 2005] Distance Order P3 P4 P7 P8 P5 P1 P10 P9 P6 P2 P11 For each keyword t, construct an R tree for objects containing t E1E1 E2E2 R 1 (t 1 ) R 2 (t 2 ) R 3 (t 3 ) P4P4 P 10 P1P1 P2P2 P3P3 E1E1 E2E2 P2P2 P5P5 P1P1 P6P6 P 11 E1E1 E2E2 P6P6 P7P7 P9P9 P3P3 P5P5 P8P8 E1E1 E2E2 E1E1 E2E2 E1E1 E2E2

7 7 IR2-tree [ I. D. Felipe, et. al., ICDE 2008] Index Structure Combination of an R-Tree and signature technique Each node contains a rectangle and a signature ( a fixed length bitmap) Each word is hashed to a particular bit The signature of a node is the “ Bitwise OR ” of all the signatures of its child nodes

8 8 Example E 11 E 12 11 Distance Order P3P3 P4P4 P7P7 P8P8 P5P5 P1P1 P 10 P9P9 P6P6 P2P2 P 11 E9E9 E 10 11 E7E7 E8E8 E6E6 E4E4 E5E5 0111 E3E3 E1E1 E2E2 01 p2p2 11 p8p8 p5p5 01 p 10 p 11 1001 p6p6 P9P9 10 p1p1 p3p3 11 p4p4 p7p7 1001 E 11 E7E7 t1t1 E 10 E9E9 E8E8 E6E6 E5E5 E4E4 E3E3 E2E2 E1E1 E 12 10 t3t3 01 t2t2 E8E8 p1p1 E5E5 p5p5

9 9 Observations Naïve approach Disadvantages: all objects in the search region are accessed ( large s and p=1 ) Inverted R-tree Advantages: exclude unrelated objects ( small s ) Disadvantages: cannot take advantage of AND semantics (p=1) IR2-tree Advantages: have filtering technique to reduce p Disadvantages: large s and p is affected by non-related objects Other Single Augmented R-tree Other spatial keyword search : KR tree [R. Hariharan, et al., SSDBM 2007] WIR tree [D. Wu, et al., TKDE 2011] Spatial keyword ranking query : IR tree [G. Cong,et al., PVLDB 2009] CM-CDIR tree [D. Wu,et al., VLDBJ 2012] Their shortcomings: same as IR2-tree

10 10 Motivation Index structure have a small number of objects within the search region can prune objects within the search region Properties falls in the category of inverted index exploit the AND semantics adaptive to the distribution of the objects for each keyword

11 11 Motivation non-Empty 1 0 0 1 Empty

12 Regular space partition based indexing Each node can be identified by its split sequence (Morton code, a.k.a Z order) A circle and a square to denote the non- leaf node and leaf node A leaf node is set black if it is not empty, otherwise, it is a white leaf node Keep the black leaf nodes (B+ tree) Linear Quadtree Structure SW, SE 0001 NE 1100

13 IL-Quadtree For each keyword t i ∈ V we build a linear quadtree, denoted by LQ i, for the objects which contain the keyword t i Besides the black leaf nodes we also keep the quadtree node information ( signature ) 1 for black leaf nodes and non- leaf nodes and 0 otherwise

14 14 Search Algorithm Distance Order P3 P4 P7 P8 P5 P1 P10 P9 P6 P2 P11

15 Data A set of spatio-textual objects Each objects has a location and a set of keywords Query A location (q.loc) A set of query keywords (q.T) A direction [ ,  ] Answer The closest k objects, each of which contains all keywords in q.T, and in the search direction Direction-aware spatial keyword search [G. Li, et al., ICDE 2012]

16 16 Spatial Keyword Based Ranking [G. Cong,et al., PVLDB 2009, VLDBJ 2012] Query – Spatial location – Query keywords Returns the k best objects ranked by – Spatial distance to the query location – Textual relevance to the query keywords Spatio-textual ranking Score The spatial proximity (δ) is the normalized Euclidean distance between p and q The textual relevance (θ) is the tf-idf based textual similarity between the description of p and the query keywords. Our Solution the maximal keywords weight replaces the bit signature – aggregate inverted linear quadtree spatial distance ranking function replaced by spatio-textual ranking score function Score based pruning based on weight and region of the quadtree node

17 17 Experimental Setting Implemented in Java Debian Linux o Intel Xeon 2.40GHz dual CPU o 4 GB memory Dataset GN : US Board on Geographic Names Tigers, Cars : o Spatial datasets from Rtree-Portal o Textual content from 20 Newsgroups SYN: synthetic dataset Query (1000) : location, # l q uery keywords Evaluate Response time and # I/O

18 18 DefinitionNotationDefault Value Number of required result k 10 Number of query keywords l 3 Term frequency of vocabulary z 1.1 Number of objects n 1,000,000 Vocabulary size v 100,000 Avg. keywords per object m 15 Parameters evaluated Important Statistics

19 19 Tuning w’ : Minimal depth of the black leaf node c: The split threshold Best performance: – w’ = 8 and c = 64

20 20 l: The number of query keywords Gird : [ M. Christoforaki,et al., CIKM, 2011] Grid+SIG : the extension of Grid, utilizing signature technique

21 21 Algorithms Evaluated ILQ – Inverted Linear Quadtree based techniques IVR – inverted Rtree [Y. Zhou, et al., CIKM 2005] MIR2 – [I. D. Felipe,et al., ICDE 2008] KR – [R. Hariharan,et al., SSDBM 2007] WIR – [D. Wu,et al., TKDE 2011] IR – [G. Cong,et al., PVLDB 2009] CM-CDIR – [D. Wu,et al., VLDBJ 2012]

22 22 Evaluation on different datasets

23 Comparison – Varying l

24 24 Comparison – Varying k

25 Comparison – Varying Parameters

26 26 Conclusion Important properties of indexing techniques to support top k spatial keyword search Propose the inverted linear quadtree structure to efficiently support top k spatial keyword search Extensive experiment on both real and synthetic data Future work Enhance the region based signature technique – group objects to reduce false positive. Support top k spatial keyword search on other metric spaces

27 27

28 Our Algorithm Aggregate ILQ Compare with IR [G. Cong, et al., PVLDB 2009] CM-CDIR [D. Wu,et al., VLDBJ 2012] Dataset: Tiger Spatial Keyword Ranking Query

29 Direction-Aware TOPK-SK Query Our Algorithm ILQ Compare with DESKS [G.Li,et al., ICDE 2012]

30 30 Comparison – Varying k

31 31 IR-Tree

32 32 KR* Tree


Download ppt "Computer Science and Engineering Inverted Linear Quadtree: Efficient Top K Spatial Keyword Search Chengyuan Zhang 1,Ying Zhang 1,Wenjie Zhang 1, Xuemin."

Similar presentations


Ads by Google