Presentation is loading. Please wait.

Presentation is loading. Please wait.

Speaker: Sattam Alsubaiee Supporting Location-Based Approximate-Keyword Queries Sattam Alsubaiee, Alexander Behm, and Chen Li University of California,

Similar presentations


Presentation on theme: "Speaker: Sattam Alsubaiee Supporting Location-Based Approximate-Keyword Queries Sattam Alsubaiee, Alexander Behm, and Chen Li University of California,"— Presentation transcript:

1 Speaker: Sattam Alsubaiee Supporting Location-Based Approximate-Keyword Queries Sattam Alsubaiee, Alexander Behm, and Chen Li University of California, Irvine Sattam Alsubaiee, Alexander Behm, and Chen Li University of California, Irvine Supporting Location-Based Approximate-Keyword Queries ACM SIGSPATIAL GIS 2010 1

2 Speaker: Sattam Alsubaiee Lunch Time! I want Chinese food! Supporting Location-Based Approximate-Keyword Queries ACM SIGSPATIAL GIS 2010 2 Remembering restaurant name?! Ch-o-chi?! Remembering restaurant name?! Ch-o-chi?!

3 Speaker: Sattam Alsubaiee Let’s Find It! Supporting Location-Based Approximate-Keyword Queries ACM SIGSPATIAL GIS 2010 3

4 Speaker: Sattam Alsubaiee Just One Typo Supporting Location-Based Approximate-Keyword Queries ACM SIGSPATIAL GIS 2010 4

5 Speaker: Sattam Alsubaiee Problem Formulation Supporting Location-Based Approximate-Keyword Queries ACM SIGSPATIAL GIS 2010 5 Object Collection chaochi restaurant starbucks apple store sam’s club … Object Collection chaochi restaurant starbucks apple store sam’s club … Find objects in “San Jose” with keywords similar to “chochi” & “resturant”

6 Speaker: Sattam Alsubaiee Preliminaries: Location-Based Keyword Search Find objects within a given spatial region that have a given set of keywords Supporting Location-Based Approximate-Keyword Queries ACM SIGSPATIAL GIS 2010 Augment a hierarchal spatial index with textual information 6

7 Speaker: Sattam Alsubaiee Preliminaries: Approximate String Search Supporting Location-Based Approximate-Keyword Queries ACM SIGSPATIAL GIS 2010 7 … chaochi chucho church Query q: chochi Query q: chochi Collection of strings s Search Output: strings s that satisfy Sim(q,s)≤ δ Sim functions: Edit distance, Jaccard, Cosine, etc

8 Speaker: Sattam Alsubaiee Preliminaries: Approximate String Search chaochi 2-grams {ch, ha, ao, oc, ch, hi} Intuition: similar strings share a certain number of grams Sliding Window Supporting Location-Based Approximate-Keyword Queries ACM SIGSPATIAL GIS 2010 8 Gram-based inverted-index Gram-based inverted-index

9 Speaker: Sattam Alsubaiee Our Solution Supporting Location-Based Approximate-Keyword Queries ACM SIGSPATIAL GIS 2010 Tree-based spatial index Approximate string search capability Keyword search capability LBAK-Tree 9

10 Speaker: Sattam Alsubaiee Contributions  How to combine those indexes  Three Algorithms 1) Simple fixed-level solution 2) Utilizing local spatial distribution of objects 3) Exploiting frequency distribution of keywords  How to combine those indexes  Three Algorithms 1) Simple fixed-level solution 2) Utilizing local spatial distribution of objects 3) Exploiting frequency distribution of keywords Supporting Location-Based Approximate-Keyword Queries ACM SIGSPATIAL GIS 2010 10

11 Speaker: Sattam Alsubaiee Algorithm 1: Fixed-Level Solution Supporting Location-Based Approximate-Keyword Queries ACM SIGSPATIAL GIS 2010 11 (Spatial Nodes) (Spatial-Approximate Nodes) (Spatial-Keyword Nodes)

12 Speaker: Sattam Alsubaiee Query Example Query: objects in “San Jose” with keywords similar to “chochi” & “resturant”  Based on edit distance of 1 Query: objects in “San Jose” with keywords similar to “chochi” & “resturant”  Based on edit distance of 1 Supporting Location-Based Approximate-Keyword Queries ACM SIGSPATIAL GIS 2010 12

13 Speaker: Sattam Alsubaiee Query Example Supporting Location-Based Approximate-Keyword Queries ACM SIGSPATIAL GIS 2010 13 Query: objects in “San Jose” with keywords similar to “chochi” & “resturant”  Based on edit distance of 1 Query: objects in “San Jose” with keywords similar to “chochi” & “resturant”  Based on edit distance of 1

14 Speaker: Sattam Alsubaiee Query Example Supporting Location-Based Approximate-Keyword Queries ACM SIGSPATIAL GIS 2010 14 Query: objects in “San Jose” with keywords similar to “chochi” & “resturant”  Based on edit distance of 1 Query: objects in “San Jose” with keywords similar to “chochi” & “resturant”  Based on edit distance of 1

15 Speaker: Sattam Alsubaiee Query Example Supporting Location-Based Approximate-Keyword Queries ACM SIGSPATIAL GIS 2010 15 Query: objects in “San Jose” with keywords similar to “chochi” & “resturant”  Based on edit distance of 1 Query: objects in “San Jose” with keywords similar to “chochi” & “resturant”  Based on edit distance of 1

16 Speaker: Sattam Alsubaiee How to Choose Level L? Supporting Location-Based Approximate-Keyword Queries ACM SIGSPATIAL GIS 2010 16 Trade off between space and time – until “some” level (both increase)

17 Speaker: Sattam Alsubaiee Observations  Query time & index size sensitive to approximate-index locations  Fixed-level solution ignores local spatial distribution of objects  Query time & index size sensitive to approximate-index locations  Fixed-level solution ignores local spatial distribution of objects Supporting Location-Based Approximate-Keyword Queries ACM SIGSPATIAL GIS 2010 17 Prefer to build approximate index at parent Prefer to build approximate indexes at children

18 Speaker: Sattam Alsubaiee Algorithm 2: Placing Approximate Indexes at Variable Levels Supporting Location-Based Approximate-Keyword Queries ACM SIGSPATIAL GIS 2010 18 (Spatial Nodes) (Spatial-Approximate Nodes) (Spatial-Keyword Nodes)

19 Speaker: Sattam Alsubaiee Selecting Nodes for Approximate Indexes  Goal: find optimal set of nodes that should have approximate indexes Supporting Location-Based Approximate-Keyword Queries ACM SIGSPATIAL GIS 2010  Optimization problem: given an R*-tree and a space budget, choose nodes to store approximate indexes, to minimize query time  NP-hard (Knapsack problem)  Optimization problem: given an R*-tree and a space budget, choose nodes to store approximate indexes, to minimize query time  NP-hard (Knapsack problem) 19

20 Speaker: Sattam Alsubaiee Greedy Algorithm: Selecting Nodes for Approximate Indexes Supporting Location-Based Approximate-Keyword Queries ACM SIGSPATIAL GIS 2010 N6 N3 N1 N2 N4 N7 N5 N12 N13 N14 N8 N9 N10N11 N15 20 ✔ ✔ ✔

21 Speaker: Sattam Alsubaiee Cost/Benefit Estimation  Effects of pushing index down  Increase space cost  Increase or decrease average query time  Typically  Higher levels: good to push index down  Intermediate levels: unclear whether to push it down  Effects of pushing index down  Increase space cost  Increase or decrease average query time  Typically  Higher levels: good to push index down  Intermediate levels: unclear whether to push it down Supporting Location-Based Approximate-Keyword Queries ACM SIGSPATIAL GIS 2010 21

22 Speaker: Sattam Alsubaiee Algorithm 3: Exploiting Frequency Distribution of Keywords Supporting Location-Based Approximate-Keyword Queries ACM SIGSPATIAL GIS 2010 22 (Spatial-Approximate Nodes) (Spatial-Keyword Nodes)

23 Speaker: Sattam Alsubaiee Experiments  Settings  Four-core Intel Xeon E5520 2.26Ghz  12GB of RAM  Ubuntu OS  C++ implementation  LBAK-tree in main memory  Keyword-frequency threshold = 1  R*-tree fanout = 40  Settings  Four-core Intel Xeon E5520 2.26Ghz  12GB of RAM  Ubuntu OS  C++ implementation  LBAK-tree in main memory  Keyword-frequency threshold = 1  R*-tree fanout = 40 Supporting Location-Based Approximate-Keyword Queries ACM SIGSPATIAL GIS 2010 23

24 Speaker: Sattam Alsubaiee Experiments  Dataset  CoPhIR Test Collection (CoPhIR)  3.75 million objects  Raw data size: 500MB  Business listings (Business)  20.4 million business listings in the U.S  Raw data size: 4GB  Queries  10,000 queries for each dataset  30km-by-30km query window around randomly selected object  Randomly chose two keywords of the randomly chosen object  Normalized edit-distance of 0.8  Dataset  CoPhIR Test Collection (CoPhIR)  3.75 million objects  Raw data size: 500MB  Business listings (Business)  20.4 million business listings in the U.S  Raw data size: 4GB  Queries  10,000 queries for each dataset  30km-by-30km query window around randomly selected object  Randomly chose two keywords of the randomly chosen object  Normalized edit-distance of 0.8 Supporting Location-Based Approximate-Keyword Queries ACM SIGSPATIAL GIS 2010 24

25 Speaker: Sattam Alsubaiee Terminology  FL: fixed-level approach  e.g.,“FL-0” approximate indexes are at the root level  VL: variable-level approach  VLF: variable-level approach exploiting keyword-frequencies  FL: fixed-level approach  e.g.,“FL-0” approximate indexes are at the root level  VL: variable-level approach  VLF: variable-level approach exploiting keyword-frequencies Supporting Location-Based Approximate-Keyword Queries ACM SIGSPATIAL GIS 2010 25

26 Speaker: Sattam Alsubaiee Comparison with MHR-Tree* Supporting Location-Based Approximate-Keyword Queries ACM SIGSPATIAL GIS 2010 * B. Yao, F. Li, M. Hadjieleftheriou, and K. Hou. Approximate string search in spatial databases. In ICDE, 2010 26  Maximum recall for MHR-Tree that we achieved is around 50%  LBAK-Tree recall is 100%  Maximum recall for MHR-Tree that we achieved is around 50%  LBAK-Tree recall is 100%

27 Speaker: Sattam Alsubaiee Index Size & Query Time Supporting Location-Based Approximate-Keyword Queries ACM SIGSPATIAL GIS 2010 27 Business Listings

28 Speaker: Sattam Alsubaiee Scalability: Query Time vs. VLF Supporting Location-Based Approximate-Keyword Queries ACM SIGSPATIAL GIS 2010 28 Used space budge: minimum index size for VLF to achieve best query time Business Listings

29 Speaker: Sattam Alsubaiee Conclusion  Spatial index + Approximate index = LBAK-tree 1) Simple fixed-level solution 2) Utilizing local spatial distribution of objects 3) Exploiting frequency distribution of keywords  Spatial index + Approximate index = LBAK-tree 1) Simple fixed-level solution 2) Utilizing local spatial distribution of objects 3) Exploiting frequency distribution of keywords Supporting Location-Based Approximate-Keyword Queries ACM SIGSPATIAL GIS 2010 29

30 Speaker: Sattam Alsubaiee Thank You! This work is part of The Flamingo Project Source Code: http://flamingo.ics.uci.edu http://flamingo.ics.uci.edu Live Demo: http://flamingo.ics.uci.edu /localsearch/fuzzysearch/ http://flamingo.ics.uci.edu /localsearch/fuzzysearch/ This work is part of The Flamingo Project Source Code: http://flamingo.ics.uci.edu http://flamingo.ics.uci.edu Live Demo: http://flamingo.ics.uci.edu /localsearch/fuzzysearch/ http://flamingo.ics.uci.edu /localsearch/fuzzysearch/ Supporting Location-Based Approximate-Keyword Queries ACM SIGSPATIAL GIS 2010 30


Download ppt "Speaker: Sattam Alsubaiee Supporting Location-Based Approximate-Keyword Queries Sattam Alsubaiee, Alexander Behm, and Chen Li University of California,"

Similar presentations


Ads by Google