Download presentation
Presentation is loading. Please wait.
1
Speaker: Sattam Alsubaiee Supporting Location-Based Approximate-Keyword Queries Sattam Alsubaiee, Alexander Behm, and Chen Li University of California, Irvine Sattam Alsubaiee, Alexander Behm, and Chen Li University of California, Irvine Supporting Location-Based Approximate-Keyword Queries ACM SIGSPATIAL GIS 2010 1
2
Speaker: Sattam Alsubaiee Lunch Time! I want Chinese food! Supporting Location-Based Approximate-Keyword Queries ACM SIGSPATIAL GIS 2010 2 Remembering restaurant name?! Ch-o-chi?! Remembering restaurant name?! Ch-o-chi?!
3
Speaker: Sattam Alsubaiee Let’s Find It! Supporting Location-Based Approximate-Keyword Queries ACM SIGSPATIAL GIS 2010 3
4
Speaker: Sattam Alsubaiee Just One Typo Supporting Location-Based Approximate-Keyword Queries ACM SIGSPATIAL GIS 2010 4
5
Speaker: Sattam Alsubaiee Problem Formulation Supporting Location-Based Approximate-Keyword Queries ACM SIGSPATIAL GIS 2010 5 Object Collection chaochi restaurant starbucks apple store sam’s club … Object Collection chaochi restaurant starbucks apple store sam’s club … Find objects in “San Jose” with keywords similar to “chochi” & “resturant”
6
Speaker: Sattam Alsubaiee Preliminaries: Location-Based Keyword Search Find objects within a given spatial region that have a given set of keywords Supporting Location-Based Approximate-Keyword Queries ACM SIGSPATIAL GIS 2010 Augment a hierarchal spatial index with textual information 6
7
Speaker: Sattam Alsubaiee Preliminaries: Approximate String Search Supporting Location-Based Approximate-Keyword Queries ACM SIGSPATIAL GIS 2010 7 … chaochi chucho church Query q: chochi Query q: chochi Collection of strings s Search Output: strings s that satisfy Sim(q,s)≤ δ Sim functions: Edit distance, Jaccard, Cosine, etc
8
Speaker: Sattam Alsubaiee Preliminaries: Approximate String Search chaochi 2-grams {ch, ha, ao, oc, ch, hi} Intuition: similar strings share a certain number of grams Sliding Window Supporting Location-Based Approximate-Keyword Queries ACM SIGSPATIAL GIS 2010 8 Gram-based inverted-index Gram-based inverted-index
9
Speaker: Sattam Alsubaiee Our Solution Supporting Location-Based Approximate-Keyword Queries ACM SIGSPATIAL GIS 2010 Tree-based spatial index Approximate string search capability Keyword search capability LBAK-Tree 9
10
Speaker: Sattam Alsubaiee Contributions How to combine those indexes Three Algorithms 1) Simple fixed-level solution 2) Utilizing local spatial distribution of objects 3) Exploiting frequency distribution of keywords How to combine those indexes Three Algorithms 1) Simple fixed-level solution 2) Utilizing local spatial distribution of objects 3) Exploiting frequency distribution of keywords Supporting Location-Based Approximate-Keyword Queries ACM SIGSPATIAL GIS 2010 10
11
Speaker: Sattam Alsubaiee Algorithm 1: Fixed-Level Solution Supporting Location-Based Approximate-Keyword Queries ACM SIGSPATIAL GIS 2010 11 (Spatial Nodes) (Spatial-Approximate Nodes) (Spatial-Keyword Nodes)
12
Speaker: Sattam Alsubaiee Query Example Query: objects in “San Jose” with keywords similar to “chochi” & “resturant” Based on edit distance of 1 Query: objects in “San Jose” with keywords similar to “chochi” & “resturant” Based on edit distance of 1 Supporting Location-Based Approximate-Keyword Queries ACM SIGSPATIAL GIS 2010 12
13
Speaker: Sattam Alsubaiee Query Example Supporting Location-Based Approximate-Keyword Queries ACM SIGSPATIAL GIS 2010 13 Query: objects in “San Jose” with keywords similar to “chochi” & “resturant” Based on edit distance of 1 Query: objects in “San Jose” with keywords similar to “chochi” & “resturant” Based on edit distance of 1
14
Speaker: Sattam Alsubaiee Query Example Supporting Location-Based Approximate-Keyword Queries ACM SIGSPATIAL GIS 2010 14 Query: objects in “San Jose” with keywords similar to “chochi” & “resturant” Based on edit distance of 1 Query: objects in “San Jose” with keywords similar to “chochi” & “resturant” Based on edit distance of 1
15
Speaker: Sattam Alsubaiee Query Example Supporting Location-Based Approximate-Keyword Queries ACM SIGSPATIAL GIS 2010 15 Query: objects in “San Jose” with keywords similar to “chochi” & “resturant” Based on edit distance of 1 Query: objects in “San Jose” with keywords similar to “chochi” & “resturant” Based on edit distance of 1
16
Speaker: Sattam Alsubaiee How to Choose Level L? Supporting Location-Based Approximate-Keyword Queries ACM SIGSPATIAL GIS 2010 16 Trade off between space and time – until “some” level (both increase)
17
Speaker: Sattam Alsubaiee Observations Query time & index size sensitive to approximate-index locations Fixed-level solution ignores local spatial distribution of objects Query time & index size sensitive to approximate-index locations Fixed-level solution ignores local spatial distribution of objects Supporting Location-Based Approximate-Keyword Queries ACM SIGSPATIAL GIS 2010 17 Prefer to build approximate index at parent Prefer to build approximate indexes at children
18
Speaker: Sattam Alsubaiee Algorithm 2: Placing Approximate Indexes at Variable Levels Supporting Location-Based Approximate-Keyword Queries ACM SIGSPATIAL GIS 2010 18 (Spatial Nodes) (Spatial-Approximate Nodes) (Spatial-Keyword Nodes)
19
Speaker: Sattam Alsubaiee Selecting Nodes for Approximate Indexes Goal: find optimal set of nodes that should have approximate indexes Supporting Location-Based Approximate-Keyword Queries ACM SIGSPATIAL GIS 2010 Optimization problem: given an R*-tree and a space budget, choose nodes to store approximate indexes, to minimize query time NP-hard (Knapsack problem) Optimization problem: given an R*-tree and a space budget, choose nodes to store approximate indexes, to minimize query time NP-hard (Knapsack problem) 19
20
Speaker: Sattam Alsubaiee Greedy Algorithm: Selecting Nodes for Approximate Indexes Supporting Location-Based Approximate-Keyword Queries ACM SIGSPATIAL GIS 2010 N6 N3 N1 N2 N4 N7 N5 N12 N13 N14 N8 N9 N10N11 N15 20 ✔ ✔ ✔
21
Speaker: Sattam Alsubaiee Cost/Benefit Estimation Effects of pushing index down Increase space cost Increase or decrease average query time Typically Higher levels: good to push index down Intermediate levels: unclear whether to push it down Effects of pushing index down Increase space cost Increase or decrease average query time Typically Higher levels: good to push index down Intermediate levels: unclear whether to push it down Supporting Location-Based Approximate-Keyword Queries ACM SIGSPATIAL GIS 2010 21
22
Speaker: Sattam Alsubaiee Algorithm 3: Exploiting Frequency Distribution of Keywords Supporting Location-Based Approximate-Keyword Queries ACM SIGSPATIAL GIS 2010 22 (Spatial-Approximate Nodes) (Spatial-Keyword Nodes)
23
Speaker: Sattam Alsubaiee Experiments Settings Four-core Intel Xeon E5520 2.26Ghz 12GB of RAM Ubuntu OS C++ implementation LBAK-tree in main memory Keyword-frequency threshold = 1 R*-tree fanout = 40 Settings Four-core Intel Xeon E5520 2.26Ghz 12GB of RAM Ubuntu OS C++ implementation LBAK-tree in main memory Keyword-frequency threshold = 1 R*-tree fanout = 40 Supporting Location-Based Approximate-Keyword Queries ACM SIGSPATIAL GIS 2010 23
24
Speaker: Sattam Alsubaiee Experiments Dataset CoPhIR Test Collection (CoPhIR) 3.75 million objects Raw data size: 500MB Business listings (Business) 20.4 million business listings in the U.S Raw data size: 4GB Queries 10,000 queries for each dataset 30km-by-30km query window around randomly selected object Randomly chose two keywords of the randomly chosen object Normalized edit-distance of 0.8 Dataset CoPhIR Test Collection (CoPhIR) 3.75 million objects Raw data size: 500MB Business listings (Business) 20.4 million business listings in the U.S Raw data size: 4GB Queries 10,000 queries for each dataset 30km-by-30km query window around randomly selected object Randomly chose two keywords of the randomly chosen object Normalized edit-distance of 0.8 Supporting Location-Based Approximate-Keyword Queries ACM SIGSPATIAL GIS 2010 24
25
Speaker: Sattam Alsubaiee Terminology FL: fixed-level approach e.g.,“FL-0” approximate indexes are at the root level VL: variable-level approach VLF: variable-level approach exploiting keyword-frequencies FL: fixed-level approach e.g.,“FL-0” approximate indexes are at the root level VL: variable-level approach VLF: variable-level approach exploiting keyword-frequencies Supporting Location-Based Approximate-Keyword Queries ACM SIGSPATIAL GIS 2010 25
26
Speaker: Sattam Alsubaiee Comparison with MHR-Tree* Supporting Location-Based Approximate-Keyword Queries ACM SIGSPATIAL GIS 2010 * B. Yao, F. Li, M. Hadjieleftheriou, and K. Hou. Approximate string search in spatial databases. In ICDE, 2010 26 Maximum recall for MHR-Tree that we achieved is around 50% LBAK-Tree recall is 100% Maximum recall for MHR-Tree that we achieved is around 50% LBAK-Tree recall is 100%
27
Speaker: Sattam Alsubaiee Index Size & Query Time Supporting Location-Based Approximate-Keyword Queries ACM SIGSPATIAL GIS 2010 27 Business Listings
28
Speaker: Sattam Alsubaiee Scalability: Query Time vs. VLF Supporting Location-Based Approximate-Keyword Queries ACM SIGSPATIAL GIS 2010 28 Used space budge: minimum index size for VLF to achieve best query time Business Listings
29
Speaker: Sattam Alsubaiee Conclusion Spatial index + Approximate index = LBAK-tree 1) Simple fixed-level solution 2) Utilizing local spatial distribution of objects 3) Exploiting frequency distribution of keywords Spatial index + Approximate index = LBAK-tree 1) Simple fixed-level solution 2) Utilizing local spatial distribution of objects 3) Exploiting frequency distribution of keywords Supporting Location-Based Approximate-Keyword Queries ACM SIGSPATIAL GIS 2010 29
30
Speaker: Sattam Alsubaiee Thank You! This work is part of The Flamingo Project Source Code: http://flamingo.ics.uci.edu http://flamingo.ics.uci.edu Live Demo: http://flamingo.ics.uci.edu /localsearch/fuzzysearch/ http://flamingo.ics.uci.edu /localsearch/fuzzysearch/ This work is part of The Flamingo Project Source Code: http://flamingo.ics.uci.edu http://flamingo.ics.uci.edu Live Demo: http://flamingo.ics.uci.edu /localsearch/fuzzysearch/ http://flamingo.ics.uci.edu /localsearch/fuzzysearch/ Supporting Location-Based Approximate-Keyword Queries ACM SIGSPATIAL GIS 2010 30
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.