Download presentation
Presentation is loading. Please wait.
1
Ad-hoc Distributed Spatial Joins on Mobile Devices Panos Kalnis, Xiaochen Li National University of Singapore Nikos Mamoulis The University of Hong Kong Spiridon Bakiras Hong Kong University of Science and Technology
2
Motivation Users are equipped with a mobile device (eg. PDA) Ad-hoc spatial queries Combine data from remote servers Hotels Restaurants “Find hotels which are within 500m of a seafood restaurant” Servers do not collaborate with each other The query is executed on the mobile device
3
Mediators? Services may only allow end-user connections (eg., subscribers only) Access through mediators may be more expensive Requests are ad-hoc; existing mediators may not support them Hotels Restaurants Mediator
4
Cost Telecommunication companies typically charge by the bulk of transferred data (eg. GPRS), instead of connection time. Goal: Minimize the amount of transferred data.
5
Solution Ask aggregate queries to estimate the data distribution (i.e., statistics) Partition the space recursively to achieve sub-linear transfer cost Choose the physical operator indepen- dently for each partition
6
Related Work Hash-based methods (eg. PBSM): require all data to be transferred R-tree based methods (eg., [Tan et.al, TKDE, 2000]): require access to internal index Mediators : HERMES : Statistics from previous queries DISCO, Garlic : Statistics during initialization Tuckila : Optimize parts of the execution tree
7
Operators WINDOW query: return all objects intersecting a window w COUNT query: return the number of objects intersecting w ε-RANGE query: return all objects within range ε from a point p NO access to the internal indices! ε w p
8
Query Types Intersection Join Find hotels which are inside parks E-range Join Find restaurants which are within 500m of a hotel Iceberg Semi-join Find hotels which are close to at least 3 restaurants ε
9
H ash B ased S patial J oin Each partition must fit in memory
10
Recursive evaluation Retrieve statistics for each subpart
11
Inefficient HBSJ
12
N ested L oop S patial J oin Recursive HBSJ : 4 QRY + 2 RCV + 5 RCV NLSJ : 2 RCV + 2 SND + 2 RES
13
Inefficient NLSJ
14
Cost Model TCP/IP: MTU = MSS + B H c1: download |R W | objects from R and |S w | objects from S and join them on the PDA C2,3: download |R W | objects from R, send them as window queries to S and retrieve the results c4: repartition w, retrieve detailed statistics and apply the algorithm recursively
15
UpJoin (Uniform Partition Join) Decide if datasets are uniform If HBSJ is cheaper and both datasets are uniform then perform HBSJ If NLSJ is cheaper and the largest dataset is uniform then perform NLSJ Else repartition
16
Uniformity check Dw Dw’0Dw’1 Dw’3Dw’2 % variation from uniform distribution Note: UpJoin will not repartition if the cost for retrieving statistics is larger than the cost of joining
17
Inefficient UpJoin
18
SR-Join (Similarity Related Join) Area % variation of density Identify dense and sparse quadrants If the distribution is similar then apply HBSJ or NLSJ Else repartition X X X X
19
Experimental setup Implementation Server: Unix Client: HP-Ipaq PDA (WiFi network, 400MHz RISC CPU, 64MB RAM, Windows Pocket PC) Datasets: Synthetic: 1K – 10K points, varying skew Real: Roads and railways of Germany
20
Setting the parameters α (for UpJoin)ρ (for SR-Join) Uniform
21
Real Dataset Uniform
22
Comparison with SemiJoin SemiJoin: Use intermediate levels of R-Tree index We cannot use it in practice, because we cannot access the index Uniform
23
Conclusions Distributed spatial joins on mobile devices No mediator – non collaborative servers – limited set of supported operators Two algorithms UpJoin SRJoin Both estimate the datasets’ distribution Future work Support multi-way spatial joins Improve the accuracy of the cost model
24
Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.