Ad-hoc Distributed Spatial Joins on Mobile Devices Panos Kalnis, Xiaochen Li National University of Singapore Nikos Mamoulis The University of Hong Kong.

Slides:



Advertisements
Similar presentations
Computer Science and Engineering Inverted Linear Quadtree: Efficient Top K Spatial Keyword Search Chengyuan Zhang 1,Ying Zhang 1,Wenjie Zhang 1, Xuemin.
Advertisements

LIBRA: Lightweight Data Skew Mitigation in MapReduce
1 Top-k Spatial Joins
Danzhou Liu Ee-Peng Lim Wee-Keong Ng
Company name KUAS HPDS Using Remote Memory Paging for Handheld Devices in a Pervasive Computing Environment Arjuna Sathiaseelan.
School of Computer Science and Engineering Finding Top k Most Influential Spatial Facilities over Uncertain Objects Liming Zhan Ying Zhang Wenjie Zhang.
STHoles: A Multidimensional Workload-Aware Histogram Nicolas Bruno* Columbia University Luis Gravano* Columbia University Surajit Chaudhuri Microsoft Research.
Progressive Computation of The Min-Dist Optimal-Location Query Donghui Zhang, Yang Du, Tian Xia, Yufei Tao* Northeastern University * Chinese University.
Database Management Systems 3ed, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 14, Part B.
Yoshiharu Ishikawa (Nagoya University) Yoji Machida (University of Tsukuba) Hiroyuki Kitagawa (University of Tsukuba) A Dynamic Mobility Histogram Construction.
Fast Data Anonymization with Low Information Loss 1 National University of Singapore 2 Hong Kong University
Database Implementation of a Model-Free Classifier Konstantinos Morfonios ADBIS 2007 University of Athens.
Optimization of Spatial Joins on Mobile Devices N. Mamoulis 1, P. Kalnis 2, S. Bakiras 3, X. Li 2 1 Department of Computer Science and Information Systems,
On Discovering Moving Clusters in Spatio-temporal Data Panos Kalnis National University of Singapore Nikos Mamoulis University of Hong Kong Spiridon Bakiras.
1 SINA: Scalable Incremental Processing of Continuous Queries in Spatio-temporal Databases Mohamed F. Mokbel, Xiaopeng Xiong, Walid G. Aref Presented by.
CSCI 5708: Query Processing I Pusheng Zhang University of Minnesota Feb 3, 2004.
Hierarchical Constraint Satisfaction in Spatial Database Dimitris Papadias, Panos Kalnis And Nikos Mamoulis.
An Incremental Refining Spatial Join Algorithm for Estimating Query Results in GIS Wan D. Bae, Shayma Alkobaisi, Scott T. Leutenegger Department of Computer.
R-tree Analysis. R-trees - performance analysis How many disk (=node) accesses we’ll need for range nn spatial joins why does it matter?
1 SINA: Scalable Incremental Processing of Continuous Queries in Spatio-temporal Databases Mohamed F. Mokbel, Xiaopeng Xiong, Walid G. Aref Presented by.
Improving the Accuracy of Continuous Aggregates & Mining Queries Under Load Shedding Yan-Nei Law* and Carlo Zaniolo Computer Science Dept. UCLA * Bioinformatics.
1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
Indexing Spatio-Temporal Data Warehouses Dimitris Papadias, Yufei Tao, Panos Kalnis, Jun Zhang Department of Computer Science Hong Kong University of Science.
Evaluation of Top-k OLAP Queries Using Aggregate R-trees Nikos Mamoulis (HKU) Spiridon Bakiras (HKUST) Panos Kalnis (NUS)
Trip Planning Queries F. Li, D. Cheng, M. Hadjieleftheriou, G. Kollios, S.-H. Teng Boston University.
R-tree Analysis. R-trees - performance analysis How many disk (=node) accesses we’ll need for range nn spatial joins why does it matter?
Evaluation of Relational Operations. Relational Operations v We will consider how to implement: – Selection ( ) Selects a subset of rows from relation.
Spatial Indexing. Spatial Queries Given a collection of geometric objects (points, lines, polygons,...) organize them on disk, to answer point queries.
Proxy-Server Architectures for OLAP Panos Kalnis, Dimitris Papadias THE HONG KONG UNIVERSITY OF SCIENCE AND TECHNOLOGY.
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 13: Query Processing.
Join-Queries between two Spatial Datasets Indexed by a Single R*-tree Join-Queries between two Spatial Datasets Indexed by a Single R*-tree Michael Vassilakopoulos.
KNR-tree: A novel R-tree-based index for facilitating Spatial Window Queries on any k relations among N spatial relations in Mobile environments ANIRBAN.
Skyline Queries Against Mobile Lightweight Devices in MANETs Zhiyong Huang 1 Christian S. Jensen 2 Hua Lu 1 Beng Chin Ooi 1 1 National University of Singapore,
Skyline Queries Against Mobile Lightweight Devices in MANETs Zhiyong Huang 1 Christian S. Jensen 2 Hua Lu 1 Beng Chin Ooi 1 1 National University of Singapore,
Approximate Encoding for Direct Access and Query Processing over Compressed Bitmaps Tan Apaydin – The Ohio State University Guadalupe Canahuate – The Ohio.
The X-Tree An Index Structure for High Dimensional Data Stefan Berchtold, Daniel A Keim, Hans Peter Kriegel Institute of Computer Science Munich, Germany.
HKU CSIS DB Seminar Processing Ad-Hoc Joins on Mobile Devices HKU CSIS DB Seminar 10 Oct 2003 Speaker: Eric Lo.
« Performance of Compressed Inverted List Caching in Search Engines » Proceedings of the International World Wide Web Conference Commitee, Beijing 2008)
Query Processing. Steps in Query Processing Validate and translate the query –Good syntax. –All referenced relations exist. –Translate the SQL to relational.
Relational Operator Evaluation. Overview Index Nested Loops Join If there is an index on the join column of one relation (say S), can make it the inner.
Efficient Processing of Top-k Spatial Preference Queries
Spatio-temporal Pattern Queries M. Hadjieleftheriou G. Kollios P. Bakalov V. J. Tsotras.
Lecture 1- Query Processing Advanced Databases Masood Niazi Torshiz Islamic Azad university- Mashhad Branch
Relational Operator Evaluation. Overview Application Programmer (e.g., business analyst, Data architect) Sophisticated Application Programmer (e.g.,
CPSC 404, Laks V.S. Lakshmanan1 Evaluation of Relational Operations – Join Chapter 14 Ramakrishnan and Gehrke (Section 14.4)
R-trees: An Average Case Analysis. R-trees - performance analysis How many disk (=node) accesses we ’ ll need for range nn spatial joins why does it matter?
Efficient OLAP Operations in Spatial Data Warehouses Dimitris Papadias, Panos Kalnis, Jun Zhang and Yufei Tao Department of Computer Science Hong Kong.
1 CSIS 7101: CSIS 7101: Spatial Data (Part 1) The R*-tree : An Efficient and Robust Access Method for Points and Rectangles Rollo Chan Chu Chung Man Mak.
Ohio State University Department of Computer Science and Engineering Servicing Range Queries on Multidimensional Datasets with Partial Replicas Li Weng,
Implementation of Database Systems, Jarek Gryz1 Evaluation of Relational Operations Chapter 12, Part A.
Optimizing Query Processing In Sensor Networks Ross Rosemark.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Evaluation of Relational Operations Chapter 14, Part A (Joins)
Rethinking Choices for Multi-dimensional Point Indexing You Jung Kim and Jignesh M. Patel University of Michigan.
A Hierarchical Edge Cloud Architecture for Mobile Computing IEEE INFOCOM 2016 Liang Tong, Yong Li and Wei Gao University of Tennessee – Knoxville 1.
A Flexible Spatio-temporal indexing Scheme for Large Scale GPS Tracks Retrieval Yu Zheng, Longhao Wang, Xing Xie Microsoft Research.
Database Management System
Efficient Join Query Evaluation in a Parallel Database System
Spatial Indexing.
Chapter 12: Query Processing
Evaluation of Relational Operations
Sameh Shohdy, Yu Su, and Gagan Agrawal
Li Weng, Umit Catalyurek, Tahsin Kurc, Gagan Agrawal, Joel Saltz
Spatial Online Sampling and Aggregation
On Spatial Joins in MapReduce
Efficient Evaluation of k-NN Queries Using Spatial Mashups
Overview of Query Evaluation
Continuous Density Queries for Moving Objects
R-trees: An Average Case Analysis
Efficient Processing of Top-k Spatial Preference Queries
Fraction-Score: A New Support Measure for Co-location Pattern Mining
Presentation transcript:

Ad-hoc Distributed Spatial Joins on Mobile Devices Panos Kalnis, Xiaochen Li National University of Singapore Nikos Mamoulis The University of Hong Kong Spiridon Bakiras Hong Kong University of Science and Technology

Motivation  Users are equipped with a mobile device (eg. PDA)  Ad-hoc spatial queries  Combine data from remote servers Hotels Restaurants “Find hotels which are within 500m of a seafood restaurant”  Servers do not collaborate with each other  The query is executed on the mobile device

Mediators?  Services may only allow end-user connections (eg., subscribers only)  Access through mediators may be more expensive  Requests are ad-hoc; existing mediators may not support them Hotels Restaurants Mediator

Cost  Telecommunication companies typically charge by the bulk of transferred data (eg. GPRS), instead of connection time.  Goal: Minimize the amount of transferred data.

Solution  Ask aggregate queries to estimate the data distribution (i.e., statistics)  Partition the space recursively to achieve sub-linear transfer cost  Choose the physical operator indepen- dently for each partition

Related Work  Hash-based methods (eg. PBSM): require all data to be transferred  R-tree based methods (eg., [Tan et.al, TKDE, 2000]): require access to internal index  Mediators : HERMES : Statistics from previous queries DISCO, Garlic : Statistics during initialization Tuckila : Optimize parts of the execution tree

Operators  WINDOW query: return all objects intersecting a window w  COUNT query: return the number of objects intersecting w  ε-RANGE query: return all objects within range ε from a point p NO access to the internal indices! ε w p

Query Types  Intersection Join Find hotels which are inside parks  E-range Join Find restaurants which are within 500m of a hotel  Iceberg Semi-join Find hotels which are close to at least 3 restaurants ε

H ash B ased S patial J oin Each partition must fit in memory

Recursive evaluation Retrieve statistics for each subpart

Inefficient HBSJ

N ested L oop S patial J oin Recursive HBSJ : 4 QRY + 2 RCV + 5 RCV NLSJ : 2 RCV + 2 SND + 2 RES

Inefficient NLSJ

Cost Model  TCP/IP: MTU = MSS + B H  c1: download |R W | objects from R and |S w | objects from S and join them on the PDA  C2,3: download |R W | objects from R, send them as window queries to S and retrieve the results  c4: repartition w, retrieve detailed statistics and apply the algorithm recursively

UpJoin (Uniform Partition Join) Decide if datasets are uniform If HBSJ is cheaper and both datasets are uniform then perform HBSJ If NLSJ is cheaper and the largest dataset is uniform then perform NLSJ Else repartition

Uniformity check Dw Dw’0Dw’1 Dw’3Dw’2 % variation from uniform distribution  Note: UpJoin will not repartition if the cost for retrieving statistics is larger than the cost of joining

Inefficient UpJoin

SR-Join (Similarity Related Join) Area % variation of density  Identify dense and sparse quadrants  If the distribution is similar then apply HBSJ or NLSJ  Else repartition X X X X

Experimental setup  Implementation Server: Unix Client: HP-Ipaq PDA (WiFi network, 400MHz RISC CPU, 64MB RAM, Windows Pocket PC)  Datasets: Synthetic: 1K – 10K points, varying skew Real: Roads and railways of Germany

Setting the parameters α (for UpJoin)ρ (for SR-Join) Uniform

Real Dataset Uniform

Comparison with SemiJoin SemiJoin: Use intermediate levels of R-Tree index We cannot use it in practice, because we cannot access the index Uniform

Conclusions  Distributed spatial joins on mobile devices  No mediator – non collaborative servers – limited set of supported operators  Two algorithms UpJoin SRJoin Both estimate the datasets’ distribution  Future work Support multi-way spatial joins Improve the accuracy of the cost model

Questions?