A Rule-Based Optimizer for Spatial Join Operations Miguel Fornari João Luiz Comba Cirano Iochpe Instituto de Informática Universidade Federal do Rio Grande.

Slides:



Advertisements
Similar presentations
An Array-Based Algorithm for Simultaneous Multidimensional Aggregates By Yihong Zhao, Prasad M. Desphande and Jeffrey F. Naughton Presented by Kia Hall.
Advertisements

1 Spatial Join. 2 Papers to Present “Efficient Processing of Spatial Joins using R-trees”, T. Brinkhoff, H-P Kriegel and B. Seeger, Proc. SIGMOD, 1993.
Spatial Join Queries. Spatial Queries Given a collection of geometric objects (points, lines, polygons,...) organize them on disk, to answer point queries.
CS4432: Database Systems II
Copyright © 2011 Ramez Elmasri and Shamkant Navathe Algorithms for SELECT and JOIN Operations (8) Implementing the JOIN Operation: Join (EQUIJOIN, NATURAL.
Fundamentals of Python: From First Programs Through Data Structures
Query Processing in Databases Dr. M. Gavrilova.  Introduction  I/O algorithms for large databases  Complex geometric operations in graphical querying.
1 CSIS 7101: CSIS 7101: Spatial Data (Part 2) Efficient Processing of Spatial Joins Using R-trees Rollo Chan Chu Chung Man Mak Wai Yip Vivian Lee Eric.
the fourth iteration of this loop is shown here
1  Simple Nested Loops Join:  Block Nested Loops Join  Index Nested Loops Join  Sort Merge Join  Hash Join  Hybrid Hash Join Evaluation of Relational.
Accessing Spatial Data
SPRING 2004CENG 3521 Join Algorithms Chapter 14. SPRING 2004CENG 3522 Schema for Examples Similar to old schema; rname added for variations. Reserves:
CSCI 5708: Query Processing I Pusheng Zhang University of Minnesota Feb 3, 2004.
Chapter 3: Data Storage and Access Methods
An Incremental Refining Spatial Join Algorithm for Estimating Query Results in GIS Wan D. Bae, Shayma Alkobaisi, Scott T. Leutenegger Department of Computer.
Chapter 4 Parallel Sort and GroupBy 4.1Sorting, Duplicate Removal and Aggregate 4.2Serial External Sorting Method 4.3Algorithms for Parallel External Sort.
ICS (072)Query Processing and Optimization 1 Chapter 15 Algorithms for Query Processing and Optimization ICS 424 Advanced Database Systems Dr.
16.5 Introduction to Cost- based plan selection Amith KC Student Id: 109.
CSCI 5708: Query Processing I Pusheng Zhang University of Minnesota Feb 3, 2004.
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
Evaluation of Relational Operations. Relational Operations v We will consider how to implement: – Selection ( ) Selects a subset of rows from relation.
1 Implementation of Relational Operations: Joins.
Roger ZimmermannCOMPSAC 2004, September 30 Spatial Data Query Support in Peer-to-Peer Systems Roger Zimmermann, Wei-Shinn Ku, and Haojun Wang Computer.
Modularizing B+-trees: Three-Level B+-trees Work Fine Shigero Sasaki* and Takuya Araki NEC Corporation * currently with 1st Nexpire Inc.
Join-Queries between two Spatial Datasets Indexed by a Single R*-tree Join-Queries between two Spatial Datasets Indexed by a Single R*-tree Michael Vassilakopoulos.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Query Evaluation Chapter 12: Overview.
CSCE Database Systems Chapter 15: Query Execution 1.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
Trevor Brown – University of Toronto B-slack trees: Space efficient B-trees.
Copyright © Curt Hill Query Evaluation Translating a query into action.
Leonardo Guerreiro Azevedo Geraldo Zimbrão Jano Moreira de Souza Approximate Query Processing in Spatial Databases Using Raster Signatures Federal University.
A Region Based Stereo Matching Algorithm Using Cooperative Optimization Zeng-Fu Wang, Zhi-Gang Zheng University of Science and Technology of China Computer.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Tree-Structured Indexes.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Tree-Structured Indexes Chapter 9.
Lecture 1- Query Processing Advanced Databases Masood Niazi Torshiz Islamic Azad university- Mashhad Branch
Chapter 12 Query Processing. Query Processing n Selection Operation n Sorting n Join Operation n Other Operations n Evaluation of Expressions 2.
Chapter 12 Query Processing (1) Yonsei University 2 nd Semester, 2013 Sanghyun Park.
Introduction to Query Optimization, R. Ramakrishnan and J. Gehrke 1 Introduction to Query Optimization Chapter 13.
16.7 Completing the Physical- Query-Plan By Aniket Mulye CS257 Prof: Dr. T. Y. Lin.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Introduction to Query Optimization Chapter 13.
Relational Operator Evaluation. Overview Application Programmer (e.g., business analyst, Data architect) Sophisticated Application Programmer (e.g.,
An Efficient CUDA Implementation of the Tree-Based Barnes Hut n-body Algorithm By Martin Burtscher and Keshav Pingali Jason Wengert.
Computing & Information Sciences Kansas State University Monday, 03 Nov 2008CIS 560: Database System Concepts Lecture 27 of 42 Monday, 03 November 2008.
ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 15 – Query Optimization.
1 CSIS 7101: CSIS 7101: Spatial Data (Part 1) The R*-tree : An Efficient and Robust Access Method for Points and Rectangles Rollo Chan Chu Chung Man Mak.
Computing & Information Sciences Kansas State University Wednesday, 08 Nov 2006CIS 560: Database System Concepts Lecture 32 of 42 Monday, 06 November 2006.
File Processing : Query Processing 2008, Spring Pusan National University Ki-Joune Li.
Implementation of Database Systems, Jarek Gryz1 Evaluation of Relational Operations Chapter 12, Part A.
Alon Levy 1 Relational Operations v We will consider how to implement: – Selection ( ) Selects a subset of rows from relation. – Projection ( ) Deletes.
1 Tree-Structured Indexes Chapter Introduction  As for any index, 3 alternatives for data entries k* :  Data record with key value k   Choice.
Chapter 13 Query Optimization Yonsei University 1 st Semester, 2015 Sanghyun Park.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Evaluation of Relational Operations Chapter 14, Part A (Joins)
1 Overview of Query Evaluation Chapter Outline  Query Optimization Overview  Algorithm for Relational Operations.
Igor EPIMAKHOV Abdelkader HAMEURLAIN Franck MORVAN
Database Management System
Tree-Structured Indexes
Parallel Density-based Hybrid Clustering
Chapter 12: Query Processing
Query Processing in Databases Dr. M. Gavrilova
Introduction to Query Optimization
Sameh Shohdy, Yu Su, and Gagan Agrawal
Chapter 15 QUERY EXECUTION.
File Processing : Query Processing
On Spatial Joins in MapReduce
Query Processing B.Ramamurthy Chapter 12 11/27/2018 B.Ramamurthy.
Improve Run Generation
Lecture 2- Query Processing (continued)
Chapter 12 Query Processing (1)
Implementation of Relational Operations
Efficient Processing of Top-k Spatial Preference Queries
Presentation transcript:

A Rule-Based Optimizer for Spatial Join Operations Miguel Fornari João Luiz Comba Cirano Iochpe Instituto de Informática Universidade Federal do Rio Grande do Sul Porto Alegre - Brazil

Outline 1.Introduction and motivation 2.Spatial Hash Algorithms 3.The Validation System Architecture 4.The Rules 5.Conclusions

3 Introduction and Motivation The spatial join operation is fundamental and expensive in GIS Combines two sets of spatial features based on a spatial predicate DBMS, traditionally, improves the performance based on heuristic rules and cost expressions Spatial DBMS can include a specific module to spatial operations

4 Goal Reduce response time of the spatial join algorithms for the filter step A set of rules to optimize the performance of some well-known algorithms Which parameters are relevant? What is the best value for each important parameter?

5 SJ Algorithms ● According to the file organization

6 SJ algorithms For each algorithm, two cost expressions are important I/O cost CPU cost Some cost are already known, but not all All expressions written in a similar notation

7 The System Architecture The performance analysis, although correct, simplifies many cases Real cases are more complex Real data sets obtained in Internet

8 Plane-sweep algorithm All SJ algorithms load objects to memory and perform a sweep-plane algorithm to check if pairs of objects satisfy the spatial predicate. Traditional performance is O(k + n log n), where k is the number of object intersections But, O(c + n log n), where c is the number of performed comparisons, is more exact.

9 Plane-sweep algorithm Divide the space into strips Count the number of objects in each strip The size of strips is the average size of objects Estimation of c = Sum of all values

10 Rule 1 – Sweep-plane The DBMS can estimate c for each axle and choose the one with minor value of c, optimizing the plane-sweep. The shape of objects alters the response time

11 Synchronized Tree Transversal Well known algorithm for R-Trees The performance depends on height of R-Trees and average size of nodes. The space division reduces the number of object comparisons ( c). Available memory is not important.

12 Rule 2 - STT The STT algorithm is optimized defining nodes with a low number of entries. But, the total number of nodes will be greater, defining a minimum limit for the rule. Optimal value between 50-75

13 Rule 2 - STT The performance of STT algorithm is constant when the memory buffer size increases. Except for very values Set any value greater than 4*heigth of the R-Trees

14 Iterative Stripped SJ Iterative SJ (Jacox & Samet) + strips Strips divides the space and reduces c Transpassant objects can occur The sorting can be either internal or external The performance depends on the memory available, the number of strips, and replicas.

15 Rule 3 - ISSJ The ISSJ algorithm is optimized definining a great number of strips. The number of objects in each strip will be small, but the is limited by the adding of replicas.

16 Rule 3 - ISSJ It´s important allocate enough memory to perform an internal sorting of each set

17 Partition Based Spatial Method (PBSM) Calculates the number of partitions Uses a regular grid to divide the space Partitions can overflow The performance depends on the number of replicas and overflowed partitions The number of object comparisons (c) is reduced by the space subdivision

18 Rule 4 - PBSM PBSM is improved setting a high value for the number of partitions using a small size of memory or just set a lower bound to the number of partitions.

19 Rule 4 - PBSM This rule is limited by the number of replicas, that increase the number of processed objects.

20 Histogram Hash Stripped Join A histogram of object distribution guides the space partitioning to avoid overflow Replicas are counted into the histogram The objects are maintained in a hash file and are loaded to memory only once. The performance is affected by the number of replicas and the space subdivision

21 Rule 5 - HHSJ HHSJ is improved setting a large value for the number of partitions and for the number of strips.

22 Rule 5 - HHSJ This rule is limited, also, by the number of replicas, that increase the number of processed objects.

23 Conclusions & Future Work Our main contribution The use of rules can reduce the response time of individual algorithms, in some cases, more than 50%. The rules can be incorporated in real GDBMS Future work Use 3D sets to perform the tests Include other spatial operations Implement in PostGIS

24 Contact & questions Miguel Fornari João Comba Cirano Iochpe