1 CSIS 7101: CSIS 7101: Spatial Data (Part 2) Efficient Processing of Spatial Joins Using R-trees Rollo Chan Chu Chung Man Mak Wai Yip Vivian Lee Eric.

Slides:



Advertisements
Similar presentations
High-dimensional Similarity Join
Advertisements

CSIS 7101: Spatial Data (Part 3) Distance Browsing in Spatial Database GÍSLI R. HJALTASON and HANAN SAMET Rollo Chan Chu Chung Man Mak Wai Yip Vivian Lee.
1 Spatial Join. 2 Papers to Present “Efficient Processing of Spatial Joins using R-trees”, T. Brinkhoff, H-P Kriegel and B. Seeger, Proc. SIGMOD, 1993.
Spatial Join Queries. Spatial Queries Given a collection of geometric objects (points, lines, polygons,...) organize them on disk, to answer point queries.
Access Methods for Advanced Database Applications.
Database Management Systems 3ed, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 14, Part B.
Database Management Systems, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
Database Management Systems, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
TIME 2002, Manchester, UK Index Based Processing of Semi- Restrictive Temporal Joins Donghui Zhang, Vassilis J. Tsotras University of California, Riverside.
Continuous Intersection Joins Over Moving Objects Rui Zhang University of Melbourne Dan Lin Purdue University Kotagiri Ramamohanarao University of Melbourne.
Query Processing in Databases Dr. M. Gavrilova.  Introduction  I/O algorithms for large databases  Complex geometric operations in graphical querying.
Query Evaluation. An SQL query and its RA equiv. Employees (sin INT, ename VARCHAR(20), rating INT, age REAL) Maintenances (sin INT, planeId INT, day.
2-dimensional indexing structure
Lecture 12 : Special Case of Hidden-Line-Elimination Computational Geometry Prof. Dr. Th. Ottmann 1 Special Cases of the Hidden Line Elimination Problem.
Spatial Indexing SAMs. Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation.
Spatial Indexing SAMs. Spatial Access Methods PAMs Grid File kd-tree based (LSD-, hB- trees) Z-ordering + B+-tree R-tree Variations: R*-tree, Hilbert.
SPRING 2004CENG 3521 Join Algorithms Chapter 14. SPRING 2004CENG 3522 Schema for Examples Similar to old schema; rname added for variations. Reserves:
Chapter 8 File organization and Indices.
1 File Organizations and Indexing Module 4, Lecture 2 “How index-learning turns no student pale Yet holds the eel of science by the tail.” -- Alexander.
Spatial Queries Nearest Neighbor and Join Queries.
Spatial Information Systems (SIS) COMP Spatial access methods: Indexing.
Chapter 3: Data Storage and Access Methods
Spatio-Temporal Databases. Introduction Spatiotemporal Databases: manage spatial data whose geometry changes over time Geometry: position and/or extent.
1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
R-Trees 2-dimensional indexing structure. R-trees 2-dimensional version of the B-tree: B-tree of maximum degree 8; degree between 3 and 8 Internal nodes.
1 Query Processing: The Basics Chapter Topics How does DBMS compute the result of a SQL queries? The most often executed operations: –Sort –Projection,
Spatial Database Systems. Spatial Database Applications GIS applications (maps): Urban planning, route optimization, fire or pollution monitoring, utility.
Spatial Indexing. Spatial Queries Given a collection of geometric objects (points, lines, polygons,...) organize them on disk, to answer point queries.
Fast Subsequence Matching in Time-Series Databases Christos Faloutsos M. Ranganathan Yannis Manolopoulos Department of Computer Science and ISR University.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 File Organizations and Indexing Chapter 8.
1 Geometric Intersection Determining if there are intersections between graphical objects Finding all intersecting pairs Brute Force Algorithm Plane Sweep.
Spatial Data Management Chapter 28. Types of Spatial Data Point Data –Points in a multidimensional space E.g., Raster data such as satellite imagery,
Join-Queries between two Spatial Datasets Indexed by a Single R*-tree Join-Queries between two Spatial Datasets Indexed by a Single R*-tree Michael Vassilakopoulos.
SEMILARITY JOIN COP6731 Advanced Database Systems.
The X-Tree An Index Structure for High Dimensional Data Stefan Berchtold, Daniel A Keim, Hans Peter Kriegel Institute of Computer Science Munich, Germany.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 File Organizations and Indexing Chapter 8 “How index-learning turns no student pale Yet holds.
Relational Operator Evaluation. Overview Index Nested Loops Join If there is an index on the join column of one relation (say S), can make it the inner.
CSIS7101 – Advanced Database Technologies Spatio-Temporal Data (Part 1) On Indexing Mobile Objects Kwong Chi Ho Leo Wong Chi Kwong Simon Lui, Tak Sing.
R-Tree. 2 Spatial Database (Ia) Consider: Given a city map, ‘index’ all university buildings in an efficient structure for quick topological search.
Computing & Information Sciences Kansas State University Tuesday, 03 Apr 2007CIS 560: Database System Concepts Lecture 29 of 42 Tuesday, 03 April 2007.
Spatial Query Processing Spatial DBs do not have a set of operators that are considered to be basic elements in a query evaluation. Spatial DBs handle.
Lecture 1- Query Processing Advanced Databases Masood Niazi Torshiz Islamic Azad university- Mashhad Branch
ICDE 2002, San Jose, CA Efficient Temporal Join Processing using Indices Donghui Zhang University of California, Riverside Vassilis J. Tsotras University.
Spatial Database 2/5/2011 Reference – Ramakrishna Gerhke and Silbershatz.
Spatial Indexing Techniques Introduction to Spatial Computing CSE 5ISC Some slides adapted from Spatial Databases: A Tour by Shashi Shekhar Prentice Hall.
1 CSIS 7101: CSIS 7101: Spatial Data (Part 1) The R*-tree : An Efficient and Robust Access Method for Points and Rectangles Rollo Chan Chu Chung Man Mak.
Query Execution. Where are we? File organizations: sorted, hashed, heaps. Indexes: hash index, B+-tree Indexes can be clustered or not. Data can be stored.
23 1 Christian Böhm 1, Florian Krebs 2, and Hans-Peter Kriegel 2 1 University for Health Informatics and Technology, Innsbruck 2 University of Munich Optimal.
File Processing : Query Processing 2008, Spring Pusan National University Ki-Joune Li.
Relational Operator Evaluation. overview Projection Two steps –Remove unwanted attributes –Eliminate any duplicate tuples The expensive part is removing.
Query Execution Query compiler Execution engine Index/record mgr. Buffer manager Storage manager storage User/ Application Query update Query execution.
Spatial Queries Nearest Neighbor and Join Queries Most slides are based on slides provided By Prof. Christos Faloutsos (CMU) and Prof. Dimitris Papadias.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 File Organizations and Indexing Chapter 8 Jianping Fan Dept of Computer Science UNC-Charlotte.
1 Spatial Query Processing using the R-tree Donghui Zhang CCIS, Northeastern University Feb 8, 2005.
Strategies for Spatial Joins
Spatial Queries Nearest Neighbor and Join Queries.
Database Management System
Database Applications (15-415) DBMS Internals- Part VII Lecture 16, October 25, 2016 Mohammad Hammoud.
Spatial Indexing.
Chapter 12: Query Processing
Query Processing in Databases Dr. M. Gavrilova
Joining Massive High-Dimensional Datasets
Spatio-Temporal Databases
Spatial Indexing I R-trees
Evaluation of Relational Operations: Other Techniques
External Sorting Sorting is used in implementing many relational operations Problem: Relations are typically large, do not fit in main memory So cannot.
Donghui Zhang, Tian Xia Northeastern University
Efficient Aggregation over Objects with Extent
Unit 12 Index in Database 大量資料存取方法之研究 Approaches to Access/Store Large Data 楊維邦 博士 國立東華大學 資訊管理系教授.
Presentation transcript:

1 CSIS 7101: CSIS 7101: Spatial Data (Part 2) Efficient Processing of Spatial Joins Using R-trees Rollo Chan Chu Chung Man Mak Wai Yip Vivian Lee Eric Lo Sindy Shou Hugh Wang

2 Efficient Processing of Spatial Join Using R-trees What is Spatial Data?  Consists of points, lines, rectangles, polygons, surfaces… Two types of queries in DBS  Single scan and Multiple scan queries How to retrieve spatial objects in GIS efficiently?  Spatial Access Method (SAM) – eg. R*-tree

3 Designed to support single scan query  eg. Window query  “Find all objects which intersect a given window” Attempts to store objects which are close together in the data space on a common page  Reduces number of disk accesses What is Spatial Access Method?

4 How is window query processed by SAM? 1) Filter step  Find all objects whose minimum bounding rectangles intersects the query rectangle 2) Refinement step  Check whether the objects fulfill the query condition

5 To combine two sets of spatial objects according to some spatial properties It is an important type of query for multiple scanning in spatial DBS What is Spatial Join?

6 Example of Spatial Join Two relations: forests, cities (Assume an attributes in each relation represents the borders of forests and cities) Example query would be:  “Find all forests which are in a city”

7 Problems when performing Spatial Join It is too expensive in terms of CPU time and I/O time Traditional index structure is not efficient for spatial join How to make it more efficient?  R*-tree

8 Why using R*-tree for Spatial Join ? To optimize CPU-time and I/O time Less comparison than a simple nested loop Other algorithms cannot be efficiently applied to spatial join

9 R*-tree Approach for Spatial Join Suppose there are two R*-trees  R, S Idea:  To use the property that directory rectangles form the minimum bounding box of data rectangles in the corresponding subtrees.  If the rectangles of two directory entries E R and E S have common intersection then there is a pair (rect R, rect S )

10 Minimum Bounding Box

11 Is there anyway to be more efficient? There are two areas we need to take into account in order to be more efficient CPU – Time Tuning I/O – Time Tuning

12 CPU – Time Tuning Two ways to improve CPU – time  Restricting the search space  Spatial sorting and plane sweep

13 Restricting the search space Idea:  Scan through each of two nodes marks all entries which are required for performing the join, (i.e. which intersect the intersecting rectangles of two nodes. )  Then, each marked entry of one node is tested against all marked entries of the other node.

14 Restricting the search space (cont’d) Original: 7 of R * 7 of S Now: 3 of R * 2 of S = 49 joins Plus Scanning: 7 of R + 7 of S =6 joins = 14 times

15 Spatial sorting and plane sweep Idea:  Sort the entries in a node of the R*-tree according to the spatial location of the corresponding rectangles.  Then move the Sweep-Line perpendicular to one of the axes from left to right to compute the intersections.

16 Example of Sorted Intersection Test t = r1 : r1 s1 t = s1 : s1 r2 t = r2 : r2 s2, r2 s3 t = s2 : - t = r3: r3 s3 Sweep-Line r1.xu s1.xl s1.xl < r1.xu

17 I/O Time Tuning To achieve good I/O-performance with a buffer size as small as possible  R*-tree might occupy only small portion of LRU-buffer Compute a read schedule of the pages to minimize the number of disk accesses  Local optimization policy based on spatial locality Idea of Read Schedule: If a frequently used page always resides in the buffer, the number of disk access can be improved by a lot

18 Three such techniques Local plane sweep Local plane sweep with pinning Local z-order

19 Local Plane-Sweep Order Idea:  Based on spatial ordering, the plane-sweep algorithm creates a sequence of pairs of intersecting rectangles.  This sequence can be used to determine the read schedule of the spatial join.

20 Local Plane-Sweep Order (cont’d) Read schedule: s1 r1 r2 s2 r3 r < s1 s2 r2 r1 r4 r3 >,,,,,

21 Local Plane-Sweep Order w/ Pinning Idea: 1. Determine a pair of (Er,Es) of entries wrt local plane sweep order. Compute the degree of the rectangles of both entries Deg(E.rect) = # of intersections between E.rect and the rectangles which belong to entries of the other tree that are not yet processed 2. Pin the page in the buffer whose corresponding rectangle has maximal degree 3. Perform spatial join on the pinned page with all other pages

22 Local Plane-Sweep Order w/ Pinning (cont’d) s1 r1 r2 s2 r3 r4 Er Es Er.rect = r1 Es.rect = s2 Deg(r1) = Deg(s2) =

23 Local Z-Order Idea: 1. Compute the intersections between each rectangle of the one node and all rectangles of the other node 2. Sort the rectangles according to the spatial location of their centers 3. Decompose the underlying space into cells of equal size and provide an ordering on this set of cells

24 Local Z-Order (cont’d) s1 r1 r2 s2 r3 r4 IV II I III IV Read schedule: II I III

25 Number of Disk Access Size of LRU Buffer > <

26 Number of Disk Access (cont’d) Size of LRU Buffer

27 Q & A That’s it for the Presentation Any Questions?

28 Reference 1. Brinkhoff T., Kriegel H.P., Seeger B. (1993). Institute of Computer Science, University of Munich. Efficient Processing of Spatial Joins Using R-trees. Washington, DC, USA: ACM-SIGMOD.