Download presentation
Presentation is loading. Please wait.
1
Chapter 3: Data Storage and Access Methods
Title: The R* Tree: An Efficient and Robust Access Method for Points and Rectangles Authors: N. Beckmann, H. Kriegel, R. Schneider and B. Seeger Pages:
2
The R* Tree: An Efficient and Robust Access Method for Points and Rectangles
Problem Problem Statement Why is this problem important? Why is this problem hard? Approaches Approach description, key concepts Contributions (novelty, improved) Assumptions
3
Problem Statement – R* Tree
Given Data containing points and rectangles Spatial queries (point, range query, insert, delete) Find - An Access Method (Data Structure) A hierarchical organization of rectangles Example from wikipedia Objectives Efficiency of spatial queries Constraints Balanced tree Each node is a disk page and has >= m (min # of entries) entries. Root has at least two children unless it is a leaf Efficiency metric = number of disk-pages accessed
4
Why is this problem important?
Multi-dimensional Applications Large geographic data. e.g., Map objects like countries occupy regions of non-zero size in two dimension. Common real world usage: “Find all museums within 2 miles of my current location". CAD … Many DBMS servers support spatial indices Orcale, IBM DB2, …
5
Why is this problem Hard?
B-tree split methods ineffective in 2-dimensions Ex. Sorting Size variation across data Rectangles Large rectangles limit split options! Non-uniform data distribution over space Dynamic Access Method Insertions and deletions Overlapping directory rectangles => multiple search paths
6
Novelty of Contribution
Related Work Traditional one-dimensional indexing structures (e.g., hash, B-tree) are not appropriate for range search B+ tree Represents sorted data in a way that allows for efficient insertion and removal of elements. Dynamic, multilevel index with maximum and minimum bounds on the number of keys in each node. Leaf nodes are linked together as a linked list to make range queries easy. R-tree R-tree is a foundation for spatial access method A complex spatial object is represented by minimum bounding rectangles while preserving essential geometric properties Over-lapping regions Heuristic: minimize the area of each enclosing rectangle in the inner nodes.
7
Principles of R-tree Height-balanced tree similar to a B-tree with index records in its leaf nodes containing pointers to data objects. Heuristic Optimization: minimize the area of each enclosing rectangle in the inner nodes. Reference: A Guttman ‘R-tree a dynamic index structure for spatial searching’, 1984
8
Performance Parameters beyond R-tree
(Q1) The area covered by a directory rectangle should be minimized. (Q2) The overlap between directory rectangles should be minimized. (Q3) The margin of a directory rectangle should be minimized. (Q4) Storage utilization should be optimized. Intuitions: Reduce overlap between sibling nodes. Reduce traversal of multiple branches for point query Reinsert old data changes entries between neighboring nodes and thus decreases overlap. Due to more restructuring, less splits occur
9
Difference between R-tree and R*-tree
Minimization of area, margin, and overlap is crucial to the performance of R-tree / R*-tree. The R*-tree attempts to reduce the tree, using a combination of a revised node split algorithm and the concept of forced reinsertion at node overflow. This is based on the observation that R-tree structures are highly susceptible to the order in which their entries are inserted, so an insertion-built (rather than bulk-loaded) structure is likely to be sub-optimal. Deletion and reinsertion of entries allows them to "find" a place in the tree that may be more appropriate than their original location. Improve retrieval performance
10
Example Preferred by R-tree Preferred by R*-tree R1 R2 R3 R5 R4 R1 R2
11
Validation Methodology
Experiments with simulated workloads Evaluation of design decisions Results R*-tree outperforms variants of R-tree and 2-level grid file. R*-tree is robust against non-uniform data distributions.
12
Summary Paper’s focus Ideas Experimental comparison
R*-tree – implementations and performance Ideas Heuristic Optimizations (pp. 208) Reduction of area, margin, and overlap of the directory rectangles Better Storage Utilization (pp 211) Forced Reinsertion (splits can be prevented) Experimental comparison Using many data distributions
13
Assumptions, Rewrite today
Indexing data in two-dimensional space Bulk load and bulk reorganization not available Concurrency control and recovery costs are negligible Reinserts during split! Rewrite today Bulk-load of rectangles Compare with newer methods R+ tree (disjoint sibling), Hilbert-R-tree Analytical results Formally compare R*-tree with alternatives
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.