R-Trees: A Dynamic Index Structure For Spatial Searching Antonin Guttman.

Slides:



Advertisements
Similar presentations
Spatial Indexing SAMs. Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation.
Advertisements

Multimedia Database Systems
 Definition of B+ tree  How to create B+ tree  How to search for record  How to delete and insert a data.
B+-Trees (PART 1) What is a B+ tree? Why B+ trees? Searching a B+ tree
R-Trees: A Dynamic Index Structure For Spatial Searching Antonin Guttman.
Search Trees.
2-dimensional indexing structure
Spatial Indexing SAMs. Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation.
Data Structures and Algorithms1 B-Trees with Minimum=1 2-3 Trees.
Spatial Indexing for NN retrieval
Accessing Spatial Data
Project Proposals Simonas Šaltenis Aalborg University Nykredit Center for Database Research Department of Computer Science, Aalborg University.
CPSC 231 B-Trees (D.H.)1 LEARNING OBJECTIVES Problems with simple indexing. Multilevel indexing: B-Tree. –B-Tree creation: insertion and deletion of nodes.
Spatial Indexing SAMs.
Data Indexing Herbert A. Evans. Purposes of Data Indexing What is Data Indexing? Why is it important?
1 R-Trees for Spatial Indexing Yanlei Diao UMass Amherst Feb 27, 2007 Some Slide Content Courtesy of J.M. Hellerstein.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part B Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
1 Indexing Structures for Files. 2 Basic Concepts  Indexing mechanisms used to speed up access to desired data without having to scan entire.
R-Trees 2-dimensional indexing structure. R-trees 2-dimensional version of the B-tree: B-tree of maximum degree 8; degree between 3 and 8 Internal nodes.
Homework #3 Due Thursday, April 17 Problems: –Chapter 11: 11.6, –Chapter 12: 12.1, 12.2, 12.3, 12.4, 12.5, 12.7.
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
1 CS 728 Advanced Database Systems Chapter 17 Database File Indexing Techniques, B- Trees, and B + -Trees.
R-TREES: A Dynamic Index Structure for Spatial Searching by A. Guttman, SIGMOD Shahram Ghandeharizadeh Computer Science Department University of.
CS4432: Database Systems II
R-Trees: A Dynamic Index Structure for Spatial Data Antonin Guttman.
Chapter 61 Chapter 6 Index Structures for Files. Chapter 62 Indexes Indexes are additional auxiliary access structures with typically provide either faster.
INDEXING SPATIAL DATABASES Atinder Singh Department of Computer Science University of California Riverside, CA
R-Trees Extension of B+-trees.  Collection of d-dimensional rectangles.  A point in d-dimensions is a trivial rectangle.
Indexing. Goals: Store large files Support multiple search keys Support efficient insert, delete, and range queries.
B-Tree. B-Trees a specialized multi-way tree designed especially for use on disk In a B-tree each node may contain a large number of keys. The number.
 B+ Tree Definition  B+ Tree Properties  B+ Tree Searching  B+ Tree Insertion  B+ Tree Deletion.
ICS 220 – Data Structures and Algorithms Week 7 Dr. Ken Cosh.
B-trees (Balanced Trees) A B-tree is a special kind of tree, similar to a binary tree. However, It is not a binary search tree. It is not a binary tree.
Spatial Data Management Chapter 28. Types of Spatial Data Point Data –Points in a multidimensional space E.g., Raster data such as satellite imagery,
1 B Trees - Motivation Recall our discussion on AVL-trees –The maximum height of an AVL-tree with n-nodes is log 2 (n) since the branching factor (degree,
Data Structures Balanced Trees 1CSCI Outline  Balanced Search Trees 2-3 Trees Trees Red-Black Trees 2CSCI 3110.
Sorting. Pseudocode of Insertion Sort Insertion Sort To sort array A[0..n-1], sort A[0..n-2] recursively and then insert A[n-1] in its proper place among.
2-3 Tree. Slide 2 Outline  Balanced Search Trees 2-3 Trees Trees.
Starting at Binary Trees
Antonin Guttman In Proceedings of the 1984 ACM SIGMOD international conference on Management of data (SIGMOD '84). ACM, New York, NY, USA.
1 Tree Indexing (1) Linear index is poor for insertion/deletion. Tree index can efficiently support all desired operations: –Insert/delete –Multiple search.
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture17.
Bin Yao (Slides made available by Feifei Li) R-tree: Indexing Structure for Data in Multi- dimensional Space.
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
CS 61B Data Structures and Programming Methodology Aug 7, 2008 David Sun.
1 Multi-Level Indexing and B-Trees. 2 Statement of the Problem When indexes grow too large they have to be stored on secondary storage. However, there.
Spatial Indexing Techniques Introduction to Spatial Computing CSE 5ISC Some slides adapted from Spatial Databases: A Tour by Shashi Shekhar Prentice Hall.
Indexing Database Management Systems. Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files File Organization 2.
1 CSIS 7101: CSIS 7101: Spatial Data (Part 1) The R*-tree : An Efficient and Robust Access Method for Points and Rectangles Rollo Chan Chu Chung Man Mak.
Indexing and B+-Trees By Kenneth Cheung CS 157B TR 07:30-08:45 Professor Lee.
R* Tree By Rohan Sadale Akshay Kulkarni.  Motivation  Optimization criteria for R* Tree  High level Algorithm  Example  Performance Agenda.
1 R-Trees Guttman. 2 Introduction Range queries in multiple dimensions: Computer Aided Design (CAD) Geo-data applications Support special data objects.
Spatial Data Management
Mehdi Kargar Department of Computer Science and Engineering
COMP261 Lecture 23 B Trees.
Data Indexing Herbert A. Evans.
Multiway Search Trees Data may not fit into main memory
Tree-Structured Indexes
CS522 Advanced database Systems
Chapter 25: Advanced Data Types and New Applications
B+ Trees What are B+ Trees used for What is a B Tree What is a B+ Tree
B+ Tree.
Chapter Trees and B-Trees
Chapter Trees and B-Trees
Data Structures and Algorithms
B+ Trees What are B+ Trees used for What is a B Tree What is a B+ Tree
B+Trees The slides for this text are organized into chapters. This lecture covers Chapter 9. Chapter 1: Introduction to Database Systems Chapter 2: The.
Spatial Indexing I R-trees
Tree-Structured Indexes
Presentation transcript:

R-Trees: A Dynamic Index Structure For Spatial Searching Antonin Guttman

Introduction Range queries in multiple dimensions:  Computer Aided Design (CAD)  Geo-data applications Support spacial data objects (boxes) Index structure is dynamic.

R-Tree Balanced (similar to B+ tree) I is an n-dimensional rectangle of the form (I 0, I 1,..., I n-1 ) where I i is a range [a,b]  [- ,  ] Leaf node index entries: (I, tuple_id) Non-leaf node entry: (I, child_ptr) M is maximum entries per node. m  M/2 is the minimum entries per node.

Invariants 1.Every leaf (non-leaf) has between m and M records (children) except for the root. 2.Root has at least two children unless it is a leaf. 3.For each leaf (non-leaf) entry, I is the smallest rectangle that contains the data objects (children). 4.All leaves appear at the same level.

Example (part 1)

Example (part 2)

Searching Given a search rectangle S... 1.Start at root and locate all child nodes whose rectangle I intersects S (via linear search). 2.Search the subtrees of those child nodes. 3.When you get to the leaves, return entries whose rectangles intersect S. Searches may require inspecting several paths. Worst case running time is not so good...

S = R16

Insertion Insertion is done at the leaves Where to put new index E with rectangle R? 1.Start at root. 2.Go down the tree by choosing child whose rectangle needs the least enlargement to include R. In case of a tie, choose child with smallest area. 3.If there is room in the correct leaf node, insert it. Otherwise split the node (to be continued...) 4.Adjust the tree... 5.If the root was split into nodes N 1 and N 2, create new root with N 1 and N 2 as children.

Adjusting the tree 1.N = leaf node. If there was a split, then NN is the other node. 2.If N is root, stop. Otherwise P = N’s parent and E N is its entry for N. Adjust the rectangle for E N to tightly enclose N. 3.If NN exists, add entry E NN to P. E NN points to NN and its rectangle tightly encloses NN. 4.If necessary, split P 5.Set N=P and go to step 2.

Deletion 1.Find the entry to delete and remove it from the appropriate leaf L. 2.Set N=L and Q = . (Q is set of eliminated nodes) 3.If N is root, go to step 6. Let P be N’s parent and E N be the entry that points to N. If N has less than m entries, delete E N from P and add N to Q. 4.If N has at least m entries then set the rectangle of E N to tightly enclose N. 5.Set N=P and repeat from step 3. 6.*Reinsert entries from eliminated leaves. Insert non- leaf entries higher up so that all leaves are at the same level. 7.If root has 1 child, make the child the new root.

Why Reinsert? Nodes can be merged with sibling whose area will increase the least, or entries can be redistributed. In any case, nodes may need to be split. Reinsertion is easier to implement. Reinsertion refines the spatial structure of the tree. Entries to be reinserted are likely to be in memory because their pages are visited during the search to find the index to delete.

Other Operations To update, delete the appropriate index, modify it, and reinsert. Search for objects completely contained in rectangle R. Search for objects that contain a rectangle. Range deletion.

Splitting Nodes Problem: Divide M+1 entries among two nodes so that it is unlikely that the nodes are needlessly examined during a search. Solution: Minimize total area of the covering rectangles for both nodes. Exponential algorithm. Quadratic algorithm. Linear time algorithm.

Splitting Nodes – Exhaustive Search Try all possible combinations. Optimal results! Bad running time!

Splitting Nodes – Quadratic Algorithm 1.Find pair of entries E 1 and E 2 that maximizes area(J) - area(E 1 ) - area(E 2 ) where J is covering rectangle. 2.Put E 1 in one group, E 2 in the other. 3.If one group has M-m+1 entries, put the remaining entries into the other group and stop. If all entries have been distributed then stop. 4.For each entry E, calculate d 1 and d 2 where d i is the minimum area increase in covering rectangle of Group i when E is added. 5.Find E with maximum |d 1 - d 2 | and add E to the group whose area will increase the least. 6.Repeat starting with step 3.

Greedy continued Algorithm is quadratic in M. Linear in number of dimensions. But not optimal.

Splitting Nodes – Linear Algorithm 1.For each dimension, choose entry with greatest range. 2.Normalize by dividing the range by the width of entire set along that dimension. 3.Put the two entries with largest normalized separation into different groups. 4.Randomly, but evenly divide the rest of the entries between the two groups. Algorithm is linear, almost no attempt at optimality.

Performance Tests CENTRAL circuit cell (1057 rectangles) Measure performance on last 10% inserts. Search used randomly generated rectangles that match about 5% of the data. Delete every 10 th data item.

Performance

With linear-time splitting, inserts spend very little time doing splits. Increasing m reduces splitting (and insertion) cost because when a groups becomes too full, the rest of the entries are assigned to the other group. As expected, most of the space is taken up by the leaves.

Performance

Deletion cost affected by size of m. For large m:  More nodes become underfull.  More reinserts take place.  More possible splits.  Running time is pretty bad for m = M/2. Search is relatively insensitive to splitting algorithm. Smaller values of m reduce average number of entries per node, so less time is spent on search in the node (?).

Space Efficiency Stricter node fill produces smaller index. For very small m, linear algorithm balances nodes. Other algorithms tend to produce unbalanced groups which are likely to split, wasting more space.

Conclusions Linear time splitting algorithm is almost as good as the others. Low node-fill requirement reduces space- utilization but is not siginificantly worse than stricter node-fill requirements. R-tree can be added to relational databases.