R-Trees Extension of B+-trees.  Collection of d-dimensional rectangles.  A point in d-dimensions is a trivial rectangle.

Slides:



Advertisements
Similar presentations
Spatial Indexing SAMs. Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation.
Advertisements

Multimedia Database Systems
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
COMP 451/651 Indexes Chapter 1.
2-dimensional indexing structure
Spatial Indexing SAMs. Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation.
Spatial Access Methods Chapter 26 of book Read only 26.1, 26.2, 26.6 Dr Eamonn Keogh Computer Science & Engineering Department University of California.
Spatial Indexing for NN retrieval
Spatial Indexing SAMs. Spatial Access Methods PAMs Grid File kd-tree based (LSD-, hB- trees) Z-ordering + B+-tree R-tree Variations: R*-tree, Hilbert.
Accessing Spatial Data
CPSC 231 B-Trees (D.H.)1 LEARNING OBJECTIVES Problems with simple indexing. Multilevel indexing: B-Tree. –B-Tree creation: insertion and deletion of nodes.
Spatial Indexing SAMs.
B + -Trees Same structure as B-trees. Dictionary pairs are in leaves only. Leaves form a doubly-linked list. Remaining nodes have following structure:
Spatial Information Systems (SIS) COMP Spatial access methods: Indexing.
Lars Arge1, Mark de Berg2, Herman Haverkort3 and Ke Yi1
1 R-Trees for Spatial Indexing Yanlei Diao UMass Amherst Feb 27, 2007 Some Slide Content Courtesy of J.M. Hellerstein.
I/O-Algorithms Lars Arge Aarhus University March 6, 2007.
Spatio-Temporal Databases. Introduction Spatiotemporal Databases: manage spatial data whose geometry changes over time Geometry: position and/or extent.
R-Trees 2-dimensional indexing structure. R-trees 2-dimensional version of the B-tree: B-tree of maximum degree 8; degree between 3 and 8 Internal nodes.
Spatial Indexing SAMs. Spatial Access Methods PAMs Grid File kd-tree based (LSD-, hB- trees) Z-ordering + B+-tree R-tree Variations: R*-tree, Hilbert.
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
Spatio-Temporal Databases. Outline Spatial Databases Temporal Databases Spatio-temporal Databases Multimedia Databases …..
B-Trees (continued) Analysis of worst-case and average number of disk accesses for an insert. Delete and analysis. Structure for B-tree node.
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
B-Trees Large degree B-trees used to represent very large dictionaries that reside on disk. Smaller degree B-trees used for internal-memory dictionaries.
R-TREES: A Dynamic Index Structure for Spatial Searching by A. Guttman, SIGMOD Shahram Ghandeharizadeh Computer Science Department University of.
Indexing Spatial Data (Parts of Chapter 25+R-tree paper)
1 B-Trees Section AVL (Adelson-Velskii and Landis) Trees AVL tree is binary search tree with balance condition –To ensure depth of the tree is.
R-Trees: A Dynamic Index Structure for Spatial Data Antonin Guttman.
Chapter 61 Chapter 6 Index Structures for Files. Chapter 62 Indexes Indexes are additional auxiliary access structures with typically provide either faster.
Indexing and Hashing (emphasis on B+ trees) By Huy Nguyen Cs157b TR Lee, Sin-Min.
CPSC 335 BTrees Dr. Marina Gavrilova Computer Science University of Calgary Canada.
B-Tree. B-Trees a specialized multi-way tree designed especially for use on disk In a B-tree each node may contain a large number of keys. The number.
B-trees (Balanced Trees) A B-tree is a special kind of tree, similar to a binary tree. However, It is not a binary search tree. It is not a binary tree.
Storage CMSC 461 Michael Wilson. Database storage  At some point, database information must be stored in some format  It’d be impossible to store hundreds.
Spatial Data Management Chapter 28. Types of Spatial Data Point Data –Points in a multidimensional space E.g., Raster data such as satellite imagery,
1 SD-Rtree: A Scalable Distributed Rtree Witold Litwin & Cédric du Mouza & Philippe Rigaux.
2-3 Trees Extended tree.  Tree in which all empty subtrees are replaced by new nodes that are called external nodes.  Original nodes are called internal.
The X-Tree An Index Structure for High Dimensional Data Stefan Berchtold, Daniel A Keim, Hans Peter Kriegel Institute of Computer Science Munich, Germany.
COSC 2007 Data Structures II Chapter 15 External Methods.
12.1 Chapter 12: Indexing and Hashing Spring 2009 Sections , , Problems , 12.7, 12.8, 12.13, 12.15,
R-Tree. 2 Spatial Database (Ia) Consider: Given a city map, ‘index’ all university buildings in an efficient structure for quick topological search.
Nearest Neighbor Queries Chris Buzzerd, Dave Boerner, and Kevin Stewart.
Bin Yao (Slides made available by Feifei Li) R-tree: Indexing Structure for Data in Multi- dimensional Space.
B-Tree – Delete Delete 3. Delete 8. Delete
Lecture 3: External Memory Indexing Structures (Contd) CS6931 Database Seminar.
Spatial Indexing Techniques Introduction to Spatial Computing CSE 5ISC Some slides adapted from Spatial Databases: A Tour by Shashi Shekhar Prentice Hall.
R-Trees: A Dynamic Index Structure For Spatial Searching Antonin Guttman.
R-T REES Accessing Spatial Data. I N THE BEGINNING … The B-Tree provided a foundation for R-Trees. But what’s a B-Tree? A data structure for storing sorted.
1 CSIS 7101: CSIS 7101: Spatial Data (Part 1) The R*-tree : An Efficient and Robust Access Method for Points and Rectangles Rollo Chan Chu Chung Man Mak.
File Processing : Multi-dimensional Index 2015, Spring Pusan National University Ki-Joune Li.
R* Tree By Rohan Sadale Akshay Kulkarni.  Motivation  Optimization criteria for R* Tree  High level Algorithm  Example  Performance Agenda.
Presenters: Amool Gupta Amit Sharma. MOTIVATION Basic problem that it addresses?(Why) Other techniques to solve same problem and how this one is step.
Spatio-Temporal Databases. Term Project Groups of 2 students You can take a look on some project ideas from here:
Multi-dimensional Range Query Processing on the GPU Beomseok Nam Date Intensive Computing Lab School of Electrical and Computer Engineering Ulsan National.
Jeremy Iverson & Zhang Yun 1.  Chapter 6 Key Concepts ◦ Structures and access methods ◦ R-Tree  R*-Tree  Mobile Object Indexing  Questions 2.
1 R-Trees Guttman. 2 Introduction Range queries in multiple dimensions: Computer Aided Design (CAD) Geo-data applications Support special data objects.
Spatial Data Management
Mehdi Kargar Department of Computer Science and Engineering
B+-Trees j a0 k1 a1 k2 a2 … kj aj j = number of keys in node.
B-Trees Large degree B-trees used to represent very large dictionaries that reside on disk. Smaller degree B-trees used for internal-memory dictionaries.
Chapter 25: Advanced Data Types and New Applications
Spatial Indexing I Point Access Methods.
Advanced Topics in Data Management
B-Trees (continued) Analysis of worst-case and average number of disk accesses for an insert. Delete and analysis. Structure for B-tree node.
Spatial Indexing I R-trees
Database Design and Programming
B-Trees Large degree B-trees used to represent very large dictionaries that reside on disk. Smaller degree B-trees used for internal-memory dictionaries.
B+-Trees j a0 k1 a1 k2 a2 … kj aj j = number of keys in node.
B-Trees Large degree B-trees used to represent very large dictionaries that reside on disk. Smaller degree B-trees used for internal-memory dictionaries.
Presentation transcript:

R-Trees Extension of B+-trees.  Collection of d-dimensional rectangles.  A point in d-dimensions is a trivial rectangle.

Non-rectangular Data Non-rectangular data may be represented by minimum bounding rectangles (MBRs).

Operations Insert Delete Find all rectangles that intersect a query rectangle. Good for large rectangle collections stored on disk.

R-Trees—Structure Data nodes (leaves) contain rectangles. Index nodes (non-leaves) contain MBRs for data in subtrees. MBR for rectangles or MBRs in a non-root node is stored in parent node.

R-Trees—Structure R-tree of order M.  Each node other than the root has between m <= ceil(M/2) and M rectangles/MBRs. Assume m = ceil(M/2) henceforth.  Typically, m = ceil(M/2).  Root has between 2 and M rectangles/MBRs.  Each index node has as many MBRs as children.  All data nodes are at the same level.

Example R-tree of order 4.  Each node may have up to 4 rectangles/MBRs.

Example Possible partitioning of our example data into 12 leaves.

Example Possible R-tree of order 4 with 12 leaves. a b cde fghij kl mnop Leaves are data nodes that contain 4 input rectangles each. a-p are MBRs

Example Possible corresponding grouping. a b c d m a b cde fghij kl mnop

Example a b c d m e f n Possible corresponding grouping. a b cde fghij kl mnop

Example a b c d m e f n h g i o p Possible corresponding grouping. a b cde fghij kl mnop

Query Report all rectangles that intersect a given rectangle.

Query Start at root and find all MBRs that overlap query. Search corresponding subtrees recursively.

Query m n op    x a b cde fghij kl mnop

Search m. m n opa b c d   x x a b cde fghij kl mnop

Insert Similar to insertion into B+-tree but may insert into any leaf; leaf splits in case capacity exceeded.  Which leaf to insert into?  How to split a node?

Insert—Leaf Selection Follow a path from root to leaf. At each node move into subtree whose MBR area increases least with addition of new rectangle. m n op

Insert—Leaf Selection Insert into m. m

Insert—Leaf Selection Insert into n. n

Insert—Leaf Selection Insert into o. o

Insert—Leaf Selection Insert into p. p

Insert—Split A Node Split set of M+1 rectangles/MBRs into 2 sets A and B.  A and B each have at least m rectangles/MBRs.  Sum of areas of MBRs of A and B is minimum. M = 8, m = 4

Insert—Split A Node Split set of M+1 rectangles/MBRs into 2 sets A and B.  A and B each have at least m rectangles/MBRs.  Sum of areas of MBRs of A and B is minimum. M = 8, m = 4

Insert—Split A Node Split set of M+1 rectangles/MBRs into 2 sets A and B.  A and B each have at least m rectangles/MBRs.  Sum of areas of MBRs of A and B is minimum. M = 8, m = 4

Insert—Split A Node Exhaustive search for best A and B.  Compute area(MBR(A)) + area(MBR(B)) for each possible A.  Note—for each A, the B is unique.  Select partition that minimizes this sum. When |A| = m = ceil(M/2), number of choices for A is (M+1)! m!(M+1-m)! Impractical for large M.

Insert—Split A Node Grow A and B using a clustering strategy.  Start with a seed rectangle a for A and b for B.  Grow A and B one rectangle at a time.  Stop when the M+1 rectangles have been partitioned into A and B.

Insert—Split A Node Quadratic Method—seed selection.  Let S be the set of M+1 rectangles to be partitioned.  Find a and b in  S that maximize area(MBR(a,b)) – area(a) – area(b) M = 8, m = 4

Insert—Split A Node Quadratic Method—seed selection.  Let S be the set of M+1 rectangles to be partitioned.  Find a and b in  S that maximize area(MBR(a,b)) – area(a) – area(b) M = 8, m = 4

Insert—Split A Node Quadratic Method—assign remaining rectangles/MBRs.  Find an unassigned rectangle c that maximizes |area(MBR(A,c)) – area(MBR(A)) - (area(MBR(B,c)) – area(MBR(B)))| M = 8, m = 4

Insert—Split A Node Quadratic Method—assign remaining rectangles/MBRs.  Find an unassigned rectangle c that maximizes |area(MBR(A,c)) – area(MBR(A)) - (area(MBR(B,c)) – area(MBR(B)))| M = 8, m = 4

Insert—Split A Node Quadratic Method—assign remaining rectangles/MBRs.  Assign c to partition whose area increases least. M = 8, m = 4

Insert—Split A Node Quadratic Method—assign remaining rectangles/MBRs.  Continue assigning in this way until all remaining rectangles must necessarily be assigned to one of the two partitions for that partition to have m rectangles. M = 8, m = 4

Insert—Split A Node Linear Method—seed selection.  Choose a and b to have maximum normalized separation. M = 8, m = 4

Insert—Split A Node Linear Method—seed selection.  Choose a and b to have maximum normalized separation. M = 8, m = 4 Separation in x- dimension

Insert—Split A Node Linear Method—seed selection.  Choose a and b to have maximum normalized separation. M = 8, m = 4 Rectangles with max x-separation

Insert—Split A Node Linear Method—seed selection.  Choose a and b to have maximum normalized separation. M = 8, m = 4 Divide by x-width to normalize

Insert—Split A Node Linear Method—seed selection.  Choose a and b to have maximum normalized separation. M = 8, m = 4 Separation in y- dimension

Insert—Split A Node Linear Method—seed selection.  Choose a and b to have maximum normalized separation. M = 8, m = 4 Rectangles with max y-separation

Insert—Split A Node Linear Method—seed selection.  Choose a and b to have maximum normalized separation. M = 8, m = 4 Divide by y-width to normalize

Insert—Split A Node Linear Method—assign remainder.  Assign remaining rectangles in random order.  Rectangle is assigned to partition whose MBR area increases least.  Stop when all remaining rectangles must be assigned to one of the partitions so that the partition has its minimum required m rectangles. M = 8, m = 4

Delete If leaf doesn’t become deficient, simply readjust MBRs in path from root. If leaf becomes deficient, get from nearest sibling (if possible) and readjust MBRs. Combine with sibling as in B+ tree. Could instead do a more global reorganization to get better R-tree.

Variants R*-tree  Leaf selection and node overflows in insertion handled differently. Hilbert R-tree

Related Structures R + -tree  Index nodes have non-overlapping rectangles.  A data object may be represented in several data nodes.  No upper bound on size of a data node.  No bounds (lower/upper) on degree of an index node.

Related Structures Cell tree  Combines BSP and R+-tree concepts.  Index nodes have non-overlapping convex polyhedrons.  No lower/upper bound on size of a data node.  Lower bound (but not upper) on degree of an index node.