R-Trees 2-dimensional indexing structure. R-trees 2-dimensional version of the B-tree: B-tree of maximum degree 8; degree between 3 and 8 Internal nodes.

Slides:



Advertisements
Similar presentations
The A-tree: An Index Structure for High-dimensional Spaces Using Relative Approximation Yasushi Sakurai (NTT Cyber Space Laboratories) Masatoshi Yoshikawa.
Advertisements

Spatial Indexing SAMs. Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation.
Multimedia Database Systems
Nearest Neighbor Queries using R-trees Based on notes from G. Kollios.
B+-Trees (PART 1) What is a B+ tree? Why B+ trees? Searching a B+ tree
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
COMP 451/651 Indexes Chapter 1.
2-dimensional indexing structure
Spatial Indexing SAMs. Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation.
1 B trees Nodes have more than 2 children Each internal node has between k and 2k children and between k-1 and 2k-1 keys A leaf has between k-1 and 2k-1.
Spatial Access Methods Chapter 26 of book Read only 26.1, 26.2, 26.6 Dr Eamonn Keogh Computer Science & Engineering Department University of California.
Spatial Indexing for NN retrieval
I/O-Algorithms Lars Arge Aarhus University March 16, 2006.
Spatial Indexing SAMs. Spatial Access Methods PAMs Grid File kd-tree based (LSD-, hB- trees) Z-ordering + B+-tree R-tree Variations: R*-tree, Hilbert.
Accessing Spatial Data
Project Proposals Simonas Šaltenis Aalborg University Nykredit Center for Database Research Department of Computer Science, Aalborg University.
CPSC 231 B-Trees (D.H.)1 LEARNING OBJECTIVES Problems with simple indexing. Multilevel indexing: B-Tree. –B-Tree creation: insertion and deletion of nodes.
Spatial Indexing SAMs.
I/O-Algorithms Lars Arge Spring 2009 April 28, 2009.
I/O-Algorithms Lars Arge Spring 2009 March 3, 2009.
1 R-Trees for Spatial Indexing Yanlei Diao UMass Amherst Feb 27, 2007 Some Slide Content Courtesy of J.M. Hellerstein.
Chapter 3: Data Storage and Access Methods
Spatial Queries Nearest Neighbor Queries.
I/O-Algorithms Lars Arge Aarhus University March 6, 2007.
Spatio-Temporal Databases. Introduction Spatiotemporal Databases: manage spatial data whose geometry changes over time Geometry: position and/or extent.
CSE 326: Data Structures B-Trees Ben Lerner Summer 2007.
Spatial Indexing SAMs. Spatial Access Methods PAMs Grid File kd-tree based (LSD-, hB- trees) Z-ordering + B+-tree R-tree Variations: R*-tree, Hilbert.
Homework #3 Due Thursday, April 17 Problems: –Chapter 11: 11.6, –Chapter 12: 12.1, 12.2, 12.3, 12.4, 12.5, 12.7.
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
Spatio-Temporal Databases. Outline Spatial Databases Temporal Databases Spatio-temporal Databases Multimedia Databases …..
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
R-TREES: A Dynamic Index Structure for Spatial Searching by A. Guttman, SIGMOD Shahram Ghandeharizadeh Computer Science Department University of.
R-Trees: A Dynamic Index Structure for Spatial Data Antonin Guttman.
Indexing and Hashing (emphasis on B+ trees) By Huy Nguyen Cs157b TR Lee, Sin-Min.
INDEXING SPATIAL DATABASES Atinder Singh Department of Computer Science University of California Riverside, CA
R-Trees Extension of B+-trees.  Collection of d-dimensional rectangles.  A point in d-dimensions is a trivial rectangle.
B-Tree. B-Trees a specialized multi-way tree designed especially for use on disk In a B-tree each node may contain a large number of keys. The number.
Index Structures for Files Indexes speed up the retrieval of records under certain search conditions Indexes called secondary access paths do not affect.
Spring 2006 Copyright (c) All rights reserved Leonard Wesley0 B-Trees CMPE126 Data Structures.
B-trees (Balanced Trees) A B-tree is a special kind of tree, similar to a binary tree. However, It is not a binary search tree. It is not a binary tree.
Spatial Data Management Chapter 28. Types of Spatial Data Point Data –Points in a multidimensional space E.g., Raster data such as satellite imagery,
1 B Trees - Motivation Recall our discussion on AVL-trees –The maximum height of an AVL-tree with n-nodes is log 2 (n) since the branching factor (degree,
The X-Tree An Index Structure for High Dimensional Data Stefan Berchtold, Daniel A Keim, Hans Peter Kriegel Institute of Computer Science Munich, Germany.
External Memory Algorithms for Geometric Problems Piotr Indyk (slides partially by Lars Arge and Jeff Vitter)
B-trees and kd-trees Piotr Indyk (slides partially by Lars Arge from Duke U)
CPSC 221: Algorithms and Data Structures Lecture #7 Sweet, Sweet Tree Hives (B+-Trees, that is) Steve Wolfman 2010W2.
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture17.
Nearest Neighbor Queries Chris Buzzerd, Dave Boerner, and Kevin Stewart.
Bin Yao (Slides made available by Feifei Li) R-tree: Indexing Structure for Data in Multi- dimensional Space.
1 Multi-Level Indexing and B-Trees. 2 Statement of the Problem When indexes grow too large they have to be stored on secondary storage. However, there.
Lecture 3: External Memory Indexing Structures (Contd) CS6931 Database Seminar.
Spatial Indexing Techniques Introduction to Spatial Computing CSE 5ISC Some slides adapted from Spatial Databases: A Tour by Shashi Shekhar Prentice Hall.
R-Trees: A Dynamic Index Structure For Spatial Searching Antonin Guttman.
B-TREE. Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so much data that it won’t.
R-T REES Accessing Spatial Data. I N THE BEGINNING … The B-Tree provided a foundation for R-Trees. But what’s a B-Tree? A data structure for storing sorted.
1 CSIS 7101: CSIS 7101: Spatial Data (Part 1) The R*-tree : An Efficient and Robust Access Method for Points and Rectangles Rollo Chan Chu Chung Man Mak.
Indexing and B+-Trees By Kenneth Cheung CS 157B TR 07:30-08:45 Professor Lee.
File Processing : Multi-dimensional Index 2015, Spring Pusan National University Ki-Joune Li.
R* Tree By Rohan Sadale Akshay Kulkarni.  Motivation  Optimization criteria for R* Tree  High level Algorithm  Example  Performance Agenda.
Jeremy Iverson & Zhang Yun 1.  Chapter 6 Key Concepts ◦ Structures and access methods ◦ R-Tree  R*-Tree  Mobile Object Indexing  Questions 2.
1 R-Trees Guttman. 2 Introduction Range queries in multiple dimensions: Computer Aided Design (CAD) Geo-data applications Support special data objects.
Spatial Data Management
Multiway Search Trees Data may not fit into main memory
B+ Trees What are B+ Trees used for What is a B Tree What is a B+ Tree
KD Tree A binary search tree where every node is a
Nearest Neighbor Queries using R-trees
R-tree: Indexing Structure for Data in Multi-dimensional Space
Spatial Indexing I R-trees
Donghui Zhang, Tian Xia Northeastern University
Presentation transcript:

R-Trees 2-dimensional indexing structure

R-trees 2-dimensional version of the B-tree: B-tree of maximum degree 8; degree between 3 and 8 Internal nodes with k children have k-1 split values

R-trees Can store: –a set of polygons (regions of a subdivision) –a set of polygonal lines (or boundaries) –a set of points –a mix of the above Stored objects may overlap

R-trees Originally by Guttman, 1984 Dozens of variations and optimizations since Suitable for windowing, point location and intersection queries Heuristic structure, no order bounds ( O(..) ) Tree with higher degree: suitable for background storage (short search paths); one node per disk block

Definition R-tree Every internal node contains entries (rectangle, pointer to child node) All leaves contain entries (rectangle, pointer to object) in database or file Rectangles are minimal bounding rectangles (MBR) The root has  2 and  M entries All other nodes have at least m and at most M entries All leaves have the same depth m > 1 and M > 2m (e.g. m = 200; M = 1000)

Object descriptions

Grouping of objects Windowing query: the fewer rectangles intersected, the fewer subtrees to descend into

Grouping of objects Objects close together in same leaves  small rectangles  queries descend in only few subtrees Group the child nodes under a parent node such that small rectangles arise

Heuristics for fast queries Small area of rectangles Small perimeter of rectangles Little overlap among rectangles Well-filled nodes (tree less deep  fewer disk accesses on each search path)

Example R-tree

Object descriptions

point containment query

Searching in an R-tree Q is query object (point, window, object); we search for intersections with stored objects For each rectangle R in the current node, if Q and R intersect, –search recursively in the subtree under the pointer at R (at an internal node) –get the object corresponding to R and test for intersection with R (at a leaf)

Nearest neighbor queries An R-tree can be used for nearest neighbor queries The idea is to perform a DFS, maintain the closest object so far and use the distance for pruning closest object so far queried pruned

Inserting in an R-tree Determine minimal bounding rectangle (MBR) of new object When not yet at a leaf (choose subtree): –determine rectangle whose area increment after insertion of R is smallest –increase this rectangle if necessary and insert R At a leaf: –if there is space, insert, otherwise Split Node

Split Node Divide the M+1 rectangles into two groups, each with at least m and at most M rectangles Make a node for each group, with the rectangles and corresponding subtrees as entries Hang the two new nodes under the parent node in the place of the overfull node; determine the new MBRs (if the root was overfull, make a new root with two child nodes) If the parent has M+1 children, repeat Split Node with this parent

Split Node, example New MBRs

Strategies for Split Node, I Determine R 1 and R 2 with largest MBR: the seeds for sets S 1 and S 2 While |S 1 |, |S 2 | < M - m and not all rectangles distributed: –Take not yet distributed rectangle R j, add to the set whose MBR increases least Linear R-tree of Guttman, 1984

Example Split Node I

Strategies for Split Node, II Determine R 1 and R 2 with largest area(MBR) - area(R 1 ) - area(R 2 ): the seeds for sets S 1 and S 2 While |S 1 |, |S 2 | < M - m and not all distributed: –Determine of every not yet distributed rectangle R j : d 1 = area increment of MBR(S 1  R j ) // w.r.t. MBR(S 1 ) d 2 = area increment of MBR(S 2  R j ) // w.r.t. MBR(S 2 ) –Choose R i with maximal | d 1 - d 2 | ; add it to the set with smallest area increment Quadratic R-tree of Guttman, 1984

Example Split Node, II

Strategies for Split Node, III Determine R 1 and R 2 with largest area(MBR) - area(R 1 ) - area(R 2 ): the seeds for sets S 1 and S 2 // same as quadratic R-tree Determine axis with largest normalized separation of R 1 and R 2 ( x-separation / x-range of MBR(R 1  R 2 ), or y-separation / y-range of MBR(R 1  R 2 ) ) Sort rectangles according to that axis (lower left corner) and split evenly in subsets of size (M+1) / 2 Greene’s split, 1989

Example Split Node, III Y-axis has largest normalized separation

Deletion from an R-tree Find the leaf (node) and delete object; determine new (possibly smaller) MBR If the node is too empty (< m entries): –delete the node recursively at its parent –insert all entries of the deleted node into the R-tree Note: Insertion of entries/subtrees always occurs at the level where it came from

Insert as rectangle on middle level

Insert in a leaf object

R*-trees Experimentally determined measures for choices at insertion (Choose Subtree, Split Node) Experimentally determined algorithms for: –Choose Subtree –Split Node

R*-trees; Choose Subtree At nodes directly above leaves: Choose entry (rectangle) with smallest overlap-increase At higher nodes: Choose entry (rectangle) with smallest area-increase (same as before) R,…, R are the entry rectangles 1 p

R*-trees; Split Node Determine split axis: For both the x- and the y-axis: –sort the rectangles by smallest and largest coordinate –determine the M - 2m + 2 allowed distributions into two groups –determine for each: the perimeter of the two MBRs –add the M - 2m + 2 perimeter lengths Choose the axis with smallest sum of perimeters mm M - 2m + 1

R*-trees; Split Node Determine split index (given the split axis): Choose the distribution, among the M - 2m + 2, with the smallest area of intersection of the MBRs

Forced reinsert Build R-tree by repeated insertion: first inserted rectangles are possibly badly placed Experiment: –make R-tree by inserting rectangles –again, but afterwards, delete the first inserted and insert them again! Search time improvement of 20-50% !

Summary R-trees Versatile 2-dimensional search tree (referred to as: indexing structure, or spatial index) Some R-tree version used in most GIS Well-suited for windowing, point location, intersection, and nearest neighbor queries Heuristic structure, no order bounds ( O(..) ) Dynamic; insertions and deletions supported Tree with higher degree: well-suited for background storage (short search paths)