R-Tree. 2 Spatial Database (Ia) Consider: Given a city map, ‘index’ all university buildings in an efficient structure for quick topological search.

Slides:



Advertisements
Similar presentations
1 DATA STRUCTURES USED IN SPATIAL DATA MINING. 2 What is Spatial data ? broadly be defined as data which covers multidimensional points, lines, rectangles,
Advertisements

Spatial Indexing SAMs. Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation.
Advanced Database Discussion B Trees. Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if.
B+-Trees (PART 1) What is a B+ tree? Why B+ trees? Searching a B+ tree
1 Lecture 8: Data structures for databases II Jose M. Peña
Search Trees.
Multidimensional Data. Many applications of databases are "geographic" = 2­dimensional data. Others involve large numbers of dimensions. Example: data.
2-dimensional indexing structure
Spatial Indexing SAMs. Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation.
Multiple-key indexes Index on one attribute provides pointer to an index on the other. If V is a value of the first attribute, then the index we reach.
Spatial Access Methods Chapter 26 of book Read only 26.1, 26.2, 26.6 Dr Eamonn Keogh Computer Science & Engineering Department University of California.
Accessing Spatial Data
Liang, Introduction to Java Programming, Eighth Edition, (c) 2011 Pearson Education, Inc. All rights reserved Chapter Trees and B-Trees.
Spatial Indexing SAMs.
Spatial Information Systems (SIS) COMP Spatial access methods: Indexing.
Chapter 3: Data Storage and Access Methods
B-Trees Disk Storage What is a multiway tree? What is a B-tree?
Spatial Queries Nearest Neighbor Queries.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part B Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
Spatio-Temporal Databases. Introduction Spatiotemporal Databases: manage spatial data whose geometry changes over time Geometry: position and/or extent.
1 B-Trees Disk Storage What is a multiway tree? What is a B-tree? Why B-trees? Comparing B-trees and AVL-trees Searching a B-tree Insertion in a B-tree.
B + -Trees (Part 1). Motivation AVL tree with N nodes is an excellent data structure for searching, indexing, etc. –The Big-Oh analysis shows most operations.
B + -Trees (Part 1) COMP171. Slide 2 Main and secondary memories  Secondary storage device is much, much slower than the main RAM  Pages and blocks.
1 Indexing Structures for Files. 2 Basic Concepts  Indexing mechanisms used to speed up access to desired data without having to scan entire.
R-Trees 2-dimensional indexing structure. R-trees 2-dimensional version of the B-tree: B-tree of maximum degree 8; degree between 3 and 8 Internal nodes.
B-Trees and B+-Trees Disk Storage What is a multiway tree?
Homework #3 Due Thursday, April 17 Problems: –Chapter 11: 11.6, –Chapter 12: 12.1, 12.2, 12.3, 12.4, 12.5, 12.7.
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
B + -Trees COMP171 Fall AVL Trees / Slide 2 Dictionary for Secondary storage * The AVL tree is an excellent dictionary structure when the entire.
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
R-TREES: A Dynamic Index Structure for Spatial Searching by A. Guttman, SIGMOD Shahram Ghandeharizadeh Computer Science Department University of.
Binary Trees Chapter 6.
R-Trees: A Dynamic Index Structure for Spatial Data Antonin Guttman.
Chapter 61 Chapter 6 Index Structures for Files. Chapter 62 Indexes Indexes are additional auxiliary access structures with typically provide either faster.
INDEXING SPATIAL DATABASES Atinder Singh Department of Computer Science University of California Riverside, CA
R-Trees Extension of B+-trees.  Collection of d-dimensional rectangles.  A point in d-dimensions is a trivial rectangle.
Trees for spatial indexing
Indexing. Goals: Store large files Support multiple search keys Support efficient insert, delete, and range queries.
B-Tree. B-Trees a specialized multi-way tree designed especially for use on disk In a B-tree each node may contain a large number of keys. The number.
B+ Trees COMP
Spatial Data Management Chapter 28. Types of Spatial Data Point Data –Points in a multidimensional space E.g., Raster data such as satellite imagery,
1 B Trees - Motivation Recall our discussion on AVL-trees –The maximum height of an AVL-tree with n-nodes is log 2 (n) since the branching factor (degree,
INTRODUCTION TO MULTIWAY TREES P INTRO - Binary Trees are useful for quick retrieval of items stored in the tree (using linked list) - often,
Chapter 6 Binary Trees. 6.1 Trees, Binary Trees, and Binary Search Trees Linked lists usually are more flexible than arrays, but it is difficult to use.
COSC 2007 Data Structures II Chapter 15 External Methods.
12.1 Chapter 12: Indexing and Hashing Spring 2009 Sections , , Problems , 12.7, 12.8, 12.13, 12.15,
Antonin Guttman In Proceedings of the 1984 ACM SIGMOD international conference on Management of data (SIGMOD '84). ACM, New York, NY, USA.
1 Tree Indexing (1) Linear index is poor for insertion/deletion. Tree index can efficiently support all desired operations: –Insert/delete –Multiple search.
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture17.
Spatial and Geographic Databases ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria University (Karachi Campus)
Marwan Al-Namari Hassan Al-Mathami. Indexing What is Indexing? Indexing is a mechanisms. Why we need to use Indexing? We used indexing to speed up access.
Spatial and Geographic Databases. Spatial databases store information related to spatial locations, and support efficient storage, indexing and querying.
Spatial Indexing Techniques Introduction to Spatial Computing CSE 5ISC Some slides adapted from Spatial Databases: A Tour by Shashi Shekhar Prentice Hall.
R-Trees: A Dynamic Index Structure For Spatial Searching Antonin Guttman.
B-TREE. Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so much data that it won’t.
Indexing Database Management Systems. Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files File Organization 2.
1 CSIS 7101: CSIS 7101: Spatial Data (Part 1) The R*-tree : An Efficient and Robust Access Method for Points and Rectangles Rollo Chan Chu Chung Man Mak.
CS4432: Database Systems II More on Index Structures 1.
Chapter 11 Indexing And Hashing (1) Yonsei University 1 st Semester, 2016 Sanghyun Park.
Jeremy Iverson & Zhang Yun 1.  Chapter 6 Key Concepts ◦ Structures and access methods ◦ R-Tree  R*-Tree  Mobile Object Indexing  Questions 2.
Database Applications (15-415) DBMS Internals- Part III Lecture 13, March 06, 2016 Mohammad Hammoud.
1 R-Trees Guttman. 2 Introduction Range queries in multiple dimensions: Computer Aided Design (CAD) Geo-data applications Support special data objects.
Spatial Data Management
CMPS 3130/6130 Computational Geometry Spring 2017
Chapter 25: Advanced Data Types and New Applications
Spatial Indexing I Point Access Methods.
Spatio-Temporal Databases
Indexing and Hashing Basic Concepts Ordered Indices
Spatial Indexing I R-trees
Chapter 11 Indexing And Hashing (1)
Presentation transcript:

R-Tree

2 Spatial Database (Ia) Consider: Given a city map, ‘index’ all university buildings in an efficient structure for quick topological search.

3 Spatial Database (Ib) Consider: Given a city map, ‘index’ all university buildings in an efficient structure for quick topological search. Spatial object: Contour (outline) of the area around the building(s). Minimum bounding region (MBR) of the object.

4 Spatial Database (Ic) Consider: Given a city map, ‘index’ all university buildings in an efficient structure for quick relational-topological search. MBR of the city neighbourhoods. MBR of the city defining the overall search region.

5 Spatial Database (II) Notion: To retrieve data items quickly and efficiently according to their spatial locations. Involves 2D regions. Need to support 2D range queries. Multiple return values desired: Answering a query region by reporting all spatial objects that are fully-contained-in or overlapping the query region (Spatial-Access Method – SAM). In general: Spatial data objects often cover areas in multidimensional spaces. Spatial data objects are not well-represented by point-location. An ‘index’ based on an object’s spatial location is desirable.

6 The Indexing Approach A B-Tree (Rosenberg & Snyder, 1981) is an ordered, dynamic, multi-way structure of order m (i.e. each node has at most m children). The keys and the subtrees are arranged in the fashion of a search tree. Each node may contain a large number of keys, and the number of subtrees in each node, then, may also be large. The B-Tree is designed (among other objectives): – to branch out this large number of directions, and – to contain a lot of keys in each node so that the height of the tree is relatively short. M PTX BDFGKLNOQSVWYZI EH

7 The R-Tree Index Structure An R-Tree is a height-balanced tree, similar to a B-Tree. Index records in the leaf nodes contain pointers to the actual spatial- objects they represent. Leaves in the structure all appear on the same level. Spatial searching requires visiting only a small number of nodes. The index is completely dynamic: inserts and deletes can be intermixed with searches. No periodic reorganisation is required.

8 The R-Tree Index Structure A spatial database consists of a collection of tuples representing spatial objects, known as Entries. Each Entry has a unique identifier that points to one spatial object, and its MBR; i.e. Entry = (MBR, pointer).

9 R-Tree Index Structure – Leaf Entries An entry E in a leaf node is defined as (Guttman, 1984): E = (I, tuple-identifier) Where I refers to the smallest binding n-dimensional region (MBR) that encompasses the spatial data pointed to by its tuple-identifier. I is a series of closed-intervals that make up each dimension of the binding region. Example. In 2D, I = (I x, I y ), where I x = [x a, x b ], and I y = [y a, y b ].

10 R-Tree Index Structure – Leaf Entries In general I = (I 0, I 1, …, I n-1 ) for n-dimensions, and that I k = [k a, k b ]. If either k a or k b (or both) are equal to , this means that the spatial object extends outward indefinitely along that dimension.

11 R-Tree Index Structure – Non-Leaf Entries An entry E in a non-leaf node is defined as: E = (I, child-pointer) Where the child-pointer points to the child of this node, and I is the MBR that encompasses all the regions in the child-node’s pointer’s entries. I(A)I(B)…I(M) I(a)I(b)I(c)I(d) B a b c d

12 Properties Then an R-Tree must satisfy the following properties: 1.Every leaf node contains between m and M index records, unless it is the root. 2.For each index-record Entry (I, tuple-identifier) in a leaf node, I is the MBR that spatially contains the n-dimensional data object represented by the tuple-identifier. 3.Every non-leaf node has between m and M children, unless it is the root. 4.For each Entry (I, child-pointer) in a non-leaf node, I is the MBR that spatially contains the regions in the child node. 5.The root has two children unless it is a leaf. 6.All leaves appear on the same level. Let M be the maximum number of entries that will fit in one node. Let m ≤ M/2 be a parameter specifying the minimum number of entries in one node.

13 Node Overflow and Underflow A Node-Overflow happens when a new Entry is added to a fully packed node, causing the resulting number of entries in the node to exceed the upper-bound M. The ‘overflow’ node must be split, and all its current entries, as well as the new one, consolidated for local optimum arrangement. A Node-Underflow happens when one or more Entries are removed from a node, causing the remaining number of entries in that node to fall below the lower-bound m. The underflow node must be condensed, and its entries dispersed for global optimum arrangement.

Spatial Indexes Used to speed up spatial queries Example: Point query: return the geometric object that contains a given query point Sequentially scanning all objects of a large collection to check whether they contain the query point involves a high number of disk accesses and the repetition of the evaluation of computationally expensive geometric predicates (e.g., containment, intersection, etc.) Reducing the set of objects to be processed is highly desirable

Indexes for object-based and space-based representations Indexes for raster data: based on recursive subdivision of the space Example: quadtrees Indexes for vector data: differ depending on the type of data (extensions of quadtrees are used also for vector data)

Vector Data Indexing Different indexing methods are used for point, linear and polygonal dataDifferent indexing methods are used for point, linear and polygonal data In the case of collections of polygons, instead of indexing the object geometries themselves, whose shapes might be complex, we consider an approximation of the geometry and index it insteadIn the case of collections of polygons, instead of indexing the object geometries themselves, whose shapes might be complex, we consider an approximation of the geometry and index it instead Most commonly used approximation: minimum bounding rectangle (MBR) also called minimum bounding box (MBB)Most commonly used approximation: minimum bounding rectangle (MBR) also called minimum bounding box (MBB)

By using the MBR as the geometric key for building the spatial index, we save the cost of evaluating expensive geometric predicates during index traversal (as geometric tests againsts an MBR is constant)By using the MBR as the geometric key for building the spatial index, we save the cost of evaluating expensive geometric predicates during index traversal (as geometric tests againsts an MBR is constant) Example: point-in-polygon testExample: point-in-polygon test In addition, the space required to store a rectangle is constant (2 points)In addition, the space required to store a rectangle is constant (2 points) MBRs (x,y)

MBRs (cont.d) An operation involving a spatial predicate on a collection of objects indexed on their MBRs is performed in two steps: 1.Filter step: selects the objects whose MBR satisfies the spatial predicate (by traversing the spatial index and applying the predicate to the MBRs) 2.Refinement step: the objects that pass the filter step are a superset of the solution. An MBR might satisfy the predicate but the corresponding object might not obj MBR P

Refinement step: the objects that pass the filter step are a superset of the solution. An MBR might satisfy the predicate but the corresponding object might not Therefore, in this step the spatial predicate is applied to the actual geometry of the object obj MBR P Refinement step

Oracle Spatial Query Model Spatial Layer Data Table where coordinates are stored Primary Filter Spatial Index Index retrieves area of interest Reduced Data Set Secondary Filter Spatial Functions Procedures that determine exact relationship Exact Result Set

Oracle Spatial Indexing Methods Two types of indexes are implemented in Oracle Spatial: R-trees R-trees Quadtrees Quadtrees

R-trees Based on MBRs (minimum bounding rectangles) Defined for indexing 2D objects (can be extended to higher dimensions but implemented only for 2D in Oracle Spatial) MBRs of geometric objects form the leaves of the index tree Multiple MBRs are grouped into larger rectangles (MBRs) to form intermediate nodes in the tree Repeat until one rectangle is left that contains everything

R-trees a b c d abcd R S RS root R-tree Pointers to geometries

Remark: nodes Intermediate nodes store: Intermediate nodes store:  MBRs of collections of objects Leaf nodes store: Leaf nodes store:  MBRs of individual objects  Pointers to storage location of the exact geometry

Building R-trees An R-tree is a depth-balanced tree in which each node corresponds to a disk page (i.e., the number of entries in each node is limited) The structure satisfies the following properties: 1.For all nodes in the tree (except the root) the number of entries is between m and M 2.The root has at least two children (unless it is a leaf) 3.All leaves are at the same level

Example (1) a b c d abcd R S RS root R-tree Pointers to geometries m = 2; M = 3

Example (2) m = 2; M = 4 R-tree R1R1 R3 root R2R2 …..

Searching R-trees We consider two types of queries: 1.point query: “what object contains the query point” 2.window query: “what objects intersect the query window”

Basic spatial queries (1) Containment Query: Given a spatial object O, find all objects in the collection that completely contain O. When O is a point, the query is called Point Query O P Containment Query Point Query (also Point-in-polygon, or Point Location)

Basic spatial queries (2) Region Query: Given a region R, find all objects in the collection that intersect R. When R is a rectangle, the query is called Window Query R R Region Query Window Query

Searching R-trees: window query  Compare search window with MBRs stored at each node  starting at root node  Stop at leaf nodes  compare contained geometries with search window

5 Searching R-trees: window query Example: abcdRSroot R-tree a b c d root R S Pointers to geometries

Example: remarks If no MBRs are used: check the query window against all geometries for intersection (computationally expensive) In some cases, using R-trees to structure the set of MBRs can cause more tests (against MBRs) to be done. In general, this is not the case

Searching R-trees: point query Test query point for inclusion in MBRs stored at each node  starting at root node  Stop at leaf nodes  Test query point for inclusion in exact geometries

Exercise: point query a b c d abcd R S RS root R-tree Pointers to geometries P

5 Searching R-trees: point query Example: abRSroot R-tree a b root R S Pointers to geometries P

Summary Indexing Vector Spatial DataIndexing Vector Spatial Data R-trees:R-trees: Based on MBRs (leaves)Based on MBRs (leaves) Root: whole datasetRoot: whole dataset Intermediate nodes: groups of MBRs (objects) – not a partition of the underlying space!Intermediate nodes: groups of MBRs (objects) – not a partition of the underlying space!

Important remarks Note that the MBRs (at all levels) can overlapNote that the MBRs (at all levels) can overlap A rectangle is stored as child of a bigger rectangle only if completely contained in itA rectangle is stored as child of a bigger rectangle only if completely contained in itExample: