Chapter 3: Data Storage and Access Methods

Slides:



Advertisements
Similar presentations
1 DATA STRUCTURES USED IN SPATIAL DATA MINING. 2 What is Spatial data ? broadly be defined as data which covers multidimensional points, lines, rectangles,
Advertisements

Spatial Indexing SAMs. Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation.
Multimedia Database Systems
 Definition of B+ tree  How to create B+ tree  How to search for record  How to delete and insert a data.
The Entity-Relationship Model IS698 Min Song. Overview of Database Design  Conceptual design: (ER Model is used at this stage.) What are the entities.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part C Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
Spatial Indexing I Point Access Methods. PAMs Point Access Methods Multidimensional Hashing: Grid File Exponential growth of the directory Hierarchical.
2-dimensional indexing structure
Spatial Indexing SAMs. Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation.
B+-tree and Hashing.
Accessing Spatial Data
CPSC 231 B-Trees (D.H.)1 LEARNING OBJECTIVES Problems with simple indexing. Multilevel indexing: B-Tree. –B-Tree creation: insertion and deletion of nodes.
Chapter 6: Database Evolution Title: AutoAdmin “What-if” Index Analysis Utility Authors: Surajit Chaudhuri, Vivek Narasayya ACM SIGMOD 1998.
Temporal Indexing MVBT. Temporal Indexing Transaction time databases : update the last version, query all versions Queries: “Find all employees that worked.
Spatial Indexing SAMs.
1 Tree-Structured Indexes Yanlei Diao UMass Amherst Feb 20, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
1 R-Trees for Spatial Indexing Yanlei Diao UMass Amherst Feb 27, 2007 Some Slide Content Courtesy of J.M. Hellerstein.
Spatial Indexing I Point Access Methods.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part B Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
Chapter 4: Transaction Management
CS 4432lecture #10 - indexing & hashing1 CS4432: Database Systems II Lecture #10 Professor Elke A. Rundensteiner.
R-Trees 2-dimensional indexing structure. R-trees 2-dimensional version of the B-tree: B-tree of maximum degree 8; degree between 3 and 8 Internal nodes.
Spatial Indexing SAMs. Spatial Access Methods PAMs Grid File kd-tree based (LSD-, hB- trees) Z-ordering + B+-tree R-tree Variations: R*-tree, Hilbert.
Homework #3 Due Thursday, April 17 Problems: –Chapter 11: 11.6, –Chapter 12: 12.1, 12.2, 12.3, 12.4, 12.5, 12.7.
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
R-TREES: A Dynamic Index Structure for Spatial Searching by A. Guttman, SIGMOD Shahram Ghandeharizadeh Computer Science Department University of.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Tree-Structured Indexes Chapter 9.
Tree-Structured Indexes. Range Searches ``Find all students with gpa > 3.0’’ –If data is in sorted file, do binary search to find first such student,
Introduction to Database Systems1 B+-Trees Storage Technology: Topic 5.
R-Trees: A Dynamic Index Structure for Spatial Data Antonin Guttman.
Chapter 61 Chapter 6 Index Structures for Files. Chapter 62 Indexes Indexes are additional auxiliary access structures with typically provide either faster.
INDEXING SPATIAL DATABASES Atinder Singh Department of Computer Science University of California Riverside, CA
Indexing. Goals: Store large files Support multiple search keys Support efficient insert, delete, and range queries.
Spatial Data Management Chapter 28. Types of Spatial Data Point Data –Points in a multidimensional space E.g., Raster data such as satellite imagery,
Database management Systems, 3ed, R. Ramakrishnan and J. Gehrke1 Spatial Data Management Chapter 28.
The X-Tree An Index Structure for High Dimensional Data Stefan Berchtold, Daniel A Keim, Hans Peter Kriegel Institute of Computer Science Munich, Germany.
Multidimensional Indexes Applications: geographical databases, data cubes. Types of queries: –partial match (give only a subset of the dimensions) –range.
Antonin Guttman In Proceedings of the 1984 ACM SIGMOD international conference on Management of data (SIGMOD '84). ACM, New York, NY, USA.
1 Tree Indexing (1) Linear index is poor for insertion/deletion. Tree index can efficiently support all desired operations: –Insert/delete –Multiple search.
Indexing and hashing Azita Keshmiri CS 157B. Basic concept An index for a file in a database system works the same way as the index in text book. For.
Index tuning-- B+tree. overview Overview of tree-structured index Indexed sequential access method (ISAM) B+tree.
Marwan Al-Namari Hassan Al-Mathami. Indexing What is Indexing? Indexing is a mechanisms. Why we need to use Indexing? We used indexing to speed up access.
R-Trees: A Dynamic Index Structure For Spatial Searching Antonin Guttman.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 B+-Tree Index Chapter 10 Modified by Donghui Zhang Nov 9, 2005.
1 CSIS 7101: CSIS 7101: Spatial Data (Part 1) The R*-tree : An Efficient and Robust Access Method for Points and Rectangles Rollo Chan Chu Chung Man Mak.
File Processing : Multi-dimensional Index 2015, Spring Pusan National University Ki-Joune Li.
R* Tree By Rohan Sadale Akshay Kulkarni.  Motivation  Optimization criteria for R* Tree  High level Algorithm  Example  Performance Agenda.
1 Chapter 12: Indexing and Hashing Indexing Indexing Basic Concepts Basic Concepts Ordered Indices Ordered Indices B+-Tree Index Files B+-Tree Index Files.
1 Tree-Structured Indexes Chapter Introduction  As for any index, 3 alternatives for data entries k* :  Data record with key value k   Choice.
Multidimensional Access Methods Ho Hoang Nguyen Nguyen Thanh Trong Dao Vu Quoc Trung Ngo Phuoc Huong Thien DATABASE.
Chapter 11 Indexing And Hashing (1) Yonsei University 1 st Semester, 2016 Sanghyun Park.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Tree-Structured Indexes Chapter 10.
Jeremy Iverson & Zhang Yun 1.  Chapter 6 Key Concepts ◦ Structures and access methods ◦ R-Tree  R*-Tree  Mobile Object Indexing  Questions 2.
1 R-Trees Guttman. 2 Introduction Range queries in multiple dimensions: Computer Aided Design (CAD) Geo-data applications Support special data objects.
CS422 Principles of Database Systems Indexes Chengyu Sun California State University, Los Angeles.
Spatial Data Management
Indexing Structures for Files and Physical Database Design
Multiway Search Trees Data may not fit into main memory
Tree-Structured Indexes
Spatial Indexing I Point Access Methods.
Tree-Structured Indexes
Indexing and Hashing Basic Concepts Ordered Indices
B+Trees The slides for this text are organized into chapters. This lecture covers Chapter 9. Chapter 1: Introduction to Database Systems Chapter 2: The.
Tree-Structured Indexes
Spatial Indexing I R-trees
Indexing 4/11/2019.
File Processing : Multi-dimensional Index
Tree-Structured Indexes
Presentation transcript:

Chapter 3: Data Storage and Access Methods Title: The R* Tree: An Efficient and Robust Access Method for Points and Rectangles Authors: N. Beckmann, H. Kriegel, R. Schneider and B. Seeger Pages: 207-216

The R* Tree: An Efficient and Robust Access Method for Points and Rectangles Problem Problem Statement Why is this problem important? Why is this problem hard? Approaches Approach description, key concepts Contributions (novelty, improved) Assumptions

Problem Statement – R* Tree Given Data containing points and rectangles Spatial queries (point, range query, insert, delete) Find - An Access Method (Data Structure) A hierarchical organization of rectangles Example from wikipedia Objectives Efficiency of spatial queries Constraints Balanced tree Each node is a disk page and has >= m (min # of entries) entries. Root has at least two children unless it is a leaf Efficiency metric = number of disk-pages accessed

Why is this problem important? Multi-dimensional Applications Large geographic data. e.g., Map objects like countries occupy regions of non-zero size in two dimension. Common real world usage: “Find all museums within 2 miles of my current location". CAD … Many DBMS servers support spatial indices Orcale, IBM DB2, …

Why is this problem Hard? B-tree split methods ineffective in 2-dimensions Ex. Sorting Size variation across data Rectangles Large rectangles limit split options! Non-uniform data distribution over space Dynamic Access Method Insertions and deletions Overlapping directory rectangles => multiple search paths

Novelty of Contribution Related Work Traditional one-dimensional indexing structures (e.g., hash, B-tree) are not appropriate for range search B+ tree Represents sorted data in a way that allows for efficient insertion and removal of elements. Dynamic, multilevel index with maximum and minimum bounds on the number of keys in each node. Leaf nodes are linked together as a linked list to make range queries easy. R-tree R-tree is a foundation for spatial access method A complex spatial object is represented by minimum bounding rectangles while preserving essential geometric properties Over-lapping regions Heuristic: minimize the area of each enclosing rectangle in the inner nodes.

Principles of R-tree Height-balanced tree similar to a B-tree with index records in its leaf nodes containing pointers to data objects. Heuristic Optimization: minimize the area of each enclosing rectangle in the inner nodes. Reference: A Guttman ‘R-tree a dynamic index structure for spatial searching’, 1984

Performance Parameters beyond R-tree (Q1) The area covered by a directory rectangle should be minimized. (Q2) The overlap between directory rectangles should be minimized. (Q3) The margin of a directory rectangle should be minimized. (Q4) Storage utilization should be optimized. Intuitions: Reduce overlap between sibling nodes. Reduce traversal of multiple branches for point query Reinsert old data changes entries between neighboring nodes and thus decreases overlap. Due to more restructuring, less splits occur

Difference between R-tree and R*-tree Minimization of area, margin, and overlap is crucial to the performance of R-tree / R*-tree. The R*-tree attempts to reduce the tree, using a combination of a revised node split algorithm and the concept of forced reinsertion at node overflow. This is based on the observation that R-tree structures are highly susceptible to the order in which their entries are inserted, so an insertion-built (rather than bulk-loaded) structure is likely to be sub-optimal. Deletion and reinsertion of entries allows them to "find" a place in the tree that may be more appropriate than their original location.  Improve retrieval performance

Example Preferred by R-tree Preferred by R*-tree R1 R2 R3 R5 R4 R1 R2

Validation Methodology Experiments with simulated workloads Evaluation of design decisions Results R*-tree outperforms variants of R-tree and 2-level grid file. R*-tree is robust against non-uniform data distributions.

Summary Paper’s focus Ideas Experimental comparison R*-tree – implementations and performance Ideas Heuristic Optimizations (pp. 208) Reduction of area, margin, and overlap of the directory rectangles Better Storage Utilization (pp 211) Forced Reinsertion (splits can be prevented) Experimental comparison Using many data distributions

Assumptions, Rewrite today Indexing data in two-dimensional space Bulk load and bulk reorganization not available Concurrency control and recovery costs are negligible Reinserts during split! Rewrite today Bulk-load of rectangles Compare with newer methods R+ tree (disjoint sibling), Hilbert-R-tree Analytical results Formally compare R*-tree with alternatives