Spatial Indexing I R-trees

Slides:



Advertisements
Similar presentations
Spatial Indexing SAMs. Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation.
Advertisements

Spatial and Temporal Data Mining V. Megalooikonomou Spatial Access Methods (SAMs) II (some slides are based on notes by C. Faloutsos)
Multimedia Database Systems
Spatial Join Queries. Spatial Queries Given a collection of geometric objects (points, lines, polygons,...) organize them on disk, to answer point queries.
I/O-Algorithms Lars Arge Fall 2014 September 25, 2014.
Indexing and Range Queries in Spatio-Temporal Databases
B+-Trees (PART 1) What is a B+ tree? Why B+ trees? Searching a B+ tree
Search Trees.
Spatial Indexing I Point Access Methods. PAMs Point Access Methods Multidimensional Hashing: Grid File Exponential growth of the directory Hierarchical.
2-dimensional indexing structure
Spatial Indexing SAMs. Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation.
Multiple-key indexes Index on one attribute provides pointer to an index on the other. If V is a value of the first attribute, then the index we reach.
Spatial Access Methods Chapter 26 of book Read only 26.1, 26.2, 26.6 Dr Eamonn Keogh Computer Science & Engineering Department University of California.
Spatial Indexing for NN retrieval
Spatial Indexing SAMs. Spatial Access Methods PAMs Grid File kd-tree based (LSD-, hB- trees) Z-ordering + B+-tree R-tree Variations: R*-tree, Hilbert.
Accessing Spatial Data
CPSC 231 B-Trees (D.H.)1 LEARNING OBJECTIVES Problems with simple indexing. Multilevel indexing: B-Tree. –B-Tree creation: insertion and deletion of nodes.
Spatial Indexing SAMs.
Spatial Queries Nearest Neighbor and Join Queries.
1 R-Trees for Spatial Indexing Yanlei Diao UMass Amherst Feb 27, 2007 Some Slide Content Courtesy of J.M. Hellerstein.
Chapter 3: Data Storage and Access Methods
Spatial Queries Nearest Neighbor Queries.
1 Indexing Structures for Files. 2 Basic Concepts  Indexing mechanisms used to speed up access to desired data without having to scan entire.
R-Trees 2-dimensional indexing structure. R-trees 2-dimensional version of the B-tree: B-tree of maximum degree 8; degree between 3 and 8 Internal nodes.
Spatial Indexing SAMs. Spatial Access Methods PAMs Grid File kd-tree based (LSD-, hB- trees) Z-ordering + B+-tree R-tree Variations: R*-tree, Hilbert.
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
R-TREES: A Dynamic Index Structure for Spatial Searching by A. Guttman, SIGMOD Shahram Ghandeharizadeh Computer Science Department University of.
R-Trees: A Dynamic Index Structure for Spatial Data Antonin Guttman.
Chapter 61 Chapter 6 Index Structures for Files. Chapter 62 Indexes Indexes are additional auxiliary access structures with typically provide either faster.
INDEXING SPATIAL DATABASES Atinder Singh Department of Computer Science University of California Riverside, CA
R-Trees Extension of B+-trees.  Collection of d-dimensional rectangles.  A point in d-dimensions is a trivial rectangle.
Storage CMSC 461 Michael Wilson. Database storage  At some point, database information must be stored in some format  It’d be impossible to store hundreds.
Spatial Data Management Chapter 28. Types of Spatial Data Point Data –Points in a multidimensional space E.g., Raster data such as satellite imagery,
The X-Tree An Index Structure for High Dimensional Data Stefan Berchtold, Daniel A Keim, Hans Peter Kriegel Institute of Computer Science Munich, Germany.
Antonin Guttman In Proceedings of the 1984 ACM SIGMOD international conference on Management of data (SIGMOD '84). ACM, New York, NY, USA.
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture17.
Nearest Neighbor Queries Chris Buzzerd, Dave Boerner, and Kevin Stewart.
Bin Yao (Slides made available by Feifei Li) R-tree: Indexing Structure for Data in Multi- dimensional Space.
Spatial Indexing Techniques Introduction to Spatial Computing CSE 5ISC Some slides adapted from Spatial Databases: A Tour by Shashi Shekhar Prentice Hall.
R-Trees: A Dynamic Index Structure For Spatial Searching Antonin Guttman.
R-T REES Accessing Spatial Data. I N THE BEGINNING … The B-Tree provided a foundation for R-Trees. But what’s a B-Tree? A data structure for storing sorted.
1 CSIS 7101: CSIS 7101: Spatial Data (Part 1) The R*-tree : An Efficient and Robust Access Method for Points and Rectangles Rollo Chan Chu Chung Man Mak.
File Processing : Multi-dimensional Index 2015, Spring Pusan National University Ki-Joune Li.
R* Tree By Rohan Sadale Akshay Kulkarni.  Motivation  Optimization criteria for R* Tree  High level Algorithm  Example  Performance Agenda.
Jeremy Iverson & Zhang Yun 1.  Chapter 6 Key Concepts ◦ Structures and access methods ◦ R-Tree  R*-Tree  Mobile Object Indexing  Questions 2.
1 R-Trees Guttman. 2 Introduction Range queries in multiple dimensions: Computer Aided Design (CAD) Geo-data applications Support special data objects.
Spatial Data Management
Mehdi Kargar Department of Computer Science and Engineering
Many slides taken from George Kollios, Boston University
Spatial Queries Nearest Neighbor and Join Queries.
Data Indexing Herbert A. Evans.
Multiway Search Trees Data may not fit into main memory
Tree-Structured Indexes: Introduction
Chapter 25: Advanced Data Types and New Applications
Spatial Indexing.
Spatial Indexing I Point Access Methods.
Database Management Systems (CS 564)
B-Trees Disk Storage What is a multiway tree? What is a B-tree?
R-tree: Indexing Structure for Data in Multi-dimensional Space
Spatio-Temporal Databases
15-826: Multimedia Databases and Data Mining
B-Tree.
Multiway Trees Searching and B-Trees Advanced Tree Structures
B-Trees Disk Storage What is a multiway tree? What is a B-tree?
B-Trees Disk Storage What is a multiway tree? What is a B-tree?
File Processing : Multi-dimensional Index
Multidimensional Search Structures
Donghui Zhang, Tian Xia Northeastern University
C. Faloutsos Spatial Access Methods - R-trees
Presentation transcript:

Spatial Indexing I R-trees

Problem Given a collection of geometric objects (points, lines, polygons, ...) organize them on disk, to answer efficiently spatial queries (range, nn, etc)

R-tree In multidimensional space, there is no unique ordering! Not possible to use B+-trees [Guttman 84] R-tree! Group objects close in space in the same node => guaranteed page utilization => easy insertion/split algorithms. (only deal with Minimum Bounding Rectangles - MBRs)

R-tree A multi-way external memory tree Index nodes and data (leaf) nodes All leaf nodes appear on the same level Every node contains between m and M entries The root node has at least 2 entries (children)

Example eg., w/ fanout 4: group nearby rectangles to parent MBRs; each group -> disk page I C A G F H B J E D

Example F=4 P1 P3 I C A G F H B J A B C H I J E P4 P2 D D E F G

Example F=4| m=2, M=4 I C A P1 G F P3 H B J E P4 P2 D P6 P5 P5 P6 P1

R-trees - format of nodes {(MBR; obj_ptr)} for leaf nodes P1 P2 P3 P4 x-low; x-high y-low; y-high ... obj ptr A B C ...

R-trees - format of nodes {(MBR; node_ptr)} for non-leaf nodes x-low; x-high y-low; y-high ... node ptr P1 P2 P3 P4 ... A B C

R-trees:Search I C P1 A G F P3 H B J E P4 P2 D P5 P6 P1 P2 P3 P4 A B C

R-trees:Search I C P1 A G F P3 H B J E P4 P2 D P5 P6 P1 P2 P3 P4 A B C

R-trees:Search Main points: every parent node completely covers its ‘children’ nodes in the same level may overlap! a child MBR may be covered by more than one parent - it is stored under ONLY ONE of them. (ie., no need for dup. elim.) a point query may follow multiple branches. everything works for any(?) dimensionality

R-trees:Insertion Insert X P1 P3 I C A G F H B X J E P4 P2 D P1 P2 P3

R-trees:Insertion Insert Y P1 P3 I C A G F H B J Y E P4 P2 D P1 P2 P3

R-trees:Insertion Extend the parent MBR P1 P3 I C A G F H B J Y E P4

R-trees:Insertion How to find the next node to insert a new object Y? Using ChooseLeaf: Find the entry that needs the least enlargement to include Y. Resolve ties using the area (smallest) Other methods (later)

R-trees:Insertion P1 K P3 I C A G W F H B J E P4 P2 D If node is full then Split : ex. Insert w P1 K P3 I P1 P2 P3 P4 C A G W F H B J A B C K H I J E P4 P2 D D E F G

R-trees:Insertion P3 P5 I K C P1 A G W F H B J E P4 P2 D Q2 Q1 If node is full then Split : ex. Insert w Q1 Q2 P3 P5 I K P1 P5 P2 P3 P4 C P1 A G W F H B A B J H I J E P4 C K W P2 D Q2 F G Q1 D E

R-trees:Split (A1: plane sweep, until 50% of rectangles) P1 K Split node P1: partition the MBRs into two groups. (A1: plane sweep, until 50% of rectangles) A2: ‘linear’ split A3: quadratic split A4: exponential split: 2M-1 choices P1 K C A W B

R-trees:Split seed2 R seed1 pick two rectangles as ‘seeds’for group 1 and group 2; seed2 R seed1

R-trees:Split seed2 R seed1 pick two rectangles as ‘seeds’for group 1 and group 2; assign each rectangle ‘R’ to the ‘closest’ ‘group’: ‘closest’: the smallest increase in area seed2 R seed1

R-trees:Linear Split How to pick Seeds: Find the rects with the highest low and lowest high sides in each dimension Normalize the separations by dividing by the width of all the rects in the corresponding dim Choose the pair with the greatest normalized separation PickNext: pick one of the remaining rects and add it to the closest group

R-trees: Quadratic Split How to pick Seeds: For each pair E1 and E2, calculate the rectangle J=MBR(E1, E2) and d= J-E1-E2. Choose the pair with the largest d PickNext: For each remaining rect calculate the area increase to include it in group d1 and d2 Choose rect with highest difference: |d1-d2| Assign to closest group

R-trees:Insertion Use the ChooseLeaf to find the leaf node to insert an entry E If leaf node is full, then Split, otherwise insert there Propagate the split upwards, if necessary Adjust parent nodes

R-Trees:Deletion Other method (later) Find the leaf node that contains the entry E Remove E from this node If underflow: Eliminate the node by removing the node entries and the parent entry Reinsert the orphaned (other entries) into the tree using Insert Other method (later)

R-trees: Variations R+-tree: DO not allow overlapping, so split the objects (similar to z-values) R*-tree: change the insertion, deletion algorithms (minimize not only area but also perimeter, forced re-insertion )

For simplicity node capacity = 3 In practice, in the order of 100

R-Tree, Leaf Nodes

R-Tree – Intermediate Nodes

R-tree, Range Query E E Root E E 1 2 E E E E 1 E E 3 4 5 6 7 E 2 a b c y axis 10 m g h l 8 k f e E 6 2 i j E d 1 4 b a 2 c x axis 2 4 6 8 10 Root E E 1 2 E E E E 1 E E 3 4 5 6 7 E 2 a b c d e f g h i j k l m E E E E E 3 4 5 6 7

Range Query E E Root E E 1 2 E E E E 1 E E 3 4 5 6 7 E 2 a b c d e f g y axis 10 m g h l 8 k f e E 6 2 i j E d 1 4 b a 2 c x axis 2 4 6 8 10 Root E E 1 2 E E E E 1 E E 3 4 5 6 7 E 2 a b c d e f g h i j k l m E E E E E 3 4 5 6 7

R-tree The original R-tree tries to minimize the area of each enclosing rectangle in the index nodes. Is there any other property that can be optimized? R*-tree  Yes!

R*-tree Optimization Criteria: (O1) Area covered by an index MBR (O2) Overlap between directory MBRs (O3) Margin of a directory rectangle (O4) Storage utilization Sometimes it is impossible to optimize all the above criteria at the same time!

R*-tree ChooseLeaf: If next node is not a leaf node, choose the node using the following criteria: Least overlap enlargement Least area enlargement Smaller area Else

R*-tree ChooseSplitIndex SplitNode Choose the axis to split Choose the two groups along the chosen axis ChooseSplitAxis Along each axis, sort rectangles and break them into two groups (M-2m+2 possible ways where one group contains at least m rectangles). Compute the sum S of all margin-values (perimeters) of each pair of groups. Choose the one that minimizes S ChooseSplitIndex Along the chosen axis, choose the grouping that gives the minimum overlap-value

R*-tree Forced Reinsert: Which ones to re-insert? How many? A: 30% defer splits, by forced-reinsert, i.e.: instead of splitting, temporarily delete some entries, shrink overflowing MBR, and re-insert those entries Which ones to re-insert? The ones that are further way from the center How many? A: 30%