Spatial Indexing SAMs. Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation.

Slides:



Advertisements
Similar presentations
Spatial Indexing SAMs. Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation.
Advertisements

Spatial and Temporal Data Mining V. Megalooikonomou Spatial Access Methods (SAMs) II (some slides are based on notes by C. Faloutsos)
Multimedia Database Systems
Spatial Join Queries. Spatial Queries Given a collection of geometric objects (points, lines, polygons,...) organize them on disk, to answer point queries.
I/O-Algorithms Lars Arge Fall 2014 September 25, 2014.
Indexing and Range Queries in Spatio-Temporal Databases
Nearest Neighbor Queries using R-trees Based on notes from G. Kollios.
CMU SCS : Multimedia Databases and Data Mining Lecture #6: Spatial Access Methods Part III: R-trees C. Faloutsos.
Search Trees.
Spatial Indexing I Point Access Methods. PAMs Point Access Methods Multidimensional Hashing: Grid File Exponential growth of the directory Hierarchical.
2-dimensional indexing structure
Multiple-key indexes Index on one attribute provides pointer to an index on the other. If V is a value of the first attribute, then the index we reach.
Spatial Access Methods Chapter 26 of book Read only 26.1, 26.2, 26.6 Dr Eamonn Keogh Computer Science & Engineering Department University of California.
Spatial Indexing for NN retrieval
Spatial Indexing SAMs. Spatial Access Methods PAMs Grid File kd-tree based (LSD-, hB- trees) Z-ordering + B+-tree R-tree Variations: R*-tree, Hilbert.
Accessing Spatial Data
Project Proposals Simonas Šaltenis Aalborg University Nykredit Center for Database Research Department of Computer Science, Aalborg University.
Spatial Indexing SAMs.
Spatial Queries Nearest Neighbor and Join Queries.
Multi-dimensional Indexes
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
1 R-Trees for Spatial Indexing Yanlei Diao UMass Amherst Feb 27, 2007 Some Slide Content Courtesy of J.M. Hellerstein.
Chapter 3: Data Storage and Access Methods
Spatial Indexing I Point Access Methods.
Spatial Queries Nearest Neighbor Queries.
I/O-Algorithms Lars Arge Aarhus University March 6, 2007.
Spatial Queries. R-tree: variations What about static datasets? (no ins/del) Hilbert What about other bounding shapes?
R-Trees 2-dimensional indexing structure. R-trees 2-dimensional version of the B-tree: B-tree of maximum degree 8; degree between 3 and 8 Internal nodes.
Spatial Indexing SAMs. Spatial Access Methods PAMs Grid File kd-tree based (LSD-, hB- trees) Z-ordering + B+-tree R-tree Variations: R*-tree, Hilbert.
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
Spatio-Temporal Databases. Outline Spatial Databases Temporal Databases Spatio-temporal Databases Multimedia Databases …..
Advanced Data Structures NTUA 2007 R-trees and Grid File.
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
Spatial Indexing. Spatial Queries Given a collection of geometric objects (points, lines, polygons,...) organize them on disk, to answer point queries.
R-TREES: A Dynamic Index Structure for Spatial Searching by A. Guttman, SIGMOD Shahram Ghandeharizadeh Computer Science Department University of.
CMU SCS : Multimedia Databases and Data Mining Lecture #6: Spatial Access Methods Part III: R-trees C. Faloutsos.
R-Trees: A Dynamic Index Structure for Spatial Data Antonin Guttman.
INDEXING SPATIAL DATABASES Atinder Singh Department of Computer Science University of California Riverside, CA
R-Trees Extension of B+-trees.  Collection of d-dimensional rectangles.  A point in d-dimensions is a trivial rectangle.
Spatial Data Management Chapter 28. Types of Spatial Data Point Data –Points in a multidimensional space E.g., Raster data such as satellite imagery,
The X-Tree An Index Structure for High Dimensional Data Stefan Berchtold, Daniel A Keim, Hans Peter Kriegel Institute of Computer Science Munich, Germany.
Temple University – CIS Dept. CIS616– Principles of Data Management V. Megalooikonomou Spatial Access Methods (SAMs) ( based on notes by Silberchatz,Korth,
Advanced Data Structures NTUA 2007 R-trees and Grid File.
R-Tree. 2 Spatial Database (Ia) Consider: Given a city map, ‘index’ all university buildings in an efficient structure for quick topological search.
Antonin Guttman In Proceedings of the 1984 ACM SIGMOD international conference on Management of data (SIGMOD '84). ACM, New York, NY, USA.
Nearest Neighbor Queries Chris Buzzerd, Dave Boerner, and Kevin Stewart.
Bin Yao (Slides made available by Feifei Li) R-tree: Indexing Structure for Data in Multi- dimensional Space.
Spatial Database 2/5/2011 Reference – Ramakrishna Gerhke and Silbershatz.
Optimizing Multidimensional Index Trees for Main Memory Access Author: Kihong Kim, Sang K. Cha, Keunjoo Kwon Members: Iris Zhang, Grace Yung, Kara Kwon,
Lecture 3: External Memory Indexing Structures (Contd) CS6931 Database Seminar.
Spatial Indexing Techniques Introduction to Spatial Computing CSE 5ISC Some slides adapted from Spatial Databases: A Tour by Shashi Shekhar Prentice Hall.
R-Trees: A Dynamic Index Structure For Spatial Searching Antonin Guttman.
R-T REES Accessing Spatial Data. I N THE BEGINNING … The B-Tree provided a foundation for R-Trees. But what’s a B-Tree? A data structure for storing sorted.
1 CSIS 7101: CSIS 7101: Spatial Data (Part 1) The R*-tree : An Efficient and Robust Access Method for Points and Rectangles Rollo Chan Chu Chung Man Mak.
Spatio-Temporal Databases
File Processing : Multi-dimensional Index 2015, Spring Pusan National University Ki-Joune Li.
R* Tree By Rohan Sadale Akshay Kulkarni.  Motivation  Optimization criteria for R* Tree  High level Algorithm  Example  Performance Agenda.
Multidimensional Access Methods Ho Hoang Nguyen Nguyen Thanh Trong Dao Vu Quoc Trung Ngo Phuoc Huong Thien DATABASE.
Jeremy Iverson & Zhang Yun 1.  Chapter 6 Key Concepts ◦ Structures and access methods ◦ R-Tree  R*-Tree  Mobile Object Indexing  Questions 2.
1 R-Trees Guttman. 2 Introduction Range queries in multiple dimensions: Computer Aided Design (CAD) Geo-data applications Support special data objects.
Spatial Data Management
Many slides taken from George Kollios, Boston University
Spatial Queries Nearest Neighbor and Join Queries.
Spatial Indexing.
Spatial Indexing I Point Access Methods.
R-tree: Indexing Structure for Data in Multi-dimensional Space
15-826: Multimedia Databases and Data Mining
Spatial Indexing I R-trees
File Processing : Multi-dimensional Index
C. Faloutsos Spatial Access Methods - R-trees
Presentation transcript:

Spatial Indexing SAMs

Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation technique and a PAM New methods: Spatial Access Methods SAMs R-tree and variations

Problem Given a collection of geometric objects (points, lines, polygons,...) organize them on disk, to answer spatial queries (range, nn, etc)

Transformation Technique Map an d-dim MBR into a point: ex. [(x min, x max ) (y min, y max )] => (x min, x max, y min, y max ) Use a PAM to index the 2d points Given a range query, map the query into the 2d space and use the PAM to answer it

R-tree [Guttman 84] Main idea: allow parents to overlap! => guaranteed 50% utilization => easier insertion/split algorithms. (only deal with Minimum Bounding Rectangles - MBRs)

R-tree A multi-way external memory tree Index nodes and data (leaf) nodes All leaf nodes appear on the same level Every node contains between m and M entries The root node has at least 2 entries (children)

Example eg., w/ fanout 4: group nearby rectangles to parent MBRs; each group -> disk page A B C D E F G H J I

Example F=4 A B C D E F G H I J P1 P2 P3 P4 FGDEHIJABC

Example F=4| m=2, M=4 A B C D E F G H I J P1 P2 P3 P4 P1P2 P3P4 FGDEHIJABC P5P6 P5 P6

R-trees - format of nodes {(MBR; obj_ptr)} for leaf nodes P1P2P3P4 ABC x-low; x-high y-low; y-high... obj ptr...

R-trees - format of nodes {(MBR; node_ptr)} for non-leaf nodes P1P2P3P4 ABC x-low; x-high y-low; y-high... node ptr...

R-trees:Search A B C D E F G H I J P1 P2 P3 P4 P1P2 P3P4 FGDEHIJABC P5P6

R-trees:Search P1P2 P3P4 FGDEHIJABC P5P6 A B C D E F G H I J P1 P2 P3 P4

R-trees:Search Main points: every parent node completely covers its ‘children’ a child MBR may be covered by more than one parent - it is stored under ONLY ONE of them. (ie., no need for dup. elim.) a point query may follow multiple branches. everything works for any(?) dimensionality

R-trees:Insertion A B C D E F G H I J P1 P2 P3 P4 P1P2P3P4 FGDEHIJABC X X Insert X

R-trees:Insertion A B C D E F G H I J P1 P2 P3 P4 P1P2P3P4 FGDEHIJABC Y Insert Y

R-trees:Insertion Extend the parent MBR A B C D E F G H I J P1 P2 P3 P4 P1P2P3P4 FGDEHIJABC Y Y

R-trees:Insertion How to find the next node to insert the new object? Using ChooseLeaf: Find the entry that needs the least enlargement to include Y. Resolve ties using the area (smallest) Other methods (later)

R-trees:Insertion If node is full then Split : ex. Insert w A B C D E F G H I J P1 P2 P3 P4 P1P2P3P4 FGDEHIJABC W K K

R-trees:Insertion If node is full then Split : ex. Insert w A B C D E F G H I J P1 P2 P3 P4 Q1Q2FGDEHIJAB W K CKW P5 P1P5P2P3 P4 Q1 Q2

R-trees:Split Split node P1: partition the MBRs into two groups. A B C W K P1 (A1: plane sweep, until 50% of rectangles) A2: ‘linear’ split A3: quadratic split A4: exponential split: 2 M-1 choices

R-trees:Split pick two rectangles as ‘seeds’; assign each rectangle ‘R’ to the ‘closest’ ‘seed’ seed1 seed2 R

R-trees:Split pick two rectangles as ‘seeds’; assign each rectangle ‘R’ to the ‘closest’ ‘seed’: ‘closest’: the smallest increase in area seed1 seed2 R

R-trees:Split How to pick Seeds: Linear:Find the highest and lowest side in each dimension, normalize the separations, choose the pair with the greatest normalized separation Quadratic: For each pair E1 and E2, calculate the rectangle J=MBR(E1, E2) and d= J-E1-E2. Choose the pair with the largest d

R-trees:Insertion Use the ChooseLeaf to find the leaf node to insert an entry E If leaf node is full, then Split, otherwise insert there Propagate the split upwards, if necessary Adjust parent nodes

R-Trees:Deletion Find the leaf node that contains the entry E Remove E from this node If underflow: Eliminate the node by removing the node entries and the parent entry Reinsert the orphaned (other entries) into the tree using Insert Other method (later)

R-trees: Variations R+-tree: DO not allow overlapping, so split the objects (similar to z-values) R*-tree: change the insertion, deletion algorithms (minimize not only area but also perimeter, forced re-insertion ) Hilbert R-tree: use the Hilbert values to insert objects into the tree

Spatial Access Methods PAMs Grid File kd-tree based (LSD-, hB- trees) Z-ordering + B+-tree R-tree Variations: R*-tree, Hilbert R-tree

R-tree A B C D E F G H I J P1 P2 P3 P4 P1P2P3P4 FGDEHIJABC Multi-way external memory structure, indexes MBRs Dynamic structure

R-tree The original R-tree tries to minimize the area of each enclosing rectangle in the index nodes. Is there any other property that can be optimized? R*-tree  Yes!

R*-tree Optimization Criteria: (O1) Area covered by an index MBR (O2) Overlap between directory MBRs (O3) Margin of a directory rectangle (O4) Storage utilization Sometimes it is impossible to optimize all the above criteria at the same time!

R*-tree ChooseSubtree: If next node is a leaf node, choose the node using the following criteria: Least overlap enlargement Least area enlargement Smaller area Else Least area enlargement Smaller area

R*-tree SplitNode Choose the axis to split Choose the two groups along the chosen axis ChooseSplitAxis Along each axis, sort rectangles and break them into two groups (M-2m+2 possible ways where one group contains at least m rectangles). Compute the sum S of all margin-values (perimeters) of each pair of groups. Choose the one that minimizes S ChooseSplitIndex Along the chosen axis, choose the grouping that gives the minimum overlap-value

R*-tree Forced Reinsert: defer splits, by forced-reinsert, i.e.: instead of splitting, temporarily delete some entries, shrink overflowing MBR, and re- insert those entries Which ones to re-insert? How many? A: 30%

R-tree: variations What about static datasets? (no ins/del) Hilbert What about other bounding shapes?

R-trees - variations what about static datasets (no ins/del/upd)? Q: Best way to pack points?

R-trees - variations what about static datasets (no ins/del/upd)? Q: Best way to pack points? A1: plane-sweep great for queries on ‘x’; terrible for ‘y’

R-trees - variations what about static datasets (no ins/del/upd)? Q: Best way to pack points? A1: plane-sweep great for queries on ‘x’; bad for ‘y’

R-trees - variations what about static datasets (no ins/del/upd)? Q: Best way to pack points? A1: plane-sweep great for queries on ‘x’; terrible for ‘y’ Q: how to improve?

R-trees - variations A: plane-sweep on HILBERT curve!

R-trees - variations A: plane-sweep on HILBERT curve! In fact, it can be made dynamic (how?), as well as to handle regions (how?)

R-trees - variations Dynamic (‘Hilbert R- tree): each point has an ‘h’- value (hilbert value) insertions: like a B-tree on the h-value but also store MBR, for searches

Hilbert R-tree Data structure of a node? LHV x-low, ylow x-high, y-high ptr h-value >= LHV & MBRs: inside parent MBR

R-trees - variations Data structure of a node? LHV x-low, ylow x-high, y-high ptr h-value >= LHV & MBRs: inside parent MBR ~B-tree

R-trees - variations Data structure of a node? LHV x-low, ylow x-high, y-high ptr h-value >= LHV & MBRs: inside parent MBR ~ R-tree