Spatial Indexing SAMs.

Slides:



Advertisements
Similar presentations
Spatial Indexing SAMs. Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation.
Advertisements

Spatial and Temporal Data Mining V. Megalooikonomou Spatial Access Methods (SAMs) II (some slides are based on notes by C. Faloutsos)
Multimedia Database Systems
I/O-Algorithms Lars Arge Fall 2014 September 25, 2014.
Nearest Neighbor Queries using R-trees Based on notes from G. Kollios.
CMU SCS : Multimedia Databases and Data Mining Lecture #6: Spatial Access Methods Part III: R-trees C. Faloutsos.
Search Trees.
QuadTrees 1. 2 x Quad Tree y 3 Quad Trees Split on all (two) dimensions at each level Split key space into equal size partitions (quadrants) Add a new.
2-dimensional indexing structure
Spatial Indexing SAMs. Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation.
Multiple-key indexes Index on one attribute provides pointer to an index on the other. If V is a value of the first attribute, then the index we reach.
Spatial Access Methods Chapter 26 of book Read only 26.1, 26.2, 26.6 Dr Eamonn Keogh Computer Science & Engineering Department University of California.
Spatial Indexing for NN retrieval
Spatial Indexing SAMs. Spatial Access Methods PAMs Grid File kd-tree based (LSD-, hB- trees) Z-ordering + B+-tree R-tree Variations: R*-tree, Hilbert.
Accessing Spatial Data
Spatial Queries Nearest Neighbor and Join Queries.
Multi-dimensional Indexes
Spatial Information Systems (SIS) COMP Spatial access methods: Indexing.
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
1 R-Trees for Spatial Indexing Yanlei Diao UMass Amherst Feb 27, 2007 Some Slide Content Courtesy of J.M. Hellerstein.
Chapter 3: Data Storage and Access Methods
Spatial Indexing I Point Access Methods.
Spatial Queries Nearest Neighbor Queries.
I/O-Algorithms Lars Arge Aarhus University March 6, 2007.
R-tree Analysis. R-trees - performance analysis How many disk (=node) accesses we’ll need for range nn spatial joins why does it matter?
R-Trees 2-dimensional indexing structure. R-trees 2-dimensional version of the B-tree: B-tree of maximum degree 8; degree between 3 and 8 Internal nodes.
Spatial Indexing SAMs. Spatial Access Methods PAMs Grid File kd-tree based (LSD-, hB- trees) Z-ordering + B+-tree R-tree Variations: R*-tree, Hilbert.
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
Advanced Data Structures NTUA 2007 R-trees and Grid File.
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
Spatial Indexing. Spatial Queries Given a collection of geometric objects (points, lines, polygons,...) organize them on disk, to answer point queries.
R-TREES: A Dynamic Index Structure for Spatial Searching by A. Guttman, SIGMOD Shahram Ghandeharizadeh Computer Science Department University of.
CMU SCS : Multimedia Databases and Data Mining Lecture #6: Spatial Access Methods Part III: R-trees C. Faloutsos.
R-Trees: A Dynamic Index Structure for Spatial Data Antonin Guttman.
INDEXING SPATIAL DATABASES Atinder Singh Department of Computer Science University of California Riverside, CA
R-Trees Extension of B+-trees.  Collection of d-dimensional rectangles.  A point in d-dimensions is a trivial rectangle.
Storage CMSC 461 Michael Wilson. Database storage  At some point, database information must be stored in some format  It’d be impossible to store hundreds.
Spatial Data Management Chapter 28. Types of Spatial Data Point Data –Points in a multidimensional space E.g., Raster data such as satellite imagery,
Database management Systems, 3ed, R. Ramakrishnan and J. Gehrke1 Spatial Data Management Chapter 28.
1 B Trees - Motivation Recall our discussion on AVL-trees –The maximum height of an AVL-tree with n-nodes is log 2 (n) since the branching factor (degree,
Temple University – CIS Dept. CIS616– Principles of Data Management V. Megalooikonomou Spatial Access Methods (SAMs) ( based on notes by Silberchatz,Korth,
Multidimensional Indexes Applications: geographical databases, data cubes. Types of queries: –partial match (give only a subset of the dimensions) –range.
Advanced Data Structures NTUA 2007 R-trees and Grid File.
R-Tree. 2 Spatial Database (Ia) Consider: Given a city map, ‘index’ all university buildings in an efficient structure for quick topological search.
Antonin Guttman In Proceedings of the 1984 ACM SIGMOD international conference on Management of data (SIGMOD '84). ACM, New York, NY, USA.
Nearest Neighbor Queries Chris Buzzerd, Dave Boerner, and Kevin Stewart.
Bin Yao (Slides made available by Feifei Li) R-tree: Indexing Structure for Data in Multi- dimensional Space.
Spatial Database 2/5/2011 Reference – Ramakrishna Gerhke and Silbershatz.
Optimizing Multidimensional Index Trees for Main Memory Access Author: Kihong Kim, Sang K. Cha, Keunjoo Kwon Members: Iris Zhang, Grace Yung, Kara Kwon,
Lecture 3: External Memory Indexing Structures (Contd) CS6931 Database Seminar.
Spatial Indexing Techniques Introduction to Spatial Computing CSE 5ISC Some slides adapted from Spatial Databases: A Tour by Shashi Shekhar Prentice Hall.
R-Trees: A Dynamic Index Structure For Spatial Searching Antonin Guttman.
R-trees: An Average Case Analysis. R-trees - performance analysis How many disk (=node) accesses we ’ ll need for range nn spatial joins why does it matter?
1 CSIS 7101: CSIS 7101: Spatial Data (Part 1) The R*-tree : An Efficient and Robust Access Method for Points and Rectangles Rollo Chan Chu Chung Man Mak.
File Processing : Multi-dimensional Index 2015, Spring Pusan National University Ki-Joune Li.
R* Tree By Rohan Sadale Akshay Kulkarni.  Motivation  Optimization criteria for R* Tree  High level Algorithm  Example  Performance Agenda.
Multidimensional Access Methods Ho Hoang Nguyen Nguyen Thanh Trong Dao Vu Quoc Trung Ngo Phuoc Huong Thien DATABASE.
Jeremy Iverson & Zhang Yun 1.  Chapter 6 Key Concepts ◦ Structures and access methods ◦ R-Tree  R*-Tree  Mobile Object Indexing  Questions 2.
1 R-Trees Guttman. 2 Introduction Range queries in multiple dimensions: Computer Aided Design (CAD) Geo-data applications Support special data objects.
Spatial Data Management
Strategies for Spatial Joins
Many slides taken from George Kollios, Boston University
Spatial Queries Nearest Neighbor and Join Queries.
Spatial Indexing.
Spatial Indexing I Point Access Methods.
R-tree: Indexing Structure for Data in Multi-dimensional Space
15-826: Multimedia Databases and Data Mining
Spatial Indexing I R-trees
Multidimensional Search Structures
C. Faloutsos Spatial Access Methods - R-trees
Presentation transcript:

Spatial Indexing SAMs

Spatial Indexing Point Access Methods can index only points. What about regions? Use the transformation technique Z-ordering and quadtrees New methods: Spatial Access Methods SAMs R-tree and variations

Problem Given a collection of geometric objects (points, lines, polygons, ...) organize them on disk, to answer spatial queries (range, nn, etc)

R-tree z-ordering: cuts regions to pieces -> dup. elim. how could we avoid that? Idea: try to extend/merge B-trees and k-d trees R-tree: a generalization of the B+-tree for multidimensional spaces

Kd-Btrees [Robinson, 81]: if f is the fanout, split point-set in f parts; and so on, recursively

Kd-Btrees But: insertions/deletions are tricky (splits may propagate downwards and upwards) no guarantee on space utilization

R-trees [Guttman 84] Main idea: allow parents to overlap! => guaranteed 50% utilization => easier insertion/split algorithms. (only deal with Minimum Bounding Rectangles - MBRs)

R-trees A multi-way external memory tree Index nodes and data (leaf) nodes All leaf nodes appear on the same level Every node contains between m and M entries The root node has at least 2 entries (children)

Example eg., w/ fanout 4: group nearby rectangles to parent MBRs; each group -> disk page I C A G F H B J E D

Example F=4 P1 P3 I C A G F H B J A B C H I J E P4 P2 D D E F G

Example F=4 P1 P3 I C A G F H B J E P4 P2 D P1 P2 P3 P4 A B C H I J D

R-trees - format of nodes {(MBR; obj_ptr)} for leaf nodes P1 P2 P3 P4 x-low; x-high y-low; y-high ... obj ptr A B C ...

R-trees - format of nodes {(MBR; node_ptr)} for non-leaf nodes x-low; x-high y-low; y-high ... node ptr P1 P2 P3 P4 ... A B C

R-trees:Search P1 P3 I C A G F H B J E P4 P2 D P1 P2 P3 P4 A B C H I J

R-trees:Search P1 P3 I C A G F H B J E P4 P2 D P1 P2 P3 P4 A B C H I J

R-trees:Search Main points: every parent node completely covers its ‘children’ a child MBR may be covered by more than one parent - it is stored under ONLY ONE of them. (ie., no need for dup. elim.) a point query may follow multiple branches. everything works for any dimensionality

R-trees:Insertion Insert X P1 P3 I C A G F H B X J E P4 P2 D P1 P2 P3

R-trees:Insertion Insert Y P1 P3 I C A G F H B J Y E P4 P2 D P1 P2 P3

R-trees:Insertion Extend the parent MBR P1 P3 I C A G F H B J Y E P4

R-trees:Insertion How to find the next node to insert the new object? Using ChooseLeaf: Find the entry that needs the least enlargement to include Y. Resolve ties using the area (smallest) Other methods (later)

R-trees:Insertion P1 K P3 I C A G W F H B J E P4 P2 D If node is full then Split : ex. Insert w P1 K P3 I P1 P2 P3 P4 C A G W F H B J A B C K H I J E P4 P2 D D E F G

R-trees:Split (A1: plane sweep, until 50% of rectangles) P1 K Split node P1: partition the MBRs into two groups. (A1: plane sweep, until 50% of rectangles) A2: ‘linear’ split A3: quadratic split A4: exponential split: 2M-1 choices P1 K C A W B

R-trees:Split seed2 R seed1 pick two rectangles as ‘seeds’; assign each rectangle ‘R’ to the ‘closest’ ‘seed’ seed2 R seed1

R-trees:Split seed2 R seed1 pick two rectangles as ‘seeds’; assign each rectangle ‘R’ to the ‘closest’ ‘seed’: ‘closest’: the smallest increase in area seed2 R seed1

R-trees:Split How to pick Seeds: Linear:Find the highest and lowest side in each dimension, normalize the separations, choose the pair with the greatest normalized separation Quadratic: For each pair E1 and E2, calculate the rectangle J=MBR(E1, E2) and d= J-E1-E2. Choose the pair with the largest d

R-trees:Insertion Use the ChooseLeaf to find the leaf node to insert an entry E If leaf node is full, then Split, otherwise insert there Propagate the split upwards, if necessary Adjust parent nodes

R-Trees:Deletion Other method (later) Find the leaf node that contains the entry E Remove E from this node If underflow: Eliminate the node by removing the node entries and the parent entry Reinsert the orphaned (other entries) into the tree using Insert Other method (later)

R-trees: Variations R+-tree: DO not allow overlapping, so split the objects (similar to z-values) R*-tree: change the insertion, deletion algorithms (minimize not only area but also perimeter, forced re-insertion ) Hilbert R-tree: use the Hilbert values to insert objects into the tree