Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.

Slides:



Advertisements
Similar presentations
Spatial Indexing SAMs. Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation.
Advertisements

External Memory Hashing. Model of Computation Data stored on disk(s) Minimum transfer unit: a page = b bytes or B records (or block) N records -> N/B.
Multidimensional Indexing
Access Methods for Advanced Database Applications.
Multidimensional Data
Dr. Kalpakis CMSC 661, Principles of Database Systems Index Structures [13]
Multidimensional Data. Many applications of databases are "geographic" = 2­dimensional data. Others involve large numbers of dimensions. Example: data.
COMP 451/651 B-Trees Size and Lookup Chapter 1.
Spatial Indexing I Point Access Methods. PAMs Point Access Methods Multidimensional Hashing: Grid File Exponential growth of the directory Hierarchical.
2-dimensional indexing structure
Spatial indexing PAMs (II).
Spatial Indexing SAMs. Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation.
Multiple-key indexes Index on one attribute provides pointer to an index on the other. If V is a value of the first attribute, then the index we reach.
1 Hash-Based Indexes Yanlei Diao UMass Amherst Feb 22, 2006 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
B+-tree and Hashing.
CPSC 231 B-Trees (D.H.)1 LEARNING OBJECTIVES Problems with simple indexing. Multilevel indexing: B-Tree. –B-Tree creation: insertion and deletion of nodes.
I/O-Algorithms Lars Arge University of Aarhus March 1, 2005.
I/O-Algorithms Lars Arge Spring 2009 March 3, 2009.
1 Hash-Based Indexes Chapter Introduction  Hash-based indexes are best for equality selections. Cannot support range searches.  Static and dynamic.
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
Chapter 3: Data Storage and Access Methods
Spatial Indexing I Point Access Methods.
1 Hash-Based Indexes Chapter Introduction : Hash-based Indexes  Best for equality selections.  Cannot support range searches.  Static and dynamic.
1 Geometric index structures April 15, 2004 Based on GUW Chapter , [Arge01] Sections 1, 2.1 (persistent B- trees), 3-4 (static versions.
R-Trees 2-dimensional indexing structure. R-trees 2-dimensional version of the B-tree: B-tree of maximum degree 8; degree between 3 and 8 Internal nodes.
Spatial Indexing SAMs. Spatial Access Methods PAMs Grid File kd-tree based (LSD-, hB- trees) Z-ordering + B+-tree R-tree Variations: R*-tree, Hilbert.
What we have covered? Indexing and Hashing Data warehouse and OLAP
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
Multidimensional Data Many applications of databases are ``geographic'' = 2­dimensional data. Others involve large numbers of dimensions. Example: data.
CS4432: Database Systems II
Chapter 61 Chapter 6 Index Structures for Files. Chapter 62 Indexes Indexes are additional auxiliary access structures with typically provide either faster.
Data Structures for Computer Graphics Point Based Representations and Data Structures Lectured by Vlastimil Havran.
1 SD-Rtree: A Scalable Distributed Rtree Witold Litwin & Cédric du Mouza & Philippe Rigaux.
The X-Tree An Index Structure for High Dimensional Data Stefan Berchtold, Daniel A Keim, Hans Peter Kriegel Institute of Computer Science Munich, Germany.
Multidimensional Indexes Applications: geographical databases, data cubes. Types of queries: –partial match (give only a subset of the dimensions) –range.
File Processing : Index and Hash 2015, Spring Pusan National University Ki-Joune Li.
Hashing and Hash-Based Index. Selection Queries Yes! Hashing  static hashing  dynamic hashing B+-tree is perfect, but.... to answer a selection query.
B + -Trees. Motivation An AVL tree with N nodes is an excellent data structure for searching, indexing, etc. The Big-Oh analysis shows that most operations.
Lecture 2: External Memory Indexing Structures CS6931 Database Seminar.
B-Tree – Delete Delete 3. Delete 8. Delete
1 CPS216: Data-intensive Computing Systems Operators for Data Access (contd.) Shivnath Babu.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Hash-Based Indexes Chapter 11 Modified by Donghui Zhang Jan 30, 2006.
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
Marwan Al-Namari Hassan Al-Mathami. Indexing What is Indexing? Indexing is a mechanisms. Why we need to use Indexing? We used indexing to speed up access.
1 CPS216: Advanced Database Systems Notes 05: Operators for Data Access (contd.) Shivnath Babu.
R-Trees: A Dynamic Index Structure For Spatial Searching Antonin Guttman.
1 CSIS 7101: CSIS 7101: Spatial Data (Part 1) The R*-tree : An Efficient and Robust Access Method for Points and Rectangles Rollo Chan Chu Chung Man Mak.
File Processing : Multi-dimensional Index 2015, Spring Pusan National University Ki-Joune Li.
Chapter 5 Record Storage and Primary File Organizations
Grid Files Multi-dimensional Index Structures. Jaruloj Chongstitvatana 2006Grid Files 2 Properties of Grid Files  Support multi-dimensional data, but.
CPSC 8620Notes 61 CPSC 8620: Database Management System Design Notes 6: Hashing and More.
Multidimensional Access Structures COMP3017 Advanced Databases Dr Nicholas Gibbins –
Spatial Data Management
Multiway Search Trees Data may not fit into main memory
Multidimensional Access Structures
Indexing ? Why ? Need to locate the actual records on disk without having to read the entire table into memory.
Spatial Indexing I Point Access Methods.
Database Management Systems (CS 564)
The Quad tree The index is represented as a quaternary tree
Spatial Indexing I Point Access Methods
External Memory Hashing
Spatial Indexing I Point Access Methods
External Memory Hashing
Multidimensional Indexes
B+Trees The slides for this text are organized into chapters. This lecture covers Chapter 9. Chapter 1: Introduction to Database Systems Chapter 2: The.
Spatial Indexing I R-trees
Database Design and Programming
CPS216: Advanced Database Systems
File Processing : Multi-dimensional Index
Index Structures Chapter 13 of GUW September 16, 2019
Presentation transcript:

Spatial Indexing I Point Access Methods

Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical (tree-based) structures Multidimensional Hashing Space filling curve SAM: index both points and regions Transformations Overlapping regions Clipping methods

The problem Given a point set and a rectangular query, find the points enclosed in the query Query

Grid File Hashing methods for multidimensional points (extension of Extensible hashing) Idea: Use a grid to partition the space  each cell is associated with one page Two disk access principle (exact match)

Grid File Space Partitioning strategy but different from a tree. Select dividers along each dimension. Partition space into cells Dividers cut all the way. Each cell corresponds to 1 disk page. Many cells can point to the same page. Cell directory potentially exponential in the number of dimensions

Grid File Implementation Dynamic structure using a grid directory Grid array: a 2 dimensional array with pointers to buckets (this array can be large, disk resident) G(0,…, nx-1, 0, …, ny-1) Linear scales: Two 1 dimensional arrays that used to access the grid array (main memory) X(0, …, nx-1), Y(0, …, ny-1)

Example Linear scale X Linear scale Y Grid Directory Buckets/Disk Blocks

Grid File Search Exact Match Search: at most 2 I/Os assuming linear scales fit in memory. First use liner scales to determine the index into the cell directory access the cell directory to retrieve the bucket address (may cause 1 I/O if cell directory does not fit in memory) access the appropriate bucket (1 I/O) Range Queries: use linear scales to determine the index into the cell directory. Access the cell directory to retrieve the bucket addresses of buckets to visit. Access the buckets.

Grid File Insertions Determine the bucket into which insertion must occur. If space in bucket, insert. Else, split bucket how to choose a good dimension to split? If bucket split causes a cell directory to split do so and adjust linear scales. insertion of these new entries potentially requires a complete reorganization of the cell directory--- expensive!!!

Grid File Deletions Deletions may decrease the space utilization. Merge buckets We need to decide which cells to merge and a merging threshold Buddy system and neighbor system A bucket can merge with only one buddy in each dimension Merge adjacent regions if the result is a rectangle

Tree-based PAMs Most of tb-PAMs are based on kd-tree kd-tree is a main memory binary tree for indexing k-dimensional points Needs to be adapted for disk model Levels rotate among the dimensions, partitioning the space based on a value for that dimension kd-tree is not necessarily balanced

kd-tree At each level we use a different dimension x=5 y=3 y=6 x=6 A B C D E

Kd-tree properties Height of the tree O(log n) Search time for exact match: O(log n) Search time for range query: O(n 1/2 + k)

kd-tree example X=5 y=5 y=6 x=3 y=2 x=8x=7 X=5X=8 X=7 X=3 Y=6 Y=2

External memory kd-trees Similar to B-tree, tree nodes split many ways instead of two ways insertion becomes quite complex and expensive. No storage utilization guarantee since when a higher level node splits, the split has to be propagated all the way to leaf level resulting in many empty blocks. Pack many interior nodes (forming a subtree) into a block. it may not be feasible to group nodes at lower level into a block productively. Many interesting papers on how to optimally pack nodes into blocks recently published.

LSD-tree Local Split Decision – tree Use kd-tree to partition the space. Each partition contains up to B points. The kd-tree is stored in main-memory. If the kd-tree (directory) is large, we store a sub-tree on disk Goal: the structure must remain balanced: external balancing property

Example: LSD-tree

LSD-tree: main points Split strategies: Data dependent Distribution dependent Paging algorithm Two types of splits: bucket splits and internal node splits