Spatial Databases - Indexing

Slides:



Advertisements
Similar presentations
Spatial Indexing SAMs. Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation.
Advertisements

Spatial Join Queries. Spatial Queries Given a collection of geometric objects (points, lines, polygons,...) organize them on disk, to answer point queries.
Multidimensional Indexing
Access Methods for Advanced Database Applications.
File Processing : Hash 2015, Spring Pusan National University Ki-Joune Li.
A Simple and Efficient Algorithm for R-Tree Packing Scott T. Leutenegger, Mario A. Lopez, Jeffrey Edgington STR Sunho Cho Jeonghun Ahn 1.
Spatial Mining.
Multidimensional Data. Many applications of databases are "geographic" = 2­dimensional data. Others involve large numbers of dimensions. Example: data.
Spatial Indexing I Point Access Methods. PAMs Point Access Methods Multidimensional Hashing: Grid File Exponential growth of the directory Hierarchical.
2-dimensional indexing structure
Spatial indexing PAMs (II).
CS CS4432: Database Systems II Basic indexing.
Spatial Indexing SAMs. Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation.
Multiple-key indexes Index on one attribute provides pointer to an index on the other. If V is a value of the first attribute, then the index we reach.
Spatial Indexing for NN retrieval
Spatial Indexing SAMs. Spatial Access Methods PAMs Grid File kd-tree based (LSD-, hB- trees) Z-ordering + B+-tree R-tree Variations: R*-tree, Hilbert.
Accessing Spatial Data
CPSC 231 B-Trees (D.H.)1 LEARNING OBJECTIVES Problems with simple indexing. Multilevel indexing: B-Tree. –B-Tree creation: insertion and deletion of nodes.
Spatial Indexing SAMs.
Spatial Queries Nearest Neighbor and Join Queries.
Multi-dimensional Indexes
Spatial Information Systems (SIS) COMP Spatial access methods: Indexing.
Lars Arge1, Mark de Berg2, Herman Haverkort3 and Ke Yi1
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
Chapter 3: Data Storage and Access Methods
Spatial Indexing I Point Access Methods.
Spatial Queries Nearest Neighbor Queries.
Spatio-Temporal Databases. Introduction Spatiotemporal Databases: manage spatial data whose geometry changes over time Geometry: position and/or extent.
R-Trees 2-dimensional indexing structure. R-trees 2-dimensional version of the B-tree: B-tree of maximum degree 8; degree between 3 and 8 Internal nodes.
Spatial Indexing SAMs. Spatial Access Methods PAMs Grid File kd-tree based (LSD-, hB- trees) Z-ordering + B+-tree R-tree Variations: R*-tree, Hilbert.
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
Data Structures for Computer Graphics Point Based Representations and Data Structures Lectured by Vlastimil Havran.
Trees for spatial data representation and searching
Spatial Data Management Chapter 28. Types of Spatial Data Point Data –Points in a multidimensional space E.g., Raster data such as satellite imagery,
SEMILARITY JOIN COP6731 Advanced Database Systems.
The X-Tree An Index Structure for High Dimensional Data Stefan Berchtold, Daniel A Keim, Hans Peter Kriegel Institute of Computer Science Munich, Germany.
1 CPS216: Advanced Database Systems Notes 04: Operators for Data Access Shivnath Babu.
Multidimensional Indexes Applications: geographical databases, data cubes. Types of queries: –partial match (give only a subset of the dimensions) –range.
File Processing : Index and Hash 2015, Spring Pusan National University Ki-Joune Li.
Em Spatiotemporal Database Laboratory Pusan National University File Processing : Index and Hash 2004, Spring Pusan National University Ki-Joune Li.
R-Tree. 2 Spatial Database (Ia) Consider: Given a city map, ‘index’ all university buildings in an efficient structure for quick topological search.
Nearest Neighbor Queries Chris Buzzerd, Dave Boerner, and Kevin Stewart.
Bin Yao (Slides made available by Feifei Li) R-tree: Indexing Structure for Data in Multi- dimensional Space.
Spatial Database 2/5/2011 Reference – Ramakrishna Gerhke and Silbershatz.
Spatial Indexing Techniques Introduction to Spatial Computing CSE 5ISC Some slides adapted from Spatial Databases: A Tour by Shashi Shekhar Prentice Hall.
Spatial Databases - Representation
1 CSIS 7101: CSIS 7101: Spatial Data (Part 1) The R*-tree : An Efficient and Robust Access Method for Points and Rectangles Rollo Chan Chu Chung Man Mak.
File Processing : Multi-dimensional Index 2015, Spring Pusan National University Ki-Joune Li.
STR: A Simple and Efficient Algorithm for R-Tree Packing.
Multidimensional Access Methods Ho Hoang Nguyen Nguyen Thanh Trong Dao Vu Quoc Trung Ngo Phuoc Huong Thien DATABASE.
Jeremy Iverson & Zhang Yun 1.  Chapter 6 Key Concepts ◦ Structures and access methods ◦ R-Tree  R*-Tree  Mobile Object Indexing  Questions 2.
Multidimensional Access Structures COMP3017 Advanced Databases Dr Nicholas Gibbins –
Em Spatiotemporal Database Laboratory Pusan National University File Processing : Hash 2004, Spring Pusan National University Ki-Joune Li.
Spatial Data Management
Spatial Queries Nearest Neighbor and Join Queries.
Spatial Indexing I Point Access Methods.
Database Management Systems (CS 564)
The Quad tree The index is represented as a quaternary tree
Query Processing in Databases Dr. M. Gavrilova
CS222P: Principles of Data Management Notes #6 Index Overview and ISAM Tree Index Instructor: Chen Li.
Spatial Indexing I R-trees
File Processing : Index and Hash
Database Design and Programming
2018, Spring Pusan National University Ki-Joune Li
CS222/CS122C: Principles of Data Management Notes #6 Index Overview and ISAM Tree Index Instructor: Chen Li.
File Processing : Multi-dimensional Index
Multidimensional Search Structures
CS222/CS122C: Principles of Data Management UCI, Fall 2018 Notes #05 Index Overview and ISAM Tree Index Instructor: Chen Li.
Index Structures Chapter 13 of GUW September 16, 2019
Presentation transcript:

Spatial Databases - Indexing Spring, 2015 Ki-Joune Li

What is Indexing ? Indexing : Fight against TIME Example Suppose that you have a Hamlet, and you want to know the name of Hamlet’s father. Without Index : Full (Sequential) Scan of the book With Index : Direct Access to the Page Hamlet

Some Constraints Modern Database What should we do ? Very Huge Volume : e.g. several peta bytes Storage on Disk Inevitable But slow (cf. main memory) : msec. vs. nano sec. Even in Main Memory Database System What should we do ? Minimize the number of Disk Access

The Objective of Indexing Database in Disk Query Condition Disk Address (Block Number) Indexing

Classification of Indexing According to the type of query and data Alphanumeric query Image Spatial What is the nearest post office to the Louvre Museum ? Spatial predicate Spatial Index Database in Disk Spatial Query Disk Address (Block Number)

Spatial Query Sophisticated Types of Spatial Query One Scan Query Region Query : Containment, Intersection K-Nearest Neighbor Query Multi-Scan Query : Join Spatial Join Distance Join Spatial Query Processing Tightly coupled with Spatial Indexing Method

Spatial Processing Strategy Filtering and Refinement Strategy Index Verification of Geometry Complete Data Candidates Spatial Query Result Simplification of Geometry Filtering Refinement 1. More Light Index : e.g. < 1 M bytes 2. Remove Unnecessary Disk Accesses

Classification of Spatial Indexing Methods Hashing and Indexing Index (in wide sense) Hashing, Indexing (in narrow sense) Space Decomposition vs. MBR Decomposition of a space : Whole Space Bounding Rectangle : Only Interesting Area Dimensionality No Transformation to Higher Dimension To Lower Dimension : Linearization

Indexing vs. Hashing Hashing Indexing (in narrow sense) 1. b = h(r.key) 2. Store(r, b) Block number is determined by hashing function or mechanism Only for primary index Search by a hashing function Indexing (in narrow sense) 1. b = Store(r ) 2. Insert(B, (r.key, b) ) Block number is independent from indexing mechanism For primary or secondary index Search by a data structure called index

Decomposition vs. Bounding Region

Decomposition Methods Grid File : An Extension of Hashing to 2-D Variation Fixed Grid Grid File Multi-Level Grid File Hierarchical Data Structure KD-tree Quadtree skd-tree etc.

Fixed Grid Most Simple Method Minimum Data for Hashing 1 Disk Page Query Window 20 30 40 50 10 1. Find intersecting grids 2. Find corresponding blocks 3. Read objects from the blocks 4. Refinement

Problems of Fixed Grid Only for Point Object Large Dead Space Object with measure : duplicated storage Degrade performance Large Dead Space Causes Unnecessary Disk Accesses Not very Flexible On Distribution Query Window 20 30 40 50 10

Grid File To overcome problems of Fixed Grid Reduce Dead Space within a cell Increase Blocking Factor Query Window Directory Grid Boundary Block# A (0,0),(15,20) Page 0 B (15,0),(30,20) Page 1 . . . I (30,28),(50,40) Page 15 40 28 20 15 20 30 50

Blocking Factor A Key Factor on performance How to increase Bf ? Number of Objects in a Disk Block Number of Disk Accesses How to increase Bf ? Increase Block Size : not always possible Packing

Problems of Fixed Grid Only for Point Object Still Large Dead Space Large Size of Directory Directory Grid Boundary Block# A (0,0),(15,20) Page 0 B (15,0),(30,20) Page 1 . . . I (30,28),(50,40) Page 15

Hierarchical Decomposition To overcome the size of directory in Grid File Hierarchical Structure of Directory Acceleration of Search

KD-tree : Index Extension of Binary Tree to K-Dimension (K=2 for us) Example : suppose Bf =3 A Directory B E x=20 y=20 y=10 x=30 =< < 15 A B E 10 D Each leaf node points to the disk page A C C D 20 30

KD-tree : Search B E x=20 =< < y=20 y=10 15 A A B x=30 E 10 D A

Weak Points of KD-tree Only for Point Objects Dead Space How to Store Tree Structure on Disk Space Blocking Problem Widely used for main memory index Rarely used for disk resident index Unbalanced Tree Zipf’s Law (or 80/20 law) Most events are concentrated Leads highly skewed tree B E D A C

Quadtree Extension of KD-tree : KD-tree : binary split Quadtree 4-way equi-split instead Example : Bf =3 C D F A F Each leaf node points to the disk page B E B C D E G H I J H J G A I

Weak Points of Quadtree Same Problems of KD-tree In addition to the lack of flexibility Only for Point Objects Dead Space How to Store Tree Structure on Disk Space Blocking Problem Widely used for main memory index Rarely used for disk resident index Unbalanced Tree Zipf’s Law (or 80/20 law) Most events are concentrated Leads highly skewed tree

Point Quadtree A Simple Variation of Quadtree Specification of Partition Point instead of equi-split More Adaptive to the distribution of objects Less Skewed (10,20) (5,25) A (5,25) F (35,10) (10,20) B C D E G H I J (35,10)

Linear Quadtree : Space-Filling Curve Quadtree but another representation Linearization by Space-Filling Curve 11 6 13 N-order Hilbert Column-wise Linearize points(or cells) by their peano-key

Linear Quadtree Example : N-order curve Computation of Peano-Key : Bit-Interleaving 11 1. Binary representation of coordinates (10,01) 10 2. Bit-Interleaving x = 1 0 y = 0 1 01 Peano key = 1 0 0 1 00 = 9 00 01 10 11

MBR Methods MBR (Minimum Bounding Box) R-tree and its variants Two dimensional geometric simplification of objects Not the Whole space, only in the region occupied by objects R-tree and its variants (X1max, X2max ) (X1min, X2min)

R-tree Construction of R-tree : Sequence of Insertion Upward Split B C E A H F G I B C D D E F G H I J K J K A Leaf node points to the disk page 2-D Objects Construction of R-tree : Sequence of Insertion Upward Split

Splitting in R-tree Split MBR in the case of overflow Line sweeping : Compare Cost-X and Cost-Y  New MBR Splitting Line Cost Measure Area, Perimeter Overlapping Area

R-tree : Query Processing B C E A A H Query Region W F B B C C D D I G D E E F F G G H H I I J J K K J Candidate K A Read its exact geometry from databaseCandidate Refinement Sample : http://www.dbnet.ece.ntua.gr/~mario/rtree/

Strength of R-tree For point and non-point Objects Good for non-uniform distribution Paged Tree Hierarchical Structure but Balanced Less Dead Space than Decomposition Methods A B C D E J K C D H I E F G

Weak Points of R-tree : Overlapping Area Overlapping : False Matching Query Region A B C J D E F K G H I L M A B G C L H K J D I K E F M False Matching : Visit unnecessary node Performance Degradation

Weak Points of R-tree : Dead Space Query Region A B G C L H J D I E K F M At least one visit at this node (K) even though there is nothing

Weak Points of R-tree : Bad Split Good Split Bad Split 1. Make them as COMPACT as possible 2. Preserve spatial proximity as possible

Improvement of R-tree Minimize Or Make it more COMPACT Overlapping area Dead Space Or Make it more COMPACT Preserve Spatial Proximity Two approaches Packing (or Bulk Loading) Good Split or Insertion Strategies

R*-tree : An Improvement of R-tree Re-Insertion Strategy on Overflow Overflow Newly Inserted Object Delete and Re-Insert this

R*-tree : An Improvement of R-tree Re-Insertion Strategy on Overflow More Compact Re-Inserted Object

R*-tree : An Improvement of R-tree Compact Small Overlapping Area Small Sum of MBR area or perimeters Small Dead Space Stable : Not very affected by the order of insertions The most widely used spatial indexing method

Packing R-tree : Improvement of R-tree Preprocessing for making R-tree more compact Hilbert R-tree STR (Sort-Tile Recursive) Uniformization Instead of Sequential Insertions

Hilbert Packing Hilbert Curve A Space Filling Curve Linearize spatial objects by their peano-key N-order Hilbert Column-wise

Hilbert Packing Hilbert Packing Example: Bf =3 Sort objects by Hilbert key Packing by round-robin way Maximize storage utilization Minimum Dead Space, and Sum of MBR area Example: Bf =3

STR (Sort-Tile Recursive) Basic idea : “tile” the data space using vertical slices r : number of rectangles n : blocking factor P ( leaf node page ) = Example Suppose r = 25, n =3 nTile = 9, nV = 3, nH = 3

Comparison : Hilbert Packing vs. STR HP Large Objects STR HP Points STR

Uniformization Non-Uniform Distribution Uniformization Technique Negative Effect on the performance But in real applications : Non-Uniform Uniformization Technique Step 1 : Transform Non-Uniform data to Uniform by STR Step 2 : Apply R-tree (or Fixed Grid) Step 3 : Transform Query Region Strength High Storage Utilization Very Simple and Good Performance

Uniformization Non Equi-Width Equi-Width 1. Area of each cell : identical 2. Number of objects within each cell : almost identical

Uniformization : Example By Delaunay Triangulation By STR Original

Uniformization : Example Original By STR

Query Processing by R-tree : Nearest Neighbor Query Point Searching Space 2nd Distances in 2-D Minimum

Query Processing by R-tree : Nearest Neighbor Branching Branching Pruning Minimum

Transformation to Higher Space Transformation to Higher Dimension Transform non-point object to point object Reuse of spatial indexing methods (e.g. Grid File) applicable only to point objects to non-point objects Example Max C B B A  A C Amin Amax Min

Corner Transformation From 2-D to 4-D   1. Simplification by MBR 2. MBR ((Xmin, Ymin), (Xmax, Ymax)) to Point (Xmin, Ymin, Xmax, Ymax) (Xmax, Ymax) (Xmin, Ymin)

Query Processing for Corner Transformation : 1-D Example W Query : Find Contained Objects Max VI IV III A V II A  I Min Amin Amax Region I : Wmax < Amin Region II : W  A Region III : Amax < Wmin Region IV : Amin < Wmin, Amax < Wmax Region V : Wmin < Amin, Wmax < Amax Region VI : A  W

Transformation to Lower Dimension : Linear Quadtree 1. Simplification of Geometry (22, 0) 2. Compute Peano Key with lower-left corner (28, 1) (23, 0) 3. If necessary, divide it and give peano key to each 4. Define the size of each piece according to the number of quadrants 4. Insert them into B-tree 5. Query Processing by B-tree (0, 2)