File Processing : Multi-dimensional Index

Slides:



Advertisements
Similar presentations
1 DATA STRUCTURES USED IN SPATIAL DATA MINING. 2 What is Spatial data ? broadly be defined as data which covers multidimensional points, lines, rectangles,
Advertisements

Spatial Indexing SAMs. Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation.
Spatial Data Structures Hanan Samet Computer Science Department
I/O-Algorithms Lars Arge Fall 2014 September 25, 2014.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part C Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
Multidimensional Indexing
File Processing : Hash 2015, Spring Pusan National University Ki-Joune Li.
Spatial Indexing I Point Access Methods. PAMs Point Access Methods Multidimensional Hashing: Grid File Exponential growth of the directory Hierarchical.
2-dimensional indexing structure
Chapter 15 B External Methods – B-Trees. © 2004 Pearson Addison-Wesley. All rights reserved 15 B-2 B-Trees To organize the index file as an external search.
Spatial Indexing SAMs. Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation.
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
Efficient Storage and Retrieval of Data
Chapter 3: Data Storage and Access Methods
Spatial Indexing I Point Access Methods.
1 Hash-Based Indexes Chapter Introduction : Hash-based Indexes  Best for equality selections.  Cannot support range searches.  Static and dynamic.
1 Geometric index structures April 15, 2004 Based on GUW Chapter , [Arge01] Sections 1, 2.1 (persistent B- trees), 3-4 (static versions.
R-Trees 2-dimensional indexing structure. R-trees 2-dimensional version of the B-tree: B-tree of maximum degree 8; degree between 3 and 8 Internal nodes.
Spatial Indexing SAMs. Spatial Access Methods PAMs Grid File kd-tree based (LSD-, hB- trees) Z-ordering + B+-tree R-tree Variations: R*-tree, Hilbert.
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
Multidimensional Data Many applications of databases are ``geographic'' = 2­dimensional data. Others involve large numbers of dimensions. Example: data.
Chapter 61 Chapter 6 Index Structures for Files. Chapter 62 Indexes Indexes are additional auxiliary access structures with typically provide either faster.
Indexing and Hashing (emphasis on B+ trees) By Huy Nguyen Cs157b TR Lee, Sin-Min.
Index Structures for Files Indexes speed up the retrieval of records under certain search conditions Indexes called secondary access paths do not affect.
Spatial Data Management Chapter 28. Types of Spatial Data Point Data –Points in a multidimensional space E.g., Raster data such as satellite imagery,
The X-Tree An Index Structure for High Dimensional Data Stefan Berchtold, Daniel A Keim, Hans Peter Kriegel Institute of Computer Science Munich, Germany.
Indexing for Multidimensional Data An Introduction.
Multidimensional Indexes Applications: geographical databases, data cubes. Types of queries: –partial match (give only a subset of the dimensions) –range.
Mehdi Mohammadi March Western Michigan University Department of Computer Science CS Advanced Data Structure.
File Processing : Index and Hash 2015, Spring Pusan National University Ki-Joune Li.
Em Spatiotemporal Database Laboratory Pusan National University File Processing : Index and Hash 2004, Spring Pusan National University Ki-Joune Li.
R-Tree. 2 Spatial Database (Ia) Consider: Given a city map, ‘index’ all university buildings in an efficient structure for quick topological search.
Spatial Databases - Indexing
Indexing Database Management Systems. Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files File Organization 2.
File Processing : Multi-dimensional Index 2015, Spring Pusan National University Ki-Joune Li.
Em Spatiotemporal Database Laboratory Pusan National University File Processing : Hash 2004, Spring Pusan National University Ki-Joune Li.
CS422 Principles of Database Systems Indexes
Spatial Data Management
Mehdi Kargar Department of Computer Science and Engineering
Indexing Structures for Files and Physical Database Design
Indexing and hashing.
Multidimensional Access Structures
Database System Implementation CSE 507
Spatial Indexing I Point Access Methods.
Database Management Systems (CS 564)
B-Trees Disk Storage What is a multiway tree? What is a B-tree?
The Quad tree The index is represented as a quaternary tree
CPSC-310 Database Systems
R-tree: Indexing Structure for Data in Multi-dimensional Space
Chapter 11: Indexing and Hashing
B+-Trees and Static Hashing
Indexing and Hashing Basic Concepts Ordered Indices
Multidimensional Indexes
Random inserting into a B+ Tree
B-Trees Disk Storage What is a multiway tree? What is a B-tree?
Spatial Indexing I R-trees
Adapted from Mike Franklin
File Processing : Index and Hash
Indexing and Hashing B.Ramamurthy Chapter 11 2/5/2019 B.Ramamurthy.
B-Trees Disk Storage What is a multiway tree? What is a B-tree?
Chapter 11 Indexing And Hashing (1)
2018, Spring Pusan National University Ki-Joune Li
CPS216: Advanced Database Systems
File Processing : File Organization and File Systems
File Organization.
Hash-Based Indexes Chapter 11
Lecture 20: Indexes Monday, February 27, 2006.
Donghui Zhang, Tian Xia Northeastern University
Presentation transcript:

File Processing : Multi-dimensional Index 2018, Spring Pusan National University Ki-Joune Li

Multi-Dimensional Index Multi-Attributes Query vs. Single Attribute Query Single Attribute : Only ONE attribute to specify query condition Example : Find Students whose record is in [3.5, 4.5] Multi-Attributes : Several attributes Example : Find students whose height is greater than 180 cm and weight is less than 70 Kg Each attribute corresponds to a dimension Multi-Attribute Query : Multi-Dimensional Query

Processing Multi-dimensional Queries Example : Find students whose height > 180 cm and weight < 70 Kg Method 1 : Using a B+-tree Step 1 : Apply B+-tree to search student taller than 180 cm Step 2 : Search students lighter than 70 Kg from the result of step 1 Height and Weight or Weight and Height ? < 70 Result 180

Processing Multi-dimensional Queries Method 2 : Using Two B+-trees Step 1 : Result1 ← Students taller than 180 cm by B+-tree Step 2 : Result2 ← Students lighter than 70 Kg by B+-tree Step 3 : Result ← Result1  Result2 Comparison of Method 1 and Method 2 180 < 70 Result = 

Processing Multi-dimensional Queries Method 3 : Unified Index for Several Attributes One index for several attributes Multi-Dimensional Space Two approaches Extending B+-tree Extending Dynamic Hashing Index for Height and Weight Weight Height

Extending Hashing : Grid Approach Weight Height block pointer . . . Block Pointer Array Query Fixed Variable Fixed Grid Method Grid File

Extending Hashing : Grid File Directory (x1, y1) (x2, y2) Block Pointer Query

Problem 1: Dead Space No objects in this query area 5 block accesses Dead Space  Empty space with no objects How to reduce dead space

Minimum Bounding Rectangle MBR (Minimum Bounding Rectangle) Query Only 1 Disk Access

Problem 2: Non-Point Object Where to store this object

Minimum Bounding Rectangle MBR (Minimum Bounding Box) Two dimensional geometric simplification of objects Not the Whole space, only in the region occupied by objects (X1max , X2max ) (X1min , X2min)

Extending B+-tree : R-tree B+-tree vs. R-tree B+-tree : Interval (1-D rectangle) R-tree : Multi-Dimensional Interval (Rectangle) R-tree : Rectangle B+-tree Each Node MBR (Minimum Bounding Rectangle) instead of Interval (or Delimiter) No Linked-List for External Nodes A certain amount of overlapping is indispensable

Extending B+-tree : R-tree Example Root Query

Upward Split like B-tree Split MBR in the case of overflow Line sweeping : Compare Cost-X and Cost-Y  New MBR Splitting Line

Splitting Strategy 50:50 Split Instead of 50:50 split, other cost measures Area, Perimeter Overlapping Area Good Split Bad Split 1. Make them as COMPACT as possible 2. Preserve spatial proximity as possible

R*-tree: An Improvement of R-tree Re-Insertion Strategy on Overflow Most Popular Index for Multi-Dimensional Index Overflow More Compact Newly Inserted Object Delete and Re-Insert this Re-Inserted Object