Native Multidimensional Indexing in Relational Databases

Slides:



Advertisements
Similar presentations
Clustered Pivot Tables for I/O-optimized Similarity Search Juraj Moško, Jakub Lokoč, Tomáš Skopal Department of Software Engineering Faculty of Mathematics.
Advertisements

Improving the Performance of M-tree Family by Nearest-Neighbor Graphs Tomáš Skopal, David Hoksza Charles University in Prague Department of Software Engineering.
ADBIS 2003 Revisiting M-tree Building Principles Tomáš Skopal 1, Jaroslav Pokorný 2, Michal Krátký 1, Václav Snášel 1 1 Department of Computer Science.
Bitmap Index Buddhika Madduma 22/03/2010 Web and Document Databases - ACS-7102.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part A Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
Bigtable: A Distributed Storage System for Structured Data F. Chang, J. Dean, S. Ghemawat, W.C. Hsieh, D.A. Wallach M. Burrows, T. Chandra, A. Fikes, R.E.
Search Engines and Information Retrieval Chapter 1.
Chapter Oracle Server An Oracle Server consists of an Oracle database (stored data, control and log files.) The Server will support SQL to define.
Context Tailoring the DBMS –To support particular applications Beyond alphanumerical data Beyond retrieve + process –To support particular hardware New.
The X-Tree An Index Structure for High Dimensional Data Stefan Berchtold, Daniel A Keim, Hans Peter Kriegel Institute of Computer Science Munich, Germany.
Chapter 13 Query Processing Melissa Jamili CS 157B November 11, 2004.
Multidimensional Indexes Applications: geographical databases, data cubes. Types of queries: –partial match (give only a subset of the dimensions) –range.
External data structures
Database Design and Management CPTG /23/2015Chapter 12 of 38 Functions of a Database Store data Store data School: student records, class schedules,
Book: Bayesian Networks : A practical guide to applications Paper-authors: Luis M. de Campos, Juan M. Fernandez-Luna, Juan F. Huete, Carlos Martine, Alfonso.
Parallel dynamic batch loading in the M-tree Jakub Lokoč Department of Software Engineering Charles University in Prague, FMP.
1/14/2005Yan Huang - CSCI5330 Database Implementation – Storage and File Structure Storage and File Structure II Some of the slides are from slides of.
Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Indexing.
DDPIn Distance and Density Based Protein Indexing David Hoksza Charles University in Prague Department of Software Engineering Czech Republic.
Marwan Al-Namari Hassan Al-Mathami. Indexing What is Indexing? Indexing is a mechanisms. Why we need to use Indexing? We used indexing to speed up access.
Indexing Database Management Systems. Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files File Organization 2.
Query Execution. Where are we? File organizations: sorted, hashed, heaps. Indexes: hash index, B+-tree Indexes can be clustered or not. Data can be stored.
Bigtable : A Distributed Storage System for Structured Data Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach Mike Burrows,
Bigtable: A Distributed Storage System for Structured Data
© 2017 by McGraw-Hill Education. This proprietary material solely for authorized instructor use. Not authorized for sale or distribution in any manner.
Fundamental of Database Systems
Storage and File Organization
CS222: Principles of Data Management Lecture #4 Catalogs, Buffer Manager, File Organizations Instructor: Chen Li.
Databases and DBMSs Todd S. Bacastow January 2005.
Fast Subsequence Matching in Time-Series Databases.
Database System Architecture and Implementation
Data Indexing Herbert A. Evans.
Module 11: File Structure
Indexing Structures for Files and Physical Database Design
CS522 Advanced database Systems
Record Storage, File Organization, and Indexes
CS 540 Database Management Systems
Indexing Goals: Store large files Support multiple search keys
Indexing and hashing.
Azita Keshmiri CS 157B Ch 12 indexing and hashing
CS522 Advanced database Systems
Physical Database Design
9. Creating and Maintaining Geographic Databases
Indexing ? Why ? Need to locate the actual records on disk without having to read the entire table into memory.
Physical Database Design and Performance
COMP 430 Intro. to Database Systems
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Data Warehouse.
Chapter Overview Understanding the Database Architecture
Evaluation of Relational Operations
Chapter 15 QUERY EXECUTION.
Database Management System (DBMS)
Hash Table.
Native Multidimensional Indexing in Relational Databases
File organization and Indexing
Chapter 11: Indexing and Hashing
Database.
Indexing and Hashing Basic Concepts Ordered Indices
قـواعــــد الـبـيــانــات
CS179G, Project In Computer Science
Multidimensional Indexes
Chapter 13: Data Storage Structures
Chapter 11 Indexing And Hashing (1)
Similarity Search: A Matching Based Approach
Overview of Query Evaluation
Database System Architecture
Prof. R. Bayer, Ph.D. Dr. Volker Markl
Chapter 11: Indexing and Hashing
CS222/CS122C: Principles of Data Management UCI, Fall 2018 Notes #03 Row/Column Stores, Heap Files, Buffer Manager, Catalogs Instructor: Chen Li.
Presentation transcript:

Native Multidimensional Indexing in Relational Databases David Hoksza, Tomáš Skopal Charles University in Prague Department of Software Engineering Czech Republic

Presentation Outline Multidimensional querying Indexing in PostgreSQL attribute number growth indexing methods contemporary DB systems Indexing in PostgreSQL user-defined access methods external indexing framework Experiments COMAD 2008

Indexing Single-attribute based indexing to avoid sequential scan B-tree Multi-attribute based indexing window query straightforward solution – multiple B-trees SELECT * FROM Products WHERE BrandID BETWEEN (3 AND 11) SELECT * FROM Products WHERE BrandId BETWEEN (13 AND 14) AND ProductTypeID BETWEEN (13 AND 24) PeriodID BETWEEN (3 AND 11) COMAD 2008

Multi-attribute Based Indexing Multiple B-trees attribute number growth → exponential growth of partial result-sets → sequential scan dimensionality curse COMAD 2008

Multi-dimensional access methods B+-tree with Compound Keys most often employed solution multiple keys – single chained value key components compared in lexicograhpical order assymetry in the order of the keys R-tree UB-tree transformation of n-dim points into 1-dim Z-address → Z-curve Z-curves divided into Z-regions indexed by a B+-tree COMAD 2008

Native Multi-dimensional Indexing SELECT * FROM Products WHERE BrandId BETWEEN (13 AND 14) AND ProductTypeID BETWEEN (13 AND 24) PeriodID BETWEEN (3 AND 11) Table rows points in n-dimensional space R-tree Queries cubes in n-dimensional space (n-dimensional windows) COMAD 2008

Contemporary DB systems COMAD 2008

PostgreSQL Object-relational DBMS Open-source Since 2005 (v. 8.0) runs on Windows Emphasis on extensibility data types operators procedural languages access methods … COMAD 2008

Relation Types in PostgreSQL Heap relations (HRs) user relations system catalog undefined order Index relations (IRs) <key,value> pairs external fast access to heap relations internal access methods’ structures COMAD 2008

User-defined Access Methods (AM) in PostgreSQL Implement a set of functions communicating with PostgreSQL’s core (AM implementation). Register the functions (located in libraries). Define an AM (index type) by connecting the functions with a newly created AM. Establish a class of operators and types for the AM. Use the index. COMAD 2008

User-defined Access Methods in PostgreSQL – cont. required functions index_build creating a structure index_insert inserts a record index_beginscan starts a new scan index_gettuple gets a record fulfilling search conditions index_getmulti gets a set of records index_endscan finishes a search index_markpos marks actual position in a scan index_restrpos returns to a marked position index_rescan repeats scan with the same structure of search keys index_bulkdelete removes a set of records index_costestimate estimates cost of a search COMAD 2008

External Indexing AM in PostgreSQL store data in IRs returning IRs’ TIDs to the core External indexing framework Storing IRs’ TIDs in external index storage PostgreSQL’s interface to access methods still requires high level of mastering of PostgreSQL’s inner mechanisms Framework for external indexing COMAD 2008

Framework interface void FW_CreateStructure (Relation index_relation); void* FW_PrepareInsert (Relation index_relation); void FW_InsertTuple (void *fw_data, Relation index_relation, IndexTuple index_tuple, BlockNumber block_number, OffsetNumber offset); void FW_FinishInsert (void *fw_data); void FW_InitSearch (IndexScanDesc scan, ScanDirection dir); bool FW_GetNextTID (IndexScanDesc scan, ScanDirection dir, BlockNumber *block_number, OffsetNumber *offset); void FW_DeleteTuple (BlockNumber block number, OffsetNumber offset); COMAD 2008

Experimental evaluation The testbed Uniform clusters of uniformly distributed (up to 15-dimensional) objects Gauss (up to 3-dimensional) objects following Gaussian distribution DBLP 435,373 DBLP database records author, type of publication, year of publication, number of pages Studied costs index access count real-time COMAD 2008

Experiments – Database Growth COMAD 2008

Experiments – Query Selectivity COMAD 2008

Experiments – Dimension Growth COMAD 2008

Experiments – Real-time COMAD 2008

Conclusion We have proposed and implemented Results show Native multidimensional indexing by R-tree Indexing framework for PostgreSQL Implementation of native external R-tree index Results show big speed-up on the real-world data according to index-access metric poor physical implementation of access methods in PostgreSQL in comparison to Oracle and MSSQL Server COMAD 2008