Presentation is loading. Please wait.

Presentation is loading. Please wait.

Native Multidimensional Indexing in Relational Databases

Similar presentations


Presentation on theme: "Native Multidimensional Indexing in Relational Databases"— Presentation transcript:

1 Native Multidimensional Indexing in Relational Databases
David Hoksza, Tomáš Skopal Charles University in Prague Department of Software Engineering Czech Republic

2 Presentation Outline Multidimensional querying Indexing in PostgreSQL
attribute number growth indexing methods contemporary DB systems Indexing in PostgreSQL user-defined access methods external indexing framework Experiments COMAD 2008

3 Indexing Single-attribute based indexing
to avoid sequential scan B-tree Multi-attribute based indexing window query straightforward solution – multiple B-trees SELECT * FROM Products WHERE BrandID BETWEEN (3 AND 11) SELECT * FROM Products WHERE BrandId BETWEEN (13 AND 14) AND ProductTypeID BETWEEN (13 AND 24) PeriodID BETWEEN (3 AND 11) COMAD 2008

4 Multi-attribute Based Indexing
Multiple B-trees attribute number growth → exponential growth of partial result-sets → sequential scan dimensionality curse COMAD 2008

5 Multi-dimensional access methods
B+-tree with Compound Keys most often employed solution multiple keys – single chained value key components compared in lexicograhpical order assymetry in the order of the keys R-tree UB-tree transformation of n-dim points into 1-dim Z-address → Z-curve Z-curves divided into Z-regions indexed by a B+-tree COMAD 2008

6 Native Multi-dimensional Indexing
SELECT * FROM Products WHERE BrandId BETWEEN (13 AND 14) AND ProductTypeID BETWEEN (13 AND 24) PeriodID BETWEEN (3 AND 11) Table rows points in n-dimensional space R-tree Queries cubes in n-dimensional space (n-dimensional windows) COMAD 2008

7 Contemporary DB systems
COMAD 2008

8 PostgreSQL Object-relational DBMS Open-source
Since 2005 (v. 8.0) runs on Windows Emphasis on extensibility data types operators procedural languages access methods COMAD 2008

9 Relation Types in PostgreSQL
Heap relations (HRs) user relations system catalog undefined order Index relations (IRs) <key,value> pairs external fast access to heap relations internal access methods’ structures COMAD 2008

10 User-defined Access Methods (AM) in PostgreSQL
Implement a set of functions communicating with PostgreSQL’s core (AM implementation). Register the functions (located in libraries). Define an AM (index type) by connecting the functions with a newly created AM. Establish a class of operators and types for the AM. Use the index. COMAD 2008

11 User-defined Access Methods in PostgreSQL – cont.
required functions index_build creating a structure index_insert inserts a record index_beginscan starts a new scan index_gettuple gets a record fulfilling search conditions index_getmulti gets a set of records index_endscan finishes a search index_markpos marks actual position in a scan index_restrpos returns to a marked position index_rescan repeats scan with the same structure of search keys index_bulkdelete removes a set of records index_costestimate estimates cost of a search COMAD 2008

12 External Indexing AM in PostgreSQL store data in IRs returning IRs’ TIDs to the core External indexing framework Storing IRs’ TIDs in external index storage PostgreSQL’s interface to access methods still requires high level of mastering of PostgreSQL’s inner mechanisms Framework for external indexing COMAD 2008

13 Framework interface void FW_CreateStructure (Relation index_relation);
void* FW_PrepareInsert (Relation index_relation); void FW_InsertTuple (void *fw_data, Relation index_relation, IndexTuple index_tuple, BlockNumber block_number, OffsetNumber offset); void FW_FinishInsert (void *fw_data); void FW_InitSearch (IndexScanDesc scan, ScanDirection dir); bool FW_GetNextTID (IndexScanDesc scan, ScanDirection dir, BlockNumber *block_number, OffsetNumber *offset); void FW_DeleteTuple (BlockNumber block number, OffsetNumber offset); COMAD 2008

14 Experimental evaluation
The testbed Uniform clusters of uniformly distributed (up to 15-dimensional) objects Gauss (up to 3-dimensional) objects following Gaussian distribution DBLP 435,373 DBLP database records author, type of publication, year of publication, number of pages Studied costs index access count real-time COMAD 2008

15 Experiments – Database Growth
COMAD 2008

16 Experiments – Query Selectivity
COMAD 2008

17 Experiments – Dimension Growth
COMAD 2008

18 Experiments – Real-time
COMAD 2008

19 Conclusion We have proposed and implemented Results show
Native multidimensional indexing by R-tree Indexing framework for PostgreSQL Implementation of native external R-tree index Results show big speed-up on the real-world data according to index-access metric poor physical implementation of access methods in PostgreSQL in comparison to Oracle and MSSQL Server COMAD 2008


Download ppt "Native Multidimensional Indexing in Relational Databases"

Similar presentations


Ads by Google