Indexing Structures Database System Implementation CSE 507 Some slides adapted from Silberschatz, Korth and Sudarshan Database System Concepts – 6 th Edition.

Slides:



Advertisements
Similar presentations
Tuning: overview Rewrite SQL (Leccotech)Leccotech Create Index Redefine Main memory structures (SGA in Oracle) Change the Block Size Materialized Views,
Advertisements

CpSc 3220 File and Database Processing Lecture 17 Indexed Files.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part C Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
Indexing (Cont.) These slides are a modified version of the slides of the book “Database System Concepts” (Chapter 12), 5th Ed., McGraw-Hill,McGraw-Hill.
Chapter 11 Indexing and Hashing (2) Yonsei University 2 nd Semester, 2013 Sanghyun Park.
Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 11: Indexing.
Index Basic Concepts Indexing mechanisms used to speed up access to desired data. E.g., author catalog in library Search Key - attribute to set of attributes.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree.
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 12: Indexing and.
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 12: Indexing and.
SPRING 2004CENG 3521 Query Evaluation Chapters 12, 14.
B-trees - Hashing. 11.2Database System Concepts Review: B-trees and B+-trees Multilevel, disk-aware, balanced index methods primary or secondary dense.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
Data Indexing Herbert A. Evans. Purposes of Data Indexing What is Data Indexing? Why is it important?
José Alferes Versão modificada de Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan Chapter 12: Indexing and Hashing.
Quick Review of Apr 15 material Overflow –definition, why it happens –solutions: chaining, double hashing Hash file performance –loading factor –search.
Query Optimization 3 Cost Estimation R&G, Chapters 12, 13, 14 Lecture 15.
Bitmap Indexes.
Evaluation of Relational Operations. Relational Operations v We will consider how to implement: – Selection ( ) Selects a subset of rows from relation.
Lecture 6 Indexing Part 2 Column Stores. Indexes Recap Heap FileBitmapHash FileB+Tree InsertO(1) O( log B n ) DeleteO(P)O(1) O( log B n ) Range Scan O(P)--
Indexing and Hashing.
CS 345: Topics in Data Warehousing Thursday, October 28, 2004.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts B + -Tree Index Files Indexing mechanisms used to speed up access to desired data.  E.g.,
Computing & Information Sciences Kansas State University Friday, 24 Oct 2008CIS 560: Database System Concepts Lecture 23 of 42 Friday, 24 October 2008.
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 12: Indexing and.
Oracle Data Block Oracle Concepts Manual. Oracle Rows Oracle Concepts Manual.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
Relational Operator Evaluation. Overview Index Nested Loops Join If there is an index on the join column of one relation (say S), can make it the inner.
12.1 Chapter 12: Indexing and Hashing Spring 2009 Sections , , Problems , 12.7, 12.8, 12.13, 12.15,
Computing & Information Sciences Kansas State University Tuesday, 03 Apr 2007CIS 560: Database System Concepts Lecture 29 of 42 Tuesday, 03 April 2007.
Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 12: Query Processing.
Lecture 1- Query Processing Advanced Databases Masood Niazi Torshiz Islamic Azad university- Mashhad Branch
Variant Indexes. Specialized Indexes? Data warehouses are large databases with data integrated from many independent sources. Queries are often complex.
Computing & Information Sciences Kansas State University Monday, 03 Nov 2008CIS 560: Database System Concepts Lecture 27 of 42 Monday, 03 November 2008.
Indexing Database Management Systems. Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files File Organization 2.
Query Execution. Where are we? File organizations: sorted, hashed, heaps. Indexes: hash index, B+-tree Indexes can be clustered or not. Data can be stored.
Lecture 3 - Query Processing (continued) Advanced Databases Masood Niazi Torshiz Islamic Azad university- Mashhad Branch
Improved Query Performance With Variant Indexes Patrick O’Neil, Dallan Quass Presented by Bo Han.
Query Processing – Implementing Set Operations and Joins Chap. 19.
Implementation of Database Systems, Jarek Gryz1 Evaluation of Relational Operations Chapter 12, Part A.
Query Execution Query compiler Execution engine Index/record mgr. Buffer manager Storage manager storage User/ Application Query update Query execution.
Alon Levy 1 Relational Operations v We will consider how to implement: – Selection ( ) Selects a subset of rows from relation. – Projection ( ) Deletes.
Query Processing and Query Optimization Database System Implementation CSE 507 Some slides adapted from Silberschatz, Korth and Sudarshan Database System.
Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 11: Indexing.
Computing & Information Sciences Kansas State University Wednesday, 02 Apr 2008CIS 560: Database System Concepts Lecture 27 of 42 Wednesday, 02 April 2008.
CS4432: Database Systems II
Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 12: Query Processing.
Database System Concepts ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 12: Indexing and Hashing.
Data Indexing Herbert A. Evans.
Database Management System
Indexing ? Why ? Need to locate the actual records on disk without having to read the entire table into memory.
Database System Implementation CSE 507
Database System Implementation CSE 507
COMP 430 Intro. to Database Systems
Chapter 11: Indexing and Hashing
Chapter 12: Query Processing
Evaluation of Relational Operations
Hashing Chapter 11.
File organization and Indexing
Chapter 11: Indexing and Hashing
Relational Operations
Physical Database Design
Lecture 15: Bitmap Indexes
Lecture 2- Query Processing (continued)
Database Design and Programming
INDEXING.
Evaluation of Relational Operations: Other Techniques
Sorting We may build an index on the relation, and then use the index to read the relation in sorted order. May lead to one disk block access for each.
Chapter 11: Indexing and Hashing
Index Structures Chapter 13 of GUW September 16, 2019
Presentation transcript:

Indexing Structures Database System Implementation CSE 507 Some slides adapted from Silberschatz, Korth and Sudarshan Database System Concepts – 6 th Edition. And O’Neil et. Al., “Improved Query Performance with Variant Indexes,” ACM SIGMOD 1997.

Duplicates in a B+ tree MensFormalsMensJeans Mens* R200Hugo Boss 2PC $600 R190Levis- Medium$50 R150Gap - Large$45 ……… Leaf Node RID Sales Table indexed on the Department column

Bitmaps– A better way to index duplicates Material adapted from Silberchatz, Korth and Sudarshan  A bitmap is simply an array of bits  Records in a relation are numbered sequentially from 0 to n.  Applicable on attributes that take on a relatively small number of distinct values  E.g. gender, country, state, …  E.g. income-level (income broken up into a small number of levels such as , , , infinity)

Bitmap Indices Material adapted from Silberchatz, Korth and Sudarshan  A bitmap index on an attribute has a bitmap for each value of the attribute  Bitmap has as many bits as records  In a bitmap for value v, the bit for a record is 1 if the record has the value v for the attribute, and is 0 otherwise.

Queries on Bitmap Indices Material adapted from Silberchatz, Korth and Sudarshan  Queries are answered using bitmap operations  Intersection (and)  Union (or)  Complementation (not)

Material adapted from Silberchatz, Korth and Sudarshan  Each operation takes two bitmaps of the same size and applies the operation on corresponding bits to get the result bitmap  E.g AND = OR = NOT =  Males with income level L1: AND =  Can then retrieve required tuples.  Counting number of matching tuples is also fast. Queries on Bitmaps

Material adapted from Silberchatz, Korth and Sudarshan  Bitmaps need to be updated after insert operations.  Deletion needs to be handled properly.  Renumbering rows and shifting bits in bitmaps becomes expensive.  Existence bitmap to note if there is a valid record at a record location  Needed for complementation  not(A=v): (NOT bitmap-A-v) AND ExistenceBitmap

Some Implementation Details  Bitmap indices generally very small compared with relation size.  Density of a bitmap index:  Said to be dense if number of 1-bits are large.  For a column with 32 values  avg density = 1/32  Typically we would have millions of RowIDs in a bitmap.  In such a case bitmaps can be broken into fragments of equal size.  Each fragment fits into a single disk page/block.  If bitmaps are sparse, then convert into a RowID list representation.  If a column has many unique values then put a B+ tree on top of bitmaps for a column.

Some Implementation Details A series of bitmap fragments making up the entry for “department = sports” for a bitmap index on the “department” column of a sales table.

Some Implementation Details  Bitmap and RowID-list representations are interchangeable.  When Bitmaps are dense, then prefer bitmap representation  Else switch to a RowID representation.  Indeed a Bitmap index can part RowID lists and part Bitmaps.  Authors call this hybrid form as Value-List Index.

Projection Indexes  Reminiscent of vertical partitioning of a table.  A projection index for column duplicates all column values for lookup by ordinal number. Col1Col2 v1 v2. v k Col3Col4 Col2 v1 v2. v k projection index for col2

Projection Indexes Col2 v1 v2. v k projection index for col2  If Column length = 4Bytes; Page size = 4000 Bytes then index blocking factor = 1000  Given a row number r, page p = r/1000 ; slot s = r%1000. Projection index Vs Plain layout: If the selectivity of result set of a join = 1/50 We can expect to pick 1000*(1/50) = 20 values per page of the projection index (Assuming uniform dist) Alternatively, a plain layout can pick only *(1/50) File blocking factor < index blocking factor (records in a file are of larger size than records in index)

Bit-Sliced Indexes  A bitmap index on the “bit-level representation” of the column values.  Consider a SALES table which contains rows for all sales made in last month.  We will build a bit sliced index on the “Rupees_amount”  Interpret each amount in term of N+1 bits.  A function D(n,i) is defined for a row number n in the table:  D(n, 0) = 1 if the 1 st (LSB) bit for “Rupees_amount” in row number n is on.  D(n, 1) = 1 if the 2 nd bit for “Rupees_amount” in row number n is on. ……  D(n, i) = 1 if the i th bit for “Rupees_amount” in row number n is on.

Bit-Sliced Indexes B0 B0 B1B1 bit-slice B nn : bitmap representing set of rows non null values in the indexed column; We can have a different Existence Bitmap in addition to B nn Col2: B2B2 B3B3 B4 B4 B5 B5

Bit-Sliced Indexes  Now for each value of i = 0, 1,…. N.  Such that D(n, i) > 0 for some row in the SALES table.  We define Bitmap B i whose n th bit would contain D(n, i)  If we consider each bit to be 1 Paisa;  Then for just N = 25, we can represent up to Rs 3.35 Lakhs  Much more than a typical transaction in a departmental stores.

Question: Compare bit-sliced against a traditional bitmap index for “Rupees_amount”

Using Indexes for Aggregation  SALES table: 100 Million rows; Each row 200 bytes;  File blocking factor = 20; Size of Page/disk block = 4000 Bytes Query: Select SUM(Rupee_amount) From SALES Where condition Assume the following:  The Where condition returns 2,000,000 rows.  These are uniformly distributed and have already been determined.  B f denotes a bitmap of this result set. We can assume it to be in Main Mem (about 12 Mb in size)

Direct Access for Aggregation  Query Plan 1:  Direct access to the table to calculate the SUM  Each disk page contains 20 rows  Total number of pages in the file = 5,000,000  Result set is about 1/50 of the rows in the table.  We will loop through the B f and retrieve “Rupees_amount” from all the rows spread across 5,000,000 pages.  Under uniform distribution, we can expect to get about 0.4 records per page, in other words need to read about 2,000,000 pages.

Projection Index for Aggregation  Query Plan 2:  Use a Projection on “Rupees_Amount”; Column width = 4Bytes.  Each disk page contains 4000/4 = 1000 values of the “Rupees_Amount”  Total number of pages in the index file = 100,000  Result set is about 1/50 of the rows in the SALES table.  We will loop through the B f and retrieve “Rupees_amount” from all the rows spread across 100,000 pages.  Under uniform distribution, we can expect to get 20 records in each page. Need to read all the 100,000 pages.

Value-List Index for Aggregation  Query Plan 3:  Use a Value-List index (Bitmap) on “Rupees_Amount”; IF (COUNT(B f AND B nn ) == 0) /*All rows in the result set have NULL for rupees*/ Return NULL; SUM = 0.0 For each non-null value v in the bitmap index of “Rupees_Amount”{ Designate the set of rows with the value v as B v SUM += v* COUNT(B f AND B v ); } Return SUM

Value-List Index for Aggregation (1/3) IF (COUNT(B f AND B nn ) == 0) Return NULL; SUM = 0.0 For each non-null value v in the index of “Rupees_Amount”{ Designate the set of rows with the value v as B v SUM += v* COUNT(B f AND B v ); }  Values in “Rupees_Amount” are counted in Paisa with 20 bits each,  we can have about 10,000 distinct values  10,001 COUNTs and 10,001 ANDs  If B v is in RowIDs of 4 bytes each.  Under uniform dist  each B v 10,000 RowIDs; or 1000 per page over 10pages;

Value-List Index for Aggregation (2/3) IF (COUNT(B f AND B nn ) == 0) Return NULL; SUM = 0.0 For each non-null value v in the index of “Rupees_Amount”{ Designate the set of rows with the value v as B v SUM += v* COUNT(B f AND B v ); }  If B v is in RowIDs of 4 bytes each.  Under uniform dist  each B v 10,000 RowIDs; or 1000 per page over 10pages;  Loop over Bf for AND and COUNT; would bring in 10 pages  Total cost = 10,000 * 10 + leaf scan of B+ over 10,000 distinct values of “Rupees_Amount” + cost of 1 AND and 1 COUNT  If Bf is also in secondary memory then additional 3125 pages for first time

Value-List Index for Aggregation (3/3) IF (COUNT(B f AND B nn ) == 0) Return NULL; SUM = 0.0 For each non-null value v in the index of “Rupees_Amount”{ Designate the set of rows with the value v as B v SUM += v* COUNT(B f AND B v ); }  If B v is in Bitmap form.  Each B v is 100 Million bits; or 12,500,000 bytes; 3125 pages.  Loop over Bf for AND and COUNT; would bring in 3125 pages  Total cost = 10,000 * leaf scan of B+ over 10,000 distinct values of “Rupees_Amount” + cost of 1 AND and 1 Count

Bit-Sliced Index for Aggregation (1/2) IF (COUNT(B f AND B nn ) == 0) /*All rows in the result set have NULL for rupees*/ Return NULL; SUM = 0.0 For I = 0 to 19 { SUM += 2^I * COUNT(B f AND B i ); } Return SUM;  Adds bit-­ ‐ slice by bit-­ ‐ slice.  First counts the number of 1s in the 2^0 slice then multiplies by 2^0  Then, counts the number of 1s in the 2^1 slice then multiplies by 2^1  …..

Bit-Sliced Index for Aggregation (2/2) IF (COUNT(B f AND B nn ) == 0) Return NULL; SUM = 0.0 For I = 0 to 19 { SUM += 2^I * COUNT(B f AND B v ); } Return SUM;  21 ANDs and 21 COUNTS  Assuming B f in main memory:  B v is 100 Million Bits; or 12,500,000 bytes; or 3125 pages  Total cost = 21 * 3125 pages.

Using Indexes for Range Predicates  SALES table: 100 Million rows; Each row 200 bytes;  File blocking factor = 20; Size of Page/disk block = 4000 Bytes Assume the following:  C is a column in SALES  is a general condition based on “equality”  C-range is a range-predicate C> c1, C between c1 and c2,…, etc. Query: Select Target-List From SALES Where C-Range and

Value-List Index for Range Predicates B r = Empty Set For each entry v in the index for C that satisfies the range C { Designate the set of rows with the value v as B v B r = B r OR B v } B F = B f AND B r /* B f is the result of the */

Bit-Sliced Index for Range Predicates B GT = B LT = the empty set; B EQ = B NN For each Bit-Slice B i for C in decreasing significance{ If bit i is on in the constant c1 B LT = B LT OR (B EQ AND NOT(B i )) B EQ = B EQ AND B i else B GT = B GT OR (B EQ AND B i ) B EQ = B EQ AND (NOT B i ) } B EQ = B EQ AND B f ; …. (similarly for B GT, B LT, …) B LE = B LT OR B EQ ; B GE = B GT OR B EQ >  B GT <  B LT ==  B EQ =<  B LE >=  B GE Not Null  B NN