Indexing Structures Database System Implementation CSE 507 Some slides adapted from Silberschatz, Korth and Sudarshan Database System Concepts – 6 th Edition.

Indexing Structures Database System Implementation CSE 507 Some slides adapted from Silberschatz, Korth and Sudarshan Database System Concepts – 6 th Edition. And O’Neil et. Al., “Improved Query Performance with Variant Indexes,” ACM SIGMOD 1997.

Duplicates in a B+ tree MensFormalsMensJeans Mens* R200Hugo Boss 2PC $600 R190Levis- Medium$50 R150Gap - Large$45 ……… Leaf Node RID Sales Table indexed on the Department column

Bitmaps– A better way to index duplicates Material adapted from Silberchatz, Korth and Sudarshan  A bitmap is simply an array of bits  Records in a relation are numbered sequentially from 0 to n.  Applicable on attributes that take on a relatively small number of distinct values  E.g. gender, country, state, …  E.g. income-level (income broken up into a small number of levels such as 0-9999, 10000-19999, 20000-50000, 50000- infinity)

Bitmap Indices Material adapted from Silberchatz, Korth and Sudarshan  A bitmap index on an attribute has a bitmap for each value of the attribute  Bitmap has as many bits as records  In a bitmap for value v, the bit for a record is 1 if the record has the value v for the attribute, and is 0 otherwise.

Queries on Bitmap Indices Material adapted from Silberchatz, Korth and Sudarshan  Queries are answered using bitmap operations  Intersection (and)  Union (or)  Complementation (not)

Material adapted from Silberchatz, Korth and Sudarshan  Each operation takes two bitmaps of the same size and applies the operation on corresponding bits to get the result bitmap  E.g. 100110 AND 110011 = 100010 100110 OR 110011 = 110111 NOT 100110 = 011001  Males with income level L1: 10010 AND 10100 = 10000  Can then retrieve required tuples.  Counting number of matching tuples is also fast. Queries on Bitmaps

Material adapted from Silberchatz, Korth and Sudarshan  Bitmaps need to be updated after insert operations.  Deletion needs to be handled properly.  Renumbering rows and shifting bits in bitmaps becomes expensive.  Existence bitmap to note if there is a valid record at a record location  Needed for complementation  not(A=v): (NOT bitmap-A-v) AND ExistenceBitmap

Some Implementation Details  Bitmap indices generally very small compared with relation size.  Density of a bitmap index:  Said to be dense if number of 1-bits are large.  For a column with 32 values  avg density = 1/32  Typically we would have millions of RowIDs in a bitmap.  In such a case bitmaps can be broken into fragments of equal size.  Each fragment fits into a single disk page/block.  If bitmaps are sparse, then convert into a RowID list representation.  If a column has many unique values then put a B+ tree on top of bitmaps for a column.

Some Implementation Details A series of bitmap fragments making up the entry for “department = sports” for a bitmap index on the “department” column of a sales table.

Some Implementation Details  Bitmap and RowID-list representations are interchangeable.  When Bitmaps are dense, then prefer bitmap representation  Else switch to a RowID representation.  Indeed a Bitmap index can part RowID lists and part Bitmaps.  Authors call this hybrid form as Value-List Index.

Projection Indexes  Reminiscent of vertical partitioning of a table.  A projection index for column duplicates all column values for lookup by ordinal number. Col1Col2 v1 v2. v k Col3Col4 Col2 v1 v2. v k projection index for col2

Projection Indexes Col2 v1 v2. v k projection index for col2  If Column length = 4Bytes; Page size = 4000 Bytes then index blocking factor = 1000  Given a row number r, page p = r/1000 ; slot s = r%1000. Projection index Vs Plain layout: If the selectivity of result set of a join = 1/50 We can expect to pick 1000*(1/50) = 20 values per page of the projection index (Assuming uniform dist) Alternatively, a plain layout can pick only *(1/50) File blocking factor < index blocking factor (records in a file are of larger size than records in index)

Bit-Sliced Indexes  A bitmap index on the “bit-level representation” of the column values.  Consider a SALES table which contains rows for all sales made in last month.  We will build a bit sliced index on the “Rupees_amount”  Interpret each amount in term of N+1 bits.  A function D(n,i) is defined for a row number n in the table:  D(n, 0) = 1 if the 1 st (LSB) bit for “Rupees_amount” in row number n is on.  D(n, 1) = 1 if the 2 nd bit for “Rupees_amount” in row number n is on. ……  D(n, i) = 1 if the i th bit for “Rupees_amount” in row number n is on.

Bit-Sliced Indexes 0 1 0 1 1 1 1 1 0 0 0 1 0 0 0 1 1 0 0 0 1 1 1 1 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 1 1 B0 B0 B1B1 bit-slice B nn : bitmap representing set of rows non null values in the indexed column; We can have a different Existence Bitmap in addition to B nn Col2: 20 52 20 62 10 34 1 49 B2B2 B3B3 B4 B4 B5 B5

Bit-Sliced Indexes  Now for each value of i = 0, 1,…. N.  Such that D(n, i) > 0 for some row in the SALES table.  We define Bitmap B i whose n th bit would contain D(n, i)  If we consider each bit to be 1 Paisa;  Then for just N = 25, we can represent up to Rs 3.35 Lakhs  Much more than a typical transaction in a departmental stores.

Question: Compare bit-sliced against a traditional bitmap index for “Rupees_amount”

Using Indexes for Aggregation  SALES table: 100 Million rows; Each row 200 bytes;  File blocking factor = 20; Size of Page/disk block = 4000 Bytes Query: Select SUM(Rupee_amount) From SALES Where condition Assume the following:  The Where condition returns 2,000,000 rows.  These are uniformly distributed and have already been determined.  B f denotes a bitmap of this result set. We can assume it to be in Main Mem (about 12 Mb in size)

Direct Access for Aggregation  Query Plan 1:  Direct access to the table to calculate the SUM  Each disk page contains 20 rows  Total number of pages in the file = 5,000,000  Result set is about 1/50 of the rows in the table.  We will loop through the B f and retrieve “Rupees_amount” from all the rows spread across 5,000,000 pages.  Under uniform distribution, we can expect to get about 0.4 records per page, in other words need to read about 2,000,000 pages.

Projection Index for Aggregation  Query Plan 2:  Use a Projection on “Rupees_Amount”; Column width = 4Bytes.  Each disk page contains 4000/4 = 1000 values of the “Rupees_Amount”  Total number of pages in the index file = 100,000  Result set is about 1/50 of the rows in the SALES table.  We will loop through the B f and retrieve “Rupees_amount” from all the rows spread across 100,000 pages.  Under uniform distribution, we can expect to get 20 records in each page. Need to read all the 100,000 pages.

Value-List Index for Aggregation  Query Plan 3:  Use a Value-List index (Bitmap) on “Rupees_Amount”; IF (COUNT(B f AND B nn ) == 0) /*All rows in the result set have NULL for rupees*/ Return NULL; SUM = 0.0 For each non-null value v in the bitmap index of “Rupees_Amount”{ Designate the set of rows with the value v as B v SUM += v* COUNT(B f AND B v ); } Return SUM

Value-List Index for Aggregation (1/3) IF (COUNT(B f AND B nn ) == 0) Return NULL; SUM = 0.0 For each non-null value v in the index of “Rupees_Amount”{ Designate the set of rows with the value v as B v SUM += v* COUNT(B f AND B v ); }  Values in “Rupees_Amount” are counted in Paisa with 20 bits each,  we can have about 10,000 distinct values  10,001 COUNTs and 10,001 ANDs  If B v is in RowIDs of 4 bytes each.  Under uniform dist  each B v 10,000 RowIDs; or 1000 per page over 10pages;

Value-List Index for Aggregation (2/3) IF (COUNT(B f AND B nn ) == 0) Return NULL; SUM = 0.0 For each non-null value v in the index of “Rupees_Amount”{ Designate the set of rows with the value v as B v SUM += v* COUNT(B f AND B v ); }  If B v is in RowIDs of 4 bytes each.  Under uniform dist  each B v 10,000 RowIDs; or 1000 per page over 10pages;  Loop over Bf for AND and COUNT; would bring in 10 pages  Total cost = 10,000 * 10 + leaf scan of B+ over 10,000 distinct values of “Rupees_Amount” + cost of 1 AND and 1 COUNT  If Bf is also in secondary memory then additional 3125 pages for first time

Value-List Index for Aggregation (3/3) IF (COUNT(B f AND B nn ) == 0) Return NULL; SUM = 0.0 For each non-null value v in the index of “Rupees_Amount”{ Designate the set of rows with the value v as B v SUM += v* COUNT(B f AND B v ); }  If B v is in Bitmap form.  Each B v is 100 Million bits; or 12,500,000 bytes; 3125 pages.  Loop over Bf for AND and COUNT; would bring in 3125 pages  Total cost = 10,000 * 3125 + leaf scan of B+ over 10,000 distinct values of “Rupees_Amount” + cost of 1 AND and 1 Count

Bit-Sliced Index for Aggregation (1/2) IF (COUNT(B f AND B nn ) == 0) /*All rows in the result set have NULL for rupees*/ Return NULL; SUM = 0.0 For I = 0 to 19 { SUM += 2^I * COUNT(B f AND B i ); } Return SUM;  Adds bit- ‐ slice by bit- ‐ slice.  First counts the number of 1s in the 2^0 slice then multiplies by 2^0  Then, counts the number of 1s in the 2^1 slice then multiplies by 2^1  …..

Bit-Sliced Index for Aggregation (2/2) IF (COUNT(B f AND B nn ) == 0) Return NULL; SUM = 0.0 For I = 0 to 19 { SUM += 2^I * COUNT(B f AND B v ); } Return SUM;  21 ANDs and 21 COUNTS  Assuming B f in main memory:  B v is 100 Million Bits; or 12,500,000 bytes; or 3125 pages  Total cost = 21 * 3125 pages.

Using Indexes for Range Predicates  SALES table: 100 Million rows; Each row 200 bytes;  File blocking factor = 20; Size of Page/disk block = 4000 Bytes Assume the following:  C is a column in SALES  is a general condition based on “equality”  C-range is a range-predicate C> c1, C between c1 and c2,…, etc. Query: Select Target-List From SALES Where C-Range and

Value-List Index for Range Predicates B r = Empty Set For each entry v in the index for C that satisfies the range C { Designate the set of rows with the value v as B v B r = B r OR B v } B F = B f AND B r /* B f is the result of the */

Bit-Sliced Index for Range Predicates B GT = B LT = the empty set; B EQ = B NN For each Bit-Slice B i for C in decreasing significance{ If bit i is on in the constant c1 B LT = B LT OR (B EQ AND NOT(B i )) B EQ = B EQ AND B i else B GT = B GT OR (B EQ AND B i ) B EQ = B EQ AND (NOT B i ) } B EQ = B EQ AND B f ; …. (similarly for B GT, B LT, …) B LE = B LT OR B EQ ; B GE = B GT OR B EQ >  B GT <  B LT ==  B EQ =<  B LE >=  B GE Not Null  B NN

Indexing Structures Database System Implementation CSE 507 Some slides adapted from Silberschatz, Korth and Sudarshan Database System Concepts – 6 th Edition.

Similar presentations

Presentation on theme: "Indexing Structures Database System Implementation CSE 507 Some slides adapted from Silberschatz, Korth and Sudarshan Database System Concepts – 6 th Edition."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Indexing Structures Database System Implementation CSE 507 Some slides adapted from Silberschatz, Korth and Sudarshan Database System Concepts – 6 th Edition.

Similar presentations

Presentation on theme: "Indexing Structures Database System Implementation CSE 507 Some slides adapted from Silberschatz, Korth and Sudarshan Database System Concepts – 6 th Edition."— Presentation transcript:

Similar presentations

About project

Feedback