Download presentation
Presentation is loading. Please wait.
Published byHeather Patterson Modified over 9 years ago
1
Indexing Structures Database System Implementation CSE 507 Some slides adapted from Silberschatz, Korth and Sudarshan Database System Concepts – 6 th Edition. And O’Neil et. Al., “Improved Query Performance with Variant Indexes,” ACM SIGMOD 1997.
2
Duplicates in a B+ tree MensFormalsMensJeans Mens* R200Hugo Boss 2PC $600 R190Levis- Medium$50 R150Gap - Large$45 ……… Leaf Node RID Sales Table indexed on the Department column
3
Bitmaps– A better way to index duplicates Material adapted from Silberchatz, Korth and Sudarshan A bitmap is simply an array of bits Records in a relation are numbered sequentially from 0 to n. Applicable on attributes that take on a relatively small number of distinct values E.g. gender, country, state, … E.g. income-level (income broken up into a small number of levels such as 0-9999, 10000-19999, 20000-50000, 50000- infinity)
4
Bitmap Indices Material adapted from Silberchatz, Korth and Sudarshan A bitmap index on an attribute has a bitmap for each value of the attribute Bitmap has as many bits as records In a bitmap for value v, the bit for a record is 1 if the record has the value v for the attribute, and is 0 otherwise.
5
Queries on Bitmap Indices Material adapted from Silberchatz, Korth and Sudarshan Queries are answered using bitmap operations Intersection (and) Union (or) Complementation (not)
6
Material adapted from Silberchatz, Korth and Sudarshan Each operation takes two bitmaps of the same size and applies the operation on corresponding bits to get the result bitmap E.g. 100110 AND 110011 = 100010 100110 OR 110011 = 110111 NOT 100110 = 011001 Males with income level L1: 10010 AND 10100 = 10000 Can then retrieve required tuples. Counting number of matching tuples is also fast. Queries on Bitmaps
7
Material adapted from Silberchatz, Korth and Sudarshan Bitmaps need to be updated after insert operations. Deletion needs to be handled properly. Renumbering rows and shifting bits in bitmaps becomes expensive. Existence bitmap to note if there is a valid record at a record location Needed for complementation not(A=v): (NOT bitmap-A-v) AND ExistenceBitmap
8
Some Implementation Details Bitmap indices generally very small compared with relation size. Density of a bitmap index: Said to be dense if number of 1-bits are large. For a column with 32 values avg density = 1/32 Typically we would have millions of RowIDs in a bitmap. In such a case bitmaps can be broken into fragments of equal size. Each fragment fits into a single disk page/block. If bitmaps are sparse, then convert into a RowID list representation. If a column has many unique values then put a B+ tree on top of bitmaps for a column.
9
Some Implementation Details A series of bitmap fragments making up the entry for “department = sports” for a bitmap index on the “department” column of a sales table.
10
Some Implementation Details Bitmap and RowID-list representations are interchangeable. When Bitmaps are dense, then prefer bitmap representation Else switch to a RowID representation. Indeed a Bitmap index can part RowID lists and part Bitmaps. Authors call this hybrid form as Value-List Index.
11
Projection Indexes Reminiscent of vertical partitioning of a table. A projection index for column duplicates all column values for lookup by ordinal number. Col1Col2 v1 v2. v k Col3Col4 Col2 v1 v2. v k projection index for col2
12
Projection Indexes Col2 v1 v2. v k projection index for col2 If Column length = 4Bytes; Page size = 4000 Bytes then index blocking factor = 1000 Given a row number r, page p = r/1000 ; slot s = r%1000. Projection index Vs Plain layout: If the selectivity of result set of a join = 1/50 We can expect to pick 1000*(1/50) = 20 values per page of the projection index (Assuming uniform dist) Alternatively, a plain layout can pick only *(1/50) File blocking factor < index blocking factor (records in a file are of larger size than records in index)
13
Bit-Sliced Indexes A bitmap index on the “bit-level representation” of the column values. Consider a SALES table which contains rows for all sales made in last month. We will build a bit sliced index on the “Rupees_amount” Interpret each amount in term of N+1 bits. A function D(n,i) is defined for a row number n in the table: D(n, 0) = 1 if the 1 st (LSB) bit for “Rupees_amount” in row number n is on. D(n, 1) = 1 if the 2 nd bit for “Rupees_amount” in row number n is on. …… D(n, i) = 1 if the i th bit for “Rupees_amount” in row number n is on.
14
Bit-Sliced Indexes 0 1 0 1 1 1 1 1 0 0 0 1 0 0 0 1 1 0 0 0 1 1 1 1 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 1 1 B0 B0 B1B1 bit-slice B nn : bitmap representing set of rows non null values in the indexed column; We can have a different Existence Bitmap in addition to B nn Col2: 20 52 20 62 10 34 1 49 B2B2 B3B3 B4 B4 B5 B5
15
Bit-Sliced Indexes Now for each value of i = 0, 1,…. N. Such that D(n, i) > 0 for some row in the SALES table. We define Bitmap B i whose n th bit would contain D(n, i) If we consider each bit to be 1 Paisa; Then for just N = 25, we can represent up to Rs 3.35 Lakhs Much more than a typical transaction in a departmental stores.
16
Question: Compare bit-sliced against a traditional bitmap index for “Rupees_amount”
17
Using Indexes for Aggregation SALES table: 100 Million rows; Each row 200 bytes; File blocking factor = 20; Size of Page/disk block = 4000 Bytes Query: Select SUM(Rupee_amount) From SALES Where condition Assume the following: The Where condition returns 2,000,000 rows. These are uniformly distributed and have already been determined. B f denotes a bitmap of this result set. We can assume it to be in Main Mem (about 12 Mb in size)
18
Direct Access for Aggregation Query Plan 1: Direct access to the table to calculate the SUM Each disk page contains 20 rows Total number of pages in the file = 5,000,000 Result set is about 1/50 of the rows in the table. We will loop through the B f and retrieve “Rupees_amount” from all the rows spread across 5,000,000 pages. Under uniform distribution, we can expect to get about 0.4 records per page, in other words need to read about 2,000,000 pages.
19
Projection Index for Aggregation Query Plan 2: Use a Projection on “Rupees_Amount”; Column width = 4Bytes. Each disk page contains 4000/4 = 1000 values of the “Rupees_Amount” Total number of pages in the index file = 100,000 Result set is about 1/50 of the rows in the SALES table. We will loop through the B f and retrieve “Rupees_amount” from all the rows spread across 100,000 pages. Under uniform distribution, we can expect to get 20 records in each page. Need to read all the 100,000 pages.
20
Value-List Index for Aggregation Query Plan 3: Use a Value-List index (Bitmap) on “Rupees_Amount”; IF (COUNT(B f AND B nn ) == 0) /*All rows in the result set have NULL for rupees*/ Return NULL; SUM = 0.0 For each non-null value v in the bitmap index of “Rupees_Amount”{ Designate the set of rows with the value v as B v SUM += v* COUNT(B f AND B v ); } Return SUM
21
Value-List Index for Aggregation (1/3) IF (COUNT(B f AND B nn ) == 0) Return NULL; SUM = 0.0 For each non-null value v in the index of “Rupees_Amount”{ Designate the set of rows with the value v as B v SUM += v* COUNT(B f AND B v ); } Values in “Rupees_Amount” are counted in Paisa with 20 bits each, we can have about 10,000 distinct values 10,001 COUNTs and 10,001 ANDs If B v is in RowIDs of 4 bytes each. Under uniform dist each B v 10,000 RowIDs; or 1000 per page over 10pages;
22
Value-List Index for Aggregation (2/3) IF (COUNT(B f AND B nn ) == 0) Return NULL; SUM = 0.0 For each non-null value v in the index of “Rupees_Amount”{ Designate the set of rows with the value v as B v SUM += v* COUNT(B f AND B v ); } If B v is in RowIDs of 4 bytes each. Under uniform dist each B v 10,000 RowIDs; or 1000 per page over 10pages; Loop over Bf for AND and COUNT; would bring in 10 pages Total cost = 10,000 * 10 + leaf scan of B+ over 10,000 distinct values of “Rupees_Amount” + cost of 1 AND and 1 COUNT If Bf is also in secondary memory then additional 3125 pages for first time
23
Value-List Index for Aggregation (3/3) IF (COUNT(B f AND B nn ) == 0) Return NULL; SUM = 0.0 For each non-null value v in the index of “Rupees_Amount”{ Designate the set of rows with the value v as B v SUM += v* COUNT(B f AND B v ); } If B v is in Bitmap form. Each B v is 100 Million bits; or 12,500,000 bytes; 3125 pages. Loop over Bf for AND and COUNT; would bring in 3125 pages Total cost = 10,000 * 3125 + leaf scan of B+ over 10,000 distinct values of “Rupees_Amount” + cost of 1 AND and 1 Count
24
Bit-Sliced Index for Aggregation (1/2) IF (COUNT(B f AND B nn ) == 0) /*All rows in the result set have NULL for rupees*/ Return NULL; SUM = 0.0 For I = 0 to 19 { SUM += 2^I * COUNT(B f AND B i ); } Return SUM; Adds bit- ‐ slice by bit- ‐ slice. First counts the number of 1s in the 2^0 slice then multiplies by 2^0 Then, counts the number of 1s in the 2^1 slice then multiplies by 2^1 …..
25
Bit-Sliced Index for Aggregation (2/2) IF (COUNT(B f AND B nn ) == 0) Return NULL; SUM = 0.0 For I = 0 to 19 { SUM += 2^I * COUNT(B f AND B v ); } Return SUM; 21 ANDs and 21 COUNTS Assuming B f in main memory: B v is 100 Million Bits; or 12,500,000 bytes; or 3125 pages Total cost = 21 * 3125 pages.
26
Using Indexes for Range Predicates SALES table: 100 Million rows; Each row 200 bytes; File blocking factor = 20; Size of Page/disk block = 4000 Bytes Assume the following: C is a column in SALES is a general condition based on “equality” C-range is a range-predicate C> c1, C between c1 and c2,…, etc. Query: Select Target-List From SALES Where C-Range and
27
Value-List Index for Range Predicates B r = Empty Set For each entry v in the index for C that satisfies the range C { Designate the set of rows with the value v as B v B r = B r OR B v } B F = B f AND B r /* B f is the result of the */
28
Bit-Sliced Index for Range Predicates B GT = B LT = the empty set; B EQ = B NN For each Bit-Slice B i for C in decreasing significance{ If bit i is on in the constant c1 B LT = B LT OR (B EQ AND NOT(B i )) B EQ = B EQ AND B i else B GT = B GT OR (B EQ AND B i ) B EQ = B EQ AND (NOT B i ) } B EQ = B EQ AND B f ; …. (similarly for B GT, B LT, …) B LE = B LT OR B EQ ; B GE = B GT OR B EQ > B GT < B LT == B EQ =< B LE >= B GE Not Null B NN
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.