BTrees & Bitmap Indexes 14.2, 14.7 DATABASE SYSTEMS – The Complete Book Presented By: Under the supervision of: Deepti Kundu Dr. T.Y.Lin Maciej Kicinski
Structure A balance tree, meaning that all paths from the leaf node have the same length. There is a parameter n associated with each Btree block. Each block will have space for n searchkeys and n+1 pointers. The root may have only 1 parameter, but all other blocks most be at least half full.
Structure ● A typical node > ● a typical interior node would have pointers pointing to leaves with out values ● a typical leaf would have pointers point to records N search keys N+1 pointers
Application The search key of the Btree is the primary key for the data file. Data file is sorted by its primary key. Data file is sorted by an attribute that is not a key,and this attribute is the search key for the Btree.
Lookup If at an interior node, choose the correct pointer to use. This is done by comparing keys to search value.
Lookup If at a leaf node, choose the key that matches what you are looking for and the pointer for that leads to the data.
Insertion When inserting, choose the correct leaf node to put pointer to data. If node is full, create a new node and split keys between the two. Recursively move up, if cannot create new pointer to new node because full, create new node. This would end with creating a new root node, if the current root was full.
Deletion Perform lookup to find node to delete and delete it. If node is no longer half full, perform join on adjacent node and recursively delete up, or key move if that node is full and recursively change pointer up.
Efficiency Btrees allow lookup, insertion, and deletion of records using very few disk I/Os. Each level of a Btree would require one read. Then you would follow the pointer of that to the next or final read.
Efficiency Three levels are sufficient for Btrees. Having each block have 255 pointers, 255^3 is about 16.6 million. You can even reduce disk I/Os by keeping a level of a Btree in main memory. Keeping the first block with 255 pointers would reduce the reads to 2, and even possible to keep the next 255 pointers in memory to reduce reads to 1.
Bitmap Indexes Definition A bitmap index for a field F is a collection of bit-vectors of length n, one for each possible value that may appear in that field F.[1]
What does that mean? Assume relation R with 2 attributes A and B. Attribute A is of type Integer and B is of type String. 6 records, numbered 1 through 6 as shown. A B 1 30 foo 2 bar 3 40 baz 4 50 5 6
Example Continued… Value Vector foo 100100 bar 010010 baz 001001 A bitmap for attribute B is: A B 1 30 foo 2 bar 3 40 baz 4 50 5 6 Value Vector foo 100100 bar 010010 baz 001001
Where do we reach? A bitmap index is a special kind of database index that uses bitmaps.[2] Bitmap indexes have traditionally been considered to work well for data such as gender, which has a small number of distinct values, e.g., male and female, but many occurrences of those values.[2]
A little more… A collection of bit-vectors A bitmap index for attribute A of relation R is: A collection of bit-vectors The number of bit-vectors = the number of distinct values of A in R. The length of each bit-vector = the cardinality of R. The bit-vector for value v has 1 in position i, if the ith record has v in attribute A, and it has 0 there if not.[3] Records are allocated permanent numbers.[3] There is a mapping between record numbers and record addresses.[3]
Motivation for Bitmap Indexes Very efficient when used for partial match queries.[3] They offer the advantage of buckets [2] Where we find tuples with several specified attributes without first retrieving all the record that matched in each of the attributes. They can also help answer range queries [3]
Another Example Multidimensional Array of multiple types {(5,d),(79,t),(4,d),(79,d),(5,t),(6,a)} 5 = 100010 79 = 010100 4 = 001000 6 = 000001 d = 101100 t = 010010 a = 000001
The location of the record has been traced! Example Continued… {(5,d),(79,t),(4,d),(79,d),(5,t),(6,a)} Searching for items is easy, just AND together. To search for (5,d) 5 = 100010 d = 101100 100010 AND 101100 = 100000 The location of the record has been traced!
Compressed Bitmaps The number of records in R are n Assume: The number of records in R are n Attribute A has m distinct values in R The size of a bitmap index on attribute A is m*n. If m is large, then the number of 1’s will be around 1/m. Opportunity to encode A common encoding approach is called run-length encoding.[1]
Run-length encoding Represents runs A run is a sequence of i 0’s followed by a 1, by some suitable binary encoding of the integer i. A run of i 0’s followed by a 1 is encoded by: First computing how many bits are needed to represent i, Say k Then represent the run by k-1 1’s and a single 0 followed by k bits which represent i in binary. The encoding for i = 1 is 01. k = 1 The encoding for i = 0 is 00. k = 1 We concatenate the codes for each run together, and the sequence of bits is the encoding of the entire bit-vector
Understanding with an Example Let us decode the sequence 11101101001011 Staring at the beginning (left most bit): First run: The first 0 is at position 4, so k = 4. The next 4 bits are 1101, so we know that the first integer is i = 13 Second run: 001011 k = 1 i = 0 Last run: 1011 i = 3 Our entire run length is thus 13,0,3, hence our bit-vector is: 0000000000000110001
Managing Bitmap Indexes 1) How do you find a specific bit-vector for a value efficiently? 2) After selecting results that match, how do you retrieve the results efficiently? 3) When data is changed, do you you alter bitmap index?
1) Finding bit vectors Think of each bit-vector as a key to a value.[1] Any secondary storage technique will be efficient in retrieving the values.[1] Create secondary key with the attribute value as a search key [3] Btree Hash
2) Finding Records Create secondary key with the record number as a search key [3] Or in other words, Once you learn that you need record k, you can create a secondary index using the kth position as a search key.[1]
3) Handling Modifications Two things to remember: Record numbers must remain fixed once assigned Changes to data file require changes to bitmap index
Deletion Tombstone replaces deleted record Corresponding bit is set to 0
Insertion Record assigned the next record number. A bit of value 0 or 1 is appended to each bit vector If new record contains a new value of the attribute, add one bit-vector.
Modification Change the bit corresponding to the old value of the modified record to 0 Change the bit corresponding to the new value of the modified record to 1 If the new value is a new value of A, then insert a new bit-vector.