Physical Storage Indexes Partitions Materialized views March 2004 91.4904 Ron McFadyen
All DBMSs provide variations of b-trees for indexing B-tree index Indexes All DBMSs provide variations of b-trees for indexing B-tree index Bitmapped index Bitmapped join index A data warehousing DBMS will likely provide these, or variations, on these March 2004 91.4904 Ron McFadyen
Most used access structure in database systems. B-tree structures Most used access structure in database systems. There are b-trees, b+-trees, b*-trees, etc. B-trees and their variations are: balanced very good in environments with mixed reading and writing operations, concurrency, exact searches and range searches provide excellent performance when used to find a few rows in big tables are studied in 91.3902 March 2004 91.4904 Ron McFadyen
Consider the following Customer table Bitmapped index Consider the following Customer table Cust_id gender province phone 22 M Ab (403) 444-1234 44 M Mb (204) 777-6789 77 F Sk (306) 384-8474 88 F Sk (306) 384-6721 99 M Mb (204) 456-1234 Note that Mb appears in rows 2, 5 Note that Sk appears in rows 3, 4; Ab in row 1 Note that M appears in rows 1, 2, 5; F in rows 3, 4 March 2004 91.4904 Ron McFadyen
A bitmapped index comprises a collection of bit arrays. There is one bit array for each distinct value of the indexed attribute. Useful for low-cardinality attributes Since gender has two values there are two bit arrays. Each bit array has the same number of bits as there are rows in the table. If we construct a bitmapped index for gender we would have two bit arrays of 5 bits each: M 1 1 0 0 1 F 0 0 1 1 0 March 2004 91.4904 Ron McFadyen
Bitmapped index If we construct a bitmapped index for Province we would have three bit arrays of 5 bits each: Ab 1 0 0 0 0 Mb 0 1 0 0 1 Sk 0 0 1 1 0 March 2004 91.4904 Ron McFadyen
Select Sum(s.amount) From Sales s Inner Join Customer c On ( … ) Bitmapped index Consider a query Select Sum(s.amount) From Sales s Inner Join Customer c On ( … ) where c.gender =M and c.province = Mb The database query processor could choose to AND the arrays for gender=M and province=Mb The result indicates that rows 2 and 5 of Customer satisfy the query; only these two rows need to be joined to Sales M 1 1 0 0 1 Mb 0 1 0 0 1 Result 0 1 0 0 1 March 2004 91.4904 Ron McFadyen
Bitmapped Join Index In general, a join index is a structure containing index entries (attribute value, row pointers), where the attribute values are in one table, and the row pointers are to related rows in another table Consider Customer Sales Date March 2004 91.4904 Ron McFadyen
Cust_id Cust_id Store_id Date_id Join Index Customer Cust_id gender province phone 22 M Ab (403) 444-1234 44 M Mb (204) 777-6789 77 F Sk (306) 384-8474 88 F Sk (306) 384-6721 99 M Mb (204) 456-1234 Sales Cust_id Store_id Date_id Amount row 1 22 1 90 100 2 44 2 7 150 3 22 2 33 50 4 44 3 55 50 5 99 3 55 25 March 2004 91.4904 Ron McFadyen
Bitmapped Join Index In some database systems, a bitmapped join index can be created which indexes Sales by an attribute of customer: home province. Here is SQL for the index: CREATE BITMAP INDEX cust_sales_bji ON Sales(Customer.province) FROM Sales, Customer WHERE Sales.cust_id = Customer.cust_id; March 2004 91.4904 Ron McFadyen
Bitmapped Join Index There are three province values in Customer. The join index will have three entries where each has a province value and a bitmap: Mb 0 1 0 1 1 Ab 1 0 1 0 0 Sk 0 0 0 0 0 The bitmap index shows that rows 2, 4, 5 of the Sales fact table are rows for customers with province = Mb The bitmap index shows that rows 1, 3 of the Sales fact table are rows for customers with province = Ab The bitmap index shows that no rows of the Sales fact table are rows for customers with province = Sk March 2004 91.4904 Ron McFadyen
Bitmapped Join Index The bitmap join index could be used to evaluate the following query. In this query, the CUSTOMER table will not even be accessed; the query is executed using only the bitmap join index and the sales table. SELECT SUM(Sales.dollar_amount) FROM Sales, Customer WHERE Sales.cust_id = Customer.cust_id AND Customer.province = Mb; The bitmap index will show that rows 2, 4, 5 of the Sales fact table are rows for customers with province = M March 2004 91.4904 Ron McFadyen
Often the attribute is a date. Partitions Physical storage can be organized in partitions where the value of an attribute is used to determine where a row is placed. Often the attribute is a date. In warehousing, dates are often used to restrict a query. Archiving older data is facilitated since one only has to drop the oldest partition. March 2004 91.4904 Ron McFadyen
Materialized Views An MV is a view where the result set is pre-computed and stored in the database Periodically an MV must be re-computed to account for updates to its source tables MVs represent a performance enhancement to a DBMS as the result set is immediately available when the view is referenced in a SQL statement MVs are used in some cases to provide summary or aggregates for data warehousing transparently to the end-user Query optimization is a DBMS feature that may re-write an SQL statement supplied by a user. March 2004 91.4904 Ron McFadyen