15.6 Index Based Algorithms huili Tang 2016-11-22
Contents Clustering and non-clustering indexes Index based Selection Joining using an index Joining using a sorted index
A database index is data structure that improves the speed of data retrieval operations on a database table at the cost of slower writes and increased storage space.
Clustered vs. Unclustered Index Index entries UNCLUSTERED CLUSTERED direct search for data entries Data entries Data entries (Index File) (Data file) Data Records Data Records
use db1; CREATE CLUSTERED INDEX IX__shipments_QTY ON dbo.shipments(QTY);
use db1; CREATE CLUSTERED INDEX IX__shipments_SNUM ON dbo.shipments(SNUM);
Clustered Index Architecture Adding a clustered index to the table has physically reordered the data pages, putting them in physical order based on the indexed column.. only 1clustered index per table In a clustered index all tuples with the same value of the key are clustered on as few blocks as possible. aaa aaaaa aa
Non-Clustered Index Architecture Does not correspond to the order of actual data. The data rows are not automatically sorted. A non-clustered index has the indexed columns and a pointer or bookmark pointing to the actual row. A table can have multiple non-clustered indexes.
Algorithms are useful for the selection operator. In a clustered relation tuples are packed roughly as few blocks, as they can possibly hold those tuples.
Index-based Selection For a selection σC(R), suppose C is of the form a=v, where a is an attribute For clustering index R.a: the number of disk I/O’s will be B(R)/V(R,a)
Index-based Selection The actual number may be higher: 1. index is not kept entirely in main memory 2. they spread over more blocks 3. may not be packed as tightly as possible into blocks
Example B(R)=1000, T(R)=20,000 number of I/O’s required: Table scan algorthm: 1. clustered, not index 1000 2. not clustered, not index 20,000 Index based algorithm: 3. If V(R,a)=100, index is clustering 10 4. If V(R,a)=10, index is nonclustering 2,000
Joining by using an index Natural join R(X, Y) , S(Y, Z) Number of I/O’s to get R Clustered: B(R) Not clustered: T(R) Number of I/O’s to get tuple t of S Clustered: T(R)B(S)/V(S,Y) Not clustered: T(R)T(S)/V(S,Y)
Example R(X,Y): 1000 blocks S(Y,Z)=500 blocks Assume 10 tuples in each block, so T(R)=10,000 and T(S)=5000 V(S,Y)=100 If R is clustered, and there is a clustering index on Y for S the number of I/O’s for R is: 1000 the number of I/O’s for S is10,000*500/100=50,000
Joining Using a Sorted index Natural join R(X, Y) S (Y, Z) with index on Y for either R or S Example: relation R(X,Y) and R(Y,Z) with index on Y for both relations search keys (Y-value) for R: 1,3,4,4,5,6 search keys (Y-value) for S: 2,2,4,6,7,8
Joining using a sorted index Used when the index is a B-tree, or structure from which we easily can extract the tuples of a relation in sorted order.