Presentation is loading. Please wait.

Presentation is loading. Please wait.

Query Execution Index Based Algorithms (15.6)

Similar presentations


Presentation on theme: "Query Execution Index Based Algorithms (15.6)"— Presentation transcript:

1 Query Execution Index Based Algorithms (15.6)
Ashish Sharma CS-257 ID:118

2 Clustering and Nonclustering Indexes
A relation is '.clustered" if its tuples are packed into roughly as few blocks as can possibly hold those tuples. Clustering Indexes, which are indexes on an attribute or attributes such that all the tuples with a fixed value for the search key of this index appear on roughly as few blocks as can hold them. Note that a relation that isn't clustered cannot have a clustering index, but even a clustered relation can have nonclustering indexes. A clustering index has all tuples with a fixed value packed into the minimum possible number of blocks

3 Index-Based Selection
Selection on equality: sa=v(R) Clustered index on a: cost B(R)/V(R,a) If the index on R.a is clustering, then the number of disk I/O's to retrieve the set sa=v (R) will average B(R)/V(R, a). The actual number may be somewhat higher. Unclustered index on a: cost T(R)/V(R,a) If the index on R.a is nonclustering, each tuple we retrieve will be on a different block, and we must access T(R)/V(R,a) tuples. Thus, T(R)/V(R, a) is an estimate of the number of disk I/O’s we need.

4 Cost of index based selection:
Example: B(R) = 2000, T(R) = 100,000, V(R, a) = 20, compute the cost of sa=v(R) Cost of table scan: If R is clustered: B(R) = 2000 I/Os If R is unclustered: T(R) = 100,000 I/Os Cost of index based selection: If index is clustered: B(R)/V(R,a) = 100 If index is unclustered: T(R)/V(R,a) = 5000 Notice: when V(R,a) is small, then unclustered index is useless

5 Joining by Using an Index
R S : This is a Natural Join Assume S has an index on the join attribute Iterate over R, for each tuple fetch corresponding tuple(s) from S Assume R is clustered. Cost: If index is clustered: B(R) + T(R)B(S)/V(S,a) If index is unclustered: B(R) + T(R)T(S)/V(S,a)

6 Assume both R and S have a sorted index (B+ tree) on the join attribute
Then perform a merge join (called zig-zag join) Cost: B(R) + B(S)

7 Example : Let us consider our running example, relations R(X, Y) and S(Y, Z) covering 1000 and 500 blocks, respectively. Assume ten tuples of either relation fit on one block, so T(R) = 10,000 and T(S) = Also, assume V(S, Y) = 100; i.e., there are 100 different values of Y among the tuples of S. Suppose that R is clustered, and there is a clustering index on Y for S. Then the approximate number of disk I/O1s, excluding what is needed to access the index itself, is to read the blocks of R plus 10,000 x 300 / 100 = 50,000 disk I/O's. This number is considerably above the cost of other methods for the same data discussed previously. If either R or the index on S is not clustered, then the cost is even higher

8 Joins Using a Sorted Index
Still consider R(X,Y) S(Y,Z) Assume there's a sorted index on both R.Y and S.Y B-tree or a sorted sequential index Scan both indexes in the increasing order of Y like merge-join, without need to sort first if index dense, can skip nonmatching tuples without loading them very efficient

9 When the index is a B-tree
When the index is a B-tree. or any other structure from which we easily can extract the tuples of a relation in sorted order, we have a number of other opportunities to use the index. Perhaps the simplest is when we want to compute R(X, Y) S(Y, Z), and we have such an index on Y for either R or S. We can then perform an ordinary sort-join, but we do not have to perform the intermediate step of sorting one of the relations on Y.

10 As an extreme case, if we have sorting indexes on Y for both R and S, then we need to perform only the final step of the simple sort-based join. This method is sometimes called zig-zag join, because we jump back and forth between the indexes finding Y-values that they share in common. Notice that tuples from R with a Y-value that does not appear in S need never be retrieved, and similarly, tuples of S whose Y-value does not appear in R need not be retrieved

11 A zig-zag join using two indexes.

12 Example: Suppose that we have relations R(X,Y) and S(Y, Z) with indexes on Y for both relations. In a tiny example, let the search keys (Y-values) for the tuples of R be in order 1,3,4,4,4,5,6, and let the search key values for S be 2,2,4,4,6,7. We start with the first keys of R and S, which are 1 and 2, respectively. Since 1 < 2, we skip the first key of R and look at the second key, 3. Now, the current key of S is less than the current key of R, so we skip the two 2's of S to reach 4. At this point, the key 3 of R is less than the key of S, so we skip the key of R. Now, both current keys are 4. \Ire follow the pointers associated with all the keys 4 from both relations, retrieve the corresponding tuples, and join them. Notice that until we met the common key 4, no tuples of the relation were retrieved. Having dispensed with the 4's: we go to key 5 of R and key 6 of S. Since 5 < 6, we skip to the next key of R. Now the keys are both 6, so we retrieve the corresponding tuples and join them. Since R is now exhausted, we know there are no more pairs of tuples from the two relations that join

13 THANK YOU


Download ppt "Query Execution Index Based Algorithms (15.6)"

Similar presentations


Ads by Google