DB performance tuning using indexes Section 8.5 and Chapters 20 (Raghu)

What do you need to know?  A relational operator can be executed using different physical algorithms  An SQL query corresponds to a relational algebra expression and may map to alternative equivalent execution plans

Join Operation  Several different algorithms to implement joins Nested-loop join Nested-loop join Block nested-loop join Block nested-loop join Indexed nested-loop join Indexed nested-loop join Merge-join Merge-join Hash-join Hash-join  Choice based on cost estimate

Nested-Loop (NL) Join  To compute the theta join r  s for each tuple t r in r do begin for each tuple t s in s do begin test pair (t r,t s ) to see if they satisfy the join condition  if they do, add t r.t s to the result. end end  r is called the outer relation and s the inner relation of the join.  Requires no indices and can be used with any kind of join condition.  Expensive since it examines every pair of tuples in the two relations.

Merge-Join 1. Sort both relations on their join attribute (if not already sorted on the join attributes). 2. Merge the sorted relations to join them 1. Join step is similar to the merge stage of the sort-merge algorithm. 2. Main difference is handling of duplicate values in join attribute: every pair with same value on join attribute must be matched

Equivalent execution plans select d.customer-name from branch b, account a, depositor d where b.branch-name = a.branch-name and a.account-number = d.account-number and b.branch-city = ‘Brooklyn’

Understanding the Workload  For each query in the workload: Which relations does it access? Which relations does it access? Which attributes are retrieved? Which attributes are retrieved? Which attributes are involved in selection/join conditions? How selective are these conditions likely to be? Which attributes are involved in selection/join conditions? How selective are these conditions likely to be?  For each update in the workload: Which attributes are involved in selection/join conditions? How selective are these conditions likely to be? Which attributes are involved in selection/join conditions? How selective are these conditions likely to be? The type of update ( INSERT/DELETE/UPDATE ), and the attributes that are affected. The type of update ( INSERT/DELETE/UPDATE ), and the attributes that are affected.

Choice of Indexes  What indexes should we create? Which relations should have indexes? What field(s) should be the search key? Should we build several indexes? Which relations should have indexes? What field(s) should be the search key? Should we build several indexes?  For each index, what kind of an index should it be? Clustered? Hash/tree? Clustered? Hash/tree?

Choice of Indexes (Contd.)  One approach: Consider the most important queries in turn. Consider the best plan using the current indexes, and see if a better plan is possible with an additional index. If so, create it. Obviously, this implies that we must understand how a DBMS evaluates queries and creates query evaluation plans! Obviously, this implies that we must understand how a DBMS evaluates queries and creates query evaluation plans! For now, we discuss simple 1-table queries. For now, we discuss simple 1-table queries.  Before creating an index, must also consider the impact on updates in the workload! Trade-off: Indexes can make queries go faster, updates slower. Require disk space, too. Trade-off: Indexes can make queries go faster, updates slower. Require disk space, too.

Index Selection Guidelines  Attributes in WHERE clause are candidates for index keys. Exact match condition suggests hash index. Exact match condition suggests hash index. Range query suggests tree index. Range query suggests tree index. Clustering is especially useful for range queries; can also help on equality queries if there are many duplicates.Clustering is especially useful for range queries; can also help on equality queries if there are many duplicates.  Multi-attribute search keys should be considered when a WHERE clause contains several conditions. Order of attributes is important for range queries. Order of attributes is important for range queries. Such indexes can sometimes enable index-only strategies for important queries. Such indexes can sometimes enable index-only strategies for important queries. For index-only strategies, clustering is not important!For index-only strategies, clustering is not important!  Try to choose indexes that benefit as many queries as possible. Since only one index can be clustered per relation, choose it based on important queries that would benefit the most from clustering.

Examples of Clustered Indexes  B+ tree index on E.age can be used to get qualifying tuples. How selective is the condition? How selective is the condition? Is the index clustered? Is the index clustered?  Consider the GROUP BY query. If many tuples have E.age > 10, using E.age index and sorting the retrieved tuples may be costly. If many tuples have E.age > 10, using E.age index and sorting the retrieved tuples may be costly. Clustered E.dno index may be better! Clustered E.dno index may be better!  Equality queries and duplicates: Clustering on E.hobby helps! Clustering on E.hobby helps! SELECT E.dno FROM Emp E WHERE E.age>40 SELECT E.dno, COUNT (*) FROM Emp E WHERE E.age>10 GROUP BY E.dno SELECT E.dno FROM Emp E WHERE E.hobby=Stamps

Indexes with Composite Search Keys  Composite Search Keys: Search on a combination of fields. Equality query: Every field value is equal to a constant value. E.g. wrt index: Equality query: Every field value is equal to a constant value. E.g. wrt index: age=20 and sal =75age=20 and sal =75 Range query: Some field value is not a constant. E.g.: Range query: Some field value is not a constant. E.g.: age =20; or age=20 and sal > 10age =20; or age=20 and sal > 10  Data entries in index sorted by search key to support range queries. Lexicographic order, or Lexicographic order, or Spatial order. Spatial order. sue1375 bob cal joe12 10 20 8011 12 nameagesal 12,20 12,10 11,80 13,75 20,12 10,12 75,13 80,11 11 12 13 10 20 75 80 Data records sorted by name Data entries in index sorted by Data entries sorted by Examples of composite key indexes using lexicographic order.

Composite Search Keys  To retrieve Emp records with age=30 AND sal=4000, an index on would be better than an index on age or an index on sal. Choice of index key orthogonal to clustering etc. Choice of index key orthogonal to clustering etc.  If condition is: 20<age<30 AND 3000<sal<5000: Clustered tree index on or is best. Clustered tree index on or is best.  If condition is: age=30 AND 3000<sal<5000: Clustered index much better than index! Clustered index much better than index!  Composite indexes are larger, updated more often.

Index-Only Plans  A number of queries can be answered without retrieving any tuples from one or more of the relations involved if a suitable index is available. SELECT E.dno, COUNT (*) FROM Emp E GROUP BY E.dno SELECT E.dno, MIN (E.sal) FROM Emp E GROUP BY E.dno SELECT AVG (E.sal) FROM Emp E WHERE E.age=25 AND E.sal BETWEEN 3000 AND 5000 Tree index! or Tree!

Index Selection for Joins (20.3)  When considering a join condition: Hash index on inner relation (where the search key includes the join columns) is very good for Index Nested Loops. Hash index on inner relation (where the search key includes the join columns) is very good for Index Nested Loops. Should be clustered if join column is not key for inner, and inner tuples need to be retrieved.Should be clustered if join column is not key for inner, and inner tuples need to be retrieved. Clustered B+ tree on join column(s) good for Sort-Merge, because don’t need to sort. Clustered B+ tree on join column(s) good for Sort-Merge, because don’t need to sort.  As examples show, our choice of indexes is guided by the plan(s) that we expect an optimizer to consider for a query.

Example 1  Hash index on D.dname supports ‘Toy’ selection. Given this, index on D.dno is not needed. Given this, index on D.dno is not needed.  Hash index on E.dno allows us to get matching (inner) Emp tuples for each selected (outer) Dept tuple. SELECT E.ename, D.mgr FROM Emp E, Dept D WHERE D.dname=‘Toy’ AND E.dno=D.dno Dep t Emp (hash index on dno) D.dno = E.dno dname = ‘Toy’ E.ename, D.mgr (hash index on dname) (INL)

Example 1a  What if WHERE included: ``... AND E.age=25’’ ? Could retrieve Emp tuples using index on E.age, then join with Dept tuples satisfying dname selection. Comparable to strategy that used E.dno index. Could retrieve Emp tuples using index on E.age, then join with Dept tuples satisfying dname selection. Comparable to strategy that used E.dno index. So, if E.age index is already created, this query provides much less motivation for adding an E.dno index. So, if E.age index is already created, this query provides much less motivation for adding an E.dno index. SELECT E.ename, D.mgr FROM Emp E, Dept D WHERE D.dname=‘Toy’ AND E.dno=D.dno AND E.age = 25 Dep t Emp (hash index on E.age) D.dno = E.dno D.dname = ‘Toy’ (hash index on D.dname) E.age = 25

Example 2  Clearly, Emp should be the outer relation. Suggests that we build a hash index on D.dno. Suggests that we build a hash index on D.dno.  What index should we build on Emp? B+ tree on E.sal could be used, OR an (hash) index on E.hobby could be used. Only one of these is needed, and which is better depends upon the selectivity of the conditions. B+ tree on E.sal could be used, OR an (hash) index on E.hobby could be used. Only one of these is needed, and which is better depends upon the selectivity of the conditions. As a rule of thumb, equality selections more selective than range selections.As a rule of thumb, equality selections more selective than range selections. SELECT E.ename, D.mgr FROM Emp E, Dept D WHERE E.sal BETWEEN 10000 AND 20000 AND E.hobby=‘Stamps’ AND E.dno=D.dno

Clustering and Joins (20.4)  Clustering is especially important when accessing inner tuples in INL => should make index on E.dno clustered. SELECT E.ename, D.mgr FROM Emp E, Dept D WHERE D.dname=‘Toy’ AND E.dno=D.dno Dep t Emp (hash index on E.dno) D.dno = E.dno D.dname = ‘Toy’ (hash index on D.dname) (INL)

Clustering and Joins (cont.)  If many employees collect stamps, Sort-Merge join may be worth considering. A clustered index on D.dno would help. Summary: Clustering is useful whenever many tuples are to be retrieved. SELECT E.ename, D.mgr FROM Emp E, Dept D WHERE E.hobby=‘Stamps AND E.dno=D.dno EmpDept (hash index on D.dno) dno = dno hobby = ‘Stamps’ (hash index on E.hobby) (Sort-Merge)

Rewriting queries (20.7.3, 20.9)  If a query runs slower than expected, check the plan that is being used. The choice of indexes may have to be adjusted, an index may need to be re-built, statistics may be too old, or a query may have to be rewritten.  Sometimes, the DBMS may not be executing the plan you had in mind. Common areas of weakness: Selections involving null values, arithmetic or string expressions (ex: WHERE E.age = 2*D.age ) Selections involving null values, arithmetic or string expressions (ex: WHERE E.age = 2*D.age ) Selections involving OR conditions (next slide) Selections involving OR conditions (next slide) Lack of evaluation features like index-only strategies or certain join methods or poor size estimation. Lack of evaluation features like index-only strategies or certain join methods or poor size estimation.

Rewriting queries with DISTINCT  Minimize the use of DISTINCT : don’t need it if duplicates are acceptable, don’t need it if duplicates are acceptable, or if answer contains a key. or if answer contains a key.  Goal is to avoid expensive operations, like duplicate elimination

Rewriting selections involving OR  Suppose indexes on hobby and age  Rewrite query using union. SELECT E.dno FROM Employees E WHERE E.hobby = ‘Stamps’ OR E.age = 10 = SELECT E.dno FROM Employees E WHERE E.hobby = ‘Stamps’ UNION SELECT E.dno FROM Employees E WHERE E.age = 10

Rewriting query with GROUP BY/HAVING  Minimize the use of GROUP BY and HAVING : SELECT MIN (E.age) FROM Employee E GROUP BY E.dno HAVING E.dno=102 SELECT MIN (E.age) FROM Employee E WHERE E.dno=102 =

Rewriting nested queries  Use only one “query block”, if possible. SELECT DISTINCT * FROM Sailors S WHERE S.sname IN (SELECT Y.sname FROM YoungSailors Y) SELECT DISTINCT S.* FROM Sailors S, YoungSailors Y WHERE S.sname = Y.sname =

DB performance tuning using indexes Section 8.5 and Chapters 20 (Raghu)

Similar presentations

Presentation on theme: "DB performance tuning using indexes Section 8.5 and Chapters 20 (Raghu)"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

DB performance tuning using indexes Section 8.5 and Chapters 20 (Raghu)

Similar presentations

Presentation on theme: "DB performance tuning using indexes Section 8.5 and Chapters 20 (Raghu)"— Presentation transcript:

Similar presentations

About project

Feedback