Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Chapter 3. Tuning Indexes May 2002 Prof. Sang Ho Lee School of Computing, Soongsil University

Similar presentations


Presentation on theme: "1 Chapter 3. Tuning Indexes May 2002 Prof. Sang Ho Lee School of Computing, Soongsil University"— Presentation transcript:

1 1 Chapter 3. Tuning Indexes May 2002 Prof. Sang Ho Lee School of Computing, Soongsil University shlee@computing.soongsil.ac.kr

2 2 Type of Queries (1) A point query returns at most one record based on an equality selection  Select name from employee where id = 8478; A multipoint query returns several records based on an equality selection  Select name from employee where salary = 40000; A range query returns a set of records whose values lie in an interval  Select name from employee where salary >= 50000 and salary < 60000; A prefix match query on an attribute or sequence of attributes X Specifies only a prefix of X  last name = ‘Gates’ and first name like ‘G%’

3 3 Type of Queries (2) An extremal query obtains a set of records whose value on some attribute is a minimum or maximum  Select name from employee where salary = MAX(select salary from employee); An ordering query displays a set of records in order of the value of some attribute(s)  Select * from employee order by salary; A grouping query partitions the results of a query into groups  Select dept, AVG(salary) from employee group by dept; A join query links two or more tables  Select employee.ssnum from employee, student where employee.ssum = student.ssnum

4 4 Key Types Two kinds of keys with respect to a table T  A sequential key: key’s value is monotonic with insertion order  A non-sequential key: key’s value is unrelated to the insertion order to table

5 5 Review of Index Terminology Index = data structure (Btree, Hash, ISAM) + pointers to data ISAM structure: Balanced tree structure with a predetermined number of levels. Interior nodes of an ISAM structure never change, but there may be overflow chain at the leaves. Use ISAM when range queries are important and there are few updates Sparse index = one pointer per data page Dense index = one pointer per data record Clustering index = may be sparse or dense and implies a data organization  e.g. Btree on A, data records sorted on A Non-clustering index = enforces nothing about data organization, so must be dense

6 6 ISAM (Index Sequential Access Method) These structures are similar to B-trees but are optimized for infrequently changing data Each leaf page maps to a track and sibling leaves are on the same or neighboring tracks If an insertion occurs to a full page, use overflow chaining In ISAM structures, tree is fixed but leaves may overflow, forming chains

7 7 Advantages/Disadvantages in ISAM Advantages  Support range queries even better than B-trees, because the utilization of each page often near 100%, reducing the number of pages that need to be retrieved  Reduce locking overhead, because nonleaf nodes of the structure are never locked Disadvantages  Sequential keys cause concurrency bottleneck on ISAM structures when there are many inserts  Whether inserts are sequential or not, ISAM structures will overflow under heavy insert traffic  This structures also suffer from large key sizes

8 8 Clustering vs. Non-clustering Clustering index may be sparse (up to the DBMS implementer) Good for range (e.g., R.A between 5 and 15) and prefix queries (e.g., R.Name like ‘Sm%’) Near key values in data structures correspond to near tuples Good for concurrency — Usually

9 9 Clustering Indexes and Concurrency Control If no clustering index, then insertions occur at end of a heap (file organized by time of insertion) Concurrent insertions will then conflict on last page of heap If there is a clustering index, e.g. on SocSecNum, then consecutive insertions will likely to far apart. Low contention. However, if key value proportional to time of insertion, then clustering index based on B-tree causes a problem. WHY ??? Would hashing make any difference???

10 10 Clustering Index — Bad News Inserts tend to be placed in the middle of the table. This can cause overflows, destroying the benefits of clustering Similarly for updates So, may be a good idea to use fairly low page utilization when using a clustering index

11 11 Non-clustering Index Non-clustering index = data structure but no imposition on structure of table. May have several per table Dense, so some queries can be answered without access to table. For example, assume a non-clustering index on attributes A, B, and C of R Select B, C from R where A = 5; Good for point queries and for selective multi-point, and for extremal queries. May or may not be useful for join queries.

12 12 Non-clustering Indexes and Selectivity — Example 1  Pages are 4Kbytes  Attribute A takes on 20 different values  Query is multi-point on A  There is a non-clustering index on A If record is 50 bytes long, there are 80 records per page. Nearly every page will have a matching record Don’t create non-clustering index

13 13 Non-clustering Indexes and Selectivity — Example 2  Pages are 4Kbytes  Attribute A takes on 20 different values  Query is multi-point on A  There is a non-clustering index on A If record is 2000 bytes long, there are 2 records per page. Only 1 in ten pages will have a matching record. Create non-clustering index

14 14 Quantitative Conclusions F — number of records per page D — number of different values of attribute P — number of pages prefetched when performing a scan  If D < F * P, then no non-clustering index  Otherwise, create non-clustering index provided multipoint queries on attribute are frequent Practice: Derive this.

15 15 Composite Indexes (1) An index based on more than one attribute Clustering or non-clustering The poor performance in inverse order to composite index  City(name, latitude, longitude, population)  clustering composite index (latitude, longitude) select name from city where population >= 10000 and latitude = 22 and longitude >= 5 and longitude <= 15 select name from city where population >= 10000 and longitude = 22 and latitude >= 5 and latitude <= 15

16 16 Composite Indexes (2) Benefits  A dense composite index can something answer a query completely  A query on all attributes of a composite index will likely return far fewer records than a query on only some of those attributes  A composite index is an efficient way to support the uniqueness of multiple attributes  A composite index can support kinds of geographical queries Disadvantages  A large key size  An update to any of its attribute will cause the index to be modified

17 17 Index Types Offered (1) DEC RDB  Clustering: dense B-tree, sparse hash  Non-clustering: B-tree IBM DB2, SQL/DS, OS/2  Clustering: dense B-tree  Non-clustering: B-tree Oracle  Clustering: dense B-tree, sparse hash  Non-clustering: B-tree

18 18 Index Types Offered (2) Ingres  Clustering: dense B-tree, sparse hash, sparse ISAM  Non-clustering: B-tree, hash, ISAM Sybase  Clustering: sparse B-tree  Non-clustering: B-tree

19 19 General Care and Feeding Here are some maintenance tips on indexes  Give them a face lift: Eliminate overflow chains on index or table Drop indexes when they hurt performance. (Rebuild clustering indexes after sorting first) Example: During a batch insertion, you may be better off having no secondary indexes Check query plan  Run the catalog statistics update package regularly

20 20 Reasons System Might Not Use an Index Catalog may not be up to date Optimizer may believe table is too small Query may be badly specified. For example, in some systems (e.g. Oracle v6 and earlier)  Select * from employee where salary/12 >= 4000; would not use salary index whereas following would Select * from employee where salary >= 4000 * 12;

21 21 Reasons System Might Not Use an Index (2)  The use of a string function: Select * from employee where substr(name, 1, 1) = ‘G’;  A comparison with NULL Select * from employee where salary is null;  The use of a bind variable Apply a different type to corresponded table’s attribute type

22 22 Scenario 1 Employee(ssnum, name, dept, manager, salary) There are no updates. Here are queries. Which indexes should be established?  Count all the employees that have a certain salary (frequent)  Find the employees that have the maximum (or minimum) salary within a particular department (rare)  Find the employee with a certain social security number (frequent)

23 23 Scenario 1 — Action Non-clustering index on salary, since the first query can be answered solely based on the non-clustering index on salary Sparse clustering index on social security number if the employee tuples are small, because a sparse index may be a level shorter than a dense one Non-clustering composite index on (dept, salary) using a B- tree for second query, should it become more important

24 24 Scenario 2 The Employee table has no clustering indexes Performs poorly when there are bursts of inserts Locking is page-based

25 25 Scenario 2 — Action Employee is organized as a heap, so all inserts to employee occur on the last page, making that page a locking hot spot  Find some way to smooth out the bursts of inserts  Use record-level locking  Create a clustering index based on hashing on customer number or social security number. Alternatively, create a B-tree-based clustering index on social security number.


Download ppt "1 Chapter 3. Tuning Indexes May 2002 Prof. Sang Ho Lee School of Computing, Soongsil University"

Similar presentations


Ads by Google