Indexes and Performance BCHB697
Outline Constraints… Indexes Query optimization …on primary key, foreign key Indexes Single value index strategies, balanced tree impl. Unique, sorting Impact of column data-type, values Cost of insertion As in-memory column subset Query optimization Logically similar, but slower? Explain (SQL query execution strategy) Sorted results (order by, group by) BCHB697 - Edwards
Constraints Guarantees enforced by the DBMS whenever the data is changed: Primary key uniqueness Foreign key referential integrity Valid values (for enumerated data-types, esp.) Typically implemented using indexes! Adds to insert/delete cost Turn off, for bulk loads Usually, an index for the primary key is automatically made for each table. BCHB697 - Edwards
Linear Search Unsorted, O(n) Sorted, O(n) Can we do better? Have to examine every element Quick to insert (append) Sorted, O(n) Can stop early (element is not present) Slow to insert, O(n) Can we do better? BCHB697 - Edwards
Binary Search Sorted, O(log n) Check middle value, element is to left or right Insert, O(log n) BCHB697 - Edwards
Binary Search Conceptually, we can represent this algorithm as a tree. BCHB697 - Edwards
Balanced tree index No longer single values, disk blocks instead For good performance, require balanced tree Data Modeling Essentials BCHB697 - Edwards
Primary Key Index Index is a separate "table" Identifies the disk block containing the record (Also) sorted, (much) smaller, keep in memory Fast access to a few records BCHB697 - Edwards Fundamental of Database Systems
Types of Indexes Unique – primary key indexes Sorting index Sometimes algorithms can take advantage Sorting index The order of records in data-blocks of table Usually primary key order Access records on the same disk block cheaply Numeric index has obvious sort order Strings sort lexicographically BCHB697 - Edwards
Indexes and Data-Type Indexes on integer columns are fastest Widest variety of data-structures and algorithms Indexes are most useful when columns have lots of distinct values: "information content" of a column – how many rows have each value Strings are indexed from the beginning of the string: like 'w%' can use the index, but like '%w' cannot BCHB697 - Edwards
When to use an index Why not add an index on every column? Slow down inserts, updates, deletes Uses memory, disk-space Index won't necessarily speed up specific queries Create indexes for: Foreign keys (+ primary keys, of course) Columns with good information content used often in where constraints Columns used for sorting, grouping, distinct Columns frequently shown in results BCHB697 - Edwards
Indexes: Cost of insertion Each index increases the amount of work to insert a row into the table Trade off: time for insert vs time for retrieve Depending on complexity of data-structure – may need very expensive "rebalancing" every so often For bulk inserts: Turn off indexes, bulk load, turn on indexes Created unsorted table (append), sort once Turn off constraints too (implemented using indexes) BCHB697 - Edwards
Indexes: In-Memory Column Cache Indexes are (usually) smaller than the corresponding tables Indexes are held in memory (frequently accessed) Each index has the values of some columns If the query can be satisfied entirely from an index: FAST! Existence check using primary key: FAST! BCHB697 - Edwards
Query optimization DBMS is responsible for deciding to use an index to satisfy each query Scanning entire table(s) is always correct, but slow An index may not be used, or may not help DBMS keeps statistics on each column! Some conditions "fake out" DBMS heuristics like '%w', !=, arithmetic order by, group by require sorting the entire result max vs order by limit 1 Some queries return too many rows BCHB697 - Edwards
Query optimization Load entire taxa database into bigtaxa; 15 minutes later… cd /opt/lampp ./bin/mysql -u root < ~/bigtaxa.sql BCHB697 - Edwards
Query optimization Load entire taxa database into bigtaxa; 0.0023 seconds 4.0637 seconds select * from taxonomy select * from taxonomy order by scientific_name BCHB697 - Edwards
Query optimization 0.0395 seconds 0.6636 seconds select * from taxonomy where scientific_name like 'w%' select * from taxonomy where scientific_name like '%w' BCHB697 - Edwards
Query optimization Add index on scientific_name column 0.0037 seconds select * from taxonomy where scientific_name like 'w%' select * from taxonomy where scientific_name like '%w' BCHB697 - Edwards
Explain The SQL keyword explain in front of a query will report on the query execution strategy explain select * from taxonomy where scientific_name like 'w%' explain select * from taxonomy where scientific_name like '%w' BCHB697 - Edwards
Query optimization Avoid order by, group by, distinct unless you need it 0.0016 seconds 0.0107 seconds 3.0367 seconds select scientific_name from taxonomy select distinct scientific_name from taxonomy select scientific_name from taxonomy order by scientific_name BCHB697 - Edwards
Query optimization 2.9806 seconds 2.9581 seconds select * from taxonomy where parent_id = 9606; select * from taxonomy where parent_id + 1 = 9607 BCHB697 - Edwards
Query optimization Add index to parent_id… 0.0025 seconds select * from taxonomy where parent_id = 9606; select * from taxonomy where parent_id + 1 = 9607 BCHB697 - Edwards
Query optimization select * from taxonomy where parent_id = 9606; BCHB697 - Edwards
Query optimization select customer.first_name, customer.last_name, address.address, city.city, country.country from customer join address on customer.address_id = address.address_id join city on address.city_id = city.city_id join country on city.country_id = country.country_id order by country.country, city.city, customer.last_name BCHB697 - Edwards
Query optimization BCHB697 - Edwards
Query optimization select film.film_id, film.title, group_concat(concat(actor.first_name," ", actor.last_name) order by actor.last_name separator "; ") from film join film_actor on film.film_id = film_actor.film_id join actor on film_actor.actor_id = actor.actor_id group by film.film_id, film.title BCHB697 - Edwards
Query optimization BCHB697 - Edwards
Exercise Load the bigtaxa database, experiment! Reproduce the examples in the lecture. Look at the indexes in the sakila database Try the queries from lecture 6 and 7 Use explain to figure out which indexes are used. Delete (or add) some indexes Try the queries again Which queries get slower (faster)? BCHB697 - Edwards