Indexes and Performance

Indexes and Performance
BCHB697

Outline Constraints… Indexes Query optimization
…on primary key, foreign key Indexes Single value index strategies, balanced tree impl. Unique, sorting Impact of column data-type, values Cost of insertion As in-memory column subset Query optimization Logically similar, but slower? Explain (SQL query execution strategy) Sorted results (order by, group by) BCHB697 - Edwards

Constraints Guarantees enforced by the DBMS whenever the data is changed: Primary key uniqueness Foreign key referential integrity Valid values (for enumerated data-types, esp.) Typically implemented using indexes! Adds to insert/delete cost Turn off, for bulk loads Usually, an index for the primary key is automatically made for each table. BCHB697 - Edwards

Linear Search Unsorted, O(n) Sorted, O(n) Can we do better?
Have to examine every element Quick to insert (append) Sorted, O(n) Can stop early (element is not present) Slow to insert, O(n) Can we do better? BCHB697 - Edwards

Binary Search Sorted, O(log n)
Check middle value, element is to left or right Insert, O(log n) BCHB697 - Edwards

Binary Search Conceptually, we can represent this algorithm as a tree.
BCHB697 - Edwards

Balanced tree index No longer single values, disk blocks instead
For good performance, require balanced tree Data Modeling Essentials BCHB697 - Edwards

Primary Key Index Index is a separate "table"
Identifies the disk block containing the record (Also) sorted, (much) smaller, keep in memory Fast access to a few records BCHB697 - Edwards Fundamental of Database Systems

Types of Indexes Unique – primary key indexes Sorting index
Sometimes algorithms can take advantage Sorting index The order of records in data-blocks of table Usually primary key order Access records on the same disk block cheaply Numeric index has obvious sort order Strings sort lexicographically BCHB697 - Edwards

Indexes and Data-Type Indexes on integer columns are fastest
Widest variety of data-structures and algorithms Indexes are most useful when columns have lots of distinct values: "information content" of a column – how many rows have each value Strings are indexed from the beginning of the string: like 'w%' can use the index, but like '%w' cannot BCHB697 - Edwards

When to use an index Why not add an index on every column?
Slow down inserts, updates, deletes Uses memory, disk-space Index won't necessarily speed up specific queries Create indexes for: Foreign keys (+ primary keys, of course) Columns with good information content used often in where constraints Columns used for sorting, grouping, distinct Columns frequently shown in results BCHB697 - Edwards

Indexes: Cost of insertion
Each index increases the amount of work to insert a row into the table Trade off: time for insert vs time for retrieve Depending on complexity of data-structure – may need very expensive "rebalancing" every so often For bulk inserts: Turn off indexes, bulk load, turn on indexes Created unsorted table (append), sort once Turn off constraints too (implemented using indexes) BCHB697 - Edwards

Indexes: In-Memory Column Cache
Indexes are (usually) smaller than the corresponding tables Indexes are held in memory (frequently accessed) Each index has the values of some columns If the query can be satisfied entirely from an index: FAST! Existence check using primary key: FAST! BCHB697 - Edwards

Query optimization DBMS is responsible for deciding to use an index to satisfy each query Scanning entire table(s) is always correct, but slow An index may not be used, or may not help DBMS keeps statistics on each column! Some conditions "fake out" DBMS heuristics like '%w', !=, arithmetic order by, group by require sorting the entire result max vs order by limit 1 Some queries return too many rows BCHB697 - Edwards

Query optimization Load entire taxa database into bigtaxa;
15 minutes later… cd /opt/lampp ./bin/mysql -u root < ~/bigtaxa.sql BCHB697 - Edwards

Query optimization Load entire taxa database into bigtaxa;
seconds seconds select * from taxonomy select * from taxonomy order by scientific_name BCHB697 - Edwards

Query optimization 0.0395 seconds 0.6636 seconds
select * from taxonomy where scientific_name like 'w%' select * from taxonomy where scientific_name like '%w' BCHB697 - Edwards

Query optimization Add index on scientific_name column 0.0037 seconds
select * from taxonomy where scientific_name like 'w%' select * from taxonomy where scientific_name like '%w' BCHB697 - Edwards

Explain The SQL keyword explain in front of a query will report on the query execution strategy explain select * from taxonomy where scientific_name like 'w%' explain select * from taxonomy where scientific_name like '%w' BCHB697 - Edwards

Query optimization Avoid order by, group by, distinct unless you need it seconds seconds seconds select scientific_name from taxonomy select distinct scientific_name from taxonomy select scientific_name from taxonomy order by scientific_name BCHB697 - Edwards

Query optimization 2.9806 seconds 2.9581 seconds
select * from taxonomy where parent_id = 9606; select * from taxonomy where parent_id + 1 = 9607 BCHB697 - Edwards

Query optimization Add index to parent_id… 0.0025 seconds
select * from taxonomy where parent_id = 9606; select * from taxonomy where parent_id + 1 = 9607 BCHB697 - Edwards

Query optimization select * from taxonomy where parent_id = 9606;
BCHB697 - Edwards

Query optimization select customer.first_name, customer.last_name,
address.address, city.city, country.country from customer join address on customer.address_id = address.address_id join city on address.city_id = city.city_id join country on city.country_id = country.country_id order by country.country, city.city, customer.last_name BCHB697 - Edwards

Query optimization BCHB697 - Edwards

Query optimization select film.film_id, film.title,
group_concat(concat(actor.first_name," ", actor.last_name) order by actor.last_name separator "; ") from film join film_actor on film.film_id = film_actor.film_id join actor on film_actor.actor_id = actor.actor_id group by film.film_id, film.title BCHB697 - Edwards

Query optimization BCHB697 - Edwards

Exercise Load the bigtaxa database, experiment!
Reproduce the examples in the lecture. Look at the indexes in the sakila database Try the queries from lecture 6 and 7 Use explain to figure out which indexes are used. Delete (or add) some indexes Try the queries again Which queries get slower (faster)? BCHB697 - Edwards

Indexes and Performance

Similar presentations

Presentation on theme: "Indexes and Performance"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Indexes and Performance

Similar presentations

Presentation on theme: "Indexes and Performance"— Presentation transcript:

Similar presentations

About project

Feedback