Proactively Optimizing Queries with EXPLAIN and mk-query-digest Presented by: Sheeri K. Cabral Database Operations Manager MySQL Sunday at Oracle OpenWorld
EXPLAIN SQL extension SELECT only Can modify other statements: UPDATE tbl SET fld1=“foo” WHERE fld2=“bar”; can be changed to: EXPLAIN SELECT fld1,fld2 FROM tbl;
What EXPLAIN Shows How many tables How tables are joined How data is looked up If there are subqueries, unions, sorts
What EXPLAIN Shows If WHERE, DISTINCT are used Possible and actual indexes used Length of index used Approx # of records examined
Metadata The MySQL optimizer uses metadata about cardinality, # rows, etc. InnoDB has approximate statistics InnoDB has one method of doing dives into the data MyISAM has better and more accurate metadata
EXPLAIN Output EXPLAIN returns 10 fields: mysql> EXPLAIN SELECT return_date -> FROM rental WHERE rental_id = 13534\G ******************* 1. row ******************* id: 1 select_type: SIMPLE table: rental type: const possible_keys: PRIMARY key: PRIMARY key_len: 4 ref: const rows: 1 Extra: 1 row in set (0.00 sec)
Id mysql> EXPLAIN SELECT return_date -> FROM rental WHERE rental_id = 13534\G ******************* 1. row ******************* id: 1 Id = sequential identifier One per table, subquery, derived table No row returned for a view Because it is virtual Underlying tables are represented
select_type mysql> EXPLAIN SELECT return_date -> FROM rental WHERE rental_id = 13534\G ******************* 1. row ******************* id: 1 select_type: SIMPLE SIMPLE – one table, or JOINs PRIMARY First SELECT in a UNION Outer query of a subquery UNION, UNION RESULT
select_type in UNION queries ********** 1. row ********** id: 1 select_type: PRIMARY table: staff type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 1 Extra: ********** 2. row ********** id: 2 select_type: UNION table: customer type: ALL possible_keys: NULL key: NULL mysql> EXPLAIN SELECT first_name FROM staff UNION SELECT first_name FROM customer\G key: NULL key_len: NULL ref: NULL rows: 541 Extra: ********** 3. row ********** id: NULL select_type: UNION RESULT table: type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: NULL Extra: 3 rows in set (0.00 sec)
Other select_type output Used in subqueries DEPENDENT UNION DEPENDENT SUBQUERY DERIVED UNCACHEABLE SUBQUERY
table mysql> EXPLAIN SELECT return_date -> FROM rental WHERE rental_id = 13534\G ******************* 1. row ******************* id: 1 select_type: SIMPLE table: rental Table name or alias used Aliases like t1 are difficult to follow in long EXPLAIN plans NULL table if no table is referenced or query is impossible
NULL table EXPLAIN SELECT 1+2\G EXPLAIN SELECT return_date FROM rental WHERE rental_id=0\G
type mysql> EXPLAIN SELECT return_date -> FROM rental WHERE rental_id = 13534\G ******************* 1. row ******************* id: 1 select_type: SIMPLE table: rental type: const “Data access method” Get this as good as possible
type ALL = full table scan Everything else uses an index index = full index scan If you have to scan the entire data set, use a covering index to use a full index scan instead of a full table scan. range = partial index scan , >= IS NULL, BETWEEN, IN
Range query EXPLAIN SELECT rental_id FROM rental WHERE rental_date BETWEEN ' :00:00' and ' :59:59'\G Not a range query; why? EXPLAIN SELECT rental_id FROM rental WHERE DATE(rental_date)=' '\G
type index_subquery subquery using a non-unique index of one table unique subquery Subquery using a PRIMARY or UNIQUE KEY of one table
type index_merge Use more than one index Extra field shows more information sort_union intersection union
Index Merge mysql> EXPLAIN SELECT customer_id FROM customer WHERE last_name LIKE “Hill%” OR customer_id<10\G ******************* 1. row ******************* id: 1 select_type: SIMPLE table: customer type: index_merge possible_keys: PRIMARY,idx_last_name key: idx_last_name,PRIMARY key_len: 137,2 ref: NULL rows: 10 Extra: Using sort_union(idx_last_name, PRIMARY); Using where 1 row in set (0.03 sec)
ref_or_null Joining/looking up non-unique index values JOIN uses a non-unique index or key prefix Indexed fields compared with = != Extra pass for NULL values if one may show up in the result
fulltext Data Access Strategy EXPLAIN SELECT film_id, title FROM film_text WHERE MATCH (title,description) AGAINST ('storm')\G ******************** 1. row ******************** id: 1 select_type: SIMPLE table: film_text type: fulltext possible_keys: idx_title_description key: idx_title_description key_len: 0 ref: rows: 1 Extra: Using where 1 row in set (0.00 sec)
Non-unique index values EXPLAIN SELECT rental_id FROM rental WHERE customer_id=75\G ******************** 1. row ******************** id: 1 select_type: SIMPLE table: rental type: ref possible_keys: idx_fk_customer_id key: idx_fk_customer_id key_len: 2 ref: const rows: 40 Extra: Using index 1 row in set (0.00 sec) Like ref_or_null without the extra pass
ref Joining/looking up non-unique index values JOIN uses a non-unique index or key prefix Indexed fields compared with = != Best data access strategy for non-unique values
eq_ref Joining/looking up unique index values JOIN uses a unique index or key prefix Indexed fields compared with =
eq_ref Data Access Strategy mysql> EXPLAIN SELECT return_date -> FROM rental WHERE rental_id = 13534\G ******************* 1. row ******************* id: 1 select_type: SIMPLE table: rental type: const possible_keys: PRIMARY key: PRIMARY key_len: 4 ref: const rows: 1 Extra: 1 row in set (0.00 sec)
eq_ref Data Access Strategy ********** 1. row ********** id: 1 select_type: SIMPLE table: rental type: range possible_keys: rental_date,idx_fk_customer_i d key: rental_date key_len: 8 ref: NULL rows: 2614 Extra: Using where; Using index mysql> EXPLAIN SELECT first_name,last_name FROM rental -> INNER JOIN customer USING (customer_id) -> WHERE rental_date BETWEEN ' :00:00' -> AND ' :59:59'\G ********** 2. row ********** id: 1 select_type: SIMPLE table: customer type: eq_ref possible_keys: PRIMARY key: PRIMARY key_len: 2 ref: sakila.rental.customer_id rows: 1 Extra: 2 rows in set (0.03 sec)
Fastest Data Access strategies Const – at most one value, uses PRIMARY or UNIQUE KEY mysql> EXPLAIN SELECT return_date FROM rental AS r WHERE rental_id = 13534\G System – system table, has 1 value EXPLAIN SELECT Time_zone_id, Use_leap_seconds FROM mysql.time_zone\G
Constant Propagation ********** 1. row ********** id: 1 select_type: SIMPLE table: rental type: const possible_keys: PRIMARY,idx_fk_customer_id key: PRIMARY key_len: 4 ref: const rows: 1 Extra: EXPLAIN SELECT return_date, first_name, last_name FROM rental INNER JOIN customer USING (customer_id) WHERE rental_id = 13534\G The rental table uses rental_id as a filter, and rental_id is a unique field. Thus, const makes sense What about the customer table?
Constant Propagation ********** 1. row ********** id: 1 select_type: SIMPLE table: rental type: const possible_keys: PRIMARY,idx_fk_customer_id key: PRIMARY key_len: 4 ref: const rows: 1 Extra: EXPLAIN SELECT return_date, first_name, last_name FROM rental INNER JOIN customer USING (customer_id) WHERE rental_id = 13534\G ********** 2. row ********** id: 1 select_type: SIMPLE table: customer type: const possible_keys: PRIMARY key: PRIMARY key_len: 2 ref: const rows: 1 Extra: 2 rows in set (0.00 sec) Const is propagated because there is at most one customer_id, UNIQUE and NOT NULL
No data access strategy No data access strategy when table is NULL Fastest data access strategy Because there is no strategy! No data access strategy when WHERE is impossible Optimizer only accesses metadata
EXPLAIN Plan indexes possible_keys key key_len – longer keys take longer to look up and compare ref – shows what is compared, field or “const” Look closely if an index you think is possible is not considered
eq_ref Data Access Strategy ********** 1. row ********** id: 1 select_type: SIMPLE table: rental type: range possible_keys: rental_date,idx_fk_customer_i d key: rental_date key_len: 8 ref: NULL rows: 2614 Extra: Using where; Using index mysql> EXPLAIN SELECT first_name,last_name FROM rental -> INNER JOIN customer USING (customer_id) -> WHERE rental_date BETWEEN ' :00:00' -> AND ' :59:59'\G ********** 2. row ********** id: 1 select_type: SIMPLE table: customer type: eq_ref possible_keys: PRIMARY key: PRIMARY key_len: 2 ref: sakila.rental.customer_id rows: 1 Extra: 2 rows in set (0.03 sec)
ref is “const” EXPLAIN SELECT return_date FROM rental WHERE rental_id = 13534\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: rental type: const possible_keys: PRIMARY key: PRIMARY key_len: 4 ref: const rows: 1 Extra: 1 row in set (0.09 sec)
Approx # rows examined mysql> EXPLAIN SELECT return_date -> FROM rental WHERE rental_id = 13534\G ******************* 1. row ******************* id: 1 select_type: SIMPLE table: rental type: const possible_keys: PRIMARY key: PRIMARY key_len: 4 ref: const rows: 1 Extra: 1 row in set (0.00 sec)
Approx # rows examined mysql> EXPLAIN SELECT first_name,last_name FROM customer LIMIT 10\G *************** 1. row ***************** id: 1 select_type: SIMPLE table: customer type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 541 Extra: 1 row in set (0.00 sec) LIMIT does not change Rows, even though it affects # rows examined.
Extra Can be good, bad, neutral Sometimes you cannot avoid the bad Distinct – stops after first row match Full scan on NULL key – subquery lookup with no index (bad) Impossible WHERE noticed after reading const tables No tables used
Extra Not exists – stops after first row match for each row set from previous tables Select tables optimized away – Aggregate functions resolved by index or metadata (good) Range checked for each record (index map: N) No good index; may be one after values from previous tables are known
Extra: Using (...) Extra: Using filesort – does an extra pass to sort the data. Worse than using an index for sort order. Index – uses index only, no table read Covering index, good Index for group-by GROUP BY or DISTINCT resolved by index or metadata (good) Temporary Intermediate temporary table used (bad)
Extra & INFORMATION_SCHEMA Scanned N databases N is 0, 1 or all Skip_open_table Fastest, no table files need to be opened Open_frm_only Open the.frm file only Open_trigger_only Open_full_table Open all the table files; slowest, can crash large systems
Sample subquery EXPLAIN ********* 1. row ******** id: 1 select_type: PRIMARY table: customer_outer type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 541 Extra: mysql> EXPLAIN SELECT first_name,last_name, -> IN (SELECT customer_id FROM rental AS rental_subquery WHERE return_date IS NULL) -> FROM customer AS customer_outer\G ********** 2. row ********** id: 2 select_type: DEPENDENT SUBQUERY table: rental_subquery type: index_subquery possible_keys: idx_fk_customer_id key: idx_fk_customer_id key_len: 2 ref: func rows: 13 Extra: Using where; Full scan on NULL key 2 rows in set (0.00 sec)
MySQL and Subqueries Avoid unoptimized subqueries Not all subqueries, though that was true in earlier versions Derived tables may be turned into views or intermediate temporary tables Subqueries can be turned into joins in some cases Getting better all the time Used to be that all subqueries were dependent subqueries.
More EXPLAIN Values EXPLAIN PARTITIONS Adds a “partitions” value that shows a list of partitions to be checked for a partitioned table NULL for a non-partitioned table EXPLAIN EXTENDED Adds filtered field, an approx % of how many of the examined rows will be returned Show the query after the optimizer is finished with SHOW WARNINGS
EXPLAIN EXTENDED mysql> EXPLAIN EXTENDED SELECT customer_id -> FROM rental -> WHERE staff_id=2 AND inventory_id<100\G ***************** 1. row ***************** id: 1 select_type: SIMPLE table: rental type: range possible_keys: idx_fk_inventory_id,idx_fk_staff_id key: idx_fk_inventory_id key_len: 3 ref: NULL rows: 326 filtered: Extra: Using where 1 row in set, 1 warning (0.00 sec)
EXPLAIN EXTENDED mysql> SHOW WARNINGS\G ***************** 1. row ***************** Level: Note Code: 1003 Message: select `sakila`.`rental`.`customer_id` AS `customer_id` from `sakila`.`rental` where ((`sakila`.`rental`.`staff_id` = 2) and (`sakila`.`rental`.`inventory_id` < 100)) 1 row in set (0.00 sec)
More EXPLAIN Information Pages 590 – 614 of the MySQL Administrator's Bible Sakila sample database:
Query Review What is it? – Systematic review of all queries Why do it? – Find queries before they become a problem – Often a sample query is non-trivial to find
Query Review Who should do it? – Optimization knowledge When and where should it be done? – dev → test,load test,staging → production
Main tool mk-query-digest – “query fingerprint” Can be used on: – Slow query logs – Binary logs – General query logs
More mk-query-digest sources Direct database querying – Uses SHOW FULL PROCESSLIST pglog (Postgres) Parsing tcpdump for traffic: – MySQL – memcached – HTTP
Getting mk-query-digest wget maatkit.org/get/mk-query-digest – Easiest – Not always up-to-date! – More work – You get all the maatkit tools, not just one – Most up to date Apr 2010, wget got rev 6067, download was 6070!
What is reported on Default setup uses --limit 95%:20 – To see all queries, --limit 100% No --filter by default --filter Any attribute at User, host, database, process id, lock_time, Memc_miss, Rows_sent, Rows_examined, Rows_affected, Rows_read, Query_time, insert_id
Other filters If using Percona's patches, you can filter on queries that cause: – Filesorts, disk filesorts – Temp tables, Temp disk tables – Full table scan, full join – Query cache hit – and more...
Output Overall summary Detailed report of matching queries Query Analysis Summary Commands run for examples: perl mk-query-digest --limit 100% \ --review h= ,P=3307,D=maatkit,t=query_review,u=user,p=pass \ --create-review-table --type genlog genlog127.sql > genlogoutput.txt perl mk-query-digest --limit 100% \ --review h= ,P=3307,D=maatkit,t=query_review,u=user,p=pass \ --type binlog binlog325.sql > binlogoutput.txt
Overall summary (genlog) # 229.7s user time, 860ms system time, 94.79M rss, M vsz # Overall: k total, 720 unique, QPS, 0x concurrency_________ # total min max avg 95% stddev median # Exec time # Time range :45:01 to :30:01 # bytes M k
Overall summary (binlog) # 390.2s user time, 1.8s system time, 62.70M rss, M vsz # Overall: 1.07M total, 252 unique, QPS, 5.69Gx concurrency_______ # total min max avg 95% stddev median # Exec time s s s 992ms s 0 # Time range :14:17 to :26:51 # # # 3.44k # 1.34k # # k k k k k k # # bytes M M k # error cod
Query analysis part 1 (genlog) # Query 9: 1.69 QPS, 0x concurrency, ID 0x188B27831A9DE05B at byte # This item is included in the report because it matches --limit. # pct total min max avg 95% stddev median # Count # Exec time # Databases 1 proddb # Time range :45:02 to :30:01 # bytes k
Query analysis part 1 (binlog) # Query 5: 3.90 QPS, Mx concurrency, ID 0x188B27831A9DE05B at byte # This item is included in the report because it matches --limit. # pct total min max avg 95% stddev median # Count # Exec time s s s 992ms s 0 # Databases 1 proddb # Time range :14:52 to :26:51 # bytes k # error cod
Query analysis part 2 (genlog) # Query_time distribution # 1us # 10us # 100us # 1ms # 10ms # 100ms # 1s # 10s+ # Review information # first_seen: :45:02 # last_seen: :30:01 # reviewed_by: # reviewed_on: # comments:
Query analysis part 2 (binlog) # Query_time distribution # 1us # 10us # 100us # 1ms # 10ms # 100ms # 1s ############################################################# # 10s+ ################ # Review information # first_seen: :45:02 # last_seen: :26:51 # reviewed_by: # reviewed_on: # comments:
Query analysis part 3 (genlog) # Tables # SHOW TABLE STATUS FROM `proddb` LIKE 'colors'\G # SHOW CREATE TABLE `proddb`.`colors`\G update colors set publishable_flag = true where id = \G # Converted for EXPLAIN # EXPLAIN select publishable_flag = true from colors where id = \G
Query analysis part 3 (binlog) # Tables # SHOW TABLE STATUS FROM `proddb` LIKE 'colors'\G # SHOW CREATE TABLE `proddb`.`colors`\G update colors set publishable_flag = true where id = \G # Converted for EXPLAIN # EXPLAIN select publishable_flag = true from shopping_events where id = \G
Query analysis part 1 (binlog) # Query 5: 3.90 QPS, Mx concurrency, ID 0x188B27831A9DE05B at byte # This item is included in the report because it matches --limit. # pct total min max avg 95% stddev median # Count # Exec time s s s 992ms s 0 # Databases 1 proddb # Time range :14:52 to :26:51 # bytes k # error cod update colors set publishable_flag = true where id = \G
Query Analysis Summary # Profile # Rank Query ID Response time Calls R/Call Item # ==== ================== ======================== ====== =============== # 1 0x85FFF5AA78E5FF6A % BEGIN # 2 0x8F345B7550CA % INSERT user_events_live # 3 0xCACEE7C0CF15B39B % UPDATE skus # 4 0x308A3C4E761F % UPDATE shopping_events # 5 0x188B27831A9DE05B % UPDATE colors # 6 0xD8F78067CE3F07AB % UPDATE offers # 7 0x3C70600B502E3A % UPDATE products
The query_review table Remember, we did the command: perl mk-query-digest --limit 100% \ --review h= ,P=3307,D=maatkit,t=query_review,u=user,p=pass \ --create-review-table --type binlog binlog325.sql > binlogoutput.txt What does the query review table look like? mysql> select * from query_review where checksum=0x188B27831A9DE05B\G *************************** 1. row *************************** checksum: fingerprint: update colors set publishable_flag = true where id = ? sample: update colors set publishable_flag = true where id = first_seen: :45:02 last_seen: :26:51 reviewed_by: NULL reviewed_on: NULL comments: NULL 1 row in set (0.00 sec)
How do we review a query? EXPLAIN, SHOW CREATE TABLE, etc. Now what? mysql> update query_review set reviewed_by='Sheeri', reviewed_on=now(), comments='This query is OK, it uses the primary key to search on.' where checksum= ; Query OK, 1 row affected (0.00 sec) Rows matched: 1 Changed: 1 Warnings: 0 One query down..... mysql> select count(*) from query_review where reviewed_on is null; | count(*) | | 769 | row in set (0.00 sec) 769 to go!
Systematic approach You can look at a few queries per day Reviewed queries do not appear in subsequent reports of mk-query-digest If you have something in reviewed_by Unless you specify --report-all
Query review --no-report to just parse a log to the database: perl mk-query-digest --limit 100% --no-report –review \ h= ,P=3307,D=maatkit,t=query_review,u=user,p=pass \ --type binlog mybinlog.txt Can save counts, etc to an historical table perl mk-query-digest --limit 100% --no-report –review \ h= ,P=3307,D=maatkit,t=query_review,u=user,p=pass \ --create-review-history-table –review-history \ h= ,P=3307,D=maatkit,t=qr_history,u=user,p=pass \ --type genlog mygenlog.txt
Query review history mysql> select * from qr_history where checksum=0x188B27831A9DE05B\G *************************** 1. row *************************** checksum: sample: update colors set publishable_flag = true where id = ts_min: :14:52 ts_max: :26:51 ts_cnt: Query_time_sum: e+12 Query_time_min: 0 Query_time_max: e+09 Query_time_pct_95: Query_time_stddev: e+08 Query_time_median: 0 Lock_time_sum: NULL Lock_time_min: NULL Lock_time_max: NULL Lock_time_pct_95: NULL Lock_time_stddev: NULL Lock_time_median: NULL Rows_sent_sum: NULL Rows_sent_min: NULL Rows_sent_max: NULL Rows_sent_pct_95: NULL Rows_sent_stddev: NULL Rows_sent_median: NULL Rows_examined_sum: NULL Rows_examined_min: NULL Rows_examined_max: NULL Rows_examined_pct_95: NULL Rows_examined_stddev: NULL Rows_examined_median: NULL
Query review history mysql> select * from qr_history where checksum=0x188B27831A9DE05B\G *************************** 1. row *************************** checksum: sample: update colors set publishable_flag = true where id = ts_min: :14:52 ts_max: :26:51 ts_cnt: Query_time_sum: e+12 Query_time_min: 0 Query_time_max: e+09 Query_time_pct_95: Query_time_stddev: e+08 Query_time_median: 0 ************* 2. row ************* checksum: sample: update colors set publishable_flag = true where id = ts_min: :45:01 ts_max: :30:00 ts_cnt: 7109 Query_time_sum: 0 Query_time_min: 0 Query_time_max: 0 Query_time_pct_95: 0 Query_time_stddev: 0 Query_time_median: 0
What I'd like to see Besides query reviews being common practice... More fields in the query_review table – what index(es) are used – fields, index type – Tables involved and their approx row count – Approx rows examined from EXPLAIN More fields in the query_review_history table – Source (genlog, binlog, etc) – When the review was done.
Start Today! Grab a log Find a test machine with a database Start EXPLAINing all your queries mk-query-digest has tons of other great features other than query reviews.....
Questions, comments, feedback? Sheeri K. Cabral Oracle ACE Director MySQL DBA – send me questions, I may be able to on twitter