Slide: 1 Presentation Title Presentation Sub-Title Copyright 2010 Robert Haas, EnterpriseDB Corporation. Creative Commons 3.0 Attribution. The PostgreSQL.

Slides:



Advertisements
Similar presentations
Tuning: overview Rewrite SQL (Leccotech)Leccotech Create Index Redefine Main memory structures (SGA in Oracle) Change the Block Size Materialized Views,
Advertisements

CS 245Notes 71 CS 245: Database System Principles Notes 7: Query Optimization Hector Garcia-Molina.
Relational Algebra, Join and QBE Yong Choi School of Business CSUB, Bakersfield.
EXECUTION PLANS By Nimesh Shah, Amit Bhawnani. Outline  What is execution plan  How are execution plans created  How to get an execution plan  Graphical.
Query Execution, Concluded Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems November 18, 2003 Some slide content may.
Query Evaluation. An SQL query and its RA equiv. Employees (sin INT, ename VARCHAR(20), rating INT, age REAL) Maintenances (sin INT, planeId INT, day.
1 Relational Query Optimization Module 5, Lecture 2.
CS263 Lecture 19 Query Optimisation.  Motivation for Query Optimisation  Phases of Query Processing  Query Trees  RA Transformation Rules  Heuristic.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
Database Implementation Issues CPSC 315 – Programming Studio Spring 2008 Project 1, Lecture 5 Slides adapted from those used by Jennifer Welch.
1 Overview of Storage and Indexing Yanlei Diao UMass Amherst Feb 13, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
Query Optimization 3 Cost Estimation R&G, Chapters 12, 13, 14 Lecture 15.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 11 Database Performance Tuning and Query Optimization.
CS 4432query processing - lecture 171 CS4432: Database Systems II Lecture #17 Join Processing Algorithms (cont). Professor Elke A. Rundensteiner.
Relational Database Performance CSCI 6442 Copyright 2013, David C. Roberts, all rights reserved.
Executing Explain Plans and Explaining Execution Plans Craig Martin 01/20/2011.
Database Systems Design, Implementation, and Management Coronel | Morris 11e ©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 10 Database Performance Tuning and Query Optimization.
ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 14 – Join Processing.
Oracle Database Administration Lecture 6 Indexes, Optimizer, Hints.
Physical Database Design & Performance. Optimizing for Query Performance For DBs with high retrieval traffic as compared to maintenance traffic, optimizing.
Module 7 Reading SQL Server® 2008 R2 Execution Plans.
Query Optimization. overview Histograms A histogram is a data structure maintained by a DBMS to approximate a data distribution Equiwidth vs equidepth.
Ashwani Roy Understanding Graphical Execution Plans Level 200.
Query Optimization Chap. 19. Evaluation of SQL Conceptual order of evaluation – Cartesian product of all tables in from clause – Rows not satisfying where.
CS 338Query Evaluation7-1 Query Evaluation Lecture Topics Query interpretation Basic operations Costs of basic operations Examples Textbook Chapter 12.
Parallel Execution Plans Joe Chang
CS 345: Topics in Data Warehousing Tuesday, October 19, 2004.
1 Chapter 10 Joins and Subqueries. 2 Joins & Subqueries Joins – Methods to combine data from multiple tables – Optimizer information can be limited based.
Query Optimizer Execution Plan Cost Model Joe Chang
CS411 Database Systems Kazuhiro Minami 11: Query Execution.
Using Partial Indexes with PostgreSQL By Lloyd Albin 4/3/2012.
Query Optimizer (Chapter ). Optimization Minimizes uses of resources by choosing best set of alternative query access plans considers I/O cost,
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
CPSC 404, Laks V.S. Lakshmanan1 Evaluation of Relational Operations – Join Chapter 14 Ramakrishnan and Gehrke (Section 14.4)
Query Processing CS 405G Introduction to Database Systems.
Query Execution. Where are we? File organizations: sorted, hashed, heaps. Indexes: hash index, B+-tree Indexes can be clustered or not. Data can be stored.
CS 440 Database Management Systems Lecture 5: Query Processing 1.
Computing & Information Sciences Kansas State University Wednesday, 08 Nov 2006CIS 560: Database System Concepts Lecture 32 of 42 Monday, 06 November 2006.
Sorting and Joining.
Query Processing – Implementing Set Operations and Joins Chap. 19.
Relational Operator Evaluation. overview Projection Two steps –Remove unwanted attributes –Eliminate any duplicate tuples The expensive part is removing.
Query Execution Query compiler Execution engine Index/record mgr. Buffer manager Storage manager storage User/ Application Query update Query execution.
Alon Levy 1 Relational Operations v We will consider how to implement: – Selection ( ) Selects a subset of rows from relation. – Projection ( ) Deletes.
CS4432: Database Systems II
Database Systems, 8 th Edition SQL Performance Tuning Evaluated from client perspective –Most current relational DBMSs perform automatic query optimization.
Execution Plans Detail From Zero to Hero İsmail Adar.
Diving into Query Execution Plans ED POLLACK AUTOTASK CORPORATION DATABASE OPTIMIZATION ENGINEER.
The PostgreSQL Query Planner Robert Haas PostgreSQL East 2010.
How is data stored? ● Table and index Data are stored in blocks(aka Page). ● All IO is done at least one block at a time. ● Typical block size is 8Kb.
Module 11: File Structure
Database Management System
Indices.
CS222P: Principles of Data Management Lecture #15 Query Optimization (System-R) Instructor: Chen Li.
Database Performance Tuning and Query Optimization
Evaluation of Relational Operations
Introduction to Database Systems
Physical Join Operators
Relational Operations
Database Query Execution
Introduction to reading execution plans
Lecture 2- Query Processing (continued)
Implementation of Relational Operations
Lecture 13: Query Execution
Chapter 11 Database Performance Tuning and Query Optimization
CS222P: Principles of Data Management Notes #13 Set operations, Aggregation, Query Plans Instructor: Chen Li.
Evaluation of Relational Operations: Other Techniques
Diving into Query Execution Plans
Sorting We may build an index on the relation, and then use the index to read the relation in sorted order. May lead to one disk block access for each.
CS222: Principles of Data Management Lecture #15 Query Optimization (System-R) Instructor: Chen Li.
Presentation transcript:

Slide: 1 Presentation Title Presentation Sub-Title Copyright 2010 Robert Haas, EnterpriseDB Corporation. Creative Commons 3.0 Attribution. The PostgreSQL Query Planner Robert Haas Drexel University CS 500 Database Theory

Copyright 2010 Robert Haas, EnterpriseDB Corporation. Creative Commons 3.0 Attribution. The PostgreSQL Query Planner Slide: 2 Why Does My Query Need a Plan? SQL is a declarative language. In other words, a SQL query is not a program. No control flow statements (e.g. for, while) and no way to control order of operations. SQL describes results, not process.

Copyright 2010 Robert Haas, EnterpriseDB Corporation. Creative Commons 3.0 Attribution. The PostgreSQL Query Planner Slide: 3 The Best Plan May Not Be Obvious CREATE TABLE foo (a integer, txt varchar); CREATE INDEX foo_a ON foo (a);...insert some data... SELECT * FROM foo WHERE a = 1; What should the planner do?

Copyright 2010 Robert Haas, EnterpriseDB Corporation. Creative Commons 3.0 Attribution. The PostgreSQL Query Planner Slide: 4 Data Distribution Affects Plan Choice SELECT * FROM foo WHERE a = 1 Plan #1 (10,000 rows, a = ): Index Scan using foo_a on foo Index Cond: (a = 1) Plan #2 (10,000 rows, 90% have a = 1): Seq Scan on foo Filter: (a = 1)

Copyright 2010 Robert Haas, EnterpriseDB Corporation. Creative Commons 3.0 Attribution. The PostgreSQL Query Planner Slide: 5 Data Distribution Affects Plan Choice SELECT * FROM foo WHERE a = 1 Plan #3 (10,000 rows, a = , 1000 times each): Bitmap Heap Scan on foo Recheck Cond: (a = 1) -> Bitmap Index Scan on foo_a Index Cond: (a = 1)

Copyright 2010 Robert Haas, EnterpriseDB Corporation. Creative Commons 3.0 Attribution. The PostgreSQL Query Planner Slide: 6 Join Planning CREATE TABLE foo (a integer, txt varchar); CREATE TABLE bar (a integer, txt varchar); CREATE INDEX foo_a ON foo (a); CREATE INDEX bar_a ON bar (a);...insert some data... SELECT * FROM foo, bar WHERE foo.x = bar.x What should the planner do? (at least 14 choices!)

Copyright 2010 Robert Haas, EnterpriseDB Corporation. Creative Commons 3.0 Attribution. The PostgreSQL Query Planner Slide: 7 Goals of Query Planning Make queries run fast. – Minimize disk I/O. – Prefer sequential I/O to random I/O. – Minimize CPU processing. Don't use too much memory in the process. Deliver correct results.

Copyright 2010 Robert Haas, EnterpriseDB Corporation. Creative Commons 3.0 Attribution. The PostgreSQL Query Planner Slide: 8 Query Planner Decisions Access strategy for each table. – Sequential Scan, Index Scan, Bitmap Index Scan. Join strategy. – Join order. – Join strategy: nested loop, merge join, hash join. – Inner vs. outer. Aggregation strategy. – Plain, sorted, hashed.

Copyright 2010 Robert Haas, EnterpriseDB Corporation. Creative Commons 3.0 Attribution. The PostgreSQL Query Planner Slide: 9 Table Access Strategies Sequential Scan (Seq Scan) – Read every row in the table. Index Scan or Bitmap Index Scan – Read only part of the table by using the index to skip uninteresting parts. – Index scan reads index and table in alternation. – Bitmap index scan reads index first, populating bitmap, and then reads table in sequential order.

Copyright 2010 Robert Haas, EnterpriseDB Corporation. Creative Commons 3.0 Attribution. The PostgreSQL Query Planner Slide: 10 Sequential Scan Always works – no need to create indices in advance. Doesn't require reading the index, which has both I/O and CPU cost. Best way to access very small tables. Usually the best way to access all or nearly the rows in a table.

Copyright 2010 Robert Haas, EnterpriseDB Corporation. Creative Commons 3.0 Attribution. The PostgreSQL Query Planner Slide: 11 Index Scan Potentially huge performance gain when reading only a small fraction of rows in a large table. Only table access method that can return rows in sorted order – very useful in combination with LIMIT. Random I/O against base table!

Copyright 2010 Robert Haas, EnterpriseDB Corporation. Creative Commons 3.0 Attribution. The PostgreSQL Query Planner Slide: 12 Bitmap Index Scan Scans all index rows before examining base table, populating a TID bitmap. Table I/O is sequential, with skips; results in physical order. Can efficiently combine data from multiple indices – TID bitmap can handle boolean AND and OR operations. Handles LIMIT poorly.

Copyright 2010 Robert Haas, EnterpriseDB Corporation. Creative Commons 3.0 Attribution. The PostgreSQL Query Planner Slide: 13 Join Planning Fixing the join order and join strategy is the “hard part” of query planning. # of possibilities grows exponentially with number of tables. When search space is small, planner does a nearly exhaustive search. When search space is too large, planner uses heuristics or GEQO to limit planning time and memory usage.

Copyright 2010 Robert Haas, EnterpriseDB Corporation. Creative Commons 3.0 Attribution. The PostgreSQL Query Planner Slide: 14 Join Strategies Nested loop. Merge join. Hash join. Each join strategy takes an “outer” relation and an “inner” relation and produces a result relation.

Copyright 2010 Robert Haas, EnterpriseDB Corporation. Creative Commons 3.0 Attribution. The PostgreSQL Query Planner Slide: 15 Nested Loop Pseudocode for (each outer tuple) for (each inner tuple) if (join condition is met) emit result row; Outer or inner loop could be scanning output of some other join, or a base table. Base table scan could be using an index. Cost is roughly proportional to product of table sizes – bad if BOTH are large.

Copyright 2010 Robert Haas, EnterpriseDB Corporation. Creative Commons 3.0 Attribution. The PostgreSQL Query Planner Slide: 16 Nested Loop Example #1 SELECT * FROM foo, bar WHERE foo.x = bar.x Nested Loop Join Filter: (foo.x = bar.x) -> Seq Scan on bar -> Materialize -> Seq Scan on foo This might be very slow!

Copyright 2010 Robert Haas, EnterpriseDB Corporation. Creative Commons 3.0 Attribution. The PostgreSQL Query Planner Slide: 17 Nested Loop Example #2 SELECT * FROM foo, bar WHERE foo.x = bar.x Nested Loop -> Seq Scan on foo -> Index Scan using bar_pkey on bar Index Cond: (bar.x = foo.x) Nested loop with inner index-scan! Much better... though probably still not the best plan.

Copyright 2010 Robert Haas, EnterpriseDB Corporation. Creative Commons 3.0 Attribution. The PostgreSQL Query Planner Slide: 18 Merge Join Only handles equality joins – something like a.x = b.x. Put both input relations into sorted order (using sort or index scan) and scan through the two in parallel, matching up equal values. Normally visits each input tuple only once, but may need to “rescan” portions of the inner input if there are duplicate values in the outer input. – Take OUTER={ } and INNER={ }

Copyright 2010 Robert Haas, EnterpriseDB Corporation. Creative Commons 3.0 Attribution. The PostgreSQL Query Planner Slide: 19 Merge Join Example SELECT * FROM foo, bar WHERE foo.x = bar.x Merge Join Merge Cond: (foo.x = bar.x) -> Sort Sort Key: foo.x -> Seq Scan on foo -> Materialize -> Sort Sort Key: bar.x -> Seq Scan on bar

Copyright 2010 Robert Haas, EnterpriseDB Corporation. Creative Commons 3.0 Attribution. The PostgreSQL Query Planner Slide: 20 Hash Join Like merge join, only handles equality joins. Hash each row from the inner relation to create a hash table. Then, hash each row from the outer relation and probe the hash table for matches. Very fast – but requires enough memory to store inner tuples. Can get around this using multiple “batches”. Not guaranteed to retain input ordering.

Copyright 2010 Robert Haas, EnterpriseDB Corporation. Creative Commons 3.0 Attribution. The PostgreSQL Query Planner Slide: 21 Hash Join Example SELECT * FROM foo, bar WHERE foo.x = bar.x Hash Join Hash Cond: (foo.x = bar.x) -> Seq Scan on foo -> Hash -> Seq Scan on bar

Copyright 2010 Robert Haas, EnterpriseDB Corporation. Creative Commons 3.0 Attribution. The PostgreSQL Query Planner Slide: 22 Join Removal Upcoming PostgreSQL 9.0 feature! Consider the following query: SELECT foo.x, foo.y, foo.z FROM foo LEFT JOIN bar ON foo.x = bar.x; If there is a unique index on bar (x), then, instead of joining foo and bar, we can just read foo, and ignore bar. Common scenario using views or query generators.

Copyright 2010 Robert Haas, EnterpriseDB Corporation. Creative Commons 3.0 Attribution. The PostgreSQL Query Planner Slide: 23 Join Removal – Continued PostgreSQL 9.0 will only be able to remove LEFT joins. Current project for PostgreSQL 9.1: remove INNER joins. Consider: SELECT foo.x, foo.y, foo.z FROM foo, bar WHERE foo.x = bar.x; Need: (1) foo.x is NOT NULL, (2) foreign key foo (x) references bar (x).

Copyright 2010 Robert Haas, EnterpriseDB Corporation. Creative Commons 3.0 Attribution. The PostgreSQL Query Planner Slide: 24 Join Reordering SELECT * FROM foo JOIN bar ON foo.x = bar.x JOIN baz ON foo.y = baz.y SELECT * FROM foo JOIN baz ON foo.y = baz.y JOIN bar ON foo.x = bar.x SELECT * FROM foo JOIN (bar JOIN baz ON true) ON foo.x = bar.x AND foo.y = baz.y

Copyright 2010 Robert Haas, EnterpriseDB Corporation. Creative Commons 3.0 Attribution. The PostgreSQL Query Planner Slide: 25 Not The Same Thing! SELECT * FROM (foo JOIN bar ON foo.x = bar.x) LEFT JOIN baz ON foo.y = baz.y SELECT * FROM (foo LEFT JOIN baz ON foo.y = baz.y) JOIN bar ON foo.x = bar.x

Copyright 2010 Robert Haas, EnterpriseDB Corporation. Creative Commons 3.0 Attribution. The PostgreSQL Query Planner Slide: 26 EXPLAIN Estimates Hash Join (cost= rows=9000 width=118) Hash Cond: (foo.x = bar.x) -> Hash Join (cost= rows=9000 width=12) Hash Cond: (foo.y = baz.y) -> Seq Scan on foo (cost= rows=10000 width=8) -> Hash (cost= rows=90 width=4) -> Seq Scan on baz (cost= rows=90 width=4) -> Hash (cost= rows=100 width=106) -> Seq Scan on bar (cost= rows=100 width=106)

Copyright 2010 Robert Haas, EnterpriseDB Corporation. Creative Commons 3.0 Attribution. The PostgreSQL Query Planner Slide: 27 EXPLAIN ANALYZE Hash Join (cost= rows=9000 width=118) (actual time= rows=9000 loops=1) Hash Cond: (foo.x = bar.x) -> Hash Join (cost= rows=9000 width=12) (actual time= rows=9000 loops=1) Hash Cond: (foo.y = baz.y) -> Seq Scan on foo (cost= rows=10000 width=8) (actual time= rows=10000 loops=1) -> Hash (cost= rows=90 width=4) (actual time= rows=90 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 4kB -> Seq Scan on baz (cost= rows=90 width=4) (actual time= rows=90 loops=1) -> Hash (cost= rows=100 width=106) (actual time= rows=100 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 14kB -> Seq Scan on bar (cost= rows=100 width=106) (actual time= rows=100 loops=1) Total runtime: ms

Copyright 2010 Robert Haas, EnterpriseDB Corporation. Creative Commons 3.0 Attribution. The PostgreSQL Query Planner Slide: 28 Review of Join Planning Join Order Join Strategy – Nested loop – Merge join – Hash join – Join removal Inner vs. outer

Copyright 2010 Robert Haas, EnterpriseDB Corporation. Creative Commons 3.0 Attribution. The PostgreSQL Query Planner Slide: 29 Aggregates and DISTINCT Plain aggregate. – e.g. SELECT count(*) FROM foo; Sorted aggregate. – Sort the data (or use pre-sorted data); when you see a new value, aggregate the prior group. Hashed aggregate. – Insert each input row into a hash table based on the grouping columns; at the end, aggregate all the groups.

Copyright 2010 Robert Haas, EnterpriseDB Corporation. Creative Commons 3.0 Attribution. The PostgreSQL Query Planner Slide: 30 Statistics All of the decisions discussed earlier in this talk are made using statistics. – Seq scan vs. index scan vs. bitmap index scan – Nested loop vs. merge join vs. hash join ANALYZE (manual or via autovacuum) gathers this information. You must have good statistics or you will get bad plans!

Copyright 2010 Robert Haas, EnterpriseDB Corporation. Creative Commons 3.0 Attribution. The PostgreSQL Query Planner Slide: 31 Confusing The Planner SELECT * FROM foo WHERE a = 1 AND b = 1 If 20% of the rows have a = 1 and 10% of the rows have b = 1, the planner will assume that 20% * 10% = 2% of the rows meet both criteria. SELECT * FROM foo WHERE (a + 0) = a Planner doesn't have a clue, so will assume 0.5% of rows will match.

Copyright 2010 Robert Haas, EnterpriseDB Corporation. Creative Commons 3.0 Attribution. The PostgreSQL Query Planner Slide: 32 What Can Go Wrong? If the planner underestimates the row count, it may choose an index scan instead of a sequential scan, or a nested loop instead of a hash or merge join. If the planner overestimates the row count, it may choose a sequential scan instead of an index scan, or a merge or hash join instead of a nested loop. Small values for LIMIT tilt the planner toward fast-start plans and magnify the effect of bad estimates.

Copyright 2010 Robert Haas, EnterpriseDB Corporation. Creative Commons 3.0 Attribution. The PostgreSQL Query Planner Slide: 33 Query Planner Parameters seq_page_cost (1.0), random_page_cost (4.0) – Reduce these costs to account for caching effects. If database is fully cached, try default_statistics_target (10 or 100) – Level of detail for statistics gathering. Can also be overridden on a per- column basis. enable_hashjoin, enable_sort, etc. - Just for testing. work_mem – Amount of memory per sort or hash. from_collapse_limit, join_collapse_limit, geqo_threshold – Sometimes need to be raised, but be careful!

Copyright 2010 Robert Haas, EnterpriseDB Corporation. Creative Commons 3.0 Attribution. The PostgreSQL Query Planner Slide: 34 Things That Are Slow DISTINCT. PL/pgsql loops. FOR x IN SELECT... LOOP SELECT... END LOOP Repeated calls to SQL or PL/pgsql functions. SELECT id, some_function(id) FROM table;

Copyright 2010 Robert Haas, EnterpriseDB Corporation. Creative Commons 3.0 Attribution. The PostgreSQL Query Planner Slide: 35 Upcoming Features Join removal (right now just for LEFT joins). Machine-readable EXPLAIN output. Hash statistics. Better model for Materialize costs. Improved use of indices to handle MIN(x), MAX(x), and x IS NOT NULL.

Copyright 2010 Robert Haas, EnterpriseDB Corporation. Creative Commons 3.0 Attribution. The PostgreSQL Query Planner Slide: 36 Questions? Any questions?