CMU SCS Carnegie Mellon Univ. Dept. of Computer Science 15-415/615 - DB Applications C. Faloutsos – A. Pavlo Lecture#15: Query Optimization.

Slides:



Advertisements
Similar presentations
Examples of Physical Query Plan Alternatives
Advertisements

Overview of Query Evaluation (contd.) Chapter 12 Ramakrishnan and Gehrke (Sections )
Query Optimization Reserves Sailors sid=sid bid=100 rating > 5 sname (Simple Nested Loops) Imperative query execution plan: SELECT S.sname FROM Reserves.
CMU SCS Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications C. Faloutsos – A. Pavlo Lecture#14: Implementation of Relational Operations.
Query Optimization CS634 Lecture 12, Mar 12, 2014 Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
Evaluation of Relational Operators CS634 Lecture 11, Mar Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Relational Query Optimization Chapters 14.
Query Optimization Goal: Declarative SQL query
Advanced Databases: Lecture 2 Query Optimization (I) 1 Query Optimization (introduction to query processing) Advanced Databases By Dr. Akhtar Ali.
CMU SCS /615Faloutsos/Pavlo1 Carnegie Mellon Univ. Dept. of Computer Science /615 – DB Applications C. Faloutsos & A. Pavlo Lecture #13: Query.
Query Evaluation. An SQL query and its RA equiv. Employees (sin INT, ename VARCHAR(20), rating INT, age REAL) Maintenances (sin INT, planeId INT, day.
1 Relational Query Optimization Module 5, Lecture 2.
Relational Query Optimization 198:541. Overview of Query Optimization  Plan: Tree of R.A. ops, with choice of alg for each op. Each operator typically.
Query Rewrite: Predicate Pushdown (through grouping) Select bid, Max(age) From Reserves R, Sailors S Where R.sid=S.sid GroupBy bid Having Max(age) > 40.
CSCI 5708: Query Processing I Pusheng Zhang University of Minnesota Feb 3, 2004.
Query Optimization II R&G, Chapters 12, 13, 14 Lecture 9.
Query Optimization 3 Cost Estimation R&G, Chapters 12, 13, 14 Lecture 15.
CS186 Final Review Query Optimization.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Relational Query Optimization Chapter 15.
Query Optimization Overview Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems December 2, 2004 Some slide content derived.
Query Processing & Optimization
Overview of Query Optimization v Plan : Tree of R.A. ops, with choice of alg for each op. –Each operator typically implemented using a `pull’ interface:
CMU SCS Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications C. Faloutsos – A. Pavlo Lecture#14: Implementation of Relational Operations.
Query Optimization R&G, Chapter 15 Lecture 16. Administrivia Homework 3 available today –Written exercise; will be posted on class website –Due date:
Query Optimization, part 2 CS634 Lecture 13, Mar Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
Overview of Implementing Relational Operators and Query Evaluation
Introduction to Database Systems1 Relational Query Optimization Query Processing: Topic 2.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Query Evaluation Chapter 12: Overview.
Access Path Selection in a Relational Database Management System Selinger et al.
Query Optimization. overview Histograms A histogram is a data structure maintained by a DBMS to approximate a data distribution Equiwidth vs equidepth.
Database systems/COMP4910/Melikyan1 Relational Query Optimization How are SQL queries are translated into relational algebra? How does the optimizer estimates.
Database Management 9. course. Execution of queries.
Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Query Processing (based on notes by C. Faloutsos at CMU)
1 Database Systems ( 資料庫系統 ) December 3, 2008 Lecture #10.
1 Relational Query Optimization Chapter Query Blocks: Units of Optimization  An SQL query is parsed into a collection of query blocks :  An SQL.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Implementing Relational Operators and Query Evaluation Chapter 12.
Introduction to Query Optimization, R. Ramakrishnan and J. Gehrke 1 Introduction to Query Optimization Chapter 13.
CMU SCS Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications C. Faloutsos – A. Pavlo Lecture#14: Implementation of Relational Operations.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Introduction to Query Optimization Chapter 13.
CMU SCS Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications C. Faloutsos – A. Pavlo Lecture#15: Query Optimization.
File Organizations and Indexing
ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 12 – Introduction to.
1 Database Systems ( 資料庫系統 ) December 13, 2004 Chapter 15 By Hao-hua Chu ( 朱浩華 )
Implementation of Database Systems, Jarek Gryz1 Relational Query Optimization Chapters 12.
Cost Estimation For each plan considered, must estimate cost: –Must estimate cost of each operation in plan tree. Depends on input cardinalities. –Must.
1 Overview of Query Evaluation Chapter Outline  Query Optimization Overview  Algorithm for Relational Operations.
Database Applications (15-415) DBMS Internals- Part X Lecture 21, April 3, 2016 Mohammad Hammoud.
Database Applications (15-415) DBMS Internals- Part IX Lecture 20, March 31, 2016 Mohammad Hammoud.
Query Optimization. overview Application Programmer (e.g., business analyst, Data architect) Sophisticated Application Programmer (e.g., SAP admin) DBA,
Database Management System
CS222P: Principles of Data Management Lecture #15 Query Optimization (System-R) Instructor: Chen Li.
Introduction to Query Optimization
Overview of Query Optimization
Introduction to Database Systems
Examples of Physical Query Plan Alternatives
Faloutsos/Pavlo C. Faloutsos – A. Pavlo Lecture#15: Query Optimization
C. Faloutsos Query Optimization – part 1
Relational Query Optimization
Faloutsos/Pavlo C. Faloutsos – A. Pavlo Lecture#13: Query Evaluation
Database Applications (15-415) DBMS Internals- Part IX Lecture 21, April 1, 2018 Mohammad Hammoud.
Query Optimization.
Relational Query Optimization
Overview of Query Evaluation
Relational Query Optimization
Database Applications (15-415) DBMS Internals- Part X Lecture 22, April 3, 2018 Mohammad Hammoud.
C. Faloutsos Query Optimization – part 2
Relational Query Optimization (this time we really mean it)
CS222: Principles of Data Management Lecture #15 Query Optimization (System-R) Instructor: Chen Li.
Relational Query Optimization
Relational Query Optimization
Presentation transcript:

CMU SCS Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications C. Faloutsos – A. Pavlo Lecture#15: Query Optimization

CMU SCS Last Class Set Operations Aggregate Operations Explain Faloutsos/PavloCMU SCS /6152

CMU SCS Today’s Class History & Background Relational Algebra Equivalences Plan Cost Estimation Plan Enumeration Nested Sub-queries Faloutsos/PavloCMU SCS /6153

CMU SCS 4 Cost-based Query Sub-System Query Parser Query Optimizer Plan Generator Plan Cost Estimator Catalog Manager Query Plan Evaluator SchemaStatistics Select * From Blah B Where B.blah = blah Queries

CMU SCS Query Optimization Remember that SQL is declarative. –User tells the DBMS what answer they want, not how to get the answer. There can be a big difference in performance based on plan is used: –See last week: 5.7 days vs. 45 seconds Faloutsos/PavloCMU SCS /6155

CMU SCS Quick DB History Lesson Faloutsos/PavloCMU SCS /6156

CMU SCS 1960s – IBM IMS First database system. Hierarchical data model. Programmer-defined physical storage format. Tuple-at-a-time queries. Faloutsos/PavloCMU SCS /6157

CMU SCS 1970s – CODASYL COBOL people got together and proposed a standard based on a network data model. Tuple-at-a-time queries. –This forces the programmer to do manual query optimization. Faloutsos/Pavlo8 Bachman CMU SCS /615

CMU SCS 1970s – Relational Model Ted Codd saw the maintenance overhead for IMS/Codasyl. Proposed database abstraction based on relations: –Store database in simple data structures. –Access it through high-level language. –Physical storage left up to implementation. Faloutsos/Pavlo9 Codd CMU SCS /615

CMU SCS IBM System R Skunkworks project at IBM Research in San Jose to implement Codd’s ideas. Had to figure out all of the things that we are discussing in this course themselves. IBM never commercialized System R. Faloutsos/PavloCMU SCS /61510

CMU SCS IBM System R First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. A lot of the concepts from System R’s optimizer are still used today. Faloutsos/PavloCMU SCS /61511

CMU SCS sidbiddayrname matlock macgyver a-team dallas Sample Database Faloutsos/PavloCMU SCS /61512 SAILORSRESERVES sidsnameratingage 1Christos Obama Tupac Bieber Sailors(sid: int, sname: varchar, rating: int, age: real) Reserves(sid: int, bid: int, day: date, rname: varchar) Boats(bid: int, bname: varchar, color: varchar) bidbnamecolor 101The GZAred 102The RZAwhite 103Raekwongreen 104O.D.B.brown BOATS

CMU SCS Query Optimization Bring query in internal form (eg., parse tree) … into “canonical form” (syntactic q-opt) Generate alternative plans. Estimate cost for each plan. Pick the best one. Faloutsos/PavloCMU SCS /61513

CMU SCS Today’s Class History & Background Relational Algebra Equivalences Plan Cost Estimation Plan Enumeration Nested Sub-queries Faloutsos/PavloCMU SCS /61514

CMU SCS Relational Algebra Equivalences Syntactic query optimization. Perform selections and projections early See transformation rules in textbook. Faloutsos/PavloCMU SCS /61515

CMU SCS Equivalence of Expressions Q: How to prove a transf. rule? Use relational tuple calculus to show that LHS = RHS: Faloutsos/PavloCMU SCS /61516 LHSRHS

CMU SCS Equivalence of Expressions Faloutsos/PavloCMU SCS /61517

CMU SCS Equivalence of Expressions Faloutsos/PavloCMU SCS /61518

CMU SCS Equivalence of Expressions Q: How to disprove a rule? Faloutsos/PavloCMU SCS /61519 X AB Christossquirrels R1 Ø AB Christosknifefights R2 AB Christossquirrels ≠

CMU SCS Equivalence of Expressions Selections: –Perform them early –Break a complex predicate, and push Simplify a complex predicate –( X=Y AND Y=3 ) → X=3 AND Y=3 Faloutsos/PavloCMU SCS /61520

CMU SCS Equivalence of Expressions Projections: –Perform them early (but carefully…) Smaller tuples Fewer tuples (if duplicates are eliminated) –Project out all attributes except the ones requested or required (e.g., joining attr.) Faloutsos/PavloCMU SCS /61521

CMU SCS Equivalence of Expressions Joins: –Commutative, associative Q: How many different orderings are there for an n-way join? Faloutsos/PavloCMU SCS /61522

CMU SCS Equivalence of Expressions Joins: How many different orderings are there for an n-way join? A: Catalan number ~ 4^nCatalan number –Exhaustive enumeration: too slow. Faloutsos/PavloCMU SCS /61523

CMU SCS Query Optimization Bring query in internal form (eg., parse tree) … into “canonical form” (syntactic q-opt) Generate alternative plans. Estimate cost for each plan. Pick the best one. Faloutsos/PavloCMU SCS /61524

CMU SCS Cost Estimation How long will a query take? –CPU: Small cost; tough to estimate. –Disk: # of block transfers. How many tuples will qualify? What statistics do we need to keep? Faloutsos/PavloCMU SCS /61525

CMU SCS Cost Estimation – Statistics For each relation R we keep: –N R → # tuples; –S R → size of tuple in bytes Faloutsos/PavloCMU SCS /61526 … SRSR #2 #3 #N R #1

CMU SCS Cost Estimation – Statistics For each relation R we keep: –N R → # tuples; –S R → size of tuple in bytes –V(A,R) → # of distinct values of attribute ‘A’ –And histograms… Faloutsos/PavloCMU SCS /61527 … SRSR #2 #3 #N R #1

CMU SCS Derivable Statistics F R → max# records/block B R → # blocks SC(A,R) → selection cardinality avg# of records with A=given Faloutsos/PavloCMU SCS /61528 … #2 #3 #B R #1 FRFR SRSR

CMU SCS Derivable Statistics F R → max# records/block Blocking Factor → B/S R, where B is the block size in bytes. B R → # blocks → N R /F R Faloutsos/PavloCMU SCS /61529

CMU SCS Derivable Statistics SC(A,R) → Selection Cardinality avg# of records with A=given → N R / V(A,R) Note that this assumes data uniformity –10,000 students, 10 colleges – how many students in SCS? Faloutsos/PavloCMU SCS /61530

CMU SCS Additional Statistics For index i: –F i → average fanout (~50-100) –HT i → # levels of index i (~2-3) ~ log(#entries)/log(F i ) –LB i # → blocks at leaf level Faloutsos/PavloCMU SCS /61531 HTi

CMU SCS Statistics Where do we store them? How often do we update them? Faloutsos/PavloCMU SCS /61532

CMU SCS Selection Statistics We saw simple predicates (‘ name=Christos ’) How about more complex predicates, like –‘ salary > 10K ’ –‘ age=30 AND jobCode=“Gangstarr” ’ What is their selectivity? Faloutsos/PavloCMU SCS /61533

CMU SCS Selections – Complex Predicates Selectivity sel(P) of predicate P: == fraction of tuples that qualify sel(P) = SC(P) / N R Faloutsos/PavloCMU SCS /61534

CMU SCS Selections – Complex Predicates Assume that V(rating, SAILORS) has 5 distinct values (i.e., 0 to 4). simple predicate P: A=constant –sel(A=constant) = 1/V(A,R) –eg., sel(rating=‘2’) = 1/5 What if V(A,R) is unknown?? Faloutsos/PavloCMU SCS /61535 rating count 40

CMU SCS Range Query: sel(rating >= ‘2’) sel(A>a) = (A max – a) / (A max – A min ) Selections – Complex Predicates Faloutsos/PavloCMU SCS /61536 rating count 40

CMU SCS Negation: sel(rating != ‘2’) –sel(not P) = 1 – sel(P) Observation: selectivity ≈ probability Selections – Complex Predicates Faloutsos/PavloCMU SCS /61537 rating count 40 ‘P’

CMU SCS Selections – Complex Predicates Conjunction: –sel(rating = ‘2’ and name LIKE ‘C%’) –sel(P1 ⋀ P2) = sel(P1) ∙ sel(P2) –INDEPENDENCE ASSUMPTION Faloutsos/PavloCMU SCS /61538 P1P2

CMU SCS Selections – Complex Predicates Disjunction: –sel(rating = ‘2’ or name LIKE ‘C%’) –sel(P1 ⋁ P2) = sel(P1) + sel(P2) – sel(P1 ⋁ P2) = sel(P1) + sel(P2) – sel(P1) ∙ sel(P2) –INDEPENDENCE ASSUMPTION, again Faloutsos/PavloCMU SCS /61539

CMU SCS Selections – Complex Predicates Disjunction, in general: –sel(P1 or P2 or … Pn) = –1 - (1- sel(P1) ) ∙ (1 - sel(P2) ) ∙ … (1 - sel(Pn)) Faloutsos/PavloCMU SCS /61540 P1P2

CMU SCS Selections – Summary sel(A=constant) → 1/V(A,r) sel(A>a) → (A max – a) / (A max – A min ) sel(not P) → 1 – sel(P) sel(P1 and P2) → sel(P1) ∙ sel(P2) sel(P1 or P2) → sel(P1) + sel(P2) – sel(P1) ∙ sel(P2) sel(P1 or... or Pn) = 1 - (1-sel(P1)) ∙... ∙ (1-sel(Pn)) Faloutsos/PavloCMU SCS /61541

CMU SCS Joins Q: Given a join of R and S, what is the range of possible result sizes in #of tuples? –Hint: what if R cols ⋂ S cols = Ø? –R cols ⋂ S cols is a key for R and a foreign key in S? Faloutsos/PavloCMU SCS /61542

CMU SCS Joins Q: Given a join of R and S, what is the range of possible result sizes in #of tuples? –Hint: what if R cols ⋂ S cols = Ø? –R cols ⋂ S cols is a key for R and a foreign key in S? Faloutsos/PavloCMU SCS /61543 N R ∙ N S ≤ N S

CMU SCS Result Size Estimation for Joins General case: R cols ⋂ S cols = {A} where A is not a key for either table. Hint: for a given tuple of R, how many tuples of S will it match? Faloutsos/PavloCMU SCS /61544

CMU SCS Result Size Estimation for Joins General case: R cols ⋂ S cols = {A} where A is not a key for either table. –Match each R-tuple with S-tuples: estSize ≈ N R ∙ N S / V(A,S) –Symmetrically, for S: estSize ≈ N R ∙ N S / V(A,R) Overall: –estSize ≈ N R ∙ N S / max( {V(A,S), V(A,R)} ) Faloutsos/PavloCMU SCS /61545

CMU SCS Cost Estimations Our formulas are nice but we assume that data values are uniformly distributed. Faloutsos/PavloCMU SCS /61546 Uniform Approximation of DDistribution D

CMU SCS Cost Estimations Our formulas are nice but we assume that data values are uniformly distributed. Faloutsos/PavloCMU SCS /61547 Uniform Approximation of DDistribution D # of occurrences Distinct values of attribute

CMU SCS Histograms Allows the DBMS to have leverage better statistics about the data. Equiwidth Histogram ~ QuantilesEquiwidth Histogram Bucket 1 Count=8 Bucket 2 Count=4 Bucket 3 Count=15 Bucket 4 Count=3 Bucket 5 Count=15 Bucket 1 Count=9 Bucket 2 Count=10 Bucket 3 Count=10 Bucket 4 Count=7 Bucket 5 Count=9

CMU SCS Query Optimization Bring query in internal form (eg., parse tree) … into “canonical form” (syntactic q-opt) Generate alternative plans. –Single relation. –Multiple relations. Estimate cost for each plan. Pick the best one. Faloutsos/PavloCMU SCS /61549

CMU SCS Plan Generation What are our plan options? Faloutsos/PavloCMU SCS /61550 … #2 #3 #B R #1 SRSR SELECT * FROM SAILORS WHERE rating = 10 SELECT * FROM SAILORS WHERE rating = 10 FRFR

CMU SCS Reminder Faloutsos/PavloCMU SCS /61551

CMU SCS Plan Generation Sequential Scan Binary Search –if sorted & consecutive Index Search –if an index exists Faloutsos/PavloCMU SCS /61552 … #2 #3 #B R #1 SRSR SELECT * FROM SAILORS WHERE rating = 10 SELECT * FROM SAILORS WHERE rating = 10

CMU SCS Sequential Scan B R (worst case) B R /2 (on average, if we search for primary key) Faloutsos/PavloCMU SCS /61553 … #2 #3 #B R #1 SRSR SELECT * FROM SAILORS WHERE rating = 10 SELECT * FROM SAILORS WHERE rating = 10 FRFR

CMU SCS Binary Search ~log(B R ) + SC(A,R)/ F R Extra blocks are ones that contain qualifying tuples Faloutsos/PavloCMU SCS /61554 … #2 #3 #B R #1 SRSR SELECT * FROM SAILORS WHERE rating = 10 SELECT * FROM SAILORS WHERE rating = 10 FRFR

CMU SCS Binary Search ~log(B R ) + SC(A,R)/ F R Extra blocks are ones that contain qualifying tuples Faloutsos/PavloCMU SCS /61555 … #2 #3 #B R #1 SRSR SELECT * FROM SAILORS WHERE rating = 10 SELECT * FROM SAILORS WHERE rating = 10 We showed that estimating this is non-trivial. FRFR

CMU SCS Index Search Index Search: –levels of index + blocks w/ qual. tuples Faloutsos/PavloCMU SCS /61556 … #2 #3 #B R #1 SRSR Case#1: Primary Key Case#2: Secondary key – clustering index Case#3: Secondary key – non-clust. index SELECT * FROM SAILORS WHERE rating = 10 SELECT * FROM SAILORS WHERE rating = 10 FRFR

CMU SCS Index Search: Case #1 Primary Key –cost: HT i + 1 Faloutsos/PavloCMU SCS /61557 … #2 #3 #B R #1 SRSR SELECT * FROM SAILORS WHERE rating = 10 SELECT * FROM SAILORS WHERE rating = 10 HT i

CMU SCS Index Search: Case #2 Secondary key with clustering index: –cost: HT i + SC(A,R)/F R Faloutsos/PavloCMU SCS /61558 … #2 #3 #B R #1 SRSR SELECT * FROM SAILORS WHERE rating = 10 SELECT * FROM SAILORS WHERE rating = 10 HT i

CMU SCS Index Search: Case #3 Secondary key with non-clustering index: –cost: HT i + SC(A,R) Faloutsos/PavloCMU SCS /61559 … #2 #3 #B R #1 SRSR SELECT * FROM SAILORS WHERE rating = 10 SELECT * FROM SAILORS WHERE rating = 10 … HT i

CMU SCS Single Relation Plans With no index: scan (dup-elim; sort) With index: –Single index access path –Multiple index access path –Sorted index access path –Index-only access path Faloutsos/PavloCMU SCS /61560

CMU SCS Overview – Detailed Why q-opt? Equivalence of expressions Cost estimation Plan generation Plan evaluation Faloutsos/PavloCMU SCS /61561

CMU SCS Citation P. G. Selinger, M. M. Astrahan, D. D. Chamberlin, R. A. Lorie, and T. G. Price. Access path selection in a relational database management system. In SIGMOD Conference, pages , Faloutsos/PavloCMU SCS /61562

CMU SCS Statistics for Optimization NCARD(R): Cardinality of relation R in tuples TCARD(R): # of pages containing tuples from R P(R) = TCARD(R)/(# of non-empty pages in the segment) –If segments only held tuples from one relation there would be no need for P(R) ICARD(I): # of distinct keys in index I NINDX(I): # of pages in index I Faloutsos/PavloCMU SCS /61563

CMU SCS Predicate Selectivity Estimation Faloutsos/PavloCMU SCS /61564 attr = valueF = 1/ICARD(attr index) – if index exists F = 1/10 otherwise attr1 = attr2F = 1/max(ICARD(I1),ICARD(I2)) or F = 1/ICARD(Ii) – if only index i exists, or F = 1/10 val1 < attr < val2F = (value2-value1)/(high key-low key) F = 1/4 otherwise expr1 or expr2F = F(expr1)+F(expr2)–F(expr1)*F(expr2) expr1 and expr2F = F(expr1) * F(expr2) NOT exprF = 1 – F(expr)

CMU SCS Costs per Access Path Case Faloutsos/PavloCMU SCS /61565 Unique index matching equal predicate 1+1+W Clustered index I matching >=1 preds F(preds)*(NINDX(I)+TCARD)+W*RSICARD Non-clustered index I matching >=1 preds F(preds)*(NINDX(I)+NCARD)+W*RSICARD Segment scanTCARD/P + W*RSICARD

CMU SCS Query Optimization Bring query in internal form (eg., parse tree) … into “canonical form” (syntactic q-opt) Generate alternative plans. –Single relation. –Multiple relations. Estimate cost for each plan. Pick the best one. Faloutsos/PavloCMU SCS /61566

CMU SCS Queries over Multiple Relations As number of joins increases, number of alternative plans grows rapidly –We need to restrict search space. Fundamental decision in System R: only left-deep join trees are considered. Faloutsos/PavloCMU SCS /61567

CMU SCS Queries over Multiple Relations Fundamental decision in System R: only left-deep join trees are considered. Faloutsos/PavloCMU SCS /61568 B A C D B A C D CD BA

CMU SCS Queries over Multiple Relations Fundamental decision in System R: only left-deep join trees are considered. Faloutsos/PavloCMU SCS /61569 B A C D B A C D CD BA XX

CMU SCS Queries over Multiple Relations Fundamental decision in System R: only left-deep join trees are considered. –Allows for fully pipelined plans where intermediate results not written to temp files. –Not all left-deep trees are fully pipelined (e.g., SM join). Faloutsos/PavloCMU SCS /61570

CMU SCS Queries over Multiple Relations Enumerate the orderings (= left deep tree) Enumerate the plans for each operator Enumerate the access paths for each table Use dynamic programming to save cost estimations. Faloutsos/PavloCMU SCS /61571

CMU SCS (Reminder: Dynamic Programming) PIT CDG ATL PVG BOS FRA JKF $200 $150 $500 Cheapest flight PIT -> PVG? $800

CMU SCS (Reminder: Dynamic Programming) FaloutsosCMU SCS /61573 PIT CDG ATL PVG BOS FRA JKF $200 $150 $500 Assumption: NO package deals: cost CDG->PVG is always $800, no matter how reached CDG $800

CMU SCS (Reminder: Dynamic Programming) PIT CDG ATL PVG BOS FRA JKF $200 $150 $500 Solution: compute partial optimal, left-to-right: $800 $50 $450 $650 $1050 $850 $950

CMU SCS (Reminder: Dynamic Programming) PIT CDG ATL PVG BOS FRA JKF $200 $150 $500 Solution: compute partial optimal, left-to-right: $800 $50 $450 $650 $1050 $850 $950 $200 $150 $50

CMU SCS (Reminder: Dynamic Programming) PIT CDG ATL PVG BOS FRA JKF $200 $150 $500 Solution: compute partial optimal, left-to-right: $800 $50 $450 $650 $1050 $850 $950 $200 $150 $50 $700 $650

CMU SCS (Reminder: Dynamic Programming) PIT CDG ATL PVG BOS FRA JKF $200 $150 $500 Solution: compute partial optimal, left-to-right: $800 $50 $450 $650 $1050 $850 $950 $200 $150 $50 $700 $650 $1500

CMU SCS (Reminder: Dynamic Programming) PIT CDG ATL PVG BOS FRA JKF $200 $150 $500 So, best price is $1,500 – which legs? $800 $50 $450 $650 $1050 $850 $950 $200 $150 $50 $700 $650 $1500

CMU SCS (Reminder: Dynamic Programming) FaloutsosCMU SCS /61579 PIT CDG ATL PVG BOS FRA JKF $200 $150 $500 $800 $50 $450 $650 $1050 $850 $950 $200 $150 $50 $700 $650 $1500 So, best price is $1,500 – which legs? A: follow the winning edges, backwards

CMU SCS (Reminder: Dynamic Programming) FaloutsosCMU SCS /61580 PIT CDG ATL PVG BOS FRA JKF $200 $150 $500 $800 $50 $450 $650 $1050 $850 $950 $200 $150 $50 $700 $650 $1500 So, best price is $1,500 – which legs? A: follow the winning edges, backwards

CMU SCS (Reminder: Dynamic Programming) FaloutsosCMU SCS /61581 PIT CDG ATL PVG BOS FRA JKF $200 $150 $500 $800 $50 $450 $650 $1050 $850 $950 $200 $150 $50 $700 $650 $1500 So, best price is $1,500 – which legs? A: follow the winning edges, backwards

CMU SCS (Reminder: Dynamic Programming) PIT CDG ATL PVG BOS FRA JKF $200 $150 $500 $800 $50 $450 $650 $1050 $850 $950 $200 $150 $50 $700 $650 $1500 Q: what are the states, costs and arrows, in q-opt?

CMU SCS (Reminder: Dynamic Programming) PIT CDG ATL PVG BOS FRA JKF $200 $150 $500 $800 $50 $450 $650 $1050 $850 $950 $200 $150 $50 $700 $650 $1500 Q: what are the states, costs and arrows, in q-opt? A: set of intermediate result tables

CMU SCS Q-Opt + Dynamic Programming E.g., compute R join S join T FaloutsosCMU SCS /61584 RSTRST R join S T R S join T R join S join T … 150 (SM) 2,500 (NL) … …

CMU SCS Q-Opt + Dynamic Programming Details: how to record the fact that, say R is sorted on R.a? or that the user requires sorted output? Consider the following query: Faloutsos/PavloCMU SCS /61585 SELECT * FROM R, S, T WHERE R.a = S.a AND S.b = T.b ORDER BY R.a SELECT * FROM R, S, T WHERE R.a = S.a AND S.b = T.b ORDER BY R.a

CMU SCS Q-Opt + Dynamic Programming E.g., compute R join S join T order by R.a Faloutsos/PavloCMU SCS /61586 RSTRST R join S T R S join T R join S join T … 150 (SM) 2,500 (NL)

CMU SCS Q-Opt + Dynamic Programming E.g., compute R join S join T order by R.a Faloutsos/PavloCMU SCS /61587 RSTRST R join S T R S join T R join S join T … 150 (SM) 2,500 (NL) R join S join T, sorted R.a sort Any other changes?

CMU SCS Q-Opt + Dynamic Programming Faloutsos/PavloCMU SCS /61588 RSTRST R join S T R S join T R join S join T … 150 (SM) 2,500 (NL) R join S join T, sorted R.a sort 150 (SM) R join S (R.a) T 2000 (NL) 50 (HJ)

CMU SCS Candidate Plans Faloutsos/PavloCMU SCS /61589 SELECT sname, bname, day FROM Sailors S, Reserves R, Boats B WHERE S.sid = R.sid AND R.bid = B.bid SELECT sname, bname, day FROM Sailors S, Reserves R, Boats B WHERE S.sid = R.sid AND R.bid = B.bid 1.Enumerate relation orderings: R S B S R B S B R x B S R x B R S R B S

CMU SCS Candidate Plans Faloutsos/PavloCMU SCS /61590 SELECT sname, bname, day FROM Sailors S, Reserves R, Boats B WHERE S.sid = R.sid AND R.bid = B.bid SELECT sname, bname, day FROM Sailors S, Reserves R, Boats B WHERE S.sid = R.sid AND R.bid = B.bid 1.Enumerate relation orderings: R S B S R B S B R x B S R x B R S R B S Prune plans with cross-products immediately!

CMU SCS Candidate Plans Faloutsos/PavloCMU SCS /61591 SELECT sname, bname, day FROM Sailors S, Reserves R, Boats B WHERE S.sid = R.sid AND R.bid = B.bid SELECT sname, bname, day FROM Sailors S, Reserves R, Boats B WHERE S.sid = R.sid AND R.bid = B.bid 1.Enumerate relation orderings: R S B S R B S B R x B S R x B R S R B S X X Prune plans with cross-products immediately!

CMU SCS Candidate Plans Faloutsos/PavloCMU SCS /61592 SELECT sname, bname, day FROM Sailors S, Reserves R, Boats B WHERE S.sid = R.sid AND R.bid = B.bid SELECT sname, bname, day FROM Sailors S, Reserves R, Boats B WHERE S.sid = R.sid AND R.bid = B.bid 2.Enumerate join algorithm choices: R S B R S B NLJ R S B HJ R S B R S B NLJ Do this for the other plans.

CMU SCS Candidate Plans Faloutsos/PavloCMU SCS /61593 SELECT sname, bname, day FROM Sailors S, Reserves R, Boats B WHERE S.sid = R.sid AND R.bid = B.bid SELECT sname, bname, day FROM Sailors S, Reserves R, Boats B WHERE S.sid = R.sid AND R.bid = B.bid 3.Enumerate access method choices: R S B NLJ R S B Heap Scan R S B NLJ Heap Scan Index Scan (R.sid) Heap Scan Do this for the other plans.

CMU SCS Candidate Plans Faloutsos/PavloCMU SCS /61594 SELECT sname, bname, day FROM Sailors S, Reserves R, Boats B WHERE S.sid = R.sid AND R.bid = B.bid SELECT sname, bname, day FROM Sailors S, Reserves R, Boats B WHERE S.sid = R.sid AND R.bid = B.bid 4.Now we can estimate the cost of each plan. R S B NLJ Heap ScanIndex Scan (R.sid) Heap Scan

CMU SCS Query Optimization Bring query in internal form (eg., parse tree) … into “canonical form” (syntactic q-opt) Generate alternative plans. –Single relation. –Multiple relations. –Nested sub-queries. Estimate cost for each plan. Pick the best one. Faloutsos/PavloCMU SCS /61595

CMU SCS Nested Sub-Queries Re-write nested queries to: de-correlate and/or flatten them Faloutsos/PavloCMU SCS /61596

CMU SCS Nested Sub-Queries Faloutsos/PavloCMU SCS /61597 SELECT S.sid, MIN(R.day) FROM Sailors S, Reserves R, Boats B WHERE S.sid = R.sid AND R.bid = B.bid AND B.color = ‘red’ AND S.rating = (SELECT MAX(S2.rating) FROM Sailors S2) GROUP BY S.sid HAVING COUNT(*) > 1 SELECT S.sid, MIN(R.day) FROM Sailors S, Reserves R, Boats B WHERE S.sid = R.sid AND R.bid = B.bid AND B.color = ‘red’ AND S.rating = (SELECT MAX(S2.rating) FROM Sailors S2) GROUP BY S.sid HAVING COUNT(*) > 1 For each sailor with the highest rating (over all sailors) and at least two reservations for red boats, find the sailor id and the earliest date on which the sailor has a reservation for a red boat.

CMU SCS Decomposing Queries into Blocks The optimizer breaks up queries into blocks and then concentrates on one block at a time. Faloutsos/PavloCMU SCS /61598

CMU SCS Decomposing Queries into Blocks Faloutsos/PavloCMU SCS /61599 SELECT S.sid, MIN(R.day) FROM Sailors S, Reserves R, Boats B WHERE S.sid = R.sid AND R.bid = B.bid AND B.color = ‘red’ AND S.rating = (SELECT MAX(S2.rating) FROM Sailors S2) GROUP BY S.sid HAVING COUNT(*) > 1 SELECT S.sid, MIN(R.day) FROM Sailors S, Reserves R, Boats B WHERE S.sid = R.sid AND R.bid = B.bid AND B.color = ‘red’ AND S.rating = (SELECT MAX(S2.rating) FROM Sailors S2) GROUP BY S.sid HAVING COUNT(*) > 1 Nested BlockOuter Block

CMU SCS Decomposing Queries into Blocks The optimizer breaks up queries into blocks and then concentrates on one block at a time. Split n-way joins into 2-way joins, then individually optimize. Faloutsos/PavloCMU SCS /615100

CMU SCS Query Optimizer Overview System R: –Break query in query blocks –Simple queries (ie., no joins): look at stats –n-way joins: left-deep join trees; ie., only one intermediate result at a time Pros: smaller search space; pipelining Cons: may miss optimal –2-way joins: NL and sort-merge Faloutsos/PavloCMU SCS /615101

CMU SCS Postgres Genetic algorithm Faloutsos/PavloCMU SCS /615102

CMU SCS Conclusions Ideas to remember: –Syntactic q-opt – do selections early –Selectivity estimations (uniformity, indep.; histograms; join selectivity) –Hash join (nested loops; sort-merge) –Left-deep joins –Dynamic programming Faloutsos/PavloCMU SCS /615103