C. Faloutsos Query Optimization – part 2

Slides:



Advertisements
Similar presentations
CMU SCS /615Faloutsos/Pavlo1 Carnegie Mellon Univ. Dept. of Computer Science /615 – DB Applications C. Faloutsos & A. Pavlo Lecture #13: Query.
Advertisements

Query processing and optimization. Advanced DatabasesQuery processing and optimization2 Definitions Query processing –translation of query into low-level.
CS 245Notes 71 CS 245: Database System Principles Notes 7: Query Optimization Hector Garcia-Molina.
Quick Review of Apr 17 material Multiple-Key Access –There are good and bad ways to run queries on multiple single keys Indices on Multiple Attributes.
Quick Review of Apr 22 material Sections 13.1 through 13.3 in text Query Processing: take an SQL query and: –parse/translate it into an internal representation.
Query Optimization 3 Cost Estimation R&G, Chapters 12, 13, 14 Lecture 15.
1 Relational Operators. 2 Outline Logical/physical operators Cost parameters and sorting One-pass algorithms Nested-loop joins Two-pass algorithms.
CMU SCS Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications C. Faloutsos – A. Pavlo Lecture#14: Implementation of Relational Operations.
Query Processing. Steps in Query Processing Validate and translate the query –Good syntax. –All referenced relations exist. –Translate the SQL to relational.
Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)
Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Query Processing (based on notes by C. Faloutsos at CMU)
CS411 Database Systems Kazuhiro Minami 11: Query Execution.
CS4432: Database Systems II Query Processing- Part 2.
Lecture 17: Query Execution Tuesday, February 28, 2001.
Chapter 4: Query Processing
Database Applications (15-415) DBMS Internals- Part VIII Lecture 17, Oct 30, 2016 Mohammad Hammoud.
CS 540 Database Management Systems
CS 440 Database Management Systems
Database Management System
Database Applications (15-415) DBMS Internals- Part VII Lecture 16, October 25, 2016 Mohammad Hammoud.
Chapter 12: Query Processing
Query Processing.
Data Engineering Query Optimization (Cost-based optimization)
Introduction to Query Optimization
Evaluation of Relational Operations
Database Management Systems (CS 564)
Evaluation of Relational Operations: Other Operations
Introduction to Database Systems
Faloutsos/Pavlo C. Faloutsos – A. Pavlo Lecture#15: Query Optimization
File Processing : Query Processing
File Processing : Query Processing
Dynamic Hashing Good for database that grows and shrinks in size
Query Processing and Optimization
C. Faloutsos Query Optimization – part 1
Query Optimization Overview
Lecture#12: External Sorting (R&G, Ch13)
Database Applications (15-415) DBMS Internals- Part VII Lecture 19, March 27, 2018 Mohammad Hammoud.
Database Applications (15-415) DBMS Internals- Part VI Lecture 15, Oct 23, 2016 Mohammad Hammoud.
Yan Huang - CSCI5330 Database Implementation – Access Methods
Query Processing B.Ramamurthy Chapter 12 11/27/2018 B.Ramamurthy.
Query Processing.
Chapter 13: Query Processing
Query processing and optimization
Chapter 13: Query Processing
CS143:Evaluation and Optimization
Faloutsos/Pavlo C. Faloutsos – A. Pavlo Lecture#13: Query Evaluation
Chapter 12: Query Processing
Chapter 13: Query Processing
Database Applications (15-415) DBMS Internals- Part IX Lecture 21, April 1, 2018 Mohammad Hammoud.
Highlights of Query Processing And Optimization
Module 13: Query Processing
Chapters 15 and 16b: Query Optimization
Chapter 13: Query Processing
Chapter 12: Query Processing
Lecture 2- Query Processing (continued)
Chapter 13: Query Processing
Lecture 23: Query Execution
Query processing and optimization
Evaluation of Relational Operations: Other Techniques
Overview of Query Evaluation: JOINS
Sorting We may build an index on the relation, and then use the index to read the relation in sorted order. May lead to one disk block access for each.
Lecture 22: Query Execution
Chapter 13: Query Processing
External Sorting Sorting is used in implementing many relational operations Problem: Relations are typically large, do not fit in main memory So cannot.
Lecture 11: B+ Trees and Query Execution
Chapter 13: Query Processing
Chapter 12: Query Processing
Evaluation of Relational Operations: Other Techniques
Lecture 20: Query Execution
Presentation transcript:

C. Faloutsos Query Optimization – part 2 Carnegie Mellon Univ. Dept. of Computer Science 15-415 - Database Applications C. Faloutsos Query Optimization – part 2

General Overview - rel. model Relational model - SQL Functional Dependencies & Normalization Physical Design Indexing Query optimization Transaction processing 15-415 - C. Faloutsos

Q-opt steps bring query in internal form (eg., parse tree) … into ‘canonical form’ (syntactic q-opt) generate alt. plans selections (simple; complex predicates) sorting; projections joins estimate cost; pick best 15-415 - C. Faloutsos

Reminder – statistics: … Sr #1 #2 #3 #nr for each relation ‘r’ we keep nr : # tuples; Sr : size of tuple in bytes V(A,r): number of distinct values of attr. ‘A’ 15-415 - C. Faloutsos

Derivable statistics fr: blocking factor = max# records/block (= B/Sr ; B: block size in bytes) br: # blocks (= nr / fr ) 15-415 - C. Faloutsos

Derivable statistics SC(A,r) = selection cardinality = avg# of records with A=given (= nr / V(A,r) ) (assumes uniformity...) – eg: 10,000 students, 10 colleges – how many students in SCS? 15-415 - C. Faloutsos

Selections we saw simple predicates (A=constant; eg., ‘name=Smith’) how about more complex predicates, like ‘salary > 10K’ ‘age = 30 and job-code=“analyst” ’ what is their selectivity? 15-415 - C. Faloutsos

Selections – complex predicates selectivity sel(P) of predicate P : == fraction of tuples that qualify sel(P) = SC(P) / nr 15-415 - C. Faloutsos

Selections – complex predicates eg., assume that V(grade, TAKES)=5 distinct values simple predicate P: A=constant sel(A=constant) = 1/V(A,r) eg., sel(grade=‘B’) = 1/5 (what if V(A,r) is unknown??) grade count A F 15-415 - C. Faloutsos

Selections – complex predicates range query: sel( grade >= ‘C’) sel(A>a) = (Amax – a) / (Amax – Amin) grade count A F 15-415 - C. Faloutsos

Selections - complex predicates negation: sel( grade != ‘C’) sel( not P) = 1 – sel(P) (Observation: selectivity =~ probability) grade count A F ‘P’ 15-415 - C. Faloutsos

Selections – complex predicates conjunction: sel( grade = ‘C’ and course = ‘415’) sel(P1 and P2) = sel(P1) * sel(P2) INDEPENDENCE ASSUMPTION P1 P2 15-415 - C. Faloutsos

Selections – complex predicates disjunction: sel( grade = ‘C’ or course = ‘415’) sel(P1 or P2) = sel(P1) + sel(P2) – sel(P1 and P2) = sel(P1) + sel(P2) – sel(P1)*sel(P2) INDEPENDENCE ASSUMPTION, again P1 P2 15-415 - C. Faloutsos

Selections – complex predicates disjunction: in general sel(P1 or P2 or … Pn) = 1 - (1- sel(P1) ) * (1 - sel(P2) ) * … (1 - sel(Pn)) P1 P2 15-415 - C. Faloutsos

Selections – summary sel(A=constant) = 1/V(A,r) sel( A>a) = (Amax – a) / (Amax – Amin) sel(not P) = 1 – sel(P) sel(P1 and P2) = sel(P1) * sel(P2) sel(P1 or P2) = sel(P1) + sel(P2) – sel(P1)*sel(P2) UNIFORMITY and INDEPENDENCE ASSUMPTIONS 15-415 - C. Faloutsos

Q-opt steps bring query in internal form (eg., parse tree) C. Faloutsos Q-opt steps bring query in internal form (eg., parse tree) … into ‘canonical form’ (syntactic q-opt) generate alt. plans selections (simple; complex predicates) sorting; projections joins estimate cost; pick best 15-415 - C. Faloutsos CMU - 15-415

Sorting Assume br blocks of rel. ‘r’, and C. Faloutsos Sorting Assume br blocks of rel. ‘r’, and only M (<br) buffers in main memory Q1: how to sort (‘external sorting’)? Q2: cost? ... 1 2 M br ‘r’ 15-415 - C. Faloutsos CMU - 15-415

Sorting Q1: how to sort (‘external sorting’)? A1: C. Faloutsos Sorting Q1: how to sort (‘external sorting’)? A1: create sorted runs of size M merge ... 1 2 M br ‘r’ 15-415 - C. Faloutsos CMU - 15-415

Sorting create sorted runs of size M (how many?) merge them (how?) ... C. Faloutsos Sorting create sorted runs of size M (how many?) merge them (how?) M ... ... 15-415 - C. Faloutsos CMU - 15-415

Sorting create sorted runs of size M C. Faloutsos Sorting create sorted runs of size M merge first M-1 runs into a sorted run of (M-1) *M, ... M ….. ... ... 15-415 - C. Faloutsos CMU - 15-415

Sorting How many steps we need to do? ‘i’, where M*(M-1)^i > br C. Faloutsos Sorting How many steps we need to do? ‘i’, where M*(M-1)^i > br How many reads/writes per step? br+br M ….. ... ... 15-415 - C. Faloutsos CMU - 15-415

Sorting In short, excluding the final ‘write’, we need C. Faloutsos Sorting In short, excluding the final ‘write’, we need ceil(log(br/M) / log(M-1)) * 2 * br + br M ….. ... ... 15-415 - C. Faloutsos CMU - 15-415

Q-opt steps bring query in internal form (eg., parse tree) C. Faloutsos Q-opt steps bring query in internal form (eg., parse tree) … into ‘canonical form’ (syntactic q-opt) generate alt. plans selections (simple; complex predicates) sorting; projections, aggregations joins estimate cost; pick best 15-415 - C. Faloutsos CMU - 15-415

Projection - dup. elimination C. Faloutsos Projection - dup. elimination eg., select distinct c-id from TAKES How? Pros and cons? 15-415 - C. Faloutsos CMU - 15-415

Set operations eg., How? Pros and cons? select * from REGULAR-STUDENT C. Faloutsos Set operations eg., select * from REGULAR-STUDENT union select * from SPECIAL-STUDENT How? Pros and cons? 15-415 - C. Faloutsos CMU - 15-415

Aggregations eg., How? select ssn, avg(grade) from TAKES group by ssn C. Faloutsos Aggregations eg., select ssn, avg(grade) from TAKES group by ssn How? 15-415 - C. Faloutsos CMU - 15-415

Q-opt steps bring query in internal form (eg., parse tree) C. Faloutsos Q-opt steps bring query in internal form (eg., parse tree) … into ‘canonical form’ (syntactic q-opt) generate alt. plans selections; sorting; projections, aggregations joins 2-way joins n-way joins estimate cost; pick best 15-415 - C. Faloutsos CMU - 15-415

2-way joins output size estimation: r JOIN s nr, ns tuples each case#1: cartesian product (R, S have no common attribute) #of output tuples=?? 15-415 - C. Faloutsos

2-way joins output size estimation: r JOIN s case#2: r(A,B), s(A,C,D), A is cand. key for ‘r’ #of output tuples=?? <=ns r(A, ...) s(A, ......) nr ns 15-415 - C. Faloutsos

2-way joins output size estimation: r JOIN s case#3: r(A,B), s(A,C,D), A is cand. key for neither (is it possible??) #of output tuples=?? r(A, ...) s(A, ......) nr ns 15-415 - C. Faloutsos

2-way joins #of output tuples~ nr * ns/V(A,s) or ns * nr/V(A,r) (whichever is less) r(A, ...) s(A, ......) nr ns 15-415 - C. Faloutsos

Q-opt steps bring query in internal form (eg., parse tree) C. Faloutsos Q-opt steps bring query in internal form (eg., parse tree) … into ‘canonical form’ (syntactic q-opt) generate alt. plans selections; sorting; projections, aggregations joins 2-way joins - output size estimation; algorithms n-way joins estimate cost; pick best 15-415 - C. Faloutsos CMU - 15-415

2-way joins algorithm(s) for r JOIN s? nr, ns tuples each r(A, ...) 15-415 - C. Faloutsos

2-way joins Algorithm #0: (naive) nested loop (SLOW!) for each tuple tr of r for each tuple ts of s print, if they match r(A, ...) s(A, ......) nr ns 15-415 - C. Faloutsos

2-way joins Algorithm #0: why is it bad? how many disk accesses (‘br’ and ‘bs’ are the number of blocks for ‘r’ and ‘s’)? br + nr*bs r(A, ...) s(A, ......) nr ns 15-415 - C. Faloutsos

2-way joins Algorithm #1: Blocked nested-loop join read in a block of r read in a block of s print matching tuples cost: br + br * bs r(A, ...) s(A, ......) nr, br ns records, bs blocks 15-415 - C. Faloutsos

2-way joins Arithmetic example: nr = 10,000 tuples, br = 1,000 blocks ns = 1,000 tuples, bs = 200 blocks alg#0: 2,001,000 d.a. alg#1: 201,000 d.a. r(A, ...) s(A, ......) 10,000 1,000 1,000 records, 200 blocks 15-415 - C. Faloutsos

smallest relation in outer loop 2-way joins Observation1: Algo#1: asymmetric: cost: br + br * bs - reverse roles: cost= bs + bs*br Best choice? smallest relation in outer loop r(A, ...) s(A, ......) nr, br ns records, bs blocks 15-415 - C. Faloutsos

2-way joins Observation2 [NOT IN BOOK]: what if we have ‘k’ buffers available? read in ‘k-1’ blocks of ‘r’ read in a block of ‘s’ print matching tuples r(A, ...) s(A, ......) nr, br ns records, bs blocks 15-415 - C. Faloutsos

2-way joins Cost? what if br=k-1? read in ‘k-1’ blocks of ‘r’ read in a block of ‘s’ print matching tuples Cost? br + br/(k-1) * bs what if br=k-1? what if we assign k-1 blocks to inner?) r(A, ...) s(A, ......) nr, br ns records, bs blocks 15-415 - C. Faloutsos

2-way joins Observation3: can we get rid of the ‘br’ term? cost: br + br * bs A: read the inner relation backwards half of the times! Q: cons? r(A, ...) s(A, ......) nr, br ns records, bs blocks 15-415 - C. Faloutsos

2-way joins Other algorithm(s) for r JOIN s? nr, ns tuples each s(A, ......) nr ns 15-415 - C. Faloutsos

2-way joins - other algo’s sort-merge sort ‘r’; sort ‘s’; merge sorted versions (good, if one or both are already sorted) r(A, ...) s(A, ......) nr ns 15-415 - C. Faloutsos

2-way joins - other algo’s sort-merge - cost: ~ 2* br * log(br) + 2* bs * log(bs) + br + bs needs temporary space (for sorted versions) gives output in sorted order r(A, ...) s(A, ......) nr ns 15-415 - C. Faloutsos

2-way joins - other algo’s use an existing index, or even build one on the fly cost: br + nr * c (c: look-up cost) r(A, ...) s(A, ......) ns nr 15-415 - C. Faloutsos

2-way joins - other algo’s hash join: hash ‘r’ into (0, 1, ..., ‘max’) buckets hash ‘s’ into buckets (same hash function) join each pair of matching buckets r(A, ...) s(A, ......) 1 max 15-415 - C. Faloutsos

2-way joins - hash join details how to join each pair of partitions Hr-i, Hs-i ? A: build another hash table for Hs-i, and probe it with each tuple of Hr-i r(A, ...) Hr-0 Hs-0 s(A, ......) 1 max 15-415 - C. Faloutsos

2-way joins - hash join details what if Hs-i is too large to fit in main-memory? A: recursive partitioning more details (overflows, hybrid hash joins): in book cost of hash join? (under certain assumptions) 3(br + bs) + 2* max 15-415 - C. Faloutsos

Q-opt steps bring query in internal form (eg., parse tree) C. Faloutsos Q-opt steps bring query in internal form (eg., parse tree) … into ‘canonical form’ (syntactic q-opt) generate alt. plans selections; sorting; projections, aggregations joins 2-way joins - output size estimation; algorithms n-way joins estimate cost; pick best 15-415 - C. Faloutsos CMU - 15-415

n-way joins r1 JOIN r2 JOIN ... JOIN rn typically, break problem into 2-way joins 15-415 - C. Faloutsos

Structure of query optimizers: System R: break query in query blocks simple queries (ie., no joins): look at stats n-way joins: left-deep join trees; ie., only one intermediate result at a time pros: smaller search space; pipelining cons: may miss optimal 2-way joins: NL and sort-merge r1 r2 r3 r4 15-415 - C. Faloutsos

Structure of query optimizers: More heuristics by Oracle, Sybase and Starburst (-> DB2) : in book In general: q-opt is very important for large databases. (‘explain select <sql-statement>’ gives plan) 15-415 - C. Faloutsos

Q-opt steps bring query in internal form (eg., parse tree) … into ‘canonical form’ (syntactic q-opt) generate alt. plans selections (simple; complex predicates) sorting; projections joins estimate cost; pick best 15-415 - C. Faloutsos