Query Evaluation and Optimization Main Steps 1.Translate into RA: select/project/join 2.Greedy optimization of RA: by pushing selection and projection.

Slides:



Advertisements
Similar presentations
Chapter 13: Query Processing
Advertisements

CS 245Notes 71 CS 245: Database System Principles Notes 7: Query Optimization Hector Garcia-Molina.
Copyright © 2011 Ramez Elmasri and Shamkant Navathe Algorithms for SELECT and JOIN Operations (8) Implementing the JOIN Operation: Join (EQUIJOIN, NATURAL.
1 40T1 60T2 30T3 10T4 20T5 10T6 60T7 40T8 20T9 R S C C R JOIN S?
CS CS4432: Database Systems II Operator Algorithms Chapter 15.
Database Management Systems 3ed, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 14, Part B.
Database Management Systems, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
Database Management Systems, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
Query processing and optimization. Advanced DatabasesQuery processing and optimization2 Definitions Query processing –translation of query into low-level.
CS 245Notes 71 CS 245: Database System Principles Notes 7: Query Optimization Hector Garcia-Molina.
CSCI 5708: Query Processing II Pusheng Zhang University of Minnesota Feb 5, 2004.
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Query Processing.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
Query Processing (overview)
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 13: Query Processing.
Query Processing and Optimization
Query Optimization. General Overview Relational model - SQL  Formal & commercial query languages Functional Dependencies Normalization Physical Design.
1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
1 40T1 60T2 30T3 10T4 20T5 10T6 60T7 40T8 20T9 R S C C R JOIN S?
©Silberschatz, Korth and Sudarshan14.1Database System Concepts 3 rd Edition Chapter 14: Query Optimization Overview Catalog Information for Cost Estimation.
CS 4432query processing - lecture 171 CS4432: Database Systems II Lecture #17 Join Processing Algorithms (cont). Professor Elke A. Rundensteiner.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
1 Implementation of Relational Operations: Joins.
Dr. Kalpakis CMSC 461, Database Management Systems Query Processing.
Query Processing Chapter 12
COMP 5138 Relational Database Management Systems Semester 2, 2007 Lecture 12 Query Processing and Optimization.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts 3 rd Edition Chapter 12: Query Processing  Overview  Catalog Information for Cost Estimation.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
©Silberschatz, Korth and Sudarshan7.1 Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting Join Operation Other Operations.
12.1Database System Concepts - 6 th Edition Chapter 12: Query Processing Overview Measures of Query Cost Selection Operation Join Operation Sorting 、 Other.
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan Chapter 13: Query Processing.
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 13: Query Processing.
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 13: Query Processing.
Chapter 13: Query Processing Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting Join Operation Other Operations.
Computing & Information Sciences Kansas State University Tuesday, 03 Apr 2007CIS 560: Database System Concepts Lecture 29 of 42 Tuesday, 03 April 2007.
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 13: Query Processing.
Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 12: Query Processing.
CPS216: Data-Intensive Computing Systems Query Execution (Sort and Join operators) Shivnath Babu.
Lecture 1- Query Processing Advanced Databases Masood Niazi Torshiz Islamic Azad university- Mashhad Branch
CPT-S Topics in Computer Science Big Data 1 1 Yinghui Wu EME 49.
Chapter 12 Query Processing. Query Processing n Selection Operation n Sorting n Join Operation n Other Operations n Evaluation of Expressions 2.
Chapter 13: Query Processing
Lecture 15- Parallel Databases (continued) Advanced Databases Masood Niazi Torshiz Islamic Azad University- Mashhad Branch
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 13: Query Processing.
CSCI 5708: Query Processing II Pusheng Zhang University of Minnesota Feb 5, 2004.
Computing & Information Sciences Kansas State University Monday, 03 Nov 2008CIS 560: Database System Concepts Lecture 27 of 42 Monday, 03 November 2008.
Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 12: Query Processing.
Query Processing CS 405G Introduction to Database Systems.
CS 440 Database Management Systems Lecture 5: Query Processing 1.
Lecture 3 - Query Processing (continued) Advanced Databases Masood Niazi Torshiz Islamic Azad university- Mashhad Branch
Computing & Information Sciences Kansas State University Wednesday, 08 Nov 2006CIS 560: Database System Concepts Lecture 32 of 42 Monday, 06 November 2006.
More Optimization Exercises. Block Nested Loops Join Suppose there are B buffer pages Cost: M + ceil (M/(B-2))*N where –M is the number of pages of R.
13.1 Chapter 13: Query Processing n Overview n Measures of Query Cost n Selection Operation n Sorting n Join Operation n Other Operations n Evaluation.
Chapter 12 Query Processing (2) Yonsei University 2 nd Semester, 2013 Sanghyun Park.
Chapter 13: Query Processing. Overview Measures of Query Cost Selection Operation Sorting Join Operation Other Operations Evaluation of Expressions.
File Processing : Query Processing 2008, Spring Pusan National University Ki-Joune Li.
CS 540 Database Management Systems
Query Processing and Query Optimization Database System Implementation CSE 507 Some slides adapted from Silberschatz, Korth and Sudarshan Database System.
Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 12: Query Processing.
Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 12: Query Processing.
Database Management System
Chapter 12: Query Processing
Evaluation of Relational Operations: Other Operations
File Processing : Query Processing
File Processing : Query Processing
CS143:Evaluation and Optimization
Chapters 15 and 16b: Query Optimization
Lecture 2- Query Processing (continued)
Sorting We may build an index on the relation, and then use the index to read the relation in sorted order. May lead to one disk block access for each.
Evaluation of Relational Operations: Other Techniques
Presentation transcript:

Query Evaluation and Optimization Main Steps 1.Translate into RA: select/project/join 2.Greedy optimization of RA: by pushing selection and projection into RA expression 3.Select best implementation for each operator 4.Cost-based estimation and selection of best join order 1

Greedy optimization of RA: pushing selection and projection into RA expression 2

Selection and Indexing If we have a S.C condition supported by an existing index we use the index If we have a conjunction, such as S.A>5 and S.B <10 with indexes on both, then we can select the better of the two (optimization) If there is no index, do we create one? 3

Selecting best implementation for each operator Selection: use an index if appropriate is available. Otherwise visit all the blocks Projection: trivial, unless we need to eliminate duplicates—using sorting or hashing. Joins –Nested loop, reasonable only with indexes –Nested Block Join –Sort-Merge Join –Has Join 4

5 40T1 60T2 30T3 10T4 20T5 10T6 60T7 40T8 20T9 R S C C R JOIN S?

6 For each r  R do For each s  S do if r.C = s.C then output r,s pair 40T1 60T2 30T3 10T4 20T5 10T6 60T7 40T8 20T9 R S Nested-Loop Join (NLJ) R is called the outer relation and S the inner relation of the join. Requires no indices and can be used with any kind of join condition. Expensive since it examines every pair of tuples in the two relations.

Nested-Loop Join (Cont.) In the worst case, if there is enough memory only to hold one block of each relation, the estimated cost is n R  b S + b R block transfers If the smaller relation fits entirely in memory, use that as the inner relation. – Reduces cost to b r + b s block transfers and 2 seeks Block nested-loops algorithm (next slide) is preferable.

Block Nested-Loop Join Variant of nested-loop join in which every block of inner relation is paired with every block of outer relation. for each block B r of r do begin for each block B s of s do begin for each tuple t r in B r do begin for each tuple t s in B s do begin Check if (t r,t s ) satisfy the join condition if they do, add t r t s to the result. end end end end

Block Nested-Loop Join (Cont.) Worst case estimate: b R  b S + b R block transfers –Each block in the inner relation S is read once for each block in the outer relation R. Best case: b R + b S block transfers. Many Improvements proposed to nested loop and block nested loop algorithms including: –Scan inner loop forward and backward alternately, to make use of the blocks remaining in buffer (with LRU replacement) –Use index on inner relation if available (next slide) E.g.Compute student takes, with R=student as the outer relation, S= takes the inner relation, –Number of records of student: 5,000 takes: 10,000 –Number of blocks of student: 100 takes: 400

10 (1)Create an index for S.C if one is not already there (2)For each r  R do X  index-lookup(S.C, r.C) For each s  X, output (r,s) 40T1 60T2 30T3 10T4 20T5 10T6 60T7 40T8 20T9 R S Index Join (IJ)

Merge-Join 11

12 –If the two tables fit in memory we can use bubble sorting –External sorting must be used otherwise. This is the litmus test for big data processing , 4.27 TB/min Apache Spark 100 TB in 1,406 seconds 207 Amazon EC2 i2.8xlarge nodes x (32 vCores - 2.5Ghz Intel Xeon E v2, 244GB memory, 8x800 GB SSD) Reynold Xin, Parviz Deyhim, Xiangrui Meng, Ali Ghodsi, Matei Zaharia Databricks Sort-Merge Join (SMJ)

13

14 Hash function h(v), range 1  k Algorithm (1)Hashing stage (bucketizing): hash tuples into buckets Hash R tuples into G1,…,Gk buckets Hash S tuples into H1,…,Hk buckets (2)Join stage: join tuples in matching buckets For i = 1 to k do match tuples in Gi, Hi buckets R … S … … Hash Join (HJ)

15 H(k) = k mod R S 40T1 60T2 30T3 10T4 20T5 10T6 60T7 40T8 20T9 Hash Join (HJ)

16 Step (1): Hashing stage Step (2): Join stage... Memory buckets G1 G2 Gk R S... Gi memory... Hi R H1 H2 G1 G2 G3 Hash Join (HJ)

Summary of Join Algorithms Nested-loop join ok for “small” relations (relative to memory size) Hash join usually best for equi-join if relations not sorted and no index Merge join for sorted relations Sort merge join good for non-equi-join Consider index join if index exists DBMS maintains statistics on data 17

Query Evaluation and Optimization Main Steps 1.Translate into RA: select/project/join 2.Greedy optimization of RA: by pushing selection and projection into RA expression 3.Select best implementation for each operator 4.Cost-based estimation and selection of best join order 18

©Silberschatz, Korth and Sudarshan1.19Database System Concepts - 6 th Edition Statistics collection commands DBMS has to collect statistics on tables/indexes – For optimal performance – Without stats, DBMS does stupid things… DB2 – RUNSTATS ON TABLE. AND INDEXES ALL Oracle – ANALYZE TABLE COMPUTE STATISTICS – ANALYZE TABLE ESTIMATE STATISTICS (cheaper than COMPUTE) Run the command after major update/index construction

©Silberschatz, Korth and Sudarshan1.20Database System Concepts - 6 th Edition Join Ordering Example For all relations r 1, r 2, and r 3, (r 1 r 2 ) r 3 = r 1 (r 2 r 3 ) (Join Associativity) If r 2 r 3 is quite large and r 1 r 2 is small, we choose (r 1 r 2 ) r 3 so that we compute and store a smaller temporary relation.

©Silberschatz, Korth and Sudarshan1.21Database System Concepts - 6 th Edition Estimation of the Size of Joins (Cont.) If R  S = {A} is not a key for R or S. If we assume that every tuple t in R produces tuples in R S, the number of tuples in R S is estimated to be: If the reverse is true, the estimate obtained will be: The lower of these two estimates is probably the more accurate one. Can improve on above if histograms are available Use formula similar to above, for each cell of histograms on the two relations V(A,s), V(A,r) the average number of A-values in s and r A special case: foreign keys

Statistics collection commands DBMS has to collect statistics on tables/indexes – For optimal performance – Without stats, DBMS does stupid things… DB2 – RUNSTATS ON TABLE. AND INDEXES ALL Oracle – ANALYZE TABLE COMPUTE STATISTICS – ANALYZE TABLE ESTIMATE STATISTICS (cheaper than COMPUTE) Run the command after major update/index construction