Query Processing & Optimization

Slides:



Advertisements
Similar presentations
Equality Join R X R.A=S.B S : : Relation R M PagesN Pages Relation S Pr records per page Ps records per page.
Advertisements

Chapter 15 Algorithms for Query Processing and Optimization Copyright © 2004 Pearson Education, Inc.
Query Optimization CS634 Lecture 12, Mar 12, 2014 Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
CS 540 Database Management Systems
Hashing and Indexing John Ortiz.
Database Management Systems 3ed, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 14, Part B.
Database Management Systems, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
Advanced Databases: Lecture 2 Query Optimization (I) 1 Query Optimization (introduction to query processing) Advanced Databases By Dr. Akhtar Ali.
Query Evaluation. An SQL query and its RA equiv. Employees (sin INT, ename VARCHAR(20), rating INT, age REAL) Maintenances (sin INT, planeId INT, day.
SPRING 2004CENG 3521 Query Evaluation Chapters 12, 14.
Query processing and optimization. Advanced DatabasesQuery processing and optimization2 Definitions Query processing –translation of query into low-level.
CS263 Lecture 19 Query Optimisation.  Motivation for Query Optimisation  Phases of Query Processing  Query Trees  RA Transformation Rules  Heuristic.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
Query Processing (overview)
CSCI 5708: Query Processing I Pusheng Zhang University of Minnesota Feb 3, 2004.
ACS-4902 Ron McFadyen Chapter 15 Algorithms for Query Processing and Optimization.
Query Optimization 3 Cost Estimation R&G, Chapters 12, 13, 14 Lecture 15.
ICS (072)Query Processing and Optimization 1 Chapter 15 Algorithms for Query Processing and Optimization ICS 424 Advanced Database Systems Dr.
1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
CSCI 5708: Query Processing I Pusheng Zhang University of Minnesota Feb 3, 2004.
1 External Sorting for Query Processing Yanlei Diao UMass Amherst Feb 27, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
1 Query Processing: The Basics Chapter Topics How does DBMS compute the result of a SQL queries? The most often executed operations: –Sort –Projection,
Bitmap Indexes.
Chapter 19 Query Processing and Optimization
Access Path Selection in a Relation Database Management System (summarized in section 2)
CPS216: Advanced Database Systems Notes 03:Query Processing (Overview, contd.) Shivnath Babu.
Database Management 9. course. Execution of queries.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
Chapter 13 Query Processing Melissa Jamili CS 157B November 11, 2004.
Department of Computer Science and Engineering, HKUST Slide Query Processing and Optimization Query Processing and Optimization.
Query Optimization Chap. 19. Evaluation of SQL Conceptual order of evaluation – Cartesian product of all tables in from clause – Rows not satisfying where.
Query Processing and Optimization
Query Processing. Steps in Query Processing Validate and translate the query –Good syntax. –All referenced relations exist. –Translate the SQL to relational.
Advanced Databases: Lecture 6 Query Optimization (I) 1 Introduction to query processing + Implementing Relational Algebra Advanced Databases By Dr. Akhtar.
CPSC 404, Laks V.S. Lakshmanan1 Evaluation of Relational Operations: Other Operations Chapter 14 Ramakrishnan & Gehrke (Sections ; )
12.1Database System Concepts - 6 th Edition Chapter 12: Query Processing Overview Measures of Query Cost Selection Operation Join Operation Sorting 、 Other.
SCUHolliday - COEN 17814–1 Schedule Today: u Query Processing overview.
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan Chapter 13: Query Processing.
Computing & Information Sciences Kansas State University Tuesday, 03 Apr 2007CIS 560: Database System Concepts Lecture 29 of 42 Tuesday, 03 April 2007.
Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 12: Query Processing.
Lecture 1- Query Processing Advanced Databases Masood Niazi Torshiz Islamic Azad university- Mashhad Branch
Chapter 12 Query Processing. Query Processing n Selection Operation n Sorting n Join Operation n Other Operations n Evaluation of Expressions 2.
Chapter 12 Query Processing (1) Yonsei University 2 nd Semester, 2013 Sanghyun Park.
Introduction to Query Optimization, R. Ramakrishnan and J. Gehrke 1 Introduction to Query Optimization Chapter 13.
Query Processing – Query Trees. Evaluation of SQL Conceptual order of evaluation – Cartesian product of all tables in from clause – Rows not satisfying.
Advance Database Systems Query Optimization Ch 15 Department of Computer Science The University of Lahore.
CS 440 Database Management Systems Lecture 5: Query Processing 1.
File Processing : Query Processing 2008, Spring Pusan National University Ki-Joune Li.
Query Processing – Implementing Set Operations and Joins Chap. 19.
Relational Operator Evaluation. overview Projection Two steps –Remove unwanted attributes –Eliminate any duplicate tuples The expensive part is removing.
CS 540 Database Management Systems
1 Overview of Query Evaluation Chapter Outline  Query Optimization Overview  Algorithm for Relational Operations.
CS4432: Database Systems II Query Processing- Part 1 1.
Database Management System
Chapter 12: Query Processing
Lecture 16: Relational Operators
COST ESTIMATION FOR THE RELATIONAL ALGEBRA OPERATIONS MIT 813 GROUP 15 PRESENTATION.
Database Management Systems (CS 564)
Evaluation of Relational Operations: Other Operations
File Processing : Query Processing
File Processing : Query Processing
Query Processing B.Ramamurthy Chapter 12 11/27/2018 B.Ramamurthy.
Lecture 2- Query Processing (continued)
Advance Database Systems
Chapter 12 Query Processing (1)
Implementation of Relational Operations
Evaluation of Relational Operations: Other Techniques
External Sorting Sorting is used in implementing many relational operations Problem: Relations are typically large, do not fit in main memory So cannot.
Yan Huang - CSCI5330 Database Implementation – Query Processing
Evaluation of Relational Operations: Other Techniques
Presentation transcript:

Query Processing & Optimization John Ortiz

Query Processing & Optimization Terms DBMS has algorithms to implement relational algebra expressions SQL is a different kind of high level language; specify what is wanted, not how it is obtained Optimization – not necessarily “optimal”, but reasonably efficient Techniques: Heuristic rules Cost estimation Lecture 19 Query Processing & Optimization

Query Evaluation Process Internal representation Query Scanner Parser Execution Strategies DBMS Data Answer Optimizer Runtime Database Processor Execution plan Code Generator Lecture 19 Query Processing & Optimization

Query Processing & Optimization An Example Query: Select B,D From R,S Where R.A = “c” and S.E = 2 and R.C=S.C R S Answer Lecture 19 Query Processing & Optimization

Query Processing & Optimization An Example (cont.) Plan 1 Cross product of R & S Select tuples using WHERE conditions Project on B & D Algebra expression B,D R.A=‘c’ S.E=2 R.C=S.C  R S B,D(R.A=‘c’ S.E=2 R.C=S.C (R S)) Lecture 19 Query Processing & Optimization

Query Processing & Optimization An Example (cont.) Plan 2 Select R tuples with R.A=“c” Select S tuples with S.E=2 Natural join Project B & D Algebra expression B,D S.E=2 R S R.A=‘c’ B,D( R.A=“c” (R) S.E=2 (S)) Lecture 19 Query Processing & Optimization

Query Processing & Optimization Query Evaluation How to evaluate individual relational operation? Selection: find a subset of rows in a table Join: connecting tuples from two tables Other operations: union, projection, … How to estimate cost of individual operation? How does available buffer affect the cost? How to evaluate a relational algebraic expression? Lecture 19 Query Processing & Optimization

Query Processing & Optimization Cost of Operations Cost = I/O cost + CPU cost I/O cost: # pages (reads & writes) or # operations (multiple pages) CPU cost: # comparisons or # tuples processed I/O cost dominates (for large databases) Cost depends on Types of query conditions Availability of fast access paths DBMSs keep statistics for cost estimation Lecture 19 Query Processing & Optimization

Query Processing & Optimization Notations Used to describe the cost of operations. Relations: R, S nR: # tuples in R, nS: # tuples in S bR: # pages in R dist(R.A) : # distinct values in R.A min(R.A) : smallest value in R.A max(R.A) : largest value in R.A HI: # index pages accessed (B+ tree height?) Lecture 19 Query Processing & Optimization

Query Processing & Optimization Simple Selection Simple selection: A op a(R) A is a single attribute, a is a constant, op is one of =, , <, , >, . Do not further discuss  because it requires a sequential scan of table. How many tuples will be selected? Selectivity Factor (SFA op a(R)) : Fraction of tuples of R satisfying “A op a” 0  SFA op a(R)  1 # tuples selected: NS = nR  SFA op a(R) Lecture 19 Query Processing & Optimization

Options of Simple Selection Sequential (linear) Scan General condition: cost = bR Equality on key: average cost = bR / 2 Binary Search Records are stored in sorted order Equality on key: cost = log2(bR) Equality on non-key (duplicates allowed) cost = log2(bR) + NS/bfR - 1 = sorted search time + selected – first one Lecture 19 Query Processing & Optimization

Selection Using Indexes Use index Search index to find pointers (or RecID) Follow pointers to retrieve records Cost = cost of searching index + cost of retrieving data Equality on primary index: Cost = HI + 1 Equality on clustering index: Cost = HI + NS/bfR Equality on secondary index: Cost = HI + NS Range conditions are more complex Lecture 19 Query Processing & Optimization

Example: Cost of Selection Relation: R(A, B, C) nR = 10000 tuples bfR = 20 tuples/page dist(A) = 50, dist(B) = 500 B+ tree clustering index on A with order 25 (p=25) B+ tree secondary index on B w/ order 25 Query: select * from R where A = a1 and B = b1 Relational Algebra: A=a1  B=b1 (R) Lecture 19 Query Processing & Optimization

Example: Cost of Selection (cont.) Option 1: Sequential Scan Have to go thru the entire relation Cost = bR = 10000/20 = 500 Option 2: Binary Search using A = a It is sorted on A (why?) NS = 10000/50 = 200 assuming equal distribution Cost = log2(bR) + NS/bfR - 1 = log2(500) + 200/20 - 1 = 18 Lecture 19 Query Processing & Optimization

Example: Cost of Selection (cont.) Option 3: Use index on R.A: Average order of B+ tree = (P + .5P)/2 = 19 Leaf nodes have 18 entries, internal nodes have 19 pointers # leaf nodes = 50/18 = 3 # nodes next level = 1 HI = 2 Cost = HI + NS/bfR = 2 + 200/20 = 12 Lecture 19 Query Processing & Optimization

Example: Cost of Selection (cont.) Option 4: Use index on R.B Average order = 19 NS = 10000/500 = 20 Use Option I (allow duplicate keys) # nodes 1st level = 10000/18 = 556 (leaf) # nodes 2nd level = 556/19 = 29 (internal) # nodes 3rd level = 29/19 = 2 (internal) # nodes 4th level = 1 HI = 4 Cost = HI + NS = 24 Lecture 19 Query Processing & Optimization

Query Processing & Optimization Summary: Selection Many different implementations. Sequential scan works always Binary search needs a sorted file Index is effective for highly selective condition Primary or clustering indexes often give good performance For general selection, working on RecID lists before retrieving data records gives better performance. Lecture 19 Query Processing & Optimization

Query Processing & Optimization Join Consider only equijoin R R.A = S.B S. Options: Cross product followed by selection R R.A = S.B S and S S.B = R.A R Nested loop join Block-based nested loop join Indexed nested loop join Merge join Hash join Lecture 19 Query Processing & Optimization

Query Processing & Optimization Cost of Join Cost = # I/O reading R & S + # I/O writing result Additional notation: M: # buffer pages available to join operation LB: # leaf blocks in B+ tree index Limitation of cost estimation Ignoring CPU costs Ignoring timing Ignoring double buffering requirements Lecture 19 Query Processing & Optimization

Estimate Size of Join Result How many tuples in join result? Cross product (special case of join) NJ = nR  nS R.A is a foreign key referencing S.B NJ = nR (assume no null value) S.B is a foreign key referencing R.A NJ = nS (assume no null value) Both R.A & S.B are non-key Lecture 19 Query Processing & Optimization

Estimate Size of Join Result (cont.) How wide is a tuple in join result? Natural join: W = W(R) + W(S) – W(SR) Theta join: W = W(R) + W(S) What is blocking factor of join result? bfJoin = block size / W How many blocks does join result have? bJoin = NJ / bfJoin Lecture 19 Query Processing & Optimization

Block-based Nested Loop Join for each block PR of R for each block PS of S for each tuple r in PR for each tuple s in PS if r[A] == s[B] then add (r, s) to join result Lecture 19 Query Processing & Optimization

Cost of Nested Loop Join R MR S Cost of Writing Buffer M=MR+MS+1 MS Result # I/O pages: Cost = bR + (bR/MR)  bS + bJoin # I/O ops = bR/MR+(bR/MR)(bS/MS) + bJoin Lecture 19 Query Processing & Optimization

Cost of Nested Loop Join (cont.) Assume bR = 100000 pg, bS = 1000 pg For simplicity, ignore cost of writing result R as outer relation Cost = 100000 + 100000*1000 = 100100000 What if S as outer relation? Cost = 1000 + 1000*100000 = 100001000 Smaller relation should be the outer relation Rocking scan (back & forth) inner relation Cost = 1000 + 1000*(100000-1) + 1 = 100000001 Does not matter which is outer relation Lecture 19 Query Processing & Optimization

Query Processing & Optimization Query Optimization SQL Query Plan execution Answer parser Parse tree preprocessor Logic plan Choose plan Pi Alg. trans. Better LP Est. cost {(P1,C1), (P2, C2), … } LP + size Est. result size Phy. plan gen. {P1, P2, … Pn} Lecture 19 Query Processing & Optimization

Query Processing & Optimization Example: SQL query fk Students(SID, Name, GPA, Age, Advisor) Professors(PID, Name, Dept) select Name from Students where Advisor in ( select PID from Professors where Dept = “Computer Science”); Lecture 19 Query Processing & Optimization

select <SelList> from <FromList> where <Condition> Example: Parse Tree <Query> <SFW> select <SelList> from <FromList> where <Condition> <Attribute> <RelName> <Tuple> in <Query> Name Students <Attribute> ( <Query> ) Advisor <SFW> select <SelList> from <FromList> where <Condition> <Attribute> <RelName> <Attribute> = <Pattern> PID Professors Dept “Computer Science” Lecture 19 Query Processing & Optimization

Example: Generating Rel. Algebra Use a two-argument selection to handle subquery Name  Students <condition> <tuple> in PID <attribute> Dept=“Computer Science” Advisor Professors Lecture 19 Query Processing & Optimization

Example: A Logical Plan Name  Students PID Dept=“Computer Science” Professors Advisor=PID Replace IN with cross product followed by selection Lecture 19 Query Processing & Optimization

Example: Improve Logical Plan Name Students PID Dept=“Computer Science” Professors Advisor=PID Transfer cross product followed by selection into a join Lecture 19 Query Processing & Optimization

Example: Estimate Result Size Name Students PID Dept=“Computer Science” Professors Advisor=PID Need to estimate size here Lecture 19 Query Processing & Optimization

Example: A Physical plan Hash join SEQ scan index scan Parameters: Join order, buffer size Project attributes, … Students Professors Select Condition,... Also specify pipelining, one or two pass algorithm, which index to use, … Lecture 19 Query Processing & Optimization

Summary: Query Optimization Important task of DBMSs Goal is to minimize # I/O blocks Search space of execution plans is huge Heuristics based on algebraic transformation lead to good logical plan, but no guarantee of optimal plan Space of physical plans is reduced by considering left-deep plans, and search methods that use estimated cost to prune plans Need better statistics, estimation methods, … Lecture 19 Query Processing & Optimization