CPS216: Advanced Database Systems Notes 07:Query Execution Shivnath Babu.

Slides:



Advertisements
Similar presentations
CS 4432query processing - lecture 161 CS4432: Database Systems II Lecture #16 Join Processing Algorithms Professor Elke A. Rundensteiner.
Advertisements

6.830 Lecture 9 10/1/2014 Join Algorithms. Database Internals Outline Front End Admission Control Connection Management (sql) Parser (parse tree) Rewriter.
CS 540 Database Management Systems
Lecture 13: Query Execution. Where are we? File organizations: sorted, hashed, heaps. Indexes: hash index, B+-tree Indexes can be clustered or not. Data.
CS CS4432: Database Systems II Operator Algorithms Chapter 15.
Completing the Physical-Query-Plan. Query compiler so far Parsed the query. Converted it to an initial logical query plan. Improved that logical query.
Query Execution Optimizing Performance. Resolving an SQL query Since our SQL queries are very high level, the query processor must do a lot of additional.
COMP 451/651 Optimizing Performance
Query Execution Professor: Dr T.Y. Lin Prepared by, Mudra Patel Class id: 113.
Query Execution Professor: Dr T.Y. Lin Prepared by, Mudra Patel Class id: 113.
CS 245Notes 71 CS 245: Database System Principles Notes 7: Query Optimization Hector Garcia-Molina.
Cs44321 CS4432: Database Systems II Query Optimizer – Cost Based Optimization.
1 Lecture 22: Query Execution Wednesday, March 2, 2005.
Query Optimization 3 Cost Estimation R&G, Chapters 12, 13, 14 Lecture 15.
Query Execution Professor: Dr T.Y. Lin Prepared by, Mudra Patel Class id: 113.
CS186 Final Review Query Optimization.
Query Processing & Optimization
CS 4432query processing - lecture 171 CS4432: Database Systems II Lecture #17 Join Processing Algorithms (cont). Professor Elke A. Rundensteiner.
Query Execution Chapter 15 Section 15.1 Presented by Khadke, Suvarna CS 257 (Section II) Id
1 Relational Operators. 2 Outline Logical/physical operators Cost parameters and sorting One-pass algorithms Nested-loop joins Two-pass algorithms.
CPS216: Advanced Database Systems Notes 03:Query Processing (Overview, contd.) Shivnath Babu.
CPS216: Advanced Database Systems Notes 06:Query Execution (Sort and Join operators) Shivnath Babu.
ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 14 – Join Processing.
Advanced Database Systems Notes:Query Processing (Overview) Shivnath Babu.
CPS216: Advanced Database Systems Notes 08:Query Optimization (Plan Space, Query Rewrites) Shivnath Babu.
Query Processing. Steps in Query Processing Validate and translate the query –Good syntax. –All referenced relations exist. –Translate the SQL to relational.
CPS216: Data-Intensive Computing Systems Introduction to Query Processing Shivnath Babu.
Query Execution Section 15.1 Shweta Athalye CS257: Database Systems ID: 118 Section 1.
CPS216: Data-Intensive Computing Systems Query Execution (Sort and Join operators) Shivnath Babu.
CSE 544: Relational Operators, Sorting Wednesday, 5/12/2004.
CPS216: Advanced Database Systems Notes 07:Query Execution (Sort and Join operators) Shivnath Babu.
CS4432: Database Systems II Query Processing- Part 3 1.
CS411 Database Systems Kazuhiro Minami 11: Query Execution.
CPS216: Advanced Database Systems Notes 09:Query Optimization (Cost-based optimization) Shivnath Babu.
CS4432: Database Systems II Query Processing- Part 2.
CPS216: Advanced Database Systems Notes 07:Query Execution (Sort and Join operators) Shivnath Babu.
Data Engineering SQL Query Processing Shivnath Babu.
Lecture 17: Query Execution Tuesday, February 28, 2001.
Query Execution. Where are we? File organizations: sorted, hashed, heaps. Indexes: hash index, B+-tree Indexes can be clustered or not. Data can be stored.
CS 440 Database Management Systems Lecture 5: Query Processing 1.
More Optimization Exercises. Block Nested Loops Join Suppose there are B buffer pages Cost: M + ceil (M/(B-2))*N where –M is the number of pages of R.
CS 440 Database Management Systems Query Optimization 1.
CS 540 Database Management Systems
Query Execution Query compiler Execution engine Index/record mgr. Buffer manager Storage manager storage User/ Application Query update Query execution.
Tallahassee, Florida, 2016 COP5725 Advanced Database Systems Query Processing Spring 2016.
1 Lecture 23: Query Execution Monday, November 26, 2001.
CS4432: Database Systems II Query Processing- Part 1 1.
CPS216: Advanced Database Systems Notes 02:Query Processing (Overview) Shivnath Babu.
Query Execution Chapter 15 Section 15.1 Presented by Khadke, Suvarna CS 257 (Section II) Id
15.1 – Introduction to physical-Query-plan operators
CS 540 Database Management Systems
CS 440 Database Management Systems
Data Engineering Query Optimization (Cost-based optimization)
Chapter 15 QUERY EXECUTION.
Query Execution Presented by Khadke, Suvarna CS 257
Database Management Systems (CS 564)
Introduction to Database Systems
April 20th – RDBMS Internals
CS143:Evaluation and Optimization
External Joins Query Optimization 10/4/2017
Performance Join Operator Select * from R, S where R.a = S.a;
Query Execution Presented by Jiten Oswal CS 257 Chapter 15
Lecture 13: Query Execution
Relational Query Optimization
Lecture 23: Query Execution
Data-Intensive Computing Systems Query Execution (Sort and Join operators) Shivnath Babu.
Lecture 22: Query Execution
CPS216: Data-Intensive Computing Systems Query Processing (contd.)
CPS216: Advanced Database Systems Notes 03:Query Processing (Overview, contd.) Shivnath Babu.
Lecture 20: Query Execution
Presentation transcript:

CPS216: Advanced Database Systems Notes 07:Query Execution Shivnath Babu

parse Query rewriting Physical plan generation execute result SQL query parse tree logical query planstatistics physical query plan Query Processing - In class order 2; ; 16.2,16.3 1; 13, 15 4; 16.4—16.7

Roadmap Path of a SQL query Plans –Operator trees –Physical Vs Logical plans –Plumbing: Materialization Vs pipelining

Modern DBMS Architecture Disk(s) Applications OS Parser Query Optimizer Query Executor Storage Manager Logical query plan Physical query plan Access method API calls SQL File system API calls Storage system API calls DBMS

Locial Plans Vs. Physical Plans  B,D  R.A = “c” R S Natural join Best logical plan RS Index scan Table scan Hash join Project

 B,D  R.A = “c” R S Operator Plumbing Materialization: output of one operator written to disk, next operator reads from the disk Pipelining: output of one operator directly fed to next operator

 B,D  R.A = “c” R S Materialization Materialized here

 B,D  R.A = “c” R S Iterators: Pipelining  Each operator supports: Open() GetNext() Close()

Iterator for Table Scan (R) Open() { /** initialize variables */ b = first block of R; t = first tuple in block b; } GetNext() { IF (t is past last tuple in block b) { set b to next block; IF (there is no next block) /** no more tuples */ RETURN EOT; ELSE t = first tuple in b; } /** return current tuple */ oldt = t; set t to next tuple in block b; RETURN oldt; } Close() { /** nothing to be done */ }

Iterator for Select Open() { /** initialize child */ Child.Open(); } GetNext() { LOOP: t = Child.GetNext(); IF (t == EOT) { /** no more tuples */ RETURN EOT; } ELSE IF (t.A == “c”) RETURN t; ENDLOOP: } Close() { /** inform child */ Child.Close(); }  R.A = “c”

NLJ (conceptually) for each r  Lexp do for each s  Rexp do if Lexp.C = Rexp.C, output r,s Iterator for Nested Loop Join LexpRexp

Iterator for Sort Open() { /** Bulk of the work is here */ Child.Open(); Read all tuples from Child and sort them } GetNext() { IF (more tuples) RETURN next tuple in order; ELSE RETURN EOT; } Close() { /** inform child */ Child.Close(); }  R.A

Example 1: Left-Deep Plan R1(A,B) TableScan R2(B,C) TableScan R3(C,D) TableScan TNLJ Question: What is the sequence of getNext() calls?

Example 1 (contd.) Assume Statistics: –B(R1) = 1000 blocks, T(R1) = 10,000 tuples –B(R2) = 500 blocks, T(R2) = 5000 tuples –B(R3) = 1000 blocks, T(R3) = 10,000 tuples – Let X = R1 Join (R1.B = R2.B) R2 –T(X) = 1,000,000 tuples, B(X) = 200,000 blocks –Let Output = 1000 tuples Questions: –Number of getNext() calls? –Number of disk I/Os? –Assume we have 1000 blocks of memory, how can we improve the plan?

Example 2: Right-Deep Plan R3(C,D) TableScan TNLJ R1(A,B) TableScan R2(B,C) TableScan TNLJ Question: What is the sequence of getNext() calls?

Example 2 (contd.) Assume Statistics: –B(R1) = 1000 blocks, T(R1) = 10,000 tuples –B(R2) = 500 blocks, T(R2) = 5000 tuples –B(R3) = 1000 blocks, T(R3) = 10,000 tuples – Let X = R1 Join (R1.B = R2.B) R2 –T(X) = 1,000,000 tuples, B(X) = 200,000 blocks –Let Output = 1000 tuples Questions: –Number of getNext() calls? –Number of disk I/Os? –Assume we have 1000 blocks of memory, how can we improve the plan?

Questions to think about What "shape" of plan works best for nested loop joins: 'Left deep' (Example 1) or 'right deep' (Example 2)? Will sorting help for nested loop join? (Hint: think about clustered vs. unclustered indexes) Can materialization help for nested loop join? Generalize Example 1 (and 2) to 'n' relations: What is the optimal use of M blocks of memory? (I don't know the answer :-) )

Example 3: Hash-Join Plan R1(A,B) TableScan R2(B,C) TableScan R3(C,D) TableScan HJ

Example 3 (contd.) Naive materialization: –Compute hash join of R1, R2 (called X) –Write output X to disk –The (outer) hash join reads X (table scan) and reads R3 and performs hash join What is the cost of naive materialization? Suggest an improved processing strategy that shaves 2 B(X) from the above cost

Example 3 (contd.) Can this be completely pipelined if you have limited memory? How much memory do you need to be able to pipeline this plan?

Questions to think about If you are designing/building the basic set of physical operators for your database system, would you implement the hash-join as a single operator or as two operators -- one that partitions and the second that joins? If you are designing physical operators for your database system, would you implement sort- merge join as a single operator or as two operators -- one that sorts and one that merges? Think about pipelining vs. materialization issues in plans involving sort-merge joins