1 Optimization - Selection. 2 The Selection Operation Table: Reserves(sid, bid, day, agent) A page (block) can hold 100 Reserves tuples There are 1,000.

Slides:



Advertisements
Similar presentations
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Evaluation of Relational Operations Chapter 12, Part A.
Advertisements

Equality Join R X R.A=S.B S : : Relation R M PagesN Pages Relation S Pr records per page Ps records per page.
1 Overview of Query Evaluation Chapter Outline  Query Optimization Overview  Algorithm for Relational Operations.
Database Management Systems 3ed, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 14, Part B.
Implementation of Other Relational Algebra Operators, R. Ramakrishnan and J. Gehrke1 Implementation of other Relational Algebra Operators Chapter 12.
Database Management Systems, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
Database Management Systems, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
Advanced Databases: Lecture 2 Query Optimization (I) 1 Query Optimization (introduction to query processing) Advanced Databases By Dr. Akhtar Ali.
1 Overview of Query Evaluation Chapter Objectives  Preliminaries:  Core query processing techniques  Catalog  Access paths to data  Index matching.
SPRING 2004CENG 3521 Query Evaluation Chapters 12, 14.
1 Relational Query Optimization Module 5, Lecture 2.
Implementation of Relational Operations CS186, Fall 2005 R&G - Chapter 14 First comes thought; then organization of that thought, into ideas and plans;
1 King Saud University College of Computer & Information Sciences IS 335 Database Management System Lecture 6 Query Processing and Optimization (Practice)
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Query Evaluation Chapter 12.
1 Implementation of Relational Operations Module 5, Lecture 1.
Evaluation of Relational Operators 198:541. Relational Operations  We will consider how to implement: Selection ( ) Selects a subset of rows from relation.
1  Simple Nested Loops Join:  Block Nested Loops Join  Index Nested Loops Join  Sort Merge Join  Hash Join  Hybrid Hash Join Evaluation of Relational.
SPRING 2004CENG 3521 Join Algorithms Chapter 14. SPRING 2004CENG 3522 Schema for Examples Similar to old schema; rname added for variations. Reserves:
1 Optimization Recap and examples. 2 Optimization introduction For every SQL expression, there are many possible ways of implementation. The different.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Query Evaluation Chapter 12.
1 Query Optimization. 2 Why Optimize? Given a query of size n and a database of size m, how big can the output of applying the query to the database be?
Query Optimization II R&G, Chapters 12, 13, 14 Lecture 9.
CS186 Final Review Query Optimization.
Introduction to Database Systems 1 Join Algorithms Query Processing: Lecture 1.
1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
Evaluation of Relational Operations. Relational Operations v We will consider how to implement: – Selection ( ) Selects a subset of rows from relation.
CMU SCS Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications C. Faloutsos – A. Pavlo Lecture#14: Implementation of Relational Operations.
1 Implementation of Relational Operations: Joins.
Overview of Implementing Relational Operators and Query Evaluation
Introduction to Database Systems1 Relational Query Optimization Query Processing: Topic 2.
ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 14 – Join Processing.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Query Evaluation Chapter 12: Overview.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Implementing Relational Operators and Query Evaluation Chapter 12.
1 Overview of Query Evaluation Chapter Overview of Query Evaluation  Plan : Tree of R.A. ops, with choice of alg for each op.  Each operator typically.
Advanced Databases: Lecture 8 Query Optimization (III) 1 Query Optimization Advanced Databases By Dr. Akhtar Ali.
CPSC 404, Laks V.S. Lakshmanan1 Evaluation of Relational Operations: Other Operations Chapter 14 Ramakrishnan & Gehrke (Sections ; )
Relational Operator Evaluation. Overview Index Nested Loops Join If there is an index on the join column of one relation (say S), can make it the inner.
Lec3/Database Systems/COMP4910/031 Evaluation of Relational Operations Chapter 14.
RELATIONAL JOIN Advanced Data Structures. Equality Joins With One Join Column External Sorting 2 SELECT * FROM Reserves R1, Sailors S1 WHERE R1.sid=S1.sid.
Database Systems/comp4910/spring20031 Evaluation of Relational Operations Why does a DBMS implements several algorithms for each algebra operation? What.
Implementing Natural Joins, R. Ramakrishnan and J. Gehrke with corrections by Christoph F. Eick 1 Implementing Natural Joins.
1 Database Systems ( 資料庫系統 ) December 3, 2008 Lecture #10.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Implementing Relational Operators and Query Evaluation Chapter 12.
ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 13 – Query Evaluation.
Introduction to Query Optimization, R. Ramakrishnan and J. Gehrke 1 Introduction to Query Optimization Chapter 13.
CMU SCS Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications C. Faloutsos – A. Pavlo Lecture#14: Implementation of Relational Operations.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Introduction to Query Optimization Chapter 13.
CPSC 404, Laks V.S. Lakshmanan1 Evaluation of Relational Operations – Join Chapter 14 Ramakrishnan and Gehrke (Section 14.4)
Query Execution. Where are we? File organizations: sorted, hashed, heaps. Indexes: hash index, B+-tree Indexes can be clustered or not. Data can be stored.
Implementation of Relational Operations R&G - Chapter 14 First comes thought; then organization of that thought, into ideas and plans; then transformation.
More Optimization Exercises. Block Nested Loops Join Suppose there are B buffer pages Cost: M + ceil (M/(B-2))*N where –M is the number of pages of R.
Database Management Systems 1 Raghu Ramakrishnan Evaluation of Relational Operations Chpt 14.
Relational Operator Evaluation. overview Projection Two steps –Remove unwanted attributes –Eliminate any duplicate tuples The expensive part is removing.
Implementation of Database Systems, Jarek Gryz1 Evaluation of Relational Operations Chapter 12, Part A.
Query Execution Query compiler Execution engine Index/record mgr. Buffer manager Storage manager storage User/ Application Query update Query execution.
Alon Levy 1 Relational Operations v We will consider how to implement: – Selection ( ) Selects a subset of rows from relation. – Projection ( ) Deletes.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Evaluation of Relational Operations Chapter 14, Part A (Joins)
1 Overview of Query Evaluation Chapter Outline  Query Optimization Overview  Algorithm for Relational Operations.
Database Applications (15-415) DBMS Internals- Part VIII Lecture 19, March 29, 2016 Mohammad Hammoud.
Evaluation of Relational Operations
Overview of Query Optimization
Evaluation of Relational Operations: Other Operations
Relational Operations
Database Applications (15-415) DBMS Internals- Part VI Lecture 15, Oct 23, 2016 Mohammad Hammoud.
Implementation of Relational Operations
Evaluation of Relational Operations: Other Techniques
Evaluation of Relational Operations: Other Techniques
Presentation transcript:

1 Optimization - Selection

2 The Selection Operation Table: Reserves(sid, bid, day, agent) A page (block) can hold 100 Reserves tuples There are 1,000 pages Compute the query: SELECT * FROM Reserves R WHERE R.agent = ‘Joe’

3 General Problem Compute:  A op c (R) where –A is an attribute of R –op is an operator, such as =, <, etc. –c is a constant We assume that there are M pages in R.

4 No Index, Unsorted Data We can compute the selection by scanning the entire relation.  Cost: M (For the query on Reserves, Cost = 1,000)

5 No Index, Sorted Data Suppose that the file in which R is stored is physically sorted on A. Then, we can use binary search to find the first tuple matching the search condition. Then, scan to find all additional tuples that match the condition.  Cost: log(M) + #pages with tuples from the result (For the query on Reserves: 10 + #pages with tuples from the result)

6 B+ Tree Index Suppose that there is a B+ Tree Index available on attribute A of R. Search the tree to find the first index entry that points to a tuple satisfying the condition. Scan leaf pages of index to find all entries in which key value satisfies the condition. Retrieve the satisfying tuples from the file.

7 Example Selection condition: rname < ‘C%’ Assume that names are uniformly distributed with respect to first letter  About 2/26  10% = 10,000 tuples = 100 pages match the condition If the B+ Tree is clustered, we can return them in 100 I/Os (plus a few to traverse the tree) Otherwise, up to 10,000 I/Os may be needed!! –Can you improve the number of I/Os needed?

8 Hash Index, Equality Selection Find appropriate bucket (about 1, 2 I/Os) Retrieve the qualifying tuples from R –Time depends on whether index is clustered. Example: Consider condition agent = ‘Joe’. Suppose there is an unclustered hash index on agent. Suppose there 100 reservations made by Joe. Cost: Could be between 1 and 100.

9 Optimization - Join

10 The Join Operation Compute the query: SELECT * FROM Reserves R, Sailors S WHERE R.sid = S.sid Number of PagesTuples per Page R1000 (M)100 (pR) S500 (N)80 (pS)

11 Block Nested Loops Join Suppose there are B buffer pages foreach block of B-2 pages of R do foreach page of S do { for all matching in-memory pairs r, s: add to result }

12 Cost Analysis R is read once: M S is read ceil(M/(B-2)) times: ceil(M/(B-2))*N Total: M + ceil(M/(B-2))*N For our Sailors, Reserves example. (B = 102) Reserves is the outer Relation: –Cost = (1,000/100)*500 = 6,000 I/Os Sailors is the outer Relation: –Cost = (500/100)*1,000 = 5,500 I/Os

13 Index Nested Loops Join Suppose there is an index on the join attribute of S We find the tuple s using the index! foreach tuple r of R foreach tuple s of S where r i =s j add to result

14 Cost Analysis 1.If index is B+ Tree, about 2-4 I/Os to find leaf. If index is Hash index, about 1-2 I/Os to find bucket 2.Retrieve tuples. Time depends on whether index is clustered. If so, cost is usually 1 I/O per r tuple. Otherwise, could be 1 I/O per matching s tuple

15 Example (1) Hash index on sid of Sailors. (about 1.2 I/Os to find bucket) sid is a key in Sailors, so there is at most one matching tuple (actually, exactly 1: why?) Scanning Reserves: 1,000 There are 100 * 1,000 = 100,000 tuples in Reserves. For each tuple, search index (1.2) and retrieve page (1) Total time: 1, ,000 * 2.2 = 221,000 I/Os

16 Example Hash index on sid of Reserves. Scanning Sailors: 500 There are 80 * 500 = 40,000 tuples in Sailors For each tuple, search index (1.2) and retreive page (??) There are 100,000 reservations for 40,000 sailors. Assuming uniform distribution on reservations, each sailor tuple matches about 2.5 reserves tuples. If index is unclustered, they may be on different pages Total: ,000*( ) = 148,500 I/Os

17 Sort-Merge Join Sort both relations on join attribute. This creates “partitions” according to the join attributes. Join relations while merging them. Tuples in corresponding partitions are joined. Cost depends on whether partitions are large and therefore, are scanned multiple times.

18 sidsnameratingage 22dustin745 28yuppy935 31lubber855 36lubber636 44guppy535 58rusty1035 sidbiddayagent /4/96Joe /3/96Frank /2/96Joe /7/96Sam /7/96Sam /6/96Frank Reserves Sailors

19 Cost Analysis Sort R: O(M logM) Sort S: O(N log N) Merge: M + N (If partitions aren’t scanned multiple times. Otherwise, worst case is M*N!!)  Cost: O(M+N+MlogM + NlogN)

20 Example Sort Reserves: 1000 log 1000  10,000 (actually better with a good algorithm for external sorting) Sort Sailors: 500 log 500  4,500 Merge: 1, = 1,500 Total: 1, , ,500 = 16,000 (actually about 7,500 if sorting is done well)

21 Hash Join //Partition R into k partitions foreach tuple r in R do //flush when fills read r and add it to buffer page h(r i ) foreach tuple s in S do //flush when fills read s and add it to buffer page h(s j ) for l = 1..k //Build in-memory hash table for R l using h2 foreach tuple r in R l do read r and insert into hash table with h2 foreach tuple s in S l do read s and probe table using h2 output matching pairs

22 Cost Analysis Partition phase - Read R, S once and write them once: 2(M + N) In the second phase, we can read each partition once, assuming that it fits into memory: M + N  Cost: 3(M + N) In our example: 3(1, ) = 4,500 I/Os

23 Estimating Result Sizes

24 Picking a Query Plan Suppose we want to find the natural join of: Reserves, Sailors, Boats. The 2 options that appear the best are (ignoring the order within a single join): (Sailors  Reserves)  Boats Sailors  (Reserves  Boats) We would like intermediate results to be as small as possible. Which is better?

25 Analyzing Result Sizes In order to answer the question in the previous slide, we must be able to estimate the size of (Sailors  Reserves) and (Reserves  Boats). The DBMS stores statistics about the relations and indexes. They are updated periodically (not every time the underlying relations are modified).

26 Statistics Maintained by DBMS Cardinality: Number of tuples NTuples(R) in each relation R Size: Number of pages NPages(R) in each relation R Index Cardinality: Number of distinct key values NKeys(I) for each index I Index Size: Number of pages INPages(I) in each index I Index Height: Number of non-leaf levels IHeight(I) in each B+ Tree index I Index Range: The minimum ILow(I) and maximum value IHigh(I) for each index I

27 Estimating Result Sizes Consider The maximum number of tuples is the product of the cardinalities of the relations in the FROM clause The WHERE clause is associating a reduction factor with each term Estimated result size is: maximum size times product of reduction factors SELECT attribute-list FROM relation-list WHERE term 1 and... and term n

28 Estimating Reduction Factors column = value: 1/NKeys(I) if there is an index I on column. This assumes a uniform distribution. Otherwise, System R assumes 1/10. column1 = column2: 1/Max(NKeys(I1),NKeys(I2)) if there is an index I1 on column1 and I2 on column2. If only one column has an index, we use it to estimate the value. Otherwise, use 1/10. column > value: (High(I)-value)/(High(I)-Low(I)) if there is an index I on column.

29 Example Cardinality(R) = 1,000 * 100 = 100,000 Cardinality(S) = 500 * 80 = 40,000 NKeys(Index on R.agent) = 100 High(Index on Rating) = 10, Low = 0 SELECT * FROM Reserves R, Sailors S WHERE R.sid = S.sid and S.rating > 3 and R.agent = ‘Joe’

30 Example (cont.) Maximum cardinality: 100,000 * 40,000 Reduction factor of R.sid = S.sid: 1/40,000 Reduction factor of S.rating > 3: (10–3)/(10-0) = 7/10 Reduction factor of R.agent = ‘Joe’: 1/100 Total Estimated size: 700