Notions of clustering Clustered file: e.g. store movie tuples together with the corresponding studio tuple. Clustered relation: tuples are stored in blocks.

Slides:



Advertisements
Similar presentations
1 Lecture 23: Query Execution Friday, March 4, 2005.
Advertisements

CS CS4432: Database Systems II Operator Algorithms Chapter 15.
Dr. Kalpakis CMSC 661, Principles of Database Systems Query Execution [15]
Completing the Physical-Query-Plan. Query compiler so far Parsed the query. Converted it to an initial logical query plan. Improved that logical query.
Query Evaluation. An SQL query and its RA equiv. Employees (sin INT, ename VARCHAR(20), rating INT, age REAL) Maintenances (sin INT, planeId INT, day.
Query Execution Since our SQL queries are very high level the query processor does a lot of processing to supply all the details. An SQL query is translated.
Notions of clustering Clustered relation: tuples are stored in blocks mostly devoted to that relation. Clustering index: tuples (of the relation) with.
Query Execution Optimizing Performance. Resolving an SQL query Since our SQL queries are very high level, the query processor must do a lot of additional.
Query Evaluation. SQL to ERA SQL queries are translated into extended relational algebra. Query evaluation plans are represented as trees of relational.
COMP 451/651 Optimizing Performance
Indexes. An index on an attribute A of a relation is a data structure that makes it efficient to find those tuples that have a fixed value for attribute.
Nested-Loop joins “one-and-a-half” pass method, since one relation will be read just once. Tuple-Based Nested-loop Join Algorithm: FOR each tuple s in.
Algebraic Laws For the binary operators, we push the selection only if all attributes in the condition C are in R.
15.3 Nested-Loop Joins By: Saloni Tamotia (215). Introduction to Nested-Loop Joins  Used for relations of any side.  Not necessary that relation fits.
Estimating the Cost of Operations We don’t want to execute the query in order to learn the costs. So, we need to estimate the costs. How can we estimate.
15.6 Index-based Algorithms Jindou Jiao 101. Index-based algorithms are especially useful for the selection operator Algorithms for join and other binary.
Lecture 24: Query Execution Monday, November 20, 2000.
15.6 Index-Based Algorithms Sadiya Hameed ID: 206 CS257.
Estimating the Cost of Operations. From l.q.p. to p.q.p Having parsed a query and transformed it into a logical query plan, we must turn the logical plan.
Parallel Algorithms for Relational Operations. Models of Parallelism There is a collection of processors. –Often the number of processors p is large,
1 Query Processing Two-Pass Algorithms Source: our textbook.
Parallel Algorithms for Relational Operations. Many processors...and disks There is a collection of processors. –Often the number of processors p is large,
Query Execution :Nested-Loop Joins Rohit Deshmukh ID 120 CS-257 Rohit Deshmukh ID 120 CS-257.
1 Query Processing: The Basics Chapter Topics How does DBMS compute the result of a SQL queries? The most often executed operations: –Sort –Projection,
Query Processing and Optimization. Query Processing Efficient Query Processing crucial for good or even effective operations of a database Query Processing.
1 Relational Operators. 2 Outline Logical/physical operators Cost parameters and sorting One-pass algorithms Nested-loop joins Two-pass algorithms.
CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 242 Database Systems II Query Execution.
CSCE Database Systems Chapter 15: Query Execution 1.
Query Execution Optimizing Performance. Resolving an SQL query Since our SQL queries are very high level, the query processor must do a lot of additional.
CS 338Query Evaluation7-1 Query Evaluation Lecture Topics Query interpretation Basic operations Costs of basic operations Examples Textbook Chapter 12.
Introduction to Indexes. Indexes An index on an attribute A of a relation is a data structure that makes it efficient to find those tuples that have a.
CPS216: Advanced Database Systems Query Rewrite Rules for Subqueries Shivnath Babu.
SCUHolliday - COEN 17814–1 Schedule Today: u Query Processing overview.
DBMS 2001Notes 5: Query Processing1 Principles of Database Management Systems 5: Query Processing Pekka Kilpeläinen (partially based on Stanford CS245.
Index Example From Garcia-Molina, Ullman, and Widom: Database Systems, the Complete Book pp
CPS216: Data-Intensive Computing Systems Query Execution (Sort and Join operators) Shivnath Babu.
Estimating the Cost of Operations. Suppose we have parsed a query and transformed it into a logical query plan (lqp) Also suppose all possible transformations.
CS4432: Database Systems II Query Processing- Part 3 1.
CS411 Database Systems Kazuhiro Minami 11: Query Execution.
Chapter 6 Query Execution. Query Query Compilation (Chapter 7 ) query plan Query execution metadata ( Chapter 6 ) data the major parts Of the query processor.
Multi pass algorithms. Nested-Loop joins Tuple-Based Nested-loop Join Algorithm: FOR each tuple s in S DO FOR each tuple r in R DO IF r and s join to.
CS4432: Database Systems II Query Processing- Part 2.
CSCE Database Systems Chapter 15: Query Execution 1.
Query Processing CS 405G Introduction to Database Systems.
Lecture 17: Query Execution Tuesday, February 28, 2001.
CS 440 Database Management Systems Lecture 5: Query Processing 1.
1 Lecture 23: Query Execution Monday, November 26, 2001.
CS4432: Database Systems II Query Processing- Part 1 1.
Two-Pass Algorithms Based on Sorting
CS 440 Database Management Systems
Database Management System
Chapter 15 QUERY EXECUTION.
Database Management Systems (CS 564)
Cse 344 April 25th – Disk i/o.
Yan Huang - CSCI5330 Database Implementation – Access Methods
Sidharth Mishra Dr. T.Y. Lin CS 257 Section 1 MH 222 SJSU - Fall 2016
15.6 Index Based Algorithms
Lecture 2- Query Processing (continued)
One-Pass Algorithms for Database Operations (15.2)
Lecture 24: Query Execution
Query Execution Index Based Algorithms (15.6)
Data-Intensive Computing Systems Query Execution (Sort and Join operators) Shivnath Babu.
Lecture 22: Query Execution
Sorting We may build an index on the relation, and then use the index to read the relation in sorted order. May lead to one disk block access for each.
Lecture 22: Query Execution
External Sorting Sorting is used in implementing many relational operations Problem: Relations are typically large, do not fit in main memory So cannot.
Lecture 11: B+ Trees and Query Execution
Lecture 22: Friday, November 22, 2002.
Lecture 24: Query Execution
Lecture 20: Query Execution
Presentation transcript:

Notions of clustering Clustered file: e.g. store movie tuples together with the corresponding studio tuple. Clustered relation: tuples are stored in blocks mostly devoted to that relation. Clustering index: tuples (of the relation) with same search key are stored together.

Index-based algorithms: selection To evaluate  a= (R) use index on a, if it exists Cost: cost of index lookup (negligible) plus If index is clustering: B(R)/V(R,a) I/O’s (the fraction of the relation with some value for a) Otherwise, an approximation is that each tuple we retrieve is in a different block, so we get: T(R)/V(R,a) I/O’s

Example of index-based selection  a= (R), and B(R) = 1000, T(R) =20,000 R is clustered, and no index on attribute a  1000 disk I/O’s R is unclustered, and no index on attribute a  20,000 I/O’s R has a clustering index on a, V(R,a) = 100  10 I/O’s R has a non-clustering index on a, V(R,a) = 100  20,000/100 = 200 disk I/O’s V(R,a) = 20,000 (i.e. attribute a is key)  just 1 I/O

Index joins We want to compute R(X,Y)  S(Y,Z) Suppose there is a Y-index on S. For each tuple t of R, lookup all tuples in S with key-value t[Y] and output the join of t. Cost: B(R) to read R (clustered case) --- We ignore this cost Each tuple of R joins with T(S)/V(S,Y) tuples of S, on average. S has a non-clustered index on Y:  T(R)T(S)/V(S,Y) S has a clustered index on Y:  T(R)B(S)/V(S,Y)

Example of index-join T(R) = 10,000, B(R) = 1000 T(S) = 5000, B(S) = 500, V(S,Y) = 100 To compute R(X,Y)  S(Y,Z) using a clustered Y-index on S: ,000*(500/100) = 51,000 I/O’s Bad!!

However, things are not so bad in practice Suppose we have the relations: StarsIn(title, year, starName) MovieStar(name, address, gender, birthdate) And there is an index on MovieStar.name Consider the SQL query: SELECT birthdate FROM StarsIn, MovieStar WHERE title = 'King Kong' AND starName = name;

We can first do the selection of those tuples in StarsIn relation with title=‘King Kong’. Suppose they are 10 such tuples. Now, we know that stars take care to not have the same name with some other star. So, name is a key for the relation MovieStar. (V(MovieStar, name) = ?) Hence, V(MovieStar, name) = T(MovieStar) Finally the number of I/O’s is: B(StarsIn) + T(  name=‘King Kong’ (StarsIn)) I/O’s for R clustered and T(StarsIn) + T(  name=‘King Kong’ (StarsIn)) I/O’s for R non-clustered. Practice (Cont’d)

Joins using sorted indexes We want to compute R(X,Y)  S(Y,Z) If S has a B-tree index on Y, Create sorted sublists of R only, and Do a sort join, extracting the S-tuples in order through the index Of both have B-tree index on Y, do a zigzag-join.

Example (B-Tree index on S[Y]) T(R) = 10,000, B(R) = 1000, T(S) = 5000, B(S) = 500, V(S,Y) = 100, S has a B-Tree index on Y Assume that both relations and the indexes are clustered. M = 101 buffers Create 10 sorted sublists of R. Cost: 2B(R) 10 buffers for sublists of R, 1 buffer for S (retrieved via index) Join tuples from input buffers Total cost: 2B(R) + B(R) + B(S) + index lookup = index lookup = index lookup

Zigzag Join Suppose we have B-Tree indexes on both S[Y] and R[Y]. We can jump back and forth between the indexes finding Y- values that they share in common. Tuples from R with Y-value that doesn’t appear in S need never be retrieved, and similarly tuples of S whose Y-value doesn’t appear in R need never be retrieved. Example. Let the Y-values for R be: 1,3,4,4,4,5,6 Let the Y-values for S be: 2,2,4,4,6,7 Start with the 1 and 2. Since 1<2 skip 1 in R. Since 2<3 skip the 2’s in S. Since 3<4 skip 3 in R. Join 4’s. …

Example (Zigzag Join) T(R) = 10,000, B(R) = 1000, T(S) = 5000, B(S) = 500, S and R both have clustered B-Tree indexes on Y There is no need to store either relation. We use just disk I/O’s to read the blocks of R and S through their indexes. We can determine from the indexes alone that a large fracion of R or S cannot match tuples of the other relation, so the cost might be considerably less than 1500 I/O’s.