Query processing and optimization. Advanced DatabasesQuery processing and optimization2 Definitions Query processing –translation of query into low-level.

Slides:



Advertisements
Similar presentations
Equality Join R X R.A=S.B S : : Relation R M PagesN Pages Relation S Pr records per page Ps records per page.
Advertisements

Database Management Systems 3ed, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 14, Part B.
Database Management Systems, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
Database Management Systems, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
Query Evaluation. An SQL query and its RA equiv. Employees (sin INT, ename VARCHAR(20), rating INT, age REAL) Maintenances (sin INT, planeId INT, day.
SPRING 2004CENG 3521 Query Evaluation Chapters 12, 14.
1 Chapter 10 Query Processing: The Basics. 2 External Sorting Sorting is used in implementing many relational operations Problem: –Relations are typically.
Evaluation of Relational Operators 198:541. Relational Operations  We will consider how to implement: Selection ( ) Selects a subset of rows from relation.
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Query Processing.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
Query Processing (overview)
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 13: Query Processing.
CSCI 5708: Query Processing I Pusheng Zhang University of Minnesota Feb 3, 2004.
Query Optimization 3 Cost Estimation R&G, Chapters 12, 13, 14 Lecture 15.
1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
CSCI 5708: Query Processing I Pusheng Zhang University of Minnesota Feb 3, 2004.
1 Query Processing: The Basics Chapter Topics How does DBMS compute the result of a SQL queries? The most often executed operations: –Sort –Projection,
Query Processing & Optimization
Evaluation of Relational Operations. Relational Operations v We will consider how to implement: – Selection ( ) Selects a subset of rows from relation.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
Dr. Kalpakis CMSC 461, Database Management Systems Query Processing.
Query Processing Chapter 12
COMP 5138 Relational Database Management Systems Semester 2, 2007 Lecture 12 Query Processing and Optimization.
Database Management 9. course. Execution of queries.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
Chapter 13 Query Processing Melissa Jamili CS 157B November 11, 2004.
Query Processing. Steps in Query Processing Validate and translate the query –Good syntax. –All referenced relations exist. –Translate the SQL to relational.
CPSC 404, Laks V.S. Lakshmanan1 Evaluation of Relational Operations: Other Operations Chapter 14 Ramakrishnan & Gehrke (Sections ; )
Lec3/Database Systems/COMP4910/031 Evaluation of Relational Operations Chapter 14.
12.1Database System Concepts - 6 th Edition Chapter 12: Query Processing Overview Measures of Query Cost Selection Operation Join Operation Sorting 、 Other.
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan Chapter 13: Query Processing.
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 13: Query Processing.
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 13: Query Processing.
Chapter 13: Query Processing Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting Join Operation Other Operations.
Computing & Information Sciences Kansas State University Tuesday, 03 Apr 2007CIS 560: Database System Concepts Lecture 29 of 42 Tuesday, 03 April 2007.
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 13: Query Processing.
Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 12: Query Processing.
Lecture 1- Query Processing Advanced Databases Masood Niazi Torshiz Islamic Azad university- Mashhad Branch
Chapter 12 Query Processing. Query Processing n Selection Operation n Sorting n Join Operation n Other Operations n Evaluation of Expressions 2.
Chapter 12 Query Processing (1) Yonsei University 2 nd Semester, 2013 Sanghyun Park.
Chapter 13: Query Processing
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 13: Query Processing.
Advance Database Systems Query Optimization Ch 15 Department of Computer Science The University of Lahore.
Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 12: Query Processing.
Query Processing CS 405G Introduction to Database Systems.
Query Execution. Where are we? File organizations: sorted, hashed, heaps. Indexes: hash index, B+-tree Indexes can be clustered or not. Data can be stored.
CS 440 Database Management Systems Lecture 5: Query Processing 1.
13.1 Chapter 13: Query Processing n Overview n Measures of Query Cost n Selection Operation n Sorting n Join Operation n Other Operations n Evaluation.
Chapter 13: Query Processing. Overview Measures of Query Cost Selection Operation Sorting Join Operation Other Operations Evaluation of Expressions.
File Processing : Query Processing 2008, Spring Pusan National University Ki-Joune Li.
Relational Operator Evaluation. overview Projection Two steps –Remove unwanted attributes –Eliminate any duplicate tuples The expensive part is removing.
Query Execution Query compiler Execution engine Index/record mgr. Buffer manager Storage manager storage User/ Application Query update Query execution.
Alon Levy 1 Relational Operations v We will consider how to implement: – Selection ( ) Selects a subset of rows from relation. – Projection ( ) Deletes.
Query Processing and Query Optimization Database System Implementation CSE 507 Some slides adapted from Silberschatz, Korth and Sudarshan Database System.
Chapter 13: Query Processing
Chapter 10 The Basics of Query Processing. Copyright © 2005 Pearson Addison-Wesley. All rights reserved External Sorting Sorting is used in implementing.
Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 12: Query Processing.
Database Management System
Chapter 12: Query Processing
Evaluation of Relational Operations: Other Operations
File Processing : Query Processing
Relational Operations
Query processing and optimization
Lecture 2- Query Processing (continued)
Chapter 12 Query Processing (1)
Implementation of Relational Operations
Query processing and optimization
Evaluation of Relational Operations: Other Techniques
Sorting We may build an index on the relation, and then use the index to read the relation in sorted order. May lead to one disk block access for each.
Evaluation of Relational Operations: Other Techniques
Presentation transcript:

Query processing and optimization

Advanced DatabasesQuery processing and optimization2 Definitions Query processing –translation of query into low-level activities –evaluation of query –data extraction Query optimization –selecting the most efficient query evaluation

Advanced DatabasesQuery processing and optimization3 Query Processing (1/2) SELECT * FROM student WHERE name=Paul Parse query and translate –check syntax, verify names, etc –translate into relational algebra (RDBMS) –create evaluation plans Find best plan (optimization) Execute plan student cidname Paul Rob Matt takes cidcourseid course courseidcoursename 312Advanced DBs 395Machine Learning

Advanced DatabasesQuery processing and optimization4 Query Processing (2/2) query parser and translator relational algebra expression optimizer evaluation plan evaluation engine output data statistics

Advanced DatabasesQuery processing and optimization5 Relational Algebra (1/2) Query language Operations: –select: σ –project: π –union:  –difference: - –product: x –join:

Advanced DatabasesQuery processing and optimization6 Relational Algebra (2/2) SELECT * FROM student WHERE name=Paul –σ name=Paul (student) π name ( σ cid< (student) ) π name (σ coursename=Advanced DBs ((student cid takes) courseid course) ) student cidname Paul Rob Matt takes cidcourseid course courseidcoursename 312Advanced DBs 395Machine Learning

Advanced DatabasesQuery processing and optimization7 Why Optimize? Many alternative options to evaluate a query –π name (σ coursename=Advanced DBs ((student cid takes) courseid course) ) –π name ((student cid takes) courseid σ coursename=Advanced DBs (course)) ) Several options to evaluate a single operation –σ name=Paul (student) scan file use secondary index on student.name Multiple access paths –access path: how can records be accessed

Advanced DatabasesQuery processing and optimization8 Evaluation plans Specify which access path to follow Specify which algorithm to use to evaluate operator Specify how operators interleave Optimization: –estimate the cost of each plan (not all plans) –select plan with lowest estimated cost σ name=Paul ; use index i student σ name=Paul student σ coursename=Advanced DBs l studenttakes cid; hash join courseid; index- nested loop course π name

Advanced DatabasesQuery processing and optimization9 Estimating Cost What needs to be considered: –Disk I/Os sequential random –CPU time –Network communication What are we going to consider: –Disk I/Os page reads/writes –Ignoring cost of writing final output

Operations and Costs

Advanced DatabasesQuery processing and optimization11 Operations and Costs (1/2) Operations: σ, π, , , -, x, Costs: –N R : number of records in R –L R : size of record in R –F R : blocking factor number of records in page –B R : number of pages to store relation R –V(A,R): number of distinct values of attribute A in R –SC(A,R): selection cardinality of A in R A key: S(A,R)=1 A nonkey: S(A,R)= N R / V(A,R) –HT i : number of levels in index I –rounding up fractions and logarithms

Advanced DatabasesQuery processing and optimization12 Operations and Costs (2/2) relation takes –700 tuples –student cid 8 bytes –course id 4 bytes –9 courses –100 students –page size 512 bytes –output size (in pages) of query: which students take the Advanced DBs course? N takes = 700 V(courseid, takes) = 9 SC(courseid,takes) = ceil( N takes /V(courseid, takes) ) = ceil(700/9) = 78 f = floor( 512/8 ) = 64 B = ceil( 78/64) = 2 pages

Advanced DatabasesQuery processing and optimization13 Selection σ (1/2) Linear search –read all pages, find records that match (assuming equality search) –average cost: nonkey B R, key 0.5*B R Binary search –on ordered field –average cost: m additional pages to be read m = ceil( SC(A,R)/F R ) - 1 Primary/Clustered Index –average cost: single record HT i + 1 multiple records HT i + ceil( SC(A,R)/F R )

Advanced DatabasesQuery processing and optimization14 Selection σ (2/2) Secondary Index –average cost: key field HT i + 1 nonkey field –worst case HT i + SC(A,R) –linear search more desirable if many matching records

Advanced DatabasesQuery processing and optimization15 Complex selection σ expr conjunctive selections: –perform simple selection using θ i with the lowest evaluation cost e.g. using an index corresponding to θ i apply remaining conditions θ on the resulting records cost: the cost of the simple selection on selected θ – multiple indices select indices that correspond to θ i s scan indices and return RIDs answer: intersection of RIDs cost: the sum of costs + record retrieval disjunctive selections: –multiple indices union of RIDs –linear search

Advanced DatabasesQuery processing and optimization16 Projection and set operations SELECT DISTINCT cid FROM takes –π requires duplicate elimination –sorting set operations require duplicate elimination –R  S –R  S –sorting

Advanced DatabasesQuery processing and optimization17 Sorting efficient evaluation for many operations required by query: –SELECT cid,name FROM student ORDER BY name implementations –internal sorting (if records fit in memory) –external sorting

Advanced DatabasesQuery processing and optimization18 External Sort-Merge Algorithm (1/3) Sort stage: create sorted runs i=0; repeat read M pages of relation R into memory sort the M pages write them into file R i increment i until no more pages N = i// number of runs

Advanced DatabasesQuery processing and optimization19 External Sort-Merge Algorithm (2/3) Merge stage: merge sorted runs //assuming N < M allocate a page for each run file R i // N pages allocated read a page P i of each R i repeat choose first record (in sort order) among N pages, say from page P j write record to output and delete from page P j if page is empty read next page P j ’ from R j until all pages are empty

Advanced DatabasesQuery processing and optimization20 External Sort-Merge Algorithm (3/3) Merge stage: merge sorted runs What if N > M ? –perform multiple passes –each pass merges M-1 runs until relation is processed –in next pass number of runs is reduced –final pass generated sorted output

Advanced DatabasesQuery processing and optimization21 Sort-Merge Example d95 a12 x44 s95 f12 o73 t45 n67 e87 z11 v22 b38 file memory t45 n67 e87 z11 v22 b38d95a12x44a12d95x44 R1R1 f12 o73 s95 R2R2 e87 n67 t45 R3R3 b38 v22 z11 R4R4 a12f a d95d a12d95 x44 s95 f12 o73 run pass v22 t45 s95 z11 x44 o73 a12 b38 n67 f12 d95 e87

Advanced DatabasesQuery processing and optimization22 Sort-Merge cost B R the number of pages of R Sort stage: 2 * B R –read/write relation Merge stage: –initially runs to be merged –each pass M-1 runs sorted –thus, total number of passes: –at each pass 2 * B R pages are read read/write relation apart from final write Total cost: –2 * B R + 2 * B R * - B R

Advanced DatabasesQuery processing and optimization23 Projection π Α1,Α2… (R) remove unwanted attributes –scan and drop attributes remove duplicate records –sort resulting records using all attributes as sort order –scan sorted result, eliminate duplicates (adjucent) cost –initial scan + sorting + final scan

Advanced DatabasesQuery processing and optimization24 Join π name (σ coursename=Advanced DBs ((student cid takes) courseid course) ) implementations –nested loop join –block-nested loop join –indexed nested loop join –sort-merge join –hash join

Advanced DatabasesQuery processing and optimization25 Nested loop join (1/2) R S for each tuple t R of R for each t S of S if (t R t S match) output t R.t S end Works for any join condition S inner relation R outer relation

Advanced DatabasesQuery processing and optimization26 Nested loop join (2/2) Costs: –best case when smaller relation fits in memory use it as inner relation B R +B S –worst case when memory holds one page of each relation S scanned for each tuple in R N R * B s + B R

Advanced DatabasesQuery processing and optimization27 Block nested loop join (1/2) for each page X R of R foreach page X S of S for each tuple t R in X R for each t S in X S if (t R t S match) output t R.t S end

Advanced DatabasesQuery processing and optimization28 Block nested loop join (2/2) Costs: –best case when smaller relation fits in memory use it as inner relation B R +B S –worst case when memory holds one page of each relation S scanned for each page in R B R * B s + B R

Advanced DatabasesQuery processing and optimization29 Indexed nested loop join R S Index on inner relation (S) for each tuple in outer relation (R) probe index of inner relation Costs: –B R + N R * c c the cost of index-based selection of inner relation –relation with fewer records as outer relation

Advanced DatabasesQuery processing and optimization30 Sort-merge join R S Relations sorted on the join attribute Merge sorted relations –pointers to first record in each relation –read in a group of records of S with the same values in the join attribute –read records of R and process Relations in sorted order to be read once Cost: –cost of sorting + B S + B R dD eE xX vV e67 e87 n11 v22 z38

Advanced DatabasesQuery processing and optimization31 Hash join R S use h 1 on joining attribute to map records to partitions that fit in memory –records of R are partitioned into R 0 … R n-1 –records of S are partitioned into S 0 … S n-1 join records in corresponding partitions –using a hash-based indexed block nested loop join Cost: 2*(B R +B S ) + (B R +B S ) R R0R0 R1R1 R n S S0S0 S1S1 S n

Advanced DatabasesQuery processing and optimization32 Exercise: joins R S N R =2 15 B R = 100 N S =2 6 B S = 30 B + index on S –order 4 –full nodes nested loop join: best case - worst case block nested loop join: best case - worst case indexed nested loop join

Advanced DatabasesQuery processing and optimization33 Evaluation evaluate multiple operations in a plan materialization pipelining σ coursename=Advanced DBs studenttakes cid; hash join courseid; index- nested loop course π name

Advanced DatabasesQuery processing and optimization34 Materialization create and read temporary relations create implies writing to disk –more page writes σ coursename=Advanced DBs studenttakes cid; hash join courseid; index- nested loop course π name

Advanced DatabasesQuery processing and optimization35 Pipelining (1/2) creating a pipeline of operations reduces number of read-write operations implementations –demand-driven - data pull –producer-driven - data push σ coursename=Advanced DBs studenttakes cid; hash join ccourseid; index- nested loop course π name

Advanced DatabasesQuery processing and optimization36 Pipelining (2/2) can pipelining always be used? any algorithm? cost of R S –materialization and hash join: B R + 3(B R +B S ) –pipelining and indexed nested loop join: N R * HT i σ coursename=Advanced DBs studenttakes cid courseid course pipelined materialized R S

Query Optimization

Advanced DatabasesQuery processing and optimization38 Choosing evaluation plans cost based optimization enumeration of plans –R S T, 12 possible orders cost estimation of each plan overall cost –cannot optimize operation independently

Advanced DatabasesQuery processing and optimization39 Cost estimation operation (σ, π, …) implementation size of inputs size of outputs sorting σ coursename=Advanced DBs studenttakes cid; hash join courseid; index- nested loop course π name

Advanced DatabasesQuery processing and optimization40 Size Estimation (1/2) –SC(A,R) – –multiplying probabilities – –probability that a record satisfy none of θ: –

Advanced DatabasesQuery processing and optimization41 Size Estimation (2/2) R x S –N R * N S R S –R  S =  : N R * N S –R  S key for R: maximum output size is N s –R  S foreign key for R: N S –R  S = {A}, neither key of R nor S N R *N S / V(A,S) N S *N R / V(A,R)

Advanced DatabasesQuery processing and optimization42 Expression Equivalence conjunctive selection decomposition – commutativity of selection – combining selection with join and product –σ θ1 (R x S) = R θ1 S commutativity of joins –R θ1 S = S θ1 R distribution of selection over join –σ θ1^θ2 (R S) = σ θ1 (R) σ θ2 (S) distribution of projection over join –π A1,A2 (R S) = π A1 (R) π A2 (S) associativity of joins: R (S T) = (R S) T

Advanced DatabasesQuery processing and optimization43 Cost Optimizer (1/2) transforms expressions –equivalent expressions –heuristics, rules of thumb perform selections early perform projections early replace products followed by selection σ (R x S) with joins R S start with joins, selections with smallest result –create left-deep join trees

Advanced DatabasesQuery processing and optimization44 Cost Optimizer (2/2) σ coursenam = Advanced DBs studenttakes cid; hash join ccourseid; index- nested loop course π name σ coursename=Advanced DBs studenttakes cid; hash join ccourseid; index- nested loop course π name

Advanced DatabasesQuery processing and optimization45 Cost Evaluation Exercise π name (σ coursename=Advanced DBs ((student cid takes) courseid course) ) R = student cid takes S = course N S = 10 records assume that on average there are 50 students taking each course blocking factor: 2 records/page what is the cost of σ coursename=Advanced DBs (R courseid S) what is the cost of R σ coursename=Advanced DBs S assume relations can fit in memory

Advanced DatabasesQuery processing and optimization46 Summary Estimating the cost of a single operation Estimating the cost of a query plan Optimization –choose the most efficient plan