SCUHolliday - COEN 17814–1 Schedule Today: u Query Processing overview
SCUHolliday - COEN 17814–2 Steps in Query Processing 1. Parsing and translation 2. Optimization3. Evaluation
SCUHolliday - COEN 17814–3 Steps in Query Processing Parsing and translation u Translate the query into its internal form. Put into relational algebra-like expression. u Parser checks syntax, verifies relations Optimization Evaluation u The query-execution engine takes a query- evaluation plan, executes that plan, and returns the answers to the query.
SCUHolliday - COEN 17814–4 Optimization A relational algebra expression may have many equivalent expressions u E.g., balance 2500 ( balance (account)) is equivalent to balance ( balance 2500 (account)) Each relational algebra operation can be evaluated using one of several different algorithms Annotated expression specifying detailed evaluation strategy is called an evaluation-plan. u E.g., can use an index on balance to find accounts with balance < 2500, u or can perform complete relation scan and discard accounts with balance 2500
SCUHolliday - COEN 17814–5 Query Optimization Amongst all equivalent evaluation plans choose the one with lowest cost. u Cost is estimated using statistical information from the database catalog e.g. number of tuples in each relation, size of tuples, etc. We want to know u How to measure query costs u Algorithms for evaluating relational algebra operations u How to combine algorithms for individual operations in order to evaluate a complete expression
SCUHolliday - COEN 17814–6 Measures of Query Cost Cost is generally measured as total elapsed time for answering query u Many factors contribute to time cost disk accesses, CPU, or even network communication Typically disk access is the predominant cost, and is also relatively easy to estimate. Measured by taking into account u Number of seeks * average-seek-cost u Number of blocks read * average-block-read-cost u Number of blocks written * average-block-write-cost Cost to write a block is greater than cost to read a block –data is read back after being written to ensure that the write was successful
SCUHolliday - COEN 17814–7 Cost For simplicity we just use number of block transfers from disk as the cost measure u We also ignore CPU costs for simplicity Costs depends on the size of the buffer in main memory u Having more memory reduces need for disk access u Amount of real memory available to buffer depends on other concurrent OS processes, and hard to determine ahead of actual execution u We often use worst case estimates, assuming only the minimum amount of memory needed for the operation is available Real systems take CPU cost into account, differentiate between sequential and random I/O, and take buffer size into account
SCUHolliday - COEN 17814–8 Example RABC S CDE a11010x2 b12020y2 c21030z2 d23540x1 e34550y3
SCUHolliday - COEN 17814–9 Example Select B,D From R,S Where R.A = “c” and S.E = 2 and R.C=S.C B,D ( R.A =“c” S.E=2 R.C=S.C )(R X S)
SCUHolliday - COEN 17814–10 RABC S CDE a11010x2 b12020y2 c21030z2 d23540x1 e34550y3 AnswerB D 2 x
SCUHolliday - COEN 17814–11 How do we execute query? - Do Cartesian product - Select tuples - Do projection One idea
SCUHolliday - COEN 17814–12 RXSR.AR.BR.CS.CS.DS.E a x 2 a y 2. C x 2. Bingo! Got one...
SCUHolliday - COEN 17814–13 Relational Algebra - can be used to describe plans... Ex: Plan I B,D R.A =“c” S.E=2 R.C=S.C X RS OR: B,D [ R.A=“c” S.E=2 R.C = S.C (RXS)]
SCUHolliday - COEN 17814–14 Another idea: B,D R.A = “c” S.E = 2 R S Plan II natural join
SCUHolliday - COEN 17814–15 R S A B C ( R ) ( S ) C D E a 1 10 A B C C D E 10 x 2 b 1 20c x 2 20 y 2 c y 2 30 z 2 d z 2 40 x 1 e y 3
SCUHolliday - COEN 17814–16 Plan III Use R.A and S.C Indexes (1) Use R.A index to select R tuples with R.A = “c” (2) For each R.C value found, use S.C index to find matching tuples (3) Eliminate S tuples S.E 2 (4) Join matching R,S tuples, project B,D attributes and place in result
SCUHolliday - COEN 17814–17 R S A B C C D E a x 2 b y 2 c z 2 d x 1 e y 3 AC I1I1 I2I2 =“c” check=2? output: next tuple:
SCUHolliday - COEN 17814–18 Example: SQL query SELECT title FROM StarsIn WHERE starName IN ( SELECT name FROM MovieStar WHERE birthdate LIKE ‘%1960’ ); (Find the movies with stars born in 1960) StarsIn = title, year, starName MovieStar = name, address, gender, birthdate
SCUHolliday - COEN 17814–19 Example: Parse Tree SELECT FROM WHERE IN title StarsIn ( ) starName SELECT FROM WHERE LIKE name MovieStar birthDate ‘%1960’
SCUHolliday - COEN 17814–20 Example: Generating Relational Algebra title StarsIn IN name birthdate LIKE ‘%1960’ starName MovieStar Fig. 7.15: An expression using a two-argument , midway between a parse tree and relational algebra
SCUHolliday - COEN 17814–21 Example: Logical Query Plan title starName=name StarsIn name birthdate LIKE ‘%1960’ MovieStar Fig. 7.18: Applying the rule for IN conditions
SCUHolliday - COEN 17814–22 Example: Improved Logical Query Plan title starName=name StarsIn name birthdate LIKE ‘%1960’ MovieStar Fig. 7.20: An improvement on fig Question: Add project to StarsIn?
SCUHolliday - COEN 17814–23 Example: Estimate Result Sizes Need expected size StarsIn MovieStar
SCUHolliday - COEN 17814–24 Selection Operation File scan – search algorithms that locate and retrieve records that fulfill a selection condition. Algorithm A1 (linear search). Scan each file block and test all records to see whether they satisfy the selection condition. u Cost estimate (number of disk blocks scanned) = b r u If selection is on a key attribute, cost = ( b r /2) stop on finding record u Linear search can be applied regardless of selection condition or ordering of records in the file, or availability of indices
SCUHolliday - COEN 17814–25 Selection continued A2 (binary search). Applicable if selection is an equality comparison on the attribute on which file is ordered. u Assume that the blocks of a relation are stored contiguously u Cost estimate (number of disk blocks to be scanned): log 2 (b r ) — cost of locating the first tuple by a binary search on the blocks Plus number of blocks containing records that satisfy selection condition
SCUHolliday - COEN 17814–26 Selection with Index Scan A3 (primary index on candidate key, equality). Retrieve a single record that satisfies the corresponding equality condition A4 (primary index on nonkey, equality) Retrieve multiple records. u Records will be on consecutive blocks A5 (equality on search-key of secondary index). u Retrieve a single record if the search-key is a candidate key u Retrieve multiple records if search-key is not a candidate key Can be very expensive! each record may be on a different block – one block access for each retrieved record
SCUHolliday - COEN 17814–27 Cross Product and Join We want a way to estimate the size of the results of joins and cross products. The cross product r s contains n r * n s tuples and each tuple occupies b r + b s bytes If R S = , then r s is the same as r s
SCUHolliday - COEN 17814–28 Join Size Estimation If R S is a key for R, then we know that a tuple of s will join with at most one tuple from r, so the number of tuples in r s is no greater than the number of tuples in s. If R S is a foreign key for S referencing R, then the number of tuples in r s is exactly the number of tuples in s. R S AX 35… 36… 37… KA k135 k235 k337
SCUHolliday - COEN 17814–29 parse convert apply laws estimate result sizes consider physical plans estimate costs pick best execute {P1,P2,…..} {(P1,C1),(P2,C2)...} Pi answer SQL query parse tree logical query plan “improved” l.q.p l.q.p. +sizes statistics