Download presentation
Presentation is loading. Please wait.
Published byBertram Sharp Modified over 9 years ago
1
Optimization of Nested Queries Sujatha Thanigaimani COSC 6421
2
Outline Introduction Kim’s Algorithm for efficient processing Count bug – Solution inequality bug – Solution Alternate Algorithm Modification of Kim’s algorithm
3
Nested Queries Queries containing other queries Inner query: –Can appear in FROM or WHERE clause “outer query” “inner query” Example: SELECT cname FROM borrower WHERE cname IN (SELECT cname FROM depositor) think this as a function that returns the result of the inner query
4
Evaluation of Nested Queries Naive method : Tuple Iteration Semantics (TIS) - inefficient. Kim’s Algorithm Rationale : Interesting and powerful feature of SQL. Unnesting : Process of transforming nested queries into canonical form. Classified the Nested Queries for better understanding and processing
5
Types : SUPPLIER(sno, sname, sloc, sbudget), PARTS(pno,pname,qoh,color), PROJECT(jno,jname,pno,jbudget,jloc) SHIPMENT(sno,pno,jno,qty,shipdate) Type-A Nesting: Not correlated, aggregated sub query Example : SELECT SNO FROM SP WHERE PNo= (SELECT MAX(PN0) FROM P) can be evaluated independently of the outer query block, and the result of its evaluation will be a single constant
6
Type-N Nesting : Non correlated, not aggregated subquery SELECT SNO FROM SP WHERE PNO IS lN (SELECT PNO FROM P WHERE WEIGHT> 50) Evaluation : inner query block Q is processed, resulting in a list of values X which can then be substituted for the inner query block so that PNO IS IN Q becomes PNO IS IN X.The resulting query is then evaluated by nested iteration
7
Type-J Nesting : Correlated, not aggregated subquery SELECT SNAME FROM S WHERE SNO IS IN (SELECT SNO FROM SP WHERE QTY> 100 AND SPORIGIN = S. CITY). Type-JA Nesting : Correlated, aggregated subquery SELECT PNAM FROM P WHERE PNO= (SELECT MAX(PN0) FROM SP WHERE SPORlGlN = P.CITY) Evaluation : In TIS, the inner query block is processed once for each tuple of the outer relation which satisfies all simple predicates on the outer relation ----- inefficient Kim developed alternate algorithms for efficient processing of nested queries.
8
Algorithm NEST-N-J (for type-N or type-J) 1. Combine the FROM clauses of all query blocks into one FROM clause 2. AND together the WHERE clauses of all query blocks, replacing IS IN by = 3. Retain the SELECT clause of the outermost query block The result is a canonical query logically equivalent to the original nested query.SELECT RiCk FROM RiFROM Ri,Rj WHERE RiCh IS IN WHERE RiCh = RjCm (SELECT RjCm FROM Rj)
9
Algorithm NEST-JA 1. Generate a temporary relation Rt(C1,Cn,Cn+l) from R2 such that Rt Cn+l is the result of applying the aggregate function AGG on the Cn+l column of R2 which have matching values of RI for Cl,C2, etc SELECT R1.Cn+2Rt(C1,..,Cn,Cn+1)=(SELECT FROM R1 C1,Cn,AGG(Cn+1) WHERE R1.Cn+1 = FROM R2 (SELECT AGG(R2.Cn+1) GROUP BY C1,..,Cn) FROM R2 WHERE R2.C1 = R1.C1 AND R2.C1 = R1.C1 AND … R2.Cn = R1.C1);
10
2. Transform the inner query block of the initial query by changing all references to R2 columns Join predicates which also reference Rl to the corresponding Rt columns. The result is a type-J nested query, which can be passed to algorithm NEST-N-J for transformation to its canonical equivalent. SELECT R1.Cn+2 FROM R1 WHERE R1.Cn+1 = (SELECT Rt.Cn+1 FROM Rt WHERE Rt.C1 = R1.C1 AND Rt.C2 = R1.C2 AND Rt.Cn = R1.C1);
11
Count bug : PARTS (PNUM,QOH) SUPPLY (PNUM,QUAN,SHIPDATE) SELECT PNUM FROM PARTS WHERE QOH = (SELECT COUNT( SHlPDATE ) FROM SUPPLY WHERE SUPPLY. PNUM = PARTS.PNUM AND SHIPDATE < l – l - 80) Parts PNUMQOH 36 101 80 PNUMQUANSHIPDATE 347-3-79 3210-1-78 1016-8-78 1028-10-81 855-7-83 Supply PNUM 10 8 Result by TISResult PNUM 10
12
Solution using Outer Join R X A B S Y B C E R=+S XY Anull BB C E
13
Solution with outer joins temp (SUPPNUM,CT) = (select parts.PNUM, count(SHIPDATE) from parts, supply where SHIPDATE < 1-1-80 and parts.PNUM =+ supply.PNUM group by parts.PNUM) parts.PNUM =+ supply.PNUM (for SHIPDATE < 1-1-80) Parts.PNUMParts.QOHSupply.PNUMSupply.QUONSupply.SHIPDATE 36347-3-79 363210-1-78 101 16-8-78 80null
14
TEMP SUPPNUMCT 32 101 80 Final Result PNUM 10 8 Drawbacks : 1.If the sub query has COUNT(*), this will always return a result > 0 because of the outer join. The '*' must be changed to a column name from the inner relation. SELECT PNUM FROM PARTS,TEMP WHERE PARTS.QOH = TEMP.CT AND PARTS.PNUM = TEMP.SUPPNUM
15
2. Duplicates Problem : Parts PNUMQOH 32 36 101 0 80 Supply PNUMQUANSHIPDATE 347-3-79 3210-1-78 1016-8-78 Result by TIS Our Result PNUM 3 10 8 PNUM 8 SUPPNUMCT 34 102 80
16
Solution: 1.Remove duplicates before the join in the creation of Temp table is performed. TEMPI(PNUM) = (SELECT DISTINCT PNUM FROM PARTS) 2. Use the projection instead of outer relation in any join required to build the temp table TEMP2(SUPPNUM,CT) = (SELECT TEMP1.PNUM,COUNT(SHIPDATE) FROM TEMP1, SUPPLY WHERE SUPPLY.SHIPDATE < 1-1-80 AND TEMP1.PNUM =+ SUPPLY.PNUM GROUP BY TEMP1.PNUM) SUPPNUMCT 32 101 80 PNUM 3 10 8
17
Another bug : Relations other than equality SELECT PNUM FROM PARTS WHERE QOH = (SELECT MAX(QUAN) FROM SUPPLY WHERE SUPPLY. PNUM < PARTS.PNUM AND SHIPDATE < l – l - 80) TEMP (SUPPNUM, MAXQUAN) = SELECT PNUM, MAX(QUAN) FROM SUPPLY WHERE SHIPDATE < l-l-80 GROUP BY PNUM SELECT PNUM FROM PARTS, TEMP WHERE QOH = TEMP.MAXQUAN AND TEMP.SUPPNUM<PARTS.PNUM Max is calculated for each S.pnum but required is Max should be taken for a set of S.Pnum which are lesser than given P.Pnum Problem
18
Solution : 1. First join, then aggregate (Kim' was: First group, then join). TEMP (SUPPNUM, MAXQUAN) = SELECT PNUM, MAX(QUAN) FROM PARTS,SUPPLY WHERE SHIPDATE < l-l-80 AND SUPPLY.PNUM < PARTS.PNUM GROUP BY PNUM SELECT PNUM FROM PARTS,TEMP WHERE PARTS.QOH = TEMP.MAXQUAN AND PARTS.PNUM = TEMP.SUPPNUM
19
Modified Algorithm : Nest JA2 1. Project the Join column of the outer relation, and restrict it with any simple predicates applying to the outer relation TEMPI(PNUM) = (SELECT DISTINCT PNUM FROM PARTS) 2. Create a temporary relation, Joining the inner relation with the projection of the outer relation. If the aggregate function is COUNT, the Join must be an outer Join TEMP2(PNUM)= (SELECT PNUM FROM SUPPLY WHERE SHIPDATE < l-1-80) TEMP3 (PNUM,CT) = (SELECT TEMPl. PNUM, COUNT(TEMP2. SHIPDATE) FROM TEMPl,TEMP2 WHERE TEMPl.PNUM=+TEMP2.PNUM GROUP BY TEMPl. PNUM)
20
3.Join the outer relation with the temporary relation, according to the transformed version of the original query SELECT PNUM FROM PARTS,TEMP3 WHERE PARTS.QOH = TEMP3.CT AND PARTS.PNUM = TEMP3.PNUM Processing a General Nested Query : Recursive Approach procedure nest_g (query-block) for each predicate in the WHERE clause of query-block if predicate is a nested predicate (i.e contains inner query block) nest_g (inner_query_block) /* Determine type of nesting and call appropriate transformation procedure*/ /* if nesting is type-JA */ nest-JA2(inner_query_block)
21
Nest_g contd nest-N-J(query_block,inner_query_block) Else /* if nesting is type-A */ nest_a(inner_query_block) Else nest-N-J (query_block, inner_query_block) Return Advantage : Simplicity
22
Analysis
23
Modified Kim’s Algorithm : R.B OP1 TEMP1.COUNT : R.B OP1 O ITEMPI < I R OJ S I,Hence better than alternate algorithm
24
References: 1.Optimisation of Nested SQL Queries Revisited - Richard A Ganski, Harry K T Wong 2.Improved Unnesting Algorithms for Join Aggregate SQL Queries – M.Muralikrishna
25
Thank You
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.