Presentation is loading. Please wait.

Presentation is loading. Please wait.

Optimization of Nested Queries Sujatha Thanigaimani COSC 6421.

Similar presentations


Presentation on theme: "Optimization of Nested Queries Sujatha Thanigaimani COSC 6421."— Presentation transcript:

1 Optimization of Nested Queries Sujatha Thanigaimani COSC 6421

2 Outline Introduction Kim’s Algorithm for efficient processing Count bug – Solution inequality bug – Solution Alternate Algorithm Modification of Kim’s algorithm

3 Nested Queries Queries containing other queries Inner query: –Can appear in FROM or WHERE clause “outer query” “inner query” Example: SELECT cname FROM borrower WHERE cname IN (SELECT cname FROM depositor) think this as a function that returns the result of the inner query

4 Evaluation of Nested Queries Naive method : Tuple Iteration Semantics (TIS) - inefficient. Kim’s Algorithm  Rationale : Interesting and powerful feature of SQL.  Unnesting : Process of transforming nested queries into canonical form.  Classified the Nested Queries for better understanding and processing

5 Types : SUPPLIER(sno, sname, sloc, sbudget), PARTS(pno,pname,qoh,color), PROJECT(jno,jname,pno,jbudget,jloc) SHIPMENT(sno,pno,jno,qty,shipdate) Type-A Nesting: Not correlated, aggregated sub query Example : SELECT SNO FROM SP WHERE PNo= (SELECT MAX(PN0) FROM P) can be evaluated independently of the outer query block, and the result of its evaluation will be a single constant

6 Type-N Nesting : Non correlated, not aggregated subquery SELECT SNO FROM SP WHERE PNO IS lN (SELECT PNO FROM P WHERE WEIGHT> 50) Evaluation : inner query block Q is processed, resulting in a list of values X which can then be substituted for the inner query block so that PNO IS IN Q becomes PNO IS IN X.The resulting query is then evaluated by nested iteration

7 Type-J Nesting : Correlated, not aggregated subquery SELECT SNAME FROM S WHERE SNO IS IN (SELECT SNO FROM SP WHERE QTY> 100 AND SPORIGIN = S. CITY). Type-JA Nesting : Correlated, aggregated subquery SELECT PNAM FROM P WHERE PNO= (SELECT MAX(PN0) FROM SP WHERE SPORlGlN = P.CITY) Evaluation : In TIS, the inner query block is processed once for each tuple of the outer relation which satisfies all simple predicates on the outer relation ----- inefficient Kim developed alternate algorithms for efficient processing of nested queries.

8 Algorithm NEST-N-J (for type-N or type-J) 1. Combine the FROM clauses of all query blocks into one FROM clause 2. AND together the WHERE clauses of all query blocks, replacing IS IN by = 3. Retain the SELECT clause of the outermost query block The result is a canonical query logically equivalent to the original nested query.SELECT RiCk FROM RiFROM Ri,Rj WHERE RiCh IS IN WHERE RiCh = RjCm (SELECT RjCm FROM Rj)

9 Algorithm NEST-JA 1. Generate a temporary relation Rt(C1,Cn,Cn+l) from R2 such that Rt Cn+l is the result of applying the aggregate function AGG on the Cn+l column of R2 which have matching values of RI for Cl,C2, etc SELECT R1.Cn+2Rt(C1,..,Cn,Cn+1)=(SELECT FROM R1 C1,Cn,AGG(Cn+1) WHERE R1.Cn+1 = FROM R2 (SELECT AGG(R2.Cn+1) GROUP BY C1,..,Cn) FROM R2 WHERE R2.C1 = R1.C1 AND R2.C1 = R1.C1 AND … R2.Cn = R1.C1);

10 2. Transform the inner query block of the initial query by changing all references to R2 columns Join predicates which also reference Rl to the corresponding Rt columns. The result is a type-J nested query, which can be passed to algorithm NEST-N-J for transformation to its canonical equivalent. SELECT R1.Cn+2 FROM R1 WHERE R1.Cn+1 = (SELECT Rt.Cn+1 FROM Rt WHERE Rt.C1 = R1.C1 AND Rt.C2 = R1.C2 AND Rt.Cn = R1.C1);

11 Count bug : PARTS (PNUM,QOH) SUPPLY (PNUM,QUAN,SHIPDATE) SELECT PNUM FROM PARTS WHERE QOH = (SELECT COUNT( SHlPDATE ) FROM SUPPLY WHERE SUPPLY. PNUM = PARTS.PNUM AND SHIPDATE < l – l - 80) Parts PNUMQOH 36 101 80 PNUMQUANSHIPDATE 347-3-79 3210-1-78 1016-8-78 1028-10-81 855-7-83 Supply PNUM 10 8 Result by TISResult PNUM 10

12 Solution using Outer Join R X A B S Y B C E R=+S XY Anull BB C E

13 Solution with outer joins temp (SUPPNUM,CT) = (select parts.PNUM, count(SHIPDATE) from parts, supply where SHIPDATE < 1-1-80 and parts.PNUM =+ supply.PNUM group by parts.PNUM) parts.PNUM =+ supply.PNUM (for SHIPDATE < 1-1-80) Parts.PNUMParts.QOHSupply.PNUMSupply.QUONSupply.SHIPDATE 36347-3-79 363210-1-78 101 16-8-78 80null

14 TEMP SUPPNUMCT 32 101 80 Final Result PNUM 10 8 Drawbacks : 1.If the sub query has COUNT(*), this will always return a result > 0 because of the outer join. The '*' must be changed to a column name from the inner relation. SELECT PNUM FROM PARTS,TEMP WHERE PARTS.QOH = TEMP.CT AND PARTS.PNUM = TEMP.SUPPNUM

15 2. Duplicates Problem : Parts PNUMQOH 32 36 101 0 80 Supply PNUMQUANSHIPDATE 347-3-79 3210-1-78 1016-8-78 Result by TIS Our Result PNUM 3 10 8 PNUM 8 SUPPNUMCT 34 102 80

16 Solution: 1.Remove duplicates before the join in the creation of Temp table is performed. TEMPI(PNUM) = (SELECT DISTINCT PNUM FROM PARTS) 2. Use the projection instead of outer relation in any join required to build the temp table TEMP2(SUPPNUM,CT) = (SELECT TEMP1.PNUM,COUNT(SHIPDATE) FROM TEMP1, SUPPLY WHERE SUPPLY.SHIPDATE < 1-1-80 AND TEMP1.PNUM =+ SUPPLY.PNUM GROUP BY TEMP1.PNUM) SUPPNUMCT 32 101 80 PNUM 3 10 8

17 Another bug : Relations other than equality SELECT PNUM FROM PARTS WHERE QOH = (SELECT MAX(QUAN) FROM SUPPLY WHERE SUPPLY. PNUM < PARTS.PNUM AND SHIPDATE < l – l - 80) TEMP (SUPPNUM, MAXQUAN) = SELECT PNUM, MAX(QUAN) FROM SUPPLY WHERE SHIPDATE < l-l-80 GROUP BY PNUM SELECT PNUM FROM PARTS, TEMP WHERE QOH = TEMP.MAXQUAN AND TEMP.SUPPNUM<PARTS.PNUM Max is calculated for each S.pnum but required is Max should be taken for a set of S.Pnum which are lesser than given P.Pnum Problem

18 Solution : 1. First join, then aggregate (Kim' was: First group, then join). TEMP (SUPPNUM, MAXQUAN) = SELECT PNUM, MAX(QUAN) FROM PARTS,SUPPLY WHERE SHIPDATE < l-l-80 AND SUPPLY.PNUM < PARTS.PNUM GROUP BY PNUM SELECT PNUM FROM PARTS,TEMP WHERE PARTS.QOH = TEMP.MAXQUAN AND PARTS.PNUM = TEMP.SUPPNUM

19 Modified Algorithm : Nest JA2 1. Project the Join column of the outer relation, and restrict it with any simple predicates applying to the outer relation TEMPI(PNUM) = (SELECT DISTINCT PNUM FROM PARTS) 2. Create a temporary relation, Joining the inner relation with the projection of the outer relation. If the aggregate function is COUNT, the Join must be an outer Join TEMP2(PNUM)= (SELECT PNUM FROM SUPPLY WHERE SHIPDATE < l-1-80) TEMP3 (PNUM,CT) = (SELECT TEMPl. PNUM, COUNT(TEMP2. SHIPDATE) FROM TEMPl,TEMP2 WHERE TEMPl.PNUM=+TEMP2.PNUM GROUP BY TEMPl. PNUM)

20 3.Join the outer relation with the temporary relation, according to the transformed version of the original query SELECT PNUM FROM PARTS,TEMP3 WHERE PARTS.QOH = TEMP3.CT AND PARTS.PNUM = TEMP3.PNUM Processing a General Nested Query : Recursive Approach procedure nest_g (query-block) for each predicate in the WHERE clause of query-block if predicate is a nested predicate (i.e contains inner query block) nest_g (inner_query_block) /* Determine type of nesting and call appropriate transformation procedure*/ /* if nesting is type-JA */ nest-JA2(inner_query_block)

21 Nest_g contd nest-N-J(query_block,inner_query_block) Else /* if nesting is type-A */ nest_a(inner_query_block) Else nest-N-J (query_block, inner_query_block) Return Advantage : Simplicity

22 Analysis

23 Modified Kim’s Algorithm : R.B OP1 TEMP1.COUNT : R.B OP1 O ITEMPI < I R OJ S I,Hence better than alternate algorithm

24 References: 1.Optimisation of Nested SQL Queries Revisited - Richard A Ganski, Harry K T Wong 2.Improved Unnesting Algorithms for Join Aggregate SQL Queries – M.Muralikrishna

25 Thank You


Download ppt "Optimization of Nested Queries Sujatha Thanigaimani COSC 6421."

Similar presentations


Ads by Google