Optimization of Nested Queries Sujatha Thanigaimani COSC 6421
Outline Introduction Kim’s Algorithm for efficient processing Count bug – Solution inequality bug – Solution Alternate Algorithm Modification of Kim’s algorithm
Nested Queries Queries containing other queries Inner query: –Can appear in FROM or WHERE clause “outer query” “inner query” Example: SELECT cname FROM borrower WHERE cname IN (SELECT cname FROM depositor) think this as a function that returns the result of the inner query
Evaluation of Nested Queries Naive method : Tuple Iteration Semantics (TIS) - inefficient. Kim’s Algorithm Rationale : Interesting and powerful feature of SQL. Unnesting : Process of transforming nested queries into canonical form. Classified the Nested Queries for better understanding and processing
Types : SUPPLIER(sno, sname, sloc, sbudget), PARTS(pno,pname,qoh,color), PROJECT(jno,jname,pno,jbudget,jloc) SHIPMENT(sno,pno,jno,qty,shipdate) Type-A Nesting: Not correlated, aggregated sub query Example : SELECT SNO FROM SP WHERE PNo= (SELECT MAX(PN0) FROM P) can be evaluated independently of the outer query block, and the result of its evaluation will be a single constant
Type-N Nesting : Non correlated, not aggregated subquery SELECT SNO FROM SP WHERE PNO IS lN (SELECT PNO FROM P WHERE WEIGHT> 50) Evaluation : inner query block Q is processed, resulting in a list of values X which can then be substituted for the inner query block so that PNO IS IN Q becomes PNO IS IN X.The resulting query is then evaluated by nested iteration
Type-J Nesting : Correlated, not aggregated subquery SELECT SNAME FROM S WHERE SNO IS IN (SELECT SNO FROM SP WHERE QTY> 100 AND SPORIGIN = S. CITY). Type-JA Nesting : Correlated, aggregated subquery SELECT PNAM FROM P WHERE PNO= (SELECT MAX(PN0) FROM SP WHERE SPORlGlN = P.CITY) Evaluation : In TIS, the inner query block is processed once for each tuple of the outer relation which satisfies all simple predicates on the outer relation inefficient Kim developed alternate algorithms for efficient processing of nested queries.
Algorithm NEST-N-J (for type-N or type-J) 1. Combine the FROM clauses of all query blocks into one FROM clause 2. AND together the WHERE clauses of all query blocks, replacing IS IN by = 3. Retain the SELECT clause of the outermost query block The result is a canonical query logically equivalent to the original nested query.SELECT RiCk FROM RiFROM Ri,Rj WHERE RiCh IS IN WHERE RiCh = RjCm (SELECT RjCm FROM Rj)
Algorithm NEST-JA 1. Generate a temporary relation Rt(C1,Cn,Cn+l) from R2 such that Rt Cn+l is the result of applying the aggregate function AGG on the Cn+l column of R2 which have matching values of RI for Cl,C2, etc SELECT R1.Cn+2Rt(C1,..,Cn,Cn+1)=(SELECT FROM R1 C1,Cn,AGG(Cn+1) WHERE R1.Cn+1 = FROM R2 (SELECT AGG(R2.Cn+1) GROUP BY C1,..,Cn) FROM R2 WHERE R2.C1 = R1.C1 AND R2.C1 = R1.C1 AND … R2.Cn = R1.C1);
2. Transform the inner query block of the initial query by changing all references to R2 columns Join predicates which also reference Rl to the corresponding Rt columns. The result is a type-J nested query, which can be passed to algorithm NEST-N-J for transformation to its canonical equivalent. SELECT R1.Cn+2 FROM R1 WHERE R1.Cn+1 = (SELECT Rt.Cn+1 FROM Rt WHERE Rt.C1 = R1.C1 AND Rt.C2 = R1.C2 AND Rt.Cn = R1.C1);
Count bug : PARTS (PNUM,QOH) SUPPLY (PNUM,QUAN,SHIPDATE) SELECT PNUM FROM PARTS WHERE QOH = (SELECT COUNT( SHlPDATE ) FROM SUPPLY WHERE SUPPLY. PNUM = PARTS.PNUM AND SHIPDATE < l – l - 80) Parts PNUMQOH PNUMQUANSHIPDATE Supply PNUM 10 8 Result by TISResult PNUM 10
Solution using Outer Join R X A B S Y B C E R=+S XY Anull BB C E
Solution with outer joins temp (SUPPNUM,CT) = (select parts.PNUM, count(SHIPDATE) from parts, supply where SHIPDATE < and parts.PNUM =+ supply.PNUM group by parts.PNUM) parts.PNUM =+ supply.PNUM (for SHIPDATE < ) Parts.PNUMParts.QOHSupply.PNUMSupply.QUONSupply.SHIPDATE null
TEMP SUPPNUMCT Final Result PNUM 10 8 Drawbacks : 1.If the sub query has COUNT(*), this will always return a result > 0 because of the outer join. The '*' must be changed to a column name from the inner relation. SELECT PNUM FROM PARTS,TEMP WHERE PARTS.QOH = TEMP.CT AND PARTS.PNUM = TEMP.SUPPNUM
2. Duplicates Problem : Parts PNUMQOH Supply PNUMQUANSHIPDATE Result by TIS Our Result PNUM PNUM 8 SUPPNUMCT
Solution: 1.Remove duplicates before the join in the creation of Temp table is performed. TEMPI(PNUM) = (SELECT DISTINCT PNUM FROM PARTS) 2. Use the projection instead of outer relation in any join required to build the temp table TEMP2(SUPPNUM,CT) = (SELECT TEMP1.PNUM,COUNT(SHIPDATE) FROM TEMP1, SUPPLY WHERE SUPPLY.SHIPDATE < AND TEMP1.PNUM =+ SUPPLY.PNUM GROUP BY TEMP1.PNUM) SUPPNUMCT PNUM
Another bug : Relations other than equality SELECT PNUM FROM PARTS WHERE QOH = (SELECT MAX(QUAN) FROM SUPPLY WHERE SUPPLY. PNUM < PARTS.PNUM AND SHIPDATE < l – l - 80) TEMP (SUPPNUM, MAXQUAN) = SELECT PNUM, MAX(QUAN) FROM SUPPLY WHERE SHIPDATE < l-l-80 GROUP BY PNUM SELECT PNUM FROM PARTS, TEMP WHERE QOH = TEMP.MAXQUAN AND TEMP.SUPPNUM<PARTS.PNUM Max is calculated for each S.pnum but required is Max should be taken for a set of S.Pnum which are lesser than given P.Pnum Problem
Solution : 1. First join, then aggregate (Kim' was: First group, then join). TEMP (SUPPNUM, MAXQUAN) = SELECT PNUM, MAX(QUAN) FROM PARTS,SUPPLY WHERE SHIPDATE < l-l-80 AND SUPPLY.PNUM < PARTS.PNUM GROUP BY PNUM SELECT PNUM FROM PARTS,TEMP WHERE PARTS.QOH = TEMP.MAXQUAN AND PARTS.PNUM = TEMP.SUPPNUM
Modified Algorithm : Nest JA2 1. Project the Join column of the outer relation, and restrict it with any simple predicates applying to the outer relation TEMPI(PNUM) = (SELECT DISTINCT PNUM FROM PARTS) 2. Create a temporary relation, Joining the inner relation with the projection of the outer relation. If the aggregate function is COUNT, the Join must be an outer Join TEMP2(PNUM)= (SELECT PNUM FROM SUPPLY WHERE SHIPDATE < l-1-80) TEMP3 (PNUM,CT) = (SELECT TEMPl. PNUM, COUNT(TEMP2. SHIPDATE) FROM TEMPl,TEMP2 WHERE TEMPl.PNUM=+TEMP2.PNUM GROUP BY TEMPl. PNUM)
3.Join the outer relation with the temporary relation, according to the transformed version of the original query SELECT PNUM FROM PARTS,TEMP3 WHERE PARTS.QOH = TEMP3.CT AND PARTS.PNUM = TEMP3.PNUM Processing a General Nested Query : Recursive Approach procedure nest_g (query-block) for each predicate in the WHERE clause of query-block if predicate is a nested predicate (i.e contains inner query block) nest_g (inner_query_block) /* Determine type of nesting and call appropriate transformation procedure*/ /* if nesting is type-JA */ nest-JA2(inner_query_block)
Nest_g contd nest-N-J(query_block,inner_query_block) Else /* if nesting is type-A */ nest_a(inner_query_block) Else nest-N-J (query_block, inner_query_block) Return Advantage : Simplicity
Analysis
Modified Kim’s Algorithm : R.B OP1 TEMP1.COUNT : R.B OP1 O ITEMPI < I R OJ S I,Hence better than alternate algorithm
References: 1.Optimisation of Nested SQL Queries Revisited - Richard A Ganski, Harry K T Wong 2.Improved Unnesting Algorithms for Join Aggregate SQL Queries – M.Muralikrishna
Thank You