Principles of Query Processing
Application Programmer (e.g., business analyst, Data architect) Sophisticated Application Programmer (e.g., SAP admin) DBA, Tuner Hardware [Processor(s), Disk(s), Memory] Operating System Concurrency ControlRecovery Storage Subsystem Indexes Query Processor Application
Overview of Query Processing Parser Query Optimizer StatisticsCost Model QEPParsed Query Database High Level Query Query Result Query Evaluator
Outline Processing relational operators Query optimization Performance tuning
Projection Operator R.attrib,.. (R) Implementation is straightforward SELECT bid FROM Reserves R WHERE R.rname < ‘C%’
Selection Operator R.attr op value (R) Size of result = R * selectivity Scan Clustered index: Good Non-clustered index: –Good for low selectivity –Worse than scan for high selectivity SELECT * FROM Reserves R WHERE R.rname < ‘C%’
Example of Join SELECT * FROM Sailors R, Reserve S WHERE R.sid=S.sid
Notations |R| = number of pages in outer table R ||R|| = number of tuples in outer table R |S| = number of pages in inner table S ||S|| = number of tuples in inner table S M = number of main memory pages allocated
Simple Nested Loop Join RS Tuple 1 scan per R tuple |S| pages per scan ||R|| tuples
Simple Nested Loop Join Scan inner table S per R tuple: ||R|| * |S| –Each scan costs |S| pages –For ||R|| tuples |R| pages for outer table R Total cost = |R| + ||R|| * |S| pages Not optimal!
Block Nested Loop Join RS M – 2 pages 1 scan per R block |S| pages per scan |R| / (M – 2) blocks
Block Nested Loop Join Scan inner table S per block of (M – 2) pages of R tuples –Each scan costs |S| pages –|R| / (M – 2) blocks of R tuples |R| pages for outer table R Total cost = |R| + |R| / (M – 2) * |S| pages R should be the smaller table
Index Nested Loop Join RS Tuple Index ||R|| tuples 1 probe per R tuple
Index Nested Loop Join Probe S index for matching S tuples per R tuple –Probe hash index: 1.2 I/Os –Probe B+ tree: 2-4 I/Os, plus retrieve matching S tuples: 1 I/O –For ||R|| tuples |R| pages for outer table R Total cost = |R| + ||R|| * index retrieval Better than Block NL join only for small number of R tuples
Sort Merge Join External sort R External sort S Merge sorted R and sorted S
External Sort R R 0,M-1 R 0,M …… R 1,2 R 1,M-1 …Merge pass 1 R 1,1 Merge pass 2R 2,1 Split pass R R 0,1 # merge passes = log M-1 |R|/M Cost per pass = |R| input + |R| output = 2 |R| Total cost = 2 |R| ( log M-1 |R|/M + 1) including split pass Size of R 0,i = M, # R 0,i ’s = |R|/M (m-1)-way merge
A classic problem in computer science! Data requested in sorted order –e.g., find students in increasing cap order Sorting is used in many applications –First step in bulk loading operations. –Sorting useful for eliminating duplicate copies in a collection of records (How?) – Sort-merge join algorithm involves sorting. Problem: sort 1Gb of data with 1Mb of RAM. External Sorting
2-Way Sort: Requires 3 Buffers Pass 1: Read a page, sort it, write it. –only one buffer page is used Pass 2, 3, …, etc.: – three buffer pages used. Main memory buffers INPUT 1 INPUT 2 OUTPUT Disk
Two-Way External Merge Sort Each pass we read + write each page in file. N pages in the file => the number of passes So total cost is: Idea: Divide and conquer: sort subfiles and merge Input file 1-page runs 2-page runs 4-page runs 8-page runs PASS 0 PASS 1 PASS 2 PASS 3 9 3,4 6,2 9,48,75,63,1 2 3,45,62,64,97,8 1,32 2,3 4,6 4,7 8,9 1,3 5,62 2,3 4,4 6,7 8,9 1,2 3,5 6 1,2 2,3 3,4 4,5 6,6 7,8
General External Merge Sort To sort a file with N pages using B buffer pages: –Pass 0: use B buffer pages. Produce sorted runs of B pages each. –Pass 2, …, etc.: merge B-1 runs. B Main memory buffers INPUT 1 INPUT B-1 OUTPUT Disk INPUT 2... More than 3 buffer pages. How can we utilize them?
Cost of External Merge Sort Number of passes: Cost = 2N * (# of passes) E.g., with 5 buffer pages, to sort 108 page file: –Pass 0: = 22 sorted runs of 5 pages each (last run is only 3 pages) –Pass 1: = 6 sorted runs of 20 pages each (last run is only 8 pages) –Pass 2: 2 sorted runs, 80 pages and 28 pages –Pass 3: Sorted file of 108 pages
Number of Passes of External Sort
Sequential vs Random I/Os Transfer rate increases 40% per year; seek time and latency time decreases by only 8% per year Is minimizing passes optimal ? Would merging as many runs as possible the best solution? Suppose we have 80 runs, each 80 pages long and we have 81 pages of buffer space. We can merge all 80 runs in a single pass –each page requires a seek to access (Why?) –there are 80 pages per run, so 80 seeks per run –total cost = 80 runs X 80 seeks = 6,400 seeks
Sequential vs Random I/Os (Cont) We can merge all 80 runs in two steps –5 sets of 16 runs each read 80/16=5 pages of one run 16 runs result in sorted run of 1280 pages each merge requires 80/5X16 = 256 seeks for 5 sets, we have 5X256 = 1280 seeks –merge 5 runs of 1280 pages read 80/5=16 pages of one run => 1280/16=80 seeks in total 5 runs => 5X80 = 400 seeks –total: =1680 seeks!!! Number of passes increases, but number of seeks decreases!
Sort Merge Join External-sort R: 2 |R| * ( log M-1 |R|/M + 1) –Split R into |R|/M sorted runs each of size M: 2 |R| –Merge up to (M – 1) runs repeatedly – log M-1 |R|/M passes, each costing 2 |R| External-sort S: 2 |S| * ( log M-1 |S|/M + 1) Merge matching tuples from sorted R and S: |R| + |S| Total cost = 2 |R| * ( log M-1 |R|/M + 1) + 2 |S| * ( log M-1 |S|/M + 1) + |R| + |S| –If |R| < M*(M-1), cost = 5 * (|R| + |S|)
GRACE Hash Join X X X R S bucketID = X mod 4 Join on R.X = S.X R S = R0 S0 + R1 S1 + R2 S2 + R3 S3
GRACE Hash Join – Partition Phase M main memory buffers Disk Original Relation OUTPUT 2 INPUT 1 hash function h1 M-1 Partitions 1 2 M-1... R (M – 1) partitions, each of size |R| / (M – 1)
GRACE Hash Join – Join Phase Partitions of R & S Input buffer for Si Hash table for partition Ri (< M-1 pages) B main memory buffers Disk Output buffer Disk Join Result hash fn h2 Partition must fit in memory: |R| / (M – 1) < M -1
GRACE Hash Join Algorithm Partition phase: 2 (|R| + |S|) –Partition table R using hash function h1: 2 |R| –Partition table S using hash function h1: 2 |S| –R tuples in partition i will match only S tuples in partition I –R (M – 1) partitions, each of size |R| / (M – 1) Join phase: |R| + |S| –Read in a partition of R (|R| / (M – 1) < M -1) –Hash it using function h2 (<> h1!) –Scan corresponding S partition, search for matches Total cost = 3 (|R| + |S|) pages Condition: M > √ f|R|, f ≈ 1.2 to account for hash table
Summary of Join Operator Simple nested loop: |R| + ||R|| * |S| Block nested loop: |R| + |R| / (M – 2) * |S| Index nested loop: |R| + ||R|| * index retrieval Sort-merge: 2 |R| * ( log M-1 |R|/M + 1) + 2 |S| * ( log M-1 |S|/M + 1) + |R| + |S| GRACE hash: 3 * (|R| + |S|) –Condition: M > √f|R|
Overview of Query Processing Parser Query Optimizer StatisticsCost Model QEPParsed Query Database High Level Query Query Result Query Evaluator
Query Rewriting A query can be expressed in many forms, with some being more efficient than others. Example: S, P, SP relations Select Distinct S.sname From S Where S.s# IN (Select SP.s# From SP Where SP.p# = ‘P2’) Select Distinct S.sname From S, SP Where S.s# = SP.s# AND SP.p# = ‘P2’ Select Distinct S.sname From S Where ‘P2’ IN (Select SP.p# From SP Where SP.p# = S.s#)
Select Distinct S.sname From S Where S.s# = ANY (Select SP.s# From SP Where SP.p# = ‘P2’) Select Distinct S.sname From S Where EXISTS (Select * From SP Where SP.s# = S.s# And SP.p# = ‘P2’) Select Distinct S.sname From S Where 0 < (Select Count(*) From SP Where SP.s# = S.s# And SP.p# = ‘P2’) Select S.sname From S, SP Where SP.s# = S.s# And SP.p# = ‘P2’) Group by S.sname
Query Optimization Given: An SQL query joining n tables Dream: Map to most efficient plan Reality: Avoid rotten plans State of the art: –Most optimizers follow System R’s technique –Works fine up to about 10 joins SELECT S.sname FROM Reserves R, Sailors S WHERE R.sid=S.sid AND R.bid=100 AND S.rating>5 Reserves Sailors sid=sid bid=100 rating > 5 sname
Complexity of Query Optimization Many degrees of freedom –Selection: scan versus (clustered, non-clustered) index –Join: block nested loop, sort-merge, hash –Relative order of the operators –Exponential search space! Heuristics –Push the selections down –Push the projections down –Delay Cartesian products –System R: Only left-deep trees B A C D
Selection: - cascade - commutative Projection: - cascade Join: - associative - commutative Equivalences in Relational Algebra R (S T) (R S) T (R S) (S R)
Equivalences in Relational Algebra A projection commutes with a selection that only uses attributes retained by the projection Selection between attributes of the two arguments of a cross-product converts cross-product to a join A selection on just attributes of R commutes with join R S (i.e., (R S) (R) S ) Similarly, if a projection follows a join R S, we can `push’ it by retaining only attributes of R (and S) that are needed for the join or are kept by the projection
System R Optimizer 1.Find all plans for accessing each base table 2.For each table Save cheapest unordered plan Save cheapest plan for each interesting order Discard all others 3.Try all ways of joining pairs of 1-table plans; save cheapest unordered + interesting ordered plans 4.Try all ways of joining 2-table with 1-table 5.Combine k-table with 1-table till you have full plan tree 6.At the top, to satisfy GROUP BY and ORDER BY Use interesting ordered plan Add a sort node to unordered plan
Source: Selinger et al, “Access Path Selection in a Relational Database Management System”
Search Strategies for Single Relations
Note: Only branches for NL join are shown here. Additional branches for other join methods (e.g. sort-merge) are not shown. Source: Selinger et al, “Access Path Selection in a Relational Database Management System”
What is “Cheapest”? Need information about the relations and indexes involved Catalogs typically contain at least: –# tuples (NTuples) and # pages (NPages) for each relation. –# distinct key values (NKeys) and NPages for each index. –Index height, low/high key values (Low/High) for each tree index. Catalogs updated periodically. –Updating whenever data changes is too expensive; lots of approximation anyway, so slight inconsistency ok. More detailed information (e.g., histograms of the values in some field) are sometimes stored.
Estimating Result Size Consider a query block: Maximum # tuples in result is the product of the cardinalities of relations in the FROM clause. Reduction factor (RF) associated with each term i reflects the impact of the term in reducing result size –Term col=value has RF 1/NKeys(I) –Term col1=col2 has RF 1/ MAX (NKeys(I1), NKeys(I2)) –Term col>value has RF (High(I)-value)/(High(I)-Low(I)) Result cardinality = Max # tuples * product of all RF’s. –Implicit assumption that terms are independent! SELECT attribute list FROM relation list WHERE term 1 AND... AND term k
Cost Estimates for Single-Table Plans Index I on primary key matches selection: –Cost is Height(I)+1 for a B+ tree, about 1.2 for hash index. Clustered index I matching one or more selects: –(NPages(I)+NPages(R)) * product of RF’s of matching selects. Non-clustered index I matching one or more selects: –(NPages(I)+NTuples(R)) * product of RF’s of matching selects. Sequential scan of file: –NPages(R). +Note: Typically, no duplicate elimination on projections! (Exception: Done on answers if user says DISTINCT.)
Counting the Costs With 5 buffers, cost of plan: –Scan Reserves (1000) + write temp T1 (10 pages, if we have 100 boats, uniform distribution) –Scan Sailors (500) + write temp T2 (250 pages, if we have 10 ratings). –Sort T1 (2*10*2), sort T2 (2*250*4), merge (10+250), total=2300 –Total: 4060 page I/Os If we used BNL join, join cost = 10+4*250, total cost = 2770 If we ‘push’ projections, T1 has only sid, T2 only sid and sname: –T1 fits in 3 pages, cost of BNL drops to under 250 pages, total < 2000 Reserves Sailors sid=sid bid=100 sname (On-the-fly) rating > 5 (Scan; write to temp T1) (Scan; write to temp T2) (Sort-Merge Join) SELECT S.sname FROM Reserves R, Sailors S WHERE R.sid=S.sid AND R.bid=100 AND S.rating>5
Exercise Reserves: 100,000 tuples, 100 tuples per page With clustered index on bid of Reserves, we get 100,000/100 = 1000 tuples on 1000/100 = 10 pages Join column sid is a key for Sailors - at most one matching tuple Decision not to push rating>5 before the join is based on availability of sid index on Sailors Cost: Selection of Reserves tuples (10 I/Os); for each tuple, must get matching Sailors tuple (1000*1.2); total 1210 I/Os Reserves Sailors sid=sid bid=100 sname (On-the-fly) rating > 5 (Use clustered index on sid) (Index Nested Loops, with pipelining ) (On-the-fly) (Use hash Index on sid)
Query Tuning
Avoid Redundant DISTINCT DISTINCT usually entails a sort operation Slow down query optimization because one more “interesting” order to consider Remove if you know the result has no duplicates SELECT DISTINCT ssnum FROM Employee WHERE dept = ‘information systems’
Change Nested Queries to Join Might not use index on Employee.dept Need DISTINCT if an employee might belong to multiple departments SELECT ssnum FROM Employee WHERE dept IN (SELECT dept FROM Techdept) SELECT ssnum FROM Employee, Techdept WHERE Employee.dept = Techdept.dept
Avoid Unnecessary Temp Tables Creating temp table causes update to catalog Cannot use any index on original table SELECT * INTO Temp FROM Employee WHERE salary > SELECT ssnum FROM Temp WHERE Temp.dept = ‘information systems’ SELECT ssnum FROM Employee WHERE Employee.dept = ‘information systems’ AND salary > 40000
Avoid Complicated Correlation Subqueries Search all of e2 for each e1 record! SELECT ssnum FROM Employee e1 WHERE salary = (SELECT MAX(salary) FROM Employee e2 WHERE e2.dept = e1.dept SELECT MAX(salary) as bigsalary, dept INTO Temp FROM Employee GROUP BY dept SELECT ssnum FROM Employee, Temp WHERE salary = bigsalary AND Employee.dept = Temp.dept
Avoid Complicated Correlation Subqueries SQL Server 2000 does a good job at handling the correlated subqueries (a hash join is used as opposed to a nested loop between query blocks) –The techniques implemented in SQL Server 2000 are described in “Orthogonal Optimization of Subqueries and Aggregates” by C.Galindo- Legaria and M.Joshi, SIGMOD > 10000> 1000
Join on Clustering and Integer Attributes Employee is clustered on ssnum ssnum is an integer SELECT Employee.ssnum FROM Employee, Student WHERE Employee.name = Student.name SELECT Employee.ssnum FROM Employee, Student WHERE Employee.ssnum = Student.ssnum
Avoid HAVING when WHERE is enough May first perform grouping for all departments! SELECT AVG(salary) as avgsalary, dept FROM Employee GROUP BY dept HAVING dept = ‘information systems’ SELECT AVG(salary) as avgsalary FROM Employee WHERE dept = ‘information systems’ GROUP BY dept
Avoid Views with unnecessary Joins Join with Techdept unnecessarily CREATE VIEW Techlocation AS SELECT ssnum, Techdept.dept, location FROM Employee, Techdept WHERE Employee.dept = Techdept.dept SELECT dept FROM Techlocation WHERE ssnum = 4444 SELECT dept FROM Employee WHERE ssnum = 4444
Aggregate Maintenance Materialize an aggregate if needed “frequently” Use trigger to update create trigger updateVendorOutstanding on orders for insert as update vendorOutstanding set amount = (select vendorOutstanding.amount+sum(inserted.quantity*item.price) from inserted,item where inserted.itemnum = item.itemnum ) where vendor = (select vendor from inserted) ;
Avoid External Loops No loop: sqlStmt = “select * from lineitem where l_partkey <= 200;” odbc->prepareStmt(sqlStmt); odbc->execPrepared(sqlStmt); Loop: sqlStmt = “select * from lineitem where l_partkey = ?;” odbc->prepareStmt(sqlStmt); for (int i=1; i<200; i++) { odbc->bindParameter(1, SQL_INTEGER, i); odbc->execPrepared(sqlStmt); }
Avoid External Loops SQL Server 2000 on Windows 2000 Crossing the application interface has a significant impact on performance Let the DBMS optimize set operations
Avoid Cursors No cursor select * from employees; Cursor DECLARE d_cursor CURSOR FOR select * from employees; OPEN d_cursor while = 0) BEGIN FETCH NEXT from d_cursor END CLOSE d_cursor go
Avoid Cursors SQL Server 2000 on Windows 2000 Response time is a few seconds with a SQL query and more than an hour iterating over a cursor
Retrieve Needed Columns Only –All Select * from lineitem; –Covered subset Select l_orderkey, l_partkey, l_suppkey, l_shipdate, l_commitdate from lineitem; Avoid transferring unnecessary data May enable use of a covering index.
Use Direct Path for Bulk Loading sqlldr directpath=true control=load_lineitem.ctl data=E:\Data\lineitem.tbl load data infile "lineitem.tbl" into table LINEITEM append fields terminated by '|' ( L_ORDERKEY, L_PARTKEY, L_SUPPKEY, L_LINENUMBER, L_QUANTITY, L_EXTENDEDPRICE, L_DISCOUNT, L_TAX, L_RETURNFLAG, L_LINESTATUS, L_SHIPDATE DATE "YYYY- MM-DD", L_COMMITDATE DATE "YYYY-MM-DD", L_RECEIPTDATE DATE "YYYY-MM-DD", L_SHIPINSTRUCT, L_SHIPMODE, L_COMMENT )
Use Direct Path for Bulk Loading Direct path loading bypasses the query engine and the storage manager. It is orders of magnitude faster than for conventional bulk load (commit every 100 records) and inserts (commit for each record).
Some Idiosyncrasies OR may stop the index being used –break the query and use UNION Order of tables may affect join implementation
Query Tuning – Thou Shalt … Avoid redundant DISTINCT Change nested queries to join Avoid unnecessary temp tables Avoid complicated correlation subqueries Join on clustering and integer attributes Avoid HAVING when WHERE is enough Avoid views with unnecessary joins Maintain frequently used aggregates Avoid external loops
Query Tuning – Thou Shalt … Avoid cursors Retrieve needed columns only Use direct path for bulk loading