Query Processing CSD305 Advanced Databases
Agenda Query Processing Fundamentals QP vs. RA QP in MS SQL Server Rule-based optimization Cost-based optimization Statistical query optimization Query Plans Search Conditions Access Methods Join Methods Query tuning CSD305 Advanced Databases
Fundamentals of Query Processing A SQL statement is semantically rather similar to a specification in Relational Calculus and can be translated into an expression in Relational Algebra SELECT title FROM Loan JOIN Book ON bookNo JOIN Member ON memberNo WHERE memberNo = 4128 can be expressed as p title((rmember#=4128 Loan) Book) Member ) CSD305 Advanced Databases
Relational algebra reminder p title((rmember#=4128 Loan) Book) Member ) CSD305 Advanced Databases
Notes on the algebra Note that the expression can be represented as a tree With each leaf representing a base table And each node representing an RA operation Each RA operation can be binary or unary And the result being a single relation CSD305 Advanced Databases
Does this look at all familiar? CSD305 Advanced Databases This picture shows a Query Plan from SQL Server
QP vs. RA As we have seen Query Plans are rather similar to Relational Algebra They take the form of binary trees They produce intermediate results (relations) on the way to producing the answer But They are concerned with physical structures as well as logical structures The operations are physical implementations of RA expressions e.g. there are several ways to do a JOIN CSD305 Advanced Databases
SQL Server QP Basics SQL requests are sent to the DBMS The DBMS parses the SQL accesses metadata prepares query plan(s) chooses the cheapest plan executes the chosen plan Results in the form of a table are returned to the client CSD305 Advanced Databases
QP Schematic CSD305 Advanced Databases
Inside the DBMS CSD305 Advanced Databases
What do the QP components do? Within the SQL Processor The Query Parser Ensures that the SQL is well-formed and is consistent with the schema The Query Optimizer Generates plans for carrying out the query and chooses the best Saves plans in the cache The Query Executor Picks plans from the cache Carries out the chosen plan Returns results to the client CSD305 Advanced Databases
What do the QP components do? Within the Storage Manager The Page Manager Processes the request for data at a logical level (the database page) The I/O Manager Processes the request at a physical level (disk input/output) CSD305 Advanced Databases
Query Optimization There are three basic techniques of query optimization Rule-based In which rules are applied to generate a Query Plan Cost-based In which the costs of various Query Plans are estimated and compared Statistical In which the optimizer makes use of statistics about data distribution to improve Query Plans Microsoft SQL Server uses a combination of all three to select a QP CSD305 Advanced Databases
Rule-based Query Optimization A sequence of heuristics (rules of thumb) is applied to construct a QP Examples of rules Apply WHERE clause conditions first if it is possible If you have an index and you know the key value, use the key for direct access If you have an index and you know the leftmost part of the key value, use the key for direct access CSD305 Advanced Databases
Cost-based optimization Each QP is evaluated with respect to its estimated costs, including Disk I/O Number and size of rows processed Use of processing cycles The accuracy of cost estimation can vary greatly e.g. what percentage of rows will be returned when the condition (WHERE sex = ‘F’) is applied? 1%, 10%, 50%? The Query Optimizer needs statistics to make a good estimate CSD305 Advanced Databases
Statistical optimization The DBMS maintains statistics about the distribution of data values in tables and indexes The statistics are stored as part of the metadata and used by the Query Optimizer The statistics include Densities of key values (calculated as 1 divided by the number of different values) Sample values or range endpoints (up to 200 of them) CSD305 Advanced Databases
Statistics Example CSD305 Advanced Databases
Analyzing Query Plans Use SQL Management Studio On the Query toolbar Choose “New Query” On the Query toolbar Turn on “Display Estimated Execution Plan" option Run your SQL query A graphical representation of the plan will be displayed You can then click on any node to "drill down" to find more about what is going on Example on next few slides CSD305 Advanced Databases
The SQL query and results CSD305 Advanced Databases
The QP overview CSD305 Advanced Databases
Drilling down on the first node CSD305 Advanced Databases
…and the next CSD305 Advanced Databases
…and the last CSD305 Advanced Databases
Some notes on the query plan The query begins by seeking the classCode in a nonclustered index From this it will collect bookmarks The bookmarks are used in the next step to look up the corresponding row in the clustered index The rows are then returned to the client CSD305 Advanced Databases
Summary SQL queries are first parsed to establish syntactic correctness The Query Optimizer generates query plan(s) using rule-based, cost-based and statistical techniques A query plan is somewhat similar to the RA but incorporates physical considerations CSD305 Advanced Databases