Query Execution Section 15.1 Shweta Athalye CS257: Database Systems ID: 118 Section 1
Agenda Query Processor Query compilation Physical Query Plan Operators Scanning Tables Table Scan Index scan Sorting while scanning tables Model of computation for physical operators Parameters for measuring cost I/O cost for scan operators Iterators
Query processor The query processor is the group of components of a DBMS that turns user queries and data-modification commands into a sequence of database operations and executes those operations query processor is responsible for supplying details regarding how the query is to be executed
The major parts of the query processor
Query compilation Query compilation itself is a multi-step process consisting of : Parsing: in which a parse tree representing query and its structure is constructed Query rewrite: in which the parse tree is converted to an initial query plan Physical plan generation: where the abstract query plan is turned into a physical query plan
Outline of query compilation
Physical Query Plan Operators Physical query plans are built from operators Each of the operators implement one step of the plan. Physical operators can be implementations of the operator of relational algebra. They can also be non relational algebra operators like “scan” which scans tables.
Scanning Tables One of the most basic things in a physical query plan. Necessary when we want to perform join or union of a relation with another relation.
Two basic approaches to locating the tuples of a relation R Table-scan Relation R is stored in secondary memory with its tuples arranged in blocks it is possible to get the blocks one by one This operation is called Table Scan
Two basic approaches to locating the tuples of a relation R Index-scan there is an index on any attribute of Relation R Use this index to get all the tuples of R This operation is called Index Scan
Sorting While Scanning Tables Why do we need sorting while scanning? the query could include an ORDER BY clause. Requiring that a relation be sorted Various algorithms for relational-algebra operations require one or both of their arguments to be sorted relation physical-query-plan operator sort-scan takes a relation R and a specification of the attributes on which the sort is to be made, and produces R in that sorted order
Model of Computation for Physical Operators Choosing physical plan operators wisely is an essential for a good query processor. Cost for an operation is measured in number of disk i/o operations. If an operator requires the final answer to a query to be written back to the disk, the total cost will depend on the length of the answer and will include the final write back cost to the total cost of the query.
Improvements in cost Major improvements in cost of the physical operators can be achieved by avoiding or reducing the number of disk i/o operations This can be achieved by passing the answer of one operator to the other in the main memory itself without writing it to the disk.
Parameters for Measuring Costs Parameters that affect the performance of a query Buffer space availability in the main memory at the time of execution of the query Size of input and the size of the output generated The size of memory block on the disk and the size in the main memory also affects the performance
I/0 Cost for Scan Operators If Relation R is clustered, i.e. it is stored in approximately B blocks then the total number of disk operations required is B If R is clustered but requires a two phase multi way merge sort then the total number of disk i/o required will be 3B. However, if R is not clustered, then the number of required disk I/0's is generally much higher.
Iterators for Implementation of Physical Operators Many physical operators can be implemented as an iterator It is a group of three functions that allows a consumer of the result of the physical operator to get the result one tuple at a time
Iterator The three functions forming the iterator are: Open: This function starts the process of getting tuples. It initializes any data structures needed to perform the operation
Iterator GetNext This function returns the next tuple in the result Adjusts data structures as necessary to allow subsequent tuples to be obtained If there are no more tuples to return, GetNext returns a special value NotFound
Iterator Close This function ends the iteration after all tuples it calls Close on any arguments of the operator