Chapter 23 Query Processing Pearson Education © 2014.

Introduction With declarative languages such as SQL, user specifies what data is required rather than how it is to be retrieved. Relieves user of knowing what constitutes good execution strategy. 2 Pearson Education © 2014

Introduction Also gives DBMS more control over system performance.
Two main techniques for query optimization: heuristic rules that order operations in a query; comparing different strategies based on relative costs, and selecting one that minimizes resource usage. Disk access tends to be dominant cost in query processing for centralized DBMS. 3 Pearson Education © 2014

Query Processing Activities involved in retrieving data from the database. Aims of query processing (QP): transform query written in high-level language (e.g., SQL), into correct and efficient execution strategy expressed in low-level language that implements relational algebra (RA); execute strategy to retrieve required data. 4 Pearson Education © 2014

Query Optimization (QO)
As there are many equivalent transformations of same high-level query, aim of QO is to choose one that minimizes resource usage. Goal: reduce total execution time of query. i.e., reduce disk accesses i.e., Reduce the total amount of data used Problem computationally intractable when there are a large number of relations Use heuristics to find near optimal solution. 5 Pearson Education © 2014

Example 23.1 - Different Strategies
Find all Managers who work at a London branch. SELECT * FROM Staff s, Branch b WHERE s.branchNo = b.branchNo AND s.position = ‘Manager’ AND b.city = ‘London’; 6 Pearson Education © 2014

Assume: 1000 tuples in Staff; 50 tuples in Branch; 50 Managers; 5 London branches; no indexes or sort keys; results of any intermediate operations stored on disk; cost of the final write is ignored; tuples are accessed one at a time. 7 Pearson Education © 2014

Example 23.1 - 3 Different Strategies
(position='Manager')  (city='London')  (Staff.branchNo=Branch.branchNo) (Staff X Branch) Calculate Cartesian product of Staff and Branch: Read contents of Staff & contents of Branch reads Write Cartesian product result: 1000 * 50 tuples Select those tuples that satisfy expression: Read all tuples: * 50 reads Total Disk accesses: ( ) + 2*(1000 * 50) = 101,050 8 Pearson Education © 2014

(2) (position='Manager')  (city='London') (Staff Staff.branchNo=Branch.branchNo Branch) Join Staff and Branch on branchNo Read contents of Staff & contents of Branch reads Write relation that results from the Join 1000 writes (each staff works at 1 Branch) Select those tuples that satisfy expression: Read all tuples: reads Total Disk accesses: ( ) + 2*(1000) = 3,050 9 Pearson Education © 2014

(3) (position='Manager'(Staff)) Staff.branchNo=Branch.branchNo (city='London' (Branch)) Select the Managers from Staff Read contents of Staff & Select for “Manager”: 1000 reads Write resulting relation: 50 writes Select the London Branches Read contents of Branch & Select for “London”: 50 reads Write resulting relation: 5 writes Join the above 2 on branchNo: reads; Total Disk accesses: ( ) + 2 * (50 + 5) = 1,160 10

Example 23.1 - Cost Comparison
Cost (in disk accesses) are: (1) ( ) + 2*(1000 * 50) = 101,050 (2) 2* ( ) = 3,050 (3) * (50 + 5) = 1,160 Cartesian product and join operations much more expensive than selection, and third option significantly reduces size of relations being joined together. 11 Pearson Education © 2014

Phases of Query Processing
QP has four main phases: decomposition (consisting of parsing and validation); Create a valid RA execution plan optimization; Improve the RA execution plan code generation; Translate from RA to machine code execution. Run the code 12 Pearson Education © 2014

Phases of Query Processing
13 Pearson Education © 2014

Dynamic versus Static Optimization
Two times when first three phases of QP can be carried out: dynamically every time query is run; statically when query is first submitted. Advantages of dynamic QO arise from fact that information is up to date. Disadvantages are that performance of query is affected, time may limit finding optimum strategy. 14 Pearson Education © 2014

Dynamic versus Static Optimization
Advantages of static QO are removal of runtime overhead, and more time to find optimum strategy. Disadvantages arise from fact that chosen execution strategy may no longer be optimal when query is run. i.e., relation sizes could be very different Could use a hybrid approach to overcome this. 15 Pearson Education © 2014

Phase I: Query Decomposition
Aims are to transform high-level query into RA query and check that query is syntactically and semantically correct. Typical stages are: analysis, normalization, semantic analysis, simplification, query restructuring. 16 Pearson Education © 2014

Analysis - Example This query would be rejected on two grounds:
SELECT staffNumber FROM Staff WHERE position > 10; This query would be rejected on two grounds: staffNumber is not defined for Staff relation (should be staffNo). Comparison ‘>10’ is incompatible with type position, which is variable character string. 18 Pearson Education © 2014

Analysis Finally, query transformed into some internal representation more suitable for processing. Some kind of query tree is typically chosen, constructed as follows: Leaf node created for each base relation. Non-leaf node created for each intermediate relation produced by RA operation. Root of tree represents query result. Sequence is directed from leaves to root. 19 Pearson Education © 2014

Example 23.1 – Relational Algebra Tree (R.A.T.)
RAT for Query Option 3 20 Pearson Education © 2014

Normalization Converts query into a normalized form for easier manipulation. Binary or unary operations Predicate can be converted into 1 of 2 forms Conjunctive normal form: Or clauses joined together by Ands (Most Common) (position = 'Manager'  salary > 20000)  (branchNo = 'B003') Disjunctive normal form: And clauses joined together by Ors (position = 'Manager'  branchNo = 'B003' )  (salary >  branchNo = 'B003') 21 Pearson Education © 2014

Semantic Analysis Rejects normalized queries that are incorrectly formulated or contradictory. Query is incorrectly formulated if components do not contribute to generation of result. Query is contradictory if its predicate cannot be satisfied by any tuple. Algorithms to determine correctness exist only for queries that do not contain disjunction and negation. 22 Pearson Education © 2014

Semantic Analysis For these queries, can construct:
A relation connection graph. Normalized attribute connection graph. Relation connection graph Create node for each relation and node for result. Create edges between two nodes that represent a join, and edges between nodes that represent projection. If not connected, query is incorrectly formulated. 23 Pearson Education © 2014

Example 23.2 - Checking Semantic Correctness
SELECT p.propertyNo, p.street //insert project edge p-result FROM Client c, Viewing v, PropertyForRent p WHERE c.clientNo = v.clientNo AND //insert Join edge c-v c.maxRent >= 500 AND c.prefType = ‘Flat’ AND p.ownerNo = ‘CO93’; Relation Connection graph Have omitted the join condition (v.propertyNo = p.propertyNo) Create node for each relation and node for result. (4 nodes: c, v, p, Result) Create edges between two nodes that represent a join, and edges between nodes that represent projection. C & v join on clientNo; SELECT p.PropertyNo, p.street (these are projections from p to Result) 24 Pearson Education © 2014

Semantic Analysis - Normalized Attribute Connection Graph
Create node for each reference to an attribute in WHERE clause; single node for all or constants labeled 0. Create 2 unweighted directed edges between nodes that represent a join or =, and directed edge between attribute node and 0 node that represents selection. Weight edges a  0 with value c if it represents inequality condition (a  b + c); weight edges 0  a with -c if it represents inequality condition (a  c); weight edges with + and – c if it represents =. If graph has cycle for which valuation sum is negative, query is contradictory. 26 Pearson Education © 2014

SELECT p.propertyNo, p.street FROM Client c, Viewing v, PropertyForRent p WHERE c.maxRent > 500 AND [select; weight with -500] c.clientNo = v.clientNo AND [join, wt 0] v.propertyNo = p.propertyNo AND [join, wt 0] c.prefType = ‘Flat’ AND c.maxRent < 200; [select =, wt Flat & -Flat] [select <, wt 200] Create node for each reference to an attribute in WHERE, or constant 0. (6 in WHERE (one duplicate from SELECT), 3 constants: nodes = 7 nodes Joins: Create pair of directed edges between nodes that represent a join, wt with 0 Selection: Create a directed edge between attribute node and 0 node that represents selection. Weight edges a  b with value c, if it represents inequality condition (a  b + c); weight edges 0  a with -c, if it represents inequality condition (a  c). If graph has cycle for which valuation sum is negative, query is contradictory. Normalized attribute connection graph 27 Pearson Education © 2014

Simplification Detects redundant qualifications, eliminates common sub-expressions, transforms query to semantically equivalent but more easily and efficiently computed form. Assuming user has appropriate access privileges, first apply well-known rules of Boolean algebra. 29 Pearson Education © 2014

Transformation Rules for RA Operations
Conjunctive Selection operations can cascade into individual Selection operations (and vice versa). pqr(R) = p(q(r(R))) Sometimes referred to as cascade of Selection. branchNo='B003'  salary>15000(Staff) = branchNo='B003'(salary>15000(Staff)) 30 Pearson Education © 2014

Commutativity of Selection and Projection. If predicate p involves only attributes in projection list, Selection and Projection operations commute: Ai, …, Am(p(R)) = p(Ai, …, Am(R)) where p {A1, A2, …, Am} For example: fName, lName(lName='Beech'(Staff)) = lName='Beech'(fName,lName(Staff)) 33 Pearson Education © 2014

Commutativity of Theta join (and Cartesian product). R p S = S p R R X S = S X R Rule also applies to Equijoin and Natural join. For example: Staff staff.branchNo=branch.branchNo Branch == Branch staff.branchNo=branch.branchNo Staff 34 Pearson Education © 2014

Commutativity of Selection and Theta join (or Cartesian product). If selection predicate involves only attributes of one of join relations, Selection and Join (or Cartesian product) operations commute: p(R r S) = (p(R)) r S p(R X S) = (p(R)) X S where p {A1, A2, …, An} 35 Pearson Education © 2014

If selection predicate is conjunctive predicate having form (p  q), where p only involves attributes of R, and q only attributes of S, Selection and Theta join operations commute as: p  q(R r S) = (p(R)) r (q(S)) p  q(R X S) = (p(R)) X (q(S)) 36 Pearson Education © 2014

For example: position='Manager'  city='London'(Staff Staff.branchNo=Branch.branchNo Branch) = (position='Manager'(Staff)) Staff.branchNo=Branch.branchNo (city='London' (Branch)) 37 Pearson Education © 2014

Commutativity of Projection and Theta join (or Cartesian product). If projection list is of form L = L1  L2, where L1 only has attributes of R, and L2 only has attributes of S, provided join condition only contains attributes of L, Projection and Theta join commute: L1L2(R r S) = (L1(R)) r (L2(S)) 38 Pearson Education © 2014

If join condition contains additional attributes not in L (M = M1  M2 where M1 only has attributes of R, and M2 only has attributes of S), a final projection operation is required: L1L2(R r S) = L1L2( (L1M1(R)) r (L2M2(S))) 39 Pearson Education © 2014

For example: position,city,branchNo(Staff Staff.branchNo=Branch.branchNo Branch) == (position, branchNo(Staff)) Staff.branchNo=Branch.branchNo (city, branchNo (Branch)) and using the latter rule: position, city(Staff Staff.branchNo=Branch.branchNo Branch) == position, city ((position, branchNo(Staff)) Staff.branchNo=Branch.branchNo ( city, branchNo (Branch))) 40 Pearson Education © 2014

Commutativity of Projection and Union. L(R  S) = L(S)  L(R) Associativity of Union and Intersection (but not Set difference). (R  S)  T = S  (R  T) (R  S)  T = S  (R  T) 42 Pearson Education © 2014

Associativity of Theta join (and Cartesian product). Cartesian product and Natural join are always associative: (R S) T = R (S T) (R X S) X T = R X (S X T) If join condition q involves attributes only from S and T, then Theta join is associative: (R p S) q  r T = R p  r (S q T) 43 Pearson Education © 2014

For example: (Staff Staff.staffNo=PropertyForRent.staffNo PropertyForRent) ownerNo=Owner.ownerNo  staff.lName=Owner.lName Owner = Staff staff.staffNo=PropertyForRent.staffNo  staff.lName=lName (PropertyForRent ownerNo Owner) 44 Pearson Education © 2014

Example 23.3 Use of Transformation Rules
For prospective renters of flats, find properties that match requirements and owned by CO93. SELECT p.propertyNo, p.street FROM Client c, Viewing v, PropertyForRent p WHERE c.prefType = ‘Flat’ AND c.clientNo = v.clientNo AND v.propertyNo = p.propertyNo AND c.maxRent >= p.rent AND c.prefType = p.type AND p.ownerNo = ‘CO93’; 45 Pearson Education © 2014

Apply commutativity rule for selection so only one operation per tree branch. Process query left to right to build tree bottom up. (Can swap clauses later on due to associativity). e.g., c.prefType=‘Flat’ first; join c results and v on ClientNo (need to do Cartesian product followed by select on clientNo); Select from p wither ownerNo=‘CO93’ Join these results with previous ones on propertyNo (need to do Cartesian product followed by select on propoertyNo) Select on rent or prefType (could be two sequential selects) Project propoertNo, pstreet MULTIPL trees possible. This is not efficient (two cartesian products) Split AND clauses of select Into separate selects 46 Pearson Education © 2014

Replace cartesian product and select with an equijoin © Swap join on propertyNo with join on clientNo (rearrange to v and p feed into it) Cartesian X -> joins Decreases rows in results Associative rule: swap join On propertyNo with join on clientNo (rearrange v and p) 47 Pearson Education © 2014

Heuristical Processing Strategies
Perform Selection operations as early as possible. i.e., lower down the tree Keep predicates on same relation together. Replace Cartesian product with subsequent Selection with a join Do as late as possible (high up the tree) Use associativity of binary operations to rearrange leaf nodes so leaf nodes with most restrictive Selection operations executed first. 49 Pearson Education © 2014

Heuristical Processing Strategies
Perform Projection as early as possible. Keep projection attributes on same relation together. Compute common expressions once. If common expression appears more than once, and result not too large, store result and reuse it when required. Useful when querying views, as same expression is used to construct view each time. When you look at tree: Selects at bottom, then projects, then join FINAL result project at the top if needed 50 Pearson Education © 2014

Cost Estimation for RA Operations
Many different ways of implementing RA operations. Use formulae that estimate costs for a number of options, and select one with lowest cost. Given a tree (or many possible trees) Estimate cost of that execution plan Consider only cost of disk access, which is usually dominant cost in QP. Many estimates are based on cardinality of the relation, so need to be able to estimate this. 51 Pearson Education © 2014

Database Statistics Success of estimation depends on amount and currency of statistical information DBMS holds. Keeping statistics current can be problematic. If statistics updated every time tuple is changed, this would impact performance. DBMS could update statistics on a periodic basis, for example nightly, or whenever the system is idle. 52 Pearson Education © 2014

Typical Statistics for Relation R
nTuples(R) - number of tuples in R. bFactor(R) - blocking factor of R (number of records of R in a disk block) nBlocks(R) - number of blocks required to store R: nBlocks(R) = [nTuples(R)/bFactor(R)] 53 Pearson Education © 2014

Typical Statistics for Attribute A of Relation R
nDistinctA(R) - number of distinct values that appear for attribute A in R. minA(R), maxA(R) - minimum and maximum possible values for attribute A in R. SCA(R) - selection cardinality of attribute A in R. Average number of tuples that satisfy an equality condition on attribute A. 54 Pearson Education © 2014

Selection Operation Cost
Number of different implementations, depending on file structure, and whether attribute(s) involved are indexed/hashed. Main strategies are: Linear Search (Unordered file, no index). O(R) Binary Search (Ordered file, no index). O(log2R) Equality on hash key. O(c) Inequality condition on hash. O(R) Equality condition on primary key (B+ tree). O(logMR) Inequality condition on primary key (B+ tree). O(logMR) 55 Pearson Education © 2014

Join Operation Main strategies for implementing join:
Block Nested Loop Join. Indexed Nested Loop Join. Sort-Merge Join. Hash Join. 56 Pearson Education © 2014

Estimating Cardinality of Join
Cardinality of Cartesian product is: nTuples(R) * nTuples(S) More difficult to estimate cardinality of any join as depends on distribution of values. Worst case, cannot be any greater than this value. 57 Pearson Education © 2014

Block Nested Loop Join Simplest join algorithm is nested loop that joins two relations together a tuple at a time. Outer loop iterates over each tuple in R, and inner loop iterates over each tuple in S. As basic unit of reading/writing is a disk block, better to have two extra loops that process blocks. Estimated cost of this approach is: nBlocks(R) + (nBlocks(R) * nBlocks(S)) 58 Pearson Education © 2014

Block Nested Loop Join Could read as many blocks as possible of smaller relation, R say, into database buffer, saving one block for inner relation and one for result. New cost estimate becomes: nBlocks(R) + [nBlocks(S)*(nBlocks(R)/(nBuffer-2))] If can read all blocks of R into the buffer, this reduces to: nBlocks(R) + nBlocks(S) 59 Pearson Education © 2014

Indexed Nested Loop Join
If have index (or hash function) on join attributes of inner relation, can use index lookup. For each tuple in R, use index to retrieve matching tuples of S. Cost of scanning R is nBlocks(R), as before. Cost of retrieving matching tuples in S depends on type of index & number of matching tuples. 60

Sort-Merge Join For Equijoins, an efficient join is when both relations are sorted on join attributes. This would be true when both attributes have a B+ tree on them Just do the “merge” phase of merge sort Linear scan across each O(R+S) 61 Pearson Education © 2014

Summary of Join Complexity In decreasing order of cost
O(R) + R * (cost of lookup of A in S) No index on S: O(R) + R * O(S) = O (R*S) S fits in memory OR Sort-Merge Equijoin O(R) + O(S) = O (R+S) S records in sorted order by Join attribute A O(R) + R * O(log2A) = O (R+Rlog2A) = O (R+Rlog2S) Join attribute A in S is indexed with B+ tree O(R) + R * O(logmA) = O (R+RlogmA) = O(RlogmS) Join attribute A in S is indexed with Hash Function O(R) + R * O(1) = O (R+R) = O(R)

Enumeration of Alternative Strategies
Fundamental to efficiency of QO is the search space of possible execution strategies and the enumeration algorithm used to search this space. Query with 2 joins gives 12 join orderings: R (S T) R (T S) (S T) R (T S) R S (R T) S (T R) (R T) S (T R) S T (R S) T (S R) (R S) T (S R) T With n relations, (2(n – 1))!/(n – 1)! orderings. If n = 4 this is 120; if n = 10 this is > 176 billion. Compounded by various selection/join methods. 63 Pearson Education © 2014

Projection Operation To implement projection need to:
remove attributes that are not required; eliminate any duplicate tuples produced from previous step. Only required if projection attributes do not include a key. Two main approaches to eliminating duplicates: sorting; hashing. 64 Pearson Education © 2014

Reducing the Search Space
Restriction 1: Unary operations processed on-the- fly: selections processed as relations are accessed for first time; projections processed as results of other operations are generated. Restriction 2: Cartesian products are never formed unless query itself specifies one. Restriction 3: Inner operand of each join is a base relation, never an intermediate result. This uses fact that with left-deep trees inner operand is a base relation and so already materialized. Restriction 3 excludes many alternative strategies but significantly reduces number to be considered. 65 Pearson Education © 2014

Semantic Query Optimization
Based on constraints specified on the database schema to reduce the search space. For example, a constraint states that staff cannot supervise more than 100 properties, so any query searching for staff who supervise more than 100 properties will produce zero rows. Now consider: CREATE ASSERTION ManagerSalary CHECK (salary > AND position = ‘Manager’) SELECT s.staffNo, fName, lName, propertyNo FROM Staff s, PropertyForRent p WHERE s.staffNo = p.staffNo AND position = ‘Manager’; 66 Pearson Education © 2014

Semantic Query Optimization
Can rewrite this query as: SELECT s.staffNo, fName, lName, propertyNo FROM Staff s, PropertyForRent p WHERE s.staffNo = p.staffNo AND salary > AND position = ‘Manager’; Additional predicate may be very useful if only index for Staff is a B+-tree on the salary attribute. However, additional predicate would complicate query if no such index existed. 67 Pearson Education © 2014

Pipelining Materialization - output of one operation is stored in temporary relation for processing by next. Could also pipeline results of one operation to another without creating temporary relation. Known as pipelining or on-the-fly processing. Pipelining can save on cost of creating temporary relations and reading results back in again. Generally, pipeline is implemented as separate process or thread. 68 Pearson Education © 2014

Chapter 23 Query Processing Pearson Education © 2014.

Similar presentations

Presentation on theme: "Chapter 23 Query Processing Pearson Education © 2014."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Chapter 23 Query Processing Pearson Education © 2014.

Similar presentations

Presentation on theme: "Chapter 23 Query Processing Pearson Education © 2014."— Presentation transcript:

Similar presentations

About project

Feedback