Database Management 9. course. Execution of queries.

Database Management 9. course

Execution of queries

Query evaluation Query Parse, compile Relational algebra Optimize Execution plan Evaluate Statistics Query output Data

Query – SQL Parse – Correct SQL query? Relational algebra – Understandable for the computer Optimize – Based on what?

Execution plan – If several queries give the same result: which is the best? Evaluate – Find the proper data records Query output – Give answer to the user

Optimization example Data of a bank select balance from account where balance<2500 Two relational algebra representation

The cost of an operation depends on the algorithms we can use: e.g. an index speeds up the selection Primitive: elemental operation (projection, selection, …) Pipeline: building blocks for evaluations and statistics Input of a primitive=output of the previous primitive

Catalogue cost approximation For choosing the proper strategy The approximation of cost is needed Cost approximation can be done based on several attributes – Space – Time Statistics are stored in the catalogue

Content of the catalogue Number of records in relation r: n r Number of blocks used for relation r: b r Size of one records in relation r: s r Number of records in one block: f r Number of different values of attribute A in relation r: V(A,r) = |π A (r)| Average number of records that fulfills an equality selection for attribute A: SC(A,r)

Catalogue information about indexes Hash tables are considered as special indexes Average number of pointers in one node (averge number of children): f i Height of tree i: HT i =|log fi V(A,r)| or in case of hash, HT i =1 Lowest level index Block (number of leaf nodes): LB i

Statistics should be updated after every modification  expensive Updated when DB has time Not always consistent, but gives good approximation

Cost of operations Just approximations: reading/writing is assumed to need the same time

Equality selection

Range selection

Types of join Distinct

Nested with blocks

Indexed nested-loop join If one of the relations is indexed No need for full scan Cost: b r + n r *c, where c is the cost of selection on s

Merge join First sort the relations based on the join attributes Reading the relations once is enough Cost: cost of sorting+b r +b s

Other operations Filter repetition (distinct) – Sort – Delete Cost: cost of sorting Projection: cost of sorting +(filter repetition+)b r Union: Sort relations+merge+filter repetition Intersection: sort both+select common rows Difference: sort+delete rows from 2nd relation

Evaluation - Materialization Tree of operations Leaves: relations Nodes: operations Cost: storing temporal relations + cost of operations Parallel processing

Pipelining Temporal storing is reduced Result records are given for the next process and not stored any more Save memory (records are stored, not relations) Sorting is not possible Demand-driven pipeline: system requires data when needed Data-driven pipeline: operations push data to the pipeline without request until the buffer gets full

Pipeline evaluation Records arrive one after another Merge cannot be used Indexed nested-loop join can be used

Transformation of relational expressions Transform to equivalent expressions with smaller evaluation time Example: Give me the names of customers who have account in Brooklyn Time consuming (selection after join 3 tables) Much better

Equivalence rules Predicates: Θ, Θ 1, Θ 2 Attributes: L 1, L 2, L 3 Relational algebra expression: E, E 1, E 2 Cascade selection: Commutativity: Cascade projection: Connection of join and Descartes multipliation:

Commutativity of theta-join: Associativity of natural join: Distributivity of selection on join – Θ 0 contains attributes from E 1

Commutativity of union and intersection Assiciativity of union and intersection Distributivity of selection on union, intersection, and difference Distributivity of projection on union

These are only examples!

Choosing evaluation plan Create algorithm for the expressions Give order for the operations Take them into processes Example: pipeline use 1. index use linear scan Sort to filter repetition

Cost-based optimization List all the equivalent expressions Assign execution plan for every plan Calculate the cost for every plan Choose the cheapest (based on approximations and statistics) Disadvantage: if too many plan, then too many pre calculations

Example Joining 3 relations: 6 ways and parenthesized in two ways: (2*(n-1)!) / (n-1)! If n=10 then 176 billions of plans… Solution: use some heuristics Consider First optimal join for the first 3 relations, then join with the rest: 12+12 plans remain  not good!

Rules for heuristics Do the selection at the beginning to reduce the number of rows Do the projection as soon as possible to reduce the size of rows Split the conjunction of selections to sequence of selections (use only one selection at the time) Push down the selections on the tree Use the selection or join which results in the least number of rows  use associativity of join

If join is equivalent to a Descartes multiplication and a selection comes next then merge them into a join operation: less records are generated Break the projection lists, push them up on the tree (sometimes new projections can be generated) Search subtrees where pipeline can be applied

1.By applying the rules, several trees are got 2.Calculate the cost 3.Apply the cheapest The optimization adds a cost  optimize it The optimal optimizer optimizes the cost of its own work and the execution too.

Thank you for your attention!

Database Management 9. course. Execution of queries.

Similar presentations

Presentation on theme: "Database Management 9. course. Execution of queries."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Database Management 9. course. Execution of queries.

Similar presentations

Presentation on theme: "Database Management 9. course. Execution of queries."— Presentation transcript:

Similar presentations

About project

Feedback