Download presentation
Presentation is loading. Please wait.
Published byAlexis Wood Modified over 9 years ago
1
Database Management 9. course
2
Execution of queries
3
Query evaluation Query Parse, compile Relational algebra Optimize Execution plan Evaluate Statistics Query output Data
4
Query – SQL Parse – Correct SQL query? Relational algebra – Understandable for the computer Optimize – Based on what?
5
Execution plan – If several queries give the same result: which is the best? Evaluate – Find the proper data records Query output – Give answer to the user
6
Optimization example Data of a bank select balance from account where balance<2500 Two relational algebra representation
7
The cost of an operation depends on the algorithms we can use: e.g. an index speeds up the selection Primitive: elemental operation (projection, selection, …) Pipeline: building blocks for evaluations and statistics Input of a primitive=output of the previous primitive
9
Catalogue cost approximation For choosing the proper strategy The approximation of cost is needed Cost approximation can be done based on several attributes – Space – Time Statistics are stored in the catalogue
10
Content of the catalogue Number of records in relation r: n r Number of blocks used for relation r: b r Size of one records in relation r: s r Number of records in one block: f r Number of different values of attribute A in relation r: V(A,r) = |π A (r)| Average number of records that fulfills an equality selection for attribute A: SC(A,r)
11
Catalogue information about indexes Hash tables are considered as special indexes Average number of pointers in one node (averge number of children): f i Height of tree i: HT i =|log fi V(A,r)| or in case of hash, HT i =1 Lowest level index Block (number of leaf nodes): LB i
12
Statistics should be updated after every modification expensive Updated when DB has time Not always consistent, but gives good approximation
13
Cost of operations Just approximations: reading/writing is assumed to need the same time
14
Equality selection
15
Range selection
16
Types of join Distinct
18
Nested with blocks
19
Indexed nested-loop join If one of the relations is indexed No need for full scan Cost: b r + n r *c, where c is the cost of selection on s
20
Merge join First sort the relations based on the join attributes Reading the relations once is enough Cost: cost of sorting+b r +b s
21
Other operations Filter repetition (distinct) – Sort – Delete Cost: cost of sorting Projection: cost of sorting +(filter repetition+)b r Union: Sort relations+merge+filter repetition Intersection: sort both+select common rows Difference: sort+delete rows from 2nd relation
22
Evaluation - Materialization Tree of operations Leaves: relations Nodes: operations Cost: storing temporal relations + cost of operations Parallel processing
23
Pipelining Temporal storing is reduced Result records are given for the next process and not stored any more Save memory (records are stored, not relations) Sorting is not possible Demand-driven pipeline: system requires data when needed Data-driven pipeline: operations push data to the pipeline without request until the buffer gets full
24
Pipeline evaluation Records arrive one after another Merge cannot be used Indexed nested-loop join can be used
25
Transformation of relational expressions Transform to equivalent expressions with smaller evaluation time Example: Give me the names of customers who have account in Brooklyn Time consuming (selection after join 3 tables) Much better
27
Equivalence rules Predicates: Θ, Θ 1, Θ 2 Attributes: L 1, L 2, L 3 Relational algebra expression: E, E 1, E 2 Cascade selection: Commutativity: Cascade projection: Connection of join and Descartes multipliation:
28
Commutativity of theta-join: Associativity of natural join: Distributivity of selection on join – Θ 0 contains attributes from E 1
30
Commutativity of union and intersection Assiciativity of union and intersection Distributivity of selection on union, intersection, and difference Distributivity of projection on union
31
These are only examples!
32
Choosing evaluation plan Create algorithm for the expressions Give order for the operations Take them into processes Example: pipeline use 1. index use linear scan Sort to filter repetition
33
Cost-based optimization List all the equivalent expressions Assign execution plan for every plan Calculate the cost for every plan Choose the cheapest (based on approximations and statistics) Disadvantage: if too many plan, then too many pre calculations
34
Example Joining 3 relations: 6 ways and parenthesized in two ways: (2*(n-1)!) / (n-1)! If n=10 then 176 billions of plans… Solution: use some heuristics Consider First optimal join for the first 3 relations, then join with the rest: 12+12 plans remain not good!
35
Rules for heuristics Do the selection at the beginning to reduce the number of rows Do the projection as soon as possible to reduce the size of rows Split the conjunction of selections to sequence of selections (use only one selection at the time) Push down the selections on the tree Use the selection or join which results in the least number of rows use associativity of join
36
If join is equivalent to a Descartes multiplication and a selection comes next then merge them into a join operation: less records are generated Break the projection lists, push them up on the tree (sometimes new projections can be generated) Search subtrees where pipeline can be applied
37
1.By applying the rules, several trees are got 2.Calculate the cost 3.Apply the cheapest The optimization adds a cost optimize it The optimal optimizer optimizes the cost of its own work and the execution too.
38
Thank you for your attention!
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.