Query Processing and Optimization

Query Processing and Optimization

One View of Basic Query Processing Steps
© Dr. Philip Cannata Data Management

Relational Algebra Relational Algebra Operation
Relational Algebra Operation Name Equivalent SQL Operation A1, A2, …, Ak (r) Project select A1, A2, … Ak from r p (r) Select … from r where p r x s Cartesian-Product … from r, s A=C (r x s) Equijoin … from r, s where A = C  (r x s) Theta Join … from r, s where A  C r s Natural Join … from r natural join s  x (E) Rename (Alias) … from E x … x (A1, A2, An) (E) Rename select x.x A1, x. y A2, x. z An from E x table  Assignment create table tmp as (select … ) © Dr. Philip Cannata Data Management

Selection and Projection Rules
Relational Algebra Rules Selection and Projection Rules Break complex selection into simpler ones: Cond1Cond2 (R)  Cond1 (Cond2 (R) ) Break projection into stages: attr (R)   attr ( attr (R)), if attr  attr Commute projection and selection:  attr (Cond(R))  Cond ( attr (R)), if attr  all attributes in Cond © Dr. Philip Cannata Data Management

Commutativity and Associativity of Join
Relational Algebra Rules Commutativity and Associativity of Join Join commutativity: R S  S R used to reduce cost of nested loop evaluation strategies (smaller relation should be in outer loop) Join associativity: R (S T)  (R S) T used to reduce the size of intermediate relations in computation of multi-relational join – first compute the join that yields smaller intermediate result N-way join has T(N) N! different evaluation plans T(N) is the number of parenthesized expressions N! is the number of permutations Query optimizer cannot look at all plans (might take longer to find an optimal plan than to compute query brute-force). Hence it does not necessarily produce optimal plan © Dr. Philip Cannata Data Management

Pushing Selections and Projections
Relational Algebra Rules Pushing Selections and Projections Cond (R  S)  R  Cond S Cond relates attributes of both R and S Reduces size of intermediate relation since rows can be discarded sooner Cond (R  S)  Cond (R)  S Cond involves only the attributes of R Reduces size of intermediate relation since rows of R are discarded sooner attr(R  S)  attr(attr (R)  S), if attributes(R)  attr  attr reduces the size of an operand of product © Dr. Philip Cannata Data Management

Oracle Join Methods Before showing query processing examples, we need to discuss some Oracle Join Methods. Nested loops join The nested loop iterates over all rows of the outer table. If there are conditions in the where clause of the SQL statement that apply to the outer table only, it checks whether those apply. If they do, the corresponding rows (from the where condition) in the joined inner table are searched. These rows from the inner table are either found using an index (if a suitable index exists) or by doing a full table scan. Hash join Hash joins are used when the joining large tables. The optimizer uses the smaller of the 2 tables to build a hash table in memory and the scans the large table and compares the hash value (of rows from large table) with this hash table to find the joined rows. Merge join (also called sort merge join) Sort merge join is used to join two independent data sources. They perform better than nested loop joins when the volume of data is big in tables but not as good as hash joins in general. They perform better than hash join when the join condition columns are already sorted or there is no sorting required. © Dr. Philip Cannata Data Management

Another View of Basic Query Processing Steps
Example of a Bind Variable: select * from emp where sal = :salary © Dr. Philip Cannata Data Management

Another View of Basic Query Processing Steps

Query Processing Example – SQL, Rational Algebra, and Query Tree
select * from s_dept d join s_region r on d.region_id = r.id join s_warehouse w on r.id = w.region_id; * (r.id=w.region_id (d.region_id=r.id ( d (s_dept) x  r (s_region)) x  w (s_warehouse))) * r.id=w.region_id x d.region_id=r.id x  w (s_warehouse)  d (s_dept)  r (s_region) © Dr. Philip Cannata Data Management

Query Processing Example
select * from s_dept d join s_region r on d.region_id = r.id join s_warehouse w on r.id = w.region_id; Estimation on the number of rows of each operation © Dr. Philip Cannata Data Management

Do the same for the S_Region and S_Dept Tables.
Query Processing Example So let’s gather statistics for the S_Warehouse Table. Do the same for the S_Region and S_Dept Tables. © Dr. Philip Cannata Data Management

select * from s_dept d join s_region r on d.region_id = r.id join s_warehouse w on r.id = w.region_id; © Dr. Philip Cannata Data Management

Run these. Then get the Execution Plan for the query. create index dept_index on s_dept(region_id); create index warehouse_index on s_warehouse(region_id); select * from s_dept d join s_region r on d.region_id = r.id join s_warehouse w on r.id = w.region_id; © Dr. Philip Cannata Data Management

SQL, Rational Algebra, Query Tree, and Optimized Query Tree
select * from s_dept d join s_region r on d.region_id = r.id join s_warehouse w on r.id = w.region_id where w.country = 'US’ d.name, r.name, w.city (w.county=‘US’ (r.id=w.region_id (d.region_id=r.id ( d (s_dept) x  r (s_region)) x  w (s_warehouse)))) d.name, r.name, w.city d.name, r.name, w.city w.county=‘US’ Optimized Query Tree r.id=w.region_id r.id=w.region_id Query Tree x x d.region_id=r.id d.region_id=r.id w.county=‘US’ x  w (s_warehouse) x  d (s_dept)  r (s_region)  w (s_warehouse)  d (s_dept)  r (s_region) © Dr. Philip Cannata Data Management

select * from s_dept d join s_region r on d.region_id = r.id join s_warehouse w on r.id = w.region_id where w.country = 'US’ © Dr. Philip Cannata Data Management

(this only works on Dr. Cannata machine)
A Really Hairy Query Processing Example – see the next page for the Execution Plan (this only works on Dr. Cannata machine) select null link, round(log(2, message_size)) label, ((obytes*8)/ )/nvl(kstat_dur, 10) "kstat 64 Stream" from (SELECT distinct eid, rid, NTH_VALUE(obytes, 2) OVER (PARTITION BY rid ORDER BY snaptime ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) obytes from (SELECT eid, rid, snaptime, avg(obytes) OVER (PARTITION BY rid ORDER BY snaptime ROWS BETWEEN 0 PRECEDING AND UNBOUNDED FOLLOWING) obytes from obytes - LAG(obytes, 1, 0) OVER (ORDER BY snaptime) AS obytes FROM (SELECT to_number(ltrim(eid, ':')) eid, to_number(ltrim(rid, ':')) rid, to_number(ltrim(snaptime, ':')) snaptime, to_number(ltrim(obytes, ':')) obytes FROM TABLE(SEM_MATCH( '(?sub :rid ?rid) (?sub :eid ?eid) (?sub :snaptime ?snaptime) (?sub :obytes ?obytes) (?sub :name :2c903000ac562-1_data_stats) (?sub :class :hca)', SEM_Models('OBSERV_RDF_MODEL'), null, SEM_ALIASES(SEM_ALIAS('',':')), null))) where eid = :P10_EXP) order by rid, snaptime)) k, ibdatarun d where d.eid = :P10_EXP and d.rid = k.rid and streams = 64 order by label © Dr. Philip Cannata Data Management

Query Processing and Optimization

Similar presentations

Presentation on theme: "Query Processing and Optimization"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Query Processing and Optimization

Similar presentations

Presentation on theme: "Query Processing and Optimization"— Presentation transcript:

Similar presentations

About project

Feedback