Presentation is loading. Please wait.

Presentation is loading. Please wait.

File Processing : Query Processing 2008, Spring Pusan National University Ki-Joune Li.

Similar presentations


Presentation on theme: "File Processing : Query Processing 2008, Spring Pusan National University Ki-Joune Li."— Presentation transcript:

1 File Processing : Query Processing 2008, Spring Pusan National University Ki-Joune Li

2 STEMPNU Basic Concepts of Query Query  Retrieve records satisfying predicates Types of Query  Operators  Aggregate Query  Sorting

3 STEMPNU Relational Operators : Select Selection (  condition )  Retrieve records satisfying predicates  Example Find Student where Student.Score > 3.5  score>3.5 (Student)  Index or Hash Select Predicate

4 STEMPNU Relational Operators : Project Project (  attributes )  Extract interesting attributes  Example Find Student.name where score > 3.5  name (  acore>3.5 (Student))  Full Scan Interesting attributes to get Extract

5 STEMPNU Cartisan Product Cartisan Product (  )  Two Tables : R 1  R 2  Produce all cross products Join ( ) r 11 r 12 … r1mr1m R1R1 r 21 r 22 … r2nr2n R2R2  = r 11 … r 21 r 22 … r2nr2n r 12 … r 21 r 22 … r2nr2n r1mr1m r 21 r 22 … r2nr2n … r1mr1m r1mr1m … …

6 STEMPNU Join Join ( )  Select combined records of cartisan product with same value of a common attribute (Natural Join)  Example Student (StudentName, AdvisorProfessorID, Department, Score) Professor(ProfessorName, ProfessorID, Department) Student AdivsorProfessorID=ProfessorID Professor =  AdivsorProfessorID=ProfessorID (Student  Professor)  Double Scan : Expensive Operation

7 STEMPNU Relational Algebra  Operand : Table (Relation)  Operator : Relational Operator ( , ,, etc)  Example Find Student Name where Student Score > 3.5 and Advisor Professor belongs to CSE Department  student.name (  acore>3.5 (Student)  Department=‘CSE’ (Professor) )  Relational Algebra Specifies the sequence of operations  

8 STEMPNU Query Processing Mechanism Query Processing Steps 1. Parsing and translation 2. Optimization 3. Evaluation

9 STEMPNU Parsing and Translation Parsing Query Statement (e.g. in SQL) Translation into relational algebra Equivalent Expression  For a same query statement several relation algebraic expressions are possible  Example  balance  2500 (  name (account ))   name (  balance  2500 (account ))  Different execution schedules Query Execution Plan (QEP)  Determined by relational algebra  Several QEPs may be produced by Parsing and Translation

10 STEMPNU Query Optimization Choose ONE QEP among QEPs based on  Execution Cost of each QEP, where cost means execution time How to find cost of each QEP ?  Real Execution Exact but Not Feasible  Cost Estimation Types of Operations Number of Records Selectivity Distribution of data

11 STEMPNU Cost Model : Basic Concepts Cost Model : Number of Block Accesses Cost C = C index + C data where C index : Cost for Index Access C data : Cost for Data Block Retrieval  C index vs. C data ? C index : depends on index C data  depends on selectivity  Random Access or Sequential Access Selectivity  Number (or Ratio) of Objects Selected by Query

12 STEMPNU Cost Model : Type of Operations Cost model for each type of operations  Select  Project  Join  Aggregate Query Query Processing Method for each type of operations Index/Hash or Not

13 STEMPNU Cost Model : Number of Records Number of Records  N record  N blocks Number of Scans  Single Scan O(N) : Linear Scan O(logN ) : Index  Multiple Scans O(NM ) : Multiple Linear Scans O(N logM ) : Multiple Scans with Index

14 STEMPNU Selectivity  Affects on C data  Random Access Scattered on several blocks N block  N selected  Sequential Access Contiguously stored on blocks N block = N selected / Bf

15 STEMPNU Selectivity Estimation  Depends on Data Distribution  Example Q1 : Find students where 60 < weight < 70 Q2 : Find students where 80 < weight < 90 How to find the distribution  Parametric Method e.g. Gaussian Distribution No a priori knowledge  Non-Parametric Method e.g. Histogram Smoothing is necessary  Wavelet, Discrete Cosine 30 40 5060708090100 Frequency

16 STEMPNU Select : Linear Search Algorithm : linear search  Scan each file block and test all records to see whether they satisfy the selection condition.  Cost estimate (number of disk blocks scanned) = b r b r denotes number of blocks containing records from relation r  If selection is on a key attribute (sorted), cost = (b r /2) stop on finding record  Linear search can be applied regardless of selection condition or ordering of records in the file, or availability of indices

17 STEMPNU Select : Range Search Algorithm : primary index, comparison  Relation is sorted on A  For  A  V (r) Step 1: use index to find first tuple  v and Step 2: scan relation sequentially  For  A  V (r) just scan relation sequentially till first tuple > v; do not use index Algorithm : secondary index, comparison  For  A  V (r) Step 1: use index to find first index entry  v and Step 2: scan index sequentially to find pointers to records.  For  A  V (r) scan leaf nodes of index finding pointers to records, till first entry > v

18 STEMPNU Select : Range Search Comparison between  Searching with Index and  Linear Search Secondary Index  retrieval of records that are pointed to  requires an I/O for each record Linear file scan may be cheaper  if records are scattered on many blocks  clustering is important for this reason

19 STEMPNU Select : Complex Query Conjunction :   1   2 ...  n (r)  Algorithm : selection using one index Step 1: Select a combination of  i (   i (r) ) Step 2: Test other conditions on tuple after fetching it into memory buffer.  Algorithm : selection using multiple-key index Use appropriate multiple-attribute index if available.  Algorithm : selection by intersection of identifiers Step 1: Requires indices with record pointers. Step 2: Intersection of all the obtained sets of record pointers. Step 3: Then fetch records from file Disjunction :   1   2 ...  n (r)  Algorithm : Disjunctive selection by union of identifiers

20 STEMPNU Join Operation Several different algorithms to implement joins  Nested-loop join  Block nested-loop join  Indexed nested-loop join  Merge-join  Hash-join Choice based on cost estimate Examples use the following information  Number of records of customer: 10,000 depositor: 5000  Number of blocks of customer: 400 depositor: 100

21 STEMPNU Nested-Loop Join Algorithm NLJ the theta join r  s For each tuple t r in r do begin For each tuple t s in s do begin test pair (t r,t s ) to see if they satisfy the join condition  if they do, add t r t s to the result. End End r : outer relation, s : inner relation. No indices, any kind of join condition. Expensive

22 STEMPNU Nested-Loop Join : Performance Worst case  the estimated cost is n r  b s + b r disk accesses, if not enough memory only to hold one block of each relation, Example  5000  400 + 100 = 2,000,100 disk accesses with depositor as outer relation, and  1000  100 + 400 = 1,000,400 disk accesses with customer as the outer relation. If the smaller relation fits entirely in memory,  use that as the inner relation.  Reduces cost to b r + b s disk accesses.  If smaller relation (depositor) fits entirely in memory, cost estimate will be 500 disk accesses.

23 STEMPNU Block Nested-Loop Join Algoritm BNLJ For each block B r of r do Get Block B r For each block B s of s do Get Block B s For each tuple t r in B r do For each tuple t s in B s do Check if (t r, t s ) satisfy the join condition if they do, add t r t s to the result. End End End End No disk access required Disk access happens here

24 STEMPNU Block Nested-Loop Join : Performance Worst case  Estimate: b r  b s + b r block accesses.  Each block in the inner relation s is read once for each block in the outer relation (instead of once for each tuple in the outer relation) Improvements : If M blocks can be buffered  use (M-2) disk blocks as blocking unit for outer relations,  use remaining two blocks to buffer inner relation and output  Then the cost becomes  b r / (M-2)   b s + b r

25 STEMPNU Indexed Nested-Loop Join Index lookups can replace file scans if  join is an equi-join or natural join and  an index is available on the inner relation’s join attribute Can construct an index just to compute a join. Algorithm INLJ For each block B r of r do Get Block B r For each tuple t r in B r do Search Index (IDX r, t r.key) if found, add t r t s to the result. End End

26 STEMPNU Indexed Nested-Loop Join : Performance Worst case  buffer has space for only one page of r,  Cost of the join: b r + n r  c Where c is the cost of traversing index and fetching matching tuple Number of matching tuples may be greater than one. If indices are available on join attributes of both r and s,  use the relation with fewer tuples as the outer relation

27 STEMPNU Example of Nested-Loop Join Costs Assume depositor customer,  with depositor as the outer relation.  customer have a primary B + -tree index on the join attribute customer-name, which contains 20 entries in each index node.  customer has 10,000 tuples, the height of the tree is 4, and one more access is needed to find the actual data  Depositor has 5000 tuples Cost of block nested loops join  400*100 + 100 = 40,100 disk accesses assuming worst case memory Cost of indexed nested loops join  100 + 5000 * 5 = 25,100 disk accesses.

28 STEMPNU Hash-Join Applicable for equi-joins and natural joins. A hash function h is used to partition tuples of both relations  h : A→ { 0, 1,..., n }  r 0, r 1,..., r n : partitions of r tuples  s 0, s 1..., s n : partitions of s tuples  r tuples in r i need only to be compared with s tuples in s i.


Download ppt "File Processing : Query Processing 2008, Spring Pusan National University Ki-Joune Li."

Similar presentations


Ads by Google