CSCI 5708: Query Processing I Pusheng Zhang University of Minnesota Feb 3, 2004.

Slides:



Advertisements
Similar presentations
Chapter 13: Query Processing
Advertisements

Chapter 15 Algorithms for Query Processing and Optimization Copyright © 2004 Pearson Education, Inc.
Query Evaluation. An SQL query and its RA equiv. Employees (sin INT, ename VARCHAR(20), rating INT, age REAL) Maintenances (sin INT, planeId INT, day.
SPRING 2004CENG 3521 Query Evaluation Chapters 12, 14.
Query processing and optimization. Advanced DatabasesQuery processing and optimization2 Definitions Query processing –translation of query into low-level.
CSCI 5708: Query Processing II Pusheng Zhang University of Minnesota Feb 5, 2004.
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Query Processing.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
Query Processing (overview)
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 13: Query Processing.
CSCI 5708: Query Processing I Pusheng Zhang University of Minnesota Feb 3, 2004.
ACS-4902 Ron McFadyen Chapter 15 Algorithms for Query Processing and Optimization.
Introduction to Query Processing and Query Optimization Techniques
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
Dr. Kalpakis CMSC 461, Database Management Systems Query Processing.
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 13: Query Processing.
Query Processing Chapter 12
COMP 5138 Relational Database Management Systems Semester 2, 2007 Lecture 12 Query Processing and Optimization.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
Chapter 13 Query Processing Melissa Jamili CS 157B November 11, 2004.
©Silberschatz, Korth and Sudarshan7.1 Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting Join Operation Other Operations.
Query Processing. Steps in Query Processing Validate and translate the query –Good syntax. –All referenced relations exist. –Translate the SQL to relational.
12.1Database System Concepts - 6 th Edition Chapter 12: Query Processing Overview Measures of Query Cost Selection Operation Join Operation Sorting 、 Other.
SCUHolliday - COEN 17814–1 Schedule Today: u Query Processing overview.
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan Chapter 13: Query Processing.
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 13: Query Processing.
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 13: Query Processing.
Chapter 13: Query Processing Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting Join Operation Other Operations.
Computing & Information Sciences Kansas State University Tuesday, 03 Apr 2007CIS 560: Database System Concepts Lecture 29 of 42 Tuesday, 03 April 2007.
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 13: Query Processing.
Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 12: Query Processing.
Lecture 1- Query Processing Advanced Databases Masood Niazi Torshiz Islamic Azad university- Mashhad Branch
CPT-S Topics in Computer Science Big Data 1 1 Yinghui Wu EME 49.
Chapter 12 Query Processing. Query Processing n Selection Operation n Sorting n Join Operation n Other Operations n Evaluation of Expressions 2.
Chapter 12 Query Processing (1) Yonsei University 2 nd Semester, 2013 Sanghyun Park.
Chapter 13: Query Processing
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 13: Query Processing.
1 B + -Trees: Search  If there are n search-key values in the file,  the path is no longer than  log  f/2  (n)  (worst case).
Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 12: Query Processing.
Computing & Information Sciences Kansas State University Wednesday, 08 Nov 2006CIS 560: Database System Concepts Lecture 32 of 42 Monday, 06 November 2006.
13.1 Chapter 13: Query Processing n Overview n Measures of Query Cost n Selection Operation n Sorting n Join Operation n Other Operations n Evaluation.
Chapter 13: Query Processing. Overview Measures of Query Cost Selection Operation Sorting Join Operation Other Operations Evaluation of Expressions.
File Processing : Query Processing 2008, Spring Pusan National University Ki-Joune Li.
Query Processing and Query Optimization Database System Implementation CSE 507 Some slides adapted from Silberschatz, Korth and Sudarshan Database System.
Computing & Information Sciences Kansas State University Wednesday, 02 Apr 2008CIS 560: Database System Concepts Lecture 27 of 42 Wednesday, 02 April 2008.
Chapter 13: Query Processing
Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 12: Query Processing.
Query Processing  Basic Steps in Query Processing – an overview  Measures of Query Cost  Query Processing- Several algorithms  Selection Operation.
Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 12: Query Processing.
Chapter 4: Query Processing
Database Management System
Chapter 12: Query Processing
Query Processing.
Chapter 13: Query Processing
File Processing : Query Processing
Dynamic Hashing Good for database that grows and shrinks in size
Query Processing B.Ramamurthy Chapter 12 11/27/2018 B.Ramamurthy.
Query Processing.
Chapter 13: Query Processing
Chapter 12: Query Processing
Chapter 13: Query Processing
Module 13: Query Processing
Lecture 2- Query Processing (continued)
Chapter 13: Query Processing
Chapter 12 Query Processing (1)
Chapter 13: Query Processing
Sorting We may build an index on the relation, and then use the index to read the relation in sorted order. May lead to one disk block access for each.
Chapter 13: Query Processing
Chapter 13: Query Processing
Chapter 13: Query Processing
Presentation transcript:

CSCI 5708: Query Processing I Pusheng Zhang University of Minnesota Feb 3, 2004

15.2 CSCI 5708, Spring University of Minnesota Outline Basic Steps in Query Processing External Sorting Strategies for relational operations  Select Operation  Join Operation  Other Operations Evaluation of Expressions

15.3 CSCI 5708, Spring University of Minnesota Basic Steps in Query Processing FIGURE 15.1 PP 494 Statistics about Data

15.4 CSCI 5708, Spring University of Minnesota Basic Steps in Query Processing (Cont.) Scanning, Parsing, and Validating  translate the query into its internal form. This is then translated into relational algebra.  Parser checks syntax, verifies relations Evaluation  The query-execution engine takes a query-evaluation plan, executes that plan, and returns the answers to the query.

15.5 CSCI 5708, Spring University of Minnesota Basic Steps in Query Processing : Optimization A relational algebra expression may have many equivalent expressions  E.g.,  SALARY < 50K (  SALARY (EMPLOYEE)) is equivalent to  SALARY (  SALARY < 50K (EMLOYEE)) Each relational algebra operation can be evaluated using one of several different algorithms  Correspondingly, a relational-algebra expression can be evaluated in many ways. Annotated expression specifying detailed evaluation strategy is called an evaluation-plan.  E.g., can use an index on SALARY to find employees with SALARY < 50k,  or can perform complete relation scan and discard employees with SALARY  50k

15.6 CSCI 5708, Spring University of Minnesota Basic Steps: Optimization (Cont.) Query Optimization: Amongst all equivalent evaluation plans choose the one with lowest cost.  Cost is estimated using statistical information from the database catalog  e.g. number of tuples in each relation, size of tuples, etc. In query processing we study  How to measure query costs  Algorithms for evaluating relational algebra operations  How to combine algorithms for individual operations in order to evaluate a complete expression In query optimization  We study how to optimize queries, that is, how to find an evaluation plan with lowest estimated cost.  Heuristic rules and systematically estimating

15.7 CSCI 5708, Spring University of Minnesota Measures of Query Cost Cost is generally measured as total elapsed time for answering query  Many factors contribute to time cost  disk accesses, CPU, or even network communication Typically disk access is the predominant cost, and is also relatively easy to estimate. Measured by taking into account  Number of seeks * average-seek-cost  Number of blocks read * average-block-read-cost  Number of blocks written * average-block-write-cost  Cost to write a block is greater than cost to read a block –data is read back after being written to ensure that the write was successful

15.8 CSCI 5708, Spring University of Minnesota Measures of Query Cost (Cont.) For simplicity we just use number of block transfers from disk as the cost measure  We ignore the difference in cost between sequential and random I/O for simplicity  We also ignore CPU/communication costs for simplicity I/O cost DEPENDS on  Search criteria: point/range query on a ordering / other fields  File structures: heap, sorted, hashed  Index types: primary, clustering, secondary, B+ tree, multilevel, …  Other factors: e.g., buffering, disk placement, materialization, overflow / free space mgmt, … We do not include cost to writing output to disk in our cost formulae

15.9 CSCI 5708, Spring University of Minnesota Catalog Information for Cost Estimation n r : number of tuples in a relation r. b r : number of blocks containing tuples of r. s r : size of a tuple of r. bfr: blocking factor of r — i.e., the number of tuples of r that fit into one block. d(A, r): number of distinct values that appear in r for attribute A. sl(A,r): selectivity (fraction of records satisfying an equality condition in r for attribute A. sc(A,r): selection cardinality of attribute A of relation r; average number of records that satisfy equality on A x: number of index levels e = Pr[a record in overflow area] * E(overflow chain length)

15.10 CSCI 5708, Spring University of Minnesota Selection Operation File scan – search algorithms that locate and retrieve records that fulfill a selection condition. Algorithm S1 (linear search). Scan each file block and test all records to see whether they satisfy the selection condition.  Cost estimate (number of disk blocks scanned) = b r  b r denotes number of blocks containing records from relation r  If selection is on a key attribute, cost = ( b r /2)  stop on finding record  Linear search can be applied regardless of  selection condition or  ordering of records in the file, or  availability of indices

15.11 CSCI 5708, Spring University of Minnesota Selection Operation (Cont.) S2 (binary search). Applicable if selection is an equality comparison on the attribute on which file is ordered.  Assume that the blocks of a relation are stored contiguously  Cost estimate (number of disk blocks to be scanned):   log 2 (b r )  — cost of locating the first tuple by a binary search on the blocks   sc(A,r) / bfr  - 1 — Plus number of other blocks containing records that satisfy selection condition  If selection is on a key attribute, cost =  log 2 (b r ) 

15.12 CSCI 5708, Spring University of Minnesota Example for File Scan n r = 10,000 (number of tuples in a relation r). b r = 500 (number of blocks containing tuples of r). sc(A,r): selection cardinality of attribute A of relation r; average number of records that satisfy equality on A select FNAME = ‘Alex’ from EMPLOYEE

15.13 CSCI 5708, Spring University of Minnesota Selections (cont.) Index scan – search algorithms that use an index  selection condition must be on search-key of index. S3a (primary index, equality). Retrieve a single record that satisfies the corresponding equality condition  Cost = x + 1 S3b (hash key). Retrieve a single record  Cost = e + 1 S4 (primary index, comparison). (Relation is sorted on A)  For  A  V (r) use index to find first tuple  v and scan relation sequentially from there  For  A  V (r) just scan relation sequentially till first tuple > v; do not use index  Rough cost = x + (b r /2),

15.14 CSCI 5708, Spring University of Minnesota Selection (cont.) S5 (equality on clustering index to retrieve multiple records)  Cost = x +  sc(A,r) / bfr  S6a (equality on search-key of secondary index).  Retrieve a single record if the search-key is a candidate key  Cost = x + 1  Retrieve multiple records if search-key is not a candidate key  Cost = x + number of records retrieved –Can be very expensive!  each record may be on a different block – one block access for each retrieved record – worst cost = x + sc(A,r)

15.15 CSCI 5708, Spring University of Minnesota Selections (cont.) S6b (secondary index, comparison).  For  A  V (r) use index to find first index entry  v and scan index sequentially from there, to find pointers to records.  For  A  V (r) just scan leaf pages of index finding pointers to records, till first entry > v  In either case, retrieve records that are pointed to –requires an I/O for each record – Linear file scan may be cheaper if many records are to be fetched! S7 Conjunctive selection

15.16 CSCI 5708, Spring University of Minnesota Summary of Selections Point Query: equality  Algorithm: Linear scan, Binary search, Hash, Indexed search  Algebraic cost formula:  Search on unique attribute: (b r /2),,  log 2 (b r ) , e + 1, x + 1  Not unique => plus  sc(A,r) / bfr  to sorted/hashed files; b r for heap Range Query: retrieve records in a certain range  (1) Linear : b r or b r /2 + sl * b r  (2) Binary search then scan:  log 2 (b r )  + sl * b r  (3) Find 1 st record in, then scan data file (primary/clustering index)  x sl * b r  (4) Find 1 st record in, then scan index leafs (secondary index/B+ tree)  x + sl *[bi(leaf) +

15.17 CSCI 5708, Spring University of Minnesota Sorting We may build an index on the relation, and then use the index to read the relation in sorted order. May lead to one disk block access for each tuple. For relations that fit in memory, techniques like quicksort can be used. For relations that don’t fit in memory, external sort-merge is a good choice.

15.18 CSCI 5708, Spring University of Minnesota External Sort-Merge 1. Create sorted runs. Let i be 0 initially. Repeatedly do the following till the end of the relation: (a) Read M blocks of relation into memory (b) Sort the in-memory blocks (c) Write sorted data to run R i ; increment i. Let the final value of I be N 2. Merge the runs (N-way merge). We assume (for now) that N < M. 1. Use N blocks of memory to buffer input runs, and 1 block to buffer output. Read the first block of each run into its buffer page 2. repeat 1. Select the first record (in sort order) among all buffer pages 2. Write the record to the output buffer. If the output buffer is full write it to disk. 3. Delete the record from its input buffer page. If the buffer page becomes empty then read the next block (if any) of the run into the buffer. 3. until all input buffer pages are empty: Let M denote memory size (in pages).

15.19 CSCI 5708, Spring University of Minnesota External Sort-Merge (Cont.) If i  M, several merge passes are required.  In each pass, contiguous groups of M - 1 runs are merged.  A pass reduces the number of runs by a factor of M -1, and creates runs longer by the same factor.  E.g. If M=11, and there are 90 runs, one pass reduces the number of runs to 9, each 10 times the size of the initial runs  Repeated passes are performed till all runs have been merged into one.

15.20 CSCI 5708, Spring University of Minnesota Example: External Sorting Using Sort-Merge

15.21 CSCI 5708, Spring University of Minnesota External Merge Sort (Cont.) Cost analysis:  Total number of merge passes required:  log M–1 (b r /M) .  Disk accesses for initial run creation as well as in each pass is 2b r.  Sort phase: 2b r (each block is accessed twice: read and write) Thus total number of disk accesses for external sorting: b r ( 2  log M–1 (b r / M)  + 2)