1 Database Query Execution Zack Ives CSE 544 - Principles of DBMS Ullman Chapter 6, Query Execution Spring 1999.

Slides:



Advertisements
Similar presentations
Equality Join R X R.A=S.B S : : Relation R M PagesN Pages Relation S Pr records per page Ps records per page.
Advertisements

Query Execution, Concluded Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems November 18, 2003 Some slide content may.
Database Management Systems 3ed, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 14, Part B.
Database Management Systems, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
Database Management Systems, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
Query Evaluation. An SQL query and its RA equiv. Employees (sin INT, ename VARCHAR(20), rating INT, age REAL) Maintenances (sin INT, planeId INT, day.
SPRING 2004CENG 3521 Query Evaluation Chapters 12, 14.
Query processing and optimization. Advanced DatabasesQuery processing and optimization2 Definitions Query processing –translation of query into low-level.
Optimizing Query Execution Zachary G. Ives University of Pennsylvania CIS 650 – Implementing Data Management Systems January 26, 2005 Content on hashing.
CS263 Lecture 19 Query Optimisation.  Motivation for Query Optimisation  Phases of Query Processing  Query Trees  RA Transformation Rules  Heuristic.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
Query Processing (overview)
Indexing, Sorting, and Execution Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems November 18, 2004 Some slide content.
Query Optimization 3 Cost Estimation R&G, Chapters 12, 13, 14 Lecture 15.
Indexing and Sorting Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems November 22, 2005.
Query Optimization Overview Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems December 1, 2005 Some slide content derived.
Indexing, Sorting, and Execution Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems November 11, 2003 Some slide content.
1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
Query Execution Zachary G. Ives University of Pennsylvania CIS 650 – Implementing Data Management Systems January 24, 2005 Content on hashing and sorting.
CS 4432query processing - lecture 171 CS4432: Database Systems II Lecture #17 Join Processing Algorithms (cont). Professor Elke A. Rundensteiner.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
Sorting and Query Processing Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems November 29, 2005.
Query Execution Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems November 23, 2004.
1 Query Execution in Databases: An Introduction Zack Ives CSE 544 Spring 2000.
1 Relational Operators. 2 Outline Logical/physical operators Cost parameters and sorting One-pass algorithms Nested-loop joins Two-pass algorithms.
Dr. Kalpakis CMSC 461, Database Management Systems Query Processing.
ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 14 – Join Processing.
Query Optimization. overview Histograms A histogram is a data structure maintained by a DBMS to approximate a data distribution Equiwidth vs equidepth.
Optimizing Query Execution Zachary G. Ives University of Pennsylvania CIS 650 – Implementing Data Management Systems September 18, 2008 Content on hashing.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
©Silberschatz, Korth and Sudarshan7.1 Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting Join Operation Other Operations.
Query Processing. Steps in Query Processing Validate and translate the query –Good syntax. –All referenced relations exist. –Translate the SQL to relational.
12.1Database System Concepts - 6 th Edition Chapter 12: Query Processing Overview Measures of Query Cost Selection Operation Join Operation Sorting 、 Other.
Chapter 13: Query Processing Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting Join Operation Other Operations.
Computing & Information Sciences Kansas State University Tuesday, 03 Apr 2007CIS 560: Database System Concepts Lecture 29 of 42 Tuesday, 03 April 2007.
Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 12: Query Processing.
Query Processing Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems November 12, 2015.
Chapter 12 Query Processing. Query Processing n Selection Operation n Sorting n Join Operation n Other Operations n Evaluation of Expressions 2.
Chapter 13: Query Processing
CSCE Database Systems Chapter 15: Query Execution 1.
Advance Database Systems Query Optimization Ch 15 Department of Computer Science The University of Lahore.
CSCI 5708: Query Processing II Pusheng Zhang University of Minnesota Feb 5, 2004.
Query Processing CS 405G Introduction to Database Systems.
Query Execution. Where are we? File organizations: sorted, hashed, heaps. Indexes: hash index, B+-tree Indexes can be clustered or not. Data can be stored.
CS 440 Database Management Systems Lecture 5: Query Processing 1.
Lecture 3 - Query Processing (continued) Advanced Databases Masood Niazi Torshiz Islamic Azad university- Mashhad Branch
13.1 Chapter 13: Query Processing n Overview n Measures of Query Cost n Selection Operation n Sorting n Join Operation n Other Operations n Evaluation.
File Processing : Query Processing 2008, Spring Pusan National University Ki-Joune Li.
Hash Tables and Query Execution March 1st, Hash Tables Secondary storage hash tables are much like main memory ones Recall basics: –There are n.
Query Processing – Implementing Set Operations and Joins Chap. 19.
Relational Operator Evaluation. overview Projection Two steps –Remove unwanted attributes –Eliminate any duplicate tuples The expensive part is removing.
CS 540 Database Management Systems
Query Execution Query compiler Execution engine Index/record mgr. Buffer manager Storage manager storage User/ Application Query update Query execution.
1 Overview of Query Evaluation Chapter Outline  Query Optimization Overview  Algorithm for Relational Operations.
CS 540 Database Management Systems
CS 440 Database Management Systems
Database Management System
COST ESTIMATION FOR THE RELATIONAL ALGEBRA OPERATIONS MIT 813 GROUP 15 PRESENTATION.
Database Management Systems (CS 564)
Evaluation of Relational Operations: Other Operations
File Processing : Query Processing
Query Optimization Overview
Database Query Execution
Indexing and Sorting Zachary G. Ives November 21, 2007
Lecture 2- Query Processing (continued)
Advance Database Systems
Lecture 13: Query Execution
Evaluation of Relational Operations: Other Techniques
B-Trees and Sorting Zachary G. Ives April 12, 2019
Evaluation of Relational Operations: Other Techniques
Presentation transcript:

1 Database Query Execution Zack Ives CSE Principles of DBMS Ullman Chapter 6, Query Execution Spring 1999

2 Query Execution  Inputs:  Query execution plan from optimizer  Data from source relations  Indices  Outputs:  Query results  Data distribution statistics  (Also use temp storage)

3 Query Plans  Data-flow tree (or graph) of relational algebra operators  Statically pre-compiled vs. dynamic decisions:  “Choose nodes”  Competition  Fragments Select Client = “Atkins” Join PressRel.Symbol = Clients.Symbol Scan PressRel Scan Clients Join Symbol = Northwest.CoSymbol Project CoSymbol Scan Northwest

4 Plan Execution  Execution granularity & parallelism:  Pipelining vs. blocking  Threads  Materialization  Execution flow:  Iterator/top-down  Data-driven/bottom-up Select Client = “Atkins” Join PressRel.Symbol = Clients.Symbol Scan PressRel Scan Clients Join Symbol = Northwest.CoSymbol Project CoSymbol Scan Northwest

5 Data-Driven Execution  Schedule leaves (generally parallel or distributed system)  Leaves feed data “up” tree; may need to buffer  Good for slow sources or parallel/distributed  Often less efficient than iterator w.r.t. memory and CPU Select Client = “Atkins” Join PressRel.Symbol = Clients.Symbol Scan PressRel Scan Clients Join Symbol = Northwest.CoSymbol Project CoSymbol Scan Northwest

6 The Iterator Model  Execution begins at root  open, getNext, close  Propagate calls to children Non-pipelined operation may require multiple getNext s  Efficient scheduling & resource usage  Poor if slow sources ( getNext may block) Select Client = “Atkins” Join PressRel.Symbol = Clients.Symbol Scan PressRel Scan Clients Join Symbol = Northwest.CoSymbol Project CoSymbol Scan Northwest

7 Tukwila Modified Iterator Model  Same operations open, getNext, close  Some operators multithreaded  Use buffering and synchronization  Schedule work while blocked  Multithreaded operators need more memory Select Client = “Atkins” Join PressRel.Symbol = Clients.Symbol Scan PressRel Scan Clients Join Symbol = Northwest.CoSymbol Project CoSymbol Scan Northwest

8 The Cost of Execution  Costs very important to the optimizer  It must search for low-cost query execution plan  Statistics:  Cardinalities  Histograms (estimate selectivities )  Impact of data integration?  I/O vs. computation costs  Time-to-first-tuple vs. completion time

9 Reducing Costs with Buffering  Read a page/block at a time  Should look familiar to OS people!  Use a page replacement strategy:  LRU (not as good as you might think)  MRU (good for one-time sequential scans)  Clock  etc.  Note that we have more knowledge than OS to predict paging behavior  e.g. one-time scan should use MRU  Can also prefetch when appropriate Buffer Mgr Tuple Reads/Writes

10 Select Operator  If unsorted & no index, check against predicate: Read tuple While tuple doesn’t meet predicate Read tuple Return tuple  Sorted data: can stop after particular value encountered  Indexed data: apply predicate to index, if possible  If predicate is:  conjunction: may use indexes and/or scanning loop above (may need to sort/hash to compute intersection)  disjunction: may use union of index results, or scanning loop

11 Project Operator  Simple scanning method often used if no index: Read tuple While more tuples Output specified attributes Read tuple  Duplicate removal may be necessary  Partition output into separate files by bucket, do duplicate removal on those  May need to use recursion  If have many duplicates, sorting may be better  Can sometimes do index-only scan, if projected attributes are all indexed

12 The Simplest Join — Nested-Loops  Requires two nested loops: For each tuple in outer relation For each tuple in inner, compare If match on join attribute, output  Block nested loops join: read & match page at a time  What if join attributes are indexed? Index nested-loops join  Very simple to implement  Inefficient if size of inner relation > memory (keep swapping pages); requires sequential search for match Join outerinner

13 Sort-Merge Join  First sort data based on join attributes  Use an external sort (as previously described), unless data is already ordered Merge and join the files, reading sequentially a block at a time  Maintain two file pointers; advance pointer that’s pointing at guaranteed non-matches  Allows joins based on inequalities (non-equijoins)  Very efficient for presorted data  Not pipelined unless data is presorted

14 Hashing it Out: Hash-Based Joins  Allows (at least some) pipelining of operations with equality comparisons (e.g. equijoin, union)  Sort-based operations block, but allow range and inequality comparisons  Hash joins usually done with static number of hash buckets  Alternatives use directories, are more complex:  Extendible hashing  Linear hashing  Generally have fairly long overflow chains

15 Hash Join Read entire inner relation into hash table (join attributes as key) For each tuple from outer, look up in hash table & join  Very efficient, very good for databases  Not fully pipelined  Supports equijoins only  Data integration?

16 Overflowing Memory - GRACE  Two possible strategies:  Overflow prevention (prevent from happening)  Overflow resolution (handle overflow when it occurs)  GRACE hash Write each bucket to separate file Finish reading inner, swapping tuples to appropriate files Read outer, swapping tuples to overflow files matching those from inner Recursively GRACE hash join matching outer & inner overflow files

17 Overflowing Memory - Hybrid Hash  A “lazy” version of the GRACE hash: When memory overflows, only swap a subset of the tables Continue reading inner relation and building table (sending tuples to buckets on disk as necessary) Read outer, joining with buckets in memory or swapping to disk as appropriate Join the corresponding overflow files, using recursion

18 Double-Pipelined Join  Two hash tables  As a tuple comes in, add to the appropriate side & join with opposite table  Fully pipelined, data- driven  Needs more memory

19 Double Pipelined Join Performance for Data Integration

20 Overflow Resolution in the DPJoin  Requires a bunch of ugly bookkeeping! Need to mark tuples depending on state of opposite bucket - this lets us know whether they need to be joined later  Tukwila “Incremental left flush” strategy  Pause reading from outer relation, swap some of its buckets  Finish reading from inner; still join with left-side hash table if possible, or swap to disk  Read outer relation, join with inner’s hash table  Read from overflow files and join as in hybrid hash join

21 Overflow Resolution, Pt. II  Tukwila “Symmetric flush” strategy:  Flush all tuples for the same bucket from both sides  Continue joining; when done, join overflow files by hybrid hash  Urhan and Franklin’s X-Join  Flush buckets from either relation  If stalled, start trying to join from overflow files  Needs lots of really nasty bookkeeping

22 Performance of Overflow Methods

23 The Semi-Join/Dependent Join  Take attributes from left and feed to the right source as input/filter  Important in data integration  Simple method: for each tuple from left send to right source get data back, join  More complex:  Hash “cache” of attributes & mappings  Don’t send attribute already seen Join A.x = B.y AB x

24 Join Type Performance

25 Issues in Choosing Joins  Goal: minimize I/O costs!  Is the data pre-sorted?  How much memory do I have and need? Selectivity estimates  Inner relation vs. outer relation  Am I doing an equijoin or some other join?  Is pipelining important?  How confident am I in my estimates?  Partition such that partition files don’t overflow!

26 Sets vs. Bags  Operations requiring set semantics  Duplicate removal  Union  Difference  Methods  Indices  Sorting  Hybrid hashing