Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Query Processing (based on notes by C. Faloutsos at CMU)

Slides:



Advertisements
Similar presentations
Query Optimization Reserves Sailors sid=sid bid=100 rating > 5 sname (Simple Nested Loops) Imperative query execution plan: SELECT S.sname FROM Reserves.
Advertisements

CS CS4432: Database Systems II Operator Algorithms Chapter 15.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Relational Query Optimization Chapters 14.
1 Overview of Query Evaluation Chapter Objectives  Preliminaries:  Core query processing techniques  Catalog  Access paths to data  Index matching.
CMU SCS /615Faloutsos/Pavlo1 Carnegie Mellon Univ. Dept. of Computer Science /615 – DB Applications C. Faloutsos & A. Pavlo Lecture #13: Query.
Query Evaluation. An SQL query and its RA equiv. Employees (sin INT, ename VARCHAR(20), rating INT, age REAL) Maintenances (sin INT, planeId INT, day.
1 Relational Query Optimization Module 5, Lecture 2.
Query processing and optimization. Advanced DatabasesQuery processing and optimization2 Definitions Query processing –translation of query into low-level.
CS 245Notes 71 CS 245: Database System Principles Notes 7: Query Optimization Hector Garcia-Molina.
Quick Review of Apr 17 material Multiple-Key Access –There are good and bad ways to run queries on multiple single keys Indices on Multiple Attributes.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
Query Processing (overview)
CSCI 5708: Query Processing I Pusheng Zhang University of Minnesota Feb 3, 2004.
Query Optimization 3 Cost Estimation R&G, Chapters 12, 13, 14 Lecture 15.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Relational Query Optimization Chapter 15.
CSCI 5708: Query Processing I Pusheng Zhang University of Minnesota Feb 3, 2004.
Bitmap Indexes.
Query Processing & Optimization
©Silberschatz, Korth and Sudarshan14.1Database System Concepts 3 rd Edition Chapter 14: Query Optimization Overview Catalog Information for Cost Estimation.
1 Relational Operators. 2 Outline Logical/physical operators Cost parameters and sorting One-pass algorithms Nested-loop joins Two-pass algorithms.
CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 242 Database Systems II Query Execution.
Overview of Implementing Relational Operators and Query Evaluation
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Query Evaluation Chapter 12: Overview.
COMP 5138 Relational Database Management Systems Semester 2, 2007 Lecture 12 Query Processing and Optimization.
Query Optimization. overview Histograms A histogram is a data structure maintained by a DBMS to approximate a data distribution Equiwidth vs equidepth.
Advanced Databases: Lecture 8 Query Optimization (III) 1 Query Optimization Advanced Databases By Dr. Akhtar Ali.
Database Management 9. course. Execution of queries.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
Department of Computer Science and Engineering, HKUST Slide Query Processing and Optimization Query Processing and Optimization.
Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)
©Silberschatz, Korth and Sudarshan14.1Database System Concepts 3 rd Edition Chapter 14: Query Optimization Overview Catalog Information for Cost Estimation.
SCUHolliday - COEN 17814–1 Schedule Today: u Query Processing overview.
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan Chapter 13: Query Processing.
Computing & Information Sciences Kansas State University Tuesday, 03 Apr 2007CIS 560: Database System Concepts Lecture 29 of 42 Tuesday, 03 April 2007.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Implementing Relational Operators and Query Evaluation Chapter 12.
Lecture 1- Query Processing Advanced Databases Masood Niazi Torshiz Islamic Azad university- Mashhad Branch
Chapter 12 Query Processing. Query Processing n Selection Operation n Sorting n Join Operation n Other Operations n Evaluation of Expressions 2.
Chapter 12 Query Processing (1) Yonsei University 2 nd Semester, 2013 Sanghyun Park.
CS4432: Database Systems II Query Processing- Part 3 1.
Introduction to Query Optimization, R. Ramakrishnan and J. Gehrke 1 Introduction to Query Optimization Chapter 13.
CS4432: Database Systems II Query Processing- Part 2.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Introduction to Query Optimization Chapter 13.
1 B + -Trees: Search  If there are n search-key values in the file,  the path is no longer than  log  f/2  (n)  (worst case).
Query Processing and Query Optimization Database System Implementation CSE 507 Some slides adapted from Silberschatz, Korth and Sudarshan Database System.
Chapter 13: Query Processing
Query Processing COMP3017 Advanced Databases Nicholas Gibbins
CS4432: Database Systems II Query Processing- Part 1 1.
Database Applications (15-415) DBMS Internals- Part IX Lecture 20, March 31, 2016 Mohammad Hammoud.
Database Management System
Query Processing.
Chapter 12: Query Processing
Introduction to Query Optimization
Overview of Query Optimization
Database Management Systems (CS 564)
Faloutsos/Pavlo C. Faloutsos – A. Pavlo Lecture#15: Query Optimization
File Processing : Query Processing
Query Processing and Optimization
C. Faloutsos Query Optimization – part 1
Yan Huang - CSCI5330 Database Implementation – Access Methods
Query Processing B.Ramamurthy Chapter 12 11/27/2018 B.Ramamurthy.
Query processing and optimization
CS143:Evaluation and Optimization
Faloutsos/Pavlo C. Faloutsos – A. Pavlo Lecture#13: Query Evaluation
Database Applications (15-415) DBMS Internals- Part IX Lecture 21, April 1, 2018 Mohammad Hammoud.
Chapters 15 and 16b: Query Optimization
Lecture 2- Query Processing (continued)
Chapter 12 Query Processing (1)
Query processing and optimization
C. Faloutsos Query Optimization – part 2
Presentation transcript:

Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Query Processing (based on notes by C. Faloutsos at CMU)

General Overview - rel. model Relational model - SQL Functional Dependencies & Normalization Physical Design; Indexing Query processing/optimization Transaction processing Advanced topics Distributed Databases OO- and OR-DBMSs

Overview of a DBMS DBA casual user DML parser buffer mgr trans. mgr DML precomp. DDL parser catalog Data-files Naïve user

Overview - detailed Motivation - Why q-opt? Equivalence of expressions Cost estimation Cost of indices Join strategies

Why Q-opt? SQL: ~declarative good q-opt -> big difference e.g., seq. Scan vs B-tree index, on P=1,000 pages

Q-opt steps bring query in internal form (e.g., parse tree) … into ‘canonical form’ (syntactic q-opt) generate alternative plans estimate cost; pick best

Q-opt - example select name from STUDENT, TAKES where c-id=‘CIS331’ and STUDENT.ssn=TAKES.ssn STUDENT TAKES   STUDENTTAKES   Canonical form

Q-opt - example STUDENT TAKES   Index; seq. scan Hash join; merge join; nested loops;

Overview - detailed Why q-opt? Equivalence of expressions Cost estimation Cost of indices Join strategies

Equivalence of expressions … or syntactic q-opt In short: perform selections and projections early More details: see transformation rules in text

Equivalence of expressions Q: How to prove a transformation rule? A: use TRC, to show that LHS = RHS, e.g.:

Equivalence of expressions

Selections perform them early break a complex predicate, and push simplify a complex predicate (‘X=Y and Y=3’) -> ‘X=3 and Y=3’

Equivalence of expressions Projections perform them early (but carefully…) Smaller tuples Fewer tuples (if duplicates are eliminated) project out all attributes except the ones requested or required (e.g., joining attr.)

Equivalence of expressions Joins Commutative, associative Q: n-way join - how many diff. orderings? … Exhaustive enumeration too slow…

Q-opt steps bring query in internal form (e.g., parse tree) … into ‘canonical form’ (syntactic q-opt) generate alt. plans estimate cost; pick best

18 Cost estimation E.g., find ssn’s of students with an ‘A’ in CIS331 (using seq. scanning) How long will a query take? CPU (but: small cost; decreasing; tough to estimate) Disk (mainly, # block transfers) How many tuples will qualify? (what statistics do we need to keep?)

Cost estimation Statistics: for each relation ‘r’ we keep nr : # tuples; Sr : size of tuple in bytes … Sr #1 #2 #3 #nr

Cost estimation Statistics: for each relation ‘r’ we keep … V(A,r): number of distinct values of attr. ‘A’ (recently, histograms, too) … Sr #1 #2 #3 #nr

Derivable statistics fr: blocking factor = max# records/block (=?? ) br: # blocks (=?? ) SC(A,r) = selection cardinality = avg# of records with A=given (=?? ) … fr Sr #1 #2 #br

Derivable statistics fr: blocking factor = max# records/block (= B/Sr ; B: block size in bytes) br: # blocks (= nr / fr )

Derivable statistics SC(A,r) = selection cardinality = avg# of records with A=given (= nr / V(A,r) ) (assumes uniformity...) – eg: 30,000 students, 10 colleges – how many students in CST?

Additional quantities we need: For index ‘i’: fi: average fanout - degree (~50-100) HTi: # levels of index ‘i’ (~2-3) ~ log(#entries)/log(fi) LBi: # blocks at leaf level HTi

Statistics Where do we store them? How often do we update them?

Q-opt steps bring query in internal form (e.g., parse tree) … into ‘canonical form’ (syntactic q-opt) generate alt. plans selections; sorting; projections joins estimate cost; pick best

Cost estimation + plan generation Selections – e.g., select * from TAKES where grade = ‘A’ Plans? … fr Sr #1 #2 #br

Cost estimation + plan generation Plans? seq. scan binary search (if sorted & consecutive) index search if an index exists … fr Sr #1 #2 #br

Cost estimation + plan generation seq. scan – cost? br (worst case) br/2 (average, if we search for primary key) … fr Sr #1 #2 #br

Cost estimation + plan generation binary search – cost? if sorted and consecutive: ~log(br) + SC(A,r)/fr (=#blocks spanned by qualified tuples) … fr Sr #1 #2 #br

Cost estimation + plan generation estimation of selection cardinalities SC(A,r): non-trivial … fr Sr #1 #2 #br

Cost estimation + plan generation method#3: index – cost? levels of index + blocks w/ qual. tuples … fr Sr #1 #2 #br... case#1: primary key case#2: sec. key – clustering index case#3: sec. key – non- clust. index

Cost estimation + plan generation method#3: index – cost? levels of index + blocks w/ qual. tuples … fr Sr #1 #2 #br.. case#1: primary key – cost: HTi + 1 HTi

Cost estimation + plan generation method#3: index - cost? levels of index + blocks w/ qual. tuples … fr Sr #1 #2 #br case#2: sec. key – clustering index OR prim. index on non-key …retrieve multiple records HTi + SC(A,r)/fr HTi

Cost estimation + plan generation method#3: index – cost? levels of index + blocks w/ qual. tuples … fr Sr #1 #2 #br... case#3: sec. key – non- clust. index HTi + SC(A,r) (actually, pessimistic...)

Cost estimation – arithmetic examples find accounts with branch-name = ‘Perryridge’ account(branch-name, balance,...)

Arithm. examples – cont’d n-account = 10,000 tuples f-account = 20 tuples/block V(balance, account) = 500 distinct values V(branch-name, account) = 50 distinct values for branch-index: fanout fi = 20

Arithm. examples Q1: cost of seq. scan? A1: 500 disk accesses Q2: assume a clustering index on branch-name – cost?

Cost estimation + plan generation method#3: index – cost? levels of index + blocks w/ qual. tuples … fr Sr #1 #2 #br case#2: sec. key – clustering index HTi + SC(A,r)/fr HTi

Arithm. examples A2: HTi + SC(branch-name, account)/f-account HTi: 50 values, with index fanout 20 -> HT=2 levels (log(50)/log(20) = 1+) SC(..)= # qualified records = nr/V(A,r) = 10,000/50 = 200 tuples SC/f: spanning 200/20 blocks = 10 blocks

Arithm. examples A2 final answer: 2+10= 12 block accesses (vs. 500 block accesses of seq. scan) footnote: in all fairness seq. disk accesses: ~2msec or less random disk accesses: ~10msec

Overview - detailed Motivation - Why q-opt? Equivalence of expressions Cost estimation Cost of indices Join strategies

2-way joins algorithm(s) for r JOIN s? nr, ns tuples each r(A,...) s(A,......) nr ns

2-way joins Algorithm #0: (naive) nested loop (SLOW!) for each tuple tr of r for each tuple ts of s print, if they match r(A,...) s(A,......) nr ns

2-way joins Algorithm #0: why is it bad? how many disk accesses (‘br’ and ‘bs’ are the number of blocks for ‘r’ and ‘s’)? r(A,...) s(A,......) nr ns nr*bs + br

2-way joins Algorithm #1: Blocked nested-loop join read in a block of r read in a block of s print matching tuples r(A,...) s(A,......) nr, br ns records, bs blocks cost: br + br * bs

2-way joins Arithmetic example: nr = 10,000 tuples, br = 1,000 blocks ns = 1,000 tuples, bs = 200 blocks r(A,...) s(A,......) 10,000 1,000 1,000 records, 200 blocks alg#0: 2,001,000 d.a. alg#1: 201,000 d.a.

2-way joins Observation1: Algo#1: asymmetric: cost: br + br * bs - reverse roles: cost= bs + bs*br Best choice? r(A,...) s(A,......) nr, br ns records, bs blocks smallest relation in outer loop

2-way joins Other algorithm(s) for r JOIN s? nr, ns tuples each r(A,...) s(A,......) nr ns

2-way joins - other algo’s sort-merge sort ‘r’; sort ‘s’; merge sorted versions (good, if one or both are already sorted) r(A,...) s(A,......) nr ns

hash join: hash ‘r’ into (0, 1,..., ‘max’) buckets hash ‘s’ into buckets (same hash function) join each pair of matching buckets 2-way joins - other algo’s r(A,...) s(A,......) 0 1 max

More heuristics by Oracle, Sybase and Starburst (-> DB2) : in book In general: q-opt is very important for large databases. (‘explain select ’ gives plan) Structure of query optimizers:

Conclusions -- Q-opt steps bring query in internal form (eg., parse tree) … into ‘canonical form’ (syntactic q-opt) generate alt. plans selections (simple; complex predicates) sorting; projections joins estimate cost; pick best