C. Faloutsos Query Optimization – part 1

Slides:



Advertisements
Similar presentations
CMU SCS Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications C. Faloutsos – A. Pavlo Lecture#15: Query Optimization.
Advertisements

Database Management Systems, R. Ramakrishnan and J. Gehrke1 Relational Query Optimization Chapters 14.
CMU SCS /615Faloutsos/Pavlo1 Carnegie Mellon Univ. Dept. of Computer Science /615 – DB Applications C. Faloutsos & A. Pavlo Lecture #13: Query.
SPRING 2004CENG 3521 Query Evaluation Chapters 12, 14.
Query processing and optimization. Advanced DatabasesQuery processing and optimization2 Definitions Query processing –translation of query into low-level.
CS 245Notes 71 CS 245: Database System Principles Notes 7: Query Optimization Hector Garcia-Molina.
Quick Review of Apr 17 material Multiple-Key Access –There are good and bad ways to run queries on multiple single keys Indices on Multiple Attributes.
CSCI 5708: Query Processing I Pusheng Zhang University of Minnesota Feb 3, 2004.
CSCI 5708: Query Processing I Pusheng Zhang University of Minnesota Feb 3, 2004.
Bitmap Indexes.
©Silberschatz, Korth and Sudarshan14.1Database System Concepts 3 rd Edition Chapter 14: Query Optimization Overview Catalog Information for Cost Estimation.
Query Optimization. overview Histograms A histogram is a data structure maintained by a DBMS to approximate a data distribution Equiwidth vs equidepth.
Database Management 9. course. Execution of queries.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
Query Processing. Steps in Query Processing Validate and translate the query –Good syntax. –All referenced relations exist. –Translate the SQL to relational.
Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)
SCUHolliday - COEN 17814–1 Schedule Today: u Query Processing overview.
Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Query Processing (based on notes by C. Faloutsos at CMU)
Lecture 1- Query Processing Advanced Databases Masood Niazi Torshiz Islamic Azad university- Mashhad Branch
Chapter 12 Query Processing (1) Yonsei University 2 nd Semester, 2013 Sanghyun Park.
CS4432: Database Systems II Query Processing- Part 2.
Carnegie Mellon Carnegie Mellon Univ. Dept. of Computer Science Database Applications C. Faloutsos Rel. model - SQL part2.
File Organizations and Indexing
Chapter 13: Query Processing
Database Applications (15-415) DBMS Internals- Part IX Lecture 20, March 31, 2016 Mohammad Hammoud.
Chapter 4: Query Processing
Database Applications (15-415) DBMS Internals- Part VIII Lecture 17, Oct 30, 2016 Mohammad Hammoud.
UNIT 11 Query Optimization
Database Management System
Query Processing.
Prepared by : Ankit Patel (226)
CS222P: Principles of Data Management Lecture #15 Query Optimization (System-R) Instructor: Chen Li.
Chapter 12: Query Processing
Introduction to Query Optimization
Overview of Query Optimization
COST ESTIMATION FOR THE RELATIONAL ALGEBRA OPERATIONS MIT 813 GROUP 15 PRESENTATION.
Chapter 15 QUERY EXECUTION.
Database Management Systems (CS 564)
Faloutsos/Pavlo C. Faloutsos – A. Pavlo Lecture#15: Query Optimization
File Processing : Query Processing
Dynamic Hashing Good for database that grows and shrinks in size
Query Processing and Optimization
Lecture#12: External Sorting (R&G, Ch13)
Database Applications (15-415) DBMS Internals- Part VII Lecture 19, March 27, 2018 Mohammad Hammoud.
Database Applications (15-415) DBMS Internals- Part VI Lecture 15, Oct 23, 2016 Mohammad Hammoud.
External Sorting The slides for this text are organized into chapters. This lecture covers Chapter 11. Chapter 1: Introduction to Database Systems Chapter.
Yan Huang - CSCI5330 Database Implementation – Access Methods
Query Processing B.Ramamurthy Chapter 12 11/27/2018 B.Ramamurthy.
Query Processing.
Query processing and optimization
Chapter 13: Query Processing
Faloutsos/Pavlo C. Faloutsos – A. Pavlo Lecture#13: Query Evaluation
Selected Topics: External Sorting, Join Algorithms, …
Chapter 13: Query Processing
Database Applications (15-415) DBMS Internals- Part IX Lecture 21, April 1, 2018 Mohammad Hammoud.
Query Optimization.
Chapters 15 and 16b: Query Optimization
Lecture 2- Query Processing (continued)
Chapter 12 Query Processing (1)
Chapter 13: Query Processing
Query processing and optimization
Evaluation of Relational Operations: Other Techniques
C. Faloutsos Query Optimization – part 2
Chapter 13: Query Processing
Lecture 11: B+ Trees and Query Execution
CS222: Principles of Data Management Lecture #15 Query Optimization (System-R) Instructor: Chen Li.
Lecture 20: Query Execution
Unit 12 Index in Database 大量資料存取方法之研究 Approaches to Access/Store Large Data 楊維邦 博士 國立東華大學 資訊管理系教授.
Presentation transcript:

C. Faloutsos Query Optimization – part 1 Carnegie Mellon Univ. Dept. of Computer Science 15-415 - Database Applications C. Faloutsos Query Optimization – part 1

General Overview - rel. model Relational model - SQL Functional Dependencies & Normalization Physical Design Indexing Query optimization Transaction processing 15-415 - C. Faloutsos

Overview of a DBMS casual Naïve user user DBA DML parser DML precomp. DDL parser trans. mgr buffer mgr catalog Data-files 15-415 - C. Faloutsos

Overview - detailed Why q-opt? Equivalence of expressions C. Faloutsos Overview - detailed Why q-opt? Equivalence of expressions Cost estimation Cost of indices Join strategies 15-415 - C. Faloutsos CMU - 15-415

Why Q-opt? SQL: ~declarative good q-opt -> big difference eg., seq. Scan vs B-tree index, on P=1,000 pages 15-415 - C. Faloutsos

Q-opt steps bring query in internal form (eg., parse tree) … into ‘canonical form’ (syntactic q-opt) generate alt. plans estimate cost; pick best 15-415 - C. Faloutsos

Q-opt - example p s TAKES STUDENT select name from STUDENT, TAKES where c-id=‘415’ and STUDENT.ssn=TAKES.ssn 15-415 - C. Faloutsos

Q-opt - example p STUDENT TAKES s p Canonical form s STUDENT TAKES 15-415 - C. Faloutsos

Hash join; merge join; nested loops; Q-opt - example STUDENT TAKES s p Hash join; merge join; nested loops; Index; seq scan 15-415 - C. Faloutsos

Overview - detailed Why q-opt? Equivalence of expressions C. Faloutsos Overview - detailed Why q-opt? Equivalence of expressions Cost estimation Cost of indices Join strategies 15-415 - C. Faloutsos CMU - 15-415

Equivalence of expressions A.k.a.: syntactic q-opt in short: perform selections and projections early More details: see transf. rules in text 15-415 - C. Faloutsos

Equivalence of expressions Q: How to prove a transf. rule? A: use RTC, to show that LHS = RHS, eg: 15-415 - C. Faloutsos

Equivalence of expressions 15-415 - C. Faloutsos

Equivalence of expressions 15-415 - C. Faloutsos

Equivalence of expressions Q: how to disprove a rule?? 15-415 - C. Faloutsos

Equivalence of expressions Selections perform them early break a complex predicate, and push simplify a complex predicate (‘X=Y and Y=3’) -> ‘X=3 and Y=3’ 15-415 - C. Faloutsos

Equivalence of expressions Projections perform them early (but carefully…) Smaller tuples Fewer tuples (if duplicates are eliminated) project out all attributes except the ones requested or required (e.g., joining attr.) 15-415 - C. Faloutsos

Equivalence of expressions Joins Commutative , associative Q: n-way join - how many diff. orderings? 15-415 - C. Faloutsos

Equivalence of expressions Joins - Q: n-way join - how many diff. orderings? A: Catalan number ~ 4^n Exhaustive enumeration: too slow. 15-415 - C. Faloutsos

Q-opt steps bring query in internal form (eg., parse tree) … into ‘canonical form’ (syntactic q-opt) generate alt. plans estimate cost; pick best 15-415 - C. Faloutsos

Cost estimation Eg., find ssn’s of students with an ‘A’ in 415 (using seq. scanning) How long will a query take? CPU (but: small cost; decreasing; tough to estimate) Disk (mainly, # block transfers) How many tuples will qualify? (what statistics do we need to keep?) 15-415 - C. Faloutsos

Cost estimation Statistics: for each relation ‘r’ we keep … Sr #1 #2 #3 #nr Statistics: for each relation ‘r’ we keep nr : # tuples; Sr : size of tuple in bytes 15-415 - C. Faloutsos

Cost estimation Statistics: for each relation ‘r’ we keep … Sr Statistics: for each relation ‘r’ we keep … V(A,r): number of distinct values of attr. ‘A’ (recently, histograms, too) #1 #2 #3 … #nr 15-415 - C. Faloutsos

Derivable statistics fr: blocking factor = max# records/block (=?? ) … fr Sr #1 #2 #br fr: blocking factor = max# records/block (=?? ) br: # blocks (=?? ) SC(A,r) = selection cardinality = avg# of records with A=given (=?? ) 15-415 - C. Faloutsos

Derivable statistics fr: blocking factor = max# records/block (= B/Sr ; B: block size in bytes) br: # blocks (= nr / fr ) 15-415 - C. Faloutsos

Derivable statistics SC(A,r) = selection cardinality = avg# of records with A=given (= nr / V(A,r) ) (assumes uniformity...) – eg: 10,000 students, 10 colleges – how many students in SCS? 15-415 - C. Faloutsos

Additional quantities we need: For index ‘i’: fi: average fanout (~50-100) HTi: # levels of index ‘i’ (~2-3) ~ log(#entries)/log(fi) LBi: # blocks at leaf level HTi 15-415 - C. Faloutsos

Statistics Where do we store them? How often do we update them? 15-415 - C. Faloutsos

Q-opt steps bring query in internal form (eg., parse tree) … into ‘canonical form’ (syntactic q-opt) generate alt. plans selections; sorting; projections joins estimate cost; pick best 15-415 - C. Faloutsos

Cost estimation + plan generation … fr Sr #1 #2 #br Selections – eg., select * from TAKES where grade = ‘A’ Plans? 15-415 - C. Faloutsos

Cost estimation + plan generation … fr Sr #1 #2 #br Plans? seq. scan binary search (if sorted & consecutive) index search if an index exists 15-415 - C. Faloutsos

Cost estimation + plan generation … fr Sr #1 #2 #br seq. scan – cost? br (worst case) br/2 (average, if we search for primary key) 15-415 - C. Faloutsos

Cost estimation + plan generation … fr Sr #1 #2 #br binary search – cost? if sorted and consecutive: ~log(br) + SC(A,r)/fr (=blocks spanned by qual. tuples) 15-415 - C. Faloutsos

Cost estimation + plan generation … fr Sr #1 #2 #br estimation of selection cardinalities SC(A,r): non-trivial – details later 15-415 - C. Faloutsos

Cost estimation + plan generation … fr Sr #1 #2 #br method#3: index – cost? levels of index + blocks w/ qual. tuples ... case#1: primary key case#2: sec. key – clustering index case#3: sec. key – non-clust. index 15-415 - C. Faloutsos

Cost estimation + plan generation … fr Sr #1 #2 #br method#3: index – cost? levels of index + blocks w/ qual. tuples .. case#1: primary key – cost: HTi + 1 HTi 15-415 - C. Faloutsos

Cost estimation + plan generation Sr method#3: index – cost? levels of index + blocks w/ qual. tuples #1 fr #2 case#2: sec. key – clustering index HTi + SC(A,r)/fr … HTi #br 15-415 - C. Faloutsos

Cost estimation + plan generation … fr Sr #1 #2 #br method#3: index – cost? levels of index + blocks w/ qual. tuples ... case#3: sec. key – non-clust. index HTi + SC(A,r) (actually, pessimistic...) 15-415 - C. Faloutsos

Cost estimation – arithm. examples find accounts with branch-name = ‘Perryridge’ account(branch-name, balance, ...) 15-415 - C. Faloutsos

Arithm. examples – cont’d n-account = 10,000 tuples f-account = 20 tuples/block V(balance, account) = 500 distinct values V(branch-name, account) = 50 distinct values for branch-index: fanout fi = 20 15-415 - C. Faloutsos

Arithm. examples Q1: cost of seq. scan? A1: 500 disk accesses Q2: assume a clustering index on branch-name – cost? 15-415 - C. Faloutsos

Cost estimation + plan generation Sr method#3: index – cost? levels of index + blocks w/ qual. tuples #1 fr #2 case#2: sec. key – clustering index HTi + SC(A,r)/fr … HTi #br 15-415 - C. Faloutsos

Arithm. examples A2: HTi + SC(branch-name, account)/f-account HTi: 50 values, with index fanout 20 -> HT=2 levels (log(50)/log(20) = 1+) SC(..)= # qual records = nr/V(A,r) = 10,000/50 = 200 tuples spanning 200/20 blocks = 10 blocks 15-415 - C. Faloutsos

Arithm. examples A2 final answer: 2+10= 12 block accesses (vs. 500 block accesses of seq. scan) footnote: in all fairness seq. disk accesses: ~2msec or less random disk accesses: ~10msec 15-415 - C. Faloutsos

Q-opt steps bring query in internal form (eg., parse tree) … into ‘canonical form’ (syntactic q-opt) generate alt. plans selections; sorting; projections joins estimate cost; pick best 15-415 - C. Faloutsos