UNIT 11 Query Optimization

Slides:



Advertisements
Similar presentations
Query optimisation.
Advertisements

Chapter 15 Algorithms for Query Processing and Optimization Copyright © 2004 Pearson Education, Inc.
UNIT 15 Query Optimization Wei-Pang Yang, Information Management, NDHU Contents  15.1 Introduction to Query Optimization  15.2 The Optimization.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Relational Query Optimization Chapters 14.
Query Evaluation. An SQL query and its RA equiv. Employees (sin INT, ename VARCHAR(20), rating INT, age REAL) Maintenances (sin INT, planeId INT, day.
1 Relational Query Optimization Module 5, Lecture 2.
Query processing and optimization. Advanced DatabasesQuery processing and optimization2 Definitions Query processing –translation of query into low-level.
CS263 Lecture 19 Query Optimisation.  Motivation for Query Optimisation  Phases of Query Processing  Query Trees  RA Transformation Rules  Heuristic.
Relational Query Optimization (this time we really mean it)
CSCI 5708: Query Processing I Pusheng Zhang University of Minnesota Feb 3, 2004.
Query Optimization 3 Cost Estimation R&G, Chapters 12, 13, 14 Lecture 15.
Query Optimization. General Overview Relational model - SQL  Formal & commercial query languages Functional Dependencies Normalization Physical Design.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Relational Query Optimization Chapter 15.
CSCI 5708: Query Processing I Pusheng Zhang University of Minnesota Feb 3, 2004.
Query Optimization Overview Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems December 2, 2004 Some slide content derived.
©Silberschatz, Korth and Sudarshan14.1Database System Concepts 3 rd Edition Chapter 14: Query Optimization Overview Catalog Information for Cost Estimation.
Query Processing Presented by Aung S. Win.
Query Optimization, part 2 CS634 Lecture 13, Mar Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
Introduction to Database Systems1 Relational Query Optimization Query Processing: Topic 2.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Query Evaluation Chapter 12: Overview.
Query Optimization. overview Histograms A histogram is a data structure maintained by a DBMS to approximate a data distribution Equiwidth vs equidepth.
Database systems/COMP4910/Melikyan1 Relational Query Optimization How are SQL queries are translated into relational algebra? How does the optimizer estimates.
Advanced Databases: Lecture 8 Query Optimization (III) 1 Query Optimization Advanced Databases By Dr. Akhtar Ali.
Database Management 9. course. Execution of queries.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
Academic Year 2014 Spring. MODULE CC3005NI: Advanced Database Systems “QUERY OPTIMIZATION” Academic Year 2014 Spring.
Query Processing. Steps in Query Processing Validate and translate the query –Good syntax. –All referenced relations exist. –Translate the SQL to relational.
CS 338Query Evaluation7-1 Query Evaluation Lecture Topics Query interpretation Basic operations Costs of basic operations Examples Textbook Chapter 12.
1-1 Homework 3 Practical Implementation of A Simple Rational Database Management System.
SCUHolliday - COEN 17814–1 Schedule Today: u Query Processing overview.
Lecture 1- Query Processing Advanced Databases Masood Niazi Torshiz Islamic Azad university- Mashhad Branch
Introduction to Query Optimization, R. Ramakrishnan and J. Gehrke 1 Introduction to Query Optimization Chapter 13.
Query Processing – Query Trees. Evaluation of SQL Conceptual order of evaluation – Cartesian product of all tables in from clause – Rows not satisfying.
Index in Database Unit 12 Index in Database 大量資料存取方法之研究 Approaches to Access/Store Large Data 楊維邦 博士 國立東華大學 資訊管理系教授.
CSCI Query Processing1 QUERY PROCESSING & OPTIMIZATION Dr. Awad Khalil Computer Science Department AUC.
Query Optimization CMPE 226 Database Systems By, Arjun Gangisetty
Relational Operator Evaluation. overview Projection Two steps –Remove unwanted attributes –Eliminate any duplicate tuples The expensive part is removing.
Chapter 13 Query Optimization Yonsei University 1 st Semester, 2015 Sanghyun Park.
Chapter 13: Query Processing
CHAPTER 19 Query Optimization. CHAPTER 19 Query Optimization.
Database Applications (15-415) DBMS Internals- Part VIII Lecture 17, Oct 30, 2016 Mohammad Hammoud.
Database Management System
CS222P: Principles of Data Management Lecture #15 Query Optimization (System-R) Instructor: Chen Li.
Chapter 12: Query Processing
Introduction to Query Optimization
Evaluation of Relational Operations
Overview of Query Optimization
COST ESTIMATION FOR THE RELATIONAL ALGEBRA OPERATIONS MIT 813 GROUP 15 PRESENTATION.
Chapter 15 QUERY EXECUTION.
Database Management Systems (CS 564)
Introduction to Database Systems
Examples of Physical Query Plan Alternatives
File Processing : Query Processing
Database Applications (15-415) DBMS Internals- Part VII Lecture 19, March 27, 2018 Mohammad Hammoud.
Database Applications (15-415) DBMS Internals- Part VI Lecture 15, Oct 23, 2016 Mohammad Hammoud.
Query Processing B.Ramamurthy Chapter 12 11/27/2018 B.Ramamurthy.
Database Applications (15-415) DBMS Internals- Part IX Lecture 21, April 1, 2018 Mohammad Hammoud.
Advanced Database System
Relational Query Optimization
Unit 7 Normalization (表格正規化).
Chapter 12 Query Processing (1)
Overview of Query Evaluation
Relational Query Optimization
Question 1: Basic Concepts (45 %)
Relational Query Optimization (this time we really mean it)
Unit 12 Index in Database 大量資料存取方法之研究 Approaches to Access/Store Large Data 楊維邦 博士 國立東華大學 資訊管理系教授.
CS222: Principles of Data Management Lecture #15 Query Optimization (System-R) Instructor: Chen Li.
Relational Query Optimization
Relational Query Optimization
Unit 12 Index in Database 大量資料存取方法之研究 Approaches to Access/Store Large Data 楊維邦 博士 國立東華大學 資訊管理系教授.
Presentation transcript:

UNIT 11 Query Optimization

Contents 11.1 Introduction to Query Optimization 11.2 The Optimization Process: An Overview 11.3 Optimization in System R 11.4 Optimization in INGRES 11.5 Implementing the Join Operators

11.1 Introduction to Query Optimization

The Problem How to choose an efficient strategy for evaluating a given expression (a query). Expression (a query): e.g. select distinct S.SNAME from S, SP where S.S# =SP.S# and SP.P#= 'p2' Evaluate: Efficient strategy: First class e.g. (A join B) where condition-on-B (A join (B where condition-on-B) ) e.g. SP.P# = 'p2' Second class e.g. from S, SP ==> S join SP How to implement join operation efficiently? “Improvement" may not be an "optimal" version. P.11-31

Query Processing in the DBMS Query in SQL: SELECT CUSTOMER. NAME FROM CUSTOMER, INVOICE WHERE REGION = 'N.Y.' AND AMOUNT > 10000 AND CUTOMER.C#=INVOICE.C# Language Processor Optimizer Operator Processor Access Method File System database Language Processor Access Method ? Internal Form : P(σ (S SP) Operator : SCAN C using region index, create C SCAN I using amount index, create I SORT C?and I?on C# JOIN C?and I?on C# EXTRACT name field Calls to Access Method: OPEN SCAN on C with region index GET next tuple . Calls to file system: GET10th to 25th bytes from block #6 of file #5

An Example Suppose: |S| = 100, |SP| = 10,000, and there are 50 tuples in SP with p# = 'p2'? Results are placed in Main Memory. Query in SQL: SELECT S.* FROM S,SP WHERE S.S# = SP.S# AND SP.P# = 'p2‘ Method 1: iteration (Join + Restrict) SNAME STATUS CITY S# S2 S5 . S1 1 2 100 S P# QTY S3 10,000  SP Cost = 100 * 10,000 = 1,000,000 tuple I/O's

An Example (cont.) Method 2: Restriction iteration Join p#= 'p2' QTY S3 S1 . S2 P4 P2 1 2 10000 SP S# P# QTY S1 S3 . S2 P2 1 2 50 SP' restrict p#= 'p2' S SP' S# SNAME STATUS CITY S# P# QTY . 1 2 . 100 S2 . . . 1 2 . 50 S1 S3 . S2 P2 . S5 . S1 cost = 10,000 + 100 * 50 = 15,000 I/O

An Example (cont.) Method 3: Sort-Merge Join + Restrict Suppose S, SP are sorted on S#. S SP S# SNAME STATUS CITY S1 S2 . S100 S# P# QTY S1 . S100 1 2 . 100 1 2 . 10,000 . . . . . cost = 100 + 10,000 = 10,100 I/O

11.2 The Optimization Process: An Overview (1) Query => internal form (2) Internal form => efficient form (3) Choose candidate low-level procedures (4) Generate query plans and choose the cheapest one

Step 1: Cast the query into some internal representation Query => Algebra Step 1: Cast the query into some internal representation Query: "get names of suppliers who supply part p2" SQL: select distinct S.SNAME from S,SP where S.S# = SP.S# and SP.P# = 'p2' Query tree: result | project (SNAME) restrict (SP.P# = 'p2') join (S.S# = SP.S#) S SP 'P2' SNAME ( (S join SP) where P#= 'P2') [SNAME] or ( ( S SP) ) S.S# = SP.S# Algebra:

Step 2: Convert to equivalent and efficient form Def: Canonical Form Given a set Q of queries, for q1, q2 belong to Q, q1 are equivalent to q2 (q1 q2) iff they produce the same result, Subset C of Q is said to be a set of canonical forms for Q iff Note: Sufficient to study the small set C Transformation Rules C Q Step2 output of step1 trans. equivalent and more efficient form Algebra Efficient Algebra

Step 2: Convert to equivalent and efficient form (cont.) e.g.1 [restriction first] (A join B) where restriction_B A join ( B where restriction_B) e.g.2 [More general case] (A join B) where restriction_A and restriction_B (A where rest_on_A) join ( B where rest_on_B) e.g.3 [ Combine restriction] ( A where rest_1 ) where rest_2 A where rest_1 and rest_2 C q1 q1≡q2 q2 scan 2 1

Step 2: Convert to equivalent and efficient form (cont.) e.g.4 [projection] last attribute (A [attribute_list_1] ) [attri_2] A [attri_2] e.g.5 [restriction first] (A [attri_1]) where rest_1 (A where rest _1) [attri_1] n1<n n+n1 .

Step 2: Convert to equivalent and efficient form (cont.) e.g.6 [Introduce extra restriction] SP JOIN (P WHERE P.P#= 'P2') e.g.7 [Semantic transformation] (SP WHERE SP.P# = 'P2') JOIN (P WHERE P.P# = 'P2') (SP join P ) [S#] SP[S#] Note: a very significant improvement. Ref.[17.27] P.571 J. J. King, VLDB81 sp.p# = p.p# if restriction on join attribute sp.p# = p.p# SP P P# P1 P2 P3 P4 P5 S# QTY S1 S2 if SP.P# is a foreign key matching the primary term P.P#

Step 3: Choose candidate low-level procedures e.g. Join, restriction are low-level operators there will be a set of procedures for implementing each operator, e.g. Join (ref p.11-31) <1> Nested Loop (a brute force) <2> Index lookup (if one relation is indexed on join attribute) <3> Hash lookup (if one relation is hashed by join attribute) <4> Merge (if both relations are indexed on join attribute) .

Step 3: Choose candidate low-level procedures (cont.) SQL Algebra Data flow Canonical Form e.g. System catalog ( (C I)) existence of indexes cardinalities of relations Optimizer step3 : access path selection . . Lib predefined low-level procedures Ref. p.11-31 p.554 6 choose 2 One or more candidate procedures for each operator e.g. , , 2 3 2 Step4

Step 4: Generate query plans and choose the cheapest is built by combing together a set of candidate implementation procedures for any given query many many reasonable plans Note: may not be a good idea to generate all possible plans. heuristic technique "keep the set within bound" (reducing the search space)

Step 4: Generate query plans and choose the cheapest (cont.) Data flow output of step 3 Step 4(a) Step 4(b) choose the cheapest cheapest query plans 2 3 2 ( (C I)) ... 1 2 2 1

Step 4: Generate query plans and choose the cheapest (cont.) Choosing the cheapest require a method for assigning a cost to any given plan. factor of cost formula: (1) # of disk I/O (2) CPU utilization (3) size of intermediate results a difficult problem [Jarke 84, 17.3. p.564 ACM computing surveys] [Yao 79, 17.8 TODS] .

11.3 Optimization in System R

Optimization in System R Only minor changes to DB2 and SQL/DS. Query in System R (SQL) is a set of "select-from-where" block System R optimizer step1: choosing block order first in case of nested => innermost block first step2: optimizing individual blocks Note: certain possible query plan will never be considered. The statistical information for optimizer Where: from the system catalog What: 1. # of tuples on each relation 2. # of pages occupied by each relation. 3. percentage of pages occupied by each relation. 4. # of distinct data values for each index. 5. # of pages occupied by each index. . Note: not updated every time the database is updated. (overhead??)

Optimization in System R (cont.) Given a query block case 1. involves just a restriction and/or projection 1. statistical information (in catalog) 2. formulas for size estimates of intermediate results. 3. formulas for cost of low-level operations (next section) choose a strategy for constructing the query operation. case 2. involves one or more join operations e.g. A join B join C join D ((A join B) join C) join D Never: (A join B) join (C join D) Why? See next page

Optimization in System R (cont.) ((A join B) join C) join D Never: (A join B) join (C join D) Note: 1. "reducing the search space" 2. heuristics for choosing the sequence of joins are given in [17.34] P.573 3. (A join B) join C not necessary to compute entirely before join C i.e. if any tuple has been produced It may never be necessary to finish relation "A B ", why ? pass to join C ∵ C has run out ??

Optimization in System R (cont.) How to determine the order of join in System R ? consider only sequential execution of multiple join. <e.g.> ((A B) C) D (A B) (C D) × STEP1: Generate all possible sequences <e.g.> (1) ((A B) C) D (2) ((A B) D) C (3) ((A C) B) D (4) ((A C) D) B (5) ((A D) B) C (6) ((A D) C) B (7) ((B C) A) D (8) ((B C ) D) A (9) ((B D) A) C (10) ((B D) C) A (11) ((C D) A) B (12) ((C D) B) A Total # of sequences = ( 4! )/ 2 = 12

Optimization in System R (cont.) STEP 2: Eliminate those sequences that involve Cartesian Product if A and B have no attribute names in common, then A B = A x B STEP 3: For the remainder, estimate the cost and choose a cheapest.

11.4 Optimization in INGRES

Query Decomposition a general idea for processing queries in INGRES. basic idea: break a query involving multiple tuple variables down into a sequence of smaller queries involving one such variable each, using detachment and tuple substitution. avoid to build Cartesian Product. keep the # of tuple to be scanned to a minimum. <e.g> "Get names of London suppliers who supply some red part weighing less than 25 pounds in a quantity greater than 200" Initial query: Q0: RETRIEVE (S.SNAME) WHERE S.CITY= 'London' AND S.S# = SP.S# AND SP.QTY > 200 AND SP.P# = P.P# AND P.COLOR = Red AND P.WEIGHT < 2 5 detach P

Query Decomposition (cont.) D1: RETRIEVE INTO P' (P.P#) WHERE P.COLOR= 'Red' AND P.WEIGHT < 25 Q1: RETRIVE (S.SNAME) WHERE S.CITY = 'London' AND S.S# = SP.S# AND SP.QTY > 200 AND SP.P# = P'.P# S join SP join P’ detach SP D2: RETRIEVE INTO SP' (SP.S#, SP.P#) WHERE SP.QTY > 200 Q2: RETRIEVE (S.SNAME) WHERE S.CITY = 'London' AND S.S#=SP'.S# AND SP'.P#=P'.P# detach S

Query Decomposition (cont.) D3: RETRIEVE INTO S' (S.S#, S.SNAME) WHERE S.CITY = 'LONDON' Q3: RETRIEVE (S'.SNAME) WHERE S'.S# =SP'.S# AND SP'.P# = P'.P# D4: RETRIEVE INTO SP"(SP'.S#) WHERE SP'.P# =P'.P# Q4: RETRIEVE (S'.SNAME) WHERE S'.S# = SP".S# D5: RETRIEVE INTO SP"(SP'.S#) WHERE SP'.P# = 'P1' OR SP'.P#= 'P3‘ Q5: RETRIEVE (S'.SNAME) WHERE S'.S# = 'S1' OR S'.S# = 'S2' OR S'.S# = 'S4' detach P' and SP' D4: two var. --> tuple substitution ( Suppose D1 evaluate to {P1, P3 } Q4 : two var. --> tuple substitution ( Suppose D5 evaluate to { S1, S2, S4})

Query Decomposition (cont.) Decomposition tree for query Q0: D1, D2, D3: queries involve only one variable => evaluate D4, Q4: queries involve tow variable => tuple substitution Objectives : avoid to build Cartesian Product. keep the # of tuple to be scanned to a minimum. Overall result Q4 (Q5) S' SP'' D3 D4 (D5) P' SP' S D1 D2 P SP

11.5 Implementing the Join Operators Method 1: Nested Loop Method 2: Index Lookup Method 3: Hash Lookup Method 4: Merge back

Join Operation Suppose R S is required, R.A and S.A are join attributes. R A a b . e . . . . . . . . . . 1 m S n

Method 1: Nested Loop Suppose R and S are not sorted on A. b . e . . . . . . . . . . 1 m n O (mn) the worst case assume that S is neither indexed nor hashed on A will usually be improved by constructing index or hash on S.A dynamically and then proceeding with an index or hash lookup scan.

Method 2: Index Lookup Suppose S in indexed on A R S.A_index S . . . . . . . . . . A b a . . . . . . . . . . . A a . . 1 n 1 a b a . e . . . . m b . . . .

Calculate hash function is faster than search in index. Method 3: Hash Lookup Suppose S is hashed on A. R.A a b z . e . . . . . . . . . . 1 m R S h(a) h(e) h(e) = 1 h(b) = 0 h(a) = 2 h(z) = 2 2 . . . . . . . . .. . . . . . S.A Calculate hash function is faster than search in index.

Method 4: Merge Suppose R and S are both sorted (for indexed) on A. = c z . . . . . . . . . . 1 m . R d S n = A b a z c . . . . . . . . . . . d R S Only index is retrieved for any unmatched tuple. Back to p.11-4