Query Optimizer (Chapter 9.0 - 9.6). Optimization Minimizes uses of resources by choosing best set of alternative query access plans considers I/O cost,

Slides:



Advertisements
Similar presentations
CS4432: Database Systems II
Advertisements

Query Optimization CS634 Lecture 12, Mar 12, 2014 Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
Evaluation of Relational Operators CS634 Lecture 11, Mar Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
Select Operation- disk access and Indexing *Some info on slides from Dr. S. Son, U. Va.
Database Management Systems 3ed, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 14, Part B.
Implementation of Other Relational Algebra Operators, R. Ramakrishnan and J. Gehrke1 Implementation of other Relational Algebra Operators Chapter 12.
Database Management Systems, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
Database Management Systems, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
Query Evaluation. An SQL query and its RA equiv. Employees (sin INT, ename VARCHAR(20), rating INT, age REAL) Maintenances (sin INT, planeId INT, day.
SPRING 2004CENG 3521 Query Evaluation Chapters 12, 14.
1 Relational Query Optimization Module 5, Lecture 2.
Evaluation of Relational Operators 198:541. Relational Operations  We will consider how to implement: Selection ( ) Selects a subset of rows from relation.
Access Path Selection in a RDBMS Shahram Ghandeharizadeh Computer Science Department University of Southern California.
CSCI 5708: Query Processing I Pusheng Zhang University of Minnesota Feb 3, 2004.
Quick Review of Apr 15 material Overflow –definition, why it happens –solutions: chaining, double hashing Hash file performance –loading factor –search.
Query Optimization 3 Cost Estimation R&G, Chapters 12, 13, 14 Lecture 15.
1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
1 External Sorting for Query Processing Yanlei Diao UMass Amherst Feb 27, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
1 Query Processing: The Basics Chapter Topics How does DBMS compute the result of a SQL queries? The most often executed operations: –Sort –Projection,
Relational Database Performance CSCI 6442 Copyright 2013, David C. Roberts, all rights reserved.
Query Optimization, part 2 CS634 Lecture 13, Mar Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
Executing Explain Plans and Explaining Execution Plans Craig Martin 01/20/2011.
CS 345: Topics in Data Warehousing Thursday, October 21, 2004.
Ashwani Roy Understanding Graphical Execution Plans Level 200.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
Query Optimization Chap. 19. Evaluation of SQL Conceptual order of evaluation – Cartesian product of all tables in from clause – Rows not satisfying where.
Academic Year 2014 Spring. MODULE CC3005NI: Advanced Database Systems “QUERY OPTIMIZATION” Academic Year 2014 Spring.
CPSC 404, Laks V.S. Lakshmanan1 Evaluation of Relational Operations: Other Operations Chapter 14 Ramakrishnan & Gehrke (Sections ; )
Relational Operator Evaluation. Overview Index Nested Loops Join If there is an index on the join column of one relation (say S), can make it the inner.
Lec3/Database Systems/COMP4910/031 Evaluation of Relational Operations Chapter 14.
Copyright © Curt Hill Query Evaluation Translating a query into action.
Storage and Indexing1 Overview of Storage and Indexing.
1 Chapter 10 Joins and Subqueries. 2 Joins & Subqueries Joins – Methods to combine data from multiple tables – Optimizer information can be limited based.
CS4432: Database Systems II Query Processing- Part 2.
Relational Operator Evaluation. Overview Application Programmer (e.g., business analyst, Data architect) Sophisticated Application Programmer (e.g.,
Query Processing – Query Trees. Evaluation of SQL Conceptual order of evaluation – Cartesian product of all tables in from clause – Rows not satisfying.
CPSC 404, Laks V.S. Lakshmanan1 Overview of Query Evaluation Chapter 12 Ramakrishnan & Gehrke (Sections )
ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 12 – Introduction to.
Database Management Systems 1 Raghu Ramakrishnan Evaluation of Relational Operations Chpt 14.
File Processing : Query Processing 2008, Spring Pusan National University Ki-Joune Li.
Query Processing – Implementing Set Operations and Joins Chap. 19.
Relational Operator Evaluation. overview Projection Two steps –Remove unwanted attributes –Eliminate any duplicate tuples The expensive part is removing.
Query Execution Query compiler Execution engine Index/record mgr. Buffer manager Storage manager storage User/ Application Query update Query execution.
Alon Levy 1 Relational Operations v We will consider how to implement: – Selection ( ) Selects a subset of rows from relation. – Projection ( ) Deletes.
1 Overview of Query Evaluation Chapter Outline  Query Optimization Overview  Algorithm for Relational Operations.
Chapter 13: Query Processing
SQL LANGUAGE TUTORIAL Prof: Dr. Shu-Ching Chen TA: Hsin-Yu Ha.
CS4432: Database Systems II Query Processing- Part 1 1.
Select Operation Strategies And Indexing (Chapter 8)
Indexes By Adrienne Watt.
Database Management System
Chapter 12: Query Processing
Database Management Systems (CS 564)
Evaluation of Relational Operations: Other Operations
File Processing : Query Processing
Relational Operations
Lecture 12 Lecture 12: Indexing.
CS222P: Principles of Data Management Notes #11 Selection, Projection
Database Applications (15-415) DBMS Internals- Part VI Lecture 15, Oct 23, 2016 Mohammad Hammoud.
Query processing and optimization
Selected Topics: External Sorting, Join Algorithms, …
Lecture 2- Query Processing (continued)
Overview of Query Evaluation
Implementation of Relational Operations
CS222P: Principles of Data Management Notes #13 Set operations, Aggregation, Query Plans Instructor: Chen Li.
CS222: Principles of Data Management Notes #11 Selection, Projection
Evaluation of Relational Operations: Other Techniques
External Sorting Sorting is used in implementing many relational operations Problem: Relations are typically large, do not fit in main memory So cannot.
Evaluation of Relational Operations: Other Techniques
CS222/CS122C: Principles of Data Management UCI, Fall 2018 Notes #10 Selection, Projection Instructor: Chen Li.
Presentation transcript:

Query Optimizer (Chapter )

Optimization Minimizes uses of resources by choosing best set of alternative query access plans considers I/O cost, CPU cost gathers statistics - may become out of date (DB2 - RUNSTATS) selectivity of values - 1/domain - used to determine number of tuples of each values

Filter Factor - selectivity Fraction of rows with specified values(s) for specified attribute that result from the predicate restriction FF(c)= # records satisfying condition total# of records in relation Estimate attribute with i distinct values as: ( |R|/i) / |R| = 1/col_cardinality e.g. (10,000/2)/10,000 = 1/2

Filter Factor FF FF tells how many tuples satisfy predicate - hopefully only need to access those tuples + index Statistical assumptions - uniform distribution of column values, independent join distribution of values from any 2 columns

Assumptions Attribute values independent Conjunctive select (independent) C1 and C2 FF(C1) * FF(C2) e.g. 1/2 (gender) * 1/4 (class) = 1/8 freshman female in CS

Information for Optimization 1.SYSCOLUMNS col_name, table_name, #of values, High, Low 2.Cluster Ratio how well clustering property holds for rows with respect to a given index if 100% clustered - clustered with updates, becomes less clustered if clustering ratio 80% or more, use sequential prefetch 3.Statistics on columns that deviate strongly from the uniform assumption

Examples of FF if SQL statement specified: –col = const, DB2 assumes FF is 1/col_cardinality – col between const1 and const2 DB2 assumes FF=(const2 - const1)/(High - Low) Predicates involving non-correlated subselects can be used for index retrieval but FF not predictable by simple formula

Explain Plan You can have access to query plan with EXPLAIN PLAN statement for SQL_query in ORACLE gives access type (index) col

Plans using Indexes Can use an index if index matches select condition in where clause: A matching index scan - only have to access a limited number of contiguous leaf entries to access data Predicate screening – index entries to eliminate RIDs Non-matching index scan – use index to identify RIDs Index-only retrieval – don’t have to access data, RIDs Multiple index retrieval – use >1 index to identify RIDs

Matching index scan When is a matching index scan used? Assume a table T1 with multiple indexes on columns C1, C2 and C3 1.Single where clause and (one) index matches Select * from T1 where C1=10 search B+-tree to leaf level for leftmost entry having specified values useful for =, between

Index Scan used 2.If multiple where clauses and all '=' Select * from T1 where C1=10 and C2=5 and C3=1 a) if there is a separate index for each clause must choose one of the indexes b) if there is a composite index and a select condition matches all index columns only have to read contiguous leaf pages FF = FF(P1) * FF(P2) *...

Index Scan used 3. If all select conditions match composite index columns and some selects are a range Select * from T1 where C1=10 and C2 between 5 and 50 - not all entries on contiguous leaf pages If must examine index entries to determine if in the result called predicate screening

Predicate screening discard RIDs based on values (for index) will access fewer tuples because RIDs used instead to eliminate potential tuples

Index Scan used 4. If select conditions match some index columns of composite index Select * from T1 where C1=10 and C2=30 and C6=20 - a matching scan can be used if at least one of the columns in select is first column of index –must eliminate tuples with what indexes you can, then examine the tuples

Rules for predicate matching Decide how many attributes to match in a composite index after the first column, so can read in a small contiguous range of leaf entries in B+-tree to get RIDs Match first column of composite index then: –look at index columns from left to right –Match ends when no predicate found –If range (<=, like, between) for a column, match terminates thereafter If a range, easier to scan all entries for range - treat rest of entries as screening predicates

Non-matching index scan attributes in where clause don't include initial attribute of index Select * from T1 where C2=30 and C3=15 search leaf entries of index and compare values for entries must read in all leaf pages to find C2, C3 values e.g. 50 index pages vs 500,000 data pages

Index only retrieval elements retrieved in select clause are attributes of compose index don't need to access rows (actual data) Select C1, C3 from T1 where C1=5 and C3 between 2 and 5 Select count(*) from T1

Multiple Index Access If conjunctive conditions in where clause (and), can use >1 index –Extract RIDs from each index satisfying matching predicate – Intersect lists of RIDs (and them) from each index – Final list - satisfies all predicates indexed

Multiple Index Access –If disjunctive conditions (or) Union the two lists of RIDs

Query optimizer rules for RIDs (DB2) 1. predicted active resulting RIDs must not be > 50% of RID pool 2. Limit to any single RID list the size of the RID memory pool (16M RIDs) 3. RID list cannot be generated by screening predicates

Rules cont’d Optimizer determines diminishing returns using multiple index access 1. List indexes with matching predicates in where clause 2. Place indexes in order by increasing filter factor 3. For successive indexes, extract RID list only if reduced cost for final row returned e.g. no sense reading 100's of pages of a new index to get number of rows to only 1 tuple

Example with Multiple Indexes Table prospects: 50M rows Indexes: zipcode – 100,000 values hobby – 100 values age – 50 values incomeclass – 10 values

Example with Multiple Indexes Select name, stradr from prospects where zipcode between and and age = 40 and hobby = ‘chess’ and incomeclass = 10; FF in ascending order: 1.FF(zipcode) = 500/100,000 = 1/200 2.FF(hobby) = 1/100 3.FF(age) = 1/50 4.FF(incomeclass) = 1/10

Example (1) 50,000,000/200 = 250,000 (2) 250,000/100 = 2500 (3) 2500/50 = 50 (4) 50/10 = 5 How much time will this take? Is it cost effective to use all of these indexes? see textbook Pg. 579