Relational Database Performance CSCI 6442 Copyright 2013, David C. Roberts, all rights reserved.

Slides:



Advertisements
Similar presentations
Physical Database Design and Tuning R&G - Chapter 20 Although the whole of this life were said to be nothing but a dream and the physical world nothing.
Advertisements

Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Query Optimization Reserves Sailors sid=sid bid=100 rating > 5 sname (Simple Nested Loops) Imperative query execution plan: SELECT S.sname FROM Reserves.
CS4432: Database Systems II
Relational Algebra, Join and QBE Yong Choi School of Business CSUB, Bakersfield.
Query Execution, Concluded Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems November 18, 2003 Some slide content may.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Relational Query Optimization Chapters 14.
Database Management Systems 3ed, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 14, Part B.
Database Management Systems, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
Query Evaluation. An SQL query and its RA equiv. Employees (sin INT, ename VARCHAR(20), rating INT, age REAL) Maintenances (sin INT, planeId INT, day.
Paper by: A. Balmin, T. Eliaz, J. Hornibrook, L. Lim, G. M. Lohman, D. Simmen, M. Wang, C. Zhang Slides and Presentation By: Justin Weaver.
DB performance tuning using indexes Section 8.5 and Chapters 20 (Raghu)
1 Chapter 10 Query Processing: The Basics. 2 External Sorting Sorting is used in implementing many relational operations Problem: –Relations are typically.
CS263 Lecture 19 Query Optimisation.  Motivation for Query Optimisation  Phases of Query Processing  Query Trees  RA Transformation Rules  Heuristic.
Relational Query Optimization (this time we really mean it)
David Konopnicki Choosing Access Path ä The basic methods. ä The access paths and when they are available. ä How the optimizer chooses among the.
1 Query Optimization. 2 Why Optimize? Given a query of size n and a database of size m, how big can the output of applying the query to the database be?
Access Path Selection in a RDBMS Shahram Ghandeharizadeh Computer Science Department University of Southern California.
Query Optimization 3 Cost Estimation R&G, Chapters 12, 13, 14 Lecture 15.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 11 Database Performance Tuning and Query Optimization.
Query Optimization. General Overview Relational model - SQL  Formal & commercial query languages Functional Dependencies Normalization Physical Design.
Introduction to Database Systems 1 Join Algorithms Query Processing: Lecture 1.
1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Relational Query Optimization Chapter 15.
Query Optimization Overview Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems December 2, 2004 Some slide content derived.
1 Query Processing: The Basics Chapter Topics How does DBMS compute the result of a SQL queries? The most often executed operations: –Sort –Projection,
Chapter 19 Query Processing and Optimization
CS 4432query processing - lecture 171 CS4432: Database Systems II Lecture #17 Join Processing Algorithms (cont). Professor Elke A. Rundensteiner.
Evaluation of Relational Operations. Relational Operations v We will consider how to implement: – Selection ( ) Selects a subset of rows from relation.
1 Optimization. 2 Why Optimize? Given a query of size n and a database of size m, how big can the output of applying the query to the database be? Example:
Access Path Selection in a Relation Database Management System (summarized in section 2)
AN INTRODUCTION TO EXECUTION PLAN OF QUERIES These slides have been adapted from a presentation originally made by ORACLE. The full set of original slides.
Query Optimization, part 2 CS634 Lecture 13, Mar Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
Database System Architecture and Performance CSCI 6442 ©Copyright 2015, David C. Roberts, all rights reserved.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 10 Database Performance Tuning and Query Optimization.
ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 14 – Join Processing.
Query Optimization. overview Histograms A histogram is a data structure maintained by a DBMS to approximate a data distribution Equiwidth vs equidepth.
Database Management 9. course. Execution of queries.
Ashwani Roy Understanding Graphical Execution Plans Level 200.
Query Optimization Chap. 19. Evaluation of SQL Conceptual order of evaluation – Cartesian product of all tables in from clause – Rows not satisfying where.
1 Chapter 10 Joins and Subqueries. 2 Joins & Subqueries Joins – Methods to combine data from multiple tables – Optimizer information can be limited based.
Module 4 Database SQL Tuning Section 3 Application Performance.
Query Optimizer (Chapter ). Optimization Minimizes uses of resources by choosing best set of alternative query access plans considers I/O cost,
Query Processing – Query Trees. Evaluation of SQL Conceptual order of evaluation – Cartesian product of all tables in from clause – Rows not satisfying.
Chapter 5 Index and Clustering
David Konopnicki –1997, Rev. MS Optimizing Join Statements To choose an execution plan for a join statement, the optimizer must choose: ä Access.
CS 440 Database Management Systems Lecture 5: Query Processing 1.
Sorting and Joining.
Query Processing – Implementing Set Operations and Joins Chap. 19.
Relational Operator Evaluation. overview Projection Two steps –Remove unwanted attributes –Eliminate any duplicate tuples The expensive part is removing.
Implementation of Database Systems, Jarek Gryz1 Evaluation of Relational Operations Chapter 12, Part A.
Query Execution Query compiler Execution engine Index/record mgr. Buffer manager Storage manager storage User/ Application Query update Query execution.
Alon Levy 1 Relational Operations v We will consider how to implement: – Selection ( ) Selects a subset of rows from relation. – Projection ( ) Deletes.
1 Overview of Query Evaluation Chapter Outline  Query Optimization Overview  Algorithm for Relational Operations.
Chapter 13: Query Processing
Chapter 10 The Basics of Query Processing. Copyright © 2005 Pearson Addison-Wesley. All rights reserved External Sorting Sorting is used in implementing.
Database Applications (15-415) DBMS Internals- Part VIII Lecture 19, March 29, 2016 Mohammad Hammoud.
Database Applications (15-415) DBMS Internals- Part IX Lecture 20, March 31, 2016 Mohammad Hammoud.
IT 5433 LM4 Physical Design. Learning Objectives: Describe the physical database design process Explain how attributes transpose from the logical to physical.
Choosing Access Path The basic methods.
Teradata Join Processing
CS222P: Principles of Data Management Lecture #15 Query Optimization (System-R) Instructor: Chen Li.
Database Performance Tuning and Query Optimization
COST ESTIMATION FOR THE RELATIONAL ALGEBRA OPERATIONS MIT 813 GROUP 15 PRESENTATION.
Access Path Selection in a Relational Database Management System
Cse 344 April 25th – Disk i/o.
Chapter 11 Database Performance Tuning and Query Optimization
Evaluation of Relational Operations: Other Techniques
External Sorting Sorting is used in implementing many relational operations Problem: Relations are typically large, do not fit in main memory So cannot.
CS222: Principles of Data Management Lecture #15 Query Optimization (System-R) Instructor: Chen Li.
Presentation transcript:

Relational Database Performance CSCI 6442 Copyright 2013, David C. Roberts, all rights reserved

“Optimization” More properly called access path selection “Optimizer” selects a strategy for processing Approaches: ◦ Cost-based: estimate total cost to process by different approaches, choose lowest estimate ◦ Heuristic: use rules to decid e how to process Cost-based is typically used by all database systems today 2

RUNSTATS RUNSTATS is the name of a statistics- gathering utility first included in IBM’s DB2 It scans the database, gathers statistics used for estimating costs for access path selection DBA determines how often and when to run the utility What statistics do you think are gathered? 3

Quandary The more that RUNSTATS collects, the better job the optimizer can do of selecting efficient processing methods However, RUNSTATS uses a lot of resources, scanning every relation Use of RUNSTATS must be balanced against its cost 4

The Optimizer Selects which indexes to use Chooses the order of using indexes Chooses algorithms to use Decides when to apply predicates 5

SQL Statement Parts of Interest Simple query: SELECT ENAME, JOB FROM EMP WHERE SAL > 20 OR JOB = ‘VP’ OR JOB LIKE ‘PRES%’; We’re interested in the FROM clause (that tells us the table names) and the WHERE clause (that tells us the predicates) 6

SINGLE-TABLE QUERIES 7

Predicates WHERE clause of SQL statement is made up of predicates Each predicate is a condition Each condition references a column Conditions may be equality, inequality, range, LIKE The first three conditions use an index if one exists, scan the table if no index exists 8

Example SAL > 20 OR JOB = ‘VP’ OR JOB LIKE ‘PRES%’; For each predicate, do we use an index to retrieve rows that make it true, then examine each row for the other predicates? 9

Predicate Selectivity Selectivity: an estimate of the fraction of rows of a table that make a predicate true 10

Classes of Predicates Predicate: condition in the WHERE clause Predicates are combined using AND, OR to make WHERE clauses Classes of predicates: ◦ Sargable: search arguments that can be processed close to the data ◦ Residual: not sargable, such as complex use of nesting 11

Access Paths Five possible access paths: ◦ Table scan ◦ Non-selective index scan ◦ Selective index scan ◦ Index only access ◦ Fully qualified unique index Each of these types of scans has different cost estimates for its use 12

Predicate Selectivity Selectivity function f(p): % of rows retrieved on average by predicate p Number of rows retrieved is strongly related to the cost to carry out the operation n = number of rows in table 13 Form of Pf column = value1/n column != value1-1/n (nearly 1) column > value(high value - search value)/(high value - low value) Column LIKE ‘value’n p1 or p2f(p1) + f(p2) p1 and p2f(p1) * f(p2)

Single-Table Queries Find out which columns that are referenced in the WHERE clause have indexes Find out selectivity of indexes Estimate selectivity of each predicate Use most selective index-predicate combination to retrieve rows that satisfy one predicate Examine each row for other predicates 14

MULTI-TABLE QUERIES (I.E., JOINS) 15

Join Result of a join is a subset of the Cartesian product of the tables being joined Cartesian product of two tables with m and n rows is a new table of mn rows, where every row of the join consists of one row of the first table and one row of the second table 16

Example Join 17

Simple Join Processing Algorithm 1. Form the Cartesian product of all tables involved in the join 2. Scan rows of the Cartesian product, testing each against all of the predicates 3. Eliminate rows of the Cartesian product that don’t meet the predicates What’s wrong with this picture? Think about two tables of 1 million rows. Cartesian product would be 1 thousand billion rows! 18

Joining More than 2 Relations A join of more than two relations is processed 2 relations at a time Part of access path planning is to select that sequence We will talk about algorithms for joining 2 tables and then choosing the order of processing a multi-table join 19

Joins An equijoin is based on equality of an attribute of each of two relations Outer join includes all rows of both tables even if some rows did not have a matching value A semi-join can be based on inequalities as the relationship 20

Join-Processing Algorithms Nested loop join ◦ Each tuple of outer relation is compared to all rows of the inner relation Sort-merge join Hash-based join 21

Nested-Loop Join The algorithm: For efficiency, the relation with higher cardinality (R) is chosen as the inner relation Number of operations: N R + N R * N S What if there is an index? 22

Nested-Loop 23

Join Order For JOIN queries, the “outer” table is access first, “inner” second Order for joining tables must be selected Most selective first Least costly joins first 24

Merge Join First, each relation is sorted on the join attribute Then both relations are scanned in the order of the join attributes Tuples that satisfy the join predicate are concatenated and placed in the output relation Number of operations: N R +N S (after the sort!) What is there is an index on R or S or both? 25

Merge Join Algorithm 26

Merge Join 27

Hash-Join The joins we have looked at compare tuples in the first relation with tuples in the second relation that cannot possibly be part of the join The goal of the hash join is to compare only those tuples that might be part of the join Hashing is used to identify those tuples There are many variations of hash-join 28

Simple Hash-Join Algorithm 29

Hash-Join Performance Performance of hash-join can be superior to other join algorithm Performance depends on the hashing algorithm (although note Lum’s research) Perfect hashing algorithm could find match or non-match with a single probe With hash table in RAM, processing would be very fast 30

Indexes Impact of a b-tree index on performance of these algorithms is obvious But the index must be maintained itself When an attribute that’s indexed in changed in a relation, the value in the index must also be changed And note that the changes must be synchronized (and locked together) 31

Order of Processing Joins Typically, all combinations of order of processing are considered and a cost developed for each Selectivity of predicates, selectivity of indexes, cardinality of relations all are factors in cost analysis Goal is to minimize number of intermediate results produced during processing Usually, low selectivity values are processed first (that is, highest selectivity) 32

Summary Single-table queries Multi-table queries ◦ Nested loop ◦ Sort-merge ◦ Hash Order of processing joins 33

But Note: We have left out a LOT Relations may be partitioned and joins processed by partition Many other parts of the DBMS affect performance If you are responsible for database performance, buy a book and dig in Remember not to give up on normalization to get performance 34

And Now: What You Do for Performance 35

How to Start First, don’t even consider denormalization You have many tools to get the performance you need without ruining the data model (and the applications) Performance test the applications Look for SQL operations that are taking a long time 36

EXPLAIN IBM invented the EXPLAIN utility; it explains the processing strategy for each WHERE clause Run it for operations that are taking too long Look for table scans, cartesian product joins Provide indexes to speed things up 37

EXPLAIN PLAN Tells you the execution plan an Oracle database follow for a SQL statement Inserts a row describing each step of the execution plan into a specified table Determines total cost of execution 38

39

Beyond EXPLAIN There are many indexing options, other options to control physical characteristics of the database Learn about them, learn how to control them But you will go very far with EXPLAIN and providing indexes 40

THANK YOU! 41