Presentation is loading. Please wait.

Presentation is loading. Please wait.

Relational Database Performance CSCI 6442 Copyright 2013, David C. Roberts, all rights reserved.

Similar presentations


Presentation on theme: "Relational Database Performance CSCI 6442 Copyright 2013, David C. Roberts, all rights reserved."— Presentation transcript:

1 Relational Database Performance CSCI 6442 Copyright 2013, David C. Roberts, all rights reserved

2 “Optimization” More properly called access path selection “Optimizer” selects a strategy for processing Approaches: ◦ Cost-based: estimate total cost to process by different approaches, choose lowest estimate ◦ Heuristic: use rules to decid e how to process Cost-based is typically used by all database systems today 2

3 RUNSTATS RUNSTATS is the name of a statistics- gathering utility first included in IBM’s DB2 It scans the database, gathers statistics used for estimating costs for access path selection DBA determines how often and when to run the utility What statistics do you think are gathered? 3

4 Quandary The more that RUNSTATS collects, the better job the optimizer can do of selecting efficient processing methods However, RUNSTATS uses a lot of resources, scanning every relation Use of RUNSTATS must be balanced against its cost 4

5 The Optimizer Selects which indexes to use Chooses the order of using indexes Chooses algorithms to use Decides when to apply predicates 5

6 SQL Statement Parts of Interest Simple query: SELECT ENAME, JOB FROM EMP WHERE SAL > 20 OR JOB = ‘VP’ OR JOB LIKE ‘PRES%’; We’re interested in the FROM clause (that tells us the table names) and the WHERE clause (that tells us the predicates) 6

7 SINGLE-TABLE QUERIES 7

8 Predicates WHERE clause of SQL statement is made up of predicates Each predicate is a condition Each condition references a column Conditions may be equality, inequality, range, LIKE The first three conditions use an index if one exists, scan the table if no index exists 8

9 Example SAL > 20 OR JOB = ‘VP’ OR JOB LIKE ‘PRES%’; For each predicate, do we use an index to retrieve rows that make it true, then examine each row for the other predicates? 9

10 Predicate Selectivity Selectivity: an estimate of the fraction of rows of a table that make a predicate true 10

11 Classes of Predicates Predicate: condition in the WHERE clause Predicates are combined using AND, OR to make WHERE clauses Classes of predicates: ◦ Sargable: search arguments that can be processed close to the data ◦ Residual: not sargable, such as complex use of nesting 11

12 Access Paths Five possible access paths: ◦ Table scan ◦ Non-selective index scan ◦ Selective index scan ◦ Index only access ◦ Fully qualified unique index Each of these types of scans has different cost estimates for its use 12

13 Predicate Selectivity Selectivity function f(p): % of rows retrieved on average by predicate p Number of rows retrieved is strongly related to the cost to carry out the operation n = number of rows in table 13 Form of Pf column = value1/n column != value1-1/n (nearly 1) column > value(high value - search value)/(high value - low value) Column LIKE ‘value’n p1 or p2f(p1) + f(p2) p1 and p2f(p1) * f(p2)

14 Single-Table Queries Find out which columns that are referenced in the WHERE clause have indexes Find out selectivity of indexes Estimate selectivity of each predicate Use most selective index-predicate combination to retrieve rows that satisfy one predicate Examine each row for other predicates 14

15 MULTI-TABLE QUERIES (I.E., JOINS) 15

16 Join Result of a join is a subset of the Cartesian product of the tables being joined Cartesian product of two tables with m and n rows is a new table of mn rows, where every row of the join consists of one row of the first table and one row of the second table 16

17 Example Join 17

18 Simple Join Processing Algorithm 1. Form the Cartesian product of all tables involved in the join 2. Scan rows of the Cartesian product, testing each against all of the predicates 3. Eliminate rows of the Cartesian product that don’t meet the predicates What’s wrong with this picture? Think about two tables of 1 million rows. Cartesian product would be 1 thousand billion rows! 18

19 Joining More than 2 Relations A join of more than two relations is processed 2 relations at a time Part of access path planning is to select that sequence We will talk about algorithms for joining 2 tables and then choosing the order of processing a multi-table join 19

20 Joins An equijoin is based on equality of an attribute of each of two relations Outer join includes all rows of both tables even if some rows did not have a matching value A semi-join can be based on inequalities as the relationship 20

21 Join-Processing Algorithms Nested loop join ◦ Each tuple of outer relation is compared to all rows of the inner relation Sort-merge join Hash-based join 21

22 Nested-Loop Join The algorithm: For efficiency, the relation with higher cardinality (R) is chosen as the inner relation Number of operations: N R + N R * N S What if there is an index? 22

23 Nested-Loop 23

24 Join Order For JOIN queries, the “outer” table is access first, “inner” second Order for joining tables must be selected Most selective first Least costly joins first 24

25 Merge Join First, each relation is sorted on the join attribute Then both relations are scanned in the order of the join attributes Tuples that satisfy the join predicate are concatenated and placed in the output relation Number of operations: N R +N S (after the sort!) What is there is an index on R or S or both? 25

26 Merge Join Algorithm 26

27 Merge Join 27

28 Hash-Join The joins we have looked at compare tuples in the first relation with tuples in the second relation that cannot possibly be part of the join The goal of the hash join is to compare only those tuples that might be part of the join Hashing is used to identify those tuples There are many variations of hash-join 28

29 Simple Hash-Join Algorithm 29

30 Hash-Join Performance Performance of hash-join can be superior to other join algorithm Performance depends on the hashing algorithm (although note Lum’s research) Perfect hashing algorithm could find match or non-match with a single probe With hash table in RAM, processing would be very fast 30

31 Indexes Impact of a b-tree index on performance of these algorithms is obvious But the index must be maintained itself When an attribute that’s indexed in changed in a relation, the value in the index must also be changed And note that the changes must be synchronized (and locked together) 31

32 Order of Processing Joins Typically, all combinations of order of processing are considered and a cost developed for each Selectivity of predicates, selectivity of indexes, cardinality of relations all are factors in cost analysis Goal is to minimize number of intermediate results produced during processing Usually, low selectivity values are processed first (that is, highest selectivity) 32

33 Summary Single-table queries Multi-table queries ◦ Nested loop ◦ Sort-merge ◦ Hash Order of processing joins 33

34 But Note: We have left out a LOT Relations may be partitioned and joins processed by partition Many other parts of the DBMS affect performance If you are responsible for database performance, buy a book and dig in Remember not to give up on normalization to get performance 34

35 And Now: What You Do for Performance 35

36 How to Start First, don’t even consider denormalization You have many tools to get the performance you need without ruining the data model (and the applications) Performance test the applications Look for SQL operations that are taking a long time 36

37 EXPLAIN IBM invented the EXPLAIN utility; it explains the processing strategy for each WHERE clause Run it for operations that are taking too long Look for table scans, cartesian product joins Provide indexes to speed things up 37

38 EXPLAIN PLAN Tells you the execution plan an Oracle database follow for a SQL statement Inserts a row describing each step of the execution plan into a specified table Determines total cost of execution 38

39 39

40 Beyond EXPLAIN There are many indexing options, other options to control physical characteristics of the database Learn about them, learn how to control them But you will go very far with EXPLAIN and providing indexes 40

41 THANK YOU! 41


Download ppt "Relational Database Performance CSCI 6442 Copyright 2013, David C. Roberts, all rights reserved."

Similar presentations


Ads by Google