1.

Explaining the Explain Plan

Disclaimer The goal of this session to provide you with a guide for reading SQL execution plans and to help you determine if that plan is what you should be expecting This session will not provide you with sudden enlightenment making you an Optimizer expert or give you the power to tune SQL statements with the flick of your wrist!

Agenda What is an execution plan and how to generate one
What is a good plan for the optimizer Understanding execution plans Cardinality Access paths Join order Join type Partitioning pruning Parallelism Execution plan examples

What is an execution plan and how to generate one
<Insert Picture Here> What is an execution plan and how to generate one

What is an Execution plan?
Execution plans show the detailed steps necessary to execute a SQL statement These steps are expressed as a set of database operators that consumes and produces rows The order of the operators and their implementation is decided by the optimizer using a combination of query transformations and physical optimization techniques The display is commonly shown in a tabular format, but a plan is in fact tree-shaped

What is an Execution plan?
Query SELECT prod_category, avg(amount_sold) FROM sales s, products p WHERE p.prod_id = s.prod_id GROUP BY prod_category; Tabular representation of plan Id Operation Name SELECT STATEMENT 1 HASH GROUP BY 2 HASH JOIN TABLE ACCESS FULL PRODUCTS PARTITION RANGE ALL TABLE ACCESS FULL SALES Tree-shaped representation of plan GROUP BY | JOIN _______|_______ | | TABLE ACCESS TABLE ACCESS PRODUCTS SALES

How to get an Execution Plan
Two methods for looking at the execution plan EXPLAIN PLAN command Displays an execution plan for a SQL statement without actually executing the statement V$SQL_PLAN A dictionary view introduced in Oracle 9i that shows the execution plan for a SQL statement that has been compiled into a cursor in the cursor cache Use DBMS_XPLAN package to display plans Under certain conditions the plan shown with EXPLAIN PLAN can be different from the plan shown using V$SQL_PLAN

Example 1 EXPLAIN PLAN command & dbms_xplan.display function SQL> EXPLAIN PLAN FOR SELECT prod_category, avg(amount_sold) FROM sales s, products p WHERE p.prod_id = s.prod_id GROUP BY prod_category; Explained SQL> SELECT plan_table_output FROM table(dbms_xplan.display('plan_table',null,'basic')); Id Operation Name SELECT STATEMENT 1 HASH GROUP BY 2 HASH JOIN TABLE ACCESS FULL PRODUCTS 4 PARTITION RANGE ALL TABLE ACCESS FULL SALES DBMS_XPLAN.DISPLAY takes three parameters plan table name (default 'PLAN_TABLE'), statement_id (default null), format (default 'TYPICAL')

Example 2 Generate & display execution plan for the last SQL stmts executed in a session SQL>SELECT prod_category, avg(amount_sold) FROM sales s, products p WHERE p.prod_id = s.prod_id GROUP BY prod_category; no rows selected SQL> SELECT plan_table_output FROM table(dbms_xplan.display_cursor(null,null,'basic')); Id Operation Name SELECT STATEMENT 1 HASH GROUP BY 2 HASH JOIN TABLE ACCESS FULL PRODUCTS 4 PARTITION RANGE ALL TABLE ACCESS FULL SALES

Example 3 Displaying execution plan for any other statement from V$SQL_PLAN Directly: SQL> SELECT plan_table_output FROM table(dbms_xplan.display_cursor('fnrtqw9c233tt',null,'basic')); Indirectly: SQL> SELECT plan_table_output FROM v$sql s, TABLE(dbms_xplan.display_cursor(s.sql_id,s.child_number, 'basic')) t WHERE s.sql_text like 'select PROD_CATEGORY%'; Note More information on

DBMS_XPLAN parameters
DBMS_XPLAN.DISPLAY takes 3 parameters plan table name (default 'PLAN_TABLE'), statement_id (default null), format (default 'TYPICAL') DBMS_XPLAN.DISPLAY_CURSOR takes 3 parameters SQL_ID (default last statement executed in this session), Child number (default 0), Format is highly customizable Basic Typical All Additional low level parameters show more detail

<Insert Picture Here> What is a good plan for the optimizer

What’s a Good Plan for the Optimizer?
The Optimizer has two different goals Serial execution: It’s all about cost The cheaper, the better Parallel execution: it’s all about performance The faster, the better Two fundamental questions: What is cost? What is performance?

Cost is an internal Oracle measurement
What is Cost? A magically number the optimizer makes up? Resources required to execute a SQL statement? Result of complex calculations? Estimate of how long it will take to execute a statement? Actual Definition Cost represents units of work or resources used Optimizer uses CPU & memory usage plus IO as units of work Cost is an estimate of the amount of CPU and memory plus the number of disk I/Os, used in performing an operation Cost is an internal Oracle measurement

What is performance? Getting as many queries completed as possible?
Getting fastest possible elapsed time using the fewest resources? Getting the best concurrency rate? Actual Definition Performance is fastest possible response time for query Goal is to complete the query as quickly as possible Optimizer does not focus on resources needed to execute the plan

<Insert Picture Here> Understanding an Execution Plan

SQL Execution Plan When looking at a plan can you determine if the following is correct? Cardinality Are the correct number of rows coming out of each object? Access paths Is the data being accessed in the best way? Scan? Index lookup? Join order Are tables being joined in the correct order to eliminate as much data as early as possible? Join type Are the right join types being used? Partitioning pruning Did I get partition pruning? Is it eliminating enough data? Parallelism

Cardinality What is it? Why should you care?
Estimate of number rows that will be returned Cardinality for a single value predicate = num_rows total / num_distinct total E.g. 100 rows total, 10 distinct values => cardinality=10 rows OR if histogram present num_rows * Density Why should you care? Influences access method and Join Order If estimate is off it can have a huge impact on a plan Density is 1/num_distinct for columns without a histogram For columns with a histogram density is calculated differently What causes Cardinality to be wrong? Data Skews Multiple single column predicates on a table A function wrapped where clause predicate

Cardinality or Selectivity
Cardinality the estimated # of rows returned To determine correct cardinality using a simple SELECT COUNT(*) from each table applying any WHERE Clause predicates belonging to that table

Data Skew Cardinality = num_rows / num_distinct Be careful
If there is a data skew the selectivity could be way off Create a histogram to correct the selectivity calculation Oracle automatically creates a histogram if it suspects a data skew Be careful Histograms have an interesting side effects on statements with binds Less relevant for data warehousing Prior to 11g stmt with binds had only one plan – based on first literal value But presence of a histogram indicate skew unlikely one plan good for all bind values In 11g multiple execution plans allowed for a single statement

Multiple Single Column Predicates
Optimizer always assumes each additional predicate increases the selectivity Selectivity of predicate 1 * selectivity of predicate 2 …etc But real data often shows correlations Job title influences salary, car model influences make How do you tell the Optimizer about the correlation? Extended Optimizer Statistics provides a mechanism to collect statistics on a group of columns Full integration into existing statistics framework Automatically maintained with column statistics Instantaneous and transparent benefit for any migrated application

A function Wrapped Where Clause Predicate
SELECT * FROM customers WHERE lower(country_id) = 'us'; Applying a function to a column means the optimizer does not know how it will effect the cardinality Most likely the optimizer will under-estimate the cardinality Creating extended statistics for this function allows the optimizer to get the correct cardinality exec dbms_stats.gather_table_stats(‘sh’,'customers', method_opt => - 'for all columns size skewonly for columns(lower(country_id))');

Access Paths How to get data out of the table The access path can be:
Full table scan Table access by Rowid Index unique scan Index range scan (descending) Index skip scan Full index scan Fast full index scan Index joins Bitmap indexes Full table reads all rows from a table and filters out those that do not meet the where clause predicates. Does multi block IO. Influenced by Value of init.ora parameter db_multi_block_read_count Parallel degree Lack of indexes Hints Typically selected if no indexes exist or the ones present cant be used Or if the cost is the lowest due to DOP or DBMBRC Rowid of a row specifies the datafile and data block containing the row and the location of the row in that block. Oracle first obtains the rowids either from the WHERE clause or through an index scan of one or more of the table's indexes. Oracle then locates each selected row in the table based on its rowid. With an Index unique scan only one row will be returned. It will be used When a statement contains a UNIQUE or a PRIMARY KEY constraint that guarantees that only a single row is accessed. An index range scan Oracle accesses adjacent index entries and then uses the ROWID values in the index to retrieve the table rows. It can be Bounded or unbounded. Data is returned in the ascending order of index columns. It will be used when a stmt has an equality predicate on non-unique index, or an incompletely specified unique index, or range predicate on unique index. (=, <, >,LIKE if not on leading edge) Uses index range scan descending when an order by descending clause can be satisfied by an index. Normally, in order for an index to be used, the columns defined on the leading edge of the index would be referenced in the query however, If all the other columns are referenced oracle will do an index skip scan to Skip the leading edge of the index and use the rest of it. Advantageous if there are few distinct values in the leading column of the composite index and many distinct values in the non-leading key of the index. A full scan does not read every block in the index structure, contrary to what its name suggests. An index full scan processes all of the leaf blocks of an index, but only enough of the branch blocks to find the first leaf block can be used because all of the columns necessary are in the index And it is cheaper than scanning the table and is used in any of the following situations: An ORDER BY clause has all of the index columns in it and the order is the same as in the index (can contain a subset of the columns in the index). The query requires a sort merge join & all of the columns referenced in the query are in the index. Order of the columns referenced in the query matches the order of the leading index columns. A GROUP BY clause is present in the query, and the columns in the GROUP BY clause are present in the index. A Fast full index scan is an alternative to a full table scan when the index c ontains all the columns that are needed for the query, and at least one column in the index key has the NOT NULL constraint. A fast full scan accesses all of the data in the index itself, without accessing the table. It cannot be used to eliminate a sort operation, because the data is not ordered by the index key. It reads the entire index using multiblock reads, unlike a full index scan, and can be parallelized. An index join is a hash join of several indexes that together contain all the table columns that are referenced in the query. If an index join is used, then no table access is needed, because all the relevant column values can be retrieved from the indexes. An index join cannot be used to eliminate a sort operation. A bitmap join uses a bitmap for key values and a mapping function that converts each bit position to a rowid. Bitmaps can efficiently merge indexes that correspond to several conditions in a WHERE clause, using Boolean operations to resolve AND and OR conditions.

Access Path Look in Operation session to see how obj is being accessed If you know the wrong access method is being used check cardinality, join order…

Access Path examples A table countries contains 10K rows & has a primary key on country_id – What plan would you expect for these queries? Select country_id, name from countries where country_id in ('AU','FR','IE‘); Select country_id, name from countries where country_id between 'AU' and 'IE'; Select country_id, name from countries where name='USA';

Join Type A Join retrieve data from more than one table
Possible join types are Nested Loops joins Hash Joins Partition Wise Joins Sort Merge joins Cartesian Joins Outer Joins Nested loop joins are useful when small subsets of data are being joined and if the join condition is an efficient way of accessing the second table (index look up), That is the second table is dependent on the outer table (foreign key). For every row in the outer table, Oracle accesses all the rows in the inner table. Consider it Like two embedded for loops. Hash joins are used for joining large data sets. The optimizer uses the smaller of two tables or data sources to build a hash table on the join key in memory. It then scans the larger table, probing the hash table to find the joined rows. Hash joins selected If an equality predicate is present Partition wise join <see next two slides> Sort merge joins are useful when the join condition between two tables is an inequality condition (but not a nonequality) like <, <=, >, or >=. Sort merge joins perform better than nested loop joins for large data sets. The join consists of two steps: Sort join operation: Both the inputs are sorted on the join key. Merge join operation: The sorted lists are merged together. A Cartesian join is used when one or more of the tables does not have any join conditions to any other tables in the statement. The optimizer joins every row from one data source with every row from the other data source, creating the Cartesian product of the two sets. Only good if the tables involved are Small. Can be a sign of problems with cardinality. An outer join returns all rows that satisfy the join condition and also returns some or all of those rows from the table without the (+) for which no rows from the other satisfy the join condition. Take query: Select * from customers c, orders o WHERE c.credit_limit > 1000 AND c.customer_id = o.customer_id(+) The join preserves the customers rows, including those rows without a corresponding row in orders

Join Type Example 1 What Join type should be use for this Query?
SELECT e.name, e.salary, d.dept_name FROM hr.employees e, hr.departments d WHERE d.dept_name IN ('Marketing‘,'Sales') AND e.department_id=d.department_id; Employees has 107 rows Departments has 27 rows Foreign key relationship between Employees and Departments on dept_id

SELECT o.customer_id, l.unit_price * l.quantity FROM oe.orders o ,oe.order_items l WHERE l.order_id = o.order_id; Orders has 105 rows Order Items has 665 rows

SELECT o.order_id,0.order_date,e.name FROM oe.orders o , hr.employees e; Orders has 105 rows Employees has 107 rows

SELECT d.department_id,e.emp_id FROM hr.employees e FULL OUTER JOIN hr.departments d ON e.department_id = d.department_id; Employees has 107 rows Departments has 27 rows Foreign key relationship between Employees and Departments on dept_id A full outer join acts like a combination of the left and right outer joins. In addition to the inner join, rows from both tables that have not been returned in the result of the inner join are preserved and extended with nulls. In other words, full outer joins let you join tables together, yet still show rows that do not have corresponding rows in the joined tables.

Join Type Look in the Operation section to check the right join type is used If the wrong join type is used go back and check the stmt is written correctly and the cardinality estimates are accurate

Join Orders Some basic rules
The order in which the tables are join in a multi table stmt Ideally start with the table that will eliminate the most rows Strongly effected by the access paths available Some basic rules Joins that definitely results in at most one row always go first When outer joins are used the table with the outer join operator must come after the other table in the predicate If view merging is not possible all tables in the view will be joined before joining to the tables outside the view

Join order 1 Want to start with the table that reduce the result set the most 2 3 If the join order is not correct, check the statistics, cardinality & access methods

Q: What was the total sales for the weekend of May 20 - 22 2008?
Partition Pruning Q: What was the total sales for the weekend of May ? Sales Table May 22nd 2008 May 23rd 2008 May 24th 2008 May 18th 2008 May 19th 2008 May 20th 2008 May 21st 2008 Select sum(sales_amount) From SALES Where sales_date between to_date(‘05/20/2008’,’MM/DD/YYYY’) And to_date(‘05/23/2008’,’MM/DD/YYYY’); Only the 3 relevant partitions are accessed

Partition Pruning Pstart and Pstop list the partition touched by the query If you see the word ‘KEY’ listed it means the partitions touched will be decided at Run Time

Bloom Filter DFO DFO DFO Hash Join Receive Filter Create Receive Set
7. Hash Join: Consumers will complete the hash join by probing into the hash table from the time time to find actual matching rows Shared Bloom filter Set 2. Bloom Filter create: Consumer set creates a hash table and a BIT VECTOR. Bit vector sets a bit for each row that matches the search conditions DFO Hash Join 6. Reduced row sent: Only rows that have a match in the bit vector get sent to the consumers Filter Create Receive 4. Bloom Filter send: BIT VECTOR is sent as an additional filter criteria to the scan of the sales table 1. Table scan: Time table is scanned and sent Receive DFO DFO Send Send Test 5. Bloom Filter apply: Join column is hashed and compared to BIT VECTOR Filter Use Scan Sales Scan Time 3. Table Scan: Sales table is scan and rows are filtered based on query predicates

Parallelism Goal is to execute all aspects of the plan in parallel
Identify if one or more sets of parallel server processes are used Producers and Consumers Identify if any part of the plan is running serial

Parallelism IN-OUT column shows which step is run in parallel and if it is a single parallel server set or not If you see any lines beginning with the letter S you are running Serial check DOP for each table & index used

Identifying Granules of Parallelism during scans in the plan
Data is Partitioned into Granules either block range Partition Each parallel server is allocated one or more granules The granule method is specified on line above the scan in the operation section

Identifying Granules of Parallelism during scans in the plan

Access Paths and how they are parallelized
Parallelization method Full table scan Block Iterator Table accessed by Rowid Partition Index unique scan Index range scan (descending) Index skip scan Full index scan Fast full index scan Bitmap indexes (in Star Transformation)

Parallel Distribution
Necessary when producers & consumers sets are used Producers must pass or distribute their data into consumers Operator into which the rows flow decides the distribution Distribution can be local or across other nodes in RAC Five common types of redistribution

HASH Assumes one of the tables is hash partitioned Hash function applied to value of the join column Distribute to the consumer working on the corresponding hash partition Broadcast The size of one of the result sets is small Sends a copy of the data to all consumers Range Typically used for parallel sort operations Individual parallel servers work on data ranges QC doesn’t have to sort just present the parallel server results in the correct order Partitioning Key Distribution – PART (KEY) Assumes that the target table is partitioned Partitions of the target tables are mapped to the parallel servers Producers will map each scanned row to a consumer based on the partitioning column Round Robin Randomly but evenly distributes the data among the consumers

Shows how the PQ servers distribute rows between each other

<Insert Picture Here> Example of reading a plan

Example SQL Statement and Block Diagram
SELECT '(' || pcode || ')' || pcode_desc AS PRODUCT, CNT FROM (SELECT a.pcode, b.pcode_desc, count(a.pcode) CNT FROM BMG.t_acct_master_hd a ,BMG.hogan_pcode_hd_ref b ,BMG.t_tran_detail_hd c WHERE a.pcode = b.pcode AND a.acct_num=c.acct_num AND a.co_id=c.co_id AND c.asof_yyyymm=200102 AND c.tran_amt < GROUP BY a.pcode , b.pcode_desc ORDER BY a.pcode , b.pcode_desc ) T_ACCT_MASTER_HD T_TRAN_DETAIL_HD ACCT_NUMCO_ID HOGAN_PCODE_HD_REF PCODE Multiple Terabytes 1 Gigabyte in size

Example Cont’d Execution plan
1. Check the rows returned is approx correct 3.Are the access method correct? 2. Are the cardinality estimates correct? Means no stats gathered strong indicate this won’t be best possible plan

5. Are the correct join methods used? 1 2 3 6. Is the join order correct? Is the table that eliminates the most rows accessed first? 4. Has partition pruning happen?

8. Check the distribution method and make sure we are not broadcasting a large table? 7. Check all aspects of the plan are executing in parallel

Example Cont’d Execution plan - Solution
1. Only 1 row is actually returned and the cost is 4 X lower now 8. Row distribution is now all hash 7. All aspects of the plan are executing in parallel 5. Join types are still hash joins but now a PWJ 2. Cardinalities are correct and with each join number of rows reduced 4. Partition pruning One range partition 4 hash partitions 1 2 3 6. The join order has changed - PWJ them join hash to look-up table 3. Access methods remains the same

Determining if you get the right plan
Query SELECT quantity_sold FROM sales s, customers c WHERE s.cust_id =c.cust_id;ID What do you expect the plan to look like for this statement?S NOT NULL) Explanation Join to customers is redundant as no columns are selected Presence of primary –foreign key relationship means we can remove table

1.

Similar presentations

Presentation on theme: "1."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1.

Similar presentations

Presentation on theme: "1."— Presentation transcript:

Similar presentations

About project

Feedback