Tampa Bay Relational Users Group

Slides:



Advertisements
Similar presentations
Youre Smarter than a Database Overcoming the optimizers bad cardinality estimates.
Advertisements

Tuning: overview Rewrite SQL (Leccotech)Leccotech Create Index Redefine Main memory structures (SGA in Oracle) Change the Block Size Materialized Views,
Query Optimization Reserves Sailors sid=sid bid=100 rating > 5 sname (Simple Nested Loops) Imperative query execution plan: SELECT S.sname FROM Reserves.
© IBM Corporation Informix Chat with the Labs John F. Miller III Unlocking the Mysteries Behind Update Statistics STSM.
Informed Search Methods How can we improve searching strategy by using intelligence? Map example: Heuristic: Expand those nodes closest in “as the crow.
Query Optimization CS634 Lecture 12, Mar 12, 2014 Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
SQL Performance 2011/12 Joe Chang, SolidQ
Slide: 1 Presentation Title Presentation Sub-Title Copyright 2010 Robert Haas, EnterpriseDB Corporation. Creative Commons 3.0 Attribution. The PostgreSQL.
Query Evaluation. An SQL query and its RA equiv. Employees (sin INT, ename VARCHAR(20), rating INT, age REAL) Maintenances (sin INT, planeId INT, day.
Query Evaluation. SQL to ERA SQL queries are translated into extended relational algebra. Query evaluation plans are represented as trees of relational.
1 Relational Query Optimization Module 5, Lecture 2.
Paper by: A. Balmin, T. Eliaz, J. Hornibrook, L. Lim, G. M. Lohman, D. Simmen, M. Wang, C. Zhang Slides and Presentation By: Justin Weaver.
Virtual techdays INDIA │ 9-11 February 2011 SQL 2008 Query Tuning Praveen Srivatsa │ Principal SME – StudyDesk91 │ Director, AsthraSoft Consulting │ Microsoft.
Query Optimization 3 Cost Estimation R&G, Chapters 12, 13, 14 Lecture 15.
Access Path Selection in a Relation Database Management System (summarized in section 2)
Relational Database Performance CSCI 6442 Copyright 2013, David C. Roberts, all rights reserved.
Query Optimization, part 2 CS634 Lecture 13, Mar Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
Oracle Database Administration Lecture 6 Indexes, Optimizer, Hints.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Query Evaluation Chapter 12: Overview.
Physical Database Design & Performance. Optimizing for Query Performance For DBs with high retrieval traffic as compared to maintenance traffic, optimizing.
DB2 for z/OS Query Optimization  IBM Corporation  IBM Corporation 2003 Tampa Bay Relational Users Group IBM Silicon Valley Lab, U.S.A.
Access Path Selection in a Relational Database Management System Selinger et al.
Module 7 Reading SQL Server® 2008 R2 Execution Plans.
Query Optimization. overview Histograms A histogram is a data structure maintained by a DBMS to approximate a data distribution Equiwidth vs equidepth.
Ashwani Roy Understanding Graphical Execution Plans Level 200.
Module 5 Planning for SQL Server® 2008 R2 Indexing.
Copyright © Curt Hill Query Evaluation Translating a query into action.
1 Chapter 10 Joins and Subqueries. 2 Joins & Subqueries Joins – Methods to combine data from multiple tables – Optimizer information can be limited based.
Module 4 Database SQL Tuning Section 3 Application Performance.
© IBM Corporation 2005 Informix User Forum 2005 John F. Miller III Explaining SQLEXPLAIN ®
Query Optimizer (Chapter ). Optimization Minimizes uses of resources by choosing best set of alternative query access plans considers I/O cost,
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Introduction to Query Optimization Chapter 13.
Query Processing – Query Trees. Evaluation of SQL Conceptual order of evaluation – Cartesian product of all tables in from clause – Rows not satisfying.
Chapter 5 Index and Clustering
Query Processing – Implementing Set Operations and Joins Chap. 19.
1 Overview of Query Evaluation Chapter Outline  Query Optimization Overview  Algorithm for Relational Operations.
Diving into Query Execution Plans ED POLLACK AUTOTASK CORPORATION DATABASE OPTIMIZATION ENGINEER.
SQL IMPLEMENTATION & ADMINISTRATION Indexing & Views.
The PostgreSQL Query Planner Robert Haas PostgreSQL East 2010.
Tuning Oracle SQL The Basics of Efficient SQL Common Sense Indexing
Practical Database Design and Tuning
Tuning Transact-SQL Queries
Choosing Access Path The basic methods.
Teradata Join Processing
Hash-Based Indexes Chapter 11
Chapter 12: Query Processing
Database Performance Tuning and Query Optimization
Introduction to Query Optimization
COST ESTIMATION FOR THE RELATIONAL ALGEBRA OPERATIONS MIT 813 GROUP 15 PRESENTATION.
Database Management Systems (CS 564)
Evaluation of Relational Operations: Other Operations
Examples of Physical Query Plan Alternatives
Physical Join Operators
File Processing : Query Processing
Lecture 12 Lecture 12: Indexing.
JULIE McLAIN-HARPER LINKEDIN: JM HARPER
Execution Plans Demystified
Hash-Based Indexes Chapter 10
Lecture 2- Query Processing (continued)
Overview of Query Evaluation
Implementation of Relational Operations
Chapter 11 Database Performance Tuning and Query Optimization
Diving into Query Execution Plans
A – Pre Join Indexes.
Introduction to Execution Plans
Introduction to Execution Plans
Evaluation of Relational Operations: Other Techniques
Query Transformations
All about Indexes Gail Shaw.
Join Implementation How is it done? Copyright © Curt Hill.
Presentation transcript:

Tampa Bay Relational Users Group Query diagnosis IBM Silicon Valley Lab, U.S.A. Ó IBM Corporation 2003

Query analysis and tuning Format the SQL statement Prepare the statement for human tuning Separate sections for: SELECT list FROM clause WHERE clause … Tools support Data Studio fixpack 2.2.0.1 includes SQL formatting Show transformed SQL text

Sample unformatted query EXPLAIN PLAN SET QUERYNO = 1 FOR SELECT DISTINCT ITEM.ITEM_NBR AS ITEM_NBR, ITEM.PRDT_ID, STOREITEM.WK_STRT_DT AS WK_STRT_DT ,STOREITEM.DC_ID AS DC_ID FROM PROD.TIPA004_STITM_PROJ AS STOREITEM , PROD.TITM001_ITEM AS ITEM WHERE ITEM.BUS_UNIT_ID = ‘GS‘ AND ITEM.BUS_UNIT_ID = STOREITEM.BUS_UNIT_ID AND ITEM.MJR_CATG_ID = '00754‘ AND ITEM.INTMD_CATG_ID = '00043‘ AND ITEM.ITEM_NBR = STOREITEM.ITEM_NBR AND ITEM.MJR_CATG_ID = STOREITEM.MJR_CATG_ID AND ITEM.INTMD_CATG_ID = STOREITEM.INTMD_CATG_ID AND STOREITEM.RTL_DEPT_NBR = 1 AND AD_ITEM_FLG = 'Y‘ AND WK_STRT_DT = '2002-02-08'; Unformatted SQL, where to start?

Formatted EXPLAIN PLAN SET QUERYNO = 1 FOR SELECT DISTINCT ITEM.ITEM_NBR AS ITEM_NBR, ITEM.PRDT_ID, STOREITEM.WK_STRT_DT AS WK_STRT_DT ,STOREITEM.DC_ID AS DC_ID FROM PROD.TIPA004_STITM_PROJ AS STOREITEM ,PROD.TITM001_ITEM AS ITEM WHERE ITEM.BUS_UNIT_ID = STOREITEM.BUS_UNIT_ID AND ITEM.MJR_CATG_ID = STOREITEM.MJR_CATG_ID AND ITEM.INTMD_CATG_ID = STOREITEM.INTMD_CATG_ID AND ITEM.ITEM_NBR = STOREITEM.ITEM_NBR AND ITEM.BUS_UNIT_ID = ‘GS‘ AND ITEM.MJR_CATG_ID = '00754‘ AND ITEM.INTMD_CATG_ID = '00043‘ AND STOREITEM.AD_ITEM_FLG = 'Y‘ AND STOREITEM.RTL_DEPT_NBR = 1 AND STOREITEM.WK_STRT_DT = '2002-02-08';

Analyzing query Observe “interesting predicates” Optimizer may produce inaccurate filter factor estimate Range predicates with parameter markers Predicates using interesting literals Probable defaults Complex predicates Complex OR expressions Negation predicates Column expressions Non-column expressions

Sample query Pat’s diagnosis

Query breakdown SELECT … FROM SETL_TRANS S ,BRANCH CUST ,BRANCH_ADDR A WHERE S.ADV_ABA_R = ? AND S.PROCESS_DT < '9999-12-31‘ AND S.TYPE_CD IN ('A', ‘C’, ‘X’) AND S.CLR_CYCLE_CD IN ('EOD', 'IMD‘, ‘OPN’) AND S.STLMT_DT = ? AND S.ACCT_NUM = CUST.ACCT_NUM AND CUST.CUST_EFCT_DT <= ? AND CUST.CUST_INACTV_DT > ? AND A.ACCT_NUM = CUST.ACCT_NUM AND A.CUST_EFCT_DT <= ? AND A.CUST_INACTV_DT > ? AND A.ADDR_TYP_CD = ' '

Identify peculiar predicates SELECT … FROM SETL_TRANS S ,BRANCH CUST ,BRANCH_ADDR A WHERE S.ADV_ABA_R = ? AND S.PROCESS_DT < ‘9999-12-31’  MAX DATE AND S.TYPE_CD IN ('A', 'C', ‘X‘, ‘Z’) AND S.CLR_CYCLE_CD IN ('EOD', 'IMD‘, ‘OPN’) AND S.STLMT_DT = ? AND S.ACCT_NUM = CUST.ACCT_NUM AND CUST.CUST_EFCT_DT <= ?  Range with marker AND CUST.CUST_INACTV_DT > ?  Range with marker AND A.ACCT_NUM = CUST.ACCT_NUM AND A.CUST_EFCT_DT <= ?  Range with marker AND A.CUST_INACTV_DT > ?  Range with marker AND A.ADDR_TYP_CD = ' ‘  COL = blank

Why are they peculiar? Predicates with typical default often skewed. AND S.PROCESS_DT < ‘9999-12-31’  MAX DATE AND A.ADDR_TYP_CD = ' ‘  COL = blank Range predicates with parameter markers - Impossible to estimate without literal AND CUST.CUST_EFCT_DT <= ?  Range with marker AND CUST.CUST_INACTV_DT > ?  Range with marker AND A.CUST_EFCT_DT <= ?  Range with marker AND A.CUST_INACTV_DT > ?  Range with marker

Range predicate interpolation Table 104. Default filter factors for interpolation Note: Op is one of these operators: <, <=, >, >=. COMMENT: This is DB2’s documented guess for an impossible to estimate Filter factor. COLCARDF Filter factor for OP Filter Factor for LIKE / BETWEEN >= 100,000,000 1 / 10,000 3 / 100,000 >= 10,000,000 1 / 3,000 >= 1,000,000 1 / 1,000 3 / 10,000 >= 100,000 1 / 300 >= 10,000 1 / 100 3 / 1,000 >= 1,000 1 / 30 >= 100 1 / 10 3 / 100 >= 2 1 / 3 = 1 1 >= 0

Analyzing query Embed information within statement Table information CARDF NPAGES Column information for predicates Local predicates Join predicates Observe where the filtering is Selectivity of a predicate is relative to table cardinality Investigate “suspicious” predicates Determine actual versus estimated filtering If there is a problem, identify options

Embed statistics SELECT … FROM SETL_TRANS S CARDF 1,600,254 NPAGES 21,627 ,BRANCH CUST CARDF 31,696 NPAGES 1132 ,BRANCH_ADDR A CARDF 58,627 NPAGES 2791 WHERE S.ADV_ABA_R = ? COLCARDF 19,712 AND S.PROCESS_DT < ‘9999-12-31’ COLCARDF 11 LOW2KEY 2004-03-24 HIGH2KEY 2004-04-05 AND S.TYPE_CD IN ('A', 'C', ‘X‘, ‘Z’) COLCARDF 4 AND S.CLR_CYCLE_CD IN ('EOD', 'IMD', ‘OPN') COLCARDF 3 AND S.STLMT_DT = ? COLCARDF 13 AND S.ACCT_NUM = CUST.ACCT_NUM COLCARDF 15360 / 26,527 AND CUST.CUST_EFCT_DT <= ? COLCARDF 2,496 LOW2KEY 1994-09-02 HIGH2KEY 2004-04-06 AND CUST.CUST_INACTV_DT > ? COLCARDF 279 LOW2KEY 2004-03-04 HIGH2KEY 2004-04-07 AND A.ACCT_NUM = CUST.ACCT_NUM COLCARDF 26,527 / 26,527 AND A.CUST_EFCT_DT <= ? COLCARDF 2,496 AND A.CUST_INACTV_DT > ? COLCARDF 274 LOW2KEY ‘2004-03-04’ HIGH2KEY ‘2004-04-07’ AND A.ADDR_TYP_CD = ‘ ‘ COLCARDF 5

Suspicious predicate analysis 1) The first range predicate, we’re looking for all values less than ‘9999-12-31. So the predicate searches for all values less than a number significantly greater Than the HIGH2KEY – so basically, all of the rows qualify here. (since the optimizer has the literal value, it KNOWS that all rows qualify). 2) For the column = blank predicate, I don’t believe a skew search was ever done. You could look to see how many values are blank. Is it > 20%? 1/5 = 20%. 1) AND S.PROCESS_DT < '9999-12-31‘ COLCARDF 11 LOW2KEY 2004-03-24 HIGH2KEY 2004-04-05 2) AND A.ADDR_TYP_CD = ' ‘ COLCARDF 5 Conclusion: First predicate is should not be causing this SQL statement any Problems.

Suspicious predicate analysis The literal value used for each of the parameter markers in this case happened To be the same, and the value was 2004-04-06. Comparing the literal value to the HIGH2KEY and what range would qualify Is how I determined the ESTIMATED FF WITH LITERAL. The ESTIMATED FF WITH MARKER is from the chart in the Admin guide. The “error” is how different the optimizers DEFAULT estimate is from ACTUAL filtering. 3) AND CUST.CUST_EFCT_DT <= ? COLCARDF 2,496 LOW2KEY 1994-09-02 HIGH2KEY 2004-04-06 ESTIMATED FF WITH LITERAL: = 100% ESTIMATE WITH MARKER: 1/30 = 3% ( 97% error ) 4) AND CUST.CUST_INACTV_DT > ? COLCARDF 279 LOW2KEY 2004-03-04 HIGH2KEY 2004-04-07 ESTIMATED FF WITH LITERAL: = 99% ESTIMATE WITH MARKER: 1/10 = 10% ( 89% error ) 5) AND A.CUST_EFCT_DT <= ? COLCARDF 2,496 6) AND A.CUST_INACTV_DT > ? COLCARDF 274 LOW2KEY ‘2004-03-04’ HIGH2KEY ‘2004-04-07’

Suspicious predicate analysis Conclusion The range predicates with parameter markers introduce significant filter factor error. So we should recognize that this filter factor error can cause significant cost estimation problems for the optimizer – possibly resulting in poor access path choice.

Where’s the filtering? WHERE S.ADV_ABA_R = ? COLCARDF 19,712 (Very selective predicate) AND S.PROCESS_DT < ‘9999-12-31’ COLCARDF 11 (This predicate doesn’t filter anything, known from suspicious predicate analysis) AND S.TYPE_CD IN ('A', 'C', ‘X', ‘Z') COLCARDF 4 (In-list looking for 4 values, COLCARDF 4 – not filtering) AND S.CLR_CYCLE_CD IN ('EOD', 'IMD', ‘OPN') COLCARDF 3 (In-list looking for 3 values, COLCARDF 3 – not filtering) AND S.STLMT_DT = ? COLCARDF 13 (COL = LIT, COLCARDF 13 – somewhat filtering, but not great selectivity) AND S.ACCT_NUM = CUST.ACCT_NUM COLCARDF 15360 / 26,527 (For the range predicates, we know that optimizer PERCIEVES them to be selective but In reality, they are not. This was determined during suspicious predicate analysis) AND CUST.CUST_EFCT_DT <= ? COLCARDF 2,496 AND CUST.CUST_INACTV_DT > ? COLCARDF 279 AND A.ACCT_NUM = CUST.ACCT_NUM COLCARDF 26,527 / 26,527 AND A.CUST_EFCT_DT <= ? COLCARDF 2,496 AND A.CUST_INACTV_DT > ? COLCARDF 274 AND A.ADDR_TYP_CD = ‘ ‘ COLCARDF 5 (COL = blank. Probably this column is skewed on blank. COLCARDF 5, not typically Very filtering)

Where’s the filtering? Most selective by far SELECT … FROM SETL_TRANS S CARDF 1,600,254 NPAGES 21,627 ,BRANCH CUST CARDF 31,696 NPAGES 1132 ,BRANCH_ADDR A CARDF 58,627 NPAGES 2791 WHERE S.ADV_ABA_R = ? COLCARDF 19,712 AND S.PROCESS_DT < ‘9999-12-31’ COLCARDF 11 LOW2KEY 2004-03-24 HIGH2KEY 2004-04-05 AND S.TYPE_CD IN ('A', 'C', ‘X', ‘Z') COLCARDF 4 AND S.CLR_CYCLE_CD IN ('EOD', 'IMD', ‘OPN') COLCARDF 3 AND S.STLMT_DT = ? COLCARDF 13 AND S.ACCT_NUM = CUST.ACCT_NUM COLCARDF 15360 / 26,527 AND CUST.CUST_EFCT_DT <= ? COLCARDF 2,496 LOW2KEY 1994-09-02 HIGH2KEY 2004-04-06 AND CUST.CUST_INACTV_DT > ? COLCARDF 279 LOW2KEY 2004-03-04 HIGH2KEY 2004-04-07 AND A.ACCT_NUM = CUST.ACCT_NUM COLCARDF 26,527 / 26,527 AND A.CUST_EFCT_DT <= ? COLCARDF 2,496 AND A.CUST_INACTV_DT > ? COLCARDF 274 LOW2KEY ‘2004-03-04’ HIGH2KEY ‘2004-04-07’ AND A.ADDR_TYP_CD = ‘ ‘ COLCARDF 5 Most selective by far

Index analysis One significant input to the optimizer is… Available indexes What join sequence they encourage Some index performance considerations Provide efficient access for local predicates Encourages table to be outer table Provide efficient access for join predicates Encourage access to table as INNER table of join Provide ordering to avoid sort Analysis: Are there appropriate indexes to support this query?

Identify indexes Table: SETL_TRANS INDEX IXSTRN01 (PROCESS_DT, CLR_CYCLE_CD, ADV_ABA_R, TYPE_CD, ACCT_NUM, STLMT_DT) TABLE: BRANCH INDEX: IXBRNC01 (CUST_INACTV_DT, CUST_EFCT_DT) INDEX: IXBRNC02 (ACCT_NUM, CUST_EFCT_DT) TABLE: BRANCH_ADDR INDEX: IXBRAD01 INDEX: IXBRAD02 (ACCT_NUM, ADDR_TYP_CD, CUST_EFCT_DT)

Index candidate usage Key: RED = Range predicate, stops matching Table: AJT_SETL_TRANS INDEX IXSTRN01 (PROCESS_DT, CLR_CYCLE_CD, ADV_ABA_R, TYPE_CD, ACCT_NUM, STLMT_DT) TABLE: BRANCH INDEX: IXBRNC01 (CUST_INACTV_DT, CUST_EFCT_DT) INDEX IXBRNC02 (ACCT_NUM, CUST_EFCT_DT) TABLE: BRANCH_ADDR INDEX: IXBRAD01 INDEX: IXBRAD02 (ACCT_NUM, ADDR_TYP_CD, CUST_EFCT_DT) Key: RED = Range predicate, stops matching BLUE: Join predicate GREEN: Local equals predicate / in-list

Index design analysis (by table) BRANCH table (Index design OK!) Index IXBRNC02 supports local access CONCERN: Predicate on this column has filter factor grossly overestimated, so optimizer will perceive the access to be more efficient to this table than what really occurs! Index IXBRNC01 supports join access BRANCH_ADDR table (Index design OK!) Index IXBRAD01 leading column on local filtering Predicate on this column has filter factor grossly over estimated Allows table to be considered as inner table efficiently Index IXBRAD02 leading column supports join Allows table to be an efficient inner table

Index design analysis (by table) SETL_TRANS table (Not OK!) INDEX IXSTRN01 has one index. No efficient for join join predicate needs to be leading col) No efficient index for outer access Leading column of index qualifies ALL rows

Biggest table, worst index Overlay table size Table: SETL_TRANS CARDF 1,600,254 NPAGES 21,627 INDEX IXSTRN01 (PROCESS_DT, CLR_CYCLE_CD, ADV_ABA_R, TYPE_CD, ACCT_NUM, STLMT_DT) TABLE: BRANCH CARDF 31,696 NPAGES 1132 INDEX: IXBRNC02 (CUST_INACTV_DT, CUST_EFCT_DT) INDEX: IXBRNC01 (ACCT_NUM, CUST_EFCT_DT) TABLE: BRANCH_ADDR CARDF 58,627 NPAGES 2791 INDEX: IXBRAD01 INDEX: IXBRAD02 (ACCT_NUM, ADDR_TYP_CD, CUST_EFCT_DT) Key: RED = Range predicate, stops matching BLUE: Join predicate GREEN: Local equals predicate / in-list Biggest table, worst index Options. Must scan 1.6 million rows!

Possible new indexes Existing index IXSTRN01 (PROCESS_DT, CLR_CYCLE_CD, ADV_ABA_R, TYPE_CD, ACCT_NUM, STLMT_DT) Efficient outer table access INDEX opt_1 (ADV_ABA_R, STLMT_DT, ACCT_NUM) Efficient inner table access: INDEX opt_2 (ACCT_NUM)

Summary of this SQL Indexes on BRANCH, BRANCH_ADDR look better than they are Range predicate with parameter marker estimates 3% of rows qualify In reality, 99% qualify Inefficient index available on SETL_TRANS table No efficient outer table index available No efficient inner table index available This is the biggest table, with the best filter!!! Optimizer bad join method due to combination of above factors Performed full scan of transaction index 26,000 times Resolution: Providing new index on SETL_TRANS should provide more stable, faster access than ever before REOPT, or providing literal values avoids the disaster without new index

SQL 2 SELECT COLS FROM PART A CARDF=17,598 QUALIFIED_ROWS=67.1 NPAGESF=1,467 , CONTRACTOR B CARDF=34,728 QUALIFIED_ROWS=77.8 NPAGESF=724 , CONT_PARTS C CARDF=2,093,750 QUALIFIED_ROWS=38,382 NPAGESF=52,189 , PARTS_PROD_ASMBLY D CARDF=7,058,356 QUALIFIED_ROWS=7,058,356 NPAGESF=68,644 , PARTS_PROD_ASM_DTL E CARDF=21,366,326 QUALIFIED_ROWS=21,366,320 NPAGESF=1,236,490 WHERE A.COUNTRY_CD = ? COLCARDF=208 MAX_FREQ=36.408% FF=0.005 AND A.PART_CD = ? COLCARDF=5 MAX_FREQ=47.199% FF=0.2 AND A.PART_TYPE IN ( 'F', 'I', 'P' ) COLCARDF=8 MAX_FREQ=79.867% FF=0.958 AND B.PART_NUM = ? COLCARDF=260 MAX_FREQ=3.032% FF=0.004 AND B.SUB_CONTRACTOR = 'Y' COLCARDF=2 MAX_FREQ=87.402% FF=0.126 LOW2KEY=N HIGH2KEY=Y AND B.SUSPENDED = 'N' COLCARDF=2 MAX_FREQ=99.833% FF=0.998 AND C.PREFERRED = 'Y' COLCARDF=3 MAX_FREQ=76.832% FF=0.018 AND B.CONTRACTOR_ID = C.CONTRACTOR_ID COLCARDF=1,047/316 FF=9.551E-4 AND D.PART_NUM = A.PART_NUM COLCARDF=6,132/17,598 FF=5.682E-5 AND C.PRODUCT_ID = D.PRODUCT_ID COLCARDF=1,391,650/7,058,356 FF=1.417E-7 AND C.PRODUCT_ID = E.PRODUCT_ID COLCARDF=1,391,650/21,366,326 FF=4.68E-8 AND E.PRODUCT_ID = D.PRODUCT_ID COLCARDF=21,366,326/7,058,356 FF=4.68E-8

Local predicate analysis SELECT COLS FROM PART A CARDF=17,598 QUALIFIED_ROWS=67.1 NPAGESF=1,467 , CONTRACTOR B CARDF=34,728 QUALIFIED_ROWS=77.8 NPAGESF=724 , CONT_PARTS C CARDF=2,093,750 QUALIFIED_ROWS=38,382 NPAGESF=52,189 , PARTS_PROD_ASMBLY D CARDF=7,058,356 QUALIFIED_ROWS=7,058,356 NPAGESF=68,644 , PARTS_PROD_ASM_DTL E CARDF=21,366,326 QUALIFIED_ROWS=21,366,320 NPAGESF=1,236,490 WHERE A.COUNTRY_CD = ? COLCARDF=208 MAX_FREQ=36.408% FF=0.005  ??? ‘FR’ = 36.4% ‘GB’ = 17% ‘DE’=10% AND A.PART_CD = ? COLCARDF=5 MAX_FREQ=47.199% FF=0.2  ??? 4 = 47%, 2 = 27%, 6 = 17%, 1 = 8%, blank = < 1% AND A.PART_TYPE IN ( 'F', 'I', 'P' ) COLCARDF=8 MAX_FREQ=79.867% FF=0.958  Skewed, not selective AND B.PART_NUM = ? COLCARDF=260 MAX_FREQ=3.032% FF=0.004 AND B.SUB_CONTRACTOR = 'Y' COLCARDF=2 MAX_FREQ=87.402% FF=0.126  skewed, selective LOW2KEY=N HIGH2KEY=Y AND B.SUSPENDED = 'N' COLCARDF=2 MAX_FREQ=99.833% FF=0.998  skewed, not selective AND C.PREFERRED = 'Y' COLCARDF=3 MAX_FREQ=76.832% FF=0.018  skewed, selective AND B.CONTRACTOR_ID = C.CONTRACTOR_ID COLCARDF=1,047/316 FF=9.551E-4 AND D.PART_NUM = A.PART_NUM COLCARDF=6,132/17,598 FF=5.682E-5 AND C.PRODUCT_ID = D.PRODUCT_ID COLCARDF=1,391,650/7,058,356 FF=1.417E-7 AND C.PRODUCT_ID = E.PRODUCT_ID COLCARDF=1,391,650/21,366,326 FF=4.68E-8 AND E.PRODUCT_ID = D.PRODUCT_ID COLCARDF=21,366,326/7,058,356 FF=4.68E-8 Both ‘A’ and ‘B’ tables have selective predicates. COUNTRY_CD and PART_CD predicates – there is skew, optimizer assumes uniform distribution B.PART_NUM – Slightly skewed. 3% one value. Uniform estimate is 0.4%. PREFERRED – skewed, query searches for an infrequently occurring value. Without looking at indexes, seems ‘A’ and ‘B’ will compete to be outer table Qualified rows of 67.1 and 77.8 pretty close

Local index analysis – ‘A’ SELECT COLS FROM PART A CARDF=17,598 QUALIFIED_ROWS=67.1 NPAGESF=1,467 WHERE A.COUNTRY_CD = ? COLCARDF=208 MAX_FREQ=36.408% FF=0.005  ??? ‘FR’ = 36.4% ‘GB’ = 17% ‘DE’=10% AND A.PART_CD = ? COLCARDF=5 MAX_FREQ=47.199% FF=0.2  ??? 4 = 47%, 2 = 27%, 6 = 17%, 1 = 8%, blank = < 1% AND A.PART_TYPE IN ( 'F', 'I', 'P' ) COLCARDF=8 MAX_FREQ=79.867% FF=0.958  Skewed, not selective INDEX CLU UR NLEAF NLEVEL CR KEYCOLNAME COLCARDF MCARDF IXPRT01 Y U 151 3 0.999 PART_CD 5 5 COUNTRY_CD 208 251 FILE 2496 3054 DR 46 3176 SECTOR 178 3548 PDV 16830 17598 IXPRT02 N D 128 2 0.794 PART_CD 5 5 PART_TYPE 8 28 PDV 16830 16850 FILE 2496 16905 IXPRT03 N D 26 2 0.998 PART_TYPE 8 8 PART_CD 5 28 COUNTRY_CD 208 579 IXPRT04 N D 99 2 0.782 PART_TYPE 8 8 PART_NUM 17598 17598

Local index analysis – ‘A’ SELECT COLS FROM PART A CARDF=17,598 QUALIFIED_ROWS=67.1 NPAGESF=1,467 WHERE A.COUNTRY_CD = ? COLCARDF=208 MAX_FREQ=36.408% FF=0.005  ??? ‘FR’ = 36.4% ‘GB’ = 17% ‘DE’=10% AND A.PART_CD = ? COLCARDF=5 MAX_FREQ=47.199% FF=0.2  ??? 4 = 47%, 2 = 27%, 6 = 17%, 1 = 8%, blank = < 1% AND A.PART_TYPE IN ( 'F', 'I', 'P' ) COLCARDF=8 MAX_FREQ=79.867% FF=0.958  Skewed, not selective INDEX CLU UR NLEAF NLEVEL CR KEYCOLNAME COLCARDF MCARDF IXPRT01 Y U 151 3 0.999 PART_CD 5 5 COUNTRY_CD 208 251 FILE 2496 3054 DR 46 3176 SECTOR 178 3548 PDV 16830 17598 IXPRT02 N D 128 2 0.794 PART_CD 5 5 PART_TYPE 8 28 PDV 16830 16850 FILE 2496 16905 IXPRT03 N D 26 2 0.998 PART_TYPE 8 8 PART_CD 5 28 COUNTRY_CD 208 579 IXPRT04 N D 99 2 0.782 PART_TYPE 8 8 PART_NUM 17598 17598

Local index analysis B SELECT COLS FROM CONTRACTOR B CARDF=34,728 QUALIFIED_ROWS=77.8 NPAGESF=724 WHERE B.PART_NUM = ? COLCARDF=260 MAX_FREQ=3.032% FF=0.004 AND B.SUB_CONTRACTOR = 'Y' COLCARDF=2 MAX_FREQ=87.402% FF=0.126  skewed, selective LOW2KEY=N HIGH2KEY=Y AND B.SUSPENDED = 'N' COLCARDF=2 MAX_FREQ=99.833% FF=0.998  skewed, not selective INDEX CLU UR NLEAF NLEVEL CR KEYCOLNAME COLCARDF MCARDF IXCTR01 Y P 210 3 0.962 PART_NUM 260 278 CONTRACTOR_ID 1047 34722 CONT_TYPE 7 34728 IXCTR02 N D 50 2 0.624 PART_NUM 260 278 IXCTR03 N D 56 2 0.348 BEGIN_DT 1015 1015 CONTRACTOR_ID 1047 2555 IXCTR04 N D 316 3 0.927 CONTRACTOR_ID 1047 1047 PART_NUM 260 34722 BEGIN_DT 1015 34722 END_DT 2656 34722 IXCTR05 N D 250 3 0.896 CONTRACTOR_ID 1047 1047 BEGIN_DT 1015 2555

Local index analysis B SELECT COLS FROM CONTRACTOR B CARDF=34,728 QUALIFIED_ROWS=77.8 NPAGESF=724 WHERE B.PART_NUM = ? COLCARDF=260 MAX_FREQ=3.032% FF=0.004 AND B.SUB_CONTRACTOR = 'Y' COLCARDF=2 MAX_FREQ=87.402% FF=0.126  skewed, selective LOW2KEY=N HIGH2KEY=Y AND B.SUSPENDED = 'N' COLCARDF=2 MAX_FREQ=99.833% FF=0.998  skewed, not selective INDEX CLU UR NLEAF NLEVEL CR KEYCOLNAME COLCARDF MCARDF IXCTR01 Y P 210 3 0.962 PART_NUM 260 278 CONTRACTOR_ID 1047 34722 CONT_TYPE 7 34728 IXCTR02 N D 50 2 0.624 PART_NUM 260 278 IXCTR04 N D 316 3 0.927 CONTRACTOR_ID 1047 1047 PART_NUM 260 34722 BEGIN_DT 1015 34722 END_DT 2656 34722 IXCTR05 N D 250 3 0.896 CONTRACTOR_ID 1047 1047 BEGIN_DT 1015 2555 Note: SUB_CONTRACTOR is selective due to search for least frequent value. Is not in any candidate index. Otherwise, local index support looks good. May be able to drop IXCTR02 with reverse index scan support.

Local index analysis C Table C SELECT COLS FROM CONT_PARTS C CARDF=2,093,750 QUALIFIED_ROWS=38,382 NPAGESF=52,189 WHERE C.PREFERRED = 'Y' COLCARDF=3 MAX_FREQ=76.832% FF=0.018  skewed, selective INDEX CLU UR NLEAF NLEVEL CR KEYCOLNAME COLCARDF MCARDF IXCPR04 N D 15352 3 0.998 PREFERRED 3 3 CONTRACTOR_ID 316 552 PRODUCT_ID 1391650 1808887 Table C There is index support for local filtering. Trailing join column (good)

Indexes for local summary Each table with local filtering had efficient indexes to support local filtering Positives: Efficient access paths exist. Negatives: Each table will compete for the outer More “apparently efficient” choices, more stress on optimizer, opportunity for incorrect choice

Join graph B C E D A Two most selective tables ‘A’ and ‘B’ not joined directly C – D – E each join on same column (PRODUCT_ID) Shaping up like ‘A’ with 67 outer rows as outer vs ‘B’ with 77 rows as outer

Join considerations Index support for certain join sequences Indexes available to support matching index access for different desirable join sequences? Join reduction / fan-out considerations Consider expansion / contraction of result size through different join sequences

Join indexes A INDEX CLU UR NLEAF NLEVEL CR KEYCOLNAME COLCARDF MCARDF SELECT COLS FROM PART A CARDF=17,598 QUALIFIED_ROWS=67.1 NPAGESF=1,467 , CONTRACTOR B CARDF=34,728 QUALIFIED_ROWS=77.8 NPAGESF=724 , CONT_PARTS C CARDF=2,093,750 QUALIFIED_ROWS=38,382 NPAGESF=52,189 , PARTS_PROD_ASMBLY D CARDF=7,058,356 QUALIFIED_ROWS=7,058,356 NPAGESF=68,644 , PARTS_PROD_ASM_DTL E CARDF=21,366,326 QUALIFIED_ROWS=21,366,320 NPAGESF=1,236,490 WHERE A.COUNTRY_CD = ? COLCARDF=208 MAX_FREQ=36.408% FF=0.005 ‘FR’ = 36.4% ‘GB’ = 17% ‘DE’=10% AND A.PART_CD = ? COLCARDF=5 MAX_FREQ=47.199% FF=0.2 4 = 47%, 2 = 27%, 6 = 17%, 1 = 8%, blank = < 1% AND A.PART_TYPE IN ( 'F', 'I', 'P' ) COLCARDF=8 MAX_FREQ=79.867% FF=0.958 AND B.PART_NUM = ? COLCARDF=260 MAX_FREQ=3.032% FF=0.004 AND B.SUB_CONTRACTOR = 'Y' COLCARDF=2 MAX_FREQ=87.402% FF=0.126 LOW2KEY=N HIGH2KEY=Y AND B.SUSPENDED = 'N' COLCARDF=2 MAX_FREQ=99.833% FF=0.998 AND C.PREFERRED = 'Y' COLCARDF=3 MAX_FREQ=76.832% FF=0.018 AND B.CONTRACTOR_ID = C.CONTRACTOR_ID COLCARDF=1,047/316 FF=9.551E-4 AND D.PART_NUM = A.PART_NUM COLCARDF=6,132/17,598 FF=5.682E-5 AND C.PRODUCT_ID = D.PRODUCT_ID COLCARDF=1,391,650/7,058,356 FF=1.417E-7 AND C.PRODUCT_ID = E.PRODUCT_ID COLCARDF=1,391,650/21,366,326 FF=4.68E-8 AND E.PRODUCT_ID = D.PRODUCT_ID COLCARDF=21,366,326/7,058,356 FF=4.68E-8 INDEX CLU UR NLEAF NLEVEL CR KEYCOLNAME COLCARDF MCARDF IXPRT01 N P 83 2 0.782 PART_NUM 17598 17598 IXPRT02 N D 112 2 0.782 PART_NUM 17598 17598 PART_TYPE 8 17598 PART_CD 5 17598 IXPRT04 N D 99 2 0.782 PART_TYPE 8 8 PART_NUM 17598 17598 IXPRTxx N D 122 2 0.782 PART_NUM 17598 17603 PART_CD 5 -1 COUNTRY_CD 208 17603

Join indexes A Join access available through join the ‘D’ table only Via PART_NUM if ‘D’ is the outer There are multiple indexes to support ‘A’ as inner IXPRT02 and IXPRTxx appear redundant IXPRTxx is superset of IXPRT02, same column sequence INDEX CLU UR NLEAF NLEVEL CR KEYCOLNAME COLCARDF MCARDF IXPRT01 N P 83 2 0.782 PART_NUM 17598 17598 IXPRT02 N D 112 2 0.782 PART_NUM 17598 17598 PART_TYPE 8 17598 PART_CD 5 17598 IXPRT04 N D 99 2 0.782 PART_TYPE 8 8 PART_NUM 17598 17598 IXPRTxx N D 122 2 0.782 PART_NUM 17598 17603 PART_CD 5 -1 COUNTRY_CD 208 17603

Join indexes B INDEX CLU UR NLEAF NLEVEL CR KEYCOLNAME COLCARDF MCARDF SELECT COLS FROM PART A CARDF=17,598 QUALIFIED_ROWS=67.1 NPAGESF=1,467 , CONTRACTOR B CARDF=34,728 QUALIFIED_ROWS=77.8 NPAGESF=724 , CONT_PARTS C CARDF=2,093,750 QUALIFIED_ROWS=38,382 NPAGESF=52,189 , PARTS_PROD_ASMBLY D CARDF=7,058,356 QUALIFIED_ROWS=7,058,356 NPAGESF=68,644 , PARTS_PROD_ASM_DTL E CARDF=21,366,326 QUALIFIED_ROWS=21,366,320 NPAGESF=1,236,490 WHERE A.COUNTRY_CD = ? COLCARDF=208 MAX_FREQ=36.408% FF=0.005 ‘FR’ = 36.4% ‘GB’ = 17% ‘DE’=10% AND A.PART_CD = ? COLCARDF=5 MAX_FREQ=47.199% FF=0.2 4 = 47%, 2 = 27%, 6 = 17%, 1 = 8%, blank = < 1% AND A.PART_TYPE IN ( 'F', 'I', 'P' ) COLCARDF=8 MAX_FREQ=79.867% FF=0.958 AND B.PART_NUM = ? COLCARDF=260 MAX_FREQ=3.032% FF=0.004 AND B.SUB_CONTRACTOR = 'Y' COLCARDF=2 MAX_FREQ=87.402% FF=0.126 LOW2KEY=N HIGH2KEY=Y AND B.SUSPENDED = 'N' COLCARDF=2 MAX_FREQ=99.833% FF=0.998 AND C.PREFERRED = 'Y' COLCARDF=3 MAX_FREQ=76.832% FF=0.018 AND B.CONTRACTOR_ID = C.CONTRACTOR_ID COLCARDF=1,047/316 FF=9.551E-4 AND D.PART_NUM = A.PART_NUM COLCARDF=6,132/17,598 FF=5.682E-5 AND C.PRODUCT_ID = D.PRODUCT_ID COLCARDF=1,391,650/7,058,356 FF=1.417E-7 AND C.PRODUCT_ID = E.PRODUCT_ID COLCARDF=1,391,650/21,366,326 FF=4.68E-8 AND E.PRODUCT_ID = D.PRODUCT_ID COLCARDF=21,366,326/7,058,356 FF=4.68E-8 INDEX CLU UR NLEAF NLEVEL CR KEYCOLNAME COLCARDF MCARDF IXCTR01 Y P 210 3 0.962 PART_NUM 260 278 CONTRACTOR_ID 1047 34722 CONT_TYPE 7 34728 IXCTR04 N D 316 3 0.927 CONTRACTOR_ID 1047 1047 PART_NUM 260 34722 BEGIN_DT 1015 34722 END_DT 2656 34722 IXCTR05 N D 250 3 0.896 CONTRACTOR_ID 1047 1047 BEGIN_DT 1015 2555

Join indexes B INDEX CLU UR NLEAF NLEVEL CR KEYCOLNAME COLCARDF MCARDF Join access available through join the ‘C’ table only Via CONTRACTOR_ID if ‘C’ is the outer There are multiple indexes to support ‘B’ as inner IXCTR01 has PART_NUM as leading local Join from outer will hit far fewer leaf pages due to leading local predicate Smaller “swath” of leaf pages: NLEAF * 1/PART_NUM COLCARDF 210 * (1/260) ~= 1 leaf page Makes this index “outstanding” from inner index access perspective Also an effective “outer” index since it provides good local filtering and join order for a join to ‘C’ table as inner IXCTR04, IXCTR05 lead with join predicate Support the join effectively Join scattered over all leaf pages INDEX CLU UR NLEAF NLEVEL CR KEYCOLNAME COLCARDF MCARDF IXCTR01 Y P 210 3 0.962 PART_NUM 260 278 CONTRACTOR_ID 1047 34722 CE_TYPE 7 34728 IXCTR04 N D 316 3 0.927 CONTRACTOR_ID 1047 1047 PART_NUM 260 34722 CE_DTDIFFREEL 1015 34722 CE_DTLANCREEL 2656 34722 IXCTR05 N D 250 3 0.896 CONTRACTOR_ID 1047 1047 CE_DTDIFFREEL 1015 2555

Join indexes C INDEX CLU UR NLEAF NLEVEL CR KEYCOLNAME COLCARDF MCARDF SELECT COLS FROM PART A CARDF=17,598 QUALIFIED_ROWS=67.1 NPAGESF=1,467 , CONTRACTOR B CARDF=34,728 QUALIFIED_ROWS=77.8 NPAGESF=724 , CONT_PARTS C CARDF=2,093,750 QUALIFIED_ROWS=38,382 NPAGESF=52,189 , PARTS_PROD_ASMBLY D CARDF=7,058,356 QUALIFIED_ROWS=7,058,356 NPAGESF=68,644 , PARTS_PROD_ASM_DTL E CARDF=21,366,326 QUALIFIED_ROWS=21,366,320 NPAGESF=1,236,490 WHERE A.COUNTRY_CD = ? COLCARDF=208 MAX_FREQ=36.408% FF=0.005 ‘FR’ = 36.4% ‘GB’ = 17% ‘DE’=10% AND A.PART_CD = ? COLCARDF=5 MAX_FREQ=47.199% FF=0.2 4 = 47%, 2 = 27%, 6 = 17%, 1 = 8%, blank = < 1% AND A.PART_TYPE IN ( 'F', 'I', 'P' ) COLCARDF=8 MAX_FREQ=79.867% FF=0.958 AND B.PART_NUM = ? COLCARDF=260 MAX_FREQ=3.032% FF=0.004 AND B.SUB_CONTRACTOR = 'Y' COLCARDF=2 MAX_FREQ=87.402% FF=0.126 LOW2KEY=N HIGH2KEY=Y AND B.SUSPENDED = 'N' COLCARDF=2 MAX_FREQ=99.833% FF=0.998 AND C.PREFERRED = 'Y' COLCARDF=3 MAX_FREQ=76.832% FF=0.018 AND B.CONTRACTOR_ID = C.CONTRACTOR_ID COLCARDF=1,047/316 FF=9.551E-4 AND D.PART_NUM = A.PART_NUM COLCARDF=6,132/17,598 FF=5.682E-5 AND C.PRODUCT_ID = D.PRODUCT_ID COLCARDF=1,391,650/7,058,356 FF=1.417E-7 AND C.PRODUCT_ID = E.PRODUCT_ID COLCARDF=1,391,650/21,366,326 FF=4.68E-8 AND E.PRODUCT_ID = D.PRODUCT_ID COLCARDF=21,366,326/7,058,356 FF=4.68E-8 INDEX CLU UR NLEAF NLEVEL CR KEYCOLNAME COLCARDF MCARDF IXCPR01 Y U 21367 3 1.0 PRODUCT_ID 1391650 1391650 CONTRACTOR_ID 316 1794093 CO_DTHRCONTACT 1645213 2093750 IXCPR02 N D 14771 3 0.999 CONTRACTOR_ID 316 316 PRODUCT_ID 1391650 1794093 IXCPR03 N D 16188 3 0.998 CONTRACTOR_ID 316 316 CO_PHASECONTACT 4 783 PRODUCT_ID 1391650 1931232 IXCPR04 N D 15352 3 0.998 PREFERRED 3 3 CONTRACTOR_ID 316 552 PRODUCT_ID 1391650 1808887

Join indexes C Join access available through join the ‘B’, ‘D’, and ‘E’ tables Via CONTRACTOR_ID if ‘B’ is the outer composite Via PRODUCT_ID if ‘D’ or ‘E’ are in the outer composite There is support for either join sequence. CPNQCC02 has PRODUCT_ID as leading column to support ‘D’ or ‘E’ in outer composite CPNQXC02 and IXCPR03 have CONTRACTOR_ID as leading join column if ‘B’ is the outer composite IXCPR03 would also be a candidate if B were cartesianed with D or E. Not that I think that’s likely. CPMQXCOH would likely be preferred index if ‘B’ were in outer composite Selective leading local on PREFERRED bounds the leaf pages that would be hit to < 2% of all leaf pages Makes ‘C’ a possible efficient outer – good local filtering, provides join ordering for join to ‘B’ table INDEX CLU UR NLEAF NLEVEL CR KEYCOLNAME COLCARDF MCARDF IXCPR01 Y U 21367 3 1.0 PRODUCT_ID 1391650 1391650 CONTRACTOR_ID 316 1794093 CO_DTHRCONTACT 1645213 2093750 IXCPR02 N D 14771 3 0.999 CONTRACTOR_ID 316 316 PRODUCT_ID 1391650 1794093 IXCPR03 N D 16188 3 0.998 CONTRACTOR_ID 316 316 CO_PHASECONTACT 4 783 PRODUCT_ID 1391650 1931232 IXCPR04 N D 15352 3 0.998 PREFERRED 3 3 CONTRACTOR_ID 316 552 PRODUCT_ID 1391650 1808887

Join indexes D INDEX CLU UR NLEAF NLEVEL CR KEYCOLNAME COLCARDF MCARDF SELECT COLS FROM PART A CARDF=17,598 QUALIFIED_ROWS=67.1 NPAGESF=1,467 , CONTRACTOR B CARDF=34,728 QUALIFIED_ROWS=77.8 NPAGESF=724 , CONT_PARTS C CARDF=2,093,750 QUALIFIED_ROWS=38,382 NPAGESF=52,189 , PARTS_PROD_ASMBLY D CARDF=7,058,356 QUALIFIED_ROWS=7,058,356 NPAGESF=68,644 , PARTS_PROD_ASM_DTL E CARDF=21,366,326 QUALIFIED_ROWS=21,366,320 NPAGESF=1,236,490 WHERE A.COUNTRY_CD = ? COLCARDF=208 MAX_FREQ=36.408% FF=0.005 ‘FR’ = 36.4% ‘GB’ = 17% ‘DE’=10% AND A.PART_CD = ? COLCARDF=5 MAX_FREQ=47.199% FF=0.2 4 = 47%, 2 = 27%, 6 = 17%, 1 = 8%, blank = < 1% AND A.PART_TYPE IN ( 'F', 'I', 'P' ) COLCARDF=8 MAX_FREQ=79.867% FF=0.958 AND B.PART_NUM = ? COLCARDF=260 MAX_FREQ=3.032% FF=0.004 AND B.SUB_CONTRACTOR = 'Y' COLCARDF=2 MAX_FREQ=87.402% FF=0.126 LOW2KEY=N HIGH2KEY=Y AND B.SUSPENDED = 'N' COLCARDF=2 MAX_FREQ=99.833% FF=0.998 AND C.PREFERRED = 'Y' COLCARDF=3 MAX_FREQ=76.832% FF=0.018 AND B.CONTRACTOR_ID = C.CONTRACTOR_ID COLCARDF=1,047/316 FF=9.551E-4 AND D.PART_NUM = A.PART_NUM COLCARDF=6,132/17,598 FF=5.682E-5 AND C.PRODUCT_ID = D.PRODUCT_ID COLCARDF=1,391,650/7,058,356 FF=1.417E-7 AND C.PRODUCT_ID = E.PRODUCT_ID COLCARDF=1,391,650/21,366,326 FF=4.68E-8 AND E.PRODUCT_ID = D.PRODUCT_ID COLCARDF=21,366,326/7,058,356 FF=4.68E-8 INDEX CLU UR NLEAF NLEVEL CR KEYCOLNAME COLCARDF MCARDF IXPDA05 Y P 44900 4 0.975 PRODUCT_ID 7058356 7058356 IXPDA02 N D 70586 4 0.868 PART_NUM 6132 6132 PRODUCT_ID 7058356 7058356 IXPDA06 N D 66590 4 0.975 PRODUCT_ID 7058356 7058356 PART_NUM 6132 7058356

Join indexes D ‘D’ is accessed in multiple directions Via PART_NUM if ‘A’ is the outer Via PRODUCT_ID if accessed through ‘C’ or ‘E’ Both join direction supported by matching index access. RT_ENTID leading column of IXPDA02 PRODUCT_ID leading column of IXPDA05, IXPDA06 The non-primary key indexes are defined as allowing duplicates – but they cannot. PRODUCT_ID is the primary key and is included in a unique index. Any index which contains PRODUCT_ID therefore is unique. Defining as unique would save some space in the index. Duplicate indexes have slightly larger control structures to allow for duplicate RIDS. DB2 must allow for duplicates if the index is not explicitly defined as unique since you could drop the unique index.

Join indexes E Join access available through C and E tables SELECT COLS FROM PART A CARDF=17,598 QUALIFIED_ROWS=67.1 NPAGESF=1,467 , CONTRACTOR B CARDF=34,728 QUALIFIED_ROWS=77.8 NPAGESF=724 , CONT_PARTS C CARDF=2,093,750 QUALIFIED_ROWS=38,382 NPAGESF=52,189 , PARTS_PROD_ASMBLY D CARDF=7,058,356 QUALIFIED_ROWS=7,058,356 NPAGESF=68,644 , PARTS_PROD_ASM_DTL E CARDF=21,366,326 QUALIFIED_ROWS=21,366,320 NPAGESF=1,236,490 WHERE A.COUNTRY_CD = ? COLCARDF=208 MAX_FREQ=36.408% FF=0.005 ‘FR’ = 36.4% ‘GB’ = 17% ‘DE’=10% AND A.PART_CD = ? COLCARDF=5 MAX_FREQ=47.199% FF=0.2 4 = 47%, 2 = 27%, 6 = 17%, 1 = 8%, blank = < 1% AND A.PART_TYPE IN ( 'F', 'I', 'P' ) COLCARDF=8 MAX_FREQ=79.867% FF=0.958 AND B.PART_NUM = ? COLCARDF=260 MAX_FREQ=3.032% FF=0.004 AND B.SUB_CONTRACTOR = 'Y' COLCARDF=2 MAX_FREQ=87.402% FF=0.126 LOW2KEY=N HIGH2KEY=Y AND B.SUSPENDED = 'N' COLCARDF=2 MAX_FREQ=99.833% FF=0.998 AND C.PREFERRED = 'Y' COLCARDF=3 MAX_FREQ=76.832% FF=0.018 AND B.CONTRACTOR_ID = C.CONTRACTOR_ID COLCARDF=1,047/316 FF=9.551E-4 AND D.PART_NUM = A.PART_NUM COLCARDF=6,132/17,598 FF=5.682E-5 AND C.PRODUCT_ID = D.PRODUCT_ID COLCARDF=1,391,650/7,058,356 FF=1.417E-7 AND C.PRODUCT_ID = E.PRODUCT_ID COLCARDF=1,391,650/21,366,326 FF=4.68E-8 AND E.PRODUCT_ID = D.PRODUCT_ID COLCARDF=21,366,326/7,058,356 FF=4.68E-8 INDEX CLU UR NLEAF NLEVEL CR KEYCOLNAME COLCARDF MCARDF IXPPA01 N U 141499 4 0.609 PRODUCT_ID 21366326 21366326 Join access available through C and E tables Both tables join on PRODUCT_ID column Join is supported via IXPDA01 index PRODUCT_ID only column Unique index (no fan-out when joining to this table)

Join fan-out Look at join fan-out issues SELECT COLS FROM PART A CARDF=17,598 QUALIFIED_ROWS=67.1 NPAGESF=1,467 , CONTRACTOR B CARDF=34,728 QUALIFIED_ROWS=77.8 NPAGESF=724 , CONT_PARTS C CARDF=2,093,750 QUALIFIED_ROWS=38,382 NPAGESF=52,189 , PARTS_PROD_ASMBLY D CARDF=7,058,356 QUALIFIED_ROWS=7,058,356 NPAGESF=68,644 , PARTS_PROD_ASM_DTL E CARDF=21,366,326 QUALIFIED_ROWS=21,366,320 NPAGESF=1,236,490 WHERE A.COUNTRY_CD = ? COLCARDF=208 MAX_FREQ=36.408% FF=0.005 ‘FR’ = 36.4% ‘GB’ = 17% ‘DE’=10% AND A.PART_CD = ? COLCARDF=5 MAX_FREQ=47.199% FF=0.2 4 = 47%, 2 = 27%, 6 = 17%, 1 = 8%, blank = < 1% AND A.PART_TYPE IN ( 'F', 'I', 'P' ) COLCARDF=8 MAX_FREQ=79.867% FF=0.958 AND B.PART_NUM = ? COLCARDF=260 MAX_FREQ=3.032% FF=0.004 AND B.SUB_CONTRACTOR = 'Y' COLCARDF=2 MAX_FREQ=87.402% FF=0.126 LOW2KEY=N HIGH2KEY=Y AND B.SUSPENDED = 'N' COLCARDF=2 MAX_FREQ=99.833% FF=0.998 AND C.PREFERRED = 'Y' COLCARDF=3 MAX_FREQ=76.832% FF=0.018 AND B.CONTRACTOR_ID = C.CONTRACTOR_ID COLCARDF=1,047/316 FF=9.551E-4 AND D.PART_NUM = A.PART_NUM COLCARDF=6,132/17,598 FF=5.682E-5 AND C.PRODUCT_ID = D.PRODUCT_ID COLCARDF=1,391,650/7,058,356 FF=1.417E-7 AND C.PRODUCT_ID = E.PRODUCT_ID COLCARDF=1,391,650/21,366,326 FF=4.68E-8 AND E.PRODUCT_ID = D.PRODUCT_ID COLCARDF=21,366,326/7,058,356 FF=4.68E-8 Look at join fan-out issues Qualified outer rows * (CARDF of inner / MAX(join colcardf) A  D 67.1 rows * (7,058,356 / 17598) ~= 27,000 rows B  C or C  B 77.8 rows * (2,093,750 / 1047) ~= 155,500 rows (after local filtering on C, down to 38K) So B  C expected to fan-out far more.

Explain PLANNO METHOD MERGE_COLS TB_NAME MATCH COLS ACCESS_TYPE ACCESS NAME IX_ONLY SORTN_JOIN SORTC_JOIN 1 PART 2 I IXPRT01 N PARTS_PROD_ASSEMBLY IXPPA02 Y 3 CONT_PARTS IXPRD04 4 CONTRACTOR IXCTR01 5 PARTS_PROD_ASM_DTL IXPDA01 Join sequence Access ‘A’ via index IXPRT01 (PART_CD, COUNTRY_CD, …) ~67 rows Nested loop join to ‘D’ using index IXPDA02 (RV_ENT_ID, PRODUCT_ID) ~27,000 rows Sort merge join to C Sorting composite into PRODUCT_ID sequence Access ‘C’ via IXCPR04 (PREFERRED, CONTRACTOR_ID) Sorting new into PRODUCT_ID sequence ~7,900 rows Nested loop join to B via index IXCTR01 ~7,900 rows (PART_NUM, CONTRACTOR_ID, CE_TYPE) Nested loop join to E via index IXPDA01 ~7,900 rows (PRODUCT_ID) Blue = local predicate Green = join predicate

Issues – A as outer? Is local filtering to ‘A’ table accurate? There is skew, but use of markers precludes recognition of skew Qualified rows and fan-out could be much worse than estimated ‘A’ as outer could be underestimated, depends on what values being used Sort merge join to ‘C’ to avoid 27K probes Does not want to probe 27k times matching + fan-out on PRODUCT_ID Uses efficient local index instead 1 probe to scan of 38k rows via PREFERRED 27K probes * 2 rows per inner via index on PRODUCT_ID Index on PREFERRED, PRODUCT_ID likely would might avert SMJ in this context Hesitant to recommend index – since A  D  C could be an inefficient sequence.

Issues – B / C as outer? B as outer Less skew on B.PART_NUM = ? – less uncertainty in cost estimate Fan-out to 38K rows is discouraging B  C supported by efficient local + equals index (PREFERRED, CONTRACTOR_ID, PRODUCT_ID) C also a desirable outer Index on (PREFERRED,CONTRACTOR_ID,PRODUCT_ID) provides good local filter Could access B via local filtering on B.PART_NUM = ?, materialize 77 rows into workfile for sort merge join

Summary Query 2 Bottom line: Multiple choices Uniform distribution estimate on ‘A’ table allows it to compete very favorably. If ‘FR’, ‘GB’, ‘DE’ values used for COUNTRY_CD – ‘A’ as outer no longer desirable. Are ‘FR’, ‘GB’, ‘DE’ values frequently used for this query? If PART_CD = ‘4’ value is used frequently – ‘A’ as outer no longer desirable. Is ‘4’ used frequently? Split query, REOPT, OPTHINTS… Multiple choices Local filtering spread across several tables Estimated filtering looks good Efficient access paths (index to support local, join predicates) exist More difficult for optimizer to identify the cheapest path Scenario more regression prone Optimizer may need more statistics, ability to use more statistics (REOPT) for optimizer identify the cheapest path

Commentary How to perform SQL analysis Format query so it’s readable Annotate with important statistics Tables: Table cardinality, NPAGES, qualified number of rows Predicates COLCARDF, LOW2KEY, HIGH2KEY, filter factor estimate Are table level estimates reasonable based on your knowledge? If you don’t know – perform counts to find out if estimates are accurate If you don’t know how selective things are, how will you know what the best path should be? Are predicate level filtering estimates reasonable? Reference table, index, indexed columns report Is the best local filtering supported through matching index access? Any mis-estimated local filtering that’s also matching indexable (may cause one path to look far more efficient than reality) With trailing join predicates to provide order to next desired table (bonus) Is there adequate (matching) index support for desired join sequences? Develop understanding of “plausible” and “desirable” access paths Examine EXPLAIN output Does optimizer choose the path you expect? If not, you should have better understanding of what makes other access paths competitive, tuning can be more targeted Eg. Certain predicate appears filtering, but is not. Can use REOPT, or trick – targeted to solve a specific problem. Skilled targeted tuning is less susceptible to re-regress than blind tuning (where problem is not understood)