DB2 for z/OS Query Optimization  IBM Corporation 2010 1  IBM Corporation 2003 Tampa Bay Relational Users Group IBM Silicon Valley Lab, U.S.A.

Slides:



Advertisements
Similar presentations
Youre Smarter than a Database Overcoming the optimizers bad cardinality estimates.
Advertisements

© IBM Corporation Informix Chat with the Labs John F. Miller III Unlocking the Mysteries Behind Update Statistics STSM.
CS4432: Database Systems II
Query Optimization CS634 Lecture 12, Mar 12, 2014 Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
SQL Performance 2011/12 Joe Chang, SolidQ
Introduction to Histograms Presented By: Laukik Chitnis
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part C Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 5 More SQL: Complex Queries, Triggers, Views, and Schema Modification.
1 Relational Query Optimization Module 5, Lecture 2.
Paper by: A. Balmin, T. Eliaz, J. Hornibrook, L. Lim, G. M. Lohman, D. Simmen, M. Wang, C. Zhang Slides and Presentation By: Justin Weaver.
David Konopnicki Choosing Access Path ä The basic methods. ä The access paths and when they are available. ä How the optimizer chooses among the.
QM Spring 2002 Business Statistics SPSS: A Summary & Review.
Tirgul 7. Find an efficient implementation of a dynamic collection of elements with unique keys Supported Operations: Insert, Search and Delete. The keys.
16.5 Introduction to Cost- based plan selection Amith KC Student Id: 109.
Database Systems More SQL Database Design -- More SQL1.
WRITING BASIC SQL SELECT STATEMENTS Lecture 7 1. Outlines  SQL SELECT statement  Capabilities of SELECT statements  Basic SELECT statement  Selecting.
Relational Database Performance CSCI 6442 Copyright 2013, David C. Roberts, all rights reserved.
DM109 A Practical Guide To ASE Optimizer Statistics and Optdiag Eric Miner Development Engineer Optimizer Group
© 2010 IBM Corporation Information Management Performance Enhancements for DB2 V10.1 Anthony Reina - Accelerated Value Specialist
Chapter 3 Single-Table Queries
Lecture 8 Index Organized Tables Clusters Index compression
CSE314 Database Systems More SQL: Complex Queries, Triggers, Views, and Schema Modification Doç. Dr. Mehmet Göktürk src: Elmasri & Navanthe 6E Pearson.
© 2009 IBM Corporation The future runs on System z 2010 DB2 9 Optimizer Jeff Freschl, IBM Silicon Valley Lab.
CS 345: Topics in Data Warehousing Tuesday, October 19, 2004.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Query Evaluation Chapter 12: Overview.
Access Path Selection in a Relational Database Management System Selinger et al.
Module 7 Reading SQL Server® 2008 R2 Execution Plans.
Database Management 9. course. Execution of queries.
Using Relational Databases and SQL John Hurley Department of Computer Science California State University, Los Angeles Lecture 3: Joins Part I.
1 Chapter 7 Optimizing the Optimizer. 2 The Oracle Optimizer is… About query optimization Is a sophisticated set of algorithms Choosing the fastest approach.
Query Optimization Chap. 19. Evaluation of SQL Conceptual order of evaluation – Cartesian product of all tables in from clause – Rows not satisfying where.
Module 5 Planning for SQL Server® 2008 R2 Indexing.
1 The Relational Database Model. 2 Learning Objectives Terminology of relational model. How tables are used to represent data. Connection between mathematical.
9/7/2012ISC329 Isabelle Bichindaritz1 The Relational Database Model.
DB2 10 Hash Access: Access Path or Collision Course? Donna Di Carlo BMC Software Session Code: A13 Wednesday, 16 November 2011 | Platform: DB2 for z/OS.
1 Chapter 10 Joins and Subqueries. 2 Joins & Subqueries Joins – Methods to combine data from multiple tables – Optimizer information can be limited based.
Database Systems Design, Implementation, and Management Coronel | Morris 11e ©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or.
Module 4 Database SQL Tuning Section 3 Application Performance.
Using Partial Indexes with PostgreSQL By Lloyd Albin 4/3/2012.
Query Optimizer (Chapter ). Optimization Minimizes uses of resources by choosing best set of alternative query access plans considers I/O cost,
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Introduction to Query Optimization Chapter 13.
CPSC 404, Laks V.S. Lakshmanan1 Overview of Query Evaluation Chapter 12 Ramakrishnan & Gehrke (Sections )
1 Chapter 9 Tuning Table Access. 2 Overview Improve performance of access to single table Explain access methods – Full Table Scan – Index – Partition-level.
Meta Data Cardinality Explored CSSQLUG User Group - June 2009.
# 1# 1 QueriesQueries How do we ask questions of the data? What is SELECT? What is FROM? What is WHERE? What is a calculated field? Spring 2010 CS105.
Thinking in Sets and SQL Query Logical Processing.
1 Overview of Query Evaluation Chapter Outline  Query Optimization Overview  Algorithm for Relational Operations.
SQL Server Statistics DEMO SQL Server Statistics SREENI JULAKANTI,MCTS.MCITP,MCP. SQL SERVER Database Administration.
SQL Server Statistics DEMO SQL Server Statistics SREENI JULAKANTI,MCTS.MCITP SQL SERVER Database Administration.
SQL IMPLEMENTATION & ADMINISTRATION Indexing & Views.
How is data stored? ● Table and index Data are stored in blocks(aka Page). ● All IO is done at least one block at a time. ● Typical block size is 8Kb.
More SQL: Complex Queries, Triggers, Views, and Schema Modification
SQL Server Statistics and its relationship with Query Optimizer
More SQL: Complex Queries,
Tuning Transact-SQL Queries
Indexes By Adrienne Watt.
Choosing Access Path The basic methods.
Teradata Join Processing
Sorting and Filtering Data
Writing Basic SQL SELECT Statements
Now where does THAT estimate come from?
Cardinality Estimator 2014/2016
Statistics: What are they and How do I use them
Hugo Kornelis Now where does THAT estimate come from? The nuts and bolts of cardinality estimation.
More SQL: Complex Queries, Triggers, Views, and Schema Modification
Introduction To Structured Query Language (SQL)
Tampa Bay Relational Users Group
Query Processing CSD305 Advanced Databases.
Chapter 8 Advanced SQL.
Introduction to the Optimizer
Presentation transcript:

DB2 for z/OS Query Optimization  IBM Corporation  IBM Corporation 2003 Tampa Bay Relational Users Group IBM Silicon Valley Lab, U.S.A.

DB2 for z/OS Query Optimization  IBM Corporation Filter factor issues Filter factor accuracy important for... –Index matching Accurately estimate index cost –Total index filtering Estimate table access cost via index(es) Choose how to use index (prefetch?) –Total table level filtering Efficient join order Efficient join method Appropriate sorts

DB2 for z/OS Query Optimization  IBM Corporation Terminology Correlation –When data on two columns is not independent –Eg. CITY, STATE Every city does not exist in every state. Data Skew (or skew) –Describes situation where data is non-uniformly distributed –Data can be point-skewed on a value or skewed over a range –Eg. Gender Domain (M, F) 35% = M, 65% = F

DB2 for z/OS Query Optimization  IBM Corporation Terminology (cont.) MFREQ: Multi-column frequency –Frequency on concatenated column group –MFREQ(C1,C2,C3) MCARD: Multi-column cardinality –Multi-column cardinality on a column group –MCARD(C1,C2,C3)

DB2 for z/OS Query Optimization  IBM Corporation Selectivity statistics Single column –Cardinality –HIGH2KEY/LOW2KEY –Frequency Multi-column –Cardinality –Frequency –Histogram

DB2 for z/OS Query Optimization  IBM Corporation Single column cardinality –Number of distinct values for a column –Assumes uniform distribution –Stored as SYSCOLUMNS.COLCARDF SYSINDEXES.FIRSTKEYCARDF –Used when better statistics can't be used... Host variables, parameter markers, special registers No other statistics available

DB2 for z/OS Query Optimization  IBM Corporation RUNSTATS column cardinality How to collect –RUNSTATS command: RUNSTATS TABLESPACE (DBNAME.TSNAME) TABLE (ALL or PAT_TABLE) COLUMN(ALL or ) –Leading column of index when RUNSTATS on index performed RUNSTATS INDEX (PAT_INDEX) RUNSTATS TABLESPACE (DBNAME.TSNAME) INDEX(ALL)

DB2 for z/OS Query Optimization  IBM Corporation Single column cardinality For equals predicate, filter factor = 1/COLCARDF –Index pages --> probe + matching FF * NLEAF 3 + (1/5) * 10,000 = 2003 index pages –Index record ids processed = CARDF * matching index filtering 100,000 * (1/5) = 20,000 –Rows returned = CARDF * total filtering 100,000 * (1/5) = 20,000 Select C2 from T1 Where C1 = ? Index I1: C1,C2,C3 T1 CARDF = 100,000 C1 COLCARDF = 5 I1 NLEAF = 10,000 I1 NLEVELS = 3

DB2 for z/OS Query Optimization  IBM Corporation Single column cardinality Two matching predicates, multiply filter factors –Index pages --> probe + matching FF * NLEAF 3 + [(1/5) * (1/10)] * 10,000 = ~203 index pages –Index record ids processed = CARDF * matching index filtering 100,000 * [(1/5) * (1/10)] =~ 2,000 – qualified rows = CARDF * total filtering 100,000 * [(1/5) * (1/10)] =~ 2,000 rows Select C3 from T1 Where C1 = ? AND C2 = ? Index I1: C1,C2,C3 C1 COLCARDF = 5 C2 COLCARDF = 10 FULLKEYCARDF 65,000

DB2 for z/OS Query Optimization  IBM Corporation Single column cardinality Range predicate with parameter marker –Use default interpolation filter factor chart COLCARDF 10,121 --> FF = 1/100 –In reality, could qualify anywhere from all to no rows Here's another sample predicate: –BIRTH_DATE <= ? How many people in room born before parameter marker? –What if value is ' '? –What if value is ' ? –Cannot accurately estimate without literal value Select C2 from T1 Where C1 > ? Index I1: C1,C2,C3 C1 COLCARDF = 10,121

DB2 for z/OS Query Optimization  IBM Corporation Range predicate interpolation Table 104. Default filter factors for interpolation COLCARDFFactor for OpFactor for LIKE/BETWEEN >=100,000,0001/10,0003/100,000 >=10,000,0001/3,0001/10,000 >=1,000,0001/1,0003/10,000 >=100,0001/3001/1,000 >=10,0001/1003/1,000 >=1,0001/301/100 >=1001/103/100 >=01/31/10 Note: Op is one of these operators:, >=. COMMENT: This is DB2’s documented guess for an impossible to estimate Filter factor.

DB2 for z/OS Query Optimization  IBM Corporation Single column cardinality Matching cost –Index pages --> probe + matching FF * NLEAF probe + (1/10) * 10,000 =~ 1,003 pages –Index rows processed = CARDF * Matching FF 1,000,000 * (1/10) = 100,000 rows Screening –Rows to access table for = CARDF * (Matching and screening FF) 1,000,000 * [(1/10) * (1/100)] =~ 1,000 rows??? CARDF = 1 million NPAGES/F = 100,000 NLEAF = 10,000 C1 COLCARDF 10 C3 COLCARDF 10,121 clusterratiof 50 Select C4 from T1 Where C1 = ? AND C3 > ? Index I1: C1,C2,C3

DB2 for z/OS Query Optimization  IBM Corporation HIGH2KEY/LOW2KEY –Single column statistic SYSCOLUMNS.HIGH2KEY SYSCOLUMNS.LOW2KEY –When used? Interpolation used to estimate range predicates Like, between,, >= Literal value must be known As domain statistics when COLCARDF = 1 or 2 Can be used in combination with single column frequencies for more accurate estimate. DB2 Interpolation: Technique to estimate the percentage of rows which qualify based on known high / low values.

DB2 for z/OS Query Optimization  IBM Corporation RUNSTATS HIGH2KEY / LOW2KEY How to collect –Whenever single column cardinality collected, HIGH2KEY / LOW2KEY also collected. –Reference RUNSTATS COLUMN CARDINALITY slide

DB2 for z/OS Query Optimization  IBM Corporation Linear interpolation Matching cost - same as before Screening –Rows to access table for = CARDF * (Matching + screening FF) –1,000,000 * [(1/10) * (screening FF)] =~ ??? –Interpolation for C3 > 50 (HIGH2KEY - LITVALUE) / (HIGH2KEY - LOW2KEY) ( ) / ( ) = 50/100 = 0.5 –1,000,000 * [(1/10) * (0.5)] =~ 50,000 rows (vs 1000 with def FF) CARDF = 1 million NPAGES/F = 100,000 C3 COLCARDF 10,241 LOW2KEY 0 HIGH2KEY 100 clusterratiof 50 Select C4 from T1 Where C1 = ? AND C3 > 50 Index I1: C1,C2,C3

DB2 for z/OS Query Optimization  IBM Corporation Single column frequencies –SYSCOLDIST.FREQUENCYF TYPE = 'F', NUMCOLUMNS = 1 –Provides non-uniform distribution information Data skew –When used? Literal value must be known Equals, is null, in Like, between,, >= Used in conjunction with other complementary statistics

DB2 for z/OS Query Optimization  IBM Corporation RUNSTATS Frequency Leading indexed column –PAT_INDEX (C1,C2,C3) –Top 10 values RUNSTATS INDEX (PAT_INDEX) –Top 20 values RUNSTATS INDEX (PAT_INDEX FREQVAL NUMCOLS(1) COUNT(20)) –Top 0 values (purge frequencies) (PAT_INDEX FREQVAL NUMCOLS(1) COUNT(0))

DB2 for z/OS Query Optimization  IBM Corporation RUNSTATS Frequency How to collect –RUNSTATS COLGROUP allows collection on almost any column RUNSTATS TABLESPACE DB1.TS1 TABLE (PAT_TABLE) COLUMNS(C1,C2,C3) COLGROUP (C1) FREQVAL COUNT(1) MOST COLGROUP (C2) FREQVAL COUNT(10) LEAST COLGROUP (C3) FREQVAL COUNT(20) BOTH –Eliminate existing frequencies: COLGROUP (C3) FREQVAL COUNT(0) MOST

DB2 for z/OS Query Optimization  IBM Corporation Single column frequency Value which exists a lot –Index pages --> probe + matching FF * NLEAF 3 + (0.75) * 10,000 =~ 7,503 index pages –Index record ids processed = CARDF * matching index filtering 100,000 * (0.75) =~ 75,000 –Rows returned = CARDF * total filtering 100,000 * (0.75) =~ 75,000 Select C4 from T1 Where C1 = 'A' Index I1: C1,C2,C3 T1 CARDF = 100,000 C1 COLCARDF = 5 I1 NLEAF = 10,000 I1 NLEVELS = 3 C1FREQ 'A'0.75 'B'0.15 'C'0.05 'D'0.03 'E'0.02 clusterratiof 50

DB2 for z/OS Query Optimization  IBM Corporation Single column frequency Looking for value not in the domain.... –Index pages --> probe + matching FF * NLEAF 3 + (~0) * 10,000 =~ 1 index page –Filter factor without the frequencies 1/5 = 0.20 Select C4 from T1 Where C1 = 'Z' Index I1: C1,C2,C3 T1 CARDF = 100,000 C1 COLCARDF = 5 I1 NLEAF = 10,000 I1 NLEVELS = 3 C1FREQ 'A'0.75 'B'0.15 'C'0.05 'D'0.03 'E'0.02 clusterratiof 0.50 ?

DB2 for z/OS Query Optimization  IBM Corporation Single column frequency Some in domain, some not... –Index pages --> probe + matching FF * NLEAF 3 + ( ) * 10,000 =~ 700 index page –Rows returned = CARDF * total filtering 100,000 * ( ) =~ 7000 –Without frequencies filter factor = 3/5 = 0.60 versus 0.07 Select C2 from T1 Where C1 in ('C', 'Z', 'E') Index I1: C1,C2,C3 T1 CARDF = 100,000 C1 COLCARDF = 5 I1 NLEAF = 10,000 I1 NLEVELS = 3 C1FREQ 'A'0.75 'B'0.15 'C'0.05 'D'0.03 'E'0.02

DB2 for z/OS Query Optimization  IBM Corporation Single column histograms Single column frequencies –SYSCOLDIST.FREQUENCYF TYPE = ‘H', NUMCOLUMNS = 1 –Provides non-uniform distribution information Range skew –When used? Literal value must be known Equals, is null, in Like, between,, >= Used in conjunction with other complementary statistics

DB2 for z/OS Query Optimization  IBM Corporation Histograms How to collect –RUNSTATS COLGROUP allows collection on almost any column RUNSTATS TABLESPACE DB1.TS1 TABLE (PAT_TABLE) COLUMNS(C1,C2,C3) COLGROUP (C2) HISTOGRAM NUMQUANTILES 20 INDEX (I1 KEYCARD HISTOGRAM NUMCOLS 1 NUMQUANTILES 30,I2 KEYCARD HISTOGRAM NUMCOLS 1 NUMQUANTILES 50 ) - 20 quantiles for column C quantiles for leading column of index I quantiles for leading column of index I3. –Eliminate existing histograms: COLGROUP (C3) HISTOGRAM COUNT(0) MOST

DB2 for z/OS Query Optimization  IBM Corporation RUNSTATS Histogram Statistics RUNSTATS will produce equal-depth histogram –Each quantile (range) will have approx same number of rows Not same number of values –Another term is range frequency Example 1, 3, 3, 4, 4, 6, 7, 8, 9, 10, 12, 15 (sequenced) –Lets cut that into 3 quantiles. 1, 3, 3, 4,4 6,7,8,910,12,15 Seq NoLow ValueHigh ValueCardinalityFrequency 11435/ / /12

DB2 for z/OS Query Optimization  IBM Corporation RUNSTATS Histogram Statistics Notes RUNSTATS –Maximum 100 quantiles for a column –Same value columns WILL be in the same quantile –Quantiles will be similar size but: Will try to avoid big gaps inside quantiles Highvalue and lowvalue may have separate quantiles Null WILL have a separate quantile Supports column groups as well as single columns Think “frequencies” for high cardinality columns

DB2 for z/OS Query Optimization  IBM Corporation Histogram Statistics Example Customer uses INTEGER (or VARCHAR) for YEAR-MONTH Assuming data for 2006 & 2007 –FF = (high-value – low-value) / (high2key – low2key) –FF = ( – ) / ( – ) –10% of rows estimated to return WHERE YEARMONTH BETWEEN AND Data assumed as evenly distributed between low and high range

DB2 for z/OS Query Optimization  IBM Corporation Histogram Statistics Example Example (cont.) –Data only exists in ranges & Collect via histograms –45% of rows estimated to return No data between & WHERE YEARMONTH BETWEEN AND

DB2 for z/OS Query Optimization  IBM Corporation Single column recommendations Cardinality –Collect on all columns used in where clause –Used regardless of literal value known Interpolation (HIGH2KEY/LOW2KEY) –Collected with column statistics –Consider REOPT(VARS) – Dynamic V8 - REOPT(ONCE) DB2 9 – REOPT(AUTO) Frequency –Optimizer requires literal value to use –Used for most predicate types –Collect all values for low COLCARDF columns –Useful for indexed and non-indexed columns Histograms –Optimizer needs literal value to use –Can be used in virtually all scenarios frequencies are used, also join predicates –Collect histograms rather than frequencies for RANGE predicates

DB2 for z/OS Query Optimization  IBM Corporation Filter factor issues Filter factor accuracy important for... –Index matching Estimate index cost –Total index filtering Estimate table access cost via index(es) Choose how to use index (prefetch?) –Table filtering Efficient join order Efficient join method Appropriate sorts

DB2 for z/OS Query Optimization  IBM Corporation Multi-column cardinalities Multi-column cardinalities (MCARD) –Stored in a few places... SYSINDEXES.FULLKEYCARDF SYSCOLDIST.CARDF –TYPE = 'C', NUMCOLUMNS > 1 –Assumes uniform distribution –When used? Primarily for indexes Literal values not necessary KEYCARD for partially matching indexes –Collect for all indexes with 3 or more columns Collect to support multi-column frequencies Collect for all multi-column join situations

DB2 for z/OS Query Optimization  IBM Corporation RUNSTATS KEYCARD How to collect –V7 RUNSTATS only collects KEYCARD on leading column of index –By default, RUNSTATS only collects FIRST/FULLKEYCARDF INDEX PAT_INDEX (C1,C2,C3,C4) RUNSTATS INDEX(PAT_INDEX KEYCARD) –MCARD on leading concatenated column groups: –MCARD(C1,C2), MCARD(C1,C2,C3)

DB2 for z/OS Query Optimization  IBM Corporation RUNSTATS COLGROUP DB2 V8 allows collection of MCARD on any column group RUNSTATS TABLESPACE DB1.TS1 TABLE(PAT_TABLE) COLUMN(C1,C2,C3,C4) COLGROUP(C1,C4) Specifying COLGROUP with multiple columns collects multi-column cardinality on the group.

DB2 for z/OS Query Optimization  IBM Corporation Multi-column cardinality Useful for local predicates SELECTname, address,... FROMCUST WHERECity = ? ANDState = ? ANDLast_name = ? INDEXIndex column1stkeyKEYCARDFULLKEY IX1City, State, Zip 10,00012,0001,000,000 IX2Last_name20,000N/A20,000 COLCARDF STATE 50 IX1 Filter Factor without KEYCARD 1 / 500,000 IX1 Filter Factor WITH KEYCARD 1 / 12,000 Last_name1 / 20,000 keycard!!!

DB2 for z/OS Query Optimization  IBM Corporation Multi-column cardinality Useful for join predicates SELECTcols FROMT1, T2 WHERET1.C1 = T2.C1 ANDT1.C2 = T2.C2 INDEX I1 (C1,C2) on table T1 INDEX I2 (C1,C2,C3) on table T2 KEYCARD is necessary on index I2 to accurately estimate T1  T2 join size.

DB2 for z/OS Query Optimization  IBM Corporation Multi-column cardinality Matching + Screening SELECTcols FROMT1 WHERET1.C1 = ? ANDT1.C3 = ? ANDT1.C4 = ? INDEX I2 (C1,C2,C3) on table T1 COLCARDF / FIRSTKEYCARDF for matching only COLGROUP (C1,C3)  matching + screen COLGROUP (C1,C3,C4)  table level filtering

DB2 for z/OS Query Optimization  IBM Corporation Filter factor issues (reminder) Filter factor accuracy important for... –Index matching Estimate index cost –Total index filtering Estimate table access cost via index(es) Choose how to use index (prefetch?) –Table filtering Efficient join order Efficient join method Appropriate sorts

DB2 for z/OS Query Optimization  IBM Corporation Multi-column frequency Multi-column frequencies –Very similar to single column frequencies Distribution statistics concatenated column group values Identifies multi-column skewed distributions –Stored in SYSCOLDIST.FREQUENCYF TYPE = ‘F’ NUMCOLUMNS > 1

DB2 for z/OS Query Optimization  IBM Corporation RUNSTATS INDEX How to collect –V7 RUNSTATS only collects on leading concatenated column of index –RUNSTATS does NOT collect multi-column frequencies by default. –Must be explicitly requested. INDEX (I1) columns (C1,C2,C3) –Collect top 15 values for column group (C1,C2) RUNSTATS INDEX (I1 FREQVAL NUMCOLS(2) COUNT(15)) –Eliminate frequencies on column group (C1,C2): RUNSTATS INDEX (I1 FREQVAL NUMCOLS(2) COUNT(0))

DB2 for z/OS Query Optimization  IBM Corporation RUNSTATS COLGROUP DB2 V8 allows collection of multi-column frequencies on almost any column group –Examples RUNSTATS TABLESPACE DB1.TS1 TABLE (T1) COLUMN(C1,C2,C3) COLGROUP(C1,C3) FREQVAL COUNT(10) MOST COLGROUP(C2,C3) FREQVAL COUNT(1) LEAST –Eliminate frequencies on column group (C1,C3): COLGROUP (C1,C3) FREQVAL COUNT(0)

DB2 for z/OS Query Optimization  IBM Corporation Multi-column frequency Multi-column frequencies –Limited use Boolean equal predicates only Always collect supporting multi-column cardinality –Collect single column frequencies for Range predicates In-lists Single column predicates other non-equal predicates

DB2 for z/OS Query Optimization  IBM Corporation Multi-column frequency GenderCOLCARDF = 2 CategoryCOLCARDF = 4 (Gender,Category)MCARD = 8 CategoryGenderFREQUENCYF1/MCARD Women's HealthF Women's HealthM Men's HealthM Men's HealthF HockeyM HockeyF SoccerF SoccerM

DB2 for z/OS Query Optimization  IBM Corporation Multi-column frequency trap Assumptions –Table T1 cardinality 150,000,000 –Index I1 (GENDER, ACCT_NUM) –COLCARDF GENDER 2 –COLCARDF ACCT_NUM 120,000,000 SQL SELECT * FROM T1 WHERE GENDER = ‘M’ Global RUNSTATS with FREQVAL used –RUNSTATS INDEX (ALL) FREQVAL NUMCOLS (6) COUNT(10) –(Don’t do this!)

DB2 for z/OS Query Optimization  IBM Corporation Useless multi-column frequency GENDERACCT_NUM FREQUENCYF M F F M F M M F Useless because… –Frequencies on GENDER alone either non-existant, or collected long ago and becoming stale –Multi-column frequency on (GENDER,ACCT_NUM) not used since only one column exists –Even if frequency could be used – ACCT_NUM is so selective, the frequencies are too diluted to tell us anything about GENDER alone

DB2 for z/OS Query Optimization  IBM Corporation Multi-column considerations Cardinality –Collect KEYCARD for all indexes with 3 or more columns –Literal values not required Frequencies –Not useful when literal values aren't known –Dynamic SQL prepared with parameter markers –Static SQL with hostvars (consider REOPT) –Special registers (consider REOPT) –Collect for specific cases Pay special attention to... –Low cardinality column groups –Volatile data

DB2 for z/OS Query Optimization  IBM Corporation Join considerations Which table should be outer? –Hmmm, low cardinality predicates.... –Significant opportunity for misestimation on both T1 and T2 – could lead to poor choice. The more tables in the join and the more predicates involved - the more important this becomes Select T1.C4, T2.C6 from T1, T2 Where T1.C1 = T2.C1 AND T1.C2 = 50 AND T2.C3 = 'A' AND T2.C4 = 'B' Unique indexes T1.C1 T2.C1 T1 & T2 CARDF = 10 million T1.C2 COLCARDF = 30 T2.C3 COLCARDF = 10 T2.C4 COLCARDF = 15

DB2 for z/OS Query Optimization  IBM Corporation One more time Filter factor accuracy important for... –Index matching Estimate index cost –Total index filtering Estimate table access cost via index(es) Choose how to use index (prefetch?) –Table filtering Efficient join order Efficient join method Appropriate sorts