1 Chapter 7 Optimizing the Optimizer
2 The Oracle Optimizer is… About query optimization Is a sophisticated set of algorithms Choosing the fastest approach (hopefully) Cost-based determinations, including – Time estimates for retrieval – Estimated CPU time – Does not include network components
3 OPTIMIZER_GOAL parameter Controls how the optimizer calculates cost ALL_ROWS – minimizes cost of processing all rows – Favors full table scans FIRST_ROWS_N – minimizes cost of processing first N rows – Favors index usage RULE – Obsolete but can be invoked with RULE hint
4 Query Transformation Oracle can transform a query into a logically equivalent statement Types of transformations include: – View Merging – Taking view and merging into a SQL statement – Star – Used for star schemas – Join elimination – eliminates unnecessary joins – Join factorization – used with UNION operations – Antijoin and semijoin – Converts MINUS and INTERSECT
5 Basis of Cost Calculations The amount of data accessed Estimated data in memory Estimate of physical IO operations Estimate of CPU-related operations – Sorting – Hashing Relative speed of IO and compute operations
6 Inputs into Cost Calculations Object statistics – metadata about the tables Histograms – data distribution statistics Database parameters – data block, memory, and other process configuration parameters System statistics – physical hardware configuration Bind Variable Peeking - Using actual value of variable to help determine execution plan
7 Database Parameters Influences optimizer decisions Some key parameters affecting optimization: – DB_FILE_MULTIBLOCK_READ_COUNT – how many blocks read in single IO operation – OPTIMIZER_INDEX_CACHING & OPTIMIZER_INDEX_COST_ADJ – affects how optimizer thinks about indexes – OPTIMIZER_FEATURES_ENABLE – tells optimizer what Oracle version to base decisions – Parallel_threads_per_cpu – affects how prominent parallelism can be – Many memory (SGA/PGA based) parameters – See pp
8 System Statistics View these in SYS.AUX_STATS$ view Statistics on physical system configuration Types include: – Nonworkload statistics – collect in absence of database load / use – Workload statistics – collect during typical database activity Workload statistics more reflective of actual hardware configuration
9 Collecting Statistics Automatic Statistics Gathering – Automated as of Oracle 10g – Many configurable parameters – Can configure within manually or within Enterprise Manager Stale Statistics – Means Oracle keeps DML activity to determine when statistics become stale – STATISTICS_LEVEL must be set to use – View information in ALL_TAB_MODIFICATIONS view – Can configure the threshold of what is regarded as “stale”
10 Collecting Statistics (cont.) Manual Statistics Gathering – Usually necessary on some objects to keep current as is needed – Used to fine-tune statistics collection – Used to collect on SYSTEM objects not gather automatically – Used to create “extended statistics” Dynamic Sampling – Done by Oracle on-the-fly at execution time – Done when certain objects have no statistics – Oracle may determine doing this may reduce execution time – Determined by OPTIMIZER_DYNAMIC_SAMPLING parameter
11 Using DBMS_STATS Oracle provided PL/SQL Package Collects and manages optimizer statistics – For a table – For a schema – For objects with “stale” statistics – Used to create histograms – Used to transfer statistics from another database – See pp for procedures and parameters
12 Using DBMS_STATS (cont.) Defaults can be set via the following procedures – SET_GLOBAL_PREFS – SET_DATABASE_PREFS – SET_SCHEMA_PREFS – SET_TABLE_PREFS Configurable parameters: – CASCADE – DEGREE – ESTIMATE_PERCENT – METHOD_OPT – GRANULARITY – See p. 202 for others
13 Histograms and METHOD_OPT Histograms – Increase accuracy of of cost calculations – Increase overhead of collecting statistics – Worthwhile for Columns with irregular data distribution Columns involved with WHERE or JOIN expressions METHOD_OPT affects how histograms are created – Default is FOR ALL COLUMNS SIZE AUTO – leaves decision to Oracle to choose columns on which to create histograms – Based on cardinality – Based on presence of filters dependent on the data – Set METHOD_OPT manually when appropriate if knowledge of column data makeup is known
14 Extended Statistics Stored in ALL_TAB_COL_STATISTICS Two types – Multicolumn – Expression-based Multicolumn Extended Statistics – Calculates selectivity of multicolumn expressions – Example 50% of customers are men 10% of customers are from Australia Optimizer conclusion: 5% of customers are Australian men
15 Extended Statistics Expression Extended Statistics – Optimizer gathers statistics on an expression to help determine query cost – Take the query: SELECT COUNT(*), SUM(amount_sold) FROM SALES WHERE sale_category (amount_sold) = 1; – By gathering statistics on the expression in above query, optimizer will likely more accurately determine cost of query BEGIN DBMS_STATS.gather_table_stats (ownname=>USER,tabname=>’SALES’,method_opt=> ‘FOR ALL COLUMNS FOR COLUMNS (sale_category(amount_sold))’); END
16 More on DBMS_STATS Can perform other operations, such as – Locking statistics (LOCK_SCHEMA_STATS) – Collect system statistics (GATHER_SYSTEM_STATS) – Move statistics from one database to another Popular to simulate production statistics in a development environment – Manipulate object statistics Modifying statistics based on actual data Good if forecasting data changes