Oracle Statistics by Example

Oracle Statistics by Example
Mauro Pagano

Mauro Pagano Consultant & Developer Oracle  Enkitec  Accenture
DBPerf and SQL Tuning Training Tools (SQLT, SQLd360, TUNAs360, Pathfinder)

Background Optimizer generates execution plans
Many execution plans for each SQL Optimal execution plan has lower cost (*) Cost is computed based on Statistical formulas (Oracle IP) Many statistics around the SQL (seeded by us) 9/20/2018 Enkitec ©

Some terminology Cost Cardinality Selectivity
Unit of measure to compare plan estimated perf Equivalent to expected #single block reads Cardinality Number of rows handled, produced / consumed Selectivity % of filtering caused by predicates, range is [0,1] Output card = input card * selectivity 9/20/2018 Enkitec ©

Why so much emphasis? Statistics are “picture” about entities
9/20/2018 Why so much emphasis? Statistics are “picture” about entities Quality of the picture affects quality plan Poor stats generally lead to poor plans (*) Better stats generally lead to better plans (*) Our best bet is to provide good quality stats Not always as trivial as it sounds There are exceptions to both rules 9/20/2018 Enkitec © Enkitec (c)

Many type of statistics
Oracle Optimizer uses statistics about Objects: tables, indexes, columns, etc System: CPU Speed and many IO metrics Dictionary: Oracle internal physical objects Fixed Objects: memory structure (X$) First two affect application SQLs Focus of this presentation is object statistics 9/20/2018 Enkitec ©

What should I do about statistics?
9/20/2018 What should I do about statistics? Collect them  Object stats when there are “enough” changes System stats once, if any (*) Oracle-seeded package DBMS_STATS Used to collect all type of statistics Plus drop, exp/imp, set prefs, etc etc Many params to affect how/what to collect Can have large impact on quality From this point on everything I say is specific to object stats, system stats will be mentioned later explicitly 9/20/2018 Enkitec © Enkitec (c)

When should I gather stats?
No specific threshold in terms of time Balance between frequency and quality Gather high quality is expensive thus slow exec Gather frequently require fast exec Optimal plans tend not to change over time Favor quality over frequency 9/20/2018 Enkitec ©

How? DBMS_STATS.GATHER_TABLE_STATS ( ownname VARCHAR2, tabname VARCHAR2, partname VARCHAR2 DEFAULT NULL, estimate_percent NUMBER DEFAULT to_estimate_percent_type (get_param('ESTIMATE_PERCENT')), block_sample BOOLEAN DEFAULT FALSE, method_opt VARCHAR2 DEFAULT get_param('METHOD_OPT'), degree NUMBER DEFAULT to_degree_type(get_param('DEGREE')), granularity VARCHAR2 DEFAULT GET_PARAM('GRANULARITY'), cascade BOOLEAN DEFAULT to_cascade_type(get_param('CASCADE')), stattab VARCHAR2 DEFAULT NULL, statid VARCHAR2 DEFAULT NULL, statown VARCHAR2 DEFAULT NULL, no_invalidate BOOLEAN DEFAULT to_no_invalidate_type ( get_param('NO_INVALIDATE')), stattype VARCHAR2 DEFAULT 'DATA', force BOOLEAN DEFAULT FALSE, context DBMS_STATS.CCONTEXT DEFAULT NULL, -- non operative options VARCHAR2 DEFAULT 'GATHER'); 9/20/2018 Enkitec ©

That looks really complex!
Easiest thing is let Oracle use defaults Just pass owner and object name This is also the recommended way starting 11g Many features depend on default values 12c histograms, Incremental, Concurrent As simple as exec dbms_stats.gather_table_stats(user,'T1') 9/20/2018 Enkitec ©

What did we just do? Gathered:
table statistics on table T1 column statistics for every column index statistics on every index defined on T1 (sub)partition statistics histograms on subset of columns (*) We’ll cover next stats that matters to CBO 9/20/2018 Enkitec ©

Table statistics Optimizer only uses two statistics
Number of blocks below HWM [ALL|DBA|USER]_TABLES.NUM_BLOCKS Used to cost Full Table Scan operations Number of rows in the table [ALL|DBA|USER]_TABLES.NUM_ROWS Used to estimate how many rows we dealing with 9/20/2018 Enkitec ©

Table statistics – FTS cost
select table_name,num_rows,blocks from user_tables where table_name='T1'; TABLE_NAME NUM_ROWS BLOCKS T explain plan for select * from t1; select * from table(dbms_xplan.display); Plan hash value: | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | | 0 | SELECT STATEMENT | | 920K| 100M| (1)| 00:00:01 | | 1 | TABLE ACCESS STORAGE FULL| T1 | 920K| 100M| (1)| 00:00:01 | 9/20/2018 Enkitec ©

Table statistics – Cardinality
select table_name,num_rows,blocks from user_tables where table_name='T1'; TABLE_NAME NUM_ROWS BLOCKS T explain plan for select * from t1; select * from table(dbms_xplan.display); Plan hash value: | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | | 0 | SELECT STATEMENT | | 920K| 100M| (1)| 00:00:01 | | 1 | TABLE ACCESS STORAGE FULL| T1 | 920K| 100M| (1)| 00:00:01 | 9/20/2018 Enkitec ©

Table statistics – Cardinality
select table_name,num_rows,blocks from user_tables where table_name='T1'; TABLE_NAME NUM_ROWS BLOCKS T explain plan for select * from t1; select * from table(dbms_xplan.display); Plan hash value: | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | | 0 | SELECT STATEMENT | | | | (1)| 00:00:01 | | 1 | TABLE ACCESS STORAGE FULL| T1 | | | (1)| 00:00:01 | 9/20/2018 Enkitec ©

Column statistics – NoHgrm
select column_name, num_distinct, num_nulls, histogram from user_tab_cols where table_name = 'T1' and column_name like '%OBJECT_ID'; COLUMN_NAME NUM_DISTINCT NUM_NULLS HISTOGRAM OBJECT_ID NONE DATA_OBJECT_ID NONE explain plan for select * from t1 where object_id = 1234; | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | | 0 | SELECT STATEMENT | | | | (1)| 00:00:01 | |* 1 | TABLE ACCESS STORAGE FULL| T1 | | | (1)| 00:00:01 | Predicate Information (identified by operation id): 1 - storage("OBJECT_ID"=1234) filter("OBJECT_ID"=1234) Let’s do the math! Total rows: NDV: 93192 * 1/93192 ~= 10 9/20/2018 Enkitec ©

Column statistics – NoHgrm
select column_name, num_distinct, num_nulls, histogram from user_tab_cols where table_name = 'T1' and column_name like '%OBJECT_ID'; COLUMN_NAME NUM_DISTINCT NUM_NULLS HISTOGRAM OBJECT_ID NONE DATA_OBJECT_ID NONE explain plan for select * from t1 where data_object_id = 1234; | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | | 0 | SELECT STATEMENT | | | | (1)| 00:00:01 | |* 1 | TABLE ACCESS STORAGE FULL| T1 | | | (1)| 00:00:01 | Predicate Information (identified by operation id): 1 - storage(”DATA_OBJECT_ID"=1234) filter(”DATA_OBJECT_ID"=1234) Let’s do the math! Total rows: Total NULLs: NDV: 8426 ( – )/8426 ~= 10 9/20/2018 Enkitec ©

Column statistics – Min/Max
cook_raw(low_value,'NUMBER') low_v,cook_raw(high_value, 'NUMBER') high_v COLUMN_NAME NUM_DISTINCT LOW_VALU HIGH_VAL OBJECT_ID DATA_OBJECT_ID | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | explain plan for select * from t1 where object_id = 99953; |* 1 | TABLE ACCESS STORAGE FULL| T1 | | | (1)| 00:00:01 | explain plan for select * from t1 where object_id = ; |* 1 | TABLE ACCESS STORAGE FULL| T1 | | | (1)| 00:00:01 | The more we move far away from the range, the lower the estimation 9/20/2018 Enkitec ©

Column Statistics Optimizer also uses Density Histogram
Not stored in dictionary (old one was, new one no) Used for unpopular value selectivity Histogram [ALL|DBA|USER]_TAB_COLS.LOW_VALUE [ALL|DBA|USER]_TAB_COLS.HIGH_VALUE [ALL|DBA|USER]_TAB_HISTOGRAMS Used for popular value selectivity 9/20/2018 Enkitec ©

What is a histogram? Describe data distribution skewness
Help the CBO get more accurate estimations Many types available Frequency – 1 bucket per NDV Top-frequency – 1 bucket per top NDV Hybrid – 1 bucket per popular value, others split Creation influenced by method_opt param 9/20/2018 Enkitec ©

What does it look like? 9/20/2018 Enkitec ©

Column statistics – Histogram
explain plan for select count(*) from t1 where object_type = 'INDEX'; | Id |Operation |Name|Rows |Bytes | ost (%CPU)|Time | | 0|SELECT STATEMENT | | 1| 9 | (1)|00:00:01| | 1| SORT AGGREGATE | | 1| 9 | | | |* 2| TABLE ACCESS STORAGE FULL|T1 |44990| 395K| (1)|00:00:01| 2 - storage("OBJECT_TYPE"='INDEX') filter("OBJECT_TYPE"='INDEX') explain plan for select count(*) from t1 where object_type = 'TABLE'; |* 2| TABLE ACCESS STORAGE FULL|T1 |24980| 219K| (1)|00:00:01| 2 - storage("OBJECT_TYPE"='TABLE') filter("OBJECT_TYPE"='TABLE') Different values have different estimation thanks to the histogram 9/20/2018 Enkitec ©

What is an index? Structure that stores pair key(s)-location
Key(s) are stored in sorted order Used to identify rows of interest without FTS Navigating index and extraction location(s) Depending on filters, faster than FTS (or not) No fixed threshold, cheaper option wins 9/20/2018 Enkitec ©

Index Statistics Optimizer uses Blevel Number of leaf blocks (LB)
[ALL|DBA|USER]_INDEXES.BLEVEL Used to estimate how expensive is to locate first leaf Number of leaf blocks (LB) [ALL|DBA|USER]_INDEXES.LEAF_BLOCKS Used to estimate how many index leaf blocks to read Clustering Factor (CLUF) [ALL|DBA|USER]_INDEXES.CLUSTERING_FACTOR Used to estimate how many table blocks to read Distinct Keys (DK) [ALL|DBA|USER]_INDEXES.DISTINCT_KEYS Used to help with data correlation 9/20/2018 Enkitec ©

Leaves are chained back and forth for asc/desc scan
What does it look like? Root Branches Leaves are chained back and forth for asc/desc scan Number of jumps is CLUF Leaves B B B B B B 9/20/2018 Enkitec ©

Index Statistics Distinct keys is 100% accurate NUM_DISTINCT is approximated If CLUF ~= number of rows in the table, inefficient index select index_name, blevel, leaf_blocks, distinct_keys, clustering_factor from user_indexes where index_name = 'T1_IDX'; INDEX_NAME BLEVEL LEAF_BLOCKS DISTINCT_KEYS CLUSTERING_FACTOR T1_IDX explain plan for select * from t1 where object_id = 1234; | Id | Operation |Name |Rows | Bytes|Cost (%CPU)| | 0 | SELECT STATEMENT | | 10| 1150| 13 (0)| | 1 | TABLE ACCESS BY INDEX ROWID BATCHED|T1 | 10| 1150| 13 (0)| |* 2 | INDEX RANGE SCAN |T1_IDX| 10| | 3 (0)| 2 - access("OBJECT_ID"=1234) Cost jumps 10 for 10 rows (from 3 to 13) as consequence of bad CLUF 9/20/2018 Enkitec ©

Extended Statistics Provide additional info to CBO about
Data correlation (functional dependencies) Expressions applied to column(s) Need to be manually implemented Automatically in 12c, not bulletproof yet Lack of usually translates in estim mistakes 9/20/2018 Enkitec ©

Extended statistics – Expression
explain plan for select count(*) from t1 where lower(object_type) = 'index'; | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | | 0 | SELECT STATEMENT | | | | (1)| 00:00:01 | | 1 | SORT AGGREGATE | | | | | | |* 2 | TABLE ACCESS STORAGE FULL| T1 | | | (1)| 00:00:01 | 2 - storage(LOWER("OBJECT_TYPE")='index') filter(LOWER("OBJECT_TYPE")='index') dbms_stats.gather_table_stats(user,'T1',method_opt=>'FOR COLUMNS (lower(object_type)) SIZE 254'); | 0 | SELECT STATEMENT | | | | (1)| 00:00:01 | |* 2 | TABLE ACCESS STORAGE FULL| T1 | | 395K| (1)| 00:00:01 | Incorrect estimation, we know the right one is ~45k Correct estimation  9/20/2018 Enkitec ©

estimate_percent Amount of data to sample for gathering stats
Has an impact on time to gather and quality Recommended (default) AUTO_SAMPLE_SIZE Not recommended in 10g, yes in 11g onwards Required for many features Use HyperLogLog algorithm internally (*) 9/20/2018 Enkitec ©

method_opt On which columns gather stats
On which columns gather histograms (#buckets) Recom (default) FOR ALL COLUMNS SIZE AUTO Not recommended in 10g, yes in 11g onwards Oracle determines hist/no-hist based on col usage If app knows better, follow app recommendations 9/20/2018 Enkitec ©

Can’t Oracle do it for me?
Oracle provides nightly job to gather stats Does a decent job starting 11g (so so in 10g) Prioritize tables order depending on #changes Only allowed to run for fixed number of hours Might not touch all needed objects Collects object and dictionary stats only Apps might have specific req, follow them 9/20/2018 Enkitec ©

References Oracle Database PL/SQL Packages and Types Reference 12.1
Oracle Database SQL Tuning Guide 12.1 Master Note: Optimizer Statistics (Doc ID )

9/20/2018 Contact Information Tools SQLd360, TUNAs360, Pathfinder Enkitec (c)

Oracle Statistics by Example

Similar presentations

Presentation on theme: "Oracle Statistics by Example"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Oracle Statistics by Example

Similar presentations

Presentation on theme: "Oracle Statistics by Example"— Presentation transcript:

Similar presentations

About project

Feedback