1 Using Statspack in Oracle8i and 9i to Identify Problems Ian Jones Database Specialists, Inc.
2 Session Topics Statspack introduction and features Mechanics –installing –generating snapshots –producing reports Discussion of the generic report Examples
3 Session Topics Statspack introduction and features Mechanics –installing –generating snapshots –producing reports Discussion of the generic report Examples
4 What is Statspack? An Oracle provided set of SQL*Plus scripts and a PL/SQL package that allows the convenient collection, automation, storage and reporting of performance and diagnostic data A PERFSTAT schema containing 42 ‘stats$’ tables and a PL/SQL package ‘statspack’ Replacement for utlbstat/utlestat
5 Overview of How Statspack Works Oracle instances constantly update lots of internal statistics, most visible through the v$ views e.g. system statistics, wait events and SQL activity, etc (timed_statistics, resource_limit, 9i statistics_level) Using ‘statspack.snap’ we save away these values from 34 v$ views into stats$ tables when desired Then we run the statspack report script ‘spreport.sql’ which calculates and displays the differences between any two sets of statistics Straightforward and effective
6 What Questions Can Statspack Answer? What work load is the database under now? What activities/events are we waiting for? Which SQL is consuming most resources? Which segments are most problematic? Where is the I/O, and are we CPU bound? How does all this compare with earlier data? Statspack provides diagnostic data to solve problems.
7 Why Use Statspack? Simple and quick to install and use Provided with all editions version Written by Oracle - in sync with RDBMS Small system overhead (varies with level) Source code is available for review Snapshot data held in tables and available for historical or custom analysis
8 Replacement For (utl)bstat/estat Statspack has an improved design over bstat/estat –Flexible reporting because data held in tables –Different levels of data collection –User defined thresholds Wider range of data –SQL statements –Wait events –Segment statistics (9.2) Bstat/estat not updated with new features
9 Statspack Main Files Set of 19 files named sp* (stat* in 8.1.6) located in $ORACLE_HOME/rdbms/admin spdoc.txt – Good description of mechanics spcreate.sql – Sqlplus installation script spreport.sql – Generic reporting script sprepsql.sql – Explain plan report script spauto.sql – Creates dbms_job to automate data collection (job_queue_processes>0)
10 Session Topics Statspack introduction and features Mechanics –installing –generating snapshots –producing reports Discussion of the generic report Examples
11 Installation Run the ‘spcreate.sql’ script using SQL*Plus as user SYS. User PERFSTAT is created by this script, owning all objects needed by the statspack package. E.g. On Unix: cd $ORACLE_HOME/rdbms/admin sqlplus “/ as To set up automatic collection of data every hour: cd $ORACLE_HOME/rdbms/admin sqlplus
12 Snapshots A single set of performance data captured using the statspack PL/SQL package: Begin perfstat.statspack.snap(i_snap_level=>6); End; Different snapshot levels determine data captured: Level = 0 General performance statistics(8i,9i) Level = 5 SQL Statements(default)(8i,9i) Level = 6 SQL Plans(9i) Level = 7 Segment statistics(9.2) Level = 10 Parent and Child latches(8i,9i)
13 Generic Report (spreport.sql) Generates a report between any two snapshots as long as the instance was not restarted between the snapshots sqlplus Enter the start and end snapshot id’s and optionally enter the output file name (or accept the default sp_ _.lst)
14 Session Topics Statspack introduction and features Mechanics –installing –generating snapshots –producing reports Discussion of the generic report Examples
15 Sections of the Generic Report Context0 Cache Sizes0 Load Profile0 Instance Efficiency0 Timed/Wait Events (renamed now includes CPU time)0 SQL (Buffer Gets/Disk Reads/Executions/Parses)5 Instance Statistics0 Tablespace and Datafile IO0 Buffer Pool Statistics0 Rollback Activity0 Latch Statistics0,10 Segment Statistics (introduced in 9.2)7 Library Cache Statistics0 SGA Pool Breakdown0 Instance Parameters0
16 Context/Cache Sizes DB Name DB Id Instance Inst Num Release Cluster Host HAW haw NO HAWKING Snap Id Snap Time Sessions Curs/Sess Begin Snap: Oct-02 16:45: End Snap: Oct-02 16:46: Elapsed: 0.63 (mins) Cache Sizes (end) ~~~~~~~~~~~~~~~~~ Buffer Cache: 36M Std Block Size: 8K Shared Pool Size: 12M Log Buffer: 512K
17 Load Profile Per Second Per Transaction Redo size: 77, ,931, Logical reads: , Block changes: , Physical reads: Physical writes: User calls: Parses: , Hard parses: , Sorts: Logons: Executes: , Transactions: 0.03 % Blocks changed per Read: Recursive Call %: Rollback per trans %: 0.00 Rows per Sort: 18.96
18 Load Profile - Comments Excellent summary of instance workload based on selected v$sysstat statistics Problems easier to see if data from a previous baseline is available - are we performing more IO? Difficult to set upper limits due to hardware and system variation – rough guidelines –Logical reads > 10,000 per 100MHz CPU per second –Physical reads > 100 per disk per second –Hard parses, soft parses > 100, 300 per second Focus on parse (consider cursor_sharing and session_cached_cursors) and IO rates
19 Cursor_sharing = force Per Second Per Transaction Redo size: 189, ,837, Logical reads: 1, , Block changes: 1, , Physical reads: Physical writes: User calls: Parses: , Hard parses: Sorts: Logons: Executes: , Transactions: 0.07 % Blocks changed per Read: Recursive Call %: Rollback per trans %: 0.00 Rows per Sort: 20.13
20 Instance Efficiency Instance Efficiency Percentages (Target 100%) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Buffer Nowait %: Redo NoWait %: Buffer Hit %: In-memory Sort %: Library Hit %: Soft Parse %: 5.00 Execute to Parse %: 4.37 Latch Hit %: Parse CPU to Parse Elapsd %: % Non-Parse CPU: Underlined items have good corresponding wait events Shared Pool Statistics Begin End Memory Usage %: % SQL with executions>1: % Memory for SQL w/exec>1:
21 Instance Efficiency - Comments Pre-computed ratios can highlight problems but may be misleading when using small intervals or after restarts – check actual values for significance Seeming good ratios can still cause problems. Practical range of ratios differ greatly %Buffer/redo nowaits, Latch, Sorts % Library Cache 0-100%Parse, Buffer Hit Correlate ratios with wait events where possible Shared pool usage should settle down to 80-90% if >90% check binds and reloads
22 Top 5 Timed Events Most valuable section of generic report 9.2 includes ‘CPU Time’ besides waits events (issues if resource_limit=false) Top 5 Timed Events ~~~~~~~~~~~~~~~~~~ % Total Event Waits Time (s) Ela Time CPU time direct path read control file sequential read log file parallel write db file parallel write
23 Incorporation of CPU Time Pre 9.2 Top 5 wait events = wait time * Sum of all wait times 9.2 Top 5 timed events = (wait or CPU time) * Sum of all wait times + CPU time
24 Wait Events - Comments A very important diagnostic provided by Oracle. The major ‘jumping off point’ if the elapsed times are a significant proportion of the interval time (i.e. if most of the time is not spent in idle waits) See Reference Guide for details of each wait Common I/O related waits:- Db file sequential read – Index reads or scans Db file scattered read – Full table scans Direct path read/write – Temp IO Log related waits- IO, switches, buffer
25 Wait Events – Where to Jump? Db file * read ->SQL by buffer gets/disk reads, File IO stats CPU Time -> Parse rates, Sorts, SQL executions, SQL buffer gets/disk reads, SMP processes(bugs) Direct path reads/writes -> Sorts, Hash joins, hash/sort_area_size, File IO Stats Buffer busy waits -> Buffer pool, Buffer waits, File IO stats, Segment statistics Other important wait events (e.g. latches, enqueues) have corresponding statspack sections to themselves
26 SQL Section Four sections of “worst SQL” ranked by buffer gets, disk reads, executions, parse counts. SQL ordered by Gets for DB: HAW1 Instance: haw1 Snaps: CPU Elapsd Buffer Gets Execs Gets per Exec %Total Time(s) Time(s) Hash Value , , Module: SQL*Plus SELECT * FROM policies WHERE policy_type = :b1
27 SQL Section - Comments Sub optimal SQL is the most common source of database problems. “Can we get the same results by consuming fewer resources?” SQL ranked by total numbers, often the ‘number per execution’ is more useful What is our current execution plan and has it changed recently? Second statspack report available (9i, level >= 6) sqlplus This report provides breakdown across snapshots based on SQL hash value. Reveals changing execution plans (see later example)
28 Segment Statistics Historically difficult to isolate segment specific data, new 9.2 view v$segstat greatly simplifies this Top 5 Logical Reads per Segment for DB -> End Segment Logical Reads Threshold: Obj. Logical Owner Tablespace Object Name Type Reads %Total TB TAB1 ANALYSIS_COMMON_RESU TABLE 106, TB TAB1 ANALYSIS_TESTS TABLE 103, TB TAB1 SAMPLES TABLE 40, TB IND1 SAMPLES_UK1 INDEX 18, TB TAB1 ANALYSIS_RESULTS_PK INDEX 18,
29 Instance/Session Statistics Instance Statistics always included in report, we can also include session statistics for a single session if desired ( i_session_id=>10 ) Useful for validating ratios & obscure stats Instance Activity Stats for DB: HAW1 Instance: haw1 Statistic Total per Second per Trans CPU used by this session 1, ,605.0 parse time cpu parse time elapsed
30 Tablespace and Datafile IO Tablespace IO Stats for DB: Instance: PAYROLL ->ordered by IOs (Reads + Writes) desc Tablespace Filename Av Av Av Av Buffer Av Buf Reads Reads/s Rd(ms) Blks/Rd Writes Writes/s Waits Wt(ms) PAY_6 /u01/oradata/payroll/PAY_6_1.dbf 438, ,
31 Buffer Pool and Buffer Waits Buffer Pool Statistics for DB: NETMON Instance: netmon -> Pools D: default pool, K: keep pool, R: recycle pool Free Write Buffer Buffer Consistent Physical Physical Buffer Complete Busy P Gets Gets Reads Writes Waits Waits Waits D 4,859,734 4,765,667 4,755,716 1, , Buffer wait Statistics for DB: NETMON Instance: netmon -> ordered by wait time desc, waits desc Tot Wait Avg Class Waits Time (cs) Time (cs) data block 8,375 8,000 1 undo block
32 Buffer Pool and Buffer Waits 9i report includes hit ratio per pool in 8i we have to calculate it manually 100*(1-physical/buffer gets) If significant free buffer waits or write buffer waits it implies that db writer is not keeping up with the buffer pool throughput. Busy buffer waits indicate multi process contention for a block. Check data class and reduce contention (e.g. reverse key indexes, fewer rows per block, freelists, initrans, more rollbacks, etc.)
33 Latches Latch Activity for DB: Pct Avg Pct Get Get Slps NoWait NoWait Latch Name Requests Miss /Miss Requests Miss cache buffers lru chain 4,925, ,749, Latch Sleep breakdown for DB -> ordered by misses desc Get Spin & Latch Name Requests Misses Sleeps Sleeps 1-> cache buffers lru chain 4,925, ,245 35, /29608 /2337/269/
34 Library Cache Reloads indicate we are aging out code and reparsing. If bind variables used increase shared_pool size, keep objects Library Cache Activity for DB: PROD Instance: PROD ->"Pct Misses" should be very low Get Pct Pin Pct Invali- Namespace Requests Miss Requests Miss Reloads dations BODY 1, CLUSTER 2, , PIPE SQL AREA 1,146, ,434, ,339 0 TABLE/PROCEDURE 1,988, ,940, ,943 0 TRIGGER
35 Session Topics Statspack introduction and features Mechanics –installing –generating snapshots –producing reports Discussion of the generic report Examples
36 Examples 1.Monitoring Madness 2.Out of Sorts 3.Distributed SQL 4.Changing Plans 5.Freelists and 9i Auto Managed Segment
37 Example #1: Monitoring Madness A previously stable system, a third party monitoring package, is suddenly consuming large amounts of CPU time. The Unix administrators want to know if they should kill these ‘out of control’ Oracle processes Snap Id Snap Time Sessions Begin Snap: Aug-02 12:20:24 33 End Snap: Aug-02 12:31:52 33 Elapsed: (mins)
38 Example #1: Load Profile Per Second Per Transaction Redo size: 7, , Logical reads: 7, , Block changes: Physical reads: 6, , Physical writes: User calls: Parses: Hard parses: Sorts: Logons: Executes: Transactions: 0.89 % Blocks changed per Read: 0.43 Recursive Call %: Rollback per trans %: 0.49 Rows per Sort: 13.08
39 Example #1: Wait Events Instance Efficiency Percentages (Target 100%) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Buffer Nowait %: Redo NoWait %: Buffer Hit %: 3.50 In-memory Sort %: Library Hit %: Soft Parse %: Execute to Parse %: Latch Hit %: Parse CPU/Parse Elapsd %: % Non-Parse CPU: Top 5 Wait Events ~~~~~~~~~~~~~~~~~ Wait % Total Event Waits Time (cs) Wt Time db file scattered read 620, , latch free 76, ,
40 Example #1: Physical Reads SQL ordered by Reads for DB: NETMON Instance: netmon -> End Disk Reads Threshold: 1000 Physical Reads Executions Reads per Exec % Total Hash Value ,723, , select distinct message_number from ntw_act_messages where message_number=:b0 union select message_number from ntw_act_messages where original_msgid=:b0 union select message_number from ntw_hist_messages where message_number=:b0 union select message_number from ntw_hist_messages where original_msgid=:b0
41 Example #1: Comments Execution plan of offending statement SELECT STATEMENT Hint=CHOOSE SORT UNIQUE UNION-ALL INDEX UNIQUE SCAN SYS_C TABLE ACCESS FULL NWT_ACT_MESSAGES INDEX UNIQUE SCAN SYS_C TABLE ACCESS FULL NWT _HIST_MESSAGES New networking equipment and network problems introduced over the weekend caused major flood of messages
42 Example #2: Out of Sorts Load Profile Per Second Per Transaction Logical reads: , Physical reads: , Physical writes: , Parses: Instance Efficiency Percentages (Target 100%) Buffer Nowait %: Redo NoWait %: Buffer Hit %: In-memory Sort %: Library Hit %: Soft Parse %: Execute to Parse %: Latch Hit %: Parse CPU / Parse Elapsd %: % Non-Parse CPU: 91.43
43 Example #2: Out of Sorts Top 5 Timed Events Wait % Total Event Waits Time (s) Wt Time direct path write 1, SQL ordered by Reads Physical Reads Execs Reads per Exec %Total , , select * from mod where course=:b1 order by nam Tablespace IO Stats Tablespace Reads Reads/s Writes Writes/s TEMP 3, ,610 52
44 Example #2: Conclusions sort_area_size parameter was set to 8i default value of 64k Virtually all the I/O to TEMP tablespace due to disk sorting, even though in memory sorts were 96.65% Increasing sort_area_size produced over 90% improvement in benchmark performance
45 Example #3: Distributed SQL Users complaining of poor performance Nothing strange in report (e.g. no bad SQL) except Top 5 Wait Events ~~~~~~~~~~~~~~~ Wait % Total Event Waits Time (cs) Wt Time SQL*Net message from dblink 197,764 12, SQL*Net more data from dblink 1, SQL*Net message to dblink 197, db file sequential read control file parallel write
46 Example #3: Distributed SQL Stats from the remote database Load Profile Per Second Per Transaction Logical reads: , User calls: 1, , Executes: , Instance Efficiency Percentages (Target 100%) Buffer Nowait %: Redo NoWait %: Buffer Hit %: In-memory Sort %: Library Hit %: Soft Parse %: Execute to Parse %: Latch Hit %: Parse CPU/Elapsd %: % Non-Parse CPU: 98.29
47 Example #3: Distributed SQL Top 5 Timed Events Wait % Total Event Waits Time (s) Wt Time CPU time SQL ordered by Executions Executions Rows Processed Rows per Exec Hash Value , , select "RESOURCE_ID" from “RESOURCES" "D" WHERE :1="RESOURCE_ID"
48 Example #3: Conclusions Search of (v$sql) based on previous fragment identified the following statement on the primary SELECT DISTINCT b.auth_role_code FROM a, b, c, d WHERE upper(a.user_login) = upper(‘G243311') AND a.person_id = b.person_id AND b.auth_role_code = c.auth_role_code AND c.resource_id = d.resource_id AND upper(d.resource_id) IN (SELECT upper(ga_resource_id) FROM apps_mapping)
49 Example #3: Conclusions Third party package (in remote database) is not analyzed. This results in a poor distributed execution plan Rows Execution Plan SELECT STATEMENT GOAL: CHOOSE REMOTE [PAW.WORLD] SELECT "RESOURCE_ID" FROM "RESOURCES" "D" WHERE :1= "RESOURCE_ID"
50 Example #4: Changing Plans A batch job that had previously performed well was now taking much longer to run. A conventional statspack report showed that a particular statement was dominating the resource usage. What has changed? Begin statspack.snap(I_snap_level=>6); End; sqlplus
51 Example #4: Changing Plans Plans in shared pool between Begin and End Snap Ids ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Shows the Execution Plans found in the shared pool between the begin and end snapshots specified Operation | PHV/Object Name |Rows|Bytes|Cost SELECT STATEMENT | | | | 1417 SORT ORDER BY | | 8K| 2M| 1417 TABLE ACCESS BY INDEX ROWID|MOVIE_REVIEWS | 8K| 2M| 481 INDEX RANGE SCAN |MOVIE_REVIEWS_I1 | 8K| | 23 SELECT STATEMENT | | | |19468 SORT ORDER BY | |211K| 12M|19468 TABLE ACCESS FULL |MOVIE_REVIEWS |211K| 12M|
52 Example #5: Hot Blocks High rates of concurrent inserts cause busy buffer waits. Lets analyze this using statspack to illustrate enqueues & buffers This example uses 20 processes running concurrently each inserting 10,000 rows into the same log table ( rdbms) 9i Introduces new feature known as ‘Segment Management Auto’ to compare our conventional results against
53 Example #5: Initial Results Elapsed: 1.42 (mins) Buffer Nowait %: Top 5 Wait Events Wait % Total Event Waits Time (s) Wt Time buffer busy waits 337,492 1, enqueue 9, Buffer wait Statistics Tot Wait Avg Class Waits Time (s) Time (ms) data block 301,572 1,106 4
54 Example #5: Freelists->20 Elapsed: 0.94 (mins) Buffer Nowait %: Top 5 Wait Events Wait % Total Event Waits Time (s) Wt Time buffer busy waits 157, enqueue 3, Buffer wait Statistics Tot Wait Avg Class Waits Time (s) Time (ms) undo header 156, segment header
55 Example #5: Rollbacks Increased Elapsed: 0.77 (mins) Buffer Nowait %: Top 5 Wait Events Event Waits Time (s) Wt Time log buffer space 1, enqueue 3, Enqueue activity Eq Requests Succ Gets Failed Gets Waits Time (ms) Time (s) SQ 2,674 2, , HW 3,123 3, ,
56 Example #5: Seg Manage Auto Elapsed: 0.87 (mins) Buffer Nowait %: Event Waits Time(s) Wt Time log buffer space 1, buffer busy waits 12, free buffer waits Total Avg Class Waits Time(s) Time (ms) data block 8, Eq Requests Succ Gets Failed Gets Waits Time (ms) Time (s) SQ 2,669 2, ,
57 References Two Oracle whitepapers ‘Performance Tuning With Statspack, Part I & II’ Ch 10, ‘Expert one-on-one Oracle’ Tom Kyte Statspack readme spdoc.txt provides free automated analysis of Statspack reports
58 Contact Information Ian Jones Database Specialists, Inc. 388 Market Street, Suite 400 San Francisco, CA Tel: 415/ Web: