Presentation is loading. Please wait.

Presentation is loading. Please wait.

How STATSPACK Was Used to Solve Common Performance Issues

Similar presentations


Presentation on theme: "How STATSPACK Was Used to Solve Common Performance Issues"— Presentation transcript:

1 How STATSPACK Was Used to Solve Common Performance Issues
Dedicated to Pramitha Chowrira, the Goddess of the Rockies, Mike Waldron, the Student who became the Master, Sheryl Driscoll, who leads purely for the glory and Ann Bischoff, who took me in trade for a developer... How STATSPACK Was Used to Solve Common Performance Issues Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems NoCOUG Brian Hitchcock November 13, 2003 Page 1

2 What STATSPACK Is Set of SQL and PL/SQL
Collects performance data from v$ tables Stores collected data in separate tables Each collection of data is a ‘snapshot’ Reports deltas in data between snapshots Supports ad-hoc SQL queries of the snapshot data

3 STATSPACK Details Works for 8.1.7 onwards
Gathers data for a single instance Snapshot levels Determine how much data is collected Defaults are fine Snapshot interval of 15 minutes suggested Long report periods miss transient events Reports over an instance restart are not valid

4 STATSPACK -- Good Free (very cool!) Gathers a wide range of data
You don’t know what you’re looking for at first Root cause isn’t usually obvious Standard process to collect performance data Gathers the same data on all instances Easy to share with vendors, support groups

5 STATSPACK -- Not Perfect
Gathers a wide range of data Ocean of data Any information? Easy to get lost Does not tell you what the problem is Shows what is happening in the instance You need to figure out if this is a problem or not Does not tell you the solution Does not tell you that you are done tuning...

6 How to Interpret Output?
Requires experience with your system No single way to analyze output Must have history of your system Look for possible problem areas Trial and error to change problem behavior Only you can tell if you have a performance problem

7 STATSPACK Report Sections
Instance Summary, Efficiency Top 5 Wait Events, Wait Events SQL Ordered by Gets, Reads, Executions Instance Activity Stats Tablespace IO Stats Ordered by IOs, Tblspc-file Buffer Pool Statistics Rollback Segment Stats, Storage Latch Activity, Sleep, Miss Sources Dictionary Cache Stats Library Cache Activity SGA Memory Summary, Breakdown init.ora Parameters

8 Documentation of output
No comprehensive documentation Oracle 8i Reference Appendix A -- Wait Events defined Appendix B -- Enqueue Names defined Appendix C -- Statistics Descriptions Database Performance Guide and Ref 9.0.1 Chapters 21-23, Supplied packages, how to use $ORACLE_HOME/rdbms/admin/spdoc.txt ORACLE High-Performance Tuning with STATSPACK, Donald K. Burleson, Oracle Press ISBN No explanation of What output means for your system

9 Configuration Used Oracle 8.1.7.2 Snapshots every 15 minutes
snapshots taken continuously Default STATSPACK snapshot ‘level’ Application loads and analyzes web site click stream data Lots of data More data all the time We don’t know what vendor code looks like

10 Actual Use 4 Performance issues in 2002 Case 1) Reports Running Slow
STATSPACK output didn’t show the problem Case 2) Vendor Demo Slow Case 3) Data Load Slow STATSPACK output led to 18x speedup (1800%) Case 4) Data Load Time Varies STATSPACK output led to the root cause

11 Case 1) Reports Running Slow
Vendor code allows users to setup reports Vendor code generates SQL for report Long run interferes with next day’s data load STATSPACK captures SQL Generate explain plan(s) Report SQL doesn’t generate where clause properly to use partition pruning Vendor refuses to change their code We simply removed the reports Performance issue ‘resolved’

12 Case 2) Vendor Demo Slow Due to issues like Case 1)
New vendor sets up demo, data load slow Data load runs twice as fast at vendor Statspack output doesn’t show anything obvious Compare configuration of vendor and our dbs Vendor has only one redo log file per group We had two redo log files per group We drop one file per group, performance issue resolved

13 Case 3) Data Load Slow First time loading new type of web log data
No baseline to compare with Classical performance tuning doesn’t always apply to the real world Data load so slow no time for daily reporting Must run faster or the data won’t be loaded We don’t know if this load will run faster Do we have a ‘performance’ issue? Yes, data load must run faster to be useful No, perhaps this is as fast as it can be...

14 Case 3) Data Load Slow SQL -- Highest Gets per Exec
SQL ordered by Gets for DB: BHDATA04 Instance: BHDATA04 Snaps: -> End Buffer Gets Threshold: -> Note that resources reported for PL/SQL includes the resources used by all SQL statements called within the PL/SQL code. As individual SQL statements are also reported, it is possible and valid for the summed total % to exceed 100 Buffer Gets Executions Gets per Exec % Total Hash Value 634, , SELECT t526.keyvalueid FROM bh_lqueryvalue t526 WHERE t526.query infoid = :ph0 ORDER BY t526.keyvalueid ASC

15 Bad SQL? SQL shouldn’t cost much Drop the costly index!
Select looking for one row Table has two indexes Explain plan shows ‘index full scan’ Should show ‘index range scan’ Explain plan with hint to force one index Verify cost of each index Optimizer is choosing wrong index! Drop the costly index! Indexes added by vendor ‘to be safe’...

16 Explain Plan Force Index1
Cost 4 SQL> truncate table plan_table; Table truncated. SQL> explain plan set Statement_Id = 'TEST' for SELECT /*+ INDEX(t526 X_LQRYVL_QUERYIDKYVL) */ t526.keyvalueid FROM bh_lqueryvalue t526 WHERE t526.queryinfoid = 100 ORDER BY t526.keyvalueid ASC; Explained. Plan Table | Operation | Name | Rows | Bytes| Cost | Pstart| Pstop | | SELECT STATEMENT | | | 24 | | | | | INDEX RANGE SCAN |X_LQRYVL_ | | 24 | | | |

17 Explain Plan Force Index2
Cost 11,667 SQL> explain plan set Statement_Id = 'TEST' for SELECT /*+ INDEX(t526 X_LQYVAL_KYVLQRYID) */ t526.keyvalueid FROM bh_lqueryvalue t526 WHERE t526.queryinfoid = 100 ORDER BY t526.keyvalueid ASC; Explained. Plan Table | Operation | Name | Rows | Bytes| Cost | Pstart| Pstop | | SELECT STATEMENT | | | 24 | | | | | INDEX FULL SCAN |X_LQYVAL_ | | 24 | | | | SQL>

18 Solution After dropping costly index
data load time was 18 hours, became 1 hour 18:1 improvement (1800%) Why did optimizer choose wrong index? No idea, Oracle requested running 18 hour data load to gather instance data Business users said “NO!” Indexes created by vendor, no need for both indexes Know when to quit tuning!

19 What About Wait Events? Popular DBAs
Wait Events are all that matters I want (desperately) to be popular too… Return to Case 3) Examine Top 5 Wait Events section Try to understand what is causing the wait time

20 Case 3) Wait Events Top 5 Wait Events PX -- parallel query issues?
~~~~~~~~~~~~~~~~~ Wait % Total Event Waits Time (cs) Wt Time PX Deq: Execution Msg , latch free , control file parallel write db file sequential read log file parallel write PX -- parallel query issues? Contact Oracle Tech Support

21 Ask the Experts Oracle Tech Support
Many wait events in STATSPACK output should be ignored (a bug perhaps? Or an RFE?) Requests the full STATSPACK report Tells me that the latch free wait event must be addressed set session_cached_cursors = 100 Performance improvement will be ‘significant’ Data load now takes 19 hours (about 10% worse)

22 What Happened? How could the experts miss the bad SQL?
Wait events are important If the total wait time is the largest problem In this case Bad SQL dominated the overall run time Wait event analysis Many events should be ignored Need to determine how much of total run time is due to wait events only Go back and fix the SQL issue

23 Total CPU Time From Instance Activity Stats
Instance Activity Stats for DB: BHDATA04 Instance: BHDATA04 Snaps: Statistic Total per Second per Trans CPU used by this session , ,441.0 CPU used when call started , ,475.0 Total CPU time is s of milliseconds 10 milliseconds is 1 centisecond (cs) = 0.01 sec > cs = seconds Report interval was 4.60 minutes (276 seconds) confused? How many cs left until Happy Hour?

24 Total Wait Time Look at all Wait Events
Wait Events for DB: BHDATA04 Instance: BHDATA04 Snaps: -> cs - centisecond th of a second -> ms - millisecond th of a second -> ordered by wait time desc, waits desc (idle events last) Avg Total Wait wait Waits Event Waits Timeouts Time (cs) (ms) /txn PX Deq: Execution Msg , latch free , , ###### control file parallel write db file sequential read log file parallel write enqueue refresh controlfile command PX Deq: Msg Fragment PX Deq: Parse Reply control file sequential read , ###### log file sync PX Deq: Signal ACK PX Deq: Join ACK PX Deq: Execute Reply SQL*Net more data to client file open db file parallel write PX Idle Wait , , , ###### SQL*Net message from client , , ###### SQL*Net message to client , ###### > Total Wait Time cs

25 Real Total Wait Time Remove idle events
MetaLink Note: PQ Wait Events STATSPACK report should filter out idle events Database Performance Guide and Ref 9.0.1 Explains more about this ‘feature’

26 Real Total Wait Time Idle Events
Wait Events for DB: BHDATA04 Instance: BHDATA04 Snaps: -> cs - centisecond th of a second -> ms - millisecond th of a second -> ordered by wait time desc, waits desc (idle events last) Avg Total Wait wait Waits Event Waits Timeouts Time (cs) (ms) /txn PX Deq: Execution Msg , <----- remove latch free , , ###### control file parallel write db file sequential read log file parallel write enqueue refresh controlfile command PX Deq: Msg Fragment PX Deq: Parse Reply control file sequential read , ###### log file sync PX Deq: Signal ACK <----- remove PX Deq: Join ACK PX Deq: Execute Reply SQL*Net more data to client file open db file parallel write PX Idle Wait , , , ###### <----- remove SQL*Net message from client , , ###### <----- remove SQL*Net message to client , ###### > Total Wait Time 1397 cs

27 Total Response Time Total CPU Time + Total Wait Time Total Wait Time
27475 cs cs = cs Total Wait Time 1397/28872 = 0.05 5% of Total Response Time Wait Time was never an issue! If you don’t remove the idle events 907997/( ) = 97%

28 Bad SQL Rules Slow data load time Time due to Bad SQL
Time due All Others Including Wait Events

29 Case 4) Data Load Time Varies
Normal 6.5 hours, Long 16 hours Varies randomly, no pattern Generate STATSPACK report Normal, Long Compare reports Look for differences between reports Tablespace IO Stats Section Normal tablespaces accessed Long tablespaces accessed

30 Problem and Solution Vendor data load shouldn’t touch all tables
What process would access all tables? Production db supported by another group We aren’t allowed to connect as ‘oracle’ Can’t see what they might be running (cron?) Turns out Production DBAs decided we needed full exports We weren’t notified Stop the exports, performance issue goes away!

31 What About Cache Hit Rates?
Back to the subject of experts Remember when it was cool to discuss hit rates? For Case 4) Compute buffer cache hit ratio from tables Tables larger than physical memory Can’t have all pages in memory at once Buffer cache hit ratio won’t be 100% Even if we had 100% Bad SQL (index) was the real problem Buffer cache hit ratio wasn’t relevant

32 Select Buffer Cache Hit Ratio
Data Load without Exports running

33 Total Wait Time? For Case 4), 15 minute report interval
Wait Time is 28% of total time Instance Activity Stats for DB: BHDATA01 Instance: BHDATA01 Snaps: Statistic Total per Second per Trans CPU used by this session ,422, , CPU used when call started ,237, , <----- Top 5 Wait Events ~~~~~~~~~~~~~~~~~ Wait % Total Event Waits Time (cs) Wt Time PX Deq: Table Q Normal , ,017, <-- remove slave wait , , PX Deq Credit: send blkd , , <-- remove PX Deq: Execution Msg , , <-- remove latch free , , cs total time = = Wait time is / = 28%

34 Review For the 4 issues we had, STATSPACK output
Was useful for all 4 Provided standard set of data for all involved Fixed 2 issues Provided the data that led to the root cause Verified that the fix was working Performance improvements were substantial Tuning process much faster with STATSPACK Same process worked for all 4 issues Decided not to look for further improvements Wait Time analysis might be useful...

35 oraperf.com Analyzer Website oraperf.com Submit STATSPACK report
Analyzer reviews report Generates detailed analysis CPU time Wait time Gives specific advice Not perfect, but it is fast and free! Has same issues with idle wait events as STATSPACK report

36 oraperf.com Who or what is oraperf.com? From the website...
Oraperf.com is run by Anjo Kolk. Anjo has worked for over 16 years at Oracle ( ). While at Oracle he worked in different countries and different departments. Many people generate utlbstat/utlestats and statspack reports, but don't know how to interpret the data. People that do look at these are reports also mostly looking at the wrong information and end up making the wrong tuning decisions. That is why the reports are analyzed based on the YAPP method. The YAPP method will show what component of the total response time should be tuned first. YAPP-Method -- Yet Another Performance Profiling Method

37 oraperf.com -- Case 3) Upload report from slow data load
Analyzer shows Response time 91.63% CPU Time 8.37% Wait Time Advice? Reduce the number of buffer gets or executions Wait time Matters only as a % of total response time

38 oraperf.com -- Case 3) Upload report from fast data load
Analyzer shows Response time 5.16% CPU Time 94.84% Wait Time Advice? Tune PX Deq: Execution Msg event But this is an idle event... Non-idle wait time is only about 25% total time

39 oraperf.com -- Case 4) Conclusion
oraperf.com analyzer provides another tool for performance tuning Well worth using if only to compute Response Time CPU Time Wait Time Check for idle wait events...

40 No Excuses Install STATSPACK Generate two snapshots
Generate standard report Upload to oraperf.com Review advice Fast, free performance analysis!

41 Installing STATSPACK Create separate tablespace Create PERFSTAT user
Execute SQL script to create tables Setup job to execute snapshots Setup process to purge data over time Set timed_statistics = TRUE Not required, but needed to get wait time data

42 Installing STATSPACK As user ‘SYS’ create tablespace perfstat datafile
'/xxx/xxx/perfstat_01.dbf' size 500M; cd $ORACLE_HOME/rdbms/admin sqlplus sys @spcreate.sql Enter value for default_tablespace: perfstat Enter value for temporary_tablespace: temp

43 Generate Standard Report
Report SQL supplied by Oracle sqlplus execute statspack.snap @$ORACLE_HOME/rdbms/admin/spreport.sql Enter value for begin_snap: 1 Enter value for end_snap: 2 Enter value for report_name: testing

44 Select STATSPACK Data Query the tables directly
select to_char(snap_time,'yyyy-mm-dd HH24') mydate, new.name buffer_pool_name, (((new.consistent_gets-old.consistent_gets)+ (new.db_block_gets-old.db_block_gets))-(new.physical_reads-old.physical_reads)) / ((new.consistent_gets-old.consistent_gets)+ (new.db_block_gets-old.db_block_gets)) bhr from perfstat.stats$buffer_pool_statistics old, perfstat.stats$buffer_pool_statistics new, perfstat.stats$snapshot sn where new.snap_id > 13125 and new.snap_id < 13149 and new.name = old.name new.snap_id = sn.snap_id old.snap_id = sn.snap_id-1; Based on SQL from ORACLE High-Performance Tuning with STATSPACK Donald K. Burleson Oracle Press ISBN

45 Buffer Cache Hit Ratio Case 4)
yr. mo dy Hr BUFFER_POOL_NAME BHR DEFAULT DEFAULT DEFAULT DEFAULT DEFAULT DEFAULT DEFAULT DEFAULT DEFAULT DEFAULT DEFAULT DEFAULT DEFAULT DEFAULT DEFAULT DEFAULT Output of SQL on previous slide

46 Space Used Snapshot size varies with
Number of tablespaces Number of SQL statements captured Db1 21 tablespaces --> 0.15 Mb/snapshot Db2 376 tablespaces --> 0.37 Mb/snapshot Assuming a snapshot every 15 minutes 96 snapshots per day Db1 --> 14.4 Mb/day Db2 --> 35.6 Mb/day

47 Removing Snapshot Data
Oracle supplied SQL SQL removes snapshot data for a range of snapshot id numbers Example sqlplus @$ORACLE_HOME/rdbms/admin/sppurge ... ... (listing of all existing snapshots) Specify the Lo Snap Id and Hi Snap Id range to purge ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Enter value for losnapid: 4001 Using 4001 for lower bound. Enter value for hisnapid: 5000 Using 5000 for upper bound. Deleting snapshots commit; Note: large deletes may fill rollback segments

48 Summary STATSPACK Free, easy to install, easy to run
Output can be very useful or confusing Real-world use has resulted in big performance gains Useful for all instances Standard way to gather performance data


Download ppt "How STATSPACK Was Used to Solve Common Performance Issues"

Similar presentations


Ads by Google