Monitoring and Tuning Oracle for z/OS and Oracle for z/Linux.

Monitoring and Tuning Oracle for z/OS and Oracle for z/Linux

Thomas Niewel Oracle Deutschland GmbH Thomas.Niewel@Oracle.com

Agenda Tuning Why ? Reasons for bad Response Time Statspack Diagnosing reasons for bad response Times SQL Tuning – TKPROF – Explain Plan WLM

Why do we need to tune ? Users report „ bad “ response times because of – CPU Time + Wait Time – Poor performing queries – SQL-Tuning –„ bad “ Database parameters – Bottlenecks in „ System “ (Operating System, WLM, IO/Subsystem etc.)

What can be the reasons for “ bad ” Response Time High CPU Usage High I/O Usage Memory Usage Network problems „ idle “ System Operating System (WLM, VM)

Diagnose from the Oracle point of view Statspack A short overview

Statspack – a short overview  spcreate.sql- installs Statspack (run only once)  statspack.snap- data capture (procedure)  spreport.sql- reporting  spdoc.txt- user documentation  sppurge.sql- delete Statspack data  spdrop.sql- drop Statspack

Capturing data Prerequisite: timed_statistics=true Use stored procedure statspack.snap SQL> execute statspack.snap;

Capturing data Get a baseline for future comparisons Capture snapshots – across peak load – across batch window – The time between snapshots should be <= 30 minutes Capture can be automated – Use OS utility e.g. cron – Use dbms_job – spauto.sql shipped as example

Reporting with Statspack All data is held in an Oracle database Report between two or more snapshots – cannot report across instance startup Spreport.sql creates a report

Reporting with Statspack SQL> @spreport DB Id DB Name Instance# Instance ----------- ---------- ---------- ---------- 1361567071 DB21 1 MAIL Completed Snapshots Instance DB Name SnapId Snap Started Snap Level ---------- ---------- ------ ---------------------- ---------- DB21 DB21 1 17 Aug 2003 10:00:16 5 2 17 Aug 2003 10:30:28 5 Enter beginning Snap Id: 1 Enter ending Snap Id: 2 Enter name of output file [sp_1_2] :

Analyzing a Statspack report Top down analysis Summary page – Enviroment – Load profile – Instance efficiency – Shared pool usage – Top 5 Timed Events Top SQL

Environment section STATSPACK report for DB Name DB Id Instance Inst Num Release Cluster Host ------------ ----------- ------------ -------- ----------- ------- ------------ RECONPRD 1403107896 RECONPRD 1 9.2.0.2.0 NO lin390t1 Snap Id Snap Time Sessions Curs/Sess Comment ------- ------------------ -------- --------- ------------------- Begin Snap: 2 03-Mar-03 11:28:01 10 5.1 End Snap: 31 04-Mar-03 11:58:04 17 5.5 Elapsed: 30.05 (mins) Cache Sizes (end) ~~~~~~~~~~~~~~~~~ Buffer Cache: 256M Std Block Size: 16K Shared Pool Size: 48M Log Buffer: 128K

Load profile Contains a number of common ratios Allows characterisation of the application Can point to problems – high hard parse rate – high IO rate – high login rate

Load profile Useful if you have a comparable baseline What has changed? – txn/sec change implies changed workload – redo size/txn implies changed transaction mix – physical reads/txn implies changed SQL or plan

Load profile Load Profile ~~~~~~~~~~~~ Per Second Per Transaction --------------- --------------- Redo size: 19,057.68 20,937.67 Logical reads: 2,408.15 2,645.70 Block changes: 98.64 108.37 Physical reads: 990.47 1,088.18 Physical writes: 6.92 7.61 User calls: 76.40 83.93 Parses: 7.08 7.78 Hard parses: 0.02 0.02 Sorts: 29.22 32.10 Logons: 24.73 27.17 Executes: 63.79 70.08 Transactions: 0.91 % Blocks changed per Read: 4.10 Recursive Call %: 72.76 Rollback per transaction %: 36.52 Rows per Sort: 153.46

Instance Efficiency Gives an overview of how the instance is performing Can also be used with a comparable baseline Shared pool Statistics allow quick identification of cursor sharing problems

Instance Efficiency Instance Efficiency Percentages (Target 100%) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Buffer Nowait %: 99.99 Redo NoWait %: 99.97 Buffer Hit %: 59.00 In-memory Sort %: 99.99 Library Hit %: 99.94 Soft Parse %: 99.69 Execute to Parse %: 88.89 Latch Hit %: 99.98 Parse CPU to Parse Elapsd %: 56.55 % Non-Parse CPU: 99.93 Shared Pool Statistics Begin End ------ ------ Memory Usage %: 38.86 66.81 % SQL with executions>1: 43.41 87.22 % Memory for SQL w/exec>1: 39.28 80.21

Top 5 Timed Events CPU time – real work Shows where Oracle sessions are waiting Compare Wait Time to elapsed time % Total Wait Time shows potential benefits Use as basis for directed drilldown % Total Event Waits Time (s) Ela Time ------------------------------- ------------ ----------- -------- CPU time 78,588 50.24 enqueue 1,560,523 59,961 38.33 db file sequential read 1,635,253 6,324 4.04 db file scattered read 14,620,725 5,907 3.78 control file parallel write 32,816 1,396.89

Top 5 Timed Events Sample drilldowns – CPU Time „ on CPU “ – enqueue e.g TX Enqueue – db file sequential read Index Access – db file scattered read Scan Operations control file parallel write

Top SQL Helps to find problem statements – SQL ordered by Gets – SQL ordered by Reads – SQL ordered by Executions – SQL ordered by Parse Calls

Top SQL CPU Elapsd Buffer Gets Executions Gets per Exec %Total Time (s) Time (s) Hash Value --------------- ------------ -------------- ------ -------- --------- ---------- 79,562,398 8,114 9,805.6 34.6 27182.71 28127.71 1525844323 Module: SQL*Plus SELECT MAX(STMT_BKG_DATE_CLOSE) FROM GAH_T_STATEMENTS WHERE S TMT_ACCT_ID = :b1 AND ((:b2 = :b3 AND STMT_CARRIER != :b4 AND STMT_MSG_TYPE != :b5 AND (:b6 IS NULL OR :b6 = STMT_CARRIER ) AND ((:b8 IS NULL AND STMT_MSG_TYPE != :b9 ) OR (:b8 IS NOT NU LL AND :b8 = STMT_MSG_TYPE ))) OR (:b2 = :b13 AND STMT_CARRIER

I/O Statistics Help to find I/O Problems – Tablespace IO Stats – File IO Stats

I/O Statistics Tablespace ------------------------------ Av Av Av Av Buffer Av Buf Reads Reads/s Rd(ms) Blks/Rd Writes Writes/s Waits Wt(ms) -------------- ------- ------ ------- ------------ -------- ---------- ------ GAH_TS00_DT_MEDIUM 15,242,896 160 0.4 6.1 41,066 0 22,468 18.4 GAH_TS00_IX_ITEM 210,346 2 11.2 1.0 130,299 1 9 15.6 GAH_TS00_IX_MEDIUM 207,433 2 6.9 1.0 86,699 1 39 43.8 RECONPRD_TS00_TEMP 185,865 2 1.7 1.6 101,560 1 0 0.0 GAH_TS00_IX_ITEM_REF 155,027 2 8.4 1.0 34,867 0 1 0.0

Diagnosing high CPU usage High CPU Usage High I/O utilization Memory Usage Network problems „ idle “ System Operating System (WLM, VM)

Diagnosing high CPU usage -Operating System- Linux/390 – sar -u 3 3333 – iostat -x 3 – vmstat 3 – top – Etc. Z/OS – SDSF – RMF – Omegamon – etc.

Diagnosing high CPU usage What can be the reason for „ high CPU “ Usage ? – Shared_Pool / SQL-Cache – db_file_multiblock_read_count – Buffer_Cache/ Buffer_Pool – How can Statements with a great # of buffergets be seperated ? – Statspack – SQL Script

Diagnosing high CPU usage CPU Elapsd Buffer Gets Executions Gets per Exec %Total Time (s) Time (s) Hash Value --------------- ------------ -------------- ------ -------- --------- ---------- 4,494,662 155 28,997.8 2.0 1049.63 2414.11 3961361411 SELECT * FROM GAH_T_STATEMENTS WHERE STMT_ACCT_ID = :b1 AND ((:b2 = :b3 AND STMT_CARRIER != :b4 AND STMT_MSG_TYPE != :b5 AND (:b6 IS NULL OR :b6 = STMT_CARRIER ) AND ((:b8 IS NULL AND STMT_MSG_TYPE != :b9 ) OR (:b8 IS NOT NULL AND :b8 = STMT_MSG_ TYPE ))) OR (:b2 = :b13 AND STMT_CARRIER = :b14 AND STMT_MSG_T Module: SQL*Plus

Diagnosing high CPU usage spool cpu_users.lst select buffer_gets,disk_reads,executions, ratio_to_report(buffer_gets) over () * 100 buffer_ratio, ratio_to_report(disk_reads) over () * 100 disk_ratio, sql_text from v$sqlarea order by buffer_ratio desc; spool off

Diagnosing high CPU usage BUFFER_GETS DISK_READS EXECUTIONS BUFFER_RATIO DISK_RATIO ----------- ---------- ---------- ------------ ---------- SQL_TEXT ---------------------------------------------------------------------------------------- 19564429 154 46908 65.9945773 5.40350877 select t.schema, t.name, t.flags, q.name from system.aq$_queue_tables t, ys.aq$_queue_table_affinities aft, system.aq$_queues q where aft.table_objno = t.objno and aft.owner_instance = :1 and q.table_objno = t.objno and q.usage = 0 and bitand(t.flags, 4+16+32+64+128+256) = 0 for update of t.name, aft.table_objno skip locked

SQL Tuning Check Object Statsitics – Use DBMS_STATS Analyze Execution Plan – Explain Query / V$SQL_PLAN – Optimize Query – Optimize Indexes – Index Only Access, Function Based Indexes

Diagnose High CPU Usage High I/O utilization Memory Usage Network problems „ idle “ System Operating System (WLM, VM)

High I/O utilization Linux/390 – sar -d 3 33333 – iostat -x 3 – vmstat 3 Z/OS – RMF – Omegamon etc

High I/O utilization Disk I/O – Disk access is slower than memory access (Factor 5000 to 100000) – One physical disk is able to perform 100-150 I/O´s per Second – Disk Reponse Times (Read operations) – 2ms (Read from disk cache) – 10ms – 15ms (Physical Reads)

High I/O utilization Reasons for High I/O utilization – Database Cache too small (DB_CACHE_SIZE) – Sortarea too small (sort_area_size) – Hasharea too small (hash_area_size) – Too many Checkpoints – Ineffective Execution Plans (e.g. Full-Table- Scans which are not necessary)

High I/O utilization Increase Cache Size – Reduces physical I/O Operations – Z/OS – Limited by 31 Bit Arcitecture – Multiple Adress Spaces help to improve the Memory management

High I/O utilization Single Shared SGA Across Address Spaces AS1AS2AS3ASn An Oracle server instance has a single SGA regardless of the number of address spaces or regions configured. The user context is distributed across all AS

High I/O utilization Linux/390 – The default maximum SGA size on Linux/390 is 750 MB without changing the base adress – the maximum SGA size to 1 GB by changing the SGA base address

% Total Event Waits Time (s) Wt Time -------------------------------------------- ------------ ------------ ------- db file sequential read 89,086,819 11,009 93.13 db file scattered read 9,875,076 776 6.56 file open 505,227 23.19 log file sync 440,409 8.07 latch free 11,042,510 3.03 High I/O utilization Top 5 Timed Events

Tablespace Av Av Av Av Buffer Av Buf Reads Reads/s Rd(ms) Blks/Rd Writes Writes/s Waits Wt(ms) -------------- ------- ------ ------- ------------ -------- ---------- ------ RECEIVABLE_T_01 18,398,460 213 12.0 1.6 59,325 1 4,892,686 0.0 SO_T_03 6,827,475 79 13.2 1.6 27,462 0 4,506 0.0 SO_I_01 5,356,393 62 9.0 1.3 18,388 0 35,935 0.0 PO_I_01 4,641,732 1400 21.7 1.8 72,563 1 217,799 0.0 Tablespace I/O Stats: High I/O utilization

D I R E C T A C C E S S D E V I C E A C T I V I T Y DEVICE AVG AVG AVG AVG AVG AVG AVG AVG % % % AVG % % STORAGE DEV DEVICE VOLUME LCU ACTIVITY RESP IOSQ DPB CUB DB PEND DISC CONN DEV DEV DEV NUMBER ANY MT GROUP NUM TYPE SERIAL RATE TIME TIME DLY DLY DLY TIME TIME TIME CONN UTIL RESV ALLOC ALLOC PEND DBORACLE 7651 33903 LEOR00 008F 0.817 4 0 0.0 0.0 0.0 0.2 2.6 0.8 0.06 0.28 0.0 1.0 100.0 0.0 DBORACLE 7652 33903 LEOR01 008F 0.878 9 0 0.0 0.0 0.0 0.2 0.3 8.7 0.76 0.79 0.0 3.0 100.0 0.0 DBORACLE 7653 33903 LEOR02 008F 0.502 2 0 0.0 0.0 0.0 0.2 0.0 1.5 0.08 0.08 0.0 6.0 100.0 0.0 DBORACLE 7654 33903 LEOR03 008F 108.968 56 52 0.0 0.0 0.0 0.2 2.4 0.8 0.08 0.32 0.0 1.0 100.0 0.0 DBORACLE 7655 33903 LEOR04 008F 0.828 3 0 0.0 0.0 0.0 0.2 2.3 0.8 0.06 0.25 0.0 1.0 100.0 0.0 DBORACLE 7656 33903 LEOR05 008F 98.779 50 48 0.0 0.0 0.0 0.2 1.7 0.8 0.13 0.42 0.0 1.0 100.0 0.0 DBORACLE 7657 33903 LEOR06 008F 2.768 2 0 0.0 0.0 0.0 0.3 1.3 0.7 0.20 0.56 0.0 1.0 100.0 0.0 DBORACLE 7658 33903 LEOR07 008F 0.943 3 0 0.0 0.0 0.0 0.2 2.3 0.7 0.07 0.28 0.0 1.0 100.0 0.0 DBORACLE 7659 33903 LEOR08 008F 1.003 4 0 0.0 0.0 0.0 0.2 3.5 0.8 0.08 0.43 0.0 1.0 100.0 0.0 DBORACLE 765A 33903 LEOR09 008F 0.945 3 0 0.0 0.0 0.0 0.2 2.2 0.8 0.07 0.28 0.0 1.0 100.0 0.0 DBORACLE 765B 33903 LEOR0A 008F 0.217 3 0 0.0 0.0 0.0 0.2 2.2 0.8 0.02 0.06 0.0 1.0 100.0 0.0 DBORACLE 765C 33903 LEOR0B 008F 0.833 4 0 0.0 0.0 0.0 0.2 2.5 0.8 0.06 0.28 0.0 2.0 100.0 0.0 DBORACLE 765D 33903 LEOR0C 008F 0.963 4 0 0.0 0.0 0.0 0.2 2.7 0.9 0.09 0.35 0.0 1.0 100.0 0.0 DBORACLE 765E 33903 LEOR0D 008F 0.013 3 0 0.0 0.0 0.0 0.2 2.6 0.5 0.00 0.00 0.0 1.0 100.0 0.0 DBORACLE 765F 33903 LEOR0E 008F 0.935 4 0 0.0 0.0 0.0 0.2 3.0 0.8 0.07 0.35 0.0 1.0 100.0 0.0 RMF Report ( Monitor 1; RMF Postprocessor) High I/O utilization

RMF Report – Explanations – IOSQ TIME = UCB Queueing time – Avg Pend Time = ms, all Path´s to logical volume are busy – AVG Resp Time = Connect Time + Dicsonnect Time + Pending Time + IOSQ High I/O utilization

SQL Tuning Check Object Statsitics – Use DBMS_STATS Analyze Execution Plan – Explain Query – Optimize Query – Optimize Indexes – Index Only Access, Function Based Indexes

Diagnose High CPU Usage High I/O utilization Memory Usage Network problems „ idle “ System Operating System (WLM, VM)

Memory Problems How to determine Paging/Swapping – Linux/390 – VMSTAT – Z/OS – RMF – OMEGAMON Reasons for Paging/Swapping – Too many processes/users – Database Parameters which are too generously – DB_CACHE_SIZE – HASH_SIZE – SQL_CACHE

Diagnosing high CPU usage -Operating System- High CPU Usage High I/O utilization Memory Usage Network problems „ idle “ System Operating System (WLM, VM)

Diagnosing Network problems Latency – LAN: < 1ms – WAN: < 10ms - 500ms – ISDN: < 50ms – VPN: 100-500 ms Badwidth – 11-18 Mbit (Copper) – 100 Mbit (Copper, fibre) – 1 Gbit (fibre) Great number of small packets – tcp_nodelay – SDU, TDU-Parameters (not available on z/os)

Diagnosing high CPU usage -Operating System- High CPU Usage High I/O utilization Memory Usage Network problems „ idle “ System Operating System (WLM, VM)

Idle System One CPU is 100% used – All other CPU´s are idle – Reason – dedicated Server – Only one process is running – Solution – Parallel Query – Not useful for OLTP Aplications – Split work - run more Processes

Idle System Latch Contentions – Use Statspack to diagnose Enqueue Waits – Use Statspack to diagnose – Often Block Contentions because of too small initrans, Freelist, Freelist goup settings Parsing because the use of Literals – Use Statspack to diagnose – Use CURSOR SHARING – Use Bind Variables

Idle System Top 5 Timed Events ~~~~~~~~~~~~~~~~~~ % Total Event Waits Time (s) Ela Time -------------------------------------------- ------------ ----------- -------- enqueue 1,560,523 78,588 50.24 CPU time 59,961 38.33 db file sequential read 1,635,253 6,324 4.04 db file scattered read 14,620,725 5,907 3.78 control file parallel write 32,816 1,396.89 -------------------------------------------------------------

Idle System Enqueue activity for DB: RECONPRD Instance: RECONPRD Snaps: 2 -31 -> Enqueue stats gathered prior to 9i should not be compared with 9i data -> ordered by Wait Time desc, Waits desc Avg Wt Wait Eq Requests Succ Gets Failed Gets Waits Time (ms) Time (s) -- ------------ ------------ ----------- ----------- ------------- ------------ TX 438,961 438,941 20 114 512,902.49 58,471 TC 34,530 34,530 0 6,904 369.61 2,552 PS 9,526,323 9,386,524 139,799 1,517,315.25 381 CF 42,761 42,751 10 23 897.57 21 CI 55,594 55,594 0 12 6.17 0 HW 11,356 11,356 0 8.13 0 -------------------------------------------------------------

SQL-Tuning Prerequisites – Use Cost based optimizer – DBMS_STATS (important) Explain Query – Create Plan Table: UTLXPLAN Visualize Execution Plan – UTLXPLS – UTLXPLP Note: Scripts are located in xxxxxxxx.yyyyyyyy.SQL library (z/OS) $ORACLE_HOME/rdbms/admin (Linux/Unix)

SQL-Tuning SQL> explain plan for select a.* from scott.emp a, scott.dept b where a.deptno=b.deptno; Explained. SQL> save explain Created file explain.sql SQL> @?/rdbms/admin/utlxpls PLAN_TABLE_OUTPUT -------------------------------------------------------------------------------- -------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost | -------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 14 | 560 | 2 | | 1 | NESTED LOOPS | | 14 | 560 | 2 | | 2 | TABLE ACCESS FULL | EMP | 14 | 518 | 2 | |* 3 | INDEX UNIQUE SCAN | PK_DEPT | 1 | 3 | | -------------------------------------------------------------------- Predicate Information (identified by operation id): PLAN_TABLE_OUTPUT -------------------------------------------------------------------------------- 3 - access("A"."DEPTNO"="B"."DEPTNO")

SQL-Tuning Optimizer features which help to improve execution plans – Function based indexes (very important) – SELECT * From emp where upper(ename) = ´SMITH´ – Bitmap indexes (Useful in case of Read Only) – Useful for Low Cardinality columns – Parameter: Optimizer_index_cost_adj – Optimizer access path selection can be adjusted to be more index friendly

SQL-Tuning SQLTRACE – Prerequisite: timed_statistics=true – Activate – Alter Session set SQL_trace=true – dbms_system.set_sql_trace_in_session – Use TKPROF to show execution statistics – sys=no,explain=uid/pw

z/OS WLM Everything works fine without peaks (e.g.CPU 30%) Common Problems we had with WLM(during peak periods) – The „Everything is important syndrom“ – User didn´t classify any discretionary goals – Everything had the same importance – Enclave(Sess) with response time goals – Enclave goes to last period (which was discretionary) shortly after Logon – No default service class for OSDI – Mistake in classification rules will result in SYSOTHER being used – discretionary goal

Oracle for Linux /390 We had tuning work – Linux on an LPAR – Linux under VM We did not have any VM related problems The reasons for performance bottlenecks were – Execution plan of a few SQL Queries – I/O Subsystem

Oracle for z/OS The reasons for performance bottlenecks were – WLM configuration – Execution plan of a few SQL Queries – I/O Subsystem – Variances in disc response time

Monitoring and Tuning Oracle for z/OS and Oracle for z/Linux.

Similar presentations

Presentation on theme: "Monitoring and Tuning Oracle for z/OS and Oracle for z/Linux."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Monitoring and Tuning Oracle for z/OS and Oracle for z/Linux.

Similar presentations

Presentation on theme: "Monitoring and Tuning Oracle for z/OS and Oracle for z/Linux."— Presentation transcript:

Similar presentations

About project

Feedback