DB Issue Trouble Shooting Guideline Edgar Huang
DB Issues Performance Issues DB Hung Internal Error Raised Lock / Dead Lock Other Issues
Performance Issues Most of (more 90%) performance issues are caused by Application. Few (less than 10%) of issues are caused by resource limitation, OS/RDBMS configuration and other reasons.
Work flow of identifying/solving Performance Issues. Client Program DB Host Middle Ware DB Server App Server App Server DB Server Middle Ware DB Host Client Program
Latent Reasons Bottleneck in Client Program PLSQL Issue Unreasonable configuration in RDBMS Network Issue Memory Issue CPU Issue IO Issue
Top Processes TOP command in UNIX Glance command in HP-UX Task Manager in Windows
Top Sessions in DB Find out the PID of top process from OS In dedicated server, find out the mapping session info in DB by the PID select b.spid, a.sid, a.username, s.sql_text from v$session a , v$process b, v$sqlarea s where a.PADDR = b.ADDR and a.sql_hash_value = s.hash_value and b.spid=&PID;
Find out the long operation in the transaction select username,sid,opname, round ( sofar *100 / totalwork,0)||'%' as progress, time_remaining,sql_text from v$session_longops , v$sqlarea where time_remaining <> 0 and sql_address = address and sql_hash_value = hash_value;
Related View V$PROCESS V$SESSION V$SQLAREA v$session_longops
Find more from Statspack Report Generate the report during peak time Analyze the report
Important Sections in the report Instance Efficiency Percentages Top 5 Wait Events SQL ordered by Gets for DB SQL ordered by Reads for DB Tablespace IO Stats for DB & File IO Stats for DB
Instance Efficiency Percentages Buffer Nowait Ratio Buffer Hit Ratio Library Hit Ratio Redo no-wait Ratio In-memory Sort Ratio
Instance Efficiency Percentages Instance Efficiency Percentages (Target 100%) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Buffer Nowait %: 100.00 Redo NoWait %: 100.00 Buffer Hit %: 99.08 In-memory Sort %: 99.60 Library Hit %: 99.46 Soft Parse %: 98.99 Execute to Parse %: 45.61 Latch Hit %: 100.00 Parse CPU to Parse Elapsd %: 87.88 % Non-Parse CPU: 100.00
Top 5 Wait Events DB File Scattered Read. DB File Sequential Read Free Buffer Enqueue Log Buffer Space Log File Sync log file parallel write
Top 5 Wait Events Top 5 Timed Events ~~~~~~~~~~~~~~~~~~ % Total Event Waits Time (s) Ela Time -------------------------------------------- ------------ ----------- -------- db file sequential read 7,544,366 25,591 48.49 CPU time 14,126 26.76 log file sync 50,593 6,241 11.82 db file scattered read 1,473,891 2,154 4.08 log file parallel write 60,964 1,546 2.93 -------------------------------------------------------------
SQL ordered by Gets for DB SQL ordered by Gets for DB: ICSSPRD Instance: icssprd Snaps: 179 -181 -> End Buffer Gets Threshold: 10000 -> Note that resources reported for PL/SQL includes the resources used by all SQL statements called within the PL/SQL code. As individual SQL statements are also reported, it is possible and valid for the summed total % to exceed 100 CPU Elapsd Buffer Gets Executions Gets per Exec %Total Time (s) Time (s) Hash Value --------------- ------------ -------------- ------ -------- --------- ---------- 91,163,470 567 160,782.1 19.2 850.46 949.33 1513226087 Module: SQL*Plus SELECT DISTINCT b.filename || :"SYS_B_00" FROM css_tp_transaction_event a ,css_tp_txn_event_activity b WHERE a.trdng_ptnr_id = :"SYS_B_01" AND a.event_type_id in (:"SYS _B_02",:"SYS_B_03") AND a.txn_event_id = b.txn_ev
SQL ordered by Reads for DB SQL ordered by Reads for DB: ICSSPRD Instance: icssprd Snaps: 179 -181 -> End Disk Reads Threshold: 1000 CPU Elapsd Physical Reads Executions Reads per Exec %Total Time (s) Time (s) Hash Value --------------- ------------ -------------- ------ -------- --------- ---------- 11,993,968 120 99,949.7 36.2 443.71 552.02 4292891611 Module: JDBC Connect Client BEGIN CSSPLSQL.B2B_MSG_DETECTOR.GET_B2B_OUT_MSG_DETECTOR ( :V1, :V2, :V3); END;
Tablespace IO Stats for DB & File IO Stats for DB Tablespace IO Stats for DB: ICSSPRD Instance: icssprd Snaps: 179 -181 ->ordered by IOs (Reads + Writes) desc Tablespace ------------------------------ Av Av Av Av Buffer Av Buf Reads Reads/s Rd(ms) Blks/Rd Writes Writes/s Waits Wt(ms) -------------- ------- ------ ------- ------------ -------- ---------- ------ CSS_BKG_DATA 1,699,930 472 4.3 4.3 109,215 30 5,034 10.9
Generate SQL report for further study SQL> @?\rdbms\admin\sprepsql.sql …… Enter value for begin_snap: 886 … Enter value for end_snap: 910 Enter value for hash_value: 4206207905
Generate SQL report for further study Plans in shared pool between Begin and End Snap Ids ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Shows the Execution Plans found in the shared pool between the begin and end snapshots specified. The values for Rows, Bytes and Cost shown below are those which existed at the time the first-ever snapshot captured this plan - these values often change over time, and so may not be indicative of current values -> Rows indicates Cardinality, PHV is Plan Hash Value -> ordered by Plan Hash Value -------------------------------------------------------------------------------- | Operation | PHV/Object Name | Rows | Bytes| Cost | |SELECT STATEMENT |----- 235857360 -----| | | 2021 | |SORT UNIQUE | | 194 | 16K| 2021 | | CONCATENATION | | | | |
SQL Trace on top SQL SET TIMING ON SET AUTOT[RACE] TRACE
SQL Trace on top SQL Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 159414 consistent gets 0 physical reads 0 redo size 146 bytes sent via SQL*Net to client 235 bytes received via SQL*Net from client 1 SQL*Net roundtrips to/from client 1 sorts (memory) 0 sorts (disk) 0 rows processed
DB Hung Issue It’s fatal fault Mostly, need work with OS & Oracle support team Run RDA in DB server host to collect information as soon Log SR in Oracle Metalink
Internal Error ORA-00600 to ORA-07445 Mostly relate to Oracle internal bug Log SR in Metalink Send Trace file (under bdump folder) to Oracle
Lock Issue Top Event Enqueue Find out the TM, TX lock; Find out the holder Work with dev team to change the logic to reduce/release the locks
Lock Issues select * from DBA_WAITERS; SELECT /*+ rule */ lpad(' ',decode(l.xidusn ,0,3,0))||l.oracle_username User_name, o.owner, o.object_name, o.object_type, s.sid, s.serial# FROM v$locked_object l,dba_objects o,v$session s WHERE l.object_id=o.object_id AND l.session_id=s.sid ORDER BY o.object_id,xidusn DESC; select sid,type,id1,id2,lmode,request,ctime,block from v$lock where type in ('TM','TX');
Lock Issues V$lock dba_waiters v$locked_objects
Dead Lock It’s logical issue Two or more sessions locking each other Find out the related SQL from trace file Work with dev team to change the logic
Dead Lock DEADLOCK DETECTED Current SQL statement for this session: DELETE FROM CSS_SI_BL_DISTRIBUTION WHERE (SI_REQUEST_ID = :"SYS_B_0") The following deadlock is not an ORACLE error. It is a deadlock due to user error in the design of an application or from issuing incorrect ad-hoc SQL. The following information may aid in determining the deadlock: Deadlock graph: ---------Blocker(s)-------- ---------Waiter(s)--------- Resource Name process session holds waits process session holds waits TM-00005ed2-00000000 622 573 SX 156 226 SX SSX TX-00060004-00063409 156 226 X 622 573 X session 573: DID 0001-026E-00000012 session 226: DID 0001-009C-0000256C session 226: DID 0001-009C-0000256C session 573: DID 0001-026E-00000012 Rows waited on: Session 226: obj - rowid = 0000700D - AAAHANAC4AAAAAAAAA (dictionary objn - 28685, file - 184, block - 0, slot - 0) -- CSS_SI_CARGO_FK4 Session 573: obj - rowid = 00005EB6 - AAAF62AC2AAApFQAAh (dictionary objn - 24246, file - 182, block - 168272, slot - 33) -- CSS_SI_BL_DISTRIBUTION Information on the OTHER waiting sessions: Session 226: pid=156 serial=25463 audsid=96195353 user: 33/CSSJAVA O/S info: user: oracle, term: unknown, ospid: , machine: as04.cargosmart.com program: JDBC Thin Client application name: JDBC Thin Client, hash value=0 Current SQL Statement: DELETE FROM CSS_SI_RELATED_BOOKING WHERE (SI_REQUEST_ID = :"SYS_B_0") End of information on OTHER waiting sessions.
Other Issues Rollback segment can not be extend Snapshot of rollback segment is too old
Rollback segment can not be extend Long transaction involve much of undo log Find out the related SQL from the error information
Rollback segment can not be extend SQL tuning Split the transaction Transfer to batch insert / direct loading Extent rollback segment
Snapshot of rollback segment is too old Long transaction running, unable to get the consistent snapshot in rollback segment Find out the related SQL from the error information
Snapshot of rollback segment is too old SQL Tuning Improve other much-undo consuming/holding transactions Setup special large segment for long transactions Extent rollback segment
Other OS commands for collect information sar iostat vmstat swapinfo
That’s All, Thanks!