Analyzing Server Crashes Hangs. Crashes Versus Hang. All about Server crash. All about Server hangs-Analyzing Thread dumps Analysis of thread dump samples.

Slides:



Advertisements
Similar presentations
Advanced Troubleshooting with Debug Diagnostics on IIS 6
Advertisements

TUNING WEBLOGIC SERVER. Core Server JDBC Tuning JVM Tuning OS Tuning TOPICS.
Paging: Design Issues. Readings r Silbershatz et al: ,
Chap 2 System Structures.
Operating System Structure
Module 20 Troubleshooting Common SQL Server 2008 R2 Administrative Issues.
Memory Leak WEBLOGIC SERVER.  Overview of Java Heap  What is a Memory Leak  Symptoms of Memory Leaks  How to troubleshoot  Tools  Best Practices.
3: OS Structures 1 OPERATING SYSTEM STRUCTURES PROCESS MANAGEMENT A process is a program in execution: (A program is passive, a process active.) A process.
Architectural Support for OS March 29, 2000 Instructor: Gary Kimura Slides courtesy of Hank Levy.
Chapter 14 Chapter 14: Server Monitoring and Optimization.
Review: Operating System Manages all system resources ALU Memory I/O Files Objectives: Security Efficiency Convenience.
Concurrency: Mutual Exclusion, Synchronization, Deadlock, and Starvation in Representative Operating Systems.
Home: Phones OFF Please Unix Kernel Parminder Singh Kang Home:
Process Management. External View of the OS Hardware fork() CreateProcess() CreateThread() close() CloseHandle() sleep() semctl() signal() SetWaitableTimer()
Chapter 6 Implementing Processes, Threads, and Resources.
Process Description and Control A process is sometimes called a task, it is a program in execution.
1 Thread Pools. 2 What’s A Thread Pool? A programming technique which we will use. A collection of threads that are created once (e.g. when server starts).
Slide 6-1 Copyright © 2004 Pearson Education, Inc. Operating Systems: A Modern Perspective, Chapter 6.
Check Disk. Disk Defragmenter Using Disk Defragmenter Effectively Run Disk Defragmenter when the computer will receive the least usage. Educate users.
Chapter 9 Overview  Reasons to monitor SQL Server  Performance Monitoring and Tuning  Tools for Monitoring SQL Server  Common Monitoring and Tuning.
Module 8: Monitoring SQL Server for Performance. Overview Why to Monitor SQL Server Performance Monitoring and Tuning Tools for Monitoring SQL Server.
Page 1 © 2001 Hewlett-Packard Company Tools for Measuring System and Application Performance Introduction GlancePlus Introduction Glance Motif Glance Character.
Memory Leak Overview and Tools. AGENDA  Overview of Java Heap  What is a Memory Leak  Symptoms of Memory Leaks  How to troubleshoot  Tools  Best.
2 Debugging Performance Issues, Memory Issues and Crashes in.net Applications Tess Ferrandez - Norlander Support Escalation Engineer Microsoft Session.
Slide 6-1 Copyright © 2004 Pearson Education, Inc. Operating Systems: A Modern Perspective, Chapter 6.
Distributed Deadlocks and Transaction Recovery.
Chapter 8 Windows Outline Programming Windows 2000 System structure Processes and threads in Windows 2000 Memory management The Windows 2000 file.
Rensselaer Polytechnic Institute CSCI-4210 – Operating Systems David Goldschmidt, Ph.D.
HOW-TO | CIF CONSULT | Redouane BELBAHRI Informatica Create a crash dump using “Debug Diag”
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 10 Database Performance Tuning and Query Optimization.
Bill Au CBS Interactive Troubleshooting Slow or Hung Java Applications.
Bill Au CBS Interactive Troubleshooting Slow or Hung Java Applications.
Ideas to Improve SharePoint Usage 4. What are these 4 Ideas? 1. 7 Steps to check SharePoint Health 2. Avoid common Deployment Mistakes 3. Analyze SharePoint.
Scheduling Basic scheduling policies, for OS schedulers (threads, tasks, processes) or thread library schedulers Review of Context Switching overheads.
Lecture 3 Process Concepts. What is a Process? A process is the dynamic execution context of an executing program. Several processes may run concurrently,
© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems Operating Systems Overview Part 2: History (continued)
Installing, Configuring And Troubleshooting Coldfusion Mark A Kruger CFG Ryan Stille CF Webtools.
DONE-08 Sizing and Performance Tuning N-Tier Applications Mike Furgal Performance Manager Progress Software
1 (Worker Queues) cs What is a Thread Pool? A collection of threads that are created once (e.g. when a server starts) That is, no need to create.
1 Announcements The fixing the bug part of Lab 4’s assignment 2 is now considered extra credit. Comments for the code should be on the parts you wrote.
Debugging and Profiling With some help from Software Carpentry resources.
Concurrency & Context Switching Process Control Block What's in it and why? How is it used? Who sees it? 5 State Process Model State Labels. Causes of.
Msdevcon.ru#msdevcon. ИЗ ПЕРВЫХ РУК: ДИАГНОСТИКА ПРИЛОЖЕНИЙ С ПОМОЩЮ ИНСТРУМЕНТОВ VISUAL STUDIO 2012 MAXIM GOLDIN Senior Developer, Microsoft.
Process Description and Control Chapter 3. Source Modified slides from Missouri U. of Science and Tech.
2 Processor(s)Main MemoryDevices Process, Thread & Resource Manager Memory Manager Device Manager File Manager.
Operating Systems CMPSC 473 Signals, Introduction to mutual exclusion September 28, Lecture 9 Instructor: Bhuvan Urgaonkar.
Introduction Contain two or more CPU share common memory and peripherals. Provide greater system throughput. Multiple processor executing simultaneous.
COMP091 – Operating Systems 1 Memory Management. Memory Management Terms Physical address –Actual address as seen by memory unit Logical address –Address.
 PROCESS MANAGEMENT  A process is a program in execution: (A program is passive, a process active.)  A process has resources (CPU time, files) and.
Embedded Real-Time Systems Processing interrupts Lecturer Department University.
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Weblogic Server Best Practices for Troubleshooting Performance Issues –
Processes and threads.
Process Management Process Concept Why only the global variables?
Chapter 3: Process Concept
CS 6560: Operating Systems Design
Java 9: The Quest for Very Large Heaps
Applied Operating System Concepts -
Chapter 2: System Structures
Lecture 21 Concurrency Introduction
Operating Systems: A Modern Perspective, Chapter 6
ColdFusion Performance Troubleshooting and Tuning
Lecture Topics: 11/1 General Operating System Concepts Processes
Threads Chapter 4.
Chapter 2: Operating-System Structures
Implementing Processes, Threads, and Resources
CSE 153 Design of Operating Systems Winter 2019
CSE 542: Operating Systems
In Today’s Class.. General Kernel Responsibilities Kernel Organization
Exceptions and networking
Presentation transcript:

Analyzing Server Crashes Hangs

Crashes Versus Hang. All about Server crash. All about Server hangs-Analyzing Thread dumps Analysis of thread dump samples Resources. Agenda

Crash Versus Hang Distinction between crash and hangs. Crash implies weblogic server java process no longer exists. Hang implies that weblogic server java process still exists but is not responding. Customers tend to use these terms interchangably.

All about crashes Determine all potential sources of native code used by the WebLogic Server. nativeIO. Type2 jdbc driver. Native libraries accessed with JNI calls. SSL native libraries. JVM itself. Most of the times its from JVM. Sometimes the JVM will produce a small log file that may contain useful information as to which library the crash has originated from. (hs_err_pid*.log)

Debugging with hs_err_pid.log We get current thread stack trace from hs_err_pid.log and depending on current thread information the issue can be debugged further: If current thread shows stack from nativeIO (performance pack): Workaround: Disable nativeIO. Fix: File a bug with CCE. If current thread shows stack from native call from type 2 driver: Workaround: Switch to pure JAVA type 4 driver instead of type 2 driver. Fix: Work with vendor of the database driver.

Debugging with hs_err_pid.log If current thread shows stack from JNI call from application code: Fix: Instruct customer that its application bug and needs to be fixed in their code. If the current shows stack from native code from weblogic SSL WorkAround:Use pure java version of SSL instead of native version If the current thread indicates crash from compiled/optimized code: WorkAround: Turn off the compilation and hence optimization (-Xint) Javacode->bytecode->compilation->optimization(hotspots) Fix: Work with JVM vendor support.

Debugging with hs_err_pid.log If the current thread indicates crash from threading library (applicable for solaris): Workaround: Switch to alternate thread library. The default thread library on solaris8 and below is: /usr/lib/libthread.so.1 This can be switched to: (Default from Solaris9) /usr/lib/lwp/libthread.so.1 Add /usr/lib/lwp to your LD_LIBRARY_PATH and -XX:+OverrideDefaultLibthread

Crashes without core Most crashes will cause a core dump. However sometimes the core file may not be available. Running out of disk space or quota to write the file. Not having the correct access permissions to create or write a file in the directory. The prior presence of a core dump of the same name that is read-only or write- protected.

Crashes without core Check the "ulimit -c" (Have it set to unlimited). Use coreadm on solaris.($ coreadmn) Also check the following parameter which on Solaris is in /etc/system file and can be used to disable core files: set sys:coredumpsize=0 On linux, the coredump is turned off by default on all systems. In RedHat Advanced Server 2.1 it should be under “/etc/security”. There should be a self-explanatory file called limits.conf and look for the word “core”. If set to "0" then coredump is disabled.

Crashes with core Core file is available. A core file is a memory map of the running process, and it saves the state of the application at the time of its termination. Core file is dependent on the exact shared libraries and OS. Core file *must* be analyzed on the customers machine.

Crashes with core If debugger is not available: Solaris 8,9 Use pstack and pmap -/usr/proc/bin/pstack core >pstack.txt -/usr/proc/bin/pmap core >pmap.txt Analyze pstack.txt and pmap.txt to understand which library caused the crash.

Crashes with core Gather information from the core. Use a debugger Different for different operating systems Methodology is the same. Check to see what the current thread is. More info is available at orePattern%2FCorePattern.html

Crash on Windows Get the windows debugging tools from Start up weblogic cd into and run (ignore messages saying that NT_SYMBOL_PATH is not set). Wait till process dies. Upon this event, directory will be created with dump and log files. Open a case with Sun Support and send the dmp file. ( If we have the symbols, we can run the debugger against the dmp file by opening the dmp file in windows debugger GUI)

All about hangs Process still exists. Process not responding. No response sent to clients. java weblogic.Admin PING command doesn’t return a normal reponse. Take multiple thread dumps (Kill -3 pid on unix platforms. Ctrl break on windows) For linux use ps -efHl | grep 'java' **. ** to identify root pid.

All about hangs Thread dumps for SUN JVM are sent to stdout. If you are using nohup, thread dumps are directed to nohup.out. For beasvc use - log:"d:\bea\user_projects\domains\myWLSdomain\myWLS server-stdout.txt" Use beasvc -dump -svcname:service-name You can also use java weblogic.Admin THREAD_DUMP command.

Not able to take thread dumps -Xrs option (JVM option) would make the OS immune to any signals including SIGQUIT (Sun JVM uses SIGQUIT to perform thread dumps) If a process is not responding to kill -3 then it’s a JVM bug.

All about hangs There are scenarios where the process appears to be hung (non responsive) and there are free threads available Process runs OutOfMemory. If java heap is full, server process appears to be hung and not accepting any requests because each request needs memory from heap for allocating objects. Process running out of File descriptors. Server cannot accept further requests because sockets cannot be created. GC taking long times (more than 20secs). This appears like a hang for end users.

Thread queues and Threads weblogic.kernel.Default – Worker threads that serve the external client requests. weblogic.kernel.system – Internal system work likeRJVM heartbeats,Http state Dumps for JNDI updates in a cluster etc Weblogic.socket.Muxer- Defaults to 3 on Unix systems and 2 on Windows.Used for socket reads Weblogic.admin.rmi- Handle OA& M requests like deployment of application,Application poller etc Weblogic.admin.html- only on admin server to handle console requests. Core health monitor – runtime health of the server JmsDispatcher, JMS.TimerTreePool, JMS.TimerClientPool -for jms

Analyzing Thread Dumps Common Thread states in thread dump: Runnable [marked as R in some VMs]: This state indicates that the thread is either running currently or is ready to run the next time the OS thread scheduler schedules it. Object.wait() [marked as CW in some VMs]: Indicates that the thread waiting on an object using Object.wait().This thread would progress further either upon notify() by another thread or if the condition for its wait() is fulfilled. For eg: wait(long timeout)wait Waiting for monitor entry [marked as MW in some VMs]: Indicates that the thread is waiting to enter a synchronized block.

Analyzing Thread Dumps Analyze thread dump for following scenarios. – Java Deadlock: More than one thread waiting to release the lock. – Threads blocked during n/w IO: Database or remote process nor responding. – Infinite Looping in the code. Multiple thread dump for with few seconds interval helps to debug slow response time

Analyzing thread dumps Classic deadlock Look for the threads waiting for monitor entry: For eg: "ExecuteThread: '95' for queue: 'default'" daemon prio=5 tid=0x411cf8 nid=0x6c waiting for monitor entry [0xd0f xd0f819d8] at weblogic.common.internal.ResourceAllocator.release(ResourceAllocator.java:766) at weblogic.jdbc.common.internal.ConnectionEnv.destroy(ConnectionEnv.java:590) The above thread is waiting to acquire lock on ResourceAllocator object. The next step is to identify the thread that is holding the ResourceAllocator object "ExecuteThread: '0' for queue: '__weblogic_admin_rmi_queue'" daemon prio=5 tid=0x41b978 nid=0x77 waiting for monitor entry [0xd xd04819d8] at weblogic.jdbc.common.internal.ConnectionEnv.getPrepStmtCacheHits(ConnectionEnv.java:174) at weblogic.common.internal.ResourceAllocator.getPrepStmtCacheHitCount (ResourceAllocator.java:1525) This thread is holding lock on ResourceAllocator object, but is waiting for ConnectionEnv object. This is a classic deadlock.

Analyzing Thread dumps Threads in wait() A sample dump: "ExecuteThread: '10' for queue: 'SERV_EJB_QUEUE'" daemon prio=5 tid=0x005607f0 nid=0x30 in Object.wait() [ ] at java.lang.Object.wait(Native Method) - waiting on (a weblogic.ejb20.pool.StatelessSessionPool) at weblogic.ejb20.pool.StatelessSessionPool.waitForBean(StatelessSessionPool.java:222) The above thread would come out of wait() under two conditions (depending on application logic) 1)One of the thread available in the execute queue pool would call notify() on this object when an instance is available. (If the wait() is indefinite). This can cause the thread to hang for ever if server never does a notify() to this object. 2) If the timeout exceeds, the thread would throw an exception and back to execute queue thread pool.

Analyzing Thread dumps Threads waiting for monitor entry and culprit thread stuck on remote call. This issue is more observed with a thread acquiring lock on a synchronized object and hung up with database (something wrong on database like database not responding) and rest of the threads that need the synchronized object are waiting for monitor entry. There are scenarios where thread holding the lock is not apparent. In these cases most of the times it would be locked at native layer which is a JVM bug. In those cases, taking pstack is the first step.

Tool for analyzing thread dump Samurai

Performance Tuning Overview

J2EE Tuning Zones

Platform (OS) Tuning Key Tuning Parameters TCP Parameters – tcp_time_wait_interval – tcp_keepalive_interval – ndd -set /dev/tcp “parameter” “value” File Descriptors – /etc/system set rlim_fd_cur 8192 (Soft Limit) set rlim_fd_max 8192 (Hard Limit)

Platform (OS) Tuning Key Tuning Parameters Prior to Solaris 2.7, the tcp_time_wait_interval parameter was called tcp_close_wait_interval. This parameter determines the time interval that a TCP socket is kept alive after issuing a close call. The default value of this parameter on Solaris is four minutes. When many clients connect for a short period of time, holding these socket resources can have a significant negative impact on performance. Setting this parameter to a value of (60 seconds) has shown a significant throughput enhancement when running benchmark JSP tests on Solaris. You might want to reduce this setting further if the server gets backed up with a queue of half-opened connections. Tip: Use the netstat -s -P tcp command to view all available TCP parameters

Platform (OS) Tuning Key Tuning Parameters Hard limits are a kernel-configurable item, and users can't exceed them. Soft limits are the user defaults, and users can change that using the ulimit program or the limit/unlimit builtins. man setrlimit(2) Basically, soft limits can be changed to anything up to the hard limit. Think of soft limits as the warning barrier. When a user reaches the soft limit they will get an warning message but are still allowed to use more space up to the hard limit. Also, you can configure the system to set expiration times for users who have exceeded thier soft limit. Just remember that the max file descriptors is 1024.

JVM Tuning Options JVM vendor and version. – User Certified Versions. JVM Heap Size Parameters. Garbage Collection Schemes (Sun JVM) – Generational Collector (Default, Stop the world) – Throughput Collector – Concurrent Low Pause Collector – Incremental Low Pause Collector Unix Threading Model – export LD_LIBRARY_PATH=/usr/lib/lwp – One to One mapping between Java and O/S thread

JVM Tuning Heap Sizing Parameters Heap Size – -Xms, -Xmx Young Generation Space – - XX:NewRatio, -XX:NewSize, -XX:MaxNewSize, Survivor Space – -XX:SurvivorRatio Permanent Generation – -XX:PermSize & -XX:MaxPermSize Aggressive Heap – -XX:+AggressiveHeap For more information and self learning look at

WebLogic Core Tuning Options “NativeIO” Performance Packs. Tuning Default ExecuteQueue. Thread usage control. StuckThreadDetection. Connection Backlog Buffering.

WebLogic Core Tuning Performance Packs Uses a platform-optimized, native socket multiplexor. Uses own socket reader threads and frees up weblogic threads. Available for most of the Platform – Solaris, Linux, HP-UX, AIX, Win Can be configured using WebLogic Admin Console.

WebLogic Core Tuning Performance Packs Benchmarks show major performance improvements when you use native performance packs on machines that host WebLogic Server instances. Performance packs use a platform-optimized, native socket multiplexor to improve server performance. For example, the native socket reader multiplexor threads have their own execute queue and do not borrow threads from the default execute queue, which frees up default execute threads to do application work. However, if you must use the pure-Java socket reader implementation for host machines, you can still improve the performance of socket communication by configuring the proper number of socket reader threads for each server instance and client machine.

WebLogic Core Tuning Default Execute Thread Tuning Number of simultaneous operations that can be performed by applications. – Production Mode Default 25 Tuning criteria. – Request turn around time. – Number of CPUs % Socket Reader Threads (Default 33%). In 8.1 Execute Queue can be tuned for OverFlow condition – Increases thread count dynamically.

WebLogic Core Tuning Thread usage Control Thread usage can be controlled by creating additional Execute Queues – Performance Optimization for critical application. – Throttle the performance – To protect application from Deadlock It can have Negative impact on overall performance

WebLogic Core Tuning StuckThreadDetection & Connection Backlog Buffering. StuckThread Detection – Detects when execute thread can not complete work or accept new work. – Warning purpose only, doesn’t change behaviour/state of the thread. – Stuck Thread Max Time, Stuck Thread Timer Interval Connection Backlog Buffering – The number of backlogged TCP connection requests.

WebLogic Core Tuning Guidelines NativeIO gives better perfromance, – consider Java IO if NativeIO is not stable. High number of thread can have negative impact on performance. – More threads does not imply that you can process more work. Avoid application designs that require creating new threads.

JDBC Connection Pool Tuning Options Connection Pool Sizing and Testing. Caching Statements. Connection Pool Request Timeouts. Recovering Leaked Connection. PinnedToThread.

JDBC Connection Pool Tuning Connection Pool Sizing and Testing Sizing – Initial capacity and Maximum capacity. – Shrink Frequency. Testing – Test Frequency. – Test Reserved/ Released Connections – Maximum Connections Made Unavailable – Test Table Name

JDBC Connection Pool Tuning Caching Statements. Reuses Callable and Prepared Statements in Cache. Reduces CPU usage at Database side and Improve performance. Cache Algorithms – LRU (Least Recently Used) – Fixed Statement CacheSize – Configured per connection pool. – It cache size for each connection in pool.

JDBC Connection Pool Tuning Recovering Leaked Connection. Connection Request Timeout Leaked Connection – Forcibly reclaims unused connection. – Inactive Connection Timeout. Connection Request Timeout. – Connection Reserve Timeout. – Maximum number of request that can wait for connection. PinnedToThread – Pins Connection to ExecuteThread – Connection.close() doesn’t return connection to pool.

JDBC Connection Pool Tuning Guidelines Configure initial capacity = maximum capacity. In most cases, maximum number of connection used does not exceed number of execute threads. Configure connection refreshing, if database calls fails because of stale connections. Try to avoid PinnedToThread if database resource is limited.

Common Performance Problems Memory Leak java.lang.OutOfMemoryError, is a symptom, however it is not a proof. Turn on verbose:gc for GC logs, i.e. – [Full GC 154K->99K(32576K), secs] Analyze GC for following scenarios, Full Garbage collection does not get chance to run before OutOfMemory is thrown. OutOfMemory is thrown eventhough memory usage is not reached to upper limit of the heap OutOfMemory is thrown during the load test ramp up. – Tune -XX:MaxPermSize, -Xms, -Xmx, -XX:NewSize, -XX:MaxNewSize XX:SurvivorRatio to resolve OOM.

Common Performance Problems Memory Leak Heap memory usage grows after each FULL GC at steady state condition of the load test – Potential memory leak Check for more common leaking objects. – Caching in the application, i.e EJB pool objects, HTTP Session objects, JMS Messages Use Memory Profiler to pinpoint memory leaking code, i.e JProbe and OptimizeIT

Performance Standards and Tools Standards – ECPerf J2ee Benchmark for Application Servers – SPECjAppServer2001 Benchmark to measure Application Server performance – SPEC JBB2000 Server side JVM performance benchmark. Tools – OptimizeIt, JProbe, PerformaSure. – Mercury LoadRunner, WebLoad, Grinder(OpenSource)