Presentation is loading. Please wait.

Presentation is loading. Please wait.

Analyzing Server Crashes Hangs. Crashes Versus Hang. All about Server crash. All about Server hangs-Analyzing Thread dumps Analysis of thread dump samples.

Similar presentations


Presentation on theme: "Analyzing Server Crashes Hangs. Crashes Versus Hang. All about Server crash. All about Server hangs-Analyzing Thread dumps Analysis of thread dump samples."— Presentation transcript:

1 Analyzing Server Crashes Hangs

2 Crashes Versus Hang. All about Server crash. All about Server hangs-Analyzing Thread dumps Analysis of thread dump samples Resources. Agenda

3 Crash Versus Hang Distinction between crash and hangs. Crash implies weblogic server java process no longer exists. Hang implies that weblogic server java process still exists but is not responding. Customers tend to use these terms interchangably.

4 All about crashes Determine all potential sources of native code used by the WebLogic Server. nativeIO. Type2 jdbc driver. Native libraries accessed with JNI calls. SSL native libraries. JVM itself. Most of the times its from JVM. Sometimes the JVM will produce a small log file that may contain useful information as to which library the crash has originated from. (hs_err_pid*.log)

5 Debugging with hs_err_pid.log We get current thread stack trace from hs_err_pid.log and depending on current thread information the issue can be debugged further: If current thread shows stack from nativeIO (performance pack): Workaround: Disable nativeIO. Fix: File a bug with CCE. If current thread shows stack from native call from type 2 driver: Workaround: Switch to pure JAVA type 4 driver instead of type 2 driver. Fix: Work with vendor of the database driver.

6 Debugging with hs_err_pid.log If current thread shows stack from JNI call from application code: Fix: Instruct customer that its application bug and needs to be fixed in their code. If the current shows stack from native code from weblogic SSL WorkAround:Use pure java version of SSL instead of native version If the current thread indicates crash from compiled/optimized code: WorkAround: Turn off the compilation and hence optimization (-Xint) Javacode->bytecode->compilation->optimization(hotspots) Fix: Work with JVM vendor support.

7 Debugging with hs_err_pid.log If the current thread indicates crash from threading library (applicable for solaris): Workaround: Switch to alternate thread library. The default thread library on solaris8 and below is: /usr/lib/libthread.so.1 This can be switched to: (Default from Solaris9) /usr/lib/lwp/libthread.so.1 Add /usr/lib/lwp to your LD_LIBRARY_PATH and -XX:+OverrideDefaultLibthread

8 Crashes without core Most crashes will cause a core dump. However sometimes the core file may not be available. Running out of disk space or quota to write the file. Not having the correct access permissions to create or write a file in the directory. The prior presence of a core dump of the same name that is read-only or write- protected.

9 Crashes without core Check the "ulimit -c" (Have it set to unlimited). Use coreadm on solaris.($ coreadmn) Also check the following parameter which on Solaris is in /etc/system file and can be used to disable core files: set sys:coredumpsize=0 On linux, the coredump is turned off by default on all systems. In RedHat Advanced Server 2.1 it should be under “/etc/security”. There should be a self-explanatory file called limits.conf and look for the word “core”. If set to "0" then coredump is disabled.

10 Crashes with core Core file is available. A core file is a memory map of the running process, and it saves the state of the application at the time of its termination. Core file is dependent on the exact shared libraries and OS. Core file *must* be analyzed on the customers machine.

11 Crashes with core If debugger is not available: Solaris 8,9 Use pstack and pmap -/usr/proc/bin/pstack core >pstack.txt -/usr/proc/bin/pmap core >pmap.txt Analyze pstack.txt and pmap.txt to understand which library caused the crash.

12 Crashes with core Gather information from the core. Use a debugger Different for different operating systems Methodology is the same. Check to see what the current thread is. More info is available at http://supportlab.bea.com:8000/spWiki/attach?page=SystemC orePattern%2FCorePattern.html

13 Crash on Windows Get the windows debugging tools from http://www.microsoft.com/whdc/devtools/debugging/installx86.mspx Start up weblogic cd into and run (ignore messages saying that NT_SYMBOL_PATH is not set). Wait till process dies. Upon this event, directory will be created with dump and log files. Open a case with Sun Support and send the dmp file. ( If we have the symbols, we can run the debugger against the dmp file by opening the dmp file in windows debugger GUI)

14 All about hangs Process still exists. Process not responding. No response sent to clients. java weblogic.Admin PING command doesn’t return a normal reponse. Take multiple thread dumps (Kill -3 pid on unix platforms. Ctrl break on windows) For linux use ps -efHl | grep 'java' **. ** to identify root pid.

15 All about hangs Thread dumps for SUN JVM are sent to stdout. If you are using nohup, thread dumps are directed to nohup.out. For beasvc use - log:"d:\bea\user_projects\domains\myWLSdomain\myWLS server-stdout.txt" Use beasvc -dump -svcname:service-name You can also use java weblogic.Admin THREAD_DUMP command.

16 Not able to take thread dumps -Xrs option (JVM option) would make the OS immune to any signals including SIGQUIT (Sun JVM uses SIGQUIT to perform thread dumps) If a process is not responding to kill -3 then it’s a JVM bug.

17 All about hangs There are scenarios where the process appears to be hung (non responsive) and there are free threads available Process runs OutOfMemory. If java heap is full, server process appears to be hung and not accepting any requests because each request needs memory from heap for allocating objects. Process running out of File descriptors. Server cannot accept further requests because sockets cannot be created. GC taking long times (more than 20secs). This appears like a hang for end users.

18 Thread queues and Threads weblogic.kernel.Default – Worker threads that serve the external client requests. weblogic.kernel.system – Internal system work likeRJVM heartbeats,Http state Dumps for JNDI updates in a cluster etc Weblogic.socket.Muxer- Defaults to 3 on Unix systems and 2 on Windows.Used for socket reads Weblogic.admin.rmi- Handle OA& M requests like deployment of application,Application poller etc Weblogic.admin.html- only on admin server to handle console requests. Core health monitor – runtime health of the server JmsDispatcher, JMS.TimerTreePool, JMS.TimerClientPool -for jms

19 Analyzing Thread Dumps Common Thread states in thread dump: Runnable [marked as R in some VMs]: This state indicates that the thread is either running currently or is ready to run the next time the OS thread scheduler schedules it. Object.wait() [marked as CW in some VMs]: Indicates that the thread waiting on an object using Object.wait().This thread would progress further either upon notify() by another thread or if the condition for its wait() is fulfilled. For eg: wait(long timeout)wait Waiting for monitor entry [marked as MW in some VMs]: Indicates that the thread is waiting to enter a synchronized block.

20 Analyzing Thread Dumps Analyze thread dump for following scenarios. – Java Deadlock: More than one thread waiting to release the lock. – Threads blocked during n/w IO: Database or remote process nor responding. – Infinite Looping in the code. Multiple thread dump for with few seconds interval helps to debug slow response time

21 Analyzing thread dumps Classic deadlock Look for the threads waiting for monitor entry: For eg: "ExecuteThread: '95' for queue: 'default'" daemon prio=5 tid=0x411cf8 nid=0x6c waiting for monitor entry [0xd0f80000..0xd0f819d8] at weblogic.common.internal.ResourceAllocator.release(ResourceAllocator.java:766) at weblogic.jdbc.common.internal.ConnectionEnv.destroy(ConnectionEnv.java:590) The above thread is waiting to acquire lock on ResourceAllocator object. The next step is to identify the thread that is holding the ResourceAllocator object "ExecuteThread: '0' for queue: '__weblogic_admin_rmi_queue'" daemon prio=5 tid=0x41b978 nid=0x77 waiting for monitor entry [0xd0480000..0xd04819d8] at weblogic.jdbc.common.internal.ConnectionEnv.getPrepStmtCacheHits(ConnectionEnv.java:174) at weblogic.common.internal.ResourceAllocator.getPrepStmtCacheHitCount (ResourceAllocator.java:1525) This thread is holding lock on ResourceAllocator object, but is waiting for ConnectionEnv object. This is a classic deadlock.

22 Analyzing Thread dumps Threads in wait() A sample dump: "ExecuteThread: '10' for queue: 'SERV_EJB_QUEUE'" daemon prio=5 tid=0x005607f0 nid=0x30 in Object.wait() [83300000..83301998] at java.lang.Object.wait(Native Method) - waiting on (a weblogic.ejb20.pool.StatelessSessionPool) at weblogic.ejb20.pool.StatelessSessionPool.waitForBean(StatelessSessionPool.java:222) The above thread would come out of wait() under two conditions (depending on application logic) 1)One of the thread available in the execute queue pool would call notify() on this object when an instance is available. (If the wait() is indefinite). This can cause the thread to hang for ever if server never does a notify() to this object. 2) If the timeout exceeds, the thread would throw an exception and back to execute queue thread pool.

23 Analyzing Thread dumps Threads waiting for monitor entry and culprit thread stuck on remote call. This issue is more observed with a thread acquiring lock on a synchronized object and hung up with database (something wrong on database like database not responding) and rest of the threads that need the synchronized object are waiting for monitor entry. There are scenarios where thread holding the lock is not apparent. In these cases most of the times it would be locked at native layer which is a JVM bug. In those cases, taking pstack is the first step.

24 Tool for analyzing thread dump Samurai http://yusuke.homeip.net/samurai/?english#content_1_0

25 Performance Tuning Overview

26 J2EE Tuning Zones

27 Platform (OS) Tuning Key Tuning Parameters TCP Parameters – tcp_time_wait_interval – tcp_keepalive_interval – ndd -set /dev/tcp “parameter” “value” File Descriptors – /etc/system set rlim_fd_cur 8192 (Soft Limit) set rlim_fd_max 8192 (Hard Limit)

28 Platform (OS) Tuning Key Tuning Parameters Prior to Solaris 2.7, the tcp_time_wait_interval parameter was called tcp_close_wait_interval. This parameter determines the time interval that a TCP socket is kept alive after issuing a close call. The default value of this parameter on Solaris is four minutes. When many clients connect for a short period of time, holding these socket resources can have a significant negative impact on performance. Setting this parameter to a value of 60000 (60 seconds) has shown a significant throughput enhancement when running benchmark JSP tests on Solaris. You might want to reduce this setting further if the server gets backed up with a queue of half-opened connections. Tip: Use the netstat -s -P tcp command to view all available TCP parameters

29 Platform (OS) Tuning Key Tuning Parameters Hard limits are a kernel-configurable item, and users can't exceed them. Soft limits are the user defaults, and users can change that using the ulimit program or the limit/unlimit builtins. man setrlimit(2) Basically, soft limits can be changed to anything up to the hard limit. Think of soft limits as the warning barrier. When a user reaches the soft limit they will get an warning message but are still allowed to use more space up to the hard limit. Also, you can configure the system to set expiration times for users who have exceeded thier soft limit. Just remember that the max file descriptors is 1024.

30 JVM Tuning Options JVM vendor and version. – User Certified Versions. JVM Heap Size Parameters. Garbage Collection Schemes (Sun 1.4.2 JVM) – Generational Collector (Default, Stop the world) – Throughput Collector – Concurrent Low Pause Collector – Incremental Low Pause Collector Unix Threading Model – export LD_LIBRARY_PATH=/usr/lib/lwp – One to One mapping between Java and O/S thread

31 JVM Tuning Heap Sizing Parameters Heap Size – -Xms, -Xmx Young Generation Space – - XX:NewRatio, -XX:NewSize, -XX:MaxNewSize, Survivor Space – -XX:SurvivorRatio Permanent Generation – -XX:PermSize & -XX:MaxPermSize Aggressive Heap – -XX:+AggressiveHeap For more information and self learning look at http://www.petefreitag.com/articles/gctuning/ http://www.petefreitag.com/articles/gctuning/

32 WebLogic Core Tuning Options “NativeIO” Performance Packs. Tuning Default ExecuteQueue. Thread usage control. StuckThreadDetection. Connection Backlog Buffering.

33 WebLogic Core Tuning Performance Packs Uses a platform-optimized, native socket multiplexor. Uses own socket reader threads and frees up weblogic threads. Available for most of the Platform – Solaris, Linux, HP-UX, AIX, Win Can be configured using WebLogic Admin Console.

34 WebLogic Core Tuning Performance Packs Benchmarks show major performance improvements when you use native performance packs on machines that host WebLogic Server instances. Performance packs use a platform-optimized, native socket multiplexor to improve server performance. For example, the native socket reader multiplexor threads have their own execute queue and do not borrow threads from the default execute queue, which frees up default execute threads to do application work. However, if you must use the pure-Java socket reader implementation for host machines, you can still improve the performance of socket communication by configuring the proper number of socket reader threads for each server instance and client machine.

35 WebLogic Core Tuning Default Execute Thread Tuning Number of simultaneous operations that can be performed by applications. – Production Mode Default 25 Tuning criteria. – Request turn around time. – Number of CPUs % Socket Reader Threads (Default 33%). In 8.1 Execute Queue can be tuned for OverFlow condition – Increases thread count dynamically.

36 WebLogic Core Tuning Thread usage Control Thread usage can be controlled by creating additional Execute Queues – Performance Optimization for critical application. – Throttle the performance – To protect application from Deadlock It can have Negative impact on overall performance

37 WebLogic Core Tuning StuckThreadDetection & Connection Backlog Buffering. StuckThread Detection – Detects when execute thread can not complete work or accept new work. – Warning purpose only, doesn’t change behaviour/state of the thread. – Stuck Thread Max Time, Stuck Thread Timer Interval Connection Backlog Buffering – The number of backlogged TCP connection requests.

38 WebLogic Core Tuning Guidelines NativeIO gives better perfromance, – consider Java IO if NativeIO is not stable. High number of thread can have negative impact on performance. – More threads does not imply that you can process more work. Avoid application designs that require creating new threads.

39 JDBC Connection Pool Tuning Options Connection Pool Sizing and Testing. Caching Statements. Connection Pool Request Timeouts. Recovering Leaked Connection. PinnedToThread.

40 JDBC Connection Pool Tuning Connection Pool Sizing and Testing Sizing – Initial capacity and Maximum capacity. – Shrink Frequency. Testing – Test Frequency. – Test Reserved/ Released Connections – Maximum Connections Made Unavailable – Test Table Name

41 JDBC Connection Pool Tuning Caching Statements. Reuses Callable and Prepared Statements in Cache. Reduces CPU usage at Database side and Improve performance. Cache Algorithms – LRU (Least Recently Used) – Fixed Statement CacheSize – Configured per connection pool. – It cache size for each connection in pool.

42 JDBC Connection Pool Tuning Recovering Leaked Connection. Connection Request Timeout Leaked Connection – Forcibly reclaims unused connection. – Inactive Connection Timeout. Connection Request Timeout. – Connection Reserve Timeout. – Maximum number of request that can wait for connection. PinnedToThread – Pins Connection to ExecuteThread – Connection.close() doesn’t return connection to pool.

43 JDBC Connection Pool Tuning Guidelines Configure initial capacity = maximum capacity. In most cases, maximum number of connection used does not exceed number of execute threads. Configure connection refreshing, if database calls fails because of stale connections. Try to avoid PinnedToThread if database resource is limited.

44 Common Performance Problems Memory Leak java.lang.OutOfMemoryError, is a symptom, however it is not a proof. Turn on verbose:gc for GC logs, i.e. – [Full GC 154K->99K(32576K), 0.0085354 secs] Analyze GC for following scenarios, Full Garbage collection does not get chance to run before OutOfMemory is thrown. OutOfMemory is thrown eventhough memory usage is not reached to upper limit of the heap OutOfMemory is thrown during the load test ramp up. – Tune -XX:MaxPermSize, -Xms, -Xmx, -XX:NewSize, -XX:MaxNewSize XX:SurvivorRatio to resolve OOM.

45 Common Performance Problems Memory Leak Heap memory usage grows after each FULL GC at steady state condition of the load test – Potential memory leak Check for more common leaking objects. – Caching in the application, i.e EJB pool objects, HTTP Session objects, JMS Messages Use Memory Profiler to pinpoint memory leaking code, i.e JProbe and OptimizeIT

46 Performance Standards and Tools Standards – ECPerf J2ee Benchmark for Application Servers – SPECjAppServer2001 Benchmark to measure Application Server performance – SPEC JBB2000 Server side JVM performance benchmark. http://www.spec.org/jbb2000/ Tools – OptimizeIt, JProbe, PerformaSure. – Mercury LoadRunner, WebLoad, Grinder(OpenSource)


Download ppt "Analyzing Server Crashes Hangs. Crashes Versus Hang. All about Server crash. All about Server hangs-Analyzing Thread dumps Analysis of thread dump samples."

Similar presentations


Ads by Google