UPortal 2 Status – 9/19/05 Dan Mindler Enterprise Systems & Services

Slides:



Advertisements
Similar presentations
Advanced Troubleshooting with Debug Diagnostics on IIS 6
Advertisements

ASP.NET Best Practices Dawit Wubshet Park University.
MCTS Guide to Microsoft Windows Server 2008 Network Infrastructure Configuration Chapter 6 Managing and Administering DNS in Windows Server 2008.
Acknowledgments Byron Bush, Scott S. Hilpert and Lee, JeongKyu
Oracle9i Database Administrator: Implementation and Administration 1 Chapter 2 Overview of Database Administrator (DBA) Tools.
Module 10: Troubleshooting Active Directory, DNS, and Replication Issues.
Module 20 Troubleshooting Common SQL Server 2008 R2 Administrative Issues.
Memory Leak WEBLOGIC SERVER.  Overview of Java Heap  What is a Memory Leak  Symptoms of Memory Leaks  How to troubleshoot  Tools  Best Practices.
Copyright © 2001 Qusay H. Mahmoud RMI – Remote Method Invocation Introduction What is RMI? RMI System Architecture How does RMI work? Distributed Garbage.
Capacity Planning and Predicting Growth for Vista Amy Edwards, Ezra Freeloe and George Hernandez University System of Georgia 2007.
HyperContent 2.0 JA-SIG Winter Conference December 5, 2005 Alex Vigdor, Columbia University.
Chapter 14 Chapter 14: Server Monitoring and Optimization.
1 - Oracle Server Architecture Overview
Computer Science 162 Section 1 CS162 Teaching Staff.
©Company confidential 1 Performance Testing for TM & D – An Overview.
16: Distributed Systems1 DISTRIBUTED SYSTEM STRUCTURES NETWORK OPERATING SYSTEMS The users are aware of the physical structure of the network. Each site.
1 Thread Pools. 2 What’s A Thread Pool? A programming technique which we will use. A collection of threads that are created once (e.g. when server starts).
Barracuda Web Filter Overview March 26, 2008 Alan Pearson, Monroe County School District Marcus Burge, Network Engineer.
Page 1 © 2001 Hewlett-Packard Company Tools for Measuring System and Application Performance Introduction GlancePlus Introduction Glance Motif Glance Character.
Fundamentals of Python: From First Programs Through Data Structures
Chapter 9 Overview  Reasons to monitor SQL Server  Performance Monitoring and Tuning  Tools for Monitoring SQL Server  Common Monitoring and Tuning.
Understanding and Managing WebSphere V5
NovaBACKUP 10 xSP Technical Training By: Nathan Fouarge
Memory Leak Overview and Tools. AGENDA  Overview of Java Heap  What is a Memory Leak  Symptoms of Memory Leaks  How to troubleshoot  Tools  Best.
1 Copyright © 2009, Oracle. All rights reserved. Exploring the Oracle Database Architecture.
Module 15: Monitoring. Overview Formulate requirements and identify resources to monitor in a database environment Types of monitoring that can be carried.
IAssessment’s High Performance Gateway Presentation System 
Introduction to the Enterprise Library. Sounds familiar? Writing a component to encapsulate data access Building a component that allows you to log errors.
Introduction to HP LoadRunner Getting Familiar with LoadRunner >>>>>>>>>>>>>>>>>>>>>>
What’s new in Stack 3.2 Michael Youngstrom. Disclaimer This IS a presentation – So sit back and relax Please ask questions.
Oracle10g RAC Service Architecture Overview of Real Application Cluster Ready Services, Nodeapps, and User Defined Services.
- Tausief Shaikh (Senior Server developer). Introduction Covers sense of responsibility towards Project development in IT Focusing on memory and CPU utilizations.
UPortal Performance & Memory Issues Scott Battaglia Rutgers, the State University of New Jersey.
Eric Westfall – Indiana University James Bennett – Indiana University ADMINISTERING A PRODUCTION KUALI RICE INFRASTRUCTURE.
Tim De Borger Principal Solution Consultant May 18 th, 2007 Tuning the ESB How to make the Bus drive faster.
By Lecturer / Aisha Dawood 1.  You can control the number of dispatcher processes in the instance. Unlike the number of shared servers, the number of.
Module 10: Monitoring ISA Server Overview Monitoring Overview Configuring Alerts Configuring Session Monitoring Configuring Logging Configuring.
JA-SIG 12/4/20051 JMX For Monitoring and Maintenance JA-SIG - December 4, 2005 – Atlanta, GA Eric Dalquist Division of Information Technology University.
© 2008 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice Introduction to HP Availability Manager.
Chapter 13 Users, Groups Profiles and Policies. Learning Objectives Understand Windows XP Professional user accounts Understand the different types of.
Multi-Threaded Application CSNB534 Asma Shakil. Overview Software applications employ a strategy called multi- threaded programming to split tasks into.
TELE 301 Lecture 10: Scheduled … 1 Overview Last Lecture –Post installation This Lecture –Scheduled tasks and log management Next Lecture –DNS –Readings:
Lecture 3 Process Concepts. What is a Process? A process is the dynamic execution context of an executing program. Several processes may run concurrently,
Installing, Configuring And Troubleshooting Coldfusion Mark A Kruger CFG Ryan Stille CF Webtools.
Computing Infrastructure for Large Ecommerce Systems -- based on material written by Jacob Lindeman.
1 (Worker Queues) cs What is a Thread Pool? A collection of threads that are created once (e.g. when a server starts) That is, no need to create.
Overview Managing a DHCP Database Monitoring DHCP
Module 9: Implementing Caching. Overview Caching Overview Configuring General Cache Properties Configuring Cache Rules Configuring Content Download Jobs.
“Load Testing Early and Often” By Donald Doane Presentation to the Rockville MDCFUG.
Apache JMeter By Lamiya Qasim. Apache JMeter Tool for load test functional behavior and measure performance. Questions: Does JMeter offers support for.
Process Architecture Process Architecture - A portion of a program that can run independently of and concurrently with other portions of the program. Some.
Configuring and Troubleshooting Identity and Access Solutions with Windows Server® 2008 Active Directory®
CSI 3125, Preliminaries, page 1 SERVLET. CSI 3125, Preliminaries, page 2 SERVLET A servlet is a server-side software program, written in Java code, that.
Preface 1Performance Tuning Methodology: A Review Course Structure 1-2 Lesson Objective 1-3 Concepts 1-4 Determining the Worst Bottleneck 1-5 Understanding.
SRM-2 Road Map and CASTOR Certification Shaun de Witt 3/3/08.
1 Chapter Overview Monitoring Access to Shared Folders Creating and Sharing Local and Remote Folders Monitoring Network Users Using Offline Folders and.
Page 1 Monitoring, Optimization, and Troubleshooting Lecture 10 Hassan Shuja 11/30/2004.
Virtual Machine Movement and Hyper-V Replica
Event Management. EMU Graham Heyes April Overview Background Requirements Solution Status.
Galaxy in Production Nate Coraor Galaxy Team Penn State University.
Cofax Scalability Document Version Scaling Cofax in General The scalability of Cofax is directly related to the system software, hardware and network.
GlassFish Performance Tuning: Tips from the Field
Java Servlets By: Tejashri Udavant..
Software Architecture in Practice
Distributed Systems - Comp 655
Chapter 2: Operating-System Structures
Outline System architecture Current work Experiments Next Steps
Chapter 2: Operating-System Structures
Exceptions and networking
Presentation transcript:

uPortal 2 Status – 9/19/05 Dan Mindler Enterprise Systems & Services

9/19/052 UPortal 2: Releases  uPortal 2.5.x GA 5/26/ RC1 available 7/ GA ?  uPortal 2.4.x GA 8/12/05  Quick Starts for and made available 9/4/05 (contributed by Vincent Mathieu )

9/19/053 uPortal 2.x Memory/Performance: Status  As of 8/31/05, all known memory leaks in myRutgers have been resolved  Finalizers did play a part  Many leaks have been plugged  Performance improvements have been made  But, there is room for improvement…

9/19/054 uPortal 2.x Memory/Performance: Background  Scott Battaglia presented overview on 3/21/05 ( s+Meeting+Minutes#March2005uPortalDevelopersMeetingMinutes-24PerfMem) s+Meeting+Minutes#March2005uPortalDevelopersMeetingMinutes-24PerfMem  myRutgers has 10-15k unique users per day, with 30-40k total logins. Load balance per session to 4 machines  Symptoms indicated memory leak  Saw in production since 11/04  Utilized scripting to monitor JVM heap and mod_jk connections; restart before the out-of- memory occurred  Used tools (e.g., YourKit) to inspect the heap

9/19/055 uPortal 2.x Memory/Performance: Initial Changes  Fixes before March, 2005: Removed caching of IPersons from PersonDirectory CError and CSecureInfo now pass events to wrapped channels Restrict access to ChannelFactory’s channel cache, synchronized instantiateChannel method Guest sessions created on time out AbstractMultithreadedChannels were not cleaning out their channel state maps (2 of them)

9/19/056 uPortal 2.x Memory/Performance: More Changes  Fixes discussed March, 2005: Switch homegrown ThreadPool to Backport Concurrent Remove finalizer in UBC_Webmail Update to AuthorizationImpl

9/19/057 uPortal 2.x Memory/Performance: More leaks?  Since March, 2005 changes, behavior better, but myRutgers JVMs still get bounced in anywhere from 1 to 3 hours  Finalizer queue still indicates many unfinalized Objects  Question: Why can’t the finalizer thread keep up?

9/19/058 uPortal 2.x Memory/Performance: Finalization Problems  If an Object implements public void finalize(), after no longer referenced gets placed on finalizer queue  A finalizer thread will sweep this list at its’ leisure (i.e., no guarantee of when)  Perhaps the GC thread is too busy, many temporary Objects created  Target: finalizers, leaks and temporary objects

9/19/059 uPortal 2.x Memory/Performance: Finalizers, Leaks and Temporary Objects  Target finalizer() methods: org.jasig.portal.MultipartDataSource Xalan XRTreeFrag (now in Xalan 2.7) Upgrade to JavaMail and modify local copy to remove finalizers  Target Leaks/Performance: org.jasig.portal.ChannelManager.java channelTable (Hashtable) not cleaned properly org.jasig.portal.car.CarResources.java not closing stream Turn off dynamic class reloading in web.xml:  Target excessive temporary Objects: ( ): org.jasig.portal.serialize.HTMLdtd.fromChar(int) – based on Xerces, but now deprecated – removed temp object creation org.jasig.portal.utils.SubstitutionIntegerFilter – reduced temp object creation org.jasig.portal.MediaManager – reduced number of MediaManager objects created

9/19/0510 uPortal 2.x Memory/Performance: Still Memory Problems  Server behaving better; some servers requiring restart in 3 hours, others in 3 days  Heap dumps still point to Objects on Finalizer queue – primarily JDK sockets (plain and SSL)  Though still running out of a memory: Unable to replicate in QA – indicates testing does not reflect usage Unpredictable nature of heap growth - servers running similar traffic/load should show similar pattern in heap usage (not one failing in 3 hours and another in 3 days) Tools no longer point to leaked Objects (other than on Finalizer queue)  So, why can’t the finalizer thread keep up!?!?

9/19/0511 uPortal 2.x Memory/Performance: Out Of Memory Graph

9/19/0512 uPortal 2.x Memory/Performance: Try, Try Again  Too many open files (configured for 8k) Thought it was network sockets During production run, got a snapshot of lsof (list of open files), indicating hundreds of: java 6843 tomcat 30r VREG 85, /u01/app/jakarta-tomcat _load_test/work/Catalina/localhost/portallt/loader/com/swabunga/spell/engine/configu ration.properties uses Jazzy ( was not explicitly closing files ( com.swabunga.spell.engine.PropertyConfiguration.java )  Wrote/Deployed a web-app that given parameters: spawns a thread to run System.gc() every configured milliseconds spawns a thread to run System.runFinalization() every configured milliseconds – not guaranteed to run finalization, but system will make an effort

9/19/0513 uPortal 2.x Memory/Performance: A Watched Pot…Boils  Patch solved problem of too many open files  Periodic run of finalization appeared ok: :26:03.591: Thread-1: DEBUG: Requesting finalization :26:03.593: Thread-1: DEBUG: Finalization request complete :26:03.593: Thread-1: DEBUG: Finalization thread sleeping 900 seconds  Monitored tenured area of heap: Heap def new generation total K, used 62635K [0x , 0xb , 0xb ) eden space K, 0% used [0x , 0x , 0x9fac0000) from space K, 35% used [0x9fac0000, 0xa37eadb0, 0xaa560000) to space K, 0% used [0xaa560000, 0xaa560000, 0xb ) tenured generation total K, used K [0xb , 0xf , 0xf ) the space K, 27% used [0xb , 0xc , 0xc6872a00, 0xf ) compacting perm gen total 32768K, used 24113K [0xf , 0xf , 0xf ) the space 32768K, 73% used [0xf , 0xf678c538, 0xf678c600, 0xf )

9/19/0514 uPortal 2.x Memory/Performance: Bingo!  Noticed the tenured space growing after successive GC’s: heap leak occurring  Scanned the log for finalization messages: :07:04.466: Thread-2: DEBUG: Requesting finalization...  No debug print of finalization complete!!!!! AHAA!!!  Finalization request not returning at the same time the tenured area of heap starts growing… Finalization thread is blocked  Issue a “kill -3” to obtain a Java Thread Dump

9/19/0515 uPortal 2.x Memory/Performance: Blocked Finalizer Threads  Finalizer threads stack traces: "Secondary finalizer" prio=5 tid=0x011ef348 nid=0x1477 runnable [d d80019c0] at java.lang.Object.wait(Native Method) - waiting on (a com.sshtools.j2ssh.transport.TransportProtocolAlgorithmSync) at com.sshtools.j2ssh.transport.TransportProtocolAlgorithmSync.lock(Unknown Source) - locked (a com.sshtools.j2ssh.transport.TransportProtocolAlgorithmSync) at com.sshtools.j2ssh.transport.TransportProtocolOutputStream.sendMessage(Unknown Source) - locked (a com.sshtools.j2ssh.transport.TransportProtocolOutputStream) at com.sshtools.j2ssh.transport.TransportProtocolCommon.sendMessage(Unknown Source) - locked (a com.sshtools.j2ssh.transport.TransportProtocolClient) at com.sshtools.j2ssh.transport.TransportProtocolCommon.sendDisconnect(Unknown Source) at com.sshtools.j2ssh.transport.TransportProtocolCommon.disconnect(Unknown Source) at com.sshtools.j2ssh.SshClient.disconnect(Unknown Source) at edu.columbia.filesystem.impl.SFTPFileSystemImpl.disconnect(SFTPFileSystemImpl.java:89) at edu.columbia.filesystem.impl.RemoteFileSystemImpl.finalize(RemoteFileSystemImpl.java:715) at java.lang.ref.Finalizer.invokeFinalizeMethod(Native Method) at java.lang.ref.Finalizer.runFinalizer(Finalizer.java:83) at java.lang.ref.Finalizer.access$100(Finalizer.java:14) at java.lang.ref.Finalizer$2.run(Finalizer.java:131) at java.lang.Thread.run(Thread.java:534)

9/19/0516 uPortal 2.x Memory/Performance: Investigation  Thread dump indicated two finalization threads, both blocked at the same point in code  Points to Briefcase channel, in the low level code to handle secure connections to ftp server (J2SSH )  A lock release is NOT in a finally clause  A user who opens their briefcase, then proactively closes it causes the problem when the user logs/times out: Proactive closing sends a close command over the socket A finalizer in the HyperContent (uses J2SSH) issues close (throws Exception, not releasing lock) A finalizer in briefcase issues close through HyperContent, which waits on the lock

9/19/0517 uPortal 2.x Memory/Performance: Wash, Rinse, Repeat  Need to re-create problem in QA  Why not caught in QA? The briefcase channel utilizes sftp to an individual account, but a test sftp server does not exist, so a “real” user must be used – not part of the Test Scenario A new test script is written to simulate one user logging into the portal and using the Briefcase Problem is replicated consistently within 20 minutes using a 256MB heap

9/19/0518 uPortal 2.x Memory/Performance: Final Resolution  Though the J2SSH code is no longer actively maintained, a contribution is found to address this issue – made available 6/04  A HyperContent class is extended and a finalizer method is overridden to only close if the connection is open  The finalizer in the briefcase is removed  QA test now passes

9/19/0519 uPortal 2.x Memory/Performance: Install Fix in Production  Final set of fixes rolled into production 8/31/05  Servers have not run out of memory since release has been introduced  But…

9/19/0520 uPortal 2.x Memory/Performance: Increase in Threads  Monitoring of servers show a gradual increase in the number of threads (from initial 180 to over 1k)  kill -3 reveals hundreds of threads in JavaMail: "JavaMail-EventQueue" daemon prio=5 tid=0x002da400 nid=0x534 in Object.wait() [aa aa8819c0] at java.lang.Object.wait(Native Method) - waiting on (a javax.mail.EventQueue) at java.lang.Object.wait(Object.java:429) at javax.mail.EventQueue.dequeue(Unknown Source) - locked (a javax.mail.EventQueue) at javax.mail.EventQueue.run(Unknown Source) at java.lang.Thread.run(Thread.java:534)  Modified JavaMail to restore needed finalization() methods  Patch fixes thread problem, but…

9/19/0521 uPortal 2.x Memory/Performance: Heavy Traffic  New semester brings much more traffic (and more problems): Increase in portal logins (40k in one day, 3600/hour) overwhelms authentication infrastructure, causing mod_jk threads blocking resulting in restarts (and more logins, etc…) Channel time out – traced to heavy load and use of SoftReference caches; worst case scenario caching Heavy memory usage – many temporary objects; over 600 concurrent sessions results in over 700MB temporary objects every second

9/19/0522 uPortal 2.x Memory/Performance: Triage  Address immediate concerns for heavy load*: Introduce LDAP connection pooling Modify caching of layouts to share across users Tune caching of heavily used channels * Modifications not yet submitted to uPortal codebase

9/19/0523 uPortal 2.x Memory/Performance: Going Forward…  Focus on heavily used channels ( )  Revisit statistics to capture better data for monitoring/reporting/forecasting production servers  Implement better caching algorithms  Monitor/tune/implement pooling: LDAP DB HTTP client connections  Migrate to JDK 1.5  Continue to monitor/tune heap Large new area for temporary objects  Go on a Temporary Object Diet!

9/19/0524 uPortal 2.x Memory/Performance: Lessons Learned  QA should reflect production usage  Java “Memory Leaks” becomes a catch- phrase for many different types of problems  Useful tools: Java Heap/CPU analyzer (e.g., YourKit) lsof (ftp://lsof.itap.purdue.edu/pub/tools/unix/lsof/)ftp://lsof.itap.purdue.edu/pub/tools/unix/lsof/ kill -3 thread dumps  Experience was invaluable: Solved many memory leaks/performance issues in codebase contributing to a more stable portal Skills can be applied to other Java-based apps