Download presentation
Presentation is loading. Please wait.
Published byDominick Grant Modified over 9 years ago
1
NGOP Status and Plans Jim Fromm Marc Mengel Jack Schmidt May 2, 2006
2
Today’s talk… Current Status Farms/CMS/General Server split Recent Enhancements Performance Tuning Configuration File cleanup CMS Enhancements. Future Enhancements
3
Current Status: Farms/CMS/General Server Split Goals: Relieve bottlenecks by splitting out the servers Reduce configuration upgrade times Provide groups with independence Simplify the General server by consolidating the two machines into one.
4
Current Status: Farms/CMS/General Server Split Bottlenecks Farms and CMS Server hangs have been non-existent since split. General Server has experienced occasional hangs, but to a lesser degree (still two systems). This goal has been successfully met.
5
Current Status: Farms/CMS/General Server Split Reduction of configuration upgrade times Prior to the split, it took 2+ hours to perform a system configuration upgrade when things went well. Farms/CMS Takes less than 20 minutes to perform a configuration upgrade Less monitored elements per server One status engine allowed for the removal of Warshall’s algorithm for finding the transitive closure of a graph.
6
Current Status: Farms/CMS/General Server Split General Server Configuration upgrade time reduced to less than 30 minutes Recent parser optimizations will likely cut configuration upgrade times to ¼. This goal has been successfully met.
7
Current Status: Farms/CMS/General Server Split Server Independence Both CMS and Farms are up to speed with doing their own configurations. Upgrades are performed only when they need them. CMS (Gary Stiehr) has taken the initiative to add several features. Both groups have taken advantage of the splitting of the cluster. This goal has been successfully met.
8
Current Status: Farms/CMS/General Server Split General Server Consolidation Not complete: still using two servers. Doesn’t have the urgency as the other items, and has been easy to put on the backburner. Need to make this a priority.
9
Recent Enhancements Performance Tuning Preprocessor speedup. Marc Mengel implemented a change that improved performance of the XML preprocessor. NGOP preprocessor expands If_xxx/For_xxx tags Was using 90% CPU on startup. This was a known python performance issue. Stunning improvements on configuration upgrade times!
10
Recent Enhancements Configuration File Cleanup New "grand unified" XML Document Type Description http://www.fnal.gov/docs/products/ngo p/ngop_unified.dtd http://www.fnal.gov/docs/products/ngo p/ngop_unified.dtd XML editor friendly Works well with Merlin XML editor.
11
Merlin Screenshot
12
Recent Enhancements CMS No Downtimes: Modified to allow multiple status engines roles to be defined for one set of definitions. This allows re-configuration on one while the other remains active, eliminating downtimes due to configuration upgrades. Used the SE API to create GUI that only shows “bad” things. Developed a generic plug-in agent that allows for a standard way of defining agents in the CMS system.
13
Future Enhancements Dynamic Configuration Upgrades By far the most difficult enhancement to implement. CMS needs have been addressed with the multiple status engine solution. With reduction of configuration upgrade times coupled with the CMS workaround, this requirement becomes a very low priority.
14
Future Enhancements(Cont) CMS specific requested enhancements: Marking Monitored Elements down across clusters. Accelerate alarms based on time (i.e. yellow becomes red after 8 hours) Verify scalability to CMS planned growth. Documentation upgrade General Improvement of logging subsystem Research UDP protocol issues Dropped packet issue seems under control with recent network tunings May need to do this anyway to address CMS requirements for scalability. Web/Swatch agents need DELAY/GAP parameters “Anti” rules for Swatch agent
15
Future Enhancements(Cont) Wish List Real dynamic configuration SNMP agent Email watcher
16
Summary Split of farms and CMS has been successful: Quicker reconfigs result in less downtime. Splitting load has reduced NGOP hangs. CMS and Farms groups are managing things on their own timetable. Need to consolidate General server to one machine New release is needed: New CMS requests Investigate potential scalability issues. Improved logging New and improved agents. Revamp documentation and website. Develop maintainable metrics
17
Information Main Site: http://www-isd.fnal.gov/ngop/ngop.html Documentation: Users Guide- http://www- isd.fnal.gov/ngop/current/ngop_ug.htm Admin Guide- http://www- sd.fnal.gov/ngop/current/ngop_admin_gui de.htm
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.