NGOP Status and Plans Jim Fromm Marc Mengel Jack Schmidt May 2, 2006.

Slides:



Advertisements
Similar presentations
Current methods for negotiating firewalls for the Condor ® system Bruce Beckles (University of Cambridge Computing Service) Se-Chang Son (University of.
Advertisements

19/06/2002WP4 Workshop - CERN WP4 - Monitoring Progress report
VMware Capacity Planner 2.7 Discussion and Demo from Engineering May 2009.
GENI Experiment Control Using Gush Jeannie Albrecht and Amin Vahdat Williams College and UC San Diego.
1 User Analysis Workgroup Update  All four experiments gave input by mid December  ALICE by document and links  Very independent.
EGEE is a project funded by the European Union under contract IST Using SRM: DPM and dCache G.Donvito,V.Spinoso INFN Bari
MCTS GUIDE TO MICROSOFT WINDOWS 7 Chapter 10 Performance Tuning.
MCTS Guide to Microsoft Windows Server 2008 Network Infrastructure Configuration Chapter 4 Installing and Configuring the Dynamic Host Configuration Protocol.
Environmental Council of States Network Authentication and Authorization Services The Shared Security Component February 28, 2005.
Network Management Overview IACT 918 July 2004 Gene Awyzio SITACS University of Wollongong.
Network Hosts Analyzer Hadas Shumovitch Elad Levi Tal Katz
NGOP J.Fromm K.Genser T.Levshina M.Mengel V.Podstavkov.
BMC Control-M Architecture By Shaikh Ilyas
OEP infrastructure issues Gregory Dubois-Felsmann Trigger & Online Workshop Caltech 2 December 2004.
1 NGOP Overview Jim Fromm Farms and Clustered Systems Group Computing Division Fermilab.
OpStor - A multi vendor storage resource management and capacity forecasting software.
Understanding and Managing WebSphere V5
FIREWALL TECHNOLOGIES Tahani al jehani. Firewall benefits  A firewall functions as a choke point – all traffic in and out must pass through this single.
TIBCO Designer TIBCO BusinessWorks is a scalable, extensible, and easy to use integration platform that allows you to develop, deploy, and run integration.
MICHAEL EDDINGTON Advanced Fuzzing with Peach 2.
Danielle Baldwin, ITS Web Services CMS Administrator Application Overview and Joomla 1.5 RC 1 Highlights.
Bob Thome, Senior Director of Product Management, Oracle SIMPLIFYING YOUR HIGH AVAILABILITY DATABASE.
DONE-10: Adminserver Survival Tips Brian Bowman Product Manager, Data Management Group.
MCTS Guide to Microsoft Windows 7
Rice Status Update University of California July 20, 2009 Eric Westfall – Kuali Rice Project Manager.
Top-Down Network Design Chapter Nine Developing Network Management Strategies Oppenheimer.
Honeypot and Intrusion Detection System
1 1 Vulnerability Assessment of Grid Software Jim Kupsch Associate Researcher, Dept. of Computer Sciences University of Wisconsin-Madison Condor Week 2006.
Kuali Enterprise Workflow Presented at ITANA October 2009 Eric Westfall – Kuali Rice Project Manager.
Berliner Elektronenspeicherringgesellschaft für Synchrotronstrahlung mbH (BESSY) CA Proxy Gateway Status and Plans Ralph Lange, BESSY.
CSCI 530 Lab Intrusion Detection Systems IDS. A collection of techniques and methodologies used to monitor suspicious activities both at the network and.
Fulvio Galeazzi, CHEP 2003, Mar 24— A Monitoring System for the BaBar INFN Computing Cluster Moreno Marzolla Università “Ca' Foscari” di Venezia.
Module 13 Implementing Business Continuity. Module Overview Protecting and Recovering Content Working with Backup and Restore for Disaster Recovery Implementing.
What’s New in WatchGuard XCS v9.1 Update 1. WatchGuard XCS v9.1 Update 1  Enhancements that improve ease of use New Dashboard items  Mail Summary >
MCTS Guide to Microsoft Windows Server 2008 Network Infrastructure Configuration Chapter 4 Installing and Configuring the Dynamic Host Configuration Protocol.
Web Cache Redirection using a Layer-4 switch: Architecture, issues, tradeoffs, and trends Shirish Sathaye Vice-President of Engineering.
Fermilab Distributed Monitoring System (NGOP) Progress Report J.Fromm K.Genser T.Levshina M.Mengel V.Podstavkov.
Module 10 Administering and Configuring SharePoint Search.
Microsoft Reseach, CambridgeBrendan Murphy. Measuring System Behaviour in the field Brendan Murphy Microsoft Research Cambridge.
Network Management Protocols and Applications Cliff Leach Mike Looney Danny Mar Monty Maughon.
LHCb Software Week November 2003 Gennady Kuznetsov Production Manager Tools (New Architecture)
Apache JMeter By Lamiya Qasim. Apache JMeter Tool for load test functional behavior and measure performance. Questions: Does JMeter offers support for.
ENABLING companies to DEPLOY wireless data solutions Application Development Tools Remote Deployment and Management LAN/WAN environments.
GIST 19: GGSPS status Status of GGSPS development and operations Andy Smith GGSPS software project manager.
NGOP Prototype Status Report T.Levshina. N ext G eneration O peration GROUP Integrated Systems Development Department Krzysztof.
ATUL PATANKAR [ ASUG INSTALLATION MEMBER MEMBER SINCE: 2000 LINDA WILSON [ ASUG INSTALLATION MEMBER MEMBER SINCE: 1999 JUERGEN LINDNER [ SAP POINT OF CONTACT.
Wavetrix Changing the Paradigm: Remote Access Using Outbound Connections Remote Monitoring, Control & Automation Orlando, FL October 6, 2005.
Tier3 monitoring. Initial issues. Danila Oleynik. Artem Petrosyan. JINR.
1 Objectives Discuss the basics of Dynamic Host Configuration Protocol (DHCP) Describe the components and processes of DHCP Install DHCP in a Windows Server.
T3 data access via BitTorrent Charles G Waldman USATLAS/University of Chicago USATLAS T2/T3 Workshop Aug
SPI NIGHTLIES Alex Hodgkins. SPI nightlies  Build and test various software projects each night  Provide a nightlies summary page that displays all.
Monitoring with InfluxDB & Grafana
A Service-Based SLA Model HEPIX -- CERN May 6, 2008 Tony Chan -- BNL.
Michael Mast Senior Architect Applications Technology Oracle Corporation.
Online Software November 10, 2009 Infrastructure Overview Luciano Orsini, Roland Moser Invited Talk at SuperB ETD-Online Status Review.
Monitoring Systems Jim Fromm Marc Mengel Jack Schmidt Gary Stiehr.
SQL Database Management
SNMP M Clements ENS.
MCTS Guide to Microsoft Windows 7
Network Load Balancing
Maximum Availability Architecture Enterprise Technology Centre.
SNMP M Clements ENS.
Storage Virtualization
Release 3.0 User Interface and Highlights
Object Oriented Analysis and Design
SNMP M Clements ENS.
POP: Building Automation Around Secure Server Deployment
CrawlBuddy The web’s best friend.
Transition Readiness Review
Field installable, upgradeable and scaleable
Presentation transcript:

NGOP Status and Plans Jim Fromm Marc Mengel Jack Schmidt May 2, 2006

Today’s talk… Current Status Farms/CMS/General Server split Recent Enhancements Performance Tuning Configuration File cleanup CMS Enhancements. Future Enhancements

Current Status: Farms/CMS/General Server Split Goals: Relieve bottlenecks by splitting out the servers Reduce configuration upgrade times Provide groups with independence Simplify the General server by consolidating the two machines into one.

Current Status: Farms/CMS/General Server Split Bottlenecks Farms and CMS Server hangs have been non-existent since split. General Server has experienced occasional hangs, but to a lesser degree (still two systems). This goal has been successfully met.

Current Status: Farms/CMS/General Server Split Reduction of configuration upgrade times Prior to the split, it took 2+ hours to perform a system configuration upgrade when things went well. Farms/CMS Takes less than 20 minutes to perform a configuration upgrade Less monitored elements per server One status engine allowed for the removal of Warshall’s algorithm for finding the transitive closure of a graph.

Current Status: Farms/CMS/General Server Split General Server Configuration upgrade time reduced to less than 30 minutes Recent parser optimizations will likely cut configuration upgrade times to ¼. This goal has been successfully met.

Current Status: Farms/CMS/General Server Split Server Independence Both CMS and Farms are up to speed with doing their own configurations. Upgrades are performed only when they need them. CMS (Gary Stiehr) has taken the initiative to add several features. Both groups have taken advantage of the splitting of the cluster. This goal has been successfully met.

Current Status: Farms/CMS/General Server Split General Server Consolidation Not complete: still using two servers. Doesn’t have the urgency as the other items, and has been easy to put on the backburner. Need to make this a priority.

Recent Enhancements Performance Tuning Preprocessor speedup. Marc Mengel implemented a change that improved performance of the XML preprocessor. NGOP preprocessor expands If_xxx/For_xxx tags Was using 90% CPU on startup. This was a known python performance issue. Stunning improvements on configuration upgrade times!

Recent Enhancements Configuration File Cleanup New "grand unified" XML Document Type Description p/ngop_unified.dtd p/ngop_unified.dtd XML editor friendly Works well with Merlin XML editor.

Merlin Screenshot

Recent Enhancements CMS No Downtimes: Modified to allow multiple status engines roles to be defined for one set of definitions. This allows re-configuration on one while the other remains active, eliminating downtimes due to configuration upgrades. Used the SE API to create GUI that only shows “bad” things. Developed a generic plug-in agent that allows for a standard way of defining agents in the CMS system.

Future Enhancements Dynamic Configuration Upgrades By far the most difficult enhancement to implement. CMS needs have been addressed with the multiple status engine solution. With reduction of configuration upgrade times coupled with the CMS workaround, this requirement becomes a very low priority.

Future Enhancements(Cont) CMS specific requested enhancements: Marking Monitored Elements down across clusters. Accelerate alarms based on time (i.e. yellow becomes red after 8 hours) Verify scalability to CMS planned growth. Documentation upgrade General Improvement of logging subsystem Research UDP protocol issues Dropped packet issue seems under control with recent network tunings May need to do this anyway to address CMS requirements for scalability. Web/Swatch agents need DELAY/GAP parameters “Anti” rules for Swatch agent

Future Enhancements(Cont) Wish List Real dynamic configuration SNMP agent watcher

Summary Split of farms and CMS has been successful: Quicker reconfigs result in less downtime. Splitting load has reduced NGOP hangs. CMS and Farms groups are managing things on their own timetable. Need to consolidate General server to one machine New release is needed: New CMS requests Investigate potential scalability issues. Improved logging New and improved agents. Revamp documentation and website. Develop maintainable metrics

Information Main Site: Documentation: Users Guide- isd.fnal.gov/ngop/current/ngop_ug.htm Admin Guide- sd.fnal.gov/ngop/current/ngop_admin_gui de.htm