Performance and Exception Monitoring Project Tim Smith CERN/IT
2000/11/02Tim Smith: JLab2 Overview Motivation Objectives Analysis and Design Prototyping Perspective and Future
2000/11/02Tim Smith: JLab3 Motivation Alarm Recovery action Monitoring System Local Remote Process killer Console Resource planning Accounting Security Inventory Independent systems No single overview Duplicated collection Host based: Want Service Perceived problems not real Scalability
2000/11/02Tim Smith: JLab4 Motivation Alarm Recovery action Monitoring System Local Remote Console Resource planning Accounting Security Inventory Configuration Collection Transport Repository mgmt Display
2000/11/02Tim Smith: JLab5 Objectives To provide tools in which the alarms and displays are orientated to the overall service provided: User end-to-end views, Quality of service views Managerial views of resource usage / evolution / failure rates Service provider views, and detailed machine views Link the alarms to both the monitoring and corrective actions To provide service level metrics To provide a uniform monitoring infrastructure Coordinated central repositories + Common logging format Averaging and archiving of logged information Correlations between logged information Multiple input routes; extensible moni. clients Modular tools; demonstrated scalability
2000/11/02Tim Smith: JLab6 Process Analysis User Requirements Document Current Tools survey Enterprise/Cluster mgmt, Pub domain, other labs, building blocks, DAQ, Run Control, Slow Control Goal / Question / Metric formalism System Requirements Document Design Interfaces Document Prototyping
2000/11/02Tim Smith: JLab7 Goal / Question / Metric Ensure quality of Interactive Service Sufficient nodes? Low enough load? Slow to respond to commands? Contactable via network Network daemons alive No nologin Free ptys Connection test from remote node
2000/11/02Tim Smith: JLab8 PEM Architecture User Interface Monitoring Agent Monitoring Broker Measurement Repository Configuration Repository Correlation Engine Access Server n Outside PEM
2000/11/02Tim Smith: JLab9 Configuration Repository Parser XML- DBMS jdbc RDBMS Viewers Xerces From Apache XML-DBMS freeware (Tried XSU from Oracle) XML Schema Loading the DB Host, Host type Metrics, Services
2000/11/02Tim Smith: JLab10 Configuration Repository Parser XML- DBMS jdbc RDBMS XML DB Querying the DB jdbc Configuration Items Java Objects
2000/11/02Tim Smith: JLab11 Correlation Engine To correlate metrics from the MRS according to configuration in the CRS Metric collections: trends + multiple machines Samplings: Union for read efficiency from MRS Example Java Classes: Correlation coordinator Sampling cache Evaluators Timers
2000/11/02Tim Smith: JLab12 Publish / Subscribe : Java RMI Interfaces Document Events User Interface Monitoring Agent Monitoring Broker Measurement Repository Configuration Repository Correlation Engine Access Server metric stream metric value exception configuration
2000/11/02Tim Smith: JLab13 Monitoring Agent/Broker I SNMP extended existing infrastructure Multithreaded broker loading DB JMX / JDMK JMX public specification: managed resources Plugable agents Reported several important bugs Demo at JavaOne conference Linux/NT remote reset Netlogger instrumentation Opened up license negotiations
2000/11/02Tim Smith: JLab14 Monitoring Agent/Broker II C Low overhead SNMP /proc netlogger Script Spool Monitoring ProcessSpool ManagerMonitoring Broker Not yet … DMTF DMI, CMI
2000/11/02Tim Smith: JLab15 PEM Futures Today: CERN CC needs it Prototype for ALICE MDC III in January Tomorrow: Tier-0 RC / GRID node need it More complete management solutions Integrate into the Fabric Management WP ‘GRIDification’ Rapidly evolving technologies Lots of middleware Lots of companies wanting collaboration still need framework
2000/11/02Tim Smith: JLab16 Configuration Management Alarm Recovery Actions Inventory Resource Planning Security PEM in Perspective PC Hardware Console Mgmt Power Mgmt/Remote Reset OS Installation/Update OS Configuration/Update Application Inst/Update Monitoring