Nagios Integration January , perfSONAR-PS Developers Meeting Jason Zurawski, Internet2 Brian Tierney, ESnet
Idea Configuration – Static configuration (e.g. Easy stuff) – Dynamic configuration (e.g. Hard stuff) GUIs – Monitoring other instances – Visualizing the data on the toolkit 2 – 8/25/2014, © 2009 Internet2 Outline
Nagios will be used to monitor the health of the toolkit – Ensuring processes are running (or not running) – Ensuring data is meeting thresholds – Alerting in the event of a problem – Visualizing stability over time Do not want to recommend development on 3.1 versions – See suggested dev freeze in ‘LiveCD’ topic Would be included on – 8/25/2014, © 2009 Internet2 Idea
Will be a need to have custom configuration Each toolkit will choose different setup options, forcing different sets of things to care about – Where to send – Which processes to monitor – Which data sets to monitor – Different definitions for key thresholds (time based, data based) Will break this into two types: – ‘static’ is the same for all instances – ‘dynamic’ depends on the environment 4 – 8/25/2014, © 2009 Internet2 Configuration
Process Monitoring – Httpd – Ntpd (running + synced) – Service watcher process (re-starting pS daemons) Host Monitoring – Disk (too full) – Load (too high) – Process count (too high) General configuration – Need to know an SMTP host to relay through – Send the to someone (admin GUI will record this) – Send certain notifications to pS? 5 – 8/25/2014, © 2009 Internet2 Configuration - Static
Process Monitoring – ssh (if enabled) – Measurement daemons (owamp/bwctl/ndt/npad/pSB master and collector) – as enabled – pS Daemons (LS, SNMP, pSB, PingER) – as enabled Process running + respond to WS requests – Mysqld (if running) Data sets – Can check in one two ways: Through the WS Direct DB Query – Could also do both 6 – 8/25/2014, © 2009 Internet2 Configuration - Dynamic
Data Sets (cont.) – Data above or below a threshold Errors on an interface Utilization too high BWCTL expectation too low OWAMP loss/jitter too high – Data older than a time period Will need to see regular testing config for this (e.g older than the data expectation interval) – Data flapping between states OWAMP/PingER latency Interface status 7 – 8/25/2014, © 2009 Internet2 Configuration - Dynamic
Host Monitoring – Monitor the health of related machines? – Custom alerts RAM Disk Processing General configuration – How often to alert? 8 – 8/25/2014, © 2009 Internet2 Configuration - Dynamic
Use existing Nagios GUIs to show a mesh of deployments (or the entire pSPT world) Other GUIs out there – – 8/25/2014, © 2009 Internet2 GUIs
Nagios Integration January , perfSONAR-PS Developers Meeting Jason Zurawski, Internet2 Brian Tierney, ESnet For more information, visit 10 – 8/25/2014, © 2009 Internet2