High Availability For Nagios Mike Weber
20122 Alternatives Daily Image Creation for Restore (VMWare, etc.) - lose parts of history - create gaps in monitoring with image creation rsync to Synchronize Servers - requires IP address, hostname changes - requires modification of nagios.cfg - assumes Master will never be misconfigured - rsync can use a lot of resources Clustered Nagios Server
20123 Alternatives: Redundant Monitoring
20124 Alternatives: Redundant Monitoring
20125 Alternatives: Failover
20126 Alternatives: Failover
2011 Nagios World Conference 7 Perfect Solution: Does Not Exist
20128 High Availability: Outline of Goals Create Master/Slave Relationship Master Sends History to the Slave Slave Not Check Services, Hosts or Notifications Slave Monitors Master via Script Slave Enables Host, Service Checks and Notifications Slave Disables All Checks when Master is Up Simplicity
20129 Failover and Performance Enhancement
Test Server: Puppet Master
Step #1: Clone Master to Slave Backup Master Databases and Files - MySQL databases - Postgres database Backup Files - /usr/local/nagios - /usr/local/nagiosxi Install all dependencies for plugins Enable Access from Slave on all devices
Step #2: Disable Slave Edit nagios.cfg execute_host_checks=0 execute_service_checks=0 enable_notifications=0 Save and Restart Nagios
Step #3: Enable NSCA Master Sends History via NSCA - edit nagios.cfg (save and restart Nagios) obsess_over_hosts=1 obsess_over_services=1 Slave Maintains History via NSCA - install NSCA daemon on slave - allow connections from Master
Master: Allow Outbound Transfers
File Found in /usr/local/nagios/etc send_nsca cfg # CONFIGURED BY NAGIOS XI password=LMb674FcsswP encryption_method=3 Master: Outbound Config
default: on # description: NSCA (Nagios Service Check Acceptor) service nsca { flags = REUSE socket_type = stream wait = no user = nagios group = nagios server = /usr/local/nagios/bin/nsca server_args = -c /usr/local/nagios/etc/nsca.cfg --inetd log_on_failure += USERID disable = no only_from = } Slave: NSCA Config
Slave: Allow Inbound Transfers
Step #4: Slave Monitor Master via SSH Create SSH Keys on Slave - push public key to master Create authorized_hosts file on Master Implement SSH script to check Master - passwordless login - set on a cron job (check every minute) - script detects status of Master - scripts turns on/off checks and notifications
Create Key Pair su – nagios mkdir.ssh cd.ssh ssh-keygen -b f id_dsa -t dsa -N '' Generating public/private dsa key pair. Your identification has been saved in id_dsa. Your public key has been saved in id_dsa.pub. The key fingerprint is: 61:23:17:2d:83:d8:d9:f9:87:2d:e1:6d:e6:3d:cb:5c The key's randomart image is: +--[ DSA 1024]----+ | o +.o | |. + =.o | |. == = | | + o= * | | S *. | |. o E| | o + | | + | |
Push Public Key to nagios user on Master scp id_dsa.pub This means that the nagios user must have a /home/nagios/.ssh directory. The public key name is changed to “slave” to avoid overwriting any keys. On the master (as the nagios user): cat slave >> authorized_keys chmod 644 authorized_keys
Slave: Cron Job # /etc/cron.d/nagiosxi: crontab fragment for nagiosxi * * * * * nagios /bin/sh /usr/local/nagios/libexec/eventhandlers/check_master.sh
Slave: check_master.sh #!/bin/bash masterip= function disable () { sed -i 's/execute_host_checks=1/execute_host_checks=0/' /usr/local/nagios/etc/nagios.cfg sed -i 's/execute_service_checks=1/execute_service_checks=0/' /usr/local/nagios/etc/nagios.cfg sed -i 's/enable_notifications=1/enable_notifications=0/' /usr/local/nagios/etc/nagios.cfg /sbin/service nagios reload } function enable () { sed -i 's/execute_host_checks=0/execute_host_checks=1/' /usr/local/nagios/etc/nagios.cfg sed -i 's/execute_service_checks=0/execute_service_checks=1/' /usr/local/nagios/etc/nagios.cfg sed -i 's/enable_notifications=0/enable_notifications=1/' /usr/local/nagios/etc/nagios.cfg /sbin/service nagios reload } nagpid=$(ssh /etc/init.d/nagios status | grep running |wc -l) if [ $nagpid -eq 0 ]; then echo "Starting Checks" enable fi if [ $nagpid -eq 1 ]; then echo "Stopping Checks" disable fi exit 0
Assumptions: Based on Simplicity Mature Implementation -set up once implementation of network is primarily complete Master Down Short Amount of Time - slave not send history to Master on return Master and Slave Independent of Updates - no rsync - guarantees integrity of one system
Master
Slave
Master: Service States
Slave: Service States
2011 Nagios World Conference 28 Problems
NSCA: Version Plugin Buffer is Larger * NSCA Server Receives OK * NSCA Sending Adds Wrong Information Replace with Version on Master * send_nsca * Located in /usr/local/nagios/libexec
2011 Nagios World Conference 30 Questions?