10 Quick Steps To Disaster Mike Weber
2011 Nagios World Conference 2 Inheriting Aberrations with Objects
20123 Where are those settings coming from? Object Inheritance Object Priorities Object Chaining Incomplete Objects Canceling Inheritance Additive Inheritance
20124 Object Inheritance
20125 Object Inheritance: Templates
20126 Object Inheritance: No Hostgroups?
20127 Object Inheritance: From Hostgroup
20128 Object Inheritance: Info Option
20129 Object Priorities: Local then Inheritance
Object Priorities: Order in List (Chaining)
Incomplete Object: Only Lists One Image
Canceling Inheritance: Object Contains Parents
Canceling Inheritance: Wrong Parents
Canceling Inheritance: Cancel Parents
Canceling Inheritance: Canceled Parents
Additive Inheritance: Append Object Contents
Additive Inheritance: Append Object Contents
2011 Nagios World Conference 18 Hoping BAD Things Won't Happen
Real BAD Things Will Happen Backups Updates Dependencies
XI: Automated Backup /etc/cron.d/nagiosxi 0 7 * * * root /root/scripts/automysqlbackup 0 8 * * * root /root/scripts/autopostgresqlbackup /store/backups/mysql daily weekly monthly /store/backups/postgresql daily weekly monthly
XI: Upgrade Backup #!/bin/bash ##### BackUp Of Nagios Before Upgrade ##### # Timestamp Backups TIMESTAMP=$(date +%Y%m%d_%H%M); echo $TIMESTAMP service nagiosxi stop service npcd stop service ndo2db stop service nagios stop mkdir /bk/upgrade_$TIMESTAMP tar cjf /bk/upgrade_$TIMESTAMP/nagios_$TIMESTAMP.tar.bz2 /usr/local/nagios tar cjf /bk/upgrade_$TIMESTAMP/nagiosxi_$TIMESTAMP.tar.bz2 /usr/local/nagiosxi pg_dump -U nagiosxi -c -F p nagiosxi | bzip2 -c > /bk/upgrade_$TIMESTAMP/pg_nagiosxi_$TIMESTAMP.sql.bz2 mysqldump -u root -pnagiosxi nagios | bzip2 -c > /bk/upgrade_$TIMESTAMP/my_nagios_$TIMESTAMP.sql.bz2 mysqldump -u root -pnagiosxi nagiosql | bzip2 -c > /bk/upgrade_$TIMESTAMP/my_nagiosql_$TIMESTAMP.sql.bz2 service nagios start service ndo2db start service npcd start service nagiosxi start
Core: Backup #!/bin/sh # Timestamped Back Up TIMESTAMP=`date +%Y%m%d_%H%M%S`; echo $TIMESTAMP tar czvf /bk/nagios_dir_$TIMESTAMP.tar.gz /usr/local/nagios tar czvf /bk/pnp4nagios_dir_$TIMESTAMP.tar.gz /usr/local/pnp4nagios
2011 Nagios World Conference 23 Ignoring/Encouraging System Warnings
Configuration Errors: Service Checks
Solution: Service Template Management
Service Template: Check Settings
Service Template: Alert Settings
Service Template: Add Hostgroup
Solution: Service Template Management
Max Concurrent Service Checks
Maximum Concurrent Checks Edit nagios.cfg to avoid latency issues. max_concurrent_checks=0
2011 Nagios World Conference 32 Mangling Users and Contacts
Managing Users and Contacts Users (access to the web interface) Contacts (notifications)
Creating Users: Web Interface
Creating Users: Web Interface
Creating Users: Restricted
Creating Users: Restricted
Managing Administrators: Full Access
Managing Administrators: Full Access
Core: cgi.cfg authorized_for_system_information=nagiosadmin,john,sue,mark,tom,mary,ralph authorized_for_configuration_information=nagiosadmin,john,sue,mark,tom,mary,ralph authorized_for_system_commands=nagiosadmin,john,sue,mark,tom,mary,ralph authorized_for_all_services=nagiosadmin,management,john,sue,mark,tom,mary,ralph authorized_for_all_hosts=nagiosadmin,management,john,sue,mark,tom,mary,ralph authorized_for_all_service_commands=nagiosadmin,john,sue,mark,tom,mary,ralph authorized_for_all_host_commands=nagiosadmin,john,sue,mark,tom,mary,ralph authorized_for_read_only=management
Contacts
2011 Nagios World Conference 42 Monitoring Non-Existent Ports on Switches
Save Resources Use AdminDown on Ports * Administratively set unused ports as AdminDown * Modify ifoperstatus Turn Off Monitoring on Used Ports Remove the Checks
Unused Switch Ports: Wasting Resources * check port status * check bandwidth * send notifications * ignore notifications
Modify check_ifoperstatus Here is the code the affects output. You need to modify the line: if ( not defined $adminWarn or $adminWarn eq "w" ) { $state = 'WARNING'; to $state = 'OK'; It is highlighted in the example. ## if ( not ($response->{$snmpIfAdminStatus} == 1) ) { $answer = "Interface $name (index $snmpkey) is administratively down."; if ( not defined $adminWarn or $adminWarn eq "w" ) { $state = 'OK'; } elsif ( $adminWarn eq "i" ) { $state = 'OK'; } elsif ( $adminWarn eq "c" ) { $state = 'CRITICAL'; } else { # If wrong value for -a, say warning $state = 'WARNING'; }
Administratively Down Ports
Disable Port Checks * 790 port checks disabled * 1.5 GB of RAM saved * 18% reduction in max service check execution time
2011 Nagios World Conference 48 Encouraging Non-Accountability for Changes
Who Makes Changes on Your Nagios? Limit Admin Access Require Training Create Policy for Changes Use a Test Server
Audit Log
2011 Nagios World Conference 51 Abusing Nagios XI Wizards
Wizard or Manual Creation: Assessment Installation Which method provides the most efficient installation? Example: Using a wizard for a switch is most efficient. Example: Manually creating a service check to be used on 100 servers is most efficient. Visibility Will it provide access to view the grouping of devices? Example: Can effective reports be created from visible devices? Management Does it make management easier in the long run? Example: The use of templates is an efficient method to manage multiple devices that are similar.
Template Management
2011 Nagios World Conference 54 Disregarding Network Relationships
Reachability
Host: Manage Parents
Host: Manage Parents
Network Relationships: Parents
2011 Nagios World Conference 59 Importing Infectious Diseases
GUI Infection: Lack of Command Line Skills Backups * cron jobs * manual backups * verification Analysis * disk space * logs Troubleshooting * finding stuff * processes * permissions Edit Files * learning vi or nano
Short Cut Infection: Auto-Discovery
2011 Nagios World Conference 62 Overestimating Human Intelligence
2011 Nagios World Conference 63 Some of the Things We Do as Humans Defies Logic