What is nagios? Version 2 8/17 4.2.4 M.A.Newhall
History Rewrite/replacement for netsaint 1999 Popular fork Icinga 2009 Licence - GPL
Dual engine monitoring engine escalation engine
Is nagios for you? You? Can you be an open source advocate? Become computer scientist.
Computer science Promises and contracts vs data and testing Open source is computer science Get in the authors head. Experiment, test, program, solve, establish best practices Don't assume anything works!
Why open source? Huge pool of pre written checks. Search engine penetration?- outstanding Email Support $1650-$4750 5 incident phone $1000
Upside No upgrade treadmill. Keep ahead of events. 'On it' Integration with everything web interface tiered user interaction, directory authentication, whitelist, blacklist, etc. Customizable
Downside Tons of decisions to be made Decisions best recorded in config files No server side automation Self discipline
Organization self discipline inheritance bogus hosts and groups.
Are your customers realistic? When to page? Reliability?
Beep frequency and escalation 1. Who - contact group or contact 2. When - timeperiods 3. How - email? beep? Hack power grid? 4. How often? - How many trys, how often to retries, rebeep? 5. Acceleration (escalation) - Speed up, slow down, beep someone new.
What is Nagios good at? Watch systems - Web interface with user definable views and many ways to drill down and sort data Remote alerts - You can beep/email/run scripts for any event with an frequency.
Not so good - Monitoring Cacti
Targets? Network stack Linux systems OSX systems Windows systems Log servers - (syslogd,eventviewer,splunk,etc) Everything
Redundancy? Why? vms vs hardware vs what datacenter? Examine and test example. - my/postgresql backend.
Software Nagios Nagios-plugins Web server Mail server
Install Windows? Beta Linux? Mature
Picking right Linux Redhat Centos Fedora Debian Ubuntu
Automation Windows? - sorry don't know Linux client deploy - yes! - Ansible,Puppet,Chef, etc Linux Nagios server deploy - homegrown.
Install on Linux .rpm/.deb apt-get install nagios yum install nagios cd someplace; tar xzvf nagios-VERSION.tar.gz cd nagios ./config; make ; make install
Installing web server. apt-get or yum Apache https nginx https
Mail server. yum install postfix Test mail -S test someone@example.com
Clients NRPE NSCLIENT++ Nageventlog SNMP SNMP server
NRPE startup demon vs xinitd systemd vs initscripts
Install plugins tar xzvf nagios-plugins-VERSION.tar.gz cd nagios-plugins ./configure make make install
Find a check Review Install plugins Search plugin repos for specific check Write check (return values plus text output)
Server
Concepts parents escalation IP vs hostname templates flapping Inheritance
Core conf files Cgi.cfg nagios.cfg
core conf files (typically .cfg files) hostgroups.cfg hosts.cfg - Own directory (hosts) define hostgroup{ hostgroup_name critical-linux alias Critical Linux contact_groups linux-admins,bean-counters members example, example2, example3 } define host{ use pingable-host host_name example Address 1.2.3.4 parents important-switch1 }
Additional conf files services.cfg define service{ use generic-service ; Name of service template to use hostgroup_name critical-linux service_description PING is_volatile 0 check_period 24x7 max_check_attempts 3 normal_check_interval 5 retry_check_interval 1 contact_groups linux-admins notification_interval 240 notification_period 24x7 notification_options w,u,c check_command check_ping!100.0,20%!400.0,60% }
Additional conf files checkcommands.cfg contacts.cfg contactgroups.cfg timeperiods.cfg # 'check_ping' command definition define command{ command_name check_ping command_line $USER1$/check_ping -H $HOSTADDRESS$ -w $ARG1$ -c $ARG2$ -p 3 }
Accepting changes checking viability nagios service reload On fail... nagios service restart NAGIOSPATH/nagios -v /etc/nagios.cfg
git
Old samples http://www.warcloud.net/docs/nagios-talk/nagios- example-commands.txt
Questions?