Nagios: An introduction and Brief Tutorial

Slides:



Advertisements
Similar presentations
How to monitor the $H!T out of Hadoop Developing a comprehensive open approach to monitoring hadoop clusters.
Advertisements

1 Network Monitoring with Nagios Asian Internet Interconnection Initiatives Project Yan Adikusuma Nara Institute of Science and Technology
Nagios on Tier1 farm Jonathan Wheeler RAL Tier1 Fabric Team 20 th June 2008.
© 2009 GroundWork Open Source, Inc. PROPRIETARY INFORMATION: Information contained herein is not for use or disclosure outside of GroundWork Open Source,
Chapter 20 Oracle Secure Backup.
1 Institutional Repository Workshop 1 – 3 April 2009 Presented by Leonard Daniels.
Bryan Heden Lead Solutions Provider
Visualization of Monitoring Data at the NASA Advanced Supercomputing Facility Janice Singh
Week 6: Chapter 6 Agenda Automation of SQL Server tasks using: SQL Server Agent Scheduling Scripting Technologies.
Monitoring of Castor at RAL Castor F2F Rutherford Appleton Laboratory, February 18 th 2009 Cheney Ketley RAL.
Nagios System monitoring, the easy way. What is Nagios Nagios watches your computers through user-defined commands It can be set to inform you when a.
© 2009 GroundWork Open Source, Inc. PROPRIETARY INFORMATION: Information contained herein is not for use or disclosure outside of GroundWork Open Source,
Network & System Monitoring with Nagios & Cacti Kevin Mueller.
1 CHEP 2000, Roberto Barbera Roberto Barbera (*) Grid monitoring with NAGIOS WP3-INFN Meeting, Naples, (*) Work in collaboration with.
Scaling Nagios ® to monitor large heterogeneous environments Dave Blunt February 21, 2008 Seattle Area System Administrators Guild.
1 Automating Monitoring with Puppet Chris Mague Moovweb May 23, 2012.
SQL Reporting Another tool in our IT toolbox. It may not be the sharpest, but it’s free with msSQL and it empowers the users, some. By Bryan Yates -
M. Bechtel, S. Blümel, A. Quignon1 Linux Network Server Group: Nagios Marc Bechtel Sebastian Blümel Alexandre Quignon.
Calendar Browser is a groupware used for booking all kinds of resources within an organization. Calendar Browser is installed on a file server and in a.
Server-Side vs. Client-Side Scripting Languages
NGOP J.Fromm K.Genser T.Levshina M.Mengel V.Podstavkov.
Slide 1 of 9 Presenting 24x7 Scheduler The art of computer automation Press PageDown key or click to advance.
R. Lange, M. Giacchini: Monitoring a Control System Using Nagios Monitoring a Control System Using Nagios Ralph Lange, BESSY – Mauro Giacchini, LNL.
Microsoft Windows 2003 Server. Client/Server Environment Many client computers connect to a server.
11 Distributed Monitoring and Cloud Scaling for Web Apps Fernando Hönig
The Blue “W” is placed on your Desktop or in your system tray area.
Papeete, French Polynesia PacNOG 5 Papeete, French Polynesia 17 June 2009 Hervey Allen.
These materials are licensed under the Creative Commons Attribution-Noncommercial 3.0 Unported license (
Experiences Deploying Xrootd at RAL Chris Brew (RAL)
Nagios and Mod-Gearman In a Large-Scale Environment Jason Cook 8/28/2012.
Passive Monitoring with Nagios Jim Prins
Your university or experiment logo here Nagios: An introduction and Brief Tutorial Chris Brew SciTech/PPD.
Josh Riggs Utilizing Open Source Network Monitoring.
An Introduction to IBM Systems Director
workshop eugene, oregon Nagios Network Design and Operations 24 July 2009
1. A key measurement tool for actively monitoring availability of devices and services. Possible the most used open source network monitoring software.
2010 These materials are licensed under the Creative Commons Attribution-Noncommercial 3.0 Unported license (
Hour 7 The Application Layer 1. What Is the Application Layer? The Application layer is the top layer in TCP/IP's protocol suite Some of the components.
INFN-GRID Testbed Monitoring System Roberto Barbera Paolo Lo Re Giuseppe Sava Gennaro Tortone.
Introduction To Nagios A Linux-based Monitoring System.
Network Monitoring Manage your business without blowing your budget. Learn how the Calhoun ISD utilizes free “Open Source” tools for real-time monitoring.
1 The new Fabric Management Tools in Production at CERN Thorsten Kleinwort for CERN IT/FIO HEPiX Autumn 2003 Triumf Vancouver Monday, October 20, 2003.
Graphing and statistics with Cacti AfNOG 11, Kigali/Rwanda.
Nagios The monitoring tool. Why ? Nagios is a powerful, modular network monitoring system that can be used to monitor many network services like smtp,
Nagios Network Monitoring Andrew Hamilton TJ IT Technician.
Performance Monitoring of SLAC Blackbox Nodes Using Perl, Nagios, and Ganglia Roxanne Martinez Mentor: Yemi Adesanya United States Department of Energy.
NAGIOS 1. Introduction A key measurement tool for actively monitoring availability of devices and services. Possible the most used open source network.
Oracle Data Integrator Agents. 8-2 Understanding Agents.
2010 These materials are licensed under the Creative Commons Attribution-Noncommercial 3.0 Unported license (
Portal Update Plan Ashok Adiga (512)
Form Processing Week Four. Form Processing Concepts The principal tool used to process Web forms stored on UNIX servers is a CGI (Common Gateway Interface)
Web Server Administration Chapter 4 Name Resolution.
2010 NAGIOS APRICOT 2010 Kuala Lumpur, Malaysia.
Splunk Enterprise Instructor: Summer Partain 3 Day Course.
Queensland University of Technology Nagios – an Open Source monitoring solution and it’s deployment at QUT.
2008 Taipei, Taiwan An Introduction APRICOT 2008 Network Management Workshop February – Taipei, Taiwan Hervey Allen & Phil.
Nagios - introduction Dhruba Raj Bhandari ( CCNA ) p Additions by Phil Regnauld.
Nagios FTW TriLUG 8/10/06 Presented by: Jason Faulkner Ian Kilgore.
Distributed Monitoring with Nagios: Past, Present, Future Mike Guthrie
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Nagios Grid Monitor E. Imamagic, SRCE OAT.
Network Management Workshop March – Bangkok, Thailand
SQL Database Management
NGI and Site Nagios Monitoring
Use of Nagios in Central European ROC
What is nagios? Version 2 8/ M.A.Newhall.
Monitoring with Nagios
Objects Mike Weber
Printer Admin Print Job Manager
Adding Objects To Nagios 3.0
Introduction to Ansible
Presentation transcript:

Nagios: An introduction and Brief Tutorial Chris Brew SciTech/PPD

Outline What is Nagios? Getting Started Going Further Hosts, Commands, Services, Timeperiods and Contacts Remote Checks with NRPE Hostgroups and Servicegroups Templates Config File(s) Active v. Passive checks Going Further Writing you own Checks NSCA Service Hierarchies Eventhandlers Modifying the Web Pages

What is Nagios “Nagios is an enterprise-class monitoring solutions for hosts, services, and networks released under an Open Source license.” www.nagios.org “Nagios is a popular open source computer system and network monitoring application software. It watches hosts and services that you specify, alerting you when things go bad and again when they get better.” www.wikipedia.org

Installation Nagios RPMs for RHEL (and so SL/SLC) available from the DAG repository 4 Main component RPMS nagios – the main server software and web scripts nagios-plugins – the common set of check scripts used to query services nagios-nrpe – Nagios Remote Plugin Executor nagios-nsca – Nagios Service Check Acceptor Setup is simply a matter on installing the RPMs, configuring your web server and editing the config files to suit your setup

Architecture Simplest setup has central server running Nagios daemon that runs local check scripts which the status of services on that and remote hosts A host is a computer running on the network which runs one or more services to be checked A service is anything on the host that you want checked. Its state can be one of: OK, Warning, Critical or Unknown A check is a script run on the server whose exit status determines the state of the service: 0, 1, 2 or -1

hosts define host{ host_name my-host alias my-host.domain.ac.uk address 168.192.0.1 check_command check-host-alive max_check_attempts 10 check_period 24x7 notification_interval 120 notification_period 24x7 notification_options d,r contact_groups unix-admins register 1 }

Services define service{ name ping-service service_description PING is_volatile 0 check_period 24x7 max_check_attempts 4 normal_check_interval 5 retry_check_interval 1 contact_groups unix-admins notification_options w,u,c,r notification_interval 960 notification_period 24x7 check_command check_ping!100.0,20%!500.0,60% hosts my-host register 1 }

Command Commands wrap the check scripts and the alerts define command{ command_name check-host-alive command_line $USER1$/check_ping -H $HOSTADDRESS$ -w 99,99% -c 100,100% -p 1 } and the alerts command_name notify-by-email command_line /usr/bin/printf "%b" "***** Nagios *****\n\nNotification Type: $NOTIFICATIONTYPE$\n\nService: $SERVICEDESC$\nHost: $HOSTALIAS$\nAddress: $HOSTADDRESS$\nState: $SERVICESTATE$\n\nDate/Time: $LONGDATETIME$\n\nAdditional Info:\n\n$SERVICEOUTPUT$" | /bin/mail -s "** $NOTIFICATIONTYPE$ alert - $HOSTALIAS$/$SERVICEDESC$ is $SERVICESTATE$ **" $CONTACTEMAIL$

Check Scripts The standard nagios-plugins rpm provides over 130 different check scripts, ranging from check_load to check_oracle_instance.p via check_procs, check_mysql, check_mssql, check_real and check_disk Writing you own check scripts is easy, can be in any language. Active scripts just need to set the exit status and output a single line of text Passive checks just write a single line to the servers command file

Contacts Contacts are the people who receive the alerts: define contact{ contact_name chris alias Chris Brew service_notification_period 24x7 host_notification_period 24x7 service_notification_options w,u,c,r host_notification_options d,r service_notification_commands notify-by-email host_notification_commands host-notify-by-email email someone@somewhere } Contactgroups group contacts: define contactgroup{ contactgroup_name unix-admins alias Unix Administrators members chris

Time Periods Time periods define when things, checks or alerts, happen: define timeperiod{ timeperiod_name 24x7 alias 24 Hours A Day, 7 Days A Week sunday 00:00-24:00 monday 00:00-24:00 tuesday 00:00-24:00 wednesday 00:00-24:00 thursday 00:00-24:00 friday 00:00-24:00 saturday 00:00-24:00 }

Remote checks with NRPE NRPE is a daemon that runs on a remote host to be checked and a corresponding check script on the Master Nagios server Nagios Daemon runs the check_nrpe script which contacts the daemon which runs the check script locally and returns the output: Nrpe.cfg (on remote host): command[check_load]=/usr/lib/nagios/plugins/check_load -w 15,10,5 -c 30,25,20 Nagios.cfg (on Master server): define command{ command_name check_nrpe_load command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c check_load }

Host and Service Groups Host and service groups let you group together similar hosts and services: define hostgroup{ hostgroup_name 4-ServiceNodes alias RALPP Service Nodes } define servicegroup{ servicegroup_name topgrid alias Top Grid Services Plus a hostgroups or a servicegroups line in the host or service definition

Templates You can define templates to make specifying hosts and services easier: define host{ name generic-unix-host use generic-host check_command check-host-alive max_check_attempts 10 check_period 24x7 notification_interval 120 notification_period 24x7 notification_options d,r contact_groups unix-admins register 0 } Reduces a host definition to: use generic-grid-frontend-host host_name heplnx201 alias heplnx201.pp.rl.ac.uk address 130.246.47.201

Config Files Main nagios.cfg file can have include statements to pulling other setting files or directories of files The standard example config files are confusingly spred over several possible files, many of which need editing to get anything working. My current set up has the config spread over multiple files and directories. One set of top level files defining global settings, commands, contact, hostgroups, servicegroups, host-templates, service-templates, time-periods, resources (user variables) One directory for each host group containing one file defining the services and one defining the hosts

Active vs Passive Checks For some services running a script to check their state every few minutes (so called active checking) is not the best way. Service has its own internal monitoring One script can efficiently check the status of multiple related services The nagios service can be set to read “commands” from a named pipe Any process can then write in a line updating the status of a service (passive check) Web frontend’s cgi script can also write commands to the file to disable checks or notifications for a host or service for example.

Going Further 1 NSCA, is a script/daemon pair that allow remote hosts to run passive checks and write the results into that nagios servers command file. Checking operation on remote host calls send_nsca script which forwards the result to the nsca daemon on the server which writes the result into the command file Can be used with eventhandlers to produce a hierarchy of Nagios servers Service Hierachies, services and hosts can depend on other services or hosts so for instance: If the web server is down don’t tell me the web is unreachable If the switch is down don’t send alerts for the hosts behind it

Going Further 2 Event Handlers, as well as just telling you a service is down Nagios can attempt to rectify the fault by running an eventhandler The cgi scripts, templates and style sheets that build the web pages can be edited to add extra information Nagios has a myriad of other features not touched upon here from state stalking to flap detection, notification escalations to scheduling network, host or service downtimes

Conclusions Nagios is a vary useful tool for the time poor sysadmin but can be very daunting when you first look at it. My advice is: Install it on you test node (though this may well end up as your master server) Run a few check scripts by hand to get the feel for them Set up a simple config file that runs a few check on the local host Install nrpe on the host and nrpe and nagios-plugins on a remote host Run check nrpe by hand to get it working then add a couple of simple checks on the remote host NOW THINK ABOUT HOW YOU WANT TO ORGANISE YOU CONFIG FILES Now add hosts and service until you run out, then write some more