Presentation is loading. Please wait.

Presentation is loading. Please wait.

Scaling Nagios ® to monitor large heterogeneous environments Dave Blunt February 21, 2008 Seattle Area System Administrators Guild.

Similar presentations


Presentation on theme: "Scaling Nagios ® to monitor large heterogeneous environments Dave Blunt February 21, 2008 Seattle Area System Administrators Guild."— Presentation transcript:

1 Scaling Nagios ® to monitor large heterogeneous environments Dave Blunt February 21, 2008 Seattle Area System Administrators Guild

2 © 2008 GroundWork Open Source, Inc. SASAG – Scaling Nagios ® 2 February 2008 What is Nagios?  “an Open Source host, service and network monitoring program.”  Started as Netsaint in 1999 and became Nagios in 2002.  www.nagios.org  Availability and performance monitoring – is it up, is it down? How much load/memory/disk is in use?

3 © 2008 GroundWork Open Source, Inc. SASAG – Scaling Nagios ® 3 February 2008 What is Nagios?

4 © 2008 GroundWork Open Source, Inc. SASAG – Scaling Nagios ® 4 February 2008 What can Nagios suffer from?  Configuration file maintenance issues  CPU and disk I/O bottlenecks  Blocking host checks  File based performance bottlenecks

5 © 2008 GroundWork Open Source, Inc. SASAG – Scaling Nagios ® 5 February 2008 What can Nagios suffer from?  Configuration file maintenance issues –Use a web based configuration tool  Monarch (sourceforge.net/projects/monarch)  Fruity (sourceforge.net/projects/fruity) –Facilitates monitoring across multiple Windows domains, SNMP communities, and other security zones.

6 © 2008 GroundWork Open Source, Inc. SASAG – Scaling Nagios ® 6 February 2008 What can Nagios suffer from?  CPU and disk I/O bottlenecks –Optimize Nagios  nagios.sourceforge.net/docs/2_0/tuning.html –Use database to store config and status information  NDOUtils (www.nagios.org/downloads)  Foundation (sourceforge.net/projects/gwfoundation) –Placing the database on a separate server will greatly improve performance and both examples support it.

7 © 2008 GroundWork Open Source, Inc. SASAG – Scaling Nagios ® 7 February 2008 What can Nagios suffer from?  Blocking host checks –Passive host updates  Fping (fping.sourceforge.net) –Huge increase in host check capacity (8,000+ checks a minute) if pings are parallelized. –Downside of passive host updates is the possibility of some extra service alarms.

8 © 2008 GroundWork Open Source, Inc. SASAG – Scaling Nagios ® 8 February 2008 What can Nagios suffer from?  File based performance bottlenecks –Remove Nagios pipe file bottleneck with Event Brokers  Bronx (archive.groundworkopensource.com/groundwork- opensource/trunk/bronx/) –Feed data into Bronx as replacement for NSCA and also have Bronx send data to Foundation  DNX (dnx.sourceforge.net) –Specifically tied to distributed monitoring.

9 © 2008 GroundWork Open Source, Inc. SASAG – Scaling Nagios ® 9 February 2008 Typical scaling limits in Nagios* Typical mix of Hosts/Services Active service checks/min** Passive service checks/min** 700/7,000770- **Based on a Service being checked once every 10 minutes, and 1% of Services and Hosts being in transition between OK and non-OK states. Retry interval for non-OK states is 1 minute. *With dual 3GHz Xeon, 4GB RAM, 10k RPM disk, RHEL4 ES 32-bit OS.

10 © 2008 GroundWork Open Source, Inc. SASAG – Scaling Nagios ® 10 February 2008 So, how could you scale up?

11 © 2008 GroundWork Open Source, Inc. SASAG – Scaling Nagios ® 11 February 2008 How can I drive my primary Nagios with Passive service checks?  Additional Nagios instances and forward the data to Bronx or NSCA, or set up DNX –At some point you end up having too much monitoring infrastructure!  Passive agents, e.g. GroundWork Distributed Monitoring Agent, NT_Scheduler, Cron –Monarch supports creation of configuration files for passive agents  Different tier one monitoring tools, e.g. syslog, SNMP traps, Ganglia, Cacti –Feed data from these up to your primary Nagios server by installing the right agent on that server, process results, and then submit to Nagios. –Syslog-ng (www.balabit.com/network-security/syslog-ng/) –Snmptt (www.snmptt.org) –Ganglia (ganglia.sourceforge.net) –Cacti (www.cacti.net)

12 © 2008 GroundWork Open Source, Inc. SASAG – Scaling Nagios ® 12 February 2008 One Nagios Instance

13 © 2008 GroundWork Open Source, Inc. SASAG – Scaling Nagios ® 13 February 2008 Many Nagios Instances

14 © 2008 GroundWork Open Source, Inc. SASAG – Scaling Nagios ® 14 February 2008 How do I implement many Nagios instances?  Use a web based configuration tool –Monarch (sourceforge.net/projects/monarch)  Enable configuration data transfer between instances –SSH  Enable check result data transfer between instances –NSCA (Nagios Service Check Acceptor – www.nagios.org/downloads), or Bronx  Optimize each Nagios instance for its purpose –Turn off active checking on parent –Set command_check_interval=-1 on parent –Turn off performance data, eventhandler, and notification processing on children  Alternative approach –DNX? ‘beta’, but significant maintenance advantages

15 © 2008 GroundWork Open Source, Inc. SASAG – Scaling Nagios ® 15 February 2008 Typical scaling limits in Nagios* Typical mix of Hosts/Services Active service checks/min** Passive service checks/min** 700/7,000770- 2,700/27,000-2,970 8,000/80,000***-10,000*** **Based on a Service being checked once every 10 minutes, and 1% of Services and Hosts being in transition between OK and non-OK states. Retry interval for non-OK states is 1 minute. *With dual 3GHz Xeon, 4GB RAM, 10k RPM disk, RHEL4 ES 32-bit OS. ***Using Bronx Event Broker and assumptions listed for note (**)

16 © 2008 GroundWork Open Source, Inc. SASAG – Scaling Nagios ® 16 February 2008 Heterogeneous environments  Mix of Operating Systems, Network security zones, Applications, and Administrators!  Approaches to the problem: –Same agent type on every system  Consistent  Limited coverage –Mix of methods  Flexible  More difficult to maintain  Must normalize data

17 © 2008 GroundWork Open Source, Inc. SASAG – Scaling Nagios ® 17 February 2008 Methods  UNIX –SNMP / SNMP traps –SSH with plugins (www.nagios.org/downloads) –NRPE with plugins (www.nagios.org/downloads) –Cron with plugins –Port-based checks –Syslog (aka traps)  Windows (http://www.crn.com/software/206801053) –SNMP / SNMP traps –NRPE_NT with plugins (www.nagiosexchange.org/Windows_NRPE.66.0.html?&tx_netnagext_pi1[p_view]=235) –WMI (with proxy) –NT_Scheduler with plugins –Port-based checks –Event logs (aka traps) (www.intersectalliance.com/projects/SnareWindows/ or www.steveshipway.org/software/f_nagios.html)  Network –SNMP / SNMP traps –Syslog –Port-based checks  Special devices –SNMP / SNMP traps –Syslog –Port-based checks

18 © 2008 GroundWork Open Source, Inc. SASAG – Scaling Nagios ® 18 February 2008 GroundWork Open Source, Inc. 139 Townsend Street, Suite 100 San Francisco, CA 94107 phone: (415) 992-4500 www.groundworkopensource.com info@groundworkopensource.com


Download ppt "Scaling Nagios ® to monitor large heterogeneous environments Dave Blunt February 21, 2008 Seattle Area System Administrators Guild."

Similar presentations


Ads by Google