High Availability For Nagios Mike Weber

Slides:



Advertisements
Similar presentations
© 2011 VMware Inc. All rights reserved High Availability Module 7.
Advertisements

1 / 62 Internet Telephony PBX System IPX-2000(V2) SOP for the Stackable Management.
Scaling Nagios ® to monitor large heterogeneous environments Dave Blunt February 21, 2008 Seattle Area System Administrators Guild.
Cacti Workshop Tony Roman Agenda What is Cacti? The Origins of Cacti Large Installation Considerations Automation The Current.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment Chapter 12: Managing and Implementing Backups and Disaster Recovery.
70-293: MCSE Guide to Planning a Microsoft Windows Server 2003 Network, Enhanced Chapter 7: Planning a DNS Strategy.
MCTS Guide to Microsoft Windows Server 2008 Network Infrastructure Configuration Chapter 8 Introduction to Printers in a Windows Server 2008 Network.
Copyright © 2013 FingerTec Worldwide Sdn.Bhd. All rights reserved.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment, Enhanced Chapter 12: Managing and Implementing Backups and Disaster Recovery.
System Administration: Linux Track 2 Workshop June 2010 Pago Pago, American Samoa.
Managing DHCP. 2 DHCP Overview Is a protocol that allows client computers to automatically receive an IP address and TCP/IP settings from a Server Reduces.
© 2006 Cisco Systems, Inc. All rights reserved.Cisco ConfidentialPresentation_ID 1 Backup, Restore, and Server Replacement Josh Rose UCBU Software Engineer.
CHAPTER 21 Automating Jobs. Introduction to Automating Jobs DBAs rely heavily on automating jobs. DBAs cannot be effective without automation. Listed.
Passive Monitoring with Nagios Jim Prins
1 CCNA 3 v3.1 Module 6 Switch Configuration Claes Larsen, CCAI.
Oracle10g RAC Service Architecture Overview of Real Application Cluster Ready Services, Nodeapps, and User Defined Services.
Web Based Inventory Site Building Room Asset Number Category Type Description Serial Number Manufacturer Model Vendor Name Acquired Date P O Number Budget.
Dynamic and Secure DNS Tianyi Xing.  Establish a dynamic and secure DNS service in the mobicloud system.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment, Enhanced Chapter 12: Managing and Implementing Backups and Disaster Recovery.
Chapter 9 Scripting RMAN. Background Authors felt that scripting was a topic not covered well Authors wanted to cover both Unix/Linux and Windows environments.
MySQL and GRID Gabriele Carcassi STAR Collaboration 6 May Proposal.
Linux Services Muhammad Amer. 2 xinetd Programs  In computer networking, xinetd, the eXtended InterNET Daemon, is an open-source super-server daemon.
11 Distributed Monitoring for Web Apps Fernando Hönig
Berkeley R Utilities & the new S Utilities The Unix (or Berkeley) r utilities provide an alternative to IP facilities telnet and ftp. Three programs: rlogin.
Cluster Consistency Monitor. Why use a cluster consistency monitoring tool? A Cluster is by definition a setup of configurations to maintain the operation.
© 2008 Cisco Systems, Inc. All rights reserved.CIPT1 v6.0—1-1 Getting Started with Cisco Unified Communications Manager Installing and Upgrading Cisco.
Daemons Ying Zhang CMSC691X, Summer02. Outline  Introduction  Init and Cron  System daemons  Print daemons and NFS daemons  Time synchronization.
MySQL and GRID status Gabriele Carcassi 9 September 2002.
Lab 8 Overview Apache Web Server. SCRIPTS Linux Tricks.
Using MPI on Dept. Clusters Min LI Sep Outline Run MPI programs on single machine Run mpi programs on multiple machines Assignment 1.
Log Shipping, Mirroring, Replication and Clustering Which should I use? That depends on a few questions we must ask the user. We will go over these questions.
Run the on your PC to start the firmware configuration process Run IP Config Tool.
CHAPTER 3 Router CLI Command Line Interface. Router User Interface User and privileged modes User mode --Typical tasks include those that check the router.
High-Availability MySQL with DR:BD and Heartbeat: MTV Japan mobile services ©2008 MTV Networks Japan K.K.
Information Initiative Center, Hokkaido University North 11, West 5, Sapporo , Japan Tel, Fax: General.
Proctor Caching Overview. 2 Proctor Caching Diagram.
WordPress and Etherpad with BlueMix and Docker. Our aim is to run on BlueMix containers (now in beta) these two famous services In the BlueMix dashboard,
Labs. Session 1 Lab: Installing and Configuring Windows 7 Exercise 1: Migrating Settings by Using Windows Easy Transfer Exercise 2: Configuring a Reference.
Run the on your PC to start the firmware configuration process Run IP Config Tool.
Using Grsync with Ubuntu Presented by Dave Mawdsley, DACS Member, Linux SIG August 20, 2008 (making rsync easy with a memory key or a server)
1 Policy Based Systems Management with Puppet Sean Dague
1 Build Your Own MySQL Time Machine Chuck Bell, PhD Mats Kindahl, PhD Replication and Backup Team Sun Microsystems 1.
Npush agent deployment Yancy Ribbens
NRPE Nagios Remote Plugin Executor Mike Weber
Distributed Monitoring with Nagios: Past, Present, Future Mike Guthrie
10 Quick Steps To Disaster Mike Weber
Administering the SOWN Network David R Newman & Chris Malton.
Using Crontab with Ubuntu
High Availability Linux (HA Linux)
CCNA Routing and Switching Routing and Switching Essentials v6.0
Attestation Checkpoint
Chapter 9 Router Configuration (Ospf, Rip) Webmin, usermin Team viewer
Working With TFTP.
LINUX ADMINISTRATION 1
ASU Saguaro 09/16/2016 Jung Hyun Kim.
ETL Job Scheduler Job Database Server User Interface Scheduler
Vagrant Managing Virtual Machines
Chapter 10: Device Discovery, Management, and Maintenance
CCNA Routing and Switching Routing and Switching Essentials v6.0
Objects Mike Weber
Lab 1 introduction, debrief
Tech Inside Extended Document Management System (EDMS)
Chapter 10: Device Discovery, Management, and Maintenance
Network Services.
COP 4343 Unix System Administration
ODP node monitoring and maintenance
Configuration Of A Pull Network.
Lecture 16B: Instructions on how to use Hadoop on Amazon Web Services
Chapter 10: Advanced Cisco Adaptive Security Appliance
How to install and manage exchange server 2010 OP Saklani.
Presentation transcript:

High Availability For Nagios Mike Weber

20122 Alternatives Daily Image Creation for Restore (VMWare, etc.) - lose parts of history - create gaps in monitoring with image creation rsync to Synchronize Servers - requires IP address, hostname changes - requires modification of nagios.cfg - assumes Master will never be misconfigured - rsync can use a lot of resources Clustered Nagios Server

20123 Alternatives: Redundant Monitoring

20124 Alternatives: Redundant Monitoring

20125 Alternatives: Failover

20126 Alternatives: Failover

2011 Nagios World Conference 7 Perfect Solution: Does Not Exist

20128 High Availability: Outline of Goals Create Master/Slave Relationship Master Sends History to the Slave Slave Not Check Services, Hosts or Notifications Slave Monitors Master via Script Slave Enables Host, Service Checks and Notifications Slave Disables All Checks when Master is Up Simplicity

20129 Failover and Performance Enhancement

Test Server: Puppet Master

Step #1: Clone Master to Slave Backup Master Databases and Files - MySQL databases - Postgres database Backup Files - /usr/local/nagios - /usr/local/nagiosxi Install all dependencies for plugins Enable Access from Slave on all devices

Step #2: Disable Slave Edit nagios.cfg execute_host_checks=0 execute_service_checks=0 enable_notifications=0 Save and Restart Nagios

Step #3: Enable NSCA Master Sends History via NSCA - edit nagios.cfg (save and restart Nagios) obsess_over_hosts=1 obsess_over_services=1 Slave Maintains History via NSCA - install NSCA daemon on slave - allow connections from Master

Master: Allow Outbound Transfers

File Found in /usr/local/nagios/etc send_nsca cfg # CONFIGURED BY NAGIOS XI password=LMb674FcsswP encryption_method=3 Master: Outbound Config

default: on # description: NSCA (Nagios Service Check Acceptor) service nsca { flags = REUSE socket_type = stream wait = no user = nagios group = nagios server = /usr/local/nagios/bin/nsca server_args = -c /usr/local/nagios/etc/nsca.cfg --inetd log_on_failure += USERID disable = no only_from = } Slave: NSCA Config

Slave: Allow Inbound Transfers

Step #4: Slave Monitor Master via SSH Create SSH Keys on Slave - push public key to master Create authorized_hosts file on Master Implement SSH script to check Master - passwordless login - set on a cron job (check every minute) - script detects status of Master - scripts turns on/off checks and notifications

Create Key Pair su – nagios mkdir.ssh cd.ssh ssh-keygen -b f id_dsa -t dsa -N '' Generating public/private dsa key pair. Your identification has been saved in id_dsa. Your public key has been saved in id_dsa.pub. The key fingerprint is: 61:23:17:2d:83:d8:d9:f9:87:2d:e1:6d:e6:3d:cb:5c The key's randomart image is: +--[ DSA 1024]----+ | o +.o | |. + =.o | |. == = | | + o= * | | S *. | |. o E| | o + | | + | |

Push Public Key to nagios user on Master scp id_dsa.pub This means that the nagios user must have a /home/nagios/.ssh directory. The public key name is changed to “slave” to avoid overwriting any keys. On the master (as the nagios user): cat slave >> authorized_keys chmod 644 authorized_keys

Slave: Cron Job # /etc/cron.d/nagiosxi: crontab fragment for nagiosxi * * * * * nagios /bin/sh /usr/local/nagios/libexec/eventhandlers/check_master.sh

Slave: check_master.sh #!/bin/bash masterip= function disable () { sed -i 's/execute_host_checks=1/execute_host_checks=0/' /usr/local/nagios/etc/nagios.cfg sed -i 's/execute_service_checks=1/execute_service_checks=0/' /usr/local/nagios/etc/nagios.cfg sed -i 's/enable_notifications=1/enable_notifications=0/' /usr/local/nagios/etc/nagios.cfg /sbin/service nagios reload } function enable () { sed -i 's/execute_host_checks=0/execute_host_checks=1/' /usr/local/nagios/etc/nagios.cfg sed -i 's/execute_service_checks=0/execute_service_checks=1/' /usr/local/nagios/etc/nagios.cfg sed -i 's/enable_notifications=0/enable_notifications=1/' /usr/local/nagios/etc/nagios.cfg /sbin/service nagios reload } nagpid=$(ssh /etc/init.d/nagios status | grep running |wc -l) if [ $nagpid -eq 0 ]; then echo "Starting Checks" enable fi if [ $nagpid -eq 1 ]; then echo "Stopping Checks" disable fi exit 0

Assumptions: Based on Simplicity Mature Implementation -set up once implementation of network is primarily complete Master Down Short Amount of Time - slave not send history to Master on return Master and Slave Independent of Updates - no rsync - guarantees integrity of one system

Master

Slave

Master: Service States

Slave: Service States

2011 Nagios World Conference 28 Problems

NSCA: Version Plugin Buffer is Larger * NSCA Server Receives OK * NSCA Sending Adds Wrong Information Replace with Version on Master * send_nsca * Located in /usr/local/nagios/libexec

2011 Nagios World Conference 30 Questions?