Real World Uses for Nagios APIs Janice Singh

Slides:



Advertisements
Similar presentations
NAGIOS AND CACTI NETWORK MANAGEMENT AND MONITORING SYSTEMS.
Advertisements

What’s New in Fireware XTM v11.3.2
Web forms and CGI scripts Dr. Andrew C.R. Martin
Nagios: An introduction and Brief Tutorial
Nagios on Tier1 farm Jonathan Wheeler RAL Tier1 Fabric Team 20 th June 2008.
Managed by UT-Battelle for the Department of Energy Best Ever Archive Utility, Yet (BEAUtY) Kay Kasemir April 2013.
Nagios XI 2012 Mike Guthrie Twitter: mguthrie88 Projects:
Visualization of Monitoring Data at the NASA Advanced Supercomputing Facility Janice Singh
Extending JIRA Rachel Wright July 15, 2014 See slide “Notes” section for commentary and talking points.
Monitoring of Castor at RAL Castor F2F Rutherford Appleton Laboratory, February 18 th 2009 Cheney Ketley RAL.
Web Server Programming
1 OBJECTIVES To generate a web-based system enables to assemble model configurations. to submit these configurations on different.
Operating-System Structures
Copyright 2004 Monash University IMS5401 Web-based Systems Development Topic 2: Elements of the Web (g) Interactivity.
Server-Side vs. Client-Side Scripting Languages
1 Web Server Administration Chapter 3 Installing the Server.
CP476 Internet Computing Browser and Web Server 1 Web Browsers A client software program that allows you to access and view Web pages on the Internet –Examples.
Asynchronous Solution Appendix Eleven. Training Manual Asynchronous Solution August 26, 2005 Inventory # A11-2 Chapter Overview In this chapter,
Slide 1 of 9 Presenting 24x7 Scheduler The art of computer automation Press PageDown key or click to advance.
Linux Operations and Administration
1 Web Developer & Design Foundations with XHTML Chapter 6 Key Concepts.
11 Distributed Monitoring and Cloud Scaling for Web Apps Fernando Hönig
Design Windows Media Services Infrastructure. Module 7: Design Windows Media Services Infrastructure Design Windows Media Services for live streaming.
NETWORK CENTRIC COMPUTING (With included EMBEDDED SYSTEMS)
Hands-On Microsoft Windows Server 2008
Passive Monitoring with Nagios Jim Prins
WorkPlace Pro Utilities.
Chapter 33 CGI Technology for Dynamic Web Documents There are two alternative forms of retrieving web documents. Instead of retrieving static HTML documents,
Chapter 10 Networking and the Internet ITSC 1458.
BZUPAGES.COM Presentation on Content Management System (CMS) Presented to. Sir Ahmad Kareem.
Advanced Features of Nagios XI Sam Lansing -
Section 1: Introducing Group Policy What Is Group Policy? Group Policy Scenarios New Group Policy Features Introduced with Windows Server 2008 and Windows.
Module 7: Fundamentals of Administering Windows Server 2008.
9 Chapter Nine Compiled Web Server Programs. 9 Chapter Objectives Learn about Common Gateway Interface (CGI) Create CGI programs that generate dynamic.
Version control Using Git Version control, using Git1.
10/13/2015 ©2006 Scott Miller, University of Victoria 1 Content Serving Static vs. Dynamic Content Web Servers Server Flow Control Rev. 2.0.
SunGuide® Software Development Project Release 4.3 Express Lanes Enhancements Design Review December 15, 2009 December 15, 20091R4.3 Design Review.
Microsoft ® Business Solutions–Navision ® 4.0 Development II - C/SIDE Solution Development Day 5.
Chapter 6 Server-side Programming: Java Servlets
Dudok de Wit David.  Documents management in a deskless company  SharePoint Online as a solution  Redesigning the documentary organization  Interoperability.
Performance Monitoring of SLAC Blackbox Nodes Using Perl, Nagios, and Ganglia Roxanne Martinez Mentor: Yemi Adesanya United States Department of Energy.
Copyright © 2012 UNICOM Systems, Inc. Confidential Information z/Ware Product Overview illustro Systems International A Division of UNICOM Global.
A university for the world real R © 2009, Chapter 9 The Runtime Environment Michael Adams.
1 WWW. 2 World Wide Web Major application protocol used on the Internet Simple interface Two concepts –Point –Click.
David Lawrence 7/8/091Intro. to PHP -- David Lawrence.
Cluster Consistency Monitor. Why use a cluster consistency monitoring tool? A Cluster is by definition a setup of configurations to maintain the operation.
IS-907 Java EE World Wide Web - Overview. World Wide Web - History Tim Berners-Lee, CERN, 1990 Enable researchers to share information: Remote Access.
Web Design and Development. World Wide Web  World Wide Web (WWW or W3), collection of globally distributed text and multimedia documents and files 
ClearQuest XML Server with ClearCase Integration Northwest Rational User’s Group February 22, 2007 Frank Scholz Casey Stewart
Linux Operations and Administration
K. Harrison CERN, 22nd September 2004 GANGA: ADA USER INTERFACE - Ganga release status - Job-Options Editor - Python support for AJDL - Job Builder - Python.
SAN DIEGO SUPERCOMPUTER CENTER Welcome to the 2nd Inca Workshop Sponsored by the NSF September 4 & 5, 2008 Presenters: Shava Smallen
UNICOS LHCLoggingDB Josef Hofer EN/ICE/SCD. Agenda The LHC Logging Database Purpose of the LHCLogging component Basic concepts Advanced concepts Logging.
4000 Imaje 4020 – Software Imaje 4020 – Content ■ Content of Chapter Software: 1. Flash Up 2. Netcenter 3. FTP 4. Active X 5. XCL commands 6. Exercise.
Leveraging Web Content Management in SharePoint 2013 Christina Wheeler.
Text2PTO: Modernizing Patent Application Filing A Proposal for Submitting Text Applications to the USPTO.
Maintaining and Updating Windows Server 2008 Lesson 8.
A System for Monitoring and Management of Computational Grids Warren Smith Computer Sciences Corporation NASA Ames Research Center.
© CGI Group Inc. User Guide Subversion client TortoiseSVN.
111 State Management Beginning ASP.NET in C# and VB Chapter 4 Pages
Module 4: Troubleshooting Web Servers. Overview Use IIS 7.0 troubleshooting features to gather troubleshooting information Use the Runtime Control and.
Netscape Application Server
Use of Nagios in Central European ROC
z/Ware 2.0 Technical Overview
Chapter 2: System Structures
NCAR-Developed Tools Bill Anderson and Marc Genty
Monitoring with Nagios
Source Code Management
Exploring the Power of EPDM Tasks - Working with and Developing Tasks in EPDM By: Marc Young XLM Solutions
Getting Started With Solr
Presentation transcript:

Real World Uses for Nagios APIs Janice Singh

Agenda This presentation describes the Nagios 4 APIs and how the NASA Advanced Supercomputing at Ames Research Center is employing them to upgrade its graphical status display (the HUD) and explain why it’s worth trying to use them yourselves. Janice S Singh –

The HUD: Visualization of the Center Status Janice S Singh –

Monitored Resources Pleiades –11,176-node SGI ICE supercluster –184,800 cores (plus 32,768 GPU cores) Frontend systems Hyperwall visualization cluster Tape Storage - pDMF cluster NFS servers for /home on computing systems Lustre scratch filesystems with multiple servers PBS (Portable Batch System) job scheduler Ref: Janice S Singh –

Nagios 4 Application Programming Interface No additional setup required Returns JSON output – multi-language support … Three kinds of APIs –Archive –Object –Status Run from the cgi-bin directory Each of the APIs have a help query –domain.com/nagios/cgi-bin/statusjson.cgi?query=help –Also gives help if there is an error in the query Janice S Singh –

JSON example bin/objectjson.cgi?query=hostgroup&hostgroup=tools "data": { "hostgroup": { "group_name": "tools", "alias": "Tools Group", "members": [ "lamsdb", "lamsweb", "lnxsrv107", "nasrunner", "remedy", "reports" ], "notes": "", "notes_url": "", "action_url": "" } } Janice S Singh –

Original Data Flow network firewall (The Enclave) nsca nrpe ssh Cluster Compute Node Remote Node Web Server nagios nsc a nagios.cmd datagg nagios2.cmd nagios nagios web interface HUD orange - pipe file green - text file purple - web site HUD buffer nrpe Dedicated Nagios Node Janice Singh - HUD format downtime.log

Nagios 4 Benefits Upgrading simplified configuration file –Frequent system configuration changes –Error prone –Time consuming Was one file: 17,835 lines; now 23 files: 9,121 lines Majority of the cleanup was using hostgroups APIs eliminate datagg configuration file Janice S Singh –

Modified Data Flow network firewall (The Enclave) nrdp nrpe ssh Cluster Compute Node Remote Node Web Server nagios nagpopd nagios nagios web interface HUD green - flat file purple - web site HUD buffer nrpe Dedicated Nagios Node Janice Singh -

Data Transfer with NRDP vs NSCA Only using one pipe allows use of nrdp Removing datagg layer allows using nagios as it was intended nrdp’s larger file transfer simplifies process –Previously had to split/reassemble –Kernel limit may cause split/reassemble No longer need to overload the perfdata Janice S Singh –

API Type - Archive Gives historical information based on var/archives –Availability –Alerts –Notifications Based on timestamps that you give it bin/archivejson.cgi?query=availability&availabilityob jecttype=hosts& hostname=pbspl233b&starttime= & endtime=-0 Janice S Singh –

API Type - Object Mirrors what your nagios configuration is Hosts Services Contacts Commands Dependencies etc. bin/objectjson.cgi?query=hostgroup&hostgroup=tool s Janice S Singh –

API Type - Status Gives the current state of nagios checks Host Service Comment Downtime bin/statusjson.cgi?query=hostlist&formatoptions=en umerate& hostgroup=tools Janice S Singh –

Status API Post Processing The API return codes are different than nagios nagpopd converts for HUD Janice S Singh – Status Code (From Nagios To Hud): Pending: 1 => 6 Ok: 2 => 0 Warning: 4 => 1 Unknown: 8 => 3 Critical: 16 => 2

API GUI Tool Tool to figure out the variables for the APIs Display builds the query –Dropdowns provide only relevant variables –Displays and executes the query –Displays the resulting JSON –Hovering over the input gives you help tips domain.com/nagios/jsonquery.html Janice S Singh –

API GUI Tool Screenshot Janice S Singh –

API GUI Tool Hover Example Janice S Singh –

NAS Use of APIs nagpopd –datagg replacement –API for object model –API for status Scheduled downtime handling Janice S Singh –

Using API for nagpopd Uses objectJSON: Get the structure directly from the API Eliminates separate HUD config file –Duplicate effort –Human errors –Inertia (resist making changes) HUD configuration put into nagios config HUD content uses custom variables Janice S Singh –

NAS Local Process (nagpopd) Prepares HUD interfacing file: Object Model –Loaded at startup from API queries –Perl, but could be any OO language –Can apply to other processing needs –Specific processing via Service subclassing Some objects created from custom variables –Some hosts form Domains –MultiServiceGroup for shared filesystem servers Janice S Singh –

Object Model NII System:: Main System:: Config System:: Encode System:: Log System:: Query System:: Service2Object Objects:: Domain Objects:: Host Objects:: HostGroup Objects:: MultiServiceGroup Objects:: Service Objects:: A_Service Objects:: B_Service Objects:: Z_Service …

API Queries Object JSON used on startup to create the layout: –objectjson.cgi?query=hostlist&details=true –objectjson.cgi?query=hostgrouplist&details=true –objectjson.cgi?query=servicelist&details=true –objectjson.cgi?query=servicegrouplist&details=true Status JSON queried in a loop to get latest data –statusjson.cgi?query=servicelist&details=true Janice S Singh –

Processing Status Information Generic Service object: –Default process ::setStatus (no changes) –Default output ::writeHUDb (reformat for HUD) –Other output methods easily added ::writeJSON (planned) ::writeHTML (later version) others: MySQL commands, etc Service Subclass overrides methods: –Handles service unique process or output –One array maps service name to object.pm Janice S Singh –

Scheduled Downtime Handling Old solution edited downtime.log When host is down, nagios stops checking it Used to sync with external program (schedule) … –Previous solution required shadow host pleiades – actual host could be down Pleiades – shadow never down –Now able to use APIs… Janice S Singh – Host_a host_a

External Program Use External program (command line interface) $ schedule all ALEX 10/06/ :00-10:25 10/06/2014 Raid Maintenance SUSAN 10/06/ :00-10:25 10/06/2014 RAID maintenance REMEDY 10/06/ :30-12:40 10/06/2014 Restart to resolve issue. $ query=downtimelist&formatoptions=enumerate& details=true Merges and updates nagios downtimelist … Janice S Singh –

Updating downtimelist Use nagios external command feature –SCHEDULE_HOST_DOWNTIME; ; ; ; ; ; ; ; –SCHEDULE_HOST_DOWNTIME;pioneer; ; ;1;0;7200;janice;just a test Documentation described in: mandlist.php mandlist.php Janice S Singh –

Hiccups Fixed by Nagios support Custom variables didn’t show up in JSON output Percent signs broke the JSON … sometimes fatally JSON output was limited to 8k Newlines didn’t show up in output Janice S Singh –

Hiccups We have one plugin that outputs so much data it can’t be passed on the command line, so nrdp breaks. –Kernel limitation –Will have to send in packets Having to have nsca and nrdp work at the same time Janice S Singh –

Future Plans AJAX-style updates to only update the part of the page that needs it Use the other information we get from the APIs –When a service is acknowledged –Use archive data to display alerts based on trends Janice S Singh –

Conclusion Using nagios 4 APIs has made our process much easier and will do more so in the future Simplified configurations Enabled object model Improved the flow Can communicate with external processes Good customer support Janice S Singh –

Questions? Janice S Singh –

Thank You Janice Singh