R. Lange, M. Giacchini: Monitoring a Control System Using Nagios Monitoring a Control System Using Nagios Ralph Lange, BESSY – Mauro Giacchini, LNL.

Slides:



Advertisements
Similar presentations
NAGIOS AND CACTI NETWORK MANAGEMENT AND MONITORING SYSTEMS.
Advertisements

1 Network Monitoring with Nagios Asian Internet Interconnection Initiatives Project Yan Adikusuma Nara Institute of Science and Technology
Nagios on Tier1 farm Jonathan Wheeler RAL Tier1 Fabric Team 20 th June 2008.
Control System Studio (CSS)
ESafe Reporter V3.0 eSafe Learning and Certification Program February 2007.
© 2009 GroundWork Open Source, Inc. PROPRIETARY INFORMATION: Information contained herein is not for use or disclosure outside of GroundWork Open Source,
How to Monitor Ingres with Open Source Tools
1 CHEP 2000, Roberto Barbera Roberto Barbera (*) Grid monitoring with NAGIOS WP3-INFN Meeting, Naples, (*) Work in collaboration with.
Advanced Workgroup System. Printer Admin Utility Monitors printers over IP networks Views Sharp and non-Sharp SNMP Devices Provided Standard with Sharp.
Nada Abdulla Ahmed.  SmoothWall Express is an open source firewall distribution based on the GNU/Linux operating system. Designed for ease of use, SmoothWall.
Controls and Monitoring Implementation Plan J. Leaver 03/06/2009.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment Chapter 2: Managing Hardware Devices.
CERN - IT Department CH-1211 Genève 23 Switzerland t Oracle and Streams Diagnostics and Monitoring Eva Dafonte Pérez Florbela Tique Aires.
© 2010 VMware Inc. All rights reserved VMware ESX and ESXi Module 3.
Microsoft Load Balancing and Clustering. Outline Introduction Load balancing Clustering.
CH 13 Server and Network Monitoring. Hands-On Microsoft Windows Server Objectives Understand the importance of server monitoring Monitor server.
A U.S. Department of Energy Office of Science Laboratory Operated by The University of Chicago Argonne National Laboratory Office of Science U.S. Department.
Platform as a Service (PaaS)
Professional Informatics & Quality Assurance Software Lifecycle Manager „Tools that are more a help than a hindrance”
G4 Control and Management Solution for Data- Centers and Computer Rooms.
Central Online Grading System COGS Dec15-21 dec1521.sd.ece.iastate.edu.
Customized cloud platform for computing on your terms !
12-CRS-0106 REVISED 8 FEB 2013 EPICS Collaboration Meeting 2013 CSS An integrated development and runtime environment for ITER plant system local controls.
© 2008 Cisco Systems, Inc. All rights reserved.CIPT1 v6.0—2-1 Administering Cisco Unified Communications Manager Understanding Cisco Unified Communications.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment, Enhanced Chapter 2: Managing Hardware Devices.
Josh Riggs Utilizing Open Source Network Monitoring.
An Introduction to IBM Systems Director
Inventory:OCSNG + GLPI Monitoring: Zenoss 3
A New Production Environment for LCLS Controls System Ernest and Jingchen.
Redundancy. 2. Redundancy 2 the need for redundancy EPICS is a great software, but lacks redundancy support which is essential for some highly critical.
Module 10: Monitoring ISA Server Overview Monitoring Overview Configuring Alerts Configuring Session Monitoring Configuring Logging Configuring.
HPDC 2007 / Grid Infrastructure Monitoring System Based on Nagios Grid Infrastructure Monitoring System Based on Nagios E. Imamagic, D. Dobrenic SRCE HPDC.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Simply monitor a grid site with Nagios J.
Real Time Monitors, Inc. Switch Expert™. 2 Switch Expert™ Overview Switch Expert ™ (SE) currently deployed at 80% percent of the INSIGHT-100.
Berliner Elektronenspeicherringgesellschaft für Synchrotronstrahlung mbH (BESSY) CA Proxy Gateway Status and Plans Ralph Lange, BESSY.
TELE 301 Lecture 10: Scheduled … 1 Overview Last Lecture –Post installation This Lecture –Scheduled tasks and log management Next Lecture –DNS –Readings:
A U.S. Department of Energy Office of Science Laboratory Operated by The University of Chicago Argonne National Laboratory Office of Science U.S. Department.
workshop eugene, oregon What is network management? System & Service monitoring  Reachability, availability Resource measurement/monitoring.
Running EPICS on NI CompactRIO Initial Experience Eric Björklund (LA-UR )
Keeping Network Monitoring Current using Automated Nagios Configurations (WIP) Greg Wickham APAN July 2005.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 3: Operating-System Structures System Components Operating System Services.
Jan Hatje, DESY CSS ITER March 2009: Technology and Interfaces XFEL The European X-Ray Laser Project X-Ray Free-Electron Laser 1 CSS – Control.
Introduction To Nagios A Linux-based Monitoring System.
Final Review of ITER PBS 45 CODAC – PART 1 – 14 th, 15 th and 16 th of January CadarachePage 1 FINAL DESIGN REVIEW OF ITER PBS 45 CODAC – PART 1.
1 Installation Training Everything you need to know to get up and running.
Online Software 8-July-98 Commissioning Working Group DØ Workshop S. Fuess Objective: Define for you, the customers of the Online system, the products.
APC Web/SNMP Management Card and PowerChute Network Shutdown
Cluster Consistency Monitor. Why use a cluster consistency monitoring tool? A Cluster is by definition a setup of configurations to maintain the operation.
© Paradigm Publishing, Inc. 4-1 Chapter 4 System Software Chapter 4 System Software.
Citrix XenApp and XenDesktop Monitoring Solution Overview.
Master thesis Analysis and implementation of monitoring systems of active network equipment. Scientific advisor: Univ. Prof., Dr. Hab., Pavel TOPALA Master.
Jefferson Lab Report Karen S. White 11/14/00. Overview  Status of Jefferson Lab Control System  Work In Progress  Transitioning to Operations.
Ralph Lange: CA Gateway Update CA Gateway Update Ralph Lange – EPICS Collaboration Meeting March SSRF.
R. Krempaska, October, 2013 Wir schaffen Wissen – heute für morgen Controls Security at PSI Current Status R. Krempaska, A. Bertrand, C. Higgs, R. Kapeller,
A Service-Based SLA Model HEPIX -- CERN May 6, 2008 Tony Chan -- BNL.
Control System Overview J. Frederick Bartlett Fermilab June 1,1999.
Berliner Elektronenspeicherringgesellschaft für Synchrotronstrahlung mbH (BESSY) CA Gateway Update Ralph Lange, BESSY Ken Evans Jr., APS Jeff Hill, LANL.
An Introduction to Epics/Tango Steve Hunt Alceli EPICS Meeting 2008 INFN Legnaro 15 Oct 17:15.
Queensland University of Technology Nagios – an Open Source monitoring solution and it’s deployment at QUT.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES Author etc Alarm framework requirements Andrea Sciabà Tony Wildish.
Monitoring Dynamic IOC Installations Using the alive Record Dohn Arms Beamline Controls & Data Acquisition Group Advanced Photon Source.
2008 Taipei, Taiwan An Introduction APRICOT 2008 Network Management Workshop February – Taipei, Taiwan Hervey Allen & Phil.
Windows Server 2003 { First Steps and Administration} Benedikt Riedel MCSE + Messaging
Use of Nagios in Central European ROC
INFNGRID Monitoring Group report
Section 13 - Integrating with Third Party Tools
Chapter 2: System Structures
Nagios – Our Open Source Network Management Solution
Developments on IRMIS at APS
Presentation transcript:

R. Lange, M. Giacchini: Monitoring a Control System Using Nagios Monitoring a Control System Using Nagios Ralph Lange, BESSY – Mauro Giacchini, LNL

R. Lange, M. Giacchini: Monitoring a Control System Using Nagios What is the Situation? Machine Status vs. Controls Infrastructure Status Machine status: –usually handled in the Control Room by an operator –uses the Alarm Handler or other EPICS tools –based on Channel Access connections Control System infrastructure can be comparably complex, its status: –needs to be handled outside the Control Room –with tools that allow remote access –using different types of connections/checks: ping, snmp, http, Channel Access, disk usage,... BESSY was starting to have an increasing number of failures due to ageing hardware One summer day Mauro (preparing an EPICS training in hot Italian summer) was asking me if I knew Nagios...

R. Lange, M. Giacchini: Monitoring a Control System Using Nagios What is Nagios? Nagios (“nah-ghee-ose”)‏ Open source monitoring framework –widely used & actively developed: Host and service problems detection and recovery Provides wide set of basic plugins (checks)‏ –easy to develop custom plugins Active vs. passive checks Centralized vs. distributed deployment –also allows redundant Nagios daemons High configurability –service dependencies, fine-grained notification options Web interface –status view, administration (e.g. analysis, downtime scheduling)‏

R. Lange, M. Giacchini: Monitoring a Control System Using Nagios The Plugin (Check) Interface Plugins (Checks)‏ Checks are command line programs that follow a convention for arguments, stdout output, and return code: nagiosplugins.org –Output: one line of status info –Return code: OK / WARNING / CRITICAL / UNKNOWN Can be written in any (i.e. your favourite) compiled or interpreted language Are configured into Nagios for local or remote execution Passive Checks An external application can write check results (following a certain format) into a file (or a pipe)‏ Nagios reads from this and accepts the results (if configured)‏

R. Lange, M. Giacchini: Monitoring a Control System Using Nagios Nagios + CA Plugin = NAL Nagios Channel Access Plugins caget type plugin (active check) by Mauro Giacchini (LNL)‏ camonitor type daemon (passive check) by Debby Quock (APS)‏ Integrate data available through CA into the Nagios monitoring framework Can check the health of EPICS integrated VME crates, VME IOCs, soft IOCs, PLCs, CA gateways, CA archivers,... as well as OPI machine and server health, disk status, network device status, NTP, DNS, web services etc. Allows NAL (Nagios Alarm Handler) to be the central monitoring system for all control system infrastructure, whereas the ALH in the control room provides similar functionality for the controlled facility

R. Lange, M. Giacchini: Monitoring a Control System Using Nagios Current Configuration at BESSY Servers All machines: ping, disk usage, load, processes, users, SSH Some: DNS (foreign and internal addresses), NTP vxWorks IOCs Ping, CPU load, memory usage, FD usage Services Wikis, web server, help pages, issue trackers (Trac/Redmine), elog Oracle servers: Ping, ODB Telnet, ODB TNS for important DBs => 296 checks on 111 hosts

R. Lange, M. Giacchini: Monitoring a Control System Using Nagios Screen Shots: Tactical Overview

R. Lange, M. Giacchini: Monitoring a Control System Using Nagios Screen Shots: Service Detail

R. Lange, M. Giacchini: Monitoring a Control System Using Nagios Screen Shots: Service Detail

R. Lange, M. Giacchini: Monitoring a Control System Using Nagios Screen Shots: Availability Report

R. Lange, M. Giacchini: Monitoring a Control System Using Nagios Screen Shots: Service Trends

R. Lange, M. Giacchini: Monitoring a Control System Using Nagios Firefox/Thunderbird Plugin Highly configurable, many filtering options New alarm starts blinking and may play sound Mouse-over opens a pop-up showing the current alarms Clicking an alarm opens the related Nagios page in a tab

R. Lange, M. Giacchini: Monitoring a Control System Using Nagios Experiences Nagios is a very stable and reliable framework, configuration is flexible, options and plugins are many Off control room, web based, notification approach fits our controls group better than ALH Manual configuration can be tedious, some parts could (should!) be generated from our RDB Found some network problems, one running system clock, two disks filling up, IOC load and memory saturation on a number of mv162s (which were replaced by mv2100s)‏

R. Lange, M. Giacchini: Monitoring a Control System Using Nagios Next Steps To be configured: Soft IOCs, CA Gateways, VME crates (Wiener)‏, Embedded Controllers NFS shares usage, switches/routers, printers Checks to be written: Conserver (IOC console access)‏ CA Archiver (through ArchiveManager web interface)‏ CA access rights (based on cainfo)‏ Collaborate: Integrate CA check plugin development Agree on a common place for our plugins (APS? Sourceforge? Nagios?)‏

R. Lange, M. Giacchini: Monitoring a Control System Using Nagios LivEPICS Example Live Example: Mauro Giacchini's LivEPICS distribution includes Nagios 3.0 (configured to look at the EPICS Base example app channels)‏ Go check it out – now!