1 CHEP 2000, 10.02.2000Roberto Barbera Roberto Barbera (*) Grid monitoring with NAGIOS WP3-INFN Meeting, Naples, 29.11.2002 (*) Work in collaboration with.

Slides:



Advertisements
Similar presentations
NAGIOS AND CACTI NETWORK MANAGEMENT AND MONITORING SYSTEMS.
Advertisements

INFN & Globus activities Massimo Sgaravatto INFN Padova.
CHEP 2000, Roberto Barbera Roberto Barbera (*) GENIUS: a Web Portal for the GRID Meeting Grid.it, Bologna, (*) work in collaboration.
1 Network Monitoring with Nagios Asian Internet Interconnection Initiatives Project Yan Adikusuma Nara Institute of Science and Technology
TeraGrid Deployment Test of Grid Software JP Navarro TeraGrid Software Integration University of Chicago OGF 21 October 19, 2007.
Istituto Nazionale di Fisica Nucleare Italy LAL - Orsay April Site Report – R.Gomezel Site Report Roberto Gomezel INFN - Trieste.
ONE STOP THE TOTAL SERVICE SOLUTION FOR REMOTE DEVICE MANAGMENT.
ActiveXperts Network Monitor Monitors servers, workstations and devices for availability Alerts and corrects.
ActiveXperts Network Monitor Monitors servers, workstations and devices for availability Alerts and corrects.
1 CHEP 2000, Roberto Barbera Tests of data management services in EDG 1.2 ALICE Off-line Week,
A Computation Management Agent for Multi-Institutional Grids
DataGrid is a project funded by the European Union 22 September 2003 – n° 1 EDG WP4 Fabric Management: Fabric Monitoring and Fault Tolerance
Network Management Overview IACT 918 July 2004 Gene Awyzio SITACS University of Wollongong.
ManageEngine TM Applications Manager 8 Monitoring Custom Applications.
Rheeve: A Plug-n-Play Peer- to-Peer Computing Platform Wang-kee Poon and Jiannong Cao Department of Computing, The Hong Kong Polytechnic University ICDCSW.
NGOP J.Fromm K.Genser T.Levshina M.Mengel V.Podstavkov.
GridScape Ding Choon Hoong Grid Computing and Distributed Systems (GRIDS) Lab. The University of Melbourne Melbourne, Australia WW Grid.
Hands-On Microsoft Windows Server 2008 Chapter 11 Server and Network Monitoring.
CH 13 Server and Network Monitoring. Hands-On Microsoft Windows Server Objectives Understand the importance of server monitoring Monitor server.
Windows Server 2008 Chapter 11 Last Update
R. Lange, M. Giacchini: Monitoring a Control System Using Nagios Monitoring a Control System Using Nagios Ralph Lange, BESSY – Mauro Giacchini, LNL.
DIRAC API DIRAC Project. Overview  DIRAC API  Why APIs are important?  Why advanced users prefer APIs?  How it is done?  What is local mode what.
G4 Control and Management Solution for Data- Centers and Computer Rooms.
Passive Monitoring with Nagios Jim Prins
Josh Riggs Utilizing Open Source Network Monitoring.
Mobile Agent Technology for the Management of Distributed Systems - a Case Study Claudia Raibulet& Claudio Demartini Politecnico di Torino, Dipartimento.
Active Monitoring in GRID environments using Mobile Agent technology Orazio Tomarchio Andrea Calvagna Dipartimento di Ingegneria Informatica e delle Telecomunicazioni.
CCR GRID 2010 (Catania) Daniele Gregori, Stefano Antonelli, Donato De Girolamo, Luca dell’Agnello, Andrea Ferraro, Guido Guizzunti, Pierpaolo Ricci, Felice.
HPDC 2007 / Grid Infrastructure Monitoring System Based on Nagios Grid Infrastructure Monitoring System Based on Nagios E. Imamagic, D. Dobrenic SRCE HPDC.
workshop eugene, oregon What is network management? System & Service monitoring  Reachability, availability Resource measurement/monitoring.
INFN-GRID Testbed Monitoring System Roberto Barbera Paolo Lo Re Giuseppe Sava Gennaro Tortone.
Fulvio Galeazzi, CHEP 2003, Mar 24— A Monitoring System for the BaBar INFN Computing Cluster Moreno Marzolla Università “Ca' Foscari” di Venezia.
A monitoring tool for a GRID operation center Sergio Andreozzi (INFN CNAF), Sergio Fantinel (INFN Padova), David Rebatto (INFN Milano), Gennaro Tortone.
May PEM status report. O.Bärring 1 PEM status report Large-Scale Cluster Computing Workshop FNAL, May Olof Bärring, CERN.
INDIANAUNIVERSITYINDIANAUNIVERSITY Grid Monitoring from a GOC perspective John Hicks HPCC Engineer Indiana University October 27, 2002 Internet2 Fall Members.
The huge amount of resources available in the Grids, and the necessity to have the most up-to-date experimental software deployed in all the sites within.
Stuart Wakefield Imperial College London Evolution of BOSS, a tool for job submission and tracking W. Bacchi, G. Codispoti, C. Grandi, INFN Bologna D.
Network Monitoring Manage your business without blowing your budget. Learn how the Calhoun ISD utilizes free “Open Source” tools for real-time monitoring.
Fermilab Distributed Monitoring System (NGOP) Progress Report J.Fromm K.Genser T.Levshina M.Mengel V.Podstavkov.
Graphing and statistics with Cacti AfNOG 11, Kigali/Rwanda.
Distributed monitoring system. Why Monitor? Solve them! Identify Problems Ensure conduct Requirements Manage many computers Spot trends in the system.
What is SAM-Grid? Job Handling Data Handling Monitoring and Information.
Roberto Barbera Prague, ALICE Multi-site Data Transfer Tests on a Wide Area Network Giuseppe Lo Re Roberto Barbera Work in collaboration with:
INFSO-RI Enabling Grids for E-sciencE GridICE: Grid and Fabric Monitoring Integrated for gLite-based Sites Sergio Fantinel INFN.
Master thesis Analysis and implementation of monitoring systems of active network equipment. Scientific advisor: Univ. Prof., Dr. Hab., Pavel TOPALA Master.
Tier3 monitoring. Initial issues. Danila Oleynik. Artem Petrosyan. JINR.
ECHO A System Monitoring and Management Tool Yitao Duan and Dawey Huang.
Gennaro Tortone, Sergio Fantinel – Bologna, LCG-EDT Monitoring Service DataTAG WP4 Monitoring Group DataTAG WP4 meeting Bologna –
DataTAG is a project funded by the European Union CERN, 8 May 2003 – n o 1 / 10 Grid Monitoring A conceptual introduction to GridICE Sergio Andreozzi
Queensland University of Technology Nagios – an Open Source monitoring solution and it’s deployment at QUT.
A System for Monitoring and Management of Computational Grids Warren Smith Computer Sciences Corporation NASA Ames Research Center.
G. Russo, D. Del Prete, S. Pardi Kick Off Meeting - Isola d'Elba, 2011 May 29th–June 01th A proposal for distributed computing monitoring for SuperB G.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES Author etc Alarm framework requirements Andrea Sciabà Tony Wildish.
Nagios - introduction Dhruba Raj Bhandari ( CCNA ) p Additions by Phil Regnauld.
Ethan Galstad What Is Nagios? What Nagios Is IT Infrastructure Monitoring.
OPEN SOURCE NETWORK MANAGEMENT TOOLS
Practical using C++ WMProxy API advanced job submission
Use of Nagios in Central European ROC
INFNGRID Monitoring Group report
Hands-On Microsoft Windows Server 2008
Consulting Services JobScheduler Architecture Decision Template
Monitoring with Nagios
EDT-WP4 monitoring group status report
a VO-oriented perspective
#01 Client/Server Computing
CUPS Print Services.
Requirements and Approach
Requirements and Approach
#01 Client/Server Computing
Network Monitoring System
Presentation transcript:

1 CHEP 2000, Roberto Barbera Roberto Barbera (*) Grid monitoring with NAGIOS WP3-INFN Meeting, Naples, (*) Work in collaboration with P. Lo Re, G. Sava and G. Tortone

2 Roberto Barbera Dipartimento di Fisica dell’Università di Catania and INFN Catania - Italy ALICE Collaboration Outline Basic concepts for a distributed monitoring system The INFN choice: NagiosNagios Role of Nagios for Grid monitoring INFN developments Present status of the INFN testbed monitoring system (live demo) WP3-INFN Meeting, Naples,

3 Roberto Barbera Dipartimento di Fisica dell’Università di Catania and INFN Catania - Italy ALICE Collaboration Basic concepts (goals) In the era of GRID computing, farm (LAN) monitoring, fabric (WAN) monitoring, and job monitoring are three faces of the same problem. The system for all of them should be the same, or at least with the same front-end. The system must be scalable up to O(10 3  4 ) nodes and O(10 2 ) sites. The system should be independent of the nature of the parameters to be monitored and should behave in the same way for all of them. The system should not be dependent on a given information service. The front-end must be unique while the back-ends should be as many as possible (both ways). The system must have a “common” (web) user interface and must be “secure”. The system must be easy to install, configure and maintain.

4 Roberto Barbera Dipartimento di Fisica dell’Università di Catania and INFN Catania - Italy ALICE Collaboration The INFN choice: Nagios (1) Nagios is (not only) a network monitoring tool (open source) developed by Ethan Galstad and designed to run under Linux (although is known to be ported on many Unix flavours). Some of its features include: simple plugins design that allows users to easily develop their own service checks monitoring of network services (FTP, HTTP, SSH, …) monitoring of host resources (CPU load/temp, disk usage, …) monitoring of job status (it is just a question of the right plug-in) ability to define network host (or device) “hierarchy” using “parent” host, allowing detection and distinction between hosts that are down and those that are unreachable distributed monitoring: a “central Nagios server” obtains check results from one or more “Nagios distributed servers”. WP3-INFN Meeting, Naples,

5 Roberto Barbera Dipartimento di Fisica dell’Università di Catania and INFN Catania - Italy ALICE Collaboration WP3-INFN Meeting, Naples,

6 Roberto Barbera Dipartimento di Fisica dell’Università di Catania and INFN Catania - Italy ALICE Collaboration Active checks Passive checks WP3-INFN Meeting, Naples,

7 Roberto Barbera Dipartimento di Fisica dell’Università di Catania and INFN Catania - Italy ALICE Collaboration contact notifications when service or host problems occur (via or user defined method) ability to define event handlers to be run during service or host events for “proactive” problem resolution logging mechanism and automatic log-file rotation optional plugins to send SNMP queries to host or network devices (router, switches, …); web interface for view current network status, notifications and problem history, logfile, … The INFN choice: Nagios (2) WP3-INFN Meeting, Naples,

8 Roberto Barbera Dipartimento di Fisica dell’Università di Catania and INFN Catania - Italy ALICE Collaboration Role of Nagios for Grid monitoring The idea is to use Nagios: to view a “snapshot” of the GRID/Testbed resources status, services availability, network measurements (and job status) to receive notifications on host or service (or job) faults to view graphs of resource status, network measurements and job status as a function of time WP3-INFN Meeting, Naples,

9 Roberto Barbera Dipartimento di Fisica dell’Università di Catania and INFN Catania - Italy ALICE Collaboration notifications: it’s possible to define group(s) of users (site admins or production manager) to notify when a service (or a host, or a job) is in critical state; event handlers: they are optional commands that are executed whenever a host or service state change occours; an obvious use of event handlers is the ability for Nagios to proactively fix problems before anyone is notified; another use is to log service or host events to an external database; plugin architecture: Nagios does not include any internal mechanism to check the status of services (or hosts, or jobs); instead, Nagios relies on external programs (plugins) to do all the monitoring activity; this feature allows users to easily develop their own service checks; Interesting features of Nagios for GRID monitoring (1) WP3-INFN Meeting, Naples,

10 Roberto Barbera Dipartimento di Fisica dell’Università di Catania and INFN Catania - Italy ALICE Collaboration remote service checks - NRPEP addon: this addon is designed to provide a way for executing plugins on a remote host. The check_nrpe plugin runs on the Nagios server and is used to send plugin execution requests to the NRPEP agent on the remote host. The nrpe agent will then run an appropriate plugin on the remote host and return the plugin output and return code to the check_nrpe plugin on the Nagios server. The check_nrpe plugin then passes the remote plugin's output and return code back to Nagios as if it were its own. All data in transit are in TripleDES encription format; passive checks : Nagios can process service check results that are submitted by remote hosts through a daemon that runs on the Nagios server and a client that is executed on remote hosts; Interesting features of Nagios for GRID monitoring (2) WP3-INFN Meeting, Naples,

11 Roberto Barbera Dipartimento di Fisica dell’Università di Catania and INFN Catania - Italy ALICE Collaboration distributed monitoring - scalability: a possible usage of Nagios is to install one Nagios “sensor” (in barebone configuration) for each site to collect monitoring results from resources and one main Nagios “collector” (in full configuration) to collect “groups” of monitoring results from sensors; this feature shows the “functionality overlap” that exists between Nagios distributed architecture and GIIS/MDS or R-GMA GRID information architecture; host Nagios sensor monitoring results site A host Nagios sensor monitoring results site B Nagios collector Interesting features of Nagios for GRID monitoring (3) WP3-INFN Meeting, Naples,

12 Roberto Barbera Dipartimento di Fisica dell’Università di Catania and INFN Catania - Italy ALICE Collaboration INFN developments of Nagios clickable geographic maps graphs of resources (or network) monitoring results: we have developed a “wrapper” that parses the output of a plugin execution and insert monitoring values into a RRD (Round Robin Database - A user, from Nagios web interface, can view daily, weekly, monthly or yearly graphs for a selected resource/service “LDAP based” plugin: another thread of development activities is the implementation of a plugin that will “pull” (“push”) information from a MDS server, instead than from resources/services WP3-INFN Meeting, Naples,

13 Roberto Barbera Dipartimento di Fisica dell’Università di Catania and INFN Catania - Italy ALICE Collaboration Current situation Nagios is the “official choice” of INFN Grid Project for monitoring of INFN Testbed 1 Collaboration is going to start with CNR on the use of Nagios for network and fabric monitoring Presently a Nagios server is installed in Catania and checks approximately ~130 services on ~35 hosts WP3-INFN Meeting, Naples,