Presentation is loading. Please wait.

Presentation is loading. Please wait.

INFNGRID Monitoring Group report

Similar presentations


Presentation on theme: "INFNGRID Monitoring Group report"— Presentation transcript:

1 INFNGRID Monitoring Group report
Roberto Barbera (INFN Catania) Paolo Lo Re (INFN Napoli) Giuseppe Sava (INFN Catania) Gennaro Tortone (INFN Napoli) Napoli – November 2002

2 Monitoring of grid elements (1/2)
Computing Element Resource Broker Storage Element Worker Node Worker Node Worker Node Replica Catalog Information Index Worker Node […] Replica Catalog LOW LEVEL measurements CPU load memory usage disk usage (per partition) network activity number of processes number of users (UI) SERVICE checks gatekeeper gsiftp gris gdmp RB/LB “GRID” measurements number of total CPUS number of free CPUS number of running jobs number of waiting jobs SE free disk space

3 Monitoring of grid elements (2/2)
sources of information LOW LEVEL measurements -> plugins/sensors installed on each machine SERVICE checks -> sensors installed on monitoring server GRID measurements -> sensors installed on monitoring server aggregate information per VO per site

4 The idea… The idea was/is to use Nagios: to view a “snapshot” of the GRID/Testbed resources status, services availability and network measurements; to receive notifications on host or service faults; to view graphs of resource monitoring results or network measurements; Nagios was the “official choice” of INFN Testbed Technical Board for monitoring of INFN Testbed 1

5 Nagios (1/2) Nagios ( is a (OpenSource) network monitoring tool developed by Ethan Galstad and designed to run under Linux. Some of its features include: simple plugins design that allows users to easily develop their own service checks monitoring of network services (FTP, HTTP, SSH, …) monitoring of host resources (CPU load, disk usage, …) ability to define network host (or device) “hierarchy” using “parent” host, allowing detection and distinction between host that are down and those that are unreachable distributed monitoring: a “central Nagios server” obtains check results from one or more “Nagios distributed servers”

6 Nagios (2/2) contact notifications when service or host problems occour (via or user defined method) ability to define event handlers to be run during service or host events for “proactive” problem resolution logging mechanism and automatic log-file rotation optional plugins to send SNMP queries to host or network devices (router, switches, …); web interface for view current network status, notifications and problem history, logfile, …

7 Addons Addons developed by INFNGRID monitoring group
graphs of resources monitoring results: we have developed a “wrapper” that parses the output of a plugin execution and insert monitoring values into a RRD (Round Robin Database - An user, from Nagios web interface, can view daily, weekly, monthly or yearly graphs for a selected resource “LDAP based” plugins: another thread of development activities is the implementation of plugins that “pull” information from a MDS server, instead than from resources

8 News Nagios and a new web-interface was used during WorldGRID demo in
SuperComputing 2002 (Baltimore) IST 2002 (Copenhagen) WorldGRID is an intercontinental Testbed between US and EU (between the iVDGL-Trillium and DataTAG(EDT)-EDG projects)

9 And now… … a short demo !


Download ppt "INFNGRID Monitoring Group report"

Similar presentations


Ads by Google