EDT-WP4 monitoring group status report

1 EDT-WP4 monitoring group status report
Gennaro Tortone (INFN Napoli) DataTAG WP4 meeting Bologna – January 14, 2003

2 EDT monitoring group Partecipants Sergio Andreozzi (INFN CNAF)
Vincenzo Ciaschini (INFN CNAF) Sergio Fantinel (INFN Legnaro) Antonia Ghiselli (INFN CNAF) Gennaro Tortone (INFN Napoli) Cristina Vistoli (INFN CNAF) Goal development of a Grid monitoring tool in order to monitor the overall functioning of the Grid. The software should enable the grid administrators to quickly identify problems and take appropriate action

3 Tasks identify the requirements for Grid monitoring
done – Grid monitoring analysis draft [with some LCG inputs] (available on evaluation of existing monitoring tools (sensors) to use as “first monitoring layer” on each grid-element done – tools evaluated: Ganglia gmond very easy to use multicast based (to gather metrics in a farm) it has not an historical archive some RPM dependencies

4 Tasks EDG-WP4 fabric-monitoring tool (fmon) client-server model very easy to use very easy to install (one RPM – without dependencies) highly customizable (time interval for each metric, …) it is very easy to add a new metric historical archive database in plain-text format extension of the WP4 fabric-monitoring tool (fmon) to include other monitoring metrics done – (all metrics added are available on

5 Tasks GLUE schema extension to include all monitoring metrics
done – “host level” added to GLUE schema development of information-providers “to fill” the GLUE host level extension – in progress definition of database structure to store snapshot/historical monitoring data – in progress

6 information providers
web interface ldap query GIIS (GLUE schema) discovery service information index monitoring service ldap query monitoring server WP4 fmonserver GRIS (GLUE schema) write run ldif output information providers farm monitoring archive WP4 monitoring agent worker node /proc filesystem WP4 sensor run read metric output WP4 monitoring agent worker node /proc filesystem WP4 sensor run read metric output read computing element GRID monitoring architecture for LCG/EDT testbeds author: G. Tortone date: 18/12/2002


8 Future activities “personal” Grid-monitoring [integration with VOMS]
job monitoring automatic resource discovery using MDS infrastructure and GLUE schema evaluation of OGSA as monitoring service development of a "Nagios based" Grid monitoring tool scalability very low intrusivity automatic resource discovery fault detection and notification metrics graphs web interface

