Download presentation
Presentation is loading. Please wait.
Published byGordon Knight Modified over 8 years ago
1
INFSO-RI-508833 Enabling Grids for E-sciencE www.eu-egee.org Experience with monitoring of Prague T2 site Tomáš Kouba NEC 2007, Varna, Bulgaria 11.09.2007
2
Enabling Grids for E-sciencE INFSO-RI-508833 Introduction 2 sites involved in EGEE and WLCG project in Prague: –farm golias (praguelcg2) about 150 nodes with 450 cores CE, 2xSE, many WNs, PBSPro head node –farm skurut (prague_cesnet_lcg2) CE, SE, regional BDII –core infrastructure for VOs voce and auger: LFC catalogue, VOMS server, lcg RB, glite WMS flexible monitoring is crucial for reliability (see http://gridview.cern.ch)
3
Enabling Grids for E-sciencE INFSO-RI-508833 Nagios I Why –de facto standard in monitoring –open source –easy to write new sensors –static configuration is not a problem Other competitors –ganglia –cacti –zenoss (built on ZOPE) –zabbix (rapid development, graphing features, less robust) –moodss
4
Enabling Grids for E-sciencE INFSO-RI-508833 Nagios II - addons Addons: –best source is www.nagiosexchange.org –nuvola (better html and css look for nagios) –nagiosql (web frontend for generating configuration from sql db) –NagiosReport (developed locally, summarizes problems at site, information is gotten from nagios log files and status files) –NagiosGrapher (generates graphs from nagios plugin's output)
5
Enabling Grids for E-sciencE INFSO-RI-508833 NagiosReport example Nagios summary generated at 09/11/2007 00:10:02 in 0.828486 seconds. ================================================================================ Hosts in trouble Hosts in downtime (not monitored): golias01, golias15, golias16, golias17, golias59, golias97, golias99 ================================================================================ downtimes: ========== golias01: Host je docasne vyrazen kvuli chybejicim dilum. Mozna bude opraven. golias15: Stroj je nedostupny, protoze funguje jako remore syslog server golias16: Stroj docasne nedostupny kvuli presunuti do jine site - testovani Cisco routeru golias17: Host vyrazen pro testovani Glite 3.1 na SLC4 golias59: Vyrazen na nahradni dily golias97: The node currently does not exist. golias99: The node currently does not exist. ================================================================================ Hosts with problem occured in last 8 hours ================================================================================
6
Enabling Grids for E-sciencE INFSO-RI-508833 Nagios III Plugins –default plugins (part of nagios default installation) ping, disk, procs, load, tcp, swap, ldap, gentoo_glsa, gentoo_service_rc_all –SRCE plugins (developed by Emir Imamagic) cert, dpm, dpns, edg_broker, globus_gram2, gridftp, lfc, srm, srm_ping, voms executed in SLC3 UI installation in chroot environment –RAL plugins lcg_same (by Chris Brew) –locally developed hpacucli, ups, jobs, gstat
7
Enabling Grids for E-sciencE INFSO-RI-508833 Generating configuration I We have hardware database for hw management Extended with service definitions and hw-service relations Definition of WSDL interface to the database:
8
Enabling Grids for E-sciencE INFSO-RI-508833 Generating configuration II python client for generating the actual configuration –uses ZSI python SOAP bindings (older bindings pySOAP, SOAPpy etc. were not sufficient) technically the WSDL file is generated from C header file (it is much easier for developer than writing WSDL by hand) the project's home page is at www.nagiosexchange.com under name SiteQuerywww.nagiosexchange.com
9
Enabling Grids for E-sciencE INFSO-RI-508833 Future work Present dependencies in the database and limit number of false alerts Create another sensors for our hardware (air condition unit, diesel power unit) Cooperate our work with monitoring group of LCG/Hepix at http://www.sysadmin.hep.ac.uk/http://www.sysadmin.hep.ac.uk/ Present another client for generating cfengine/quattor configuration
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.