Presentation is loading. Please wait.

Presentation is loading. Please wait.

INFSO-RI-508833 Enabling Grids for E-sciencE www.eu-egee.org Experience with monitoring of Prague T2 site Tomáš Kouba NEC 2007, Varna, Bulgaria 11.09.2007.

Similar presentations


Presentation on theme: "INFSO-RI-508833 Enabling Grids for E-sciencE www.eu-egee.org Experience with monitoring of Prague T2 site Tomáš Kouba NEC 2007, Varna, Bulgaria 11.09.2007."— Presentation transcript:

1 INFSO-RI-508833 Enabling Grids for E-sciencE www.eu-egee.org Experience with monitoring of Prague T2 site Tomáš Kouba NEC 2007, Varna, Bulgaria 11.09.2007

2 Enabling Grids for E-sciencE INFSO-RI-508833 Introduction 2 sites involved in EGEE and WLCG project in Prague: –farm golias (praguelcg2)‏  about 150 nodes with 450 cores  CE, 2xSE, many WNs, PBSPro head node –farm skurut (prague_cesnet_lcg2)‏  CE, SE, regional BDII –core infrastructure for VOs voce and auger:  LFC catalogue, VOMS server, lcg RB, glite WMS flexible monitoring is crucial for reliability (see http://gridview.cern.ch)‏

3 Enabling Grids for E-sciencE INFSO-RI-508833 Nagios I Why –de facto standard in monitoring –open source –easy to write new sensors –static configuration is not a problem Other competitors –ganglia –cacti –zenoss (built on ZOPE)‏ –zabbix (rapid development, graphing features, less robust)‏ –moodss

4 Enabling Grids for E-sciencE INFSO-RI-508833 Nagios II - addons Addons: –best source is www.nagiosexchange.org –nuvola (better html and css look for nagios)‏ –nagiosql (web frontend for generating configuration from sql db)‏ –NagiosReport (developed locally, summarizes problems at site, information is gotten from nagios log files and status files)‏ –NagiosGrapher (generates graphs from nagios plugin's output)‏

5 Enabling Grids for E-sciencE INFSO-RI-508833 NagiosReport example Nagios summary generated at 09/11/2007 00:10:02 in 0.828486 seconds. ================================================================================ Hosts in trouble Hosts in downtime (not monitored): golias01, golias15, golias16, golias17, golias59, golias97, golias99 ================================================================================ downtimes: ========== golias01: Host je docasne vyrazen kvuli chybejicim dilum. Mozna bude opraven. golias15: Stroj je nedostupny, protoze funguje jako remore syslog server golias16: Stroj docasne nedostupny kvuli presunuti do jine site - testovani Cisco routeru golias17: Host vyrazen pro testovani Glite 3.1 na SLC4 golias59: Vyrazen na nahradni dily golias97: The node currently does not exist. golias99: The node currently does not exist. ================================================================================ Hosts with problem occured in last 8 hours ================================================================================

6 Enabling Grids for E-sciencE INFSO-RI-508833 Nagios III Plugins –default plugins (part of nagios default installation)‏  ping, disk, procs, load, tcp, swap, ldap, gentoo_glsa, gentoo_service_rc_all –SRCE plugins (developed by Emir Imamagic)‏  cert, dpm, dpns, edg_broker, globus_gram2, gridftp, lfc, srm, srm_ping, voms  executed in SLC3 UI installation in chroot environment –RAL plugins  lcg_same (by Chris Brew)‏ –locally developed  hpacucli, ups, jobs, gstat

7 Enabling Grids for E-sciencE INFSO-RI-508833 Generating configuration I We have hardware database for hw management Extended with service definitions and hw-service relations Definition of WSDL interface to the database:

8 Enabling Grids for E-sciencE INFSO-RI-508833 Generating configuration II python client for generating the actual configuration –uses ZSI python SOAP bindings (older bindings pySOAP, SOAPpy etc. were not sufficient)‏ technically the WSDL file is generated from C header file (it is much easier for developer than writing WSDL by hand)‏ the project's home page is at www.nagiosexchange.com under name SiteQuerywww.nagiosexchange.com

9 Enabling Grids for E-sciencE INFSO-RI-508833 Future work Present dependencies in the database and limit number of false alerts Create another sensors for our hardware (air condition unit, diesel power unit)‏ Cooperate our work with monitoring group of LCG/Hepix at http://www.sysadmin.hep.ac.uk/http://www.sysadmin.hep.ac.uk/ Present another client for generating cfengine/quattor configuration


Download ppt "INFSO-RI-508833 Enabling Grids for E-sciencE www.eu-egee.org Experience with monitoring of Prague T2 site Tomáš Kouba NEC 2007, Varna, Bulgaria 11.09.2007."

Similar presentations


Ads by Google