NGI and Site Nagios Monitoring

Slides:



Advertisements
Similar presentations
HPDC 2007 / Grid Infrastructure Monitoring System Based on Nagios Grid Infrastructure Monitoring System Based on Nagios E. Imamagic, D. Dobrenic SRCE HPDC.
Advertisements

EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Simply monitor a grid site with Nagios J.
Monitoring the Grid at local, national, and Global levels Pete Gronbech GridPP Project Manager ACAT - Brunel Sept 2011.
Monitoring in EGEE EGEE/SEEGRID Summer School 2006, Budapest Judit Novak, CERN Piotr Nyczyk, CERN Valentin Vidic, CERN/RBI.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks The network monitoring in grid context Operations.
02/07/09 1 WLCG NAGIOS Kashif Mohammad Deputy Technical Co-ordinator (South Grid) University of Oxford.
WLCG Nagios and the NGS. We have a plan NGS is using a highly customised version of the (SDSC written) INCA monitoring framework. It was became too complicated.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks GStat 2.0 Joanna Huang (ASGC) Laurence Field.
James Casey, CERN, IT-GT-TOM 1 st ROC LA Workshop, 6 th October 2010 Grid Infrastructure Monitoring.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Nagios for Grid Services E. Imamagic, SRCE.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Operations Automation Team James Casey EGEE’08.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Grid Site Monitoring with Nagios E. Imamagic,
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Stuart Kenny and Stephen Childs Trinity.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Wojciech Lapka SAM Team CERN EGEE’09 Conference,
1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.
Information System Status and Evolution Maria Alandes Pradillo, CERN CERN IT Department, Grid Technology Group GDB 13 th June 2012.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Using GStat 2.0 for Information Validation.
SAM Sensors & Tests Judit Novak CERN IT/GD SAM Review I. 21. May 2007, CERN.
INFSO-RI Enabling Grids for E-sciencE /10/20054th EGEE Conference - Pisa1 gLite Configuration and Deployment Models JRA1 Integration.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI How to integrate portals with the EGI monitoring system Dusan Vudragovic.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Grid Monitoring Tools E. Imamagic, SRCE CE.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Regional Nagios Emir Imamagic /SRCE EGEE’09,
TP: Grid site installation BEINGRID site installation.
The GridPP DIRAC project DIRAC for non-LHC communities.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Grid Configuration Data or “What should be.
Probes Requirement Review OTAG-08 03/05/ Requirements that can be directly passed to EMI ● Changes to the MPI test (NGI_IT)
II EGEE conference Den Haag November, ROC-CIC status in Italy
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks ROC model assessment AP ROC ShuTing Liao.
Co-ordination & Harmonisation of Advanced e-Infrastructures for Research and Education Data Sharing Research Infrastructures Grant Agreement n
SAM architecture EGEE 07 Service Availability Monitor for the LHC experiments Simone Campana, Alessandro Di Girolamo, Nicolò Magini, Patricia Mendez Lorenzo,
INFSO-RI Enabling Grids for E-sciencE GOCDB Requirements John Gordon, STFC.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks The Dashboard for Operations Cyril L’Orphelin.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI 2 nd level support training Marian Babik, David Collados, Wojciech Lapka,
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks CYFRONET site report Marcin Radecki CYFRONET.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Update on Service Availability Monitoring (SAM) Marian Babik, David Collados,
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Regional tools use cases overview Peter Solagna – EGI.eu On behalf of the.
Site notifications with SAM and Dashboards Marian Babik SDC/MI Team IT/SDC/MI 12 th June 2013 GDB.
Open Science Grid Configuring RSV OSG Resource & Service Validation Thomas Wang Grid Operations Center (OSG-GOC) Indiana University.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks GOCDB4 Gilles Mathieu, RAL-STFC, UK An introduction.
Monitoring Working Group Update Grid Deployment Board 5 th December, CERN Ian Neilson.
ALICE WLCG operations report Maarten Litmaath CERN IT-SDC ALICE T1-T2 Workshop Torino Feb 23, 2015 v1.2.
TSA1.4 Infrastructure for Grid Management Tiziana Ferrari, EGI.eu EGI-InSPIRE – SA1 Kickoff Meeting1.
Enabling Grids for E-sciencE EGEE-II INFSO-RI ROC managers meeting at EGEE 2007 conference, Budapest, October 1, 2007 Admin Matters Vera Hanser.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Operational Tools M2 Update James Casey.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks An insight into GOCDB for ROD Operators.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Status of the SAM/Nagios/GSTAT Components.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks MyEGEE David Horat (
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Nagios Grid Monitor E. Imamagic, SRCE OAT.
Transition to EGI PSC-06 Istanbul Ioannis Liabotis Greece GRNET
Daniele Bonacorsi Andrea Sciabà
James Casey, CERN IT-GD WLCG Workshop 1st September, 2007
Regional Operations Centres Core infrastructure Centres
Use of Nagios in Central European ROC
Andreas Unterkircher CERN Grid Deployment
SA1.4 Infrastructure for Grid Management Overview
POW MND section.
Operational Tools Update OMB 27/07/2010
Introduction to OAT presentations
Evolution of SAM in an enhanced model for monitoring the WLCG grid
Patricia Méndez Lorenzo ALICE Offline Week CERN, 13th July 2007
Security Monitoring in a Nagios world
Advancements in Availability and Reliability computation Introduction and current status of the Comp Reports mini project C. Kanellopoulos GRNET.
March Availability Report for EGEE Sites based on Nagios
Maite Barroso, SA1 activity leader CERN 27th January 2009
Monitoring in EGEE Automatisierung & Regionalisierung im Hinblick auf EGI Torsten Antoni (KIT), James Casey (CERN), Sabine Reißer (KIT)
Solutions for federated services management EGI
Danilo Dongiovanni INFN-CNAF
The EU DataGrid Fabric Management Services
Kashif Mohammad Deputy Technical Co-ordinator (South Grid) Oxford
EGEE Operation Tools and Procedures
Presentation transcript:

NGI and Site Nagios Monitoring Emir Imamagic University Computing Centre (SRCE) Croatia EGI-InSPIRE – ROD Teams Workshop

EGI-InSPIRE – ROD Teams Workshop Overview Nagios Monitoring Nagios Web Interface Nagios Internals Credential Management MSG Bridge MyEGEE Bridge SAM CE Metrics Configuration Tuning EGI-InSPIRE – ROD Teams Workshop

EGI-InSPIRE – ROD Teams Workshop Nagios monitoring EGI-InSPIRE – ROD Teams Workshop

EGI-InSPIRE – ROD Teams Workshop Architecture EGI-InSPIRE – ROD Teams Workshop

EGI-InSPIRE – ROD Teams Workshop Nagios Open source monitoring framework Highly flexible with advanced features host/service dependencies, escalation, soft/hard states, flapping detection Widely used & actively developed EGI-InSPIRE – ROD Teams Workshop

Nagios Config Generator Automatic generation of Nagios configuration configuring Nagios is hard Based on multiple information sources Simple bootstrap of Nagios instances EGI-InSPIRE – ROD Teams Workshop

Nagios Config Generator – Information Sources Database components Aggregated Topology Provider (ATP) Metric Description Database (MDDB) Operations services GOCDB, SAM, ENOC Grid information services BDII Static files EGI-InSPIRE – ROD Teams Workshop

EGI-InSPIRE – ROD Teams Workshop Probe Types Local probes probes executed by Nagios as active checks SAM probes (CE, WMS, WN and SRM) WLCG probes (SRCE, CERN) BDII & Gstat probes Nagios native probes lightweight service checks (ENOC Downcollector) grouped in profiles (e.g. ROC, SITE, …) EGI-InSPIRE – ROD Teams Workshop

EGI-InSPIRE – ROD Teams Workshop Probe Types Remote probes results imported from external systems as passive checks remote Nagios instances classic SAM monitoring system ENOC Downcollector EGI-InSPIRE – ROD Teams Workshop

EGI-InSPIRE – ROD Teams Workshop Deployment SL5 RPM packages & metapackages egee-NAGIOS egee-NRPE Yum repository Yaim configuration package glite-NAGIOS glite-NRPE https://twiki.cern.ch/twiki/bin/view/EGEE/GridMonitoringNcgYaim EGI-InSPIRE – ROD Teams Workshop

EGI-InSPIRE – ROD Teams Workshop Nagios Web interface EGI-InSPIRE – ROD Teams Workshop

EGI-InSPIRE – ROD Teams Workshop Tactical Overview EGI-InSPIRE – ROD Teams Workshop

EGI-InSPIRE – ROD Teams Workshop Host Metrics EGI-InSPIRE – ROD Teams Workshop

EGI-InSPIRE – ROD Teams Workshop Host Details EGI-InSPIRE – ROD Teams Workshop

EGI-InSPIRE – ROD Teams Workshop Service Details EGI-InSPIRE – ROD Teams Workshop

Force Metric Execution All services on a host Host Details page Schedule a check of all services on this host Single metric Service Details page Re-schedule the next check of this service Important! don’t force check all services on host or remote metrics EGI-InSPIRE – ROD Teams Workshop

EGI-InSPIRE – ROD Teams Workshop Downtimes Downtimes are imported from GOCDB org.egee.ImportGocdbDowntimes metric Disables notifications of all metrics Metrics are still executed! EGI-InSPIRE – ROD Teams Workshop

EGI-InSPIRE – ROD Teams Workshop External Links Extra Notes red folder image links to metric documentation Extra Actions “bomb” image local probes – links to performance data remote probes – links to original web page EGI-InSPIRE – ROD Teams Workshop

EGI-InSPIRE – ROD Teams Workshop Nagios internals EGI-InSPIRE – ROD Teams Workshop

Credential Management EGI-InSPIRE – ROD Teams Workshop

Credential Management – Nagios Metrics hr.srce.GridProxy-Get-* regenerates VOMS proxy from MyProxy credential hr.srce.GridProxy-Valid-* checks validity of VOMS proxy on Nagios host all metrics using proxy depend on this metric hr.srce.MyProxy-ProxyLifetime-* checks validity of stored MyProxy credential warns admin that MyProxy should be refreshed EGI-InSPIRE – ROD Teams Workshop

EGI-InSPIRE – ROD Teams Workshop MSG Bridge EGI-InSPIRE – ROD Teams Workshop

MSG Bridge – Components ConfigCache SQLite database /var/cache/msg/config-cache/config.db contains configuration of local and remote Nagios instances MsgCache DirQueue /var/spool/msg-nagios-bridge/ contains results from metrics executed by local and remote Nagioses EGI-InSPIRE – ROD Teams Workshop

MSG Bridge – Components msg-to-handler daemon subscribed to list of topics and queues modular implementation (handler per topic/queue) stores configuration to ConfigCache stores remote metric results to MsgCache EGI-InSPIRE – ROD Teams Workshop

MSG Bridge – Nagios Metrics org.egee.SendToMsg publishes configuration & metric results org.egee.RecvFromQueue imports results from local MsgCache to Nagios results imported as passive checks org.egee.ConfigCheck checks if new remote configuration is available EGI-InSPIRE – ROD Teams Workshop

EGI-InSPIRE – ROD Teams Workshop MyEGEE Bridge MyEGEE uses databases Metric Description Database (MDDB) Aggregated Topology Provider (ATP) Metric Result Store (MRS) Nagios executes probes for updating databases EGI-InSPIRE – ROD Teams Workshop

MyEGEE Bridge – Nagios Metrics org.egee.ATPSync synchronizes the local ATP with the central ATP log in /var/log/atp org.egee.MDDBSync synchronizes the local MDDB with the central MDDB log in /var/log/mddb org.egee.SendToMetricStore publishes Nagios results to MRS if critical no data in MyEGEE EGI-InSPIRE – ROD Teams Workshop

EGI-InSPIRE – ROD Teams Workshop SAM CE Metrics org.sam.CE-JobStatus associated with each CE service submits SAM WN job via WMS & holds status of submitted job WN probes communicate back via MSG org.sam.CE-JobMonit associated with Nagios server updates status of all org.sam.CE-JobStatus probes on Nagios EGI-InSPIRE – ROD Teams Workshop

EGI-InSPIRE – ROD Teams Workshop SAM CE Metrics org.sam.CE-JobSubmit associated with each CE service holds the final state of SAM WN job passive check updated by org.sam.CE-JobMonit org.sam.WN-* individual WN metrics (equivalent to old SAM) passive checks updated via MSG https://twiki.cern.ch/twiki/bin/view/LCG/SAMProbesMetrics EGI-InSPIRE – ROD Teams Workshop

EGI-InSPIRE – ROD Teams Workshop Configuration tuning EGI-InSPIRE – ROD Teams Workshop

EGI-InSPIRE – ROD Teams Workshop Configuration Tuning NCG configuration modifying ncg.conf beware of yaim reruns ncg.d directory will be provided in the next release Static file directives adding files to /etc/ncg/ncg-localdb.d/ directives are documented in perldoc of modules NCG::SiteSet::File, NCG::SiteInfo::File, NCG::LocalMetrics::File, NCG::LocalMetricsAttrs::File, NCG::LocalRules::File EGI-InSPIRE – ROD Teams Workshop

NCG Custom Site Config on Multisite Instances Procedure customized NCG block must be copied at the beginning of block sitename is added, e.g. <NCG::SiteInfo egee.srce.hr>… Useful for adding uncertified sites which require specific information sources adding per site static file directives EGI-InSPIRE – ROD Teams Workshop

Adding and Removing Site Handled by module NCG::SiteSet::File Adding site which is in GOCDB/SAM/ATP ADD_SITE!sitename Adding site which is not in GOCDB/SAM/ATP ADD_SITE_BDII!sitename!site_bdii_address Removing site REMOVE_SITE!sitename EGI-InSPIRE – ROD Teams Workshop

Adding and Removing Host Handled by module NCG::SiteInfo::File Host must be associated to service Adding host/service associated with VO ADD_HOST_SERVICE_VO!hostname!service!VO Adding host/service ADD_HOST_SERVICE!hostname!service Important! on multisite instances adding hosts requires NCG::SiteInfo block to be associated to site EGI-InSPIRE – ROD Teams Workshop

Adding and Removing Host REMOVE_HOST!hostname Removing service from a host REMOVE_HOST_SERVICE!hostname!service Removing service from all hosts REMOVE_SERVICE!service EGI-InSPIRE – ROD Teams Workshop

EGI-InSPIRE – ROD Teams Workshop Email Notifications Default grid services configuration GOCDB CONTACT_EMAIL is configured notifications are disabled Default Nagios internals configuration root@localhost is configured notifications are enabled EGI-InSPIRE – ROD Teams Workshop

EGI-InSPIRE – ROD Teams Workshop Email Notifications Enabling grid service notifications set ENABLE_NOTIFICATIONS = 1 in the block <NCG::ConfigGen><Nagios> Changing Nagios internals address NAGIOS_ADMIN = email@address EGI-InSPIRE – ROD Teams Workshop

EGI-InSPIRE – ROD Teams Workshop Email Notifications Possible to add contacts for grid services Handled by module NCG::LocalRules::File Adding contact for all hosts and metrics ADD_CONTACT!email@address Adding contact for a single host ADD_HOSTCONTACT!hostname!email@address EGI-InSPIRE – ROD Teams Workshop

EGI-InSPIRE – ROD Teams Workshop Email Notifications Adding contact for a given service on host ADD_SERVICECONTACT!hostname!service!email@email.com Removing contact REMOVE_CONTACT!email@address useful if you don’t want to receive alerts on the default address EGI-InSPIRE – ROD Teams Workshop

EGI-InSPIRE – ROD Teams Workshop Links OAT page https://twiki.cern.ch/twiki/bin/view/EGEE/OAT_EGEE_III lot of useful links to Nagios, NCG, MSG, packaging, repositories Installation manual https://twiki.cern.ch/twiki/bin/view/EGEE/GridMonitoringNcgYaim EGI-InSPIRE – ROD Teams Workshop

EGI-InSPIRE – ROD Teams Workshop Links Nagios web interface follow “Extra Notes” links where provided Nagios documentation is provided on every instance EGI-InSPIRE – ROD Teams Workshop

EGI-InSPIRE – ROD Teams Workshop Feedback & Support Regional admin mailing list regional-nagios-admins@cern.ch OAT discuss mailing list egee3-operations-automation-discuss@cern.ch Nagios GGUS Support Unit Recently migrated to JIRA tracker https://tomtools.cern.ch/jira/ EGI-InSPIRE – ROD Teams Workshop

EGI-InSPIRE – ROD Teams Workshop Thank you! Questions? EGI-InSPIRE – ROD Teams Workshop