NGI and Site Nagios Monitoring Emir Imamagic University Computing Centre (SRCE) Croatia EGI-InSPIRE – ROD Teams Workshop
EGI-InSPIRE – ROD Teams Workshop Overview Nagios Monitoring Nagios Web Interface Nagios Internals Credential Management MSG Bridge MyEGEE Bridge SAM CE Metrics Configuration Tuning EGI-InSPIRE – ROD Teams Workshop
EGI-InSPIRE – ROD Teams Workshop Nagios monitoring EGI-InSPIRE – ROD Teams Workshop
EGI-InSPIRE – ROD Teams Workshop Architecture EGI-InSPIRE – ROD Teams Workshop
EGI-InSPIRE – ROD Teams Workshop Nagios Open source monitoring framework Highly flexible with advanced features host/service dependencies, escalation, soft/hard states, flapping detection Widely used & actively developed EGI-InSPIRE – ROD Teams Workshop
Nagios Config Generator Automatic generation of Nagios configuration configuring Nagios is hard Based on multiple information sources Simple bootstrap of Nagios instances EGI-InSPIRE – ROD Teams Workshop
Nagios Config Generator – Information Sources Database components Aggregated Topology Provider (ATP) Metric Description Database (MDDB) Operations services GOCDB, SAM, ENOC Grid information services BDII Static files EGI-InSPIRE – ROD Teams Workshop
EGI-InSPIRE – ROD Teams Workshop Probe Types Local probes probes executed by Nagios as active checks SAM probes (CE, WMS, WN and SRM) WLCG probes (SRCE, CERN) BDII & Gstat probes Nagios native probes lightweight service checks (ENOC Downcollector) grouped in profiles (e.g. ROC, SITE, …) EGI-InSPIRE – ROD Teams Workshop
EGI-InSPIRE – ROD Teams Workshop Probe Types Remote probes results imported from external systems as passive checks remote Nagios instances classic SAM monitoring system ENOC Downcollector EGI-InSPIRE – ROD Teams Workshop
EGI-InSPIRE – ROD Teams Workshop Deployment SL5 RPM packages & metapackages egee-NAGIOS egee-NRPE Yum repository Yaim configuration package glite-NAGIOS glite-NRPE https://twiki.cern.ch/twiki/bin/view/EGEE/GridMonitoringNcgYaim EGI-InSPIRE – ROD Teams Workshop
EGI-InSPIRE – ROD Teams Workshop Nagios Web interface EGI-InSPIRE – ROD Teams Workshop
EGI-InSPIRE – ROD Teams Workshop Tactical Overview EGI-InSPIRE – ROD Teams Workshop
EGI-InSPIRE – ROD Teams Workshop Host Metrics EGI-InSPIRE – ROD Teams Workshop
EGI-InSPIRE – ROD Teams Workshop Host Details EGI-InSPIRE – ROD Teams Workshop
EGI-InSPIRE – ROD Teams Workshop Service Details EGI-InSPIRE – ROD Teams Workshop
Force Metric Execution All services on a host Host Details page Schedule a check of all services on this host Single metric Service Details page Re-schedule the next check of this service Important! don’t force check all services on host or remote metrics EGI-InSPIRE – ROD Teams Workshop
EGI-InSPIRE – ROD Teams Workshop Downtimes Downtimes are imported from GOCDB org.egee.ImportGocdbDowntimes metric Disables notifications of all metrics Metrics are still executed! EGI-InSPIRE – ROD Teams Workshop
EGI-InSPIRE – ROD Teams Workshop External Links Extra Notes red folder image links to metric documentation Extra Actions “bomb” image local probes – links to performance data remote probes – links to original web page EGI-InSPIRE – ROD Teams Workshop
EGI-InSPIRE – ROD Teams Workshop Nagios internals EGI-InSPIRE – ROD Teams Workshop
Credential Management EGI-InSPIRE – ROD Teams Workshop
Credential Management – Nagios Metrics hr.srce.GridProxy-Get-* regenerates VOMS proxy from MyProxy credential hr.srce.GridProxy-Valid-* checks validity of VOMS proxy on Nagios host all metrics using proxy depend on this metric hr.srce.MyProxy-ProxyLifetime-* checks validity of stored MyProxy credential warns admin that MyProxy should be refreshed EGI-InSPIRE – ROD Teams Workshop
EGI-InSPIRE – ROD Teams Workshop MSG Bridge EGI-InSPIRE – ROD Teams Workshop
MSG Bridge – Components ConfigCache SQLite database /var/cache/msg/config-cache/config.db contains configuration of local and remote Nagios instances MsgCache DirQueue /var/spool/msg-nagios-bridge/ contains results from metrics executed by local and remote Nagioses EGI-InSPIRE – ROD Teams Workshop
MSG Bridge – Components msg-to-handler daemon subscribed to list of topics and queues modular implementation (handler per topic/queue) stores configuration to ConfigCache stores remote metric results to MsgCache EGI-InSPIRE – ROD Teams Workshop
MSG Bridge – Nagios Metrics org.egee.SendToMsg publishes configuration & metric results org.egee.RecvFromQueue imports results from local MsgCache to Nagios results imported as passive checks org.egee.ConfigCheck checks if new remote configuration is available EGI-InSPIRE – ROD Teams Workshop
EGI-InSPIRE – ROD Teams Workshop MyEGEE Bridge MyEGEE uses databases Metric Description Database (MDDB) Aggregated Topology Provider (ATP) Metric Result Store (MRS) Nagios executes probes for updating databases EGI-InSPIRE – ROD Teams Workshop
MyEGEE Bridge – Nagios Metrics org.egee.ATPSync synchronizes the local ATP with the central ATP log in /var/log/atp org.egee.MDDBSync synchronizes the local MDDB with the central MDDB log in /var/log/mddb org.egee.SendToMetricStore publishes Nagios results to MRS if critical no data in MyEGEE EGI-InSPIRE – ROD Teams Workshop
EGI-InSPIRE – ROD Teams Workshop SAM CE Metrics org.sam.CE-JobStatus associated with each CE service submits SAM WN job via WMS & holds status of submitted job WN probes communicate back via MSG org.sam.CE-JobMonit associated with Nagios server updates status of all org.sam.CE-JobStatus probes on Nagios EGI-InSPIRE – ROD Teams Workshop
EGI-InSPIRE – ROD Teams Workshop SAM CE Metrics org.sam.CE-JobSubmit associated with each CE service holds the final state of SAM WN job passive check updated by org.sam.CE-JobMonit org.sam.WN-* individual WN metrics (equivalent to old SAM) passive checks updated via MSG https://twiki.cern.ch/twiki/bin/view/LCG/SAMProbesMetrics EGI-InSPIRE – ROD Teams Workshop
EGI-InSPIRE – ROD Teams Workshop Configuration tuning EGI-InSPIRE – ROD Teams Workshop
EGI-InSPIRE – ROD Teams Workshop Configuration Tuning NCG configuration modifying ncg.conf beware of yaim reruns ncg.d directory will be provided in the next release Static file directives adding files to /etc/ncg/ncg-localdb.d/ directives are documented in perldoc of modules NCG::SiteSet::File, NCG::SiteInfo::File, NCG::LocalMetrics::File, NCG::LocalMetricsAttrs::File, NCG::LocalRules::File EGI-InSPIRE – ROD Teams Workshop
NCG Custom Site Config on Multisite Instances Procedure customized NCG block must be copied at the beginning of block sitename is added, e.g. <NCG::SiteInfo egee.srce.hr>… Useful for adding uncertified sites which require specific information sources adding per site static file directives EGI-InSPIRE – ROD Teams Workshop
Adding and Removing Site Handled by module NCG::SiteSet::File Adding site which is in GOCDB/SAM/ATP ADD_SITE!sitename Adding site which is not in GOCDB/SAM/ATP ADD_SITE_BDII!sitename!site_bdii_address Removing site REMOVE_SITE!sitename EGI-InSPIRE – ROD Teams Workshop
Adding and Removing Host Handled by module NCG::SiteInfo::File Host must be associated to service Adding host/service associated with VO ADD_HOST_SERVICE_VO!hostname!service!VO Adding host/service ADD_HOST_SERVICE!hostname!service Important! on multisite instances adding hosts requires NCG::SiteInfo block to be associated to site EGI-InSPIRE – ROD Teams Workshop
Adding and Removing Host REMOVE_HOST!hostname Removing service from a host REMOVE_HOST_SERVICE!hostname!service Removing service from all hosts REMOVE_SERVICE!service EGI-InSPIRE – ROD Teams Workshop
EGI-InSPIRE – ROD Teams Workshop Email Notifications Default grid services configuration GOCDB CONTACT_EMAIL is configured notifications are disabled Default Nagios internals configuration root@localhost is configured notifications are enabled EGI-InSPIRE – ROD Teams Workshop
EGI-InSPIRE – ROD Teams Workshop Email Notifications Enabling grid service notifications set ENABLE_NOTIFICATIONS = 1 in the block <NCG::ConfigGen><Nagios> Changing Nagios internals address NAGIOS_ADMIN = email@address EGI-InSPIRE – ROD Teams Workshop
EGI-InSPIRE – ROD Teams Workshop Email Notifications Possible to add contacts for grid services Handled by module NCG::LocalRules::File Adding contact for all hosts and metrics ADD_CONTACT!email@address Adding contact for a single host ADD_HOSTCONTACT!hostname!email@address EGI-InSPIRE – ROD Teams Workshop
EGI-InSPIRE – ROD Teams Workshop Email Notifications Adding contact for a given service on host ADD_SERVICECONTACT!hostname!service!email@email.com Removing contact REMOVE_CONTACT!email@address useful if you don’t want to receive alerts on the default address EGI-InSPIRE – ROD Teams Workshop
EGI-InSPIRE – ROD Teams Workshop Links OAT page https://twiki.cern.ch/twiki/bin/view/EGEE/OAT_EGEE_III lot of useful links to Nagios, NCG, MSG, packaging, repositories Installation manual https://twiki.cern.ch/twiki/bin/view/EGEE/GridMonitoringNcgYaim EGI-InSPIRE – ROD Teams Workshop
EGI-InSPIRE – ROD Teams Workshop Links Nagios web interface follow “Extra Notes” links where provided Nagios documentation is provided on every instance EGI-InSPIRE – ROD Teams Workshop
EGI-InSPIRE – ROD Teams Workshop Feedback & Support Regional admin mailing list regional-nagios-admins@cern.ch OAT discuss mailing list egee3-operations-automation-discuss@cern.ch Nagios GGUS Support Unit Recently migrated to JIRA tracker https://tomtools.cern.ch/jira/ EGI-InSPIRE – ROD Teams Workshop
EGI-InSPIRE – ROD Teams Workshop Thank you! Questions? EGI-InSPIRE – ROD Teams Workshop