Presentation is loading. Please wait.

Presentation is loading. Please wait.

NGI and Site Nagios Monitoring

Similar presentations


Presentation on theme: "NGI and Site Nagios Monitoring"— Presentation transcript:

1 NGI and Site Nagios Monitoring
Emir Imamagic University Computing Centre (SRCE) Croatia EGI-InSPIRE – ROD Teams Workshop

2 EGI-InSPIRE – ROD Teams Workshop
Overview Nagios Monitoring Nagios Web Interface Nagios Internals Credential Management MSG Bridge MyEGEE Bridge SAM CE Metrics Configuration Tuning EGI-InSPIRE – ROD Teams Workshop

3 EGI-InSPIRE – ROD Teams Workshop
Nagios monitoring EGI-InSPIRE – ROD Teams Workshop

4 EGI-InSPIRE – ROD Teams Workshop
Architecture EGI-InSPIRE – ROD Teams Workshop

5 EGI-InSPIRE – ROD Teams Workshop
Nagios Open source monitoring framework Highly flexible with advanced features host/service dependencies, escalation, soft/hard states, flapping detection Widely used & actively developed EGI-InSPIRE – ROD Teams Workshop

6 Nagios Config Generator
Automatic generation of Nagios configuration configuring Nagios is hard Based on multiple information sources Simple bootstrap of Nagios instances EGI-InSPIRE – ROD Teams Workshop

7 Nagios Config Generator – Information Sources
Database components Aggregated Topology Provider (ATP) Metric Description Database (MDDB) Operations services GOCDB, SAM, ENOC Grid information services BDII Static files EGI-InSPIRE – ROD Teams Workshop

8 EGI-InSPIRE – ROD Teams Workshop
Probe Types Local probes probes executed by Nagios as active checks SAM probes (CE, WMS, WN and SRM) WLCG probes (SRCE, CERN) BDII & Gstat probes Nagios native probes lightweight service checks (ENOC Downcollector) grouped in profiles (e.g. ROC, SITE, …) EGI-InSPIRE – ROD Teams Workshop

9 EGI-InSPIRE – ROD Teams Workshop
Probe Types Remote probes results imported from external systems as passive checks remote Nagios instances classic SAM monitoring system ENOC Downcollector EGI-InSPIRE – ROD Teams Workshop

10 EGI-InSPIRE – ROD Teams Workshop
Deployment SL5 RPM packages & metapackages egee-NAGIOS egee-NRPE Yum repository Yaim configuration package glite-NAGIOS glite-NRPE EGI-InSPIRE – ROD Teams Workshop

11 EGI-InSPIRE – ROD Teams Workshop
Nagios Web interface EGI-InSPIRE – ROD Teams Workshop

12 EGI-InSPIRE – ROD Teams Workshop
Tactical Overview EGI-InSPIRE – ROD Teams Workshop

13 EGI-InSPIRE – ROD Teams Workshop
Host Metrics EGI-InSPIRE – ROD Teams Workshop

14 EGI-InSPIRE – ROD Teams Workshop
Host Details EGI-InSPIRE – ROD Teams Workshop

15 EGI-InSPIRE – ROD Teams Workshop
Service Details EGI-InSPIRE – ROD Teams Workshop

16 Force Metric Execution
All services on a host Host Details page Schedule a check of all services on this host Single metric Service Details page Re-schedule the next check of this service Important! don’t force check all services on host or remote metrics EGI-InSPIRE – ROD Teams Workshop

17 EGI-InSPIRE – ROD Teams Workshop
Downtimes Downtimes are imported from GOCDB org.egee.ImportGocdbDowntimes metric Disables notifications of all metrics Metrics are still executed! EGI-InSPIRE – ROD Teams Workshop

18 EGI-InSPIRE – ROD Teams Workshop
External Links Extra Notes red folder image links to metric documentation Extra Actions “bomb” image local probes – links to performance data remote probes – links to original web page EGI-InSPIRE – ROD Teams Workshop

19 EGI-InSPIRE – ROD Teams Workshop
Nagios internals EGI-InSPIRE – ROD Teams Workshop

20 Credential Management
EGI-InSPIRE – ROD Teams Workshop

21 Credential Management – Nagios Metrics
hr.srce.GridProxy-Get-* regenerates VOMS proxy from MyProxy credential hr.srce.GridProxy-Valid-* checks validity of VOMS proxy on Nagios host all metrics using proxy depend on this metric hr.srce.MyProxy-ProxyLifetime-* checks validity of stored MyProxy credential warns admin that MyProxy should be refreshed EGI-InSPIRE – ROD Teams Workshop

22 EGI-InSPIRE – ROD Teams Workshop
MSG Bridge EGI-InSPIRE – ROD Teams Workshop

23 MSG Bridge – Components
ConfigCache SQLite database /var/cache/msg/config-cache/config.db contains configuration of local and remote Nagios instances MsgCache DirQueue /var/spool/msg-nagios-bridge/ contains results from metrics executed by local and remote Nagioses EGI-InSPIRE – ROD Teams Workshop

24 MSG Bridge – Components
msg-to-handler daemon subscribed to list of topics and queues modular implementation (handler per topic/queue) stores configuration to ConfigCache stores remote metric results to MsgCache EGI-InSPIRE – ROD Teams Workshop

25 MSG Bridge – Nagios Metrics
org.egee.SendToMsg publishes configuration & metric results org.egee.RecvFromQueue imports results from local MsgCache to Nagios results imported as passive checks org.egee.ConfigCheck checks if new remote configuration is available EGI-InSPIRE – ROD Teams Workshop

26 EGI-InSPIRE – ROD Teams Workshop
MyEGEE Bridge MyEGEE uses databases Metric Description Database (MDDB) Aggregated Topology Provider (ATP) Metric Result Store (MRS) Nagios executes probes for updating databases EGI-InSPIRE – ROD Teams Workshop

27 MyEGEE Bridge – Nagios Metrics
org.egee.ATPSync synchronizes the local ATP with the central ATP log in /var/log/atp org.egee.MDDBSync synchronizes the local MDDB with the central MDDB log in /var/log/mddb org.egee.SendToMetricStore publishes Nagios results to MRS if critical no data in MyEGEE EGI-InSPIRE – ROD Teams Workshop

28 EGI-InSPIRE – ROD Teams Workshop
SAM CE Metrics org.sam.CE-JobStatus associated with each CE service submits SAM WN job via WMS & holds status of submitted job WN probes communicate back via MSG org.sam.CE-JobMonit associated with Nagios server updates status of all org.sam.CE-JobStatus probes on Nagios EGI-InSPIRE – ROD Teams Workshop

29 EGI-InSPIRE – ROD Teams Workshop
SAM CE Metrics org.sam.CE-JobSubmit associated with each CE service holds the final state of SAM WN job passive check updated by org.sam.CE-JobMonit org.sam.WN-* individual WN metrics (equivalent to old SAM) passive checks updated via MSG EGI-InSPIRE – ROD Teams Workshop

30 EGI-InSPIRE – ROD Teams Workshop
Configuration tuning EGI-InSPIRE – ROD Teams Workshop

31 EGI-InSPIRE – ROD Teams Workshop
Configuration Tuning NCG configuration modifying ncg.conf beware of yaim reruns ncg.d directory will be provided in the next release Static file directives adding files to /etc/ncg/ncg-localdb.d/ directives are documented in perldoc of modules NCG::SiteSet::File, NCG::SiteInfo::File, NCG::LocalMetrics::File, NCG::LocalMetricsAttrs::File, NCG::LocalRules::File EGI-InSPIRE – ROD Teams Workshop

32 NCG Custom Site Config on Multisite Instances
Procedure customized NCG block must be copied at the beginning of block sitename is added, e.g. <NCG::SiteInfo egee.srce.hr>… Useful for adding uncertified sites which require specific information sources adding per site static file directives EGI-InSPIRE – ROD Teams Workshop

33 Adding and Removing Site
Handled by module NCG::SiteSet::File Adding site which is in GOCDB/SAM/ATP ADD_SITE!sitename Adding site which is not in GOCDB/SAM/ATP ADD_SITE_BDII!sitename!site_bdii_address Removing site REMOVE_SITE!sitename EGI-InSPIRE – ROD Teams Workshop

34 Adding and Removing Host
Handled by module NCG::SiteInfo::File Host must be associated to service Adding host/service associated with VO ADD_HOST_SERVICE_VO!hostname!service!VO Adding host/service ADD_HOST_SERVICE!hostname!service Important! on multisite instances adding hosts requires NCG::SiteInfo block to be associated to site EGI-InSPIRE – ROD Teams Workshop

35 Adding and Removing Host
REMOVE_HOST!hostname Removing service from a host REMOVE_HOST_SERVICE!hostname!service Removing service from all hosts REMOVE_SERVICE!service EGI-InSPIRE – ROD Teams Workshop

36 EGI-InSPIRE – ROD Teams Workshop
Notifications Default grid services configuration GOCDB CONTACT_ is configured notifications are disabled Default Nagios internals configuration is configured notifications are enabled EGI-InSPIRE – ROD Teams Workshop

37 EGI-InSPIRE – ROD Teams Workshop
Notifications Enabling grid service notifications set ENABLE_NOTIFICATIONS = 1 in the block <NCG::ConfigGen><Nagios> Changing Nagios internals address NAGIOS_ADMIN = EGI-InSPIRE – ROD Teams Workshop

38 EGI-InSPIRE – ROD Teams Workshop
Notifications Possible to add contacts for grid services Handled by module NCG::LocalRules::File Adding contact for all hosts and metrics Adding contact for a single host EGI-InSPIRE – ROD Teams Workshop

39 EGI-InSPIRE – ROD Teams Workshop
Notifications Adding contact for a given service on host Removing contact useful if you don’t want to receive alerts on the default address EGI-InSPIRE – ROD Teams Workshop

40 EGI-InSPIRE – ROD Teams Workshop
Links OAT page lot of useful links to Nagios, NCG, MSG, packaging, repositories Installation manual EGI-InSPIRE – ROD Teams Workshop

41 EGI-InSPIRE – ROD Teams Workshop
Links Nagios web interface follow “Extra Notes” links where provided Nagios documentation is provided on every instance EGI-InSPIRE – ROD Teams Workshop

42 EGI-InSPIRE – ROD Teams Workshop
Feedback & Support Regional admin mailing list OAT discuss mailing list Nagios GGUS Support Unit Recently migrated to JIRA tracker EGI-InSPIRE – ROD Teams Workshop

43 EGI-InSPIRE – ROD Teams Workshop
Thank you! Questions? EGI-InSPIRE – ROD Teams Workshop


Download ppt "NGI and Site Nagios Monitoring"

Similar presentations


Ads by Google