Presentation is loading. Please wait.

Presentation is loading. Please wait.

Computing Facilities CERN IT Department CH-1211 Geneva 23 Switzerland t CF Alarming with GNI VOC WG meeting 12 th September.

Similar presentations


Presentation on theme: "Computing Facilities CERN IT Department CH-1211 Geneva 23 Switzerland t CF Alarming with GNI VOC WG meeting 12 th September."— Presentation transcript:

1 Computing Facilities CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t CF Alarming with GNI itmon-team@cern.ch VOC WG meeting 12 th September 2013

2 Computing Facilities Agenda GNI Overview Metrics Manager –Metric Registration –Metric Workflow –Quattor Legacy Lemon Producer GNI Consumers –Service Now Integration –GNI Dashboard –No Contact Processor Current Status and Next steps Alarming with GNI - 2

3 Computing Facilities GNI Overview Alarming with GNI - 3

4 Computing Facilities Architecture Alarming with GNI - 4

5 Computing Facilities Metrics Manager Alarming with GNI - 5

6 Computing Facilities Metric Registration Lemon Metric Manager: https://metricmgr.cern.chhttps://metricmgr.cern.ch Single entry point for Quattor & Puppet metrics configuration Keeps default parameters setting and assign responsibility –Metrics parameters overloading available via puppetpuppet Lemon metrics concept: –Sensor implements multiple metric classes definition –Metric class can be used for multiple metrics definition Alarming with GNI - 6 Puppet Hiera node Lemon Agent Lemon Forwarder configuration files Metric Manager

7 Computing Facilities Metric Workflow Supports puppet only and puppet + quattor metrics New metrics: –Draft: user defines metric –Pending: user submits metric for approval, itmon team verifies –Production: itmon team propagates new metric to agent definitions Metrics already in Quattor: –Legacy: metric was imported from Quattor but is not enabled in Puppet –Production: itmon team propagates metric to lemon agent definitions Changes to production metrics: –Production: user changes metric definition –Production: itmon team propagates metric to lemon agent definitions Further details: https://metricmgr.cern.ch/help/https://metricmgr.cern.ch/help/ Alarming with GNI - 7

8 Computing Facilities Quattor Legacy Metric definition must still be added to Quattor –Copy the generated Quattor code into a CDB template –e.g. under prod/pro_monitoring_*.tpl Alarming with GNI - 8

9 Computing Facilities Lemon Producer Alarming with GNI - 9

10 Computing Facilities Lemon Producer Main components: –Lemon agent and sensors: no changes –Lemon forwarder: wrapping lemon data to JSON formatJSON format –Lemon tools: no changes to lemon-host-check and lemon-cli Notifications send based on lemon exceptions (alarms) Notifications can be customized in the node: –Can be configured via puppet (How-to)How-to –Overwrites defaults in metrics manager Users can create other notifications Alarming with GNI - 10 Puppet Hiera node Lemon Agent Lemon Forwarder configuration files Metric Manager

11 Computing Facilities GNI Consumers Alarming with GNI - 11

12 Computing Facilities Service Now Integration Takes notifications marked for incident creation Checks if notification should be masked Opens Incidents in SNOW Re-submits notification with incident ID Supports masking of ticket creation Today takes alarmed flag defined in Foreman –Requires successful puppet run In the future it will be integrated with Roger –Developed by config team –Prototyping phase Alarming with GNI - 12

13 Computing Facilities Integration with Roger Masking in Roger –Service providing information about host state and masking state –Set masking for no contact notifications and 3 notification types: Hardware, OS, Application All exceptions must be classified under a notification type: –Hardware, OS, Application FE responsibles will be asked to classify their exceptions Alarming with GNI - 13

14 Computing Facilities GNI Dashboard Alarming with GNI - 14

15 Computing Facilities No Contact Processor Heartbeat from lemon metric updates Processor looks at heartbeat timeout Raises GNI notification –Creates SNOW incident for CC Operator If node comes back –Closes GNI notification Possible to mask with ROGER Alarming with GNI - 15

16 Computing Facilities Current Status & Next Steps Current status –Deployed dev and prod instances of GNI, including Metric Manager –Migrated from Apollo to ActiveMQ –Integrated with training instance of Service Now Next Steps –Integrate Roger service for run-time notification type masking –Review default exception configuration –Start opening SNOW incidents for hardware notifications –Redirect production GNI instance to production Service Now Alarming with GNI - 16

17 Computing Facilities ¿Questions? itmon-team@cern.ch http://cern.ch/itmon Alarming with GNI - 17


Download ppt "Computing Facilities CERN IT Department CH-1211 Geneva 23 Switzerland t CF Alarming with GNI VOC WG meeting 12 th September."

Similar presentations


Ads by Google