Computing Facilities CERN IT Department CH-1211 Geneva 23 Switzerland t CF Alarming with GNI VOC WG meeting 12 th September.

Slides:



Advertisements
Similar presentations
26/05/2004HEPIX, Edinburgh, May Lemon Web Monitoring Miroslav Šiket CERN IT/FIO
Advertisements

Designing, Deploying and Managing Workflow in SharePoint Sites Steve Heaney Product Development Manager OBS
Computing Facilities CERN IT Department CH-1211 Geneva 23 Switzerland t CF Post-C5 Lemon-web 2.0 Daniel Lenkes and Ivan Fedorko.
CERN IT Department CH-1211 Genève 23 Switzerland t The Agile Infrastructure Project Monitoring Markus Schulz Pedro Andrade.
Grid Technology CERN IT Department CH-1211 Geneva 23 Switzerland t DBCF GT Simplifying Configuration Ricardo Rocha ( on behalf of the LCGDM.
CERN - IT Department CH-1211 Genève 23 Switzerland t Service-Now UDS training [Jan 2011] - 1 Service-now training for UDS Service-now training.
Computing Facilities CERN IT Department CH-1211 Geneva 23 Switzerland t CF Implementing Service Management Processes with Service-Now Zhechka.
CERN IT Department CH-1211 Genève 23 Switzerland t Integrating Lemon Monitoring and Alarming System with the new CERN Agile Infrastructure.
CERN IT Department CH-1211 Geneva 23 Switzerland t Problem management AI Thursday meeting 02/10/2014.
N A managed approach to planning and controlling the implementation of complex application software. n A flexible tool kit, designed to support the Project.
Presented By: Product Activation Group Syndication.
LHC Experiment Dashboard Main areas covered by the Experiment Dashboard: Data processing monitoring (job monitoring) Data transfer monitoring Site/service.
CMDB in Snow Part 7 Zhechka Toteva IT/DI-SM. Conclusions from last meeting Snow table column Snow table Source for Puppet Source for CDB From where Comment.
Software Enhancements Operations keeps the lights on, strategy provides a light at the end of the tunnel, but project management is the train engine that.
CERN - IT Department CH-1211 Genève 23 Switzerland t Monitoring the ATLAS Distributed Data Management System Ricardo Rocha (CERN) on behalf.
AI project components: Facter and Hiera
CERN IT Department CH-1211 Geneva 23 Switzerland t CF Messaging System Ivan, Omar, Sergio 14 march 2012.
CERN IT Department CH-1211 Genève 23 Switzerland t Service Management GLM 15 November 2010 Mats Moller IT-DI-SM.
Copyright © 2007, Oracle. All rights reserved. Managing Concurrent Requests.
U.S. Department of Agriculture eGovernment Program August 14, 2003 eAuthentication Agency Application Pre-Design Meeting eGovernment Program.
Lead Management Tool Partner User Guide March 15, 2013
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Simply monitor a grid site with Nagios J.
Operating Systems & Information Services CERN IT Department CH-1211 Geneva 23 Switzerland t OIS Working with Windows 7 at CERN Michał Budzowski.
Service Management Team Outlook 1. New light User Interface for incidents and for Requests Deployed in Production on 18/06/2013: Review all fields included.
CERN IT Department CH-1211 Genève 23 Switzerland t Using AI tools for IT-CS Spectrum-based monitoring Véronique Lefébure IT/CS-CE February.
CERN IT Department CH-1211 Genève 23 Switzerland t Experiences running a production Puppet Ben Jones HEPiX Bologna Spring.
Platform & Engineering Services CERN IT Department CH-1211 Geneva 23 Switzerland t PES AI’s user access, OpenStack security groups and firewall.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES GGUS Overview ROC_LA CERN
G. Cancio, L. Cons, Ph. Defert - n°1 October 2002 Software Packages Management System for the EU DataGrid G. Cancio Melia, L. Cons, Ph. Defert. CERN/IT.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Grid Site Monitoring with Nagios E. Imamagic,
Lemon Monitoring Miroslav Siket, German Cancio, David Front, Maciej Stepniewski CERN-IT/FIO-FS LCG Operations Workshop Bologna, May 2005.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Stuart Kenny and Stephen Childs Trinity.
Migration from Savannah to JIRA Alina Grigoras A.
Agenda Basic concepts and demo in service portal (search KB articles)
Computing Facilities CERN IT Department CH-1211 Geneva 23 Switzerland t CF Automatic server registration and burn-in framework HEPIX’13 28.
CERN IT Department CH-1211 Geneva 23 Switzerland t CF Computing Facilities Agile Infrastructure Monitoring CERN IT/CF.
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Job Priorities update Andrea Sciabà IT/GS Ulrich Schwickerath IT/FIO.
Computing Facilities CERN IT Department CH-1211 Geneva 23 Switzerland t CF Lemon for Quattor I.Fedorko CERN CF/IT 16 March 2011.
CERN IT Department CH-1211 Genève 23 Switzerland PES 1 Ermis service for DNS Load Balancer configuration HEPiX Fall 2014 Aris Angelogiannopoulos,
CERN IT Department CH-1211 Genève 23 Switzerland t IT Configuration Activities Gavin McCance Online Cross-experiment Meeting, 14 June 2012.
1 CERN IT Department CH-1211 Genève 23 Switzerland t Puppet in the CERN CC Tomas Karasek Steve Traylen Oct
Lemon Tutorial Sensor Exception Miroslav Siket, Dennis Waldron CERN-IT/FIO-FD.
Configuration Report (nearly) Christmas Edition
Computing Facilities CERN IT Department CH-1211 Geneva 23 Switzerland t CF Agile Infrastructure Monitoring HEPiX Spring th April.
CERN IT Department CH-1211 Geneva 23 Switzerland t A proposal for improving Job Reliability Monitoring GDB 2 nd April 2008.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES Andrea Sciabà Hammercloud and Nagios Dan Van Der Ster Nicolò Magini.
Computing Facilities CERN IT Department CH-1211 Geneva 23 Switzerland t CF CF Monitoring: Lemon, LAS, SLS I.Fedorko(IT/CF) IT-Monitoring.
CERN IT Department CH-1211 Genève 23 Switzerland t CERN IT Monitoring and Data Analytics Pedro Andrade (IT-GT) Openlab Workshop on Data Analytics.
PDS4 Demonstration Management Council Face-to-Face Flagstaff, AZ August 22-23, 2011 Sean Hardman.
CERN IT Department CH-1211 Genève 23 Switzerland t Migration from ELFMs to Agile Infrastructure CERN, IT Department.
Computing Facilities CERN IT Department CH-1211 Geneva 23 Switzerland t CF Lemon monitoring and Lemon Alarm System (sensors, exception, alarm)
Platform & Engineering Services CERN IT Department CH-1211 Geneva 23 Switzerland t PES Migration of the ITCM workflow from Remedy to Service-Now.
Computing Facilities CERN IT Department CH-1211 Geneva 23 Switzerland t CF CC Monitoring I.Fedorko on behalf of CF/ASI 18/02/2011 Overview.
CERN IT Department CH-1211 Genève 23 Switzerland t CERN Agile Infrastructure Monitoring Pedro Andrade CERN – IT/GT HEPiX Spring 2012.
CERN AI Config Management 16/07/15 AI for INFN visit2 Overview for INFN visit.
LOAN RADAR Media Center, LLC. Keep your team on the same page Update your in-process loans with loan status.
Interstage BPM v11.2 1Copyright © 2010 FUJITSU LIMITED CREATING A SIMPLE PROCESS.
Computing Facilities CERN IT Department CH-1211 Geneva 23 Switzerland t CF Cluman: Advanced Cluster Management for Large-scale Infrastructures.
IBM Control Desk Enabling the Enterprise App Store –
II EGEE conference Den Haag November, ROC-CIC status in Italy
Platform & Engineering Services CERN IT Department CH-1211 Geneva 23 Switzerland t PES Agile Infrastructure Project Overview : Status and.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks The Dashboard for Operations Cyril L’Orphelin.
Automating operational procedures with Daniel Fernández Rodríguez - Akos Hencz -
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES Author etc Alarm framework requirements Andrea Sciabà Tony Wildish.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Update on Service Availability Monitoring (SAM) Marian Babik, David Collados,
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Nagios Grid Monitor E. Imamagic, SRCE OAT.
How to Setup and Utilize Functionality
Advanced Integration and Deployment Techniques
Overview Multimedia: The Role of WINS in the Network Infrastructure
Presentation transcript:

Computing Facilities CERN IT Department CH-1211 Geneva 23 Switzerland t CF Alarming with GNI VOC WG meeting 12 th September 2013

Computing Facilities Agenda GNI Overview Metrics Manager –Metric Registration –Metric Workflow –Quattor Legacy Lemon Producer GNI Consumers –Service Now Integration –GNI Dashboard –No Contact Processor Current Status and Next steps Alarming with GNI - 2

Computing Facilities GNI Overview Alarming with GNI - 3

Computing Facilities Architecture Alarming with GNI - 4

Computing Facilities Metrics Manager Alarming with GNI - 5

Computing Facilities Metric Registration Lemon Metric Manager: Single entry point for Quattor & Puppet metrics configuration Keeps default parameters setting and assign responsibility –Metrics parameters overloading available via puppetpuppet Lemon metrics concept: –Sensor implements multiple metric classes definition –Metric class can be used for multiple metrics definition Alarming with GNI - 6 Puppet Hiera node Lemon Agent Lemon Forwarder configuration files Metric Manager

Computing Facilities Metric Workflow Supports puppet only and puppet + quattor metrics New metrics: –Draft: user defines metric –Pending: user submits metric for approval, itmon team verifies –Production: itmon team propagates new metric to agent definitions Metrics already in Quattor: –Legacy: metric was imported from Quattor but is not enabled in Puppet –Production: itmon team propagates metric to lemon agent definitions Changes to production metrics: –Production: user changes metric definition –Production: itmon team propagates metric to lemon agent definitions Further details: Alarming with GNI - 7

Computing Facilities Quattor Legacy Metric definition must still be added to Quattor –Copy the generated Quattor code into a CDB template –e.g. under prod/pro_monitoring_*.tpl Alarming with GNI - 8

Computing Facilities Lemon Producer Alarming with GNI - 9

Computing Facilities Lemon Producer Main components: –Lemon agent and sensors: no changes –Lemon forwarder: wrapping lemon data to JSON formatJSON format –Lemon tools: no changes to lemon-host-check and lemon-cli Notifications send based on lemon exceptions (alarms) Notifications can be customized in the node: –Can be configured via puppet (How-to)How-to –Overwrites defaults in metrics manager Users can create other notifications Alarming with GNI - 10 Puppet Hiera node Lemon Agent Lemon Forwarder configuration files Metric Manager

Computing Facilities GNI Consumers Alarming with GNI - 11

Computing Facilities Service Now Integration Takes notifications marked for incident creation Checks if notification should be masked Opens Incidents in SNOW Re-submits notification with incident ID Supports masking of ticket creation Today takes alarmed flag defined in Foreman –Requires successful puppet run In the future it will be integrated with Roger –Developed by config team –Prototyping phase Alarming with GNI - 12

Computing Facilities Integration with Roger Masking in Roger –Service providing information about host state and masking state –Set masking for no contact notifications and 3 notification types: Hardware, OS, Application All exceptions must be classified under a notification type: –Hardware, OS, Application FE responsibles will be asked to classify their exceptions Alarming with GNI - 13

Computing Facilities GNI Dashboard Alarming with GNI - 14

Computing Facilities No Contact Processor Heartbeat from lemon metric updates Processor looks at heartbeat timeout Raises GNI notification –Creates SNOW incident for CC Operator If node comes back –Closes GNI notification Possible to mask with ROGER Alarming with GNI - 15

Computing Facilities Current Status & Next Steps Current status –Deployed dev and prod instances of GNI, including Metric Manager –Migrated from Apollo to ActiveMQ –Integrated with training instance of Service Now Next Steps –Integrate Roger service for run-time notification type masking –Review default exception configuration –Start opening SNOW incidents for hardware notifications –Redirect production GNI instance to production Service Now Alarming with GNI - 16

Computing Facilities ¿Questions? Alarming with GNI - 17