CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t Problem management AI Thursday meeting 02/10/2014.

Slides:



Advertisements
Similar presentations
Know the Difference™ Incident Investigation Solution Martin Perlin Marketing Director, Evolven A HIGH STAKES RACE AGAINST TIME Prevent high impact IT environment.
Advertisements

University of Florida Incident Tracking and Reporting Kathy Bergsma
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES Feasibility Study on a Common Analysis Framework for ATLAS & CMS.
Best Practices – Overview
CERN - IT Department CH-1211 Genève 23 Switzerland t Oracle and Streams Diagnostics and Monitoring Eva Dafonte Pérez Florbela Tique Aires.
Problem Management Overview
ITIL: Why Your IT Organization Should Care Service Support
Change Management Chris Colomb Trish Fullmer Jordan Bloodworth Veronica Beichner.
CERN - IT Department CH-1211 Genève 23 Switzerland t Service-Now UDS training [Jan 2011] - 1 Service-now training for UDS Service-now training.
CERN IT Department CH-1211 Genève 23 Switzerland t Integrating Lemon Monitoring and Alarming System with the new CERN Agile Infrastructure.
Know the Difference™ Release Validation Solution Martin Perlin Marketing Director, Evolven RELEASE WITH CONFIDENCE Prevent production outages and ensure.
YOUR LOGO HERE Slide Master View © (reproduced with permission) Incident Management All course material is copyright.
CERN - IT Department CH-1211 Genève 23 Switzerland t Monitoring the ATLAS Distributed Data Management System Ricardo Rocha (CERN) on behalf.
Thursday, January 23, :00 am – 11:30 am. Agenda  Cyber Security Center of Excellence  Project Phase  Implementation  Next Steps 2.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES Support Tools, Underlying Services and WLCG Operations Lionel.
*HCL Confidential ICE OFM – Logging Exception Auditing Framework.
ITIL Process Management An Overview of Service Management Processes Thanks to Jerree Catlin, Sue Silkey & Thelma Simons University of Kansas.
CERN IT Department CH-1211 Genève 23 Switzerland t Service Management GLM 15 November 2010 Mats Moller IT-DI-SM.
Service Management for CERN GS & IT. Page 2 Service Management: WHAT Our Goals:  One Service Desk for CERN (one number to ring, one place to go, 24/7.
Event Management & ITIL V3
CERN IT Department CH-1211 Genève 23 Switzerland t ITIL at CERN Tony Cass HEPiX LBL, 29 th October 2009.
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Job Monitoring for the LHC experiments Irina Sidorova (CERN, JINR) on.
CERN IT Department CH-1211 Genève 23 Switzerland t Using AI tools for IT-CS Spectrum-based monitoring Véronique Lefébure IT/CS-CE February.
Service Management for CERN Change Management Workshop Geneva, Jochen Beuttel.
Microsoft Operations Framework Morten Lauridsen Engagement Manager Microsoft Consulting Services Morten Lauridsen Engagement Manager.
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Overlook of Messaging.
CERN IT Department CH-1211 Genève 23 Switzerland t Monitoring: Tracking your tasks with Task Monitoring PAT eLearning – Module 11 Edward.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES GGUS Overview ROC_LA CERN
ManageEngine RecoveryManager Plus A Backup and Recovery Solution for Active Directory Complete/granular Restoration | Point-in-time Rollback | Version.
CERN IT Department CH-1211 Geneva 23 Switzerland t GDB CERN, 4 th March 2008 James Casey A Strategy for WLCG Monitoring.
The DIAMON Project Monitoring and Diagnostics for the CERN Controls Infrastructure Pierre Charrue, Mark Buttner, Joel Lauener, Katarina Sigerud, Maciej.
Computing Facilities CERN IT Department CH-1211 Geneva 23 Switzerland t CF Automatic server registration and burn-in framework HEPIX’13 28.
CERN IT Department CH-1211 Geneva 23 Switzerland t CF Computing Facilities Agile Infrastructure Monitoring CERN IT/CF.
Computing Facilities CERN IT Department CH-1211 Geneva 23 Switzerland t CF SCRUM Do’s and Don’ts ITTF 31 May 2013 Miguel Santos CERN IT/CF.
CERN IT Department CH-1211 Genève 23 Switzerland t 24x7 Service Support Tony Cass LCG GDB, 24 th November 2009.
Data & Storage Services CERN IT Department CH-1211 Genève 23 Switzerland t DSS Castor incident (and follow up) Alberto Pace.
CERN General Infrastructure Services Department CERN GS Department CH-1211 Geneva 23 Switzerland SMS CERN General Infrastructure.
Computing Facilities CERN IT Department CH-1211 Geneva 23 Switzerland t CF Agile Infrastructure Monitoring HEPiX Spring th April.
CERN IT Department CH-1211 Geneva 23 Switzerland t A proposal for improving Job Reliability Monitoring GDB 2 nd April 2008.
CERN IT Department CH-1211 Genève 23 Switzerland t Experiment Operations Simone Campana.
Computing Facilities CERN IT Department CH-1211 Geneva 23 Switzerland t CF CF Monitoring: Lemon, LAS, SLS I.Fedorko(IT/CF) IT-Monitoring.
CERN - IT Department CH-1211 Genève 23 Switzerland t A Quick Overview of ITIL John Shade CERN WLCG Collaboration Workshop April 2008.
CERN IT Department CH-1211 Genève 23 Switzerland t CERN IT Monitoring and Data Analytics Pedro Andrade (IT-GT) Openlab Workshop on Data Analytics.
Computing Facilities CERN IT Department CH-1211 Geneva 23 Switzerland t CF Alarming with GNI VOC WG meeting 12 th September.
CERN IT Department CH-1211 Genève 23 Switzerland t Migration from ELFMs to Agile Infrastructure CERN, IT Department.
CERN - IT Department CH-1211 Genève 23 Switzerland t Operating systems and Information Services OIS Proposed Drupal Service Definition IT-OIS.
Introduction to ITIL and ITIS. CONFIDENTIAL Agenda ITIL Introduction  What is ITIL?  ITIL History  ITIL Phases  ITIL Certification Introduction to.
CERN - IT Department CH-1211 Genève 23 Switzerland t Service Infrastructure EMI Kickoff Meeting.
Incident Management A disruption in normal or standard business operation that affects the quality of service Goal: restore normal service as quickly as.
CERN IT Department CH-1211 Geneva 23 Switzerland t OIS Tim Bell CERN IT/OIS 7 th September 2010 Service Management Meeting.
Introduction to ITSM processes. CONFIDENTIAL Agenda Problem Management  Overview  High Level process Change Management  Overview  High Level process.
CERN IT Department CH-1211 Geneva 23 Switzerland t Distributed Database Operations Workshop November 17 th, 2010 Przemyslaw Radowiecki CERN.
CERN IT Department CH-1211 Geneva 23 Switzerland t James Casey CCRC’08 April F2F 1 April 2008 Communication with Network Teams/ providers.
Platform & Engineering Services CERN IT Department CH-1211 Geneva 23 Switzerland t PES Migration of the ITCM workflow from Remedy to Service-Now.
CERN - IT Department CH-1211 Genève 23 Switzerland t IT-GD-OPS attendance to EGEE’09 IT/GD Group Meeting, 09 October 2009.
Integration of IT-DB Monitoring tools into General Notification Infrastructure Binathi Bingi.
Progress of ITSM at Pomona College and the use of Footprints Information Technology Services IT Service Management and the Tools Supporting it Pomona College,
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES The Common Solutions Strategy of the Experiment Support group.
ITIL Project Change Management Workshop 7 February 2007
CERN IT Department CH-1211 Genève 23 Switzerland t Monitoring: Present and Future Pedro Andrade (CERN IT) 31 st August.
Site notifications with SAM and Dashboards Marian Babik SDC/MI Team IT/SDC/MI 12 th June 2013 GDB.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES Author etc News from the CMS computing and offline monitoring.
SAP Trade Repository Reporting by Virtusa
BA Continuum India Pvt Ltd
ITILSC-OSA Dumps PDF ITIL Service Capability Operational Support and Analysis Exam Exam Code Exam.
Making Information Security Manageable with GRC
ITIL: Why Your IT Organization Should Care Service Support
ITIL: Why Your IT Organization Should Care Service Support
ITIL: Why Your IT Organization Should Care Service Support
Presentation transcript:

CERN IT Department CH-1211 Geneva 23 Switzerland t Problem management AI Thursday meeting 02/10/2014

Background Existing problem mgt meetings –Monthly, dedicated –Original goal was to follow up and reduce the ITCM events being handled by the SysAdmin and CCOperators teams; Thanks to Anthony for the good work New monitoring framework offers more possibilities to deal with notifications –Time for a redefinition of the goals The department decided to evolve problem mgt involving all IT services 2

Problem management: scope ITIL: Ensure stability in services, by identifying root causes and removing errors in the infrastructure. A Problem is the unknown underlying cause of one or more Incidents Goal of problem management is to minimise both the number and severity of incidents and potential problems –User-reported incidents versus automated tool-generated events (GNI notifications). –Our problem mgt effort will focus on these ones. 3

Problem management: goal What do we want to do? move forward with automation, from automated problem detection/notification to automated recovery actions How? Profiting the capabilities of the toolset in use (Puppet, GNI monitoring, SNOW) to deal with notifications at the most efficient level –Instead of sending them to the Ccoperators/sysadmins 4

In detail Problem detection: AI monitoring –Define metrics/sensors/notifications for your service (plus use of the existing ones) Problem logging: several possibilities –GNI dashboard –SNOW ticket –Dedicated consumers Problem categorization: –Which service? Which FE? If Snow ticket, which support group? Problem diagnosis and solution: –Ad-hoc, procedure or automated? 5

Next steps No dedicated effort, instead, part of service manager duties (as config mgt, monitoring, etc) What we expect from service managers: Decide how to set up notifications for your service –More details in Miguel’s talk Start thinking what procedures you can automate –Some of the existing ops procedures? –Recovery actions for newly defined service notifications? –See Jerome’s talk for an example (batch) 6