Presentation is loading. Please wait.

Presentation is loading. Please wait.

A A A N C N U I N F O R M A T I O N T E C H N O L O G Y : IT OPERATIONS 1 Problem Management Jim Heronime, Manager, ITSM Program Tanya Friehauf-Dungca,

Similar presentations


Presentation on theme: "A A A N C N U I N F O R M A T I O N T E C H N O L O G Y : IT OPERATIONS 1 Problem Management Jim Heronime, Manager, ITSM Program Tanya Friehauf-Dungca,"— Presentation transcript:

1 A A A N C N U I N F O R M A T I O N T E C H N O L O G Y : IT OPERATIONS 1 Problem Management Jim Heronime, Manager, ITSM Program Tanya Friehauf-Dungca, Manager, Problem Management 2/17/11 A A A N C N U I N F O R M A T I O N T E C H N O L O G Y IT OPERATIONS

2 A A A N C N U I N F O R M A T I O N T E C H N O L O G Y : IT OPERATIONS 2 Agenda  PM Overview  History  Vision & Mission  Operational Level Agreement (OLA)  Action Items  Trending (Proactive Problem Management)  Facilitated Meetings (MIR & ToE)  KPIs and Metrics  Future Initiatives  Questions? Problem Management Team Members

3 A A A N C N U I N F O R M A T I O N T E C H N O L O G Y : IT OPERATIONS 3 Problem Management Overview  Main goal of Problem Management: – Detection of the underlying causes of an incident and the subsequent resolution and prevention of the incidents.  Problem Management ensures: – The identification and classification of problems, root cause analysis, and resolution of problems  Problem Management process also includes: – The formulation of recommendations for improvement, maintenance of problem records, and review of the status of corrective actions

4 A A A N C N U I N F O R M A T I O N T E C H N O L O G Y : IT OPERATIONS 4 History of PM at AAA  Began our formal Problem Management practice in 2008. – Track major incidents – ID Root cause for major incidents – Rudimentary MS-Access dB to store info  Began formal implementation of ITSM in June 2009 – Average root cause found was 55.4% – Mean time to close problems = 6 days  Implemented current iteration of Problem Management October 2009. By January 2010. – Average root cause found was 83% – Mean time to close problems = 3 days  We continue to mature our process

5 A A A N C N U I N F O R M A T I O N T E C H N O L O G Y : IT OPERATIONS 5 Vision and Mission  VISION: – To permanently eliminate problems in our production environment and prevent new problems from occurring  MISSION: – To aggressively identify root cause of problems and drive permanent solutions to stabilize our IT infrastructure  We do this by: – PROCESSES: Ensuring PM processes and procedures are followed by IT support teams – ACTION ITEMS: Managing assigned action items and their timeframes with support teams to drive permanent solutions – ROOT CAUSE: Driving root cause identification within OLA timeframes

6 A A A N C N U I N F O R M A T I O N T E C H N O L O G Y : IT OPERATIONS 6 OLAs for PM Be aggressive: 3 Business days to identify root cause - Report enables us to track daily progress

7 A A A N C N U I N F O R M A T I O N T E C H N O L O G Y : IT OPERATIONS 7 Action Items  Objective: – Action items are identified and assigned to drive permanent solutions  Types of Action Items: – Root cause identification for every problem created from an incident – Areas of improvement Documentation Process improvement & training Vendor management Hardware replacement  How are Action Items identified? – Incident management activities – Problem management activities – Root Cause Analysis – Meetings: Daily IT Operations Meeting, Major Incident Review (MIR), or Team of Experts (ToE)  How are they tracked? – Maximo – integrated system with Change, Incident, and Asset

8 A A A N C N U I N F O R M A T I O N T E C H N O L O G Y : IT OPERATIONS 8 Trend Analysis (Proactive Problem Management)  Objective: – Analyze related incidents for common root causes  Collaboration with Operations Bridge: – Weekly work sessions to identify potential areas of concern – The Problem Management team reviews related incidents to look for common symptoms, causes, or conditions  Commonalities identified by trend analysis? – A Global Problem record is created and assigned to the Service Owner with appropriately assigned action items  Service Owner analysis: – The Service Owner prioritizes their efforts – Determine to identify root cause – Prioritize and approve with business for funding, scheduling

9 A A A N C N U I N F O R M A T I O N T E C H N O L O G Y : IT OPERATIONS 9  Reporting: – The Problem Management team reports out on the status of the Trend records as appropriate until ticket closure  Examples of Trend statuses: Trend Identified – Pending funding Trend Identified – Pending approval Trend Identified – Pending change Trend Identified – Changes not funded Trend Identified – Not a managerial priority Determined not a trend Trend Analysis (Proactive Problem Management)

10 A A A N C N U I N F O R M A T I O N T E C H N O L O G Y : IT OPERATIONS 10 Major Incident Review (MIR)  What is it? – Evaluation of the incident process after a major incident  What’s it’s purpose? – Validate details of the incident record – Review incident handling – identify opportunities – Identify lessons learned - share across the enterprise – Identify action items  When is one required? – Mandated for all Severity 1 incidents – Lower severities by request or as needed  Why does Problem Management facilitate a Major Incident Review? – Unbiased view of events – no call involvement

11 A A A N C N U I N F O R M A T I O N T E C H N O L O G Y : IT OPERATIONS 11 MIR Agenda

12 A A A N C N U I N F O R M A T I O N T E C H N O L O G Y : IT OPERATIONS 12 MIR Template

13 A A A N C N U I N F O R M A T I O N T E C H N O L O G Y : IT OPERATIONS 13 Team of Experts (ToE)  What is it? – A special team of technical subject matter experts (SMEs) assembled to analyze and resolve critical problems at an accelerated pace to minimize or eliminate exposure.  How long has this process been in place? – This is one of our newest additions – since December 2010  Why are ToEs initiated? – Teams not collaboratively engaging each other – Need to identify root cause immediately – back to back incidents – Leadership’s request for information and status of critical or chronic problems

14 A A A N C N U I N F O R M A T I O N T E C H N O L O G Y : IT OPERATIONS 14 ToE (cont.)  ToE Activities – Root cause analysis – Brainstorm solutions and permanent fixes – Assign action items and due dates  Where’s the template? – Currently under construction

15 A A A N C N U I N F O R M A T I O N T E C H N O L O G Y : IT OPERATIONS 15 KPIs and Metrics  KPIs – Root cause identified within OLA – MIRs conducted for Sev1 Incidents  Operational Metrics – Total Problems by Severity – Problems by Causing Party – Outages by Domain (Applications, Network, Security, Servers, Telecom or Other)

16 A A A N C N U I N F O R M A T I O N T E C H N O L O G Y : IT OPERATIONS 16 KPIs *Baseline determined by internal historical data = 82% *Industry standards non-existent

17 A A A N C N U I N F O R M A T I O N T E C H N O L O G Y : IT OPERATIONS 17 KPI Details *2010 Average for RC Identified within OLA = 85.7%

18 A A A N C N U I N F O R M A T I O N T E C H N O L O G Y : IT OPERATIONS 18 Examples of Metrics *Change Freeze AT&T AAA NCNU

19 A A A N C N U I N F O R M A T I O N T E C H N O L O G Y : IT OPERATIONS 19 Future Initiatives  Workarounds and defects – Known Error Database  Action item validation – quality check on completed actions  ToE template development

20 A A A N C N U I N F O R M A T I O N T E C H N O L O G Y : IT OPERATIONS 20 Questions?  PROBLEM MANAGEMENT TEAM MEMBERS – Mark Hernandez - IT Service Transition Analyst V – Gessica Briggs-Sullivan – IT Service Transition Analyst III – Andrew Egan - Intern


Download ppt "A A A N C N U I N F O R M A T I O N T E C H N O L O G Y : IT OPERATIONS 1 Problem Management Jim Heronime, Manager, ITSM Program Tanya Friehauf-Dungca,"

Similar presentations


Ads by Google