EGEE’09 — V. Konoplev — September 21-25 2009 – Barcelona Enabling Grids for E-sciencE www.eu-egee.org Veniamin Konoplev (RRC-KI) & … EGEE’09 21-25 September.

Slides:



Advertisements
Similar presentations
Configuration management
Advertisements

System Development Life Cycle (SDLC)
Overview of IS Controls, Auditing, and Security Fall 2005.
SIM5102 Software Evaluation
Software Engineering CSE470: Requirements Analysis 1 Requirements Analysis Defining the WHAT.
Internal Control in a Financial Statement Audit
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Network trouble ticket standardisation -
This chapter is extracted from Sommerville’s slides. Text book chapter
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE II - Network Service Level Agreement (SLA) Establishment EGEE’07 Mary Grammatikou.
 To explain the importance of software configuration management (CM)  To describe key CM activities namely CM planning, change management, version management.
Internal Control in a Financial Statement Audit
Demo. Overview Overall the project has two main goals: 1) Develop a method to use sensor data to determine behavior probability. 2) Use the behavior probability.
Find regular encounter pattern from mobile users. Regular encounter indicates an encounter trend that is repetitive and consistent. Using this metric can.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks LHCOPN Ops WG Act 4 – Conclusion Guillaume.
INFSO-RI Enabling Grids for E-sciencE Logging and Bookkeeping and Job Provenance Services Ludek Matyska (CESNET) on behalf of the.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Spyros Kopsidas Center for Research and.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks The network monitoring in grid context Operations.
INFSO-RI Enabling Grids for E-sciencE GRID sites connectivity database design Anthony Teslyuk, RRC KI JRA4, SA2 Meeting 4 th EGEE.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks WMSMonitor: a tool to monitor gLite WMS/LB.
EGEE-III INFSO-RI Enabling Grids for E-sciencE Monitoring and enforcement of Service Level Agreements John Shade EGEE-II / EGEE.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks SA2 Quality Plan for EGEE III Geneviève.
Enabling Grids for E- sciencE EGEE and gLite are registered trademarks EGEE-III INFSO-RI Analysis of Overhead and waiting times.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Design of an Expert System for Enhancing.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Wojciech Lapka SAM Team CERN EGEE’09 Conference,
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks EGEE-III Network activity overall Xavier.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Etienne Dublé - CNRS/UREC EGEE SA2 Xavier.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Abel Carrión Ignacio Blanquer Vicente Hernández.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Using GStat 2.0 for Information Validation.
Face Image-Based Gender Recognition Using Complex-Valued Neural Network Instructor :Dr. Dong-Chul Kim Indrani Gorripati.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Vassiliki Pouli
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Task tracking SA3 All Hands Meeting Prague.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Xavier Jeannin Activity Manager CNRS EGEE-III.
1 Chapter 12 Configuration management This chapter is extracted from Sommerville’s slides. Text book chapter 29 1.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks ENOC - Status and plans Guillaume Cessieux.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Standard network trouble tickets exchange.
ATP Future Directions Availability of historical information for grid resources: It is necessary to store the history of grid resources as these resources.
EGEE-II INFSO-RI Enabling Grids for E-sciencE End-to-End Service Level Agreement Provisioning and Monitoring for End-to-End QoS.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks APEL CPU Accounting in the EGEE/WLCG infrastructure.
INFSO-RI Enabling Grids for E-sciencE NRENs & Grids Workshop Relations between EGEE & NRENs Mathieu Goutelle (CNRS UREC) EGEE-SA2.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Xavier Jeannin (CNRS/UREC Paris, FR) 24.
INFSO-RI SA2 ETICS2 first Review Valerio Venturi INFN Bruxelles, 3 April 2009 Infrastructure Support.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Regional Nagios Emir Imamagic /SRCE EGEE’09,
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks A three years thorough review of a project’s.
CHARACTERIZING CLOUD COMPUTING HARDWARE RELIABILITY Authors: Kashi Venkatesh Vishwanath ; Nachiappan Nagappan Presented By: Vibhuti Dhiman.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks SA2 Networking support for EGEE III Xavier.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks SA1 & SA2-ENOC Interactions status and plans.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks LHCOPN Operations WS: Introduction & Objectives.
SA2 All Hands Meeting — V. Konoplev — 27 March 2009 – Rome Enabling Grids for E-sciencE Veniamin Konoplev (RRC-KI) All Hands meeting GARR/Rome.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks LHCOPN Operational model: Roles and functions.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks LHCOPN operations Presentation and training.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks LHCOPN operations Presentation and training.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Grid Configuration Data or “What should be.
INFSO-RI Enabling Grids for E-sciencE Network Services Development Network Resource Provision 3 rd EGEE Conference, Athens, 20 th.
LHCOPN operational model Guillaume Cessieux (CNRS/FR-CCIN2P3, EGEE SA2) On behalf of the LHCOPN Ops WG GDB CERN – November 12 th, 2008.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Mining Job Monitoring Data Automatic Error.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks ROC model assessment AP ROC ShuTing Liao.
Norwegian Meteorological Institute met.no QC2 Status
Dillon: CSE470: ANALYSIS1 Requirements l Specify functionality »model objects and resources »model behavior l Specify data interfaces »type, quantity,
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Astrophysical Cluster Session Claudio Vuerli,
INFSO-RI Enabling Grids for E-sciencE GOCDB2 Matt Thorpe / Philippa Strange RAL, UK.
Connect. Communicate. Collaborate Place your organisation logo in this area End-to-End Coordination Unit Marian Garcia, Operations Manager, DANTE LHC Meeting,
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Trouble Ticket exchange standardization Mathieu.
Establishing by the laboratory of the functional requirements for uncertainty of measurements of each examination procedure Ioannis Sitaras.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks ENOC status LHC-OPN meeting – ,
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Nagios Grid Monitor E. Imamagic, SRCE OAT.
INFSO-RI Enabling Grids for E-sciencE TNC 2005 Networking activities in EGEE Mathieu Goutelle (CNRS UREC, France) EGEE-SA2 activity.
Sizing With Function Points
Infrastructure Support
Examining a Windows NT Infrastructure (2)
M. Kezunovic (P.I.) S. S. Luo D. Ristanovic Texas A&M University
Presentation transcript:

EGEE’09 — V. Konoplev — September – Barcelona Enabling Grids for E-sciencE Veniamin Konoplev (RRC-KI) & … EGEE’ September 2009 Trouble ticket and incident correlation

Enabling Grids for E-sciencE EGEE-III INFSO-RI Subject history Current ENOC mission area includes receiving and processing NREN’s TT flow to be aware of potential network connectivity problems that can affect EGEE operation. Smart and proper interpretation of TT content is essential for ENOC as mediator between NREN and EGEE end users. Statistical trouble ticket (TT) matching approach was proposed at the beginning of EGEE III to facilitate finding correlation of TT content to a part of possibly affected EGEE infrastructure. Statistical matching approach finds correlations between NREN’s TT content and real observed EGEE node connectivity status. Such correlations observed for a long period are forming a knowledge database. Starting from Dec 2008 statistical matching prototype was established in RBNET. It has been colleting EGEE node reachability status in terms of: fine,moderate,bad,unreachable. Principles of this approach as well as first obtained results was reported in EGEE’08, UF’09, DSA2.1. The details are summarized in the technical paper “…”.

Enabling Grids for E-sciencE EGEE-III INFSO-RI Statistical TT matching principles NREN’s trouble ticket is interpreted as a vector of essential attributes. Currently the following attributes are used: –Problem Interval – begin/end time of problem as reported by NREN –Problem Location – short string describing where the problem arises in terms of NREN’s identification scheme. –Problem Kind – tag describing the problem in unified ENOC classification scheme. Currently this field does not practically used since it is not established during TT preprocessing. Site connectivity history is summarized in alert database. An alert is represented as interval and severity. NREN’s TT are matched against NREN’s site alerts forming so called “hit statistic”. –Hit = [Ticket_ID, Location, SITE, Alerts_Severity] – –The hit take place if a site has alerts during a TT time interval. –The hit inherits a severity of hardest alert in the group. Hit statistic is grouped by: –Location. For each Location in the ticket we track: all TT and TT with hits. –Site-Location. For each site we track: number of hits observed for particular severity. Metrics extracted from hits statistics and used in TT analysis: –Counts(Location) – number of tickets seen for this location. –Ratio(Location) – percentage of TTs with hits for a particular location. –SiteImpact (Site-Location) – probability to get an alert for particular site if we see TT with particular location. This metric is tracked separately for different severities.

Enabling Grids for E-sciencE EGEE-III INFSO-RI Increasing matching accuracy techniques Purifying initial TT and Alert data: –TTs and alerts with likely intervals only are taken into account (~ 15min – 4hour). Detecting group connectivity events –Monitoring intermediate points. I.e. Pinger-to-GEANT uplink and NREN-to-GEANT uplinks. –Check global number of simultaneous active alerts. –Check number of simultaneous active alerts per NREN. Apply TT and alert interval padding. –Extend TT and alert time intervals by small configurable parameter (0-15min). This allows to reduce time errors (e.g. system clock offset or TT human mistakes). Put in correspondence data from several alert system located in different places (still pending).

Enabling Grids for E-sciencE EGEE-III INFSO-RI Input data: NREN complexity NRENTicket GARR243 HEANET143 RENATER135 REDIRIS88 HUNGARNET60 E2ECU38 NORDUNET30 Typical NREN topology is a rather complex that makes difficult human TT interpretation. This complexity also prevents storing and maintaining detail NRENs topologies in NOD database RENATER Network Topology Number of unique locations seen in NRENs allows to estimate NREN topology complexity

Enabling Grids for E-sciencE EGEE-III INFSO-RI Matching Results (1) LOCATIONTicket_Hits/Ticket_CountsSite Impact (%) Significance (%)Valid IT / POP-CA -- POP-RM1/3 INFN-CAGLIARI33 Yes IT / HSH-VICO EQUENSE1/3 SPACI-CS-IA642038? IT / INFN - NAPOLI1/3 INFN-T13355No INFN-CNAF3357No IT / ASI - TORINO --1/3 INFN-LNL-23388No INFN-PADOVA3388No INFN-MILANO3373No ITB-BARI3350No IT / UNI-NAPOLI PARTH1/4 INFN-ROMA23327No INFN-CAGLIARI33 No IT / UNI-ROMA-LUSPIO1/4 INFN-BOLOGNA2571? INFN-T12555? INFN-CNAF2557? PPS-CNAF2550? IT / POP-PD1 -- POP-M1/6 INFN-TRIESTE1724Yes –Initial believe of statistical matching as a reliable method to map all essential ticket locations to list of affected sites turned out to be inconsistent. –Main reason – very weak statistic data. Locations with hits count > 1 are seldom –Matching results for GARR from Jan 2009 to Aug 2009 as example are figured below.

Enabling Grids for E-sciencE EGEE-III INFSO-RI Matching Results (2) Group N Location Group NREN SUM Number of TT since Jan 2009 Remarks for group GARRHEANETRENATER 1Total number of locations Since Jan Seen 2 or more times Set of tickets we consider 3Seen 3 or more times Suitable for statistical approach 4 Seen 3 or more times with no hits Can be considered as EGEE agnostic 5 Seen 3 or more times with hits Candidates for statistical TT matching 6 Reliably matched to EGEE sites Criteria: Location-Site object has 3 or more hits 7 "Grey zone" Need further/alternative processing =Group2-Group4-Group6 Tickets with “frequent” Locations 56% 10% 34% Commit as EGEE agnostic Matched to EGEE sites Still under the question But current matching results can be used as a part of TT processing workflow. As shown on the table below only 34% of tickets with repeated locations was left ” “under the question” for GARR, HEANET and RENATER

Enabling Grids for E-sciencE EGEE-III INFSO-RI Matching Results (3) Details for matched locations RENATER LOCATIONSITE-LOCATION FR / STRASBOURGIN2P3-IRES FR / MARSEILLEIN2P3-CPPM FR / JUSSIEUIPSL-IPGP-LCG2 FR / GRENOBLEIN2P3-LPSC FR / NANTESIN2P3-SUBATECH FR / ORSAYIPSL-IPGP-LCG2 HEANET LOCATIONSITE-LOCATION IE / DIAScpDIASie IE / IT TRALEEgiITTRie IE / GEANT giITTRie cpDIASie giNUIMie GARR -- NONE -- Matching detail for strong criteria (Location-Site has >=3 hits) are shown above. We can see 100% matching accuracy.

Enabling Grids for E-sciencE EGEE-III INFSO-RI Matching Results (4) Details for locations in “grey zone” FR / CAYENNE-FTLDFR / PARIS-2 FR / CRETEILFR / PARIS1 FR / AFNICFR / CERIMES FR / CSIFR / UNIVERSITE PARIS 10 FR / TELEHOUSE2 -INTERXION1 CIRCUITFR / INRA FR / PARIS1-ORSAYFR / INA FR / CLERMONT-FERRANDFR / PARIS2 FR / CADARACHEFR / NICE-CADARACHE FR / GEANT-E2EFR / BESANгON-STRASBOURG FR / PARIS-NOUMиAFR / LYON1-NICE FR / PARIS1-LYON1FR / PAU-TOULOUSE FR / TOURS - ORLиANSFR / NANTES-ANGERS FR / LE MANS - TOURSFR / KOUROU-CSG The list of locations left in grey zone for RENATER

Enabling Grids for E-sciencE EGEE-III INFSO-RI Conclusions Main practical results: –76% of repeated locations was considered as “EGEE agnostic” or mapped to EGEE sites –All mapped repeated locations (10%) were with 100% accuracy The reasons for TT matching fails. –Weak TT statistic.  Only small part of locations was suitable for matching (ticket counts >=3). Part with ticket count >= 4 was really negligible. –Not perfect node status detection.  Matching was performed using data from Smokeping and DownCollector. Smokeping had «not so good» uplink and DownCollector can not track multilevel node status detection. NREN can improve the content of their tickets –Short and accurate location (RENATER format is a good example) –Short problem severity tag. Matching results can be used as part TT processing in conjunction with lexicographical and manual location matching. Further directions: –Tune and improve matching criteria. –Go to combining statistical matching with other methods. –Renew Smokeping config and move it to “good” location. –Add multi-pinger TT processing functionality.