PPS All sites Meeting: - CODs and PPS - Monitoring Tools

Slides:



Advertisements
Similar presentations
LCG WLCG Operations John Gordon, CCLRC GridPP18 Glasgow 21 March 2007.
Advertisements

EGI: SA1 Operations John Gordon EGEE09 Barcelona September 2009.
Monitoring in EGEE EGEE/SEEGRID Summer School 2006, Budapest Judit Novak, CERN Piotr Nyczyk, CERN Valentin Vidic, CERN/RBI.
02/07/09 1 WLCG NAGIOS Kashif Mohammad Deputy Technical Co-ordinator (South Grid) University of Oxford.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks PPS All sites Meeting: Introduction & Agenda.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks ROD model assessment ROC UKI John Walsh.
EGEE is a project funded by the European Union under contract IST User support in EGEE Alistair Mills Torsten Antoni EGEE-3 Conference 20 April.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Operations Automation Team James Casey EGEE’08.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Multi-level monitoring - an overview James.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks EGEE-EGI Grid Operations Transition Maite.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks The EGEE User Support Infrastructure Torsten.
EGEE-III INFSO-RI Enabling Grids for E-sciencE Operations Automation Team KoM, May ROC VIEW (SWE)‏ Javier Lopez Cacheiro/
8 th CIC on Duty meeting Krakow /2006 Enabling Grids for E-sciencE Feedback from SEE first COD shift Emanoil Atanassov Todor Gurov.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Regional Dashboard Cyril L’Orphelin - CNRS/IN2P3.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Using GStat 2.0 for Information Validation.
Automatic Resource & Usage Monitoring Steve Traylen/Flavia Donno CERN/IT.
INFSO-RI Enabling Grids for E-sciencE An overview of EGEE operations & support procedures Jules Wolfrat SARA.
Operations Working Group Summary Ian Bird CERN IT-GD 4 November 2004.
EGEE-II INFSO-RI Enabling Grids for E-sciencE Operations procedures: summary for round table Maite Barroso OCC, CERN
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks CIC portal Requirements from users WLCG service.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Ops Portal New Requirements.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Grid Monitoring Tools E. Imamagic, SRCE CE.
Vendredi 19 février 2016 CIC portal development status and TODO list Gilles Mathieu, Osman Aidel, Cyril L’Orphelin IN2P3/CNRS Computing Centre, Lyon, France.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks COD-17
Mardi 8 mars 2016 Status of new features in CIC Portal Latest Release of 22/08/07 Osman Aidel, Hélène Cordier, Cyril L’Orphelin, Gilles Mathieu IN2P3/CNRS.
Mercredi 9 mars 2016 CIC Portal/COD Activities Hélène Cordier IN2P3/CNRS Computing Centre, Lyon, France.
Operations model Maite Barroso, CERN On behalf of EGEE operations WLCG Service Workshop 11/02/2006.
CERN - IT Department CH-1211 Genève 23 Switzerland t IT-GD-OPS attendance to EGEE’09 IT/GD Group Meeting, 09 October 2009.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks EGEE Operations: Evolution of the Role of.
Feedback from joining and first COD shift M.Radecki on behalf of CE ROC COD-7, Lyon, France.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks What all NGIs need to do: Helpdesk / User.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Best Practices and Use cases David Bouvet,
II EGEE conference Den Haag November, ROC-CIC status in Italy
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks ROC model assessment AP ROC ShuTing Liao.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks The Dashboard for Operations Cyril L’Orphelin.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks CYFRONET site report Marcin Radecki CYFRONET.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks COD-16 (Transition to EGEE-III) Report to.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks COD-17
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Operations automation team presentazione.
INFSO-RI Enabling Grids for E-sciencE GOCDB2 Matt Thorpe / Philippa Strange RAL, UK.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks IT ROC: Vision for EGEE III Tiziana Ferrari.
Enabling Grids for E-sciencE EGEE-II INFSO-RI ROC managers meeting at EGEE 2007 conference, Budapest, October 1, 2007 Admin Matters Vera Hanser.
Zscaler Support Best Practices Guide Version September 27, 2016.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Operational Tools M2 Update James Casey.
Nordic NE ROC Face 2 Face Meeting
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks MyEGEE David Horat (
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Nagios Grid Monitor E. Imamagic, SRCE OAT.
Documentation, Best Practices and Procedures: Roadmap
Il Sistema di Supporto INFNGrid & GGUS (Global Grid User Support )
James Casey, CERN IT-GD WLCG Workshop 1st September, 2007
NGI and Site Nagios Monitoring
Operations Status Report
POW MND section.
Helene Cordier, CNRS-IN2P3 Villeurbanne, France
Introduction to OAT presentations
Evolution of SAM in an enhanced model for monitoring the WLCG grid
GOCDB current status and plans
Cyril L’Orphelin (CC-IN2P3) COD-19, Bologna, March 30th 2009
Maite Barroso, SA1 activity leader CERN 27th January 2009
Operational Documentation Vera Hansper, CSC/NDGF
Nordic ROC Organization
R-COD model readiness in FR
NE-ROC Nordics Operations
Pole 3 – Dashboard Assessment COD 20 - Helsinki
LCG Operations Workshop, e-IRG Workshop
Introduction OMB, T. Ferrari/EGI.eu 12/4/2018
Kashif Mohammad Deputy Technical Co-ordinator (South Grid) Oxford
EGEE Operation Tools and Procedures
Site availability Dec. 19 th 2006
WLCG Workshop Introduction
Presentation transcript:

PPS All sites Meeting: - CODs and PPS - Monitoring Tools A. Retico (CERN/SA1) I.Neilson (CERN/SA1) M.Boehm (EDS) EGEE 2007 PPS All sites Meeting: “PPS Operations” session Budapest, Hungary, 3rd October, 2007

PPS Operations: recent history The current model of Operations for PPS was agreed at the beginning of 2006 Decision: same tools, actors and processes as in production SAM, gstat, CIC Portal, GGUS, COD, escalation procedure, ROC … cover PPS “Integration” of operations into PPS was pursued in two alternative ways: Replicating instances of processes/tools/documents in use in production e.g. documentation entry point (PPS web vs CIC portal), release procedure, SAM, FCR, wiki … “Including” PPS into processes and tools CIC Portal, COD, Ops procedures, certification by ROC, gstat GridView …

PPS Operations: recent history 2007: the operation conditions on the grid have changed More and more sites to monitor No new teams joining CODs Emphasis on automation Review started (by CODs) about PPS alarm-handling CODs are suffering: instability of PPS sites low priority given to PPS tickets non-responsiveness of ROCs to requests of suspension scarce attention to scheduled downtimes ROC and PPS sites are suffering: Tickets submitted in course of updates Need to reply to tickets for a service poorly used

PPS Operations: recent history August 2007: two options possible COD opening and following-up tickets to PPS sites No exceptions in the ops procedures for PPS sites ROCs automatically in the loop Full test of the support path for new services Frequent "false positives" or extra care needed by CODs COD effort unchanged whereas ROCs can set priorities COD not opening tickets and PPS sites registering to CIC Portal’s RSS alarm notifications: Faster than CODTPMROCs. Problem faced while still "hot" Step in the direction of automation Support line in PPS not strictly dependent upon ROC CODs and TPMs ROCs completely out of the loop. Experience: CODs not submitting tickets  service degradation

PPS Operations: recent history A possible compromise: Option 2) ++ Same as option 2 with in addition a weekly status report sent by the CODs to ROC and PPS support Does this really make life simpler for CODs? Still need to monitor and follow-up PPS sites Report has to be prepared  additional procedure On one thing most people agrees: Splitting the CIC Portal in Production and PPS instances We expect to make some decisions during this conference

What’s new with monitoring tools COD’s work is hardly replaceable, but the tools have also improved Tools for site-level monitoring in preparation Nagios being packaged for a distribution with YAIM Pilot installation monitoring CERN_PPS running SMS alerts from SAM (centrally managed) RSS feed with alarms from the CIC portal (subscribing) proved to be useful at some sites (e.g. CERN_PPS) “GridMap”interface to SAM for high-level monitoring available on PPS web Survey: Are you using a tool to monitor your PPS site? RSS Alarms Nagios Ganglia other tools Nothing

Waiting for the lunch… Short demo of GridMap in PPS (Max) Short demo of Nagios @ CERN_PPS (Ian) Short case study: Self-operations in SWE ROC (Mario) Collective Exercise: alternatives to COD? Hypothesis: The CODs, tomorrow, stop monitoring PPS Can we monitor ourselves? How? Start thinking

Next speaker: Max Bohem (EDS) Questions after Mario’s talk, please Thanks