EGEE-III INFSO-RI- 222667 Enabling Grids for E-sciencE COD21 22.09.09 EGEE09 Barcelona C-COD Survey results Vera Hansper.

Slides:



Advertisements
Similar presentations
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks ROD model assessment ROC SEE By E. Atanassov,
Advertisements

Mixed-level English classrooms What my paper is about: Basically my paper is about confirming with my research that the use of technology in the classroom.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks gLite Release Process Maria Alandes Pradillo.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks From ROCs to NGIs The pole1 and pole 2 people.
EGEE-III INFSO-RI Enabling Grids for E-sciencE COD June 2009 COD-20 Hélène Cordier COD-20, CNRS-IN2P3, CSC.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Pole 3 – COD TOOLS Cyril L’Orphelin - CNRS/IN2P3.
Enabling Grids for E-sciencE COD 19 meeting, Bologna Nordic ROD experiences Michaela Lechner COD-19, Bologna.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks LHCOPN Ops WG Act 4 – Conclusion Guillaume.
INFSO-RI Enabling Grids for E-sciencE SA1: Cookbook (DSA1.7) Ian Bird CERN 18 January 2006.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks What GGUS can do for you JRA1 All hands.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks The network monitoring in grid context Operations.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks ROD model assessment ROC UKI John Walsh.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks GStat 2.0 Joanna Huang (ASGC) Laurence Field.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks The Bazaar Vision Ideas of RC/VO coordination,
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Operations Automation Team James Casey EGEE’08.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Multi-level monitoring - an overview James.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks EGEE-EGI Grid Operations Transition Maite.
EGEE-III INFSO-RI Enabling Grids for E-sciencE Overview of STEP09 monitoring issues Julia Andreeva, IT/GS STEP09 Postmortem.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks SA1: Grid Operations Maite Barroso (CERN)
Enabling Grids for E-sciencE INFSO-RI Tools for CIC Operations, Bologna, 24th May Monitoring workflow in EGEE GOC DB is used to get the list.
EGEE-III INFSO-RI Enabling Grids for E-sciencE COD June 2009 COD-20 Parallel sessions Hélène Cordier COD-20, CNRS-IN2P3,
EGEE-III INFSO-RI Enabling Grids for E-sciencE Operations Automation Team KoM, May ROC VIEW (SWE)‏ Javier Lopez Cacheiro/
EGEE-III INFSO-RI Enabling Grids for E-sciencE Pre-production in EGEEIII Operation principles Antonio Retico EGEE-II / EGEE II SA1.
EGEE-III INFSO-RI Enabling Grids for E-sciencE Ricardo Rocha CERN (IT/GS) EGEE’08, September 2008, Istanbul, TURKEY Experiment.
8 th CIC on Duty meeting Krakow /2006 Enabling Grids for E-sciencE Feedback from SEE first COD shift Emanoil Atanassov Todor Gurov.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Regional Dashboard Cyril L’Orphelin - CNRS/IN2P3.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Using GStat 2.0 for Information Validation.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks ROC Security Contacts R. Rumler Lyon/Villeurbanne.
EGEE-II INFSO-RI Enabling Grids for E-sciencE Operations procedures: summary for round table Maite Barroso OCC, CERN
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks CIC portal Requirements from users WLCG service.
EGEE-III INFSO-RI Enabling Grids for E-sciencE COD20. June 2009 Helsinki R-COD in UKI Claire Devereux, Jeremy Coles & Co. COD-20,
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks COD-17
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Regional Nagios Emir Imamagic /SRCE EGEE’09,
INFSO-RI Enabling Grids for E-sciencE FTS failure handling Gavin McCance Service Challenge technical meeting 21 June.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Operations Automation Team Kickoff Meeting.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks SA1 & SA2-ENOC Interactions status and plans.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks LHCOPN Operations WS: Introduction & Objectives.
INFSO-RI Enabling Grids for E-sciencE Operations Parallel Session Summary Markus Schulz CERN IT/GD Joint OSG and EGEE Operations.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Pole 2 : Restructuration of the OPS Manual.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks LHCOPN Operational model: Roles and functions.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Grid Configuration Data or “What should be.
EGEE-III INFSO-RI Enabling Grids for E-sciencE Pole 2 wrap up Vera, Helene, Malgorzata, David, Fotis, Diana.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks What all NGIs need to do: Helpdesk / User.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Best Practices and Use cases David Bouvet,
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Grid is a Bazaar of Resource Providers and.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks ROC model assessment AP ROC ShuTing Liao.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks The Dashboard for Operations Cyril L’Orphelin.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks COD-16 (Transition to EGEE-III) Report to.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks COD-17
Enabling Grids for E-sciencE EGEE-II INFSO-RI ROC managers meeting at EGEE 2007 conference, Budapest, October 1, 2007 Admin Matters Vera Hanser.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI COD activity in EGI-InSPIRE Marcin Radecki CYFRONET, Poland & COD Team 9/29/2016.
EGEE-III INFSO-RI Enabling Grids for E-sciencE COD EGEE09 Barcelona Pole-2 Restructuring of Procedures Vera Hansper.
Nordic NE ROC Face 2 Face Meeting
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks MyEGEE David Horat (
Documentation, Best Practices and Procedures: Roadmap
PL-Grid – an example of NGI support structure Marcin Radecki
Chapter 21 More About Tests.
LCG/EGEE Incident Response Planning
Patricia Méndez Lorenzo ALICE Offline Week CERN, 13th July 2007
NGI Operations readiness report
LHCOPN Operations: Yearly review
Cyril L’Orphelin (CC-IN2P3) COD-19, Bologna, March 30th 2009
ROD model assessment ROC FR
Analysis Operations Monitoring Requirements Stefano Belforte
Nordic ROC Organization
NE-ROC Nordics Operations
Take the summary from the table on
Pole 3 – Dashboard Assessment COD 20 - Helsinki
Helen Jefferis, Soraya Kouadri & Elaine Thomas
EGEE Operation Tools and Procedures
Presentation transcript:

EGEE-III INFSO-RI Enabling Grids for E-sciencE COD EGEE09 Barcelona C-COD Survey results Vera Hansper

Enabling Grids for E-sciencE EGEE-III INFSO-RI COD EGEE09 Barcelona About the survey 5 simple questions to assess how the community sees the C-COD role 6 ROCS responded – 2 ROCS had responses from 2 different ROD teams – 1 ROC had responses from 5 ROD teams – 12 separate responses in total All replies were welcome – Even if not all questions answered Skews the results slightly – Even is some responses were out of context

Enabling Grids for E-sciencE EGEE-III INFSO-RI COD EGEE09 Barcelona 1. How do you see the role of C-COD? Interpretation of what the C-COD role actually does. – 8 responses: an oversight role for ROD teams A co-ordination role overseeing quality of operations – 1 response: interpreted as the dashboard – 1 response: stressed lightweight framework of role – 2 responses: no comments One response was overall happy with C-COD and had no further suggestions

Enabling Grids for E-sciencE EGEE-III INFSO-RI COD EGEE09 Barcelona Good summary of the role (courtesy of UKI) Oversight and quality control of the RODs Help in ticket handling for non-ROC matters Provision of ROC tools Integration of resource status into ROD tools (dashboard) Coordination of RODs

Enabling Grids for E-sciencE EGEE-III INFSO-RI COD EGEE09 Barcelona 2. Do you find it useful? Affirmative: 11 No Response: 1 Negative: 0 Further comments: – Could be improved – Allows one to discover anomalies and sites that are not working in the proper way, thereby reducing problems on the production grids. – some matters are beyond the ROC's control – For quality control, many Operators found the COD intervention to be too invasive - the ROD operators know the sites better than the c-COD (however, they do understand that this was to help with the transition to the ROD model).

Enabling Grids for E-sciencE EGEE-III INFSO-RI COD EGEE09 Barcelona 3. Do you find it necessary? Affirmative: 11 No Response: 1 Negative: 0 Further comments: – Definitely – a production grid needs such kind of support.

Enabling Grids for E-sciencE EGEE-III INFSO-RI COD EGEE09 Barcelona 4. Do you think there is something missing from the daily operations handling of tickets? There are some related steps mentioned in different sections, for example, – In Sites in downtime: – When a ticket is open against a site that continues to add downtime the tickets must be closed... This case can also be put to Closing tickets... So maybe its better to have a workflow/flow chat to explain this kind of procedure/steps. There is no automatic closing of useless alarms (even though this is not easy to implement, but it is necessary). The two tickets that are generated when raising a ticket against a site (dashborad and GGUS) seem to be a bit much. 1 tickets would reduce management overheads

Enabling Grids for E-sciencE EGEE-III INFSO-RI COD EGEE09 Barcelona 4. responses, cont. I think that some automations are still missing; C-COD (sic) shifters are currently occupied by a number of "trivial" operations that probably could be executed automatically by the dashboard software. Or maybe we are only in need of a more ergonomic and integrated visualization (some infos are duplicated on more tabs, some other are hard to find). I think that the CIC dashboard support greatly help our work. I think it's ok.

Enabling Grids for E-sciencE EGEE-III INFSO-RI COD EGEE09 Barcelona 4. responses, cont. Maybe a better handling of sites not yet in production in the GOCDB, if they are monitored and in the site-bdii they will still show up, and we have practically the choice to put them in downtime or switch monitoring off (the later is not what the sites want, as they also want to know that the samtests work and don't reconfigure their site-bdii all the time) We don't see the not in production info in the dashboard, and we will have to close tickets where the error is practically still "on" and that is bad for our metrics. At least I would like to have a best practices for that (Do we know how downtimes are counted for nodes not yet in production for the sites???)

Enabling Grids for E-sciencE EGEE-III INFSO-RI COD EGEE09 Barcelona 4. responses, cont. There is consensus that we need a method in dashboard to resolve all current alarms in OK status (like the old "global" button). – This will allow one to concentrate time/effort in the really interesting ones.

Enabling Grids for E-sciencE EGEE-III INFSO-RI COD EGEE09 Barcelona 5. Do you feel something else could or should be done regarding daily operations, and if so, what? SLA can be considered with daily operations. There are a lot of clicks to be done if following the procedures properly. Many of the alarms are actually resulting from some one-off temporary transient failures, which although are rare for a single site, when looking at a number of sites happen quitefrequently. This generates some "noise in the system" especially since a site admin has no way of solving such a problem and this results only in frustration. Apart from the tests for Alice/LHCB etc. that are seen on COD dashboard, it will be wise to have also the results for the other VOs that the site supports, so that site administrators or country-level operators can investigate what is wrong.

Enabling Grids for E-sciencE EGEE-III INFSO-RI COD EGEE09 Barcelona 5. responses, cont. Again I hope that more integration with ROD ticketing system will improve the efficiency of the shifts. – A stronger integration among tools could further improve our activity. It is wonderful not to have to do it every or every second week, Our nordic solution there is best to keep us motivated to a minimum, because otherwise it would get very boring and it would be more difficult to hold a certain standard. – Would like to have a clear split in the dashboard Metrics between Nordics and the Benelux part of the NE ROC. It should be possible to remove a person or at least all of their roles in the GOCDB.

Enabling Grids for E-sciencE EGEE-III INFSO-RI COD EGEE09 Barcelona 5. responses, cont. The handling of alarms needs to be improved so that: – There are less false alarms (eg. the host cert check often fails, and then is later OK). In some cases, no other alarms were raised, so this may indicate that there is a problem with the alarm. – Alarms should be self-healing - operators spend a lot of time switching of transient alarms.

Enabling Grids for E-sciencE EGEE-III INFSO-RI COD EGEE09 Barcelona Summary It appears that C-COD, as an oversight body is perceived as needed – though there are things that could possibly be changed – Further study of these responses to find improvements – More feedback could also be useful More ideas and feedback have been obtained regarding the operations in general. – Also needs better analysis