Helene Cordier, CNRS-IN2P3 Villeurbanne, France

Slides:



Advertisements
Similar presentations
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks From ROCs to NGIs The pole1 and pole 2 people.
Advertisements

EGEE-III INFSO-RI Enabling Grids for E-sciencE COD June 2009 COD-20 Hélène Cordier COD-20, CNRS-IN2P3, CSC.
EGI: SA1 Operations John Gordon EGEE09 Barcelona September 2009.
INFSO-RI Enabling Grids for E-sciencE SA1: Cookbook (DSA1.7) Ian Bird CERN 18 January 2006.
Lundi 12 octobre 2015 CA update procedure Hélène Cordier IN2P3/CNRS Computing Centre, Lyon, France.
INFSO-RI Enabling Grids for E-sciencE EGEE 1 st EU Review – 9 th to 11 th February 2005 CERN.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks ROD model assessment ROC UKI John Walsh.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Operations Automation Team James Casey EGEE’08.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Multi-level monitoring - an overview James.
EGEE-III INFSO-RI Enabling Grids for E-sciencE COD21 22 Sept 2009 Forum & COD-22 since COD21 until EGI Hélène Cordier COD-22, CNRS-IN2P3,
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks EGEE-EGI Grid Operations Transition Maite.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks SA1: Grid Operations Maite Barroso (CERN)
INFSO-RI Enabling Grids for E-sciencE EGEE SA1 in EGEE-II – Overview Ian Bird IT Department CERN, Switzerland EGEE.
EGEE-III INFSO-RI Enabling Grids for E-sciencE COD June 2009 COD-20 Parallel sessions Hélène Cordier COD-20, CNRS-IN2P3,
8 th CIC on Duty meeting Krakow /2006 Enabling Grids for E-sciencE Feedback from SEE first COD shift Emanoil Atanassov Todor Gurov.
Operations Working Group Summary Ian Bird CERN IT-GD 4 November 2004.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Resource Allocation in EGEEIII Overview &
PIC port d’informació científica EGEE – EGI Transition for WLCG in Spain M. Delfino, G. Merino, PIC Spanish Tier-1 WLCG CB 13-Nov-2009.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Ops Portal New Requirements.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Grid Monitoring Tools E. Imamagic, SRCE CE.
EGEE is a project funded by the European Union under contract IST Roles & Responsibilities Ian Bird SA1 Manager Cork Meeting, April 2004.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks COD-17
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Operations Automation Team Kickoff Meeting.
Mardi 8 mars 2016 Status of new features in CIC Portal Latest Release of 22/08/07 Osman Aidel, Hélène Cordier, Cyril L’Orphelin, Gilles Mathieu IN2P3/CNRS.
Mercredi 9 mars 2016 CIC Portal/COD Activities Hélène Cordier IN2P3/CNRS Computing Centre, Lyon, France.
Operations model Maite Barroso, CERN On behalf of EGEE operations WLCG Service Workshop 11/02/2006.
INFSO-RI Enabling Grids for E-sciencE Operations Parallel Session Summary Markus Schulz CERN IT/GD Joint OSG and EGEE Operations.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Pole 2 : Restructuration of the OPS Manual.
Feedback from joining and first COD shift M.Radecki on behalf of CE ROC COD-7, Lyon, France.
EGEE-III INFSO-RI Enabling Grids for E-sciencE Pole 2 wrap up Vera, Helene, Malgorzata, David, Fotis, Diana.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks What all NGIs need to do: Helpdesk / User.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Best Practices and Use cases David Bouvet,
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks EGEE Operational Procedures (Contacts, procedures,
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks ROC model assessment AP ROC ShuTing Liao.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks The Dashboard for Operations Cyril L’Orphelin.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks COD-16 (Transition to EGEE-III) Report to.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks COD-17
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks IT ROC: Vision for EGEE III Tiziana Ferrari.
Operations Coordination Team Maria Girone, CERN IT-ES GDB, 11 July 2012.
Enabling Grids for E-sciencE EGEE-II INFSO-RI ROC managers meeting at EGEE 2007 conference, Budapest, October 1, 2007 Admin Matters Vera Hanser.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI COD activity in EGI-InSPIRE Marcin Radecki CYFRONET, Poland & COD Team 9/29/2016.
EGEE-III INFSO-RI Enabling Grids for E-sciencE COD EGEE09 Barcelona Pole-2 Restructuring of Procedures Vera Hansper.
Nordic NE ROC Face 2 Face Meeting
EGEE-III INFSO-RI Enabling Grids for E-sciencE COD EGEE09 Barcelona C-COD Survey results Vera Hansper.
Documentation, Best Practices and Procedures: Roadmap
Il Sistema di Supporto INFNGrid & GGUS (Global Grid User Support )
PPS All sites Meeting: - CODs and PPS - Monitoring Tools
EGEE is a project funded by the European Union
SA1 Execution Plan Status and Issues
Ian Bird GDB Meeting CERN 9 September 2003
Service Level Agreement/Description between CE ROC and Sites
NGI Operations readiness report
Report on SLA progress Ioannis Liabotis <ilaboti at grnet.gr>
GOCDB current status and plans
Operations & Coordination Tools
Cyril L’Orphelin (CC-IN2P3) COD-19, Bologna, March 30th 2009
The CCIN2P3 and its role in EGEE/LCG
Maite Barroso, SA1 activity leader CERN 27th January 2009
ROD model assessment ROC FR
Operational Documentation Vera Hansper, CSC/NDGF
Nordic ROC Organization
R-COD model readiness in FR
‘s tools targeted to be useful for COD activity
NE-ROC Nordics Operations
Pole 3 – Dashboard Assessment COD 20 - Helsinki
LCG Operations Workshop, e-IRG Workshop
Connecting the European Grid Infrastructure to Research Communities
Leigh Grundhoefer Indiana University
EGEE Operation Tools and Procedures
Site availability Dec. 19 th 2006
Presentation transcript:

Helene Cordier, CNRS-IN2P3 Villeurbanne, France COD-16 Transition meeting EGEE-III http://indico.cern.ch/conferenceDisplay.py?confId=34516 Helene Cordier, CNRS-IN2P3 Villeurbanne, France

Contents What has been achieved in EGEE-II Mandate for EGEE-III https://cic.gridops.org/index.php?section=cod&page=codprocedures Mandate for EGEE-III http://indico.cern.ch/conferenceDisplay.py?confId=34516 Definition of the 3 poles https://twiki.cern.ch/twiki/bin/view/EGEE/COD_EGEE_III Main Goals and objectives of this meeting Pole 1 : Regionalisation of the service Pole 2 : Assessment and Improvement of the service Pole 3 : COD tools Operational model  Procedures and tools  Sustainability and Reliability at the international level Pole 1 : Regionalisation of the service – what (service level) / how (operations model) Pole 2 : Assesment of the service - Problem follow-up/ procedure /metrics Pole3 : COD tools Representative here means that we need to know who will be active and who will not COD-16 Transition meeting

Mandate COD as we understand it today will not exist at the end of EGEE-III; consequently, all current rocs are to run a regional COD service by the end of EGEE-III. The mandate of the COD is mainly to prepare for the evolution of the current operations model towards a more “sustainable” one in 2 years time. i.e. move from the current centralized COD model to regional COD (r-COD) model for all federations. Of course, our goal is to achieve the set-up of this distributed model as smooth and seamless as possible; hence a need for communication and share of expertise between regional CODs and for building together the operations model. Operational model  Procedures and tools  Sustainability and Reliability at the international level SUSTAINABILITY : r-cod model Smooth and Seamless : the current work is still done  Everybody stays on board unless proven otherwise RELIABITY /AVAILABILITY : model of c-COD COD-16 Transition meeting

From 6 wkg grps to 3 poles 1rst line Support : CE- NE- TWN - SWE Best Practices and Procedures : DE-CH /SEE COD tools : FR- IT- SWE-CE 1rst line support Best Practices COD tools Ops. Manual Procedure Failover /HA ops tools CIC integration TIC/ HA m/w Efforts from same people who are doing or have been doing the COD work and not somebody out of the blue. Integration of the OAT specifications into the evolution of the COD dashboard according to COD proposals, make regional instances  “catch-all “ and elaborate  "thin layer COD" instance. Weekly Operations Regional teams Operation Model SA1 coordination Ops tools OAT H.Cordier

Main Goals / Objectives Define what would the operations model be Validation of the internal organization 3 poles with enough «representative participation» and « active » leaders i.e. start with 4 federations e.g pole 1 and its steering comittee e.g. Pole 1 Active leaders should come from 2 federations. Each Pole has a task list and is staffed consequently 3 poles whith a list of tasks including some of the EGEE-II current tasks / COD15 actions list  easy to follow-up Identify need for tools / staffing Define precisely Interfaces with external bodies Logistics : Phone conferences bi-monthly by poles in-between meetings F2F Meetings following SA1 coordination meetings at best. Composition/specific needs ------------ Next meeting : EGEE’08 Representativity from all federations means that the outcome has some value to the EGEE-III project and the structure beyond. Some of the EGEE-II current tasks: People have to sort what/how COD-16 Transition meeting

Because there is mow more than one COD … COD : Distributed teams doing monitoring shifts and ensuring critical tests failures against sites are attended at i.e. at minima : communication schema + grid expertise stored in procedures and wiki First Line support : COD service for sites within a federation with current model of regionalization for operations being: Alarms for sites belonging to AP region and younger than 24h don't appear on the regular COD dashboard. 1st line supporters are allowed to switch these alarms off, mask them, and create tickets from these alarms. 1st line supporters  are allowed to pass information through the site notepad. Access to all other info in read-only mode. COD-16 Transition meeting

Glossary as of today R-COD : ultimate model of 1rst line support with maximum autonomy regarding alarms and operational tickets assigned to sites in the region. C-COD : small (how small can it be) team coordinating r-CODs, catch-all for monitoring cover need for escalation process/grid experts/reporting/grid integrity. Maximum autonomy to be defined Operational tickets : which ones now stay and could stay/ or should stay at the regional level  C-CODs COD-16 Transition meeting

1rst line support forum – CE /TW, SWE/NE https://twiki.cern.ch/twiki/bin/view/EGEE/Pole_1 Current model : alarms younger than 24h Planning and specificities of federations/Questionnaire Recommendations for the r-COD service on federations who join How to improve the model – Open questions Go further in the regional model ? Boundary between the 1rst line support/c-COD ? What would do the final c-COD team ? (e.g. need for escalation process/grid experts/reporting/grid integrity) Knowledge base and collaborative tools Impact of VO specific tests on the operations model ? Mutualization of work of both teams at the region level H.Cordier

1rst line support forum – cont’d CE /TW, SWE/NE How to improve the model – Open questions Knowledge base and collaborative tools Impact of VO specific tests on the operations model ? Mutualization of work of both teams at the region level Regionalisation of the COD service What it takes to have it successful still on May 1rst 2010. What is the operations model c-COD going to be  tomorrow How to improve the process of COD service  hints Run SAMAP Set aside network glitches impact Take into account the Core Services Failure COD-16 Transition meeting

Best practices &procedures DE-CH – SEE and ??? https://twiki.cern.ch/twiki/bin/view/EGEE/Pole_2 Best Practices Drive evolution of COD Best Practices reflected in procedures: Advisory comittee for ensuring uniformisation and regulation, before setting up new critical tests Interface to weekly operations meetings and ROC: incl.item 196: Ask CODs if they can check http://nagios.eugridpma.org/ in their cycle of work from GDA on monday 09/06/08 ROC : accounting tests passing critical. - Procedure release PPS: https://twiki.cern.ch/twiki/bin/view/LCG/PPSReleaseProcedures VO aspects: -VO registration: https://edms.cern.ch/document/503245 Site aspects: - EGEE-II SA1 SLD https://edms.cern.ch/document/860386/ -Site registration: https://edms.cern.ch/document/503198 - Downtime  Procedure - How-to https://cic.gridops.org/index.php?section=rc&page=SDprocedure General: - New CA release: http://goc.grid.sinica.edu.tw/gocwiki/Procedure_for_new_CA_release COD activity: - Operational Procedure Manual -- official version 1.6 to be released anytime now https://edms.cern.ch/document/840932 - [COD activity - describing the working groups and referencing the activity report] https://cic.gridops.org/index.php?section=cod&page=codprocedures - [COD dashboard - How to ] https://cic.gridops.org/common/all/documents/Portal_documentation/dashboard_howto.pdf H.Cordier

Best practices & procedures – cont’d DE-CH – SEE and ??? Ensuring that the present COD work is not disrupted: Operational use-cases follow-up /operational tools GGUS report https://twiki.cern.ch/twiki/bin/view/EGEE/OperationalUseCasesAndStatus Work Assessment so that the central service is operational Metrics for COD activity in EGEE-III, Handover improvement, Alarm masking rules and weighing mechanism – SEE Operation Procedure Manual cf Clemens’ talk and MSA1.2 COD-16 Transition meeting

COD tools – CE-IT-FR-SWE https://twiki.cern.ch/twiki/bin/view/EGEE/Pole_3 TIC Failover mechanisms for GSTAT, SAMAP, CIC and GOC Follow-up of HA of ops tools /ENOC and core service mw /TCG -- forum /OCC  COD dashboard Evolution towards N regional instances + 1 specifically dedicated for C-COD. Incl Alarm weighing mechanism. Request for change on monitoring tools through OAT Follow-up of requirements on OAT Follow-up of requirements on the separate tools Automatization is the how and do not tell where we want to go Automatization is providing the specifications of the new info source to COD dashboard All Automatization in turn is to be taken with caution that includes automated opening of tickets H.Cordier

POLE1 : EGEE-II – Now COD range : All federations Sites COD range : some federations Sites r-COD RC OPEN QUESTIONS To BE ADRESSED TOMORROW MORNING OR LATER TODAY by inter - POLE 1 discussions or PLENARY -- What is requested to build a r-COD acceptable to EGEE-III i.e : minimum level of service. We have to define this -- What is going to be the final model of COD becoming c-COD -- Could be minimal but we need to make sure that the model is sustainable as to May 1rst 2010 : IT STILL WORKS i.e : work out how to imagine the operations model what is the % of tickets passing to c-COD after 24h00 and why ? should that be, because it takes too long to solve ? Or because it is relevant to the project level  what do the r-cods think a c-cod is needed for ? escalation process/grid experts/reporting/grid integrity/security COD-16 Transition meeting

range : some federations Now COD range : some federations Sites r-COD RC COD-16 Transition meeting

range : some federations Now – End of PY2 C-COD range : some federations Sites r-COD RC C-COD ..... r-COD RC OPEN QUESTIONS TO BE ADRESSED TOMORROW: -- What is the number of federations involved : e.g. CERN or even if we come to a full set of r-COD it may very well be that it is not the case on May 1rst 2010. How the specific needs of the r-CODs / LHC, will impact the relations between r-COD and c-COD. c-COD need to be aware of the monitoring of this. COD-16 Transition meeting

Main Goals / Objectives Define what would the operations model be Validation of the internal organization 3 poles with enough «representative participation» and « active » leaders i.e. start with 4 federations e.g pole 1 and its steering comittee e.g. Pole 1 Active leaders should come from 2 federations. Each Pole has a task list and is staffed consequently 3 poles whith a list of tasks including some of the EGEE-II current tasks / COD15 actions list  easy to follow-up Identify need for tools / staffing Define precisely Interfaces with external bodies Collaborative tools/ mailing lists Staffing needs/Rota basis or lead team duties Representativity from all federations means that the outcome has some value to the EGEE-III project and the structure beyond. Some of the EGEE-II current tasks: People have to sort what/how HOMOGENEOUS INVOLVEMENT IN POLES NEEDED COD-16 Transition meeting