Nordic ROC Organization Gert Svensson, PDC, KTH, Nordic ROC Manager Nordic ROC and Baltic Grid meeting - Helsinki- 15 June 2009
ROC Duties Provide Help Desk facilities (first-level support). Provide second-level support by helping in the resolution of advanced and specialized operational problems that cannot be solved by site administrators. If necessary, the ROC will propagate and follow-up problems with higher-level operational or development teams. Ticket follow-up (ensure that sites work on tickets opened against them). Respond to tickets from sites in a timely manner. Manage and support the deployment of gLite middleware on sites. Registering new sites. Follow-up on accounting. Nordic ROC & BG – Helsinki, 15 June 2009
Functions and tools Functional Areas Operational tools Ticketing System
Regionalized model What is our target model? In EGI as much as possible will be regionalized based on NGI:s National Grid Initiatives ROC:s are responsible for day to day operations, with a minimal organization overseeing them More efficient Several NGI:s can have one ROC Nordic ROC & BG – Helsinki, 15 June 2009
Current operational model Nordic ROC & BG – Helsinki, 15 June 2009
Transition r-COD COD Duties to be performed all the time Duties to be performed periodically Look at the whole infrastructure r-COD Duties to be performed all the time Only look at sites in own region Nordic ROC & BG – Helsinki, 15 June 2009
Another view Nordic ROC & BG – Helsinki, 15 June 2009
Site responsibilty Adhere to the Operations Procedures Manual Maintain accurate information in GOCDB Adhere to the Grid Site Operations Policy Adhere to the Security and Availability Policy document Adhere to Service Level Description (SLD) Deploy supported versions of gLite (or compatible) middleware Respond to tickets in a timely manner Nordic ROC & BG – Helsinki, 15 June 2009
First line user support - TPM Provides 1st line support for users together with VO experts Assigns tickets to appropriate support units Monitor longstanding open unchanged tickets Is at the time being a central task More tickets will be sent directly to ROC in the future Only cases without natural region will be handled centrally Follow-up will stay central Nordic ROC & BG – Helsinki, 15 June 2009
First line support function First-line support in GGUS is called Ticket Process Management (TPM). The TPM duty is to assign tickets to the right Support Unit (SU). Assignment must be done in less than one working hour. TPMs only see 'normal' submitted tickets, i.e. those not assigned automatically (to the ROCs or a few VOs today). TPMs should recuperate 'forgotten' tickets. TPMs are notified for action on 2nd and 3rd level of ticket escalation. TPMs should open savannah entries for middleware problems submitted to GGUS. Function and models' details in http://edms.cern.ch/document/1000210 Antoni | Bosio | Dimou - SA1 F2F Meeting | CERN | 09/06/09
User support workflow User Support Ticket Processing Managers (TPM) analyse the problems reported and assign them to the correct second-level support units. VOs have support infrastructures to help their users with VO-specific problems. These infrastructures are under their own control. Usually, they are using other tools to support this effort. The Regional Operations Centres are responsible for dealing with problems arising in their associated resource centres GGUS benefits from experts spread all over the world for solving issues related to grid security, to networks, and to the interfaces with other grids.
User Support Workflow contd. For VO users and VO specific problems Mail to <VO>-user-support@ggus.org - Solves - Classifies - Monitors Automatic Ticket Creation TPM Grid+VO experts VO-specific Central Application (GGUS) VO Support Units Middleware Support Units Deployment Operations Support ROC Network Other Grids Nordic ROC & BG – Helsinki, 15 June 2009
Multi level monitoring framework
Terminology SNIC - Swedish National Infrastructure for Computing Organizing high-performance computing in Sweden Joint Research Unit in the EGEE III project NDGF – Nordic Data Grid Facility an organization for Grids set up by the Nordic Countries runs the Nordic Tier-1 distributed over 9 sites develops ARC middleware most staff distributed in the Nordic countries Nordic ROC & BG – Helsinki, 15 June 2009
NE ROC organisation Two federations: Nordic, Benelux One distributed ROC Three sites in Sweden, one in Finland (SNIC + NGDF) and three in Netherlands GGUS handling and 1:st line support – Regional Operator on Duty (ROD): Nordic handles Nordic Sites + Baltic Grid Collaboration between SNIC ROC and NDGF Netherlands handles the Benelux sites Rotated among the sites in weekly shifts TPM duty rotated between all ROC:s Rotated between sites in the NE ROC Nordic ROC & BG – Helsinki, 15 June 2009
Challenges Distributed ROC Distributed Tier-1 Knowledge of ARC and gLite by different groups Nordic ROC & BG – Helsinki, 15 June 2009
Regular meetings Meetings Nordic ROC meeting Thursday 10.00 Phone EGEE & WLCG Joint Operations meeting Monday 16.00 Phone EGEE SA1 meeting each second Tuesday 10.00 Phone NGDF meeting Friday 10.00 Jabber Nordic ROC & BG – Helsinki, 15 June 2009
Things to discuss How do we improve communication? Contact site directly, operations directors etc? Provide more help SLA – EGEE requires Service Level Agreements How do we handle that? 75 % availability over each month What should we do when site doesn’t respond? Nordic ROC & BG – Helsinki, 15 June 2009