Download presentation
Presentation is loading. Please wait.
1
EGEE/LCG Operation Workshop
24th-26th May 2005 A report on operation support, open issues and statistics Marco Verlato INFN – Sezione di Padova EGEE is a project funded by the European Union under contract IST
2
Outline History since last Operation Workshop
EGEE User Support Framework Grid.it helpdesk support infrastructure usage report interface to GGUS ROC Integration ROC SE helpdesk overview ROC Russia, SW, GER-CH snapshots statistics Some Issues EGEE/LCG Operation Workshop – May 24-26,
3
History Nov. 04: Outcome from User Support Task Force, Grid.it support infrastructure and pilot interface to/from GGUS presented at 1st EGEE/LCG Operation Workshop Nov. 04: Pilot GGUS-Grid.it helpdesk interface live demo at EGEE-2 Conference Dec. 04: Grid.it helpdesk code and interface documentation made available to other ROCs Jan. 05: E(gee, or xecutive) Support Commettee kick off at FZK, WP definition and mandate Mar. 05: Support on Duty start, GGUS/ Grid.it / Cic-on-duty helpdesks fully interfaced May 05: GGUS enhanced, SE and RU helpdesks interfaced EGEE/LCG Operation Workshop – May 24-26,
4
EGEE User Support: requirements
Support requests range from: Grid Services and Sites faults Problems with installation/configuration “How do I …?” Problems with applications Bugs Requirements for extra features Users may be Site Admins, VO application users, VO managers, … they all prefer a single point of contact for Grid problems User Support / Operation Support / VO Support are different but with a lot of overlap Different sets of experts and levels of support EGEE/LCG Operation Workshop – May 24-26,
5
EGEE User Support: infrastructure
The ROCs, VOs and the other project wide groups such as the Core Infrastructure Center (CIC), middleware groups (JRA), and network groups (NA), will be connected via a central integration platform provided by GGUS. This central helpdesk keeps track of all service requests and assigns them to the appropriate support groups. In this way, formal communication between all support groups is possible. To enable this, each group has to build only one interface between its internal support structure and the central GGUS application. EGEE/LCG Operation Workshop – May 24-26,
6
EGEE User Support: interfaces
Using the local Helpdesk Systems in conjunction with a central integration platform at GGUS Resource Center 1(RC) ... Resource Center N(RC) Local User Support Application Regional Operations Center (ROC) Third level support: Generic deployment Grid Middleware Report Problem The User Interface VO support Use the Webview Report Problem Central GGUS Application Interface CIC EGEE/LCG Operation Workshop – May 24-26,
7
EGEE User Support: Responsible Units
First Level Support GGUS team SOD (ROC experts rotation) Second Level Support CIC-on-duty ROC_Asia/Pacific ROC_CE ROC_CERN ROC_France ROC_GER/CH ROC_Italy ROC_North ROC_Russia ROC_SE ROC_SW ROC_UK/Ireland VOSupport (atlas,magic,biomed,compass,babar,cdf,alice,lhcb,cms,d0) Third Level Support (filled with experts provided by ROCs) Grid Deployment Castor Generic Deployment Manual Installation Pre-production system VO management/VOMS Grid Middleware d-Cache Data Management GLUE GridICE Information System/GIP/BDII R-GMA Security Management Workload Management ROC Helpdesks EGEE/LCG Operation Workshop – May 24-26,
8
The Grid.it portal 31 RCs ~1400 CPUs ~120 TB 21 VOs +DAG+MPI +DGAS
31 RCs ~1400 CPUs ~120 TB 21 VOs +DAG+MPI +DGAS EGEE/LCG Operation Workshop – May 24-26,
9
Deployment Status EGEE/LCG Operation Workshop – May 24-26,
10
Services and Sites Monitoring
EGEE/LCG Operation Workshop – May 24-26,
11
Grid.it Helpdesk EGEE/LCG Operation Workshop – May 24-26,
12
Trouble Ticketing System
The trouble ticketing system is based on OneOrZero Helpdesk tool ( coded in PHP, using MySQL, customizable, free Replaced with Xoops / xHelp tool soon Access allowed to registered members approved by administrators: End-users: they create the tickets describing problems or suggestions Supporters: fix the problems, or redirect somewhere else Site Managers: act as supporters for a given RC, and exchange tickets with Operatives for operational issues Operatives: people of ROC/CIC Central Management Team, Release & Deployment Team and Ticketing System Team itself, exchange tickets with Site Managers and Supporters EGEE/LCG Operation Workshop – May 24-26,
13
ROC Support Units ~ 40 people + site managers
EGEE/LCG Operation Workshop – May 24-26,
14
Weekly shifts 4 people a day weekly rotating 8.30-19.30 working hours
11x5 coverage ICQ channel Mainly busy with Operations EGEE/LCG Operation Workshop – May 24-26,
15
Usage Report Statistics for last 6 months of operations
~25 tickets a week on average EGEE/LCG Operation Workshop – May 24-26,
16
Usage Report Grid services Operative teams Grid sites VO applications
EGEE/LCG Operation Workshop – May 24-26,
17
Interface to GGUS http://infnforge.cnaf.infn.it/eticketimp/
First Interface between Grid.it Helpdesk and GGUS ready since November 04 and in ‘production’ since March 05 Based on Web Services at GGUS side, several advantages: sample code available for PHP / Perl and other computing languages very fast: service requests/sec on the GGUS Servers easy to adapt Based on at Grid.it side (importer tool) XML exchange format EGEE/LCG Operation Workshop – May 24-26,
18
Interface to GGUS EGEE/LCG Operation Workshop – May 24-26,
19
GGUSROC Basic Workflow
ROC Helpdesk GGUS System XML Mail GGUS/SOD Web Portal SUPP Unit CMT Ticket assignment CIC-on-duty CIC Interface SUPP Unit X Ticket solved notification SUPP Unit Y Web services EGEE/LCG Operation Workshop – May 24-26,
20
ROC Integration All ROCs were asked to create/enable their Support Structure to be integrated with GGUS: providing a contact to their helpdesk system providing a well defined structure behind their helpdesk system providing a list of experts committed to VO support and 3th level support filling the corresponding GGUS Responsible Units Some ROCs set up an helpdesk system interfaced to GGUS following the Grid.it example using OneOrZero SE: ready in production since April 25th RU: work started in April, interface in production since May 23th SW: almost ready EGEE/LCG Operation Workshop – May 24-26,
21
ROC SE Helpdesk Overview (slides from Alexandru Stanciu)
Oneorzero v1.6 helpdesk is hosted by ICI, RO it's a new release (March) which has functional enhancements and new features over the 1.4 and 1.5 series Integration with GGUS made based on INFN example, but with local customizations In production since 25 April Decentralized structure EGEE/LCG Operation Workshop – May 24-26,
22
ROC SE Helpdesk Structure
Two kinds of support groups Per country support groups: BG, CY, GR, IL, RO Specialized support groups: 14 support groups ( Site Certification, VOMS, MyProxy, etc. ) Each site has a supporter account in the country group where the site belongs: i.e. for GR there are: GR01AUTH, GR02UoM, GR03HEPNTUA, GR04FORTHICS, GR05DEMOKRITOS, HG01GRNET supporter accounts Each site account is registered with the mailing list as site contact Helpdesk administration is distributed each country has one admin managing the user registration process Generic GGUS support group is used for the interface: members should manage the workflow of tickets coming from GGUS reassign them to the right support group and supporter ROC Central Support group to coordinate helpdesk operations Operations coordination support group for the overall management of operations at the ROC level EGEE/LCG Operation Workshop – May 24-26,
23
ROC SE Ticket Categories
EGEE/LCG Operation Workshop – May 24-26,
24
ROC SE Ticket Statistics
EGEE/LCG Operation Workshop – May 24-26,
25
ROC SE Helpdesk Stats Currently 22 individual supporters and 17 site accounts registered Over 80 tickets in our database Support through helpdesk is provided on a “best effort” basis Helpdesk is mostly used for operations support most tickets are trouble tickets concerning sites EGEE/LCG Operation Workshop – May 24-26,
26
ROC Integration: Russia (thanks to Valeriy Kirichenko)
EGEE/LCG Operation Workshop – May 24-26,
27
ROC Integration: Russia
4 supporters from ITEP + 1 helpdesk admin working hours 8x5 coverage should have an answer in 2 working days (or send to GGUS or CERN) Supporters from other Russian sites may register Full integration with helpdesks of other Russian sites within Fall EGEE/LCG Operation Workshop – May 24-26,
28
ROC Integration: SW (slides from M. Kaci , F. Fassi, J. Salt)
Links to : Home FAQ Ticket Documents Repositories Training EGEE/LCG Operation Workshop – May 24-26,
29
ROC Integration: SW username/ password needed
Powered by OneOrZero v1.4 RC2 Red Lava EGEE/LCG Operation Workshop – May 24-26,
30
ROC Integration Some ROCs had different helpdesks inside their federation: CE & NE: helpdesk based on RT open to local users since April, plan to be interfaced to GGUS by end of May, support structure and responsibilities defined within their ROC, tickets expected to be answered in a reasonable time FR: home developed helpdesk, interface to GGUS by end of May GER-CH: helpdesk based on Remedy, interface to GGUS ready by June 8th UK-I: helpdesk based on Footprint, plan to be interfaced to GGUS by end of July All ROCs will have their Support System ready and interfaced to GGUS by end of July Open issue: what about ROC_CERN and ROC_Asia/Pacific? EGEE/LCG Operation Workshop – May 24-26,
31
ROC Integration: GER/CH (slide from Sven Hermann)
ROC User Support GER/CH based on web application similar to GGUS 1:1 ticket exchange with GGUS implemented portal currently tested; going into operation in June'05 ROC Operations Support GER/CH Handle tickets created in GGUS Support group changes every two weeks 2-3 people per RC involved Mo – Fr, 9:00 – 17:00 about 15 tickets/month FZK 06/06/ /06/2005 23/24 62/63 DESY 23/05/ /06/2005 21/22 60/61 GSI 09/05/ /05/2005 19/20 58/59 FhG 25/04/ /05/2005 17/18 56/57 11/04/ /04/2005 15/16 54/55 On Duty Site / Contact Date Calendar Week Project Week EGEE/LCG Operation Workshop – May 24-26,
32
ROC Integration: some numbers
Even if most ROC helpdesks are not interfaced to GGUS yet, ROC supports units are reached with mailing lists: ROC # tickets # open oldest CE 29 2 1 day France 3 1 month GER-CH 33 Italy 54 5 days NE 10 5 Russia 13 2 months SE 36 4 SW 31 12 UK-I 58 15 TOTAL 309 60 Statistic available since half March More than 90% coming from CIC-on-duty CIC-on-duty rate: ~ # 50/week 1st Level rates: GGUS ~ # 20/week SOD ~ # 4/week EGEE/LCG Operation Workshop – May 24-26,
33
Some Issues Distributed EGEE User/Operation Support Infrastructure is progressing, but: tickets must be solved within an acceptable timeframe, otherwise we’ll not attract users simply forwarding to ROCs may delay solution increase ROC experts participation to 1st Level Support / SOD might help: most of people at ROCs involved in deploying / troubleshooting the Grid can more easily solve tickets at once real responsive people must be placed behind the Support Units looking into user tickets is time consuming, but resources at ROCs now are mainly busy with Operations they can handle 1-2 tickets/day but not 50 tickets/day ROC resources needs to be re-allocated / re-organized / enhanced / committed to User Support Workflows, Monitoring, Reporting, Escalation procedures … see Alistair’s talk about Service Operation Challenge (SOC) Integration effort useless if at the end we are not able to provide a reliable service EGEE/LCG Operation Workshop – May 24-26,
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.