Operation team at Ccin2p3 Suzanne Poulat –

Slides:



Advertisements
Similar presentations
LCG WLCG Operations John Gordon, CCLRC GridPP18 Glasgow 21 March 2007.
Advertisements

Extern name server - translates addresses of s messages - enables users to use aliases - … ID cards system - controls entrance to buildings,
IT Technical Support Policies and Procedures South Nottingham College.
Network Redesign and Palette 2.0. The Mission of GCIS* Provide all of our users optimal access to GCC’s technology resources. *(GCC Information Services:
08/11/908 WP2 e-NMR Grid deployment and operations Technical Review in Brussels, 8 th of December 2008 Marco Verlato.
Computers & Employment By Andrew Attard and Stephen Calleja.
Status Report on Tier-1 in Korea Gungwon Kang, Sang-Un Ahn and Hangjin Jang (KISTI GSDC) April 28, 2014 at 15th CERN-Korea Committee, Geneva Korea Institute.
© 2008 Cisco Systems, Inc. All rights reserved.CIPT1 v6.0—2-1 Administering Cisco Unified Communications Manager Understanding Cisco Unified Communications.
Enabling Grids for E-sciencE COD 19 meeting, Bologna Nordic ROD experiences Michaela Lechner COD-19, Bologna.
CC - IN2P3 Site Report Hepix Fall meeting 2009 – Berkeley
OSG Operations and Interoperations Rob Quick Open Science Grid Operations Center - Indiana University EGEE Operations Meeting Stockholm, Sweden - 14 June.
TELE 301 Lecture 10: Scheduled … 1 Overview Last Lecture –Post installation This Lecture –Scheduled tasks and log management Next Lecture –DNS –Readings:
SEE-GRID-SCI SEE-GRID-SCI Operations Procedures and Tools Antun Balaz Institute of Physics Belgrade, Serbia The SEE-GRID-SCI.
Monitoring the Grid at local, national, and Global levels Pete Gronbech GridPP Project Manager ACAT - Brunel Sept 2011.
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Job Monitoring for the LHC experiments Irina Sidorova (CERN, JINR) on.
Overview of day-to-day operations Suzanne Poulat.
Monitoring in EGEE EGEE/SEEGRID Summer School 2006, Budapest Judit Novak, CERN Piotr Nyczyk, CERN Valentin Vidic, CERN/RBI.
1 24x7 support status and plans at PIC Gonzalo Merino WLCG MB
11/30/2007 Overview of operations at CC-IN2P3 Exploitation team Reported by Philippe Olivero.
GridPP Deployment & Operations GridPP has built a Computing Grid of more than 5,000 CPUs, with equipment based at many of the particle physics centres.
A.Golunov, “Remote operational center for CMS in JINR ”, XXIII International Symposium on Nuclear Electronics and Computing, BULGARIA, VARNA, September,
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Next steps with EGEE EGEE training community.
EGI – Security Training and Dissemination Mingchao Ma STFC – RAL, UK.
1 Oracle Enterprise Manager Slides from Dominic Gélinas CIS
BNL Tier 1 Service Planning & Monitoring Bruce G. Gibbard GDB 5-6 August 2006.
1 LHCb on the Grid Raja Nandakumar (with contributions from Greig Cowan) ‏ GridPP21 3 rd September 2008.
Ruth Pordes November 2004TeraGrid GIG Site Review1 TeraGrid and Open Science Grid Ruth Pordes, Fermilab representing the Open Science.
US LHC OSG Technology Roadmap May 4-5th, 2005 Welcome. Thank you to Deirdre for the arrangements.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks SA1: Grid Operations Maite Barroso (CERN)
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks The EGEE User Support Infrastructure Torsten.
EGEE-III INFSO-RI Enabling Grids for E-sciencE Ricardo Rocha CERN (IT/GS) EGEE’08, September 2008, Istanbul, TURKEY Experiment.
Operating Systems Concepts 1/e Ruth Watson Chapter 8 Chapter 8 Network Administrator Ruth Watson.
8 th CIC on Duty meeting Krakow /2006 Enabling Grids for E-sciencE Feedback from SEE first COD shift Emanoil Atanassov Todor Gurov.
Site Validation Session Report Co-Chairs: Piotr Nyczyk, CERN IT/GD Leigh Grundhoefer, IU / OSG Notes from Judy Novak WLCG-OSG-EGEE Workshop CERN, June.
HLRmon accounting portal DGAS (Distributed Grid Accounting System) sensors collect accounting information at site level. Site data are sent to site or.
EUROPEAN UNION Polish Infrastructure for Supporting Computational Science in the European Research Space Operational Architecture of PL-Grid project M.Radecki,
Jan 2010 OSG Update Grid Deployment Board, Feb 10 th 2010 Now having daily attendance at the WLCG daily operations meeting. Helping in ensuring tickets.
EGEE-II INFSO-RI Enabling Grids for E-sciencE Operations procedures: summary for round table Maite Barroso OCC, CERN
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Ops Portal New Requirements.
CERN IT Department CH-1211 Genève 23 Switzerland t CERN IT Monitoring and Data Analytics Pedro Andrade (IT-GT) Openlab Workshop on Data Analytics.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI VO auger experience with large scale simulations on the grid Jiří Chudoba.
HLRmon accounting portal The accounting layout A. Cristofori 1, E. Fattibene 1, L. Gaido 2, P. Veronesi 1 INFN-CNAF Bologna (Italy) 1, INFN-Torino Torino.
2.0 PROJECT INITIATION AND PLANNING The initiating and planning are the phase where process or workflow to develop the system will identify and planning.
Operations model Maite Barroso, CERN On behalf of EGEE operations WLCG Service Workshop 11/02/2006.
26/01/2007Riccardo Brunetti OSCT Meeting1 Security at The IT-ROC Status and Plans.
BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.
CERN - IT Department CH-1211 Genève 23 Switzerland t IT-GD-OPS attendance to EGEE’09 IT/GD Group Meeting, 09 October 2009.
EMI INFSO-RI Testbed for project continuous Integration Danilo Dongiovanni (INFN-CNAF) -SA2.6 Task Leader Jozef Cernak(UPJŠ, Kosice, Slovakia)
A Computing Tier 2 Node Eric Fede – LAPP/IN2P3. 2 Eric Fede – 1st Chinese-French Workshop Plan What is a Tier 2 –Context and definition To be a Tier 2.
II EGEE conference Den Haag November, ROC-CIC status in Italy
© 2012 AT&T Intellectual Property. All rights reserved. AT&T, the AT&T logo and all other AT&T marks contained herein are trademarks of AT&T Intellectual.
Setting up NGI operations Ron Trompert EGI-InSPIRE – ROD teams workshop1.
Using HLRmon for advanced visualization of resource usage Enrico Fattibene INFN - CNAF ISCG 2010 – Taipei March 11 th, 2010.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks CYFRONET site report Marcin Radecki CYFRONET.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI Services for Distributed e-Infrastructure Access Tiziana Ferrari on behalf.
EGEE is a project funded by the European Union under contract IST ROC-IT User Support in the EGEE infrastructure Riccardo Brunetti INFN-Torino.
Scuola Grid - Martina Franca, Thursday 08 November Il Sistema di Supporto INFNGrid & GGUS ( Global Grid User.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
Vendredi 27 avril 2007 Management of ATLAS CC-IN2P3 Specificities, issues and advice.
Staff Training Week ( June 2013) A SECURE AND USABLE COMPUTER SYSTEM FOR ADMINISTRATIVE STAFF Antonio Ruiz González Beatriz Jiménez Valverde.
University of Florida EMS Campus Kickoff Martha Elder
POW MND section.
CC - IN2P3 Site Report Hepix Spring meeting 2011 Darmstadt May 3rd
CC IN2P3 - T1 for CMS: CSA07: production and transfer
R-COD model readiness in FR
Relate to Clients on a business level
Leigh Grundhoefer Indiana University
Pierre Girard ATLAS Visit
Wide Area Workload Management Work Package DATAGRID project
EGEE Operation Tools and Procedures
Presentation transcript:

Operation team at Ccin2p3 Suzanne Poulat –

Overview Operation Team Organisation Operation’s role Services during out of working hours Tools Monitored services Examples Suzanne Poulat -

Operation team  Two groups : Support and Operation  Support (9 persons) : −general user support, −dedicated persons for LHC experiments, −help-desk(Xhelp), −opening CC to collaborations and other sciences  Operation : details follow 3Suzanne Poulat -

Organisation  Ten persons in the group −two for Grid coordination −Four for Operation −Four operators in shift to cover 08:00AM to 09:PM 7/7  on a weekly basis : −one person for operation (often 1.5) −The others have tasks as developments, monitoring or administrative tasks 4Suzanne Poulat -

Operation’s role  Check the avalaibility of all services (storage, cpu,…)  Optimize service usage  Insure that commitments of CCIN2P3 for the experiments and Grid VOs are respected  Organize the scheduled shutdowns  Coordinate actions during unscheduled downtimes  Monitoring and management of tape libraries  Create and manage accounts and AFS space  Organize the « on duty » service 5Suzanne Poulat -

Services - Out of working hours  On site night security guard from 6PM to 8AM and weekends –no computing actions : Alerting and Messaging  1 on-duty engineer (evenings, weekends) –Corrective actions if possible (documentations, Training) –else call an expert … if available  Weekend : 1 operator on site (10AM – 5PM) –first low level action –else call on-duty engineer  Result is a « Best effort » coverage 6Suzanne Poulat -

tools  Monitoring tool : NGOP -> Nagios  Remote Logging Service : RLS  Mails  Tickets from local and grid users : Xhelp interfaced with GGUS at CC  Web pages on the current state of services  Wiki for documentation, recipes, shutdowns, postmortem analysis  log of the daily production : ELog  Tickets web page for tapes and drives incidents (~50 incidents per month : 10 drives, 40 tapes with 2 lost of data)  Scripts to analyse faulty tapes 7Suzanne Poulat -

Monitored services  BQS  Storage : HPSS, dCache, AFS  Grid : CE, SRM, TOP BDII  Databases  Others : Tape libraries, Saphir (privileges and location of services)  Workers and all servers Suzanne Poulat -

Nagios 9

SMURF 10

Anastasie – Running jobs Suzanne Poulat -

Xhelp Suzanne Poulat -

Xhelp (2) Suzanne Poulat - ~320 tickets by month = 10 to 20 tickets by days

Xhelp (3) Suzanne Poulat -

implementations  Wiki Operation Wiki Operation  Nagios monitoring Nagios monitoring  Ovax Ovax  Users database Interface Users database Interface  Incidents robotique Incidents robotique  On duty tools On duty tools 15Suzanne Poulat -

QUESTIONS ? 16Suzanne Poulat -