SEE-GRID-SCI Grid Operations Procedures Antun Balaz Institute of Physics Belgrade Serbia The SEE-GRID-SCI initiative.

Slides:



Advertisements
Similar presentations
LCG WLCG Operations John Gordon, CCLRC GridPP18 Glasgow 21 March 2007.
Advertisements

FP7-INFRA Enabling Grids for E-sciencE EGEE Induction Grid training for users, Institute of Physics Belgrade, Serbia Sep. 19, 2008.
SEE-GRID-SCI Antun Balaz SA1 Leader Institute of Physics Belgrade National, Regional and World-wide Grid eInfrastructures.
SEE-GRID-SCI Hands-On Session: Workload Management System (WMS) Installation and Configuration Dusan Vudragovic Institute of Physics.
08/11/908 WP2 e-NMR Grid deployment and operations Technical Review in Brussels, 8 th of December 2008 Marco Verlato.
EGEE-III INFSO-RI Enabling Grids for E-sciencE Nov. 18, EGEE and gLite are registered trademarks EGEE-III, Regional, and National.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Romanian SA1 report Alexandru Stanciu ICI.
INFSO-RI Enabling Grids for E-sciencE EGEE Induction Grid training for users, Institute of Physics Belgrade, Serbia Sep. 19, 2008.
SEE-GRID-SCI Regional Grid Infrastructure: Resource for e-Science Regional eInfrastructure development and results IT’10, Zabljak,
SEE-GRID-SCI SEE-GRID-SCI Operations Procedures and Tools Antun Balaz Institute of Physics Belgrade, Serbia The SEE-GRID-SCI.
Monitoring in EGEE EGEE/SEEGRID Summer School 2006, Budapest Judit Novak, CERN Piotr Nyczyk, CERN Valentin Vidic, CERN/RBI.
Enabling Grids for E-sciencE System Analysis Working Group and Experiment Dashboard Julia Andreeva CERN Grid Operations Workshop – June, Stockholm.
EGEE is a project funded by the European Union under contract IST User support in EGEE Alistair Mills Torsten Antoni EGEE-3 Conference 20 April.
Steve Traylen PPD Rutherford Lab Grid Operations PPD Christmas Lectures Steve Traylen RAL Tier1 Grid Deployment
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America Grid Monitoring Tools Alexandre Duarte CERN.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Next steps with EGEE EGEE training community.
June 24-25, 2008 Regional Grid Training, University of Belgrade, Serbia Introduction to gLite gLite Basic Services Antun Balaž SCL, Institute of Physics.
SEE-GRID-2 The SEE-GRID-2 initiative is co-funded by the European Commission under the FP6 Research Infrastructures contract no
Certification and test activity IT ROC/CIC Deployment Team LCG WorkShop on Operations, CERN 2-4 Nov
SEE-GRID-2 The SEE-GRID-2 initiative is co-funded by the European Commission under the FP6 Research Infrastructures contract no
EGEE-II INFSO-RI Enabling Grids for E-sciencE The GILDA training infrastructure.
1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.
SEE-GRID-SCI NA1-Technical Execution Plan Overview Open of PSC-03 Bucharest Ioannis Liabotis Greece GRNET iliaboti grnetSPAMFREE.gr.
SAM Tests SAM Devel. & Support Team CERN IT/GD WLCG/EGEE/OSG Operations Workshop 25 Jan. 2007, CERN.
8 th CIC on Duty meeting Krakow /2006 Enabling Grids for E-sciencE Feedback from SEE first COD shift Emanoil Atanassov Todor Gurov.
Grid Monitoring and Operations SAM Development Team CERN IT/GD Tier2 Admin Workshop 03 Dec. 2006, Mumbai.
Site Validation Session Report Co-Chairs: Piotr Nyczyk, CERN IT/GD Leigh Grundhoefer, IU / OSG Notes from Judy Novak WLCG-OSG-EGEE Workshop CERN, June.
SAM Sensors & Tests Judit Novak CERN IT/GD SAM Review I. 21. May 2007, CERN.
The SEE-GRID-SCI initiative is co-funded by the European Commission under the FP7 Research Infrastructures contract no Workflow repository, user.
7 September 2007 AEGIS 2007 Annual Assembly Current Status of Serbian eInfrastructure: AEGIS, SEE-GRID-2, EGEE-II Antun Balaz SCL, Institute of Physics,
Operations Working Group Summary Ian Bird CERN IT-GD 4 November 2004.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) The Egyptian Grid Infrastructure Maha Metawei
SEE-GRID-2 The SEE-GRID-2 initiative is co-funded by the European Commission under the FP6 Research Infrastructures contract no
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI How to integrate portals with the EGI monitoring system Dusan Vudragovic.
EGEE-II INFSO-RI Enabling Grids for E-sciencE Operations procedures: summary for round table Maite Barroso OCC, CERN
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks CIC portal Requirements from users WLCG service.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Ops Portal New Requirements.
LCG WLCG Accounting: Update, Issues, and Plans John Gordon RAL Management Board, 19 December 2006.
AEGIS Academic and Educational Grid Initiative of Serbia Antun Balaz (NGI_AEGIS Technical Manager) Dusan Vudragovic (NGI_AEGIS Deputy.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Deliverable DSA1.4 Jules Wolfrat ARM-9 –
INFSO-RI SA2 ETICS2 first Review Valerio Venturi INFN Bruxelles, 3 April 2009 Infrastructure Support.
INFSO-RI Enabling Grids for E-sciencE Introduction to Grid Computing, EGEE and Bulgarian Grid Initiatives, Sofia, South.
Mardi 8 mars 2016 Status of new features in CIC Portal Latest Release of 22/08/07 Osman Aidel, Hélène Cordier, Cyril L’Orphelin, Gilles Mathieu IN2P3/CNRS.
Operations model Maite Barroso, CERN On behalf of EGEE operations WLCG Service Workshop 11/02/2006.
SAM Status Update Piotr Nyczyk LCG Management Board CERN, 5 June 2007.
Feedback from joining and first COD shift M.Radecki on behalf of CE ROC COD-7, Lyon, France.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks EGEE Operational Procedures (Contacts, procedures,
II EGEE conference Den Haag November, ROC-CIC status in Italy
SEE-GRID-SCI MON Hands-on Session Vladimir Slavnić Institute of Physics Belgrade Serbia The SEE-GRID-SCI initiative.
Operation team at Ccin2p3 Suzanne Poulat –
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks ROC model assessment AP ROC ShuTing Liao.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks The Dashboard for Operations Cyril L’Orphelin.
SEE-GRID-SCI New AEGIS services Dusan Vudragovic Institute of Physics Belgrade Serbia The SEE-GRID-SCI initiative is co-funded.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks CYFRONET site report Marcin Radecki CYFRONET.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Operations automation team presentazione.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI Services for Distributed e-Infrastructure Access Tiziana Ferrari on behalf.
Scuola Grid - Martina Franca, Thursday 08 November Il Sistema di Supporto INFNGrid & GGUS ( Global Grid User.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Operations Portal OTAG September, 21th 2011 Cyril L’Orphelin – CCIN2P3/CNRS.
Enabling Grids for E-sciencE EGEE-II INFSO-RI ROC managers meeting at EGEE 2007 conference, Budapest, October 1, 2007 Admin Matters Vera Hanser.
Transition to EGI PSC-06 Istanbul Ioannis Liabotis Greece GRNET
Regional Operations Centres Core infrastructure Centres
Brief overview on GridICE and Ticketing System
Grid Operations Procedures
Introduction to Grid Technology
Overview of IPB responsibilities in EGEE-III SA1
Nordic ROC Organization
GGUS Partnership between FZK and ASCC
EGEE Operation Tools and Procedures
Site availability Dec. 19 th 2006
Presentation transcript:

SEE-GRID-SCI Grid Operations Procedures Antun Balaz Institute of Physics Belgrade Serbia The SEE-GRID-SCI initiative is co-funded by the European Commission under the FP7 Research Infrastructures contract no AEGIS Training for Site Administrators Institute of Physics Belgrade, Dec 2008

AEGIS Training for Site Administrators, Institute of Physics Belgrade, December 10-11, Overview SEE-GRID operational and monitoring tools (and their relation to EGEE tools)  HGSM/GOCDB  Helpdesk/GGUS  BBmSAM/SAM  GStat  Nagios/CIC portal  Accounting portal Downtime procedures Upgrade procedures Grid-Operator-On-Duty (GOOD) Service Level Agreement (SLA)

AEGIS Training for Site Administrators, Institute of Physics Belgrade, December 10-11, Operational & monitoring tools HGSM HELPDESK BDII R-GMA SAM GSTAT (Taiwan) GSTAT (Taiwan) VOMS BBmSAM Accounting NAGIOS

AEGIS Training for Site Administrators, Institute of Physics Belgrade, December 10-11, HGSM/GOCDB (1)

AEGIS Training for Site Administrators, Institute of Physics Belgrade, December 10-11, HGSM/GOCDB (2)

AEGIS Training for Site Administrators, Institute of Physics Belgrade, December 10-11, HGSM/GOCDB (3) Static database containing all relevant data about all SEE-GRID and AEGIS sites Must be kept synchronized with the real situation  All sheets must be properly updated  Site Info  Contacts  Site Nodes  Downtimes  XML dumps – the easiest way to apply changes is to download XML dump of the data, edit it appropriately, and then upload the new XML file; this also allows keeping of backups

AEGIS Training for Site Administrators, Institute of Physics Belgrade, December 10-11, HGSM/GOCDB (4) The essential fields in HGSM:  GIIS URL  Monitoring: Yes  Status: certified  Type: seegrid_production, seegrid_certified, egee_production  Site Commitments Contacts and administrators All fields have to have correct values! URL:

AEGIS Training for Site Administrators, Institute of Physics Belgrade, December 10-11, Helpdesk/GGUS (1)

AEGIS Training for Site Administrators, Institute of Physics Belgrade, December 10-11, Helpdesk/GGUS (2)

AEGIS Training for Site Administrators, Institute of Physics Belgrade, December 10-11, Helpdesk/GGUS (3) Central reference point for tracking of all operational and user problems Identified problems are reported through the Helpdesk and assigned to the appropriate supported If problems cannot be solved within the SEE-GRID community, they are propagated to other projects/initiatives/support systems (e.g. GGUS) URL:

AEGIS Training for Site Administrators, Institute of Physics Belgrade, December 10-11, BBmSAM/SAM

AEGIS Training for Site Administrators, Institute of Physics Belgrade, December 10-11, BBmSAM History

AEGIS Training for Site Administrators, Institute of Physics Belgrade, December 10-11, BBmSAM Portal that provides access to the database of SAM tests results Central tools for identification of operational problems Should be checked by each site admin on a daily basis Should be used to troubleshoot problems Also provides SLA figures URL:

AEGIS Training for Site Administrators, Institute of Physics Belgrade, December 10-11, GStat (1)

AEGIS Training for Site Administrators, Institute of Physics Belgrade, December 10-11, GStat (2) Central tool for monitoring of the information system of SEE-GRID infrastructure Provides useful data Identifies problems with sites Should be checked by each site admin on a daily basis and used for troubleshooting  Useful ldapsearch commands can be found on GStat pages! URL:

AEGIS Training for Site Administrators, Institute of Physics Belgrade, December 10-11, Nagios/CIC portal (1)

AEGIS Training for Site Administrators, Institute of Physics Belgrade, December 10-11, Nagios/CIC Portal (2)

AEGIS Training for Site Administrators, Institute of Physics Belgrade, December 10-11, Nagios/CIC portal (3) Collection of alarms raised by various tools The aim is to integrate all the tools and make the life of site admins and infrastructure managers easier In the future, automatic creation of Helpdesk tickets will be implemented URL:

AEGIS Training for Site Administrators, Institute of Physics Belgrade, December 10-11, Accounting portal (1) Accounting by site Accounting by countries and institutions Accounting by applications

AEGIS Training for Site Administrators, Institute of Physics Belgrade, December 10-11, EGEE Accounting portal

AEGIS Training for Site Administrators, Institute of Physics Belgrade, December 10-11, Accounting portal (2) Collects the accounting data from all SEE-GRID and AEGIS sites through apel accounting publisher developed by the project Provides aggregated accounting data by site, country, institution, application Each site must publish the accounting data properly URL:

AEGIS Training for Site Administrators, Institute of Physics Belgrade, December 10-11, Downtime procedures Downtimes must be announced well in advance (1 week is reasonable time)  There are always downtime due to hardware etc. failures that cannot be anticipated All downtimes must be entered properly in HGSM  That way they are not be counted against the site’s availability In addition, all downtimes must be broadcasted by e- mail to the GIM, APP and proper VO mailing lists Downtime should not exceed 10% of the total time (monthly, quarterly)  If yes, explanation must be provided  If the explanation is not accepted by the project management, SA1 claims will be rejected

AEGIS Training for Site Administrators, Institute of Physics Belgrade, December 10-11, Upgrade procedures All upgrades/updates are announced over the GIM list The broadcasts contain links to further instructions for upgrades for each Grid service  Site admins should carefully examine them before performing the update! In addition, possible SEE-GRID-specific instructions are given in the For especially important updates/changes, tickets are created for each site For some upgrades/updates to be performed, downtimes may be required OS updates must be regularly installed, to minimize security risks

AEGIS Training for Site Administrators, Institute of Physics Belgrade, December 10-11, Grid-Operator-On-Duty (GOOD) Rotating shifts on a weekly basis  Each country’s GIM is responsible to monitor sites during his/her shift  Tickets are submitted to sites with problems, according to the status of sites in various monitoring tools (BBmSAM, GStat, Nagios, Accounting portal, etc.)  Older tickets that are not resolved are escalated  Support is given to sites that cannot resolve earlier identified operational problems  User tickets are assigned to the appropriate supporters  Wiki documentation is updated, or new wiki pages created if necessary URLs:  

AEGIS Training for Site Administrators, Institute of Physics Belgrade, December 10-11, Usual problems and links to (possible) solutions BDII  siteBDII (GIIS) or top-level BDII is Unreachable  No info published CA  CA version test failed with error message: This CA is an old one and time allowed to upgrade is over CE (Computing Element)  Job submission failed with error message: Brokerhelper: Cannot plan. No compatible resources:  Job submission failed with error message: Got a job held event, reason: Unspecified gridmanager error  Job submission failed with error message: Cannot read JobWrapper output, both from Condor and from Maradona  Job submission failed with error message: 7 authentication failed  Job submission failed with error message: 10 data transfer to the server failed  4444 Waiting jobs in the GRIS SE (Storage Element)  File copy and registration failed with error message: FTPD GSSAPI error: GSS Major Status: General failure FTPD_GSSAPI_error%3A_GSS_Major_Status%3A_General_failure

AEGIS Training for Site Administrators, Institute of Physics Belgrade, December 10-11, Service Level Agreement (SLA) Old URL: The change to the current one is that the required availability is 80%, and that the availability is calculated on 3h basis, not on a daily basis BBmSAM portal provides SLA figures Sites not fully conforming to the SLA will have reduced funding Sites with the availability <50% will be uncertified Sites fully conforming to the SLA will be put into seegrid_certified status and become visible to the whole SEE region (i.e. not only SEE-GRID, but also EGEE-SEE etc.)

AEGIS Training for Site Administrators, Institute of Physics Belgrade, December 10-11, Service Level Agreement (SLA) Currently:  AEGIS01-PHY-SCL: egee_production  AEGIS02-RCUB and AEGIS04-KG: seegrid_certified - good work!  AEGIS03-ELEF-LEDA: seegrid_production  AEGIS05-ETFBG: experimental (uncertified) – problem! All AEGIS sites must improve their availability figures, so that we provide better service to our users