Certification and test activity ROC/CIC Deployment Team EGEE-SA1 Conference, CNAF – Bologna 05 Oct 2004 www.eu-egee.org.

Slides:



Advertisements
Similar presentations
Andrew McNab - Manchester HEP - 17 September 2002 Putting Existing Farms on the Testbed Manchester DZero/Atlas and BaBar farms are available via the Testbed.
Advertisements

29 June 2006 GridSite Andrew McNabwww.gridsite.org VOMS and VOs Andrew McNab University of Manchester.
Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft Torsten Antoni – LCG Operations Workshop, CERN 02-04/11/04 Global Grid User Support - GGUS -
Last update 01/06/ :23 LCG 1Maria Dimou- cern-it-gd Maria Dimou IT/GD Site Registration policy & procedures
Deployment Team. Deployment –Central Management Team Takes care of the deployment of the release, certificates the sites and manages the grid services.
08/11/908 WP2 e-NMR Grid deployment and operations Technical Review in Brussels, 8 th of December 2008 Marco Verlato.
Dave Kant Grid Monitoring and Accounting Dave Kant CCLRC e-Science Centre, UK HEPiX at Brookhaven 18 th – 22 nd Oct 2004.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Simply monitor a grid site with Nagios J.
INFSO-RI Enabling Grids for E-sciencE SA1: Cookbook (DSA1.7) Ian Bird CERN 18 January 2006.
SEE-GRID-SCI SEE-GRID-SCI Operations Procedures and Tools Antun Balaz Institute of Physics Belgrade, Serbia The SEE-GRID-SCI.
Monitoring in EGEE EGEE/SEEGRID Summer School 2006, Budapest Judit Novak, CERN Piotr Nyczyk, CERN Valentin Vidic, CERN/RBI.
CMS Stress Test Report Marco Verlato (INFN-Padova) INFN-GRID Testbed Meeting 17 Gennaio 2003.
Responsibilities of ROC and CIC in EGEE infrastructure A.Kryukov, SINP MSU, CIC Manager Yu.Lazin, IHEP, ROC Manager
GridPP Deployment & Operations GridPP has built a Computing Grid of more than 5,000 CPUs, with equipment based at many of the particle physics centres.
First attempt for validating/testing Testbed 1 Globus and middleware services WP6 Meeting, December 2001 Flavia Donno, Marco Serra for IT and WPs.
Fabric Monitor, Accounting, Storage and Reports experience at the INFN Tier1 Felice Rosso on behalf of INFN Tier1 Workshop sul.
Enabling Grids for E-sciencE System Analysis Working Group and Experiment Dashboard Julia Andreeva CERN Grid Operations Workshop – June, Stockholm.
Maarten Litmaath (CERN), GDB meeting, CERN, 2006/02/08 VOMS deployment Extent of VOMS usage in LCG-2 –Node types gLite 3.0 Issues Conclusions.
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America Grid Monitoring Tools Alexandre Duarte CERN.
CERN Using the SAM framework for the CMS specific tests Andrea Sciabà System Analysis WG Meeting 15 November, 2007.
Certification and test activity IT ROC/CIC Deployment Team LCG WorkShop on Operations, CERN 2-4 Nov
Fabric Monitoring at the INFN Tier1 Felice Rosso on behalf of INFN Tier1 Joint OSG & EGEE Operations WS, Culham (UK)
US LHC OSG Technology Roadmap May 4-5th, 2005 Welcome. Thank you to Deirdre for the arrangements.
1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.
1 Grid2Win: porting of gLite middleware to Windows Dario Russo INFN Catania
EGEE-III INFSO-RI Enabling Grids for E-sciencE Antonio Retico CERN, Geneva 19 Jan 2009 PPS in EGEEIII: Some Points.
8 th CIC on Duty meeting Krakow /2006 Enabling Grids for E-sciencE Feedback from SEE first COD shift Emanoil Atanassov Todor Gurov.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE Site Architecture Resource Center Deployment Considerations MIMOS EGEE Tutorial.
Site Validation Session Report Co-Chairs: Piotr Nyczyk, CERN IT/GD Leigh Grundhoefer, IU / OSG Notes from Judy Novak WLCG-OSG-EGEE Workshop CERN, June.
SAM Sensors & Tests Judit Novak CERN IT/GD SAM Review I. 21. May 2007, CERN.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Communication tools between Grid Virtual.
EGEE is a project funded by the European Union under contract IST Roles & Responsibilities Ian Bird SA1 Manager Cork Meeting, April 2004.
CERN Running a LCG-2 Site – Oxford July - 1 LCG2 Administrator’s Course Oxford University, 19 th – 21 st July Developed.
Accounting in LCG/EGEE Can We Gauge Grid Usage via RBs? Dave Kant CCLRC, e-Science Centre.
EGEE is a project funded by the European Union under contract INFSO-RI Grid accounting with GridICE Sergio Fantinel, INFN LNL/PD LCG Workshop November.
M. Cristina Vistoli EGEE SA1 Organization Meeting EGEE is proposed as a project funded by the European Union under contract IST Regional Operations.
TP: Grid site installation BEINGRID site installation.
Gennaro Tortone, Sergio Fantinel – Bologna, LCG-EDT Monitoring Service DataTAG WP4 Monitoring Group DataTAG WP4 meeting Bologna –
INFN GRID Production Infrastructure Status and operation organization Cristina Vistoli Cnaf GDB Bologna, 11/10/2005.
SAM Status Update Piotr Nyczyk LCG Management Board CERN, 5 June 2007.
FESR Trinacria Grid Virtual Laboratory gLite Information System Muoio Annamaria INFN - Catania gLite 3.0 Tutorial Trigrid Catania,
EGEE is a project funded by the European Union under contract IST Service Activity 1 M.Cristina Vistoli ROC Coordinator All activity meeting,
II EGEE conference Den Haag November, ROC-CIC status in Italy
– n° 1 Grid di produzione INFN – GRID Cristina Vistoli INFN-CNAF Bologna Workshop di INFN-Grid ottobre 2004 Bari.
1/3/2006 Grid operations: structure and organization Cristina Vistoli INFN CNAF – Bologna - Italy.
Claudio Grandi INFN Bologna Virtual Pools for Interactive Analysis and Software Development through an Integrated Cloud Environment Claudio Grandi (INFN.
INFSO-RI Enabling Grids for E-sciencE Resource allocation and negotiation update C. Vistoli, R. Rumler Operations workshop Bologna.
INFN/IGI contributions Federated Clouds Task Force F2F meeting November 24, 2011, Amsterdam.
TIFR, Mumbai, India, Feb 13-17, GridView - A Grid Monitoring and Visualization Tool Rajesh Kalmady, Digamber Sonvane, Kislay Bhatt, Phool Chand,
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks The Dashboard for Operations Cyril L’Orphelin.
DGAS Distributed Grid Accounting System INFN Workshop /05/1009, Palau Giuseppe Patania Andrea Guarise 6/18/20161.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks CYFRONET site report Marcin Radecki CYFRONET.
1 Grid Operations Jinny Chien ASGC June 09, Academia Sinica Slides adapted from the EGEE training material repository:
CERN LCG1 to LCG2 Transition Markus Schulz LCG Workshop March 2004.
Enabling Grids for E-sciencE INFN Workshop – May 7-11 Rimini 1 Grid Accounting Status at INFN Riccardo Brunetti INFN-TORINO.
Scuola Grid - Martina Franca, Thursday 08 November Il Sistema di Supporto INFNGrid & GGUS ( Global Grid User.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
Grid Monitoring and Diagnostic Tools: GridICE, GSTAT, SAM Giuseppe Misurelli INFN-CNAF giuseppe.misurelli cnaf.infn.it.
Enabling Grids for E-sciencE Claudio Cherubino INFN DGAS (Distributed Grid Accounting System)
INFN-Grid WS, Bari, 2004/10/15 Andrea Caltroni, INFN-Padova Marco Verlato, INFN-Padova Andrea Ferraro, INFN-CNAF Bologna EGEE User Support Report.
– n° 1 The Grid Production infrastructure Cristina Vistoli INFN CNAF.
Job monitoring and accounting data visualization
Regional Operations Centres Core infrastructure Centres
NGI and Site Nagios Monitoring
Brief overview on GridICE and Ticketing System
ATLAS support in LCG.
INFN – GRID status and activities
The CCIN2P3 and its role in EGEE/LCG
EGEE Operation Tools and Procedures
Site availability Dec. 19 th 2006
Presentation transcript:

Certification and test activity ROC/CIC Deployment Team EGEE-SA1 Conference, CNAF – Bologna 05 Oct

INFN-CNAF – Bologna, ItalyOctober 05, Summary INFN-GRID release: resources, services and supported VOs; Basic tests before joining the grid; Certification and periodic tests activity; Calendar and Ticketing System; Certification queue; GridAT (Grid Application Test);

INFN-CNAF – Bologna, ItalyOctober 05, INFN-GRID Release INFN-GRID is a customized release of LCG  All resources are fully managed via LCFGng;  INFN-GRID does not support the middleware installation without LCFGng; INFN-GRID release is based upon the official LCG and it is 100% compatible; Main differences from LCG to INFN-GRID 2.2.0:  Added support for DAG jobs;  Added support for AFS on the WorkerNodes;  Added support for MPI jobs via home syncronisation with ssh;  Documented installation of WNs on a private network; Added VOMS support:  INFNGRID, CDF are completely managed via VOMS server.

INFN-CNAF – Bologna, ItalyOctober 05, INFN-GRID: Resources and supported VOs (**) Hyperthreaded

INFN-CNAF – Bologna, ItalyOctober 05, INFN-GRID & EGEE: Dedicated service resources Service Resources are open to all VOs supported by INFN-GRID! RB: egee-rb-01.cnaf.infn.it support also BIOMED VO

INFN-CNAF – Bologna, ItalyOctober 05, Upgrade/Installation activity Testing if "the grid is working" is not so easy; Certification activity in INFN-GRID can be classified into four levels:  Local tests by the local resource center managers;  Certification tests by ROC/CIC Team;  Monitor tests by ROC/CIC Team;  The fourth level, certification on demand, made both by ROC Team and Application Teams.

INFN-CNAF – Bologna, ItalyOctober 05, Basic site tests (1/2) These tests can be performed by the local resource center manager, just after an installation/upgrade or also after in case of troubles reported by users or found by our periodic test activity ;  All nodes: Check that all nodes are mounting the LCFGng RPM repository from the LCFGng server;  CE/SE: Verify the files access permissions and check the validity and the subject of the host certificate;  CE: Check if the local scheduler works fine locally;  SE: In the SE storage area there should be one directory for each VO supported with permissions and owners;  WN: WNs should have some pool accounts for each supported VOs.

INFN-CNAF – Bologna, ItalyOctober 05, Basic tests (2/2) Basic middleware tests from an UI:  Globus-job-run tests without using the batch system and running directly on CE; globus-job-run CE /bin/hostname  Globus-job-run tests to the batch system; globus-job-run CE:2119/jobmanager-lcgpbs /bin/hostname  Try to communicate with the SE using a globus-url-copy;  Test GRIS and GIIS (checking if the queues of the local batch system, the SE informations, the RuntimeEnviroments, etc, are published; ldapsearch -h [CE||SE] -p b "mds-vo-name=local,o=grid" -x

INFN-CNAF – Bologna, ItalyOctober 05, Certification activity The ROC/CMT (Central Management Team) is responsible of the resource centers certification: checking the functionalities of a site before joining the site to the production grid. Although all certification jobs are VO independent, the INFNGRID VO is used to perform these jobs; In particular are checked: GIIS' information consistence; Local jobs submission (LRMS); Grid submission with Globus (globus-job-run); Grid submission with the ResorceBroker; ReplicaManager functionalities; In order to certificate a site the CMT uses only dedicated grid services:  RB & BDII: gridit-cert-rb.cnaf.infn.it In this way we avoid to have an uncertified site in the production grid services;

INFN-CNAF – Bologna, ItalyOctober 05, Periodic test ROC Team and each resource manager, could notify advices about their resources via web inserting a “Downtime advices”. The Calendar shows the snapshot of the Production Service Status. We periodically submit certification jobs to the sites in the production grid, in order to pro-actively find ‘troubles’ before users find them.

INFN-CNAF – Bologna, ItalyOctober 05, Ticketing system INFN-GRID ticketing system is used: from users to ask questions or to communicate troubles; from resource manager to communicate about common grid tasks (ex: upgrading to a new grid release), or to solve problem.  Support Groups are “helper” groups and they exist to resolve the obvious problems arising with the grow of the grid: –Support Grid Services (RB, RLS, VOMS, GridICE, etc) Group; –Support VO Services Group (each for every VO); –Support VOApplications Group (each for every VO); –Support Site Group (each for every site)  Operative Groups aren`t "helper" groups. They exist to improve the overall grid coordination: –Operative Central Management Team (CMT); –Operative Release & Deployment Team; Users -> Create a ticket Supporters/Operatives -> Open the ticket Users and/or Supporters/Operatives -> Update an open ticket Supporters/Operatives -> Close the ticket

INFN-CNAF – Bologna, ItalyOctober 05, Why a “cert” queue ? A CE could exist in many BDIIs with different purpose(CIC, LCG, VO specific) After a site upgrade, just as soon as queues were opened, a lot of jobs arrived from anywhere to an uncertified (and unsecure) site and making impossible its fully certification. To avoid this, all sites joining INFN-GRID have a cert queue (both with PBS and LSF): –High priority queue; –Only open to VO INFNGRID; –With a low max cpu time (10 minutes);  After site installation/upgrade, only the cert queues is opened;  After certification tests by ROC, every other queues will be opened; In addiction, in this way, all periodic test jobs by ROC submitted to the cert queue will always have a higher priority than the other jobs.

INFN-CNAF – Bologna, ItalyOctober 05, BDII - ROC setup All the sites, certified by the ROC team using the test zone are added to the ROC production BDII accessible via web. Each Roc create, manage and publish via web the region BDII  Similar to The ROC is ‘authoritative’ for its BDII, it is the master copy of CE and SE of his region  Operations relatedwith ROC resource centers are reflected in the BDII content (scheduled downtime, planned upgrade, site certification failure) All CICs and OMC run a EGEE-BDII made of the union of all the ROCs and CERN BDII lists (like in DNS each Roc is authoritative for its zone the other BDII get a valid copy) RBs can be associated with the EGEE-BDII or with a BDII containing ad hoc selection of sites from this global list

INFN-CNAF – Bologna, ItalyOctober 05, Distributed operations among ROCs: technical coordination Operation contact mailing list for each ROC Moderated list of mailing list: egee-roc-announce to exhange info on resource center insertion/removal from ROC BDII (eventually pointing to ticket# for details) Specific problem about a CE/SE Information about Middleware Release availability ROC resource centers upgrade plan exchange Operational security issue Site control tools/scripts exchange IRQ dedicated channel between ROC deployment teams to rapidly exchange operations related questions

INFN-CNAF – Bologna, ItalyOctober 05, GridAT - Grid Application Test GridAT has the main goal to provide a general and flexible framework for VO application tests in a grid system. Under development by T. Coviello and A. Pierro (INFN-BARI) It permits to test a grid site from the VO viewpoint. Results are stored in a central database and browsable on a web page so it will be also used for certification and test activity.

INFN-CNAF – Bologna, ItalyOctober 05, Useful links INFN Production Grid  INFN GridICE  INFN test and certification  INFN Support 