SAM Sensors & Tests Judit Novak CERN IT/GD SAM Review I. 21. May 2007, CERN.

Slides:



Advertisements
Similar presentations
FP7-INFRA Enabling Grids for E-sciencE EGEE Induction Grid training for users, Institute of Physics Belgrade, Serbia Sep. 19, 2008.
Advertisements

Grid and CDB Janusz Martyniak, Imperial College London MICE CM37 Analysis, Software and Reconstruction.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Services Abderrahman El Kharrim
Summary of issues and questions raised. FTS workshop for experiment integrators Summary of use  Generally positive response on current state!  Now the.
Experience with Site Functional Tests Piotr Nyczyk CERN IT/GD WLCG Service Workshop Mumbai, February 2006.
Data management at T3s Hironori Ito Brookhaven National Laboratory.
May 12, 2008 Overview on monitoring tools for Grid Systems - Antonio Pierro (INFN-BARI)1 Overview of monitoring tools for Grid Systems Varenna, 12 May.
CERN IT Department CH-1211 Genève 23 Switzerland t EIS section review of recent activities Harry Renshall Andrea Sciabà IT-GS group meeting.
HPDC 2007 / Grid Infrastructure Monitoring System Based on Nagios Grid Infrastructure Monitoring System Based on Nagios E. Imamagic, D. Dobrenic SRCE HPDC.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Simply monitor a grid site with Nagios J.
Monitoring in EGEE EGEE/SEEGRID Summer School 2006, Budapest Judit Novak, CERN Piotr Nyczyk, CERN Valentin Vidic, CERN/RBI.
1 DIRAC – LHCb MC production system A.Tsaregorodtsev, CPPM, Marseille For the LHCb Data Management team CHEP, La Jolla 25 March 2003.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Information System on gLite middleware Vincent.
The EDGeS project receives Community research funding 1 SG-DG Bridges Zoltán Farkas, MTA SZTAKI.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks GStat 2.0 Joanna Huang (ASGC) Laurence Field.
Giuseppe Codispoti INFN - Bologna Egee User ForumMarch 2th BOSS: the CMS interface for job summission, monitoring and bookkeeping W. Bacchi, P.
Enabling Grids for E-sciencE System Analysis Working Group and Experiment Dashboard Julia Andreeva CERN Grid Operations Workshop – June, Stockholm.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Nagios for Grid Services E. Imamagic, SRCE.
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America Grid Monitoring Tools Alexandre Duarte CERN.
CERN Using the SAM framework for the CMS specific tests Andrea Sciabà System Analysis WG Meeting 15 November, 2007.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Grid Site Monitoring with Nagios E. Imamagic,
Certification and test activity IT ROC/CIC Deployment Team LCG WorkShop on Operations, CERN 2-4 Nov
Local Monitoring at SARA Ron Trompert SARA. Ganglia Monitors nodes for Load Memory usage Network activity Disk usage Monitors running jobs.
Report on Installed Resource Capacity Flavia Donno CERN/IT-GS WLCG GDB, CERN 10 December 2008.
1 LHCb File Transfer framework N. Brook, Ph. Charpentier, A.Tsaregorodtsev LCG Storage Management Workshop, 6 April 2005, CERN.
1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.
SAM Tests SAM Devel. & Support Team CERN IT/GD WLCG/EGEE/OSG Operations Workshop 25 Jan. 2007, CERN.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks The GILDA t-Infrastructure Roberto Barbera.
Glite. Architecture Applications have access both to Higher-level Grid Services and to Foundation Grid Middleware Higher-Level Grid Services are supposed.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE Site Architecture Resource Center Deployment Considerations MIMOS EGEE Tutorial.
Grid Monitoring and Operations SAM Development Team CERN IT/GD Tier2 Admin Workshop 03 Dec. 2006, Mumbai.
Site Validation Session Report Co-Chairs: Piotr Nyczyk, CERN IT/GD Leigh Grundhoefer, IU / OSG Notes from Judy Novak WLCG-OSG-EGEE Workshop CERN, June.
CERN IT Department t LHCb Software Distribution Roberto Santinelli CERN IT/GS.
The ATLAS Cloud Model Simone Campana. LCG sites and ATLAS sites LCG counts almost 200 sites. –Almost all of them support the ATLAS VO. –The ATLAS production.
INFSO-RI Enabling Grids for E-sciencE ARDA Experiment Dashboard Ricardo Rocha (ARDA – CERN) on behalf of the Dashboard Team.
INFSO-RI Enabling Grids for E-sciencE ATLAS DDM Operations - II Monitoring and Daily Tasks Jiří Chudoba ATLAS meeting, ,
Certification and test activity ROC/CIC Deployment Team EGEE-SA1 Conference, CNAF – Bologna 05 Oct
Testing and integrating the WLCG/EGEE middleware in the LHC computing Simone Campana, Alessandro Di Girolamo, Elisa Lanciotti, Nicolò Magini, Patricia.
Service Availability Monitor tests for ATLAS Current Status Tests in development To Do Alessandro Di Girolamo CERN IT/PSS-ED.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI How to integrate portals with the EGI monitoring system Dusan Vudragovic.
GridView - A Monitoring & Visualization tool for LCG Rajesh Kalmady, Phool Chand, Kislay Bhatt, D. D. Sonvane, Kumar Vaibhav B.A.R.C. BARC-CERN/LCG Meeting.
SAM Database and relation with GridView Piotr Nyczyk SAM Review CERN, 2007.
TP: Grid site installation BEINGRID site installation.
Gennaro Tortone, Sergio Fantinel – Bologna, LCG-EDT Monitoring Service DataTAG WP4 Monitoring Group DataTAG WP4 meeting Bologna –
User Interface UI TP: UI User Interface installation & configuration.
D.Spiga, L.Servoli, L.Faina INFN & University of Perugia CRAB WorkFlow : CRAB: CMS Remote Analysis Builder A CMS specific tool written in python and developed.
The GridPP DIRAC project DIRAC for non-LHC communities.
WLCG Operations Coordination report Maria Alandes, Andrea Sciabà IT-SDC On behalf of the WLCG Operations Coordination team GDB 9 th April 2014.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Grid Configuration Data or “What should be.
SAM Status Update Piotr Nyczyk LCG Management Board CERN, 5 June 2007.
INFSO-RI Enabling Grids for E-sciencE File Transfer Software and Service SC3 Gavin McCance – JRA1 Data Management Cluster Service.
RI EGI-TF 2010, Tutorial Managing an EGEE/EGI Virtual Organisation (VO) with EDGES bridged Desktop Resources Tutorial Robert Lovas, MTA SZTAKI.
Co-ordination & Harmonisation of Advanced e-Infrastructures for Research and Education Data Sharing Research Infrastructures Grant Agreement n
SAM architecture EGEE 07 Service Availability Monitor for the LHC experiments Simone Campana, Alessandro Di Girolamo, Nicolò Magini, Patricia Mendez Lorenzo,
TIFR, Mumbai, India, Feb 13-17, GridView - A Grid Monitoring and Visualization Tool Rajesh Kalmady, Digamber Sonvane, Kislay Bhatt, Phool Chand,
VO Box discussion ATLAS NIKHEF January, 2006 Miguel Branco -
Site Manageability Issues for LCG Ian Bird IT Department, CERN HEPiX JLab, 12 th October 2006.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI Services for Distributed e-Infrastructure Access Tiziana Ferrari on behalf.
Grid Monitoring and Diagnostic Tools: GridICE, GSTAT, SAM Giuseppe Misurelli INFN-CNAF giuseppe.misurelli cnaf.infn.it.
SRM v2.2: service availability testing and monitoring SRM v2.2 deployment Workshop - Edinburgh, UK November 2007 Flavia Donno IT/GD, CERN.
Service Availability Monitoring
NGI and Site Nagios Monitoring
MCproduction on the grid
POW MND section.
Patricia Méndez Lorenzo ALICE Offline Week CERN, 13th July 2007
SAM Alarm Triggering and Masking
R-GMA (Relational Grid Monitoring Architecture) for monitoring applications “s” gLite and LCG.
Site availability Dec. 19 th 2006
Presentation transcript:

SAM Sensors & Tests Judit Novak CERN IT/GD SAM Review I. 21. May 2007, CERN

SAM review I May, Outline definitions sensor & test types execution existing sensors & tests open issues

SAM review I May, Sensors in the SAM framework

SAM review I May, Definitions – sensor & test Test –single unit checking functionality of a service –executables (scripts) Sensor –container object tests execution environment –configuration, test preparation, inter-service dependencies, test sequence definition –types integrated: running within the SAM framework standalone: existing monitoring tool –grouping tests by functionality per GRID service for multiple services SENSOR exec. environment test1 testN test2

SAM review I May, Integrated sensors run by SAM submisson framework –centrally (SAM UI host) –on site pre-defined structure, conventions sensors & tests: plug-in modules –easy to create & add new ones –example: adding test to existing sensor LHCb tests plugged into CE sensor SAM submission framework SAM Client plug-in externally developed test integrated sensor test2 test1 testN externally developed sensor test2 test1 testN

SAM review I May, Standalone sensors run outside the SAM framework –independent software already existing testsuite or monitoring tool –no modifications on its code! examples –Gstat sensor site BDII, top-level BDII, CE, SE –VOBOX sensor testsuite developed by alice standalone sensor test2 test1 testN standalone test SAM database SAM Client integrated sensor test2 test1 testN

SAM review I May, Execution workflow execution –via SAM submission framework –other mechanisms cron, etc. publishing results –using SAM web services –stored in SAM database displaying results –SAM portal –Gridview SAM submission framework SAM publisher web service insert data publish invoke sensor SAM Client SAM Server standalone sensor test2 test1 testN integrated sensor test2 test1 testN publish SAM database

SAM review I May, Existing SAM sensors integrated sensors –CE, gCE –SE, SRM –LFC –FTS –VOBOX –VOMS –MyProxy –host-cert –(JobWrapper) standalone sensors –Gstat –RB –VOBOX (alice)

SAM review I May, Integrated sensors – CE, gCE I. Job submission –testing UI→RB/WMS→CE/gCE→WN chain errors may occure on any level –job submission failure → job logging info returned Replica Management –WN → default SE –3rd- party replication default SE → central SAM SE (CERN) CA certificate check (WN!) –new CA certificates released by EUGridPMA SAM + middleware repository: immediate update sites have 7 days to upgrade

SAM review I May, Integrated sensors – CE, gCE II. Software version check UNIX shell –bash + csh –environment variables RGMA –insert + read back data all tests above executed on site!

SAM review I May, Integrated sensors – SE, SRM, LFC, VOBOX, etc. SE, SRM –same tests for both –file replication there and back (and deleted after) LFC –directory listing –directory creation for the VO VOBOX – gsissh test VOMS, MyProxy –only host-cert (“ping”)

SAM review I May, Integrated sensors - FTS check if FTS endpoint is correctly published in BDII listing channels –ChannelManagement service transfer test –N-N transfer jobs following the VO use cases tested T0 → all T1s (outgoing) tested T1 ← other T1s (incoming) –checking the status of jobs –using pre-defined static list of SRM endpoints –Note: this test is relying on SRM availability

SAM review I May, Integrated sensors - host-cert multiple services –CE, gCE, SE, SRM, LFC, FTS, RB, VOMS, MyProxy, RGMA is the host certificate valid alarm raised after certificate expired –future: alarm raised 1 week before expiration

SAM review I May, Standalone sensors Gstat (Sinica) –top-level BDII accessibility (response time) reliability of data (number of entries) –site BDII accessibility (response time) sanity checks (partial Glue Schema validation) –CE free- and total nubmer of CPUs number of waiting- and running jobs –SE used- and available storage capacity RB (RAL) –job submission “important” RBs are tested using “reliable” CEs –measuring time of matchmaking

SAM review I May, Integrated sensors - JobWrapper I. requested by experiments –motivation: SAM jobs might not reach all Wns → broken WN not detected simplified set of tests execution by CE wrapper script with every GRID job → all Wns reached test results –passed to the job –stored in SAM DB SAM publishing web service –published RGMA installation –part of middleware release modified CE wrapper scripts –signed tarball on software area tests

SAM review I May, Integrated sensors - JobWrapper II. operations –infrastructure description with unique identification of WNs relation between batch queues & WNs detection of monitoring queues pointing to “carefullly” selected WNs no double counting of WNs that belong to shared batch farms –detection of sites with broken WNs basic fabric monitoring for small sites status –wrapper script: released production BUT disabled due to RGMA problems –tarball with tests: to be installed –visualization tools: to be developed

SAM review I May, Open issues missing sensors –WMS, MyProxy coordinated by TIC Team –VOMS currently: only host-cert (“ping”) standalone sensor by VOMS developers: to be integrated analysis: what else is needed? –Tier1 DB CERN DB Team –RGMA Registry being developed at RAL adopting Standard Probe Format

SAM review I May, Last slide Thanx for your attention!