Testing and integrating the WLCG/EGEE middleware in the LHC computing Simone Campana, Alessandro Di Girolamo, Elisa Lanciotti, Nicolò Magini, Patricia.

Slides:



Advertisements
Similar presentations
Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft Torsten Antoni – LCG Operations Workshop, CERN 02-04/11/04 Global Grid User Support - GGUS -
Advertisements

The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Services Abderrahman El Kharrim
Workload Management Workpackage Massimo Sgaravatto INFN Padova.
LHC Experiment Dashboard Main areas covered by the Experiment Dashboard: Data processing monitoring (job monitoring) Data transfer monitoring Site/service.
CERN - IT Department CH-1211 Genève 23 Switzerland t Monitoring the ATLAS Distributed Data Management System Ricardo Rocha (CERN) on behalf.
5 November 2001F Harris GridPP Edinburgh 1 WP8 status for validating Testbed1 and middleware F Harris(LHCb/Oxford)
CERN IT Department CH-1211 Genève 23 Switzerland t EIS section review of recent activities Harry Renshall Andrea Sciabà IT-GS group meeting.
F.Fanzago – INFN Padova ; S.Lacaprara – LNL; D.Spiga – Universita’ Perugia M.Corvo - CERN; N.DeFilippis - Universita' Bari; A.Fanfani – Universita’ Bologna;
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Experience with the gLite Workload Management.
Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.
EGEE-III INFSO-RI Enabling Grids for E-sciencE Julia Andreeva CERN (IT/GS) CHEP 2009, March 2009, Prague New job monitoring strategy.
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Job Monitoring for the LHC experiments Irina Sidorova (CERN, JINR) on.
INFSO-RI Enabling Grids for E-sciencE Project Gridification: the UNOSAT experience Patricia Méndez Lorenzo CERN (IT-PSS/ED) CERN,
1 st December 2003 JIM for CDF 1 JIM and SAMGrid for CDF Mòrag Burgon-Lyon University of Glasgow.
Interactive Job Monitor: CafMon kill CafMon tail CafMon dir CafMon log CafMon top CafMon ps LcgCAF: CDF submission portal to LCG resources Francesco Delli.
The huge amount of resources available in the Grids, and the necessity to have the most up-to-date experimental software deployed in all the sites within.
November SC06 Tampa F.Fanzago CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team.
Tier-2  Data Analysis  MC simulation  Import data from Tier-1 and export MC data CMS GRID COMPUTING AT THE SPANISH TIER-1 AND TIER-2 SITES P. Garcia-Abia.
David Adams ATLAS ADA, ARDA and PPDG David Adams BNL June 28, 2004 PPDG Collaboration Meeting Williams Bay, Wisconsin.
And Tier 3 monitoring Tier 3 Ivan Kadochnikov LIT JINR
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks WMSMonitor: a tool to monitor gLite WMS/LB.
Enabling Grids for E-sciencE System Analysis Working Group and Experiment Dashboard Julia Andreeva CERN Grid Operations Workshop – June, Stockholm.
INFSO-RI Enabling Grids for E-sciencE The gLite Workload Management System Elisabetta Molinari (INFN-Milan) on behalf of the JRA1.
CERN Using the SAM framework for the CMS specific tests Andrea Sciabà System Analysis WG Meeting 15 November, 2007.
CEOS WGISS-21 CNES GRID related R&D activities Anne JEAN-ANTOINE PICCOLO CEOS WGISS-21 – Budapest – 2006, 8-12 May.
What is SAM-Grid? Job Handling Data Handling Monitoring and Information.
Stefano Belforte INFN Trieste 1 Middleware February 14, 2007 Resource Broker, gLite etc. CMS vs. middleware.
Grid User Interface for ATLAS & LHCb A more recent UK mini production used input data stored on RAL’s tape server, the requirements in JDL and the IC Resource.
US LHC OSG Technology Roadmap May 4-5th, 2005 Welcome. Thank you to Deirdre for the arrangements.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks The usage of the gLite Workload Management.
22 February 2008GS Group Meeting - EIS section GS-EIS: Experiment Integration Support section Five staff: Harry Renshall Section Leader Simone Campana.
E-infrastructure shared between Europe and Latin America FP6−2004−Infrastructures−6-SSA gLite Information System Pedro Rausch IF.
1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
Glite. Architecture Applications have access both to Higher-level Grid Services and to Foundation Grid Middleware Higher-Level Grid Services are supposed.
SAM Sensors & Tests Judit Novak CERN IT/GD SAM Review I. 21. May 2007, CERN.
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America gLite Information System Claudio Cherubino.
Julia Andreeva on behalf of the MND section MND review.
INFSO-RI Enabling Grids for E-sciencE CRAB: a tool for CMS distributed analysis in grid environment Federica Fanzago INFN PADOVA.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES Andrea Sciabà Hammercloud and Nagios Dan Van Der Ster Nicolò Magini.
1 Andrea Sciabà CERN The commissioning of CMS computing centres in the WLCG Grid ACAT November 2008 Erice, Italy Andrea Sciabà S. Belforte, A.
DIRAC Project A.Tsaregorodtsev (CPPM) on behalf of the LHCb DIRAC team A Community Grid Solution The DIRAC (Distributed Infrastructure with Remote Agent.
GridView - A Monitoring & Visualization tool for LCG Rajesh Kalmady, Phool Chand, Kislay Bhatt, D. D. Sonvane, Kumar Vaibhav B.A.R.C. BARC-CERN/LCG Meeting.
EGEE is a project funded by the European Union under contract INFSO-RI Grid accounting with GridICE Sergio Fantinel, INFN LNL/PD LCG Workshop November.
Criteria for Deploying gLite WMS and CE Ian Bird CERN IT LCG MB 6 th March 2007.
Enabling Grids for E-sciencE CMS/ARDA activity within the CMS distributed system Julia Andreeva, CERN On behalf of ARDA group CHEP06.
New solutions for large scale functional tests in the WLCG infrastructure with SAM/Nagios: The experiments experience ES IT Department CERN J. Andreeva.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES Andrea Sciabà Ideal information system - CMS Andrea Sciabà IS.
Enabling Grids for E-sciencE Experience Supporting the Integration of LHC Experiments Computing Systems with the LCG Middleware Simone.
D.Spiga, L.Servoli, L.Faina INFN & University of Perugia CRAB WorkFlow : CRAB: CMS Remote Analysis Builder A CMS specific tool written in python and developed.
My Jobs at CERN April 2015 My Jobs at CERN2
LHCC Referees Meeting – 28 June LCG-2 Data Management Planning Ian Bird LHCC Referees Meeting 28 th June 2004.
SAM Status Update Piotr Nyczyk LCG Management Board CERN, 5 June 2007.
Status of gLite-3.0 deployment and uptake Ian Bird CERN IT LCG-LHCC Referees Meeting 29 th January 2007.
FESR Trinacria Grid Virtual Laboratory gLite Information System Muoio Annamaria INFN - Catania gLite 3.0 Tutorial Trigrid Catania,
INFSO-RI Enabling Grids for E-sciencE File Transfer Software and Service SC3 Gavin McCance – JRA1 Data Management Cluster Service.
Breaking the frontiers of the Grid R. Graciani EGI TF 2012.
LCG Workshop User Support Working Group 2-4 November 2004 – n o 1 Some thoughts on planning and organization of User Support in LCG/EGEE Flavia Donno LCG.
OSG Status and Rob Gardner University of Chicago US ATLAS Tier2 Meeting Harvard University, August 17-18, 2006.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Job Management Claudio Grandi.
SAM architecture EGEE 07 Service Availability Monitor for the LHC experiments Simone Campana, Alessandro Di Girolamo, Nicolò Magini, Patricia Mendez Lorenzo,
CERN IT Department CH-1211 Genève 23 Switzerland t CMS SAM Testing Andrea Sciabà Grid Deployment Board May 14, 2008.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
Bob Jones EGEE Technical Director
INFNGRID Workshop – Bari, Italy, October 2004
Short update on the latest gLite status
LCG middleware and LHC experiments ARDA project
Site availability Dec. 19 th 2006
Information Services Claudio Cherubino INFN Catania Bologna
The LHCb Computing Data Challenge DC06
Presentation transcript:

Testing and integrating the WLCG/EGEE middleware in the LHC computing Simone Campana, Alessandro Di Girolamo, Elisa Lanciotti, Nicolò Magini, Patricia Méndez Lorenzo, Enzo Miccio, Roberto Santinelli, Andrea Sciabà CERN — Switzerland, INFN-CNAF — Italy The EIS team The Experiment Integration Support team in the Worldwide LHC Computing Grid project is active since 2002 in helping the LHC experiments and other user communities to use the Grid as effectively as possible. The EIS activities include: —Contributing to integrate the experiment computing framework with the Grid middleware —Interfacing user communities with the middleware developers and the WLCG infrastructure operations —Developing new user tools to implement functionalities missing from the Grid middleware —Testing middleware components as they become available —Directly participating to the experiment computing activities (data challenges, Monte Carlo production, etc.) —Providing end-user documentation The gLite WMS and the experiments Middleware testing is one of the most important activities of the EIS team. The purpose is to verify the readiness of the middleware with respect to the needs of the LHC experiments. The LHC experiments need to generate and process huge amounts of simulated data to validate the reconstruction software, test their computing model and develop physics data analysis algorithms. For example, the current and foreseen production rates are of the order of 50 million events/month in 2007 and 100 million events/month in 2008, for ATLAS and CMS. Each experiment requires to submit and manage about 10 5 jobs/day at several tens of participating sites. The gLite Workload Management System is an evolution of the LCG Resource Broker which provides better performance in terms of scalability and new functionalities ("bulk" submission being the most important). Experiment monitoring with SAM The Service Availability Monitoring system (SAM) is a framework developed to provide a global and uniform monitoring tool for Grid services. It works executing periodic tests, organized in "sensors" (one for each type of Grid service), on all Grid services. The test results are published in an Oracle database with a Tomcat based web service interface. SAM is the main source of information for Grid operations and is used to measure the availability of Grid services. The flexibility of the SAM framework makes it an excellent choice also for any Virtual Organisation to implement custom tests on existing service types, or even on experiment-specific services. The EIS team is strongly involved in the integration of the experiment monitoring with SAM. All the LHC experiments are currently using SAM: ALICE uses SAM to monitor the services running on their "VO boxes" (nodes which host all the ALICE-specific software at a site) ATLAS uses the same "standard" tests as the Grid operations team, but run using ATLAS Grid credentials; in the near future also more specific tests will be run LHCb uses the SAM database to publish the results of software installation and validation jobs The gLite WMS architecture ClientWMProxy Task queue Workload Manager Matchmaker Job Submission and Monitoring Logging & Bookkeeping Information Supermarket Information System LB Proxy Computing Element Testing the gLite WMS During its final development phase, the WMS was mainly tested by the EIS team. The tests involved the submission large numbers of jobs to the WLCG production infrastructure, both using simple "hello world" scripts and real experiment applications. Problems encountered were reported to the developers, who provided bug fixes, in an iterative process. Acceptance criteria were defined to assess the compliance of the WMS with the requirements from the experiments and the WLCG operations: Uninterrupted submission of at least 10 4 jobs/day for period of at least five days No service restart required during this period No degradation in performance at the end of this period Number of "stale" jobs less than 1% of the total at the end of the test gLite WMS test results The gLite WMS was tested both by submitting single jobs and job collections of a few hundred jobs each. The status of the jobs was monitored and all failures were identified and investigated. The WMS internal status was also monitored (system load, memory usage, etc.). A test to verify the acceptance criteria was performed and these results were obtained: 115,000 jobs submitted in 7 days (16,000 jobs/day) 320 (0.3%) jobs aborted due to the WMS Negligible delay between job submission and arrival on the CE The acceptance criteria were fully met. An example: CMS monitoring CMS has adopted SAM as the system to implement Grid-wide monitoring of computing and storage elements. CMS contacts at sites must ensure that the CMS tests run successfully. Computing Element tests NameChecks basicCMS software area and CMS site local configuration swinstPresence of the required versions of the CMSSW Monte CarloStage out of a file from the WN to the local SE Squid Basic functionality of the closest Squid server FroNtier SRM tests NameChecks get-pfn-from-tfc Gets LFN  PFN rule from a central CMS DB putCopies a test file into the SRM via srmcp get-metadataGets metadata of the remote file getCopies back the remote file advisory-deleteRemoves the remote file Snapshot of the SAM test results on OSG Site availability The outcome of the CMS SAM tests is used to give a measurement of the CMS availability. It is expressed as the fraction of successful tests as a function of time. EGEE/OSG interoperability An interesting side effect of the choice of SAM for the CMS monitoring was a strong push for interoperability between EGEE and OSG: jobs are submitted by SAM to the LCG Resource Broker also for OSG sites. 50% of sites over the 80% availability mark