Réunion DataGrid France, Lyon, fév. 2003 CMS test of EDG Testbed Production MC CMS Objectifs Résultats Conclusions et perspectives C. Charlot / LLR-École.

Slides:



Advertisements
Similar presentations
Claudio Grandi INFN Bologna DataTAG WP4 meeting, Bologna 14 jan 2003 CMS Grid Integration Claudio Grandi (INFN – Bologna)
Advertisements

DataTAG WP4 Meeting CNAF Jan 14, 2003 Interfacing AliEn and EDG 1/13 Stefano Bagnasco, INFN Torino Interfacing AliEn to EDG Stefano Bagnasco, INFN Torino.
ATLAS/LHCb GANGA DEVELOPMENT Introduction Requirements Architecture and design Interfacing to the Grid Ganga prototyping A. Soroko (Oxford), K. Harrison.
EU 2nd Year Review – Jan – Title – n° 1 WP1 Speaker name (Speaker function and WP ) Presentation address e.g.
Workload management Owen Maroney, Imperial College London (with a little help from David Colling)
FP7-INFRA Enabling Grids for E-sciencE EGEE Induction Grid training for users, Institute of Physics Belgrade, Serbia Sep. 19, 2008.
CMS-ARDA Workshop 15/09/2003 CMS/LCG-0 architecture Many authors…
Grid and CDB Janusz Martyniak, Imperial College London MICE CM37 Analysis, Software and Reconstruction.
The DataGrid Project NIKHEF, Wetenschappelijke Jaarvergadering, 19 December 2002
School on Grid Computing – July 2003 – n° 1 A.Fanfani INFN Bologna – CMS WP8 – CMS experience on EDG testbed  Introduction  Use of EDG middleware in.
11 Dec 2000F Harris Datagrid Testbed meeting at Milan 1 LHCb ‘use-case’ - distributed MC production
A tool to enable CMS Distributed Analysis
The EDG Testbed Deployment Details The European DataGrid Project
EDG Application The European DataGrid Project Team
Dave Newbold, University of Bristol24/6/2003 CMS MC production tools A lot of work in this area recently! Context: PCP03 (100TB+) just started Short-term.
DataGrid is a project funded by the European Commission under contract IST Status and Prospective of EU Data Grid Project Alessandra Fanfani.
CMS Report – GridPP Collaboration Meeting VI Peter Hobson, Brunel University30/1/2003 CMS Status and Plans Progress towards GridPP milestones Workload.
EU 2nd Year Review – Feb – Title – n° 1 WP8: Progress and testbed evaluation F Harris (Oxford/CERN) (WP8 coordinator )
5 November 2001F Harris GridPP Edinburgh 1 WP8 status for validating Testbed1 and middleware F Harris(LHCb/Oxford)
WP9 – Earth Observation Applications – n° 1 Experiences with Testbed1, plans and objectives for Testbed 2 Testbed retreat th August 2002
Use of R-GMA in BOSS Henry Nebrensky (Brunel University) VRVS 26 April 2004 Some slides stolen from various talks at EDG 2 nd Review (
F.Fanzago – INFN Padova ; S.Lacaprara – LNL; D.Spiga – Universita’ Perugia M.Corvo - CERN; N.DeFilippis - Universita' Bari; A.Fanfani – Universita’ Bologna;
INFSO-RI Enabling Grids for E-sciencE Logging and Bookkeeping and Job Provenance Services Ludek Matyska (CESNET) on behalf of the.
SLICE Simulation for LHCb and Integrated Control Environment Gennady Kuznetsov & Glenn Patrick (RAL) Cosener’s House Workshop 23 rd May 2002.
Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.
Cosener’s House – 30 th Jan’031 LHCb Progress & Plans Nick Brook University of Bristol News & User Plans Technical Progress Review of deliverables.
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Job Monitoring for the LHC experiments Irina Sidorova (CERN, JINR) on.
Claudio Grandi INFN Bologna CHEP'03 Conference, San Diego March 27th 2003 Plans for the integration of grid tools in the CMS computing environment Claudio.
DataGrid is a project funded by the European Union VisualJob Demonstation EDG 1.4.x 2003 The EU DataGrid How the use of distributed resources can help.
DataGrid WP1 Massimo Sgaravatto INFN Padova. WP1 (Grid Workload Management) Objective of the first DataGrid workpackage is (according to the project "Technical.
1 DIRAC – LHCb MC production system A.Tsaregorodtsev, CPPM, Marseille For the LHCb Data Management team CHEP, La Jolla 25 March 2003.
Lessons for the naïve Grid user Steve Lloyd, Tony Doyle [Origin: 1645–55; < F, fem. of naïf, OF naif natural, instinctive < L nātīvus native ]native.
11 December 2000 Paolo Capiluppi - DataGrid Testbed Workshop CMS Applications Requirements DataGrid Testbed Workshop Milano, 11 December 2000 Paolo Capiluppi,
WP8 Status – Stephen Burke – 30th January 2003 WP8 Status Stephen Burke (RAL) (with thanks to Frank Harris)
Grid Workload Management Massimo Sgaravatto INFN Padova.
- Distributed Analysis (07may02 - USA Grid SW BNL) Distributed Processing Craig E. Tull HCG/NERSC/LBNL (US) ATLAS Grid Software.
CMS Stress Test Report Marco Verlato (INFN-Padova) INFN-GRID Testbed Meeting 17 Gennaio 2003.
Status of the LHCb MC production system Andrei Tsaregorodtsev, CPPM, Marseille DataGRID France workshop, Marseille, 24 September 2002.
November SC06 Tampa F.Fanzago CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team.
Author - Title- Date - n° 1 Partner Logo EU DataGrid, Work Package 5 The Storage Element.
Author - Title- Date - n° 1 Partner Logo WP5 Summary Paris John Gordon WP5 6th March 2002.
13 May 2004EB/TB Middleware meeting Use of R-GMA in BOSS for CMS Peter Hobson & Henry Nebrensky Brunel University, UK Some slides stolen from various talks.
Claudio Grandi INFN Bologna CHEP'03 Conference, San Diego March 27th 2003 BOSS: a tool for batch job monitoring and book-keeping Claudio Grandi (INFN Bologna)
First attempt for validating/testing Testbed 1 Globus and middleware services WP6 Meeting, December 2001 Flavia Donno, Marco Serra for IT and WPs.
Grid User Interface for ATLAS & LHCb A more recent UK mini production used input data stored on RAL’s tape server, the requirements in JDL and the IC Resource.
EDG Applications The European DataGrid Project Team
29 Sept 2004 CHEP04 A. Fanfani INFN Bologna 1 A. Fanfani Dept. of Physics and INFN, Bologna on behalf of the CMS Collaboration Distributed Computing Grid.
INFSO-RI Enabling Grids for E-sciencE CRAB: a tool for CMS distributed analysis in grid environment Federica Fanzago INFN PADOVA.
Summary from WP 1 Parallel Section Massimo Sgaravatto INFN Padova.
Testing the HEPCAL use cases J.J. Blaising, F. Harris, Andrea Sciabà GAG Meeting April,
15-Feb-02Steve Traylen, RAL WP6 Test Bed Report1 RAL/UK WP6 Test Bed Report Steve Traylen, WP6 PPGRID/RAL, UK
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
The DataGrid Project NIKHEF, Wetenschappelijke Jaarvergadering, 19 December 2002
Tests at Saclay D. Calvet, A. Formica, Z. Georgette, I. Mandjavidze, P. Micout DAPNIA/SEDI, CEA Saclay Gif-sur-Yvette Cedex.
D.Spiga, L.Servoli, L.Faina INFN & University of Perugia CRAB WorkFlow : CRAB: CMS Remote Analysis Builder A CMS specific tool written in python and developed.
EDG Project Conference – Barcelona 13 May 2003 – n° 1 A.Fanfani INFN Bologna – CMS WP8 – Grid Planning in CMS Outline  CMS Data Challenges  CMS Production.
BaBar-Grid Status and Prospects
The EDG Testbed Deployment Details
Work Package 9 – EO Applications
INFN-GRID Workshop Bari, October, 26, 2004
ALICE Physics Data Challenge 3
Nicolas Jacq LPC, IN2P3/CNRS, France
Testbed Software Test Plan Status
Scalability Tests With CMS, Boss and R-GMA
Job workflow Pre production operations:
CMS report from FNAL demo week Marco Verlato (INFN-Padova)
Stephen Burke, PPARC/RAL Jeff Templon, NIKHEF
Gridifying the LHCb Monte Carlo production system
Status and plans for bookkeeping system and production tools
Presentation transcript:

Réunion DataGrid France, Lyon, fév CMS test of EDG Testbed Production MC CMS Objectifs Résultats Conclusions et perspectives C. Charlot / LLR-École Polytechnique

Réunion DataGrid France, Lyon, fév CMS jobs description CMKIN : MC Generation of the proton- proton interaction for a physics channel (dataset) CMSIM: Detailed simulation of the CMS detector, processing the data produced during the CMKIN step CMKIN Job CMSIM Job Output data Output data Grid Storage Write to Grid Storage Element Write to Grid Storage Element Read from Grid Storage Element * PIII 1GHz 512MB, 46.8 SI95 size/eventtime * /event CMKIN ~ 0.05MB~ sec CMSIM ~ 1.8 MB ~ 6 min

Réunion DataGrid France, Lyon, fév CMSEDG SE CE CMS software CMS production components interfaced to EDG middleware BOSS DB Workload Management System JDL RefDB parameters data registration Job output filtering Runtime monitoring input data location Push data or info Pull info UI IMPALA/BOSS Replica Manager CE CMS software CE CMS software CE WN SE CE CMS software SE

Réunion DataGrid France, Lyon, fév Accessing information BOSS DB Workload Management System CMS UI Workstation Logging & Bookkeeping Replica Manager Replica Catalog boss SQL dg-job-status edg-replicamanager-xxx Information System (MDS)

Réunion DataGrid France, Lyon, fév Main Objectives Verify the portability of CMS production environment in Grid environment Assess the robustness of the EDG middleware in a production environment Produce data for physics studies –As part of a production involving non grid sites –Target was 1M events (250k initialy)

Réunion DataGrid France, Lyon, fév Technical objectives and choices Job submission –Use 4 UIs, 4 RBs, 1 RB per UI Proved usefull, 512 jobs limitation Easily possible to switch to another RB –Some UIs close to their RB, others far –Resubmission disabled Data management –Jobs writting data to a dedicated SE, jobs writing data to close SE Dedicated UI configurations (jdl creation in impala) –Replication of cmsim output to CERN (offline) Also monitored by Boss –Two sites with MSS Two MSS systems One site with direct MSS interface, the other with additional SE MSS enabled

Réunion DataGrid France, Lyon, fév Technical objectives and choices Data management (cont’d) –Two sites with MSS One UI sending jobs using MSS interface –Dedicated rc.conf, dedicated stage command Thus replication between –Disk => MSS –MSS => MSS (although not direct) –Only one RC One logical collection per UI

Réunion DataGrid France, Lyon, fév Sites and resources Global services highly distributed –CERN: top MDS, two RBs –CNAF: RC, RB –CC-IN2P3: EDG software repository, CMS software repository –Marseille: user authorization –NIKHEF: CMS VO –IC: RB Submission sites distributed as well –Bologna, LLR, Padova, IC Core sites –CERN, CC-IN2P3, CNAF, NIKHEF, RAL CMS added sites –Legnaro, Padova, Imperial College, LLR (waouh!) –Added on the fly during the test

Réunion DataGrid France, Lyon, fév Phenomenology of problems Problems related with information system –Highly instable, An important source of Aborted jobs low submission rate adopted, in particular for cmsim –Change to dbII (1.4.0) during the test Much better Still pbs when one GRIS hangs Replica Management and Catalog limitations –RC slowing, getting stuck by too many access (short jobs) –Limitation in number of (lengthy) entries Split into several collections –High rate of failure of edg-rm commands Network problems –Time out in InputSandbox transfer from UI to RB Problems related to job submission –See next slide

Réunion DataGrid France, Lyon, fév EDG reasons of failure (categories) Preliminary analysis of pre Xmas (1.4.0)

Réunion DataGrid France, Lyon, fév CMS/EDG Summary of Stress Test Preliminary Analysis Short jobs Long jobs After Stress Test – Jan 03

Réunion DataGrid France, Lyon, fév CMS use of the system (Statistics) CEsSEs Nb. of evts time Part of the CMS official production Production testbed (3 weeks period)

Réunion DataGrid France, Lyon, fév Main results and observations from CMS work RESULTS –Could distribute and run CMS s/w in EDG environment –Could increase resources by adding CMS sites on the fly –Generated ~250K events for physics with ~10,000 jobs in 3 week period OBSERVATIONS –Were able to quickly add new sites to provide extra resources –Fast turnaround in bug fixing and installing new software –Test was labour intensive (since software was developing and the overall system was fragile) WP1 At the start there were serious problems with long jobs- recently improved WP2 Replication Tools were difficult to use and not reliable, and the performance of the Replica Catalogue was unsatisfactory WP3 The Information System based on MDS performed poorly with increasing query rate The system is sensitive to hardware faults and site/system mis-configuration The user tools for fault diagnosis are limited –EDG 2.0 should fix the major problems (see talks by R Jones and E Laure) providing a system suitable for full integration in distributed production