Presentation is loading. Please wait.

Presentation is loading. Please wait.

Réunion DataGrid France, Lyon, fév. 2003 CMS test of EDG Testbed Production MC CMS Objectifs Résultats Conclusions et perspectives C. Charlot / LLR-École.

Similar presentations


Presentation on theme: "Réunion DataGrid France, Lyon, fév. 2003 CMS test of EDG Testbed Production MC CMS Objectifs Résultats Conclusions et perspectives C. Charlot / LLR-École."— Presentation transcript:

1 Réunion DataGrid France, Lyon, fév. 2003 CMS test of EDG Testbed Production MC CMS Objectifs Résultats Conclusions et perspectives C. Charlot / LLR-École Polytechnique

2 Réunion DataGrid France, Lyon, fév. 2003 CMS jobs description CMKIN : MC Generation of the proton- proton interaction for a physics channel (dataset) CMSIM: Detailed simulation of the CMS detector, processing the data produced during the CMKIN step CMKIN Job CMSIM Job Output data Output data Grid Storage Write to Grid Storage Element Write to Grid Storage Element Read from Grid Storage Element * PIII 1GHz 512MB, 46.8 SI95 size/eventtime * /event CMKIN ~ 0.05MB~ 0.4-0.5 sec CMSIM ~ 1.8 MB ~ 6 min

3 Réunion DataGrid France, Lyon, fév. 2003 CMSEDG SE CE CMS software CMS production components interfaced to EDG middleware BOSS DB Workload Management System JDL RefDB parameters data registration Job output filtering Runtime monitoring input data location Push data or info Pull info UI IMPALA/BOSS Replica Manager CE CMS software CE CMS software CE WN SE CE CMS software SE

4 Réunion DataGrid France, Lyon, fév. 2003 Accessing information BOSS DB Workload Management System CMS UI Workstation Logging & Bookkeeping Replica Manager Replica Catalog boss SQL dg-job-status edg-replicamanager-xxx Information System (MDS)

5 Réunion DataGrid France, Lyon, fév. 2003 Main Objectives Verify the portability of CMS production environment in Grid environment Assess the robustness of the EDG middleware in a production environment Produce data for physics studies –As part of a production involving non grid sites –Target was 1M events (250k initialy)

6 Réunion DataGrid France, Lyon, fév. 2003 Technical objectives and choices Job submission –Use 4 UIs, 4 RBs, 1 RB per UI Proved usefull, 512 jobs limitation Easily possible to switch to another RB –Some UIs close to their RB, others far –Resubmission disabled Data management –Jobs writting data to a dedicated SE, jobs writing data to close SE Dedicated UI configurations (jdl creation in impala) –Replication of cmsim output to CERN (offline) Also monitored by Boss –Two sites with MSS Two MSS systems One site with direct MSS interface, the other with additional SE MSS enabled

7 Réunion DataGrid France, Lyon, fév. 2003 Technical objectives and choices Data management (cont’d) –Two sites with MSS One UI sending jobs using MSS interface –Dedicated rc.conf, dedicated stage command Thus replication between –Disk => MSS –MSS => MSS (although not direct) –Only one RC One logical collection per UI

8 Réunion DataGrid France, Lyon, fév. 2003 Sites and resources Global services highly distributed –CERN: top MDS, two RBs –CNAF: RC, RB –CC-IN2P3: EDG software repository, CMS software repository –Marseille: user authorization –NIKHEF: CMS VO –IC: RB Submission sites distributed as well –Bologna, LLR, Padova, IC Core sites –CERN, CC-IN2P3, CNAF, NIKHEF, RAL CMS added sites –Legnaro, Padova, Imperial College, LLR (waouh!) –Added on the fly during the test

9 Réunion DataGrid France, Lyon, fév. 2003 Phenomenology of problems Problems related with information system –Highly instable, An important source of Aborted jobs low submission rate adopted, in particular for cmsim –Change to dbII (1.4.0) during the test Much better Still pbs when one GRIS hangs Replica Management and Catalog limitations –RC slowing, getting stuck by too many access (short jobs) –Limitation in number of (lengthy) entries Split into several collections –High rate of failure of edg-rm commands Network problems –Time out in InputSandbox transfer from UI to RB Problems related to job submission –See next slide

10 Réunion DataGrid France, Lyon, fév. 2003 EDG reasons of failure (categories) Preliminary analysis of pre Xmas (1.4.0)

11 Réunion DataGrid France, Lyon, fév. 2003 CMS/EDG Summary of Stress Test Preliminary Analysis Short jobs Long jobs After Stress Test – Jan 03

12 Réunion DataGrid France, Lyon, fév. 2003 CMS use of the system (Statistics) CEsSEs Nb. of evts time Part of the CMS official production Production testbed (3 weeks period)

13 Réunion DataGrid France, Lyon, fév. 2003 Main results and observations from CMS work RESULTS –Could distribute and run CMS s/w in EDG environment –Could increase resources by adding CMS sites on the fly –Generated ~250K events for physics with ~10,000 jobs in 3 week period OBSERVATIONS –Were able to quickly add new sites to provide extra resources –Fast turnaround in bug fixing and installing new software –Test was labour intensive (since software was developing and the overall system was fragile) WP1 At the start there were serious problems with long jobs- recently improved WP2 Replication Tools were difficult to use and not reliable, and the performance of the Replica Catalogue was unsatisfactory WP3 The Information System based on MDS performed poorly with increasing query rate The system is sensitive to hardware faults and site/system mis-configuration The user tools for fault diagnosis are limited –EDG 2.0 should fix the major problems (see talks by R Jones and E Laure) providing a system suitable for full integration in distributed production


Download ppt "Réunion DataGrid France, Lyon, fév. 2003 CMS test of EDG Testbed Production MC CMS Objectifs Résultats Conclusions et perspectives C. Charlot / LLR-École."

Similar presentations


Ads by Google