Results of the LHCb experiment Data Challenge 2004 Joël Closier CERN / LHCb CHEP’ 04.

Slides:



Advertisements
Similar presentations
Réunion DataGrid France, Lyon, fév CMS test of EDG Testbed Production MC CMS Objectifs Résultats Conclusions et perspectives C. Charlot / LLR-École.
Advertisements

INFSO-RI Enabling Grids for E-sciencE EGEE Middleware The Resource Broker EGEE project members.
Grid and CDB Janusz Martyniak, Imperial College London MICE CM37 Analysis, Software and Reconstruction.
1 Grid services based architectures Growing consensus that Grid services is the right concept for building the computing grids; Recent ARDA work has provoked.
CHEP 2004, 27 September - 1 October 2004, Interlaken1 DIRAC – the distributed production and analysis for LHCb A.Tsaregorodtsev, CPPM, Marseille CHEP 2004,
23/04/2008VLVnT08, Toulon, FR, April 2008, M. Stavrianakou, NESTOR-NOA 1 First thoughts for KM3Net on-shore data storage and distribution Facilities VLV.
K.Harrison CERN, 23rd October 2002 HOW TO COMMISSION A NEW CENTRE FOR LHCb PRODUCTION - Overview of LHCb distributed production system - Configuration.
Stuart K. PatersonCHEP 2006 (13 th –17 th February 2006) Mumbai, India 1 from DIRAC.Client.Dirac import * dirac = Dirac() job = Job() job.setApplication('DaVinci',
Large scale data flow in local and GRID environment V.Kolosov, I.Korolko, S.Makarychev ITEP Moscow.
The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.
Pilots 2.0: DIRAC pilots for all the skies Federico Stagni, A.McNab, C.Luzzi, A.Tsaregorodtsev On behalf of the DIRAC consortium and the LHCb collaboration.
LCG Milestones for Deployment, Fabric, & Grid Technology Ian Bird LCG Deployment Area Manager PEB 3-Dec-2002.
CERN - IT Department CH-1211 Genève 23 Switzerland t Monitoring the ATLAS Distributed Data Management System Ricardo Rocha (CERN) on behalf.
LCG-France, 22 July 2004, CERN1 LHCb Data Challenge 2004 A.Tsaregorodtsev, CPPM, Marseille LCG-France Meeting, 22 July 2004, CERN.
03/27/2003CHEP20031 Remote Operation of a Monte Carlo Production Farm Using Globus Dirk Hufnagel, Teela Pulliam, Thomas Allmendinger, Klaus Honscheid (Ohio.
Computing Infrastructure Status. LHCb Computing Status LHCb LHCC mini-review, February The LHCb Computing Model: a reminder m Simulation is using.
Cosener’s House – 30 th Jan’031 LHCb Progress & Plans Nick Brook University of Bristol News & User Plans Technical Progress Review of deliverables.
Nick Brook Current status Future Collaboration Plans Future UK plans.
Backdrop Particle Paintings created by artist Tom Kemp September Grid Information and Monitoring System using XML-RPC and Instant.
Sep 21, 20101/14 LSST Simulations on OSG Sep 21, 2010 Gabriele Garzoglio for the OSG Task Force on LSST Computing Division, Fermilab Overview OSG Engagement.
1 DIRAC – LHCb MC production system A.Tsaregorodtsev, CPPM, Marseille For the LHCb Data Management team CHEP, La Jolla 25 March 2003.
LHCb week, 27 May 2004, CERN1 Using services in DIRAC A.Tsaregorodtsev, CPPM, Marseille 2 nd ARDA Workshop, June 2004, CERN.
Status of the LHCb MC production system Andrei Tsaregorodtsev, CPPM, Marseille DataGRID France workshop, Marseille, 24 September 2002.
November SC06 Tampa F.Fanzago CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team.
RAL Site Report John Gordon IT Department, CLRC/RAL HEPiX Meeting, JLAB, October 2000.
Enabling Grids for E-sciencE System Analysis Working Group and Experiment Dashboard Julia Andreeva CERN Grid Operations Workshop – June, Stockholm.
1 LCG-France sites contribution to the LHC activities in 2007 A.Tsaregorodtsev, CPPM, Marseille 14 January 2008, LCG-France Direction.
Getting started DIRAC Project. Outline  DIRAC information system  Documentation sources  DIRAC users and groups  Registration with DIRAC  Getting.
Author: Andrew C. Smith Abstract: LHCb's participation in LCG's Service Challenge 3 involves testing the bulk data transfer infrastructure developed to.
1 LHCb on the Grid Raja Nandakumar (with contributions from Greig Cowan) ‏ GridPP21 3 rd September 2008.
1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.
CHEP 2006, February 2006, Mumbai 1 LHCb use of batch systems A.Tsaregorodtsev, CPPM, Marseille HEPiX 2006, 4 April 2006, Rome.
The ATLAS Cloud Model Simone Campana. LCG sites and ATLAS sites LCG counts almost 200 sites. –Almost all of them support the ATLAS VO. –The ATLAS production.
SAM Sensors & Tests Judit Novak CERN IT/GD SAM Review I. 21. May 2007, CERN.
EMI INFSO-RI ARC tools for revision and nightly functional tests Jozef Cernak, Marek Kocan, Eva Cernakova (P. J. Safarik University in Kosice, Kosice,
1 A lightweight Monitoring and Accounting system for LHCb DC'04 production V. Garonne R. Graciani Díaz J. J. Saborido Silva M. Sánchez García R. Vizcaya.
DIRAC Review (12 th December 2005)Stuart K. Paterson1 DIRAC Review Workload Management System.
Testing and integrating the WLCG/EGEE middleware in the LHC computing Simone Campana, Alessandro Di Girolamo, Elisa Lanciotti, Nicolò Magini, Patricia.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks CRAB: the CMS tool to allow data analysis.
Data Transfer Service Challenge Infrastructure Ian Bird GDB 12 th January 2005.
Large scale data flow in local and GRID environment Viktor Kolosov (ITEP Moscow) Ivan Korolko (ITEP Moscow)
LHCb Data Challenge in 2002 A.Tsaregorodtsev, CPPM, Marseille DataGRID France meeting, Lyon, 18 April 2002.
1 LHCb view on Baseline Services A.Tsaregorodtsev, CPPM, Marseille Ph.Charpentier CERN Baseline Services WG, 4 March 2005, CERN.
1 LHCb computing for the analysis : a naive user point of view Workshop analyse cc-in2p3 17 avril 2008 Marie-Hélène Schune, LAL-Orsay for LHCb-France Framework,
LHCbDirac and Core Software. LHCbDirac and Core SW Core Software workshop, PhC2 Running Gaudi Applications on the Grid m Application deployment o CVMFS.
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
LHCb system for distributed MC production (data analysis) and its use in Russia NEC’2005, Varna, Bulgaria Ivan Korolko (ITEP Moscow)
1 DIRAC agents A.Tsaregorodtsev, CPPM, Marseille ARDA Workshop, 7 March 2005, CERN.
CHEP 2006, February 2006, Mumbai 1 DIRAC, the LHCb Data Production and Distributed Analysis system A.Tsaregorodtsev, CPPM, Marseille CHEP 2006,
D.Spiga, L.Servoli, L.Faina INFN & University of Perugia CRAB WorkFlow : CRAB: CMS Remote Analysis Builder A CMS specific tool written in python and developed.
GAG meeting, 5 July 2004, CERN1 LHCb Data Challenge 2004 A.Tsaregorodtsev, Marseille N. Brook, Bristol/CERN GAG Meeting, 5 July 2004, CERN.
Breaking the frontiers of the Grid R. Graciani EGI TF 2012.
Joe Foster 1 Two questions about datasets: –How do you find datasets with the processes, cuts, conditions you need for your analysis? –How do.
CERN LCG1 to LCG2 Transition Markus Schulz LCG Workshop March 2004.
BaBar & Grid Eleonora Luppi for the BaBarGrid Group TB GRID Bologna 15 febbraio 2005.
Lessons learned administering a larger setup for LHCb
LHCb D ata P rocessing S oftware J. Blouw, A. Zhelezov Physikalisches Institut, Universitaet Heidelberg DESY Computing Seminar, Nov. 29th, 2010.
DIRAC: Workload Management System Garonne Vincent, Tsaregorodtsev Andrei, Centre de Physique des Particules de Marseille Stockes-rees Ian, University of.
Real Time Fake Analysis at PIC
Overview of the Belle II computing
U.S. ATLAS Grid Production Experience
INFN GRID Workshop Bari, 26th October 2004
LHCb Computing Model and Data Handling Angelo Carbone 5° workshop italiano sulla fisica p-p ad LHC 31st January 2008.
Nicolas Jacq LPC, IN2P3/CNRS, France
Grid Deployment Board meeting, 8 November 2006, CERN
Simulation use cases for T2 in ALICE
R. Graciani for LHCb Mumbay, Feb 2006
LHC Data Analysis using a worldwide computing grid
Production Manager Tools (New Architecture)
The LHCb Computing Data Challenge DC06
Presentation transcript:

Results of the LHCb experiment Data Challenge 2004 Joël Closier CERN / LHCb CHEP’ 04

Result of LHCb DC042 The LHCb DC04 team  Dirac –Andrei Tsaregorodtsev, Vincent Garonne, Ian Stokes-Rees  Production management –Joel Closier, Ricardo Graciani (LCG), Johan Blouw, Andrew Pickford … and the LHCb site managers  LHCb Bookkeeping, Monitoring & accounting –Markus Frank, Carmine Cioffi, Manuel Sanchez, Ruben Vizcaya  LCG-LHCb liaison –Flavia Donno, Roberto Santinelli  The LCG-GDA team –Ian Bird, Laurence Field, Maarten Litmaath, Markus Schulz, David Smith, Zdenek Sekera, Marco Serra…

Result of LHCb DC043 Outline  Aims of the LHCb Data Challenge 2004  Production model  Performances of DC’04  Lessons from DC’04  Conclusions

Result of LHCb DC044 LHCb DC’04 aims  Main goal :gather information to be used for writing the LHCb computing Technical Design Report –Robustness test of the LHCb software and production system Using software as realistic as possible in terms of performance –Test of the LHCb distributed computing model Including distributed analyses Realistic test of analysis environment, need realistic analyses –Incorporation of the LCG application area software into the LHCb production environment –Use of LCG resources (at least 50% of the production capacity) –3 phases Production : MC simulation and reconstruction Stripping : Event pre-selection Analysis

Result of LHCb DC045 LHCb DC04 aims (cont’d)  Physics goals –HLT studies, consolidating efficiencies –Background/Signal studies, consolidate background estimates + background properties  Requires quantitative increase in number of signal and background events compared to DC03: – signal events – specific background – background (B inclusive + minimum bias, ratio 1:1.8)

Result of LHCb DC046 Production  Production done with DIRAC system –Track 4 - Distributed Computing Services : id 377  DIRAC is deployed to each site participating to DC’04  Central Services supporting the Data Challenge –Production database –Workload Management System –Monitoring, Accounting –Bookkeeping, ALIEN File Catalog  Technologies used by the production services –C++, python, XML-RPC –ORACLE and mysql databases

Result of LHCb DC047 LHCb job Non LCG site 1.DIRAC deployment (CE). 2.DIRAC JobAgent: –Check CE status. –Request a DIRAC task (jdl). –Install LHCb software if needed –Submit to Local Batch System the job. –Execute task: –Check Steps. –Upload results 3.DIRAC TransferAgent. LCG site 1.Input SandBox: –Small bash script (~50 lines). 1.Check environment: Site, hostname, CPU, Memory, Disk Space… 2.Install DIRAC: Download DIRAC tarball (~1 MB). Deploy DIRAC on WN. 3.Execute the job: A.Request a DIRAC task (LHCb Simulation job) B.Execute task: C.Check Steps D.Upload results: 2.Retrieval of SandBox 3.Analysis of Retrieved Output SandBox

Result of LHCb DC048 Strategy  Test sites: –Each site is tested with special and production-like jobs.  Enable site : –DIRAC Workload Management System.  Always keep jobs in the queues DIRAC  Run Local Agent continuously: –Via cron jobs –Via runsv –Via daemon LCG  Submit jobs continuously: –Via cron job on User Interface PS: LCG is considered as a site for DIRAC point of view

Result of LHCb DC049 Data Storage  All the output of the reconstructed phase (DST) are send to CERN (as Tier0)  All the intermediate files are not kept.  DSTs are also stored in one of our 5 TIER1 –CNAF (Italy) –Karlsruhe (Germany) –Lyon (France) –PIC (Spain) –RAL (United Kingdom)

Result of LHCb DC0410 DC’04 performances

Result of LHCb DC0411 Phase 1 results DIRAC alone LCG in action /day LCG paused Phase 1 Completed /day LCG restarted 186 M Produced Events

Result of LHCb DC0412 Daily performance 5 million/day

Result of LHCb DC0413 Sites involved 43 LCG Sites (8 also DIRAC sites) 20 DIRAC Sites Used resources from non-LHCb countries e.g. Hungary produced ~2M events

Result of LHCb DC0414 Simultaneous jobs (a snapshot)

Result of LHCb DC0415 TIER storage Tier 1Nb of EventsSize (TB) CNAF RAL PIC Karlsruhe Lyon TIER 0Nb of EventsSize (TB) CERN

Result of LHCb DC0416 DIRAC-LCG : events share 50% of events were produced using LCG

Result of LHCb DC0417 DIRAC – LCG : CPU share May: 88%:12% 11% of DC’04 Jun: 78%:22% 25% of DC’04 Jul: 75%:25% 22% of DC’04 Aug: 26%:74% 42% of DC’ CPU · Years

Result of LHCb DC k Submitted Jobs to LCG After Running: LCG Efficiency: 61 % LCG performance 113 k Done (Successful) 34 k Aborted

Result of LHCb DC0419 DC’04 lessons

Result of LHCb DC0420 Lessons learnt: DIRAC  The concept of the light, customizable and simple to deploy agents proved to be very effective  Easy update procedure - propagate bug fixes quickly of DIRAC tools  Applications software installation triggered by a running job  Most of the central services were running on the same machine –Too many processes, high loads  Improve Server Availability  Improve Error Handling and Reporting.

Result of LHCb DC0421 Lessons learnt: LCG  Improve OutputSandBox Upload | Retrieval mechanism: –Should also be available for Failed and Aborted Jobs.  Improve reliability of CE status collection methods (timestamps?).  Add intelligence on CE or RB to detect and avoid large number of aborted jobs on start-up: –Avoid miss-configured site to become a black-hole.  Need to collect LCG-log info and tool to navigate them (including different JobIDs).  Need a way to limit the CPU (and Wall-clock time): –LCG Wrapper must issue appropriated signals to User Job to allow graceful termination.  How to manuals: –Clear instruction to Site Managers on the procedure to shutdown a site (for maintenance and/or upgrade). –Problems with site configurations (LCG config, firewalls, gridFTP servers..)

Result of LHCb DC0422  LHCb DC’04 Phase 1 is over.  The Production Target was achieved: –186 M Events in 424 CPU years. –~ 50% on LCG Resources (75-80% at the last weeks).  LHCb Strategy successful: –Submitting “empty” DIRAC Agents to LCG has proven to be very flexible allowing a success rate above LCG alone.  Big room for improvements, both on DIRAC and LCG –DIRAC needs to improve in the reliability of the Servers: big step already during DC. –LCG needs improvement on the single job efficiency: ~40% aborted jobs, ~10% did the work but failed from LCG viewpoint. –In both cases extra protections against external failures (network, unexpected shutdowns…) must be built in.  Success due to dedicated support from LCG team and DIRAC Site Managers Conclusions

Result of LHCb DC0423 Other links  CHEP04 talks: –File-Metadata Management System for the LHCb Experiment (Track 4 - Distributed Computing Services) id Sep :30 –DIRAC Workload Management System (Track 5 - Distributed Computing Systems and Experiences) id Sep :00 –Grid Information and Monitoring System using XML-RPC and Instant Messaging for DIRAC (Track 4 - Distributed Computing Services) id Sep :00 –DIRAC - The Distributed MC Production and Analysis for LHCb (Track 4 - Distributed Computing Services) id Sep :10 –A Lightweight Monitoring and Accounting System for LHCb DC04 Production (Track 4 - Distributed Computing Services) id Sep :30