LHCb Computing Philippe Charpentier CERN
LHCb in brief Experiment dedicated to studying CP-violation Responsible for the dominance of matter on antimatter Matter-antimatter difference studied using the b-quark (beauty) High precision physics (tiny difference…) Single arm spectrometer Looks like a fixed-target experiment Smallest of the 4 big LHC experiments ~500 physicists Nevertheless, computing is also a challenge….
LHCb data processing software Simul. Gauss Analysis DaVinci MCHits DST Raw Data (r)DST MCParts GenParts Event model / Physics event model µDST Conditions Database Gaudi Digit. Boole Trigger Moore Recons. Brunel LHCb Computing, PhC
LHCb Basic Computing principles Raw data shipped in real time to Tier-0 Registered in the Grid File Catalog (LFC) Raw data provenance in a Bookkeeping database (query-enabled) Resilience enforced by a second copy at Tier-1’s Rate: ~2000 evts/s (35 kB) 70 MB/s All data processing up to final µDST or Tuple production distributed Not possible to perform first pass reconstruction of all data at Tier0 Consider Tier0 also is distributed First pass reconstruction at all Tier1s like re-processing Analysis performed at Analysis Facilities In Computing Model: AF at Tier1s Part of the analysis is not data-related Extracting physics parameters on CP violation (toy-MC, complex fitting procedures…) Also using distributed computing resources LHCb Computing, PhC
Basic principles (cont’d) LHCb runs jobs where data are All data are placed explicitly Analysis made possible by reduction of datasets many different channels of interest very few events in each channel (from 102 to 106 events / year) physicist dealing with maximum 107 events small and simple events final dataset manageable on physicist’s desktop (100’s of GBytes) Calibration and alignment performed on a selected part of the data stream (at CERN) Alignment and tracking calibration using dimuons (~5/s) Used also for validation of new calibration PID calibration using Ks, D* LHCb Computing, PhC
LHCb dataflow Tier0 Tier2 Tier1 Online MSS-SE Recons. Tier1 Analysis Simulation. Online Tier0 Tier2 Raw MSS-SE Tier1 Digi Recons. Raw/Digi Tier1 MSS-SE rDST Analysis Stripping rDST+Raw DST DST LHCb Computing, PhC
Comments on the LHCb Distributed Computing Only last part of the analysis is foreseen to be “interactive” Either analysing ROOT trees or using GaudiPython/pyRoot User analysis at Tier1’s - why? Analysis is very delicate, needs careful file placement Tier1’s are easier to check, less prone (in principle) to outages CPU requirements are very modest What is LHCb’s concept of the Grid? It is a set of computing resources working in a collaborative way Provides computing resources for the collaboration as a whole Recognition of contributions is independent on what type of jobs are run at a site There are no noble and less noble tasks. All are needed to make the experiment a success Resources are not made available for nationals Resource high availability is the key issue LHCb Computing, PhC
Further comments on Analysis Preliminary: currently being discussed Local analysis (Tier3) Non-pledged resources, reserved to local users (no CE access, local batch queues, no central accounting) Storage may be a Grid-SE (i.e. SRM-enabled) or not Copy or Replication performed by Dirac DMS tools Grid-SE: replication, can use third-party transfers Replica should be registered in LFC Non Grid-SE: copy from a local node LFC registration more problematic (no SRM), but possible Analysis on Tier2 Pledged resources, therefore available to the whole collaboration Resources should be additional (dedicated to analysis) We have just enough with Tier2 for simulation… Storage and data access handled by local team (no central manpower available) Data fully replicated in Grid-SE (LFC) CE centrally banned in case of failures (as for Tier1s) LHCb Computing, PhC
How to best achieve Distributed Computing? Data Management is primordial Availability of Storage Elements at Tier1’s Reliability of SRM and transfers Efficiency of data access protocols (rfio, (gsi)dcap, xrootd…) Infrastructure is vital Resource management 24x7 support coverage Reliable and powerful networks (OPN) Resource sharing is a must Less support needed Best resource usage (less idle CPUs, empty tapes, unused networks…) Shares must be on long term, no hard limit on number of slots …. but opportunistic resources should not be neglected… EGEE, EGI? LHCb Computing, PhC
LHCb Distributed Computing software Integrated WMS and DMS : DIRAC Distributed analysis portal: GANGA Uses DIRAC W&DMS as back-end DIRAC’s main characteristics Implements late job scheduling Overlay network (pilot jobs, central task queue) Pull paradigm Generic pilot jobs: allows to run multiple payload Allows LHCb policy to be enforced Alleviates the level of support required from sites LHCb services designed to be redundant and hence highly available (multiple instances with failover, VO-BOXes) LHCb Computing, PhC
WMS with pilot jobs Jobs are submitted with credentials of their owner (VOMS proxy) The proxy is renewed automatically inside the WMS repository The Pilot Job fetches the User Job and proxy The User Job is executed with its owner’s proxy used to access SE, catalogs, etc
The LHCb Tier1s 6 Tier1s Contribute to Keeps copies on MSS of CNAF (IT, Bologna) GridKa (DE, Karlsruhe) IN2P3 (FR, Lyon) NIKHEF (NL, Amsterdam) PIC (ES, Barcelona) RAL (UK, Didcot) Contribute to Reconstruction Stripping Analysis Keeps copies on MSS of Raw (2 copies shared) Locally produced rDST DST (2 copies) MC data (2 copies) Keeps copies on disk of DST (7 copies)
LHCb Computing: a few numbers Event sizes on persistent medium (not in memory) Processing time Best estimates as of today Requirements for 2009-10 6 106 seconds of beam ~109 MC b-events ~3. 109 MC non-b events TDR estimate Current estimate Event Size kB RAW 25 35 rDST 20 DST 100 110 Evt processing kSI2k.s Simulation 75 90 Reconstruction 2.4 3 Stripping 0.2 Analysis 0.3
Resource requirements for 2009-10 Site Fraction (%) CERN 14 FZK 11 IN2P3 25 CNAF 9 NL-T1 26 PIC 4 RAL CPU requirements Disk requirements Tape requirements IN2P3-CC represents 25% of the LHCb Tier1 pledges LHCb Computing, PhC
Statistiques depuis le 1er janvier 2009 LHCb and LCG-France Statistiques depuis le 1er janvier 2009 LHCb Computing, PhC
CPU (in days) from January 2009 LHCb Computing, PhC
Jobs in France LHCb Computing, PhC
Tests for analysis LHCb Computing, PhC
Analysis data access (cont’d) LHCb Computing, PhC
Conclusions LHCb has proposed a Computing Model adapted at its specific needs (number of events, event size, low number of physics candidates) Reconstruction, stripping and analysis resources located at Tier1s (and possibly some Tier2s with enough storage and CPU capacities) CPU requirements dominated by Monte-Carlo, assigned to Tier2s and opportunistic sites With DIRAC, even idle desktops / laptops could be used ;-) LHCb@home ? Requirements are modest compared to other experiments DIRAC is well suited and adapted to this computing model Integrated WMS and DMS LHCb Computing, PhC