1 LHCb Computing A.Tsaregorodtsev, CPPM, Marseille 14 March 2007, Clermont-Ferrand.

Slides:



Advertisements
Similar presentations
31/03/00 CMS(UK)Glenn Patrick What is the CMS(UK) Data Model? Assume that CMS software is available at every UK institute connected by some infrastructure.
Advertisements

Andrew C. Smith, 11 th May 2007 EGEE User Forum 2 - DIRAC Data Management System User Forum 2 Data Management System “Tell me and I forget. Show me and.
Resources for the ATLAS Offline Computing Basis for the Estimates ATLAS Distributed Computing Model Cost Estimates Present Status Sharing of Resources.
CHEP 2012 – New York City 1.  LHC Delivers bunch crossing at 40MHz  LHCb reduces the rate with a two level trigger system: ◦ First Level (L0) – Hardware.
Grid and CDB Janusz Martyniak, Imperial College London MICE CM37 Analysis, Software and Reconstruction.
6/4/20151 Introduction LHCb experiment. LHCb experiment. Common schema of the LHCb computing organisation. Common schema of the LHCb computing organisation.
23/04/2008VLVnT08, Toulon, FR, April 2008, M. Stavrianakou, NESTOR-NOA 1 First thoughts for KM3Net on-shore data storage and distribution Facilities VLV.
Stuart K. PatersonCHEP 2006 (13 th –17 th February 2006) Mumbai, India 1 from DIRAC.Client.Dirac import * dirac = Dirac() job = Job() job.setApplication('DaVinci',
Large scale data flow in local and GRID environment V.Kolosov, I.Korolko, S.Makarychev ITEP Moscow.
1 Managing distributed computing resources with DIRAC A.Tsaregorodtsev, CPPM-IN2P3-CNRS, Marseille September 2011, NEC’11, Varna.
LHCb Quarterly Report October Core Software (Gaudi) m Stable version was ready for 2008 data taking o Gaudi based on latest LCG 55a o Applications.
CMS Report – GridPP Collaboration Meeting VI Peter Hobson, Brunel University30/1/2003 CMS Status and Plans Progress towards GridPP milestones Workload.
LCG-France, 22 July 2004, CERN1 LHCb Data Challenge 2004 A.Tsaregorodtsev, CPPM, Marseille LCG-France Meeting, 22 July 2004, CERN.
Computing and LHCb Raja Nandakumar. The LHCb experiment  Universe is made of matter  Still not clear why  Andrei Sakharov’s theory of cp-violation.
3 Sept 2001F HARRIS CHEP, Beijing 1 Moving the LHCb Monte Carlo production system to the GRID D.Galli,U.Marconi,V.Vagnoni INFN Bologna N Brook Bristol.
Computing Infrastructure Status. LHCb Computing Status LHCb LHCC mini-review, February The LHCb Computing Model: a reminder m Simulation is using.
Cosener’s House – 30 th Jan’031 LHCb Progress & Plans Nick Brook University of Bristol News & User Plans Technical Progress Review of deliverables.
Nick Brook Current status Future Collaboration Plans Future UK plans.
1 DIRAC – LHCb MC production system A.Tsaregorodtsev, CPPM, Marseille For the LHCb Data Management team CHEP, La Jolla 25 March 2003.
ATLAS and GridPP GridPP Collaboration Meeting, Edinburgh, 5 th November 2001 RWL Jones, Lancaster University.
Bookkeeping Tutorial. Bookkeeping & Monitoring Tutorial2 Bookkeeping content  Contains records of all “jobs” and all “files” that are created by production.
F. Fassi, S. Cabrera, R. Vives, S. González de la Hoz, Á. Fernández, J. Sánchez, L. March, J. Salt, A. Lamas IFIC-CSIC-UV, Valencia, Spain Third EELA conference,
7April 2000F Harris LHCb Software Workshop 1 LHCb planning on EU GRID activities (for discussion) F Harris.
Status of the LHCb MC production system Andrei Tsaregorodtsev, CPPM, Marseille DataGRID France workshop, Marseille, 24 September 2002.
Tier-2  Data Analysis  MC simulation  Import data from Tier-1 and export MC data CMS GRID COMPUTING AT THE SPANISH TIER-1 AND TIER-2 SITES P. Garcia-Abia.
Your university or experiment logo here LHCb Development Glenn Patrick Raja Nandakumar GridPP18, 20 March 2007.
1 LCG-France sites contribution to the LHC activities in 2007 A.Tsaregorodtsev, CPPM, Marseille 14 January 2008, LCG-France Direction.
LHCb The LHCb Data Management System Philippe Charpentier CERN On behalf of the LHCb Collaboration.
The LHCb CERN R. Graciani (U. de Barcelona, Spain) for the LHCb Collaboration International ICFA Workshop on Digital Divide Mexico City, October.
Author: Andrew C. Smith Abstract: LHCb's participation in LCG's Service Challenge 3 involves testing the bulk data transfer infrastructure developed to.
1 LHCb on the Grid Raja Nandakumar (with contributions from Greig Cowan) ‏ GridPP21 3 rd September 2008.
Grid User Interface for ATLAS & LHCb A more recent UK mini production used input data stored on RAL’s tape server, the requirements in JDL and the IC Resource.
1 LHCb File Transfer framework N. Brook, Ph. Charpentier, A.Tsaregorodtsev LCG Storage Management Workshop, 6 April 2005, CERN.
CHEP 2006, February 2006, Mumbai 1 LHCb use of batch systems A.Tsaregorodtsev, CPPM, Marseille HEPiX 2006, 4 April 2006, Rome.
Bookkeeping Tutorial. 2 Bookkeeping content  Contains records of all “jobs” and all “files” that are produced by production jobs  Job:  In fact technically.
Outline: The LHCb Computing Model Philippe Charpentier, CERN ICFA workshop on Grid activities, Sinaia, Romania, October
David Stickland CMS Core Software and Computing
DIRAC Project A.Tsaregorodtsev (CPPM) on behalf of the LHCb DIRAC team A Community Grid Solution The DIRAC (Distributed Infrastructure with Remote Agent.
1 LHCb view on Baseline Services A.Tsaregorodtsev, CPPM, Marseille Ph.Charpentier CERN Baseline Services WG, 4 March 2005, CERN.
1 LHCb computing for the analysis : a naive user point of view Workshop analyse cc-in2p3 17 avril 2008 Marie-Hélène Schune, LAL-Orsay for LHCb-France Framework,
Distributed Physics Analysis Past, Present, and Future Kaushik De University of Texas at Arlington (ATLAS & D0 Collaborations) ICHEP’06, Moscow July 29,
1 DIRAC agents A.Tsaregorodtsev, CPPM, Marseille ARDA Workshop, 7 March 2005, CERN.
CHEP 2006, February 2006, Mumbai 1 DIRAC, the LHCb Data Production and Distributed Analysis system A.Tsaregorodtsev, CPPM, Marseille CHEP 2006,
Markus Frank (CERN) & Albert Puig (UB).  An opportunity (Motivation)  Adopted approach  Implementation specifics  Status  Conclusions 2.
GAG meeting, 5 July 2004, CERN1 LHCb Data Challenge 2004 A.Tsaregorodtsev, Marseille N. Brook, Bristol/CERN GAG Meeting, 5 July 2004, CERN.
1 DIRAC WMS & DMS A.Tsaregorodtsev, CPPM, Marseille ICFA Grid Workshop,15 October 2006, Sinaia.
Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.
LHCb Computing activities Philippe Charpentier CERN – LHCb On behalf of the LHCb Computing Group.
Breaking the frontiers of the Grid R. Graciani EGI TF 2012.
Jianming Qian, UM/DØ Software & Computing Where we are now Where we want to go Overview Director’s Review, June 5, 2002.
ATLAS – statements of interest (1) A degree of hierarchy between the different computing facilities, with distinct roles at each level –Event filter Online.
LHCb D ata P rocessing S oftware J. Blouw, A. Zhelezov Physikalisches Institut, Universitaet Heidelberg DESY Computing Seminar, Nov. 29th, 2010.
DIRAC: Workload Management System Garonne Vincent, Tsaregorodtsev Andrei, Centre de Physique des Particules de Marseille Stockes-rees Ian, University of.
L’analisi in LHCb Angelo Carbone INFN Bologna
INFN GRID Workshop Bari, 26th October 2004
Data Challenge with the Grid in ATLAS
The LHCb Software and Computing NSS/IEEE workshop Ph. Charpentier, CERN B00le.
ALICE Physics Data Challenge 3
LHCb Computing Model and Data Handling Angelo Carbone 5° workshop italiano sulla fisica p-p ad LHC 31st January 2008.
Philippe Charpentier CERN – LHCb On behalf of the LHCb Computing Group
Grid Deployment Board meeting, 8 November 2006, CERN
Artem Trunov and EKP team EPK – Uni Karlsruhe
R. Graciani for LHCb Mumbay, Feb 2006
LHCb Computing Philippe Charpentier CERN
LHCb Grid Computing LHCb is a particle physics experiment which will study the subtle differences between matter and antimatter. The international collaboration.
LHC Data Analysis using a worldwide computing grid
Gridifying the LHCb Monte Carlo production system
ATLAS DC2 & Continuous production
LHCb thinking on Regional Centres and Related activities (GRIDs)
The LHCb Computing Data Challenge DC06
Presentation transcript:

1 LHCb Computing A.Tsaregorodtsev, CPPM, Marseille 14 March 2007, Clermont-Ferrand

2 LHCb in brief  Experiment dedicated to studying CP-violation  Responsible for the dominance of matter on antimatter  Matter-antimatter difference studied using the b-quark (beauty)  High precision physics (tiny difference…)  Single arm spectrometer  Looks like a fixed-target experiment  Smallest of the 4 big LHC experiments  ~500 physicists  Nevertheless, computing is also a challenge….

3 LHCb Basic Computing principles  Raw data shipped in real time to Tier-0  Registered in the Grid (File Catalog)  Raw data provenance in a Bookkeeping database (query- enabled)  Resilience enforced by a second copy at Tier-1’s  Rate: ~2000 evts/s (35 kB)  70 MB/s  4 main trigger sources (with little overlap) b-exclusive; dimuon; D*; b-inclusive  All data processing up to final Tuple or histogram production distributed  Not even possible to reconstruct all data at Tier0…  Processing at Tier1 centers  LHCb runs jobs where data are  All data are placed explicitly

4 LHCb data processing software Simul. Gauss Analysis DaVinci MCHits DST Raw Data (r)DST MCParts GenParts Event model / Physics event model AOD Conditions Database Gaudi Digit. Boole Trigger Moore Recons. Brunel

5 LHCb dataflow Online MSS-SE Tier1 MSS-SE Recons. Stripping Simulation. Raw Digi Raw/Digi rDST DST rDST+Raw Tier1 Tier2Tier0 Analysis DST

6 Computing Model

7 The LHCb Tier1s  6 Tier1s  CNAF (IT, Bologna)  GridKa (DE, Karlsruhe)  IN2P3 (FR, Lyon)  NIKHEF (NL, Amsterdam)  PIC (ES, Barcelona)  RAL (UK, Didcot)  Contribute  Reconstruction  Stripping  Analysis  Keeps copies on MSS of  Raw (2 copies shared)  Locally produced rDST  DST (2 copies)  MC data (2 copies)  Keeps copies on disk of  DST (7 copies)

8 LHCb Computing: a few numbers  Event sizes  on persistent medium (not in memory)  Processing time  Best estimates as of today  Requirements for 2008  seconds of beam Current estimate TDR estimate Analysis 0.2Stripping 2.4Reconstruction kSI2k.sEvt processing 110DST 20rDST 35RAW kBEvent Size

9 Summary resources needs for Stripped AnalysisSimulationrDSTRAW Data on disk 2008 (TB) 0.5 Stripping AnalysisSimulationRecons. CPU needs in 2008 (MSI2k.yr) 483 Stripped AnalysisSimulationrDSTRAW Data on tape 2008 (TB)

10 DIRAC grid management software  DIRAC is a distributed data production and analysis system for the LHCb experiment  Includes workload and data management components Uses LCG services whenever possible  Was developed originally for the MC data production tasks  The goal was: integrate all the heterogeneous computing resources available to LHCb Minimize human intervention at LHCb sites  The resulting design led to an architecture based on a set of services and a network of light distributed agents

11 DIRAC Services, Agents and Resources DIRAC Job Management Service DIRAC Job Management Service Agent Production Manager Production Manager GANGA DIRAC API JobMonitorSvc JobAccountingSvc Job monitor ConfigurationSvc FileCatalogSvc BookkeepingSvc BK query webpage BK query webpage FileCatalog browser FileCatalog browser Services Agent MessageSvc Resources LCG Grid WN Site Gatekeeper Tier1 VO-box

12 WMS Service  DIRAC Workload Management System is itself composed of a set of central services, pilot agents and job wrappers  Realizes the PULL scheduling paradigm  Pilot agents deployed at LCG Worker Nodes pull the jobs from the central Task Queue  The central Task Queue allows to apply easily the VO policies by prioritization of the user jobs  Using the accounting information and user identities, groups and roles  The job scheduling is late  Job goes to a resource for immediate execution

13 Task Queue 1 DIRAC workload management Job Receiver Job Database Optimizer Prioritizer Optimizer Data Optimizer XXX Priority Calculator Accounting Service LHCb policy, quotas Job requirements, ownership Job priority Task Queue 1 VOMS info Agent Director Agent 1 Agent 2 … Match Maker Resources (WNs) Central Services

14 VO-boxes  LHCb VO-boxes are machines offered by Tier1 sites to insure safety and efficiency of the grid operations  Standard LCG software is maintained by the site managers;  LHCb software is maintained by the LHCb administrators;  Recovery of failed data transfers and bookkeeping operations.  VO-boxes are behaving in a completely non-intrusive way  Access site grid services via standard interfaces  Main advantage – geographical distribution  VO-boxes are set up now in all the Tier1 centers.  Any job can set requests on any VO-box in a round-robin way for redundancy and load-balancing

15 DM Components  DIRAC Data Management tools are built on top of or provide interfaces to the existing services  The main components are:  Storage Element client and Storage access plug-ins SRM, GridFTP, HTTP, SFTP, FTP, …  Replica Manager – high level operations Uploading, replication, registration Best replica finding Failure retries with alternative data access methods  File Catalogs LFC Processing Database  High level tools for automatic bulk data transfers  T0-T1 raw data distribution  T1-T1 reconstructed data distribution

16 Bulk Data Management  Bulk asynchronous file replication  Requests set in RequestDB  Transfer Agent executes periodically  ‘Waiting’ or ‘Running’ requests obtained from RequestDB  FTS bulk transfer jobs submitted and monitored LCG – Machinery Transfer network LHCb - DIRAC DMS Request DB FC Interface Transfer Manager Interface Replica Manager Transfer Agent LCG File Catalog File Transfer Service Tier0 SE Tier1 SE A Tier1 SE B Tier1 SE C

17 Results: T0-T1 data transfer tests

18 Processing Database  The suite of Production Manager tools to facilitate the routine production tasks:  define complex production workflows  manage large numbers of production jobs  Transformation Agents prepare data reprocessing jobs automatically as soon as the input files are registered in the Processing Database via a standard File Catalog interface  Minimize the human intervention, speed up standard production

19 DIRAC production performance  Permanent production (no more DC)  Up to 10K simultaneous production jobs  The throughput is only limited by the capacity available on LCG  ~80 distinct sites accessed through LCG or through DIRAC directly IN2P3 Total

20 CPU usage by country (since Dec. 2006) CountryCPU usage (%) UK41.1 CERN12.1 Italy9.6 German8.1 France7.7 Spain6.6 Greece3.8 Netherland3.1 Polony2.4 Russia2.0 Hungary0.9 UK CERN German Spain France Italy

21 CPU usage since April 2006 by sites Sites IDCPU usage (%) CERN14.0 CNAF11.1 Manchester9.1 RAL7.4 GRIDKA4.9 QMUL4.7 IN2P34.5 LPC3.3 USC3.3 NIKHEF2.9 Brunel(uk)2.4 Barcelona2.0 Liverpool1.9 Glasgow1.7 HG-06(Greece)1.6 Lancashire1.4 PIC1.3 CERN CNAF Manchester RAL  50% of production done at T1s

22 Status of various activities  The LHCb production system is working in a stable way for the MC production  Pilot agent model screens the LCG inefficiencies  The data distribution and reprocessing is more difficult  Unstable storage systems  Flaws in the data access middleware  Many problems will be resolved (and created) with the new SRM2.2 release

23 Status of various activities  User analysis is starting  Reliable data access is a crucial point  Efficient job prioritization is a must User and production jobs are competing for the same resources  Full chain tests involving DAQ, T0-T1 data distribution, distributed reconstruction, T1-T1 data distribution, final analysis in June  Automatic “real time” data movement and processing  Close to a real scale involving all the T1 sites

24 Conclusions  LHCb has proposed a Computing Model adapted at its specific needs (number of events, event size, low number of physics candidates)  Reconstruction, stripping and analysis resources located at Tier1s (and possibly some Tier2s with enough storage and CPU capacities)  CPU requirements dominated by Monte-Carlo, assigned to Tier2s and opportunistic sites  With DIRAC, even idle desktops / laptops could be used ;-) ?  Requirements are modest compared to other experiments  DIRAC is well suited and adapted to this computing model  Integrated WMS and DMS  LHCb’s Computing should be ready when first data come

25 LHCb software stack  Uses CMT for build and configuration (handling dependencies)  LHCb projects:  Applications Gauss (simulation), Boole (digitisation), Brunel (reconstruction), Moore (HLT), DaVinci (analysis)  Algorithms LBCOM (commone packages), Rec (reconstruction), Phys (physics), Online  Event model LHCb  Software framework Gaudi  LCG Applications area POOL, root, COOL  Lcg/external External SW: boost, xerces… also middleware client (lfc, gfal,…) LHCb Online SEALPOOL RootExt.Libs GaussBooleBrunelPanoramixMoore Gaudi LCG Framework Applications PhysRecLbcom Event Model DaVinci Component projects COOL CORAL Geant4 GENSER

26 Last month activities Record of running jobs 9654 CNAFGRIDKA IN2P3 NIKHEF PIC RAL ALL CERN  Average of 7.5K running jobs in the last month  Temporary problems at PIC and RAL

27 Community Overlay Network TQ DIRAC WMS WN Pilot Agent Pilot Agent Monitoring Logging WN Pilot Agent Pilot Agent WN Pilot Agent Pilot Agent  DIRAC Central Services and Pilot Agents form a dynamic distributed system as easy to manage as an ordinary batch system  Uniform view of the resources independent of their nature Grids, clusters, PCs  Prioritization according to VO policies and accounting  Possibility to reuse batch system tools  E.g. Maui scheduler GRID

28 Reconstruction requirements 2 passes per year: 1 quasi real time over ~100 day period (2.8 MSI2k) re-processing over 2 month period of shutdown (4.3 MSI2k) Make use of Filter Farm at pit (2.2 MSI2k) - data back to the pit CPU (MSI2k.yr) MSS storage (TB) 8      10 8 Number of events Input fraction Totalb-inclusiveD*Dimuonb-exclusive

29 Stripping requirements Stripping 4 times per year - 1 month production outside of recons Stripping has at least 4 output streams Only rDST stored for “non-b” channels+RAW i.e. 55 kB RAW+full DST for “b” channels - i.e. 110kB Output on disk SE at all Tier-1 centres 84121TAG (TB) Storage requirement per stripping (TB) CPU (MSI2k.year) 8.4      10 7 Event yield per stripping Reduction factor Input fraction TotalInclusive-bD*dimuonExclusive-b

30 Simulation requirements - studies to measure performance of detector & event selection in particular regions of phase space - use large statistics dimuon & D* samples for systematics - reduced Monte Carlo needs 3.87Total  10 7 Brunel  10 8 Boole  10 8 GaussInclusive  10 7 Brunel  10 8 Boole  10 8 GaussSignal Total CPU (MSI2k.year) CPU time/evt (kSI2k.s) Nos. of eventsApplication

31 Simulation storage requirements - Simulation still dominate LHCb CPU needs - Current evt size for Monte Carlo DST (with truth info) is ~400kB/evt; - Total storage needs 64TB in Output at CERN and another 2 copies distributed over Tier-1 centres 64Total  10 7 TAG  10 7 DSTInclusive  10 7 TAG  10 7 DSTSignal Total Storage (TB) Storage/evt (kB) Nos. of eventsOutput

32 Analysis requirements - user analysis accounted in model predominantly batch - ~30k jobs/year - predominantly analysing ~10 6 events - CPU of 0.3 kSI2k.s/evt - Analysis needs grow linearly with year in early phase of expt Disk storage (TB) CPU needs (MSI2k.years) 10Number of “active” Ntuples 5Event size reduction factor after analysis 4Nos. of analysis jobs per physicist/week 140Nos. of physicist performing analysis