LHCb Roadmap 2009-10. 2008: DIRAC3 put in production m Production activities o Started in July o Simulation, reconstruction, stripping P Includes file.

Slides:



Advertisements
Similar presentations
LHCb on the Grid A Tale of many Migrations
Advertisements

Status GridKa & ALICE T2 in Germany Kilian Schwarz GSI Darmstadt.
Clara Gaspar on behalf of the LHCb Collaboration, “Physics at the LHC and Beyond”, Quy Nhon, Vietnam, August 2014 Challenges and lessons learnt LHCb Operations.
CHEP 2012 – New York City 1.  LHC Delivers bunch crossing at 40MHz  LHCb reduces the rate with a two level trigger system: ◦ First Level (L0) – Hardware.
23/04/2008VLVnT08, Toulon, FR, April 2008, M. Stavrianakou, NESTOR-NOA 1 First thoughts for KM3Net on-shore data storage and distribution Facilities VLV.
1 Data Storage MICE DAQ Workshop 10 th February 2006 Malcolm Ellis & Paul Kyberd.
Large scale data flow in local and GRID environment V.Kolosov, I.Korolko, S.Makarychev ITEP Moscow.
LHC Experiment Dashboard Main areas covered by the Experiment Dashboard: Data processing monitoring (job monitoring) Data transfer monitoring Site/service.
PROOF: the Parallel ROOT Facility Scheduling and Load-balancing ACAT 2007 Jan Iwaszkiewicz ¹ ² Gerardo Ganis ¹ Fons Rademakers ¹ ¹ CERN PH/SFT ² University.
LHCb Quarterly Report October Core Software (Gaudi) m Stable version was ready for 2008 data taking o Gaudi based on latest LCG 55a o Applications.
LHCC Comprehensive Review – September WLCG Commissioning Schedule Still an ambitious programme ahead Still an ambitious programme ahead Timely testing.
Test Of Distributed Data Quality Monitoring Of CMS Tracker Dataset H->ZZ->2e2mu with PileUp - 10,000 events ( ~ 50,000 hits for events) The monitoring.
José M. Hernández CIEMAT Grid Computing in the Experiment at LHC Jornada de usuarios de Infraestructuras Grid January 2012, CIEMAT, Madrid.
Computing and LHCb Raja Nandakumar. The LHCb experiment  Universe is made of matter  Still not clear why  Andrei Sakharov’s theory of cp-violation.
CHEP – Mumbai, February 2006 The LCG Service Challenges Focus on SC3 Re-run; Outlook for 2006 Jamie Shiers, LCG Service Manager.
Computing Infrastructure Status. LHCb Computing Status LHCb LHCC mini-review, February The LHCb Computing Model: a reminder m Simulation is using.
LCG Service Challenge Phase 4: Piano di attività e impatto sulla infrastruttura di rete 1 Service Challenge Phase 4: Piano di attività e impatto sulla.
Bookkeeping Tutorial. Bookkeeping & Monitoring Tutorial2 Bookkeeping content  Contains records of all “jobs” and all “files” that are created by production.
ATLAS in LHCC report from ATLAS –ATLAS Distributed Computing has been working at large scale Thanks to great efforts from shifters.
November SC06 Tampa F.Fanzago CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team.
CCRC’08 Weekly Update Jamie Shiers ~~~ LCG MB, 1 st April 2008.
1 LCG-France sites contribution to the LHC activities in 2007 A.Tsaregorodtsev, CPPM, Marseille 14 January 2008, LCG-France Direction.
Stefano Belforte INFN Trieste 1 Middleware February 14, 2007 Resource Broker, gLite etc. CMS vs. middleware.
1 LHCb on the Grid Raja Nandakumar (with contributions from Greig Cowan) ‏ GridPP21 3 rd September 2008.
1 User Analysis Workgroup Discussion  Understand and document analysis models  Best in a way that allows to compare them easily.
1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.
CHEP 2006, February 2006, Mumbai 1 LHCb use of batch systems A.Tsaregorodtsev, CPPM, Marseille HEPiX 2006, 4 April 2006, Rome.
BNL Service Challenge 3 Status Report Xin Zhao, Zhenping Liu, Wensheng Deng, Razvan Popescu, Dantong Yu and Bruce Gibbard USATLAS Computing Facility Brookhaven.
Data Transfer Service Challenge Infrastructure Ian Bird GDB 12 th January 2005.
WLCG LHCC mini-review LHCb Summary. Outline m Activities in 2008: summary m Status of DIRAC m Activities in 2009: outlook m Resources in PhC2.
Large scale data flow in local and GRID environment Viktor Kolosov (ITEP Moscow) Ivan Korolko (ITEP Moscow)
LHCb report to LHCC and C-RSG Philippe Charpentier CERN on behalf of LHCb.
Workflows and Data Management. Workflow and DM Run3 and after: conditions m LHCb major upgrade is for Run3 (2020 horizon)! o Luminosity x 5 ( )
OPERATIONS REPORT JUNE – SEPTEMBER 2015 Stefan Roiser CERN.
1 Andrea Sciabà CERN The commissioning of CMS computing centres in the WLCG Grid ACAT November 2008 Erice, Italy Andrea Sciabà S. Belforte, A.
Handling of T1D0 in CCRC’08 Tier-0 data handling Tier-1 data handling Experiment data handling Reprocessing Recalling files from tape Tier-0 data handling,
LHCbDirac and Core Software. LHCbDirac and Core SW Core Software workshop, PhC2 Running Gaudi Applications on the Grid m Application deployment o CVMFS.
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
Ian Bird Overview Board; CERN, 8 th March 2013 March 6, 2013
LHCb status and plans Ph.Charpentier CERN. LHCb status and plans WLCG Workshop 1-2 Sept 2007, Victoria, BC 2 Ph.C. Status of DC06  Reminder:  Two-fold.
LHCbComputing Computing for the LHCb Upgrade. 2 LHCb Upgrade: goal and timescale m LHCb upgrade will be operational after LS2 (~2020) m Increase significantly.
Distributed Physics Analysis Past, Present, and Future Kaushik De University of Texas at Arlington (ATLAS & D0 Collaborations) ICHEP’06, Moscow July 29,
CMS: T1 Disk/Tape separation Nicolò Magini, CERN IT/SDC Oliver Gutsche, FNAL November 11 th 2013.
Markus Frank (CERN) & Albert Puig (UB).  An opportunity (Motivation)  Adopted approach  Implementation specifics  Status  Conclusions 2.
Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.
Monitoring the Readiness and Utilization of the Distributed CMS Computing Facilities XVIII International Conference on Computing in High Energy and Nuclear.
LHCb Computing activities Philippe Charpentier CERN – LHCb On behalf of the LHCb Computing Group.
LHCb 2009-Q4 report Q4 report LHCb 2009-Q4 report, PhC2 Activities in 2009-Q4 m Core Software o Stable versions of Gaudi and LCG-AA m Applications.
SAM architecture EGEE 07 Service Availability Monitor for the LHC experiments Simone Campana, Alessandro Di Girolamo, Nicolò Magini, Patricia Mendez Lorenzo,
Acronyms GAS - Grid Acronym Soup, LCG - LHC Computing Project EGEE - Enabling Grids for E-sciencE.
LHCb Computing 2015 Q3 Report Stefan Roiser LHCC Referees Meeting 1 December 2015.
1-2 March 2006 P. Capiluppi INFN Tier1 for the LHC Experiments: ALICE, ATLAS, CMS, LHCb.
WLCG IPv6 deployment strategy
L’analisi in LHCb Angelo Carbone INFN Bologna
LCG Service Challenge: Planning and Milestones
Data Challenge with the Grid in ATLAS
Bernd Panzer-Steindel, CERN/IT
Update on Plan for KISTI-GSDC
Status and Prospects of The LHC Experiments Computing
LHCb Computing Model and Data Handling Angelo Carbone 5° workshop italiano sulla fisica p-p ad LHC 31st January 2008.
Philippe Charpentier CERN – LHCb On behalf of the LHCb Computing Group
Dagmar Adamova (NPI AS CR Prague/Rez) and Maarten Litmaath (CERN)
Ákos Frohner EGEE'08 September 2008
WLCG Collaboration Workshop;
ILD Ichinoseki Meeting
R. Graciani for LHCb Mumbay, Feb 2006
LHCb Computing Philippe Charpentier CERN
LHCb status and plans Ph.Charpentier CERN.
Development of LHCb Computing Model F Harris
The LHCb Computing Data Challenge DC06
Presentation transcript:

LHCb Roadmap

2008: DIRAC3 put in production m Production activities o Started in July o Simulation, reconstruction, stripping P Includes file distribution strategy, failover mechanism P File access using local access protocol (rootd, rfio, (gsi)dcap, xrootd) P Commissioned alternative method: copy to local disk d Drawback: non-guaranteed space, less CPU efficiency, additional network traffic (possibly copied from remote site) o Failover using VOBOXes P File transfers (delegated to FTS) P LFC registration P Internal DIRAC operations (bookkeeping, job monitoring…) m Analysis o Started in September o Ganga available for DIRAC3 in November o DIRAC2 de-commissioned on January 12 th Call me DIRAC now…. wLCG Workshop, March 2009, Prague2

2009 DIRAC concurrent jobs wLCG Workshop, March 2009, Prague3 111 sites

DIRAC jobs per day wLCG Workshop, March 2009, Prague4

LHCb Computing Operations m Production manager o Schedules production work, sets up and checks workflows, reports to LHCb operations m Computing shifters o Computing Operations shifter (pool of ~12 shifters) P Covers 14h/day, 7 days / week P Computing Control room (2-R-014) o Data Quality shifter P Covers 8h/day, 7 days / week o Both are in the LHCb Computing Control room (2-R-014) m Daily DQ and Operations meetings o Week days (twice a week during shutdowns) m Grid Expert on-call o On duty for a week o Runs the operations meetings m Grid Team (~6 FTEs needed, ~2 missing) o Shared responsibilities (WMS, DMS, SAM, Bookkeeping…) wLCG Workshop, March 2009, Prague5

Activities in 2008 m Completion of MC simulation called DC06 o Additional channels o Re-reconstruction (at Tier1s) P Involves a lot of pre-staging (2 years old files) o Stripping (at Tier1s) m User analysis of DC06 o At Tier1s, using ganga and DIRAC (2, then 3) P Access to D1 data (some files are 2 years old) m Commissioning for 2008 data taking o CCRC08 (February, May) P Managed to distribute data at nominal rate P Automatic job submission to Tier1s P Re-processing of data still on disk o Very few cosmics data (only saved at Tier0, analysed online) o First beam data P Very few events (rate: 1 event / 48 seconds…) wLCG Workshop, March 2009, Prague6

Plans for 2009 m Simulation… and its analysis in 2009 o Tuning stripping and HLT for 2010 (DC09) P 4/5 TeV, 50 ns (no spillover), cm -1 s -1 P Benchmark channels for first physics studies (100 Mevts) B µµ, Γ s, B Dh, B s J/ ψ ϕ, B K * µµ … P Large minimum bias samples (~ 1mn of LHC running, 10 9 events) P Stripping performance required: ~ 50 Hz for benchmark channels P Tune HLT: efficiency vs retention, optimisation o Replacing DC06 datasets (DC09-2) P Signal and background samples (~500 Mevts) P Minimum bias for L0, HLT and stripping commissioning (~100 Mevts) P Used for CP-violation performance studies P Nominal LHC settings (7 TeV, 25 ns, cm -2 s -1 ) o Preparation for very first physics (MC-2009) P 2 TeV, low luminosity P Large minimum bias sample (10 9 events, part used for FEST09) m Commissioning for data taking (FEST09) o See next slides wLCG Workshop, March 2009, Prague7

FEST09 m Aim o Replace the non-existing 2008 beam data with MC o Learn on how to desal with real data P HLT strategy: from 1 MHz to 2 kHz d First data (loose trigger) d Higher lumi/energy data (b-physics trigger) P Online detector monitoring d Based on event selection from HLT e.g. J/Psi events d Automatic detector problems detection P Online Data streaming d Physics stream (all triggers) and calibration stream (subset of triggers, typically 5 Hz) P Alignment and calibration loop d Trigger re-alignment d Run alignment processes d Validate new alignment (based on calibration stream) P Feedback of calibration to reconstruction P Stripping, streaming, data merging and distribution P Physics Analysis (group analysis, end-user…) wLCG Workshop, March 2009, Prague8

FEST09 preparation (2) m Online developments o Event injector P Read MC files with emulated L0 trigger P Creates Multi-Event Packets (MEP as font-end does) P Send MEP to an HLT farm node o Event injector control system P Emulation of the standard Run Control P Simulates a regular run, but using event injector as source o Multiple online streams P Using HLT classification as criterion d Was not needed for 2008 run, hence was delayed o Status P Tests in December, operational in January P First FEST week: 26 January d Mainly online commissioning, limited data transfers P Second FEST week: 2 March d Data Quality commissioning, feedback to reconstruction wLCG Workshop, March 2009, Prague9

Resources (preliminary) m Consider as a whole (new LHC schedule) o Real data P Split year in two parts: d s at low lumi – LHC-phase1 d 3 to s at higher lumi ( ) – LHC phase2 P Trigger rate independent on lumi and energy: 2 kHz o Simulation: events (nominal year) in 2010 m New assumptions for (re-)processing and analysis o More re-processings during LHC-phase1 o Add calibration checks (done at CERN) o Envision more analysis at CERN with first data P Increase from 25% (TDR) to 50% (phase1) and 35% (phase2) P Include SW development and testing (LXBATCH) o Adjust event sizes and CPU needs to current estimates P Important effort to reduce data size (packed format for rDST, DST, µDST…) P Use new HEP-SPEC06 benchmarking wLCG Workshop, March 2009, Prague10

Resources (contd) m CERN usage o Tier0: P Real data recording, export to Tier1s P First pass reconstruction of ~85% of raw data P Reprocessing (in future foresee to use also the Online HLT farm) o CAF (Calibration and Alignment Facility) P Dedicated LXBATCH resources P Detector studies, alignment and calibration o CAF (CERN Analysis Facility) P Part of Grid distributed analysis facilities (estimate 40% in ) P Histograms and interactive analysis (lxplus, desk/lap-tops) m Tier1 usage o Re[-re]construction P First pass during data taking, reprocessing o Analysis facilities P Grid distributed analysis P Local storage for users data (LHCb_USER SRM space) o Simulation in 2009 (background activity) wLCG Workshop, March 2009, Prague11

Resource requirements trends m Numbers being finalised for MB meeting and C-RRB m Trends are: o Shift in tape requirements due to LHC schedule o Increase in CERN CPU requirements P Change in assumptions in the computing model o Tier1s: P CPU requirements lower in 2009 but similar in 2010 d More real data re-processings in 2010 P Decrease in disk requirements o Tier2s: P CPU decrease due to less MC simulation requests in 2009 m Anyway: o All this is full of many unknowns! P LHC running time P Machine background P Number of re-processings (how fast can we calibrate?) o More than anything hard to predict needed power and space as function of time! Only integrated CPU, final storage estimates wLCG Workshop, March 2009, Prague12

What are the remaining issues? wLCG Workshop, March 2009, Prague13 Storage Stability Storage Stability

Storage and data access m 3 years after Mumbai o Definition of storage classes o Roadmap to SRM v2.2 m Where are we? o Many scalability issues P We do use and only use SRM P Data access from storage (no local copy) o Instabilities of storage-ware (and DM tools) P Delay in coping in changes (inconsistent tools) o Data disappearance…. P Tapes damaged P Disk servers offline o Still no unified RFIO library between Castor and DPM… m What can be done? o Regular meetings between experiments DM experts, sites and storage-ware developers P Pre-GDB resurrected? P Should be technical, not political wLCG Workshop, March 2009, Prague14

Storage and Data Access (2) m Reliability of data access? o We (experiments) cannot design sites storage o If more hardware is needed, should be evaluated by sites P Flexible to changes P Number of tape drives, size of disk caches, cache configuration… P Examples: d Write pools different from read pools: Is it a good choice? How large pools should be d Scale number of tape drives to disk cache and staging policy m Consistency of storage with catalogs o Unaccessible data (tape or disk) o Job matching based on Catalog P For T1D0 data, we use pre-staging: ensures availability of data d Spot lost files P For D1 data, we assume it is available d We can query SRM, but will collapse d Will SRM reply the truth, i.e. UNAVAILABLE? d We often can get a tURL, but opening file just hangs… wLCG Workshop, March 2009, Prague15

Software repository and deployment m Very important service: o Can make a site unusable! o Should scale with number of WNs o Use proper technology P Example: at CERN LHCb has 1 write AFS server and 4 read-only AFS servers o Of course proper permissions should be set… P Write to lcg-admin (a.k.a. sgm accounts) P Read-only to all others P Make your choice: pool accounts and separate groups or single accounts o Intermittent outages can kill all jobs on a site! m Middleware client o We do need support for multiplatform P Libraries linked to applications (LFC, gfal, castor, dCache…) o Therefore we must distribute it P LCG-AA distribution is primordial wLCG Workshop, March 2009, Prague16

Workload Management m Stability and reliability of gLite WMS o Mega-patch is not a great experience… o In most cases we dont need brokering P Next step is direct CE submission (CREAM) d Need a reliable CEMON information m Job matching to WNs: shopping list o MaxCPUTime matching: which units? P Is it guaranteed? o Memory usage P We are very modest memory consumers, but… P Jobs are often killed by batch systems due to excessive memory (virtual) P There is no queue parameter allowing a JDL requirement d Only indication on WN memory P Some sites have linked memory to CPU!!! d Seem strange… short jobs all fail… P Limits should be increased d Can bias physics results (e.g. large number of particles in Geant4) P CPUs with (really) many cores are almost here… wLCG Workshop, March 2009, Prague17

SAM jobs and reports m Need to report on usability by the experiments o Tests reproduce standard use cases o Should run as normal jobs, i.e. not on special clean environment m Reserve lcg-admin for software installation o Needs dedicated mapping for permissions to repository m Use normal accounts for running tests o Running as Ultimate Priority DIRAC jobs o Matched by the first pilot job that starts P Scans the WN domain d Often see WN-dependent problems (bad config) P Regular environment o Should allow for longer periods without report P Queues may be full (which is actually good sign) but then no new job can start! wLCG Workshop, March 2009, Prague18

Conclusions m 2008 o CCRC very useful for LHCb (although irrelevant to be simultaneous due to low throughput) o DIRAC3 fully commissioned P Production in July P Analysis in November P As of now, called DIRAC o Last processing on DC06 P Analysis will continue in 2009 o Commission simulation and reconstruction for real data m o Large simulation requests for replacing DC06, preparing o FEST09: ~1 week a month and 1 day a week o Resource requirements being prepared for C-RRB in April Services are not stable enough yet! wLCG Workshop, March 2009, Prague19