LHCb LHCb GRID SOLUTION TM Recent and planned changes to the LHCb computing model Marco Cattaneo, Philippe Charpentier, Peter Clarke, Stefan Roiser.

Slides:

Advertisements

Similar presentations

Clara Gaspar on behalf of the LHCb Collaboration, “Physics at the LHC and Beyond”, Quy Nhon, Vietnam, August 2014 Challenges and lessons learnt LHCb Operations.

Advertisements

Stripping Plans for 2014 and 2015 Laurence Carson (Edinburgh), Stefano Perazzini (Bologna) 2 nd LHCb Computing Workshop,

T1 at LBL/NERSC/OAK RIDGE General principles. RAW data flow T0 disk buffer DAQ & HLT CERN Tape AliEn FC Raw data Condition & Calibration & data DB disk.

DATA PRESERVATION IN ALICE FEDERICO CARMINATI. MOTIVATION ALICE is a 150 M CHF investment by a large scientific community The ALICE data is unique and.

Hall D Online Data Acquisition CEBAF provides us with a tremendous scientific opportunity for understanding one of the fundamental forces of nature. 75.

LHCb Quarterly Report October Core Software (Gaudi) m Stable version was ready for 2008 data taking o Gaudi based on latest LCG 55a o Applications.

Ian Fisk and Maria Girone Improvements in the CMS Computing System from Run2 CHEP 2015 Ian Fisk and Maria Girone For CMS Collaboration.

December 17th 2008RAL PPD Computing Christmas Lectures 11 ATLAS Distributed Computing Stephen Burke RAL.

Status of CMS Matthew Nguyen Recontres LCG-France December 1 st, 2014 *Mostly based on information from CMS Offline & Computing Week November 3-7.

Grid Data Management A network of computers forming prototype grids currently operate across Britain and the rest of the world, working on the data challenges.

Computing Infrastructure Status. LHCb Computing Status LHCb LHCC mini-review, February The LHCb Computing Model: a reminder m Simulation is using.

Your university or experiment logo here Caitriana Nicholson University of Glasgow Dynamic Data Replication in LCG 2008.

Status of 2015 pledges 2016 requests RRB Report Concezio Bozzi INFN Ferrara LHCb NCB, November 3 rd 2014.

ALICE Upgrade for Run3: Computing HL-LHC Trigger, Online and Offline Computing Working Group Topical Workshop Sep 5 th 2014.

Bookkeeping Tutorial. Bookkeeping & Monitoring Tutorial2 Bookkeeping content  Contains records of all “jobs” and all “files” that are created by production.

14 Aug 08DOE Review John Huth ATLAS Computing at Harvard John Huth.

Caitriana Nicholson, CHEP 2006, Mumbai Caitriana Nicholson University of Glasgow Grid Data Management: Simulations of LCG 2008.

LHCb The LHCb Data Management System Philippe Charpentier CERN On behalf of the LHCb Collaboration.

CERN IT Department CH-1211 Genève 23 Switzerland t Frédéric Hemmer IT Department Head - CERN 23 rd August 2010 Status of LHC Computing from.

Marco Cattaneo LHCb computing status for LHCC referees meeting 14 th June

Integration of the ATLAS Tag Database with Data Management and Analysis Components Caitriana Nicholson University of Glasgow 3 rd September 2007 CHEP,

LHCbComputing LHCC status report. Operations June 2014 to September m Running jobs by activity o Montecarlo simulation continues as main activity.

LHCbComputing Resources requests : changes since LHCb-PUB (March 2013) m Assume no further reprocessing of Run I data o (In.

20 Dec Peter Clarke Computing production issues 1.Disk Space – update from high-mu workshop 2.Disk planning 3.Use of T2s NCB meeting 20-Dec-2010.

LHCb report to LHCC and C-RSG Philippe Charpentier CERN on behalf of LHCb.

Maria Girone CERN - IT Tier0 plans and security and backup policy proposals Maria Girone, CERN IT-PSS.

Workflows and Data Management. Workflow and DM Run3 and after: conditions m LHCb major upgrade is for Run3 (2020 horizon)! o Luminosity x 5 ( )

LHCb Readiness for Run WLCG Workshop Okinawa

Predrag Buncic ALICE Status Report LHCC Referee Meeting CERN

OPERATIONS REPORT JUNE – SEPTEMBER 2015 Stefan Roiser CERN.

ATLAS Distributed Computing perspectives for Run-2 Simone Campana CERN-IT/SDC on behalf of ADC.

Predrag Buncic CERN ALICE Status Report LHCC Referee Meeting 01/12/2015.

Ian Bird WLCG Networking workshop CERN, 10 th February February 2014

Maria Girone, CERN CMS Experiment Status, Run II Plans, & Federated Requirements Maria Girone, CERN XrootD Workshop, January 27, 2015.

Ian Bird Overview Board; CERN, 8 th March 2013 March 6, 2013

LHCbComputing Computing for the LHCb Upgrade. 2 LHCb Upgrade: goal and timescale m LHCb upgrade will be operational after LS2 (~2020) m Increase significantly.

Markus Frank (CERN) & Albert Puig (UB).  An opportunity (Motivation)  Adopted approach  Implementation specifics  Status  Conclusions 2.

Dynamic Data Placement: the ATLAS model Simone Campana (IT-SDC)

1 Reconstruction tasks R.Shahoyan, 25/06/ Including TRD into track fit (JIRA PWGPP-1))  JIRA PWGPP-2: Code is in the release, need to switch setting.

16 September 2014 Ian Bird; SPC1. General ALICE and LHCb detector upgrades during LS2  Plans for changing computing strategies more advanced CMS and.

Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.

LHCb Computing activities Philippe Charpentier CERN – LHCb On behalf of the LHCb Computing Group.

LHCb 2009-Q4 report Q4 report LHCb 2009-Q4 report, PhC2 Activities in 2009-Q4 m Core Software o Stable versions of Gaudi and LCG-AA m Applications.

Jianming Qian, UM/DØ Software & Computing Where we are now Where we want to go Overview Director’s Review, June 5, 2002.

Computing infrastructures for the LHC: current status and challenges of the High Luminosity LHC future Worldwide LHC Computing Grid (WLCG): Distributed.

Predrag Buncic CERN Plans for Run2 and the ALICE upgrade in Run3 ALICE Tier-1/Tier-2 Workshop February 2015.

LHCbComputing LHCb computing model in Run1 & Run2 Concezio Bozzi Bologna, Feb 19 th 2015.

LHCb Computing 2015 Q3 Report Stefan Roiser LHCC Referees Meeting 1 December 2015.

ScotGRID is the Scottish prototype Tier 2 Centre for LHCb and ATLAS computing resources. It uses a novel distributed architecture and cutting-edge technology,

ATLAS – statements of interest (1) A degree of hierarchy between the different computing facilities, with distinct roles at each level –Event filter Online.

LHCb distributed computing during the LHC Runs 1,2 and 3

Ian Bird WLCG Workshop San Francisco, 8th October 2016

L’analisi in LHCb Angelo Carbone INFN Bologna

Report from WLCG Workshop 2017: WLCG Network Requirements GDB - CERN 12th of July 2017

SuperB and its computing requirements

Predrag Buncic ALICE Status Report LHCC Referee Meeting CERN

evoluzione modello per Run3 LHC

for the Offline and Computing groups

LHCb Software & Computing Status

LHCb Computing Model and Data Handling Angelo Carbone 5° workshop italiano sulla fisica p-p ad LHC 31st January 2008.

Philippe Charpentier CERN – LHCb On behalf of the LHCb Computing Group

Dagmar Adamova (NPI AS CR Prague/Rez) and Maarten Litmaath (CERN)

ALICE Computing Upgrade Predrag Buncic

ILD Ichinoseki Meeting

New strategies of the LHC experiments to meet

R. Graciani for LHCb Mumbay, Feb 2006

LHCb Computing Philippe Charpentier CERN

ATLAS DC2 & Continuous production

The ATLAS Computing Model

The LHCb Computing Data Challenge DC06

Presentation transcript:

LHCb LHCb GRID SOLUTION TM Recent and planned changes to the LHCb computing model Marco Cattaneo, Philippe Charpentier, Peter Clarke, Stefan Roiser

LHCb data-taking conditions m Run 1 exceeded specifications to maximise physics reach m In 2015: o ~Same luminosity as in 2012 (luminosity levelling) P In twice the bunches (25ns) -> reduced pileup o Event size roughly constant P Higher track multiplicity, lower pileup P Processing time per event ~unchanged o 2.5 times HLT rate P More CPU, more storage 2

LHCb Computing Model (TDR) m RAW data: 1 copy at CERN, 1 copy distributed (6 Tier1s) o First pass reconstruction runs democratically at CERN+Tier1s o End of year reprocessing of complete year’s dataset P Also at CERN+Tier1s m Each reconstruction followed by “stripping” pass P Event selections by physics groups, several 1000s selections in ~10 streams P Further stripping passes scheduled as needed m Stripped DSTs distributed to CERN and all 6 Tier1s o Input to user analysis and further centralised processing by analysis working groups P User analysis runs at any Tier1 P Users do not have access to RAW data or unstripped Reconstruction output m All Disk located at CERN and Tier1s o Tier2s dedicated to simulation P And analysis jobs requiring no input data o Simulation DSTs copied back to CERN and 3 Tier1s 3

Problems with TDR model m Tier1 CPU power sized for end of year reprocessing o Large peaks, increasing with accumulated luminosity m Processing model makes inflexible use of CPU resources o Only simulation can run anywhere m Data management model very demanding on storage space o All sites treated equally, regardless of available space 4

Changes to processing model in 2012 m In 2012, doubling of integrated luminosity c.f o New model required to avoid doubling Tier1 power m Allow reconstruction jobs to be executed on a selected number of Tier2 sites o Download the RAW file (3GB) from a Tier1 storage o Run the reconstruction job at the Tier2 site (~ 24 hours) o Upload the Reco output file to the same T1 storage m Rethink first pass reconstruction & reprocessing strategy o First pass processing mainly for monitoring and calibration P Used also for fast availability of data for ‘discovery’ physics o Reduce first pass to < 30% of RAW data bandwidth P Used exclusively to obtain final calibrations within 2-4 weeks o Process full bandwidth with 2-4 weeks delay P Makes full dataset available for precision physics without need for end of year reprocessing 5

“Reco14” processing of 2012 and 2011 data 45 % of reconstruction CPU time provided by 44 additional Tier2 sites But also outside WLCG (Yandex) 6 “Stop and Go” for 2012 data as it needed to wait calibration data from the first pass processing. Power required for continuous processing of 2012 data roughly equivalent to power required for reprocessing of 2011 data at end of year 2012 data 2011 data

2015: suppression of reprocessing m During LS1, major redesign of LHCb HLT system o HLT1 (displaced vertices) will run in real time o HLT2 (physics selections) deferred by several hours P Run continuous calibration in the Online farm to allow use of calibrated PID information in HLT2 selections P HLT2 reconstruction becomes very similar to offline m Automated validation of online calibration for use offline o Includes validation of alignment o Removes need for “first pass” reconstruction m Green light from validation triggers ‘final’ reconstruction o Foresee up to two weeks’ delay to allow correction of any problems flagged by automatic validation o No end of year reprocessing P Just restripping m If insufficient resources, foresee to ‘park’ a fraction of the data for processing after the run o Unlikely to be needed in 2015 but commissioned from the start 7

Going beyond the Grid paradigm m Distinction between Tiers for different types of processing activities becoming blurred o Currently, production managers manually attach/detach sites to different production activities in DIRAC configuration system P In the future sites declare their availability for a given activity and provide the corresponding computing resources m DIRAC allows easy integration of non WLCG resources o See Federico’s talk 8

Changes to data management model m Increases in trigger rate and expanded physics programme put strong pressure on storage resources m Tape shortages mitigated by reduction in archive volume o Archives of all derived data exist as single tape copy P Forced to accept risk of data loss m Disk shortages addressed by o Introduction of Disk at Tier 2 o Reduction of event size in derived data formats o Changes to data replication and data placement policies o Measurement of data popularity to guide decisions on replica removals 9

Tier2Ds m In LHCb computing model, user analysis jobs requiring input data are executed at sites holding the data on disk o So far Tier1 sites were the only ones to provide storage and computing resources for user analysis jobs m Tier2Ds are a limited set of Tier2 sites which are allowed to provide disk capacity for LHCb o Introduced in 2013 to circumvent shortfall of disk storage P To provide disk storage for physics analysis files (MC and data) P Run user analysis jobs on the data stored at the sites o See Stephan’s talk for status m Blurs even more functional distinction between Tier1 and Tier2 o Large Tier2D equivalent to small Tier1 without Tape 10

Data Formats m Highly centralised LHCb data processing model allows to optimise data formats for operation efficiency m Large shortfalls in disk and tape storage (due to larger trigger rates and expanded physics programme) drive efforts to reduce data formats for physics: o DST used by most analyses in 2010 (~120kB/event) P Contains copy of RAW and full Reco information o Strong drive to microDST (~13kB/event) P Suitable for most exclusive analyses, but many iterations required to get content correct o Transparent switching between several formats through generalised use of analysis software framework 11

12 Data placement of DSTs m Data-driven automatic replication o Archive systematically all analysis data (T1D0) o Real Data: 4 disk replicas, 1 archive o MC: 3 disk replicas, 1 archive m Selection of disk replication sites: o Keep together whole runs (for real data) P Random choice per file for MC o Choose storage element depending on free space P Random choice, weighted by the free space P Should allow no disk saturation d Exponential fall-off of free space d As long as there are enough non-full sites! m Removal of replicas o For processing n-1: reduce to 2 disk replicas (randomly) P Possibility to preferentially remove replicas from sites with less free space o For processing n-2: only keep archive replicas

Data popularity m Enabled recording of information as of May 2012 m Information recorded for each job: o Dataset (path) o Number of files for each job o Storage element used m Allows currently by visual inspection to identify unused datasets m Plan: o Establish, per dataset: P Last access date P Number of accesses in last (n) months (1<n<12) P Normalise number of dataset accesses to its size o Prepare summary tables per dataset P Access summary (above) P Storage usage at each site o Allow to trigger replica removal when space is required 13

Conclusions m The LHCb computing model has evolved to accommodate within a constant budget for computing resources the expanding physics programme of the experiment m The model has evolved from the hierarchical model of the TDR to a model based on the capabilities of different sites m Further adaptations are planned for We do not foresee the need for any revolutionary changes to the model or to our frameworks (Gaudi, Dirac) to accommodate the computing requirements of LHCb during Run 2 14

End m Backups are often required Rob Lambert, NIKHEF

Examples of popularity plots (kFiles/day) data still much used

Examples of popularity plots (kFiles/day) 17 Single dataset used for 2012 data

Examples of popularity plots (kFiles/day) 18 3 datasets for 2011 data