LHCb distributed computing during the LHC Runs 1,2 and 3

Slides:

Advertisements

Similar presentations

31/03/00 CMS(UK)Glenn Patrick What is the CMS(UK) Data Model? Assume that CMS software is available at every UK institute connected by some infrastructure.

Advertisements

Large scale data flow in local and GRID environment V.Kolosov, I.Korolko, S.Makarychev ITEP Moscow.

Pilots 2.0: DIRAC pilots for all the skies Federico Stagni, A.McNab, C.Luzzi, A.Tsaregorodtsev On behalf of the DIRAC consortium and the LHCb collaboration.

Ian Fisk and Maria Girone Improvements in the CMS Computing System from Run2 CHEP 2015 Ian Fisk and Maria Girone For CMS Collaboration.

Zhiling Chen (IPP-ETHZ) Doktorandenseminar June, 4 th, 2009.

José M. Hernández CIEMAT Grid Computing in the Experiment at LHC Jornada de usuarios de Infraestructuras Grid January 2012, CIEMAT, Madrid.

Building a distributed software environment for CDF within the ESLEA framework V. Bartsch, M. Lancaster University College London.

Computing Infrastructure Status. LHCb Computing Status LHCb LHCC mini-review, February The LHCb Computing Model: a reminder m Simulation is using.

PhysX CoE: LHC Data-intensive workflows and datamanagement Wahid Bhimji, Pete Clarke, Andrew Washbrook – Edinburgh And other CoE WP4 people…

Bookkeeping Tutorial. Bookkeeping & Monitoring Tutorial2 Bookkeeping content  Contains records of all “jobs” and all “files” that are created by production.

Databases E. Leonardi, P. Valente. Conditions DB Conditions=Dynamic parameters non-event time-varying Conditions database (CondDB) General definition:

Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.

November SC06 Tampa F.Fanzago CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team.

Tier-2  Data Analysis  MC simulation  Import data from Tier-1 and export MC data CMS GRID COMPUTING AT THE SPANISH TIER-1 AND TIER-2 SITES P. Garcia-Abia.

1 LCG-France sites contribution to the LHC activities in 2007 A.Tsaregorodtsev, CPPM, Marseille 14 January 2008, LCG-France Direction.

LHCb The LHCb Data Management System Philippe Charpentier CERN On behalf of the LHCb Collaboration.

T3 analysis Facility V. Bucard, F.Furano, A.Maier, R.Santana, R. Santinelli T3 Analysis Facility The LHCb Computing Model divides collaboration affiliated.

Claudio Grandi INFN Bologna CMS Computing Model Evolution Claudio Grandi INFN Bologna On behalf of the CMS Collaboration.

LHCbComputing Resources requests : changes since LHCb-PUB (March 2013) m Assume no further reprocessing of Run I data o (In.

Data Placement Intro Dirk Duellmann WLCG TEG Workshop Amsterdam 24. Jan 2012.

LHCb report to LHCC and C-RSG Philippe Charpentier CERN on behalf of LHCb.

Workflows and Data Management. Workflow and DM Run3 and after: conditions m LHCb major upgrade is for Run3 (2020 horizon)! o Luminosity x 5 ( )

LHCb Readiness for Run WLCG Workshop Okinawa

OPERATIONS REPORT JUNE – SEPTEMBER 2015 Stefan Roiser CERN.

DIRAC Project A.Tsaregorodtsev (CPPM) on behalf of the LHCb DIRAC team A Community Grid Solution The DIRAC (Distributed Infrastructure with Remote Agent.

ATLAS Distributed Computing perspectives for Run-2 Simone Campana CERN-IT/SDC on behalf of ADC.

Ian Bird Overview Board; CERN, 8 th March 2013 March 6, 2013

LHCbComputing Computing for the LHCb Upgrade. 2 LHCb Upgrade: goal and timescale m LHCb upgrade will be operational after LS2 (~2020) m Increase significantly.

CMS: T1 Disk/Tape separation Nicolò Magini, CERN IT/SDC Oliver Gutsche, FNAL November 11 th 2013.

Finding Data in ATLAS. May 22, 2009Jack Cranshaw (ANL)2 Starting Point Questions What is the latest reprocessing of cosmics? Are there are any AOD produced.

Markus Frank (CERN) & Albert Puig (UB).  An opportunity (Motivation)  Adopted approach  Implementation specifics  Status  Conclusions 2.

Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.

LHCb Computing activities Philippe Charpentier CERN – LHCb On behalf of the LHCb Computing Group.

Jianming Qian, UM/DØ Software & Computing Where we are now Where we want to go Overview Director’s Review, June 5, 2002.

LHCbComputing LHCb computing model in Run1 & Run2 Concezio Bozzi Bologna, Feb 19 th 2015.

LHCb LHCb GRID SOLUTION TM Recent and planned changes to the LHCb computing model Marco Cattaneo, Philippe Charpentier, Peter Clarke, Stefan Roiser.

CERN IT Department CH-1211 Genève 23 Switzerland t EGEE09 Barcelona ATLAS Distributed Data Management Fernando H. Barreiro Megino on behalf.

LHCb Computing 2015 Q3 Report Stefan Roiser LHCC Referees Meeting 1 December 2015.

LHC collisions rate: Hz New PHYSICS rate: Hz Event selection: 1 in 10,000,000,000,000 Signal/Noise: Raw Data volumes produced.

ScotGRID is the Scottish prototype Tier 2 Centre for LHCb and ATLAS computing resources. It uses a novel distributed architecture and cutting-edge technology,

ATLAS – statements of interest (1) A degree of hierarchy between the different computing facilities, with distinct roles at each level –Event filter Online.

LHCb D ata P rocessing S oftware J. Blouw, A. Zhelezov Physikalisches Institut, Universitaet Heidelberg DESY Computing Seminar, Nov. 29th, 2010.

Review of the WLCG experiments compute plans

Ian Bird WLCG Workshop San Francisco, 8th October 2016

WP18, High-speed data recording Krzysztof Wrona, European XFEL

LCG Service Challenge: Planning and Milestones

Virtualization and Clouds ATLAS position

Report from WLCG Workshop 2017: WLCG Network Requirements GDB - CERN 12th of July 2017

Grid site as a tool for data processing and data analysis

Overview of the Belle II computing

Belle II Physics Analysis Center at TIFR

GGF OGSA-WG, Data Use Cases Peter Kunszt Middleware Activity, Data Management Cluster EGEE is a project funded by the European.

Jan 12, 2005 Improving CMS data transfers among its distributed Computing Facilities N. Magini CERN IT-ES-VOS, Geneva, Switzerland J. Flix Port d'Informació.

Data Challenge with the Grid in ATLAS

INFN-GRID Workshop Bari, October, 26, 2004

Bernd Panzer-Steindel, CERN/IT

Status and Prospects of The LHC Experiments Computing

LHCb Software & Computing Status

LHCb Computing Model and Data Handling Angelo Carbone 5° workshop italiano sulla fisica p-p ad LHC 31st January 2008.

Philippe Charpentier CERN – LHCb On behalf of the LHCb Computing Group

Dagmar Adamova (NPI AS CR Prague/Rez) and Maarten Litmaath (CERN)

Artem Trunov and EKP team EPK – Uni Karlsruhe

Ákos Frohner EGEE'08 September 2008

WLCG Collaboration Workshop;

R. Graciani for LHCb Mumbay, Feb 2006

LHCb Computing Philippe Charpentier CERN

YongPyong-High Jan We appreciate that you give an opportunity to have this talk. Our Belle II computing group would like to report on.

Development of LHCb Computing Model F Harris

The LHCb Computing Data Challenge DC06

Presentation transcript:

LHCb distributed computing during the LHC Runs 1,2 and 3 Stefan Roiser, Chris Haen On behalf of the LHCb Computing team

ISGC'15 - LHCb Distributed Computing Evolution Content Evolution of the LHCb experiment’s computing model and operation in the areas of Data Processing Data Management Supporting Services NB: All activities carried out by LHCb in distributed computing for data management and data processing are managed by LHCbDIRAC This talk is NOT about LHCbDIRAC, see talk 39 “Architecture of the LHCb Distributed Computing System” on Friday ISGC'15 - LHCb Distributed Computing Evolution

ISGC'15 - LHCb Distributed Computing Evolution Preamble: LHC running conditions relevant for LHCb offline data processing Run 1 (2011/12) Planned for Run 2 (> 2015) Max beam energy 4 TeV 6.5 TeV Transverse beam emittance 1.8 μm (??) 1.9 μm β* (beam oscillation) 0.6 m 0.5 m Number of bunches 1374 2508 Max protons per bunch 1.7 * 1011 1.15 * 1011 Bunch spacing 50 ns 25 ns μ (avg # collisions/crossing) 1.6 1.2 Max LHC Luminosity 7.7 * 1033 cm-2s-1 1.6 * 1034 cm-2s-1 Max LHCb Luminosity 4 * 1032 cm-2s-1 ATLAS & CMS NB: LHCb uses “luminosity leveling”, i.e. the “in time pile up” and therefore the instantaneous luminosity stays constant LHCb ISGC'15 - LHCb Distributed Computing Evolution

Preamble 2: Data taking and filtering Before data arrives on “the grid” for processing it runs through hardware/software filters (“High Level Trigger”) reducing the rate of stored events from 40 MHz to ~ kHz During Run 1 4.5 kHz of stored events at ~ 60 kB / event 800 k “RAW” files of 3GB collected Changes for Run 2 Output rate increases to 10 kHz event size stays the same at ~ 60 kB This is so much data out of the pit 300 -> 750 MB/s Results in ~ double amount of data to be processed offline New concept of “Turbo Stream” (2.5 kHz) Event reconstruction in the HLT no further offline processing needed Ideas for Run 3 Output rate increase by factor 10 -> more reco in HLT ISGC'15 - LHCb Distributed Computing Evolution

ISGC'15 - LHCb Distributed Computing Evolution DATA Processing ISGC'15 - LHCb Distributed Computing Evolution

Offline Processing Workflow Legend: Application File Type Storage Element Stripping RAW . X 5GB, 1x FULL.DST Tape RAW 24h Reco 5GB, 1x FULL.DST BUFFER 3GB, 2x Tape 6h Stripping The RAW input file is available on Tape storage Reconstruction (Brunel) runs ~ 24 h, 1 input RAW, 1 output FULL.DST to (Disk) BUFFER Asynchronous migration of FULL.DST from BUFFER to Tape Stripping (DaVinci) runs on 1 or 2 input files (~ 6h/file), output several unmerged DST files (one per “stream”) to BUFFER Input FULL.DST removed from BUFFER asynchronously Rerun the above workflows for one run Once a stream reaches 5 GB of unmerged DSTs (up to O(100) files), Merging (DaVinci) runs ~ 15 – 30 mins, output one merged DST file to Disk Input DST files removed from BUFFER asynchronously X … unmerged (M)DST O(MB) 1x BUFFER Merging 30m (M)DST … 5GB, 1x Disk ISGC'15 - LHCb Distributed Computing Evolution

Offline Data Reconstruction During Run 1 Data processing only at Tier1 sites For Run2 All processed data from a given run will stay at the same site More strict data placement than it was in Run 1 More flexibility b/c output is defined End of Run1 introduced processing with help of T2 sites Eg. 2012 “reprocessing” ~ 50 % of CPU from T2 sites Still “hard attachment” In Run 2 any site (T0/1/2) will be able to help on processing data from any other storage For certain workflows LHCb is moving away from rigid model of Tier levels Say sth about Tier levels ISGC'15 - LHCb Distributed Computing Evolution

Workflow Execution Location Tier1 A RAW Reco Strippg Merge FULL.DST unm. DST DST Tier1 B RAW Reco FULL.DST RAW Reco FULL.DST Tier 2 Tier 2 RAW Reco FULL.DST Tier1 B RAW Reco Strippg Merge FULL.DST unm. DST DST X X Data Processing workflow executed by default at Tier 0/1 sites during Run 1 For Run2 in addition we allow A Tier2 site to participate for a certain Job Type remotely (most useful would be Reco) Any Tier2 is allowed at any time to participate on any Job Type In principal the system also allows for ANY site to participate on any Job Type remotely ISGC'15 - LHCb Distributed Computing Evolution 8

Monte Carlo and User Workflows Monte Carlo Simulation Simulation jobs account for ~ 40 % of work executed on distributed computing resources during a data taking year During shutdown even more Recently introduce “elastic” Monte Carlo Knows CPU/event, able to adapt to the length of the queue User Jobs have highest priority of all workflows ~ 10 % of total work Can run on every tier level If require input data they are sent to the site containing the data ISGC'15 - LHCb Distributed Computing Evolution

Distributed Computing Resources (Virtualized) Vac Use of “non managed” virtualized resources, only hypervisors needed, the VMs will manage themselves (boot, shutdown) vcycle LHCb’s way of interacting with “managed” virtualized resources, e.g. via Openstack BOINC Volunteer computing project (a la “SETI@home”) to run short simulation jobs in a virtualized environment NB: Usage of virtualized resources is likely to expand during Run 2 ISGC'15 - LHCb Distributed Computing Evolution

Distributed Computing Resources (Non-Virtualized) Grid resources LHCb has been and is committed to continue using “classic” grid resources (batch system, worker nodes) HLT (non virtualized) Extensive use of the HLT farm especially during shutdown phase, i.e. + 17k job slots During data taking reduced usage Non pledged resources Several resources contributing to LHCb distributed computing, e.g. Yandex® (Russian search engine provider) ISGC'15 - LHCb Distributed Computing Evolution

ISGC'15 - LHCb Distributed Computing Evolution DATA Management ISGC'15 - LHCb Distributed Computing Evolution

ISGC'15 - LHCb Distributed Computing Evolution Data Storage Tier2Ds (D == Data) Introduced during Run 1, allowing also storage at Tier 2 sites, several sites are participating up to now Data Popularity – reduce of replicas Original computing model included a replica of every physics analysis file on every T1 storage, reduced to 4 replicas during Run 1 Further reduction possible with the help of data popularity tools I.e. register how often a data set is analyzed by physicists See our DataManagement poster (PO-02) ISGC'15 - LHCb Distributed Computing Evolution

ISGC'15 - LHCb Distributed Computing Evolution Data Operations LHCb uses two catalog types Bookkeeping Provides “data provenance” information, i.e. ancestors and descendants of the data produced File Catalog Provides information about “data replicas”, i.e. on which storage elements are the copies of a given file stored The File Catalog was recently migrated from the “LCG File Catalog (LFC)” to the “Dirac File Catalog (DFC)” which provides better performance for Run 2 Data access protocols SRM, an abstraction layer for local protocols LHCb recently migrated to direct xroot access for disk resident data Http/webdav similar concept as xroot will be provided in the future All LHCb storages are already equipped with http/webdav access ISGC'15 - LHCb Distributed Computing Evolution

ISGC'15 - LHCb Distributed Computing Evolution Data Operations (ctd) Gaudi Federation By default LHCb jobs are “sent to the data”, in case the “local copy” is not available the Gaudi federation kicks in Each job is equipped with a local catalog of replica information of its input files, if the local copy is not available it will try to access a remote copy Envisage reduction of tape caches Already for last processing campaigns the input data was staged from tape to disk buffer storage and processed from there The disk cache in front of tape functions as a “pass through” area, allows to considerably reduce this space To be tested during Run 2 ISGC'15 - LHCb Distributed Computing Evolution

ISGC'15 - LHCb Distributed Computing Evolution Underlying services ISGC'15 - LHCb Distributed Computing Evolution

Data Management Services File Transfer Service (FTS3) Used for all WAN transfers and replication of LHCb data Successfully deployed also for tape interaction E.g. pre-staging of tape resident input data for “big campaigns” Many more features available which are not (yet) used Prioritization of transfer types, e.g. CERN RAW export is more important than physics data replication File deletion via FTS Multi-hop transfers, e.g. for remote sites with no good direct connection HTTP Federation Building on top of the HTTP/WEBDAV access to storages Provides “live” view of the data replica information by browsing the “logical namespace” Future possible uses Consistency checks storage against replica catalog (DFC) to find “dark data” WebFTS on top of federation to easily transfer data by physicists ISGC'15 - LHCb Distributed Computing Evolution

More External Services CVMFS Application software distribution previously done via “special jobs” installation the necessary software on shared file systems on the different sites Now centralized installation which propagates via Includes also distribution of detector conditions database Monitoring In addition to the DIRAC Monitoring and Accounting, several external services are available from WLCG SAM3 – probing worker nodes, storage and services – WLCG Availability / Reliability reports generated out of this information Dashboards – display of additional information, e.g. status of CVMFS on different sites perfSonar – network throughput and traceroute monitoring Currently developing new LHCbDIRAC monitoring infrastructure based on elasticsearch ISGC'15 - LHCb Distributed Computing Evolution

ISGC'15 - LHCb Distributed Computing Evolution Summary Computing model evolved from a previously rigid to a more flexible system, which allowed Relaxation of the model of strict Tier levels Reduction of disk resident replicas of analysis data Flexible adaptation to multiple resource types All this done with a small core computing team and interaction with several external projects WAN transfer and tape interaction via FTS Software installation and distribution via CVMFS Additional WLCG monitoring infrastructure ISGC'15 - LHCb Distributed Computing Evolution