LHCb distributed computing during the LHC Runs 1,2 and 3 Stefan Roiser, Chris Haen On behalf of the LHCb Computing team
ISGC'15 - LHCb Distributed Computing Evolution Content Evolution of the LHCb experiment’s computing model and operation in the areas of Data Processing Data Management Supporting Services NB: All activities carried out by LHCb in distributed computing for data management and data processing are managed by LHCbDIRAC This talk is NOT about LHCbDIRAC, see talk 39 “Architecture of the LHCb Distributed Computing System” on Friday ISGC'15 - LHCb Distributed Computing Evolution
ISGC'15 - LHCb Distributed Computing Evolution Preamble: LHC running conditions relevant for LHCb offline data processing Run 1 (2011/12) Planned for Run 2 (> 2015) Max beam energy 4 TeV 6.5 TeV Transverse beam emittance 1.8 μm (??) 1.9 μm β* (beam oscillation) 0.6 m 0.5 m Number of bunches 1374 2508 Max protons per bunch 1.7 * 1011 1.15 * 1011 Bunch spacing 50 ns 25 ns μ (avg # collisions/crossing) 1.6 1.2 Max LHC Luminosity 7.7 * 1033 cm-2s-1 1.6 * 1034 cm-2s-1 Max LHCb Luminosity 4 * 1032 cm-2s-1 ATLAS & CMS NB: LHCb uses “luminosity leveling”, i.e. the “in time pile up” and therefore the instantaneous luminosity stays constant LHCb ISGC'15 - LHCb Distributed Computing Evolution
Preamble 2: Data taking and filtering Before data arrives on “the grid” for processing it runs through hardware/software filters (“High Level Trigger”) reducing the rate of stored events from 40 MHz to ~ kHz During Run 1 4.5 kHz of stored events at ~ 60 kB / event 800 k “RAW” files of 3GB collected Changes for Run 2 Output rate increases to 10 kHz event size stays the same at ~ 60 kB This is so much data out of the pit 300 -> 750 MB/s Results in ~ double amount of data to be processed offline New concept of “Turbo Stream” (2.5 kHz) Event reconstruction in the HLT no further offline processing needed Ideas for Run 3 Output rate increase by factor 10 -> more reco in HLT ISGC'15 - LHCb Distributed Computing Evolution
ISGC'15 - LHCb Distributed Computing Evolution DATA Processing ISGC'15 - LHCb Distributed Computing Evolution
Offline Processing Workflow Legend: Application File Type Storage Element Stripping RAW . X 5GB, 1x FULL.DST Tape RAW 24h Reco 5GB, 1x FULL.DST BUFFER 3GB, 2x Tape 6h Stripping The RAW input file is available on Tape storage Reconstruction (Brunel) runs ~ 24 h, 1 input RAW, 1 output FULL.DST to (Disk) BUFFER Asynchronous migration of FULL.DST from BUFFER to Tape Stripping (DaVinci) runs on 1 or 2 input files (~ 6h/file), output several unmerged DST files (one per “stream”) to BUFFER Input FULL.DST removed from BUFFER asynchronously Rerun the above workflows for one run Once a stream reaches 5 GB of unmerged DSTs (up to O(100) files), Merging (DaVinci) runs ~ 15 – 30 mins, output one merged DST file to Disk Input DST files removed from BUFFER asynchronously X … unmerged (M)DST O(MB) 1x BUFFER Merging 30m (M)DST … 5GB, 1x Disk ISGC'15 - LHCb Distributed Computing Evolution
Offline Data Reconstruction During Run 1 Data processing only at Tier1 sites For Run2 All processed data from a given run will stay at the same site More strict data placement than it was in Run 1 More flexibility b/c output is defined End of Run1 introduced processing with help of T2 sites Eg. 2012 “reprocessing” ~ 50 % of CPU from T2 sites Still “hard attachment” In Run 2 any site (T0/1/2) will be able to help on processing data from any other storage For certain workflows LHCb is moving away from rigid model of Tier levels Say sth about Tier levels ISGC'15 - LHCb Distributed Computing Evolution
Workflow Execution Location Tier1 A RAW Reco Strippg Merge FULL.DST unm. DST DST Tier1 B RAW Reco FULL.DST RAW Reco FULL.DST Tier 2 Tier 2 RAW Reco FULL.DST Tier1 B RAW Reco Strippg Merge FULL.DST unm. DST DST X X Data Processing workflow executed by default at Tier 0/1 sites during Run 1 For Run2 in addition we allow A Tier2 site to participate for a certain Job Type remotely (most useful would be Reco) Any Tier2 is allowed at any time to participate on any Job Type In principal the system also allows for ANY site to participate on any Job Type remotely ISGC'15 - LHCb Distributed Computing Evolution 8
Monte Carlo and User Workflows Monte Carlo Simulation Simulation jobs account for ~ 40 % of work executed on distributed computing resources during a data taking year During shutdown even more Recently introduce “elastic” Monte Carlo Knows CPU/event, able to adapt to the length of the queue User Jobs have highest priority of all workflows ~ 10 % of total work Can run on every tier level If require input data they are sent to the site containing the data ISGC'15 - LHCb Distributed Computing Evolution
Distributed Computing Resources (Virtualized) Vac Use of “non managed” virtualized resources, only hypervisors needed, the VMs will manage themselves (boot, shutdown) vcycle LHCb’s way of interacting with “managed” virtualized resources, e.g. via Openstack BOINC Volunteer computing project (a la “SETI@home”) to run short simulation jobs in a virtualized environment NB: Usage of virtualized resources is likely to expand during Run 2 ISGC'15 - LHCb Distributed Computing Evolution
Distributed Computing Resources (Non-Virtualized) Grid resources LHCb has been and is committed to continue using “classic” grid resources (batch system, worker nodes) HLT (non virtualized) Extensive use of the HLT farm especially during shutdown phase, i.e. + 17k job slots During data taking reduced usage Non pledged resources Several resources contributing to LHCb distributed computing, e.g. Yandex® (Russian search engine provider) ISGC'15 - LHCb Distributed Computing Evolution
ISGC'15 - LHCb Distributed Computing Evolution DATA Management ISGC'15 - LHCb Distributed Computing Evolution
ISGC'15 - LHCb Distributed Computing Evolution Data Storage Tier2Ds (D == Data) Introduced during Run 1, allowing also storage at Tier 2 sites, several sites are participating up to now Data Popularity – reduce of replicas Original computing model included a replica of every physics analysis file on every T1 storage, reduced to 4 replicas during Run 1 Further reduction possible with the help of data popularity tools I.e. register how often a data set is analyzed by physicists See our DataManagement poster (PO-02) ISGC'15 - LHCb Distributed Computing Evolution
ISGC'15 - LHCb Distributed Computing Evolution Data Operations LHCb uses two catalog types Bookkeeping Provides “data provenance” information, i.e. ancestors and descendants of the data produced File Catalog Provides information about “data replicas”, i.e. on which storage elements are the copies of a given file stored The File Catalog was recently migrated from the “LCG File Catalog (LFC)” to the “Dirac File Catalog (DFC)” which provides better performance for Run 2 Data access protocols SRM, an abstraction layer for local protocols LHCb recently migrated to direct xroot access for disk resident data Http/webdav similar concept as xroot will be provided in the future All LHCb storages are already equipped with http/webdav access ISGC'15 - LHCb Distributed Computing Evolution
ISGC'15 - LHCb Distributed Computing Evolution Data Operations (ctd) Gaudi Federation By default LHCb jobs are “sent to the data”, in case the “local copy” is not available the Gaudi federation kicks in Each job is equipped with a local catalog of replica information of its input files, if the local copy is not available it will try to access a remote copy Envisage reduction of tape caches Already for last processing campaigns the input data was staged from tape to disk buffer storage and processed from there The disk cache in front of tape functions as a “pass through” area, allows to considerably reduce this space To be tested during Run 2 ISGC'15 - LHCb Distributed Computing Evolution
ISGC'15 - LHCb Distributed Computing Evolution Underlying services ISGC'15 - LHCb Distributed Computing Evolution
Data Management Services File Transfer Service (FTS3) Used for all WAN transfers and replication of LHCb data Successfully deployed also for tape interaction E.g. pre-staging of tape resident input data for “big campaigns” Many more features available which are not (yet) used Prioritization of transfer types, e.g. CERN RAW export is more important than physics data replication File deletion via FTS Multi-hop transfers, e.g. for remote sites with no good direct connection HTTP Federation Building on top of the HTTP/WEBDAV access to storages Provides “live” view of the data replica information by browsing the “logical namespace” Future possible uses Consistency checks storage against replica catalog (DFC) to find “dark data” WebFTS on top of federation to easily transfer data by physicists ISGC'15 - LHCb Distributed Computing Evolution
More External Services CVMFS Application software distribution previously done via “special jobs” installation the necessary software on shared file systems on the different sites Now centralized installation which propagates via Includes also distribution of detector conditions database Monitoring In addition to the DIRAC Monitoring and Accounting, several external services are available from WLCG SAM3 – probing worker nodes, storage and services – WLCG Availability / Reliability reports generated out of this information Dashboards – display of additional information, e.g. status of CVMFS on different sites perfSonar – network throughput and traceroute monitoring Currently developing new LHCbDIRAC monitoring infrastructure based on elasticsearch ISGC'15 - LHCb Distributed Computing Evolution
ISGC'15 - LHCb Distributed Computing Evolution Summary Computing model evolved from a previously rigid to a more flexible system, which allowed Relaxation of the model of strict Tier levels Reduction of disk resident replicas of analysis data Flexible adaptation to multiple resource types All this done with a small core computing team and interaction with several external projects WAN transfer and tape interaction via FTS Software installation and distribution via CVMFS Additional WLCG monitoring infrastructure ISGC'15 - LHCb Distributed Computing Evolution