Status of LHCb-INFN Computing CSN1, Catania, September 18, 2002 Domenico Galli, Bologna
Status of LHCb-INFN Computing, 2 Domenico Galli LHCb Computing Constraints Urgent Need of production and analysis of large number of MC data sets in a short time. LHCb-light detector design. Trigger design, TDRs. Need to optimize the hardware and software configuration to minimize dead time and system administration effort.
Status of LHCb-INFN Computing, 3 Domenico Galli LHCb Farm Architecture (I) Article in press on Computer Physics Communications: “A Beowulf-class computing cluster for the Monte Carlo production of the LHCb experiment”. Disk-less computing nodes, with operating systems centralized on a file server (Operating System Server). Very flexible configuration, allows adding and removing nodes from the system without any local installation. Useful for computing resources shared among different experiments. Extremely stable system: no side effects at all in more than 1 year of work. System administration duties minimized.
Status of LHCb-INFN Computing, 4 Domenico Galli LHCb Farm Architecture (II) Security Usage of private IP addresses and Virtual LAN. High level of isolation from the Internet network. Extern accesses (afs servers, bookkeeping database, CASTOR library at CERN) through Network Address Translation technology on a Gateway node. Potential system “Single Points of Failure” equipped with redundant disk configuration. RAID-5 (2 NAS). RAID-1 (Gateway and Operating System Server).
Status of LHCb-INFN Computing, 5 Domenico Galli LHCb Farm Architecture (III)
Status of LHCb-INFN Computing, 6 Domenico Galli Fast ethernet switch NAS, 1TB Ethernet controlled power distributor (32 channels) Rack (1U dual-processor MB)
Status of LHCb-INFN Computing, 7 Domenico Galli Data Storage Files containing reconstructed events (OODST-ROOT format) are transferred to CERN using bbftp and automatically stored on the CASTOR tape library. Data transfer from CNAF to CERN performed with a maximum throughput of 70 Mb/s (on a 100 Mb/s link). To be compared with ~15 Mb/s using ftp.
Status of LHCb-INFN Computing, 8 Domenico Galli 2002 Monte Carlo Production Target Production of large event statistics for the design of the LHCb- light detector and of the trigger system (trigger TDR). Software: Simulation (FORTRAN) and reconstruction (C++) code to be used in the production supplied in July. LHCb Data Challenge ongoing (August-September) Participating Computing Centers : CERN, INFN-CNAF, Liverpool, IN2P3-Lyon, NIKHEF, RAL, Bristol, Cambridge, Oxford, ScotGrid (Glasgow & Edinburgh)
Status of LHCb-INFN Computing, 9 Domenico Galli Status of Summer LHCb-Italy Monte Carlo Production (Data Challenge) Events produced in Bologna (Aug., 1 –Sep., 12): 1,053,500 Bd0 -> pi+ pi-79,000 Bd0 -> D*-(D0_bar(K+ pi-) pi-) pi+19,000 Bd0 -> K+ pi-55,500 Bs0 -> K- pi+8,000 Bs0 -> K+ K-8,000 Bs0 -> J/psi(mu+ mu-) eta(gamma gamma)8,000 Bd0 -> phi(K+ K-) Ks0(pi+ pi-)8,000 Bs0 -> mu+ mu-8,000 Bd0 -> D+(K- pi+ pi+) D-(K+ pi- pi-)8,000 Bs0 -> Ds-(K+ K- pi-) K+8,000 Bs0 -> J/psi(mu+ mu-) phi(K+ K-)8,000 Bs0 -> J/psi(e+ e-) phi(K+ K-)8,000 Minimum bias47,500 c c_bar -> inclusive (at least one c hadron in 400 mrad)275,500 b b_bar -> inclusive (at least one b hadron in 400 mrad)505,000
Status of LHCb-INFN Computing, 10 Domenico Galli Distribution of Produced Events Among Production Centers (August, 1–September, 12) The other above mentioned centres are late on the Data Challenge start date.
Status of LHCb-INFN Computing, 11 Domenico Galli Usage of the CNAF Tier-1 Computing Resources Computing, Control and Service Nodes: 130 PIII CPUs (clock ranges from 866 MHz to 1.4 GHz) Disk Storage Servers 1 TB NAS (14 x 80 GB IDE disks + hotspare in RAID5). 1TB NAS (7 x 170 GB SCSI disks + hotspare in RAID5). All the stuff is working at a very high duty-cycle. CPU LOAD
Status of LHCb-INFN Computing, 12 Domenico Galli Plan for Analysis Activities In autumn the analysis of the data produced during the Data Challenge is foreseen. Complete porting to Bologna of the development environment of the analysis code (DaVinci C++ code) already performed and in use on a mini-farm since 2 months. Need of an extension of the analysis mini-farm to a grater number of nodes for the need of the Italian LHCb collaboration. Data produced in Bologna are kept stored on Bologna disks, data produced in the other centers need to be transferred to Bologna on user-demand with an automatic procedure. Analysis jobs (on ~100 CPUs) need an I/O throughput (~100MB/s) greater than supplied by NAS (~10MB/s).
Status of LHCb-INFN Computing, 13 Domenico Galli High Performance I/O System (I) An I/O parallelization system (through the use of a parallel file system) was successfully tested. PVFS (Parallel Virtual File System). File striping of data among local disks of several I/O servers (ION). Scalable System (throughput ~ 100 Mbit/s x n_ION) CN 1 CN 2 CN m ION 1 ION 2 ION n MGR I/O nodes Management Node Clients Network
Status of LHCb-INFN Computing, 14 Domenico Galli High Performance I/O System (II) With 10 ION we were able to reach the Aggregate I/O of 110 MB/s (30 client nodes reading data). To be compared with: MB/s (local disk) 10 MB/s (100Base-T NAS) 50 MB/s (1000Base-T NAS) With a single file hierarchy.
Status of LHCb-INFN Computing, 15 Domenico Galli Test of a PVFS-Based Analysis Facility (I) Test performed using the OO DaVinci algorithm for B + – selection. Analyzed 44.5k signal events and 484k bb inclusive events in 25 minutes (to be compared with 2 days on a single PC). Completely performed with the Bologna Farm parallelizing the analysis algorithm over 106 CPUs (80 x 1.4 GHz PIII CPUs + 26 x 1 GHz PIII CPUs). DaVinci processes read OODST from PVFS.
Status of LHCb-INFN Computing, 16 Domenico Galli Test of a PVFS-Based Analysis Facility (II) CN 1 PVFS CN 2 CN 106 Nt-ple ION 1 ION 2 ION 10 MGR Login Node OODST
Status of LHCb-INFN Computing, 17 Domenico Galli Test of a PVFS-Based Analysis Facility (III) 106 DaVinci processes reading from PVFS. 968 files (500 OODST events each) x 120 MB. 116 GB read and processed in 1500 s.
Status of LHCb-INFN Computing, 18 Domenico Galli B + – : Pion Momentum Resolution p [GeV/c] p / p for identified pions coming from B 0 FWMH 0.01 p / p | p / p| vs p for identified pions coming from B 0 p / p
Status of LHCb-INFN Computing, 19 Domenico Galli B 0 Mass Plots P t > 800 MeV/c d/ d > 1.6 l B0 > 1 mm MeV/c 2 All pi+ pi- pairs with no cuts All pi+ pi- pairs with all cuts (magnified) 3425 events 105 events FWMH 66 MeV
Status of LHCb-INFN Computing, 20 Domenico Galli bb Inclusive Background Mass Plot All pi+ pi- pairs with all cuts Total number of events 484k. Only events with single interaction taken into account at the moment: ~240k. 213 events in mass region after all cuts. 32/213 are ghosts. GeV/c 2
Status of LHCb-INFN Computing, 21 Domenico Galli Signal Efficiency and Mass Plots for Tighter Cuts Final Efficiency (tighter zero bb inclusive background (240k events) = 871/22271 = 4% Rejection against bb inclusive background > 1-1/ = % GeV/c signal events in mass region 16 BG events from signal sample in mass region (all ghosts) GeV/c 2 P t > 2.8 GeV/c d/ d > 2.5 l B0 > 0.6 mm
Status of LHCb-INFN Computing, 22 Domenico Galli Conclusions MC production farm stably running (with increasing resources) since more than 1 year. INFN Tier-1 is the second most active LHCb MC production centre (after CERN). The collaboration with the CNAF staff is excellent. Still we aren’t using GRID tools in production, but we plan to move as soon as the detector design is stable. An analysis mini-farm for interactive work is running since more than 1 month and we plan to extend the number of nodes depending on the availability of the resources. Massive analysis system architecture already tested using a parallel file system and 106 CPUs. We need at least to keep the present computing power at CNAF (but more resources to keep production running in parallel with massive analysis activities would be welcome) to supply the analysis facility to the LHCb- Italian collaboration.