Download presentation
Presentation is loading. Please wait.
Published byIrma Newton Modified over 9 years ago
1
Petabyte-scale computing challenges of the LHCb experiment UK e-Science All Hands Meeting 2008, Edinburgh, 9 th September 2008 Y. Y. Li on behalf of the LHCb collaboration
2
Outline The questions… The LHC – the experiments taking up the challenge The LHCb experiment at the LHC LHCb computing model Data flow, processing requirements Distributive computing in LHCb Architecture and functionality Performance
3
The questions… The Standard Model of particle physics explains much of the interactions between the fundamental particles that form the Universe all experiments so far confirming its predictions BUT many questions still remain … How does gravity fit into the model? Where does all the mass come from? Why do we have a Universe made up of matter? Does dark matter exist and how much? Search for phenomena beyond our current understanding Go back to the 1 st billionth of a second after the BIG BANG…
4
The Large Hadron Collider 100m below surface on Swiss/French border 14 TeV proton-proton collider, 7x higher than previous machines 1,232 superconducting magnets chilled to -271.3ºc 4 experiments/detectors France Swiss CMS Alice LHCb Atlas 27Km After ~25 years since its first proposal… 1 st circulating beam tomorrow! 1 st collisions in October 2008.
5
LHCb pp VErtex LOcator – b decay vertex Operates only ~5mm from the beam Ring Imaging CHerenkov detector – particle ID Human eye – 100 photos/s RICH – 40million photos/s LHC beauty experiment Special purpose detector to search for: New physics in very rare b quark decays Investigates particle-antiparticle asymmetry ~ 1Trillion bb pairs per year!
6
Data flow Five main LHCb applications (C++ : Gauss, Boole, Brunel, DaVinci Python: Bender) Gauss Event Generation Detector Simulation Boole Digitization Brunel Reconstruction DaVinci Analysis Bender Sim DST Statistics RAW Data flow from detector Production Job Detector calibrations Analysis Job Sim – Simulation data format DST – Data Storage Tape
7
CPU times Gauss Event Generation Detector Simulation Analysis 2,000 interesting events selected per second = 50MB/s data transferred and stored 40 million collisions (events) per second DST Offline full reconstruction, 150MB processed per second of running Full simulation reconstruction, 100MB / event 500KB / event 962 physicist, 56 institutes in 4 continents Full simulation DST 80s / event (2.8GHz Xeon processor) ~100 years for 1 CPU to simulate 1s of real data! 10 7 s data taking / year + simulation ~ O (PB) data per year
8
LHCb computing structure CERN RAL, UK PIC, Spain IN2P3, France GridKA, Germany NIKHEF, Netherlands CNAF, Italy Detector RAW data transfer 10MB/s Simulation data transfer 1MB/s Tier 0 CERN Raw data, ~3K cpu Tier 1 Large centres Reconstruction and analysis, ~15K cpu Tier 2 Universities (~34) Simulations, ~19K cpu Tier 3 / 4 Laptops, desktops etc… Simulations Needs distributed computing
9
LHCb Grid Middleware - DIRAC LHCb’s grid middleware: Distributed Infrastructure with Remote Agent Control Python Multi-platform (Linux, Windows) Built with common grid tools GSI (Grid Security Infrastructure) authentication Pulls together all resources, shared with other experiments Uses experimental wide CPU fair share Optimises CPU usage with Long, steady simulation jobs by production managers Chaotic analysis usage by individual users
10
DIRAC architecture Service orientated architecture 4 parts User interface Services Agents Resources Uses a pull strategy for assigning CPU’s Free, stable CPU’s request for jobs from main server Useful in masking instability of resources from users
11
Linux based Multi-platform Combination of DIRAC services and non-DIRAC services Web monitoring
12
Security and data access DISET, DIRAC SEcuriTy module Uses openssl and modified pyopenssl Allows for proxy support for secure access DISET portal used to facilitate secure access on various platforms when authentication process is OS dependent Platform binaries shipped with DIRAC, version is determined during installation Various data access protocols supported SRM, GridFTP,.NetGridFTP on Windows etc … Data Services operates on main server Each file is assigned a logical file name that matches to the physical file name(s)
13
Compute element resources Other grids, e.g. WLCG (Worldwide LHC Computing Grid) Linux machines Local batch systems, Condor Stand alone, desktops, laptops etc … Windows 3 sites so far ~100 CPU Windows Server, Windows Compute Cluster Windows XP ~90% of the World’s computers are Windows
14
Pilot agents Used to access other grid resources, e.g. WLCG via gLite User job triggers pilot agent submission by DIRAC as a ‘grid job’ to reserve CPU time Pilot on WN checks environment before retrieving the user job from DIRAC WMS Advantages Easy control of CPU quota for shared resources Several pilot agents can be deployed for the same job if failure on WN occurs If full reserved CPU time is not used another job can also be retrieved from the DIRAC WMS
15
Agents on Windows Windows resources – CPU scavenging Non-LHC dedicated CPU’s Spare CPU’s at Universities, private home computers etc… Agent launch would be triggered by e.g. screen saver CPU resource contribution determined by owner during DIRAC installation Windows Compute Cluster Shared single DIRAC installation Job Wrapper submits retrieved jobs via Windows CC submission calls Local job scheduling determined by the Windows CC scheduling service
16
Cross-platform submissions Submissions made with valid grid proxy Three Ways JDL (Job Description Language) DIRAC API Ganga job management system Built on DIRAC API commands Full porting to Windows under process SoftwarePackages = { “DaVinci.v19r12" }; InputSandbox = { “DaVinci.opts” }; InputData = { "LFN:/lhcb/production/DC06/v2/00980000/DST/Presel _00980000_00001212.dst" }; JobName = “DaVinci_1"; Owner = "yingying"; StdOutput = "std.out"; StdError = "std.err"; OutputSandbox = { "std.out", "std.err", “DaVinci_v19r12.log” “DVhbook.root” }; JobType = "user"; JDL import DIRAC from DIRAC.Client.Dirac import * dirac = Dirac() job = Job() job.setApplication(‘DaVinci', 'v19r12') job.setInputSandbox(['DaVinci.opts’]) job.setInputData(['LFN:/lhcb/production/DC 06/v2/00980000/DST/Presel_00980000_0000 1212.dst']) job.setOutputSandbox([‘DaVinci_v19r12.lo g’, ‘DVhbook.root’]) dirac.submit(job) API User pre-compiled binaries can also be shipped Jobs are then bound to be processed on the same platform Successfully used in full selection and background analysis studies (User – Windows, resources – Windows and Linux)
17
Performance Successful processing of data challenges since 2004 Latest data challenge record of >10,000 simultaneous jobs (analysis and production) 700M events simulated in 475 days, ~1700 years of CPU time Windows SitesLinux Sites Total Running Jobs: 9715
18
Conclusions LHC will expect O (PBytes) of data per year per experiment Data to be analysed by 1,000s physicists on 4 continents LHCb distributed computing structure is in place, pulling together a total of ~40K CPU’s from across the World The DIRAC system has been fine tuned on the experiences from the past 4 years of intensive testing We now eagerly await for the LHC switch on and the true test! 1 st beams tomorrow morning!!!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.