The LHC Computing Challenge Tim Bell Fabric Infrastructure & Operations Group Information Technology Department CERN 2nd April 2009
The Four LHC Experiments… ATLAS General purpose Origin of mass Supersymmetry 2,000 scientists from 34 countries CMS General purpose Origin of mass Supersymmetry 1,800 scientists from over 150 institutes ALICE heavy ion collisions, to create quark-gluon plasmas 50,000 particles in each collision LHCb to study the differences between matter and antimatter will detect over 100 million b and b-bar mesons each year
… generate lots of data … The accelerator generates 40 million particle collisions (events) every second at the centre of each of the four experiments’ detectors
… generate lots of data … reduced by online computers to a few hundred “good” events per second. Which are recorded on disk and magnetic tape at 100-1,000 MegaBytes/sec ~15 PetaBytes per year for all four experiments
(extracted by physics topic) Data Handling and Computation for Physics Analysis CERN reconstruction detector event filter (selection & reconstruction) analysis processed data event summary data raw data batch physics analysis event reprocessing simulation analysis objects (extracted by physics topic) event simulation interactive physics analysis
… leading to a high box count ~2,500 PCs Another ~1,500 boxes CPU Disk Tape
Computing Service Hierarchy Tier-0 – the accelerator centre Data acquisition & initial processing Long-term data curation Distribution of data Tier-1 centres Canada – Triumf (Vancouver) France – IN2P3 (Lyon) Germany – Forschunszentrum Karlsruhe Italy – CNAF (Bologna) Netherlands – NIKHEF/SARA (Amsterdam) Nordic countries – distributed Tier-1 Spain – PIC (Barcelona) Taiwan – Academia SInica (Taipei) UK – CLRC (Oxford) US – FermiLab (Illinois) – Brookhaven (NY) Tier-1 – “online” to the data acquisition process high availability Managed Mass Storage Data-heavy analysis National, regional support Tier-2 – ~100 centres in ~40 countries Simulation End-user analysis – batch and interactive
The Grid Timely Technology! Deploy to meet LHC computing needs. Challenges for the Worldwide LHC Computing Grid Project due to worldwide nature competing middleware… newness of technology scale …
Interoperability in action
83 Tier-2 sites being monitored Reliability Site Reliability Tier-2 Sites 83 Tier-2 sites being monitored
Why Linux ? 1990s – Unix wars – 6 different Unix flavours Linux allowed all users to align behind a single OS which was low cost and dynamic Scientific Linux is based on Red Hat with extensions of key usability and performance features AFS global file system XFS high performance file system But how to deploy without proprietary tools? See EDG/WP4 report on current technology (http://cern.ch/hep-proj-grid-fabric/Tools/DataGrid-04-TED-0101-3_0.pdf) or “Framework for Managing Grid-enabled Large Scale Computing Fabrics” (http:/cern.ch/quattor/documentation/poznanski-phd.pdf) for reviews of various packages.
Deployment Commercial Management Suites Scalability (Full) Linux support rare (5+ years ago…) Much work needed to deal with specialist HEP applications; insufficient reduction in staff costs to justify license fees. Scalability 5,000+ machines to be reconfigured 1,000+ new machines per year Configuration change rate of 100s per day See EDG/WP4 report on current technology (http://cern.ch/hep-proj-grid-fabric/Tools/DataGrid-04-TED-0101-3_0.pdf) or “Framework for Managing Grid-enabled Large Scale Computing Fabrics” (http:/cern.ch/quattor/documentation/poznanski-phd.pdf) for reviews of various packages.
Dataflows and rates Remember this figure Scheduled work only! 700MB/s 420MB/s 700MB/s 1120MB/s (1600MB/s) (2000MB/s) Averages! Need to be able to support 2x for recovery! Remember this figure 1430MB/s
Volumes & Rates 15PB/year. Peak rate to tape >2GB/s 3 full SL8500 robots/year Requirement in first 5 years to reread all past data between runs 60PB in 4 months: 6GB/s Can run drives at sustained 80MB/s 75 drives flat out merely for controlled access Data Volume has interesting impact on choice of technology Media use is advantageous: high-end technology (3592, T10K) favoured over LTO.
Tape archive subsystem Castor Architecture DB Svc Job Qry Error Stager RH Client Scheduler Disk cache subsystem RR DB Disk Servers Mover GC StagerJob MigHunter Central Services NameServer RTCPClientD Tape archive subsystem Tape Servers TapeDaemon VMGR RTCPD VDQM Detailed view
Castor Performance
Long lifetime LEP, CERN’s last accelerator, started in 1989 and shutdown 10 years later. First data recorded to IBM 3480s; at least 4 different technologies used over the period. All data ever taken, right back to 1989, was reprocessed and reanalysed in 2001/2. LHC starts in 2007 and will run until at least 2020. What technologies will be in use in 2022 for the final LHC reprocessing and reanalysis? Data repacking required every 2-3 years. Time consuming Data integrity must be maintained
Disk capacity & I/O rates 1996 2000 2006 4GB 10MB/s 50GB 20MB/s 500GB 60MB/s 1TB I/O 250x10MB/s 2,500MB/s 20x20MB/s 400MB/s 2x60MB/s 120MB/s CERN now purchases two different storage server models: capacity oriented and throughput oriented. fragmentation increases management complexity (purchase overhead also increased…)
.. and backup – TSM on Linux Daily Backup volumes of around 18TB to 10 Linux TSM servers
Capacity Requirements
Power Outlook
Summary Immense Challenges & Complexity Data rates, developing software, lack of standards, worldwide collaboration, … Considerable Progress in last ~5-6 years WLCG service exists Petabytes of data transferred But more data is coming in November… Will the system cope with chaotic analysis? Will we understand the system enough to identify problems—and fix underlying causes ? Can we meet requirements given power available?