Download presentation
Presentation is loading. Please wait.
Published byColeen Hutchinson Modified over 9 years ago
1
The LHC Computing Grid – February 2008 The Worldwide LHC Computing Grid Emmanuel Tsesmelis 2 nd CERN School Thailand 2012 Suranaree University of Technology 1 May 2012
2
CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/it Brazil and CERN / September 2009 2 2 Breaking the Wall of Communication 23 years ago: the Web was born @ CERN Breaking the Wall of Communication 23 years ago: the Web was born @ CERN... and today ?
3
Enter a New Era in Fundamental Science Start-up of the Large Hadron Collider (LHC), one of the largest and truly global scientific projects ever, is the most exciting turning point in particle physics. Exploration of a new energy frontier LHC ring: 27 km circumference CMS ALICE LHCb ATLAS
5
5 The LHC Computing Challenge Signal/Noise: 10 -13 (10 -9 offline) Data volume High rate * large number of channels * 4 experiments 15 PetaBytes of new data each year Compute power Event complexity * Nb. events * thousands users 200 k CPUs 45 PB of disk storage Worldwide analysis & funding Computing funding locally in major regions & countries Efficient analysis everywhere GRID technology
6
CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/it The LHC Data 40 million events (pictures) per second Select (on the fly) the ~200 interesting events per second to write on tape “Reconstruct” data and convert for analysis: “physics data” [ the grid...] (x4 experiments x15 years) Per eventPer year Raw data1.6 MB3200 TB Reconstructed data1.0 MB2000 TB Physics data0.1 MB 200 TB Concorde (15 km) Balloon (30 km) CD stack with 1 year LHC data! (~ 20 km) Mt. Blanc (4.8 km)
7
150 million sensors deliver data … … 40 million times per second
8
8 A collision at LHC 26 June 2009Ian Bird, CERN
9
9 The Data Acquisition 26 June 2009
10
Tier 0 at CERN: Acquisition, First pass reconstruction, Storage & Distribution Ian.Bird@cern.ch 1.25 GB/sec (ions) 10
11
A distributed computing infrastructure to provide the production and analysis environments for the LHC experiments Managed and operated by a worldwide collaboration between the experiments and the participating computer centres The resources are distributed – for funding and sociological reasons Our task was to make use of the resources available to us – no matter where they are located Ian Bird, CERN11 WLCG – what and why? Tier-0 (CERN): Data recording Initial data reconstruction Data distribution Tier-1 (11 centres): Permanent storage Re-processing Analysis Tier-2 (~130 centres): Simulation End-user analysis
12
e-Infrastructure
13
Tier 0 Tier 1 Tier 2 WLCG Grid Sites Today >130 sites >250k CPU cores >150 PB disk Today >130 sites >250k CPU cores >150 PB disk
14
Lyon/CCIN2P3 Barcelona/PIC De-FZK US-FNAL Ca- TRIUMF NDGF CERN US-BNL UK-RAL Taipei/ASGC Ian Bird, CERN1426 June 2009 Today we have 49 MoU signatories, representing 34 countries: Australia, Austria, Belgium, Brazil, Canada, China, Czech Rep, Denmark, Estonia, Finland, France, Germany, Hungary, Italy, India, Israel, Japan, Rep. Korea, Netherlands, Norway, Pakistan, Poland, Portugal, Romania, Russia, Slovenia, Spain, Sweden, Switzerland, Taipei, Turkey, UK, Ukraine, USA. Today we have 49 MoU signatories, representing 34 countries: Australia, Austria, Belgium, Brazil, Canada, China, Czech Rep, Denmark, Estonia, Finland, France, Germany, Hungary, Italy, India, Israel, Japan, Rep. Korea, Netherlands, Norway, Pakistan, Poland, Portugal, Romania, Russia, Slovenia, Spain, Sweden, Switzerland, Taipei, Turkey, UK, Ukraine, USA. WLCG Collaboration Status Tier 0; 11 Tier 1s; 68 Tier 2 federations WLCG Collaboration Status Tier 0; 11 Tier 1s; 68 Tier 2 federations Amsterdam/NIKHEF-SARA Bologna/CNAF
15
The grid really works All sites, large and small can contribute – And their contributions are needed! Significant use of Tier 2s for analysis The grid really works All sites, large and small can contribute – And their contributions are needed! Significant use of Tier 2s for analysis Ian.Bird@cern.ch15 CPU – around the Tiers
16
Relies on – OPN, GEANT, US-LHCNet – NRENs & other national & international providers Ian Bird, CERN16 LHC Networking
17
CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/it WLCG 2010-11 CPU corresponds to >>150k cores permanently running; Peak job loads – around 200k concurrent jobs In 2010 WLCG delivered ~100 CPU-millenia! CPU corresponds to >>150k cores permanently running; Peak job loads – around 200k concurrent jobs In 2010 WLCG delivered ~100 CPU-millenia! Data traffic in Tier 0 and to grid larger than 2010 values: Up to 4 GB/s from DAQs to tape Data traffic in Tier 0 and to grid larger than 2010 values: Up to 4 GB/s from DAQs to tape LHCb Compass CMS ATLAS AMS ALICE 2 PB Data to tape/month CPU delivered (HS06-hours/month) 17CERN IT Department - Frédéric Hemmer 1 M jobs/day CMS HI data zero suppression & FNAL 2011 data Tier 1s Re-processing 2010 data ALICE HI data Tier 1s 2010 pp data Tier 1s & re-processing enables the rapid delivery of physics results
18
Castor service at Tier 0 well adapted to the load: – Heavy Ions: more than 6 GB/s to tape (tests show that Castor can easily support >12 GB/s); Actual limit now is network from experiment to CC – Major improvements in tape efficiencies – tape writing at ~native drive speeds. Fewer drives needed – ALICE had x3 compression for raw data in HI runs WLCG: Data in 2011 HI: ALICE data into Castor > 4 GB/s (red) HI: Overall rates to tape > 6 GB/s (r+b) 23 PB data written in 2011
19
Since October 2012 Data Transfers for 2012… 2012 Data Already back to “normal” levels for accelerator running
20
Overall Use of WLCG 10 9 HEPSPEC-hours/month (~150 k CPU continuous use) 10 9 HEPSPEC-hours/month (~150 k CPU continuous use) 1.5M jobs/day Usage continues to grow even over end of year technical stop -# jobs/day -CPU usage Usage continues to grow even over end of year technical stop -# jobs/day -CPU usage
21
WLCG – no stop for computing Activity on 3 rd Jan
22
Continued growth in overall usage levels High levels of analysis use, particularly in preparation for winter conferences Resources fully occupied Full reprocessing runs of total 2011 data samples – Achieved by end of the year HI: complete processing of 2011 samples Large simulation campaigns for 2011 data and in preparation for 8 TeV run Disk clean up campaigns in preparation for 2012 data Main Features of Recent Use:
23
Ian.Bird@cern.ch23 CERN & Tier 1 Accounting
24
Ian.Bird@cern.ch24 Use of T0 + T1 Resources Comparison between use per experiment and pledges -For Tier 0 alone -For sum of Tier 1s Comparison between use per experiment and pledges -For Tier 0 alone -For sum of Tier 1s Early in year, pledges start to be installed – can be used Tier 1 use – close to full Can make use of capacity share not used, esp. ALICE & LHCb Early in year, pledges start to be installed – can be used Tier 1 use – close to full Can make use of capacity share not used, esp. ALICE & LHCb
25
Ian.Bird@cern.ch25 Tier 2 accounting
26
Ian.Bird@cern.ch26 Tier 2 usage Tier 2 CPU delivered last 14 months – by country Comparison use & pledges
27
No real issue now Plots show “ops” reports Also published monthly are experiment- specific measured reliabilities: since Feb new report allows use of “arbitrary” experiment tests Ian.Bird@cern.ch27 Reliabilities
28
Fewer incidents in general But longer lasting (or most difficult to resolve) Q4 2011 all except 1 took >24 hr to resolve Ian.Bird@cern.ch28 Service incidents Time to resolution
29
Ian.Bird@cern.ch WLCG has been leveraged on both sides of the Atlantic, to benefit the wider scientific community – Europe: Enabling Grids for E-sciencE (EGEE) 2004-2010 European Grid Infrastructure (EGI) 2010-- – USA: Open Science Grid (OSG) 2006-2012 (+ extension?) Many scientific applications 29 Impact of the LHC Computing Grid Archeology Astronomy Astrophysics Civil Protection Comp. Chemistry Earth Sciences Finance Fusion Geophysics High Energy Physics Life Sciences Multimedia Material Sciences … Archeology Astronomy Astrophysics Civil Protection Comp. Chemistry Earth Sciences Finance Fusion Geophysics High Energy Physics Life Sciences Multimedia Material Sciences …
30
Spectrum of grids, clouds, supercomputers, etc. 30 Grids Collaborative environment Distributed resources (political/sociological) Commodity hardware (also supercomputers) (HEP) data management Complex interfaces (bug not feature) Supercomputers Expensive Low latency interconnects Applications peer reviewed Parallel/coupled applications Traditional interfaces (login) Also SC grids (DEISA, Teragrid) Clouds Proprietary (implementation) Economies of scale in management Commodity hardware Virtualisation for service provision and encapsulating application environment Details of physical resources hidden Simple interfaces (too simple?) Volunteer computing Simple mechanism to access millions CPUs Difficult if (much) data involved Control of environment check Community building – people involved in Science Potential for huge amounts of real work Many different problems: Amenable to different solutions No right answer Many different problems: Amenable to different solutions No right answer Consider ALL as a combined e-Infrastructure ecosystem Aim for interoperability and combine the resources into a consistent whole Keep applications agile so they can operate in many environments
31
Grid: Is a distributed computing service – Integrates distributed resources – Global single-sign-on (use same credential everywhere) – Enables (virtual) collaboration Cloud: Is a large (remote) data centre – Economy of scale – centralize resources in large centres – Virtualisation – enables dynamic provisioning of resources Technologies are not exclusive – In the future our collaborative grid sites will use cloud technologies (virtualisation etc) – We will also use cloud resources to supplement our own Ian Bird, CERN31 Grid Cloud?? 26 June 2009
32
CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/it The CERN Data Centre in Numbers Data Centre Operations (Tier 0) – 24x7 operator support and System Administration services to support 24x7 operation of all IT services. – Hardware installation & retirement ~7,000 hardware movements/year; ~1800 disk failures/year – Management and Automation framework for large scale Linux clusters High Speed Routers (640 Mbps → 2.4 Tbps) 24 Ethernet Switches350 10 Gbps ports2000 Switching Capacity4.8 Tbps 1 Gbps ports16,939 10 Gbps ports558 Racks828 Servers11,728 Processors15,694 Cores64,238 HEPSpec06482,507 Disks64,109 Raw disk capacity (TiB)63,289 Memory modules56,014 Memory capacity (TiB)158 RAID controllers3,749 Tape Drives160 Tape Cartridges45000 Tape slots56000 Tape Capacity (TiB)34000 IT Power Consumption2456 KW Total Power Consumption3890 KW
33
Consolidation of existing centre at CERN – Project ongoing to add additional critical power in the “barn” & consolidate UPS capacity – Scheduled to complete Oct 2012 Remote Tier 0 – Tendering completed, adjudication done in March Finance Committee – Wigner Inst., Budapest, Hungary selected – Anticipate Testing and first equipment installed in 2013 Production 2014 in time for end of LS1 – Will be true extension of Tier 0 LHCOne, IP connectivity direct from Budapest (not in 1 st years) Capacity to ramp up Use model – as dynamic as feasible (avoid pre-allocation of experiments or types of work) Ian.Bird@cern.ch33 Tier 0 Evolution
34
CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/it Collaboration - Education CERN openlab – Intel, Oracle, Siemens, HP Networking http://cern.ch/openlab CERN School of Computing http://cern.ch/csc UNOSAT http://cern.ch/unosat EC Projects – EMI, EGI-Inspire, PARTNER, EULICE, OpenAire Citizen Cyber Science Collaboration – Involving the General Public
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.