Presentation is loading. Please wait.

Presentation is loading. Please wait.

Grid Computing in HIGH ENERGY Physics

Similar presentations


Presentation on theme: "Grid Computing in HIGH ENERGY Physics"— Presentation transcript:

1 Grid Computing in HIGH ENERGY Physics
Challenges and Opportunities Dr. Ian Bird LHC Computing Grid Project Leader Göttingen Tier 2 Inauguration 13th May 2008

2 The scales

3 High Energy Physics machines and detectors
√s=14 TeV L : 1034/cm2/s L: /cm2/s Chambres à muons Calorimètre - 2,5 million collisions per second LVL1: 10 KHz, LVL3: Hz 25 MB/sec digitized recording 40 million collisions per second LVL1: 1 kHz, LVL3: 100 Hz 0.1 to 1 GB/sec digitized recording

4 LHC: 4 experiments … ready! First physics expected in autumn 2008
Is the computing ready ?

5 The LHC Computing Challenge
Signal/Noise: 10-9 Data volume High rate * large number of channels * 4 experiments  15 PetaBytes of new data each year Compute power Event complexity * Nb. events * thousands users  100 k of (today's) fastest CPUs Worldwide analysis & funding Computing funding locally in major regions & countries Efficient analysis everywhere  GRID technology

6 A collision at LHC Luminosity : 1034cm-2 s-1 40 MHz – every 25 ns
20 events overlaying

7 The Data Acquisition

8 Tier 0 at CERN: Acquisition, First pass reconstruction, Storage & Distribution
I would stress the continuous operations over several months and the link between collisions, experiments and computing (and the fact that the reliability of the computing could -if not good enough- impact the daq and ultimately the quality of science. This will bring you nicely onto foil 10. 1.25 GB/sec (ions)

9 Tier 0 – Tier 1 – Tier 2 Tier-0 (CERN): Data recording
First-pass reconstruction Data distribution Tier-1 (11 centres): Permanent storage Re-processing Analysis Tier-2 (>200 centres): Simulation End-user analysis

10 Evolution of requirements
ATLAS (or CMS) requirements for first year at design luminosity ATLAS&CMS CTP 107 MIPS 100 TB disk LHC start LHC approved “Hoffmann” Review 7x107 MIPS 1,900 TB disk Computing TDRs 55x107 MIPS 70,000 TB disk (140 MSi2K) 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 LHCb approved ATLAS & CMS approved ALICE approved

11 Evolution of CPU Capacity at CERN
Tape & disk requirements: >10 times CERN possibility SC (0.6GeV) PS (28GeV) ISR (300GeV) SPS (400GeV) ppbar (540GeV) LEP (100GeV) LEP II (200GeV) LHC (14 TeV) Costs (2007 Swiss Francs) Includes infrastructure costs (comp.centre, power, cooling, ..) and physics tapes

12 Evolution of Grids WLCG GriPhyN, iVDGL, PPDG GRID 3 OSG EU DataGrid
EGEE 1 EGEE 2 EGEE 3 LCG 1 LCG 2 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 Service Challenges Cosmics First physics Data Challenges

13 The Worldwide LHC Computing Grid
Purpose Develop, build and maintain a distributed computing environment for the storage and analysis of data from the four LHC experiments Ensure the computing service … and common application libraries and tools Phase I – Development & planning Phase II – – Deployment & commissioning of the initial services

14 WLCG Collaboration MoU Signing Status The Collaboration
Tier 1 – all have now signed Tier 2: MoU Signing Status The Collaboration 4 LHC experiments ~250 computing centres 12 large centres (Tier-0, Tier-1) 56 federations of smaller “Tier-2” centres Growing to ~40 countries Grids: EGEE, OSG, Nordugrid Technical Design Reports WLCG, 4 Experiments: June 2005 Memorandum of Understanding Agreed in October 2005 Resources 5-year forward look Australia Belgium Canada * China Czech Rep. * Denmark Estonia Finland France Germany (*) Hungary * Italy India Israel Japan JINR Korea Netherlands Norway * Pakistan Poland Potugal Romania Russia Slovenia Spain Sweden * Switzerland Taipei Turkey * UK Ukraine USA Still to sign: Austria Brazil (under discussion) * Recent additions

15 WLCG Service Hierarchy
Tier-0 – the accelerator centre Data acquisition & initial processing Long-term data curation Distribution of data  Tier-1 centres Canada – Triumf (Vancouver) France – IN2P3 (Lyon) Germany – Forschunszentrum Karlsruhe Italy – CNAF (Bologna) Netherlands – NIKHEF/SARA (Amsterdam) Nordic countries – distributed Tier-1 Spain – PIC (Barcelona) Taiwan – Academia SInica (Taipei) UK – CLRC (Oxford) US – FermiLab (Illinois) – Brookhaven (NY) Tier-1 – “online” to the data acquisition process  high availability Managed Mass Storage –  grid-enabled data service Data-heavy analysis National, regional support Tier-2: ~130 centres in ~35 countries End-user (physicist, research group) analysis – where the discoveries are made Simulation

16 Recent grid use Across all grid infrastructures
EGEE, OSG, Nordugrid The grid concept really works – all contributions – large & small are essential! CERN: 11% Tier 2: 54% Tier 1: 35%

17 Recent grid activity WLCG ran ~ 44 M jobs in 2007 – workload has continued to increase 29M in 2008 – now at ~ >300k jobs/day Distribution of work across Tier0/Tier1/Tier 2 really illustrates the importance of the grid system Tier 2 contribution is around 50%; > 85% is external to CERN 300k /day 230k /day These workloads (reported across all WLCG centres) are at the level anticipated for 2008 data taking

18 LHCOPN Architecture

19 Data Transfer out of Tier-0
Target: 2008/2009 1.3 GB/s

20 Production Grids WLCG relies on a production quality infrastructure
Requires standards of: Availability/reliability Performance Manageability Will be used 365 days a year ... (has been for several years!) Tier 1s must store the data for at least the lifetime of the LHC - ~20 years Not passive – requires active migration to newer media Vital that we build a fault-tolerant and reliable system That can deal with individual sites being down and recover

21 The EGEE Production Infrastructure
Operations Coordination Centre Regional Operations Centres Global Grid User Support EGEE Network Operations Centre (SA2) Operational Security Coordination Team Support Structures & Processes Production Service Pre-production service Certification test-beds (SA3) Test-beds & Services Training infrastructure (NA4) Training activities (NA3) Operations Advisory Group (+NA4) Joint Security Policy Group EuGridPMA (& IGTF) Grid Security Vulnerability Group Security & Policy Groups

22 Site Reliability Sep 07 Oct 07 Nov 07 Dec 07 Jan 08 Feb 08 All 89% 86%
92% 87% 84% 8 best 93% 95% 96% Above target (+>90% target) 7 + 2 5 + 4 9 + 2 6 + 4 7 + 3

23 Improving Reliability
Monitoring Metrics Workshops Data challenges Experience Systematic problem analysis Priority from software developers

24 Gridmap

25 Middleware: Baseline Services
The Basic Baseline Services – from the TDR (2005) Storage Element Castor, dCache, DPM Storm added in 2007 SRM 2.2 – deployed in production – Dec 2007 Basic transfer tools – Gridftp, .. File Transfer Service (FTS) LCG File Catalog (LFC) LCG data mgt tools - lcg-utils Posix I/O – Grid File Access Library (GFAL) Synchronised databases T0T1s 3D project Information System Scalability improvements Compute Elements Globus/Condor-C – improvements to LCG-CE for scale/reliability web services (CREAM) Support for multi-user pilot jobs (glexec, SCAS) gLite Workload Management in production VO Management System (VOMS) VO Boxes Application software installation Job Monitoring Tools Focus now on continuing evolution of reliability, performance, functionality, requirements For a production grid the middleware must allow us to build fault-tolerant and scalable services: this is more important than sophisticated functionality

26 Database replication In full production
Several GB/day user data can be sustained to all Tier 1s ~100 DB nodes at CERN and several 10’s of nodes at Tier 1 sites Very large distributed database deployment Used for several applications Experiment calibration data; replicating (central, read-only) file catalogues

27 LCG depends on two major science grid infrastructures ….
EGEE - Enabling Grids for E-Science OSG - US Open Science Grid Interoperability & interoperation is vital significant effort in building the procedures to support it

28 Grid infrastructure project co-funded by the European Commission - now in 2nd phase with 91 partners in 32 countries 240 sites 45 countries 45,000 CPUs 12 PetaBytes > 5000 users > 100 VOs > 100,000 jobs/day Archeology Astronomy Astrophysics Civil Protection Comp. Chemistry Earth Sciences Finance Fusion Geophysics High Energy Physics Life Sciences Multimedia Material Sciences

29 EGEE: Increasing workloads
⅓ non-LHC

30 Grid Applications Medical Seismology Chemistry Astronomy
Particle Physics Fusion

31 Share of EGEE resources
HEP 5/07 – 4/08: 45 Million jobs

32 HEP use of EGEE: May 07 – Apr 08

33 The next step

34 Sustainability: Beyond EGEE-II
Need to prepare permanent, common Grid infrastructure Ensure the long-term sustainability of the European e-infrastructure independent of short project funding cycles Coordinate the integration and interaction between National Grid Infrastructures (NGIs) Operate the European level of the production Grid infrastructure for a wide range of scientific disciplines to link NGIs Expand the idea and problems of the JRU

35 EGI – European Grid Initiative
EGI Design Study proposal to the European Commission (started Sept 07) Supported by 37 National Grid Initiatives (NGIs) 2 year project to prepare the setup and operation of a new organizational model for a sustainable pan-European grid infrastructure after the end of EGEE-3

36 Summary We have an operating production quality grid infrastructure that: Is in continuous use by all 4 experiments (and many other applications); Is still growing in size – sites, resources (and still to finish ramp up for LHC start-up); Demonstrates interoperability (and interoperation!) between 3 different grid infrastructures (EGEE, OSG, Nordugrid); Is becoming more and more reliable; Is ready for LHC start up For the future we must: Learn how to reduce the effort required for operation; Tackle upcoming issues of infrastructure (e.g. Power, cooling); Manage migration of underlying infrastructures to longer term models; Be ready to adapt the WLCG service to new ways of doing distributed computing

37


Download ppt "Grid Computing in HIGH ENERGY Physics"

Similar presentations


Ads by Google