Download presentation
Presentation is loading. Please wait.
1
Overview & Status Al-Ain, UAE November 2007
2
Outline Introduction – Overview of the LCG Project Project Status
The computing challenge -why grid computing? Overview of the LCG Project Project Status Challenges & Outlook 25-Nov-07
3
The LHC Computing Challenge
Signal/Noise 10-9 Data volume High rate * large number of channels * 4 experiments 15 PetaBytes of new data each year Compute power Event complexity * Nb. events * thousands users 100 k of (today's) fastest CPUs Worldwide analysis & funding Computing funding locally in major regions & countries Efficient analysis everywhere GRID technology 25-Nov-07
4
Timeline: LHC Computing
ATLAS (or CMS) requirements for first year at design luminosity ATLAS&CMS CTP 107 MIPS 100 TB disk LHC start LHC approved “Hoffmann” Review 7x107 MIPS 1,900 TB disk Computing TDRs 55x107 MIPS 70,000 TB disk (140 MSi2K) 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 LHCb approved ATLAS & CMS approved ALICE approved 25-Nov-07
5
Evolution of CPU Capacity at CERN
SC (0.6GeV) PS (28GeV) ISR (300GeV) SPS (400GeV) ppbar (540GeV) LEP (100GeV) LEP II (200GeV) LHC (14 TeV) Costs (2007 Swiss Francs) Includes infrastructure costs (comp.centre, power, cooling, ..) and physics tapes 25-Nov-07
6
requirements: >10 times CERN possibility
Requirements Match CPU & disk requirements: >10 times CERN possibility 25-Nov-07
7
LHC Computing Multi-science Grid
MONARC project First LHC computing architecture – hierarchical distributed model 2000 – growing interest in grid technology HEP community main driver in launching the DataGrid project EU DataGrid project middleware & testbed for an operational grid – LHC Computing Grid – LCG deploying the results of DataGrid to provide a production facility for LHC experiments CERN 25-Nov-07
8
The Worldwide LHC Computing Grid
Purpose Develop, build and maintain a distributed computing environment for the storage and analysis of data from the four LHC experiments Ensure the computing service … and common application libraries and tools Phase I – Development & planning Phase II – – Deployment & commissioning of the initial services 25-Nov-07
9
WLCG Collaboration The Collaboration Technical Design Reports
4 LHC experiments ~250 computing centres 12 large centres (Tier-0, Tier-1) 38 federations of smaller “Tier-2” centres Growing to ~40 countries Grids: EGEE, OSG, Nordugrid Technical Design Reports WLCG, 4 Experiments: June 2005 Memorandum of Understanding Agreed in October 2005 Resources 5-year forward look
10
LCG Service Hierarchy Tier-0 – the accelerator centre
Data acquisition & initial processing Long-term data curation Distribution of data Tier-1 centres Canada – Triumf (Vancouver) France – IN2P3 (Lyon) Germany – Forschunszentrum Karlsruhe Italy – CNAF (Bologna) Netherlands – NIKHEF/SARA (Amsterdam) Nordic countries – distributed Tier-1 Spain – PIC (Barcelona) Taiwan – Academia SInica (Taipei) UK – CLRC (Oxford) US – FermiLab (Illinois) – Brookhaven (NY) Tier-1 – “online” to the data acquisition process high availability Managed Mass Storage – grid-enabled data service Data-heavy analysis National, regional support Tier-2: ~130 centres in ~35 countries End-user (physicist, research group) analysis – where the discoveries are made Simulation 25-Nov-07
11
Distribution of Computing Services
about 100,000 CPU cores New data will grow at about 15 PetaBytes per year – with two copies CPU Disk Tape Significant fraction of the resources distributed over more than 120 computing centres 25-Nov-07
12
Grid Activity 100K jobs/day
Continuing increase in usage of the EGEE and OSG grids All sites reporting accounting data (CERN, Tier-1, -2, -3) Increase in past 17 months – 5 X number of jobs X cpu usage 100K jobs/day
13
October 2007 - CPU Usage CERN, Tier-1s, Tier-2s
* * NDGF usage for September 2007 > 85% of CPU Usage is external to CERN 25-Nov-07
14
Tier-2 Sites – October 2007 30 sites deliver 75% of the cpu
25-Nov-07
15
LHCOPN Architecture 25-Nov-07
16
Data Transfer out of Tier-0
25-Nov-07
17
Middleware: Baseline Services
The Basic Baseline Services – from the TDR (2005) Storage Element Castor, dCache, DPM (with SRM 1.1) Storm added in 2007 SRM 2.2 – long delays incurred - being deployed in production Basic transfer tools – Gridftp, .. File Transfer Service (FTS) LCG File Catalog (LFC) LCG data mgt tools - lcg-utils Posix I/O – Grid File Access Library (GFAL) Synchronised databases T0T1s 3D project Information System Compute Elements Globus/Condor-C web services (CREAM) gLite Workload Management in production at CERN VO Management System (VOMS) VO Boxes Application software installation Job Monitoring Tools ... continuing evolution reliability, performance, functionality, requirements 25-Nov-07
18
Site Reliability – CERN + Tier-1s
“Site Reliability” a function of grid services middleware site operations storage management systems networks Targets – CERN + Tier-1s Before July July 07 Dec 07 Avg.last 3 months Each site 88% 91% 93% 89% 8 best sites 95%
19
Tier-2 Site Reliability
Tier-2 Sites 83 Tier-2 sites being monitored
20
Improving Reliability
Monitoring Metrics Workshops Data challenges Experience Systematic problem analysis Priority from software developers
21
LCG depends on two major science grid infrastructures ….
EGEE - Enabling Grids for E-Science OSG - US Open Science Grid 25-Nov-07
22
LHC Computing Multi-science Grid
MONARC project First LHC computing architecture – hierarchical distributed model 2000 – growing interest in grid technology HEP community main driver in launching the DataGrid project EU DataGrid project middleware & testbed for an operational grid – LHC Computing Grid – LCG deploying the results of DataGrid to provide a production facility for LHC experiments ; – EU EGEE project starts from the LCG grid shared production infrastructure expanding to other communities and sciences Now preparing 3rd phase CERN 25-Nov-07
23
Grid infrastructure project co-funded by the European Commission - now in 2nd phase with 91 partners in 32 countries 240 sites 45 countries 45,000 CPUs 12 PetaBytes > 5000 users > 100 VOs > 100,000 jobs/day Archeology Astronomy Astrophysics Civil Protection Comp. Chemistry Earth Sciences Finance Fusion Geophysics High Energy Physics Life Sciences Multimedia Material Sciences …
24
EGEE infrastructure use
> 90k jobs/day LCG >143 k jobs/day total Data from EGEE accounting system LHCC Comprehensive Review; November 2007
25
EGEE working with related infrastructure projects
26
Sustainability: Beyond EGEE-II
Need to prepare permanent, common Grid infrastructure Ensure the long-term sustainability of the European e-infrastructure independent of short project funding cycles Coordinate the integration and interaction between National Grid Infrastructures (NGIs) Operate the European level of the production Grid infrastructure for a wide range of scientific disciplines to link NGIs Expand the idea and problems of the JRU
27
EGI – European Grid Initiative
EGI Design Study proposal to the European Commission (started Sept 07) Supported by 37 National Grid Initiatives (NGIs) 2 year project to prepare the setup and operation of a new organizational model for a sustainable pan-European grid infrastructure after the end of EGEE-3
28
Challenges Short timescale Longer term Preparation for start-up:
Resource ramp-up across Tier 1 and 2 sites Site and service reliability Longer term Infrastructure – power and cooling Multi-core CPU – how will we make best use of them? Supporting large scale analysis activities – just starting now – what will be the new problems that arise? Migration from today’s grid to a model of national infrastructures – how to ensure that LHC gets what it needs 25-Nov-07
29
Combined Computing Readiness Challenge - CCRC
A combined challenge by all Experiments & Sites validate the readiness of the WLCG computing infrastructure before start of data taking at a scale comparable to that need for data taking in 2008 Should be done well in advance of the start of data taking to identify flaws, bottlenecks and allow time to fix them Wide battery of tests – simultaneously – all experiments Driven from DAQ with full Tier-0 processing Site-site data transfers, storage system to storage system Required functionality and performance Data access patterns similar to 2008 processing CPU and data loads simulated as required to reach 2008 scale Coordination team in place Two test periods – February, May
30
Ramp-up Needed for Startup
Sep Jul Apr 3 X 2.9 X Jul Sep Apr 3.7 X pledge installed Sep Jul Apr 2.3 X 3.7 X target usage usage 25-Nov-07
31
Summary We have an operational grid service for LHC
EGEE – The European Grid Infrastructure - is the world’s largest multi-disciplinary grid for science ~240 sites; > 100 application groups Over the next months before LHC comes on-line: Ramp-up resources to the MoU levels Improve service reliability and availability Full program of “dress-rehearsals” to demonstrate the complete computing system
32
The Grid is now in operation, working on: reliability, scaling up, sustainability
Tier-1 Centers: TRIUMF (Canada); GridKA(Germany); IN2P3 (France); CNAF (Italy); SARA/NIKHEF (NL); Nordic Data Grid Facility (NDGF); ASCC (Taipei); RAL (UK); BNL (US); FNAL (US); PIC (Spain)
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.