Presentation is loading. Please wait.

Presentation is loading. Please wait.

Tony Doyle - University of Glasgow 21 September 2005AHM05 Meeting GridPP: Meeting The Particle Physics Computing Challenge Tony Doyle.

Similar presentations


Presentation on theme: "Tony Doyle - University of Glasgow 21 September 2005AHM05 Meeting GridPP: Meeting The Particle Physics Computing Challenge Tony Doyle."— Presentation transcript:

1 Tony Doyle - University of Glasgow 21 September 2005AHM05 Meeting GridPP: Meeting The Particle Physics Computing Challenge Tony Doyle

2 Tony Doyle - University of Glasgow 21 September 2005AHM05 Meeting Contents “The particle physicists are now well on their way to constructing a genuinely global particle physics Grid to enable them to exploit the massive data streams expected from the Large Hadron Collider in CERN that will turn on in 2007.” Tony Hey, AHM 2005 Introduction 1.Why? –LHC Motivation ( “one in a billion events”, “20 million readout channels”, “1000s of physicists” “10 million lines of code” ) 2.What? –The World’s Largest Grid ( according to the Economist ) 3.How? –“Get Fit Plan” and Current Status ( 197 sites, 13,797 CPUs, 5PB storage ) 4.When? –Accounting and Planning Overview ( “50 PetaBytes of data”, “100,000 of today’s processors” “2007-08” ) Reference: http://www.allhands.org.uk/2005/proceedings/papers/349.pdfhttp://www.allhands.org.uk/2005/proceedings/papers/349.pdf

3 Tony Doyle - University of Glasgow 21 September 2005AHM05 Meeting 4 LHC Experiments ALICE - heavy ion collisions, to create quark-gluon plasmas - 50,000 particles in each collision LHCb - to study the differences between matter and antimatter - producing over 100 million b and b-bar mesons each year ATLAS - general purpose: origin of mass, supersymmetry, micro-black holes? - 2,000 scientists from 34 countries CMS - general purpose detector - 1,800 scientists from 150 institutes “One Grid to Rule Them All”?

4 Tony Doyle - University of Glasgow 21 September 2005AHM05 Meeting 1. Rare Phenomena - Huge Background 9 orders of magnitude The HIGGS All interactions “one in a billion events” “20 million readout channels” 2. Complexity Why (particularly) the LHC?

5 Tony Doyle - University of Glasgow 21 September 2005AHM05 Meeting Must share data between thousands of scientists with multiple interests link major (Tier-0 [Tier-1]) and minor (Tier-1 [Tier-2]) computer centres ensure all data accessible anywhere, anytime grow rapidly, yet remain reliable for more than a decade cope with different management policies of different centres ensure data security be up and running routinely by 2007 What are the Grid challenges?

6 Tony Doyle - University of Glasgow 21 September 2005AHM05 Meeting What are the Grid challenges? Data Management, Security and Sharing 1. Software process 2. Software efficiency 3. Deployment planning 4. Link centres 5. Share data 6. Manage data7. Install software 8. Analyse data9. Accounting 10. Policies

7 Tony Doyle - University of Glasgow 21 September 2005AHM05 Meeting Grid Overview Aim: by 2008 (full year’s data taking) -CPU ~100MSi2k (100,000 CPUs) -Storage ~80PB - Involving >100 institutes worldwide -Build on complex middleware being developed in advanced Grid technology projects, both in Europe (Glite) and in the USA (VDT) 1.Prototype went live in September 2003 in 12 countries 2.Extensively tested by the LHC experiments in September 2004 3.Currently 197 sites, 13,797 CPUs, 5PB storage in September 2005

8 Tony Doyle - University of Glasgow 21 September 2005AHM05 Meeting Tier Structure Tier 0 Tier 1 National centres Tier 2 Regional groups Tier 3 Institutes Tier 4 Workstations Offline farm Online system CERN computer centre RAL,UK ScotGridNorthGridSouthGridLondon FranceItalyGermanyUSA GlasgowEdinburghDurham

9 Tony Doyle - University of Glasgow 21 September 2005AHM05 Meeting Functionality for the LHC Experiments The basic functionality of the Tier-1s is: ALICE Reconstruction, Chaotic Analysis ATLAS Reconstruction, Scheduled Analysis/skimming, Calibration CMS Reconstruction LHCbReconstruction, Scheduled skimming, Analysis The basic functionality of the Tier-2s is: ALICESimulation Production, Analysis ATLASSimulation, Analysis, Calibration CMSAnalysis, All Simulation Production LHCbSimulation Production, No analysis

10 Tony Doyle - University of Glasgow 21 September 2005AHM05 Meeting Technical Design Reports (June 2005) Computing Technical Design Reports: http://doc.cern.ch/archive/electronic/cern/ preprints/lhcc/public/ ALICE: lhcc-2005-018.pdflhcc-2005-018.pdf ATLAS: lhcc-2005-022.pdflhcc-2005-022.pdf CMS: lhcc-2005-023.pdflhcc-2005-023.pdf LHCb: lhcc-2005-019.pdflhcc-2005-019.pdf LCG: lhcc-2005-024.pdflhcc-2005-024.pdf LCG Baseline Services Group Report: http://cern.ch/LCG/peb/bs/BSReport-v1.0.pdf Contains all you (probably) need to know about LHC computing. End of prototype phase.

11 Tony Doyle - University of Glasgow 21 September 2005AHM05 Meeting Timescales Service Challenges – UK deployment plans End point April ’07 Context: first real (cosmics) data ’05

12 Tony Doyle - University of Glasgow 21 September 2005AHM05 Meeting Baseline Functionality Requirement OMIIVDT/GTLCG/gLiteOtherComment Storage Element YesSRM via dCache, DPM or CASTOR LCG includes Storage Resource Management capability Basic File Transfer YesGridFTPYesLCG includes GridFTP Reliable File Transfer File Transfer ServiceFTS is built on top of GridFTP Catalogue Services RLSLCG File Catalogue, gLite FireMan Central catalogues adequate, high throughput needed Data Management tools OMII Data Service (upload / download) LCG tools (replica management, etc.) gLite File Placement Service under development Compute Element OMII Job ServiceGatekeeperYesLCG uses Globus with mods Workload Management Manual resource allocation & job submission Condor-GResource BrokerRB builds on Globus, Condor-G VO Agents Perform localised activities on behalf of VO VO Membership Services Tools for account management, no GridMapFile equivalent CASVOMSCAS does not provide all the needed functionality DataBase Services MySQL, PostgreSQL, ORACLE Off–the-shelf offerings are adequate Posix-like I/O GFAL, gLite I/OXrootd Application Software Installation Tools YesTools already exist in LCG-2 e.g. PACMAN Job Monitoring Monalisa, Netlog ger Logging & Bookkeeping service, R- GMA Reliable Messaging Tools such as Jabber are used by experiments (e.g. DIRAC for LHCb) Information System MDS (GLUE) YesBDIILCG based on BDII and GLUE schema Concentrate on robustness and scale. Experiments have assigned external middleware priorities.

13 Tony Doyle - University of Glasgow 21 September 2005AHM05 Meeting Exec 2 Summary GridPP2 has already met 21% of its original targets with 86% of the metrics within specification “Get fit” deployment plan in place: LCG 2.6 deployed at 16 sites as a preliminary production service Glite 1 was released in April as planned but components have not yet been deployed or their robustness tested by the experiments (1.3 available on pre-production service) Service Challenge (SC)2 addressing networking was a success at CERN and the RAL Tier-1 in April 2005 SC3 also addressing file transfers has just been completed Long-term concern: planning for 2007-08 (LHC startup) Short-term concerns: some under-utilisation of resources and the deployment of Tier-2 resources At the end of GridPP2 Year 1, the initial foundations of “The Production Grid” are built. The focus is on “efficiency”.

14 Tony Doyle - University of Glasgow 21 September 2005AHM05 Meeting People and Roles More than 100 people in the UK http://www.gridpp.ac.uk/members/

15 Tony Doyle - University of Glasgow 21 September 2005AHM05 Meeting Project Map

16 Tony Doyle - University of Glasgow 21 September 2005AHM05 Meeting GridPP Deployment Status 18/9/05 [2/7/05] (9/1/05) totalCPUfreeCPUrunJobwaitJobseAvail TBseUsed TBmaxCPUavgCPU Total 3070 [2966] (2029) 2247 [1666] (1402) 458 [843] (95) 52 [31] (480) 90.89 [74.28] (8.69) 31.61 [16.54] (4.55) 3118 [3145] (2549) 2784 [2802] (1994) Measurable Improvements: 1.Sites Functional - Tested 2.3000 CPUs 3.Storage via SRM interfaces 4.UK+Ireland federation

17 Tony Doyle - University of Glasgow 21 September 2005AHM05 Meeting New Grid Monitoring Maps Demo AND Google Map http://gridportal.hep.ph.ic.ac.uk/rtm/http://gridportal.hep.ph.ic.ac.uk/rtm/ http://map.gridpp.ac.uk/ http://map.gridpp.ac.uk/ Preliminary Production Grid Status

18 Tony Doyle - University of Glasgow 21 September 2005AHM05 Meeting Accounting

19 Tony Doyle - University of Glasgow 21 September 2005AHM05 Meeting LCG Tier-1 Planning RAL, UK Pledged Planned to be pledged 2006 2007200820092010 CPU (kSI2K) 980 1492 1234 2712 3943 4206 6321 5857 10734 Disk (Tbytes) 450 841 630 1484 2232 2087 3300 3020 5475 Tape (Tbytes) 664 1080 555 2074 2115 3934 4007 5710 6402 2006: Pledged 2007-10: (a)Bottom up (b)Top down } ~50% uncertainty

20 Tony Doyle - University of Glasgow 21 September 2005AHM05 Meeting LCG Tier-1 Planning (CPU & Storage) Experiment requests are large e.g. in 2008 CPU ~50MSi2k Storage ~50PB! They can be met globally except in 2008. UK plan to contribute >7%. [Currently contribute >10%] First LCG Tier-1 Compute Law: CPU:Storage ~1[kSi2k/TB] Second LCG Tier-1 Storage Law: Disk:Tape ~ 1 (The number to remember is.. 1)

21 Tony Doyle - University of Glasgow 21 September 2005AHM05 Meeting LCG Tier-1 Planning (Storage)

22 Tony Doyle - University of Glasgow 21 September 2005AHM05 Meeting LCG Tier-2 Planning UK, Sum of all Federations Pledged Planned to be pledged 2006 2007200820092010 CPU (kSI2K) 3800 3840 1592 4830 4251 5410 6127 6010 9272 Disk (Tbytes) 530 540 258 600 1174 660 2150 720 3406 Third LCG Tier-2 Compute Law: Tier-1:Tier-2 CPU ~1 Zeroth LCG Law: There is no Zeroth law – all is uncertain Fifth LCG Tier-2 Storage Law: CPU:Disk~5[kSi2k/TB]) 2006: Pledged 2007-10: (a)Bottom up (b)Top down } ~100% uncertainty

23 Tony Doyle - University of Glasgow 21 September 2005AHM05 Meeting Set SMART (Specific Measurable Achievable Realistic Time-phased) Goals Systematic approach and measurable improvements in deployment area See Grid Deployment and Operations for EGEE, LCG and GridPP Jeremy Coles Provides context for Grid “efficiency”Grid Deployment and Operations for EGEE, LCG and GridPP The “Get Fit” Plan

24 Tony Doyle - University of Glasgow 21 September 2005AHM05 Meeting High level deployment view UK is a significant contributor to EGEE (~20%) CPU utilisation is lower than 70% target (currently ~55%) Disk resource is climbing (but utilisation is low via Storage Resource Management interfaces)

25 Tony Doyle - University of Glasgow 21 September 2005AHM05 Meeting Sites upgrade improvements ~quarterly upgrades within 3 weeks gradual (20%) improvement in site configuration and stability Increasing number of accessible job slots High level deployment view

26 Tony Doyle - University of Glasgow 21 September 2005AHM05 Meeting Service Challenges SC2 (April) : RAL joined computing centres around the world in a networking challenge, transferring 60 TeraBytes of data over ten days. SC3 (September): RAL to CERN (T1-T0) at rates of up to 650 Mb/s. e.g. Edinburgh to RAL (T2-T1) at rates of up 480Mb/s. UKLight service tested from Lancaster to RAL. Overall, the File Transfer Service is very reliable, failure rate now below 1%.

27 Tony Doyle - University of Glasgow 21 September 2005AHM05 Meeting Middleware Development Configuration Management Storage Interfaces Network Monitoring Security Information Services Grid Data Management

28 Tony Doyle - University of Glasgow 21 September 2005AHM05 Meeting Glite Status 1.2 installed on the Grid pre-production service 1.3 some components have been upgraded 1.4 upgrades to VOMs and registration tools plus additional bulk job submission components LCG 2.6 is the (August) production release UK’s R-GMA incorporated (production and pre-prodn.) LCG 3 will be based upon Glite..

29 Tony Doyle - University of Glasgow 21 September 2005AHM05 Meeting SRM A single SRM server to service incoming file requests (this is implemented as a web service) Multiple file servers with unix filesystems on which data resides. Data transfer is done to/from the file servers, thus inbound IP connectivity is essential to make the SRM SE available to the wider grid.

30 Tony Doyle - University of Glasgow 21 September 2005AHM05 Meeting Data Management File Metadata Logical File Name GUID System Metadata (owner, permissions, checksum, …) User Metadata User Defined Metadata File Replica Storage File Name Storage Host Symlinks Link Name

31 Tony Doyle - University of Glasgow 21 September 2005AHM05 Meeting

32 Tony Doyle - University of Glasgow 21 September 2005AHM05 Meeting Application Development e.g. Reprocessing DØ data with SAMGrid Frederic Villeneuve-Seguier Reprocessing DØ data with SAMGrid ATLAS LHCbCMS BaBar (SLAC) SAMGrid (FermiLab)QCDGrid

33 Tony Doyle - University of Glasgow 21 September 2005AHM05 Meeting Workload Management Efficiency Overview Integrated over all VOs and RBs: Successes/Day 12722 Success % 67% Improving from 42% to 70-80% during 2005 Problems identified: half WMS (Grid) half JDL (User)

34 Tony Doyle - University of Glasgow 21 September 2005AHM05 Meeting LHC VOs ALICE ATLAS CMS LHCb Successes/Day N/A 2435 448 3463 Success % 53% 84% 59% 68% Note. Some caveats, see http://egee-jra2.web.cern.ch/EGEE-JRA2/QoS/JobsMetrics/JobMetrics.htm http://egee-jra2.web.cern.ch/EGEE-JRA2/QoS/JobsMetrics/JobMetrics.htm Selection by experiments of “production sites” using Site Functional Tests (currently ~110 of the 197 sites) or use of pre-test software agents leads to >90% experiment production efficiency

35 Tony Doyle - University of Glasgow 21 September 2005AHM05 Meeting “UK contributes to EGEE's battle with malaria” BioMed Successes/Day 1107 Success % 77% WISDOM (Wide In Silico Docking On Malaria) The first biomedical data challenge for drug discovery, which ran on the EGEE grid production service from 11 July 2005 until 19 August 2005. GridPP resources in the UK contributed ~100,000 kSI2k-hours from 9 sites Number of Biomedical jobs processed by country Normalised CPU hours contributed to the biomedical VO for UK sites, July-August 2005

36 Tony Doyle - University of Glasgow 21 September 2005AHM05 Meeting 1.Why? 2. What? 3. How? 4. When? From Particle Physics perspective the Grid is: 1.mainly (but not just) for physicists, more generally for those needing to utilise large-scale computing resources efficiently and securely 2. a) a working production-scale system running today b) about seamless discovery of computing resources c) using evolving standards for interoperation d) the basis for computing in the 21 st Century e) not (yet) as seamless, robust or efficient as end-users need 3. methods outlined – please come to the PPARC stand, Jeremy Coles’ talk 4.a) now at “preliminary production service” level, for simple(r) applications (e.g. experiment Monte Carlo production) b) 2007 for a fully tested 24x7 LHC service (a large distributed computing resource) for more complex applications (e.g. data analysis) c) planned to meet the LHC Computing Challenge


Download ppt "Tony Doyle - University of Glasgow 21 September 2005AHM05 Meeting GridPP: Meeting The Particle Physics Computing Challenge Tony Doyle."

Similar presentations


Ads by Google