Tony Doyle - University of Glasgow 21 September 2005AHM05 Meeting GridPP: Meeting The Particle Physics Computing Challenge Tony Doyle.

Slides:



Advertisements
Similar presentations
S.L.LloydATSE e-Science Visit April 2004Slide 1 GridPP – A UK Computing Grid for Particle Physics GridPP 19 UK Universities, CCLRC (RAL & Daresbury) and.
Advertisements

Particle physics – the computing challenge CERN Large Hadron Collider –2007 –the worlds most powerful particle accelerator –10 petabytes (10 million billion.
GridPP: Executive Summary Tony Doyle. Tony Doyle - University of Glasgow Oversight Committee 11 October 2007 Exec 2 Summary Grid Status: Geographical.
Tony Doyle - University of Glasgow 30 November 2005ScotGrid Phase 2 Procurement ScotGrid Procurement … a future news item 25 June 2006: ScotGrid's 4th.
Fighting Malaria With The Grid. Computing on The Grid The Internet allows users to share information across vast geographical distances. Using similar.
The LHC experiments AuthZ Interoperation requirements GGF16, Athens 16 February 2006 David Kelsey CCLRC/RAL, UK
 Contributing >30% of throughput to ATLAS and CMS in Worldwide LHC Computing Grid  Reliant on production and advanced networking from ESNET, LHCNET and.
Plateforme de Calcul pour les Sciences du Vivant SRB & gLite V. Breton.
Southgrid Status Pete Gronbech: 27th June 2006 GridPP 16 QMUL.
23/04/2008VLVnT08, Toulon, FR, April 2008, M. Stavrianakou, NESTOR-NOA 1 First thoughts for KM3Net on-shore data storage and distribution Facilities VLV.
The LHC Computing Grid – February 2008 The Worldwide LHC Computing Grid Dr Ian Bird LCG Project Leader 15 th April 2009 Visit of Spanish Royal Academy.
Ian M. Fisk Fermilab February 23, Global Schedule External Items ➨ gLite 3.0 is released for pre-production in mid-April ➨ gLite 3.0 is rolled onto.
LHC Experiment Dashboard Main areas covered by the Experiment Dashboard: Data processing monitoring (job monitoring) Data transfer monitoring Site/service.
LCG Milestones for Deployment, Fabric, & Grid Technology Ian Bird LCG Deployment Area Manager PEB 3-Dec-2002.
The Grid Prof Steve Lloyd Queen Mary, University of London.
CMS Report – GridPP Collaboration Meeting VI Peter Hobson, Brunel University30/1/2003 CMS Status and Plans Progress towards GridPP milestones Workload.
INFSO-RI Enabling Grids for E-sciencE gLite Data Management Services - Overview Mike Mineter National e-Science Centre, Edinburgh.
CHEP – Mumbai, February 2006 The LCG Service Challenges Focus on SC3 Re-run; Outlook for 2006 Jamie Shiers, LCG Service Manager.
Computing Infrastructure Status. LHCb Computing Status LHCb LHCC mini-review, February The LHCb Computing Model: a reminder m Simulation is using.
Tony Doyle - University of Glasgow 4 July 2005GridPP13 Collaboration Meeting GridPP Overview Tony Doyle.
QCDGrid Progress James Perry, Andrew Jackson, Stephen Booth, Lorna Smith EPCC, The University Of Edinburgh.
GridPP & The Grid Who we are & what it is Tony Doyle.
LCG Service Challenge Phase 4: Piano di attività e impatto sulla infrastruttura di rete 1 Service Challenge Phase 4: Piano di attività e impatto sulla.
3 June 2004GridPP10Slide 1 GridPP Dissemination Sarah Pearce Dissemination Officer
ATLAS and GridPP GridPP Collaboration Meeting, Edinburgh, 5 th November 2001 RWL Jones, Lancaster University.
Instrumentation of the SAM-Grid Gabriele Garzoglio CSC 426 Research Proposal.
Tony Doyle - University of Glasgow 1 July 2005Oversight Committee GridPP: Executive Summary Tony Doyle.
Tony Doyle - University of Glasgow 6 September 2005Collaboration Meeting GridPP Overview (emphasis on beyond GridPP) Tony Doyle.
SouthGrid SouthGrid SouthGrid is a distributed Tier 2 centre, one of four setup in the UK as part of the GridPP project. SouthGrid.
Δ Storage Middleware GridPP10 What’s new since GridPP9? CERN, June 2004.
November SC06 Tampa F.Fanzago CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team.
Tier-2  Data Analysis  MC simulation  Import data from Tier-1 and export MC data CMS GRID COMPUTING AT THE SPANISH TIER-1 AND TIER-2 SITES P. Garcia-Abia.
GridPP Deployment & Operations GridPP has built a Computing Grid of more than 5,000 CPUs, with equipment based at many of the particle physics centres.
GridPP Deployment Status GridPP14 Jeremy Coles 6 th September 2005.
GridPP Building a UK Computing Grid for Particle Physics Professor Steve Lloyd, Queen Mary, University of London Chair of the GridPP Collaboration Board.
LCG Service Challenges: Planning for Tier2 Sites Update for HEPiX meeting Jamie Shiers IT-GD, CERN.
LCG Service Challenges: Planning for Tier2 Sites Update for HEPiX meeting Jamie Shiers IT-GD, CERN.
Owen SyngeTitle of TalkSlide 1 Storage Management Owen Synge – Developer, Packager, and first line support to System Administrators. Talks Scope –GridPP.
…building the next IT revolution From Web to Grid…
Tony Doyle - University of Glasgow 8 July 2005Collaboration Board Meeting GridPP Report Tony Doyle.
The LHC Computing Grid – February 2008 The Challenges of LHC Computing Dr Ian Bird LCG Project Leader 6 th October 2009 Telecom 2009 Youth Forum.
Les Les Robertson LCG Project Leader High Energy Physics using a worldwide computing grid Torino December 2005.
1 LHCb on the Grid Raja Nandakumar (with contributions from Greig Cowan) ‏ GridPP21 3 rd September 2008.
Grid User Interface for ATLAS & LHCb A more recent UK mini production used input data stored on RAL’s tape server, the requirements in JDL and the IC Resource.
CERN IT Department CH-1211 Genève 23 Switzerland t Frédéric Hemmer IT Department Head - CERN 23 rd August 2010 Status of LHC Computing from.
ATLAS is a general-purpose particle physics experiment which will study topics including the origin of mass, the processes that allowed an excess of matter.
GridPP11 Liverpool Sept04 SAMGrid GridPP11 Liverpool Sept 2004 Gavin Davies Imperial College London.
US LHC OSG Technology Roadmap May 4-5th, 2005 Welcome. Thank you to Deirdre for the arrangements.
ATLAS WAN Requirements at BNL Slides Extracted From Presentation Given By Bruce G. Gibbard 13 December 2004.
Derek Ross E-Science Department DCache Deployment at Tier1A UK HEP Sysman April 2005.
INFSO-RI Enabling Grids for E-sciencE ARDA Experiment Dashboard Ricardo Rocha (ARDA – CERN) on behalf of the Dashboard Team.
The CMS Top 5 Issues/Concerns wrt. WLCG services WLCG-MB April 3, 2007 Matthias Kasemann CERN/DESY.
Testing and integrating the WLCG/EGEE middleware in the LHC computing Simone Campana, Alessandro Di Girolamo, Elisa Lanciotti, Nicolò Magini, Patricia.
Data Transfer Service Challenge Infrastructure Ian Bird GDB 12 th January 2005.
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
Distributed Physics Analysis Past, Present, and Future Kaushik De University of Texas at Arlington (ATLAS & D0 Collaborations) ICHEP’06, Moscow July 29,
LHCC Referees Meeting – 28 June LCG-2 Data Management Planning Ian Bird LHCC Referees Meeting 28 th June 2004.
The GridPP DIRAC project DIRAC for non-LHC communities.
Status of gLite-3.0 deployment and uptake Ian Bird CERN IT LCG-LHCC Referees Meeting 29 th January 2007.
J Jensen/J Gordon RAL Storage Storage at RAL Service Challenge Meeting 27 Jan 2005.
Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.
INFSO-RI Enabling Grids for E-sciencE File Transfer Software and Service SC3 Gavin McCance – JRA1 Data Management Cluster Service.
ScotGRID is the Scottish prototype Tier 2 Centre for LHCb and ATLAS computing resources. It uses a novel distributed architecture and cutting-edge technology,
GridPP, The Grid & Industry
The LHC Computing Grid Visit of Mtro. Enrique Agüera Ibañez
Collaboration Meeting
Understanding the nature of matter -
LHC Data Analysis using a worldwide computing grid
Collaboration Board Meeting
The LHCb Computing Data Challenge DC06
Presentation transcript:

Tony Doyle - University of Glasgow 21 September 2005AHM05 Meeting GridPP: Meeting The Particle Physics Computing Challenge Tony Doyle

Tony Doyle - University of Glasgow 21 September 2005AHM05 Meeting Contents “The particle physicists are now well on their way to constructing a genuinely global particle physics Grid to enable them to exploit the massive data streams expected from the Large Hadron Collider in CERN that will turn on in 2007.” Tony Hey, AHM 2005 Introduction 1.Why? –LHC Motivation ( “one in a billion events”, “20 million readout channels”, “1000s of physicists” “10 million lines of code” ) 2.What? –The World’s Largest Grid ( according to the Economist ) 3.How? –“Get Fit Plan” and Current Status ( 197 sites, 13,797 CPUs, 5PB storage ) 4.When? –Accounting and Planning Overview ( “50 PetaBytes of data”, “100,000 of today’s processors” “ ” ) Reference:

Tony Doyle - University of Glasgow 21 September 2005AHM05 Meeting 4 LHC Experiments ALICE - heavy ion collisions, to create quark-gluon plasmas - 50,000 particles in each collision LHCb - to study the differences between matter and antimatter - producing over 100 million b and b-bar mesons each year ATLAS - general purpose: origin of mass, supersymmetry, micro-black holes? - 2,000 scientists from 34 countries CMS - general purpose detector - 1,800 scientists from 150 institutes “One Grid to Rule Them All”?

Tony Doyle - University of Glasgow 21 September 2005AHM05 Meeting 1. Rare Phenomena - Huge Background 9 orders of magnitude The HIGGS All interactions “one in a billion events” “20 million readout channels” 2. Complexity Why (particularly) the LHC?

Tony Doyle - University of Glasgow 21 September 2005AHM05 Meeting Must share data between thousands of scientists with multiple interests link major (Tier-0 [Tier-1]) and minor (Tier-1 [Tier-2]) computer centres ensure all data accessible anywhere, anytime grow rapidly, yet remain reliable for more than a decade cope with different management policies of different centres ensure data security be up and running routinely by 2007 What are the Grid challenges?

Tony Doyle - University of Glasgow 21 September 2005AHM05 Meeting What are the Grid challenges? Data Management, Security and Sharing 1. Software process 2. Software efficiency 3. Deployment planning 4. Link centres 5. Share data 6. Manage data7. Install software 8. Analyse data9. Accounting 10. Policies

Tony Doyle - University of Glasgow 21 September 2005AHM05 Meeting Grid Overview Aim: by 2008 (full year’s data taking) -CPU ~100MSi2k (100,000 CPUs) -Storage ~80PB - Involving >100 institutes worldwide -Build on complex middleware being developed in advanced Grid technology projects, both in Europe (Glite) and in the USA (VDT) 1.Prototype went live in September 2003 in 12 countries 2.Extensively tested by the LHC experiments in September Currently 197 sites, 13,797 CPUs, 5PB storage in September 2005

Tony Doyle - University of Glasgow 21 September 2005AHM05 Meeting Tier Structure Tier 0 Tier 1 National centres Tier 2 Regional groups Tier 3 Institutes Tier 4 Workstations Offline farm Online system CERN computer centre RAL,UK ScotGridNorthGridSouthGridLondon FranceItalyGermanyUSA GlasgowEdinburghDurham

Tony Doyle - University of Glasgow 21 September 2005AHM05 Meeting Functionality for the LHC Experiments The basic functionality of the Tier-1s is: ALICE Reconstruction, Chaotic Analysis ATLAS Reconstruction, Scheduled Analysis/skimming, Calibration CMS Reconstruction LHCbReconstruction, Scheduled skimming, Analysis The basic functionality of the Tier-2s is: ALICESimulation Production, Analysis ATLASSimulation, Analysis, Calibration CMSAnalysis, All Simulation Production LHCbSimulation Production, No analysis

Tony Doyle - University of Glasgow 21 September 2005AHM05 Meeting Technical Design Reports (June 2005) Computing Technical Design Reports: preprints/lhcc/public/ ALICE: lhcc pdflhcc pdf ATLAS: lhcc pdflhcc pdf CMS: lhcc pdflhcc pdf LHCb: lhcc pdflhcc pdf LCG: lhcc pdflhcc pdf LCG Baseline Services Group Report: Contains all you (probably) need to know about LHC computing. End of prototype phase.

Tony Doyle - University of Glasgow 21 September 2005AHM05 Meeting Timescales Service Challenges – UK deployment plans End point April ’07 Context: first real (cosmics) data ’05

Tony Doyle - University of Glasgow 21 September 2005AHM05 Meeting Baseline Functionality Requirement OMIIVDT/GTLCG/gLiteOtherComment Storage Element YesSRM via dCache, DPM or CASTOR LCG includes Storage Resource Management capability Basic File Transfer YesGridFTPYesLCG includes GridFTP Reliable File Transfer File Transfer ServiceFTS is built on top of GridFTP Catalogue Services RLSLCG File Catalogue, gLite FireMan Central catalogues adequate, high throughput needed Data Management tools OMII Data Service (upload / download) LCG tools (replica management, etc.) gLite File Placement Service under development Compute Element OMII Job ServiceGatekeeperYesLCG uses Globus with mods Workload Management Manual resource allocation & job submission Condor-GResource BrokerRB builds on Globus, Condor-G VO Agents Perform localised activities on behalf of VO VO Membership Services Tools for account management, no GridMapFile equivalent CASVOMSCAS does not provide all the needed functionality DataBase Services MySQL, PostgreSQL, ORACLE Off–the-shelf offerings are adequate Posix-like I/O GFAL, gLite I/OXrootd Application Software Installation Tools YesTools already exist in LCG-2 e.g. PACMAN Job Monitoring Monalisa, Netlog ger Logging & Bookkeeping service, R- GMA Reliable Messaging Tools such as Jabber are used by experiments (e.g. DIRAC for LHCb) Information System MDS (GLUE) YesBDIILCG based on BDII and GLUE schema Concentrate on robustness and scale. Experiments have assigned external middleware priorities.

Tony Doyle - University of Glasgow 21 September 2005AHM05 Meeting Exec 2 Summary GridPP2 has already met 21% of its original targets with 86% of the metrics within specification “Get fit” deployment plan in place: LCG 2.6 deployed at 16 sites as a preliminary production service Glite 1 was released in April as planned but components have not yet been deployed or their robustness tested by the experiments (1.3 available on pre-production service) Service Challenge (SC)2 addressing networking was a success at CERN and the RAL Tier-1 in April 2005 SC3 also addressing file transfers has just been completed Long-term concern: planning for (LHC startup) Short-term concerns: some under-utilisation of resources and the deployment of Tier-2 resources At the end of GridPP2 Year 1, the initial foundations of “The Production Grid” are built. The focus is on “efficiency”.

Tony Doyle - University of Glasgow 21 September 2005AHM05 Meeting People and Roles More than 100 people in the UK

Tony Doyle - University of Glasgow 21 September 2005AHM05 Meeting Project Map

Tony Doyle - University of Glasgow 21 September 2005AHM05 Meeting GridPP Deployment Status 18/9/05 [2/7/05] (9/1/05) totalCPUfreeCPUrunJobwaitJobseAvail TBseUsed TBmaxCPUavgCPU Total 3070 [2966] (2029) 2247 [1666] (1402) 458 [843] (95) 52 [31] (480) [74.28] (8.69) [16.54] (4.55) 3118 [3145] (2549) 2784 [2802] (1994) Measurable Improvements: 1.Sites Functional - Tested CPUs 3.Storage via SRM interfaces 4.UK+Ireland federation

Tony Doyle - University of Glasgow 21 September 2005AHM05 Meeting New Grid Monitoring Maps Demo AND Google Map Preliminary Production Grid Status

Tony Doyle - University of Glasgow 21 September 2005AHM05 Meeting Accounting

Tony Doyle - University of Glasgow 21 September 2005AHM05 Meeting LCG Tier-1 Planning RAL, UK Pledged Planned to be pledged CPU (kSI2K) Disk (Tbytes) Tape (Tbytes) : Pledged : (a)Bottom up (b)Top down } ~50% uncertainty

Tony Doyle - University of Glasgow 21 September 2005AHM05 Meeting LCG Tier-1 Planning (CPU & Storage) Experiment requests are large e.g. in 2008 CPU ~50MSi2k Storage ~50PB! They can be met globally except in UK plan to contribute >7%. [Currently contribute >10%] First LCG Tier-1 Compute Law: CPU:Storage ~1[kSi2k/TB] Second LCG Tier-1 Storage Law: Disk:Tape ~ 1 (The number to remember is.. 1)

Tony Doyle - University of Glasgow 21 September 2005AHM05 Meeting LCG Tier-1 Planning (Storage)

Tony Doyle - University of Glasgow 21 September 2005AHM05 Meeting LCG Tier-2 Planning UK, Sum of all Federations Pledged Planned to be pledged CPU (kSI2K) Disk (Tbytes) Third LCG Tier-2 Compute Law: Tier-1:Tier-2 CPU ~1 Zeroth LCG Law: There is no Zeroth law – all is uncertain Fifth LCG Tier-2 Storage Law: CPU:Disk~5[kSi2k/TB]) 2006: Pledged : (a)Bottom up (b)Top down } ~100% uncertainty

Tony Doyle - University of Glasgow 21 September 2005AHM05 Meeting Set SMART (Specific Measurable Achievable Realistic Time-phased) Goals Systematic approach and measurable improvements in deployment area See Grid Deployment and Operations for EGEE, LCG and GridPP Jeremy Coles Provides context for Grid “efficiency”Grid Deployment and Operations for EGEE, LCG and GridPP The “Get Fit” Plan

Tony Doyle - University of Glasgow 21 September 2005AHM05 Meeting High level deployment view UK is a significant contributor to EGEE (~20%) CPU utilisation is lower than 70% target (currently ~55%) Disk resource is climbing (but utilisation is low via Storage Resource Management interfaces)

Tony Doyle - University of Glasgow 21 September 2005AHM05 Meeting Sites upgrade improvements ~quarterly upgrades within 3 weeks gradual (20%) improvement in site configuration and stability Increasing number of accessible job slots High level deployment view

Tony Doyle - University of Glasgow 21 September 2005AHM05 Meeting Service Challenges SC2 (April) : RAL joined computing centres around the world in a networking challenge, transferring 60 TeraBytes of data over ten days. SC3 (September): RAL to CERN (T1-T0) at rates of up to 650 Mb/s. e.g. Edinburgh to RAL (T2-T1) at rates of up 480Mb/s. UKLight service tested from Lancaster to RAL. Overall, the File Transfer Service is very reliable, failure rate now below 1%.

Tony Doyle - University of Glasgow 21 September 2005AHM05 Meeting Middleware Development Configuration Management Storage Interfaces Network Monitoring Security Information Services Grid Data Management

Tony Doyle - University of Glasgow 21 September 2005AHM05 Meeting Glite Status 1.2 installed on the Grid pre-production service 1.3 some components have been upgraded 1.4 upgrades to VOMs and registration tools plus additional bulk job submission components LCG 2.6 is the (August) production release UK’s R-GMA incorporated (production and pre-prodn.) LCG 3 will be based upon Glite..

Tony Doyle - University of Glasgow 21 September 2005AHM05 Meeting SRM A single SRM server to service incoming file requests (this is implemented as a web service) Multiple file servers with unix filesystems on which data resides. Data transfer is done to/from the file servers, thus inbound IP connectivity is essential to make the SRM SE available to the wider grid.

Tony Doyle - University of Glasgow 21 September 2005AHM05 Meeting Data Management File Metadata Logical File Name GUID System Metadata (owner, permissions, checksum, …) User Metadata User Defined Metadata File Replica Storage File Name Storage Host Symlinks Link Name

Tony Doyle - University of Glasgow 21 September 2005AHM05 Meeting

Tony Doyle - University of Glasgow 21 September 2005AHM05 Meeting Application Development e.g. Reprocessing DØ data with SAMGrid Frederic Villeneuve-Seguier Reprocessing DØ data with SAMGrid ATLAS LHCbCMS BaBar (SLAC) SAMGrid (FermiLab)QCDGrid

Tony Doyle - University of Glasgow 21 September 2005AHM05 Meeting Workload Management Efficiency Overview Integrated over all VOs and RBs: Successes/Day Success % 67% Improving from 42% to 70-80% during 2005 Problems identified: half WMS (Grid) half JDL (User)

Tony Doyle - University of Glasgow 21 September 2005AHM05 Meeting LHC VOs ALICE ATLAS CMS LHCb Successes/Day N/A Success % 53% 84% 59% 68% Note. Some caveats, see Selection by experiments of “production sites” using Site Functional Tests (currently ~110 of the 197 sites) or use of pre-test software agents leads to >90% experiment production efficiency

Tony Doyle - University of Glasgow 21 September 2005AHM05 Meeting “UK contributes to EGEE's battle with malaria” BioMed Successes/Day 1107 Success % 77% WISDOM (Wide In Silico Docking On Malaria) The first biomedical data challenge for drug discovery, which ran on the EGEE grid production service from 11 July 2005 until 19 August GridPP resources in the UK contributed ~100,000 kSI2k-hours from 9 sites Number of Biomedical jobs processed by country Normalised CPU hours contributed to the biomedical VO for UK sites, July-August 2005

Tony Doyle - University of Glasgow 21 September 2005AHM05 Meeting 1.Why? 2. What? 3. How? 4. When? From Particle Physics perspective the Grid is: 1.mainly (but not just) for physicists, more generally for those needing to utilise large-scale computing resources efficiently and securely 2. a) a working production-scale system running today b) about seamless discovery of computing resources c) using evolving standards for interoperation d) the basis for computing in the 21 st Century e) not (yet) as seamless, robust or efficient as end-users need 3. methods outlined – please come to the PPARC stand, Jeremy Coles’ talk 4.a) now at “preliminary production service” level, for simple(r) applications (e.g. experiment Monte Carlo production) b) 2007 for a fully tested 24x7 LHC service (a large distributed computing resource) for more complex applications (e.g. data analysis) c) planned to meet the LHC Computing Challenge