Tony Doyle GridPP – From Prototype To Production, HEPiX Meeting, Edinburgh, 25 May 2004.

Slides:



Advertisements
Similar presentations
An open source approach for grids Bob Jones CERN EU DataGrid Project Deputy Project Leader EU EGEE Designated Technical Director
Advertisements

S.L.LloydATSE e-Science Visit April 2004Slide 1 GridPP – A UK Computing Grid for Particle Physics GridPP 19 UK Universities, CCLRC (RAL & Daresbury) and.
1 ALICE Grid Status David Evans The University of Birmingham GridPP 14 th Collaboration Meeting Birmingham 6-7 Sept 2005.
Tony Doyle GridPP2 Proposal and Responses to Questions, Grid Steering Committee, Coseners, 28 July 2003.
S.L.LloydGrid Steering Committee 8 March 2002 Slide 1 Status of GridPP Overview Financial Summary Recruitment Status EU DataGrid UK Grid Status GridPP.
1 ALICE Grid Status David Evans The University of Birmingham GridPP 16 th Collaboration Meeting QMUL June 2006.
Tony Doyle - University of Glasgow GridPP EDG - UK Contributions Architecture Testbed-1 Network Monitoring Certificates & Security Storage Element R-GMA.
Tony Doyle GridPP2 – Project Specification, GridPP9 Collaboration Meeting, Edinburgh, 4 February 2004.
Tony Doyle Executive Summary, PPARC, MRC London, 15 May 2003.
Your university or experiment logo here What is it? What is it for? The Grid.
A Grid For Particle Physics From testbed to production Jeremy Coles 3 rd September 2004 All Hands Meeting – Nottingham, UK.
B A B AR and the GRID Roger Barlow for Fergus Wilson GridPP 13 5 th July 2005, Durham.
The Grid What is it? what is it for?. Your university or experiment logo here Web: information sharing Invented at CERN by Tim Berners-Lee No. of Internet.
Tony Doyle GridPP2 Specification Process Grid Steering Committee Meeting, MRC, London, 18 February 2004.
Partner Logo UK GridPP Testbed Rollout John Gordon GridPP 3rd Collaboration Meeting Cambridge 15th February 2002.
S.L.LloydGridPP Collaboration Meeting IC Sept 2002Slide 1 Introduction Welcome to the 5 th GridPP Collaboration Meeting Steve Lloyd, Chair of GridPP.
GridPP Building a UK Computing Grid for Particle Physics A PPARC funded project.
1Oxford eSc – 1 st July03 GridPP2: Application Requirement & Developments Nick Brook University of Bristol ALICE Hardware Projections Applications Programme.
Slide 1 of 24 Steve Lloyd NW Grid Seminar - 11 May 2006 GridPP and the Grid for Particle Physics Steve Lloyd Queen Mary, University of London NW Grid Seminar.
Particle physics – the computing challenge CERN Large Hadron Collider –2007 –the worlds most powerful particle accelerator –10 petabytes (10 million billion.
UK Agency for the support of: High Energy Physics - the nature of matter and mass Particle Astrophysics - laws from natural phenomena Astronomy - the.
GridPP Presentation to PPARC Grid Steering Committee 26 July 2001 Steve Lloyd Tony Doyle John Gordon.
Tony Doyle GridPP2 Proposal, BT Meeting, Imperial, 23 July 2003.
GridPP Deployment Status Steve Traylen 28th October 2004 GOSC Face to Face, NESC, UK.
31/03/00 CMS(UK)Glenn Patrick What is the CMS(UK) Data Model? Assume that CMS software is available at every UK institute connected by some infrastructure.
The LHC experiments AuthZ Interoperation requirements GGF16, Athens 16 February 2006 David Kelsey CCLRC/RAL, UK
Andrew McNab - Manchester HEP - 22 April 2002 EU DataGrid Testbed EU DataGrid Software releases Testbed 1 Job Lifecycle Authorisation at your site More.
GridPP From Prototype to Production David Britton 21/Sep/06 1.Context – Introduction to GridPP 2.Performance of the GridPP/EGEE/wLCG Grid 3.Some Successes.
15 May 2006Collaboration Board GridPP3 Planning Executive Summary Steve Lloyd.
Tony Doyle - University of Glasgow PPARC Review Timeline 0. PPARC Call (February 2003) 1.GridPP2 Proposal (30/5/03) 2. Feedback from Reviewers 3. Responses.
LHCb Computing Activities in UK Current activities UK GRID activities RICH s/w activities.
Andrew McNab - Manchester HEP - 2 May 2002 Testbed and Authorisation EU DataGrid Testbed 1 Job Lifecycle Software releases Authorisation at your site Grid/Web.
EGEE statement EU and EU member states major investment in Grid Technology Several good prototype results Next Step: –Leverage current and planned national.
Enabling e-Research over GridPP Dan Tovey University of Sheffield.
Tony Doyle “GridPP2 Proposal”, GridPP7 Collab. Meeting, Oxford, 1 July 2003.
S.L.LloydGridPP CB 29 Oct 2002Slide 1 Agenda 1.Introduction – Steve Lloyd 2.Minutes of Previous Meeting (23 Oct 2001) 3.Matters Arising 4.Project Leader's.
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
Slide David Britton, University of Glasgow IET, Oct 09 1 Prof. David Britton GridPP Project leader University of Glasgow GridPP Vendor Day 30 th April.
GridPP Steve Lloyd, Chair of the GridPP Collaboration Board.
GridPP9 – 5 February 2004 – Data Management DataGrid is a project funded by the European Union GridPP is funded by PPARC WP2+5: Data and Storage Management.
CMS Report – GridPP Collaboration Meeting VI Peter Hobson, Brunel University30/1/2003 CMS Status and Plans Progress towards GridPP milestones Workload.
5 November 2001F Harris GridPP Edinburgh 1 WP8 status for validating Testbed1 and middleware F Harris(LHCb/Oxford)
Tony Doyle GridPP – From Prototype To Production, GridPP10 Meeting, CERN, 2 June 2004.
12th November 2003LHCb Software Week1 UK Computing Glenn Patrick Rutherford Appleton Laboratory.
DataGrid Applications Federico Carminati WP6 WorkShop December 11, 2000.
Robin Middleton RAL/PPD DG Co-ordination Rome, 23rd June 2001.
Nick Brook Current status Future Collaboration Plans Future UK plans.
1 st EGEE Conference – April UK and Ireland Partner Dave Kant Deputy ROC Manager.
3 June 2004GridPP10Slide 1 GridPP Dissemination Sarah Pearce Dissemination Officer
ATLAS and GridPP GridPP Collaboration Meeting, Edinburgh, 5 th November 2001 RWL Jones, Lancaster University.
7April 2000F Harris LHCb Software Workshop 1 LHCb planning on EU GRID activities (for discussion) F Harris.
John Gordon CCLRC e-Science Centre LCG Deployment in the UK John Gordon GridPP10.
SouthGrid SouthGrid SouthGrid is a distributed Tier 2 centre, one of four setup in the UK as part of the GridPP project. SouthGrid.
GridPP Deployment & Operations GridPP has built a Computing Grid of more than 5,000 CPUs, with equipment based at many of the particle physics centres.
Tony Doyle - University of GlasgowOutline EDG LCG GSC UK Core Grid GridPP2 EGEE Where do we go from here? Operations.
GridPP Presentation to AstroGrid 13 December 2001 Steve Lloyd Queen Mary University of London.
GridPP Building a UK Computing Grid for Particle Physics Professor Steve Lloyd, Queen Mary, University of London Chair of the GridPP Collaboration Board.
…building the next IT revolution From Web to Grid…
Tony Doyle - University of Glasgow 8 July 2005Collaboration Board Meeting GridPP Report Tony Doyle.
Grid User Interface for ATLAS & LHCb A more recent UK mini production used input data stored on RAL’s tape server, the requirements in JDL and the IC Resource.
Presenter Name Facility Name UK Testbed Status and EDG Testbed Two. Steve Traylen GridPP 7, Oxford.
J Jensen/J Gordon RAL Storage Storage at RAL Service Challenge Meeting 27 Jan 2005.
Slide § David Britton, University of Glasgow IET, Oct 09 1 Prof. David Britton GridPP Project leader University of Glasgow GridPP delivering The UK Grid.
Bob Jones EGEE Technical Director
Regional Operations Centres Core infrastructure Centres
Gavin McCance University of Glasgow GridPP2 Workshop, UCL
Understanding the nature of matter -
UK GridPP Tier-1/A Centre at CLRC
Building a UK Computing Grid for Particle Physics
Collaboration Board Meeting
Presentation transcript:

Tony Doyle GridPP – From Prototype To Production, HEPiX Meeting, Edinburgh, 25 May 2004

Tony Doyle - University of GlasgowOutline GridPP Project Introduction UK Context Components: A.Management B.Middleware C.Applications D.Tier-2 E.Tier-1 F.Tier-0 Challenges: –Middleware Validation –Improving Efficiency –Meeting Experiment Requirements –..Via The Grid? –Work Group Computing –Events.. To Files.. To Events –Software Distribution –Distributed Analysis Historical Perspective What is the Grid Anyway? Is GridPP a Grid? Summary

Tony Doyle - University of Glasgow GridPP – A UK Computing Grid for Particle Physics GridPP 19 UK Universities, CCLRC (RAL & Daresbury) and CERN Funded by the Particle Physics and Astronomy Research Council (PPARC) GridPP1 - Sept £17m "From Web to Grid" GridPP2 – Sept £16(+1)m "From Prototype to Production"

Tony Doyle - University of Glasgow UK Core e-Science Programme Institutes Tier-2 Centres CERN LCG EGEE GridPP GridPP in Context Tier-1/A Middleware, Security, Networking Experiments Grid Support Centre Not to scale! Apps Dev Apps Int GridPP

Tony Doyle - University of Glasgow GridPP1 Components LHC Computing Grid Project (LCG) Applications, Fabrics, Technology and Deployment European DataGrid (EDG) Middleware Development UK Tier-1/A Regional Centre Hardware and Manpower Grid Application Development LHC and US Experiments + Lattice QCD Management Travel etc

Tony Doyle - University of Glasgow GridPP2 Components C. Grid Application Development LHC and US Experiments + Lattice QCD + Phenomenology B. Middleware Security Network Development F. LHC Computing Grid Project (LCG Phase 2) [review] E. Tier-1/A Deployment: Hardware, System Management, Experiment Support A. Management, Travel, Operations D. Tier-2 Deployment: 4 Regional Centres - M/S/N support and System Management

Tony Doyle - University of Glasgow A. GridPP Management Collaboration Board Project Management Board Project Leader Project Manager Technical (Deployment) Board Experiments (User) Board (Production Manager) (Dissemination Officer) GGF, LCG, EDG (EGEE), UK e- Science, Liaison GridPP1 (GridPP2) Project Map Risk Register

Tony Doyle - University of Glasgow In LCG Context A. Management Structure ARDA Expmts EGEE LCG Deployment Board Tier1/Tier2, Testbeds, Rollout Service specification & provision User Board Requirements Application Development User feedback Metadata Workload Network Security Info. Mon. PMB CB Storage

Tony Doyle - University of Glasgow ARDA Expmts EGEE LCG Deployment Board Tier1/Tier2, Testbeds, Rollout Service specification & provision User Board Requirements Application Development User feedback MetadataWorkloadNetwork Security Info. Mon. PMB Storage III. Grid Middleware I. Experiment Layer II. Application Middleware IV. Facilities and Fabrics User Board Deployment Board GridPP2 Project Managing the Middleware B. Middleware, Security and Network Development

Tony Doyle - University of Glasgow B. Middleware, Security and Network Development M/S/N builds upon UK strengths as part of International development Configuration Management Storage Interfaces Network Monitoring Security Information Services Grid Data Management Security Middleware Networking

Tony Doyle - University of Glasgow C. Application Development GANGA SAMGrid Lattice QCD AliEn ARDA CMS BaBar

Tony Doyle - University of Glasgow D. UK Tier-2 Centres NorthGrid **** Daresbury, Lancaster, Liverpool, Manchester, Sheffield SouthGrid * Birmingham, Bristol, Cambridge, Oxford, RAL PPD, Warwick ScotGrid * Durham, Edinburgh, Glasgow LondonGrid *** Brunel, Imperial, QMUL, RHUL, UCL Current UK Status: 10 Sites via LCG

Tony Doyle - University of Glasgow D. The UK Testbed: Hidden Sector

Tony Doyle - University of Glasgow E. The UK Tier-1/A Centre High quality data services National and International Role UK focus for International Grid development LHCb ATLAS CMS BaBar April 2004: 700 Dual CPU 80TB Disk 60TB Tape (Capacity 1PB) Grid Operations Centre

Tony Doyle - University of Glasgow Real Time Grid Monitoring LCG2 24 May 2004

Tony Doyle - University of Glasgow E. Grid Operations Grid Operations Centre –Core Operational Tasks –Monitor infrastructure, components and services –Troubleshooting –Verification of new sites joining Grid –Acceptance tests of new middleware releases –Verify suppliers are meeting SLA –Performance tuning and optimisation –Publishing use figures and accounts –Grid information services –Monitoring services –Resource brokering –Allocation and scheduling services –Replica data catalogues –Authorisation services –Accounting services Grid Support Centre –Core Support Tasks –Running UK Certificate Authority

Tony Doyle - University of Glasgow F. Tier 0 and LCG: Foundation Programme Aim: build upon Phase 1 Ensure development programmes are linked Project management: GridPPLCG Shared expertise: LCG establishes the global computing infrastructure Allows all participating physicists to exploit LHC data Earmarked UK funding to be reviewed in Autumn 2004 Required Foundation: LCG Fabric, Technology and Deployment

Tony Doyle - University of Glasgow Tagged release selected for certification Certified release selected for deployment Tagged package Problem reports add unit tested code to repository Run nightly build & auto. tests Grid certification Fix problems Application Certification Build System Certification Testbed ~40CPU Application Testbed ~1000CPU Certified public release for use by apps. 24x7 Build system Test Group WPs Unit Test Build Certification Production Users Development Testbed ~15CPU Individual WP tests Integration Team Integration Overall release tests Releases candidate Tagged Releases Releases candidate Certified Releases Apps. Representatives Process to: Test frameworks Test support Test policies Test documentation Test platforms/compilers The Challenges Ahead I: Implementing the Validation Process

Tony Doyle - University of Glasgow The Challenges Ahead II: Improving Grid Efficiency

Tony Doyle - University of Glasgow The Challenges Ahead III: Meeting Experiment Requirements (UK) Total Requirement: In International Context - Q LCG Resources:

Tony Doyle - University of Glasgow Dynamic Grid Optimisation over JANET Network ~7,000 1GHz CPUs ~30,000 1GHz CPUs ~400 TB disk~2200 TB disk (note x2 scale change) The Challenges Ahead IV: Using (Anticipated) Grid Resources

Tony Doyle - University of Glasgow The Challenges Ahead V: Work Group Computing

Tony Doyle - University of Glasgow The Challenges Ahead VI: Events.. to Files.. to Events RAW ESD AOD TAG Interesting Events List RAW ESD AOD TAG RAW ESD AOD TAG Tier-0(International) Tier-1(National) Tier-2(Regional) Tier-3(Local) Data Files Data Files Data Files TAG Data Files Data Files Data Files RAW Data File Data Files Data Files ESD Data Files Data Files AOD Data Event 1 Event 2 Event 3 VOMS-enhanced Grid certificates to access databases via metadata Non-Trivial..

Tony Doyle - University of Glasgow The Challenges Ahead VII: software distribution ATLAS Data Challenge (DC2) this year to validate world-wide computing model Packaging, distribution and installation: Scale: one release build takes 10 hours produces 2.5 GB of files Complexity: 500 packages, Mloc, 100s of developers and 1000s of users –ATLAS collaboration is widely distributed: 140 institutes, all wanting to use the software –needs push-button easy installation.. Physics Models Monte Carlo Truth Data MC Raw Data Reconstruction MC Event Summary Data MC Event Tags Detector Simulation Raw Data Reconstruction Data Acquisition Level 3 trigger Trigger Tags Event Summary Data ESD Event Summary Data ESD Event Tags Calibration Data Run Conditions Trigger System Step 1: Monte Carlo Data Challenges Step 1: Monte Carlo Data Challenges Step 2: Real Data

Tony Doyle - University of Glasgow Complex workflow… LCG/ARDA Development 1.AliEn (ALICE Grid) provided a pre- Grid implementation [Perl scripts] 2.ARDA provides a framework for PP application middleware The Challenges Ahead VIII: distributed analysis

Tony Doyle - University of Glasgow Historical Perspective I wrote in 1990 a program called "WorlDwidEweb", a point and click hypertext editor which ran on the "NeXT" machine. This, together with the first Web server, I released to the High Energy Physics community at first, and to the hypertext and NeXT communities in the summer of Tim Berners-Lee The first three years were a phase of persuasion, aided by my colleague and first convert Robert Cailliau, to get the Web adopted… We needed seed servers to provide incentive and examples, and all over the world inspired people put up all kinds of things… Between the summers of 1991 and 1994, the load on the first Web server ("info.cern.ch") rose steadily by a factor of 10 every year…

Tony Doyle - University of Glasgow What is The Grid Anyway? From Particle Physics Perspective The Grid is: not hype, but surrounded by it a working prototype running on testbed(s)… about seamless discovery of PC resources around the world using evolving standards for interoperation the basis for particle physics computing in the 21 st Century not (yet) as transparent as end-users want it to be

Tony Doyle - University of Glasgow What is The Grid Is GridPP a Grid? Anyway? 1.Coordinates resources that are not subject to centralized control 2.… using standard, open, general-purpose protocols and interfaces 3.… to deliver nontrivial qualities of service 1.YES. This is why development and maintenance of a UK-EU-US testbed is important 2.YES... Globus/CondorG/EDG meet this requirement. Common experiment application layers are also important here. 3.NO(T YET)… Experiments define whether this is true - currently only ~100,000 jobs submitted via the testbed c.f. internal component tests of up 10,000 jobs per day. Next step: LCG-2 deployment outcome… this year

Tony Doyle - University of Glasgow GridPP – Theory and Experiment UK GridPP started 1/9/01 EU DataGrid: First Middleware ~1/9/01 Development requires a testbed with feedback –Operational Grid Fit into UK e-Science structures Experience in distributed computing essential to build and exploit the Grid Scale in UK? 0.5 PBytes and 2,000 distributed CPUs GridPP in Sept 2004 Grid jobs are being submitted now.. user feedback loop is important.. All experiments have immediate requirements Current Experiment Production: The Grid is a small component Non-technical issues: –Recognising context –Building upon expertise –Defining roles –Sharing resources Major deployment activity is LCG –We contribute significantly to LCG and our success depends critically on LCG Production Grid will be difficult to realise: GridPP2 planning underway as part of LCG/EGEE Many Challenges Ahead.. GridPP Summary: From Web to Grid

Tony Doyle - University of Glasgow GridPP Summary: From Prototype to Production BaBar D0 CDF ATLAS CMS LHCb ALICE 19 UK Institutes RAL Computer Centre CERN Computer Centre SAMGrid BaBarGrid LCG EDG GANGA EGEE UK Prototype Tier-1/A Centre CERN Prototype Tier-0 Centre 4 UK Tier-2 Centres LCG UK Tier-1/A Centre CERN Tier-0 Centre UK Prototype Tier-2 Centres ARDA Separate Experiments, Resources, Multiple Accounts 'One' Production Grid Prototype Grids

Tony Doyle - University of Glasgow Why was the failure rate ~20%? Component Testing e.g. RB Stress Tests (LCG) RB never crashed ran without problems at load for several days in a row 20 streams with 100 jobs each ( typical error rate ~ 2 % still present) RB stress test in a job storm of 50 streams, 20 jobs each : –50% of the streams ran out of connections between UI and RB. (configuration parameter – but machine constraints) –Remaining 50% streams finished normal (2% error rate) –Time between job-submit and return of the command (acceptance by the RB) is 3.5 seconds (independent of number of streams) PROBLEMS ARE END-TO-END: e.g. Site advertisement communicated via class ads to all sites (inc. e.g. CNAF) results in RB sending application jobs (e.g. AliEn for ALICE) to black hole – these are recorded as failures (application corrects for these via re-submission) OTHER PROBLEM IS INCORPORATION OF ADDED FUNCTIONALITY –~Resolved by adherence to software process coupled to testbed structure… improved significantly within LCG (leading to EGEE) III. Grid Middleware I. Experiment Layer II. Application Middleware IV. Facilities and Fabrics

Tony Doyle - University of Glasgow What is the GridPP1 Project Status? 76% of the 190 GridPP1 tasks have been successfully completed