Tony Doyle GridPP – From Prototype To Production, GridPP10 Meeting, CERN, 2 June 2004
Tony Doyle - University of GlasgowOutline GridPP Project Introduction UK Context Components: A.Management B.Middleware C.Applications D.Tier-2 E.Tier-1 F.Tier-0 Challenges: 1.Middleware Validation 2.Improving Efficiency 3.Meeting Experiment Requirements 4...via The Grid? 5.Work Group Computing 6.Events.. To Files.. To Events 7.Software Distribution 8.Distributed Analysis 9.Production Accounting 10.Sharing Resources Summary
Tony Doyle - University of Glasgow GridPP – A UK Computing Grid for Particle Physics GridPP 19 UK Universities, CCLRC (RAL & Daresbury) and CERN Funded by the Particle Physics and Astronomy Research Council (PPARC) GridPP1 - Sept £17m "From Web to Grid" GridPP2 – Sept £16(+1)m "From Prototype to Production"
Tony Doyle - University of Glasgow UK Core e-Science Programme Institutes Tier-2 Centres CERN LCG EGEE GridPP GridPP in Context Tier-1/A Middleware, Security, Networking Experiments Grid Support Centre Not to scale! Apps Dev Apps Int GridPP
Tony Doyle - University of Glasgow GridPP1 Components LHC Computing Grid Project (LCG) Applications, Fabrics, Technology and Deployment European DataGrid (EDG) Middleware Development UK Tier-1/A Regional Centre Hardware and Manpower Grid Application Development LHC and US Experiments + Lattice QCD Management Travel etc
Tony Doyle - University of Glasgow GridPP2 Components C. Grid Application Development LHC and US Experiments + Lattice QCD + Phenomenology B. Middleware Security Network Development F. LHC Computing Grid Project (LCG Phase 2) [review] E. Tier-1/A Deployment: Hardware, System Management, Experiment Support A. Management, Travel, Operations D. Tier-2 Deployment: 4 Regional Centres - M/S/N support and System Management
Tony Doyle - University of Glasgow A. GridPP Management Collaboration Board Project Management Board Project Leader Project Manager Technical (Deployment) Board Experiments (User) Board (Production Manager) (Dissemination Officer) GGF, LCG, EDG (EGEE), UK e- Science, Liaison GridPP1 (GridPP2) Project Map Risk Register
Tony Doyle - University of Glasgow GridPP PMB Who’s Who CB Chair Steve Lloyd Project Leader Tony Doyle Project Manager Dave Britton Applications Coordinator Roger Jones Middleware Coordinator Robin Middleton Tier-2 Board Chair Steve Lloyd Tier-1 Board Chair Tony Doyle Productn. Manager Jeremy Coles User Board Chair Roger Barlow Deployment Board Chair Dave Kelsey “External Input” Dissemination Officer Sarah Pearce CERN Liaison Tony Cass UK e-Science Liaison Neil Geddes GGF Liaison Pete Clarke PPARC Head of e-Science Guy Rickett Deputy Project Leader John Gordon “authority” via the Collaboration Board “reporting” via the Project Manager “strategic” from the User Board and Deployment Board “external” from the dissemination officer and liaison members Roles Context
Tony Doyle - University of Glasgow In LCG Context A. Management Structure ARDA Expmts EGEE LCG Deployment Board Tier1/Tier2, Testbeds, Rollout Service specification & provision User Board Requirements Application Development User feedback Metadata Workload Network Security Info. Mon. PMB CB Storage
Tony Doyle - University of Glasgow A. GridPP Management Staff Effort A. Management, Travel, Operations Reporting line: Production Manager: via the Deputy Project Leader to the EGEE SA1 Infrastructure activity. Dissemination Officer: via the Project Manager and, partially, to the EGEE NA2 Dissemination activity.
Tony Doyle - University of Glasgow ARDA Expmts EGEE LCG Deployment Board Tier1/Tier2, Testbeds, Rollout Service specification & provision User Board Requirements Application Development User feedback MetadataWorkloadNetwork Security Info. Mon. PMB Storage III. Grid Middleware I. Experiment Layer II. Application Middleware IV. Facilities and Fabrics User Board Deployment Board GridPP2 Project Managing the Middleware B. Middleware, Security and Network Development
Tony Doyle - University of Glasgow B. Middleware, Security and Network Development M/S/N builds upon UK strengths as part of International development Configuration Management Storage Interfaces Network Monitoring Security Information Services Grid Data Management Security Middleware Networking
Tony Doyle - University of Glasgow B. Middleware, Security and Network Development: Staff Effort B. Middleware Security Network Development Reporting line: via the middleware coordinator and also to the LCG/EGEE JRA1 Middleware area, if agreed, within the LCG/EGEE work areas.
Tony Doyle - University of Glasgow C. Application Development GANGA SAMGrid Lattice QCD AliEn → ARDA CMS BaBar
Tony Doyle - University of Glasgow C. Application Development: Staff Effort C. Grid Application Development LHC and US Experiments + Lattice QCD + Phenomenology Reporting line: via the applications coordinator.
Tony Doyle - University of Glasgow D. UK Tier-2 Centres NorthGrid **** Daresbury, Lancaster, Liverpool, Manchester, Sheffield SouthGrid * Birmingham, Bristol, Cambridge, Oxford, RAL PPD, Warwick ScotGrid * Durham, Edinburgh, Glasgow LondonGrid *** Brunel, Imperial, QMUL, RHUL, UCL Current UK Status: 10 Sites via LCG (2 at RAL)
Tony Doyle - University of Glasgow D. The UK Testbed: Hidden Sector
Tony Doyle - University of Glasgow D. UK Tier-2 Centres: Staff Effort D. Tier-2 Deployment: 4 Regional Centres - M/S/N support and System Management Reporting line: via the Tier-2 Board Chair for Operations staff. UK Support Posts report via the Production Manager and also to the Deputy Project Leader for the EGEE SA1 Infrastructure activity.
Tony Doyle - University of Glasgow E. The UK Tier-1/A Centre High quality data services National and International Role UK focus for International Grid development LHCb ATLAS CMS BaBar April 2004: 700 Dual CPU 80TB Disk 60TB Tape (Capacity 1PB) Grid Operations Centre
Tony Doyle - University of Glasgow E. The UK Tier-1/A Centre: Staff Effort E. Tier-1/A Deployment: Hardware, System Management, Experiment Support Reporting line: via the Tier-1 Manager to the Tier-1/A Board.
Tony Doyle - University of Glasgow Real Time Grid Monitoring LCG2 1 June 2004
Tony Doyle - University of Glasgow E. Grid Operations Grid Operations Centre –Core Operational Tasks –Monitor infrastructure, components and services –Troubleshooting –Verification of new sites joining Grid –Acceptance tests of new middleware releases –Verify suppliers are meeting SLA –Performance tuning and optimisation –Publishing use figures and accounts –Grid information services –Monitoring services –Resource brokering –Allocation and scheduling services –Replica data catalogues –Authorisation services –Accounting services Grid Support Centre –Core Support Tasks –Running UK Certificate Authority
Tony Doyle - University of Glasgow E. Grid Operations: Staff Effort Reporting line: via the Deputy Project Leader to the EGEE SA1 Infrastructure activity.
Tony Doyle - University of Glasgow F. Tier 0 and LCG: Foundation Programme Aim: build upon Phase 1 Ensure development programmes are linked Project management: GridPPLCG Shared expertise: LCG establishes the global computing infrastructure Allows all participating physicists to exploit LHC data Earmarked UK funding to be reviewed in Autumn 2004 Required Foundation: LCG Fabric, Technology and Deployment F. LHC Computing Grid Project (LCG Phase 2) [review]
Tony Doyle - University of Glasgow Tagged release selected for certification Certified release selected for deployment Tagged package Problem reports add unit tested code to repository Run nightly build & auto. tests Grid certification Fix problems Application Certification Build System Certification Testbed ~40CPU Application Testbed ~1000CPU Certified public release for use by apps. 24x7 Build system Test Group WPs Unit Test Build Certification Production Users Development Testbed ~15CPU Individual WP tests Integration Team Integration Overall release tests Releases candidate Tagged Releases Releases candidate Certified Releases Apps. Representatives Process to: Test frameworks Test support Test policies Test documentation Test platforms/compilers The Challenges Ahead I: Implementing the Validation Process
Tony Doyle - University of Glasgow The Challenges Ahead II: Improving Grid “Efficiency”
Tony Doyle - University of Glasgow The Challenges Ahead III: Meeting Experiment Requirements (UK) Total Requirement: In International Context - Q LCG Resources:
Tony Doyle - University of Glasgow Dynamic Grid Optimisation over JANET Network ~7,000 1GHz CPUs ~30,000 1GHz CPUs ~400 TB disk~2200 TB disk (note x2 scale change) The Challenges Ahead IV: Using (Anticipated) Grid Resources
Tony Doyle - University of Glasgow The Challenges Ahead V: Work Group Computing
Tony Doyle - University of Glasgow The Challenges Ahead VI: Events.. to Files.. to Events RAW ESD AOD TAG “Interesting Events List” RAW ESD AOD TAG RAW ESD AOD TAG Tier-0(International) Tier-1(National) Tier-2(Regional) Tier-3(Local) Data Files Data Files Data Files TAG Data Files Data Files Data Files RAW Data File Data Files Data Files ESD Data Files Data Files AOD Data Event 1 Event 2 Event 3 VOMS-enhanced Grid certificates to access databases via metadata Non-Trivial..
Tony Doyle - University of Glasgow The Challenges Ahead VII: software distribution ATLAS Data Challenge (DC2) this year to validate world-wide computing model Packaging, distribution and installation: Scale: one release build takes 10 hours produces 2.5 GB of files Complexity: 500 packages, Mloc, 100s of developers and 1000s of users –ATLAS collaboration is widely distributed: 140 institutes, all wanting to use the software –needs ‘push-button’ easy installation.. Physics Models Monte Carlo Truth Data MC Raw Data Reconstruction MC Event Summary Data MC Event Tags Detector Simulation Raw Data Reconstruction Data Acquisition Level 3 trigger Trigger Tags Event Summary Data ESD Event Summary Data ESD Event Tags Calibration Data Run Conditions Trigger System Step 1: Monte Carlo Data Challenges Step 1: Monte Carlo Data Challenges Step 2: Real Data
Tony Doyle - University of Glasgow Complex workflow… LCG/ARDA Development 1.AliEn (ALICE Grid) provided a pre- Grid implementation [Perl scripts] 2.ARDA provides a framework for PP application middleware The Challenges Ahead VIII: distributed analysis
Tony Doyle - University of Glasgow Complex workflow… LCG/ARDA Development Online monitoring Automatic accounting Meeting LCG and other requirements The Challenges Ahead IX: Production Accounting GridPP Grid Report for Tue, 1 Jun :00: CPUs Total: Hosts up:442 Hosts down: 82 Avg Load (15, 5, 1m): 33%, 35%, 36% Localtime: :00
Tony Doyle - University of Glasgow The Challenges Ahead X: S haring… MoUs, Guidelines and Policies Disk/CPU resources allocated to each “group” Grid is based on distributed resources - a “group” is an experiment An institute is typically involved in many experiments Institutes define priorities on computing resources via OPEN policy statements All jobs submitted via Globus authentication - Certificates identified by user and experiment Need to implement Grid “priority” Minimum amount of data to deliver at a time for a job? Where to store files? Which data access/storing activities have the highest priority? Sharing of the resources among groups? Users belong to multiple groups? How many jobs per group are allowed? What processing activities are allowed at each site? To which sites should data access and processing activities be sent? How should the resources of a local cluster of PCs be shared among groups? Tier-2 discussion prior to the Collaboration Meeting… issues will arise which require ALL Tier centres to define/sign up to an MoU and publish a policy (See Steve’s talk) * Implemented by site administrators, with OPEN policies defined at each site based on e.g. case to funding authority What’s new? Ability to monitor/allocate unused resources We will be judged by how well we work as a set of Virtual Organisations
Tony Doyle - University of Glasgow GridPP – Theory and Experiment UK GridPP started 1/9/01 EU DataGrid: First Middleware ~1/9/01 Development requires a testbed with feedback –“Operational Grid” Fit into UK e-Science structures Experience in distributed computing essential to build and exploit the Grid Scale in UK? 0.5 PBytes and 2,000 distributed CPUs GridPP in Sept 2004 Grid jobs are being submitted now.. user feedback loop is important.. All experiments have immediate requirements Current Experiment Production: “The Grid” is a small component Non-technical issues: –Recognising context –Building upon expertise –Defining roles –Sharing resources Major deployment activity is LCG/EGEE –We contribute significantly to LCG and our success depends critically on LCG “Production Grid” will be difficult to realise: GridPP2 planning underway as part of LCG/EGEE Work Areas and Roles defined Many Challenges Ahead.. GridPP Summary: From Web to Grid
Tony Doyle - University of Glasgow GridPP Summary: From Prototype to Production BaBar D0 CDF ATLAS CMS LHCb ALICE 19 UK Institutes RAL Computer Centre CERN Computer Centre SAMGrid BaBarGrid LCG EDG GANGA EGEE UK Prototype Tier-1/A Centre CERN Prototype Tier-0 Centre 4 UK Tier-2 Centres LCG UK Tier-1/A Centre CERN Tier-0 Centre UK Prototype Tier-2 Centres ARDA Separate Experiments, Resources, Multiple Accounts 'One' Production Grid Prototype Grids