Tony Doyle GridPP2 Proposal and Responses to Questions, Grid Steering Committee, Coseners, 28 July 2003
Tony Doyle - University of Glasgow GridPP2 Proposal ~30 page proposal + figures/tables + 11 planning documents: 15.Tier-0 16.Tier-1 17.Tier-2 18.The Network Sector 19.Middleware 20.Applications 21.Hardware Requirements 22.Management 23.Travel 24.Dissemination 25.From Testbed to Production Production Grid Whole Greater than the Sum of Parts..
Tony Doyle - University of Glasgow Tagged release selected for certification Certified release selected for deployment Tagged package Problem reports add unit tested code to repository Run nightly build & auto. tests Grid certification Fix problems Application Certification Build System Certification Testbed ~40CPU Production Testbed ~1000CPU Certified public release for use by apps. 24x7 Build system Test Group WPs Unit Test Build Certification Production Users Development Testbed ~15CPU Individual WP tests Integration Team Integration Overall release tests Releases candidate Tagged Releases Releases candidate Certified Releases Apps. Representatives From Testbed to Production
Tony Doyle - University of Glasgow Experiment Requirements: UK only Total Requirement:
Tony Doyle - University of Glasgow Projected Hardware Resources Total Resources: (note x2 scale change)
Tony Doyle - University of Glasgow GridPP2 Project Map Built in: to identify progress…
Tony Doyle - University of Glasgow Roadmap: GridPP2 Proposal GridPP1GridPP2Exploitation *Note Added Estimate of Tier-2 SRIF-2 Hardware Value
Tony Doyle - University of Glasgow Components: GridPP2 Proposal * * *Note Added Estimate of Tier-2 SRIF-2 Hardware Value
Tony Doyle - University of Glasgow PPARC Call (27/2/03) GridPP2 Response (30/5/03) Projects Peer Review Panel (14-15/7/03) Grid Steering Committee (28-29/7/03) Science Committee (October 03) Components: GridPP2 Proposal (£23.1m) VerticalIntegrationVerticalIntegration
Tony Doyle - University of GlasgowConclusion We request £23.1m to fund our three-year project, GridPP2 This will provide a Production Grid incorporating 1.access to the Tier-0 Centre at CERN and the LCG deployment releases 2.the UK Tier-1 Centre 3.integration of four distributed Tier-2 centres 4.technical development in Middleware, Security and Networking 5.Grid integration of the experiments The project is in direct support of PPARC's highest priority science programme, the LHC Starting Point for Consideration of Priorites
Tony Doyle - University of Glasgow GridPP Responses to Questions 1.If funded at a 25 % reduced level, what would the GridPP2 priorities be and what would be delivered? Prior to planning documents and proposal writing, discussed priorities to achieve a Production Grid: robust, reliable, resilient, secure, stable service delivered to end-user applications – adapt to reduced funding with this objective in mind. GridPP2 Reduced Programme 2.What extra added value would be bought/delivered by increasing funding above this level? Present in terms of a 15% reduced level scenario and the corresponding GridPP2 Regained Programme (Other scenarios possible)
Tony Doyle - University of Glasgow Priorities: GridPP2 Proposal 1.Tier-1/A staff – National Grid Centre 2.Tier-1/A hardware – International Role 3.Tier-2 staff – UK e-Science Grid 4.Applications –Grid Integration (GridPP2) –Development (experiments proposals) 5.Middleware – EU-wide development 6.Tier-2 hardware – non-PPARC funding 7.CERN staff – quality assurance 8.CERN hardware – pro-rata contribution Established entering proposal writing phase… ALL of these are required to address the LHC Computing Challenge
Tony Doyle - University of Glasgow GridPP2 Priorities: £17.3m Funding Scenario Tier-1/A Staff and Hardware – 25% Reduction Tier-2 Staff –30% Reduction Applications –25% Reduction Middleware, Security and Networking –40% Reduction Tier-0/LCG Contribution –Fixed Cost … GridPP Management –Fixed Cost Highest Priority –(Inter)National Grid via LCG Best Value For Money –National Grid End-user Driven Project –Grid interfaces required Essential Grid & e-Science Component –Grid development required Basis of Deployment –International negotiation required with PPARC Added functionality –Operations Manager and Dissemination Officer GridPP2 Priorities End-user Driven Programme Focus on Production Grid Deployment Assessment by Area Risk Reduced ability to develop and maintain Production Grid
Tony Doyle - University of Glasgow £17.3m Funding Scenario
Tony Doyle - University of Glasgow Tier 0 and LCG: Foundation Programme Aim: retain UK influence Ensure development programmes are linked Project management: GridPPLCG Shared expertise: LCG establishes the global computing infrastructure Allows all participating physicists to exploit LHC data Proposed funding determined based on: –Recently increased funding at CERN supporting LCG (recognition of Grid importance within the LHC programme) –Appropriate share for the UK –Requirements of LCG Phase 2 –Past GridPP1 contribution Required Foundation: LCG Fabric, Technology and Deployment
Tony Doyle - University of Glasgow Tier 1: Reduced System GridPP2 Proposal Reduced Services CPU 2.0 Disk 1.5 AFS Tape 2.5 Core Services Operations Networking 0.5 Security Deployment Experiments Management Total 1914 Reduce hardware by 25% 3MSI2K cpu and 850TB of disk by 2007 c.f. Reqt. of 12MSI2K and 2200TB in 2007 Reduce Tier 1 manpower resources by 25%: Reduced International Significance and Ability to Contribute to Grid Deployment
Tony Doyle - University of Glasgow Tier 1: Reduced Services Reduced Services (Concentrate on delivery to LHC/LCG) Reduce Application Support by 1 FTE –No explicit support for BaBar at Tier-A and best-effort support for non-LHC experiments Reduce Management by 0.5 FTE –Compromises ability to contribute fully to external projects Reduce Deployment Team by 1 FTE –Reduced participation/support for Grid Deployment programme at UK centre Remove Security Support (1 FTE) and –Spread load across Support Team. Risk of security exposure and less focus on propagating security knowledge to other sites Drop AFS Support (0.5 FTE) –No specialist file service support Reduce Operations Support and Core Services by 1 FTE –Slower fixing of broken boxes. Reduced hardware utilisation Reduce Hardware by 25% –Concentrate on Data Services. Greater reliance on Tier-2 Centres for CPU- intensive jobs Reduced International Significance and Ability to Contribute to Grid Deployment
Tony Doyle - University of Glasgow Tier 2: Reduced System Support Reduce Hardware Support by 1 FTE –Reduces hardware support at a given Tier-2 centre by 50% Reduce User Support by 1 FTE (50%) –Harder to induct users in Grid technology and support them afterwards Reduce Data Management by 1 FTE (50%) –No longer a dedicated specialist service (1/2 post) at each Tier-2 centre Remove Network (1 FTE) and VO Management Services (1 FTE) –Services required by all e-Science Grid users –Rely on the local centres providing these: increasing risks if not responsive to GridPP requirements GridPP2 ProposalReduced Service Y 1Y 2Y 3Y 1Y 2Y 3 Hardware Support Core Services4.0 User Support Specialist Services Security1.0 Resource Broker1.0 Network Data Management VO Management Existing Staff-4.0 GridPP Total SY Reduce ability to access significant Tier-2 resources via Production Grid by ~ one third Reduce Tier 2 manpower resources by 30%:
Tony Doyle - University of Glasgow Middleware, Security & Networking: Reduced Development –Reduce Security by 1 FTE Omit local access and usage control framework –Reduce Information Services by 2 FTE Matching funding needed for EGEE participation; Risks UK leadership; Other potential solution (MDS) falls short of LCG requirements –Reduce Data & Storage by 1 FTE Reduced data replication capabilities –No Workload Mgmt. Development No tech transfer from Core Programme; No leverage of OGSA development –Reduce Networking by 1 FTE No active participation in UKLight Programme –Rely upon non-GridPP developments Significantly Increased Project Risk Experiments priorities: 1.Data & Storage –Mission critical to PP Information Services & Monitoring –Essential for understanding grid 2.Security –Robust against hackers, denial-of- service, secure file storage.. Networking –PP input to performance & provisioning 3.Workload –Brokering development highly desirable ActivityProposalReduced Security Info. Services & Monit Data & Storage Workload2.50 Networking TOTAL Reduce ability to develop/understand Middleware as part of Production Grid Environment Reduce middleware manpower by 40%: Security Middleware Networking
Tony Doyle - University of Glasgow Middleware, Security & Networking: Reduced Development Security Middleware Networking Cuts From Reduced Development Programme
Tony Doyle - University of Glasgow Reduce Applications Resources by 25% 5 FTEs removed (less than current programme) –Cuts in ongoing programme of work –Loss of leadership within experiments programme –Reduced non-LHC involvement –No new experiment involvement Long-term problem running separate computing systems Non-Grid & Grid Disenfranchise sizeable part of the community –MICE –Linear Collider –Non-accelerator Physics –Phenomenology Applications: Reduced Development Risk of lost leadership and expertise Failure to engage the whole PP community Expt A Server Expt B Servers Expt CExpt D Middleware
Tony Doyle - University of Glasgow Regained Programme Regained Programme: Reduce GridPP2 programme by 15% leading to the following areas being brought back into scope of the GridPP2 programme Note cuts already included (in both scenarios) Dissemination de-scoping: two year funding within GridPP2 Spine Point Savings: SP11 used as average for new University appointments Losses [£m]Reduced Services (25% cut) Regained Programme (15% cut) Tier-1 staff Tier-1 hardware Tier-2 staff Middleware posts Application posts Dissemination De-scoping Spine Point Savings Total Proposed cuts would compromise large areas of the GridPP2 Prioritised Programme GridPP2 Priority Areas Regain to maintain Production Grid Propose to regain this programme by reducing severity of cuts
Tony Doyle - University of Glasgow £19.6m Funding Scenario
Tony Doyle - University of Glasgow Tier 1: Regained Programme Regained programme includes 13.5 (GridPP) (CCLRC) FTE Hardware: Regain 3.5MSI2K cpu (+15%) and 1PB of disk (+17.5%) by 2007 System Support: Focus on outward programme Restore 1 FTE for Experiment Support –Continue as BaBar Tier A through to 2007 Restore 1 FTE to Deployment –Regain participation/support for Grid Deployment programme Restore 0.5 FTE to Management –Enable participation in wider Grid programme Reduced Services Regained Programme CPU 2.0 Disk 1.5 AFS 0.0 Tape 2.5 Core Services 2.0 Operations 2.5 Networking 0.5 Security 0.0 Deployment Experiments Management Total Restore Tier 1 manpower resources to 87% of original proposal Restores International significance and ability to lead Grid Deployment
Tony Doyle - University of Glasgow Tier 2: Regained Programme Reduced Service Regained Programme Y 1Y 2Y 3Y 1Y 2Y 3 Hardware Support Core Services4.0 User Support Specialist Services Security1.0 Resource Broker1.0 Network Data Management VO Management Existing Staff-4.0 GridPP Total SY Reprofile Hardware Support to later years –Delays Production Grid Roll-Out but establishes longer-term support Reprofile and Restore User Support to 2 FTE in the 2nd & 3rd years –Induct users in Grid technology and support them as LHC turn-on approaches Restore Data Management to 2 FTE –Allows dedicated specialist service inc. 0.5FTE at each Tier-2 Partially Restore the Network and VO Management Services to 0.5 FTE each –Reduce reliance on local centres –Reduces risk that they are not responsive to GridPP requirements Restore Tier 2 manpower resources to 86% of original proposal Restores ability to access managed Tier 2 resources via Production Grid
Tony Doyle - University of Glasgow Middleware, Security & Networking: Regained Programme Restore 1.0 FTE Data & Storage –meet PP data replication requirements Restore 1.0 FTE Information & Monitoring –enable delivery of robust information services Restore 0.5 FTE for Security –enable application of local site policies Restore 0.5 FTE Networking –enable UKLight direct participation Regain 1.5 FTE Workload –enable viable job brokering development programme Regained programme defined by –mission criticality (experiment requirements driven) –International/UK-wide lead –leverage of EGEE, UK core and LCG developments ActivityProposalReducedRegained Security Info-Mon Data & Storage Workload Networking TOTAL Restores ability to develop key Middleware as part of Production Grid Environment Security Middleware Networking
Tony Doyle - University of Glasgow Applications: Regained Programme Restore Applications Resources –2 FTEs restored (regain current GridPP1 compliment) Ongoing programme of work can continue –Difficult to involve experiment activity not already engaged within GridPP Still a risk in providing Grid access across PP community Would need re-scoping (or de-scoping) current activities Project would need to build on cross-experiment collaboration – GridPP1 already has experience –GANGA: ATLAS & LHCb –SAM: CDF & D0 –Persistency: CMS & BaBar Encourage new joint developments across experiments Current knowledge base maintained and current engagement protected Ability to rescope to engage the whole PP community
Tony Doyle - University of Glasgow Conclusions GridPP2 proposal strategic aim: meet all particle physics computing requirements via production grid Balanced programme Priorities focus on LCG development and deployment Recognise challenge in going from Prototype to Production systems 25% reduced funding scenario would require significant de-scoping One scenario presented, focussing on LHC end-user driven objectives 15% reduced funding scenario would re-enable key aspects of the GridPP2 programme Regain: Tier-1: International significance and ability to lead Grid Deployment Tier-2: ability to access managed Tier-2 resources via Production Grid Middleware: ability to develop key Middleware as part of Production Grid Environment Applications: maintain leadership and ability to re-scope to engage the whole PP community