Tony Doyle - University of Glasgow 6 September 2005Collaboration Meeting GridPP Overview (emphasis on beyond GridPP) Tony Doyle
Tony Doyle - University of Glasgow 3 February 2005Science Committee Meeting “2004 was a pivotal year, marked by extraordinary and rapid change with respect to Grid deployment, in terms of scale and throughput. The scale of the Grid in the UK is more than 2000 CPUs and 1PB of disk storage (from a total of 9,000 CPUs and over 5PB internationally), providing a significant fraction of the total resources required by A peak load of almost 6,000 simultaneous jobs in August, with individual Resource Brokers able to handle up to 1,000 simultaneous jobs, gives confidence that the system should be able to scale up to the required 100,000 CPUs by A careful choice of sites leads to acceptable (>90%) throughput for the experiments, but the inherent complexity of the system is apparent and many operational improvements are required to establish and maintain a production Grid of the required scale. Numerous issues have been identified that are now being addressed as part of GridPP2 planning in order to establish the required resource for particle physics computing in the UK.” Most projects fail in going from prototype to production… There are many issues: methodical approach reqd. Executive Summary II At the end of GridPP2 Year 1, the initial foundations of “The Production Grid” are built. The focus is on “efficiency”. Some Open Questions.. What will it take to build upon this foundation? What are the underlying problems?
Tony Doyle - University of Glasgow 6 September 2005Collaboration Meeting Open Questions 1."LCG Service Challenges" (plans for SC4 based on experience of SC3) How do we all prepare? 2."Running Applications on the Grid"(Why won't my jobs run?) 3."Grid Documentation" (What documentation is needed/missing? Is it a question of organisation?) 4."What value does GridPP add?" 5."Beyond GridPP2 and e-Infrastructure" (What is the current status of planning?) 6."Managing Large Facilities in the LHC era" (What works? What doesn't? What won't) 7."What is a workable Tier-2 Deployment Model?" 8."What is Middleware Support?" (really all about) Aim: to recognise the problems (at all levels), respond accordingly, define appropriate actions
Tony Doyle - University of Glasgow 6 September 2005Collaboration Meeting Open Questions 1."LCG Service Challenges" (plans for SC4 based on experience of SC3) How do we all prepare? 2."Running Applications on the Grid"(Why won't my jobs run?) 3."Grid Documentation" (What documentation is needed/missing? Is it a question of organisation?) 4."What value does GridPP add?" 5."Beyond GridPP2 and e-Infrastructure" (What is the current status of planning?) 6."Managing Large Facilities in the LHC era" (What works? What doesn't? What won't) 7."What is a workable Tier-2 Deployment Model?" 8."What is Middleware Support?" (really all about) Aim: to recognise the problems (at all levels), respond accordingly, define appropriate actions
Tony Doyle - University of Glasgow 6 September 2005Collaboration Meeting Beyond GridPP2.. Funding from September 2007 will be incorporated as part of PPARC’s request for planning input for LHC exploitation from the LHC experiments and GridPP that will be considered by a Panel consisting of Prof. G. Lafferty (Chair), Prof. S. Watts and Dr. P. Harris meeting over the summer to provide input to Science Committee in the Autumn. An important issue to note is the need to ensure matching funding is fully in place for the full term of EGEE-2, anticipated to be 1st April 2006 to 31st March Such funding for SA1 and JRA1 is currently provided by PPARC through GridPP2, but this will terminate under current arrangements at the end of GridPP2 in August 2007.
Tony Doyle - University of Glasgow 6 September 2005Collaboration Meeting LCG Tier-1 Planning (CPU & Storage) Experiment requests are large e.g. in 2008 CPU ~50MSi2k Storage ~50PB! They can be met globally except in UK expected to contribute ~ 7%. [Currently more] First LCG Tier-1 Compute Law: CPU:Storage ~1[kSi2k/TB] Second LCG Tier-1 Storage Law: Disk:Tape ~ 1 (The number to remember is.. 1)
Tony Doyle - University of Glasgow 6 September 2005Collaboration Meeting LCG Tier-1 Planning (Storage)
Tony Doyle - University of Glasgow 6 September 2005Collaboration Meeting LCG Tier-1 Planning RAL, UK Pledged Planned to be pledged CPU (kSI2K) Disk (Tbytes) Tape (Tbytes) : March 2005 detailed planning (bottom up) v26b [uncertainty on when within bid to PPARC] PPARC signatures required in Q : (a)March 2005 detailed planning (bottom up) v26b [current plan] (b)August 2005 minimal Grid (top down) [input requiring LHC-UK experiments support, further iteration(s)..]
Tony Doyle - University of Glasgow 6 September 2005Collaboration Meeting LCG Tier-2 Planning 2006: October 2004 Institute MoU commitments [deployment, 2005] requirement currently less than “planned” reduced CPU and disk currently delivered, need to monitor this.. PPARC signatures required in Q : (a)2007 MoU, followed by pessimistic guess [current plan] (b)August 2005 minimal Grid (top down) [input requiring LHC-UK experiments support, further iteration(s)..] UK, Sum of all Federations Pledged Planned to be pledged CPU (kSI2K) Disk (Tbytes) Third LCG Tier-2 Compute Law: Tier-1:Tier-2 CPU ~1 Zeroth LCG Law: There is no Zeroth law – all is uncertain Fifth LCG Tier-2 Storage Law: CPU:Disk~5[kSi2k/TB])
Tony Doyle - University of Glasgow 6 September 2005Collaboration Meeting Cascaded Pledges.. T2 resource LCG pledges depend upon MoU commitments Current (Q2) Status: SouthGrid has (already) met its MoU commitment Other T2s have not The Q3 status will be reported to PPARC as the year 1 outturn (info. must be correct)
Tony Doyle - University of Glasgow 6 September 2005Collaboration Meeting "What value does GridPP add?" High Level Value added by GridPP LCG1Enabling a rapid start for the LCG Project Middleware 2Generic Metadata Development. 3To provide common storage solutions for the UK. 4To provide and maintain a central Workload Management system in the UK. 5Security in the Grid environment. 6Information Monitoring System. 7Network development. Applications 8Integration of the LHC experiment applicationts. 9Ganga Development. 10Integration with running experiments. 11Connecting with the Theory Community 12Grid Portal Infrastructure 13The Deployment Team 14The Tier-2 structures 15The Tier-1 infrastructure 16Grid Support 17Service Challenges Coordination 18The GridPP Website 19The GridPP Identity 20Dissemination 21Management 22The UK Grid
Tony Doyle - University of Glasgow 6 September 2005Collaboration Meeting "What happens when GridPP disappears?"
Tony Doyle - University of Glasgow 6 September 2005Collaboration Meeting "Beyond GridPP2 and e-Infrastructure" LHC EXPLOITATION PLANNING REVIEW Input is requested from the UK project spokespersons, for ATLAS and CMS for each of the financial years 2008/9 to 2011/12, and for LHCb, ALICE and GridPP for 2007/8 to 2011/12. Physics programme Please give a brief outline of the planned physics programme. Please also indicate how this planned programme could be enhanced with additional resources. In total this should be no more than 3 sides of A4. The aim is to understand the incremental physics return from increasing resources. Input was based upon PPAP roadmap input E-Science and LCG-2E-Science and LCG-2 (26 Oct 2004) and feedback from CB (12 Jan & 7 July 2005) 3 page description: “The Grid for LHC Exploitation” submitted in August 2005
Tony Doyle - University of Glasgow 6 September 2005Collaboration Meeting Beyond GridPP2.. 3 page description: “The Grid for LHC Exploitation” “In order to calculate the minimum amount of resource required at the UK Tier-1 and Tier-2 we have taken the total Tier-1 and Tier-2 requirements of the experiments multiplied by a UK ‘share’.” Experiments should determine the “incremental physics return from increasing resources”.
Tony Doyle - University of Glasgow 6 September 2005Collaboration Meeting UK Support for the LHC Experiments The basic functionality of the Tier-1 is: ALICE - Reconstruction, Chaotic Analysis ATLAS - Reconstruction, Scheduled Analysis/strimming, Calibration CMS - Reconstruction LHCb - Reconstruction, scheduled strimming, chaotic analysis The basic functionality of the Tier-2s is: ALICE - MC Production, Chaotic Analysis ATLAS - Simulation, Analysis, Calibration CMS - Analysis, All Simulation Production LHCb - MC Production, No analysis
Tony Doyle - University of Glasgow 6 September 2005Collaboration Meeting Support for the LHC Experiments in 2008 UK Tier-1 (~7% of Global Tier-1): UK Tier-2 (pre-SRIF3): Status of current UK planning by experiment
Tony Doyle - University of Glasgow 6 September 2005Collaboration Meeting Tier-1 Requirements Minimal UK Grid – each experiments may wish to increase their share (tape omitted for clarity ) Requirements CPU (KSI2K)Disk (TB) UK Share ALICE UK1% ATLAS UK10% CMS UK5% LHCb UK15% LHC Total Other20% UK Total
Tony Doyle - University of Glasgow 6 September 2005Collaboration Meeting Tier-2 Requirements Initial requirements can be met via SRIF3 ( ) Uncertain beyond this.. Requirements CPU (KSI2K)Disk (TB) UK Share ALICE UK1% ATLAS UK10% CMS UK5% LHCb UK15% LHC Total Other20% UK Total
Tony Doyle - University of Glasgow 6 September 2005Collaboration Meeting Manpower Input Requirements for “minimal” Grid Supports LHC and other experiments Does not include wider E-Infrastructure (EGEE and beyond) FTE Table FY05FY06FY07FY08FY09FY10FY11FY12 Apr/05 - Mar/06Apr/06 - Mar/07Apr/07 - Aug/07Sep-07FY07 GridPPOtherTotalGridPPOtherTotalGridPPOtherNewTotalNew Tier-1 Operation Tier-2 Operation Grid Operations Management Application Interfaces Middleware Support Total
Tony Doyle - University of Glasgow 6 September 2005Collaboration Meeting Estimated Costs Naïve Full Economic Cost approach ~£10m p.a. FEC Table [k£] Inflaction factor1.05 FY07FY08FY09FY10FY11FY12 Average Salary £44.5k£46.7k£49.1k£51.5k£54.1k£56.8k Effective FEC fraction 100% Total FEC per FTE £89.0k£93.5k£98.1k£103.0k£108.2k£113.6k Cost Table FY07FY08FY09FY10FY11FY12 Tier-1 Staff £779k£1,402k£1,472k£1,545k£1,623k£1,704k Tier-1 Hardware £2,196k£2,041k£2,721k£1,793k£1,435k£1,416k Tier-1 Running Costs £86k£170k£273k£464k£607k£748k Tier-2 Staff £779k£1,402k£1,472k£1,545k£1,623k£1,704k Tier-2 Hardware £1,484k£957k£1,286k£870k£605k£632k Tier-2 Running Costs £69k£184k£265k£401k£489k£590k Grid Operations £519k£748k£785k£824k£865k£909k Management £130k£234k£245k£155k£162k£170k Application Interfaces £415k£748k£687k£721k£757k£795k Middleware Support £675k£1,215k£981k£824k£865k£909k Travel and Operations £211k£294k£290k£289k£299k£309k Grand Total £7,343k£9,393k£10,478k£9,431k£9,331k£9,886k
Tony Doyle - University of Glasgow 6 September 2005Collaboration Meeting Cost Breakdown Total: £9,393k
Tony Doyle - University of Glasgow 6 September 2005Collaboration Meeting Viewpoint: Enabling Grids for E-science in Europe is “E-Infrastructure” Deliver a 24/7 Grid service to European science build a consistent, robust and secure Grid network that will attract additional computing resources. continuously improve and maintain the middleware in order to deliver a reliable service to users. attract new users from industry as well as science and ensure they receive the high standard of training and support they need. 100 million euros/4years, funded by EU >400 software engineers + service support 70++ European partners
Tony Doyle - University of Glasgow 6 September 2005Collaboration Meeting Phase 2 Overview EGEE is the Grid Infrastructure Project in Europe Take the lead in developing roadmaps, white papers, collaborations Organise European flagship events Collaborate with other projects (including CPS) –start date = April UK partners –CCLRC+NeSC+PPARC (+TCD) (n.b. UK e-Science, not only HEP) NeSC : Training, Dissemination & Applications NeSC : Networking CLRC : Grid Operations, Support & Management CLRC : Middleware Engineering (R-GMA) UK phase 2 added partners –Glasgow, ICSTM, Leeds(?), Manchester, Oxford, (+QMUL) Funded effort dedicated to deploying regional grids (+dissemination) UK T2 coordinators (+newsletter)
Tony Doyle - University of Glasgow 6 September 2005Collaboration Meeting Summary This meeting aims to address the uncertain areas of developing and maintaining a Production Grid Long-term planning ( ) is one of these (particularly) uncertain areas LCG MoUs will be signed shortly based upon Worldwide planning GridPP is providing PPARC with planning input for the LHC Exploitation Grid (+input from ALICE, ATLAS, CMS, LHCb) The (full economic) costs involved for even a minimal LHC Computing Grid are significant GridPP needs to demonstrate its wider significance (in order to enhance PPARC’s funding at a higher level) EGEE 2 starting, but beyond EGEE requires more planning Real work required for "Beyond GridPP2 and e-Infrastructure" open for (tomorrow’s) discussion..