LCG Denis Linglin - 1 MàJ : 9/02/03 07:24 LHC Computing Grid Project Status Report 12 February 2003.

2 LCG Denis Linglin - 2 MàJ : 9/02/03 07:24 Project Goals applications – environment, common tools, frameworks, persistency,.. computing system  –data recording, reconstruction, managed storage (CERN) –global grid service of collaborating computer centres –global analysis environment central role of data challenges –deploy & evolve –experience  confidence Goal – Prepare and deploy the LHC computing environment to help the experiments’ analyse the data coming from the detectors

3 LCG Denis Linglin - 3 MàJ : 9/02/03 07:24 Two Phases Phase 1 – 2002-05 -- R&D –Applications - prototyping  development –Develop and Operate a Grid Service –Computing Services TDR – July-2005 Phase 2 – 2006-08 -- Construction & operation –Installation, commissioning and operation of the initial global LHC data analysis Grid

4 LCG Denis Linglin - 4 MàJ : 9/02/03 07:24 Requirements & Implementation SC2 brings together the Four Experiments and Tier 1 Regional Centres it identifies common domains and sets requirements for the project –may use an RTAG – Requirements and Technical Assessment Group –limited scope, two-month lifetime with intermediate report –one member per experiment + experts PEB manages the implementation –organizing projects, work packages –coordinating between the Regional Centres –collaborating with Grid projects –organizing grid services SC2 approves the work plan, monitors progress Info from SC2 LHCC Computing RRB Project Execution Board Software and Computing Committee Overview Board

5 LCG Denis Linglin - 5 MàJ : 9/02/03 07:24 SC2 Requirements Specification status of RTAGs –On applications:final report data persistencyapr02 software support process may02 mathematical libraries may02 detector geometry description oct02 Monte Carlo generators oct02 applications architectural blueprint oct02 Detector simulation dec02 –On Fabrics mass storage requirementsmay02 –On Grid technology and deployment area Grid technology use casesjun02 Regional Center categorisationjun02 –Current status of RTAGs (and available reports) on Info from SC2

6 LCG Denis Linglin - 6 MàJ : 9/02/03 07:24 Work Planning Status High level planning paper prepared and presented to LHCC in July Level 1 and 2 milestones agreed with LHCC referees – November 2002 PBS/WBS agreed with experiments – December 2002 see  Formal work plans agreed for –Data Persistency (POOL) –Support for the Software Process & Infrastructure (SPI) –Mass Storage –Core software services (SEAL) Work plans in preparation: –Mathematical Libraries –Physics Interfaces (PI) LHC Global Grid Service –First service definition in preparation  February 2002 Info from SC2

7 LCG Denis Linglin - 7 MàJ : 9/02/03 07:24 LCG Level 1 Milestones 2002200520042003 Q1 Q2 Q3 Q4 Hybrid Event Store available for general users Distributed production using grid services First Global Grid Service (LCG-1) available Distributed end-user interactive analysis Full Persistency Framework LCG-1 reliability and performance targets “50% prototype” (LCG-3) available LHC Global Grid TDR applications grid service launch workshop Here we are

8 LCG Denis Linglin - 8 MàJ : 9/02/03 07:24 LCG Project Implementation PEB : 4 Areas of Work - Applications – Torre Wenaus Grid deployment – Ian Bird Fabrics – Bernd Panzer Provision of Grid Technology – David Foster LHCC Computing RRB Project Execution Board Software and Computing Committee Overview Board

9 LCG Denis Linglin - 9 MàJ : 9/02/03 07:24 Applications Area Area manager – Torre Wenaus Importance of RTAGs to define scope Open weekly applications area meetings Software Architects Forum  –process for taking LCG-wide software decisions Staffing of projects – –CERN, experiments, other institutes –CERN resources being merged into a single group – EP/SFT and moving people together in building 32

10 LCG Denis Linglin - 10 MàJ : 9/02/03 07:24 Simulation RTAGs have defined formal requirements for LCG for : –detector geometry description –MC generators –detector simulation Support required for both GEANT 4 and FLUKA GEANT4 –independent collaboration, including HEP institutes, LHC and other experiments, other sciences –significant LHC related resources (including CERN) –MoU being re-defined now –need to ensure long-term support –CERN resources will be under the direction of the project –process for agreeing common LHC priorities

11 LCG Denis Linglin - 11 MàJ : 9/02/03 07:24 Grid Deployment Area Manager – Ian Bird Planning, building, commissioning, operating - - a stable, reliable, manageable Grid for - - Data Challenges and the general analysis workload Integrating fabrics from many Regional Centres and CERN

12 LCG Denis Linglin - 12 MàJ : 9/02/03 07:24 Distributed Analysis must work CERN will provide the data reconstruction & recording service (Tier 0) -- but only a small part of the analysis capacity current planning for capacity at CERN + principal Regional Centres –2002: 650 KSI2000  <1% of capacity required in 2008 –2005: 6,600 KSI2000  < 10% of 2008 capacity KSI2000 at CC-IN2P3 : March 2002 ~190, Nov. 2002 ~275, March 2003 ~700 % CPU (LHC/∑CC-in2p3) = 16% in 2002

13 LCG Denis Linglin - 13 MàJ : 9/02/03 07:24 Data Challenges in 2002

14 6 million events ~20 sites

15 grid tools used at 11 sites

16 LCG Denis Linglin - 16 MàJ : 9/02/03 07:24 Grid Deployment Experiments can do (and are doing) their event production using distributed resources with a variety of solutions –classic distributed production – send jobs to specific sites, simple bookkeeping –some use of Globus, and some of the HEP Grid tools –other integrated solutions (ALIEN) The hard problem for distributed computing is data analysis – ESD and AOD –chaotic workload –unpredictable data access patterns this is the problem that the LCG has to solve and this is where Grid technology should really help

17 LCG Denis Linglin - 17 MàJ : 9/02/03 07:24 Deploying the LHC Grid The priority for 2003 is to move from testbeds to a SERVICE We need to learn how to OPERATE a Grid Service Quality and Reliability are as important as functionality

18 LCG Denis Linglin - 18 MàJ : 9/02/03 07:24 Grid Deployment Board Grid Deployment Board – chair Mirco Mazzucato –representatives from the experiments and from each country with an active Regional Centre taking part in the LCG Grid Service –forges the agreements, takes the decisions, defines the standards and policies that are needed to set up and manage the LCG Global Grid Services –coordinates the planning of resources for physics and computing data challenges First meeting 4 October in Milano First task is the detailed definition of LCG-1, the initial LCG Global Grid Service

19 LCG Denis Linglin - 19 MàJ : 9/02/03 07:24 Grid Deployment - The Strategy Get a basic grid service into production so that we know what works, what doesn’t, what the priorities are And evolve from there to the full LHC service Agree on a common set of middleware to be used for the first LCG grid service – LCG-1 target- full definition of LCG-1 by February 2003 - LCG-1 in operation mid-2003 - LCG-1 in full service by end of 2003 this will be conservative – stability before functionality and will not satisfy all of the HEPCAL requirements but must be sufficient for the data challenges scheduled in 2004

20 LCG Denis Linglin - 20 MàJ : 9/02/03 07:24 Centres taking part in the LCG-1 around the world  around the clock

21 LCG Denis Linglin - 21 MàJ : 9/02/03 07:24 Centres taking part in LCG-1 Centres that have declared resources – Dec. 2002 Tier 0 CERN Tier 1 Centres Brookhaven National Lab CNAF Bologna Fermilab FZK Karlsruhe IN2P3 Lyon Rutherford Appleton Lab (UK) University of Tokyo CERN Other Centres Academica Sinica (Taipei) Barcelona Caltech GSI Darmstadt Italian Tier 2s(Torino, Milano, Legnaro) Manno (Switzerland) Moscow State University NIKHEF Amsterdam Ohio Supercomputing Centre Sweden (NorduGrid) Tata Institute (India) Triumf (Canada) UCSD UK Tier 2s University of Florida– Gainesville University of Prague ……

22 LCG Denis Linglin - 22 MàJ : 9/02/03 07:24 LCG-1 as a service for LHC experiments Mid-2003 –5-10 of the larger regional centres –available as one of the services used for simulation campaigns 2H03 –add more capacity at operational regional centres –add more regional centres –activate operations centre, user support infrastructure Early 2004 –principal service for physics data challenges Grid Technology in LCG LCG expects to obtain Grid Technology, along with maintenance and support, from projects funded by national and regional e-science initiatives -- and, later, from industry

23 LCG Denis Linglin - 23 MàJ : 9/02/03 07:24 Grid Technology in LCG Coordination by the project CTO – David Foster This area of the project is concerned with ensuring that the LCG requirements are known to current and potential Grid projects active lobbying for suitable solutions – influencing plans and priorities evaluating potential solutions negotiating support for tools developed by Grid projects developing a plan to supply solutions that do not emerge from other sources BUT this must be done with caution – important to avoid HEP-SPECIAL solutions important to migrate to standards as they emerge (avoid emotional attachment to prototypes)

24 LCG Denis Linglin - 24 MàJ : 9/02/03 07:24 Grid Technology Status A base set of requirements has been defined (HEPCAL, HEP common application layer) : –43 use cases –~2/3 of which should be satisfied ~2003 by currently funded projects Good experience of working with Grid projects in Europe and the United States Practical results from testbeds used for physics simulation campaigns GLUE initiative – has shown how to integrate the EDG and VDT toolkits An initial agreement is being made on a joint toolkit for LCG-1

25 LCG Denis Linglin - 25 MàJ : 9/02/03 07:24 Grid Technology Status We are still solving basic reliability & functionality problems –This is worrying as we still have a long way to go to get to a solid service –At end 2002, a solid service in mid-2003 looks (surprisingly) ambitious HEP needs to limit divergence in developments. –Complexity adds cost We have not yet addressed system level issues –How to manage and maintain the Grid as a system providing a high- quality reliable service. –Few tools and treatment in current developments of problem determination, error recovery, fault tolerance etc. Some of the advanced functionality we will need is only being thought about now –Comprehensive data management, SLA’s, reservation schemes, interactive use. Many many initiatives are underway and more are coming How do we manage the complexity of all this ?

26 LCG Denis Linglin - 26 MàJ : 9/02/03 07:24 Establishing Priorities We need to create a basic infrastructure that works well. –LHC needs a systems architecture and high-quality middleware – reliable and fault tolerant. –Tools for systems administration. –Focus on mainline physics requirements and robust data handling. –Simple end-user tools that deal with the complexity. Need to look at the overall picture of what we are trying to do and focus resources on key priority developments We must simplify and make the simple things work well. It is easy to expand scope, much harder to contract it !

27 LCG Denis Linglin - 27 MàJ : 9/02/03 07:24 Grid Technology – Next Steps leverage the considerable investments being made –proposals being prepared for EU 6 th Framework Programme, NSF-DoE funding round, various national science infrastructure funding opportunities priority target: hardening/re-engineering of current prototypes with correctly funded maintenance and support but - expect several major architectural changes before things mature

28 LCG Denis Linglin - 28 MàJ : 9/02/03 07:24 Target for the end of the decade LHC data analysis using “global collaborative environments integrating large-scale, globally distributed computational systems and complex data collections linking tens of thousands of computers and hundreds of terabytes of storage” The researchers concentrating on science, unaware of the details and complexity of the environment they are exploiting Success will be when the scientist does not mention the Grid

29 LCG Denis Linglin - 29 MàJ : 9/02/03 07:24 A few things to keep in mind A global grid infrastructure needs a coordinated management structure Middleware for a global infrastructure – –International development programme –World-wide support & maintenance –Regional and national sensitivities Avoid HEP specials –Basic middleware for global science – not just for HEP –Plan for convergence with industrial solutions Collaborative, complementary development projects –partnership of computer science, software engineering, scientists –funding from multiple agencies – national, regional,..

30 LCG Denis Linglin - 30 MàJ : 9/02/03 07:24 Grid Technology Summary many R&D projects funded –to develop and demonstrate middleware –limited duration – many already in mid-life excellent initial experience –shows the potential for science grids –has given a lot of insight –but – we are understanding that this is very hard to do consolidation of the results and coordination of future efforts is now needed to build a solution for LHC a priority now is to – –harden/re-implement the current prototypes and pilot products –understand support issues –add the essential missing features for a production environment – that were not part of the R&D projects

31 LCG Denis Linglin - 31 MàJ : 9/02/03 07:24 Fabric Area Area Manager – Bernd Panzer CERN Tier 0+1 centre –high performance data recording –automated systems management & operation –integration in LHC Grid Tier 1,2 centre collaboration –develop/share experience on installing and operating a Grid –exchange information on planning and experience of large fabric management –look for areas for collaboration and cooperation –use HEPiX as the communications forum Technology tracking & costing –new technology assessment (PASTA III) just completed (Feb 03) –re-costing of Phase II will be done 1H03 in light of PASTA III re-assessment of experiment trigger rates, event sizes (LHCC) but no significant re-assessment of the analysis model

32 LCG Denis Linglin - 32 MàJ : 9/02/03 07:24 Mass Storage Requirements Current mass storage requirements defined by ALICE for high performance data recording 350 MB/sec 2002  750 MB/sec 2005  1.2 GB/sec in 2008 Attempt to define requirements for mass storage support for analysis stalled – –analysis model not clear enough –worrying for Tier 1 centres

33 LCG Denis Linglin - 33 MàJ : 9/02/03 07:24 Resources in Regional Centres Estimates of resources in Regional Centres being gathered by Grid Deployment Board Expect to be complete this month Then we will compare with Data Challenge requirements Delivery efficiency is a key factor – hard to estimate at present

34 LCG Denis Linglin - 34 MàJ : 9/02/03 07:24 Resources at CERN

35 LCG Denis Linglin - 35 MàJ : 9/02/03 07:24 -60 -40 -20 0 20 40 60 80 2002200320042005 Requested Committed Cumulative Balance F T E Years LCG

36 Denis Linglin - 36 MàJ : 9/02/03 07:24 Computing Materials at CERN Infrastructure + Physics LCG

37 Denis Linglin - 37 MàJ : 9/02/03 07:24 Challenges - I General background - Complexity of the project – Regional Centres, Grid projects, experiments, funding sources and funding motivation The project is operating in an environment where – –there is already a great deal of activity – applications software, data challenges, grid testbeds –requirements are changing as understanding and experience develop Fundamental technologies are evolving independently of the project and LHC

38 LCG Denis Linglin - 38 MàJ : 9/02/03 07:24 Challenges - II Going well - Obtaining agreement on common requirements between the LHC experiments Integrating all of the players in implementation teams –CERN staff and visitors, experiments, other institutes Resources in Regional Centres – but we need to understand delivery efficiency Going reasonably well - Influence on external projects to which LCG supplies resources – GEANT4, ROOT Influence on grid projects and evolution

39 LCG Denis Linglin - 39 MàJ : 9/02/03 07:24 Challenges - III Still in question - Production quality service on a Grid - harder than it looks –Proceed with caution - realistic targets –Urgent to establish how well middleware works, get suppliers focused on support, stability Grids imply operation and management by the community – evolution from empires to a federation We are a long way from demonstrating that we can do effective ESD analysis on a Grid

